VentureBeat

Businesses are going all in on AI for the holidays, but will it really make a difference?

Presented by Commercetools Retailers have been ahead of the AI curve for a long time, embracing predictive AI algorithms early on compared to other industries. Again, in this next wave of AI, which includes generative AI and more advanced algorithms, many brands have already leaned in, testing out the potential of the technology. This holiday shopping season, which reached its peak during Cyber Week, marks the inflection point, where the experimental phase has come to a close, and we’re starting to see the promised results, says Jen Jones, CMO of Commercetools. “Our recent survey shows that AI is hitting both the bottom line and customer satisfaction,” Jones says. “We learned that 91% of businesses have seen improved demand forecasting accuracy. It’s critical to avoid stockouts or overstocks at their busiest time of year, when customers expect to find things and don’t want to be disappointed, and retailers also don’t want to be discounting overstock in January.” On the personalization and recommendation engine side, the latest iteration of AI technology shines not only in analyzing larger-than-ever data sets, full of rich customer information that used to be difficult to process in its entirety, but gets a major glow-up with functions like autonomous bots that detect patterns and make real-time decisions. “We’re finally creating those curated, personalized experiences,” Jones adds. “From a customer standpoint, that’s where having our data in the hands of a brand that we trust makes sense. Now we’re getting something in return, with the items we want served up from the start, making for a far more seamless customer journey.” But while AI adoption is widespread, with 62% of businesses already leveraging AI, and another 32% planning to implement it soon, the journey from implementation to meaningful outcomes is not always straightforward. Prioritizing AI investments Brands continue to keep AI and social commerce front and center of their ecommerce strategy, with 69% planning to ramp up their investments in both technologies. But AI can be an expensive proposition, especially when fully committing. To prioritize spend, Jones advises that brands hone in on the customer journey, from ideation and discovery, to selection, checkout and delivery, and look at how AI can transform key customer touchpoints. The whole journey is the short answer, of course. But you can rank those potential applications by examining each point at which a measurable outcome could be achieved, and consider what impact that outcome would have on the brand’s ultimate goals. That said, there are a few practical areas in which brands are seeing great success when leveraging AI, including inventory management, demand prediction, fraud detection and customer service. Fraud detection is especially critical during the holiday shopping season when the number of transactions jumps dramatically, and keeping track of unusual activity becomes even more of a challenge. To protect their organization’s bottom line, 94% have added AI-enhanced fraud detection amid rising online threats. AI-powered fraud detection harnesses what predictive AI does best — analyze behavioral patterns and detect anomalies and spikes in suspicious activity. Generative AI takes that a step further, taking action if it’s an issue that the system can handle on its own, or bumping it up the chain of command by alerting humans to a problem that needs to be handled. Generative AI has also improved customer service chatbots to the point where they understand natural language and are significantly better at detecting intent. As a result, they’re far better at handling the less complex customer issues that flood call centers during the busiest time of year, letting agents focus on more meaningful, complex problems while call volume is reduced. In turn, customer satisfaction goes up, and so does agent satisfaction.     Additionally, AI-driven personalized advertising is driving huge gains in campaign performance for 93% of retailers. In an era where advertising costs are mounting, anything that optimizes advertising outcomes directly leads to greater revenue. From a consumer standpoint, better, more personalized, more targeted ads take frustration out of the shopping experience and can make customers feel more seen. And then there are the innovations — for example, Commercetools customer Sephora launched a color IQ foundation-matching service that helps customers find the right foundation shade, and the brand that’s right for them. On the B2B side, Dawn Foods is using AI-powered search capabilities to help customers navigate a growing product catalog and turn up results more quickly and more accurately, as well as offer up useful new selections to improve customer relationships. Why barriers to AI value remain Moving from experimentation to successful outcomes has proven to be a challenge for some brands. One of the major barriers is navigating the gap between the demands of cutting-edge AI technology and commerce software that in some cases has been around for decades. “Many brands are still using software that was built for a different world,” Jones says. “But you need to be able to seamlessly integrate your AI technology into your platform, and you must be able to immediately react to data, stay agile and make changes on the fly. Otherwise, AI just won’t live up to your expectations.” That’s where having a modern commerce platform approach, including composable commerce, sets companies up for success, she adds. Another recent Commercetools survey found that 90% of the businesses that switch to modern ecommerce platforms report significant boosts in sales and revenue. Of those platforms, composable commerce has emerged as a frontrunner for brands, with 91% already using or considering it, and 92% cite increased agility as the most important adoption factor. “Composable commerce, at its heart, is about being ready for whatever’s next,” Jones says. “AI is a great example of a technology that broke into the limelight, and brands with flexible, more modern architecture were able to jump on it early, do those early experiments and be ready for this year. They’re seeing benefits both on the operational efficiency side of things, but also on the customer experience and loyalty side. I expect they’ll have a standout holiday shopping season compared to their peers on

Businesses are going all in on AI for the holidays, but will it really make a difference? Read More »

Enterprise AI gets closer to data with Couchbase’s new Capella AI services

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Database platform developer Couchbase is looking to help solve an increasingly common problem for enterprise AI deployments. Namely how to get data closer to AI in as fast and as secure an approach as possible. The end goal is to make it simpler and more operationally efficient to build and deploy enterprise AI. Couchbase today announced Capella AI Services, a suite of capabilities designed to help enterprises build and deploy AI applications while maintaining data security and streamlining development workflows. Among the new offerings is the model service for secure hosting of AI models within organizational boundaries. The vectorization service automates vector operations for efficient AI processing. AI functions simplify AI integration through SQL++ queries while the new agent catalog centralizes AI development resources and templates. The announcement comes as organizations grapple with integrating AI into their existing applications while managing concerns about data privacy, operational complexity and development efficiency. According to the company, the Capella AI Services will enable enterprises to build and deploy AI applications more efficiently with lower latency leading to improved business outcomes. This expansion builds upon Couchbase’s existing strengths in NoSQL database technology and its cloud-to-edge capabilities. Couchbase is among the early pioneers in the NoSQL database world with the company going public back in 2021. Over the past year, the company has increasingly focussed on building out vector database capabilities. Those capabilities have included an assistive gen AI feature known as Cappella IQ in 2023 and expanded vector search this year. “We’re focusing on building a developer data platform for critical applications in our AI world today,” Matt McDonough, SVP of product and partners at Couchbase, told VentureBeat. “Traditional applications are designed for humans to input data. AI really flips that on the head, the emphasis moves from the UI or front end application to the database and making it as efficient as possible for AI agents to work with.” How Couchbase aims to differentiate in an increasingly crowded database market As has been the case in the database market for decades, there is a healthy amount of competition.  Just as NoSQL database capabilities have become increasingly common, the same is now also true of vector database functionality. NoSQL vendors such as MongoDB, DataStax and Neo4j, as well as traditional database vendors like Oracle all have vector capabilities today. “Everyone has vector capabilities today,  I think that’s probably an accurate statement,” McDonough admitted. That said, he noted that even before the new Capella AI services, Couchbase does aim to have a somewhat differentiated offering. In particular, Couchbase has long had mobile and edge deployment capabilities. The database also provides in-memory capabilities that help to accelerate all types of queries, including vector search.  Couchbase is also notable for its SQL++ query language. SQL++ allows developers to query and manipulate JSON data stored in Couchbase using familiar SQL syntax. This helps bridge the gap between relational and NoSQL data models. With the new Capella AI services, SQL++ functionality is being extended to make it easier for application developers to directly query AI models with standard database queries. Mohan Varthakavi, VP of Software Development, AI and Edge at Couchbase explained to VentureBeat that AI functions enable developers to easily execute common AI functions on data. For example, he noted that an organization might already have a large volume of data in Couchbase. With the new AI functions, the organization can simply use SQL++ to summarize data, or executive any other AI function directly on the data. That can be done without needing to host a separate AI model, connect data stores or learn different syntax to execute the AI function. How Capella AI brings semantic context to accelerate enterprise deployments The new Capella AI Services suite introduces several key components that address common enterprise AI challenges One of the new components is the model service which addresses enterprise security concerns by enabling AI model hosting within organizational boundaries. As such a model can be hosted for example within the same virtual private cloud (VPC). “Our customers consistently told us that they are concerned about data going across the wire to foundational models sourced outside,” Varthakavi said.  The service supports both open source models and commercial offerings, with value-added features including request batching and semantic caching. Varthakavi explained that semantic caching provides the ability to cache not just the literal responses to queries, but the semantic meaning and context behind those responses. He noted that by caching semantically relevant responses, Couchbase can provide more contextual and meaningful information to the AI models or applications consuming the data. The semantic caching can help reduce the number of calls needed to AI models, as Couchbase can often provide relevant responses from its own cache. This can lower the operational costs and latency associated with making calls to AI services. McDonough emphasized that the core focus for Couchbase overall with the new AI services is to make it simpler for developers to build, test and deploy AI, without having to use a bunch of different platforms. “Ultimately we believe that is going to reduce latency operational cost, by keeping these models and the data together throughout the entire software development life cycle for AI applications,” he said. source

Enterprise AI gets closer to data with Couchbase’s new Capella AI services Read More »

Sakana AI’s CycleQD outperforms traditional fine-tuning methods for multi-skill language models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers at Sakana AI have developed a resource-efficient framework that can create hundreds of language models specializing in different tasks. Called CycleQD, the technique uses evolutionary algorithms to combine the skills of different models without the need for expensive and slow training processes. CycleQD can create swarms of task-specific agents that offer a more sustainable alternative to the current paradigm of increasing model size. Rethinking model training Large language models (LLMs) have shown remarkable capabilities in various tasks. However, training LLMs to master multiple skills remains a challenge. When fine-tuning models, engineers must balance data from different skills and ensure that one skill doesn’t dominate the others. Current approaches often involve training ever-larger models, which leads to increasing computational demands and resource requirements. “We believe rather than aiming to develop a single large model to perform well on all tasks, population-based approaches to evolve a diverse swarm of niche models may offer an alternative, more sustainable path to scaling up the development of AI agents with advanced capabilities,” the Sakana researchers write in a blog post. To create populations of models, the researchers took inspiration from quality diversity (QD), an evolutionary computing paradigm that focuses on discovering a diverse set of solutions from an initial population sample. QD aims at creating specimens with various “behavior characteristics” (BCs), which represent different skill domains. It achieves this through evolutionary algorithms (EA) that select parent examples and use crossover and mutation operations to create new samples. Quality Diversity (source: Sakana AI) CycleQD CycleQD incorporates QD into the post-training pipeline of LLMs to help them learn new, complex skills. CycleQD is useful when you have multiple small models that have been fine-tuned for very specific skills, such as coding or performing database and operating system operations, and you want to create new variants that have different combinations of those skills. In the CycleQD framework, each of these skills is considered a behavior characteristic or a quality that the next generation of models is optimized for. In each generation, the algorithm focuses on one specific skill as its quality metric while using the other skills as BCs. “This ensures every skill gets its moment in the spotlight, allowing the LLMs to grow more balanced and capable overall,” the researchers explain. CycleQD (source: Sakana AI) CycleQD starts with a set of expert LLMs, each specialized in a single skill. The algorithm then applies “crossover” and “mutation” operations to add new higher-quality models to the population. Crossover combines the characteristics of two parent models to create a new model while mutation makes random changes to the model to explore new possibilities. The crossover operation is based on model merging, a technique that combines the parameters of two LLMs to create a new model with combined skills. This is a cost-effective and quick method for developing well-rounded models without the need to fine-tune them. The mutation operation uses singular value decomposition (SVD), a factorization method that breaks down any matrix into simpler components, making it easier to understand and manipulate its elements. CycleQD uses SVD to break down the model’s skills into fundamental components or sub-skills. By tweaking these sub-skills, the mutation process creates models that explore new capabilities beyond those of their parent models. This helps the models avoid getting stuck in predictable patterns and reduces the risk of overfitting. Evaluating CycleQD’s performance The researchers applied CycleQD to a set of Llama 3-8B expert models fine-tuned for coding, database operations and operating system operations. The goal was to see if the evolutionary method could combine the skills of the three models to create a superior model. The results showed that CycleQD outperformed traditional fine-tuning and model merging methods across the evaluated tasks. Notably, a model fine-tuned on all datasets combined performed only marginally better than the single-skill expert models, despite being trained on more data. Moreover, the traditional training process is much slower and more expensive. CycleQD was also able to create various models with different performance levels on the target tasks. “These results clearly show that CycleQD outperforms traditional methods, proving its effectiveness in training LLMs to excel across multiple skills,” the researchers write. CycleQD vs other fine-tuning methods (source: Sakana AI) The researchers believe that CycleQD has the potential to enable lifelong learning in AI systems, allowing them to continuously grow, adapt and accumulate knowledge over time. This can have direct implications for real-world applications. For example, CycleQD can be used to continuously merge the skills of expert models instead of training a large model from scratch. Another exciting direction is the development of multi-agent systems, where swarms of specialized agents evolved through CycleQD can collaborate, compete and learn from one another.  “From scientific discovery to real-world problem-solving, swarms of specialized agents could redefine the limits of AI,” the researchers write. source

Sakana AI’s CycleQD outperforms traditional fine-tuning methods for multi-skill language models Read More »

AWS now allows prompt caching with 90% cost reduction

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The usage of AI continues to expand, and with more enterprises integrating AI tools into their workflows, many want to look for more options to cut the costs associated with running AI models.  To answer customer demand, AWS announced two new capabilities on Bedrock to cut the cost of running AI models and applications, that are already available on competitor platforms.  During a keynote speech at AWS re:Invent, Swami Sivasubramanian, vice president for  AI and Data at AWS, announced Intelligent Prompt Routing on Bedrock and the arrival of Prompt Caching.  Intelligent Prompt Routing would help customers direct prompts to the best size so a big model doesn’t answer a simple query.  “Developers need the right models for their applications, which is why we offer a diverse set of options,” Sivasubramanian said.  AWS said Intelligent Prompt Routing “can reduce costs by up to 30% without compromising on accuracy.” Users will have to choose a model family, and Bedrock’s Intelligent Prompt Routing will push prompts to the right-sized models within that family.  Moving prompts through different models to optimize usage and cost has slowly gained prominence in the AI industry. Startup Not Diamond announced its smart routing feature in July.  Voice agent company Argo Labs, an AWS customer, said it uses Intelligent Prompt Routing to ensure the correct-sized models handle the different customer inquiries. Simple yes-or-no questions like “Do you have a reservation?” are managed by a smaller model, but more complicated ones like “What vegan options are available?” would be routed to a bigger one.  Caching prompts AWS also announced Bedrock will now support prompt caching, where Bedrock can keep common or repeat prompts without pinging the model and generating another token.  “Token generation costs can quickly rise up, especially when prompts are frequently repeated,” Sivasubramanian said. “We wanted to give customers an easy way to dynamically cache prompts without sacrificing accuracy.” AWS said prompt caching reduces costs “by up to 90% and latency by up to 85% for supported models.” However, AWS is a little late to this trend. Prompt caching has been available on other platforms to help users cut costs when reusing prompts. Anthropic’s Claude 3.5 Sonnet and Haiku offer prompt caching on its API. OpenAI also expanded prompt caching for its API.  Using AI models can be expensive Running AI applications remains expensive, not just because of the cost of training models, but actually using them. Enterprises have said the costs of using AI are still one of the biggest barriers to broader deployment.  As enterprises move towards agentic use cases, there is still a cost associated with users pinging the model and the agent to start doing its tasks. Methods like prompt caching and intelligent routing may help cut costs by limiting when a prompt pings a model API to answer a query.  Model developers, though, said as adoption grows, some model prices could fall. OpenAI has said it anticipates AI costs could come down soon.  More models AWS, which hosts many models from Amazon — including its new Nova models — and leading open-source providers, will add new models on Bedrock. This includes models from Poolside, Stability AI’s Stable Diffusion 3.5 Large and Luma’s Ray 2. The models are expected to launch on Bedrock soon.  Luma CEO and co-founder Amit Jain told VentureBeat that AWS is the first cloud provider partner of the company to host its models. Jain said the company used Amazon’s SageMaker HyperPod when building and training Luma models.  “The AWS team had engineers who felt like part of our team because they were helping us figure out issues. It took us almost a week or two to bring our models to life,” Jain said.  source

AWS now allows prompt caching with 90% cost reduction Read More »

Calling all gen AI disruptors of the enterprise! Apply now to present at Transform 2025

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The Innovation Showcase is back at Transform: The Orchestration of Enterprise Agentic AI at Scale, in June 2025 in San Francisco.  We’re on the hunt for the 10 generative AI products most likely to disrupt the enterprise. If you think your technology fits this bill, we’d like to invite you to present its impact on the main stage at Transform. Those selected to present will do so in front of an invite-only audience of 400 industry decision-makers, and will receive direct feedback from a panel of enterprise tech analysts, brand executives and others. Every presenter will receive exclusive editorial coverage from VentureBeat, getting your company out in front of our millions of monthly readers. Who should apply? Dynamic companies with compelling new gen AI technologies eager to present on stage. All finalists will be winners (since we’re being selective). We will award winners in three categories: most likely to succeed, best technology, and best presentation style. In total, up to 10 candidates will be selected from what is sure to be a multitude of qualified applicants. Candidates must offer new enterprise AI solutions, and we will select five early-stage companies (seed to early-stage Series A that have received $50M or less) and five later-stage startups (series A or later companies that have received more than $50M). If you represent a unit that is part of a mature, large company, please enter as a later stage. If have a story to tell and an AI product or service that offers up real business results and use cases, please submit your application by 5 p.m. PT on March 31, 2025. Read about the winners from VB Transform 2024: SambaNova, Instabase and Tabnine. source

Calling all gen AI disruptors of the enterprise! Apply now to present at Transform 2025 Read More »

Hume launches Voice Control allowing users and developers to make custom AI voices

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hume AI, the startup specializing in emotionally intelligent voice interfaces, has launched Voice Control, an experimental feature that empowers developers and users to create custom AI voices through precise modulation of vocal characteristics — no coding, AI prompt engineering, or sound design skills required. This release builds on the foundation laid by the company’s earlier Empathic Voice Interface 2 (EVI 2), which introduced advanced capabilities in naturalness, emotional responsiveness, and customization. Both EVI 2 and Voice Control avoid the risks of voice cloning, a practice that Cowen has stated carries ethical and practical challenges. Instead, Hume focuses on providing tools for creating unique, expressive voices that align with user needs, such as customer service chatbots, digital assistants, tutors, guides, or accessibility features. Moving beyond preset AI voices toward custom bespoke solutions Voice Control offers developers the ability to adjust voices along 10 distinct dimensions, including: “Masculine/Feminine: The vocalization of gender, ranging between more masculine and more feminine. Assertiveness: The firmness of the voice, ranging between timid and bold. Buoyancy: The density of the voice, ranging between deflated and buoyant. Confidence: The assuredness of the voice, ranging between shy and confident. Enthusiasm: The excitement within the voice, ranging between calm and enthusiastic. Nasality: The openness of the voice, ranging between clear and nasal. Relaxedness: The stress within the voice, ranging between tense and relaxed. Smoothness: The texture of the voice, ranging between smooth and staccato. Tepidity: The liveliness behind the voice, ranging between tepid and vigorous. Tightness: The containment of the voice, ranging between tight and breathy.” This no-code tool allows users to fine-tune voice attributes in real time through virtual onscreen sliders. It’s currently available in Hume’s virtual playground, which requires a free user sign-up to access. The release addresses a key pain point in the AI industry: the reliance on preset voices, which often fail to meet the specific needs of brands or applications, or the risks associated with voice cloning. This focus on customization aligns with Hume’s broader goal of developing emotionally nuanced voice AI. The company’s efforts to advance voice AI were highlighted in September 2024 with the launch of EVI 2, which the company described as a significant upgrade to its predecessor. EVI 2 improved latency by 40%, reduced costs by 30%, and expanded voice modulation features, offering developers a safer alternative to voice cloning. Sliders > text prompts Hume’s research-driven approach plays a central role in its product development. The company, co-founded by former Google DeepMinder Alan Cowen, utilizes a proprietary model based on cross-cultural voice recordings paired with emotional survey data. This methodology, rooted in emotion science, forms the backbone of both EVI 2 and the newly launched Voice Control. Voice Control extends these principles by addressing the granular, often ineffable ways humans perceive voices. The tool’s slider-based interface reflects common perceptual qualities of voice, such as buoyancy or assertiveness, without attempting to oversimplify these attributes through text-based prompts. Voice Control is immediately available in beta and integrates with Hume’s Empathic Voice Interface (EVI), making it accessible for a wide range of applications. Developers can select a base voice, adjust its characteristics, and preview the results in real time. This process ensures reproducibility and stability across sessions, key features for real-time applications like customer service bots or virtual assistants. EVI 2’s influence is evident in Voice Control’s capabilities. The earlier model introduced features like in-conversation prompts and multilingual capabilities, which have broadened the scope of voice AI applications. For example, EVI 2 supports sub-second response times, enabling natural and immediate conversations. It also allows dynamic adjustments to speaking style during interactions, making it a versatile tool for businesses. Differentiating in a competitive market Hume’s focus on voice customization and emotional intelligence positions it as a strong competitor in the voice AI space, even against well-funded rivals such as OpenAI with its Advanced Voice Mode and ElevenLabs, both of which offer libraries of pre-set voices. Hume continues to build on its innovative approach to voice AI. Plans for expanding Voice Control include introducing additional modifiable dimensions, refining voice quality under extreme adjustments, and increasing the range of base voices available. With the launch of Voice Control, Hume reinforces its position as a leader in voice AI innovation, offering tools that prioritize customization, emotional intelligence, and real-time adaptability. Developers can access Voice Control today via Hume’s platform, marking another step forward in the evolution of AI-driven voice solutions. source

Hume launches Voice Control allowing users and developers to make custom AI voices Read More »

Meta launches open source Llama 3.3, shrinking powerful bigger model into smaller size

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Meta’s VP of generative AI, Ahmad Al-Dahle took to rival social network X today to announce the release of Llama 3.3, the latest open-source multilingual large language model (LLM) from the parent company of Facebook, Instagram, WhatsApp and Quest VR. As he wrote: “Llama 3.3 improves core performance at a significantly lower cost, making it even more accessible to the entire open-source community.” With 70 billion parameters — or settings governing the model’s behavior — Llama 3.3 delivers results on par with Meta’s 405B parameter model from the Llama 3.1 from the summer, but at a fraction of the cost and computational overhead — e.g., the GPU capacity needed to run the model in an inference. It’s designed to offer top-tier performance and accessibility yet in a smaller package than prior foundation models. Meta’s Llama 3.3 is offered under the Llama 3.3 Community License Agreement, which grants a non-exclusive, royalty-free license for use, reproduction, distribution, and modification of the model and its outputs. Developers integrating Llama 3.3 into products or services must include appropriate attribution, such as “Built with Llama,” and adhere to an Acceptable Use Policy that prohibits activities like generating harmful content, violating laws, or enabling cyberattacks. While the license is generally free, organizations with over 700 million monthly active users must obtain a commercial license directly from Meta. A statement from the AI at Meta team underscores this vision: “Llama 3.3 delivers leading performance and quality across text-based use cases at a fraction of the inference cost.” How much savings are we talkin’ about, really? Some back-of-the-envelope math: Llama 3.1-405B requires between 243 GB and 1944 GB of GPU memory, according to the Substratus blog (for the open source cross cloud substrate). Meanwhile, the older Llama 2-70B requires between 42-168 GB of GPU memory, according to the same blog, though same have claimed as low as 4 GB, or as Exo Labs has shown, a few Mac computers with M4 chips and no discrete GPUs. Therefore, if the GPU savings for lower-parameter models holds up in this case, those looking to deploy Meta’s most powerful open source Llama models can expect to save up to nearly 1940 GB worth of GPU memory, or potentially, 24 times reduced GPU load for a standard 80 GB Nvidia H100 GPU. At an estimated $25,000 per H100 GPU, that’s up to $600,000 in up-front GPU cost savings, potentially — not to mention the continuous power costs. A highly performant model in a small form factor According to Meta AI on X, the Llama 3.3 model handedly outperforms the identically sized Llama 3.1-70B as well as Amazon’s new Nova Pro model in several benchmarks such as multilingual dialogue, reasoning, and other advanced natural language processing (NLP) tasks (Nova outperforms it in HumanEval coding tasks). Llama 3.3 has been pretrained on 15 trillion tokens from “publicly available” data and fine-tuned on over 25 million synthetically generated examples, according to the information Meta provided in the “model card” posted on its website. Leveraging 39.3 million GPU hours on H100-80GB hardware, the model’s development underscores Meta’s commitment to energy efficiency and sustainability. Llama 3.3 leads in multilingual reasoning tasks with a 91.1% accuracy rate on MGSM, demonstrating its effectiveness in supporting languages such as German, French, Italian, Hindi, Portuguese, Spanish, and Thai, in addition to English. Cost-effective and environmentally conscious Llama 3.3 is specifically optimized for cost-effective inference, with token generation costs as low as $0.01 per million tokens. This makes the model highly competitive against industry counterparts like GPT-4 and Claude 3.5, with greater affordability for developers seeking to deploy sophisticated AI solutions. Meta has also emphasized the environmental responsibility of this release. Despite its intensive training process, the company leveraged renewable energy to offset greenhouse gas emissions, resulting in net-zero emissions for the training phase. Location-based emissions totaled 11,390 tons of CO2-equivalent, but Meta’s renewable energy initiatives ensured sustainability. Advanced features and deployment options The model introduces several enhancements, including a longer context window of 128k tokens (comparable to GPT-4o, about 400 pages of book text), making it suitable for long-form content generation and other advanced use cases. Its architecture incorporates Grouped Query Attention (GQA), improving scalability and performance during inference. Designed to align with user preferences for safety and helpfulness, Llama 3.3 uses reinforcement learning with human feedback (RLHF) and supervised fine-tuning (SFT). This alignment ensures robust refusals to inappropriate prompts and an assistant-like behavior optimized for real-world applications. Llama 3.3 is already available for download through Meta, Hugging Face, GitHub, and other platforms, with integration options for researchers and developers. Meta is also offering resources like Llama Guard 3 and Prompt Guard to help users deploy the model safely and responsibly. source

Meta launches open source Llama 3.3, shrinking powerful bigger model into smaller size Read More »

Microsoft Copilot Vision is here, letting AI see what you do online

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft Copilot is getting smarter by the day. The Satya Nadella-led company has just announced that its AI assistant now has ‘vision’ capabilities that enable it to browse the internet with users. While the feature was first announced in October this year, the company is now previewing it with a select set of Pro subscribers. According to Microsoft, these users will be able to trigger Copilot Vision on webpages opened on their Edge browser and interact with it regarding the contents visible on the screen. The feature is still in the early stages of development and pretty restricted, but once fully evolved, it could prove to be a game-changer for Microsoft’s enterprise customers — helping them with analysis and decision-making as they interact with products the company has in its ecosystem (OneDrive, Excel, SharePoint, etc.)  In the long run, it will also be interesting to see how Copilot Vision fares against more open and capable agentic offerings, such as those from Anthropic and Emergence AI, that allow developers to integrate agents to see, reason and take actions across applications from different vendors. What to expect with Copilot Vision? When a user opens a website, they may or may not have an intended goal. But, when they do, like researching for an academic paper, the process of executing the desired task revolves around going through the website, reading all its content and then taking a call on it (like whether the website’s content should be used as a reference for the paper or not). The same applies to other day-to-day web tasks like shopping. With the new Copilot Vision experience, Microsoft aims to make this entire process simpler. Essentially, the user now has an assistant that sits at the bottom of their browser and can be called upon whenever needed to read the contents of the website, covering all the texts and images, and help with decision-making.  It can immediately scan, analyze and provide all the required information, considering the intended goal of the user — just like a second set of eyes. The capability has far-reaching benefits — it can accelerate your workflows in not time — as well as major implications, given the agent is reading and assessing whatever you’re browsing. However, Microsoft has assured that all the context and information shared by the users is deleted as soon as the Vision session is closed. It also noted that websites’ data is not captured/stored for training the underlying models. “In short, we’re prioritizing copyright, creators, and our user’s privacy and safety – and are putting them all first,” the Copilot team wrote in a blog post announcing the preview of the capability. Expansion based on feedback Currently, a select set of Copilot Pro subscribers in the US, who have signed up for the early-access Copilot Labs program, will be able to use vision capabilities in their Edge browser. The capability will be opt-in, which means they don’t have to worry about AI reading their screens all the time.  Further, at this stage, it will only work with select websites. Microsoft says it will take feedback from the early users and gradually improve the capability while expanding support to more Pro users and other websites.  In the long run, the company may even expand these capabilities to other products in its ecosystem, such as OneDrive and Excel, allowing enterprise users to work and make decisions more easily. However, there’s no official confirmation yet. Not to mention, given the cautious approach signaled here, it may take some time to become a reality.  Microsoft’s move to launch Copilot Vision’s preview comes at a time when competitors are pushing the bar in the agentic AI space. Salesforce has already rolled out AgentForce across its Customer 360 offerings to automate workflows across domains like sales, marketing and service.  Meanwhile, Anthropic has launched ‘Computer Use,’ which allows developers to integrate Claude to interact with a computer desktop environment, performing tasks that were previously handled only by human workers, such as opening applications, interacting with interfaces and filling out forms. source

Microsoft Copilot Vision is here, letting AI see what you do online Read More »

AWS Bedrock upgrades to add model teaching, hallucination detector

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AWS announced more updates for Bedrock aimed to spot hallucinations and build smaller models faster as enterprises want more customization and accuracy from models.  AWS announced during re:Invent 2024 Amazon Bedrock Model Distillation and Automated Reasoning Checks on preview for enterprise customers interested in training smaller models and catching hallucinations. Amazon Bedrock Model Distillation will let users use a larger AI model to train a smaller model and offer enterprises access to a model they feel would work best with their workload.  Larger models, such as Llama 3.1 405B, have more knowledge but are slow and unwieldy. A smaller model responds faster but most often has limited knowledge. AWS said Bedrock Model Distillation would make the process of transferring a bigger model’s knowledge to a smaller one without sacrificing response time.  Users can select the heavier-weight model they want and find a small model within the same family, like Llama or Claude, which have a range of model sizes in the same family, and write out sample prompts. Bedrock will generate responses and fine-tune the smaller model and continue to make more sample data to finish distilling the larger model’s knowledge.  Right now, model distillation works with Anthropic, Amazon and Meta models. Bedrock Model Distillation is currently on preview.  Why enterprises are interested in model distillation For enterprises that want a faster response model — such as one that can quickly answer customer questions — there must be a balance between knowing a lot and responding quickly. While they can choose to use a smaller version of a large model, AWS is banking that more enterprises want more customization in the kinds of models — both the larger and smaller ones — that they want to use.  AWS, which does offer a choice of models in Bedrock’s model garden, hopes enterprises will want to choose any model family and train a smaller model for their needs.  Many organizations, mostly model providers, use model distillation to train smaller models. However, AWS said the process usually entails a lot of machine learning expertise and manual fine-tuning. Model providers such as Meta have used model distillation to bring a broader knowledge base to a smaller model. Nvidia leveraged distillation and pruning techniques to make Llama 3.1-Minitron 4B, a small language model it said performs better than similar-sized models. Model distillation is not new for Amazon, which has been working on model distillation methods since 2020.  Catching factual errors faster Hallucinations remain an issue for AI models, even though enterprises have created workarounds like fine-tuning and limiting what models will respond to. However, even the most fine-tuned model that only performs retrieval augmented generation (RAG) tasks with a data set can still make mistakes.  AWS solution is Automated Reasoning checks on Bedrock, which uses mathematical validation to prove that a response is correct.  “Automated Reasoning checks is the first and only generative AI safeguard that helps prevent factual errors due to hallucinations using logically accurate and verifiable reasoning,” AWS said. “By increasing the trust that customers can place in model responses, Automated Reasoning checks opens generative AI up to new use cases where accuracy is paramount.”  Customers can access Automated Reasoning checks from Amazon Bedrock Guardrails, the product that brings responsible AI and fine-tuning to models. Researchers and developers often use automated reasoning to deal with precise answers for complex issues with math.  Users have to upload their data and Bedrock will develop the rules for the model to follow and guide customers to ensure the model is tuned to them. Once it’s checked, Automated Reasoning checks on Bedrock will verify the responses from the model. If it returns something incorrectly, Bedrock will suggest a new answer. AWS CEO Matt Garman said during his keynote that automated checks ensure an enterprise’s data remains its differentiator, with their AI models reflecting that accurately.  source

AWS Bedrock upgrades to add model teaching, hallucination detector Read More »

Amazon launches Nova AI model family for generating text, images and videos

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As one of the biggest tech companies in the world, Amazon’s position in the ongoing generative AI race has been mainly focused on building out its developer tools and platforms — as well as providing significant funding for startup Anthropic. But no longer: as announced today by CEO Andy Jassy at the annual Amazon Web Services (AWS) re:Invent conference, the e-commerce giant is fielding a whole new AI model family called Nova which allows users to generate text, images, and videos — pitting it right up against the likes of OpenAI, Google, and even its own investment Anthropic. Several of the new models — including the text, image, and video offerings — are available now here, though you’ll need an Amazon Bedrock account to access them, with a speech-to-speech audio generation model said to be coming in 2025. Super nova The Amazon Nova suite introduces several models tailored to specific use cases, all supporting more than 200 languages: • Amazon Nova Micro: A text-only model optimized for low-latency responses at minimal cost. • Amazon Nova Lite: A multimodal model offering fast processing for text, images, and videos at a very low cost. • Amazon Nova Pro: A multimodal model combining accuracy, speed, and cost-efficiency, designed for a wide range of tasks. • Amazon Nova Premier: The most advanced multimodal model for complex reasoning tasks and for distilling custom models (launching in Q1 2025). • Amazon Nova Canvas: An advanced image generation model for creative content development. • Amazon Nova Reel: A state-of-the-art video generation model offering dynamic capabilities. All models support fine-tuning and knowledge distillation, allowing customers to tailor AI tools to their proprietary data for improved accuracy and performance. These models excel in supporting Retrieval Augmented Generation (RAG), which grounds outputs in specific organizational data to enhance reliability. An image canvas and complex camera controls The Nova Canvas and Reel models highlight Amazon’s push into creative content generation: • Nova Canvas: Users can edit images through natural language text prompts and adjust layouts or color schemes. Built-in safety measures, such as watermarking and content moderation, ensure responsible AI usage. • Nova Reel: This video generation model supports advanced features, including camera motion controls like panning, zooming, and 360-degree rotations. It allows for the creation of dynamic six-second videos, with additional functionalities expected in the future. Human evaluations have validated the model’s capabilities. Nova Reel outperformed Runway’s Gen-3 Alpha in A/B testing, achieving winning rates of 61.4% for video quality and 71.6% for video consistency. Integrated with Bedrock (duh) Unsurprisingly, the Amazon Nova models are deeply integrated with its Bedrock fully managed service that simplifies access to high-performing AI models through a single API. Customers can use this platform to experiment, evaluate, and deploy Nova models or other foundation models available on Bedrock. There are also options for fine-tuning and distillation, allowing users to adapt models to their specific needs. Designed for brands Rohit Prasad, Senior Vice President of Amazon Artificial General Intelligence, noted that Amazon Nova is designed to address common challenges faced by application builders. The models deliver advances in latency, cost-effectiveness, and information grounding, providing flexible and powerful solutions for both internal and external customers. Brands using Amazon Nova tools in advertising have reported significant improvements, including a fivefold increase in the number of products advertised and a doubling of images per product. These tools also enable advertisers to explore new strategies, such as keyword-level creative optimization and video advertising. More to come Amazon has announced plans to expand the Nova family in 2025 with two additional models: • A speech-to-speech model for natural, humanlike verbal interactions. • An any-to-any modality model that can process and generate text, images, audio, and video, enabling seamless translation and editing across modalities. Amazon emphasizes safety and transparency with integrated protections across all Nova models. The company has introduced AWS AI Service Cards, offering clear documentation on use cases, limitations, and responsible AI practices. Features like watermarking and content moderation are embedded to ensure compliance with ethical standards. Amazon Nova represents a significant step in the company’s AI journey, bringing innovative generative AI tools to businesses and individuals. As these tools become more widely available, Amazon continues to prioritize delivering real-world value to its customers source

Amazon launches Nova AI model family for generating text, images and videos Read More »