VentureBeat

Alibaba researchers unveil Marco-o1, an LLM with advanced reasoning capabilities

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The recent release of OpenAI o1 has brought great attention to large reasoning models (LRMs), and is inspiring new models aimed at solving complex problems classic language models often struggle with. Building on the success of o1 and the concept of LRMs, researchers at Alibaba have introduced Marco-o1, which enhances reasoning capabilities and tackles problems with open-ended solutions where clear standards and quantifiable rewards are absent. OpenAI o1 uses “inference-time scaling” to improve the model’s reasoning ability by giving it “time to think.” Basically, the model uses more compute cycles during inference to generate more tokens and review its responses, which improves its performance on tasks that require reasoning. o1 is renowned for its impressive reasoning capabilities, especially in tasks with standard answers such as mathematics, physics and coding.  However, many applications involve open-ended problems that lack clear solutions and quantifiable rewards. “We aimed to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges,” Alibaba researchers write. Marco-o1 is a fine-tuned version of Alibaba’s Qwen2-7B-Instruct that integrates advanced techniques such as chain-of-thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS) and reasoning action strategies. The researchers trained Marco-o1 on a combination of datasets, including the Open-O1 CoT dataset; the Marco-o1 CoT dataset, a synthetic dataset generated using MCTS; and the Marco-o1 Instruction dataset, a collection of custom instruction-following data for reasoning tasks. Marco-o1 uses CoT and MCTS to reason about tasks (source: arXiv) MCTS is a search algorithm that has proven to be effective in complex problem-solving scenarios. It intelligently explores different solution paths by repeatedly sampling possibilities, simulating outcomes and gradually building a decision tree. It has proven to be very effective in complex AI problems, such as beating the game Go. Marco-o1 leverages MCTS to explore multiple reasoning paths as it generates response tokens. The model uses the confidence scores of candidate response tokens to build its decision tree and explore different branches. This enables the model to consider a wider range of possibilities and arrive at more informed and nuanced conclusions, especially in scenarios with open-ended solutions. The researchers also introduced a flexible reasoning action strategy that allows them to adjust the granularity of MCTS steps by defining the number of tokens generated at each node in the tree. This provides a tradeoff between accuracy and computational cost, giving users the flexibility to balance performance and efficiency. Another key innovation in Marco-o1 is the introduction of a reflection mechanism. During the reasoning process, the model periodically prompts itself with the phrase, “Wait! Maybe I made some mistakes! I need to rethink from scratch.” This causes the model to re-evaluate its reasoning steps, identify potential errors and refine its thought process. “This approach allows the model to act as its own critic, identifying potential errors in its reasoning,” the researchers write. “By explicitly prompting the model to question its initial conclusions, we encourage it to re-express and refine its thought process.” To evaluate the performance of Marco-o1, the researchers conducted experiments on several tasks, including the MGSM benchmark, a dataset for multi-lingual grade school math problems. Marco-o1 significantly outperformed the base Qwen2-7B model, particularly when the MCTS component was adjusted for single-token granularity.  Different versions of Marco-o1 vs base model (source: arXiv) However, the primary objective of Marco-o1 was to address the challenges of reasoning in open-ended scenarios. To this end, the researchers tested the model on translating colloquial and slang expressions, a task that requires understanding subtle nuances of language, culture and context. The experiments showed that Marco-o1 was able to capture and translate these expressions more effectively than traditional translation tools. For instance, the model correctly translated a colloquial expression in Chinese, which literally means, “This shoe offers a stepping-on-poop sensation”, into the English equivalent, “This shoe has a comfortable sole.” The reasoning chain of the model shows how it evaluates different potential meanings and arrives at the correct translation. This paradigm can prove to be useful for tasks such as product design and strategy, which require deep and contextual understanding and do not have well-defined benchmarks and metrics. Example of reasoning chain for translation task (source: arXiv) A new wave of reasoning models Since the release of o1, AI labs are racing to release reasoning models. Last week, Chinese AI lab DeepSeek released R1-Lite-Preview, its o1 competitor, which is currently only available through the company’s online chat interface. R1-Lite-Preview reportedly beats o1 on several key benchmarks. The open source community is also catching up with the private model market, releasing models and datasets that take advantage of inference-time scaling laws. The Alibaba team released Marco-o1 on Hugging Face along with a partial reasoning dataset that researchers can use to train their own reasoning models. Another recently released model is LLaVA-o1, developed by researchers from multiple universities in China, which brings the inference-time reasoning paradigm to open-source vision language models (VLMs).  The release of these models comes amidst uncertainty about the future of model scaling laws. Various reports indicate that the returns on training larger models are diminishing and might be hitting a wall. But what’s for certain is that we are just beginning to explore the possibilities of inference-time scaling. source

Alibaba researchers unveil Marco-o1, an LLM with advanced reasoning capabilities Read More »

OpenAI appears poised to launch ChatGPT Pro subscription plans at $200 USD per month

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI appears to be launching a new subscription tier offering for its signature chatbot product, ChatGPT. Screenshots posted on X by third-party AI engineer Tibor Blaho show the new tier, ChatGPT Pro, priced at $200 U.S. dollars per month, 10x the amount currently charged for the ChatGPT Plus individual offering. It is also the highest sum compared to ChatGPT Team ($30), Enterprise (varies but estimated to be $60-$100), and Edu (varies but estimated at $12 per month). Yet the Pro plan will grant users access to “the best of OpenAI with the highest level of access” according to the screenshot, which includes “unlimited” access to its newest o1 and o1-mini reasoning models and even “more compute,” a.k.a. graphics-processing unit (GPU) capacity that OpenAI has for serving up model inferences (live models users can interact with). Plus, meanwhile, only grants users “limited access” to the o1 and o1-mini reasoning models. VentureBeat has reached out to OpenAI contacts for confirmation or further information on ChatGPT Pro and will update when we hear back. The news comes just a day after OpenAI said it would today begin 12 days of holiday-themed announcements entitled “12 Days of OpenAI,” an obvious allusion to the “12 Days of Christmas” song and tradition. OpenAI is expected to make its announcement today in an hour, around 10 am PT. source

OpenAI appears poised to launch ChatGPT Pro subscription plans at $200 USD per month Read More »

Cohere’s Rerank 3.5 is here, and it’s about to change enterprise search forever

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Artificial intelligence company Cohere released a powerful new search model today that promises to transform how global businesses find and use their data across languages and complex systems. The new model, Rerank 3.5, arrives as businesses struggle with increasingly complex data environments and multilingual operations. Its most notable advancement is the ability to process queries across more than 100 languages, with particular strength in major business languages including Arabic, Japanese, and Korean. Breaking language barriers in enterprise search What sets Rerank 3.5 apart isn’t just its linguistic prowess – it’s the model’s ability to fundamentally reshape how global enterprises handle information retrieval. In an era where data silos and language barriers still plague multinational corporations, this advancement could level the playing field for non-English speaking markets and dramatically accelerate global business operations. The technology shows particular promise in specialized sectors such as finance, healthcare, and manufacturing, where precision in information retrieval is crucial. Internal testing by Cohere demonstrated that Rerank 3.5 performed 23.4% better than hybrid search systems and 30.8% better than traditional BM25 search algorithms on financial services datasets. These improvements, while impressive on paper, could translate to millions in saved costs and significantly reduced risks in regulated industries where information accuracy is paramount. AI-powered reasoning transforms complex queries Perhaps most significant is the model’s enhanced reasoning capabilities. Using a technique called “cross-encoding,” Rerank 3.5 can better understand queries with multiple constraints — a common stumbling block for traditional search systems. This leap forward in search intelligence represents a crucial shift from simple keyword matching to genuine understanding of context and intent, potentially eliminating the frustrating trial-and-error approach that characterizes most enterprise search experiences. The integration of cross-encoding with retrieval-augmented generation (RAG) systems could prove to be the killer feature that finally makes enterprise search feel as intuitive as consumer search engines. This combination might finally deliver on the long-promised dream of truly intelligent enterprise search, where systems understand not just what users are asking for, but why they’re asking for it. Enterprise race for smarter search solutions intensifies The timing of this release is particularly significant. As enterprise AI moves from experimentation to production, the battle for market dominance in intelligent search is heating up. Cohere’s focus on practical implementation — allowing deployment with minimal code changes and negligible latency impact — suggests a deep understanding of enterprise pain points that competitors have often overlooked. The model’s availability through major cloud platforms (such as Amazon Bedrock) signals Cohere’s ambition to become the de facto standard for enterprise search. However, the mandatory migration deadline of March 31, 2025, for older versions reveals a broader truth about the AI industry: the pace of innovation is relentless, and enterprises must be prepared for continuous adaptation. Looking ahead, Rerank 3.5’s impact could extend far beyond simple search improvement. As organizations grapple with exponentially growing data volumes and increasingly diverse global workforces, the ability to seamlessly bridge language barriers while maintaining search precision could become a critical competitive differentiator. The real question isn’t whether competitors will emerge — they certainly will — but whether they can match Cohere’s balance of sophisticated AI capabilities with practical enterprise needs. What’s at stake isn’t just market share in the enterprise search sector, but the future of how global organizations access and utilize their collective knowledge. If Rerank 3.5 delivers on its promises, it could mark the beginning of a new era where language and data complexity no longer impede global business operations. source

Cohere’s Rerank 3.5 is here, and it’s about to change enterprise search forever Read More »

The end of AI scaling may not be nigh: Here’s what’s next

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As AI systems achieve superhuman performance in increasingly complex tasks, the industry is grappling with whether bigger models are even possible — or if innovation must take a different path. The general approach to large language model (LLM) development has been that bigger is better, and that performance scales with more data and more computing power. However, recent media discussions have focused on how LLMs are approaching their limits. “Is AI hitting a wall?” The Verge questioned, while Reuters reported that “OpenAI and others seek new path to smarter AI as current methods hit limitations.”  The concern is that scaling, which has driven advances for years, may not extend to the next generation of models. Reporting suggests that the development of frontier models like GPT-5, which push the current limits of AI, may face challenges due to diminishing performance gains during pre-training. The Information reported on these challenges at OpenAI and Bloomberg covered similar news at Google and Anthropic.  This issue has led to concerns that these systems may be subject to the law of diminishing returns — where each added unit of input yields progressively smaller gains. As LLMs grow larger, the costs of getting high-quality training data and scaling infrastructure increase exponentially, reducing the returns on performance improvement in new models. Compounding this challenge is the limited availability of high-quality new data, as much of the accessible information has already been incorporated into existing training datasets.  This does not mean the end of performance gains for AI. It simply means that to sustain progress, further engineering is needed through innovation in model architecture, optimization techniques and data use. Learning from Moore’s Law A similar pattern of diminishing returns appeared in the semiconductor industry. For decades, the industry had benefited from Moore’s Law, which predicted that the number of transistors would double every 18 to 24 months, driving dramatic performance improvements through smaller and more efficient designs. This too eventually hit diminishing returns, beginning somewhere between 2005 and 2007 due to Dennard Scaling — the principle that shrinking transistors also reduces power consumption— having hit its limits which fueled predictions of the death of Moore’s Law. I had a close up view of this issue when I worked with AMD from 2012-2022. This problem did not mean that semiconductors — and by extension computer processors — stopped achieving performance improvements from one generation to the next. It did mean that improvements came more from chiplet designs, high-bandwidth memory, optical switches, more cache memory and accelerated computing architecture rather than the scaling down of transistors. New paths to progress Similar phenomena are already being observed with current LLMs. Multimodal AI models like GPT-4o, Claude 3.5 and Gemini 1.5 have proven the power of integrating text and image understanding, enabling advancements in complex tasks like video analysis and contextual image captioning. More tuning of algorithms for both training and inference will lead to further performance gains. Agent technologies, which enable LLMs to perform tasks autonomously and coordinate seamlessly with other systems, will soon significantly expand their practical applications. Future model breakthroughs might arise from one or more hybrid AI architecture designs combining symbolic reasoning with neural networks. Already, the o1 reasoning model from OpenAI shows the potential for model integration and performance extension. While only now emerging from its early stage of development, quantum computing holds promise for accelerating AI training and inference by addressing current computational bottlenecks. The perceived scaling wall is unlikely to end future gains, as the AI research community has consistently proven its ingenuity in overcoming challenges and unlocking new capabilities and performance advances.  In fact, not everyone agrees that there even is a scaling wall. OpenAI CEO Sam Altman was succinct in his views: “There is no wall.” Source: X https://x.com/sama/status/1856941766915641580  Speaking on the “Diary of a CEO” podcast, ex-Google CEO and co-author of Genesis Eric Schmidt essentially agreed with Altman, saying he does not believe there is a scaling wall — at least there won’t be one over the next five years. “In five years, you’ll have two or three more turns of the crank of these LLMs. Each one of these cranks looks like it’s a factor of two, factor of three, factor of four of capability, so let’s just say turning the crank on all these systems will get 50 times or 100 times more powerful,” he said. Leading AI innovators are still optimistic about the pace of progress, as well as the potential for new methodologies. This optimism is evident in a recent conversation on “Lenny’s Podcast” with OpenAI’s CPO Kevin Weil and Anthropic CPO Mike Krieger. Source: https://www.youtube.com/watch?v=IxkvVZua28k  In this discussion, Krieger described that what OpenAI and Anthropic are working on today “feels like magic,” but acknowledged that in just 12 months, “we’ll look back and say, can you believe we used that garbage? … That’s how fast [AI development] is moving.”  It’s true — it does feel like magic, as I recently experienced when using OpenAI’s Advanced Voice Mode. Speaking with ‘Juniper’ felt entirely natural and seamless, showcasing how AI is evolving to understand and respond with emotion and nuance in real-time conversations. Krieger also discusses the recent o1 model, referring to this as “a new way to scale intelligence, and we feel like we’re just at the very beginning.” He added: “The models are going to get smarter at an accelerating rate.”  These expected advancements suggest that while traditional scaling approaches may or may not face diminishing returns in the near-term, the AI field is poised for continued breakthroughs through new methodologies and creative engineering. Does scaling even matter? While scaling challenges dominate much of the current discourse around LLMs, recent studies suggest that current models are already capable of extraordinary results, raising a provocative question of whether more scaling even matters. A recent study forecasted that ChatGPT would help doctors make diagnoses when presented with complicated patient cases. Conducted with an

The end of AI scaling may not be nigh: Here’s what’s next Read More »

Google Cloud launches Veo AI video generator model on Vertex

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As Amazon takes a major step into the AI space with its new Nova family of foundation models, Google is doubling down on its own multimodal AI capabilities. The tech giant’s cloud division has announced that its latest video and image-generation models, Veo and Imagen 3, are now available on Vertex AI. This move empowers teams to integrate cutting-edge video and image-generation capabilities into their AI workflows, unlocking diverse use cases—especially in marketing and advertising. It also makes Google Cloud the first hyperscaler to offer a video model to its customers.  While the Veo model is currently in private preview, Imagen 3 will be generally available to all Vertex AI users starting next week. Notably, Imagen 3 also includes editing features, enabling users to refine generated images to meet specific creative needs. What do Veo and Imagen 3 offer? First unveiled at Google’s I/O developer conference, Veo is Google DeepMind’s response to competitors like Runway’s Gen-3 and OpenAI’s Sora, delivering a sophisticated video-generation experience. The model transforms text or image prompts into cinematic, high-definition videos in various visual styles, generating clips over 60 seconds long. What sets it apart is frame-level consistency, ensuring subjects move seamlessly within shots. Imagen 3, also from DeepMind, takes on the task of text-to-image generation, producing photorealistic visuals in a variety of styles. Google claims it surpasses its predecessors in detail, lighting accuracy and artifact reduction. Beyond generation, users on Google’s allowlist can also access advanced customization options with Imagen 3. These include image upscaling, inpainting, outpainting and background replacement—all guided by text prompts. Additionally, users can provide reference images, enabling Imagen 3 to create content aligned with specific brand aesthetics, logos or product features. Broader implications for industry Vertex AI has long been Google Cloud’s flagship platform for streamlining AI application development and deployment. By integrating Veo and Imagen 3, the platform offers organizations an even more comprehensive suite of tools to innovate in marketing, sales and beyond. Imagen 3, for instance, simplifies the creation of high-quality assets such as product images and social media content, while Veo extends this capability by offering teams an option to convert these visuals into polished videos. The speeds up production, cuts costs, and accelerates prototyping, allowing teams to iterate rapidly on their creative strategies. “Customers like Agoda are using the power of AI models like Veo, Gemini, and Imagen to streamline their video ad production, achieving a significant reduction in production time,” said Warren Barkley, senior director of product management at Google, in a blog post. He also highlighted that both models include safety features like digital watermarking and content moderation guardrails to mitigate risks associated with generative AI. Other early adopters include Mondelez International—owner of brands such as Oreo, Cadbury, and Milka—and global marketing and communications service WPP. As Google’s foundation models expand their reach, businesses across industries have a powerful opportunity to reimagine how they create and deliver visual content.  Competition continues to heat up While all major cloud providers, including Google Cloud, Amazon Web Services and Microsoft Azure, have been providing image generation models on their respective AI orchestration platforms, video generation has been quite a rarity thus far. Google’s move to launch Veo in private preview today changes that.  Interestingly, soon after the Veo announcement, AWS made a splash at re:Invent with the announcement of Nova Reel, a foundation model that generates six-second-long studio-quality videos from text and image prompts. This model, along with others in the Nova family, is set to become available via Amazon Bedrock, the company’s fully managed service designed to simplify the creation and deployment of generative AI applications.  Microsoft, on its part, appears to be lagging in this category at this stage. Its AI Foundry does not include models for video generation. However, we expect that to change as soon as OpenAI’s Sora hits the market. source

Google Cloud launches Veo AI video generator model on Vertex Read More »

Emergence’s AI orchestrator launches to do what big tech offerings can’t: play well with others

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Emergence AI, a startup founded by IBM Research veterans that emerged from stealth earlier this year with more than $97 million in funding, today unveiled its enterprise-grade autonomous multi-agent AI orchestrator, which it claims is among the best offerings for enterprises on the market. Why should an enterprise go with Emergence AI, a company whose name is likely unfamiliar, over rival offerings from big tech vendors such as Microsoft with its Magentic-One framework or Salesforce with Agentforce, or even, as also announced today, Amazon’s Bedrock multi-agent orchestrator? One simple advantage: cross-application and cross-vendor compatibility. “Modern enterprises have hundreds of systems—some legacy, some modern,” said Satya Nitta, Co-Founder and CEO of Emergence AI, in a recent video call interview with VentureBeat. “Our technology bridges these systems, automating workflows across platforms where vendors like Microsoft or Salesforce fall short.” Emergence AI founding team (L-R): Dr. Ravi Kokku, Co-founder and Chief Technology Officer; Dr. Satya Nitta, Co-founder and Chief Executive Officer; Sharad Sundararajan, Co-founder and Chief Information Officer. Credit: Emergence AI Put another way: your company likely has multiple tech vendors such as Salesforce for Slack, Microsoft or Google for email, and maybe even Notion or Monday for project tracking. Emergence’s contention is that trusting a first-party solution from any of these folks would be a mistake when they don’t always play well together. It aims to sit above the fray and work well with any application and vendor that the enterprise uses, uniting them all with its orchestrator. Emergence’s orchestrator acts as an advanced meta-agent that integrates API interactions and web navigation to optimize enterprise workflows. Credit: Emergence AI The orchestrator operates as a hierarchical planner, enabling real-time planning, execution, and verification. “We think of autonomous agents as analogous to autonomous driving—dealing with dynamic environments and requiring planning, reasoning, and deterministic execution,” Nitta said. “We are proud to announce the first multi-agent orchestrator built for web automation, featuring an over-the-top web agent and an API agent working together to tackle complex workflows.” Emergence AI’s agents excel in performing intricate tasks, such as navigating dynamically changing interfaces, extracting data from unstructured sources, and overcoming errors like broken links or unexpected pop-ups. These capabilities are further enhanced by secure API interactions, enabling seamless cross-application workflows and integration across enterprise systems. “Autonomous agents are exciting but must be deterministic,” Nitta emphasized. “Enterprises won’t deploy systems that work 95% of the time—they must work every time.” The orchestrator’s flexible architecture allows organizations to integrate new agents and prompts seamlessly. “Enterprises need runtime determinism and design-time flexibility. They should be able to adjust workflows and integrate new agents or prompts without writing code,” Nitta explained. Real world use cases Emergence AI has already demonstrated its orchestrator’s effectiveness across several industries: • Supply Chain Management: API agents retrieve and update supplier data from platforms like SAP, while web agents gather insights from supplier portals, enabling comprehensive reporting and proactive decision-making. • Financial Services: Autonomous agents combine historical data aggregation with regulatory document analysis to create in-depth financial reports. • Quality Assurance (QA): Web agents automate testing by simulating user interactions, ensuring error-free deployments of web applications. “Our agents can plan and execute test scenarios, reducing weeks of manual effort into hours by identifying bugs and generating detailed reports. It’s a massive productivity gain,” Nitta shared. • Research and Analytics: Agents integrate structured API data with public records and research papers for detailed analyses. These use cases underscore the orchestrator’s ability to handle tasks previously reliant on manual intervention, significantly reducing time and resource expenditures. Nitta also said the orchestrator can assist with travel planning and booking. In general, for all these use cases, “these workflows often involve integrating legacy and modern systems.” Flex pricing Emergence AI is offering a tiered pricing model designed to cater to developers and enterprises alike. “For developers, we offer a freemium model with 100 free actions, and after that, we charge per action,” said Nitta. “Pricing ranges from five cents to $1.50 per action depending on complexity—whether it’s a simple search or something like test scenario development.” For larger organizations, Emergence AI is testing an enterprise-focused pricing approach tied directly to measurable outcomes. “For enterprises, we’re exploring value-based pricing,” Nitta explained. “It’s about tying the cost to the ROI—like extending the capability of a data scientist or automating workflows that would otherwise take extensive manual effort.” Future enhancements promised Emergence AI also introduced a roadmap to expand its orchestrator’s capabilities. Planned features include a “Build Your Own Orchestrator” platform and an Agent Software Development Kit (SDK), allowing developers to create custom agents. Future updates aim to integrate Vision-Language Models (VLMs) for advanced DOM processing and introduce multi-turn conversational interfaces. The company is addressing a gap in autonomous agent evaluation by introducing enterprise-specific benchmarks. In recent tests using the WebVoyager benchmark, Emergence AI’s orchestrator outperformed industry standards by 10-30%, affirming its leadership in web automation. Supporting safety and scalability Dr. Margaret Honey, President and CEO of the Scratch Foundation, commended the orchestrator’s role in advancing safety and scalability. “We are partnering with Emergence and leveraging their Multiagent Orchestrator to implement an innovative agentic solution for platform moderation at scale. This approach will be instrumental in enhancing user safety while supporting the Scratch Foundation’s mission to provide a secure and creative learning environment for children worldwide.” Emergence AI emphasizes enterprise-grade security in its solutions. The orchestrator can be deployed within Virtual Private Cloud (VPC) or on-premises environments, ensuring compliance with stringent data protection standards. The integration of external frameworks and second-party or third-party agents allows businesses to customize solutions tailored to their specific needs. An emerging future of enterprise innovation Emergence AI’s orchestrator is designed to meet enterprises where they are, addressing challenges like legacy infrastructure, data privacy, and system adaptability. With a commitment to continuous innovation, the company aims to push the boundaries of AI-driven automation. Nitta summarized the team’s expertise, stating, “We are a group of researchers

Emergence’s AI orchestrator launches to do what big tech offerings can’t: play well with others Read More »

AWS brings multi-agent orchestration to Bedrock

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AWS is doubling down on AI agents with the announcement of multiagent capabilities on its Amazon Bedrock platform.  During his keynote at the AWS re:Invent conference, AWS CEO Matt Garman said customers building agents on Bedrock wanted a means to make agents that work together.  “While a single agent can be useful, more complex tasks, like performing financial analysis across hundreds or thousands of different variables, may require a large number of agents with their own specializations,” Garman said. “However, creating a system that can coordinate multiple agents, share context across them, and dynamically route different tasks to the right agent requires specialized tools and generative AI expertise that many companies do not have available.” The new capabilities allow enterprises using Bedrock to build agentic workflows to build AI agents and establish their entire agentic ecosystem. This includes the ability to build orchestration agents to manage multiple agents and workflows that require multiple steps.  AWS’s emphasis on agent collaboration differentiates from the approach taken by Microsoft in its announcements around multi-agent support two weeks ago. AWS’s approach builds on its extensive experience with microservices. In an interview with VentureBeat, Swami Sivasubramanian, VP, AI and Data, AWS said the company got key insights from developing its Q dev agent, which he says is best-in-class on the SWE benchmark, which is based on real-world engineering tasks. That background shaped the tools being introduced today, he said. This production-ready focus allows enterprises to move from prototype to deployment faster, he said, with orchestration capabilities that streamline workflows, manage state sharing, and dynamically allocate tasks across specialized agents. These features differentiate AWS from competitors like Microsoft, he said, whose agent tools prioritize broader frameworks but lack the same level of orchestration focus and deployment readiness. “Using multi-agent collaboration in Amazon Bedrock, customers can get more accurate results by creating and assigning specialized agents for specific steps of a project and accelerate tasks by orchestrating multiple agents working in parallel,” AWS said.  Customers build their specialized agents on Bedrock, and then make a supervisor or orchestrator agent to help manage the other agents. AWS said the supervisor agent “handles the coordination, like breaking up and routing tasks to the right agents, giving specific agents access to the information they need to complete their work and determining what actions can be processed in parallel.” Once the other agents finish their tasks, the orchestrator agent then pulls all that information together. Garman pointed to Moody’s, the credit rating agency and one of its customers that first used the multiagent capability. Moody’s created a series of agents for its risk analysis workflow. The company created agents that analyzed macroeconomic trends or looked at the risks of companies, which were able to build more accurate risk assessments. AWS first introduced agentic capabilities in 2023 with the release of Agents on Bedrock, which offered enterprises a way to start building agents. During its re:Invent event this week, AWS customers like PagerDuty and GitLab announced new agents that let users develop their applications on their platforms and use AWS agents to enhance their workflows.  Multiple agents becoming the norm Enterprises turn to agents to help simplify their workflows, so having an agentic ecosystem with multiple AI agents is fast becoming popular, and service providers have noticed. Microsoft offers a library of agents for Copilot users (and has amassed one of the largest agent ecosystems). ServiceNow also has a suite of AI agents, making its orchestrator agent a big selling point. Even OpenAI understands the growing appeal of AI agents with its Swarm AI agent framework.  However, enterprises also need to control the sprawl of AI agents and make sure the agents actually do their job for the task. The orchestration layer, often consisting of an orchestrator agent, monitors the task progress and triggers which agent will start working next.  AWS’s approach is to allow enterprises to build the complex multiagent workflows they want and the orchestration layer they need. Other providers, like ServiceNow and Salesforce, give customers access to the agents, which can be customized to their needs, and then offer orchestrator agents built by the company.  Managing AI agent sprawl is the next big step in the agentic wars, and the space is heating up.  source

AWS brings multi-agent orchestration to Bedrock Read More »

What AI vendor should you choose? Here are the top 7 (OpenAI still leads)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Vendors are deploying new generative AI tools every day in a market that has been likened to the Wild West. But because the technology is so new and ever-evolving, it can be extremely confusing, with platform providers making sometimes speculative promises. IT firm GAI Insights hopes to bring some clarity to enterprise decision-makers with its release of the first known buyer’s guide to large language models (LLMs) and gen AI. It reviewed more than two dozen vendors, identifying seven emerging leaders (OpenAI is way ahead of the pack). Also, proprietary, open source and small models will all be in high demand in 2025 as the C-suite prioritizes AI spending.  “We’re seeing real migration from awareness to early experimentation to really driving systems into production,” Paul Baier, GAI Insights CEO and co-founder, told VentureBeat. “This is exploding, AI is transforming the entire enterprise IT stack.” 7 emerging leaders GAI Insights — which aims to be the “Gartner of gen AI” — reviewed 29 vendors across common enterprise gen AI use cases such as customer service, sales support, marketing and supply chains. They found that OpenAI remains firmly in the lead, taking up 65% of market share. The firm points out that the startup has partnerships with a multitude of content and chip vendors (including Broadcom, with whom it is developing chips). “Obviously they’re the first, they defined the category,” said Baier. However, he noted, the industry is “splintering into sub-categories.” The six other vendors GAI Insights identified as emerging leaders (in alphabetical order):  Amazon (Titan, Bedrock): Has a vendor-neutral approach and is a “one-stop shop” for deployment. It also offers custom AI infrastructure in the way of specialized AI chips such as Trainium and Inferentia.  Anthropic (Sonnet, Haiku, Opus): Is a “formidable” competitor to OpenAI, with models boasting long context windows and performing well on coding tasks. The company also has a strong focus on AI safety and has released multiple tools for enterprise use this year alongside Artifacts, Computer Use and contextual retrieval.  Cohere (Command R): Offers enterprise-focused models and multilingual capabilities as well as private cloud and on-premise deployments. Its Embed and Rerank models can improve search and retrieval with retrieval augmented generation (RAG), which is important for enterprises looking to work with internal data. CustomGPT: Has a no-Code offering and its models feature high accuracy and low hallucination rates. It also has enterprise features such as Sign-On and OAuth and provides analytics and insights into how employees and customers are using tools.  Meta (Llama): Features “best-in-class” models ranging from small and specialized to frontier. Its Meta’s Llama 3 series, with 405 billion parameters, rivals GPT-4o and Claude 3.5 Sonnet in complex tasks such as reasoning, math, multilingual processing and long context comprehension.  Microsoft (Azure, Phi-3): Takes a dual approach, leveraging existing tools from OpenAI while investing in proprietary platforms. The company is also reducing chip dependency by developing its own, including Maia 100 and Cobalt 100.  Some other vendors GAI Insights assessed include SambaNova, IBM, Deepset, Glean, LangChain, LlamaIndex and Mistral AI.   Vendors were rated based on a variety of factors, including product and service innovation; clarity of product and service and benefits and features; track record in launching products and partnerships; defined target buyers; quality of technical teams and management team experience; strategic relationships and quality of investors; money raised; and valuation.  Meanwhile, Nvidia continues to dominate, with 85% of market share. The company will continue to offer products up and down the hardware and software stack, and innovate and grow in 2025 at a “blistering” pace.  Other top trends for 2025 While the gen AI market is still in its early stages — just 5% of enterprises have applications in production — 2025 will see massive growth, with 33% of companies pushing models into production, GAI Insights projects. Gen AI is the leading budget priority for CIOs and CTOs amidst a 240X drop over the last 18 months in the cost of AI computation.  Interestingly 90% of current deployments use proprietary LLMs (compared to open source), a trend the firm calls “Own Your Own Intelligence.” This is due to a need for greater data privacy, control and regulatory compliance. Top use cases for gen AI include customer support, coding, summarization, text generation and contract management.  But ultimately, Baier noted, “there is an explosion in just about any use case right now.”  He pointed out that it’s estimated that 90% of data is unstructured, contained across emails, PDFs, videos and other platforms and marveled that “gen AI allows us to talk to machines, it allows us to unlock the value of unstructured data. We could never do that cost-effectively before. Now we can. There’s a stunning IT revolution going on right now.” 2025 will also see an increased number of vertical-specific small language models (SLMs) emerging, and open-source models will be in demand, as well (even as their definition is contentious). There will also be better performance with even smaller models such as Gemma (2B to 7B parameters), Phi-3 (3.8 B to 7B parameters) and Llama 3.2 (1B and 3B). GAI Insights points out that small models are cost-effective and secure, and that there have been key developments in byte-level tokenization, weight pruning and knowledge distillation that are minimizing size and increasing performance.  Further, voice assistance is expected to be the “killer interface” in 2025 as they offer more personalized experiences and on-device AI is expected to see a significant boost. “We see a real boom next year when smartphones start shipping with AI chips embedded in them,” said Baier.  Will we truly see AI agents in 2025? While AI agents are all the talk in enterprise right now, it remains to be seen how viable they will be in the year ahead. There are many hurdles to overcome, Baier noted, such as unregulated spread, agentic AI making “unreliable or questionable” decisions and operating on poor-quality data. 

What AI vendor should you choose? Here are the top 7 (OpenAI still leads) Read More »

AI2 closes the gap between closed-source and open-source post-training

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The Allen Institute for AI (Ai2) claims to have narrowed the gap between closed-source and open-sourced post-training with the release of its new model training family, Tülu 3, bringing the argument that open-source models will thrive in the enterprise space.  Tülu 3 brings open-source models up to par with OpenAI’s GPT models, Claude from Anthropic and Google’s Gemini. It allows researchers, developers and enterprises to fine-tune open-source models without losing data and core skills of the model and get it close to the quality of closed-source models.  Ai2 said it released Tülu 3 with all of the data, data mixes, recipes, code, infrastructure and evaluation frameworks. The company needed to create new datasets and training methods to improve Tülu’s performance, including “training directly on verifiable problems with reinforcement learning.” “Our best models result from a complex training process that integrates partial details from proprietary methods with novel techniques and established academic research,” Ai2 said in a blog post. “Our success is rooted in careful data curation, rigorous experimentation, innovative methodologies and improved training infrastructure.” Tülu 3 will be available in a range of sizes.  Open-source for enterprises Open-source models often lagged behind closed-sourced models in enterprise adoption, although more companies anecdotally reported choosing more open-source large language models (LLMs) for projects.  Ai2’s thesis is that improving fine-tuning with open-source models like Tülu 3 will increase the number of enterprises and researchers picking open-source models because they can be confident it can perform as well as a Claude or Gemini.  The company points out that Tülu 3 and Ai2’s other models are fully open source, noting that big model trainers like Anthropic and Meta, who claim to be open source, have “none of their training data nor training recipes are transparent to users.” The Open Source Initiative recently published the first version of its open-source AI definition, but some organizations and model providers don’t fully follow the definition in their licenses.  Enterprises care about the transparency of models, but many choose open-source models not so much for research or data openness but because it’s the best fit for their use cases.  Tülu 3 offers enterprises more of a choice when looking for open-source models to bring into their stack and fine-tune with their data.  Ai2’s other models, OLMoE and Molmo, are also open source which the company said has started to outperform other leading models like GPT-4o and Claude.  Other Tülu 3 features Ai2 said Tülu 3 lets companies mix and match their data during fine-tuning.  “The recipes help you balance the datasets, so if you want to build a model that can code, but also follow instructions precisely and speak in multiple languages, you just select the particular datasets and follow the steps in the recipe,” Ai2 said.  Mixing and matching datasets can make it easier for developers to move from a smaller model to a larger weighted one and keep its post-training settings. The company said the infrastructure code it released with Tülu 3 allows enterprises to build out that pipeline when moving through model sizes.  The evaluation framework from Ai2 offers a way for developers to specify settings in what they want to see out of the model.  source

AI2 closes the gap between closed-source and open-source post-training Read More »

Nous Research is training an AI model using machines distributed across the internet

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The team of AI researchers known as Nous Research is currently doing something unique in the fast-moving space of generative AI (at least to my knowledge): Nous is in the midst of pre-training a new 15-billion parameter large language model (LLM) using machines distributed around the internet and the world, avoiding the need to concentrate model development as it traditionally has been in expensive, power-hungry AI data centers and “superclusters” of graphics processing units (GPUs) such as the one recently completed by Elon Musk’s xAI in Memphis, Tennessee. Furthermore, Nous is livestreaming the pre-training process on a dedicated website — distro.nousresearch.com — showing how well it is performing on evaluation benchmarks as it goes along and also a simple map of the various locations of the training hardware behind the exercise, including several places in the U.S. and Europe. As of the time of this article’s publication, there are roughly 57 hours (2.3 days) left in the pre-training run with more than 75% of the process completed. Pre-training is the first of two and arguably most foundational aspect of training an LLM, as it involves training the model on a vast corpus of text data to learn the statistical properties and structures of language. The model processes extensive text datasets, capturing patterns, grammar, and contextual relationships between words. This stage equips the model with a broad understanding of language, enabling it to generate coherent text and perform various language-related tasks. Following pre-training, the model undergoes fine-tuning on a more specific dataset tailored to particular tasks or domains. If successful, Nous will prove that it is possible to train frontier-class LLMs without the need for expensive superclusters or low latency transmission, using a novel, open source training method. It could usher in a new era of distributed AI training as a major, or potentially dominant, source of new AI models and shift the balance of power in gen AI away from well-moneyed big tech companies and towards smaller groups and non-corporate actors. Nous DisTrO: the tech behind the training exercise Nous, which made headlines earlier this year for the release of its permissive and existentially conflicted Meta Llama 3.1 variant Hermes 3 and its overall mission to make AI development personalized and unrestricted, is using its open-source distributed training technology called Nous DisTrO (Distributed Training Over-the-Internet), which Nous initially published in a research paper back in August 2024. According to Nous Research’s recent publication, DisTrO reduces inter-GPU communication bandwidth requirements by up to 10,000x during pre-training. This innovation allows models to be trained on slower and more affordable internet connections—potentially as low as 100Mbps download and 10Mbps upload speeds—while maintaining competitive convergence rates and loss curves. DisTrO’s core breakthrough lies in its ability to efficiently compress the data exchanged between GPUs without sacrificing model performance. As described in an August 2024 VentureBeat article, the method reduced communication requirements from 74.4 gigabytes to just 86.8 megabytes during a test using a Llama 2 architecture, an efficiency gain of nearly 857x. This dramatic improvement paves the way for a new era of decentralized, collaborative AI research. DisTrO builds upon earlier work on Decoupled Momentum Optimization (DeMo), an algorithm designed to reduce inter-GPU communication by several orders of magnitude while maintaining training performance comparable to traditional methods. Both the DeMo algorithm and the DisTrO stack are part of Nous Research’s ongoing mission to decentralize AI capabilities and bring advanced AI development to a broader audience. The team also made the DeMo algorithm available as open-source code on GitHub, inviting researchers and developers worldwide to experiment with and build upon their findings. Hardware partners The pre-training of Nous Research’s 15-billion-parameter language model involved contributions from several notable partners, including Oracle, Lambda Labs, Northern Data Group, Crusoe Cloud, and the Andromeda Cluster. Together, they provided the heterogeneous hardware necessary to test DisTrO’s capabilities in a real-world distributed environment. Profound implications for future AI model development The implications of DisTrO extend beyond technical innovation. By reducing the reliance on centralized data centers and specialized infrastructure, DisTrO offers a path to a more inclusive and collaborative AI research ecosystem. Smaller institutions, independent researchers, and even hobbyists with access to consumer-grade internet and GPUs can potentially train large models—a feat previously reserved for companies with significant capital and expertise. Diederik P. Kingma, a co-author of the research paper and co-inventor of the Adam optimizer, joined Nous Research as a collaborator on the development of DeMo and DisTrO. Kingma’s contributions, alongside those of Nous Research co-founders Bowen Peng and Jeffrey Quesnelle, lend credibility to the project and signal its potential impact on the broader AI community. Next steps Nous Research has opened the door to a future where AI development is no longer dominated by a handful of corporations. Their work on DisTrO demonstrates that with the right optimizations, large-scale AI models can be trained efficiently in a decentralized manner. While the current demonstration used cutting-edge GPUs like the Nvidia H100, the scalability of DisTrO to less specialized hardware remains an area for further exploration. As Nous Research continues to refine its methods, the potential applications of this technology—ranging from decentralized federated learning to training diffusion models for image generation—could redefine the boundaries of AI innovation. source

Nous Research is training an AI model using machines distributed across the internet Read More »