VentureBeat

New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers have introduced Light-R1-32B, a new open-source AI model optimized to solve advanced math problems. It is now available on Hugging Face under a permissive Apache 2.0 license — free for enterprises and researchers to take, deploy, fine-tune or modify as they wish, even for commercial purposes. The 32-billion parameter (number of model settings) model surpasses the performance of similarly sized (and even larger) open-source models such as DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B on the third-party American Invitational Mathematics Examination (AIME) benchmark that contains 15 math problems designed for extremely advanced students and has an allotted time limit of 3 hours. Developed by Liang Wen, Fenrui Xiao, Xin He, Yunke Cai, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia and Xiangzheng Zhang, the model surpasses previous open-source alternatives on competitive math benchmarks. Incredibly, the researchers completed the model’s training in fewer than six hours on 12 Nvidia H800 GPUs at an estimated total cost of $1,000. This makes Light-R1-32B one of the most accessible and practical approaches for developing high-performing math-specialized AI models. However, it’s important to remember that the model was trained on a variant of Alibaba’s open-source Qwen 2.5-32B-Instruct, which itself is presumed to have had much higher upfront training costs. Alongside the model, the team has released its training datasets and scripts and evaluation tools, providing a transparent and accessible framework for building math-focused AI models. The arrival of Light-R1-32B follows similar efforts from rivals, such as Microsoft Orca-Math. A new math king emerges To help Light-R1-32B tackle complex mathematical reasoning, the researchers trained on a model that wasn’t equipped with long-chain-of-thought (COT) reasoning. They applied curriculum-based supervised fine-tuning (SFT) and direct preference otptimization (DPO) to refine its problem-solving capabilities. When evaluated, Light-R1-32B achieved 76.6 on AIME24 and 64.6 on AIME25, surpassing DeepSeek-R1-Distill-Qwen-32B, which scored 72.6 and 54.9, respectively. This improvement suggests that the curriculum-based training approach effectively enhances mathematical reasoning, even when training from models that initially lack long COT. Fair benchmarking To ensure fair benchmarking, the researchers decontaminated training data against common reasoning benchmarks, including AIME24/25, MATH-500 and GPQA Diamond, preventing data leakage. They also implemented difficulty-based response filtering using DeepScaleR-1.5B-preview, ultimately forming a 76,000-example dataset for the first stage of supervised fine-tuning. A second, more challenging dataset of 3,000 examples further improved performance. After training, the team merged multiple trained versions of Light-R1-32B, leading to additional gains. Notably, the model maintains strong generalization abilities on scientific reasoning tasks (GPQA), despite being math-specialized. How enterprises can benefit Light-R1-32B is released under the Apache License 2.0, a permissive open-source license that allows free use, modification and commercial deployment without requiring derivative works to be open-sourced. This makes it an attractive option for enterprises, AI developers and software engineers looking to integrate or customize the model for proprietary applications. The license also includes a royalty-free, worldwide patent grant, reducing legal risks for businesses while discouraging patent disputes. Companies can freely deploy Light-R1-32B in commercial products, maintaining full control over their innovations while benefiting from an open and transparent AI ecosystem. For CEOs, CTOs and IT leaders, Apache 2.0 ensures cost efficiency and vendor independence, eliminating licensing fees and restrictive dependencies on proprietary AI solutions. AI developers and engineers gain the flexibility to fine-tune, integrate and extend the model without limitations, making it ideal for specialized math reasoning, research and enterprise AI applications. However, as the license provides no warranty or liability coverage, organizations should conduct their own security, compliance and performance assessments before deploying Light-R1-32B in critical environments. Transparency in low-cost training and optimization for math problem solving The researchers emphasize that Light-R1-32B provides a validated, cost-effective way to train strong long CoT models in specialized domains. By sharing their methodology, training data and code, they aim to lower cost barriers for high-performance AI development. Looking ahead, they plan to explore reinforcement learning (RL) to further enhance the model’s reasoning capabilities. source

New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs Read More »

Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cerebras Systems, an AI hardware startup that has been steadily challenging Nvidia’s dominance in the artificial intelligence market, announced Tuesday a significant expansion of its data center footprint and two major enterprise partnerships that position the company to become the leading provider of high-speed AI inference services. The company will add six new AI data centers across North America and Europe, increasing its inference capacity twentyfold to over 40 million tokens per second. The expansion includes facilities in Dallas, Minneapolis, Oklahoma City, Montreal, New York and France, with 85% of the total capacity located in the United States. “This year, our goal is to truly satisfy all the demand and all the new demand we expect will come online as a result of new models like Llama 4 and new DeepSeek models,” said James Wang, director of product marketing at Cerebras, in an interview with VentureBeat. “This is our huge growth initiative this year to satisfy [the] almost unlimited demand we’re seeing across the board for inference tokens.” The data center expansion represents the company’s ambitious bet that the market for high-speed AI inference — the process where trained AI models generate outputs for real-world applications — will grow dramatically as companies seek faster alternatives to GPU-based solutions from Nvidia. Cerebras plans to expand capacity from 2 million to over 40 million tokens per second by Q4 2025 across eight data centers in North America and Europe. (Credit: Cerebras) Strategic partnerships that bring high-speed AI to developers and financial analysts Alongside the infrastructure expansion, Cerebras announced partnerships with Hugging Face, the popular AI developer platform, and AlphaSense, a market intelligence platform widely used in the financial services industry. The Hugging Face integration will allow its five million developers to access Cerebras Inference with a single click, without having to sign up for Cerebras separately. This becomes a major distribution channel for Cerebras, particularly for developers working with open-source models like Llama 3.3 70B. “Hugging Face is kind of the GitHub of AI and the center of all open-source AI development,” Wang explained. “The integration is super nice and native. [We] just appear in their inference providers list. You just check the box and then you can use Cerebras right away.” The AlphaSense partnership represents a significant enterprise customer win, with the financial intelligence platform switching from what Wang described as a “global, top-three closed-source AI model vendor” to Cerebras. AlphaSense, which serves approximately 85% of Fortune 100 companies, is using Cerebras to accelerate its AI-powered search capabilities for market intelligence. “This is a tremendous customer win and a very large contract for us,” Wang said. “We speed them up by 10 times, so what used to take five seconds or longer basically become[s] instant on Cerebras.” Mistral’s Le Chat, powered by Cerebras, processes 1,100 tokens per second — significantly outpacing competitors like Google’s Gemini, ChatGPT and Claude. (Credit: Cerebras) How Cerebras is winning the race for AI inference speed as reasoning models slow down Cerebras has been positioning itself as a specialist in high-speed inference, claiming its Wafer-Scale Engine (WSE-3) processor can run AI models 10 to 70 times faster than GPU-based solutions. This speed advantage has become increasingly valuable as AI models evolve toward more complex reasoning capabilities. “If you listen to Jensen’s remarks, reasoning is the next big thing, even according to Nvidia,” Wang said, referring to Nvidia CEO Jensen Huang. “But what he’s not telling you is that reasoning makes the whole thing run 10 times slower because the model has to think and generate a bunch of internal monologue before it gives you the final answer.” This slowdown creates an opportunity for Cerebras, whose specialized hardware is designed to accelerate these more complex AI workloads. The company has already secured high-profile customers including Perplexity AI and Mistral AI, who use Cerebras to power their AI search and assistant products, respectively. “We help Perplexity become the world’s fastest AI search engine. This just isn’t possible otherwise,” Wang said. “We help Mistral achieve the same feat [with AI assistants]. Now they have a reason for people to subscribe to Le Chat Pro, whereas before, your model is probably not the same cutting-edge level as GPT-4.” Cerebras’ hardware delivers inference speeds up to 13x faster than GPU solutions across popular AI models like Llama 3.3 70B and DeepSeek-R1 70B. (Credit: Cerebras) The compelling economics behind Cerebras’ challenge to OpenAI and Nvidia Cerebras is betting that the combination of speed and cost will make its inference services attractive even to companies already using leading models like GPT-4. Wang pointed out that Meta’s Llama 3.3 70B, an open-source model that Cerebras has optimized for its hardware, now scores the same on intelligence tests as OpenAI’s GPT-4, while costing significantly less to run. “Anyone who is using GPT-4 today can just move to Llama 3.3 70B as a drop-in replacement,” he explained. “The price for GPT-4 is [about] $4.40 in blended terms. And Llama 3.3 is like 60 cents. We’re about 60 cents, right? So you reduce cost by almost an order of magnitude. And if you use Cerebras, you increase speed by another order of magnitude.” Inside Cerebras’ tornado-proof data centers built for AI resilience The company is making substantial investments in resilient infrastructure as part of its expansion. Its Oklahoma City facility, scheduled to come online in June 2025, is designed to withstand extreme weather events. “Oklahoma, as you know, is a kind of a tornado zone. So this data center actually is rated and designed to be fully resistant to tornadoes and seismic activity,” Wang said. “It will withstand the strongest tornado ever recorded on record. If [such a tornado] goes through, this thing will just keep sending Llama tokens to developers.” The Oklahoma City facility, operated in partnership with Scale Datacenter, will house over 300 Cerebras CS-3 systems and feature triple-redundant power stations and custom water-cooling solutions specifically

Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia Read More »

The great software rewiring: AI isn’t just eating everything; it is everything

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Once upon a time, software ate the world. Now, AI is here to digest what’s left. The old model of computing, where apps ruled, marketplaces controlled access and platforms took their cut, is unraveling. What’s emerging is an AI-first world where software functions aren’t trapped inside apps but exist as dynamic, on-demand services accessible through AI-native interfaces. For decades, computing has been a glorified filing cabinet. Applications were digital folders, self-contained, rigid and walled off from one another. Want to check the weather? Open an app. Need to book a flight? Another app. Pay a bill? Yet another. The result? A fragmented user experience where we toggle between countless silos, each competing for real estate on a home screen. Generative AI breaks this model. Instead of clicking and tapping through individual programs, users will interact with intelligent agents that dynamically retrieve, process and generate responses in real time, no app required. Ask a single AI assistant to manage travel, optimize finances and recommend a workout routine? Done. Need legal documents reviewed while ordering groceries and summarizing today’s news? Seamless. The new interface is not an app. It is conversational, predictive and frictionless. To be fair, this new world of functional intelligence is not yet entirely ready. Apps are not disappearing overnight, but their grip on computing may well be slipping. AI doesn’t care about pre-packaged software silos. It rewires the experience, making software modular, dynamic and deeply integrated. The entire idea of opening and switching between apps? That is going to quickly feel like legacy thinking. The incumbent risk: Traditional marketplaces are on the clock For years, digital storefronts and walled-garden marketplaces were unbeatable moats. Control distribution, tax every transaction and rake in billions. Beautiful. But what happens when applications become…. unnecessary? The rise of AI-driven interactions threatens the entire app distribution economy. If users rely on AI-native systems instead of installing discrete software, traditional software marketplaces become a relic. AI eats the middleman. The economic model shifts from app monetization to AI-driven service layers, where interactions are seamless, personalized and, most importantly, beyond the reach of legacy platform control. Two unavoidable consequences: Revenue disruption: No more 30% cuts on app sales or in-app purchases. If AI handles transactions autonomously, app store economics implode. Platform disintermediation: AI is cloud-native and hardware-agnostic. Control over digital ecosystems diminishes as software becomes an ambient service rather than a gated experience. The new question is who owns the AI-powered service layers? Because, whoever does will own the next trillion-dollar industry. The new power structures: AI models and vertical AI solutions AI eating applications creates an obvious power vacuum. Where does the value shift? Simple, control over: AI models: The entities developing the most advanced foundation models define the core intelligence layer. User interface and personalization: Whoever builds the most intuitive, AI-native interfaces will dominate engagement. Data and integration: AI thrives on access to real-time, proprietary data. Whoever owns the data pipelines controls the insights, the intelligence and, ultimately, the economy. But there is another force at play: Vertical AI solutions. Right now, most large language models (LLMs) feel like a Swiss Army knife with infinite tools — exciting but overwhelming. Users don’t want to “figure out” AI. They want solutions, AI agents tailored for specific industries and workflows. Think: legal AI drafting contracts, financial AI managing investments, creative AI generating content, scientific AI accelerating research. Broad AI is interesting. Vertical AI is valuable. Right now, LLMs are too broad, too abstract, too unapproachable for most. A blank chat box is not a product, it is homework. If AI is going to replace applications, it must become invisible, integrating seamlessly into daily workflows without forcing users to think about prompts, settings or backend capabilities. The companies that succeed in this next wave will not just build better AI models, but better AI experiences. The future of computing is not about one AI that does everything. It is about many specialized AI systems that know exactly what users need and execute on that flawlessly. The entire software stack is being rewritten in real time. What replaces the old model? Microservices over apps: Forget bloated applications. Future software will be modular, on-demand and AI-callable. Booking a trip? The AI agent sources flights, hotels and rental cars in real time, without you ever opening an app. AI-powered marketplaces: The next software marketplace is not an app store. It is an AI-native services marketplace, where users subscribe to function-specific AI agents rather than downloading static software. AI-as-a-service: Instead of selling standalone apps, developers will build “skills” or “agents” that integrate into an overarching AI ecosystem, monetized through subscriptions or usage-based pricing. The inevitable disruption This is not an evolution; it is a coup. Gen AI is not just another technology layer; it has the potential to eat the entire software industry from the inside out. The old software model was built on scarcity. Control distribution, limit access, charge premiums. AI obliterates this. The new model is fluid, frictionless,and infinitely scalable. The platforms and businesses that fail to adapt may well be relegated to the history books, joining the ranks of those who dismissed the internet, mobile and cloud before it. AI is not just the next software wave; it is the wave that breaks everything before it. The only question left is: Who rides it and who gets drowned?  Justin Westcott leads the global technology sector for Edelman.  source

The great software rewiring: AI isn’t just eating everything; it is everything Read More »

Mistral releases new optical character recognition (OCR) API claiming top performance globally

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Well-funded French AI startup Mistral is content to go its own way. In a sea of competing reasoning models, the company has introduced Mistral OCR, a new optical character recognition (OCR) API designed to provide advanced document understanding capabilities. The API extracts content — including handwritten notes, typed text, images, tables and equations — from unstructured PDFs and images with high accuracy, presenting in a structured format. Structured data is information that is organized in a predefined manner, typically using rows and columns, making it easy to search and analyze. Common examples include names, addresses and financial transactions stored in databases or spreadsheets.  By contrast, unstructured data lacks a specific format or structure, making it more challenging to process and analyze. This category encompasses a wide range of data types, such as emails, social media posts, videos, images and audio files. Since unstructured data doesn’t fit neatly into traditional databases, specialized tools and techniques, like natural language processing (NLP) and machine learning (ML), are often employed to extract meaningful insights.  Understanding the distinction between these data types is crucial for businesses looking to effectively manage and leverage their information assets. With multilingual support, fast processing speeds and integration with large language models (LLMs) for document understanding, Mistral OCR is positioned to assist organizations in making their documentation AI-ready. Given that — according to Mistral’s blog post announcing the new API — 90% of all business information is unstructured, the new API should be a huge boon to organizations seeking to digitize and catalog their data for use in AI applications or internal/external knowledge bases. Mistral sets a new gold standard for OCR Mistral OCR aims to improve how organizations process and analyze complex documents. Unlike traditional OCR solutions that primarily focus on text extraction, Mistral OCR is designed to interpret various document typographical elements and characters, including tables, mathematical expressions and interleaved images, while maintaining structured outputs. According to Mistral’s chief science officer Guillaume Lample, this technology represents a significant step toward wider AI adoption in enterprises, particularly for companies seeking to simplify access to their internal documentation. The API is already integrated into Le Chat, which millions of users rely on for document processing. Now, developers and businesses can access the model via la Plateforme, Mistral’s developer suite. The API is also expected to become available through cloud and inference partners and will offer on-premises deployment for organizations with high-security requirements. Advancing an early (70-year-old) computing technology OCR technology has played a significant role in automating data extraction and document digitization for decades. The first commercial OCR machine was developed in the 1950s by David Shepard and his colleagues Harvey and William Lawless Jr., who founded Intelligent Machines Research Co. (IMR) to bring the technology to market. The system gained traction when Reader’s Digest became its first major customer, followed by banks, telecom companies like AT&T and major oil firms. In 1959, IBM licensed IMR’s patents and introduced its own OCR machine, formalizing the term as the industry standard. Since then, OCR technology has continued to evolve, incorporating AI and ML to improve accuracy, expand language support and handle increasingly complex document formats, and can be found in such leading enterprise software as PDF reader Adobe Acrobat. Mistral OCR represents the next step in this evolution, as it leverages AI to enhance document comprehension beyond simple text recognition. Benchmarks show the power of Mistral OCR Mistral highlights its OCR’s competitive edge over existing tools, citing benchmark tests where it outperformed major alternatives including Google Document AI, Azure OCR and OpenAI’s GPT-4o. The model achieved the highest accuracy scores in math recognition, scanned documents and multilingual text processing. Mistral OCR is also designed to operate faster than competing models and is capable of processing up to 2,000 pages per minute on a single node. This speed advantage makes it suitable for high-volume document processing in industries such as research, customer service and historical preservation. Sophia Yang, head of developer relations at Mistral, has been actively showcasing the OCR capabilities on her X account. Notably, she highlighted its top-tier performance benchmarks, multilingual support and ability to accurately extract mathematical equations from PDFs. In a recent post, she shared an example of Mistral OCR successfully recognizing and formatting complex mathematical expressions, reinforcing its effectiveness for scientific and academic applications. Key features and use cases Mistral OCR introduces several features that make it a versatile tool for businesses and institutions handling large document repositories: Multilingual and multimodal processing: The model supports a wide range of languages, scripts and document layouts, making it useful for global organizations. Yang emphasized this capability, calling it a game-changer for multilingual document processing. Structured output and document hierarchy preservation: Unlike basic OCR models, Mistral OCR retains formatting elements such as headers, paragraphs, lists and tables, ensuring extracted text is more useful for downstream applications. Document-as-prompt and structured outputs: Users can extract specific content and format it in structured outputs, such as JSON or Markdown, enabling integration with other AI-driven workflows. Self-hosting option: Organizations with stringent data security and compliance requirements can deploy Mistral OCR within their own infrastructure. The Mistral AI developer documentation online also highlights document understanding capabilities that go beyond OCR. After extracting text and structure, Mistral OCR integrates with LLMs, allowing users to interact with document content using natural language queries. This feature enables: Question answering about specific document content; Automated information extraction and summarization; Comparative analysis across multiple documents; Context-aware responses that consider the full document. What enterprise decision makers should know about Mistral OCR For CEOs, CIOs, CTOs, IT managers and team leaders, Mistral OCR presents significant opportunities for efficiency, security and scalability in document-driven workflows. 1. Increased efficiency and cost savings By automating document processing and reducing manual data entry, Mistral OCR cuts down on administrative overhead and streamlines operations. Organizations can process large volumes of documents faster and with higher accuracy,

Mistral releases new optical character recognition (OCR) API claiming top performance globally Read More »

Mayo Clinic’s secret weapon against AI hallucinations: Reverse RAG in action

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Even as large language models (LLMs) become ever more sophisticated and capable, they continue to suffer from hallucinations: Offering up inaccurate information, or, to put it more harshly, lying.  This can be particularly harmful in areas like healthcare, where wrong information can have dire results.  Mayo Clinic, one of the top-ranked hospitals in the U.S., has adopted a novel technique to address this challenge. To succeed, the medical facility must overcome the limitations of retrieval-augmented generation (RAG). That’s the process by which large language models (LLMs) pull information from specific, relevant data sources. The hospital has employed what is essentially backwards RAG, where the model extracts relevant information, then links every data point back to its original source content.  Remarkably, this has eliminated nearly all data-retrieval-based hallucinations in non-diagnostic use cases — allowing Mayo to push the model out across its clinical practice. “With this approach of referencing source information through links, extraction of this data is no longer a problem,” Matthew Callstrom, Mayo’s medical director for strategy and chair of radiology, told VentureBeat. Accounting for every single data point Dealing with healthcare data is a complex challenge — and it can be a time sink. Although vast amounts of data are collected in electronic health records (EHRs), data can be extremely difficult to find and parse out.  Mayo’s first use case for AI in wrangling all this data was discharge summaries (visit wrap-ups with post-care tips), with its models using traditional RAG. As Callstrom explained, that was a natural place to start because it is simple extraction and summarization, which is what LLMs generally excel at.  “In the first phase, we’re not trying to come up with a diagnosis, where you might be asking a model, ‘What’s the next best step for this patient right now?’,” he said.  The danger of hallucinations was also not nearly as significant as it would be in doctor-assist scenarios; not to say that the data-retrieval mistakes weren’t head-scratching.  “In our first couple of iterations, we had some funny hallucinations that you clearly wouldn’t tolerate — the wrong age of the patient, for example,” said Callstrom. “So you have to build it carefully.”  While RAG has been a critical component of grounding LLMs (improving their capabilities), the technique has its limitations. Models may retrieve irrelevant, inaccurate or low-quality data; fail to determine if information is relevant to the human ask; or create outputs that don’t match requested formats (like bringing back simple text rather than a detailed table).  While there are some workarounds to these problems — like graph RAG, which sources knowledge graphs to provide context, or corrective RAG (CRAG), where an evaluation mechanism assesses the quality of retrieved documents — hallucinations haven’t gone away.  Referencing every data point This is where the backwards RAG process comes in. Specifically, Mayo paired what’s known as the clustering using representatives (CURE) algorithm with LLMs and vector databases to double-check data retrieval.  Clustering is critical to machine learning (ML) because it organizes, classifies and groups data points based on their similarities or patterns. This essentially helps models “make sense” of data. CURE goes beyond typical clustering with a hierarchical technique, using distance measures to group data based on proximity (think: data closer to one another are more related than those further apart). The algorithm has the ability to detect “outliers,” or data points that don’t match the others.  Combining CURE with a reverse RAG approach, Mayo’s LLM split the summaries it generated into individual facts, then matched those back to source documents. A second LLM then scored how well the facts aligned with those sources, specifically if there was a causal relationship between the two.  “Any data point is referenced back to the original laboratory source data or imaging report,” said Callstrom. “The system ensures that references are real and accurately retrieved, effectively solving most retrieval-related hallucinations.”  Callstrom’s team used vector databases to first ingest patient records so that the model could quickly retrieve information. They initially used a local database for the proof of concept (POC); the production version is a generic database with logic in the CURE algorithm itself. “Physicians are very skeptical, and they want to make sure that they’re not being fed information that isn’t trustworthy,” Callstrom explained. “So trust for us means verification of anything that might be surfaced as content.”  ‘Incredible interest’ across Mayo’s practice The CURE technique has proven useful for synthesizing new patient records too. Outside records detailing patients’ complex problems can have “reams” of data content in different formats, Callstrom explained. This needs to be reviewed and summarized so that clinicians can familiarize themselves before they see the patient for the first time.  “I always describe outside medical records as a little bit like a spreadsheet: You have no idea what’s in each cell, you have to look at each one to pull content,” he said.  But now, the LLM does the extraction, categorizes the material and creates a patient overview. Typically, that task could take 90 or so minutes out of a practitioner’s day — but AI can do it in about 10, Callstrom said.   He described “incredible interest” in expanding the capability across Mayo’s practice to help reduce administrative burden and frustration.  “Our goal is to simplify the processing of content — how can I augment the abilities and simplify the work of the physician?” he said.  Tackling more complex problems with AI Of course, Callstrom and his team see great potential for AI in more advanced areas. For instance, they have teamed with Cerebras Systems to build a genomic model that predicts what will be the best arthritis treatment for a patient, and are also working with Microsoft on an image encoder and an imaging foundation model.  Their first imaging project with Microsoft is chest X-rays. They have so far converted 1.5 million X-rays and plan to do another 11 million in the

Mayo Clinic’s secret weapon against AI hallucinations: Reverse RAG in action Read More »

Chain-of-experts (CoE): A lower-cost LLM framework that increases efficiency and accuracy

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprises increasingly rely on large language models (LLMs) to deliver advanced services, but struggle to handle the computational costs of running models. A new framework, chain-of-experts (CoE), aims to make LLMs more resource-efficient while increasing their accuracy on reasoning tasks. The CoE framework addresses the limitations of earlier approaches by activating “experts” — separated elements of a model, each specializing in certain tasks — sequentially instead of in parallel. This structure allows experts to communicate intermediate results and gradually build on each others’ work. Architectures such as CoE can become very useful in inference-intensive applications, where gains in efficiency can result in huge cost savings and better user experience. Dense LLMs and mixture-of-experts Classic LLMs, sometimes referred to as dense models, activate every parameter simultaneously during inference, leading to extensive computational demands as a model grows larger. Mixture-of-experts (MoE), an architecture used in models such as DeepSeek-V3 and (assumedly) GPT-4o, addresses this challenge by splitting the model into a set of experts. During inference, MoE models use a router that selects a subset of experts for each input. MoEs significantly reduce the computational overhead of running LLMs compared to dense models. For example, DeepSeek-V3 is a 671-billion-parameter model with 257 experts, nine of which are used for any given input token, totaling 37 billion active parameters during inference. But MoEs have limitations. The two main drawbacks are, first, that each expert operates independently of others, reducing the model’s performance on tasks that require contextual awareness and coordination among experts. And second, the MoE architecture causes high sparsity, resulting in a model with high memory requirements, even though a small subset is used at any given time. Chain-of-experts The chain-of-experts framework addresses the limitations of MoEs by activating experts sequentially instead of in parallel. This structure allows experts to communicate intermediate results and gradually build on each others’ work.  CoE uses an iterative process. The input is first routed to a set of experts, which process it and pass on their answers to another set of experts. The second group of experts processes the intermediate results and can pass them on to the next set of experts. This sequential approach provides context-aware inputs, significantly enhancing the model’s ability to handle complex reasoning tasks. Chain-of-experts versus mixture-of-experts (source: Notion) For example, in mathematical reasoning or logical inference, CoE allows each expert to build on previous insights, improving accuracy and task performance. This method also optimizes resource use by minimizing redundant computations common in parallel-only expert deployments, addressing enterprise demands for cost-efficient and high-performing AI solutions. Key advantages of CoE The chain-of-experts approach, using sequential activation and expert collaboration, results in several key benefits, as described in a recent analysis from a group of researchers testing the CoE framework. In CoE, the expert selection is performed in an iterative fashion. In each iteration, the experts are determined by the output of the previous stage. This enables different experts to communicate and form interdependencies to create a more dynamic routing mechanism. “In this way, CoE can significantly improve model performance while maintaining computational efficiency, especially in complex scenarios (e.g., the Math task in experiments),” the researchers write. CoE models outperform dense LLMs and MoEs with equal resources (source: Notion) The researchers’ experiments show that with equal compute and memory budgets, CoE outperforms dense LLMs and MoEs. For example, in mathematical benchmarks, a CoE with 64 experts, four routed experts and two inference iterations (CoE-2(4/64)) outperforms an MoE with 64 experts and eight routed experts (MoE(8/64)). The researchers also found that CoE reduces memory requirements. For example, a CoE with two of 48 routed experts and two iterations (CoE-2(4/48)) achieves performance similar to MoE(8/64) while using fewer total experts, reducing memory requirements by 17.6%. CoE also allows for more efficient model architectures. For example, a CoE-2(8/64) with four layers of neural networks matches the performance of an MoE(8/64) with eight layers, but using 42% less memory.  “Perhaps most significantly, CoE seems to provide what we call a ‘free lunch’ acceleration,” the researchers write. “By restructuring how information flows through the model, we achieve better results with similar computational overhead compared to previous MoE methods.” Case in point: A CoE-2(4/64) provides 823 more expert combinations in comparison to the MoE(8/64), enabling the model to learn more complex tasks without increasing the size of the model or its memory and compute requirements. CoE’s lower operational costs and improved performance on complex tasks can make advanced AI more accessible to enterprises, helping them remain competitive without substantial infrastructure investments. “This research opens new pathways for efficiently scaling language models, potentially making advanced artificial intelligence capabilities more accessible and sustainable,” the researchers write. source

Chain-of-experts (CoE): A lower-cost LLM framework that increases efficiency and accuracy Read More »

SimilarWeb data: This obscure AI startup grew 8,658% while OpenAI crawled at 9%

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More SimilarWeb‘s latest Global AI Tracker report reveals dramatic shifts in the AI landscape, painting a clear picture of market winners and losers. The comprehensive report tracks traffic patterns across various AI tool categories, providing crucial insights for industry strategists and investors. DevOps and code completion tools lead the pack with a remarkable 72% year-over-year growth in the 12-week period ending February 28, 2025. Meanwhile, traditional educational technology platforms continue their downward spiral, declining 20% during the same period as AI alternatives gain traction. These numbers reveal the stark reality of AI’s market impact: We’ve moved beyond speculation into actual market restructuring. The dramatic contrasts between soaring developer tools and plummeting EdTech platforms show how rapidly AI is redrawing competitive boundaries. The winners aren’t just technically superior; they’re fundamentally reimagining how problems are solved in ways legacy systems cannot match. Let’s examine the most surprising takeaways from SimilarWeb’s latest intelligence report that showcase the evolving AI landscape. SimilarWeb’s Global AI Tracker shows DevOps tools leading with 72% growth, while sectors like music generation and writing content declined. Data analytics (42%) and HCM (31%) emerged as surprising growth categories in early 2025. (Credit: SimilarWeb) AI-powered developer tools show extraordinary momentum, with 72% year-over-year growth. DevOps and code completion tools have emerged as the category with the clearest product-market fit in the generative AI era. Tools like Cursor (97% growth) and Replit (67% growth) demonstrate that AI’s most immediate impact may be on software development itself — creating a virtuous cycle where AI accelerates the creation of more advanced AI systems. This suggests that the most transformative effects of AI in the near term may be invisible to consumers but profoundly important for technological progress. AI tools market grows 21% quarterly, with Deepseek’s extraordinary 8,658% surge leading the sector. Established players like Microsoft (-6%) and Claude (2%) have lost momentum while Hugging Face maintains strong growth at 42%, highlighting the sector’s extreme volatility. (Source: SimilarWeb) 2. The digital freelance contraction Traffic to digital freelance platforms has steadily declined by 20% throughout the reporting period, raising profound questions about the future of knowledge work. As AI tools become more capable of producing design assets, written content and even code, the traditional freelance marketplace model appears increasingly threatened. Every major platform in this category showed significant declines: Fiverr (-22%), Upwork (-18%), Freelancer (-15%) and Toptal (-35%). This pattern suggests that businesses may be shifting budget from human freelancers to AI tools for certain categories of work, particularly in content creation and basic design tasks. Major freelance platforms show consistent year-over-year traffic declines, with Toptal experiencing the steepest drop at -35%. Guru’s trajectory has deteriorated most dramatically, shifting from 18% growth last September to a -30% decline by February 2025, highlighting the mounting pressure on creative service marketplaces as AI alternatives gain traction. (Source: SimilarWeb) 3. The resilience of design platforms amid AI art growth Despite the proliferation of AI image generation tools showing 8% growth by February 2025, traditional design platforms have demonstrated remarkable resilience with 16% growth in the same period. Canva maintained 18% growth, while Adobe Express and Figma showed 19% and 8% growth, respectively. This challenges the narrative that AI tools cannibalize their traditional counterparts. Instead, data suggests that established design platforms may be successfully integrating AI capabilities into their offerings, creating a complementary rather than competitive relationship with generative technologies. Design platforms show resilience with 16% year-over-year growth, with Canva (18%) and Adobe Express (19%) maintaining strong performance. Newer entrant Kittl, despite being labeled “falling,” still posts impressive 55% growth, highlighting the sector’s overall stability. (Source: SimilarWeb) 4. Traditional EdTech’s accelerating decline Traditional EdTech platforms show a consistent downward trend culminating in a 20% year-over-year decline, with the trajectory worsening over time (as indicated by the “falling” trend designation). Individual platforms tell an even more dramatic story. Chegg and Course Hero, once dominant players in the homework help space, saw traffic plummet by 58% and 59%, respectively. These platforms, which built their business models around human tutors and crowdsourced study materials, appear particularly vulnerable to AI-powered alternatives that offer instant, personalized assistance. Education technology platforms face a stark 20% traffic decline, with Course Hero and Chegg plummeting nearly 60% as students increasingly adopt AI alternatives. Even established players like Udemy (-11%) struggle, while Duolingo shows relative resilience at just -1%. (Source: SimilarWeb) 5. The meteoric rise of niche AI challengers The most stunning growth story isn’t from an established tech giant but from relative newcomers. Deepseek, in the general AI category, posted an astonishing 8,658% growth in the 12-week period ending February 2025. While OpenAI’s properties grew by just 9% in the same timeframe, these emerging contenders are redefining market dynamics. In the DevOps category, Lovable demonstrated similarly explosive growth, with increases measured in the thousands of percentage points throughout the tracking period. These patterns suggest that the AI market remains highly dynamic, with opportunities for specialized tools to capture significant market share despite the presence of established players. Developer tools show exceptional 72% quarterly growth, largely driven by Lovable’s astonishing 928% increase and Cursor’s steady 97% rise. The market reveals a clear winner-take-all dynamic as traditional tools like Tabnine (-24%) and Bito (-25%) rapidly lose market share to AI-powered alternatives. (Source: SimilarWeb) The new AI landscape takes shape The SimilarWeb report offers more than just traffic statistics — it provides a window into the practical impact of AI technologies across sectors. The data reveals a nuanced picture where AI isn’t simply replacing existing tools, but is creating new value in specific domains while challenging established business models in others. For businesses navigating this shifting landscape, the message is clear: AI adoption isn’t a monolithic trend, but a series of specialized transformations happening at different rates across sectors. The tools that gain traction aren’t necessarily the most technologically advanced, but those that solve real problems for specific user groups. As developers

SimilarWeb data: This obscure AI startup grew 8,658% while OpenAI crawled at 9% Read More »

Contextual AI’s new AI model crushes GPT-4o in accuracy — here’s why it matters

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Contextual AI unveiled its grounded language model (GLM) today, claiming it delivers the highest factual accuracy in the industry by outperforming leading AI systems from Google, Anthropic and OpenAI on a key benchmark for truthfulness. The startup, founded by the pioneers of retrieval-augmented generation (RAG) technology, reported that its GLM achieved an 88% factuality score on the FACTS benchmark, compared to 84.6% for Google’s Gemini 2.0 Flash, 79.4% for Anthropic’s Claude 3.5 Sonnet and 78.8% for OpenAI’s GPT-4o. While large language models have transformed enterprise software, factual inaccuracies — often called hallucinations — remain a critical challenge for business adoption. Contextual AI aims to solve this by creating a model specifically optimized for enterprise RAG applications where accuracy is paramount. “We knew that part of the solution would be a technique called RAG — retrieval-augmented generation,” said Douwe Kiela, CEO and cofounder of Contextual AI, in an exclusive interview with VentureBeat. “And we knew that because I was a co-inventor of RAG. What this company is about is really about doing RAG the right way, to kind of the next level of doing RAG.” The company’s focus differs significantly from general-purpose models like ChatGPT or Claude, which are designed to handle everything from creative writing to technical documentation. Contextual AI instead targets high-stakes enterprise environments where factual precision outweighs creative flexibility. “If you have a RAG problem and you’re in an enterprise setting in a highly regulated industry, you have no tolerance whatsoever for hallucination,” explained Kiela. “The same general-purpose language model that is useful for the marketing department is not what you want in an enterprise setting where you are much more sensitive to mistakes.” A benchmark comparison showing Contextual AI’s new grounded language model (GLM) outperforming competitors from Google, Anthropic and OpenAI on factual accuracy tests. The company claims its specialized approach reduces AI hallucinations in enterprise settings.(Credit: Contextual AI) How Contextual AI makes ‘groundedness’ the new gold standard for enterprise language models The concept of “groundedness” — ensuring AI responses stick strictly to information explicitly provided in the context — has emerged as a critical requirement for enterprise AI systems. In regulated industries like finance, healthcare and telecommunications, companies need AI that either delivers accurate information or explicitly acknowledges when it doesn’t know something. Kiela offered an example of how this strict groundedness works: “If you give a recipe or a formula to a standard language model, and somewhere in it, you say, ‘but this is only true for most cases,’ most language models are still just going to give you the recipe assuming it’s true. But our language model says, ‘Actually, it only says that this is true for most cases.’ It’s capturing this additional bit of nuance.” The ability to say “I don’t know” is a crucial one for enterprise settings. “Which is really a very powerful feature, if you think about it in an enterprise setting,” Kiela added. Contextual AI’s RAG 2.0: A more integrated way to process company information Contextual AI’s platform is built on what it calls “RAG 2.0,” an approach that moves beyond simply connecting off-the-shelf components. “A typical RAG system uses a frozen off-the-shelf model for embeddings, a vector database for retrieval, and a black-box language model for generation, stitched together through prompting or an orchestration framework,” according to a company statement. “This leads to a ‘Frankenstein’s monster’ of generative AI: the individual components technically work, but the whole is far from optimal.” Instead, Contextual AI jointly optimizes all components of the system. “We have this mixture-of-retrievers component, which is really a way to do intelligent retrieval,” Kiela explained. “It looks at the question, and then it thinks, essentially, like most of the latest generation of models, it thinks, [and] first it plans a strategy for doing a retrieval.” This entire system works in coordination with what Kiela calls “the best re-ranker in the world,” which helps prioritize the most relevant information before sending it to the grounded language model. Beyond plain text: Contextual AI now reads charts and connects to databases While the newly announced GLM focuses on text generation, Contextual AI’s platform has recently added support for multimodal content including charts, diagrams and structured data from popular platforms like BigQuery, Snowflake, Redshift and Postgres. “The most challenging problems in enterprises are at the intersection of unstructured and structured data,” Kiela noted. “What I’m mostly excited about is really this intersection of structured and unstructured data. Most of the really exciting problems in large enterprises are smack bang at the intersection of structured and unstructured, where you have some database records, some transactions, maybe some policy documents, maybe a bunch of other things.” The platform already supports a variety of complex visualizations, including circuit diagrams in the semiconductor industry, according to Kiela. Contextual AI’s future plans: Creating more reliable tools for everyday business Contextual AI plans to release its specialized re-ranker component shortly after the GLM launch, followed by expanded document-understanding capabilities. The company also has experimental features for more agentic capabilities in development. Founded in 2023 by Kiela and Amanpreet Singh, who previously worked at Meta’s Fundamental AI Research (FAIR) team and Hugging Face, Contextual AI has secured customers including HSBC, Qualcomm and the Economist. The company positions itself as helping enterprises finally realize concrete returns on their AI investments. “This is really an opportunity for companies who are maybe under pressure to start delivering ROI from AI to start looking at more specialized solutions that actually solve their problems,” Kiela said. “And part of that really is having a grounded language model that is maybe a bit more boring than a standard language model, but it’s really good at making sure that it’s grounded in the context and that you can really trust it to do its job.” source

Contextual AI’s new AI model crushes GPT-4o in accuracy — here’s why it matters Read More »

Google launches free Gemini-powered Data Science Agent on its Colab Python platform

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI agents are all the rage, but how about one focused specifically on analyzing, sorting and drawing conclusions from vast volumes of data? Google’s data science agent does just that: The new, free, Gemini 2.0-powered AI assistant that automates data analysis is now available to users age 18-plus in select countries and languages. The assistant is available through Google Colab, the company’s eight-year-old service for running Python code live online atop graphics processing units (GPUs) owned by the search giant and its own, in-house tensor processing units (TPUs). Initially launched for trusted testers in December 2024, data science agent is designed to help researchers, data scientists and developers streamline their workflows by generating fully-functional Jupyter notebooks from natural language descriptions, all in the user’s browser. This expansion aligns with Google’s ongoing efforts to integrate AI-driven coding and data science features into Colab, building on past updates such as Codey-powered AI coding assistance, announced in May 2023. It also acts as a kind of advanced and belated rejoinder to OpenAI’s ChatGPT advanced data analysis (previously Code Interpreter), which is now built into ChatGPT when running GPT-4. What is Google Colab? Google Colab (short for colaboratory) is a cloud-based Jupyter Notebook environment that enables users to write and execute Python code directly in their browser. Jupyter Notebook is an open-source web application that enables users to create and share documents containing live code, equations, visualizations and narrative text. Originating from the IPython project in 2014, it now supports more than 40 programming languages, including Python, R and Julia. This interactive platform is widely used in data science, research and education for tasks like data analysis, visualization and teaching programming concepts. Since its launch in 2017, Google Colab has become one of the most widely-used platforms for machine learning (ML) data science and education. As Ori Abramovsky, data science lead at Spectralops.io, detailed in an excellent Medium post from 2023, Colab’s ease of use and free access to GPUs and TPUs make it a standout option for many developers and researchers. He noted that the low barrier to entry, seamless integration with Google Drive and support for TPUs allowed his team to dramatically shorten training cycles while working on AI models. However, Abramovsky also pointed out Colab’s limitations, such as: Session time limits (especially for free-tier users). Unpredictable resource allocation at peak usage times. Lack of critical features, like efficient pipeline execution and advanced scheduling. Support challenges, as Google provides limited options for direct assistance. Despite these drawbacks, Abramovsky emphasized that Colab remains one of the best serverless notebook solutions available — particularly in the early stages of ML and data analysis projects. Simplifying data analysis with AI The data science agent builds on Colab’s serverless notebook environment by eliminating the need for manual setup. Using Google’s Gemini AI, users can describe their analytical goals in plain English (“visualize trends,” “train a prediction model,” “clean missing values”), and the agent generates fully-executable Colab notebooks in response. It supports users by: Automating analysis: Generates complete, working notebooks instead of isolated code snippets. Saving time: Eliminates manual setup and repetitive coding. Enhancing collaboration: Features built-in sharing features for team-based projects. Offering modifiable solutions: Users can adjust and customize generated code. Data science agent is already accelerating real-world scientific research According to Google, early testers have reported significant time savings when using data science agent. For instance, a scientist at Lawrence Berkeley National Laboratory working on tropical wetland methane emissions estimated that their data processing time dropped from one week to just five minutes when using the agent. The tool has also performed well in industry benchmarks, ranking 4th on the DABStep: Data Agent Benchmark for Multi-step Reasoning on Hugging Face, ahead of AI agents such as ReAct (GPT-4.0), Deepseek, Claude 3.5 Haiku and Llama 3.3 70B. However, OpenAI’s rival o3-mini and o1 models, as well as Anthropic’s Claude 3.5 Sonnet, both outclassed the new Gemini data science agent. Getting started Users can start using data science agent in Google Colab by following these steps: Open a new Colab notebook. Upload a dataset (CSV, JSON, etc.). Describe the analysis in natural language using the Gemini side panel. Execute the generated notebook to see insights and visualizations. Google provides sample datasets and prompt ideas to help users explore its capabilities, including: Stack Overflow developer survey: “Visualize most popular programming languages.” Iris Species dataset: “Calculate and visualize Pearson, Spearman and Kendall correlations.” Glass Classification dataset: “Train a random forest classifier.” Anytime a user wants to use the new agent, they’ll have to navigate to Colab and click “file,” then “new notebook in drive,” and the resulting notebook will be stored in their Google Drive cloud account. My own brief demo usage was more mixed Granted, I’m a lowly tech journalist and not a data scientist, but my own usage of the new Gemini 2.0-powered data science agent in Colab so far has been less than seamless. I uploaded five CSV files (comma separated values, standard spreadsheet files from Excel or Sheets) and asked it “How much am I spending each month and quarter on my utilities?”. The agent went ahead and performed the following operations: Merged datasets, handling date and account number inconsistencies. Filtered and cleaned the data, ensuring only relevant expenses remained. Grouped transactions by month and quarter to calculate spending. Generated visualizations, such as line charts for trend analysis. Summarized findings in a clear, structured report. Before execution, Colab prompted a confirmation message, reminding me that it might interact with external APIs. It did all this very rapidly and smoothly in the browser, in a matter of seconds. And it was impressive to watch it work through the analysis and programming with visible step-by-step descriptions of what it was doing. However, it ultimately generated an inaccurate graph showing just one month’s utility spending, failing to recognize the sheets included a full year’s worth broken out by

Google launches free Gemini-powered Data Science Agent on its Colab Python platform Read More »

Top AI leaders are shaping VB Transform 2025 — here’s what’s next for enterprise AI

Enterprise AI leaders face a new set of challenges in 2025 — how can they deploy agentic AI, drive real ROI and navigate evolving AI economics? At VB Transform 2025, a hand-picked group of AI executives — from LinkedIn, Bank of America, Intuit and more — is shaping the agenda to deliver the most actionable, no-fluff insights. “The goal is a VB Transform program that serves the needs of executives by addressing questions, challenges and strategic priorities that execs are facing, and provide blueprints for driving AI initiatives forward,” says Matt Marshall, CEO of VentureBeat. “Our attendees will learn how to implement practical agentic AI applications at scale, directly from industry leaders already in the trenches.” As AI hurtles forward, what’s your next move? Join us to get real-world strategies directly from the leaders in the trenches. But spots are filling fast — secure your seat today! Have an AI success story to share? Apply to speak and position yourself as an industry leader. Want to get noticed by AI decision-makers? Apply to sponsor now. VB Transform 2025 committee members Meet the committee members who are helping deepen VB Transform 2025 discussions: Deepak Agarwal, chief AI officer, LinkedIn Awais Sher Bajwa, head of data and AI banking, Bank of America Fiona Tan, CTO, Wayfair Patrick Baginski, Global VP – data, data science, data engineering and machine learning engineering, AB InBev Ashok Srivastava, SVP and chief data officer, Intuit May Habib, CEO and co-founder, Writer Bill Braun, former CIO, Chevron Xuedong Huang, CTO, Zoom The critical themes and big issues shaping Transform 2025 Agentic AI: Hype vs. reality. Every company wants to leverage AI agents—but what’s truly working in 2025? Leaders will separate fact from fiction, offering real deployment lessons. Data quality: Business leaders will talk about the need for high-quality, well-governed data remains paramount — and why garbage in, garbage out has continued to be a critical concern in data lakes, data integration, data labeling and data annotation. Legacy systems and disparate data sources create significant challenges, though solutions to address these issues are still emerging. ROI of AI: Moving forward, practicality and ROI are key, with the focus shifting from theoretical discussions to tangible business value. Learn how enterprises can seek proven use cases with clear ROI, especially in areas like process automation and improved customer experience. Changing economics: The economics of AI continue to evolve and will shape ROI and business value. Inference costs and latency are rapidly decreasing, enabling more complex and sophisticated AI applications, but questions remain about the cost-effectiveness of different approaches, e.g., building vs. buying models) — what best practices are emerging among leaders across industries? Open source: While there’s cautious optimism about open-source AI models and infrastructure, driven by transparency, cost savings, and increased control, execs warn that challenges remain regarding security, maintenance and support — and will offer actionable ways forward into the future. Seats are filling fast for Transform 2025 — secure your place today! Share your success story and position yourself as an industry leader — apply to speak now. Connect with AI decision-makers: apply to sponsor. source

Top AI leaders are shaping VB Transform 2025 — here’s what’s next for enterprise AI Read More »