VentureBeat

Mistral AI drops new open-source model that outperforms GPT-4o Mini with fraction of parameters

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More French artificial-intelligence startup Mistral AI unveiled a new open-source model today that the company says outperforms similar offerings from Google and OpenAI, setting the stage for increased competition in a market dominated by U.S. tech giants. The model, called Mistral Small 3.1, processes both text and images with just 24 billion parameters—a fraction of the size of leading proprietary models—while matching or exceeding their performance, according to the company. “This new model comes with improved text performance, multimodal understanding, and an expanded context window of up to 128k tokens,” Mistral said in a company blog post announcing the release. The firm claims the model processes information at speeds of 150 tokens per second, making it suitable for applications requiring rapid response times. By releasing the model under the permissive Apache 2.0 license, Mistral is pursuing a markedly different strategy than its larger competitors, which have increasingly restricted access to their most powerful AI systems. The approach highlights a growing divide in the AI industry between closed, proprietary systems and open, accessible alternatives. How a $6 billion European startup is taking on Silicon Valley’s AI giants Founded in 2023 by former researchers from Google DeepMind and Meta, Mistral AI has rapidly established itself as Europe’s leading AI startup, with a valuation of approximately $6 billion after raising around $1.04 billion in capital. This valuation, while impressive for a European startup, remains a fraction of OpenAI’s reported $80 billion or the resources available to tech giants like Google and Microsoft. Mistral has achieved notable traction, particularly in its home region. Its chat assistant Le Chat recently reached one million downloads in just two weeks following its mobile release, bolstered by vocal support from French President Emmanuel Macron, who urged citizens to “download Le Chat, which is made by Mistral, rather than ChatGPT by OpenAI — or something else” during a television interview. The company strategically positions itself as “the world’s greenest and leading independent AI lab,” emphasizing European digital sovereignty as a key differentiator from American competitors. Small but mighty: How Mistral’s 24 billion parameter model punches above its weight class Mistral Small 3.1 stands out for its remarkable efficiency. With just 24 billion parameters—a fraction of models like GPT-4—the system delivers multimodal capabilities, multilingual support, and handles long-context windows of up to 128,000 tokens. This efficiency represents a significant technical achievement. While the AI industry has generally pursued ever-larger models requiring massive computational resources, Mistral has focused on algorithmic improvements and training optimizations to extract maximum capability from smaller architectures. The approach addresses one of the most pressing challenges in AI deployment: the enormous computational and energy costs associated with state-of-the-art systems. By creating models that run on relatively modest hardware—including a single RTX 4090 graphics card or a Mac with 32GB of RAM—Mistral makes advanced AI accessible for on-device applications where larger models prove impractical. This emphasis on efficiency may ultimately prove more sustainable than the brute-force scaling pursued by larger competitors. As climate concerns and energy costs increasingly constrain AI deployment, Mistral’s lightweight approach could transition from alternative to industry standard. Why Europe’s AI champion could benefit from growing geopolitical tensions Mistral’s latest release emerges amid growing concerns about Europe’s ability to compete in the global AI race, traditionally dominated by American and Chinese companies. “Not being American or Chinese may now be a help, not a hindrance,” The Economist reported in a recent analysis of Mistral’s position, suggesting that as geopolitical tensions rise, a European alternative may become increasingly attractive for certain markets and governments. Arthur Mensch, Mistral’s CEO, has advocated forcefully for European digital sovereignty. At the Mobile World Congress in Barcelona this month, he urged European telecoms to “get into the hyperscaler game” by investing in data center infrastructure. “We would welcome more domestic effort in making more data centers,” Mensch said, suggesting that “the AI revolution is also bringing opportunities to decentralize the cloud.” The company’s European identity provides significant regulatory advantages. As the EU’s AI Act takes effect, Mistral enters the market with systems designed from inception to align with European values and regulatory expectations. This contrasts sharply with American and Chinese competitors who must retrofit their technologies and business practices to comply with an increasingly complex global regulatory landscape. Beyond text: Mistral’s expanding portfolio of specialized AI models Mistral Small 3.1 joins a rapidly expanding suite of AI products from the company. In February, Mistral released Saba, a model focused specifically on Arabic language and culture, demonstrating an understanding that AI development has concentrated excessively on Western languages and contexts. Earlier this month, the company introduced Mistral OCR, an optical character recognition API that converts PDF documents into AI-ready Markdown files—addressing a critical need for enterprises seeking to make document repositories accessible to AI systems. These specialized tools complement Mistral’s broader portfolio, which includes Mistral Large 2 (their flagship large language model), Pixtral (for multimodal applications), Codestral (for code generation), and “Les Ministraux,” a family of models optimized for edge devices. This diversified portfolio reveals a sophisticated product strategy that balances innovation with market demands. Rather than pursuing a single monolithic model, Mistral creates purpose-built systems for specific contexts and requirements — an approach that may prove more adaptable to the rapidly evolving AI landscape. From Microsoft to military: How strategic partnerships are fueling Mistral’s growth Mistral’s rise has accelerated through strategic partnerships, including a deal with Microsoft that includes distribution of its AI models through Microsoft’s Azure platform and a $16.3 million investment. The company has also secured partnerships with France’s army and job agency, German defense tech startup Helsing, IBM, Orange, and Stellantis, positioning itself as a key player in Europe’s AI ecosystem. In January, Mistral signed a deal with press agency Agence France-Presse (AFP) to allow its chat assistant to query AFP’s entire text archive dating back to 1983, enriching its knowledge base with high-quality journalistic content. These partnerships reveal

Mistral AI drops new open-source model that outperforms GPT-4o Mini with fraction of parameters Read More »

GPUs go biological: BBB unveils Bionode, lab-grown, living neuron compute for AI applications

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Graphics processing units (GPUs), the expensive computer chips made by companies like Nvidia, AMD, and Sima.ai, are no longer the only way to train and deploy artificial intelligence. Biological Black Box (BBB), a Baltimore-founded startup developing a new class of AI hardware, has emerged from stealth with its Bionode platform—a computing system that integrates living, lab-grown neurons with traditional processors. The company, which has been operating quietly while filing patents and refining its technology, believes its biological computing approach — growing new neurons specifically to act as computer chips using donor human stem cells and rat-derived cells — could offer a low-power, adaptive alternative to conventional GPUs. “Over the last 20 years, three independent fields—biology, hardware, and computational tools—have advanced to the point where biological computing is now possible,” said Alex Ksendzovsky, BBB’s co-founder and CEO, in a video call interview with VentureBeat. A member of Nvidia’s Inception incubator, BBB is positioning itself as an advancement and augmentation to the dominant silicon-based AI chips that Nvidia and others produce. By leveraging neurons’ ability to physically rewire themselves, the company aims to reduce energy costs, improve processing efficiency, and accelerate AI model training—challenges that have become increasingly urgent as AI adoption expands. This isn’t sci-fi, despite the incredible premise: BBB’s neural chips are already powering computer vision and LLMs for customers. The company has entered talks with two partners to license its tech for computer vision apps—though the company declined to name its customers and partners specifically, citing confidentiality agreements. It is also accepting inquiries from prospective partners and clients on its website. Blending biology and hardware At the core of BBB’s approach is the Bionode platform, which uses lab-grown neurons wired into computing systems. “We have multiple models that we use,” Ksendzovsky told me. “One of those models is from rat cells. One of those models is from actually human stem cells that are converted into neurons.” The co-founder said that “hundreds of thousands of them” are integrated into a dish containing 4,096 electrodes, which forms the basis of one Bionode chip. He also said they live for over a year before needing to be replaced. The idea is to harness neurons’ natural adaptability for AI processing, creating a hybrid computing system that differs fundamentally from today’s rigid, transistor-based chips. Microscopic image of BBB neural compute cell with information flowing through it. Credit: BBB Ksendzovsky, who has been working with neurons on electrodes since 2005, originally considered using them to predict the stock market. His mentor, Steve Potter, dismissed the idea at the time. “Why aren’t we using neurons to predict the stock market so we can all be rich?” Ksendzovsky recalled asking Potter, who laughed it off as impractical. “At the time, he was right,” Ksendzovsky admitted. Since then, improvements in electrode technology, computational tools, and neuron longevity have made biological computing viable. “The biological network has evolved over hundreds of millions of years into the most efficient computing system ever created,” Ksendzovsky explained. This setup offers two immediate advantages: • More Efficient Computer Vision: Bionode has been tested as a pre-processing layer for AI classification tasks, reducing both inference times and GPU power consumption. • Accelerated Large Language Model (LLM) Training: Unlike GPUs, which require frequent retraining cycles, neurons adapt on the fly. This could significantly reduce the time and energy needed to update large language models (LLMs), addressing a key bottleneck in AI scaling. “One of our biggest breakthroughs is using biological networks to train LLM’s more efficiently, reducing the massive energy consumption required today,” Ksendzovsky said. Building a viable, living GPU with Nvidia’s help Nvidia’s GPUs have been instrumental in AI’s rapid advancement, but their high energy consumption and increasing cost have raised concerns about scalability. BBB sees an opportunity to introduce a more power-efficient alternative while still operating within Nvidia’s ecosystem. “We don’t see ourselves as direct competitors to Nvidia, at least in the near future,” Ksendzovsky noted. “Biological computing and silicon computing will coexist. We still need GPUs and CPUs to process the data coming from neurons.” In fact, according to the co-founder, “we can use our biological networks to augment and improve silicon-based AI models, making them more accurate and more energy-efficient.” He argued that the long-term vision for AI hardware will be a modular ecosystem in which biological computing, silicon chips, and even quantum computing each play a role. “The future of computing will be a modular ecosystem where traditional silicon, biological computing, and quantum computing each play a role based on their strengths,” he said. Although BBB has yet to disclose a commercial launch date, the company is relocating from Baltimore, Maryland, to the Bay Area as it prepares to scale its technology. The future of hybrid AI processing While silicon-based GPUs remain the industry standard, BBB’s brain-on-a-chip concept presents a glimpse into a future where AI hardware is no longer limited to transistors and circuits. The ability of neurons to reconfigure themselves dynamically could enable AI systems that are more energy-efficient, adaptive, and capable of continuous learning. “We’re already applying biological computing to computer vision. We can encode images into a biological network, let neurons process them, and then decode the neural response to improve classification accuracy,” Ksendzovsky said. Beyond efficiency gains, BBB also believes that its biological approach can provide deeper insight into how AI models process data. “We’ve built a closed-loop system that allows neurons to rewire themselves, increasing efficiency and accuracy for AI tasks,” he explained. Despite the potential, Ksendzovsky acknowledges that ethical considerations will be an ongoing discussion. BBB is already working with ethicists and regulatory experts to ensure its technology is developed responsibly. “We don’t need millions of neurons to process the entire environment like a brain does. We use only what’s necessary for specific tasks, keeping ethical considerations in mind,” he emphasized. BBB is betting that living tissue, not just silicon, could be the key to

GPUs go biological: BBB unveils Bionode, lab-grown, living neuron compute for AI applications Read More »

Visa’s AI edge: How RAG-as-a-service and deep learning are strengthening security and speeding up data retrieval

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Global payments giant Visa operates in 200-plus countries and territories, all with their own unique, complex rules and regulations.  Its client services team must understand those nuances when policy-related questions come up — like ‘are we allowed to process this type of payment in this country?’ — but it’s simply not humanly possible to know all those answers top-of-mind.  This means they’ve typically had to track down relevant information manually — an exhaustive process that can take days depending on how accessible it is.  When generative AI emerged, Visa saw this as a perfect use case, applying retrieval-augmented generation (RAG) to not only pull out information up to 1,000X faster, but cite it back to its sources.  “First of all, it’s better quality results,” Sam Hamilton, Visa’s SVP of data and AI, told VentureBeat. “It’s also latency, right? They can handle a lot more cases than they were able to before.” This is just one way Visa is using gen AI to enhance its operations — supported by a deliberately-built, tiered tech stack — while managing risk and keeping fraud at bay.  Secure ChatGPT: Visa’s protected models November 30, 2022, the day ChatGPT was introduced to the world, will go down in history as a pivotal moment for AI.  Not long thereafter, Hamilton noted, “employees at Visa were all asking, ‘Where is my chatGPT?’ ‘Can I use ChatGPT?’ ‘I don’t have access to ChatGPT.’ ‘I want ChatGPT.’” However, as one of the world’s largest digital payments providers, Visa naturally had concerns about its customers’ sensitive data — specifically, that it remained secure, out of the public domain and wouldn’t be used for future model training.   To meet employee demand while balancing these concerns, Visa introduced what it calls ‘Secure ChatGPT,’ which sits behind a firewall and runs internally on Microsoft Azure. The company can control input and output via data loss prevention (DLP) screening to ensure no sensitive data is leaving Visa’s systems.  “All the hundreds of petabytes of data, everything is encrypted, everything is secure at rest and also in transport,” Hamilton explained. Despite the name, Secure ChatGPT is a multi-model interface offering six different options: GPT (and its various iterations), Mistral, Anthropic’s Claude, Meta’s Llama, Google’s Gemini and IBM’s Granite. Hamilton described this as model-as-a-service or RAG-as-a-service.  “Think of that as a kind of a layer where we can provide an abstraction,” he said. Instead of people building their own vector databases, they can pick and choose the API that best fits their particular use case. For instance, if they just need a little bit of fine-tuning, they’ll typically choose a smaller open-source model like Mistral; by contrast, if they’re looking for more of a sophisticated reasoning model, they can choose something like OpenAI o1 or o3.  This way, people don’t feel constrained or as if they’re missing out on what’s readily available in the public domain (which can lead to ‘shadow AI,’ or the use of unapproved models). Secure GPT is “nothing more than a shell on top of the model,” Hamilton explained. “Now they can pick the model they want on top of that.”  Beyond Secure ChatGPT, all Visa developers are given access to GitHub Copilot to assist in their day to day coding and testing. Developers use Copilot and plugins for various integrated development environments (IDEs) to understand code, enhance code and perform unit testing (determining that code runs as intended), Hamilton noted. “So the code coverage [identifying areas where proper testing is lacking] increases significantly because we have this assistant,” he said.  RAG-as-a-service in action One of the most potent use cases for Secure ChatGPT is the handling of policy-related questions specific to a given region.  “As you can imagine, being in 200 countries with different regulations, documents could be thousands and thousands, hundreds of thousands,” Hamilton noted. “That gets really complicated. You need to nail that, right? And it needs to be an exhaustive search.”  Not to mention, local policy changes over time, so Visa’s experts must be up-to-date.  Now with a robust RAG grounded in reliable, up-to-date data, Visa’s AI not only quickly retrieves answers, but provides citations and source materials. “It tells you what you can do or cannot do, and says, ‘Here is the document that you want, I’m giving an answer based on that,’” Hamilton explained. “We have narrowed answers with the knowledge that we have built into the RAG.”  Normally, the exhaustive process would take “if not hours, days” to draw concrete conclusions. “Now I can get that in five minutes, two minutes,” said Hamilton.  Visa’s four-layer ‘birthday cake’ data infrastructure These capabilities are the result of Visa’s heavy investment in data infrastructure over the last 10 years: The finance giant has spent around $3 billion on its tech stack, according to Hamilton.  He describes that stack as a “birthday cake with 4 layers”: The foundation is a ‘data-platform-as-a-service layer, with ‘data-as-a-service,’ an AI and machine learning (ML) ecosystem and data services and products layers built on top.  Data-platform-as-a-service essentially serves as an operating system built on a data lake that aggregates “hundreds of petabytes of data,” Hamilton explained. The layer above, data-as-a-service, serves as a sort of “data highway” with multiple lanes going at different speeds to power hundreds of applications.  Layer three, the AI/ML ecosystem, is where Visa continuously tests models to ensure they are performing the way they should be, and are not susceptible to bias and drift. Finally, the fourth layer is where Visa builds products for employees and clients.  Blocking $40 billion in fraud Being a trusted payment provider, one of Visa’s top priorities is fraud prevention, and AI is playing an increased role here, as well. Hamilton explained that the company has invested more than $10 billion to help reduce fraud and increase network security. Ultimately, this helped the company block $40 billion in attempted fraud in 2024 alone. For instance, a new

Visa’s AI edge: How RAG-as-a-service and deep learning are strengthening security and speeding up data retrieval Read More »

What you need to know about Manus, the new AI agentic system from China hailed as a second ‘DeepSeek moment’

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Stop me if you’ve heard this one before: A little-known Chinese startup is making waves globally for an impressive new AI product. No, we’re not talking about DeepSeek-R1, the AI reasoning model that made waves among western AI circles earlier this year. Instead, the hot new product du jour is Manus, a new AI multipurpose agent — that is, more than an AI model, it’s an interface for controlling multiple models that can autonomously complete complicated tasks like generating reports or running dozens of social media accounts on the user’s behalf. If it sounds similar to the Deep Research modes offered by OpenAI, Google and others, as well as OpenAI’s Operator agent and Anthropic’s Computer Use mode, (the latter two of which can, like Manus, take control of a user’s computer or programs on it, move cursors and type to perform actions within software), then congrats — you’ve understood what it aims to offer. But what do action-oriented leaders and decision makers within enterprises here in the west and abroad — such as CTO, product managers, IT team leaders and more — need to know about Manus and the capabilities it offers? Read on to find out. What is Manus and who’s behind it? Manus AI was officially announced on March 5 on social network X, with a post from its builder Butterfly Effect describing it as “the first general AI agent” that autonomously executes complex tasks rather than just generating ideas. According to South China Morning Post (SCMP), Butterfly Effect has offices in Beijing and Wuhan. The company reportedly has only a few dozen employees, but has rapidly gained attention in China’s AI landscape. The founding team includes entrepreneurs and experienced product managers, led by Xiao Hong, a 33-year-old serial entrepreneur and 2015 graduate of Wuhan’s Huazhong University of Science and Technology. Manus team. Credit: Optics Valley of China/Facebook Xiao previously built WeChat-based applications that were acquired by larger companies and later launched Monica.ai, an AI assistant available as a browser extension and mobile app. On its website, Manus explains that its name comes from the Latin word for “hand,” a nod to the fact that users can rely on it to perform actions for them, or, in my words, to “lend them a hand.” How does Manus AI work? Manus AI is designed as a multi-agent system, meaning it combines several AI models to handle tasks independently. Unlike AI chatbots that assist users by providing information, Manus can research, analyze data, generate reports, automate workflows and even write and deploy code. According to X posts by Ji Yichao, co-founder and chief scientist of Manus AI, the system is built on Anthropic’s Claude 3.5 Sonnet — a nine-month-old AI model — and fine-tuned versions of Alibaba’s Qwen models. The team is currently testing upgrading Manus to Anthropic’s newest and most performant model, Claude 3.7, which is expected to further enhance its reasoning and execution capabilities. Manus AI operates asynchronously, meaning users can assign tasks and walk away while it completes them autonomously. It is currently in private beta, with access granted through invitation codes. How does Manus AI stack up against U.S.-based competition? One of the biggest reasons Manus AI has gained traction is its strong benchmark performance: It beat U.S. firm OpenAI’s own o3-powered Deep Research agent and the “previous state-of-the-art,” according to a graph posted on the official Manus website. This claim, along with real-world tests, has led some AI power users and early adopters to the conclusion that Manus may be one of the most capable autonomous AI agents available today. Beyond benchmarks, Manus has already proven itself on freelance platforms like Upwork and Fiverr and in Kaggle machine learning (ML) challenges, successfully executing complex real-world tasks. AI influencers celebrate Manus’s arrival and impressive performance Conversation about Manus in media and AI circles took off late last week when users on X noted that some people were using it to automate the management of up to 50 social accounts at one time, in realtime, showing off its ability to create fleets of engagement that businesses could use for reviews. In addition, although this hasn’t yet been proven for Manus, the same technology could presumably be used for all kinds of marketing and influence campaigns, even political propaganda or disinformation. But for the most part, AI power users and influencers in the west were largely impressed and celebrated Manus’s arrival — saying they were awed by initial tests once they received scarce beta invites, or observed the work of others with access to the tool. Rowan Cheung, founder of The Rundown AI newsletter, described Manus AI’s launch as a potential turning point for AI agents and said “China’s second DeepSeek moment is here” in a post on his LinkedIn account. “This AI agent called ‘Manus’ is going crazy viral in China right now… It’s like Deep Research + Operator + Claude Computer combined, and it’s REALLY good.” Cheung personally tested Manus and found that it: Created and deployed a biography website about himself, with 100% accuracy and real-time data retrieval. Found top rental spots in San Francisco based on crime rates, AI industry presence and entrepreneurship density. Developed a full AI course, generating eight chapters of content, including tools, use cases and prompts. He received 500 invite codes from the Manus team and has been doling them out to his subscribers and readers. Former Googler and AI-focused YouTuber Bilawal Sidhu shared a hands-on video review, calling Manus “the closest thing I have seen to an autonomous AI agent.” “It’s like you’re standing over the shoulder of somebody using a computer… asking them what to do at the highest level, and it basically does it for you.” Sidhu tested Manus on various tasks, including: Researching locations: Scanning Google Maps and news sources to recommend the best places based on regulations, accessibility and safety. Developing video applications: Automating video

What you need to know about Manus, the new AI agentic system from China hailed as a second ‘DeepSeek moment’ Read More »

Alibaba’s new open source model QwQ-32B matches DeepSeek-R1 with way smaller compute requirements

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Qwen Team — a division of Chinese e-commerce giant Alibaba developing its growing family of open-source Qwen large language models (LLMs) — has introduced QwQ-32B, a new 32-billion-parameter reasoning model designed to improve performance on complex problem-solving tasks through reinforcement learning (RL). The model is available as open-weight on Hugging Face and on ModelScope under an Apache 2.0 license. This means it’s available for commercial and research uses, so enterprises can employ it immediately to power their products and applications (even ones they charge customers to use). It can also be accessed for individual users via Qwen Chat. Qwen-with-Questions was Alibaba’s answer to OpenAI’s original reasoning model o1 QwQ, short for Qwen-with-Questions, was first introduced by Alibaba in November 2024 as an open-source reasoning model aimed at competing with OpenAI’s o1-preview. At launch, the model was designed to enhance logical reasoning and planning by reviewing and refining its own responses during inference, a technique that made it particularly effective in math and coding tasks. The initial version of QwQ released back in November 2024 (called simply, “QwQ”) featured 32 billion parameters as well, and a 32,000-token context length. Alibaba highlighted its ability to outperform o1-preview in mathematical benchmarks like AIME and MATH, as well as scientific reasoning tasks such as GPQA. Despite its strengths, QwQ’s early iterations struggled with programming benchmarks like LiveCodeBench, where OpenAI’s models maintained an edge. Additionally, as with many emerging reasoning models, QwQ faced challenges such as language mixing and occasional circular reasoning loops. However, Alibaba’s decision to release the model under an Apache 2.0 license ensured that developers and enterprises could freely adapt and commercialize it, distinguishing it from proprietary alternatives like OpenAI’s o1. Since QwQ’s initial release, the AI landscape has evolved rapidly. The limitations of traditional LLMs have become more apparent, with scaling laws yielding diminishing returns in performance improvements. This shift has fueled interest in large reasoning models (LRMs) — a new category of AI systems that use inference-time reasoning and self-reflection to enhance accuracy. These include OpenAI’s o3 series and the massively successful DeepSeek-R1 from rival Chinese lab DeepSeek, an offshoot of Hong Kong quantitative analysis firm High-Flyer Capital Management. A new report from web traffic analytics and research firm SimilarWeb found that since the launch of R1 back in January 2024, DeepSeek has rocketed up the charts to become the most-visited AI model-providing website behind OpenAI. Credit: SimilarWeb, AI Global Global Sector Trends on Generative AI QwQ-32B, Alibaba’s latest iteration, builds on these advancements by integrating RL and structured self-questioning, positioning it as a serious competitor in the growing field of reasoning-focused AI. The context length of the new model has been extended to 131,000 tokens, as well — similar to the 128,000 of OpenAI’s models and many others, though Google Gemini 2.0’s context remains superior at 2 million tokens. (Recall context refers to the number of tokens that the LLM can input/output in a single interaction, with higher token count meaning more information. 131,000 tokens is equivalent to around a 300-page book. Scaling up performance with multi-stage reinforcement learning Traditional instruction-tuned models often struggle with difficult reasoning tasks, but the Qwen Team’s research suggests that RL can significantly improve a model’s ability to solve complex problems. QwQ-32B builds on this idea by implementing a multi-stage RL training approach to enhance mathematical reasoning, coding proficiency and general problem-solving. The model has been benchmarked against leading alternatives such as DeepSeek-R1, o1-mini and DeepSeek-R1-Distilled-Qwen-32B, demonstrating competitive results despite having fewer parameters than some of these models. For example, while DeepSeek-R1 operates with 671 billion parameters (with 37 billion activated), QwQ-32B achieves comparable performance with a much smaller footprint — typically requiring 24 GB of vRAM on a GPU (Nvidia’s H100s have 80GB) compared to more than 1500 GB of vRAM for running the full DeepSeek R1 (16 Nvidia A100 GPUs) — highlighting the efficiency of Qwen’s RL approach. QwQ-32B follows a causal language model architecture and includes several optimizations: 64 transformer layers with RoPE, SwiGLU, RMSNorm and Attention QKV bias; Generalized query attention (GQA) with 40 attention heads for queries and 8 for key-value pairs; Extended context length of 131,072 tokens, allowing for better handling of long-sequence inputs; Multi-stage training including pretraining, supervised fine-tuning and RL. The RL process for QwQ-32B was executed in two phases: Math and coding focus: The model was trained using an accuracy verifier for mathematical reasoning and a code execution server for coding tasks. This approach ensured that generated answers were validated for correctness before being reinforced. General capability enhancement: In a second phase, the model received reward-based training using general reward models and rule-based verifiers. This stage improved instruction following, human alignment and agent reasoning without compromising its math and coding capabilities. What it means for enterprise decision-makers For enterprise leaders—including CEOs, CTOs, IT leaders, team managers and AI application developers—QwQ-32B represents a potential shift in how AI can support business decision-making and technical innovation. With its RL-driven reasoning capabilities, the model can provide more accurate, structured and context-aware insights, making it valuable for use cases such as automated data analysis, strategic planning, software development and intelligent automation. Companies looking to deploy AI solutions for complex problem-solving, coding assistance, financial modeling or customer service automation may find QwQ-32B’s efficiency an attractive option. Additionally, its open-weight availability allows organizations to fine-tune and customize the model for domain-specific applications without proprietary restrictions, making it a flexible choice for enterprise AI strategies. The fact that it comes from a Chinese e-commerce giant may raise some security and bias concerns for some non-Chinese users, especially when using the Qwen Chat interface. But as with DeepSeek-R1, the fact that the model is available on Hugging Face for download and offline usage and fine-tuning or retraining suggests that these can be overcome fairly easily. And it is a viable alternative to DeepSeek-R1. Early reactions from AI power users and influencers The release

Alibaba’s new open source model QwQ-32B matches DeepSeek-R1 with way smaller compute requirements Read More »

Beyond RAG: SEARCH-R1 integrates search engines directly into reasoning models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) have seen remarkable advancements in using reasoning capabilities. However, their ability to correctly reference and use external data — information that they weren’t trained on — in conjunction with reasoning has largely lagged behind.  This is an issue especially when using LLMs dynamic, information-intensive scenarios that demand up-to-date data from search engines. But an improvement has arrived: SEARCH-R1, a technique introduced in a paper by researchers at the University of Illinois at Urbana-Champaign and the University of Massachusetts Amherst, trains LLMs to generate search queries and seamlessly integrate search engine retrieval into their reasoning.  With enterprises seeking ways to integrate these new models into their applications, techniques such as SEARCH-R1 promise to unlock new reasoning capabilities that rely on external data sources. The challenge of integrating search with LLMs Search engines are crucial for providing LLM applications with up-to-date, external knowledge. The two main methods for integrating search engines with LLMs are Retrieval-Augmented Generation (RAG) and tool use, implemented through prompt engineering or model fine-tuning.  However, both methods have limitations that make them unsuitable for reasoning models. RAG often struggles with retrieval inaccuracies and lacks the ability to perform multi-turn, multi-query retrieval, which is essential for reasoning tasks.  Prompting-based tool use often struggles with generalization, while training-based approaches require extensive, annotated datasets of search-and-reasoning interactions, which are difficult to produce at scale. (In our own experiments with reasoning models, we found that information retrieval remains one of the key challenges.)  SEARCH-R1 SEARCH-R1 enables LLMs to interact with search engines during their reasoning process as opposed to having a separate retrieval stage. SEARCH-R1 defines the search engine as part of the LLM’s environment, enabling the model to integrate its token generation with search engine results seamlessly.  The researchers designed SEARCH-R1 to support iterative reasoning and search. The model is trained to generate separate sets of tokens for thinking, search, information, and answer segments. This means that during its reasoning process (marked by <think></think> tags), if the model determines that it needs external information, it generates a <search></search> sequence that contains the search query. The query is then passed on to a search engine and the results are inserted into the context window in an <information></information> segment. The model then continues to reason with the added context and when ready, generates the results in an <answer></answer> segment. This structure allows the model to invoke the search engine multiple times as it reasons about the problem and obtains new information (see example below). Example of LLM reasoning with SEARCH-R1 (source: arXiv) Reinforcement learning Training LLMs to interleave search queries with their reasoning chain is challenging. To simplify the process, the researchers designed SEARCH-R1 to train the model through pure reinforcement learning (RL), where the model is left to explore the use of reasoning and search tools without guidance from human-generated data. SEARCH-R1 uses an “outcome-based reward model,” in which the model is only evaluated based on the correctness of the final response. This eliminates the need for creating complex reward models that verify the model’s reasoning process. This is the same approach used in DeepSeek-R1-Zero, where the model was given a task and only judged based on the outcome. The use of pure RL obviates the need to create large datasets of manually annotated examples (supervised fine-tuning). “SEARCH-R1 can be viewed as an extension of DeepSeek-R1, which primarily focuses on parametric reasoning by introducing search-augmented RL training for enhanced retrieval-driven decision-making,” the researchers write in their paper. SEARCH-R1 in action The researchers tested SEARCH-R1 by fine-tuning the base and instruct versions of Qwen-2.5 and Llama-3.2 and evaluating them on seven benchmarks encompassing a diverse range of reasoning tasks requiring single-turn and multi-hop search. They compared SEARCH-R1 against different baselines:‌ direct inference with Chain-of-Thought (CoT) reasoning, inference with RAG, and supervised fine-tuning for tool use. SEARCH-R1 consistently outperforms baseline methods by a fair margin. It also outperforms reasoning models trained on RL but without search retrieval. “This aligns with expectations, as incorporating search into LLM reasoning provides access to relevant external knowledge, improving overall performance,” the researchers write. SEARCH-R1 is also effective for different model families and both base and instruction-tuned variants, suggesting that RL with outcome-based rewards can be useful beyond pure reasoning scenarios. The researchers have released the code for SEARCH-R1 on GitHub. SEARCH-R1’s ability to autonomously generate search queries and integrate real-time information into reasoning can have significant implications for enterprise applications. It can enhance the accuracy and reliability of LLM-driven systems in areas such as customer support, knowledge management, and data analysis. By enabling LLMs to dynamically adapt to changing information, SEARCH-R1 can help enterprises build more intelligent and responsive AI solutions. This capability can be very helpful for applications that require access to constantly changing data, and that require multiple steps to find an answer.  It also suggests that we have yet to explore the full potential of the new reinforcement learning paradigm that has emerged since the release of DeepSeek-R1. source

Beyond RAG: SEARCH-R1 integrates search engines directly into reasoning models Read More »

Gemini 2.0 Flash Thinking now has memory and Google apps integration

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A few months ago, Google added access to reasoning modes to its Gemini AI chatbot. Now, it’s expanded the reach of Gemini 2.0 Flash Thinking Experimental to other features of the chat experience as it doubles down on context-filled responses.  The company announced it’s making Gemini more personal, connected and helpful. It’s also making its version of Deep Research, which searches the Internet for information, more widely available to Gemini users.  Deep Research will now be backed by Gemini 2.0 Flash Thinking Experimental. Google said in a blog post that, by adding the power of Flash Thinking, Deep Research can now give users “a real-time look into how it’s going about solving your research tasks.” The company said this combination will improve the quality of reports done through Deep Research by providing more details and insights.  Before this update, Gemini 1.5 Pro powered Deep Research and was only available on the $20-a-month Google One AI Premium plan. However, VentureBeat’s Carl Franzen found even this now less-powerful version to be a helpful research assistant.  A more personal Gemini Gemini 2.0 Flash Thinking Experimental will also power a new capability called personalization.  Personalization is precisely that: Responses will be more tailored to the user by referencing previous conversations or searches. To enable this level of personalization, Gemini connects to users’ Google apps and services, including Search and Photos. Google emphasized that it will use information from your Google apps only with permission.  “In the coming months, Gemini will expand its ability to understand you by connecting with other Google apps and services, including Photos and YouTube,” Dave Citron, senior director, product management, Gemini app, said in a blog post. “This will enable Gemini to provide more personalized insights, drawing from a broader understanding of your activities and preferences to deliver responses that truly resonate with you.” Since Gemini 2.0 Flash Thinking Experimental is built into the personalization feature, users can see an outline of which data sources the model is tapping to answer queries or to complete requests.  Gemini Advanced users can toggle other preferences they want the chatbot to remember, such as instructing it to refer to past conversations or reminding it of dietary restrictions. This allows Gemini to offer more natural and relevant responses. Of course, Google is not the only company that recognizes the importance of personalized and relevant responses. In November, Anthropic launched its Styles feature, which allows people to customize how Claude speaks to them.  More connected apps As personalization requires access to more data about the user, think of it as RAG, but for a Gemini user rather than an entire organization, with Google connecting more of its services to Gemini 2.0 Flash Thinking Experimental.  The model can tap apps like Calendar, Notes, Tasks and Photos.  “With this thinking model, Gemini can better tackle complex requests like prompts that involve multiple apps, because the new model can better reason over the overall request, break it down into distinct steps, and assess its own progress as it goes,” Citron said.  Google said that in a couple of weeks, Gemini will be able to look at photos in Google Photos and answer questions based on users’ images. It can create travel itineraries based on pictures from recent trips, and recall information like the expiration date for a driver’s license, or whether you happen to have taken a photo of milk in the store.  Integrating applications to provide more context to chatbot responses has been a big trend for AI companies. This has translated to giving chatbots access to developer environments or emails in the enterprise space. ChatGPT can open most IDEs so developers can bring their code from VSCode and query ChatGPT about it. Google’s coding helper, Code Assist, also connects to IDEs.  Google’s increasing app and service integration and personalizing Gemini underscore the importance of context and data in making these chatbots more useful, even if the query is just asking for a restaurant recommendation. source

Gemini 2.0 Flash Thinking now has memory and Google apps integration Read More »

ServiceNow expands AI offerings with pre-built agents, targeting broader enterprise adoption

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More ServiceNow believes that more areas within an enterprise can benefit from agents, and as it upgrades its agent platform and makes acquisitions, the company plans on doubling down on agents even more.  ServiceNow announced the acquisition of Moveworks on Tuesday along with new agent capabilities. It also made its orchestration platform generally available and announced plans to integrate agentic enterprise search.  Amit Zavery, ServiceNow CPO, told VentureBeat that the company sees a lot of potential in the infrastructure around AI.  “There are a lot of things to do in this space and [this new] release has a lot of capabilities around AI agents,” said Zavery. “It’s core elements of a platform, and that’s something we’re going to continue doubling down [on] and investing in.”  ServiceNow has added new agents to its library of pre-built AI agents, including a new security operations (SecOps) agent, a set of autonomous change management agents, and a network test and repair agent. The company also updated its AI Agent Orchestrator and AI Agent Studio, adding new features and making them generally available. Wider agent reach ServiceNow offers pre-built agents for specific teams and also helps customers build their own. The company first announced its library of AI agents in September and has continually added to that library.  The SecOps agent ideally would “help eliminate repetitive tasks and empower SecOps teams to focus on quickly stopping real threats,” said Zavery. The autonomous change management agents will generate custom implementation and tests, while the proactive network test and repair AI agent troubleshoots, detects and diagnoses network issues before performance is impacted. These new agents automate more areas of an organization, and Zavery said ServiceNow is “seeing a lot of new use cases emerge as we see that value for customers.” The company hopes that agents will help employees interact with them differently.  Zavery said that its acquisition of Moveworks and its updated agentic platform allows ServiceNow to expand agentic capabilities to enterprise search, using agents to find any information an employee needs about their organization.  “This is where Moveworks helps us get into and double down on our AI investment going forward,” said Zavery. “We are the workflow leader and have a lot of data capabilities. We’ve been doing a lot of AI-related and very capable use cases delivery there. So that all remains intact and it just accelerates some of those things, especially around AI.” Enhanced orchestration  ServiceNow has also opened up its AI Agent Orchestrator and AI Agent Studio to general availability. The platforms help organizations build agents, onboard and coordinate them. Users can now refer to an analytics dashboard to visualize AI agent usage, quality and value. In a demo to reporters, ServiceNow VP of AI and innovation Dorit Zilbershot said that administrators will be able to clearly see how many tasks the agent is closing and determine if it is working according to plan.  The AI Agent Studio now has guided instructions, which the company says make it “easier than ever to design and configure new AI agents using natural language descriptions.” ServiceNow has long advocated for agents to be part of an organization’s workflow. It also supports “invisible agents” — where employees interact with agents but don’t necessarily know what those agents are doing to fulfill their tasks.  source

ServiceNow expands AI offerings with pre-built agents, targeting broader enterprise adoption Read More »

Nvidia’s Cosmos-Transfer1 makes robot training freakishly realistic—and that changes everything

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia has released Cosmos-Transfer1, an innovative AI model that enables developers to create highly realistic simulations for training robots and autonomous vehicles. Available now on Hugging Face, the model addresses a persistent challenge in physical AI development: bridging the gap between simulated training environments and real-world applications. “We introduce Cosmos-Transfer1, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge,” Nvidia researchers state in a paper published alongside the release. “This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real.” Unlike previous simulation models, Cosmos-Transfer1 introduces an adaptive multimodal control system that allows developers to weight different visual inputs—such as depth information or object boundaries—differently across various parts of a scene. This breakthrough enables more nuanced control over generated environments, significantly improving their realism and utility. How adaptive multimodal control transforms AI simulation technology Traditional approaches to training physical AI systems involve either collecting massive amounts of real-world data — a costly and time-consuming process — or using simulated environments that often lack the complexity and variability of the real world. Cosmos-Transfer1 addresses this dilemma by allowing developers to use multimodal inputs (like blurred visuals, edge detection, depth maps, and segmentation) to generate photorealistic simulations that preserve crucial aspects of the original scene while adding natural variations. “In the design, the spatial conditional scheme is adaptive and customizable,” the researchers explain. “It allows weighting different conditional inputs differently at different spatial locations.” This capability proves particularly valuable in robotics, where a developer might want to maintain precise control over how a robotic arm appears and moves while allowing more creative freedom in generating diverse background environments. For autonomous vehicles, it enables the preservation of road layout and traffic patterns while varying weather conditions, lighting, or urban settings. Physical AI applications that could transform robotics and autonomous driving Dr. Ming-Yu Liu, one of the core contributors to the project, explained why this technology matters for industry applications. “A policy model guides a physical AI system’s behavior, ensuring that the system operates with safety and in accordance with its goals,” Liu and his colleagues note in the paper. “Cosmos-Transfer1 can be post-trained into policy models to generate actions, saving the cost, time, and data needs of manual policy training.” The technology has already demonstrated its value in robotics simulation testing. When using Cosmos-Transfer1 to enhance simulated robotics data, Nvidia researchers found the model significantly improves photorealism by “adding more scene details and complex shading and natural illumination” while preserving the physical dynamics of robot movement. For autonomous vehicle development, the model enables developers to “maximize the utility of real-world edge cases,” helping vehicles learn to handle rare but critical situations without needing to encounter them on actual roads. Inside Nvidia’s strategic AI ecosystem for physical world applications Cosmos-Transfer1 represents just one component of Nvidia’s broader Cosmos platform, a suite of world foundation models (WFMs) designed specifically for physical AI development. The platform includes Cosmos-Predict1 for general-purpose world generation and Cosmos-Reason1 for physical common sense reasoning. “Nvidia Cosmos is a developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster,” the company states on its GitHub repository. The platform includes pre-trained models under the Nvidia Open Model License and training scripts under the Apache 2 License. This positions Nvidia to capitalize on the growing market for AI tools that can accelerate autonomous system development, particularly as industries from manufacturing to transportation invest heavily in robotics and autonomous technology. Real-time generation: How Nvidia’s hardware powers next-gen AI simulation Nvidia also demonstrated Cosmos-Transfer1 running in real-time on its latest hardware. “We further demonstrate an inference scaling strategy to achieve real-time world generation with an Nvidia GB200 NVL72 rack,” the researchers note. The team achieved approximately 40x speedup when scaling from one to 64 GPUs, enabling the generation of 5 seconds of high-quality video in just 4.2 seconds — effectively real-time throughput. This performance at scale addresses another critical industry challenge: simulation speed. Fast, realistic simulation enables more rapid testing and iteration cycles, accelerating the development of autonomous systems. Open-source Innovation: Democratizing Advanced AI for Developers Worldwide Nvidia’s decision to publish both the Cosmos-Transfer1 model and its underlying code on GitHub removes barriers for developers worldwide. This public release gives smaller teams and independent researchers access to simulation technology that previously required substantial resources. The move fits into Nvidia’s broader strategy of building robust developer communities around its hardware and software offerings. By putting these tools in more hands, the company expands its influence while potentially accelerating progress in physical AI development. For robotics and autonomous vehicle engineers, these newly available tools could shorten development cycles through more efficient training environments. The practical impact may be felt first in testing phases, where developers can expose systems to a wider range of scenarios before real-world deployment. While open source makes the technology available, putting it to effective use still requires expertise and computational resources — a reminder that in AI development, the code itself is just the beginning of the story. source

Nvidia’s Cosmos-Transfer1 makes robot training freakishly realistic—and that changes everything Read More »

Successful AI adoption comes down to one thing: Smarter, right-size compute

Presented by AMD As AI adoption accelerates, businesses are encountering compute bottlenecks that extend beyond just raw processing power. The challenge is not only about having more compute; it’s about having smarter, more efficient compute, customized to an organization’s needs, with the ability to scale alongside AI innovation. AI models are growing in size and complexity, requiring architectures that can process massive datasets, support continuous learning and provide the efficiency needed for real-time decision-making. From AI training and inference in hyperscale data centers to AI-driven automation in enterprises, the ability to deploy and scale compute infrastructure seamlessly is now a competitive differentiator. “It’s a tall order. Organizations are struggling to stay up-to-date with AI compute demands, scale AI workloads efficiently and optimize their infrastructure,” says Mahesh Balasubramanian, director, datacenter GPU product marketing at AMD. “Every company we talk to wants to be at the forefront of AI adoption and business transformation. The challenge is, they’ve never been faced before with such a massive, era-defining technology.” Launching a nimble AI strategy Where to start? Modernizing existing data centers is an essential first step to removing bottlenecks to AI innovation. This frees up space and power, improves efficiency and greens the data center, all of which helps the organization stay nimble enough to adapt to the changing AI environment. “You can upgrade your existing data center from a three-generation-old, Intel Xeon 8280 CPU, to the latest generation of AMD EPYC CPU and save up to 68% on energy while using 87% fewer servers3,” Balasubramanian says. “It’s not just a smart and efficient way of upgrading an existing data center, it opens up options for the next steps in upgrading a company’s compute power.” And as an organization evolves its AI strategy, it’s critical to have a plan for fast-growing hardware and computational requirements. It’s a complex undertaking, whether you’re working with a single model underlying organizational processes, customized models for each department or agentic AI. “If you understand your foundational situation – where AI will deployed, and what infrastructure is already available from a space, power, efficiency and cost perspective – you have a huge number of robust technology solutions to solve these problems,” Balasubramanian says. Beyond one-size-fits all compute A common perception in the enterprise is that AI solutions require a massive investment right out of the gate, across the board, on hardware, software and services. That has proven to be one of the most common barriers to adoption — and an easy one to overcome, Balasubramanian says. The AI journey kicks off with a look at existing tech and upgrades to the data center; from there, an organization can start scaling for the future by choosing technology that can be right-sized for today’s problems and tomorrow’s goals. “Rather than spending everything on one specific type of product or solution, you can now right-size the fit and solution for the organizations you have,” Balasubramanian says. “AMD is unique in that we have a broad set of solutions to meet bespoke requirements. We have solutions from cloud to data center, edge solutions, client and network solutions and more. This broad portfolio lets us provide the best performance across all solutions, and lets us offer in-depth guidance to enterprises looking for the solution that fits their needs.” That AI portfolio is designed to tackle the most demanding AI workloads — from foundation model training to edge inference. The latest AMD InstinctTM MI325X GPUs, powered by HBM3e memory and CDNA architecture, deliver superior performance for generative AI workloads, providing up to 1.3X better inference performance compared to competing solutions1,2​. AMD EPYC CPUs continue to set industry standards, delivering unmatched core density, energy efficiency and high-memory bandwidth critical for AI compute scalability​. Collaboration with a wide range of industry leaders — including OEMs like Dell, Supermicro, Lenovo, and HPE, network vendors like Broadcom and Marvell, and switching vendors like Arista and Cisco — maximizes the modularity of these data center solutions. It scales seamlessly from two or four servers to thousands, all built with next gen Ethernet-based AI networking and backed by industry-leading technology and expertise. Why open-source software is critical for AI advancement While both hardware and software are crucial for tackling today’s AI challenges, open-source software will drive true innovation. “We believe there’s no one company in this world that has the answers for every problem,” Balasubramanian says. “The best way to solve the world’s problems with AI is to have a united front, and to have a united front means having an open software stack that everyone can collaborate on. That’s a key part of our vision.” AMD’s open-source software stack, ROCmTM, is widely adopted by industry leaders like OpenAI, Microsoft, Meta, Oracle and more. Meta runs its largest and most complicated model on AMD Instinct GPUs. ROCm comes with standard support for PyTorch, the largest AI framework, and has more than a million models from Hugging Face’s premium model repository enabling customers begin their journey with seamless out of the box experience on ROCm software and Instinct GPUs. AMD works with vendors like PyTorch, Tensorflow, JAX, OpenAI’s Triton and others to ensure that no matter what the size of the model, small or large, applications and use cases can scale anywhere from a single GPU all the way to tens of thousands of GPUs — just as its AI hardware can scale to match any size workload. ROCm’s deep ecosystem engagement with continuous integration and continuous development ensures that new AI functions and features can be securely integrated into the stack. These features go through an automated testing and development process to ensure it fits in, it’s robust, it doesn’t break anything and it can provide support right away to the software developers and data scientists using it. And as AI evolves, ROCm is pivoting to offer new capabilities, rather than locking an organization into one particular vendor that might not offer the flexibility necessary to grow. “We want to give organizations an open-source software stack that is completely open

Successful AI adoption comes down to one thing: Smarter, right-size compute Read More »