VentureBeat

You.com unveils AI research agent that processes 400+ sources at once

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More You.com launched a new artificial intelligence research tool today that promises to transform how businesses conduct market research by analyzing hundreds of sources simultaneously and producing comprehensive reports in minutes instead of weeks. The tool, called Advanced Research & Insights agent (ARI), targets the $250-billion management consulting industry by automating the labor-intensive research process that typically requires teams of analysts to pore over documents for days or weeks. “The entire world of knowledge work is changing, and that’s a trillion-dollar-plus industry,” said Richard Socher, cofounder and CEO of You.com, in an interview with VentureBeat. “When every employee has instant access to comprehensive, validated insights that previously required teams of consultants and weeks of work, it changes the speed and quality of business decision-making.” 10x more sources: How ARI’s technical breakthrough powers enterprise research What sets ARI apart from existing AI research tools is its ability to process and analyze more than 400 sources simultaneously — roughly 10 times the number that competing systems can handle, according to the company. This capability comes from a novel approach to managing context and compressing information. “The way that we’re able to find that many sources is [that] we’re taking this iterative research approach,” Bryan McCann, cofounder and CTO of You.com, told VentureBeat. “We bring back an initial set of sources, summarize and create a first research report, and then gather even more. At each step, we’re compressing that information down so we’re only adding new things.” ARI doesn’t just compile text-based reports. The system automatically generates interactive visualizations based on the data it discovers — a feature You.com claims is unique among current AI research tools. During a demonstration, Socher showed examples of automatically generated reports on renewable energy that included interactive plots showing a variety of information. “It puts together this beautiful PDF,” Socher said. “Since it talks about energy, it includes useful plots looking at market size, market growth expectations, mix of renewables and fossil fuels, solar energy growth rates.” ‘Click to Verify’: How ARI solves AI’s accuracy problem for business users Crucially for enterprise customers, ARI provides direct source verification for every claim. Users can click on any citation and the system will highlight exactly where the information came from, making fact-checking substantially faster. “When you click on the citation, it actually scrolls down and highlights exactly where it found that fact,” Socher demonstrated. “If your career and your job depends on the facts being right, that’s very helpful.” You.com is positioning ARI primarily for enterprise customers in research-intensive industries. Early adopters include Germany’s largest medical publisher, Wort & Bild Verlag, and global consulting firm APCO Worldwide. “We already have several hundreds of active accounts from each major consulting company,” Socher noted. “We’re excited to partner with them and help them be more productive.” Dr. Dennis Ballwieser, managing director at Wort & Bild Verlag, reported that research time “has dropped from a few days to just a few hours” through using ARI, and praised the accuracy of its results across both German and English content. ARI enters an increasingly crowded marketplace for AI research tools. Recent weeks have seen announcements of DeepSeek, Claude 3.7 from Anthropic and various other research-oriented models. Socher claims ARI differentiates itself through comprehensiveness, verification capabilities and speed. “Compared to research tools from OpenAI, for instance, ARI has 10 times the sources, but at the same time, it’s three times faster,” he said. Unlike some competing systems, ARI doesn’t make decisions about which information is most trustworthy but instead presents comprehensive findings. “ARI is optimized for comprehensiveness, so if it comes across contradicting statements, it’s much more likely to just tell you this source said this, and these sources said that,” McCann explained. “It’s not really inclined to make the decision for you as to which information is the most trustworthy.” Beyond public data: how ARI integrates enterprises’ internal knowledge A key aspect of ARI’s enterprise strategy is its ability to incorporate internal company data alongside public sources — creating a bridge between an organization’s proprietary information and the broader research landscape. “The biggest thing that we’re already doing now with enterprise customers is to give ARI access to their company internal data,” Socher said. “So you get all these amazing dashboards and insights right away about your own organization.” You.com is taking an unusual approach to pricing ARI, charging per report rather than based on computational resources consumed — a strategy that aligns costs with business value rather than technical implementation. “We’re looking at pricing not by token anymore, but more on a usage basis, at the actual response level,” McCann explained. “Thinking of it more like: The final piece of collateral that comes out is the thing that you’re paying for — orders of magnitude cheaper than it would have cost in the past.” This approach reflects You.com’s belief that AI usage will follow Jevons paradox: as efficiency increases, total consumption rises rather than falls. “Of course, the training will get cheaper,” said Socher. “But at the same time, when everyone realizes [that] the more compute you give a model, the better the results get, it doesn’t even make sense to think of one model as how intelligent it is.” From research to action: ARI’s future as an autonomous business agent You.com sees ARI as just the first step in transforming how organizations approach knowledge work. The company plans to make ARI more agentic — capable of taking independent actions based on research findings. “For as long as we’ve been building it, we’ve wanted to make it more agentic,” McCann said. “If you can access almost all of the information out there about a thing, that should provide a better foundation for any decision-making on top of that information.” Socher frames You.com’s evolution around what he calls “the four A’s: accurate answers, agents and AGI.” He envisions a future when everyone becomes a manager

You.com unveils AI research agent that processes 400+ sources at once Read More »

AI agents are redefining digital commerce: Don’t let your platform be the bottleneck

Presented by commercetools Digital commerce leaders are under immense pressure. Navigating an increasingly volatile market, while still delivering exceptional value and experiences to customers, is a precarious juggling act — and that’s why it’s time to go all-in on AI. It’s not just about today’s benefits; it’s about preparing for a fast-approaching future. Across industries, AI is delivering on its promise, helping companies create efficiencies while creating outstanding shopping experiences, delivering on time and unifying all touch points. It’s also on the cusp of transforming how we shop, says Dirk Hoerig, Founder and Chief Innovation Officer of commercetools. “Very soon AI will change behavior in humans, and how we interact with companies and products,” Hoerig says. “Companies need to embrace AI now, to leverage its powerful capabilities, and position themselves to take advantage of its potential when AI, not the storefront, will become the center of the customer experience.” For digital commerce, the interaction point for shoppers has always been the storefront, on every device, and a human has done the browsing, selecting, ordering and returning. But agentic shopping is on the horizon, or AI handling all those tasks on behalf of the human consumer. For retailers, that means optimizing product and customer data, pricing, inventory and more for an AI on the hunt at the direction of the human. “The AI is interacting with the brands, the manufacturers, the retailers, but this is not just about putting another layer in between the human and the company,” Hoerig says. “This is a fundamental shift in how shoppers experience brands and retailers, and it’s upending the customer journey, not to mention customer acquisition, marketing and sales tactics.” For example, retailers currently design shopping experiences around human behavior, placing upsell and cross-sell opportunities where shoppers are most likely to add extra items. However, as AI-driven shopping agents become more common, this approach may fall short. These AI shoppers, focused on finding the best product match through data, aren’t swayed by impulse buys. To offset customer acquisition costs and maintain profitability, retailers must rethink their strategies to cater to AI-driven purchasing behavior. It’s already happening, with big tech companies making moves to control the search market, which is often an entry point for shoppers. Social networks are also considering new ways to integrate commerce and product discovery into their customer experiences. The retailers with the right data, and the kind of powerful, flexible infrastructure that composable commerce provides, are positioned to pivot in the direction of agentic shoppers, Hoerig explains. A composable commerce platform gives retailers the ability to create shopping experiences across channels and touchpoints, in a cloud-native, component-based and tech-agnostic way that lets a company structure its data for any AI tool or agent. For instance, organizations with traditional, monolithic commerce platforms will need to find ways to let an agent crawl an array of functions without causing any data breaches. But composable commerce not only lets brands integrate a product catalog into the agentic web, but also allows an AI agent to make a transaction, access return information and create a customer query on behalf of the human, and more, without exposing any internal data. While agentic shopping is breaking over the horizon, AI is already changing the shopping experience here and now. Here’s a look at the AI trends brands need to know about. AI and hyper personalization “The term ‘hyper-personalization’ isn’t new; it’s been used to describe algorithmic optimizations of the product catalog, mostly based on past searches and cohort data,” Hoerig says. “With generative AI, we have a unique opportunity to personalize and tailor the whole experience in real time, from content to tone and presentation.” Generative AI can rewrite the page layout, content and wording, the assortment of products based on a customer’s direct intentions, and offer personalized interactions through chat on the application and website, based on customer context. A 50-year-old shopper will have a different vocabulary and preferred style of communication than an 18-year-old shopper, for instance. Or if you’re in a rush on a travel site, interactions can be short, sweet and transactional. If you’re browsing a beauty site, it can offer more in-depth conversation. Localization is no longer a time-consuming, expensive endeavor — for instance, a retailer won’t have to pick and choose which languages to translate and optimize for their site and content. Translation becomes efficient at scale across any language, even down to local dialects. “It’s the kind of interactions customers crave, driving better customer loyalty, and increasing engagement,” Hoerig says. “If you asked a retailer five years ago, ‘Would you customize interactions based on buyer cohorts, adjust your language and tone to better fit each category’s needs?’ They  would say that sounds like a fine idea, but they’d never do it on a large catalog. Now it’s possible.” The power of predictive operational intelligence AI can process huge sets of data in a very short time, and then come up with ways to improve critical facets of retail operations. That includes inventory optimization, fraud detection in user click behavior, demand forecasting, pricing optimization and more. Supply chain AI. Many retailers have adopted sophisticated, and expensive, demand forecasting software, with algorithms that can forecast inventory trends, offer replenishment advice and more. Adding AI into the mix makes it far less expensive to build and integrate these kinds of tools, and makes them far faster, and far more precise, nearly in real time, and for significantly less money. It even improves the customer buying experience, making tools like click-and-collect more accurate. Fraud prevention. If AI is good at anything, it’s pattern detection, which can be applied directly to fraud prevention. For instance, AI can detect anomalies in real time and determine whether your system is experiencing malicious bot traffic that’s trying to collect data and costs compute power versus an influx of interest from shoppers. Autonomous decision making. Today, it’s critical to create efficiency gains and reduce costs, which becomes more complex when scaling in any context. Combining that with the customer expectation that

AI agents are redefining digital commerce: Don’t let your platform be the bottleneck Read More »

Crunchbase’s AI can predict startup success with 95% accuracy—will it change investing?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Crunchbase will abandon its roots as a historical data provider to become an AI-powered predictions engine that forecasts startup funding rounds, acquisitions and company growth trajectories. The San Francisco-based company announced today it will relaunch its platform with AI models that can predict future business events with up to 95% accuracy, betting that artificial intelligence will fundamentally reshape how investors and companies make decisions about private markets. “The historical data industry as we know it is dead,” said Jager McConnell, CEO of Crunchbase, in an interview with VentureBeat. “If you are a company, a data company, and all you’re dealing with is historical data…I think you’re going to find that you don’t use it as much anymore in the future.” AI disrupts traditional market data; Crunchbase declares the old model ‘dead’ The move marks a dramatic shift for Crunchbase, which built its reputation as a crowdsourced database of startup information over 15 years. McConnell argues that traditional data providers face an existential threat from AI systems that can easily absorb and analyze historical information. “AI companies are an existential threat for data companies, not just software companies,” McConnell said. “If you deal in historical data, once your data gets into these systems, the facts remain facts. Even data behind paywalls eventually leaks, and once it does, your value disappears because AIs can build better insights by combining it with all the data on the internet.” Instead of focusing solely on past events, Crunchbase now leverages its massive dataset — including usage patterns from 80 million active users — to predict future business outcomes. The company’s AI analyzes thousands of signals to forecast events around fundraising, acquisitions and growth. How Crunchbase’s AI uses 80 million users to predict the next big startup According to Megh Gautam, Crunchbase’s chief product officer, the company’s predictions stem from a unique combination of contributed data, captured data from public sources, and anonymized user engagement patterns. “The real magic behind our ability to predict key milestones in company lifecycles lies in our unparalleled breadth and depth of knowledge,” Gautam told VentureBeat. “We’ve built features that are generalized, not tuned to any single dataset.” The company claims its fundraising predictions achieve up to 95% precision and 99% recall in backtesting — meaning it correctly identifies most companies that go on to raise funding, with few false positives. For 12-month predictions, accuracy remains in the “high 70s percent,” according to McConnell. Beyond fundraising, Crunchbase’s AI can predict acquisitions, IPOs, company growth and even potential layoffs — though McConnell said some negative predictions won’t be displayed publicly to avoid causing harm to companies. The future of investing: Can AI outperform human decision-making? The strategic shift comes as investors increasingly seek predictive signals rather than historical data alone. “The problem they’re trying to tackle is, what do we do next?” Gautam said. “Our users want to be first to market.” Looking ahead, McConnell envisions Crunchbase becoming a platform that powers AI-driven investment decisions, potentially including automated investing systems and indexes tracking private market sectors. “I think in five years, everyone’s dead,” McConnell warned, referring to traditional data companies. “The Salesforces of the world have to figure out what their UI experience is going to be like…this thing is so fluid that in five years, a data company that’s not doing the stuff we’re talking about won’t exist.” The transformation positions Crunchbase to compete more directly with both traditional market intelligence providers and emerging AI-powered investment platforms. The company plans to allow customers to incorporate its predictive signals into their own models while it maintains control of its valuable underlying data. Industry analysts note that Crunchbase’s shift comes amid growing interest in using AI for investment decisions, though many investors remain skeptical of fully automated approaches. The company’s success may depend on whether it can maintain high prediction accuracy as it scales while convincing customers to trust its AI-generated insights. McConnell emphasizes that Crunchbase aims to augment rather than replace human decision-making: “We fundamentally believe in augmentation…investments [are] pretty subjective, and your thesis has to match, and the price has to match.” The rebranded platform launches publicly today at Crunchbase.ai, marking what McConnell calls a “precipice of just everything changing” in how investors evaluate private companies. In his view, the future belongs not to those who collect the most data, but to those who can best predict what happens next. source

Crunchbase’s AI can predict startup success with 95% accuracy—will it change investing? Read More »

OpenAI expands Deep Research access to Plus users, heating up AI agent wars with DeepSeek and Claude

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI announced today that it is rolling out its powerful Deep Research capability to all ChatGPT Plus, Team, Education and Enterprise users, significantly expanding access to what many experts consider the company’s most transformative AI agent since the original ChatGPT. According to an announcement on OpenAI’s official X account, Plus, Team, Education and Enterprise users will initially receive 10 deep research queries per month, while Pro tier subscribers will have access to 120 queries monthly. Deep Research, which is powered by a specialized version of OpenAI’s upcoming o3 model, represents a significant shift in how AI can assist with complex research tasks. Unlike traditional chatbots that provide immediate responses, Deep Research independently scours hundreds of online sources, analyzes text, images and PDFs and synthesizes comprehensive reports comparable to those produced by professional analysts. Deep research is now rolling out to all ChatGPT Plus, Team, Edu, and Enterprise users ? — OpenAI (@OpenAI) February 25, 2025 The AI research arms race: DeepSeek’s open challenge meets OpenAI’s premium play The timing of OpenAI’s expanded rollout is hardly coincidental. The generative AI landscape has transformed dramatically in recent weeks, with China’s DeepSeek emerging as an unexpected disruptor. By open-sourcing their DeepSeek-R1 model under an MIT license, the company has fundamentally challenged the closed, subscription-based business model that has defined Western AI development. What makes this competition particularly interesting is the divergent philosophies at play. While OpenAI continues to gate its most powerful capabilities behind increasingly complex subscription tiers, DeepSeek has opted for a radically different approach: Give away the technology and let a thousand applications bloom. Chinese AI company Deepseek recently made waves when it announced R1, an open-source reasoning model that it claimed achieved comparable performance to OpenAI’s o1, at a fraction of the cost. But for those following AI developments closely, Deepseek and R1 didn’t come out of… pic.twitter.com/FUahYP0HHz — Y Combinator (@ycombinator) February 5, 2025 This strategy echoes earlier eras of technology adoption, where open platforms ultimately created more value than closed systems. Linux’s dominance in server infrastructure offers a compelling historical parallel. For enterprise decision-makers, the question becomes whether to invest in proprietary solutions that may offer immediate competitive advantages or embrace open alternatives that could foster broader innovation across their organization. Perplexity’s recent integration of DeepSeek-R1 into its own research tool — at a fraction of OpenAI’s price point — demonstrates how quickly this open approach can yield competing products. Meanwhile, Anthropic’s Claude 3.7 Sonnet has taken yet another path, focusing on transparency in its reasoning process with “visible extended thinking.” deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price. we will obviously deliver much better models and also it’s legit invigorating to have a new competitor! we will pull up some releases. — Sam Altman (@sama) January 28, 2025 The result is a fragmented market where each major player now offers a distinctive approach to AI-powered research. For enterprises, this means greater choice, but also increased complexity in determining which platform best aligns with their specific needs and values. From walled garden to public square: OpenAI’s calculated democratic pivot When Sam Altman writes that Deep Research “probably is worth $1,000 a month to some users,” he’s revealing more than just price elasticity — he’s acknowledging the extraordinary value disparity that exists among potential users. This admission cuts to the heart of OpenAI’s ongoing strategic balancing act. The company faces a fundamental tension: Maintaining the premium exclusivity that funds its development while simultaneously fulfilling its mission of ensuring that “artificial general intelligence benefits all of humanity.” Today’s announcement represents a careful step toward greater accessibility without undermining its revenue model. i think we are going to initially offer 10 uses per month for chatgpt plus and 2 per month in the free tier, with the intent to scale these up over time. it probably is worth $1000 a month to some users but i’m excited to see what everyone does with it! https://t.co/YBICvzodPF — Sam Altman (@sama) February 12, 2025 By limiting free tier users to just two queries monthly, OpenAI is essentially offering a teaser — enough to demonstrate the technology’s capabilities without cannibalizing its premium offerings. This approach follows the classic “freemium” playbook that has defined much of the digital economy, but with unusually tight constraints that reflect the substantial computing resources required for each Deep Research query. The allocation of 10 monthly queries for Plus users ($20/month) compared to 120 for Pro users ($200/month) creates a clear delineation that preserves the premium value proposition. This tiered rollout strategy suggests OpenAI recognizes that democratizing access to advanced AI capabilities requires more than just lowering price barriers — it necessitates a fundamental rethinking of how these capabilities are packaged and delivered. Beyond the surface: Deep Research’s hidden strengths and surprising vulnerabilities The headline figure — 26.6% accuracy on “Humanity’s Last Exam” — tells only part of the story. This benchmark, designed to be extraordinarily challenging even for human experts, represents a quantum leap beyond previous AI capabilities. For context, achieving even 10% on this test would have been considered remarkable just a year ago. What’s most significant isn’t just the raw performance, but the nature of the test itself, which requires synthesizing information across disparate domains and applying nuanced reasoning that goes far beyond pattern matching. Deep Research’s approach combines several technological breakthroughs: multi-stage planning, adaptive information retrieval and, perhaps most crucially, a form of computational self-correction that allows it to recognize and remedy its own limitations during the research process. Yet, these capabilities come with notable blind spots. The system remains vulnerable to what might be called “consensus bias” — a tendency to privilege widely accepted viewpoints while potentially overlooking contrarian perspectives that challenge established thinking. This bias could be particularly problematic in domains where innovation often emerges from challenging conventional wisdom. Moreover, the system’s reliance on existing web

OpenAI expands Deep Research access to Plus users, heating up AI agent wars with DeepSeek and Claude Read More »

The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new wave of AI-powered browser-use agents is emerging, promising to transform how enterprises interact with the web. These agents can autonomously navigate websites, retrieve information and even complete transactions — but early testing reveals significant gaps between promise and performance. While consumer examples offered by OpenAI’s new browser-use agent Operator, like ordering pizza or buying game tickets, have grabbed headlines, the question is where the main developer and enterprise use cases are. “The thing that we don’t know is what will be the killer app,” said Sam Witteveen, cofounder of Red Dragon, a company that develops AI agent applications. “My guess is it’s going to be things that just take time on the web that you don’t actually enjoy.” This includes things like searching the web for a product’s cheapest price or booking the best hotel accommodations. More likely it will be used in combination with other tools like Deep Research, where companies can then do even more sophisticated research plus execution of tasks around the web. Companies need to carefully evaluate the rapidly evolving landscape, as established players and startups take different approaches to solving the autonomous browsing challenge. Key players in the browser-use agent landscape The field has quickly become crowded with major tech companies as well as innovative startups: Operator and Proxy are the most advanced, in terms of being consumer-friendly and out-of-the-box ready. Many of the others appear to be positioning themselves more for developer or enterprise usage. One example is Browser Use, a Y-Combinator startup that allows users to customize the models used with the agent. This gives you more control over how the agent works, including the ability to use a model from your local machine. But it’s definitely more involved. The others listed above provide varying degrees of functionality and interaction with local machine resources. I decided not to even test ByteDance’s UI-TARS for now, because it requested lower-level access to my machine’s security and privacy features (if I test it out, I’ll definitely use a secondary computer).  Testing reveals reasoning challenges So the easiest to test are OpenAI’s Operator and Convergence’s Proxy. In our testing, the results highlighted how reasoning capabilities can matter more than raw automation features. Operator, in particular, was more buggy. For example, I asked the agents to find and summarize VentureBeat’s five most popular stories. It was an ambiguous task, because VentureBeat doesn’t have a “most popular” section per se. Operator struggled with this. It first fell into an infinite scrolling loop while searching for “most popular” stories, requiring manual intervention. In another attempt, it found a three-year-old article titled “Top five stories of the week.” In contrast, Proxy demonstrated better reasoning by identifying the five most visible stories on the homepage as a practical proxy for popularity, and it gave accurate summaries. The distinction became even clearer in real-world tasks. I asked the agents to book a reservation at a romantic restaurant for noon in Napa, California. Operator approached the task linearly — finding a romantic restaurant first, then checking availability at noon. When no tables were available, it reached a dead end. Proxy showed more sophisticated reasoning by starting with OpenTable to find restaurants that were both romantic and available at the desired time. It even came back with a slightly better-rated restaurant. Even seemingly simple tasks revealed important differences. When searching for a “YubiKey 5C NFC price” on Amazon, Proxy quickly found the item more easily than Operator.  OpenAI hasn’t divulged much about the technologies it uses for training its Operator agent, other than saying it has trained its model on browser-use tasks. Convergence, however, has provided more detail: Its agent uses something called Generative Tree Search to “leverage Web-World Models that predict the state of the web after a proposed action has been taken. These are generated recursively to produce a tree of possible futures that are searched over to select the next optimal action, as ranked by our value models. Our Web-World models can also be used to train agents in hypothetical situations without generating a lot of expensive data.” (More here). Benchmarks may be useless for now On paper, these tools appear closely matched. Convergence’s Proxy achieves 88% on the WebVoyager benchmark, which evaluates web agents across 643 real-world tasks on 15 popular websites like Amazon and Booking.com. OpenAI’s Operator scores 87%, while Browser Use says it reaches 89% but only after changing the WebVoyager codebase slightly, it conceded, “according to our needs.” These benchmark scores should really be taken with a grain of salt, though, as they can be gamed. The real test comes in practical usage for real-world cases. It’s very early, the space is so rapidly changing, and these products are changing almost on a daily basis. The results will depend more on the specific jobs you’re trying to do, and you may want to instead rely on the vibes you get while using the different products. Enterprise implications The implications for enterprise automation are significant. As Witteveen points out in our video podcast conversation about this, where we do a deep dive into this browser-use trend, many companies are currently paying for virtual assistants — operated by real people — to handle basic web research and data gathering tasks. These browser-use agents could dramatically change that equation. “If AI takes this over,” Witteveen notes, “that’s going to be some of the first low-hanging fruit of people losing their jobs. It’s going to show up in some of these kinds of things.” This could feed into the robotic process automation (RPA) trend, where browser use is pulled in as just another tool for companies to automate more tasks. And as mentioned earlier, the more powerful use cases will be when an agent combines browser use with other tools, including things like Deep Research, where an LLM-driven agent uses a search tool plus browser use to do more sophisticated

The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator Read More »

OpenAI’s ChatGPT explodes to 400M weekly users, with GPT-5 on the way

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI’s ChatGPT has surpassed 400 million weekly active users, a milestone that underscores the company’s growing reach across both consumer and enterprise markets, according to an X post from chief operating officer Brad Lightcap on Thursday. The rapid expansion comes as OpenAI faces intensifying competition from rivals such as Elon Musk’s xAI and China’s DeepSeek, both of which have recently launched high-performing models aimed at disrupting OpenAI’s dominance. Despite this, OpenAI has seen significant traction in the business sector, with more than two million enterprise users now using ChatGPT at work — doubling from September 2024. “ChatGPT recently crossed 400M WAU, we feel very fortunate to serve 5% of the world every week,” Lightcap wrote. He also noted that usage of OpenAI’s reasoning model API has surged fivefold since the launch of its o3 Mini model, which is designed to enhance logical inference and structured problem-solving capabilities. chatgpt recently crossed 400M WAU, we feel very fortunate to serve 5% of the world every week 2M+ business users now use chatgpt at work, and reasoning model API use is up 5x since o3 mini launch we’ll bring GPT-4.5 and GPT-5 to chat and the API soon, with unlimited GPT-5 for… https://t.co/7hfyUcIyBW — Brad Lightcap (@bradlightcap) February 20, 2025 AI is reshaping the workplace: 2 million businesses now rely on ChatGPT The surge in enterprise adoption represents a crucial validation of OpenAI’s strategy to position ChatGPT as not just a chatbot for casual queries, but as a serious productivity tool for businesses. Companies such as Morgan Stanley, Uber and T-Mobile have integrated OpenAI’s models into their workflows, using AI to generate reports, automate customer service and streamline decision-making. Notably, OpenAI’s progress comes amid heightened scrutiny over the role of generative AI in business-critical applications. The company recently secured its first federal agency customer, USAID, which is deploying ChatGPT Enterprise to reduce administrative burdens and streamline partnerships, according to FedScoop. The expansion into government contracts suggests OpenAI is succeeding in navigating the regulatory hurdles that have slowed adoption of AI in public-sector institutions. At the same time, OpenAI is deepening its presence in Japan through a joint venture with SoftBank, dubbed SB OpenAI Japan. The partnership, which involves a $3 billion annual investment from SoftBank, aims to integrate OpenAI’s technology into major Japanese enterprises, with initial deployments inside SoftBank’s own ecosystem, including its semiconductor subsidiary Arm and digital payments platform PayPay. GPT-5 is coming: OpenAI’s next leap in artificial intelligence Lightcap also revealed that OpenAI is preparing to launch GPT-4.5 and GPT-5, with the latter set to merge the company’s GPT and o-series models into a single, more powerful system. “We’ll bring GPT-4.5 and GPT-5 to chat and the API soon, with unlimited GPT-5 for free users (plus users can run at even higher intelligence),” he wrote. This move signals OpenAI’s ambition to consolidate its AI offerings into a unified model that can handle both general conversational AI tasks and more specialized reasoning-based applications. By integrating the capabilities of its flagship GPT models with the structured problem-solving of the o-series, OpenAI is betting that a one-model-to-rule-them-all approach will give it a competitive edge over rivals that are still segmenting their AI offerings. OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5: We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings. We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten. We hate… — Sam Altman (@sama) February 12, 2025 The timing of the GPT-5 release is particularly critical. Musk’s xAI recently introduced Grok 3, a model that the company claims outperforms OpenAI’s GPT-4o in certain benchmarks, including math, science and coding. Meanwhile, DeepSeek’s rapid rise in China has added to the pressure on OpenAI to maintain its lead in AI sophistication and accessibility. The AI wars: OpenAI, xAI and DeepSeek battle for global dominance OpenAI’s expansion comes at a moment of fierce competition in the AI sector, with rival companies racing to secure market share in both consumer and enterprise applications. Musk, who co-founded OpenAI before departing in 2018, has been vocal about his concerns regarding the company’s shift toward a for-profit model. The billionaire recently launched an unsolicited $97 billion bid to take control of OpenAI, a move that was swiftly rejected by the company’s board. OpenAI has since positioned itself as the leader in enterprise AI deployments, with Microsoft’s backing providing both financial stability and cloud infrastructure. Meanwhile, DeepSeek has disrupted the market with low-cost, open-source AI models that have gained traction, particularly among developers wary of OpenAI’s pricing model. The Chinese firm has claimed that it trained its latest model for under $6 million — an order of magnitude lower than what OpenAI and xAI are spending on comparable systems. What’s next for OpenAI? The future of AI in business and beyond OpenAI’s latest user metrics suggest that the company is still expanding at a rapid clip despite the mounting competition. The leap from 300 million to 400 million weekly active users in just three months indicates that demand for AI-powered tools continues to grow, with businesses increasingly integrating them into their everyday operations. The launch of GPT-5 will be a crucial test of OpenAI’s ability to maintain its leadership in AI. If the model delivers on promises of higher reasoning capability, better personalization and improved efficiency, it could cement OpenAI’s position as the go-to provider for both consumer and enterprise AI applications. However, with Musk’s xAI, DeepSeek and Google’s Gemini models all vying for dominance, OpenAI cannot afford to slow down. The next 12 months will likely determine whether it remains the uncontested leader in generative AI, or whether a new player will disrupt the balance of power in artificial intelligence. source

OpenAI’s ChatGPT explodes to 400M weekly users, with GPT-5 on the way Read More »

OpenAI drops Deep Research access to Plus users, heating up AI agent wars with DeepSeek and Claude

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI announced today that it is rolling out its powerful Deep Research capability to all ChatGPT Plus, Team, Education and Enterprise users, significantly expanding access to what many experts consider the company’s most transformative AI agent since the original ChatGPT. According to an announcement on OpenAI’s official X account, Plus, Team, Education and Enterprise users will initially receive 10 deep research queries per month, while Pro tier subscribers will have access to 120 queries monthly. Deep Research, which is powered by a specialized version of OpenAI’s upcoming o3 model, represents a significant shift in how AI can assist with complex research tasks. Unlike traditional chatbots that provide immediate responses, Deep Research independently scours hundreds of online sources, analyzes text, images and PDFs and synthesizes comprehensive reports comparable to those produced by professional analysts. Deep research is now rolling out to all ChatGPT Plus, Team, Edu, and Enterprise users ? — OpenAI (@OpenAI) February 25, 2025 The AI research arms race: DeepSeek’s open challenge meets OpenAI’s premium play The timing of OpenAI’s expanded rollout is hardly coincidental. The generative AI landscape has transformed dramatically in recent weeks, with China’s DeepSeek emerging as an unexpected disruptor. By open-sourcing their DeepSeek-R1 model under an MIT license, the company has fundamentally challenged the closed, subscription-based business model that has defined Western AI development. What makes this competition particularly interesting is the divergent philosophies at play. While OpenAI continues to gate its most powerful capabilities behind increasingly complex subscription tiers, DeepSeek has opted for a radically different approach: Give away the technology and let a thousand applications bloom. Chinese AI company Deepseek recently made waves when it announced R1, an open-source reasoning model that it claimed achieved comparable performance to OpenAI’s o1, at a fraction of the cost. But for those following AI developments closely, Deepseek and R1 didn’t come out of… pic.twitter.com/FUahYP0HHz — Y Combinator (@ycombinator) February 5, 2025 This strategy echoes earlier eras of technology adoption, where open platforms ultimately created more value than closed systems. Linux’s dominance in server infrastructure offers a compelling historical parallel. For enterprise decision-makers, the question becomes whether to invest in proprietary solutions that may offer immediate competitive advantages or embrace open alternatives that could foster broader innovation across their organization. Perplexity’s recent integration of DeepSeek-R1 into its own research tool — at a fraction of OpenAI’s price point — demonstrates how quickly this open approach can yield competing products. Meanwhile, Anthropic’s Claude 3.7 Sonnet has taken yet another path, focusing on transparency in its reasoning process with “visible extended thinking.” deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price. we will obviously deliver much better models and also it’s legit invigorating to have a new competitor! we will pull up some releases. — Sam Altman (@sama) January 28, 2025 The result is a fragmented market where each major player now offers a distinctive approach to AI-powered research. For enterprises, this means greater choice, but also increased complexity in determining which platform best aligns with their specific needs and values. From walled garden to public square: OpenAI’s calculated democratic pivot When Sam Altman writes that Deep Research “probably is worth $1,000 a month to some users,” he’s revealing more than just price elasticity — he’s acknowledging the extraordinary value disparity that exists among potential users. This admission cuts to the heart of OpenAI’s ongoing strategic balancing act. The company faces a fundamental tension: Maintaining the premium exclusivity that funds its development while simultaneously fulfilling its mission of ensuring that “artificial general intelligence benefits all of humanity.” Today’s announcement represents a careful step toward greater accessibility without undermining its revenue model. i think we are going to initially offer 10 uses per month for chatgpt plus and 2 per month in the free tier, with the intent to scale these up over time. it probably is worth $1000 a month to some users but i’m excited to see what everyone does with it! https://t.co/YBICvzodPF — Sam Altman (@sama) February 12, 2025 By limiting free tier users to just two queries monthly, OpenAI is essentially offering a teaser — enough to demonstrate the technology’s capabilities without cannibalizing its premium offerings. This approach follows the classic “freemium” playbook that has defined much of the digital economy, but with unusually tight constraints that reflect the substantial computing resources required for each Deep Research query. The allocation of 10 monthly queries for Plus users ($20/month) compared to 120 for Pro users ($200/month) creates a clear delineation that preserves the premium value proposition. This tiered rollout strategy suggests OpenAI recognizes that democratizing access to advanced AI capabilities requires more than just lowering price barriers — it necessitates a fundamental rethinking of how these capabilities are packaged and delivered. Beyond the surface: Deep Research’s hidden strengths and surprising vulnerabilities The headline figure — 26.6% accuracy on “Humanity’s Last Exam” — tells only part of the story. This benchmark, designed to be extraordinarily challenging even for human experts, represents a quantum leap beyond previous AI capabilities. For context, achieving even 10% on this test would have been considered remarkable just a year ago. What’s most significant isn’t just the raw performance, but the nature of the test itself, which requires synthesizing information across disparate domains and applying nuanced reasoning that goes far beyond pattern matching. Deep Research’s approach combines several technological breakthroughs: multi-stage planning, adaptive information retrieval and, perhaps most crucially, a form of computational self-correction that allows it to recognize and remedy its own limitations during the research process. Yet, these capabilities come with notable blind spots. The system remains vulnerable to what might be called “consensus bias” — a tendency to privilege widely accepted viewpoints while potentially overlooking contrarian perspectives that challenge established thinking. This bias could be particularly problematic in domains where innovation often emerges from challenging conventional wisdom. Moreover, the system’s reliance on existing web

OpenAI drops Deep Research access to Plus users, heating up AI agent wars with DeepSeek and Claude Read More »

How big U.S. bank BNY manages armies of AI agents

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The financial services industry is one of the most regulated sectors. It also manages huge amounts of data. Conscious of a need for caution, financial companies have slowly added generative AI and AI agents to their stables of services.  The industry is no stranger to automation. But use of the term “agent” has been muted. And understandably, many in the industry took a very cautious stance toward generative AI, especially in the absence of regulatory frameworks. Now, however, banks like JP Morgan and Bank of America have debuted AI-powered assistants. A bank at the forefront of the trend is BNY. The financial services company founded by Alexander Hamilton is updating its AI tool, Eliza (named after Hamilton’s wife), developing it into a multi-agent resource. The bank sees AI agents as providing valuable assistance to its sales representatives while engaging its customers more. A multi-agent approach Sarthak Pattanaik, head of BNY’s Artificial Intelligence Hub told VentureBeat in an interview that the bank began by figuring out how to connect its many units so their information can be easily accessed.  BNY created a lead recommendation agent for its various teams. But it did more. In fact, it uses a multi-agent architecture to help its sales team make suitable recommendations to clients. “We have an agent which has everything [the sales team] know[s] about our client,” Pattanaik said. “We have another agent which talks about products, all the products that the bank has…from liquidity to collateral, to payments, the treasury and so forth. Ultimately…we are trying to solve a client need through the capabilities we have, the product capabilities we have.” Pattanaik added that its agents have reduced the number of people many of its client-facing employees must speak to in order to determine a good recommendation for customers. So, “instead of the salespeople talking to 10 different product managers, 10 different client people, 10 different segment people, all of that is done now through this agent.” The agent lets its sales team answer very specific questions that clients might have. For example, does the bank support foreign currencies like the Malaysian ringgit if a client wants to launch a credit card in the country? How they built it The multi-agent recommendation capabilities debuted in BNY’s Eliza tool.  There are about 13 agents that “negotiate with each other” to figure out a good product recommendation, depending on the marketing segment. Pattanaik explained that the agents range from functional agents like client agents to segment agents that touch on structured and unstructured data. Many of the agents within Eliza have a “sense of reasoning.” The bank understands that its agent ecosystem is not fully agentic. As Pattanaik pointed out, “the fully agentic version would be that it would automatically generate a PowerPoint we can give to the client, but that’s not what we do.” Pattanaik said the bank turned to Microsoft’s Autogen to bring its AI agents to life.  “We started off with Autogen since it is open-source,” he said. “We are generally a builder company; wherever we can use open source, we do it.” Pattanaik said Autogen provided the bank with a set of solid guardrails it can use to ground many of the agents’ responses and make them more deterministic. The bank also looked into LangChain to architect the system.  BNY built a framework around the agentic system that gives the agents a blueprint for responding to requests. To accomplish this, the company’s AI engineers worked closely with other bank departments. Pattanaik underscored that BNY has been building mission-critical platforms for years and has scaled products like its clearance and collateral platforms. This deep bench of knowledge was key to helping the AI engineers in charge of the agent platform give the agents the specialized expertise they needed.  “Having less hallucination is a characteristic that always helps, compared to just having AI engineers driving the engine,” Pattanaik said. “Our AI engineers worked very closely with the full-stack engineers who built the mission-critical systems to help us ground the problem. It’s about componentizing so that it’s reusable.”  Building, for example, a lead-recommendation agent this way allows it to be developed by BNY’s different lines of business. It acts as a microservice “that continues to learn, reason and act.”  Expanding Eliza As its agentic footprint expands, BNY plans to further upgrade its flagship AI tool, Eliza. BNY released the tool in 2024, though it has been in development since 2023. Eliza lets BNY employees access a marketplace of AI apps, get approved datasets and look for insights.  Pattanaik said Eliza is already providing a blueprint for how BNY can move forward with AI agents and offer users more advanced, intelligent service. But the bank doesn’t want to be stagnant, and wants the next iteration of Eliza to be more intelligent. “What we built using Eliza 1.0 is a representation, and the learning aspect of things,” Pattanaik said. “With 2.0, we’re going to improve the process and also ask, how do we build a great agent? If you think about agents, it’s about something that can learn and reason and, at some point in time, provide some actions as to this is a break, this is not a break and so forth. This is the direction we are going towards as we build 2.0, because a lot of things have to be set up in terms of the risk guardrails, the explainability, the transparency, the linkages and so forth, before we become completely autonomous.”  source

How big U.S. bank BNY manages armies of AI agents Read More »

AI still has a hallucination problem: How MongoDB aims to solve it with advanced rerankers and embedding models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More To get the best possible result from an AI query, organizations need the best possible data. The answer that many organizations have had to overcome that challenge is retrieval-augmented generation (RAG). With RAG, results are grounded in data from a database. As it turns out, though, not all RAG is the same, and actually optimizing a database for the best possible results can be challenging. Database vendor MongoDB is no stranger to the world of AI or RAG. The company’s namesake database is already being used for RAG, and MongoDB has also launched AI applications development initiatives. While the company and its users — such a medical giant Novo Nordisk — have had success with gen AI, there is still more to be done. In particular, hallucination and accuracy continues to be an issue holding some organizations back from getting gen AI into production. To that end, MongoDB today announced the acquisition of privately-held Voyage AI, which develops advanced embedding and retrieval models. Voyage raised $20 million in funding in Oct. 2024 in a round supported by cloud data giant Snowflake. The acquisition will bring Voyage AI’s expertise in embedding generation and reranking — critical components for AI-powered search and retrieval — directly into MongoDB’s database platform. “Over the last year, and especially as organizations have tried to think about how they could build AI powered applications, it became increasingly clear that the quality and trust of the applications they build, or the lack thereof, was becoming one of the barriers for applying AI to mission critical use cases,” MongoDB CPO Sahir Azam told VentureBeat. What are the challenges of hallucination? Doesn’t RAG solve them? The basic idea behind RAG is that, instead of simply relying on a knowledge base from trained data, the gen AI engine can get grounded data from a database. Creating highly accurate RAG is quite complex, and there is still a potential risk for hallucinations — a challenge faced by MongoDB and its users. While Azam declined to provide any specific example or incident where gen AI RAG failed a user, he did note that accuracy is always a concern. Improving accuracy and reducing hallucination involves multiple steps. The first is to improve the quality of retrieval (the ‘R’ in RAG). “In many cases, the retrieval quality is not good enough,” Tengyu Ma, founder and CEO of Voyage AI, told VentureBeat. “In the retrieval step, if they are not retrieving relevant information, then the retrieval is not very useful, and the large language model (LLM) hallucinates because it has to guess some context.” The Voyage AI models now part of MongoDB help improve RAG in a few key ways: Domain-specific models and re-rankers: These are trained on large amounts of unstructured data from specific verticals, allowing them to better understand the terminology and semantics of those domains. Customization and fine-tuning:  Users can fine tune the retrieval mechanism for unique datasets and use cases. MongoDB’s competition MongoDB isn’t the first or only vendor to recognize the need for and value of having highly optimized embedding and re-ranker technology. After all, that’s one of the reasons Snowflake invested in Voyage AI and is using the company’s models.  It’s important to note that, even after being acquired by MongoDB, Voyage AI’s models will still be available to Snowflake and to Voyage AI’s other users. The big difference is that Voyage AI will now be increasingly integrated into MongoDB’s database platforms.  Directly integrating advanced embedding models in a database is an approach taken by other rival database vendors, as well. Back in June 2024, DataStax announced its own RAGStack technology that combines advanced embedding and retrieval models. Azam argued that MongoDB is a bit different, though. For one, it is an operational database, as opposed to an analytical database. Also, as opposed to just providing insights and analysis, MongoDB helps power transactions and real-world operations. MongoDB is also what is known as a “document model database,” which has a different structure than a traditional relational database. That structure does not rely on columns and tables, which are not particularly good at representing information about unstructured data (a critical element for AI applications). “We’re the only database technology that combines the management of metadata about a customer’s information, the operations and transactions, which is the heartbeat of what’s happening in the business, as well as the foundation for retrieval — all with a single system,” said Azam. Why Voyage AI matters for agentic AI workflows The need for highly accurate embedding and retrieval models is being further accelerated by agentic AI. “Agentic AI still needs retrieval methods, because an agent cannot make decisions out of context,” said Ma. “Sometimes, actually multiple retrieval components are used in even one decision.” Ma noted that Voyage AI is currently working on specific models that are highly customized for agentic AI use cases. He explained that agentic AI can use different types of queries that can still benefit from more optimization. As gen AI increasingly moves into operational use cases, the need to remove the risk of hallucinations is clearly paramount. While MongoDB has had success with gen AI, Azam expects the integration of Voyage AI to open new mission critical use cases. “If we can now say, ‘Hey, we can give you well north of 90% accuracy for your applications that today may only, in some cases, get to 30 or 60% accuracy for the results,’ the aperture widens in terms of the types of opportunities people can apply AI to in their software applications,” said Azam. source

AI still has a hallucination problem: How MongoDB aims to solve it with advanced rerankers and embedding models Read More »

How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Very small language models (SLMs) can outperform leading large language models (LLMs) in reasoning tasks, according to a new study by Shanghai AI Laboratory. The authors show that with the right tools and test-time scaling techniques, an SLM with 1 billion parameters can outperform a 405B LLM on complicated math benchmarks. The ability to deploy SLMs in complex reasoning tasks can be very useful as enterprises are looking for new ways to use these new models in different environments and applications. Test-time scaling explained Test-time scaling (TTS) is the process of giving LLMs extra compute cylces during inference to improve their performance on various tasks. Leading reasoning models, such as OpenAI o1 and DeepSeek-R1, use “internal TTS,” which means they are trained to “think” slowly by generating a long string of chain-of-thought (CoT) tokens. An alternative approach is “external TTS,” where model performance is enhanced with (as the name implies) outside help. External TTS is suitable for repurposing exiting models for reasoning tasks without further fine-tuning them. An external TTS setup is usually composed of a “policy model,” which is the main LLM generating the answer, and a process reward model (PRM) that evaluates the policy model’s answers. These two components are coupled together through a sampling or search method.  The easiest setup is “best-of-N,” where the policy model generates multiple answers and the PRM selects one or more best answers to compose the final response. More advanced external TTS methods use search. In “beam search,” the model breaks the answer down into multiple steps. For each step, it samples multiple answers and runs them through the PRM. It then chooses one or more suitable candidates and generates the next step of the answer. And, in “diverse verifier tree search” (DVTS), the model generates several branches of answers to create a more diverse set of candidate responses before synthesizing them into a final answer. Different test-time scaling methods (source: arXiv) What is the right scaling strategy? Choosing the right TTS strategy depends on multiple factors. The study authors carried out a systematic investigation of how different policy models and PRMs affect the efficiency of TTS methods. Their findings show that efficiency is largely dependent on the policy and PRM models. For example, for small policy models, search-based methods outperform best-of-N. However, for large policy models, best-of-N is more effective because the models have better reasoning capabilities and don’t need a reward model to verify every step of their reasoning. Their findings also show that the right TTS strategy depends on the difficulty of the problem. For example, for small policy models with fewer than 7B parameters, best-of-N works better for easy problems, while beam search works better for harder problems. For policy models that have between 7B and 32B parameters, diverse tree search performs well for easy and medium problems, and beam search works best for hard problems. But for large policy models (72B parameters and more), best-of-N is the optimal method for all difficulty levels. Why small models can beat large models SLMs outperform large models at MATH and AIME-24 (source: arXiv) Based on these findings, developers can create compute-optimal TTS strategies that take into account the policy model, PRM and problem difficulty to make the best use of compute budget to solve reasoning problems. For example, the researchers found that a Llama-3.2-3B model with the compute-optimal TTS strategy outperforms the Llama-3.1-405B on MATH-500 and AIME24, two complicated math benchmarks. This shows that an SLM can outperform a model that is 135X larger when using the compute-optimal TTS strategy. In other experiments, they found that a Qwen2.5 model with 500 million parameters can outperform GPT-4o with the right compute-optimal TTS strategy. Using the same strategy, the 1.5B distilled version of DeepSeek-R1 outperformed o1-preview and o1-mini on MATH-500 and AIME24. When accounting for both training and inference compute budgets, the findings show that with compute-optimal scaling strategies, SLMs can outperform larger models with 100-1000X less FLOPS. The researchers’ results show that compute-optimal TTS significantly enhances the reasoning capabilities of language models. However, as the policy model grows larger, the improvement of TTS gradually decreases.  “This suggests that the effectiveness of TTS is directly related to the reasoning ability of the policy model,” the researchers write. “Specifically, for models with weak reasoning abilities, scaling test-time compute leads to a substantial improvement, whereas for models with strong reasoning abilities, the gain is limited.” The study validates that SLMs can perform better than larger models when applying compute-optimal test-time scaling methods. While this study focuses on math benchmarks, the researchers plan to expand their study to other reasoning tasks such as coding and chemistry. source

How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs) Read More »