VentureBeat

Ex-OpenAI CTO Mira Murati unveils Thinking Machines: A startup focused on multimodality, human-AI collaboration

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Ever since Mira Murati departed OpenAI last fall, many have wondered about the former CTO’s next move. Well, now we know (or at least have a rough idea).  Murati took to X today to announce her new venture Thinking Machines Lab, an AI research and product company with a goal to “advance AI by making it broadly useful and understandable through solid foundations, open science and practical applications.” Murati posted: “We’re building three things: Helping people adapt AI systems to work for their specific needs; Developing strong foundations to build more capable AI systems; Fostering a culture of open science that helps the whole field understand and improve these systems.” Thinking Machinea’ team of roughly two dozen engineers and scientists is stacked with other OpenAI alums — including cofounder and deep reinforcement learning pioneer John Schulman and ChatGPT co-creator Barret Zoph — which could position the startup to make significant strides in AI research and development.  As of the posting of this article, the company’s newly-launched X account @thinkymachines had already amassed roughly 14,000 followers.  Model intelligence, multimodality, strong infrastructure  Thinking Machines — not to be mistaken with the now defunct supercomputer and AI firm of the 1980s — isn’t yet offering specific examples of intended projects, but suggests a broad focus on multimodal capabilities, human-AI collaboration (as opposed to purely agentic systems) and strong infrastructure. The goal is to build more “flexible, adaptable and personalized AI systems,” the company writes on its new website.  Multimodality is “critical” to the future of AI, Thinking Machines says, as it allows for more natural and efficient communication that captures intent and supports deeper integration. “Instead of focusing solely on making fully autonomous AI systems, we are excited to build multimodal systems that work with people collaboratively,” the company writes.  Going forward, the startup will build models “at the frontier of capabilities” in areas including science and programming. Model intelligence will be key, as will infrastructure quality. “We aim to build things correctly for the long haul, to maximize both productivity and security, rather than taking shortcuts,” Thinking Machines writes.  Calling scientific progress a “collective effort,” the company says it will collaborate with the AI community and publish technical blog posts, papers and code. It will also take an “empirical and iterative” approach to AI safety, and pledges to maintain a “high safety bar” to prevent misuse, perform red-teaming and post-deployment monitoring and share best practices, datasets, code and model specs.  Expanding on a decorated team Thinking Machines boasts an impressive team of scientists, engineers and builders behind top AI models including ChatGPT, Character AI and Mistral, as well as popular open-source projects PyTorch, the OpenAI Gym Python library, the Fairseq sequence modeling toolkit and Meta AI’s Segment Anything.  The startup is looking to expand on that base; it is currently on the lookout for a research program manager, as well as product builders and machine learning (ML) experts. The goal is to put together a “small, high-caliber team” of both PhD holders and the self-taught, the company writes. Those interested should apply here.  Another OpenAI competitor? Murati abruptly resigned from OpenAI in September 2024 — following the unexpected departure of other execs including Schulman and cofounder and former chief scientist Ilya Sutskever — after joining the company in 2018 and ascending to CTO in 2022 (the year that brought us the groundbreaking ChatGPT).  At the time, she teased on X: “I’m stepping away because I want to create the time and space to do my own exploration.” Her next move had been a topic of speculation given her reputation as a steady operational force during OpenAI’s tumultuous period in late 2023, when the board’s attempted ousting of CEO Sam Altman briefly upended the company. Internally, she was seen as a pragmatic leader who kept OpenAI stable through uncertainty. Her decision to strike out on her own follows a broader shift in the AI research landscape — where the breakneck race to train ever-larger models is giving way to an era of applied AI, agentic systems and real-world execution. Her Thinking Machines announcement comes amid a flurry of new model capabilities and benchmark-shattering. OpenAI continues to make innovative breakthroughs — including its new o3-powered Deep Research mode, which scrolls the web and curates extensive reports — but it also faces strong competition in all directions. Just today, for instance, xAI released Grok 3, which rivals GPT-4o’s performance.  With OpenAI cofounder Ilya Sutskever launching Safe Superintelligence and other industry veterans charting their own paths, the question now is where Thinking Machines will place its bets in this evolving landscape. source

Ex-OpenAI CTO Mira Murati unveils Thinking Machines: A startup focused on multimodality, human-AI collaboration Read More »

Elon Musk just released an AI that’s smarter than ChatGPT — here’s why that matters

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Elon Musk’s artificial intelligence startup xAI has unveiled Grok 3, its latest AI model that the company claims outperforms leading competitors across key technical benchmarks. The announcement marks a significant escalation in the race to develop more powerful AI systems. The launch comes just days after Musk’s failed $97.4 billion bid to acquire OpenAI, the company he co-founded with Sam Altman in 2015. During a livestreamed demonstration on X, Musk characterized Grok 3 as “an order of magnitude more capable than Grok 2” and emphasized its ability to reason through complex problems. Early testing appears to support some of xAI’s claims. The model topped the influential Chatbot Arena leaderboard, scoring higher than OpenAI’s GPT-4o, Google’s Gemini and DeepSeek’s V3 model in blind user testing. Published benchmarks show Grok 3 achieving superior scores in mathematics (AIME ’24), scientific reasoning (GPQA) and coding tasks. Grok 3 leads the Chatbot Arena leaderboard with a score of approximately 1400, significantly outperforming other major AI models in blind user testing. (Source: xAI) Inside Grok 3’s massive computing infrastructure: 200,000 GPUs and a new data center “Grok 3 clearly has around state of the art thinking capabilities,” wrote former OpenAI researcher Andrej Karpathy in an X post after early-access testing. “Few models get this right reliably. The top OpenAI thinking models get it too, but all of DeepSeek-R1, Gemini 2.0 Flash Thinking, and Claude do not.” The model’s development required massive computational resources. xAI doubled its GPU cluster to 200,000 Nvidia chips for training, housed in a new Memphis data center. This infrastructure investment highlights the increasing computational demands of advanced AI development, as companies race to build more capable systems. I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check. Thinking✅ First, Grok 3 clearly has an around state of the art thinking model (“Think” button) and did great out of the box on my Settler’s of Catan… pic.twitter.com/qIrUAN1IfD — Andrej Karpathy (@karpathy) February 18, 2025 DeepSearch and advanced reasoning: how Grok 3 aims to outsmart ChatGPT and Google Gemini A key innovation is Grok 3’s “DeepSearch” feature, which combines web searching with reasoning capabilities to analyze information from multiple sources. The system also includes specialized modes for complex problem-solving, including a “Think” function that shows its reasoning process and a “Big Brain” mode that allocates additional computing power to difficult tasks. “The thing to really pay attention to in AI is learning speed. And @xai is learning way faster than any other,” posted tech industry veteran Robert Scoble, citing a conversation with Apple Siri cofounder Tom Gruber. Grok 3 benchmarks. The thing to really pay attention to in AI is learning speed. And @xai is learning way faster than any other. Who said that? Apple Siri cofounder Tom Gruber. He told me at dinner a decade ago that that is the most important thing to pay attention to. pic.twitter.com/yWCiJsN9pU — Robert Scoble (@Scobleizer) February 18, 2025 However, some limitations emerged during testing. Karpathy noted that the model sometimes fabricates citations and struggles with certain types of humor and ethical reasoning tasks. These challenges are common across current AI systems and highlight the ongoing difficulties in developing truly human-like artificial intelligence. Scale.ai CEO Alexandr Wang praised the release, tweeting: “Grok 3 is a new best model in the world from the @xai team!” He noted its superior performance on various benchmarks and expressed enthusiasm for future collaboration. Grok 3 is a new best model in the world from the @xai team! Grok 3 ranks #1 on Chatbot Arena w/a big gap, and scores impressively on pretraining and reasoning evals. congrats to @elonmusk @ibab @jimmybajimmyba @Yuhu_ai_ looking forward to more partnership on grok4 & beyond ? pic.twitter.com/BrPGz17P51 — Alexandr Wang (@alexandr_wang) February 18, 2025 AI industry competition heats up: what Grok 3’s launch means for OpenAI, DeepSeek and the future of artificial intelligence The model will be available through X’s Premium+ subscription ($40/month) and a new standalone “SuperGrok” service ($30/month). Enterprise API access is planned for the coming weeks. This launch intensifies competition in the AI industry, particularly as Chinese startup DeepSeek recently demonstrated comparable performance with reportedly lower computational requirements. The development also raises questions about the sustainability of the computational arms race in AI, as companies invest billions in increasingly powerful hardware infrastructure. In key performance benchmarks, Grok 3 and its mini variant show superior scores across mathematics, science and coding tests compared to competing models from Google, OpenAI, Anthropic and DeepSeek. The full-size Grok 3 model (dark blue) achieved particularly strong results in scientific reasoning. (Source: xAI) Musk emphasized that Grok 3 remains in beta, with improvements expected “almost every day.” The company plans to add voice interaction capabilities within weeks and will open-source its previous model, Grok 2, once the new version stabilizes. Yet perhaps the most telling aspect of Grok 3’s debut isn’t its technical specifications or benchmark scores, but what it represents: the mounting tension between Musk and his former colleagues at OpenAI. Just days after his failed $97.4 billion bid to acquire OpenAI, Musk has unveiled a model that challenges its supremacy — suggesting that in the high-stakes race for AI dominance, even a rejected suitor can become a formidable rival. source

Elon Musk just released an AI that’s smarter than ChatGPT — here’s why that matters Read More »

Snowflake expands AI tools with Anthropic partnership — what it means for businesses

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Snowflake and Anthropic have unveiled a major partnership to embed AI agents directly into corporate data environments, empowering businesses to analyze vast amounts of information while maintaining strict security controls. The companies will integrate Anthropic’s Claude 3.5 Sonnet model into Snowflake’s new Cortex Agents platform, allowing organizations to deploy AI systems that can analyze both structured database information and unstructured content like documents within their existing security frameworks. “We believe that AI agents will soon be essential to the enterprise workforce,” Baris Gultekin, head of AI at Snowflake, said during a media roundtable. “They’ll enhance the productivity for many teams such as customer support analytics and engineering, and they’ll free up employee time to focus on higher value things.” Snowflake strengthens AI capabilities with Anthropic’s Claude 3.5 The partnership addresses a crucial challenge in enterprise AI adoption — deploying powerful AI models securely at scale. Claude will run entirely within Snowflake’s security boundary, eliminating concerns about sending sensitive data to external AI services. “Running Claude within Snowflake’s security perimeter allows customers to build and deploy AI applications while keeping their data governed,” said Mike Krieger, Anthropic’s Chief Product Officer, during the press conference. Early results show promise. Snowflake reports 90% accuracy on complex text-to-SQL tasks in internal benchmarks, significantly outperforming previous approaches. Siemens Energy has already built an AI chatbot analyzing more than half a million pages of internal documents, while Nissan North America achieved 97% accuracy in analyzing customer sentiment about dealer experiences. How Snowflake is using AI to automate business data analysis Cortex Agents orchestrates complex data tasks across both structured databases and unstructured content. The system combines two key components: Cortex Analyst, which converts natural language into accurate database queries, and Cortex Search, a hybrid search system that Snowflake claims outperforms competitors by at least 11% on standard benchmarks. “Having such a state-of-the-art model available to Snowflake customers contributes to the ease of use experience,” said Christian Kleinerman, EVP of product at Snowflake. “Instead of which model to use, and how many prompts I need to go push to get something to behave the way I want it, or answer the question I need… it is phenomenal.” Snowflake’s Cortex Agents promise smarter, faster enterprise AI The partnership signals a shift in enterprise AI strategy. Companies now seek to integrate AI directly into existing data infrastructure, rather than treating it as separate technology. “Nobody is looking for just a token vendor that exchanges input tokens for output tokens,” Krieger explained. “They’re looking for somebody who will help them craft their AI strategy and do so in a way that’s aligned with their values, and also that they trust to remain on the frontier.” The platform includes comprehensive monitoring capabilities and maintains existing access controls and compliance requirements — crucial features as AI regulation evolves. “Some amount of regulatory clarity would be helpful,” Kleinerman noted during the announcement. “But I think it’s on all of us, especially research labs that understand in next-level detail, that we’re involved to help inform how that regulation is formed.” Why Snowflake’s AI strategy focuses on security and governance The partnership offers technical decision makers a potential path to deploy AI at scale while maintaining security and governance. Success will likely depend on careful implementation and clear use cases that deliver measurable business value. For enterprises grappling with growing data volumes and complexity, the ability to deploy AI safely and effectively could become a crucial competitive advantage. The platform’s combination of advanced AI capabilities with robust security controls suggests a future where intelligent agents become an integral part of corporate operations. source

Snowflake expands AI tools with Anthropic partnership — what it means for businesses Read More »

Microsoft’s Muse AI can design video game worlds after watching you play

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft researchers have achieved what many in artificial intelligence considered a distant goal: teaching AI to understand and interact with three-dimensional spaces the way humans do. The breakthrough comes in the form of Muse, an AI model that can comprehend and generate complex gameplay sequences while maintaining consistent physics and character behaviors. The model, detailed in a paper published in Nature, learned entirely from observing human gameplay data — over seven years’ worth — from the Xbox game Bleeding Edge. Unlike traditional AI models that work with text or static images, Muse develops what researchers call a “practical understanding” of how objects, characters and environments interact in three-dimensional space over time. Three key capabilities of Microsoft’s Muse AI system: consistency in physics, diversity in outcomes and persistence of user modifications. (Credit: Microsoft) How Microsoft’s Muse AI sees, learns and plays like a human “The model architecture is agnostic to the game; the only requirement is access to an appropriate dataset,” said Katja Hofmann, senior principal research manager at Microsoft Research, in an exclusive interview with VentureBeat. “We designed the model to use the most general data format, which we call the ‘human interface’ of visuals and controller actions.” This approach allows Muse to generate consistent gameplay sequences lasting up to two minutes — a significant technical achievement in maintaining coherent 3D world interactions over extended periods. The system can take just one second of game visuals as input and generate complex scenarios that respect game physics and character behaviors. However, limitations exist. “Image resolution is fixed to 300×180 pixels,” Hofmann told VentureBeat. “There is a trade-off between model size and speed, meaning that our largest and most consistent models are also slowest at inference time.” Beyond gaming: how Muse could shape architecture, retail and manufacturing The development of Muse was shaped by extensive input from game creators. Microsoft researchers interviewed 27 game developers globally, including studios from both developed and developing nations, to ensure the technology would serve real creative needs. Beyond gaming, Microsoft sees broader applications for the technology. Peter Lee, president of Microsoft Research, highlighted in a blog post potential uses in architecture, retail and manufacturing: “From reconfiguring the kitchen in your home to redesigning a retail space to building a digital twin of a factory floor to test and explore different scenarios. All these things are just now becoming possible with AI.” “The main limitation for applications beyond gaming is access to high-quality data,” Hofmann told VentureBeat. “Gaming is an excellent application area for driving advances, because large amounts of high-quality data can typically be collected more easily than in other 3D environments.” Preserving gaming history and empowering future creators For the gaming industry specifically, Xbox is exploring how this technology could help preserve classic games. “Thanks to this breakthrough, we are exploring the potential for Muse to take older back catalog games from our studios and optimize them for any device,” said Fatima Kardar, corporate vice president of gaming AI at Microsoft, in a blog post. The model achieves three key technical innovations: maintaining coherent physics and game mechanics over extended sequences; generating multiple varied but plausible continuations from the same starting point; and allowing users to modify generated content while maintaining those changes consistently. “I am personally fascinated by Muse’s ability to learn a detailed understanding of a complex 3D environment purely from observing human gameplay data,” Hofmann said. “Our research demonstrates an exciting step towards novel interactive experiences crafted by creatives that are hyper-personalized to and by their players.” Microsoft is releasing the model weights and a demonstrator tool to researchers and creatives under a Microsoft Research License, though this is not yet an enterprise customer offering. This release aims to encourage further research and exploration of the technology’s capabilities. The development signals a broader shift in AI capabilities: from understanding static content like text and images to comprehending dynamic 3D environments and human interactions. This could have far-reaching implications for how we design and interact with virtual spaces across industries. As Microsoft moves to productize this research, it emphasizes that human creativity remains central. The technology is positioned as an assistive tool rather than a replacement for human game designers, aiming to augment rather than automate the creative process. source

Microsoft’s Muse AI can design video game worlds after watching you play Read More »

Decoding OpenAI’s Super Bowl ad and Sam Altman’s grandiose blog post

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More If you were in one of the nearly 40 million U.S. households that tuned into the NFL Super Bowl LIX this year, in addition to watching the Philadelphia Eagles trounce the Kansas City Chiefs, you may have caught an advertisement for OpenAI. This is the company’s first Super Bowl ad, and it cost a reported $14 million — in keeping with the astronomical sums commanded by ads during the big game, which some come to see instead of the football. As you’ll see in a copy embedded below, the OpenAI ad depicts various advancements throughout human history, leading up to ChatGPT today, what OpenAI calls the “Intelligence Age.“ While reaction to the ad was mixed — I’ve seen more praise and defense for it than criticism in my feeds — it clearly indicates that OpenAI has arrived as a major force in American culture, and quite obviously seeks to connect to a long lineage of invention, discovery and technological progress that’s taken place here. Innofensive and simple, or dramatic and stark? On its own, the OpenAI Super Bowl ad seems to me to be a totally inoffensive and simple message designed to appeal to the widest possible audience — perfect for the Super Bowl and its large audience across demographics. In a way, it’s even so smooth and uncontroversial that it is forgettable. But coupled with a blog post OpenAI CEO Sam Altman published on his personal website earlier on Sunday entitled “Three Observations,” suddenly OpenAI’s assessment of the current moment and the future becomes much more dramatic and stark. Altman begins the blog post with a pronouncement about artificial general intelligence (AGI), the raison d’etre of OpenAI’s founding and its ongoing efforts to release more and more powerful AI models such as the latest o3 series. This pronouncement, like OpenAI’s Super Bowl ad, also seeks to connect OpenAI’s work building these models and approaching this goal of AGI with the history of human innovation more broadly. “Systems that start to point to AGI* are coming into view, and so we think it’s important to understand the moment we are in. AGI is a weakly defined term, but generally speaking we mean it to be a system that can tackle increasingly complex problems, at human level, in many fields. People are tool-builders with an inherent drive to understand and create, which leads to the world getting better for all of us. Each new generation builds upon the discoveries of the generations before to create even more capable tools — electricity, the transistor, the computer, the internet and soon AGI.“ A few paragraphs later, he even seems to concede that AI — as many developers and users of the tech agree — is simply another new tool. Yet he immediately flips to suggest this may be a much different tool than anyone in the world has ever experienced to date. As he writes: “In some sense, AGI is just another tool in this ever-taller scaffolding of human progress we are building together. In another sense, it is the beginning of something for which it’s hard not to say ‘this time it’s different’; the economic growth in front of us looks astonishing, and we can now imagine a world where we cure all diseases, have much more time to enjoy with our families and can fully realize our creative potential.“ OpenAI’s (and others’) quest for longevity The idea of “curing all diseases” is certainly appealing — it mirrors something rival tech boss Mark Zuckerberg of Meta also sought out to do with his Chan-Zuckerberg Initiative, a medical research nonprofit co-founded with his wife, Prisicilla Chan. As of two years ago, the timeline proposed to reach this goal was 2100. Yet now, thanks to the progress of AI, Altman seems to believe it’s attainable even sooner, writing: “In a decade, perhaps everyone on earth will be capable of accomplishing more than the most impactful person can today.” Altman and Zuck are hardly the only two high-profile tech billionaires interested in medicine, and longevity science in particular. Google’s co-founders, especially Sergey Brin, have put money towards analogous efforts, and in fact, there were (or are) at one point so many leaders in the tech industry interested in prolonging human life and ending disease that back in 2017, The New Yorker magazine did an extensive feature: “Silicon Valley’s Quest to Live Forever.” This utopian notion of ending disease and ultimately death seems patently hubristic to me on the face of it — how many folklore stories and fairy tales are there about the perils of trying to cheat death? — but it aligns neatly with the larger techno-utopian beliefs of some in the industry, which have been helpfully grouped by AGI critics and researchers Timnit Gebru and Émile P. Torres under the umbrella term TESCREAL, an acronym for “transhumanism, extropianism, singularitarianism, (modern) cosmism, rationalism, effective altruism and longtermism.” As these authors elucidate, the veneer of progress sometimes masks uglier beliefs, such as the inherent racial superiority or humanity of those with higher IQs or specific demographics, ultimately evoking racial science and phrenology of more openly discriminatory and oppressive ages past. There’s nothing to suggest in Altman’s note that he shares such beliefs, mind you…in fact, rather the opposite. He writes: “Ensuring that the benefits of AGI are broadly distributed is critical. The historical impact of technological progress suggests that most of the metrics we care about (health outcomes, economic prosperity) get better on average and over the long-term, but increasing equality does not seem technologically determined and getting this right may require new ideas.” In other words: He wants to ensure everyone’s life gets better with AGI, but is uncertain how to achieve that. It’s a laudable notion, and one that maybe AGI itself could help answer. But, for one thing, OpenAI’s latest and greatest models remain closed and proprietary as opposed to competitors such

Decoding OpenAI’s Super Bowl ad and Sam Altman’s grandiose blog post Read More »

Perplexity just made AI research crazy cheap—what that means for the industry

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Perplexity has shattered the AI market’s status quo today by launching Deep Research, a tool that generates comprehensive research reports in minutes and opens advanced AI capabilities to users at a fraction of typical enterprise costs. “Thankful for open source! We’re going to keep making this faster and cheaper,” Perplexity CEO Aravind Srinivas wrote in a post on X. “Knowledge should be universally accessible and useful. Not kept behind obscenely expensive subscription plans that benefit the corporates, not in the interests of humanity!” Thankful for open source! We’re going to keep making this faster and cheaper. Knowledge should be universally accessible and useful. Not kept behind obscenely expensive subscription plans that benefit the corporates, but not in the interests of humanity! https://t.co/mtG4oZhl4z pic.twitter.com/M1yHtXJKUe — Aravind Srinivas (@AravSrinivas) February 14, 2025 Perplexity Deep Research is redefining AI pricing — can enterprise AI survive? The launch exposes a painful truth in AI pricing: Expensive enterprise subscriptions may be unnecessary. While Anthropic and OpenAI charge thousands monthly for their services, Perplexity offers five free queries daily to all users. Pro subscribers pay $20 monthly for 500 daily queries and faster processing — a price point that could force larger AI companies to explain why their services cost up to 100 times more. Companies have been significantly increasing their AI investments, with enterprise AI spending expected to rise by 5.7% in 2025, despite overall IT budget increases of less than 2%. Some businesses are planning to increase their AI spending by 10% or more, with an average increase of $3.4 million dedicated to AI initiatives. These investments now look questionable as Perplexity delivers similar capabilities at consumer prices. In a typical query, Perplexity’s Deep Research tool performs 8 searches and consults 42 sources to generate a 1,300-word report in under 3 minutes. (Credit: Perplexity) How Perplexity Deep Research is outperforming Google and OpenAI Deep Research’s technical achievements suggest expensive AI services may be overpriced rather than superior. The system scored 93.9% accuracy on the SimpleQA benchmark and reached 20.5% on Humanity’s Last Exam, outperforming Google’s Gemini Thinking and other leading models. OpenAI’s Deep Research still leads with 26.6% on the same exam, but OpenAI charges $200 percent for that service. Perplexity’s ability to deliver near-enterprise level performance at consumer prices raises important questions about the AI industry’s pricing structure. “Deep Research on Perplexity completes most tasks in under 3 minutes,” the company announced, highlighting its ability to perform dozens of searches and analyze hundreds of sources simultaneously. The tool combines web search, coding capabilities and reasoning functions to refine research iteratively — mimicking expert human researchers but at machine speed. Perplexity Deep Research achieved a 93.9% accuracy score on the SimpleQA benchmark, substantially higher than competing models from OpenAI, Google and Anthropic. (Credit: Perplexity) Why Perplexity’s affordable AI is breaking down barriers to advanced technology The implications stretch beyond pricing. Enterprise AI has created a digital divide between well-funded companies and everyone else. Small businesses, researchers and professionals who couldn’t afford thousand-dollar subscriptions were effectively locked out of advanced AI capabilities. Perplexity’s approach changes this calculation. The tool handles complex tasks, from financial analysis and market research to technical documentation and healthcare insights. Users can export findings as PDFs or share them through Perplexity’s platform, potentially replacing expensive research subscriptions and specialized tools. Deep Research is now a commodity thanks to Perplexity pic.twitter.com/Fk8yvPTLzV — Aravind Srinivas (@AravSrinivas) February 14, 2025 The company plans to expand Deep Research to iOS, Android, and Mac platforms, which could accelerate adoption among users who previously viewed AI tools as out of reach. This broad access may prove more valuable than any technical breakthrough — finally putting advanced AI capabilities in the hands of users who need them most. For technical decision makers, this shift demands attention. Companies paying premium prices for AI services should examine whether those investments deliver value beyond what Perplexity now offers at a fraction of the cost. The answer may reshape how organizations approach AI spending in 2025 and beyond. While Perplexity’s competitors scramble to justify their premium pricing, thousands of users are already testing Deep Research’s capabilities. Their verdict might matter more than any benchmark: In AI’s new reality, the best technology isn’t the one that costs the most — it’s the one people can actually use. source

Perplexity just made AI research crazy cheap—what that means for the industry Read More »

Fast break AI: How Databricks helped the Pacers slash ML costs 12,000X% while speeding up insights

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Stats might be everything in basketball — but for Pacers Sports and Entertainment (PS&E), data about fans is just as valuable.  Yet while the parent company of the Indianapolis Pacers (NBA), the Indiana Fever (WNBA) and the Indiana Mad Ants (NBA G League) was pumping untold amounts of it into a $100,000-a-year machine learning (ML) platform to generate predictive models around such factors as pricing and ticket demand, the insights weren’t coming fast enough.  Jared Chavez, manager of data engineering and strategy, set out to change that, making the move to Databricks on Salesforce a year-and-a-half ago.  Now? His team is performing the same range of predictive projects with careful compute configurations to gain critical insights into fan behavior — for just $8 a year. It’s a jaw-dropping, seemingly unthinkable decrease Chavez credits largely to his team’s ability to reduce ML compute to near-infinitesimal amounts.   “We’re very good at optimizing our compute and figuring out exactly how far we can push down the limit to get our models to run,” he told VentureBeat. “That’s really what we’ve been known for with Databricks.”  PS&E cuts OpEx by 98% In addition to its three basketball teams, the Indianapolis-based PS&E hosts March Madness games and runs a busy, 300-plus day event business through the Gainbridge Fieldhouse arena (concerts, comedy shows, rodeos, other sporting events). Further, the company just last month announced plans to build a $78 million Indiana Fever Sports Performance Center, which will be connected by skybridge to the arena and a parking garage (expected to open in 2027).  All this makes for a mind-boggling amount of data — and data sprawl. From a data infrastructure standpoint, Chavez pointed out that, up until two years ago, the organization hosted two completely independent warehouses built on Microsoft Azure Synapse Analytics. Different teams across the business all used their own form of analytics, and tooling and skill sets varied wildly.  While Azure Synapse did a great job connecting to external platforms, it was cost-prohibitive for an organization of PS&E’s size, he explained. Also, integrating the company’s ML platform with Microsoft Azure Data Studio led to fragmentation.  To address these problems, Chavez switched over to Databricks AutoML and the Databricks Machine Learning Workspace in August 2023. The initial focus was to configure, train and deploy models around ticket pricing and game demand.  Both technical and non-technical users immediately found the platforms helpful, Chavez noted, and they quickly sped up the ML process (and plummeted costs).  “It dramatically improves response times for my marketing team, because they don’t have to know how to code,” said Chavez. It’s all buttons for them, and all that data comes back down to Databricks as unified records.” Further, his team organized the company’s 60-some-odd systems into Salesforce Data Cloud. Now, he reports that they have 440X more data in storage and 8X more data sources in production.  PS&E today operates at just under 2% of its previous annual OPEX costs. “We saved hundreds of thousands a year just on operations,” said Chavez. “We reinvested it into customer data enrichment. We reinvested into better tooling for not just my team, but the analytics units around the company.”  Continued refinement, deep understanding of data How did his team get compute so staggeringly low? Databricks has continually refined cluster configurations, enhanced connectivity options to schemas and integrated model outputs back into PS&E’s data tables, Chavez explained. The powerful ML engine is “continuously enriching, refining, merging and predicting” on PS&E’s customer records across every system and revenue stream.  This leads to better-informed predictions with each iteration — and in fact, the occasional AutoML model sometimes makes it straight to production without any further tweaking from his team, Chavez reported.  “Truthfully, it’s just knowing the size of the data going in, but also roughly how long it is going to take to train,” said Chavez. He added: “It’s on the smallest cluster size you could possibly run, it might just be a memory-optimized cluster, but it’s just knowing Apache Spark fairly well and knowing which way we could store and read the data fairly optimally.” Who’s most likely to buy season tickets? One way Chavez’ team is using data, AI and ML is in propensity scoring for season tickets packages. As he put it: “We sell an ungodly number of them.” The goal is to determine which customer characteristics influence where they choose to sit. Chavez explained that his team is geo-locating addresses they have on file to make correlations between demographics, income levels and travel distances. They’re also analyzing users’ purchase histories across retail, food and beverage, mobile app engagement and other events they might attend on PS&E’s campus.  Further, they’re pulling in data from Stubhub, Seat Geek and other vendors outside of Ticketmaster to evaluate price points and determine how well inventories are moving. This can all be married with everything they know about a given customer to figure out where they’re going to sit, Chavez explained.  Armed with that data, they could then, for instance, upsell a given customer from Section 201 to section 101. “Now we’re able to not only resell his seat in the higher deck, we can also sell another smaller package on the same seats he purchased in the mid-season, using the same characteristics for another person,” said Chavez.  Similarly, data can be used to enhance sponsorships, which are critical to any sports franchise.  “Of course, they want to align with organizations who overlap with theirs,” said Chavez. “So can we better enrich? Can we better predict? Can we do custom segmentation?” Ideally, the goal is an interface where any user could ask questions like: ‘Give me a section of the Pacers fan base in their mid-to-late 20s with disposable income.’ Going even further: ‘Look for those that make more than $100K a year and have an interest in luxury vehicles.’ The interface could then bring back a

Fast break AI: How Databricks helped the Pacers slash ML costs 12,000X% while speeding up insights Read More »

20,000 AI agents per company: How will they all work together?

Presented by Outshift by Cisco AI agents are poised to transform how we work, but there’s a critical challenge ahead: getting them to work together effectively at massive scale. As organizations deploy thousands of AI agents across their operations, the need for a new kind of internet — the Internet of Agents (IoA) — has become increasingly urgent, says Vijoy Pandey, SVP of Outshift by Cisco. “You’ll have agents sitting within all software, whether it’s business software, personal software, avatars on the social network or embodied AI sitting inside robotic entities and doing physical work,” Pandey says. “In the B2B context, all SaaS in the future is going to have multiple agents within it, agentic software from different providers coming together to talk to each other to solve a business outcome.” This is why we need the Internet of Agents: an open, interoperable internet that will revolutionize how agents  collaborate in a quantum safe way. But doing that at scale is the central challenge, he says. It’s not just about the number of agents — an average-size organization will soon deploy upwards of 20,000 agents, and an enterprise could be handling hundreds of thousands. It’s about enabling them to communicate and interact effectively across vendor boundaries, security domains and profiles. This is why we need the Internet of Agents:  an open, interoperable internet that will revolutionize how agents  collaborate in a quantum safe way. Building ensembles of agents The IoA, powered by sophisticated LLMs and machine learning algorithms, needs to not only understand user intent but proactively act on it, communicate and seamlessly collaborate in a multi-agent framework, and stitch together workflows to execute a broad array of tasks across domains, all without explicit commands. Each individual agent can be considered a subject matter expert — a master artisan even, in its own particular context and domain, Pandey says. The challenge is this: bringing each of these disparate, specialized agents together, in order to activate this future of collaboration and advance the power of agentic AI. “A collection of these agents needs to come together, whether it’s a symphony or a trio, pick the right agents to solve for the problem at hand,” Pandey says. “How do you deal with that in communication and scaling out? There are issues of identity, trust, authentication. Then you think about planning and composition. Which skill sets do you need to bring together to build out your symphony or your quartet?” While today’s internet excels at sharing data between humans and systems, AI agents need to share complex states, reasoning processes and make coordinated decisions in real-time. This collaboration framework is fundamentally different from how current internet infrastructure works. While today’s internet excels at sharing data between humans and systems, AI agents need to share complex states, reasoning processes and make coordinated decisions in real-time — all while maintaining security and trust. Each agent has a set of tools it can access, training and an inherent skill set driven by the LLM. It lives in an environment of data sources, institutional knowledge bases, techniques like RAG and dynamic database access, all of which need to be brought into the equation. And communication is now probabilistic in nature, after decades of deterministic software — and every agent may have its own language. But even when the language hurdle is managed, every agent might interpret things differently, depending on its context. And finally, there’s a massive amount of state being exchanged, with multimodal agents exchanging video, images and text to manage an end-to-end workflow. Why an open ecosystem remains critical Pandey uses code development as an example of how a universal infrastructure and tooling for AI agents is critical. Generating and productionizing code is a laborious process, requiring syntactic checkers, semantic checkers, security agents, scaling agents and compliance agents before it hits production and runtime. A developer can pick the best of breed of these from various providers, stitch them together, build out code and push it to production. However, these agents aren’t fully contained within one provider’s wheelhouse, and not even in each individual provider’s training and data set capabilities, because the developer’s code base and APIs are included in development. And that whole production needs to scale out in a trustworthy, scalable, explainable and secure way. “No walled gardens, which means it maximizes value for every entity in that value chain. For the software developer, for the operator, for the customer and consumer.” “All of these things can happen, and the best way possible is to build it in an open, interoperable manner,” Pandey says. “Because what does openness give us? No walled gardens, which means it maximizes value for every entity in that value chain. For the software developer in this case, for the operator, for the customer and consumer.” Moving from closed gardens to open systems Open ecosystems are the basis of technological progress. The internet began with proprietary systems like Solaris and Sun boxes and proprietary databases. But the internet actually took off when open source, open standard technologies like Linux, Apache, MySQL, PHP and the LAMP stack appeared. Cloud took off only when cloud native open-source ecosystems came about, like Kubernetes and Docker containers, and were adopted by everybody. “For discoverability, reputation, identity, risk management, and communication. If you don’t have those things in an open standards way, then these things don’t communicate with each other. We will not have the Internet of Agents until that happens.” “We’re at the same point here where yes, you can build proprietary systems, but if you want to expedite the development of agentic software, and if you want to expedite the way these pieces of software interact with each other inside an organization, all across various providers, the only way to do this is an open, interoperable way,” Pandey says. “Open source to drive the development of these pieces of software, and open source or specification-based outcomes for discoverability, reputation, identity, risk management, and communication. If you don’t have those things in an open standards

20,000 AI agents per company: How will they all work together? Read More »

Identify key challenges, sandbox, assess vendors — how to accelerate your AI journey

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More With 77% of companies already using or exploring the use of AI, and more than 80% claiming it’s a top priority, leaders are eager to get maximum value from the technology. However, the volume of solutions available and onslaught of marketing messages accompanying them can make finding a clear path difficult. Here are some guidelines to help you evaluate AI tools’ capabilities and determine the best fit for your organization.  When the media lauds a particular platform, or you discover your competitors are using the same one, it’s natural to wonder if you should, too. But before examining a new system, identify the problems your business is facing. What are its key challenges? Its core needs? Once you’ve redirected your focus, reframe the solution you’re considering through this lens.  If AI technology will solve well-defined, measurable issues your company has been encountering (that is, automating routine tasks or increasing team productivity), the tool is worth exploring. If it doesn’t directly connect to solving your problems, move on. AI can be incredibly powerful, but it does have limitations. Your goal should be to only apply it to the areas where it can make the most meaningful impact.  Pilot programs and experimental budgets When you’ve determined that a given system may strategically support your needs, you’ve fulfilled the first necessary criteria — but this doesn’t mean you’re ready to make a purchase. The next step is to take time to test the technology significantly through a small-scale pilot program to determine its efficacy.  The most valuable testing uses a framework connecting to crucial key performance indicators (KPIs). According to Google Cloud: “KPIs are essential in gen AI deployments for a number of reasons: Objectively assessing performance, aligning with business goals, enabling data-driven adjustments, enhancing adaptability, facilitating clear stakeholder communication and demonstrating the AI project’s ROI. They are critical for measuring success and guiding improvements in AI initiatives.”  In other words, your testing framework could be based on accuracy, coverage, risk or whichever KPI is most important to you. You just need to have clear KPIs. Once you do, gather five to 15 people to perform the testing. Two teams of seven people are ideal for this. As those experienced individuals begin testing those tools, you will be able to gather enough input to determine whether this system is worth scaling.  Leaders often ask what they should do if a vendor isn’t willing to do a pilot program with them. This is a valid question, but the answer is simple. If you find yourself in this situation, do not engage further with the company. Any worthy vendor will consider it an honor to create a pilot program for you.  Additionally, plan ahead and set aside funds for an experimental AI budget. This should be where you turn when you want to try various solutions without overcommitting resources. Even if everything seems to be going seamlessly, give your team plenty of time to familiarize themselves with the technology and adapt before making a purchase or scaling up.  Prioritize data security and vendor transparency When you consider a platform, remember you’re not just evaluating the technology but the company behind it. Vendors should be put through just as much scrutiny — if not more — than the technology itself. Make sure you only work with vendors that maintain the highest standards in terms of data security. They should adhere to global standards for data protection and ethical AI principles, and the platforms themselves should be certified as SOC 2 Type 1, SOC 2 Type 2, the general data protection regulation (GDPR) and ISO 27001. Furthermore, verify that your vendors aren’t using your company’s data for AI training purposes without explicit consent. Virtual meeting provider Zoom is an example of a popular company that had planned to harvest customer content for use in its AI and ML models. Even though they ultimately didn’t carry out these plans, the incident should raise concerns for enterprises and consumers alike. If you put a dedicated AI lead in charge of this area, this person can manage all data security needs and ensure organizational compliance. This might feel like unnecessary, additional work, but it’s essential. Remember that all it takes is a single data breach by one of your providers to make you lose customer trust — if not your customers.  Final thoughts Leaders must use a structured approach to assessing AI solutions to get maximum value from them. Focus first on problem-solving, followed closely by testing and pilot programs, data security and identifying tangible value. AI can be immensely powerful, but only when applied to the right problems after careful selection and implementation.  Arjun Pillai is co-founder and CEO of DocketAI. DataDecisionMakers Welcome to the VentureBeat community! DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation. If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers. You might even consider contributing an article of your own! Read More From DataDecisionMakers source

Identify key challenges, sandbox, assess vendors — how to accelerate your AI journey Read More »

Out-analyzing analysts: OpenAI’s Deep Research pairs reasoning LLMs with agentic RAG to automate work

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprise companies need to take note of OpenAI’s Deep Research. It provides a powerful product based on new capabilities, and is so good that it could put a lot of people out of jobs. Deep Research is on the bleeding edge of a growing trend: integrating large language models (LLMs) with search engines and other tools to greatly expand their capabilities. (Just as this article was being reported, for example, Elon Musk’s xAI unveiled Grok 3, which claims similar capabilities, including a Deep Search product. However, it’s too early to assess Grok 3’s real-world performance, since most subscribers haven’t actually gotten their hands on it yet.) OpenAI’s Deep Research, released on February 3, requires a Pro account with OpenAI, costing $200 per month, and is currently available only to U.S. users. So far, this restriction may have limited early feedback from the global developer community, which is typically quick to dissect new AI advancements. With Deep Research mode, users can ask OpenAI’s leading o3 model any question. The result? A report often superior to what human analysts produce, delivered faster and at a fraction of the cost. How Deep Research works While Deep Research has been widely discussed, its broader implications have yet to fully register. Initial reactions praised its impressive research capabilities, despite its occasional hallucinations in its citations. There was the guy who said he used it to help his wife who had breast cancer. It provided deeper analysis than what her oncologists provided on how radiation therapy was the right course of action, he said. The consensus, summarized by Wharton AI professor Ethan Mollick, is that its advantages far outweigh occasional inaccuracies, as fact-checking takes less time than what the AI saves overall. This is something I agree with, based on my own usage. Financial institutions are already exploring applications. BNY Mellon, for instance, sees potential in using Deep Research for credit risk assessments. Its impact will extend across industries, from healthcare to retail, manufacturing, and supply chain management — virtually any field that relies on knowledge work. A smarter research agent Unlike traditional AI models that attempt one-shot answers, Deep Research first asks clarifying questions. It might ask four or more questions to make sure it understands exactly what you want. It then develops a structured research plan, conducts multiple searches, revises its plan based on new insights, and iterates in a loop until it compiles a comprehensive, well-formatted report. This can take between a few minutes and half an hour. Reports range from 1,500 to 20,000 words, and typically include citations from 15 to 30 sources with exact URLs, at least according to my usage over the past week and a half. The technology behind Deep Research: reasoning LLMs and agentic RAG Deep Research does this by merging two technologies in a way we haven’t seen before in a mass-market product.  Reasoning LLMs: The first is OpenAI’s cutting-edge model, o3, which leads in logical reasoning and extended chain-of-thought processes. When it was announced in December 2024, o3 scored an unprecedented 87.5% on the super-difficult ARC-AGI benchmark designed to test novel problem-solving abilities. What’s interesting is that o3 hasn’t been released as a standalone model for developers to use. Indeed, OpenAI’s CEO Sam Altman announced last week that the model instead would be wrapped into a “unified intelligence” system, which would unite models with agentic tools like search, coding agents and more. Deep Research is an example of such a product. And while competitors like DeepSeek-R1 have approached o3’s capabilities (one of the reasons why there was so much excitement a few weeks ago), OpenAI is still widely considered to be slightly ahead. Agentic RAG: The second, agentic RAG, is a technology that has been around for about a year now. It uses agents ​​to autonomously seek out information and context from other sources, including searching the internet. This can include other tool-calling agents to find non-web information via APIs; coding agents that can complete complex sequences more efficiently; and database searches. Initially, OpenAI’s Deep Research is primarily searching the open web, but company leaders have suggested it would be able to search more sources over time. OpenAI’s competitive edge (and its limits) While these technologies are not entirely new, OpenAI’s refinements — enabled by things like its jump-start on working on these technologies, massive funding, and its closed-source development model — have taken Deep Research to a new level. It can work behind closed doors, and leverage feedback from the more than 300 million active users of OpenAI’s popular ChatGPT product. OpenAI has led in research in these areas, for example in how to do verification step by step to get better results. And it has clearly implemented search in an interesting way, perhaps borrowing from Microsoft’s Bing and other technologies. While it is still hallucinating some results from its searches, it’s doing so less than competitors, perhaps in part because the underlying o3 model itself has set an industry low for these hallucinations at 8%. And there are ways to reduce mistakes still further, by using mechanisms like confidence thresholds, citation requirements and other sophisticated credibility checks.  At the same time, there are limits to OpenAI’s lead and capabilities. Within two days of Deep Research’s launch, HuggingFace introduced an open-source AI research agent called Open Deep Research that got results that weren’t too far off of OpenAI’s — similarly merging leading models and freely available agentic capabilities. There are few moats. Open-source competitors like DeepSeek appear set to stay close in the area of reasoning models, and Microsoft’s Magentic-One offers a framework for most of OpenAI’s agentic capabilities, to name just two more examples.  Furthermore, Deep Research has limitations. The product is really efficient at researching obscure information that can be found on the web. But in areas where there is not much online and where domain expertise is largely private — whether in

Out-analyzing analysts: OpenAI’s Deep Research pairs reasoning LLMs with agentic RAG to automate work Read More »