VentureBeat

Dario Amodei challenges DeepSeek’s $6 million AI narrative: What Anthropic thinks about China’s latest AI move

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The AI world was rocked last week when DeepSeek, a Chinese AI startup, announced its latest language model DeepSeek-R1 that appeared to match the capabilities of leading American AI systems at a fraction of the cost. The announcement triggered a widespread market selloff that wiped nearly $200 billion from Nvidia’s market value and sparked heated debates about the future of AI development. The narrative that quickly emerged suggested that DeepSeek had fundamentally disrupted the economics of building advanced AI systems, supposedly achieving with just $6 million what American companies had spent billions to accomplish. This interpretation sent shockwaves through Silicon Valley, where companies like OpenAI, Anthropic and Google have justified massive investments in computing infrastructure to maintain their technological edge. But amid the market turbulence and breathless headlines, Dario Amodei, co-founder of Anthropic and one of the pioneering researchers behind today’s large language models (LLMs), published a detailed analysis that offers a more nuanced perspective on DeepSeek’s achievements. His blog post cuts through the hysteria to deliver several crucial insights about what DeepSeek actually accomplished and what it means for the future of AI development. Here are the four key insights from Amodei’s analysis that reshape our understanding of DeepSeek’s announcement. 1. The ‘$6 million model’ narrative misses crucial context DeepSeek’s reported development costs need to be viewed through a wider lens, according to Amodei. He directly challenges the popular interpretation: “DeepSeek does not ‘do for $6 million what cost U.S. AI companies billions.’ I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10s of millions to train (I won’t give an exact number). Also, 3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors).” This shocking revelation fundamentally shifts the narrative around DeepSeek’s cost efficiency. When considering that Sonnet was trained 9-12 months ago and still outperforms DeepSeek’s model on many tasks, the achievement appears more in line with the natural progression of AI development costs rather than a revolutionary breakthrough. The timing and context also matter significantly. Following historical trends of cost reduction in AI development — which Amodei estimates at roughly 4X per year — DeepSeek’s cost structure appears to be largely on trend rather than dramatically ahead of the curve. 2. DeepSeek-V3, not R1, was the real technical achievement While markets and media focused intensely on DeepSeek’s R1 model, Amodei points out that the company’s more significant innovation came earlier. “DeepSeek-V3 was actually the real innovation and what should have made people take notice a month ago (we certainly did). As a pretrained model, it appears to come close to the performance of state of the art U.S. models on some important tasks, while costing substantially less to train.” The distinction between V3 and R1 is crucial for understanding DeepSeek’s true technological advancement. V3 represented genuine engineering innovations, particularly in managing the model’s “Key-Value cache” and pushing the boundaries of the mixture of experts (MoE) method. This insight helps explain why the market’s dramatic reaction to R1 may have been misplaced. R1 essentially added reinforcement learning capabilities to V3’s foundation — a step that multiple companies are currently taking with their models. 3. Total corporate investment reveals a different picture Perhaps the most revealing aspect of Amodei’s analysis concerns DeepSeek’s overall investment in AI development. “It’s been reported — we can’t be certain it is true — that DeepSeek actually had 50,000 Hopper generation chips, which I’d guess is within a factor ~2-3X of what the major U.S. AI companies have. Those 50,000 Hopper chips cost on the order of ~$1B. Thus, DeepSeek’s total spend as a company (as distinct from spend to train an individual model) is not vastly different from U.S. AI labs.” This revelation dramatically reframes the narrative around DeepSeek’s resource efficiency. While the company may have achieved impressive results with individual model training, its overall investment in AI development appears to be roughly comparable to its American counterparts. The distinction between model training costs and total corporate investment highlights the ongoing importance of substantial resources in AI development. It suggests that while engineering efficiency can be improved, remaining competitive in AI still requires significant capital investment. 4. The current ‘crossover point’ is temporary Amodei describes the present moment in AI development as unique but fleeting. “We’re therefore at an interesting ‘crossover point’, where it is temporarily the case that several companies can produce good reasoning models,” he wrote. “This will rapidly cease to be true as everyone moves further up the scaling curve on these models.” This observation provides crucial context for understanding the current state of AI competition. The ability of multiple companies to achieve similar results in reasoning capabilities represents a temporary phenomenon rather than a new status quo. The implications are significant for the future of AI development. As companies continue to scale up their models, particularly in the resource-intensive area of reinforcement learning, the field is likely to once again differentiate based on who can invest the most in training and infrastructure. This suggests that while DeepSeek has achieved an impressive milestone, it hasn’t fundamentally altered the long-term economics of advanced AI development. The true cost of building AI: What Amodei’s analysis reveals Amodei’s detailed analysis of DeepSeek’s achievements cuts through weeks of market speculation to expose the actual economics of building advanced AI systems. His blog post systematically dismantles both the panic and enthusiasm that followed DeepSeek’s announcement, showing how the company’s $6 million model training cost fits within the steady march of AI development. Markets and media gravitate toward simple narratives, and the story of a Chinese company dramatically undercutting U.S. AI development costs proved irresistible. Yet Amodei’s breakdown reveals a more complex reality: DeepSeek’s total investment, particularly its reported $1 billion in computing hardware, mirrors the spending of its American counterparts. This moment of cost parity

Dario Amodei challenges DeepSeek’s $6 million AI narrative: What Anthropic thinks about China’s latest AI move Read More »

Is DeepSeek really sending data to China? Let’s decode

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Last week, Chinese startup DeepSeek sent shockwaves in the AI community with its frugal yet highly performant open-source release, DeepSeek-R1. The model uses pure reinforcement learning (RL) to match OpenAI’s o1 on a range of benchmarks, challenging the longstanding notion that only large-scale training with powerful chips can lead to high-performing AI.  However, with the blockbuster release, many have also started pondering the implications of the Chinese model, including the possibility of DeepSeek transmitting personal user data to China.  The concerns started with the company’s privacy policy. Soon, the issue snowballed, with OpenAI technical staff member Steven Heidel indirectly suggesting that Americans love to “give away their data” to the Chinese Communist Party to get free stuff. The allegations are significant from a security standpoint, but the fact is that DeepSeek can only store data on Chinese servers when the models are used through the company’s own ChatGPT-like service.  If the open-source model is hosted locally or orchestrated via GPUs in the U.S., the data does not go to China.  Concerns about DeepSeek’s privacy policy In its privacy policy, which was also unavailable for a couple of hours, DeepSeek notes that the company collects information in different ways, including when users sign up for its services or use them. This means everything from account setup information — names, emails, numbers and passwords — to usage data such as text or audio input prompts, uploaded files, feedback and broader chat history goes to the company. But, that’s not all. The policy further states that the information collected will be stored in secure servers located in the People’s Republic of China and may be shared with law enforcement agencies, public authorities and others for reasons such as helping investigate illegal activities or just complying with applicable law, legal process or government requests.  The latter is important as China’s data protection laws allow the government to seize data from any server in the country with minimal pretext. With such a range of information on Chinese servers, a myriad of things can be triggered, including profiling individuals and organizations, leakage of sensitive business data, and even cyber surveillance campaigns. The catch While the policy can easily raise security and privacy alarms (as it already has for many), it is important to note that it applies only to DeepSeek’s own services — apps, websites and software — using the R1 model in the cloud. If you have signed up for the DeepSeek Chat website or are using the DeepSeek AI assistant on your Android or iOS device, there’s a good chance that your device data, personal information and prompts so far have been sent to and stored in China.  The company has not shared its stance on the matter, but given that the iOS DeepSeek app has been trending as #1, even ahead of ChatGPT, it’s fair to say that many people may have already signed up for the assistant to test out its capabilities — and shared their data at some level in the process.  The Android app of the service has also scored over a million downloads. DeepSeek-R1 is open-source itself As for the core DeepSeek-R1 model, there’s no question of data transmission.  R1 is fully open-source, which means teams can run it locally for their targeted use case through open-source implementation tools like Ollama. This ensures the model does its job effectively while keeping data restricted to the machine itself. According to Emad Mostaque, former founder and CEO of Stability AI, the R1-distill-Qwen-32B model can run smoothly on the new Macs with 16GB of vRAM. As an alternative, teams can also use GPU clusters from third-party orchestrators to train, fine-tune and deploy the model — without data transmission risks. One of these is Hyperbolic Labs, which allows users to rent a GPU to host R1. The company also allows inference via a secured API. That said, in case one’s looking just to chat with DeepSeek-R1 to solve a particular reasoning problem, the best way to go right now is with Perplexity. The company has just added R1 to its model selector, allowing users to do deep web research with chain-of-thought reasoning. According to Aravind Srinivas, the CEO of Perplexity, the company has enabled this use case for its customers by hosting the model in data center servers located in the U.S. and Europe.  Long story short: your data is safe as long as it’s going to a locally hosted version of DeepSeek-R1, whether it’s on your machine or a GPU cluster somewhere in the West. source

Is DeepSeek really sending data to China? Let’s decode Read More »

Agentic AI needs orchestration: How ServiceNow’s AI orchestrator automates complex enterprise workflows

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Agentic AI isn’t just the latest AI hype cycle, it’s real technology that can make a big difference for enterprise workflows. That’s the big bet that ServiceNow has been making and is now doubling down on in a bid to bring higher return-on-investment to enterprise AI efforts. ServiceNow is in the business of enterprise workflow solutions, helping its over 8,000 global customers with all manner of processes ranging from human resources to IT service management (ITSM). Over the past two years, the company has been incrementally adding AI services through its Now Assist technology. In 2024 its first agentic AI services debuted, with a library of AI agents in the Now Assist Skill Kit. While the underlying technology is sophisticated, ServiceNow’s approach emphasizes invisible integration — AI agents working behind the scenes to enhance enterprise productivity without requiring direct user interaction. So how do you make agentic AI even more useful? That’s where AI orchestration comes in.  Today ServiceNow is announcing its new AI Agent Orchestrator and AI Agent Studio, marking a significant shift from basic generative AI assistance to comprehensive, end-to-end task automation. What the ServiceNow AI orchestrator is all about The AI Agent Orchestrator is the “brain” that coordinates and manages the interactions between different AI agents to accomplish complex tasks. The AI Agent Studio is the development and management platform that gives customers the tools to create, deploy and maintain their own custom AI agents. The company plans to release thousands of specialized AI agents designed to handle complex enterprise workflows across IT, HR and customer service functions. Early adopters report significant efficiency gains, according to ServiceNow. One early user is achieving a 70% reduction in ticket resolution time — cutting average handling time from 30 minutes to seven to eight minutes. The key to getting a better outcome is to fully realize the promise of agentic AI. And the promise of agentic AI workflows isn’t about just using a single agent, but rather a series of them to achieve a business goal. “You can have all these single AI agents being specialized in a specific task, but bringing in the AI agent orchestrator, it’s bringing order to the chaos,” Dorit Zilbershot, vice president of AI experiences and innovation at ServiceNow, told VentureBeat. “It’s making sure that there is some kind of supervision on all these AI agents and that there’s an understanding of what’s the end-to-end goal or business problem that we’re trying to solve.” Why agentic AI needs more than just agents There is no shortage of interest in agentic AI in 2025. Many large enterprise technology vendors including big players like Salesforce with its Agentforce platform and Microsoft have been emphasizing the technology. ServiceNow isn’t all that different in recognizing the value of agentic AI for enterprises. Where it differs is in the application. ServiceNow is all about workflows for enterprise processes, which can often involve many different steps. An agent can help to automate  components within a specific domain, while AI agent orchestration can coordinate multiple agents across domains in a complicated workflow. An example of how the AI agent orchestrator can help with a complex enterprise workflow might be onboarding a new employee. This involves several steps and tasks that need to be coordinated, such as: setting up the employee’s IT accounts and equipment, enrolling them in HR systems and benefits, scheduling training and orientation sessions and granting access to necessary business systems. With the AI agent orchestrator, ServiceNow can create a team of specialized AI agents to handle each of these tasks. For example, an IT agent would provision the laptop, an HR agent would enroll the employee in HR systems, and a training agent would schedule onboarding sessions. The AI Agent Orchestrator coordinates the handoffs and communication between the agents. It understands the overall onboarding workflow, monitors progress and ensures all the necessary steps are completed successfully. If any issues arise, the orchestrator can troubleshoot, re-assign tasks, or escalate to human intervention as needed. The system also provides end-to-end visibility and management of the onboarding process. How agentic AI orchestration works The idea of chaining together different AI processes or LLMs is not new either.   There are technologies like LangChain that allow organizations to “chain” together multiple LLMs. There are also LLM router technologies that allow different queries to be routed. Zilbershot said that ServiceNow’s orchestrator is built entirely on its proprietary platform and is not relying on external frameworks. She explained that the system incorporates both short-term and long-term memory capabilities. The memory helps to provide context for the AI agents. Within the orchestrator platform there are also multiple types of models. The orchestrator uses larger language models for decision-making and planning. It then uses smaller LLMs for specific actions like summarization or email generation. Agentic AI workflows need data ServiceNow’s strategic positioning at the intersection of AI and enterprise automation hinges on data, and specifically the company’s Workflow Data Fabric. The Workflow Data Fabric is a foundational technology for ServiceNow. It enables cross-system data access, consistent context maintenance, secure data handling across workflows and integration with existing enterprise systems. “We want to drive agentic AI and conversational experiences across the board, and we’re driving everything with the Workflow Data Fabric at its core,” Zilbershot said. “We’re able to access any data systems as well as any systems for actions, and really create the single place where our customers can manage and orchestrate all their enterprise processes and workflows with the ServiceNow platform.” source

Agentic AI needs orchestration: How ServiceNow’s AI orchestrator automates complex enterprise workflows Read More »

Sam Altman admits OpenAI was ‘on the wrong side of history’ in open source debate

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Sam Altman, CEO of OpenAI, made a striking admission on Friday that his company has been “on the wrong side of history” regarding open source AI, signaling a potential seismic shift in strategy as competition from China intensifies and efficient open models gain traction. The candid acknowledgment came during a Reddit “Ask Me Anything” session, just days after Chinese AI firm DeepSeek rattled global markets with its open source R1 model that claims comparable performance to OpenAI’s systems at a fraction of the cost. “Yes, we are discussing [releasing model weights],” Altman wrote. “I personally think we have been on the wrong side of history here and need to figure out a different open source strategy.” He noted that not everyone at OpenAI shares his view and it isn’t the company’s current highest priority. The statement represents a remarkable departure from OpenAI’s increasingly proprietary approach in recent years, which has drawn criticism from some AI researchers and former allies, most notably Elon Musk, who is suing the company for allegedly betraying its original open source mission. Sam Altman, the chief executive of OpenAI, acknowledged in a Reddit forum on Thursday that the company needs to reconsider its closed-source approach to artificial intelligence, though he noted internal disagreement on the issue. (Credit: Reddit) Sam Altman on DeepSeek: ‘We will maintain less of a lead’ Altman’s comments come amid market turmoil triggered by DeepSeek’s emergence. The Chinese company’s claims of building advanced AI models for just $5.6 million in training costs (though total development costs are likely much higher) sent Nvidia’s stock plummeting, wiping out nearly $600 billion in market value—the largest single-day drop for any U.S. company in history. “We will produce better models, but we will maintain less of a lead than we did in previous years,” Altman acknowledged in the same AMA, addressing DeepSeek’s impact directly. Sam Altman, the chief executive of OpenAI, acknowledged in a Reddit forum on Friday that DeepSeek’s model is “very good” and predicted his company would “maintain less of a lead than we did in previous years” in AI development. (Credit: Reddit) Sam Altman admits OpenAI’s closed strategy may be flawed DeepSeek’s breakthrough, whether or not its specific claims prove accurate, has highlighted shifting dynamics in AI development. The company says it achieved its results using only 2,000 Nvidia H800 GPUs—far fewer than the estimated 10,000+ chips typically deployed by major AI labs. This approach suggests that algorithmic innovation and architectural optimization might matter more than raw computing power. The revelation threatens not just OpenAI’s technical strategy, but its entire business model built on exclusive access to massive computational resources. The open source debate: innovation vs. security However, DeepSeek’s rise has also intensified national security concerns. The company stores user data on servers in mainland China, where it could be subject to government access. Several U.S. agencies have already moved to restrict its use, with NASA becoming the latest to block the application citing “security and privacy concerns.” OpenAI’s potential pivot to open source would mark a return to its roots. The company was founded as a non-profit in 2015 with the mission of ensuring artificial general intelligence benefits humanity. However, its transition to a “capped-profit” model and increasingly closed approach has drawn criticism from open source advocates. “The correct reading is: ‘Open source models are surpassing proprietary ones,’” wrote Meta’s chief AI scientist Yann LeCun on LinkedIn, responding to DeepSeek’s emergence. “They came up with new ideas and built them on top of other people’s work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.” A new chapter in AI development While Altman’s comments suggest a strategic shift may be coming, he emphasized that open source isn’t currently OpenAI’s top priority. This hesitation reflects the complex reality facing AI leaders: balancing innovation, security, and commercialization in an increasingly multipolar AI world. The stakes extend far beyond OpenAI’s bottom line. The company’s decision could reshape the entire AI ecosystem. Open-sourcing key models could accelerate innovation and democratize access, but it might also complicate efforts to ensure AI safety and security—core tenets of OpenAI’s mission. The timing of Altman’s admission, coming after DeepSeek’s market shock rather than before it, suggests that OpenAI may be reacting to market forces rather than leading them. This reactive stance marks a striking role reversal for a company that has long positioned itself as AI’s north star. As the dust settles from DeepSeek’s debut, one thing becomes clear: the real disruption isn’t just about technology or market value—it’s about challenging the assumption that closely guarded AI models are the surest path to artificial general intelligence. In that light, Altman’s admission might be less about being on the wrong side of history and more about recognizing that history itself has changed course. source

Sam Altman admits OpenAI was ‘on the wrong side of history’ in open source debate Read More »

Essential principles to produce and consume data for AI acceleration

This is a VB Lab Insights article presented by Capital One. AI offers transformative potential, but unlocking its value requires strong data management. AI builds on a solid data foundation that can iteratively improve, creating a flywheel effect between data and AI. This flywheel enables companies to build more customized, real-time solutions that unlock impact for their customers and the business. Managing data in today’s world is not without complexity. Data volume is skyrocketing, with research showing it’s doubled in the last five years alone. As a result, 68% of data available to enterprises is left untapped. Within that data, there’s a huge variety of structures and formats, with MIT noting that around 80-90% of data is unstructured — fueling complexity in putting it to use. And finally, the velocity at which data needs to be deployed to users is accelerating. Some use cases call for sub-10 millisecond data availability, or in other words, ten times faster than the blink of an eye. The data ecosystems of today are big, diverse and fast — and the AI revolution is further raising the stakes on how companies manage and use data. Fundamentals for great data The data lifecycle is complicated and unforgiving, often involving many steps, many hops and many tools. This can lead to disparate ways of working with data and varying levels of maturity and instrumentation to drive data management. To empower users with trustworthy data for innovation, we need to first tackle the fundamentals of managing great data: self-service, automation and scale. Self-service means empowering users to do their job with minimal friction. It covers areas like seamless data discovery, ease of data production and tools that democratize data access. Automation ensures that all core data management capabilities are embedded in the tools and experiences that enable users to work with data. Data ecosystems need to scale — especially in the AI era. Among other considerations, enterprises need to consider the scalability of certain technologies, resilience capabilities and service level agreements that set baseline obligations for how data is to be managed (as well as enforcement mechanisms for such agreements). These principles lay the foundation to produce and consume great data. Producing great data Data producers are responsible for onboarding and organizing data, enabling quick and efficient consumption. A well-designed, self-service portal can play a key role here by allowing producers to interact seamlessly with systems across the ecosystem — such as storage, access controls, approvals, versioning and business catalogs. The goal is to create a unified control plane that mitigates the complexity of these systems, making data available in the right format, at the right time and in the right place. To scale and enforce governance, enterprises can choose between a central platform and a federated model — or even adopt a hybrid approach. A central platform simplifies data publishing and governance rules, while a federated model offers flexibility, using purpose-built SDKs to manage governance and infrastructure locally. The key is to implement consistent mechanisms that ensure automation and scalability, enabling the business to reliably produce high-quality data that fuels AI innovation. Consuming great data Data consumers — such as data scientists and data engineers — need easy access to reliable, high-quality data for rapid experimentation and development. Simplifying the storage strategy is a foundational step. By centralizing compute within the data lake and using a single storage layer, enterprises can minimize data sprawl and reduce complexity by enabling compute engines to consume data from a single storage layer. Enterprises should also adopt a zone strategy to handle diverse use cases. For instance, a raw zone may support expanded data and file types such as unstructured data, while a curated zone enforces stricter schema and quality requirements. This setup allows for flexibility while maintaining governance and data quality. Consumers can use these zones for activities like creating personal spaces for experimentation or collaborative zones for team projects. Automated services ensure data access, lifecycle management and compliance, empowering users to innovate with confidence and speed. Lead with simplicity Effective AI strategies are grounded in robust, well-designed data ecosystems. By simplifying how you produce and consume data — and improving the quality of said data — businesses can empower users to innovate in new performance-driving areas with confidence. As a foundation, it’s paramount that businesses prioritize ecosystems and processes that enhance trustworthiness and accessibility. By implementing the principles outlined above, they can do just that –building scalable and enforceable data management that will power rapid experimentation in AI and ultimately deliver long-term business value. Marty Andolino is VP, Software Engineering at Capital OneKajal Wood is Sr. Director, Software Engineering at Capital One VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Essential principles to produce and consume data for AI acceleration Read More »

Ex-Google, Apple engineers launch unconditionally open source Oumi AI platform that could help to build the next DeepSeek

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More If it wasn’t clear before, it’s definitely very clear now: Open source really does matter for AI. The success of DeepSeek-R1 has substantively proven there is a need and demand for open-source AI. But what exactly is open-source AI? For Meta and its Llama models, it means free access to use the model, with some conditions. DeepSeek is available under a permissive open-source license  providing significant access to its architecture and capabilities. However, the specific training code and detailed methodologies, particularly those involving reinforcement learning (RL) techniques like Group Relative Policy Optimization (GRPO), have not been publicly disclosed. This omission limits the community’s ability to fully understand and replicate the model’s training process. What neither DeepSeek nor Llama enables, however, is full unconditional access to all the model code, including weights as well as training data. Without all that information, developers can still work with the open model but they don’t have all the necessary tools and insights to understand how it really works and more importantly how to build an entirely new model. That’s a challenge that a new startup led by former Google and Apple AI veterans aims to solve. Launching today, Oumi is backed by an alliance of 13 leading research universities including Princeton, Stanford, MIT, UC Berkeley, University of Oxford, University of Cambridge, University of Waterloo and Carnegie Mellon. Oumi’s founders raised $10 million, a modest seed round they say meets their needs. While major players like OpenAI contemplate $500 billion investments in massive data centers through projects like Stargate, Oumi is taking a radically different approach. The platform provides researchers and developers with a complete toolkit for building, evaluating and deploying foundation models. “Even the biggest companies can’t do this on their own,” Oussama Elachqar, cofounder of Oumi and previously a machine learning engineer at Apple, told VentureBeat. “We were effectively working in silos within Apple, and there are many other silos happening across the industry. There has to be a better way to develop these models collaboratively.” What open-source models like DeepSeek and Llama are missing Oumi CEO and former Google Cloud AI senior engineering manager Manos Koukoumidis told VentureBeat that researchers consistently tell him AI experimentation has become extremely complex. While today’s open models are a step forward, it’s not enough. Koukoumidis explained that with current “open” AI models like DeepSeek-R1 and Llama, an organization can use the model and deploy it on their own. What’s missing is that anyone else who wants to build on the model doesn’t know exactly how it was built. The Oumi founders believe this lack of transparency is a major hindrance to collaborative AI research and development. Even a project like Llama requires a significant amount of effort from researchers to figure out how to reproduce and build upon the work.  How Oumi works to open AI for enterprise users, researchers and everyone else The Oumi platform works by providing an all-in-one environment that streamlines the complex workflows involved in building AI models.  Koukoumidis explained that to build a foundation model, there are typically 10 or more steps that need to be done, often in parallel. Oumi integrates all necessary tools and workflows into a unified environment, eliminating the need for researchers to piece together and configure various open-source components. Key technical features include: Support for models ranging from 10M to 405B parameters Implementation of advanced training techniques including SFT, LoRA, QLoRA and DPO Compatibility with both text and multimodal models Built-in tools for training data synthesis and curation using LLM judges Deployment options through modern inference engines like vLLM and SGLang Comprehensive model evaluation across standard industry benchmarks “We don’t have to deal with the open-source development hell of figuring out what you can combine and what works well,” Koukoumidis explained. The platform allows users to start small, using their own laptops for initial experiments and model training. As users progress, they can then scale up to larger compute resources, such as university clusters or cloud providers, all within the same Oumi environment. You don’t need massive training infrastructure to build an open model  One of the big surprises with DeepSeek-R1 is the fact that it was apparently built with a fraction of the resources that Meta or OpenAI use to build their models. As OpenAI and others invest billions in centralized infrastructure, Oumi is betting on a distributed approach that could dramatically reduce costs. “The idea that you need hundreds of billions [of dollars] for AI infrastructure is fundamentally flawed,” Koukoumidis said. “With distributed computing across universities and research institutions, we can achieve similar or better results at a fraction of the cost.” The initial focus for Oumi is to build out the open-source ecosystem of users and development. But that’s not all the company has planned. Oumi plans to develop enterprise offerings to help businesses deploy these models in production environments. source

Ex-Google, Apple engineers launch unconditionally open source Oumi AI platform that could help to build the next DeepSeek Read More »

Former Google, Meta leaders launch Palona AI, bringing personalized, emotive customer agents to non-techie enterprises

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Speaking for myself, interacting with any merchant’s AI-powered chatbot on their website is often an exercise in frustration. Phone trees with robot voices are typically worse. I’d wager I’m hardly alone in my assessment. Who amongst us hasn’t experienced long hold times, slow responses, lack of updated information and awareness of the customer’s own account history, a granting faux politeness and a host of other inefficiencies? A new startup called Palona debuted last week that aims to fix this sorry state of affairs. It equips direct-to-consumer enterprises — think pizza shops and electronics vendors — with live, 24/7 customer support sales agents that are uniquely reflective of each business’s brand personality, voice, inventory stock and value proposition. The electronics vendor has a “wizard” agent made by Palona, while the pizza shop gets a surfer dude agent personality — naturally. In all cases, Palona focuses on creating AI agents that have high “EQ,” or “emotional intelligence/emotional quotient,” building them from a combination of open source and proprietary AI models and training some of their own using human sociology research. “A kind of fundamental thesis is that we can create an experience that is delightful and feels genuine, like a real human conversation,” Palona co-founder and CTO Tim Howes, said in an in-person interview with VentureBeat. “ChatGPT is a hugely useful tool, but it does not feel like a human conversation.” Palona claims its system can be easily implemented by a non-techie brand on their website, mobile app or phone lines — with responses uniquely tailored to each brand and each communications environment. And, in fact, its agents are already at work handling orders, answering questions and complaints and suggesting products and upsells to customers. Strong founding background In addition to Howes, Palona was co-founded and is led by a team of engineers from some of the top tech companies in the world, among them: Maria Zhang, Palona’s CEO, is a former VP of engineering at Google, VP/GM of AI for products at Meta and CTO of Tinder. She also founded Alike, which was acquired by Yahoo in 2013. Palona’s chief scientist Steve Liu was formerly chief scientist at Samsung AI Center and Tinder. A tenured professor at McGill University, Liu is also a Fellow of IEEE and the Canadian Academy of Engineering, with more than 390 research papers to his name. And, Howes himself is the co-inventor of the industry-standard, open source Lightweight Directory Access Protocol (LDAP) online data storage system, as well as co-founder of LoudCloud and OpsWare (the latter was acquired by HP for $1.65 billion). He was also previously the CTO at Netscape, HP Software and led developer productivity at Meta’s AI infrastructure business. “We’re building fully autonomous sales agents — not tools for salespeople, but actual AI salespeople,” said Zhang, adding that AI will be “the employee of the century.” 24/7 polite, distinct, personable sales agents Palona AI positions itself as a solution for companies looking to improve their sales performance, customer engagement and brand loyalty. The Palona agents act as customized virtual sales employees, combining soft sales skills with 24/7 availability, unlimited capacity and advanced memory recall, and can interact with customers in an online chatbot, an SMS/text or AI-powered voices. “100% — we support voice,” Zhang explained. “For example, in pizza ordering, voice is still a major user pattern. In the Midwest, about 50% of people still call to order. On the east and west coasts, it’s around 20%, but it’s still significant.” Palona’s voices are licensed, but the company has the ability to train and deploy custom ones — even voice clones of authorized customer reps or a CEO. The company realized through testing that the voice version of Palona’s AI sales agents would need to have distinctly different interaction styles from the text chatbot. “We tested different voice interactions, and for pizza ordering, for example, customers wanted efficiency,” Zhang related. “They didn’t want a chatty AI — they just wanted to get their order done as fast as possible. So we optimized for that, making it have less personality, less verbosity, more efficiency.” Unlike traditional chatbots that serve as assistants to human representatives, Palona AI is designed to handle entire sales cycles without human intervention. “There’s a big gap between lifelike AI models like ChatGPT and what businesses actually need — an AI agent that can fully sell, convert, and upsell,” Zhang explained. Palona claims to minimize errors and reduces AI hallucinations by up to 98%, ensuring reliable interactions. Zhang and Howes said that for even the most analog businesses, it takes just a short lead team to get going, and only several days for a simple implementation. Customers provide Palona with “FAQs, employee training manuals, policies and procedures,” said Howes. Then, they define what actions the agent should take — be it processing orders, answering inquiries or handling support issues. One of the biggest factors affecting setup time: How much integration is required with the customer’s existing systems (point of sale, customer reltionship management, ordering platforms). If we already support their system, it’s plug-and-play,” Howes explained. “The agent can be ready in a couple of days. If they’re using a new, unfamiliar system, that requires additional engineering work, which could take longer.” In addition, Zhang said that Palona was “actually in the process of automating agent setup. Eventually, businesses will be able to use a Palona agent to configure their own Palona agent.” Three language models are better than one Palona achieves all this by combining three different models. The first is a custom, fine-tuned large language model (LLM) that serves as the basis for every distinct business sales agent — the pizza shop gets a different tone and personality from the electronics vendor, and each one is customized out of the box. There’s also a supervisory model that detects, catches and removes hallucinations from the main model before it outputs them

Former Google, Meta leaders launch Palona AI, bringing personalized, emotive customer agents to non-techie enterprises Read More »

Beyond benchmarks: How DeepSeek-R1 and o1 perform on real-world tasks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More DeepSeek-R1 has surely created a lot of excitement and concern, especially for OpenAI’s rival model o1. So, we put them to test in a side-by-side comparison on a few simple data analysis and market research tasks.  To put the models on equal footing, we used Perplexity Pro Search, which now supports both o1 and R1. Our goal was to look beyond benchmarks and see if the models can actually perform ad hoc tasks that require gathering information from the web, picking out the right pieces of data and performing simple tasks that would require substantial manual effort.  Both models are impressive but make errors when the prompts lack specificity. o1 is slightly better at reasoning tasks but R1’s transparency gives it an edge in cases (and there will be quite a few) where it makes mistakes. Here is a breakdown of a few of our experiments and the links to the Perplexity pages where you can review the results yourself. Calculating returns on investments from the web Our first test gauged whether models could calculate returns on investment (ROI). We considered a scenario where the user has invested $140 in the Magnificent Seven (Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, Tesla) on the first day of every month from January to December 2024. We asked the model to calculate the value of the portfolio at the current date. To accomplish this task, the model would have to pull Mag 7 price information for the first day of each month, split the monthly investment evenly across the stocks ($20 per stock), sum them up and calculate the portfolio value according to the value of the stocks on the current date. In this task, both models failed. o1 returned a list of stock prices for January 2024 and January 2025 along with a formula to calculate the portfolio value. However, it failed to calculate the correct values and basically said that there would be no ROI. On the other hand, R1 made the mistake of only investing in January 2024 and calculating the returns for January 2025. o1’s reasoning trace does not provide enough information However, what was interesting was the models’ reasoning process. While o1 did not provide much details on how it had reached its results, R1’s reasoning traced showed that it did not have the correct information because Perplexity’s retrieval engine had failed to obtain the monthly data for stock prices (many retrieval-augmented generation applications fail not because of the model lack of abilities but because of bad retrieval). This proved to be an important bit of feedback that led us to the next experiment. The R1 reasoning trace reveals that it is missing information Reasoning over file content We decided to run the same experiment as before, but instead of prompting the model to retrieve the information from the web, we decided to provide it in a text file. For this, we copy-pasted stock monthly data for each stock from Yahoo! Finance into a text file and gave it to the model. The file contained the name of each stock plus the HTML table that contained the price for the first day of each month from January to December 2024 and the last recorded price. The data was not cleaned to reduce the manual effort and test whether the model could pick the right parts from the data. Again, both models failed to provide the right answer. o1 seemed to have extracted the data from the file, but suggested the calculation be done manually in a tool like Excel. The reasoning trace was very vague and did not contain any useful information to troubleshoot the model. R1 also failed and didn’t provide an answer, but the reasoning trace contained a lot of useful information. For example, it was clear that the model had correctly parsed the HTML data for each stock and was able to extract the correct information. It had also been able to do the month-by-month calculation of investments, sum them and calculate the final value according to the latest stock price in the table. However, that final value remained in its reasoning chain and failed to make it into the final answer. The model had also been confounded by a row in the Nvidia chart that had marked the company’s 10:1 stock split on June 10, 2024, and ended up miscalculating the final value of the portfolio.  R1 hid the results in its reasoning trace along with information about where it went wrong Again, the real differentiator was not the result itself, but the ability to investigate how the model arrived at its response. In this case, R1 provided us with a better experience, allowing us to understand the model’s limitations and how we can reformulate our prompt and format our data to get better results in the future. Comparing data over the web Another experiment we carried out required the model to compare the stats of four leading NBA centers and determine which one had the best improvement in field goal percentage (FG%) from the 2022/2023 to the 2023/2024 seasons. This task required the model to do multi-step reasoning over different data points. The catch in the prompt was that it included Victor Wembanyama, who just entered the league as a rookie in 2023. The retrieval for this prompt was much easier, since player stats are widely reported on the web and are usually included in their Wikipedia and NBA profiles. Both models answered correctly (it’s Giannis in case you were curious), although depending on the sources they used, their figures were a bit different. However, they did not realize that Wemby did not qualify for the comparison and gathered other stats from his time in the European league. In its answer, R1 provided a better breakdown of the results with a comparison table along with links to the sources it used for its answer.

Beyond benchmarks: How DeepSeek-R1 and o1 perform on real-world tasks Read More »

Galileo launches Agentic Evaluations to fix AI agent errors before they cost you

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Galileo, a San Francisco-based startup, is betting that the future of artificial intelligence depends on trust. Today, the company launched a new product, Agentic Evaluations, to address a growing challenge in the world of AI: making sure the increasingly complex systems known as AI agents actually work as intended. AI agents — autonomous systems that perform multi-step tasks like generating reports or analyzing customer data — are gaining traction across industries. But their rapid adoption raises a crucial question: How can companies verify these systems remain reliable after deployment? Galileo’s CEO, Vikram Chatterji, believes his company has found the answer. “Over the last six to eight months, we started to see some of our customers trying to adopt agentic systems,” said Chatterji in an interview. “Now LLMs can be used as a smart router to pick and choose the right API calls towards actually completing a task. Going from just generating text to actually completing a task was a very big chasm that was unlocked.” A diagram showing how Galileo evaluates AI agents at three key stages: tool selection, error detection and task completion. (Credit: Galileo) AI agents show promise, but enterprises demand accountability Major enterprises like Cisco and Ema (the latter founded by Coinbase’s former chief product officer) have already adopted Galileo’s platform. These companies use AI agents to automate tasks from customer support to financial analysis, and report significant productivity gains. “A sales representative who’s trying to do outreach and outbounds would otherwise use maybe a week of their time to do that, versus with some of these AI-enabled agents, they’re doing that within two days or less,” Chatterji explained, highlighting the return on investment for enterprises. Galileo’s new framework evaluates tool selection quality, detects errors in tool calls, and tracks overall session success. It also monitors essential metrics for large-scale AI deployment, including costs and latency. A dashboard showing how Galileo evaluates AI agents at three key stages: tool selection, error detection and task completion. (Credit: Galileo) $68 million in funding fuels Galileo’s push into enterprise AI The launch builds on Galileo’s recent momentum. The company raised $45 million in series B funding led by Scale Venture Partners last October, bringing its total funding to $68 million. Industry analysts project the market for AI operations tools could reach $4 billion by 2025. The stakes are high as AI deployment accelerates. Studies show even advanced models like GPT-4 can hallucinate about 23% of the time during basic question-and-answer tasks. Galileo’s tools help enterprises identify these issues before they impact operations. “Before we launch this thing, we really, really need to know that this thing works,” Chatterji said, describing customer concerns. “The bar is really high. So that’s where we gave them this tool chain, such that they could just use our metrics as the basis for these tests.” Addressing AI hallucinations and enterprise-scale challenges The company’s focus on reliable, production-ready solutions positions it well in a market increasingly concerned with AI safety. For technical leaders deploying enterprise AI, Galileo’s platform provides essential guardrails for ensuring AI agents perform as intended while controlling costs. As enterprises expand their use of AI agents, performance monitoring tools become crucial infrastructure. Galileo’s latest offering aims to help businesses deploy AI responsibly and effectively at scale. “2025 will be the year of agents. It is going to be very prolific,” Chatterji noted. “However, what we’ve also seen is a lot of companies that are just launching these agents without good testing is leading to negative implications…The need for proper testing and evaluations is more than ever before.” source

Galileo launches Agentic Evaluations to fix AI agent errors before they cost you Read More »

It’s here: OpenAI’s o3-mini advanced reasoning model arrives to counter DeepSeek’s rise

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has released a new proprietary AI model in time to counter the rapid rise of open source rival DeepSeek R1 — but will it be enough to blunt the latter’s success? Today, after several days of rumors and increasing anticipation among AI users on social media, OpenAl is debuting o3-mini, the second model in its new family of “reasoners,” Al models that take slightly more time to “think,” analyze their own processes and reflect on their own “chains of thought” before responding to user queries and inputs with new outputs. The result is a model that can perform at the level of a PhD student or even degree holder on answering hard questions in math, science, engineering and many other fields. The o3-mini model is now available on ChatGPT, including the free tier, and OpenAI’s application programming interface (API), and it’s actually less expensive, faster, and more performant than the previous high-end model, OpenAI’s o1 and its faster, lower-parameter count sibling, o1-mini. While inevitably it will be compared to DeepSeek R1, and the release date seen as a reaction, it’s important to remember that o3 and o3-mini were announced well prior to the January release of DeepSeek R1, in December 2024 — and that OpenAI CEO Sam Altman stated previously on X that due to feedback from developers and researchers, it would be coming to ChatGPT and the OpenAI API at the same time. Unlike DeepSeek R1, o3-mini will not be made available as an open source model — meaning the code cannot be taken and downloaded for offline usage, nor customized to the same extent, which may limit its appeal compared to DeepSeek R1 for some applications. OpenAI did not provide any further details about the (presumed) larger o3 model announced back in December alongside o3-mini. At that time, OpenAI’s opt-in dropdown form for testing o3 stated that it would undergo a “delay of multiple weeks” before third-parties could test it. Performance and Features Similar to o1, OpenAI o3-mini is optimized for reasoning in math, coding, and science. Its performance is comparable to OpenAI o1 when using medium reasoning effort, but offers the following advantages: 24% faster response times compared to o1-mini (OpenAI didn’t provide a specific number here, but looking at third-party evaluation group Artificial Analysis’s tests, o1-mini’s response time is 12.8 seconds to receive and output 100 tokens. So for o3-mini, a 24% speed bump would drop the response time down to 10.32 seconds.) Improved accuracy, with external testers preferring o3-mini’s responses 56% of the time. 39% fewer major errors on complex real-world questions. Better performance in coding and STEM tasks, particularly when using high reasoning effort. Three reasoning effort levels (low, medium, and high), allowing users and developers to balance accuracy and speed. It also boasts impressive benchmarks, even outpacing o1 in some cases, according to the o3-mini System Card OpenAI released online (and which was published earlier than the official model availability announcement). o3-mini’s context window — the number of combined tokens it can input/output in a single interaction — is 200,000, with a maximum of 100,000 in each output. That’s the same as the full o1 model and outperforms DeepSeek R1’s context window of around 128,000/130,000 tokens. But it is far below Google Gemini 2.0 Flash Thinking’s new context window of up to 1 million tokens. While o3-mini focuses on reasoning capabilities, it doesn’t have vision capabilities yet. Developers and users looking to upload images and files should keep using o1 in the meantime. The competition heats up The arrival of o3-mini marks the first time OpenAI is making a reasoning model available to free ChatGPT users. The prior o1 model family was only available to paying subscribers of the ChatGPT Plus, Pro and other plans, as well as via OpenAI’s paid application programming interface. As it did with large language model (LLM)-powered chatbots via the launch of ChatGPT in November 2022, OpenAI essentially created the entire category of reasoning models back in September 2024 when it first unveiled o1, a new class of models with a new training regime and architecture. But OpenAI, in keeping with its recent history, did not make o1 open source, contrary to its name and original founding mission. Instead, it kept the model’s code proprietary. And over the last two weeks, o1 has been overshadowed by Chinese AI startup DeepSeek, which launched R1, a rival, highly efficient, largely open-source reasoning model freely available to take, retrain, and customize by anyone around the world, as well as use for free on DeepSeek’s website and mobile app — a model reportedly trained at a fraction of the cost of o1 and other LLMs from top labs. DeepSeek R1’s permissive MIT Licensing terms, free app/website for consumers, and decision to make R1’s codebase freely available to take and modify has led it to a veritable explosion of usage both in the consumer and enterprise markets — even OpenAI investor Microsoft and Anthropic backer Amazon rushing to add variants of it to their cloud marketplaces. Perplexity, the AI search company, also quickly added a variant of it for users. DeepSeek also dethroned the ChatGPT iOS app for the number one place in the U.S. Apple App Store, and is notable for outpacing OpenAI by connecting its R1 model to web search in its app and on the web, something that OpenAI has not yet done for o1, leading to further techno anxiety among tech workers and others online that China is catching up or has outpaced the U.S. in AI innovation — even technology more generally. Many AI researchers and scientists and top VCs such as Marc Andreessen, however, have welcomed the rise of DeepSeek and its open sourcing in particular as a tide that lifts all boats in the AI field, increasing the intelligence available to everyone while reducing costs. Availability in ChatGPT o3 is now rolling out globally to ChatGPT

It’s here: OpenAI’s o3-mini advanced reasoning model arrives to counter DeepSeek’s rise Read More »