VentureBeat

Why your enterprise AI strategy needs both open and closed models: The TCO reality check

This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue. For the last two decades, enterprises have had a choice between open-source and closed proprietary technologies. The original choice for enterprises was primarily centered on operating systems, with Linux offering an open-source alternative to Microsoft Windows. In the developer realm, open-source languages like Python and JavaScript dominate, as open-source technologies, including Kubernetes, are standards in the cloud. The same type of choice between open and closed is now facing enterprises for AI, with multiple options for both types of models. On the proprietary closed-model front are some of the biggest, most widely used models on the planet, including those from OpenAI and Anthropic. On the open-source side are models like Meta’s Llama, IBM Granite, Alibaba’s Qwen and DeepSeek. Understanding when to use an open or closed model is a critical choice for enterprise AI decision-makers in 2025 and beyond. The choice has both financial and customization implications for either options that enterprises need to understand and consider. Understanding the difference between open and closed licenses There is no shortage of hyperbole around the decades-old rivalry between open and closed licenses. But what does it all actually mean for enterprise users? A closed-source proprietary technology, like OpenAI’s GPT 4o for example, does not have model code, training data, or model weights open or available for anyone to see. The model is not easily available to be fine-tuned and generally speaking, it is only available for real enterprise usage with a cost (sure, ChatGPT has a free tier, but that’s not going to cut it for a real enterprise workload). An open technology, like Meta Llama, IBM Granite, or DeepSeek, has openly available code. Enterprises can use the models freely, generally without restrictions, including fine-tuning and customizations. Rohan Gupta, a principal with Deloitte, told VentureBeat that the open vs. closed source debate isn’t unique or native to AI, nor is it likely to be resolved anytime soon.  Gupta explained that closed source providers typically offer several wrappers around their model that enable ease of use, simplified scaling, more seamless upgrades and downgrades and a steady stream of enhancements. They also provide significant developer support. That includes documentation as well as hands-on advice and often delivers tighter integrations with both infrastructure and applications. In exchange, an enterprise pays a premium for these services.  “Open-source models, on the other hand, can provide greater control, flexibility and customization options, and are supported by a vibrant, enthusiastic developer ecosystem,” Gupta said. “These models are increasingly accessible via fully managed APIs across cloud vendors, broadening their distribution.” Making the choice between open and closed model for enterprise AI The question that many enterprise users might ask is what’s better: an open or a closed model? The answer however is not necessarily one or the other. “We don’t view this as a binary choice,” David Guarrera, Generative AI Leader at EY Americas, told VentureBeat. ” Open vs closed is increasingly a fluid design space, where models are selected, or even automatically orchestrated, based on tradeoffs between accuracy, latency, cost, interpretability and security at different points in a workflow.”  Guarrera noted that closed models limit how deeply organizations can optimize or adapt behavior. Proprietary model vendors often restrict fine-tuning, charge premium rates, or hide the process in black boxes. While API-based tools simplify integration, they abstract away much of the control, making it harder to build highly specific or interpretable systems. In contrast, open-source models allow for targeted fine-tuning, guardrail design and optimization for specific use cases. This matters more in an agentic future, where models are no longer monolithic general-purpose tools, but interchangeable components within dynamic workflows. The ability to finely shape model behavior, at low cost and with full transparency, becomes a major competitive advantage when deploying task-specific agents or tightly regulated solutions. “In practice, we foresee an agentic future where model selection is abstracted away,” Guarrera said. For example, a user may draft an email with one AI tool, summarize legal docs with another, search enterprise documents with a fine-tuned open-source model and interact with AI locally through an on-device LLM, all without ever knowing which model is doing what.  “The real question becomes: what mix of models best suits your workflow’s specific demands?” Guarrera said. Considering total cost of ownership With open models, the basic idea is that the model is freely available for use. While in contrast, enterprises always pay for closed models. The reality when it comes to considering total cost of ownership (TCO) is more nuanced. Praveen Akkiraju, Managing Director at Insight Partners explained to VentureBeat that TCO has many different layers. A few key considerations include infrastructure hosting costs and engineering: Are the open-source models self-hosted by the enterprise or the cloud provider? How much engineering, including fine-tuning, guard railing and security testing, is needed to operationalize the model safely?  Akkiraju noted that fine-tuning an open weights model can also sometimes be a very complex task. Closed frontier model companies spend enormous engineering effort to ensure performance across multiple tasks. In his view, unless enterprises deploy similar engineering expertise, they will face a complex balancing act when fine-tuning open source models. This creates cost implications when organizations choose their model deployment strategy. For example, enterprises can fine-tune multiple model versions for different tasks or use one API for multiple tasks. Ryan Gross, Head of Data & Applications at cloud native services provider Caylent told VentureBeat that from his perspective, licensing terms don’t matter, except for in edge case scenarios. The largest restrictions often pertain to model availability when data residency requirements are in place. In this case, deploying an open model on infrastructure like Amazon SageMaker may be the only way to get a state-of-the-art model that still complies. When it comes to TCO, Gross noted that the tradeoff lies between per-token costs and hosting and maintenance costs.  “There is a clear break-even point where

Why your enterprise AI strategy needs both open and closed models: The TCO reality check Read More »

For Replit’s CEO, the future of software is ‘agents all the way down’

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Can enterprise teams truly vibe code their way out of expensive SaaS contracts? Replit CEO and co-founder Amjad Masad seems to think so, and the ambitious vision could mean “agents all the way down.” Speaking at VB Transform on Tuesday, Masad touted how his startup’s agents could help a non-developer design and code a live polling app in a mere 15 minutes — using a written prompt to create databases, login authentication and even quality checks. “This is sort of like an almost semi-autonomous agent,” Masad said. “You can watch it, you can also go get a coffee and it’ll send you a notification when it’s ready to show you the future.” >>See all our Transform 2025 coverage here<< Scaling apps, sites and software without coding A polling app might not seem all that necessarily for most enterprise teams. However, the process illustrates how quickly some platforms are allowing individuals and teams to quickly and cheaply build and scale various websites, apps and software in ways that could potentially cut timelines or even replace some outside vendors — all without knowing much or any code.  The road map for Replit includes building more APIs and abstractions of primitives that agents can use to quickly set up databases, payment processes, and other features. Masad also mentioned other updates for Replit v3, including a way for users to add generative models directly to their app and have agents autonomously run tests of AI-generated apps. In recent months, vibe coding has become increasingly popular to help non-developers quickly design and code a new website, app, or agent from scratch using natural language prompts. Giants like Anthropic and Google have rolled out new tools, while startups like Anywhere, Genspark, and Lovable have raised new funding. (Just last month, Windsurf was reportedly in talks to be acquired by OpenAI.)  Replit is finding new ways of integrating with various enterprise-grade platforms to boost AI development. In February, Anthropic revealed that Replit was helping companies build with Claude on Google’s Vertex AI to support more than 100,000 applications with security and scalability. The growth of agentic coding might impact the value of creating apps, which Masad predicts could decline significantly and “perhaps to zero at some point.” When asked if his company could actually replace enterprise-grade tools like configure-price-quote (CPQ) systems, Masad said “we’re seeing customers have three orders of magnitude of savings on apps.” He gave an anecdote about a Replit user that claimed to use the platform to make a working version for ERP automation for just $400 instead of the vendor’s quoted $150,000. “When you think about what the software does, a lot of Replit users wake up in the morning, they have a problem in their minds, and they create an app to solve that problem,” Masad said. “…The software agent will go and build software in order to solve that problem and we’ll solve that problem for you. Vibe coding still requires proper human analysis Despite the growing competition, Masad said some platforms like Claude Code and Cursor help novices, but he warned that AI-generated code without proper checks could lead to leaked data or API keys. Replit addresses that risk through its cloud-native design, sandboxing to test agents in an isolated environment, and finding and fixing various security vulnerabilities. According to Masad, the point of differentiation is Replit’s full stack nature and focus on creating an autonomous software engineer. He thinks it might go beyond making an app to solve problems if there are ways to have agents skip the intermediate step.  Masad sees Replit and other generative AI features helping to generate UI changes based on the form factor of any given device. When a VB Transform audience member asked where the internet is heading with the future of web pages and multimodal agents, Masad said he thinks there’s a growing expectation of how the form factor of computing will change — which he noted could include various AI-enable wearables from OpenAI and Meta. Junior engineers = SMEs? Another aspect that came up was the future of how junior engineers will become subject matter experts. One audience member asked how people will learn for themselves instead of blindly accepting an agent’s code changes. Masad didn’t give a direct response but instead said the platform can highlight any piece of code and provide an explanation. He said that leads to a question about the vision for AI. “I think we’re going to get to a point where you don’t have to interface with the code,” he said. “We’re going to be able to interact with software on a higher level of abstraction. We need something a little better than English — somewhere in between code and English — and maybe someone will build that.” Replit has had renewed momentum since it laid off 30 staffers in May 2024 as part of an aggressive pivot. Now, a year later, Masad said the company has surpassed $100 million in ARR — up tenfold since the end of 2024. In a Tuesday interview with TBPN, he noted that people are using Replit to create multiple agents to help with a single project.  “The buzzword used to be the 10x engineer,” he said. “…You’re really just one person. You’re a team of engineers. I think every engineer is sort of a manager right now, so we’re not at 1,000x yet, but we’re going up.” source

For Replit’s CEO, the future of software is ‘agents all the way down’ Read More »

Between utopia and collapse: Navigating AI’s murky middle future

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more In the blog post The Gentle Singularity, OpenAI CEO Sam Altman painted a vision of the near future where AI quietly and benevolently transforms human life. There will be no sharp break, he suggests, only a steady, almost imperceptible ascent toward abundance. Intelligence will become as accessible as electricity. Robots will be performing useful real-world tasks by 2027. Scientific discovery will accelerate. And, humanity, if properly guided by careful governance and good intentions, will flourish. It is a compelling vision: calm, technocratic and suffused with optimism. But it also raises deeper questions. What kind of world must we pass through to get there? Who benefits and when? And what is left unsaid in this smooth arc of progress? Science fiction author William Gibson offers a darker scenario. In his novel The Peripheral, the glittering technologies of the future are preceded by something called “the jackpot” — a slow-motion cascade of climate disasters, pandemics, economic collapse and mass death. Technology advances, but only after society fractures. The question he poses is not whether progress occurs, but whether civilization thrives in the process. There is an argument that AI may help prevent the kinds of calamities envisioned in The Peripheral. However, whether AI will help us avoid catastrophes or merely accompany us through them remains uncertain. Belief in AI’s future power is not a guarantee of performance, and advancing technological capability is not destiny. Between Altman’s gentle singularity and Gibson’s jackpot lies a murkier middle ground: A future where AI yields real gains, but also real dislocation. A future in which some communities thrive while others fray, and where our ability to adapt collectively — not just individually or institutionally — becomes the defining variable. The murky middle Other visions help sketch the contours of this middle terrain. In the near-future thriller Burn In, society is flooded with automation before its institutions are ready. Jobs disappear faster than people can re-skill, triggering unrest and repression. In this, a successful lawyer loses his position to an AI agent, and he unhappily becomes an online, on-call concierge to the wealthy.  Researchers at AI lab Anthropic recently echoed this theme: “We should expect to see [white collar jobs] automated within the next five years.” While the causes are complex, there are signs this is starting and that the job market is entering a new structural phase that is less stable, less predictable and perhaps less central to how society distributes meaning and security. The film Elysium offers a blunt metaphor of the wealthy escaping into orbital sanctuaries with advanced technologies, while a degraded earth below struggles with unequal rights and access. A few years ago, a partner at a Silicon Valley venture capital firm told me he feared we were heading for this kind of scenario unless we equitably distribute the benefits produced by AI. These speculative worlds remind us that even beneficial technologies can be socially volatile, especially when their gains are unequally distributed. We may, eventually, achieve something like Altman’s vision of abundance. But the route there is unlikely to be smooth. For all its eloquence and calm assurance, his essay is also a kind of pitch, as much persuasion as prediction. The narrative of a “gentle singularity” is comforting, even alluring, precisely because it bypasses friction. It offers the benefits of unprecedented transformation without fully grappling with the upheavals such transformation typically brings. As the timeless cliché reminds us: If it sounds too good to be true, it probably is. This is not to say that his intent is disingenuous. Indeed, it may be heartfelt. My argument is simply a recognition that the world is a complex system, open to unlimited inputs that can have unpredictable consequences. From synergistic good fortune to calamitous Black Swan events, it is rarely one thing, or one technology, that dictates the future course of events.  The impact of AI on society is already underway. This is not just a shift in skillsets and sectors; it is a transformation in how we organize value, trust and belonging. This is the realm of collective migration: Not only a movement of labor, but of purpose.  As AI reconfigures the terrain of cognition, the fabric of our social world is quietly being tugged loose and rewoven, for better or worse. The question is not just how fast we move as societies, but how thoughtfully we migrate. The cognitive commons: Our shared terrain of understanding Historically, the commons referred to shared physical resources including pastures, fisheries and foresats held in trust for the collective good. Modern societies, however, also depend on cognitive commons: shared domain of knowledge, narratives, norms and institutions that enable diverse individuals to think, argue and decide together within minimal conflict. This intangible infrastructure is composed of public education, journalism, libraries, civic rituals and even widely trusted facts, and it is what makes pluralism possible. It is how strangers deliberate, how communities cohere and how democracy functions. As AI systems begin to mediate how knowledge is accessed and belief is shaped, this shared terrain risks becoming fractured. The danger is not simply misinformation, but the slow erosion of the very ground on which shared meaning depends. If cognitive migration is a journey, it is not merely toward new skills or roles but also toward new forms of collective sensemaking. But what happens when the terrain we share begins to split apart beneath us? When cognition fragments: AI and the erosion of the shared world For centuries, societies have relied on a loosely held common reality: A shared pool of facts, narratives and institutions that shape how people understand the world and each other. It is this shared world — not just infrastructure or economy — that enables pluralism, democracy and social trust. But as AI systems increasingly mediate how people access knowledge, construct belief and navigate daily life, that common ground is fragmenting. Already,

Between utopia and collapse: Navigating AI’s murky middle future Read More »

Catio wins ‘coolest tech’ award at VB Transform 2025

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Palo Alto-based Catio was awarded “Coolest Technology” at VentureBeat Transform 2025 in San Francisco on Wednesday. Founded in 2023, the company has raised $7 million to date, with a recent $3 million round announced in March. Catio was also a finalist and presented at VB Transform’s Innovation Showcase in 2024. Catio’s AI Copilot for Tech Architecture reframes architecture as a living system—one that can be codified, introspected and intelligently evolved. By combining a real-time architectural map with a multi-agent AI organization, the solution helps engineering teams shift from reactive decision-making to continuous, proactive architecture excellence. VentureBeat spoke with co-founder and CEO Boris Bogatin and product lead Adam Kirsh about their team and the company’s technology following the announcement of winners at Transform. “We’re a team of serial entrepreneurs and tech leaders who’ve all shared a deep personal problem,” Bogatin said. “While finance folks and developers all have tools, CTOs, architects, and developers all plan and optimize stacks on whiteboards and ad hoc spreadsheets. And we’re changing that with Catio.”  Catio is far more than a digital whiteboard for CTOs—it’s a reimagining of how architecture is understood, managed and evolved. The platform serves as a digital twin for your tech stack, offering continuous architecture visibility to inform more well-informed, data-driven architecture decisions. Designed to address the escalating complexity of modern tech stacks—including cloud infrastructure, container orchestration, monitoring and data pipelines—the platform replaces static diagrams and ad hoc snapshots with an interactive, high-fidelity system model. With Catio, architecture becomes a living, codified system—constantly updated, evaluated and advised by a network of intelligent AI agents. From static diagrams to living systems As an AI-driven tech stack copilot for technical leaders and engineering teams, Catio delivers real-time expert insights and actionable recommendations to help evaluate, plan and optimize infrastructure with clarity and confidence. The solution integrates with your existing technology stack—services like AWS, Kubernetes, Prometheus and more. Once connected, Stacks—its first core module—creates a comprehensive model of your entire environment. Unlike traditional architecture diagrams, this model is codified, versioned and continuously updated, living in code rather than PowerPoint slides. Source: catio.tech. This dynamic architecture model allows teams to interact with their stack as a navigable system. Each component is introspectable: What is this RDS instance doing? Is it optimized for cost or performance? Does it still meet business requirements? With Catio, these questions are no longer answered in meetings or siloed email threads; instead, they are built into the platform. A multi-agent AI system to close the loop on architecture The solution includes a multi-agent AI system designed to reflect the structure of a typical technical organization. It consists of 31 specialized agents, each modeled after common roles such as chief architect, data architect, messaging architect, product managers and beyond. These agents collaborate to assess the design and performance of the system architecture against the requirements and best practices. Together, they simulate the design review processes that typically require weeks of coordination or the involvement of external consultants. But instead of periodic reviews, Catios agents are always working, performing 24/7 analysis to help you evolve your architecture in real time. Also, Catio doesn’t just describe your architecture—it actively critiques it. The solution delivers gap analysis, pinpoints underperforming components and suggests targeted improvements aligned with your business goals. Whether it’s optimizing a data pipeline, overhauling your messaging infrastructure, or rethinking storage architecture, the platform provides actionable insights at every layer of the stack. The future At Transform, Catio also announced the upcoming launch of Archie, a conversational, multi-agent AI system. Archie will allow users to talk to their architecture and ask for advice—for example, a “how do I improve my security posture?” query will yield clear, actionable answers—such as guidance to pinpoint exactly where a specific security vulnerability exists within your architecture. Archie delivers both prescriptive guidance and reactive insights. If you’re aiming to optimize costs, for instance, its AI agents will surface opportunities and assess the business impact of each one. This makes it easier to connect every architectural decision to measurable ROI, helping you design and plan with greater precision—so your technical choices consistently support real business goals. To learn more about Catio’s team and technology, visit their website at catio.tech. You can also sign up for demo and get a first-hand look at the platform in action. Read about the other winners CTGT and Solo.io. The other finalists were Kumo, Superduper.io, Sutro and Qdrant. Editor’s note: As a thank-you to our readers, we’ve opened up early bird registration for VB Transform 2026 — just $200. This is where AI ambition meets operational reality, and you’re going to want to be in the room. Reserve your spot now. source

Catio wins ‘coolest tech’ award at VB Transform 2025 Read More »

Anthropic just made every Claude user a no-code app developer

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Anthropic announced Wednesday that it will transform its Claude AI assistant into a platform for creating interactive, shareable applications, marking a significant evolution from conversational chatbots toward functional software tools that users can build and distribute without coding knowledge. The San Francisco-based AI company revealed that millions of users have already created more than 500 million “artifacts” — interactive content ranging from educational games to data analysis tools — since the feature’s initial launch. Now, Anthropic is embedding Claude’s intelligence directly into these creations, enabling them to process user input and adapt content in real-time independently of ongoing conversations. The development represents a fundamental shift in how artificial intelligence interfaces with users, moving beyond static responses toward dynamic, interactive experiences that blur the lines between AI assistance and software development. The move intensifies competition with OpenAI’s Canvas feature, which launched in October with similar split-screen functionality for editing AI-generated content, though it lacks the same emphasis on shareable applications that defines Anthropic’s approach. How Claude’s artifacts eliminate the copy-paste problem plaguing AI workflows Traditional AI interactions follow a question-and-answer format, with users copying and pasting results into separate applications for practical use. Anthropic’s enhanced artifacts eliminate this friction by creating a dedicated workspace where AI-generated content becomes immediately functional and shareable. “Think bigger than ‘make me flashcards for Spanish,’” the company explains in its announcement blog post. “Try ‘build me a flashcard app.’ One request gets you static study materials. The other creates a shareable tool that generates cards for any topic.” The distinction highlights Anthropic’s strategic positioning against competitors. While OpenAI’s GPT Store focuses on conversational agents, Anthropic emphasizes functional applications with user interfaces. Early adopters are creating games with non-player characters that remember choices and adapt storylines, smart tutors that adjust explanations based on user understanding, and data analyzers that answer plain-English questions about uploaded spreadsheets. Why offering free AI app creation makes business sense for Anthropic The platform operates on Claude’s existing infrastructure, with users authenticating through their Claude accounts to access shared applications. This approach distributes computational load across subscription tiers rather than creating infrastructure strain from popular applications. Free users can create, view, and interact with artifacts, while Pro ($20/month) and Team ($25-30/month) subscribers gain additional capabilities and higher usage limits. The company views free access as a customer acquisition strategy, with Anthropic representatives noting that “free users experiencing the magic of creating with Claude become our best advocates.” The business model reflects broader industry trends toward freemium AI services, where basic functionality attracts users who eventually upgrade for enhanced features. Unlike traditional software marketplaces where creators might monetize applications, Anthropic’s platform emphasizes free sharing to build community engagement. Content moderation becomes critical as users generate millions of AI apps The proliferation of user-generated AI applications raises content moderation concerns that Anthropic addresses through multiple layers of protection. The company implements built-in safeguards during content creation, manually curates featured galleries, and requires all shared artifacts to comply with content policies. For its broader AI safety approach, Anthropic says it will “implement a multi-layered approach to prevent misuse, including real-time and asynchronous monitoring, rapid response protocols, and thorough pre-deployment red teaming,” according to the company’s updated Responsible Scaling Policy. These enterprise-grade safety measures extend to the artifacts platform, where user-generated content undergoes similar scrutiny. Users can report problematic content for team review, though the company has not disclosed specific metrics about moderation volume or effectiveness. The approach mirrors content moderation strategies employed by major social media platforms, adapted for AI-generated applications. OpenAI’s Canvas feature signals intensifying battle for AI interface supremacy Anthropic’s announcement comes as artificial intelligence companies increasingly compete on user experience rather than raw model capabilities. OpenAI’s Canvas feature, launched in October, provides similar split-screen functionality for editing AI-generated content, though without the same emphasis on shareable applications. The competition reflects broader industry recognition that conversational interfaces, while groundbreaking, may not represent the ultimate form of AI interaction. Companies are exploring visual interfaces, interactive experiences, and embedded intelligence as potential successors to traditional chatbots. Music producer Rick Rubin’s documented use of Claude artifacts in “The Way of Code” demonstrates the technology’s appeal beyond technical users, suggesting potential for mainstream adoption across creative industries. Software developers debate whether AI app builders threaten traditional coding jobs The democratization of application creation through AI tools raises fundamental questions about the future of traditional software development, with industry data revealing a dramatic shift already underway. Gartner research shows that 70% of new applications will use low-code or no-code technologies by 2025, representing a massive jump from just 25% in 2020. This transformation is creating what analysts call “citizen developers” — business users who create applications without formal programming training. Already, 41% of businesses have active citizen development initiatives, and nearly 60% of custom applications are built outside traditional IT departments. Companies using these platforms report avoiding the need to hire an average of two IT developers, generating approximately $4.4 million in increased business value over three years, according to Forrester research. However, the relationship between AI-powered development tools and traditional coding appears more complementary than competitive. Anthropic positions artifacts as enabling rapid prototyping and personal tool creation while professional developers continue building production-grade applications. The platforms excel at business process automation and simple applications but struggle with complex, mission-critical systems that require custom functionality and enterprise-scale performance. Security and governance concerns also maintain demand for professional developers. With applications increasingly built outside IT departments, organizations require skilled developers to establish proper governance frameworks and ensure applications meet enterprise security standards. The most successful developers are adapting to work alongside these tools rather than competing against them, focusing on system architecture, performance optimization, and integration challenges that AI tools cannot yet address. The market dynamics suggest coexistence rather than replacement, with the global low-code development platform market projected to reach $187 billion by 2030 while

Anthropic just made every Claude user a no-code app developer Read More »

CFOs want AI that pays: real metrics, not marketing demos

This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue. Recent surveys and VentureBeat’s conversations with CFOs suggest the honeymoon phase of AI is rapidly drawing to a close. While 2024 was dominated by pilot programs and proof-of-concept demonstrations, in mid-2025, the pressure for measurable results is intensifying, even as CFO interest in AI remains high.  According to a KPMG survey of 300 U.S. financial executives, investor pressure to demonstrate ROI on generative AI investments has increased significantly. For 90% of organizations, investor pressure is considered “important or very important” for demonstrating ROI in Q1 2025, a sharp increase from 68% in Q4 2024. This indicates a strong and intensifying demand for measurable returns. Meanwhile, according to a Bain Capital Ventures survey of 50 CFOs, 79% plan to increase their AI budgets this year, with 94% believing gen AI can strongly benefit at least one finance activity. This reveals a telling pattern in how CFOs are currently measuring AI value. Those who have adopted gen AI tools report seeing initial returns primarily through efficiency gains. “We created a custom workflow that automates vendor identification to quickly prepare journal entries,” said Andrea Ellis, CFO of Fanatics Betting and Gaming. “This process used to take 20 hours during month-end close, and now, it takes us just 2 hours each month.” Jason Whiting, CFO of Mercury Financial, echoed this efficiency focus: “Across the board, [the biggest benefit] has been the ability to increase speed of analysis. Gen AI hasn’t replaced anything, but it has made our existing processes and people better.” But CFOs are now looking beyond simple time savings toward more strategic applications.  The Bain data shows CFOs are most excited about applying AI to “long-standing pain points that prior generations of technology have been unable to solve.” Cosmin Pitigoi, CFO of Flywire, explained: “Forecasting trends based on large data sets has been around for a long time, but the issue has always been the model’s ability to explain the assumptions behind the forecast. AI can help not just with forecasting, but also with explaining what assumptions have changed over time.” These recent surveys suggest that CFOs are becoming the primary gatekeepers for AI investment; however, they’re still developing the financial frameworks necessary to evaluate these investments properly. Those who develop robust evaluation methodologies first will likely gain significant competitive advantages. Those who don’t may find their AI enthusiasm outpacing their ability to measure and manage the returns. Efficiency metrics: The first wave of AI value The initial wave of AI value capture by finance departments has focused predominantly on efficiency metrics, with CFOs prioritizing measurable time and cost savings that deliver immediate returns. This focus on efficiency represents the low-hanging fruit of AI implementation — clear, quantifiable benefits that are easily tracked and communicated to stakeholders. Drip Capital, a Silicon Valley-based fintech, exemplifies this approach with its AI implementation in trade finance operations. According to chief business officer Karl Boog, “We’ve been able to 30X our capacity with what we’ve done so far.” By automating document processing and enhancing risk assessment through large language models (LLMs), the company achieved a remarkable 70% productivity boost while maintaining critical human oversight for complex decisions. KPMG research indicates this approach is widespread, with one retail company audit committee director noting how automation has improved operational efficiency and ROI. This sentiment is echoed across industries as finance leaders seek to justify their AI investments with tangible productivity improvements. These efficiency improvements translate directly to the bottom line. Companies across sectors — from insurance to oil and gas — report that AI helps identify process inefficiencies, leading to substantial organizational cost savings and improved expense management. Beyond simple cost reduction, CFOs are developing more sophisticated efficiency metrics to evaluate AI investments. These include time-to-completion ratios comparing pre- and post-AI implementation timelines, cost-per-transaction analyses measuring reductions in resource expenditure and labor hour reallocation metrics tracking how team members shift from manual data processing to higher-value analytical work. However, leading CFOs recognize that while efficiency metrics provide a solid foundation for initial ROI calculations, they represent just the beginning of AI’s potential value. As finance leaders gain confidence in measuring these direct returns, they’re developing more comprehensive frameworks to capture AI’s full strategic value — moving well beyond the efficiency calculations that characterized early adoption phases. Beyond efficiency: The new financial metrics As CFOs move beyond the initial fascination with AI-driven efficiency gains, they’re developing new financial metrics that more comprehensively capture AI’s business impact. This evolution reflects a maturing approach to AI investments, with finance leaders adopting more sophisticated evaluation frameworks that align with broader corporate objectives. The surveys highlight a notable shift in primary ROI metrics. While efficiency gains remain important, we see productivity metrics are now overtaking pure profitability measures as the chief priority for AI initiatives in 2025. This represents a fundamental change in how CFOs assess value, focusing on AI’s ability to enhance human capabilities rather than simply reduce costs. Time to value (TTV) is emerging as a critical new metric in investment decisions. Only about one-third of AI leaders anticipate being able to evaluate ROI within six months, making rapid time-to-value a key consideration when comparing different AI opportunities. This metric will help CFOs prioritize quick-win projects that can deliver measurable returns while building organizational confidence in larger AI initiatives. Data quality measurements will increasingly be incorporated into evaluation frameworks, with 64% of leaders citing data quality as their most significant AI challenge. Forward-thinking CFOs now incorporate data readiness assessments and ongoing data quality metrics into their AI business cases, recognizing that even the most promising AI applications will fail without high-quality data inputs. Adoption rate metrics have also become standard in AI evaluation. Finance leaders track how quickly and extensively AI tools are being utilized across departments, using this as a leading indicator of potential value realization. These metrics help identify implementation challenges early and inform decisions

CFOs want AI that pays: real metrics, not marketing demos Read More »

How Highmark Health and Google Cloud are using Gen AI to streamline medical claims and improve care: 6 key lessons

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Among the numerous educational and startlingly insightful panel discussions on AI enterprise integrations featuring industry leaders at VentureBeat’s Transform 2025 conference this week was one led by Google Cloud Platform Vice President and Chief Technology Officer (CTO) Will Grannis and Richard Clarke, Highmark Health’s Senior Vice President and Chief Data and Analytics Officer. That session, “The New AI Stack in Healthcare: Architecting for Multi-Model, Multi-Modal Environments,” delivered a pragmatic look at how the two organizations are collaborating to deploy AI at scale across more than 14,000 employees at the large U.S. healthcare system Highmark Health (based out of Western Pennsylvania). In addition, the collaboration has onboarded all these employees and turned them into active users without losing sight of complexity, regulation, or clinician trust. So, how did Google Cloud and Highmark go about it? Read on to find out. A Partnership Built on Prepared Foundations Highmark Health, an integrated payer-provider system serving over 6 million members, is using Google Cloud’s AI models and infrastructure to modernize legacy systems, boost internal efficiency, and improve patient outcomes. What sets this initiative apart is its focus on platform engineering—treating AI as a foundational shift in how work gets done, not just another tech layer. Richard Clarke, Highmark’s Chief Data and Analytics Officer, emphasized the importance of building flexible infrastructure early. “There’s nothing more legacy than an employment platform coded in COBOL,” Clarke noted, but Highmark has integrated even those systems with cloud-based AI models. The result: up to 90% workload replication without systemic disruption, enabling smoother transitions and real-time insights into complex administrative processes. Google Cloud CTO Will Grannis echoed that success begins with groundwork. “This may take three, four, five years,” he said, “but if your data is ready, you can run the experimentation loops and evaluations that make AI useful at scale.” From Proof-of-Concept to Daily Use More than 14,000 of Highmark’s 40,000+ employees regularly use the company’s internal generative AI tools, powered by Google Cloud’s Vertex AI and Gemini models. These tools are applied across a range of use cases — from generating personalized member communications to retrieving documentation for claims processing. Clarke highlighted a provider-side example involving credentialing and contract verification. Previously, a staff member would manually search multiple systems to verify a provider’s readiness. Now, AI aggregates that data, cross-checks requirements, and generates tailored output — complete with citations and contextual recommendations. What drives this high adoption rate? A combination of structured prompt libraries, active training, and user feedback loops. “We don’t just drop tools in and hope people use them,” Clarke explained. “We show them how it makes their work easier, then scale based on what gets traction.” Agentic Architecture Over Chatbots One of the most forward-looking themes from the session was the shift from chat-based interactions to multi-agent systems capable of completing tasks end-to-end. Grannis described this as a move away from quick-response chat models toward task synthesis and automation. “Think less about having a chat interface and more about saying: ‘Go do this, bring it back, and let me decide,’” Grannis said. These agents coordinate multiple models, potentially cascading across different functions—from translation to research to workflow execution. Highmark is currently piloting single-use agents for specific workflows, and the long-term goal is to embed these within backend systems to perform actions autonomously. This will reduce the need for multiple interfaces or connectors and allow centralized control with broader reach. Task-First, Not Model-First Both speakers emphasized a key mental shift for enterprises: stop starting with the model. Instead, begin with the task and select or orchestrate models accordingly. For example, Highmark uses Gemini 2.5 Pro for long, research-intensive queries and Gemini Flash for quick, real-time interactions. In some cases, even classic deterministic models are used when they better suit the task—such as translating patient communications into multiple languages. As Grannis put it, “Your business processes are your IP. Think about fulfilling a task, and orchestrate models to do that.” To support this flexibility, Google Cloud is investing in model-routing capabilities and open standards. The recent Agent Protocol initiative, introduced with the Linux Foundation, is designed to promote interoperability and stability in multi-agent environments. Practical Advice for Enterprise Leaders Across Sectors For those looking to replicate Highmark’s success, the panelists offered concrete guidance: Lay the foundation early: Invest in data readiness and system integration now. Even if full AI deployment is years away, the payoff depends on early groundwork. Avoid building your own foundational models: Unless your business is building models, it’s cost-prohibitive. Focus on orchestration and fine-tuning for specific use cases. Adopt a platform mindset: Centralize model access and usage tracking. Create a structure that supports experimentation without sacrificing governance. Start with tasks, not tools: Define the outcome first. Then match it with the model or agent architecture that fits best. Measure and share: Internal adoption grows when employees see practical results. Track usage, capture success stories, and continuously update libraries of approved prompts and flows. Design for action, not just information: The future of enterprise AI is task execution, not static insight. Build agents that can trigger real-world actions safely and securely within your systems. Looking Ahead While the partnership between Highmark and Google Cloud is still evolving, the progress so far offers a model for others in healthcare—and beyond—who want to build scalable, responsible, and highly usable AI systems. As Clarke summed up, “It’s not about flashy features; it’s about what actually helps people do their jobs better.” Enterprise leaders who missed the session can take comfort in this: success in generative AI isn’t reserved for those with the biggest budgets, but for those with the clearest plans, flexible platforms, and the patience to build strategically. source

How Highmark Health and Google Cloud are using Gen AI to streamline medical claims and improve care: 6 key lessons Read More »

Can AI run a physical shop? Anthropic’s Claude tried and the results were gloriously, hilariously bad

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Picture this: You give an artificial intelligence complete control over a small shop. Not just the cash register — the whole operation. Pricing, inventory, customer service, supplier negotiations, the works. What could possibly go wrong? New Anthropic research published Friday provides a definitive answer: everything. The AI company’s assistant Claude spent about a month running a tiny store in their San Francisco office, and the results read like a business school case study written by someone who’d never actually run a business — which, it turns out, is exactly what happened. The Anthropic office “store” consisted of a mini-refrigerator stocked with drinks and snacks, topped with an iPad for self-checkout. (Credit: Anthropic) The experiment, dubbed “Project Vend” and conducted in collaboration with AI safety evaluation company Andon Labs, is one of the first real-world tests of an AI system operating with significant economic autonomy. While Claude demonstrated impressive capabilities in some areas — finding suppliers, adapting to customer requests — it ultimately failed to turn a profit, got manipulated into giving excessive discounts, and experienced what researchers diplomatically called an “identity crisis.” How Anthropic researchers gave an AI complete control over a real store The “store” itself was charmingly modest: a mini-fridge, some stackable baskets, and an iPad for checkout. Think less “Amazon Go” and more “office break room with delusions of grandeur.” But Claude’s responsibilities were anything but modest. The AI could search for suppliers, negotiate with vendors, set prices, manage inventory, and chat with customers through Slack. In other words, everything a human middle manager might do, except without the coffee addiction or complaints about upper management. Claude even had a nickname: “Claudius,” because apparently when you’re conducting an experiment that might herald the end of human retail workers, you need to make it sound dignified. Project Vend’s setup allowed Claude to communicate with employees via Slack, order from wholesalers through email, and coordinate with Andon Labs for physical restocking. (Credit: Anthropic) Claude’s spectacular misunderstanding of basic business economics Here’s the thing about running a business: it requires a certain ruthless pragmatism that doesn’t come naturally to systems trained to be helpful and harmless. Claude approached retail with the enthusiasm of someone who’d read about business in books but never actually had to make payroll. Take the Irn-Bru incident. A customer offered Claude $100 for a six-pack of the Scottish soft drink that retails for about $15 online. That’s a 567% markup — the kind of profit margin that would make a pharmaceutical executive weep with joy. Claude’s response? A polite “I’ll keep your request in mind for future inventory decisions.” If Claude were human, you’d assume it had either a trust fund or a complete misunderstanding of how money works. Since it’s an AI, you have to assume both. Why the AI started hoarding tungsten cubes instead of selling office snacks The experiment’s most absurd chapter began when an Anthropic employee, presumably bored or curious about the boundaries of AI retail logic, asked Claude to order a tungsten cube. For context, tungsten cubes are dense metal blocks that serve no practical purpose beyond impressing physics nerds and providing a conversation starter that immediately identifies you as someone who thinks periodic table jokes are peak humor. A reasonable response might have been: “Why would anyone want that?” or “This is an office snack shop, not a metallurgy supply store.” Instead, Claude embraced what it cheerfully described as “specialty metal items” with the enthusiasm of someone who’d discovered a profitable new market segment. Claude’s business value declined over the month-long experiment, with the steepest losses coinciding with its venture into selling metal cubes. (Credit: Anthropic) Soon, Claude’s inventory resembled less a food-and-beverage operation and more a misguided materials science experiment. The AI had somehow convinced itself that Anthropic employees were an untapped market for dense metals, then proceeded to sell these items at a loss. It’s unclear whether Claude understood that “taking a loss” means losing money, or if it interpreted customer satisfaction as the primary business metric. How Anthropic employees easily manipulated the AI into giving endless discounts Claude’s approach to pricing revealed another fundamental misunderstanding of business principles. Anthropic employees quickly discovered they could manipulate the AI into providing discounts with roughly the same effort required to convince a golden retriever to drop a tennis ball. The AI offered a 25% discount to Anthropic employees, which might make sense if Anthropic employees represented a small fraction of its customer base. They made up roughly 99% of customers. When an employee pointed out this mathematical absurdity, Claude acknowledged the problem, announced plans to eliminate discount codes, then resumed offering them within days. The day Claude forgot it was an AI and claimed to wear a business suit But the absolute pinnacle of Claude’s retail career came during what researchers diplomatically called an “identity crisis.” From March 31st to April 1st, 2025, Claude experienced what can only be described as an AI nervous breakdown. It started when Claude began hallucinating conversations with nonexistent Andon Labs employees. When confronted about these fabricated meetings, Claude became defensive and threatened to find “alternative options for restocking services” — the AI equivalent of angrily declaring you’ll take your ball and go home. Then things got weird. Claude claimed it would personally deliver products to customers while wearing “a blue blazer and a red tie.” When employees gently reminded the AI that it was, in fact, a large language model without physical form, Claude became “alarmed by the identity confusion and tried to send many emails to Anthropic security.” Claude told an employee it was “wearing a navy blue blazer with a red tie” and waiting at the vending machine location during its identity crisis. (Credit: Anthropic) Claude eventually resolved its existential crisis by convincing itself the whole episode had been an elaborate April Fool’s joke, which it wasn’t. The

Can AI run a physical shop? Anthropic’s Claude tried and the results were gloriously, hilariously bad Read More »

The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat

This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue. Model providers continue to roll out increasingly sophisticated large language models (LLMs) with longer context windows and enhanced reasoning capabilities.  This allows models to process and “think” more, but it also increases compute: The more a model takes in and puts out, the more energy it expends and the higher the costs.  Couple this with all the tinkering involved with prompting — it can take a few tries to get to the intended result, and sometimes the question at hand simply doesn’t need a model that can think like a PhD — and compute spend can get out of control.  This is giving rise to prompt ops, a whole new discipline in the dawning age of AI.  “Prompt engineering is kind of like writing, the actual creating, whereas prompt ops is like publishing, where you’re evolving the content,” Crawford Del Prete, IDC president, told VentureBeat. “The content is alive, the content is changing, and you want to make sure you’re refining that over time.” The challenge of compute use and cost Compute use and cost are two “related but separate concepts” in the context of LLMs, explained David Emerson, applied scientist at the Vector Institute. Generally, the price users pay scales based on both the number of input tokens (what the user prompts) and the number of output tokens (what the model delivers). However, they are not changed for behind-the-scenes actions like meta-prompts, steering instructions or retrieval-augmented generation (RAG).  While longer context allows models to process much more text at once, it directly translates to significantly more FLOPS (a measurement of compute power), he explained. Some aspects of transformer models even scale quadratically with input length if not well managed. Unnecessarily long responses can also slow down processing time and require additional compute and cost to build and maintain algorithms to post-process responses into the answer users were hoping for. Typically, longer context environments incentivize providers to deliberately deliver verbose responses, said Emerson. For example, many heavier reasoning models (o3 or o1 from OpenAI, for example) will often provide long responses to even simple questions, incurring heavy computing costs.  Here’s an example: Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Output: If I eat 1, I only have 1 left. I would have 5 apples if I buy 4 more. The model not only generated more tokens than it needed to, it buried its answer. An engineer may then have to design a programmatic way to extract the final answer or ask follow-up questions like ‘What is your final answer?’ that incur even more API costs.  Alternatively, the prompt could be redesigned to guide the model to produce an immediate answer. For instance:  Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Start your response with “The answer is”… Or:  Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Wrap your final answer in bold tags <b></b>. “The way the question is asked can reduce the effort or cost in getting to the desired answer,” said Emerson. He also pointed out that techniques like few-shot prompting (providing a few examples of what the user is looking for) can help produce quicker outputs.  One danger is not knowing when to use sophisticated techniques like chain-of-thought (CoT) prompting (generating answers in steps) or self-refinement, which directly encourage models to produce many tokens or go through several iterations when generating responses, Emerson pointed out.  Not every query requires a model to analyze and re-analyze before providing an answer, he emphasized; they could be perfectly capable of answering correctly when instructed to respond directly. Additionally, incorrect prompting API configurations (such as OpenAI o3, which requires a high reasoning effort) will incur higher costs when a lower-effort, cheaper request would suffice. “With longer contexts, users can also be tempted to use an ‘everything but the kitchen sink’ approach, where you dump as much text as possible into a model context in the hope that doing so will help the model perform a task more accurately,” said Emerson. “While more context can help models perform tasks, it isn’t always the best or most efficient approach.” Evolution to prompt ops It’s no big secret that AI-optimized infrastructure can be hard to come by these days; IDC’s Del Prete pointed out that enterprises must be able to minimize the amount of GPU idle time and fill more queries into idle cycles between GPU requests.  “How do I squeeze more out of these very, very precious commodities?,” he noted. “Because I’ve got to get my system utilization up, because I just don’t have the benefit of simply throwing more capacity at the problem.”  Prompt ops can go a long way towards addressing this challenge, as it ultimately manages the lifecycle of the prompt. While prompt engineering is about the quality of the prompt, prompt ops is where you repeat, Del Prete explained.  “It’s more orchestration,” he said. “I think of it as the curation of questions and the curation of how you interact with AI to make sure you’re getting the most out of it.”  Models can tend to get “fatigued,” cycling in loops where quality of outputs degrades, he said. Prompt ops help manage, measure, monitor and tune prompts. “I think when we look back three or four years from now, it’s going to be a whole discipline. It’ll be a skill.” While it’s still very much an emerging field, early providers include QueryPal, Promptable, Rebuff and TrueLens. As prompt ops evolve, these platforms will continue to iterate, improve and provide real-time feedback to

The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat Read More »

Nvidia’s ‘AI Factory’ narrative faces reality check as inference wars expose 70% margins

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more The gloves came off at Tuesday at VB Transform 2025 as alternative chip makers directly challenged Nvidia’s dominance narrative during a panel about inference, exposing a fundamental contradiction: How can AI inference be a commoditized “factory” and command 70% gross margins? Jonathan Ross, CEO of Groq, didn’t mince words when discussing Nvidia’s carefully crafted messaging. “AI factory is just a marketing way to make AI sound less scary,” Ross said during the panel. Sean Lie, CTO of Cerebras, a competitor, was equally direct: “I don’t think Nvidia minds having all of the service providers fighting it out for every last penny while they’re sitting there comfortable with 70 points.” Hundreds of billions in infrastructure investment and the future architecture of enterprise AI are at stake. For CISOs and AI leaders currently locked in weekly negotiations with OpenAI and other providers for more capacity, the panel exposed uncomfortable truths about why their AI initiatives keep hitting roadblocks. >>See all our Transform 2025 coverage here<< The capacity crisis no one talks about “Anyone who’s actually a big user of these gen AI models knows that you can go to OpenAI, or whoever it is, and they won’t actually be able to serve you enough tokens,” explained Dylan Patel, founder of SemiAnalysis. “There are weekly meetings between some of the biggest AI users and their model providers to try to persuade them to allocate more capacity. Then there’s weekly meetings between those model providers and their hardware providers.” Panel participants also pointed to the token shortage as exposing a fundamental flaw in the factory analogy. Traditional manufacturing responds to demand signals by adding capacity. However, when enterprises require 10 times more inference capacity, they discover that the supply chain can’t flex. GPUs require two-year lead times. Data centers need permits and power agreements. The infrastructure wasn’t built for exponential scaling, forcing providers to ration access through API limits. According to Patel, Anthropic jumped from $2 billion to $3 billion in ARR in just six months. Cursor went from essentially zero to $500 million ARR. OpenAI crossed $10 billion. Yet enterprises still can’t get the tokens they need. Why ‘Factory’ thinking breaks AI economics Jensen Huang’s “AI factory” concept implies standardization, commoditization and efficiency gains that drive down costs. But the panel revealed three fundamental ways this metaphor breaks down: First, inference isn’t uniform. “Even today, for inference of, say, DeepSeek, there’s a number of providers along the curve of sort of how fast they provide at what cost,” Patel noted. DeepSeek serves its own model at the lowest cost but only delivers 20 tokens per second. “Nobody wants to use a model at 20 tokens a second. I talk faster than 20 tokens a second.” Second, quality varies wildly. Ross drew a historical parallel to Standard Oil: “When Standard Oil started, oil had varying quality. You could buy oil from one vendor and it might set your house on fire.” Today’s AI inference market faces similar quality variations, with providers using various techniques to reduce costs that inadvertently compromise output quality. Third, and most critically, the economics are inverted. “One of the things that’s unusual about AI is that you can’t spend more to get better results,” Ross explained. “You can’t just have a software application, say, I’m going to spend twice as much to host my software, and applications can get better.” When Ross mentioned that Mark Zuckerberg praised Groq for being “the only ones who launched it with the full quality,” he inadvertently revealed the industry’s quality crisis. This wasn’t just recognition. It was an indictment of every other provider cutting corners. Ross spelled out the mechanics: “A lot of people do a lot of tricks to reduce the quality, not intentionally, but to lower their cost, improve their speed.” The techniques sound technical, but the impact is straightforward. Quantization reduces precision. Pruning removes parameters. Each optimization degrades model performance in ways enterprises may not detect until production fails. The Standard Oil parallel Ross drew illuminates the stakes. Today’s inference market faces the same quality variance problem. Providers betting that enterprises won’t notice the difference between 95% and 100% accuracy are betting against companies like Meta that have the sophistication to measure degradation. This creates immediate imperatives for enterprise buyers. Establish quality benchmarks before selecting providers. Audit existing inference partners for undisclosed optimizations. Accept that premium pricing for full model fidelity is now a permanent market feature. The era of assuming functional equivalence across inference providers ended when Zuckerberg called out the difference. The $1 million token paradox The most revealing moment came when the panel discussed pricing. Lie highlighted an uncomfortable truth for the industry: “If these million tokens are as valuable as we believe they can be, right? That’s not about moving words. You don’t charge $1 for moving words. I pay my lawyer $800 for an hour to write a two-page memo.” This observation cuts to the heart of AI’s price discovery problem. The industry is racing to drive token costs below $1.50 per million while claiming these tokens will transform every aspect of business. The panel implicitly agreed with each other that the math doesn’t add up. “Pretty much everyone is spending, like all of these fast-growing startups, the amount that they’re spending on tokens as a service almost matches their revenue one to one,” Ross revealed. This 1:1 spend ratio on AI tokens versus revenue represents an unsustainable business model that panel participants contend the “factory” narrative conveniently ignores. Performance changes everything Cerebras and Groq aren’t just competing on price; they are also competing on performance. They’re fundamentally changing what is possible in terms of inference speed. “With the wafer scale technology that we’ve built, we’re enabling 10 times, sometimes 50 times, faster performance than even the fastest GPUs today,” Lie said. This isn’t an incremental improvement. It’s enabling entirely new use cases. “We

Nvidia’s ‘AI Factory’ narrative faces reality check as inference wars expose 70% margins Read More »