VentureBeat

Stack Overflow data reveals the hidden productivity tax of ‘almost right’ AI code

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now More developers than ever before are using AI tools to both assist and generate code. While enterprise AI adoption accelerates, new data from Stack Overflow’s 2025 Developer Survey exposes a critical blind spot: the mounting technical debt created by AI tools that generate “almost right” solutions, potentially undermining the productivity gains they promise to deliver. Stack Overflow’s annual developer survey is one of the largest such reports in any given year. In 2024 the report found that developers were not worried that AI would still their jobs. Somewhat ironically,  Stack Overflow was initially negatively impacted by the growth of gen AI, with declining traffic and resulting layoffs in 2023. The 2025 survey of over 49,000 developers across 177 countries reveals a troubling paradox in enterprise AI adoption. AI usage continues climbing—84% of developers now use or plan to use AI tools, up from 76% in 2024. Yet trust in these tools has cratered. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF “One of the most surprising findings was a significant shift in developer preferences for AI compared to previous years, while most developers use AI, they like it less and trust it less this year,” Erin Yepis, Senior Analyst for Market Research and Insights at Stack Overflow, told VentureBeat. “This response is surprising because with all of the investment in and focus on AI in tech news, I would expect that the trust would grow as the technology gets better.” The numbers tell the story. Only 33% of developers trust AI accuracy in 2025, down from 43% in 2024 and 42% in 2023. AI favorability dropped from 77% in 2023 to 72% in 2024 to just 60% this year. But the survey data reveals a more urgent concern for technical decision-makers. Developers cite “AI solutions that are almost right, but not quite” as their top frustration—66% report this problem. Meanwhile, 45% say debugging AI-generated code takes more time than expected. AI tools promise productivity gains but may actually create new categories of technical debt. The ‘almost right’ phenomenon disrupts developer workflows AI tools don’t just produce obviously broken code. They generate plausible solutions that require significant developer intervention to become production-ready. This creates a particularly insidious productivity problem. “AI tools seem to have a universal promise of saving time and increasing productivity, but developers are spending time addressing the unintended breakdowns in the workflow caused by AI,” Yepis explained. “Most developers say AI tools do not address complexity, only 29% believed AI tools could handle complex problems this year, down from 35% last year.” Unlike obviously broken code that developers quickly identify and discard, “almost right” solutions demand careful analysis. Developers must understand what’s wrong and how to fix it. Many report it would be faster to write the code from scratch than to debug and correct AI-generated solutions. The workflow disruption extends beyond individual coding tasks. The survey found 54% of developers use six or more tools to complete their jobs. This adds context-switching overhead to an already complex development process. Enterprise governance frameworks trail behind adoption Rapid AI adoption has outpaced enterprise governance capabilities. Organizations now face potential security and technical debt risks they haven’t fully addressed. “Vibe coding requires a higher level of trust in the AI’s output, and sacrifices confidence and potential security concerns in the code for a faster turnaround,” Ben Matthews, Senior Director of Engineering at Stack Overflow, told VentureBeat. Developers largely reject vibe coding for professional work, with 77% noting that it’s not part of their professional development process. Yet the survey reveals gaps in how enterprises manage AI-generated code quality. Matthews warns that AI coding tools powered by LLMs can and do produce mistakes. He noted that while knowledgeable developers are able to identify and test vulnerable code themselves, LLMs are sometimes simply unable to even register any mistakes they may produce. Security risks compound these quality issues. The survey data shows that when developers would still turn to humans for coding help, 61.7% cite “ethical or security concerns about code” as a key reason. This suggests that AI tools introduce integration challenges around data access, performance and security that organizations are still learning to manage. Developers still use Stack Overflow and other human sources of expertise Despite declining trust, developers aren’t abandoning AI tools. They’re developing more sophisticated strategies for integrating them into workflows. The survey shows 69% of developers spent time learning new coding techniques or programming languages in the past year. Of these, 44% used AI-enabled tools for learning, up from 37% in 2024. Even with the rise of vibe coding and AI, the survey data shows that developers maintain strong connections to human expertise and community resources. Stack Overflow remains the top community platform at 84% usage. GitHub follows at 67% and YouTube at 61%. Most tellingly, 89% of developers visit Stack Overflow multiple times per month. Among these, 35% turn to the platform specifically after encountering issues with AI responses. “Although we have seen a decline in traffic, in no way is it as dramatic as some would indicate,” Jody Bailey, Chief Product & Technology Officer, told VentureBeat. That said, Bailey did admit that times change and the day-to-day needs of users are not the same as they were 16 years ago when Stack Overflow got started. He noted that there is not a single site or company not seeing a shift in where users come from or how they are now engaging with gen AI tools. That shift is causing Stack Overflow to critically reassess how it gauges success in

Stack Overflow data reveals the hidden productivity tax of ‘almost right’ AI code Read More »

C8 Health started with an AI that gives anesthesiologists guidance on demand — now it’s targeting whole hospitals

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Medicine is one of the most highly regulated fields in the world, and for good reason — the difference between doing a process correctly and incorrectly can often be that of life or death. But think of the many people involved in providing care at hospitals: it’s not just doctors and nurses, but also the entire medical support staff who handle patient records, equipment, and dispose of medical waste. They all need to be following the rules and best practices to ensure the hospital remains a safe and healthy place to work, administer, and receive care. Yet each job duty and department has its own list of guidance and best practices to follow — often siloed away in different applications like SharePoint, SmartVault, Docuware or others. For New York City AI startup C8 Health, that disconnect isn’t just an inconvenience — it’s a $345 billion problem. “When I began practicing medicine, I was shocked by how hard it was to access the information I needed,” said Dr. Ido Zamberg, an anesthesiologist and C8’s Chief Medical Officer, in a recent video call interview with VentureBeat. “In a field so knowledge-intensive, it felt absurd to have to search across 10 or 15 different systems just to find an answer.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Now, the New York-based startup that initially focused on providing anesthesiologists with knowledge to put people under is betting that its AI-powered chatbot and knowledge platform, built to bring structure and immediacy to clinical best practices, can solve it. “Our platform ensures knowledge is always accessible — whether on mobile, desktop, or within the Electronic Medical Records (EMR) — so clinicians don’t waste time hunting across 20 systems,” said CEO and co-founder Galia Rosen Schwarz in the same interview. Backed by a fresh $12 million Series A round led by Team8, with 10D and Vertex Ventures also participating, the company plans to scale up deployments and broaden its reach across the U.S. healthcare system. Funding to expand The newly announced funding brings C8 Health’s total raised to $18 million. According to Rosen Schwarz, the capital will go toward expanding the team, refining the product, and meeting the growing demand among hospital systems seeking a better way to deliver on their own standards of care. Founded in 2022, the company is focused squarely on one of the most persistent challenges in healthcare: ensuring that evidence-based best practices actually make it into the hands of those delivering care — regardless of whether they’re on night shift, rotating in from another facility, or just starting their residency. “In many hospitals, protocols are still printed out and taped to the walls,” said Zamberg. “Stakeholders know no one has time to dig through software to find them. It’s that rudimentary.” Zamberg, a trained physician and former software engineer, built the first version of C8 Health as a workaround to this challenge in his own department. But he and the company’s other leadership saw the problem went far beyond the anesthesiology department. As Rosen Schwarz put it: “I interviewed over 100 health professionals in the U.S. and Switzerland, and it became clear how massive this problem was — the impact on both providers and patients was undeniable.” Little surprise, then C8’s application quickly spread, first to 13,000 employees across five hospitals in Switzerland, and then on to more than 100 hospitals across the U.S., with clients including Dartmouth Health, Mount Sinai, MetroHealth, and the University of Texas Medical Branch. A Red Panda avatar provides suggestions without prompting C8 Health’s platform aims to centralize every piece of clinical guidance—policies, protocols, guidelines, educational materials—and make it instantly accessible in a format tailored to the clinician’s role, department, and even daily schedule. The system is available via mobile, desktop, and directly within EMRs, allowing hospital staff to engage with it in the flow of their work. A friendly red panda avatar serves up the knowledge from the organization’s own siloed databases, complete with citations to the underlying knowledge sources, files, and documents. The system doesn’t just wait for users to query. Based on behavioral patterns, schedule data, and institutional context, it can proactively surface relevant protocols before a scheduled procedure, or deliver targeted quality reminders if an individual’s performance metrics are slipping. “We don’t just let users search,” Rosen Schwartz explained. “We proactively push the right content to the right person, at the right time—based on their role, behavior, and what others like them are doing.” Early impact earns rave reviews In each deployment, the company reports clinician adoption rates above 90% within three to six months—a notable achievement in a sector where new software tools often struggle to gain traction. At MetroHealth, Dr. Luis Tollinche, Chair of Anesthesiology, described the state of affairs before C8 as a mess of six different protocol locations—email threads, shared drives, and policy databases among them. “We had protocols scattered across six different locations—emails, shared drives, policy databases, even cognitive aids in the EHR,” he wrote in a quote provided to VentureBeat by C8. “When clinicians needed guidance, they often couldn’t remember where to find it, or simply gave up trying. We needed a single, reliable source that made our best practices instantly accessible at the point of care.” After deployment, over 750 knowledge items were centralized, daily engagement hit 3.49 views per user, and nearly 90% of that activity came from mobile devices. Real-time feedback and performance tracking One of C8’s defining features is its ability to integrate performance data into the same experience clinicians use to access knowledge. Through dashboards

C8 Health started with an AI that gives anesthesiologists guidance on demand — now it’s targeting whole hospitals Read More »

Digital Twins and AI: Powering the future of creativity at Nestlé

Presented by NVIDIA and Microsoft Consumer preferences can shift in the blink of an eye in the fast-paced retail landscape, forcing brands to do more than just keep up — they need to stand out. Accomplishing that requires not just creativity but also deeply personalized marketing strategies that resonate and evolve with every customer interaction.   Many marketers now use AI for content generation, data-driven personalization, and campaign optimization. The pace of technology adoption is picking up speed across the retail and consumer packaged goods sectors, transforming the way brands connect, operate, and compete.  NVIDIA and Microsoft are helping companies in these sectors, including food and beverage leader Nestlé, leverage AI and 3D digital twins — virtual models of physical objects, processes, or environments — to transform creative workflows and drive marketing innovation. Powered by NVIDIA Omniverse libraries on Microsoft Azure, OpenUSD (Universal Scene Description), and integration from Accenture Song, marketers have a technical foundation that enables real-time collaboration, interactive experiences, and 3D asset management. Using digital twins to transform marketing  The digital transformation of retail marketing is progressing quickly, with global revenue for AI in marketing projected to reach $47 billion this year and soar above $107 billion by 2028. Meanwhile, brands leveraging digital twins are reporting up to 30% faster content production cycles and significant cost savings in creative development. These dynamic models are unlocking new possibilities across marketing and creative workflows, including:  Content creation: Digital twins can generate high-quality, photo-realistic marketing content at scale by automating repetitive tasks and enabling faster iterations on creative campaigns.  Predictive analytics: By connecting digital twins with data sources, businesses can gain insights into customer behavior, predict trends, and optimize marketing strategies.   Product configuration: Interactive digital twins of products allow customers in real time to customize and visualize products for improved online shopping experiences and targeted marketing campaigns.  Immersive customer experiences: Interactive digital twins also can be streamed to augmented and extended reality devices, offering immersive experiences in which consumers can explore products.  These uses of digital twins can produce several beneficial effects in an enterprise’s marketing efforts, including cost savings, a better ability to quickly create campaigns and products that keep up with shifting customer demands, and greater effectiveness in personalizing efforts.  How Nestlé has embraced tech-forward creative strategies  NVIDIA, Microsoft, and Accenture Song already are helping Nestlé, the world’s largest food and beverage company, to use digital twins, AI, and a range of related tools at scale to empower tech-forward marketing strategies.  This collaboration is launching an AI-powered, in-house service to create high-quality product content at scale for ecommerce and digital media channels. Based on digital twins powered by NVIDIA Omniverse, Nestlé can create exact 3D virtual replicas of physical products, with the ability to adjust or localize product packaging digitally. This means that new creative content can be generated without having to constantly reshoot from scratch.  More than 250 Nestlé global digital specialists are relying on these solutions to enable the company’s brand marketing strategies at scale. Nestlé already has a library of 4,000 3D digital products, mainly for global brands, with the ambition to convert a total of 10,000 products into digital twins in the next two years.   Enabling digital marketing with NVIDIA and Microsoft platforms   Nestlé’s approach exemplifies how retail brands can benefit from advanced digital marketing strategies, but this requires the right technologies and expertise to achieve. NVIDIA Omniverse on Microsoft Azure and OpenUSD provide a powerful platform that Nestlé and other enterprises can use to achieve creative transformation.   NVIDIA Omniverse on Azure allows for building and seamlessly integrating advanced simulation and generative AI into existing 3D workflows. This cloud-based platform includes APIs and services enabling developers to easily integrate OpenUSD, as well as other sensor and rendering applications. OpenUSD’s capabilities accelerate workflows, teams, and projects when creating 3D assets and environments for large-scale, AI-enabled virtual worlds.   The Omniverse Development Workstation on Azure accelerates the process of building Omniverse apps and tools, removing the time and complexity of configuring individual software packages and GPU drivers. With NVIDIA Omniverse on Azure and OpenUSD, marketing teams can create ultra-realistic 3D product previews and environments so that customers can explore a retailer’s products in an engaging and informative way. The platform also can deliver immersive augmented and virtual reality experiences for customers, such as virtually test-driving a car or seeing how new furniture pieces would look in an existing space.   For retailers, NVIDIA Omniverse can help create digital twins of stores or in-store displays to simulate and evaluate different layouts to optimize how customers interact with them. These augmented reality tools can overlay digital information onto the physical store, helping employees in monitoring, stocking, and resetting sales shelves or inventory. When a digital twin and in-store data are connected, NVIDIA Omniverse on Azure can even unearth insights into customer behavior and sales statistics.   Creating successful digital campaigns with Accenture Song  Accenture Song, the world’s largest tech-powered creative group, acted as a systems integrator for Nestlé’s adoption of digital marketing platforms. The group offers a full range of services, including product design, technology platforms, and marketing transformation. With help from NVIDIA, Accenture Song creates cinematic 3D environments via conversational prompts for fully immersive 3D scenes and digital twins.   At this year’s Cannes Lions Festival of Creativity, Accenture Song showed attendees how technology and creativity can be effectively fused together for persuasive results. This collaboration between NVIDIA, Microsoft, and Accenture Song has enabled Nestlé as a leader in the digital transformation of consumer packaged goods marketing. From transformed workflows, enhanced personalization for consumers, and driving innovation, Nestlé is able to improve customer connections and drive business growth.   Learn more about digital twins and AI in marketing  Some of the world’s most successful retail and consumer packaged goods companies are finding that digital solutions such as digital twins and AI can enhance their marketing operations and, as a result, boost sales.   NVIDIA Omniverse on Azure provides retailers with a powerful, collaborative platform to create highly engaging retail experiences, optimize in-store operations, and streamline workflows

Digital Twins and AI: Powering the future of creativity at Nestlé Read More »

Nightfall launches ‘Nyx,’ an AI that automates data loss prevention at enterprise scale

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Nightfall AI launched the industry’s first autonomous data loss prevention platform Wednesday, introducing an AI agent that automatically investigates security incidents and tunes policies without human intervention — a breakthrough that could reshape how enterprises protect sensitive information in an era of expanding cyber threats. The San Francisco-based startup’s new platform, called Nightfall Nyx, represents a fundamental shift from traditional data loss prevention tools that rely on manual rule-setting and generate high volumes of false alerts. Instead, the system uses an AI agent to mirror the work of security analysts, automatically prioritizing threats and distinguishing between legitimate business activities and genuine security risks. “Security teams are drowning in alerts while sophisticated insider threats slip through legacy DLP systems,” said Rohan Sathe, CEO and co-founder of Nightfall, in an exclusive interview with VentureBeat. “When analysts spend hours investigating false positives only to discover that real threats went undetected because they didn’t match a predefined pattern, organizations aren’t just losing time—they’re losing control over their most sensitive data.” The announcement comes as enterprises grapple with an explosion of data security challenges driven by remote work, cloud adoption, and the rapid proliferation of AI tools in the workplace. The global cybersecurity market, valued at approximately $173 billion in 2023, is expected to reach $270 billion by 2026, with data protection representing a significant portion of that growth. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How AI-powered detection cuts false alerts from 80% to 5% Traditional data loss prevention systems have long frustrated security teams with accuracy rates as low as 10-20%, according to Sathe. These legacy platforms rely heavily on pattern matching and regular expressions to identify sensitive data, creating a constant stream of false alerts that require manual investigation. “What ends up happening is you end up staffing like a SOC analyst to go and sift through all the false positives,” Sathe explained. “With an AI kind of native approach to actually doing content classification, you can get in that like 90, 95% accuracy.” Nightfall’s approach combines three AI-powered components: advanced content classification using large language models and computer vision, data lineage tracking that understands where information originates and travels, and autonomous policy optimization that learns from user behavior over time. The platform’s AI agent, dubbed “Nix,” sits atop this detection infrastructure and “basically mirrors what a DLP SOC analyst would do,” Sathe said. “Taking a look at all the incidents that Nightfall surfaces in the dashboard, and then making recommendations on what to investigate most urgently, and then what policy tweaks to make to differentiate between real business workflows versus things that are actually dangerous.” The platform arrives as enterprises confront a new category of data risk: “Shadow AI,” where employees use unauthorized artificial intelligence tools like ChatGPT, Claude, or Copilot for work tasks, often inadvertently exposing sensitive corporate information. Unlike traditional DLP solutions that rely on static application allow-lists or basic content scanning, Nightfall captures the actual content pasted, typed, or uploaded to AI tools, along with data lineage showing where the information originated. The system can monitor prompt-level interactions across major AI platforms including ChatGPT, Microsoft Copilot, Claude, Gemini, and Perplexity. “It’s a little meta, because it’s like, AI is identifying risks of AI usage,” Sathe noted. The platform analyzes content being shared with AI applications, tracks where that content originated, and determines whether usage patterns represent normal business activity or potential security violations. Customer adoption surges as accuracy rates hit 95% across enterprise deployments Nightfall’s approach has gained traction among enterprise customers seeking alternatives to legacy solutions from Microsoft, Google, and traditional cybersecurity vendors. The company now serves “many hundreds” of customers and processes “hundreds of terabytes a day” of data across deployments supporting over 50,000 employees, according to Sathe. Aaron’s, the furniture retailer, exemplifies the customer value proposition. The company previously struggled with a legacy DLP solution that generated excessive false positives when monitoring Slack communications. After deploying Nightfall, “they were like, wow, we can really cut down the time that we need to go investigate all these things, because most of everything that you’re surfacing to us is actually legitimate and things that we’re looking for,” Sathe said. The rapid adoption reflects broader market frustration with traditional approaches. Within six months of launching its endpoint DLP capabilities, Nightfall achieved 20% penetration among its existing customer base — a metric Sathe highlighted as evidence of strong product-market fit. Legacy DLP vendors face disruption from autonomous security platforms Nightfall competes against established players including Microsoft Purview, which comes bundled with enterprise Office 365 licenses, as well as dedicated DLP vendors like Forcepoint, Symantec, and newer entrants. However, Sathe argues that bundled solutions carry hidden costs in the form of human labor required to manage false positives. “Sure, they threw it in for free, quote unquote, but then you had to staff a SOC analyst to go and review all this stuff,” he said. “Hiring people, training them, and having them spend time on DLP, when they could be doing something else, from an opportunity cost standpoint is also dollars at the end of the day.” The company’s lightweight architecture, which uses API-based integrations rather than network proxies, enables faster deployment compared to traditional solutions that can require three to six months for implementation. Nightfall customers typically see value within weeks rather than months, according to Sathe. Lightweight architecture enables weeks-long deployments vs. months-long rollouts Central to Nightfall’s differentiation is its AI-native architecture. While legacy systems require extensive manual tuning to reduce false positives, Nightfall employs machine learning models that improve automatically

Nightfall launches ‘Nyx,’ an AI that automates data loss prevention at enterprise scale Read More »

Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Writer, the enterprise AI company valued at $1.9 billion, launched an autonomous “super agent” Tuesday that can independently execute complex, multi-step business tasks across hundreds of software platforms — marking a significant escalation in the corporate AI arms race. The new Action Agent represents a fundamental shift from AI chatbots that simply answer questions to systems that can autonomously complete entire projects. The agent can browse websites, analyze data, create presentations, write code and coordinate work across an organization’s entire technology stack without human intervention. “Other AI chatbots can tell you what to do,” said May Habib, Writer’s CEO and co-founder. “Action Agent does it. It’s the difference between getting a research report and having your entire sales pipeline updated and acted upon.” The launch positions San Francisco-based Writer as a formidable competitor to Microsoft’s Copilot and OpenAI’s ChatGPT in the lucrative enterprise market. Unlike consumer-focused AI tools, Writer’s agent includes enterprise-grade security controls and audit trails required by regulated industries like banking and healthcare. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How Writer’s super agent executes tasks other AI can only describe Writer’s Action Agent fundamentally differs from existing AI assistants by operating at what the company calls “level four orchestration” — the highest tier of AI automation. Most current enterprise AI tools operate at levels one or two, handling basic tasks like answering questions or retrieving documents. “What we’ve done here is full orchestration,” Matan-Paul Shetrit, Writer’s head of product, explained in an interview with VentureBeat. “This is an agent that calls agents, writes its own tools when needed, can execute on that with full visibility.” The distinction goes far beyond simple automation capabilities. While traditional AI assistants like ChatGPT or Copilot are “very much built for like a Q&A experience,” Shetrit noted, Action Agent is designed for execution. “The difference is not, let me do this back and forth brainstorming, but more, once I want do the brainstorming, I can act on it.” The agent operates within its own isolated virtual computer for each session, allowing it to independently browse web pages, build software, solve technical problems and execute complex multi-step plans. When asked to perform a product analysis, for example, Action Agent will automatically process thousands of customer reviews, perform sentiment analysis, identify themes and generate a presentation — all without human guidance. The system’s capabilities extend to generating its own tools when existing ones prove insufficient. “It can action whether or not it has MCP or any tool access, because it can just generate its own tools on the fly for the purpose of the task,” Shetrit explained. During a demonstration, Shetrit showed the agent conducting clinical trial site selection — a process that typically requires weeks of human research. The agent systematically analyzed demographics across multiple cities, ranked locations by suitability criteria and generated comprehensive reports with supporting evidence. “This is weeks worth of work by these companies,” Shetrit noted. “It’s not something that’s trivial to do.” Breaking benchmarks: Action agent outperforms OpenAI on key tests Writer’s claims about capabilities are backed by impressive benchmark results. Action Agent scored 61% on GAIA Level 3, the most challenging benchmark for AI agent performance, outperforming competing systems including OpenAI’s Deep Research. The agent also achieved a 10.4% score on the CUB (Computer Use Benchmark) leaderboard, making it the top performer for computer and browser use tasks. These results demonstrate the agent’s ability to handle complex reasoning tasks that have traditionally stumped AI systems. GAIA Level 3 tests require agents to navigate multiple tools, synthesize information from various sources and complete multi-step workflows — precisely the kind of work that enterprises need automated. The performance stems from Writer’s Palmyra X5 model, which features a one-million-token context window — enough to process hundreds of pages of documents simultaneously while maintaining coherence across complex tasks. This massive context capability allows the agent to work with entire codebases, lengthy research reports and comprehensive datasets without losing track of the overall objective. Writer’s enterprise focus sets it apart in a market dominated by consumer-oriented AI companies attempting to adapt their products for business use. The company built Action Agent on its existing enterprise platform, which already serves hundreds of major corporations, including Accenture, Vanguard, Qualcomm, Uber and Salesforce. The distinction proves crucial for enterprise adoption. While consumer AI tools often operate as “black boxes” with limited transparency, Writer’s system provides complete audit trails showing exactly how the agent reached its conclusions and what actions it took. Shetrit emphasized this transparency as essential for regulated industries: “If you start talking about some of the largest companies in the world, whether it’s banks, pharmaceutical or healthcare, it’s unacceptable that you don’t know how these autonomous agents are behaving and what they’re doing.” The system provides “full traceability, auditability and visibility,” allowing IT administrators to set fine-grained permissions controlling which tools each agent can access and what actions they can perform. Action Agent’s ability to connect with more than 600 enterprise tools represents a significant technical achievement. The agent uses Model Context Protocol (MCP), an emerging standard for AI tool integration, but Writer has enhanced it with enterprise-grade controls that address security and governance concerns. Writer has been working closely with Amazon Web Services and other industry players to bring MCP to enterprise standards. “There’s still place to bring it to enterprise grade,” Shetrit noted, referencing recent issues with MCP implementations at companies like Asana and GitHub. The company’s approach allows granular control that extends beyond simple user permissions. “It’s not just by a user,” Shetrit explained.

Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks Read More »

Acree opens up new enterprise-focused, customizable AI model AFM-4.5B trained on ‘clean, rigorously filtered data’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Arcee.ai, a startup focused on developing small AI models for commercial and enterprise use, is opening up its own AFM-4.5B model for limited free usage by small companies — posting the weights on Hugging Face and allowing enterprises that make less than $1.75 million in annual revenue to use it without charge under a custom “Acree Model License.“ Designed for real-world enterprise use, the 4.5-billion-parameter model — much smaller than the tens of billions to trillions of leading frontier models — combines cost efficiency, regulatory compliance, and strong performance in a compact footprint. AFM-4.5B was one of a two part release made by Acree last month, and is already “instruction tuned,” or an “instruct” model, which is designed for chat, retrieval, and creative writing and can be deployed immediately for these use cases in enterprises. Another base model was also released at the time that was not instruction tuned, only pre-trained, allowing more customizability by customers. However, both were only available through commercial licensing terms — until now. Acree’s chief technology officer (CTO) Lucas Atkins also noted in a post on X that more “dedicated models for reasoning and tool use are on the way,” as well. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF “Building AFM-4.5B has been a huge team effort, and we’re deeply grateful to everyone who supported us We can’t wait to see what you build with it,” he wrote in another post. “We’re just getting started. If you have feedback or ideas, please don’t hesitate to reach out at any time.” The model is available now for deployment across a variety of environments —from cloud to smartphones to edge hardware. It’s also geared toward Acree’s growing list of enterprise customers and their needs and wants — specifically, a model trained without violating intellectual property. As Acree wrote in its initial AFM-4.5B announcement post last month: “Tremendous effort was put towards excluding copyrighted books and material with unclear licensing.” Acree notes it worked with third-party data curation firm DatologyAI to apply techniques like source mixing, embedding-based filtering, and quality control — all aimed at minimizing hallucinations and IP risks. Focused on enterprise customer needs AFM-4.5B is Arcee.ai’s response to what it sees as major pain points in enterprise adoption of generative AI: high cost, limited customizability, and regulatory concerns around proprietary large language models (LLMs). Over the past year, the Arcee team held discussions with more than 150 organizations, ranging from startups to Fortune 100 companies, to understand the limitations of existing LLMs and define their own model goals. According to the company, many businesses found mainstream LLMs — such as those from OpenAI, Anthropic, or DeepSeek — too expensive and difficult to tailor to industry-specific needs. Meanwhile, while smaller open-weight models like Llama, Mistral, and Qwen offered more flexibility, they introduced concerns around licensing, IP provenance, and geopolitical risk. AFM-4.5B was developed as a “no-trade-offs” alternative: customizable, compliant, and cost-efficient without sacrificing model quality or usability. AFM-4.5B is designed with deployment flexibility in mind. It can operate in cloud, on-premise, hybrid, or even edge environments—thanks to its efficiency and compatibility with open frameworks such as Hugging Face Transformers, llama.cpp, and (pending release) vLLM. The model supports quantized formats, allowing it to run on lower-RAM GPUs or even CPUs, making it practical for applications with constrained resources. Company vision secures backing Arcee.ai’s broader strategy focuses on building domain-adaptable, small language models (SLMs) that can power many use cases within the same organization. As CEO Mark McQuade explained in a VentureBeat interview last year, “You don’t need to go that big for business use cases.” The company emphasizes fast iteration and model customization as core to its offering. This vision gained investor backing with a $24 million Series A round back in 2024. Inside AFM-4.5B’s architecture and training process The AFM-4.5B model uses a decoder-only transformer architecture with several optimizations for performance and deployment flexibility. It incorporates grouped query attention for faster inference and ReLU² activations in place of SwiGLU to support sparsification without degrading accuracy. Training followed a three-phase approach: Pretraining on 6.5 trillion tokens of general data Midtraining on 1.5 trillion tokens emphasizing math and code Instruction tuning using high-quality instruction-following datasets and reinforcement learning with verifiable and preference-based feedback To meet strict compliance and IP standards, the model was trained on nearly 7 trillion tokens of data curated for cleanliness and licensing safety. A competitive model, but not a leader Despite its smaller size, AFM-4.5B performs competitively across a broad range of benchmarks. The instruction-tuned version averages a score of 50.13 across evaluation suites such as MMLU, MixEval, TriviaQA, and Agieval—matching or outperforming similar-sized models like Gemma-3 4B-it, Qwen3-4B, and SmolLM3-3B. Multilingual testing shows the model delivers strong performance across more than 10 languages, including Arabic, Mandarin, German, and Portuguese. According to Arcee, adding support for additional dialects is straightforward due to its modular architecture. AFM-4.5B has also shown strong early traction in public evaluation environments. In a leaderboard that ranks conversational model quality by user votes and win rate, the model ranks third overall, trailing only Claude Opus 4 and Gemini 2.5 Pro. It boasts a win rate of 59.2% and the fastest latency of any top model at 0.2 seconds, paired with a generation speed of 179 tokens per second. Built-in support for agents In addition to general capabilities, AFM-4.5B comes with built-in support for function calling and agentic reasoning. These features aim to simplify the process of building AI agents and workflow automation tools, reducing the need for complex prompt engineering or orchestration layers. This functionality

Acree opens up new enterprise-focused, customizable AI model AFM-4.5B trained on ‘clean, rigorously filtered data’ Read More »

Positron believes it has found the secret to take on Nvidia in AI inference chips — here’s how it could benefit enterprises

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As demand for large-scale AI deployment skyrockets, the lesser-known, private chip startup Positron is positioning itself as a direct challenger to market leader Nvidia by offering dedicated, energy-efficient, memory-optimized inference chips aimed at relieving the industry’s mounting cost, power, and availability bottlenecks. “A key differentiator is our ability to run frontier AI models with better efficiency—achieving 2x to 5x performance per watt and dollar compared to Nvidia,” said Thomas Sohmers, Positron co-founder and CTO, in a recent video call interview with VentureBeat. Obviously, that’s good news for big AI model providers, but Positron’s leadership contends it is helpful for many more enterprises beyond, including those using AI models in their workflows, not as service offerings to customers. “We build chips that can be deployed in hundreds of existing data centers because they don’t require liquid cooling or extreme power densities,” pointed out Mitesh Agrawal, Positron’s CEO and the former chief operating officer of AI cloud inference provider Lambda, also in the same video call interview with VentureBeat. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Venture capitalists and early users seem to agree. Positron yesterday announced an oversubscribed $51.6 million Series A funding round led by Valor Equity Partners, Atreides Management and DFJ Growth, with support from Flume Ventures, Resilience Reserve, 1517 Fund and Unless. As for Positron’s early customer base, that includes both name-brand enterprises and companies operating in inference-heavy sectors. Confirmed deployments include the major security and cloud content networking provider Cloudflare, which uses Positron’s Atlas hardware in its globally distributed, power-constrained data centers, and Parasail, via its AI-native data infrastructure platform SnapServe. Beyond these, Positron reports adoption across several key verticals where efficient inference is critical, such as networking, gaming, content moderation, content delivery networks (CDNs), and Token-as-a-Service providers. These early users are reportedly drawn in by Atlas’s ability to deliver high throughput and lower power consumption without requiring specialized cooling or reworked infrastructure, making it an attractive drop-in option for AI workloads across enterprise environments. Entering a challenging market that is decreasing AI model size and increasing efficiency But Positron is also entering a challenging market. The Information just reported that rival buzzy AI inference chip startup Groq — where Sohmers previously worked as Director of Technology Strategy — has reduced its 2025 revenue projection from $2 billion+ to $500 million, highlighting just how volatile the AI hardware space can be. Even well-funded firms face headwinds as they compete for data center capacity and enterprise mindshare against entrenched GPU providers like Nvidia, not to mention the elephant in the room: the rise of more efficient, smaller large language models (LLMs) and specialized small language models (SLMs) that can even run on devices as small and low-powered as smartphones. Yet Positron’s leadership is for now embracing the trend and shrugging off the possible impacts on its growth trajectory. “There’s always been this duality—lightweight applications on local devices and heavyweight processing in centralized infrastructure,” said Agrawal. “We believe both will keep growing.” Sohmers agreed, stating: “We see a future where every person might have a capable model on their phone, but those will still rely on large models in data centers to generate deeper insights.” Atlas is an inference-first AI chip While Nvidia GPUs helped catalyze the deep learning boom by accelerating model training, Positron argues that inference — the stage where models generate output in production — is now the true bottleneck. Its founders call it the most under-optimized part of the “AI stack,” especially for generative AI workloads that depend on fast, efficient model serving. Positron’s solution is Atlas, its first-generation inference accelerator built specifically to handle large transformer models. Unlike general-purpose GPUs, Atlas is optimized for the unique memory and throughput needs of modern inference tasks. The company claims Atlas delivers 3.5x better performance per dollar and up to 66% lower power usage than Nvidia’s H100, while also achieving 93% memory bandwidth utilization—far above the typical 10–30% range seen in GPUs. From Atlas to Titan, supporting multi-trillion parameter models Launched just 15 months after founding — and with only $12.5 million in seed capital — Atlas is already shipping and in production. The system supports up to 0.5 trillion-parameter models in a single 2kW server and is compatible with Hugging Face transformer models via an OpenAI API-compatible endpoint. Positron is now preparing to launch its next-generation platform, Titan, in 2026. Built on custom-designed “Asimov” silicon, Titan will feature up to two terabytes of high-speed memory per accelerator and support models up to 16 trillion parameters. Today’s frontier models are in the hundred billions and single digit trillions of parameters, but newer models like OpenAI’s GPT-5 are presumed to be in the multi-trillions, and larger models are currently thought to be required to reach artificial general intelligence (AGI), AI that outperforms humans on most economically valuable work, and superintelligence, AI that exceeds the ability for humans to understand and control. Crucially, Titan is designed to operate with standard air cooling in conventional data center environments, avoiding the high-density, liquid-cooled configurations that next-gen GPUs increasingly require. Engineering for efficiency and compatibility From the start, Positron designed its system to be a drop-in replacement, allowing customers to use existing model binaries without code rewrites. “If a customer had to change their behavior or their actions in any way, shape or form, that was a barrier,” said Sohmers. Sohmers explained that instead of building a complex compiler stack or rearchitecting software ecosystems, Positron focused narrowly on inference, designing hardware that ingests Nvidia-trained models directly. “CUDA mode isn’t something to fight,” said Agrawal. “It’s an

Positron believes it has found the secret to take on Nvidia in AI inference chips — here’s how it could benefit enterprises Read More »

How E2B became essential to 88% of Fortune 100 companies and raised $21 million

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now E2B, a startup providing cloud infrastructure specifically designed for artificial intelligence agents, has closed a $21 million Series A funding round led by Insight Partners, capitalizing on surging enterprise demand for AI automation tools. The funding comes as an remarkable 88% of Fortune 100 companies have already signed up to use E2B’s platform, according to the company, highlighting the rapid enterprise adoption of AI agent technology. The round included participation from existing investors Decibel, Sunflower Capital, and Kaya, along with notable angels including Scott Johnston, former CEO of Docker. E2B’s technology addresses a critical infrastructure gap as companies increasingly deploy AI agents — autonomous software programs that can execute complex, multi-step tasks including code generation, data analysis, and web browsing. Unlike traditional cloud computing designed for human users, E2B provides secure, isolated computing environments where AI agents can safely run potentially dangerous code without compromising enterprise systems. “Enterprises have enormous expectations for AI agents. However, we’re asking them to scale and perform on legacy infrastructure that wasn’t designed for autonomous agents,” said Vasek Mlejnsky, co-founder and CEO of E2B, in an exclusive interview with VentureBeat. “E2B solves this by equipping AI agents with safe, scalable, high-performance cloud infrastructure designed specifically for production-scale agent deployments.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Seven-figure monthly revenue spike shows enterprises betting big on AI automation The funding reflects explosive revenue growth, with E2B adding “seven figures” in new business just in the past month, according to Mlejnsky. The company has processed hundreds of millions of sandbox sessions since October, demonstrating the scale at which enterprises are deploying AI agents. E2B’s customer roster reads like a who’s who of AI innovation: search engine Perplexity uses E2B to power advanced data analysis features for Pro users, implementing the capability in just one week. AI chip company Groq relies on E2B for secure code execution in its Compound AI systems. Workflow automation platform Lindy integrated E2B to enable custom Python and JavaScript execution within user workflows. The startup’s technology has also become critical infrastructure for AI research. Hugging Face, the leading AI model repository, uses E2B to safely execute code during reinforcement learning experiments for replicating advanced models like DeepSeek-R1. Meanwhile, UC Berkeley’s LMArena platform has launched over 230,000 E2B sandboxes to evaluate large language models’ web development capabilities. Firecracker microVMs solve the dangerous code problem plaguing AI development E2B’s core innovation lies in its use of Firecracker microVMs — lightweight virtual machines originally developed by Amazon Web Services — to create completely isolated environments for AI-generated code execution. This addresses a fundamental security challenge: AI agents often need to run untrusted code that could potentially damage systems or access sensitive data. “When talking to customers and special enterprises, their biggest decision is almost always build versus buy,” Mlejnsky explained in an interview. “With the build versus buy solution, it all really comes down to whether you want to spend next six to 12 months building this hiring five to 10 person infrastructure team that will cost you at least half a million dollars…or you can use our plug and play solution.” The platform supports multiple programming languages including Python, JavaScript, and C++, and can spin up new computing environments in approximately 150 milliseconds — fast enough to maintain the real-time responsiveness users expect from AI applications. Enterprise customers particularly value E2B’s open-source approach and deployment flexibility. Companies can self-host the entire platform for free or deploy it within their own virtual private clouds (VPCs) to maintain data sovereignty — a critical requirement for Fortune 100 firms handling sensitive information. Perfect timing as Microsoft layoffs signal shift toward AI worker replacement The funding comes at a pivotal moment for AI agent technology. Recent advances in large language models have made AI agents increasingly capable of handling complex, real-world tasks. Microsoft recently laid off thousands of employees while expecting AI agents to perform previously human-only work, Mlejnsky pointed out in our interview. However, infrastructure limitations have constrained AI agent adoption. Industry data suggests fewer than 30% of AI agents successfully make it to production deployment, often due to security, scalability, and reliability challenges that E2B’s platform aims to solve. “We’re building the next cloud,” Mlejnsky said, outlining the company’s ambitious vision. “The current world runs on Cloud 2.0, which was made for humans. We’re building the open-source cloud for AI agents where they can be autonomous and run securely.” The market opportunity appears substantial. Code generation assistants already produce at least 25% of the world’s software code, while JPMorgan Chase saved 360,000 hours annually through document processing agents. Enterprise leaders expect to automate 15% to 50% of manual tasks using AI agents, creating massive demand for supporting infrastructure. Open-source strategy creates defensive moat against tech giants like Amazon and Google E2B faces potential competition from cloud giants like Amazon, Google, and Microsoft, which could theoretically replicate similar functionality. However, the company has built competitive advantages through its open-source approach and focus on AI-specific use cases. “We don’t really care” about the underlying virtualization technology, Mlejnsky explained, noting that E2B focuses on creating an open standard for how AI agents interact with computing resources. “We are even like actually partnering with a lot of these cloud providers too, because a lot of enterprise customers actually want to deploy E2B inside their AWS account.” The company’s open-source sandbox protocol has become a de facto standard, with hundreds of millions of compute instances demonstrating its real-world effectiveness. This network effect makes it difficult for competitors to displace E2B once enterprises have standardized on

How E2B became essential to 88% of Fortune 100 companies and raised $21 million Read More »

It’s Qwen’s summer: new open source Qwen3-235B-A22B-Thinking-2507 tops OpenAI, Gemini reasoning models on key benchmarks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now If the AI industry had an equivalent to the recording industry’s “song of the summer” — a hit that catches on in the warmer months here in the Northern Hemisphere and is heard playing everywhere — the clear honoree for that title would go to Alibaba’s Qwen Team. Over just the past week, the frontier model AI research division of the Chinese e-commerce behemoth has released not one, not two, not three, but four (!!) new open source generative AI models that offer record-setting benchmarks, besting even some leading proprietary options. Last night, Qwen Team capped it off with the release of Qwen3-235B-A22B-Thinking-2507, it’s updated reasoning large language model (LLM), which takes longer to respond than a non-reasoning or “instruct” LLM, engaging in “chains-of-thought” or self-reflection and self-checking that hopefully result in more correct and comprehensive responses on more difficult tasks. Indeed, the new Qwen3-Thinking-2507, as we’ll call it for short, now leads or closely trails top-performing models across several major benchmarks. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF As AI influencer and news aggregator Andrew Curran wrote on X: “Qwen’s strongest reasoning model has arrived, and it is at the frontier.” In the AIME25 benchmark—designed to evaluate problem-solving ability in mathematical and logical contexts — Qwen3-Thinking-2507 leads all reported models with a score of 92.3, narrowly surpassing both OpenAI’s o4-mini (92.7) and Gemini-2.5 Pro (88.0). The model also shows a commanding performance on LiveCodeBench v6, scoring 74.1, ahead of Google Gemini-2.5 Pro (72.5), OpenAI o4-mini (71.8), and significantly outperforming its earlier version, which posted 55.7. In GPQA, a benchmark for graduate-level multiple-choice questions, the model achieves 81.1, nearly matching Deepseek-R1-0528 (81.0) and trailing Gemini-2.5 Pro’s top mark of 86.4. On Arena-Hard v2, which evaluates alignment and subjective preference through win rates, Qwen3-Thinking-2507 scores 79.7, placing it ahead of all competitors. The results show that this model not only surpasses its predecessor in every major category but also sets a new standard for what open-source, reasoning-focused models can achieve. A shift away from ‘hybrid reasoning’ The release of Qwen3-Thinking-2507 reflects a broader strategic shift by Alibaba’s Qwen team: moving away from hybrid reasoning models that required users to manually toggle between “thinking” and “non-thinking” modes. Instead, the team is now training separate models for reasoning and instruction tasks. This separation allows each model to be optimized for its intended purpose—resulting in improved consistency, clarity, and benchmark performance. The new Qwen3-Thinking model fully embodies this design philosophy. Alongside it, Qwen launched Qwen3-Coder-480B-A35B-Instruct, a 480B-parameter model built for complex coding workflows. It supports 1 million token context windows and outperforms GPT-4.1 and Gemini 2.5 Pro on SWE-bench Verified. Also announced was Qwen3-MT, a multilingual translation model trained on trillions of tokens across 92+ languages. It supports domain adaptation, terminology control, and inference from just $0.50 per million tokens. Earlier in the week, the team released Qwen3-235B-A22B-Instruct-2507, a non-reasoning model that surpassed Claude Opus 4 on several benchmarks and introduced a lightweight FP8 variant for more efficient inference on constrained hardware. All models are licensed under Apache 2.0 and are available through Hugging Face, ModelScope, and the Qwen API. Licensing: Apache 2.0 and its enterprise advantage Qwen3-235B-A22B-Thinking-2507 is released under the Apache 2.0 license, a highly permissive and commercially friendly license that allows enterprises to download, modify, self-host, fine-tune, and integrate the model into proprietary systems without restriction. This stands in contrast to proprietary models or research-only open releases, which often require API access, impose usage limits, or prohibit commercial deployment. For compliance-conscious organizations and teams looking to control cost, latency, and data privacy, Apache 2.0 licensing enables full flexibility and ownership. Availability and pricing Qwen3-235B-A22B-Thinking-2507 is available now for free download on Hugging Face and ModelScope. For those enterprises who don’t want to or don’t have the resources and capability to host the model inference on their own hardware or virtual private cloud through Alibaba Cloud’s API, vLLM, and SGLang. Input price: $0.70 per million tokens Output price: $8.40 per million tokens Free tier: 1 million tokens, valid for 180 days The model is compatible with agentic frameworks via Qwen-Agent, and supports advanced deployment via OpenAI-compatible APIs. It can also be run locally using transformer frameworks or integrated into dev stacks through Node.js, CLI tools, or structured prompting interfaces. Sampling settings for best performance include temperature=0.6, top_p=0.95, and max output length of 81,920 tokens for complex tasks. Enterprise applications and future outlook With its strong benchmark performance, long-context capability, and permissive licensing, Qwen3-Thinking-2507 is particularly well suited for use in enterprise AI systems involving reasoning, planning, and decision support. The broader Qwen3 ecosystem — including coding, instruction, and translation models—further extends the appeal to technical teams and business units looking to incorporate AI across verticals like engineering, localization, customer support, and research. The Qwen team’s decision to release specialized models for distinct use cases, backed by technical transparency and community support, signals a deliberate shift toward building open, performant, and production-ready AI infrastructure. As more enterprises seek alternatives to API-gated, black-box models, Alibaba’s Qwen series increasingly positions itself as a viable open-source foundation for intelligent systems—offering both control and capability at scale. source

It’s Qwen’s summer: new open source Qwen3-235B-A22B-Thinking-2507 tops OpenAI, Gemini reasoning models on key benchmarks Read More »

From pilot to payoff: Turning AI investment into real ROI

Presented by Google Cloud Ignoring agentic AI’s potential, particularly its demand for modernized data infrastructure, carries the same existential risk that faced retailers who ignored the internet. The question isn’t whether to invest, but how to ensure those investments translate into measurable, real-world payoff. But measuring tangible return on agentic AI investment can feel elusive. So how should you position yourself for the agentic AI future, while also ensuring measurable successes along the way? Get clearer about what you’re aiming for This is a crucial moment for enterprises to move beyond the tinkering phase of AI. The era of experimenting for experimentation’s sake is over. Today’s models are powerful, but their value depends on the clarity of the outcomes they’re meant to achieve. Without a sharp understanding of business objectives, even the most advanced AI capabilities risk becoming expensive science projects. It’s time to get precise about what success looks like, and build towards it deliberately. For instance, agents now manage governance, orchestrate pipelines, accelerate onboarding, and enhance customer engagement. Some benefits are easily quantified, like a 15% lift in marketing conversion or a 40% drop in onboarding time. Others are more structural, such as optimized resource utilization and the elimination of redundant tools. When starting out, determine what use cases make the most impact in the least amount of time, and build from there. Governance: Where ROI takes root So how do you model more specific ROI goals into your AI strategy? It starts with governance. This isn’t just about compliance; governance agents actively enforce policies, dynamically detect schema drift, and pinpoint lineage gaps in real time. That creates trustable feedback loops for both developers and executives evaluating outcomes. Successful organizations aren’t fixated on a single big AI use case. They’re embedding agents across the stack, from customer-facing applications to internal systems for governance, data quality monitoring, and workload optimization. Without a strong command of your data, however, understanding what these agents achieve and, more importantly, measuring their ROI, becomes impossible. As the investor and author Robert Kiyosaki noted, “The rich don’t work for money; they make money work for them.” A similar principle applies to your data. When your data is agile, clean, and actively working for you — improving decisions, training sophisticated systems, and powering autonomous agents — ROI from AI becomes not just theoretical, but real. The most successful early adopters built governance deliberately. They invested in metadata systems, automation, and domain-based organization. This creates efficiencies, from eliminating redundant data pipelines to speeding delivery. The payoff isn’t always immediate, but it is foundational. Robust governance transforms raw data into a reliable, usable product that enables agents to deliver consistent, repeatable value. Measuring ROI across the stack ROI can emerge in many places, and not all of them look the same. On the business side, agentic AI is already delivering impact. Marketing teams use generative agents for hyper-personalized campaigns, while sales and support teams deploy copilots that dramatically improve response times and customer satisfaction. These are direct accelerators for revenue and key performance indicators. For example, I recently spoke with a financial services firm that used generative agents to personalize onboarding sequences, cutting customer setup time from two weeks to three days, while improving conversion by 20%. On the supply side, AI agents are optimizing infrastructure, significantly reducing manual work, and mitigating risk. This includes automating complex governance, improving observability, and intelligently tuning workloads to reduce spend. These efficiency gains often materialize more quickly than customer-facing improvements. A common antipattern is fragmented platforms. When teams adopt overlapping tools, hidden costs pile up. Whether you run a unified platform or a mixed environment, significant ROI is gained by reducing duplication and consolidating workloads. Interoperability matters. When agents operate across systems and governance is consistent, both compute and operational costs fall. The most agile and successful enterprises relentlessly streamline their core platforms. Think of AI ROI as a continuum. Some investments yield immediate returns. Others build long-term value. The key is knowing where you are and what to measure. From guesswork to guidance Don’t treat AI as a mere cost-cutting tool. Its deeper opportunity is horizontal: helping teams to move faster, innovate more, and focus on higher-value work. This benefit, however, only materializes if your data is ready, and that readiness begins with governance. By making ROI visible and trackable, governance inherently breaks down the organizational silos that fragment efforts and dilute results. It establishes a shared framework that directly aligns data investments with company-wide OKRs. In this age of agentic AI, ROI isn’t a static number on a dashboard; it’s a distributed force waiting to be captured across your enterprise. Learn how Google Cloud provides the integrated platform to turn these challenges into your competitive advantage. Gus Kimble is GM and Head of North America Data Analytics Customer Engineering at Google Cloud. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

From pilot to payoff: Turning AI investment into real ROI Read More »