VentureBeat

From chatbots to collaborators: How AI agents are reshaping enterprise work

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Scott White still marvels at how quickly artificial intelligence has transformed from a novelty into a true work partner. Just over a year ago, the product lead for Claude AI at Anthropic watched as early AI coding tools could barely complete a single line of code. Today, he’s building production-ready software features himself — despite not being a professional programmer. “I no longer think about my job as writing a PRD and trying to convince someone to do something,” White said during a fireside chat at VB Transform 2025, VentureBeat’s annual enterprise AI summit in San Francisco. “The first thing I do is, can I build a workable prototype of this on our staging server and then share a demo of it actually working.” This shift represents a broader transformation in how enterprises are adopting AI, moving beyond simple chatbots that answer questions to sophisticated “agentic” systems capable of autonomous work. White’s experience offers a glimpse into what may be coming for millions of other knowledge workers. >>See all our Transform 2025 coverage here<< From code completion to autonomous programming: AI’s breakneck evolution The evolution has been remarkably swift. When White joined Anthropic, the company’s Claude 2 model could handle basic text completion. The release of Claude 3.5 Sonnet enabled the creation of entire applications, leading to features like Artifacts that let users generate custom interfaces. Now, with Claude 4 achieving a 72.5% score on the SWE-bench coding benchmark, the model can function as what White calls “a fully remote agentic software engineer.” Claude Code, the company’s latest coding tool, can analyze entire codebases, search the internet for API documentation, issue pull requests, respond to code review comments, and iterate on solutions — all while working asynchronously for hours. White noted that 90% of Claude Code itself was written by the AI system. “That is like an entire agentic process in the background that was not possible six months ago,” White explained. Enterprise giants slash work time from weeks to minutes with AI agents The implications extend far beyond software development. Novo Nordisk, the Danish pharmaceutical giant, has integrated Claude into workflows that previously took 10 weeks to complete clinical reports, now finishing the same work in 10 minutes. GitLab uses the technology for everything from sales proposals to technical documentation. Intuit deploys Claude to provide tax advice directly to consumers. White distinguishes between different levels of AI integration: simple language models that answer questions, models enhanced with tools like web search, structured workflows that incorporate AI into business processes, and full agents that can pursue goals autonomously using multiple tools and iterative reasoning. “I think about an agent as something that has a goal, and then it can just do many things to accomplish that goal,” White said. The key enabler has been what he calls the “inexorable” relationship between model intelligence and new product capabilities. The infrastructure revolution: Building networks of AI collaborators A critical infrastructure development has been Anthropic’s Model Context Protocol (MCP), which White describes as “the USB-C for integrations.” Rather than companies building separate connections to each data source or tool, MCP provides a standardized way for AI systems to access enterprise software, from Salesforce to internal knowledge repositories. “It’s really democratizing access to data,” White said, noting that integrations built by one company can be shared and reused by others through the open-source protocol. For organizations looking to implement AI agents, White recommends starting small and building incrementally. “Don’t try to build an entire agentic system from scratch,” he advised. “Build the component of it, make sure that component works, then build a next component.” He also emphasized the importance of evaluation systems to ensure AI agents perform as intended. “Evals are the new PRD,” White said, referring to product requirement documents, highlighting how companies must develop new methods to assess AI performance on specific business tasks. From AI assistants to AI organizations: The next workforce frontier Looking ahead, White envisions AI development becoming accessible to non-technical workers, similar to how coding capabilities have advanced. He imagines a future where individuals manage not just one AI agent but entire organizations of specialized AI systems. “How can everyone be their own mini CPO or CEO?” White asked. “I don’t exactly know what that looks like, but that’s the kind of thing that I wake up and want to get there.” The transformation White describes reflects broader industry trends as companies grapple with AI’s expanding capabilities. While early adoption focused on experimental use cases, enterprises are increasingly integrating AI into core business processes, fundamentally changing how work gets done. As AI agents become more autonomous and capable, the challenge shifts from teaching machines to perform tasks to managing AI collaborators that can work independently for extended periods. For White, this future is already arriving — one production feature at a time. source

From chatbots to collaborators: How AI agents are reshaping enterprise work Read More »

AWS doubles down on infrastructure as strategy in the AI race with SageMaker upgrades

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now AWS seeks to extend its market position with updates to SageMaker, its machine learning and AI model training and inference platform, adding new observability capabilities, connected coding environments and GPU cluster performance management.  However, AWS continues to face competition from Google and Microsoft, which also offer many features that help accelerate AI training and inference.   SageMaker, which transformed into a unified hub for integrating data sources and accessing machine learning tools in 2024, will add features that provide insight into why model performance slows and offer AWS customers more control over the amount of compute allocated for model development. Other new features include connecting local integrated development environments (IDEs) to SageMaker, so locally written AI projects can be deployed on the platform.  SageMaker General Manager Ankur Mehrotra told VentureBeat that many of these new updates originated from customers themselves.  “One challenge that we’ve seen our customers face while developing Gen AI models is that when something goes wrong or when something is not working as per the expectation, it’s really hard to find what’s going on in that layer of the stack,” Mehrotra said. SageMaker HyperPod observability enables engineers to examine the various layers of the stack, such as the compute layer or networking layer. If anything goes wrong or models become slower, SageMaker can alert them and publish metrics on a dashboard. Mehrotra pointed to a real issue his own team faced while training new models, where training code began stressing GPUs, causing temperature fluctuations. He said that without the latest tools, developers would have taken weeks to identify the source of the issue and then fix it.  Connected IDEs SageMaker already offered two ways for AI developers to train and run models. It had access to fully managed IDEs, such as Jupyter Lab or Code Editor, to seamlessly run the training code on the models through SageMaker. Understanding that other engineers prefer to use their local IDEs, including all the extensions they have installed, AWS allowed them to run their code on their machines as well.  However, Mehrotra pointed out that it meant locally coded models only ran locally, so if developers wanted to scale, it proved to be a significant challenge.  AWS added new secure remote execution to allow customers to continue working on their preferred IDE — either locally or managed — and connect ot to SageMaker. “So this capability now gives them the best of both worlds where if they want, they can develop locally on a local IDE, but then in terms of actual task execution, they can benefit from the scalability of SageMaker,” he said.  More flexibility in compute AWS launched SageMaker HyperPod in December 2023 as a means to help customers manage clusters of servers for training models. Similar to providers like CoreWeave, HyperPod enables SageMaker customers to direct unused compute power to their preferred location. HyperPod knows when to schedule GPU usage based on demand patterns and allows organizations to balance their resources and costs effectively.  However, AWS said many customers wanted the same service for inference. Many inference tasks occur during the day when people use models and applications, while training is usually scheduled during off-peak hours.  Mehrotra noted that even in the world inference, developers can prioritize the inference tasks that HyperPod should focus on. Laurent Sifre, co-founder and CTO at AI agent company H AI, said in an AWS blog post that the company used SageMaker HyperPod when building out its agentic platform. “This seamless transition from training to inference streamlined our workflow, reduced time to production, and delivered consistent performance in live environments,” Sifre said.  AWS and the competition Amazon may not be offering the splashiest foundation models like its cloud provider rivals, Google and Microsoft. Still, AWS has been more focused on providing the infrastructure backbone for enterprises to build AI models, applications, or agents.  In addition to SageMaker, AWS also offers Bedrock, a platform specifically designed for building applications and agents.  SageMaker has been around for years, initially serving as a means to connect disparate machine learning tools to data lakes. As the generative AI boom began, AI engineers began using SageMaker to help train language models. However, Microsoft is pushing hard for its Fabric ecosystem, with 70% of Fortune 500 companies adopting it, to become a leader in the data and AI acceleration space. Google, through Vertex AI, has quietly made inroads in enterprise AI adoption. AWS, of course, has the advantage of being the most widely used cloud provider. Any updates that would make its many AI infrastructure platforms easier to use will always be a benefit.  source

AWS doubles down on infrastructure as strategy in the AI race with SageMaker upgrades Read More »

The human harbor: Navigating identity and meaning in the AI age

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Image generated by ChatGPT. We are living through a time when AI is reshaping how we work but also how we think, perceive and assign meaning. This phase is not just about smarter tools or faster work. AI is beginning to reshape how we define value, purpose and identity itself. The future is not just unpredictable in terms of unknowable events; it is marked by deepening uncertainty about our place in it, and by growing ambiguity about the nature of human purpose itself. Until now, the terrain of thought and judgment was distinctly human. But that ground is shifting. We find ourselves in motion, part of a larger migration toward something unknown; a journey as exhilarating as it is unnerving. Perhaps a redefinition of what it means to live, contribute and have value in a world where cognition is no longer our exclusive domain. Reflected wisdom Trained with vast expanses of human knowledge, machines now reflect versions of us through our language, reasoning and creativity, powered by statistical prediction and amplified by computational speed unimaginable just five years ago. Much like Narcissus, transfixed by his reflection and unable to look away, we are drawn to AI’s mirrored intelligence. In chatbots, we encounter echoes of ourselves in their language, empathy and insight. This fascination with our reflected intelligence, however, unfolds against a backdrop of rapid economic transformation that threatens to make the metaphor literal, leaving us transfixed while the ground shifts beneath our feet. OpenAI CEO Sam Altman has said Gen Z and Millennials are now treating AI chatbots as “life advisors.” Yet what chatbots show us is not a perfect mirror. It is subtly reshaped by algorithmic logic, probabilistic inference and sycophantic reinforcement. Like a carnival mirror, its distortions are seductive precisely because they flatter. The emotional toll Even as AI offers an imperfect mirror, its proliferation is triggering profound and mixed emotions. In “The Master Algorithm,” University of Washington professor Pedro Domingos offers reassurance about the impact of AI: “Humans are not a dying twig on the tree of life. On the contrary, we are about to start branching. In the same way that culture coevolved with larger brains, we will coevolve with our creations.” Not everyone is so certain. Psychologist Elaine Ryan, in an interview with Business Insider, noted: “[AI] didn’t arrive quietly. It appeared everywhere — at work, in healthcare, in education, even in creativity. People feel disoriented. They worry not just about losing jobs but about losing relevance. Some even wonder if they’re losing their sense of identity. I’ve heard it again and again: ‘Where do I fit now?’ or ‘What do I have to offer that AI can’t?’” These feelings are not personal failures. They are signals of a system in flux and of a story we have not yet written.  Losing our place This sense of dislocation is not just an emotional reaction; it signals something deeper: A reexamination of the very ground on which human identity has stood. This moment compels us to revisit foundational questions: What does it mean to be human when cognition itself can be outsourced or surpassed? Where does meaning reside when our crowning trait — the capacity to reason and create — is no longer uniquely ours? These feelings point toward a fundamental shift: We are moving from defining ourselves by what we do to discovering who we are beyond our cognitive outputs. One path sees us as conductors or orchestrators of AI. For example, Altman foresees a world where each of us has multiple AI agents running in parallel, anticipating needs, analyzing conversations and surfacing ideas. He noted: “We have this team of agents, assistants, companions… doing stuff in the background all the time… [that] will really transform what people can do and how we work, and to some extent how we live our lives.” Another trajectory points toward AI systems that do not just assist but outperform. For example, Microsoft researches developed a “Microsoft AI Diagnostic Orchestrator (MAI-DxO)” system that uses multiple frontier AI models to mimic several human doctors working together in a virtual panel. In a blog post, Microsoft said this led to successful diagnoses at a rate more than four times higher than a group of experienced physicians. According to Microsoft AI CEO Mustafa Suleyman: “This orchestration mechanism — multiple agents that work together in this chain-of-debate style — is going to drive us closer to medical superintelligence.”  The distinction between augmentation and replacement matters because our response, and the harbor we build, depends partly on which trajectory dominates. If AI acts continuously on our behalf by anticipating, executing, even exceeding us, what becomes of human initiative, surprise or the cognitive friction that fosters growth? And who, in this new orchestration, still finds a role that feels essential? That question is especially poignant now, as some startups promote “stop hiring humans” and instead employ AI agents as an alternative. Others pursue the wholesale automation of white-collar labor “as fast as possible.”  These efforts may not succeed, but companies are investing as if they will and doing so at speed. A survey of U.S.-based C-suite and business leaders by management consulting firm KPMG found that “as AI-agent adoption accelerates, there is near-unanimous agreement that comprehensive organizational changes are coming.” Nearly 9 in 10 respondents said agents will require organizations to redefine performance metrics and will also prompt organizations to upskill employees currently in roles that may be displaced.” Clients are no longer asking ‘if’ AI will transform their business, they’re asking ‘how fast’ it can be deployed.”  Joe Rogan, in conversation with Senator Bernie Sanders, expressed concern about AI displacing workers and its impact. “Even if people have universal basic income, they don’t have meaning.” Sanders responded: “What you’re talking about here is a revolution in human existence… We have to find [meaning] in ourselves in ways you don’t know,

The human harbor: Navigating identity and meaning in the AI age Read More »

The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now The chatter around artificial general intelligence (AGI) may dominate headlines coming from Silicon Valley companies like OpenAI, Meta and xAI, but for enterprise leaders on the ground, the focus is squarely on practical applications and measurable results. At VentureBeat’s recent Transform 2025 event in San Francisco, a clear picture emerged: the era of real, deployed agentic AI is here, is accelerating and it’s already reshaping how businesses operate. Companies like Intuit, Capital One, LinkedIn, Stanford University and Highmark Health are quietly putting AI agents into production, tackling concrete problems, and seeing tangible returns. Here are the four biggest takeaways from the event for technical decision-makers. 1. AI Agents are moving into production, faster than anyone realized Enterprises are now deploying AI agents in customer-facing applications, and the trend is accelerating at a breakneck pace. A recent VentureBeat survey of 2,000 industry professionals conducted just before VB Transform revealed that 68% of enterprise companies (with 1,000+ employees) had already adopted agentic AI – a figure that seemed high at the time. (In fact, I worried it was too high to be credible, so when I announced the survey results on the event stage, I cautioned that the high adoption may be a reflection of VentureBeat’s specific readership.) However, new data validates this rapid shift. A KPMG survey released on June 26, a day after our event, shows that 33% of organizations are now deploying AI agents, a surprising threefold increase from just 11% in the previous two quarters. This market shift validates the trend VentureBeat first identified just weeks ago in its pre-Transform survey. This acceleration is being fueled by tangible results. Ashan Willy, CEO of New Relic, noted a staggering 30% quarter over quarter growth in monitoring AI applications by its customers, mainly because of the its customers’ move to adopt agents. Companies are deploying AI agents to help customers automate workflows they need help with. Intuit, for instance, has deployed invoice generation and reminder agents in its QuickBooks software. The result? Businesses using the feature are getting paid five days faster and are 10% more likely to be paid in full. Even non-developers are feeling the shift. Scott White, the product lead of Anthropic’s Claude AI product, described how he, despite not being a professional programmer, is now building production-ready software features himself. “This wasn’t possible six months ago,” he explained, highlighting the power of tools like Claude Code. Similarly, OpenAI’s head of product for its API platform, Olivier Godement, detailed how customers like Stripe and Box are using its Agents SDK to build out multi-agent systems. 2. The hyperscaler race has no clear winner as multi-cloud, multi-model reigns The days of betting on a single large language model (LLM) provider are over. A consistent theme throughout Transform 2025 was the move towards a multi-model and multi-cloud strategy. Enterprises want the flexibility to choose the best tool for the job, whether it’s a powerful proprietary model or a fine-tuned open-source alternative. As Armand Ruiz, VP of AI Platform at IBM explained, the company’s development of a model gateway — which routes applications to use whatever LLM is most efficient and performant for the specific case –was a direct response to customer demand. IBM started by offering enterprise customers its own open-source models, then added open-source support, and finally realized it needed to support all models. This desire for flexibility was echoed by XD Huang, the CTO of Zoom, who described his company’s three-tiered model approach: supporting proprietary models, offering their own fine-tuned model and allowing customers to create their own fine-tuned versions. This trend is creating a powerful but constrained ecosystem, where GPUs and the power needed to generate tokens are in limited supply. As Dylan Patel of SemiAnalysis and fellow panelists Jonathan Ross of Groq and Sean Lie of Cerebras pointed out, this puts pressure on the profitability of a lot of companies that simply buy more tokens when they are available, instead of locking into profits as the cost of those tokens continues to fall. Enterprises are getting smarter about how they use different models for different tasks to optimize for both cost and performance — and that may often mean not just relying on Nvidia chips, but being much more customized — something also echoed in a VB Transform session led by Solidigm around the emergence of customized memory and storage solutions for AI. 3. Enterprises are focused on solving real problems, not chasing AGI While tech leaders like Elon Musk, Mark Zuckerberg and Sam Altman are talking about the dawn of superintelligence, enterprise practitioners are rolling up their sleeves and solving immediate business challenges. The conversations at Transform were refreshingly grounded in reality. Take Highmark Health, the nation’s third-largest integrated health insurance and provider company. Its Chief Data Officer Richard Clarke said it is using LLMs for practical applications like multilingual communication to better serve their diverse customer base, and streamlining medical claims. In other words, leveraging technology to deliver better services today. Similarly, Capital One is building teams of agents that mirror the functions of the company, with specific agents for tasks like risk evaluation and auditing, including helping their car dealership clients connect customers with the right loans. The travel industry is also seeing a pragmatic shift. CTOs from Expedia and Kayak discussed how they are adapting to new search paradigms enabled by LLMs. Users can now search for a hotel with an “infinity pool” on ChatGPT, and travel platforms need to incorporate that level of natural language discovery to stay competitive. The focus is on the customer, not the technology for its own sake. 4. The future of AI teams is small, nimble, and empowered The age of AI agents is also transforming how teams are structured. The consensus is that small, agile “squads” of three to four engineers are most effective. Varun Mohan, CEO

The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted Read More »

Don’t wait for a ‘bake-off’: How Intuit and Amex beat competitors to production AI agents

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As generative AI matures, enterprises are shifting from experimentation to implementation—moving beyond chatbots and copilots into the realm of intelligent, autonomous agents. In a conversation with VentureBeat’s Matt Marshall, Ashok Srivastava, SVP and Chief Data Officer at Intuit, and Hilary Packer, EVP and CTO at American Express at VB Transform, detailed how their companies are embracing agentic AI to transform customer experiences, internal workflows and core business operations. >>See all our Transform 2025 coverage here<< From models to missions: the rise of intelligent agents At Intuit, agents aren’t just about answering questions—they’re about executing tasks. In TurboTax, for instance, agents help customers complete their taxes 12% faster, with nearly half finishing in under an hour. These intelligent systems draw data from multiple streams—including real-time and batch data—via Intuit’s internal bus and persistent services. Once processed, the agent analyzes the information to make a decision and take action. “This is the way we’re thinking about agents in the financial domain,”  said Srivastava. “We’re trying to make sure that as we build, they’re robust, scalable and actually anchored in reality. The agentic experiences we’re building are designed to get work done for the customer, with their permission. That’s key to building trust.” These capabilities are made possible by GenOS, Intuit’s custom generative AI operating system. At its heart is GenRuntime, which Srivastava likens to a CPU: it receives the data, reasons over it, and determines an action that’s then executed for the end user. The OS was designed to abstract away technical complexity, so developers don’t need to reinvent risk safeguards or security layers every time they build an agent. Across Intuit’s brands—from TurboTax and QuickBooks to Mailchimp and Credit Karma—GenOS helps create consistent, trusted experiences and ensure robustness, scalability and extensibility across use cases.  Building the agentic stack at Amex: trust, control,and experimentation For Packer and her team at Amex, the move into agentic AI builds on more than 15 years of experience with traditional AI and a mature, battle-tested big data infrastructure. As gen AI capabilities accelerate, Amex is reshaping its strategy to focus on how intelligent agents can drive internal workflows and power the next generation of customer experiences. For example, the company is focused on developing internal agents that boost employee productivity, like the APR agent that reviews software pull requests and advises engineers on whether code is ready to merge. This project reflects Amex’s broader approach: start with internal use cases, move quickly, and use early wins to refine the underlying infrastructure, tools, and governance standards. To support fast experimentation, strong security, and policy enforcement, Amex developed an “enablement layer” that allows for rapid development without sacrificing oversight. “And so now as we think about agentic, we’ve got a nice control plane to plug in these additional guardrails that we really do need to have in place,” said Packer. Within this system is Amex’s concept of modular “brains”—a framework in which agents are required to consult with specific “brains” before taking action. These brains serve as modular governance layers—covering brand values, privacy, security, and legal compliance—that every agent must engage with during decision-making. Each brain represents a domain-specific set of policies, such as brand voice, privacy rules, or legal constraints and functions as a consultable authority. By routing decisions through this system of constraints, agents remain accountable, aligned with enterprise standards and worthy of user trust. For example, a dining reservation agent operating through Resy, Amex’s restaurant booking platform, would need to validate that it’s selecting the right restaurant at the right time, matching the user’s intent while adhering to brand and policy guidelines. Architecture that enables speed and safety Both AI leaders agreed that enabling rapid development at scale demands thoughtful architectural design. At Intuit, the creation of GenOS empowers hundreds of developers to build safely and consistently. The platform ensures each team can access shared infrastructure, common safeguards, and model flexibility without duplicating work. Amex took a similar approach with its enablement layer. Designed around a unified control plane, the layer lets teams rapidly develop AI-driven agents while enforcing centralized policies and guardrails. It ensures consistent implementation of risk and governance frameworks while encouraging speed. Developers can deploy experiments quickly, then evaluate and scale based on feedback and performance, all without compromising brand trust. Lessons in agentic AI adoption Both AI leaders stressed the need to move quickly, but with intent. “Don’t wait for a bake-off,” Packer advised. “It’s better to pick a direction, get something into production, and iterate quickly, rather than delaying for the perfect solution that may be outdated by launch time.” They also emphasized that measurement must be embedded from the very beginning. According to Srivastava, instrumentation isn’t something to bolt on later—it has to be an integral part of the stack. Tracking cost, latency, accuracy and user impact is essential for assessing value and maintaining accountability at scale.  “You have to be able to measure it. That’s where GenOS comes in—there’s a built-in capability that lets us instrument AI applications and track both the cost going in and the return coming out,” said Srivastava. “I review this every quarter with our CFO. We go line by line through every AI use case across the company, assessing exactly how much we’re spending and what value we’re getting in return.” Intelligent agents are the next enterprise platform shift Intuit and American Express are among the leading enterprises adopting agentic AI not just as a technology layer, but as a new operating model. Their approach focuses on building the agentic platform, establishing governance, measuring impact, and moving quickly. As enterprise expectations evolve from simple chatbot functionality to autonomous execution, organizations that treat agentic AI as a first-class discipline—with control planes, observability, and modular governance—will be best positioned to lead the agentic race. Editor’s note: As a thank-you to our readers, we’ve opened up early bird registration for

Don’t wait for a ‘bake-off’: How Intuit and Amex beat competitors to production AI agents Read More »

Hugging Face just launched a $299 robot that could disrupt the entire robotics industry

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Hugging Face, the $4.5 billion AI platform that has become the GitHub of machine learning (ML), announced Tuesday the launch of Reachy Mini, a $299 desktop robot designed to bring AI-powered robotics to millions of developers worldwide. The 11-inch humanoid companion represents the company’s boldest move yet to democratize robotics development and challenge the industry’s traditional closed-source, high-cost model. The announcement comes as Hugging Face crosses a significant milestone of 10 million users, with CEO Clément Delangue revealing in an exclusive interview that “more and more of them are building in relation to robotics.” The compact robot, which can sit on any desk next to a laptop, addresses what Delangue calls a fundamental barrier in robotics development: Accessibility. “One of the challenges with robotics is that you know you can’t just build on your laptop. You need to have some sort of robotics partner to help in your building, and most people won’t be able to buy $70,000 robots,” Delangue explained, referring to traditional industrial robotics systems and even newer humanoid robots like Tesla’s Optimus, which is expected to cost $20,000 to $30,000. How a software company is betting big on physical AI robots Reachy Mini emerges from Hugging Face’s April acquisition of French startup Pollen Robotics, marking the company’s most significant hardware expansion since its founding. The robot represents the first consumer product to integrate natively with the Hugging Face Hub, allowing developers to access thousands of pre-built AI models and share robotics applications through the platform’s “Spaces” feature. The timing appears deliberate as the AI industry grapples with the next frontier: Physical AI. While large language models (LLMs) have dominated the past two years, industry leaders increasingly believe that AI will need physical embodiment to achieve human-level capabilities. Goldman Sachs projects that the humanoid robotics market could reach $38 billion by 2035, while the World Economic Forum identifies robotics as a critical frontier technology for industrial operations. “We’re seeing more and more people moving to robotics, which is extremely exciting,” Delangue said. “The idea is to really become the desktop, open-source robot for AI builders.” Inside the $299 robot that could democratize AI development Reachy Mini packs sophisticated capabilities into its compact form factor. The robot features six degrees of freedom in its moving head, full body rotation, animated antennas, a wide-angle camera, multiple microphones and a 5-watt speaker. The wireless version includes a Raspberry Pi 5 computer and battery, making it fully autonomous. The robot ships as a DIY kit and can be programmed in Python, with JavaScript and Scratch support planned. Pre-installed demonstration applications include face and hand tracking, smart companion features and dancing moves. Developers can create and share new applications through Hugging Face’s Spaces platform, potentially creating what Delangue envisions as “thousands, tens of thousands, millions of apps.” This approach contrasts sharply with traditional robotics companies that typically release one product annually with limited customization options. “We want to have a model where we release tons of things,” Delangue explained. “Maybe we’ll release 100 prototypes a year. Out of this 100 prototypes, maybe we’ll assemble only 10 ourselves… and maybe fully assembled, fully packaged, fully integrated with all the software stack, maybe there’s going to be just a couple of them.” Why open-source hardware might be the future of robotics The launch represents a fascinating test of whether open-source principles can translate successfully to hardware businesses. Hugging Face plans to release all hardware designs, software and assembly instructions as open source, allowing anyone to build their own version. The company monetizes through convenience, selling pre-assembled units to developers who prefer to pay rather than build from scratch. “You try to share as much as possible to really empower the community,” Delangue explained. “There are people who, even if they have all the recipes open source to build their own Reachy Mini, would prefer to $300 bucks, $500, and get it already ready, or easy to assemble at home.” This freemium approach for hardware echoes successful software models but faces unique challenges. Manufacturing costs, supply chain complexity and physical distribution create constraints that don’t exist in pure software businesses. However, Delangue argues that this creates valuable feedback loops: “You learn from the open-source community about what they want to build, how they want to build and you can reintegrate it into what you sell.” The privacy challenge facing AI robots in your home The move into robotics raises new questions about data privacy and security that don’t exist with purely digital AI systems. Robots equipped with cameras, microphones and the ability to take physical actions in homes and workplaces create unprecedented privacy considerations. Delangue positions open source as the solution to these concerns. “One of my personal motivations to do open-source robotics is that it’s going to fight concentration of power… the natural tendency of creating black box robots that users don’t really understand or really control,” he said. “The idea of ending up in a world where just a few companies are controlling millions of robots that are in people’s homes, being able to take action in real life, is quite scary.” The open-source approach allows users to inspect code, understand data flows and potentially run AI models locally rather than relying on cloud services. For enterprise customers, Hugging Face’s existing enterprise platform could provide private deployment options for robotics applications. From prototype to production: Hugging Face’s manufacturing gamble Hugging Face faces significant manufacturing and scaling challenges as it transitions from a software platform to a hardware company. The company plans to begin shipping Reachy Mini units as early as next month, starting with more DIY-oriented versions where customers complete final assembly. “The first versions shipping will be a bit DIY, in the sense that we’ll split the weight of assembling with the user,” Delangue explained. “We’ll do some of the assembling ourselves, and the user will

Hugging Face just launched a $299 robot that could disrupt the entire robotics industry Read More »

A new paradigm for AI: How ‘thinking as optimization’ leads to better general-purpose models

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at the University of Illinois Urbana-Champaign and the University of Virginia have developed a new model architecture that could lead to more robust AI systems with more powerful reasoning capabilities.  Called an energy-based transformer (EBT), the architecture shows a natural ability to use inference-time scaling to solve complex problems. For the enterprise, this could translate into cost-effective AI applications that can generalize to novel situations without the need for specialized fine-tuned models. The challenge of System 2 thinking In psychology, human thought is often divided into two modes: System 1, which is fast and intuitive, and System 2, which is slow, deliberate and analytical. Current large language models (LLMs) excel at System 1-style tasks, but the AI industry is increasingly focused on enabling System 2 thinking to tackle more complex reasoning challenges. Reasoning models use various inference-time scaling techniques to improve their performance on difficult problems. One popular method is reinforcement learning (RL), used in models like DeepSeek-R1 and OpenAI’s “o-series” models, where the AI is rewarded for producing reasoning tokens until it reaches the correct answer. Another approach, often called best-of-n, involves generating multiple potential answers and using a verification mechanism to select the best one.  However, these methods have significant drawbacks. They are often limited to a narrow range of easily verifiable problems, like math and coding, and can degrade performance on other tasks such as creative writing. Furthermore, recent evidence suggests that RL-based approaches might not be teaching models new reasoning skills, instead just making them more likely to use successful reasoning patterns they already know. This limits their ability to solve problems that require true exploration and are beyond their training regime. Energy-based models (EBM) The architecture proposes a different approach based on a class of models known as energy-based models (EBMs). The core idea is simple: Instead of directly generating an answer, the model learns an “energy function” that acts as a verifier. This function takes an input (like a prompt) and a candidate prediction and assigns a value, or “energy,” to it. A low energy score indicates high compatibility, meaning the prediction is a good fit for the input, while a high energy score signifies a poor match. Applying this to AI reasoning, the researchers propose in a paper that devs should view “thinking as an optimization procedure with respect to a learned verifier, which evaluates the compatibility (unnormalized probability) between an input and candidate prediction.” The process begins with a random prediction, which is then progressively refined by minimizing its energy score and exploring the space of possible solutions until it converges on a highly compatible answer. This approach is built on the principle that verifying a solution is often much easier than generating one from scratch. This “verifier-centric” design addresses three key challenges in AI reasoning. First, it allows for dynamic compute allocation, meaning models can “think” for longer on harder problems and shorter on easy problems. Second, EBMs can naturally handle the uncertainty of real-world problems where there isn’t one clear answer. Third, they act as their own verifiers, eliminating the need for external models. Unlike other systems that use separate generators and verifiers, EBMs combine both into a single, unified model. A key advantage of this arrangement is better generalization. Because verifying a solution on new, out-of-distribution (OOD) data is often easier than generating a correct answer, EBMs can better handle unfamiliar scenarios. Despite their promise, EBMs have historically struggled with scalability. To solve this, the researchers introduce EBTs, which are specialized transformer models designed for this paradigm. EBTs are trained to first verify the compatibility between a context and a prediction, then refine predictions until they find the lowest-energy (most compatible) output. This process effectively simulates a thinking process for every prediction. The researchers developed two EBT variants: A decoder-only model inspired by the GPT architecture, and a bidirectional model similar to BERT. Energy-based transformer (source: GitHub) The architecture of EBTs make them flexible and compatible with various inference-time scaling techniques. “EBTs can generate longer CoTs, self-verify, do best-of-N [or] you can sample from many EBTs,” Alexi Gladstone, a PhD student in computer science at the University of Illinois Urbana-Champaign and lead author of the paper, told VentureBeat. “The best part is, all of these capabilities are learned during pretraining.” EBTs in action The researchers compared EBTs against established architectures: the popular transformer++ recipe for text generation (discrete modalities) and the diffusion transformer (DiT) for tasks like video prediction and image denoising (continuous modalities). They evaluated the models on two main criteria: “Learning scalability,” or how efficiently they train, and “thinking scalability,” which measures how performance improves with more computation at inference time. During pretraining, EBTs demonstrated superior efficiency, achieving an up to 35% higher scaling rate than Transformer++ across data, batch size, parameters and compute. This means EBTs can be trained faster and more cheaply.  At inference, EBTs also outperformed existing models on reasoning tasks. By “thinking longer” (using more optimization steps) and performing “self-verification” (generating multiple candidates and choosing the one with the lowest energy), EBTs improved language modeling performance by 29% more than Transformer++. “This aligns with our claims that because traditional feed-forward transformers cannot dynamically allocate additional computation for each prediction being made, they are unable to improve performance for each token by thinking for longer,” the researchers write. For image denoising, EBTs achieved better results than DiTs while using 99% fewer forward passes.  Crucially, the study found that EBTs generalize better than the other architectures. Even with the same or worse pretraining performance, EBTs outperformed existing models on downstream tasks. The performance gains from System 2 thinking were most substantial on data that was further out-of-distribution (different from the training data), suggesting that EBTs are particularly robust when faced with novel and challenging tasks. The researchers suggest that “the benefits of EBTs’ thinking are not uniform

A new paradigm for AI: How ‘thinking as optimization’ leads to better general-purpose models Read More »

Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Moonshot AI, the Chinese artificial intelligence startup behind the popular Kimi chatbot, released an open-source language model on Friday that directly challenges proprietary systems from OpenAI and Anthropic with particularly strong performance on coding and autonomous agent tasks. The new model, called Kimi K2, features 1 trillion total parameters with 32 billion activated parameters in a mixture-of-experts architecture. The company is releasing two versions: a foundation model for researchers and developers, and an instruction-tuned variant optimized for chat and autonomous agent applications. ? Hello, Kimi K2! Open-Source Agentic Model!? 1T total / 32B active MoE model? SOTA on SWE Bench Verified, Tau2 & AceBench among open models?Strong in coding and agentic tasks? Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence… pic.twitter.com/PlRQNrg9JL — Kimi.ai (@Kimi_Moonshot) July 11, 2025 “Kimi K2 does not just answer; it acts,” the company stated in its announcement blog. “With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can’t wait to see what you build.” The model’s standout feature is its optimization for “agentic” capabilities — the ability to autonomously use tools, write and execute code, and complete complex multi-step tasks without human intervention. In benchmark tests, Kimi K2 achieved 65.8% accuracy on SWE-bench Verified, a challenging software engineering benchmark, outperforming most open-source alternatives and matching some proprietary models. David meets Goliath: How Kimi K2 outperforms Silicon Valley’s billion-dollar models The performance metrics tell a story that should make executives at OpenAI and Anthropic take notice. Kimi K2-Instruct doesn’t just compete with the big players — it systematically outperforms them on tasks that matter most to enterprise customers. On LiveCodeBench, arguably the most realistic coding benchmark available, Kimi K2 achieved 53.7% accuracy, decisively beating DeepSeek-V3‘s 46.9% and GPT-4.1‘s 44.7%. More striking still: it scored 97.4% on MATH-500 compared to GPT-4.1’s 92.4%, suggesting Moonshot has cracked something fundamental about mathematical reasoning that has eluded larger, better-funded competitors. But here’s what the benchmarks don’t capture: Moonshot is achieving these results with a model that costs a fraction of what incumbents spend on training and inference. While OpenAI burns through hundreds of millions on compute for incremental improvements, Moonshot appears to have found a more efficient path to the same destination. It’s a classic innovator’s dilemma playing out in real time — the scrappy outsider isn’t just matching the incumbent’s performance, they’re doing it better, faster, and cheaper. The implications extend beyond mere bragging rights. Enterprise customers have been waiting for AI systems that can actually complete complex workflows autonomously, not just generate impressive demos. Kimi K2’s strength on SWE-bench Verified suggests it might finally deliver on that promise. The MuonClip breakthrough: Why this optimizer could reshape AI training economics Buried in Moonshot’s technical documentation is a detail that could prove more significant than the model’s benchmark scores: their development of the MuonClip optimizer, which enabled stable training of a trillion-parameter model “with zero training instability.” This isn’t just an engineering achievement — it’s potentially a paradigm shift. Training instability has been the hidden tax on large language model development, forcing companies to restart expensive training runs, implement costly safety measures, and accept suboptimal performance to avoid crashes. Moonshot’s solution directly addresses exploding attention logits by rescaling weight matrices in query and key projections, essentially solving the problem at its source rather than applying band-aids downstream. The economic implications are staggering. If MuonClip proves generalizable — and Moonshot suggests it is — the technique could dramatically reduce the computational overhead of training large models. In an industry where training costs are measured in tens of millions of dollars, even modest efficiency gains translate to competitive advantages measured in quarters, not years. More intriguingly, this represents a fundamental divergence in optimization philosophy. While Western AI labs have largely converged on variations of AdamW, Moonshot’s bet on Muon variants suggests they’re exploring genuinely different mathematical approaches to the optimization landscape. Sometimes the most important innovations come not from scaling existing techniques, but from questioning their foundational assumptions entirely. Open source as competitive weapon: Moonshot’s radical pricing strategy targets big tech’s profit centers Moonshot’s decision to open-source Kimi K2 while simultaneously offering competitively priced API access reveals a sophisticated understanding of market dynamics that goes well beyond altruistic open-source principles. At $0.15 per million input tokens for cache hits and $2.50 per million output tokens, Moonshot is pricing aggressively below OpenAI and Anthropic while offering comparable — and in some cases superior — performance. But the real strategic masterstroke is the dual availability: enterprises can start with the API for immediate deployment, then migrate to self-hosted versions for cost optimization or compliance requirements. This creates a trap for incumbent providers. If they match Moonshot’s pricing, they compress their own margins on what has been their most profitable product line. If they don’t, they risk customer defection to a model that performs just as well for a fraction of the cost. Meanwhile, Moonshot builds market share and ecosystem adoption through both channels simultaneously. The open-source component isn’t charity — it’s customer acquisition. Every developer who downloads and experiments with Kimi K2 becomes a potential enterprise customer. Every improvement contributed by the community reduces Moonshot’s own development costs. It’s a flywheel that leverages the global developer community to accelerate innovation while building competitive moats that are nearly impossible for closed-source competitors to replicate. From demo to reality: Why Kimi K2’s agent capabilities signal the end of chatbot theater The demonstrations Moonshot shared on social media reveal something more significant than impressive technical capabilities—they show AI finally graduating from parlor tricks to practical utility. Consider the salary analysis example: Kimi K2 didn’t just answer questions about data, it autonomously executed 16 Python operations to generate statistical analysis and interactive visualizations. The London concert planning demonstration involved 17 tool calls across multiple platforms — search, calendar, email, flights, accommodations, and

Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks Read More »

Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Cambridge, Mass.-based Solo.io was awarded “Most Likely to Succeed” at the Innovation Showcase at VB Transform in San Francisco on June 25. Founded in 2017, the cloud-native application networking company — which raised $135 million in a Series C round in 2021 and is valued at $1 billion — provides tools for connecting, securing and observing modern applications, particularly those built on Kubernetes and microservices. >>See all our Transform 2025 coverage here<< Introducing Kagent Studio The company’s Kagent platform is a first-of-its-kind cloud native framework that helps DevOps and platform engineers build and run AI agents in Kubernetes. During the Innovation Showcase at this year’s VB Transform, the company announced the launch of Kagent Studio, a framework that allows enterprises to build, secure, run and manage their AI agents in Kubernetes.  Keith Babo, the company’s CPO, presented the new offering from VB Transform’s main stage. He said the framework aims to address platform engineering challenges by offering features including: Native VSCode extension integration Real-time incident response capabilities Bilateral communication between workplace communications platforms such as Slack and Teams and the integrated development environment (IDE) Automated root cause analysis generation Live infrastructure monitoring and diagnostics “It’s the first framework of its kind that targets this audience that’s building and running on Kubernetes for agents,” Babo said in an interview with VentureBeat. “We wanted to make sure that we could bring that directly into the tools that platform engineers are using on a day-to-day basis.” Helping Platform engineers share context Babo said the framework operates as a native extension in VS code and checks the boxes on many core platform engineering workflows, including incident response. “So you get maybe a PagerDuty alert that shows up in your IDE. You can acknowledge it and immediately our agents running locally inside the IDE will pick up that incident and start to diagnose. All of this is live right in front of the engineer. So you’re seeing the actual charts and the logs and the status of the core pods or infrastructure under which you’re running, and you’re getting all that live and the human in the loop the whole time,” he explained. The engineer then instructs the system to move forward (or not), allowing the agent and human to coexist and partner in the effective resolution of this issue, Babo added.  He said the framework will allow for the live injection of context from the communications platforms to the IDE, with analysis being shared back to the communications platform, enabling bilateral communication for people working on the issue. Salesforce for platform engineers Idit Levine, the company’s founder and CEO, said in an interview with VentureBeat that she sees Kagent Studio as an “essential engineering tool,” similar to Salesforce’s CRM, which is essential for sales teams across the enterprise. She said Kagent Studio connects context and communication across platforms and platform engineering subteams. Levine said winning the “Most Likely to Succeed” award was “great validation for us” and indicative of enterprise interest in their offering. Kagent Studio has already gained significant traction, according to Babo and Levine. It has 1,000-plus contributors, 1,100-plus GitHub stars, and users already running it in production. It is currently in a closed preview phase; users can request access. More information can be found on the discord server.  Each finalist presented to an audience of industry decision-makers and received feedback from a panel of venture capital judges at the showcase. These included Emily Zhao, principal at Salesforce Ventures; Matt Kraning, partner at Menlo Ventures; and Rebecca Li, investment director at Amex Ventures. Read about other winners CTGT and Catio. Finalists included Kumo, Superduper.io, Sutro and Qdrant. Editor’s note: As a thank-you to our readers, we’ve opened up early bird registration for VB Transform 2026 — at just $200. This is where AI ambition meets operational reality, and you’re going to want to be in the room. Reserve your spot now. source

Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase Read More »

Employee AI agent adoption: Maximizing gains while navigating challenges

While agentic AI definitely marks a turning point in human-computer interaction, moving from tool use to collaboration, the next step is integrating these agents and actually deriving value. At VentureBeat’s Transform 2025, Matthew Kropp, managing director and senior partner at BCG, offered a game plan for workflow evolution, employee adoption, and organizational change. “The companies that are at the top of this curve — what we call future built, the ones that are most mature — are seeing substantial results: 1.5 times more revenue growth, 1.8 times higher shareholder value,” Kropp said. “There’s value here, but we’re early.” Deploy, reshape, invent To take advantage and create value with AI and with agents, a company needs to determine where to focus, using a deploy, reshape, invent framework. AI is already being deployed in every enterprise, and will have agents within the next few years. But if you give an employee a chatbot, you haven’t changed the way the work is done. You have to rethink the work, and reshape functions, departments, and workflows by identifying where human work can be automated. “We’re advising companies right now to focus on your three or four big rocks. If you have a big customer support organization, you should apply AI in customer support. It has a huge impact. If you have a big engineering organization, you should employ tools like Windsurf to reshape the way that you do engineering, software development.” Invention is still in the very early stage, but enterprises should be thinking about how to use AI’s ability to be creative, reason, and plan. Look at services and products, and how you interact with customers: can you reinvent that using those capabilities?” For instance, makeup company L’Oreal launched a virtual beauty advisor to scale that exclusive service beyond their retail locations, reinventing the way they think about interacting with their customers at scale. Thinking beyond basic use cases It’s also critical to think about how AI changes your business. There’s been a lot of focus in the last couple of years on cost reduction by replacing workers, but that isn’t big-picture thinking. AI amplifies the employees you currently have, dramatically increasing their productivity. “This is what we’re seeing in software development,” he said. “I don’t think we’ll see companies laying off their software developers. We’re going to see a massive explosion in the amount of capability and features that software companies are building.” In a study BCG conducted with Harvard, Wharton, and MIT, they asked 750 knowledge workers to write a business and marketing plan, with and without generative AI. The participants using GPT4 executed 25% faster, completed 15% more tasks, and the quality of their output was 40% better. And when given an LLM, the bottom performers in the baseline did just as well as the top performers. “It brought everyone’s performance up, which is very powerful, because in most organizations the new joiners are less effective than more experienced people,” he said. “It has the ability to increase time to proficiency.” AI can also surpass human scale, even open up new applications that were not previously possible. For example, in the medical space, outcomes for patients are significantly improved with preoperative and postoperative follow-up from a nurse, but implementing this has been cost-prohibitive — until the advent of AI nurses that can take on that task for a large patient population. Overcoming the biggest hurdle: Adoption While these tools are fantastic, people aren’t using them. BCG tracked the adoption of GitHub Copilot and productivity metrics for an organization with about 10,000 software engineers. The top 5% engineers doubled in productivity in four months, while 60% showed zero improvement, because they just didn’t adopt the tool at all. Why won’t humans adopt? There are three reasons. First is capability ignorance. The second, habit inertia. The third is identity threat, and that is the hardest to overcome. Developers are asking, “If this AI can write code for me then who am I? What’s my value?” “This is going to be the real work of the next three to five years,” Kropp said. “It’s getting people to use the agents.” Strategies overcoming reluctance There are a few valuable ways to overcome these challenges. Naturally, getting the right tool is the first step, and integrating it with the way people work by training them explicitly. It’s also critical to measure and celebrate adoption for those employees actively using the tools so that everyone else starts to see they need to get on this bandwagon. Another important step is ramping up scarcity — that means taking away resources so employees need to do more with less. At the same time, it’s essential to redesign work processes hand-in-hand with those employees who are on the front lines. Don’t just identify laborious processes where manual work can be automated — identify the parts where humans bring value. “We minimize the toil and we maximize the joy,” Kropp said. “We’re left with a much more efficient process, a much more efficient company, a much more productive workforce, and jobs that people like to be in.” source

Employee AI agent adoption: Maximizing gains while navigating challenges Read More »