VentureBeat

Gartner forecasts gen AI spending to hit $644B in 2025: What it means for enterprise IT leaders

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Make no mistake about it, there is a lot of money being spent on generative AI in 2025. Analyst firm Gartner released a new report today forecasting that global gen AI spending will hit $644 billion in 2025. That figure represents a 76.4% year-over-year increase over gen AI spending in 2024.  Gartner’s report joins a chorus of other industry analyses in recent months that all point to increasing adoption and spending for gen AI. Spending has been growing by 130%, according to research conducted by AI at Wharton, a research center at the Wharton School of the University of Pennsylvania. Deloitte reported that 74% of enterprises have already met or exceeded gen AI initiatives. While it’s no surprise that spending on gen AI is growing, the Gartner report provides new clarity on where the money is going and where enterprises might get the most value. According to Gartner’s analysis, hardware will claim a staggering 80% of all gen AI spending in 2025. The forecast shows: Devices will account for $398.3 billion (99.5% growth) Servers will reach $180.6 billion (33.1% growth) Software spending follows at just $37.2 billion (93.9% growth) Services will total $27.8 billion (162.6% growth) “The device market was the biggest surprise, it is the market most driven by the supply side rather than the demand side,” John Lovelock, distinguished VP analyst at Gartner, told VentureBeat. “Consumers and enterprises are not seeking AI enabled devices, but manufacturers are producing them and selling them. By 2027, it will be almost impossible to buy a PC that is not AI enabled.” Hardware’s dominance will intensify, not diminish for enterprise AI With hardware claiming approximately 80% of gen AI spending in 2025, many might assume this ratio would gradually shift toward software and services as the market matures. Lovelock’s insights suggest the opposite. “The ratios shift more in hardware’s favor over time,” Lovelock said. “While more and more software will have gen AI enabled features, there will be less attributable money spent on gen AI software—gen AI will be embedded functionality delivered as part of the price of the software.” This projection has profound implications for technology budgeting and infrastructure planning. Organizations expecting to shift spending from hardware to software over time may need to recalibrate their financial models to account for ongoing hardware requirements. Moreover, the embedded nature of future-gen AI functionality means that discrete AI projects may become less common. Instead, AI capabilities will increasingly arrive as features within existing software platforms, making intentional adoption strategies and governance frameworks even more critical. The PoC graveyard: Why internal enterprise AI projects fail Gartner’s report highlights a sobering reality: many internal gen AI proof-of-concept (PoC) projects have failed to deliver expected results. This has created what Lovelock calls a “paradox” where expectations are declining despite massive investment. When asked to elaborate on these challenges, Lovelock identified three specific barriers that consistently derail gen AI initiatives. “Corporations with more experience with AI had higher success rates with gen AI, while enterprises with less experience suffered higher failure rates,” Lovelock explained. “However, most enterprises failed for one or more of the top three reasons: Their data was of insufficient size or quality, their people were unable to use the new technology or change to use the new process or the new gen AI would not have a sufficient ROI.” These insights reveal that gen AI’s primary challenges aren’t technical limitations but organizational readiness factors: Data inadequacy: Many organizations lack sufficient high-quality data to train or implement gen AI systems effectively. Change resistance: Users struggle to adopt new tools or adapt workflows to incorporate AI capabilities. ROI shortfalls: Projects fail to deliver measurable business value that justifies their implementation costs. The strategic pivot: From internal development to commercial solutions The Gartner forecast notes an expected shift from ambitious internal projects in 2025 and beyond. Instead, the expectation is that enterprises will opt for commercial off-the-shelf solutions that deliver more predictable implementation and business value. This transition reflects the growing recognition that building custom-gen AI solutions often presents more challenges than anticipated. Lovelock’s comments about failure rates underscore why many organizations are pivoting to commercial options offering predictable implementation paths and clearer ROI. For technical leaders, this suggests prioritizing vendor solutions that embed gen AI capabilities into existing systems rather than building custom applications from scratch. As Lovelock noted, these capabilities will increasingly be delivered as part of standard software functionality rather than as separate gen AI products. What this means for enterprise AI strategy For enterprises looking to lead in AI adoption, Gartner’s forecast challenges several common assumptions about the gen AI marketplace. The emphasis on hardware spending, supply-side drivers and embedded functionality suggests a more evolutionary approach may yield better results than revolutionary initiatives. Technical decision-makers should focus on integrating commercial gen AI capabilities into existing workflows rather than building custom solutions. This approach aligns with Lovelock’s observation that CIOs are reducing self-development efforts in favor of features from existing software providers. For organizations planning more conservative adoption, the inevitability of AI-enabled devices presents challenges and opportunities. While these capabilities may arrive through regular refresh cycles regardless of strategic intent, organizations that prepare to leverage them effectively will gain competitive advantages. As gen AI spending accelerates toward $644 billion in 2025, success won’t be determined by spending volume alone. Organizations that align their investments with organizational readiness, focus on overcoming the three key failure factors and develop strategies to leverage increasingly embedded gen AI capabilities will extract the most value from this rapidly evolving technology landscape. source

Gartner forecasts gen AI spending to hit $644B in 2025: What it means for enterprise IT leaders Read More »

Why businesses judge AI like humans — and what that means for adoption

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As businesses rush to adopt AI, they’re discovering an unexpected truth: Even the most rational enterprise buyers aren’t making purely rational decisions — their subconscious requirements go far beyond the conventional software evaluation standards. Let me share an anecdote: It’s November 2024; I’m sitting in a New York City skyscraper, working with a fashion brand on their first AI assistant. The avatar, Nora, is a 25-year-old digital assistant displayed on a six-foot-tall kiosk. She has sleek brown hair, a chic black suit and a charming smile. She waves “hi” when recognizing a client’s face, nods as they speak and answers questions about company history and tech news. I came prepared with a standard technical checklist: response accuracy, conversation latency, face recognition precision… But my client didn’t even glance at the checklist. Instead, they asked, “Why doesn’t she have her own personality? I asked her favorite handbag, and she didn’t give me one!” Changing how we evaluate technology It’s striking how quickly we forget these avatars aren’t human. While many worry about AI blurring the lines between humans and machines, I see a more immediate challenge for businesses: A fundamental shift in how we evaluate technology. When software begins to look and act human, users stop evaluating it as a tool and begin judging it as a human being. This phenomenon — judging non-human entities by human standards — is anthropomorphism, which has been well-studied in human-pet relationships, and is now emerging in the human-AI relationship. When it comes to procuring AI products, enterprise decisions are not as rational as you might think because decision-makers are still humans. Research has shown that unconscious perceptions shape most human-to-human interactions, and enterprise buyers are no exception. Thus, businesses signing an AI contract aren’t just entering into a “utility contract” seeking cost reduction or revenue growth anymore; they’re entering an implicit “emotional contract.” Often, they don’t even realize it themselves. Getting the ‘AI baby’ perfect? Although every software product has always had an emotional element, when the product becomes infinitely similar to a real human being, this aspect becomes much more prominent and unconscious. These unconscious reactions shape how your employees and customers engage with AI, and my experience tells me how widespread these responses are — they’re truly human. Consider these four examples and their underlying psychological ideas: When my client in New York asked about Nora’s favorite handbag, craving for her personality, they were tapping into social presence theory, treating the AI as a social being that needs to be present and real. One client fixated on their avatar’s smile: “The mouth shows a lot of teeth — it’s unsettling.” This reaction reflects the uncanny valley effect, where nearly human-like features provoke discomfort. Conversely, a visually appealing yet less functional AI agent sparked praise because of the aesthetic-usability effect — the idea that attractiveness can outweigh performance issues. Yet another client, a meticulous business owner, kept delaying the project launch. “We need to get our AI baby perfect,” he repeated in every meeting. “It needs to be flawless before we can show it to the world.” This obsession with creating an idealized AI entity suggests a projection of an ideal self onto our AI creations, as if we’re crafting a digital entity that embodies our highest aspirations and standards. What matters most to your business? So, how can you lead the market by tapping into these hidden emotional contracts and win over your competitors who are just stacking up one fancy AI solution after another? The key is determining what matters for your business’s unique needs. Set up a testing process. This will not only help you identify top priorities but, more importantly, deprioritize minor details, no matter how emotionally compelling. Since the sector is so new, there are almost no readily usable playbooks. But you can be the first mover by establishing your original way of figuring out what suits your business best. For example, the client’s question about “the AI avatar’s personality” was validated by testing with internal users. On the contrary, most people couldn’t tell the difference between the several versions that the business owner had struggled back and forth for his “perfect AI baby,” meaning that we could stop at a “good enough” point. To help you recognize patterns more easily, consider hiring team members or consultants who have a background in psychology. All four examples are not one-off, but are well-researched psychological effects that happen in human-to-human interactions. Your relationship with the tech vendor must also change. They must be a partner who navigates the experience with you. You can set up weekly meetings with them after signing a contract and share your takeaways from testing so they can create better products for you. If you don’t have the budget, at least buffer extra time to compare products and test with users, allowing those hidden “emotional contracts” to surface. We are at the forefront of defining how humans and AI interact. Successful business leaders will embrace the emotional contract and set up processes to navigate the ambiguity that will help them win the market.   Joy Liu has led enterprise products at AI startups and cloud and AI initiatives at Microsoft. source

Why businesses judge AI like humans — and what that means for adoption Read More »

New approach to agent reliability, AgentSpec, forces agents to follow rules

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI agents have safety and reliability problems. Although agents would allow enterprises to automate more steps in their workflows, they can take unintended actions while executing a task, are not very flexible and are difficult to control. Organizations have already raised the alarm about unreliable agents, worried that once deployed, agents might forget to follow instructions.  OpenAI even admitted that ensuring agent reliability would involve working with outside developers, so it opened up its Agents SDK to help solve this issue.  However, Singapore Management University (SMU) researchers have developed a new approach to solving agent reliability. AgentSpec is a domain-specific framework that lets users “define structured rules that incorporate triggers, predicates and enforcement mechanisms.” The researchers said AgentSpec will make agents work only within the parameters that users want. Guiding LLM-based agents with a new approach AgentSpec is not a new large language model (LLM) but rather an approach to guide LLM-based AI agents. The researchers believe AgentSpec can be used for agents in enterprise settings and self-driving applications.    The first AgentSpec tests integrated on LangChain frameworks, but the researchers said they designed it to be framework-agnostic, meaning it can also run on AutoGen and Apollo ecosystems.  Experiments using AgentSpec showed it prevented “over 90% of unsafe code executions, ensures full compliance in autonomous driving law-violation scenarios, eliminates hazardous actions in embodied agent tasks and operates with millisecond-level overhead.” LLM-generated AgentSpec rules, which used OpenAI’s o1, also had a strong performance and enforced 87% of risky code and prevented “law-breaking in 5 out of 8 scenarios.” Current methods are a little lacking AgentSpec is not the only method for helping developers give agents more control and reliability. Other approaches include ToolEmu and GuardAgent. The startup Galileo launched Agentic Evaluations, a way to ensure agents work as intended. The open-source platform H2O.ai uses predictive models to improve the accuracy of agents used by companies in finance, healthcare, telecommunications and government.  The AgentSpec said researchers said current approaches to mitigate risks, like ToolEmu, effectively identify risks. They noted that “these methods lack interpretability and offer no mechanism for safety enforcement, making them susceptible to adversarial manipulation.”  Using AgentSpec AgentSpec works as a runtime enforcement layer for agents. It intercepts the agent’s behavior while executing tasks and adds safety rules set by humans or generated by prompts. Since AgentSpec is a custom domain-specific language, users must define the safety rules. There are three components to this: the first is the trigger, which lays out when to activate the rule; the second is to check to add conditions; and the third is enforce, which enforces actions to take if the rule is violated.  AgentSpec is built on LangChain, though, as previously stated, the researchers said AgentSpec can also be integrated into other frameworks like AutoGen or the autonomous vehicle software stack Apollo.  These frameworks orchestrate the steps agents need to take by taking in the user input, creating an execution plan, observing the result, and then deciding if the action was completed and, if not, planning the next step. AgentSpec adds rule enforcement into this flow.  “Before an action is executed, AgentSpec evaluates predefined constraints to ensure compliance, modifying the agent’s behavior when necessary. Specifically, AgentSpec hooks into three key decision points: before an action is executed (AgentAction), after an action produces an observation (AgentStep), and when the agent completes its task (AgentFinish). These points provide a structured way to intervene without altering the core logic of the agent,” the paper states.  More reliable agents Approaches like AgentSpec underscore the need for reliable agents for enterprise use. As organizations begin to plan their agentic strategy, tech decision leaders also look at ways to ensure reliability.  For many, agents will eventually autonomously and proactively do tasks for users. The idea of ambient agents, where AI agents and apps continuously run in the background and trigger themselves to execute actions, would require agents that do not stray from their path and accidentally introduce non-safe actions.  If ambient agents are where agentic AI will go in the future, expect more methods like AgentSpec to proliferate as companies seek to make AI agents continuously reliable.  source

New approach to agent reliability, AgentSpec, forces agents to follow rules Read More »

Credit where credit’s due: Inside Experian’s AI framework that’s changing financial access

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More While many enterprises are now racing to adopt and deploy AI, credit bureau giant Experian has taken a very measured approach. Experian has developed its own internal processes, frameworks and governance models that have helped it test out generative AI, deploy it at scale and have an impact. The company’s journey has helped to transform operations from a traditional credit bureau into a sophisticated AI-powered platform company. Its approach—blending advanced machine learning (ML), agentic AI architectures and grassroots innovation—has improved business operations and expanded financial access to an estimated 26 million Americans. Experian’s AI journey contrasts sharply with companies that only began exploring machine learning after ChatGPT’s emergence in 2022. The credit giant has been methodically developing AI capabilities for nearly two decades, creating a foundation allowing it to capitalize on generative AI breakthroughs rapidly. “AI has been part of the fabric at Experian way beyond when it was cool to be in AI,” Shri Santhanam, EVP and GM, Software, Platforms and AI products at Experian, told VentureBeat in an exclusive interview. “We’ve used AI to unlock the power of our data to create a better impact for businesses and consumers for the past two decades.” From traditional machine learning to AI innovation engine Before the modern gen AI era, Experian was already using and innovating with ML. Santhanam explained that instead of relying on basic, traditional statistical models, Experian pioneered the use of Gradient-Boosted Decision Trees alongside other machine learning techniques for credit underwriting. The company also developed explainable AI systems—crucial for regulatory compliance in financial services—that could articulate the reasoning behind automated lending decisions. Most significantly, the Experian Innovation Lab (formerly Data Lab) experimented with language models and transformer networks well before ChatGPT’s release. This early work positioned the company to quickly leverage generative AI advancements rather than starting from scratch. “When the ChatGPT meteor hit, it was a fairly straightforward point of acceleration for us, because we understood the technology, had applications in mind, and we just stepped on the pedal,” Santhanam explained. This technology foundation enabled Experian to bypass the experimental phase that many enterprises are still navigating and move directly to production implementation. While other organizations were just beginning to understand what large language models (LLMs) could do, Experian was already deploying them within their existing AI framework, applying them to specific business problems they had previously identified. Four pillars for enterprise AI transformation When generative AI emerged, Experian didn’t panic or pivot; it accelerated along a path already charted. The company organized its approach around four strategic pillars that offer technical leaders a comprehensive framework for AI adoption: Product Enhancement: Experian examines existing customer-facing offerings to identify opportunities for AI-driven improvements and entirely new customer experiences. Rather than creating standalone AI features, Experian integrates generative capabilities into its core product suite.  Productivity Optimization: The second pillar addressed productivity optimization by implementing AI across engineering teams, customer service operations and internal innovation processes. This included providing AI coding assistance to developers and streamlining customer service operations. Platform Development: The third pillar—perhaps most critical to Experian’s success—centered on platform development. Experian recognized early that many organizations would struggle to move beyond proof-of-concept implementations, so it invested in building platform infrastructure designed specifically for the responsible scaling of AI initiatives enterprise-wide. Education and Empowerment: The fourth pillar addressed education, empowerment, and communication—creating structured systems to drive innovation throughout the organization rather than limiting AI expertise to specialized teams. This structured approach offers a blueprint for enterprises seeking to move beyond scattered AI experiments toward systematic implementation with measurable business impact. Technical architecture: How Experian built a modular AI platform For technical decision-makers, Experian’s platform architecture demonstrates how to build enterprise AI systems that balance innovation with governance, flexibility and security. The company constructed a multi-layered technical stack with core design principles that prioritize adaptability: “We avoid going through one-way doors,” Santhanam explained. “If we’re making choices on technology or frameworks, we want to ensure that for the most part… we make choices which we could pivot from if needed.” The architecture includes: Model layer: Multiple large language model options, including OpenAI APIs through Azure, AWS Bedrock models, including Anthropic’s Claude, and fine-tuned proprietary models. Application layer: Service tooling and component libraries enabling engineers to build agentic architectures. Security layer: Early partnership with Dynamo AI  for security, policy governance and penetration testing specifically designed for AI systems. Governance structure: A Global AI Risk Council with direct executive involvement. This approach contrasts with enterprises that have committed to single-vendor solutions or proprietary models, providing Experian greater flexibility as AI capabilities continue to evolve. The company is now seeing its architecture shift toward what Santhanam describes as “AI systems architected more as a mixture of experts and agents powered by more focused specialist or small language models.” Measurable impact: AI-driven financial inclusion at scale Beyond architectural sophistication, Experian’s AI implementation demonstrates concrete business and societal impact, particularly in addressing the challenge of “credit invisibles.” In the financial services industry, “credit invisibles” refers to the approximately 26 million Americans who lack sufficient credit history to generate a traditional credit score. These individuals, often younger consumers, recent immigrants, or those from historically underserved communities, face significant barriers to accessing financial products despite potentially being creditworthy. Traditional credit scoring models primarily rely on standard credit bureau data like loan payment history, credit card utilization, and debt levels. Without this conventional history, lenders historically viewed these consumers as high-risk or declined to serve them entirely. This creates a catch-22 where people cannot build credit because they cannot access credit products in the first place. Experian tackled this problem through four specific AI innovations: Alternative data models: Machine learning systems incorporating non-traditional data sources (rental payments, utilities, telecom payments) into creditworthiness assessments, analyzing hundreds of variables rather than the limited factors in conventional models. Explainable AI for compliance: Frameworks that maintain regulatory compliance by articulating

Credit where credit’s due: Inside Experian’s AI framework that’s changing financial access Read More »

Observe launches VoiceAI agents to automate customer call centers with realistic, humanlike voices that don’t interrupt

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Observe.AI has officially launched VoiceAI agents, a solution designed to automate routine customer interactions in contact centers. The latest addition to the company’s AI-driven conversational intelligence platform, VoiceAI agents aim to improve customer experience while reducing operational costs. With this release, Observe.AI is positioning itself as the only complete AI-powered platform that supports enterprises across the entire customer journey. The company’s suite of solutions now includes enterprise-grade VoiceAI agents, real-time agent assist tools, AutoQA for quality monitoring, agent coaching, and business insights. Automating the routine Observe.AI’s VoiceAI agents are built to handle a wide range of customer service inquiries, from frequently asked questions to more complex, multi-step conversations. They are built atop a combination of in-house AI models and partnerships with major AI providers like OpenAI and Anthropic for large language models (LLMs). “It’s an ensemble of multiple smaller models,” Jain explained. “For example, we have a specific model for number detection, a specific model for entity detection, a model for turn detection, and so on.” The goal is to alleviate the burden on human agents, allowing them to focus on higher-value interactions. As Swapnil Jain, CEO and co-founder of Observe.AI, told VentureBeat in a recent video call interview: “Enterprises are saying, ‘Do we really need human agents for these kinds of use cases?’” Jain said that companies often receive calls for basic tasks like checking an account balance or resetting a password—interactions that AI can now handle efficiently. For customers, this means eliminating long hold times and avoiding frustrating IVR menus that require pressing multiple buttons or repeatedly requesting a human agent. The voice AI space is becoming increasingly crowded with options ranging from proprietary models like OpenAI’s newly released GPT-4o-transcribe family and ElevenLabs to open source solutions as well. So why would someone pick Observe.AI’s agents over these? In a nutshell: specialization and ease-of-use. Instead of having to use raw voice AI models through providers’ APIs and building custom integrations with a business, or custom voice apps, Obseve.AI’s platform is already built to essentially “plug and play” with existing workflows and operations. So while, GPT-4o and other LLMs provide raw AI capabilities, Jain and Observe.AI’s contention is that they don’t offer a fully integrated solution for customer service workflows. In addition, unlike traditional voice AI assistants, Observe.AI’s VoiceAI agents are specifically designed for contact centers. The system combines various AI technologies, including: Automatic Speech Recognition (ASR): Converts spoken language into text in real time. Text-to-Speech (TTS): Delivers responses in a human-like voice. Proprietary AI Models: Specialized for handling numbers, turn-taking, and interruptions—critical in customer service settings. Jain noted that one of the key challenges AI agents face is knowing when a customer has actually finished speaking. “When do you know that the AI agent can start processing and the customer has stopped speaking?” he asked. “Sometimes I’m taking pauses because my sentence is over and I’m starting a new one. Sometimes I just stop speaking. How do you know the difference?” Observe.AI has developed custom in-house models that solve these nuances, ensuring smoother conversations between AI and customers. Deploys fast while integrating deeply with enterprise product support and tracking systems One of Observe.AI’s key advantages is its ability to integrate seamlessly with existing enterprise systems. Over time, the company has developed pre-built integrations with more than 250 platforms, including leading telephony, CRM, and workforce management tools such as Salesforce, Zendesk, and ServiceNow. This approach allows businesses to implement VoiceAI agents quickly. While AI deployments can sometimes take months, Observe.AI claims that its VoiceAI agents can go live in as little as one week, with minimal setup costs. “It’s not a professional services model where we take six months to customize something for you,” Jain said. “We come in, take two weeks to configure the product, and it works.” Security and compliance at the forefront Given the sensitivity of customer interactions, Observe.AI has built its solution with enterprise-grade security. The company holds certifications including GDPR, HIPAA, HITRUST, SOC2, and ISO27001. While voice biometrics have been used in the past for authentication, Jain stated that Observe.AI does not rely on them due to security concerns. Instead, the system follows traditional authentication methods, such as verifying Social Security numbers or account details. Additionally, Observe.AI offers data redaction capabilities to remove personally identifiable information (PII) before storage, and customers can opt for private instances to ensure data remains isolated. “In today’s world, you cannot rely on individual speech patterns for authentication,” Jain said. “We work with businesses to configure the same security rules they use for their human agents into our AI agents.” Saving $$$ through automation Observe.AI’s pricing model is based on completed tasks rather than per-minute usage. The cost depends on the complexity of the interaction, with simpler tasks (such as routing a call) priced lower than more involved tasks (such as processing an insurance claim). According to Jain, businesses can expect to save between 70-80% on customer service costs compared to using human agents. Early enterprise success stories Companies using VoiceAI agents are already seeing significant improvements. Emmanual Noyola, Director of Patient Services at Affordable Care, highlighted the impact on his team: “Beth, our VoiceAI agent, handles multiple intents with a 95% containment rate so our customer care team can focus on more complex cases.” By analyzing every conversation, Observe.AI’s platform continuously refines AI agent performance, ensuring accuracy and compliance. Businesses can also use AutoQA to evaluate both AI and human agents, identifying areas for improvement. One of the key challenges in AI-driven customer service is maintaining accuracy while preventing unintended responses. Jain acknowledged these concerns, referencing past AI missteps in customer service automation. “The core thesis behind making these enterprise-grade is having a very high bar on the confidence of the response,” he said. “If our response confidence is less than a certain threshold, it’s better for the AI agent to not even engage.” Blending AI automation with

Observe launches VoiceAI agents to automate customer call centers with realistic, humanlike voices that don’t interrupt Read More »

The open source Model Context Protocol was just updated — here’s why it’s a big deal

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The Model Context Protocol (MCP)—a rising open standard designed to help AI agents interact seamlessly with tools, data and interfaces—just hit a significant milestone. Today, developers behind the initiative finalized an updated version of the MCP spec, introducing key upgrades to make AI agents more secure, capable and interoperable. In a very significant move, OpenAI, the industry leader in generative AI, followed the MCP announcement today by saying it is also adding support for MCP across its products. CEO Sam Altman said the support is available today in OpenAI’s Agents SDK and that support for ChatGPT’s desktop app and the Responses API would be coming soon. Microsoft announced support for MCP alongside this release, including launching a new Playwright-MCP server that allows AI agents like Claude to browse the web and interact with sites using the Chrome accessibility tree. “This new version is a major leap forward for agent-tool communication,” Alex Albert, a key contributor to the MCP project, said in a post on Twitter. “And having Microsoft building real-world infrastructure on top of it shows how quickly this ecosystem is evolving.” What’s new in the updated MCP version? The March 26 update brings several important protocol-level changes: OAuth 2.1-Based Authorization Framework: Adds a robust standard for securing agent-server communication, especially in HTTP-based transports. Streamable HTTP Transport: Replaces the older HTTP+SSE setup, enabling real-time, bidirectional data flow with better compatibility. JSON-RPC Batching: Allows clients to send multiple requests in one go, improving efficiency and reducing latency in agent-tool interactions. Tool Annotations: Adds rich metadata for describing tool behavior, enabling more imaginative discovery and reasoning by AI agents. Figure 1: Claude Desktop using Playwright-MCP to navigate and describe datasette.io, demonstrating web automation powered by the Model Context Protocol. The protocol uses a modular JSON-RPC 2.0 base, with a layered architecture separating core transport, lifecycle management, server features (like resources and prompts) and client features (like sampling or logging). Developers can pick and choose which components to implement, depending on their use case. Microsoft’s contribution: Browser automation via MCP Two days ago, Microsoft released Playwright-MCP, a server that wraps its powerful browser automation tool in the MCP standard. This means AI agents like Claude can now do more than talk—they can click, type, browse, and interact with the web like real users. Built on the Chrome accessibility tree, the integration allows Claude to access and describe page contents in a human-readable form. The available toolset includes: Navigation: browser_navigate, go_back, go_forward Input: browser_type, browser_click, browser_press_key Snapshots: browser_snapshot, browser_screenshot Element-based interactions using accessibility descriptors This turns any compliant AI agent into a test automation bot, QA assistant or data navigator. people love MCP and we are excited to add support across our products. available today in the agents SDK and support for chatgpt desktop app + responses api coming soon! — Sam Altman (@sama) March 26, 2025 Setup is easy: users simply add Playwright as a command in claude_desktop_config.json, and the Claude Desktop app will recognize the tools at runtime. The bigger picture: Interoperability at scale Figure 2: The modular design of MCP enables developers to implement only the layers they need, while maintaining compatibility. Anthropic first introduced MCP in late 2023 to solve a growing pain point: AI agents need to interact with real-world tools, but every app speaks a different “language.” MCP aims to fix that by providing a standard protocol for describing and using tools across ecosystems. With backing from Anthropic, LangChain and now Microsoft, MCP is emerging as a serious contender for becoming the standard layer of agent interconnectivity. Since MCP was launched first by Anthropic, questions lingered whether Anthropic’s largest competitor, OpenAI, would support the protocol. And of course, Microsoft, a big ally of OpenAI, was another question mark. The fact that both players have supported the protocol shows momentum is building among enterprise and open-source communities. OpenAI itself has been opening its ecosystem around agents, including with its latest Agents SDK announced a week ago — and the move has solidified support around OpenAI’s API formats becoming a standard, given that others like Anthropic and Google have fallen in line. So with OpenAI’s API formats and MCP both seeing support, standardization has seen a big win over the past few weeks. “We’re entering the protocol era of AI,” tweeted Alexander Doria, the co-founder of AI startup Pleias. “This is how agents will actually do things.” What’s next? With the release of MCP 0.2 and Microsoft’s tangible support, the groundwork is being laid for a new generation of agents who can think and act securely and flexibly across the stack. Figure 3: OAuth 2.1 Authorization Flow in Model Context Protocol (MCP) The big question now is: Will others follow? If Meta, Amazon, or Apple sign on, MCP could soon become the universal “language” of AI actions. For now, it’s a big day for the agent ecosystem—one that brings the promise of AI interoperability closer to reality. source

The open source Model Context Protocol was just updated — here’s why it’s a big deal Read More »

METASCALE improves LLM reasoning with adaptive strategies

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new framework called METASCALE enables large language models (LLMs) to dynamically adapt their reasoning mode at inference time. This framework addresses one of LLMs’ shortcomings, which is using the same reasoning strategy for all types of problems. Introduced in a paper by researchers at the University of California, Davis, the University of Southern California and Microsoft Research, METASCALE uses “meta-thoughts”—adaptive thinking strategies tailored to each task—to improve LLM performance and generalization across various tasks.  This approach can offer enterprises a way to enhance the accuracy and efficiency of their LLM applications without changing models or engaging in expensive fine-tuning efforts. The limitations of fixed reasoning Strategies One of the main challenges of LLM applications is their fixed and inflexible reasoning behavior. Unlike humans, who can consciously choose different approaches to solve problems, LLMs often rely on pattern matching from their training data, which may not always align with sound reasoning principles that humans use.  Current methods for adjusting the reasoning process of LLMs, such as chain-of-thought (CoT) prompting, self-verification and reverse thinking, are often designed for specific tasks, limiting their adaptability and effectiveness across diverse scenarios.  The researchers point out that “these approaches impose fixed thinking structures rather than enabling LLMs to adaptively determine the most effective task-specific strategy, potentially limiting their performance.” To address this limitation, the researchers propose the concept of “meta-thinking.” This process allows LLMs to reflect on their approach before generating a response. Meta-thoughts guide the reasoning process through two components inspired by human cognition: Cognitive mindset: The perspective, expertise, or role the model adopts to approach the task. Problem-solving strategy: A structured pattern used to formulate a solution for the task based on the chosen mindset. Instead of directly tackling a problem, the LLM first determines how to think, selecting the most appropriate cognitive strategy. For example, when faced with a complex software problem, the LLM might first think about the kind of professional who would solve it (e.g., a software engineer) and choose a strategy to approach the problem (e.g., using design patterns to break down the problem or using a micro-services approach to simplify the deployment).  “By incorporating this meta-thinking step, LLMs can dynamically adapt their reasoning process to different tasks, rather than relying on rigid, predefined heuristics,” the researchers write. Building upon meta-thoughts, the researchers introduce METASCALE, a test-time framework that can be applied to any model through prompt engineering.  “The goal is to enable LLMs to explore different thinking strategies, and generate the most effective response for a given input,” they state. METASCALE operates in three phases: Initialization: METASCALE generates a diverse pool of reasoning strategies based on the input prompt. It does this by prompting the LLM to self-compose strategies and leveraging instruction-tuning datasets containing reasoning templates for different types of problems. This combination creates a rich initial pool of meta-thoughts. Selection: A Multi-Armed Bandit (MAB) algorithm selects the most promising meta-thought for each iteration. MAB is a problem framework where an agent must repeatedly choose between multiple options, or “arms,” each with unknown reward distributions. The core challenge lies in balancing “exploration” (e.g., trying different reasoning strategies) and “exploitation” (consistently selecting the reasoning strategy that previously provided the best responses). In METASCALE, each meta-thought is treated as an arm, and the goal is to maximize the reward (response quality) based on the selected meta-thought. Evolution: A genetic algorithm refines and expands the pool of cognitive strategies iteratively. METASCALE uses high-performing meta-thoughts as “parents” to produce new “child” meta-thoughts. The LLM is prompted to develop refined meta-thoughts that integrate and improve upon the selected parents. To remain efficient, METASCALE operates within a fixed sampling budget when generating meta-thoughts.  The researchers evaluated METASCALE on mathematical reasoning benchmarks (GSM8K), knowledge and language understanding (MMLU-Pro), and Arena-Hard, comparing it to four baseline inference methods: direct responses (single-pass inference), CoT, Best-of-N (sampling multiple responses and choosing the best one), and Best-of-N with CoT. They used GPT-4o and Llama-3.1-8B-Instruct as the backbone models for their experiments. The results show that METASCALE significantly enhances LLM problem-solving capabilities across diverse tasks, consistently outperforming baseline methods. METASCALE achieved equal or superior performance compared to all baselines, regardless of whether they used CoT prompting. Notably, GPT-4o with METASCALE outperformed o1-mini under style control. “These results demonstrate that integrating meta-thoughts enables LLMs to scale more effectively during test time as the number of samples increases,” the researchers state. As the number of candidate solutions increased, METASCALE showed significantly higher gains than other baselines, indicating that it is a more effective scaling strategy. Implications for the enterprise As a test-time technique, METASCALE can help enterprises improve the quality of LLM reasoning through smart prompt engineering without the need to fine-tune or switch models. It also doesn’t require building complex software scaffolding on top of models, as the logic is completely provided by the LLM itself. By dynamically adjusting the reasoning strategies of LLMs, METASCALE is also practical for real-world applications that handle various reasoning tasks. It is also a black-box method, which can be applied to open-source models running on the enterprise cloud or closed models running behind third-party APIs. It shows promising capabilities of test-time scaling techniques for reasoning tasks. source

METASCALE improves LLM reasoning with adaptive strategies Read More »

ChatGPT gets smarter: OpenAI adds internal data referencing

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has finally added a long-requested feature for ChatGPT users: the ability to reference internal knowledge sources.  ChatGPT Team users, one of the company’s paid tiers, can connect internal knowledge databases directly to the platform during this beta period, bringing company-specific information. A feature many enterprises say would give better responses to questions. This lets users perform semantic searches of their data, link directly to internal sources in responses, get the most relevant and up-to-date context, and ensure that ChatGPT understands internal company lingo.  Right now, ChatGPT Team admins can connect Google Drive to ChatGPT. However, Nate Gonzales, a product manager at OpenAI, said in a LinkedIn post that the team “is already working on the next wave of connectors, aiming to support all the key internal knowledge sources your team relies on today.” These could include data analytics platforms and CRMs. “One of my favorite parts: over time, the model learns your org’s unique language—project names and acronyms, and team-specific terms—while respecting your user permissions so responses are grounded in the right context. (We love our codenames at OpenAI ?),” Gonzales said.  Internal documents lead to better institutional knowledge By connecting internal knowledge bases, ChatGPT Team could become more invaluable to users who already ask the platform strategy questions or for analysis. Querying company and domain-specific data gives users more context for their conversations and makes AI chatbots more useful.  Unsurprisingly, many companies with AI platforms, chatbots, agents, or applications tout their proprietary internal knowledge graphs as a differentiator. This is also why enterprise search is a rising area of enterprise AI.  Companies like Glean offer a way to use AI to find information throughout companies. ServiceNow acquired MoveWorks in a bid to boost its enterprise search capabilities.  OpenAI already lets people upload documents directly from Google Drive or Microsoft’s OneDrive. Google brought the power of Gemini to its Workspace product, meaning users could ask the model questions about their work while in a file. Perplexity added the capability to use internal documents as data sources.  Control and customization  OpenAI said controls around the data sources will look different for some users.  While only admins can add data connectors, users from smaller teams can configure when ChatGPT will tap into internal knowledge bases and which drives. However, larger teams require the administrator to decide which shared Google Drives can be accessed. OpenAI said that ChatGPT knows when to access connected data sources for many common prompts. Users can still select “Internal Knowledge” in the message composer.  The company said ChatGPT “fully respects existing organization settings and permissions,” so users who do not have access to specific drives or documents cannot force ChatGPT to read those. source

ChatGPT gets smarter: OpenAI adds internal data referencing Read More »

Groq and PlayAI just made voice AI sound way more human

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Groq and PlayAI announced a partnership today to bring Dialog, an advanced text-to-speech model, to market through Groq’s high-speed inference platform. The partnership combines PlayAI’s expertise in voice AI with Groq’s specialized processing infrastructure, creating what the companies claim is one of the most natural-sounding and responsive text-to-speech systems available. “Groq provides a complete, low latency system for automatic speech recognition (ASR), GenAI, and text-to-speech, all in one place,” said Ian Andrews, Chief Revenue Officer at Groq, in an exclusive interview with VentureBeat. “With Dialog now running on GroqCloud, this means customers won’t have to use multiple providers for a single use case — Groq is a one stop solution.” Groq powers first Arabic voice AI, expanding Middle East tech presence Dialog is notable for being available in both English and Arabic, with the Arabic version representing the first voice AI specifically designed for the Middle East region. The inclusion of Arabic as one of the initial offerings was strategic for both companies. “Arabic is the fourth most spoken language globally — by partnering with PlayAI to offer an Arabic TTS model, Groq is unlocking a key global market and enabling broader access to fast AI inference,” Andrews told VentureBeat. The companies claim their solution addresses key shortcomings in existing voice AI technologies, particularly around natural speech patterns and response speed. According to benchmark testing conducted by third-party evaluator Podonos, Dialog was preferred by users at a rate of 10:1 versus ElevenLabs v2.5 Turbo and over 3:1 against ElevenLabs Multilingual v2.0. Innovative ‘adaptive speech contextualizer’ transforms conversational AI What sets Dialog apart is its sophisticated approach to context. Rather than treating each vocalization as an isolated event, the system maintains awareness of the entire conversation flow. “We built a novel architecture that we call an ‘adaptive speech contextualizer‘ (ASC), which allows the model to use the full context and history of a conversation,” said Mahmoud Felfel, co-founder and CEO of PlayAI, in an interview with VentureBeat. “This means that every response isn’t just a standalone output; it’s enriched with appropriate prosody, tone, and emotion that reflect the flow of the conversation.” For enterprises looking to implement conversational AI, latency — the delay between request and response — has been a persistent challenge. Groq’s specialized Language Processing Units (LPUs) appear to provide a significant advantage in this area. “Based on initial internal testing, Groq is delivering up to 140 characters per second on PlayAI’s Dialog model, a significant boost compared to the same model running on GPUs at 86 characters per second,” explained Andrews. “That means that Dialog generates text up to 10 times faster than real-time.” Groq secures $1.5 billion Saudi investment to build world-class AI infrastructure The partnership comes at a time of significant expansion for Groq, which recently secured a $1.5 billion commitment from Saudi Arabia to fund additional infrastructure. The company has established a data center in Dammam, which it describes as “the region’s largest inference cluster.” “Partnering with Groq was a no-brainer; they’re the industry leader in advanced AI inference infrastructure,” said Felfel. “With TTS and agents, low latency is key. We’ve already optimized Dialog for these real-time applications, but partnering with Groq allows us to deliver the lowest latency voice model on the market.” The voice AI market has seen rapid growth as businesses look to automate customer interactions while maintaining a natural, human-like experience. Applications range from customer service and sales automation to voice-overs and accessibility features for the visually impaired. Enterprise applications extend beyond traditional customer service use cases “Beyond customer service, other enterprise use cases include automating sales and appointment scheduling, on-boarding and personal assistants, creating voice overs to existing content, translating English audio and video content into Arabic, increasing website and static content accessibility for the visually impaired, and more,” Andrews said. For PlayAI, which was founded by entrepreneurs from the Middle East and North Africa region, the inclusion of Arabic language capabilities was particularly meaningful. “As MENA founders, we know the region is heavily investing in AI capabilities and infrastructure as inflected in investments like Groq, but also world-leading adoption,” said Felfel. “Arabic is a global business language and one that we grew up speaking, so it was a natural choice as one of our core languages.” The companies have made the Dialog technology available through GroqCloud’s tiered service model, which includes both free and paid options. This approach allows developers to experiment with the technology before committing to larger implementations. “GroqCloud offers both free and paid plans. Anyone can create an account and create an API code for free,” Andrews explained. “Our paid Developer Tier is self-serve, meaning anyone with a credit card can sign up themselves.” As voice becomes an increasingly important interface for AI systems, this partnership positions both companies to capitalize on the growing demand for more natural and responsive conversational experiences. By addressing the technical challenges of latency and natural speech patterns, Groq and PlayAI may have removed significant barriers to wider adoption of voice AI in enterprise settings. source

Groq and PlayAI just made voice AI sound way more human Read More »

‘Studio Ghibli’ AI image trend overwhelms OpenAI’s new GPT-4o feature, delaying free tier

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More If you’ve been on the internet — or, at least, on the social network X — in the last day or so, you’ve likely come across colorful, smooth anime-style images of famous photographs rendered in the style of the Japanese studio, Studio Ghibli (the one that made Princess Mononoke, The Boy and the Crane, and My Neighbor Totoro, among many other classic animated films). In fact, some users are complaining because their feeds seem to be filled with nearly exclusively these types of images. Whether it’s current President Trump, the iconic image of the “Tank Man” during the 1989 pro-Democracy Tiananmen Square protests, Osama Bin Laden, Jeffrey Epstein, or even other pop culture moments and characters like Sam Rockwell’s iconic cameo on The White Lotus and many popular memes of yore, people have been making and sharing these images at a rapid clip. Powered by the new GPT-4o model’s native image gen Much of that is thanks to OpenAI’s new update to the GPT-4o model behind ChatGPT for Pro, Plus, and Team subscription tiers, which turns on “native image generation.” While ChatGPT previously allowed users to create images from text prompts, it did so by routing them to another, separate OpenAI model, DALL-E 3. But OpenAI’s GPT-4o model is so named with an “o” because it is an “omni” model — the company trained it not only on text and code, but also on imagery and presumably, video and audio as well, allowing it to be able to understand all these forms of media and their similarities and differences, conceive of ideas across them (an “apple” is not just a word, but also something that can be drawn as a red or yellow or green fruit), and accurately produce said media given text prompts by a user without connecting to any external models. As a consequence, like rival Google AI Studio’s recent update to include a Gemini 2.0 Flash experimental image creation model, the new OpenAI GPT-4o can also accept image uploads of any pre-existing image in your camera roll or that you’ve screenshotted or saved off the web. How to use ChatGPT to make Studio Ghibli-style images (and change or transfer any image into any style!) First, navigate to Chat.com or ChatGPT.com and ensure you’re logged in with your ChatGPT Plus, Pro, or Team account and that the AI model selector (located in the left corner of the session window) is showing “GPT-4o” as the chosen model (you can click it to drop down and select the proper model between the available options). Once you do that, you can upload an image to ChatGPT using the “+” button in the lower left hand corner of the prompt entry text box, you can now ask the new GPT-4o with image creation model to render your pre-existing image in a new style. If you want, you can try it by uploading a photo of yourself and friends and typing “make all these people in the style of a Studio Ghibli animation.” And after a few seconds, it will do so with some pretty convincing and amusing results. It even supports attaching multiple images and combining them into a single piece. ChatGPT free tier usage delayed OpenAI initially said it would also enable this feature for free (non-paying users of ChatGPT), but unfortunately for them, co-founder and CEO Sam Altman today posted that the feature will be delayed due to the overwhelming demand by existing paying subscribers to ChatGPT Plus, Pro, and Team tiers. As he wrote on X: “images in chatgpt are wayyyy more popular than we expected (and we had pretty high expectations). rollout to our free tier is unfortunately going to be delayed for awhile.“ Meanwhile, those who do have access will likely continue cranking out image edits in this and other recognizable or novel styles. Of course, not everyone is a fan of OpenAI’s work here. In fact, Studio Ghibli creator Hayao Miyazaki himself appeared in a documentary back in 2016 — and one of the most memorable moments from it still referenced to this day is him reacting with overwhelming disgust and revulsion to an early example of AI-powered animation and physics by, you guessed it, an OpenAI model. As with many generative AI products and services, OpenAI’s training data for this new image generation capability remains under wraps, but is widely speculated to contain copyrighted material — and while imitating a style is generally not considered copyright infringement in the U.S., it is rubbing some fans of the original animation the wrong way. For now, those brands and enterprises looking to play with this style should do so with caution and after serious consideration, given the possible negative blowback among some users. But for those who are unabashedly pro-AI tools or with more forgiving and fun-loving fanbases, it’s clear that OpenAI has yet another hit on its hands. source

‘Studio Ghibli’ AI image trend overwhelms OpenAI’s new GPT-4o feature, delaying free tier Read More »