VentureBeat

Nvidia’s GTC 2025 keynote: 40x AI performance leap, open-source ‘Dynamo’, and a walking Star Wars-inspired ‘Blue’ robot

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia CEO Jensen Huang took to the stage at the SAP Center on Tuesday morning, leather jacket intact and without a teleprompter, to deliver what has become one of the most anticipated keynotes in the technology industry. The GPU Technology Conference (GTC) 2025, self-described by Huang as the “Super Bowl of AI,” arrives at a critical juncture for Nvidia and the broader artificial intelligence sector. “What an amazing year it was, and we have a lot of incredible things to talk about,” Huang told the packed arena, addressing an audience that has grown exponentially as AI has transformed from a niche technology into a fundamental force reshaping entire industries. The stakes were particularly high this year following market turbulence triggered by Chinese startup DeepSeek‘s release of its highly efficient R1 reasoning model, which sent Nvidia’s stock tumbling earlier this year amid concerns about potential reduced demand for its expensive GPUs. Against this backdrop, Huang delivered a comprehensive vision of Nvidia’s future, emphasizing a clear roadmap for data center computing, advancements in AI reasoning capabilities, and bold moves into robotics and autonomous vehicles. The presentation painted a picture of a company working to maintain its dominant position in AI infrastructure while expanding into new territories where its technology can create value. Nvidia’s stock traded down throughout the presentation, closing more than 3% lower for the day, suggesting investors may have hoped for even more dramatic announcements. But if Huang’s message was clear, it was this: AI isn’t slowing down, and neither is Nvidia. From groundbreaking chips to a push into physical AI, here are the five most important takeaways from GTC 2025. Blackwell platform ramps up production with 40x performance gain over Hopper The centerpiece of Nvidia’s AI computing strategy, the Blackwell platform, is now in “full production,” according to Huang, who emphasized that “customer demand is incredible.” This is a significant milestone after what Huang had previously described as a “hiccup” in early production. Huang made a striking comparison between Blackwell and its predecessor, Hopper: “Blackwell NVLink 72 with Dynamo is 40 times the AI factory performance of Hopper.” This performance leap is particularly crucial for inference workloads, which Huang positioned as “one of the most important workloads in the next decade as we scale out AI.” The performance gains come at a critical time for the industry, as reasoning AI models like DeepSeek‘s R1 require substantially more computation than traditional large language models. Huang illustrated this with a demonstration comparing a traditional LLM’s approach to a wedding seating arrangement (439 tokens, but wrong) versus a reasoning model’s approach (nearly 9,000 tokens, but correct). “The amount of computation we have to do in AI is so much greater as a result of reasoning AI and the training of reasoning AI systems and agentic systems,” Huang explained, directly addressing the challenge posed by more efficient models like DeepSeek’s. Rather than positioning efficient models as a threat to Nvidia’s business model, Huang framed them as driving increased demand for computation — effectively turning a potential weakness into a strength. Next-generation Rubin architecture unveiled with clear multi-year roadmap In a move clearly designed to give enterprise customers and cloud providers confidence in Nvidia’s long-term trajectory, Huang laid out a detailed roadmap for AI computing infrastructure through 2027. This is an unusual level of transparency about future products for a hardware company, but reflects the long planning cycles required for AI infrastructure. “We have an annual rhythm of roadmaps that has been laid out for you so that you could plan your AI infrastructure,” Huang stated, emphasizing the importance of predictability for customers making massive capital investments. The roadmap includes Blackwell Ultra coming in the second half of 2025, offering 1.5 times more AI performance than the current Blackwell chips. This will be followed by Vera Rubin, named after the astronomer who discovered dark matter, in the second half of 2026. Rubin will feature a new CPU that’s twice as fast as the current Grace CPU, along with new networking architecture and memory systems. “Basically everything is brand new, except for the chassis,” Huang explained about the Vera Rubin platform. The roadmap extends even further to Rubin Ultra in the second half of 2027, which Huang described as an “extreme scale up” offering 14 times more computational power than current systems. “You can see that Rubin is going to drive the cost down tremendously,” he noted, addressing concerns about the economics of AI infrastructure. This detailed roadmap serves as Nvidia’s answer to market concerns about competition and sustainability of AI investments, effectively telling customers and investors that the company has a clear path forward regardless of how AI model efficiency evolves. Nvidia Dynamo emerges as the ‘operating system’ for AI factories One of the most significant announcements was Nvidia Dynamo, an open-source software system designed to optimize AI inference. Huang described it as “essentially the operating system of an AI factory,” drawing a parallel to how traditional data centers rely on operating systems like VMware to orchestrate enterprise applications. Dynamo addresses the complex challenge of managing AI workloads across distributed GPU systems, handling tasks like pipeline parallelism, tensor parallelism, expert parallelism, in-flight batching, disaggregated inferencing, and workload management. These technical challenges have become increasingly important as AI models grow more complex and reasoning-based approaches require more computation. The system gets its name from the dynamo, which Huang noted was “the first instrument that started the last Industrial Revolution, the industrial revolution of energy.” The comparison positions Dynamo as a foundational technology for the AI revolution. By making Dynamo open source, Nvidia is attempting to strengthen its ecosystem and ensure its hardware remains the preferred platform for AI workloads, even as software optimization becomes increasingly important for performance and efficiency. Partners including Perplexity are already working with Nvidia on Dynamo implementation. “We’re so happy that so many of our partners are working with us on

Nvidia’s GTC 2025 keynote: 40x AI performance leap, open-source ‘Dynamo’, and a walking Star Wars-inspired ‘Blue’ robot Read More »

Adobe’s new AI agents can make personal websites for your customers

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Adobe first made its mark in generative AI with its Firefly image generation model in 2023 and its generative fill feature on Photoshop. With enterprise customers turning their attention from exploring AI-powered creation tools to agents, Adobe is throwing its hat in the agentic ring and adding more personalization features to everyday customer experience tasks.  Adobe announced the launch of 10 agents and an orchestration tool on its Adobe Experience platform. These tools target specific needs such as customer channel engagement, content production, data management and site optimization.  The company also debuted Brand Concierge, a way for organizations to personalize their websites for customers based on their previous interactions with the brand.  Loni Stark, vice president of strategy and product for Adobe, told VentureBeat in an interview that agents would change the customer experience for both the enterprise and their clients. “We see that agents can scale up the capacity of experience makers. It’s not just because of the hype out there, but because when we have delivered our tools to the customers we work with, we see that as their trust in the AI capabilities we deliver increases, they start to think, oh, can I make them autonomous,” Stark said.  She added the idea is to let these agents work ambiently, meaning the agents and the orchestrator continue to work in the background to provide information or solve issues for enterprises proactively.  Orchestration and agents for customer experience The new agents launching on AEP are: Account qualification agent to evaluate new sales pipelines Audience agent, which analyzes cross-channel engagement data to  Content production agent that helps marketers and creatives scale by generating and assembling content   Data insights agent simplifies and expands the process of deriving insights from signals   Data engineering agent   Experimentation agent helps stimulate new ideas and conduct impact analysis   Journeyagentst can orchestrate cross-channel experiences  Product advisor agent recommends experience and product engagement experiments   Site optimization agent manages and detects traffic and engagement in a website  Workflow optimization agent for cross-team collaboration and monitoring ongoing projects Stark highlighted the Site Optimization agent during a demo with VentureBeat. The agent would check for broken links or proactively examine a brand’s website for traffic and bounce rates and suggest fixes.  “Most companies don’t have people that spend all of their days looking at broken links, for example, especially if they have tens of thousands of pages, or can’t check on these daily,” Stark said. “What’s happening is that there’s lost opportunity both if you think about the bounce rate. This agent is pre-trained, so out of the box, it already comes with skills like looking for broken backlinks.” Stark said enterprises using the Experience Platform can fine-tune how much agents access their data through the orchestrator. Adobe joins companies like Salesforce and ServiceNow in providing users with pre-built agents for specific tasks and teams.   A customized brand website Another new feature for the Adobe Experience Platform is Brand Concierge, which will help enterprises build websites that offer customized customer visits. Organizations can create a website for their company or product that greets customers by name and provides a query box asking them what information they want.  Say a company has a website for a hotel chain. A customer can ask the chat function or click on premade prompts to ask about amenities specific to one location, Brand Concierge helps the company push the appropriate information to the front page of the site and also customize all other assets and experiences to that location. Stark said customers can still browse the site as usual, but Brand Concierge pushes customer engagement further by remembering how particular customers have interacted with the enterprise before.  Brand Concierge is a separate offering from the agents that sits on top of the AEP, but Stark said, “It’ll leverage agents such as the Product Advisor Agent, which is already built into the Concierge app.” The company also understands its customers’ past interactions and preferences.  Stark said Adobe customers increasingly find their clients more comfortable using AI chatbots, making it easier to transition them to more personalized, prompt-based website experiences.  “I think what we’re seeing is that consumers are increasingly comfortable with an AI-powered conversational experience. New Adobe Analytics data shows a 1,200% surge in U.S. retail sites and a 1,700% surge in U.S. travel sites (July 2024 to Feb 2025) from generative AI sources. Companies can surface this on high-traffic properties (like their website) with an increasingly familiar form factor that is gaining traction,” Stark said.  The company launched the Adobe Experience Platform in 2019, but the real-time customer experience management solution saw a massive update last year, including an AI assistant for users. source

Adobe’s new AI agents can make personal websites for your customers Read More »

Inside Zoom’s AI evolution: From basic meeting tools to agentic productivity platform powered by LLMs and SLMs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Zoom became a household name during the pandemic as remote work became the norm nearly overnight. While the company was once synonymous only with video conferencing, it has been quietly building a sophisticated AI infrastructure over the last several years with an aim to redefine workplace productivity. While video conferencing is important and remains the cornerstone of Zoom’s business, there’s a lot more now, too, thanks to AI. Moving from meeting to milestone Everyone knows that Zoom is a technology for meetings. But what is the meeting for? In a business context, certainly there can be meetings that have no purpose, but those should be outliers. Meetings should lead to something, whether that’s an action item or some other milestone. “In the agentic AI era, finally technology is reaching the point that we can transform from meeting to milestone,” Zoom CTO Xuedong (X.D.) Huang told VentureBeat in an exclusive interview. Today, Zoom is announcing an aggressive agentic AI strategy that includes a series of new services. The update introduces agentic capabilities that promise to transform meetings from communication events into action-oriented workflows, alongside a new AI Studio that lets enterprises create customized AI agents. The hidden technical evolution behind Zoom’s agentic AI  Prior to joining Zoom, Huang spent 30 years at Microsoft, working on speech technologies as well as Microsoft’s Azure OpenAI service. He carried forward a lot of lessons learned from that experience when he joined Zoom in 2023. Under Huang’s direction, Zoom began quietly building an AI architecture designed to facilitate tasks rather than just summarize conversations. Zoom publicly announced a partnership with Anthropic in May 2023 — but that’s not the only large language model (LLM) used at Zoom. While Microsoft Teams generally relies on OpenAI via the Microsoft OpenAI Azure service, and Google Meet is supported by Google Gemini, Zoom has taken an agnostic approach to LLMs. Huang explained that when Zoom launched the first iteration of its AI companion in 2023, it wasn’t based on any one single LLM. Instead, the company started off with a federated approach, using multiple LLMs including its own custom built small language model (SLM). “We’ve partnered with the best models out there, including OpenAI and Anthropic, but we’ve also built our own highly customized 2 billion parameter language model,” said Huang. Zoom’s AI Companion uses a federated approach in which the smaller Zoom model is used in conjunction with larger, industry-leading language models. The smaller model initially evaluates and processes the input, and the partial results are then passed to larger models to produce the final output. This approach allows Zoom to take advantage of the strengths of both the smaller, customized model and the larger, more powerful models, while reducing costs and improving performance. How the small language model is at the center of Zoom’s agentic AI journey Perhaps the most technically intriguing aspect of Zoom’s AI strategy is its focus on SLMs. Rather than following the industry trend of distilling smaller models from larger ones, Zoom built its 2-billion parameter model entirely from scratch. The technical advantage of this approach becomes apparent when customizing for specific domains. “When you customize, it takes more effort, it’s just hard to steer a bigger ship,” Huang explained. As it turns out, the ability to customize the small model is a critical component to the development of specific agentic AI workflows. Looking ahead, Zoom envisions its SLMs eventually running directly on user devices, enabling both better privacy and more personalized experiences. AI companion 2.0: Agentic AI transforms meetings to milestones At the heart of Zoom’s updates is AI Companion 2.0, which transforms Zoom’s AI capabilities from meeting support to fully agentic functions. With 2.0, Zoom is evolving from assistant to agentic AI that is capable of reasoning, memory and task execution. The evolved AI Companion can now execute multi-step actions on behalf of users, orchestrating tasks like scheduling meetings, generating video clips and creating documents. Key updates include: Agentic skills: Calendar management, clip generation, advanced writing assistance; Task management: Automatic detection of action items from meetings and chats; Meeting enhancements: AI-powered agendas, live notes and voice recording; Document creation: Advanced references and automatic data table generation in Zoom Docs; Virtual agents: Self-service capabilities for customer service with both chat and voice support; Industry solutions: Specialized tools for frontline workers, healthcare professionals and educators; Zoom Drive: New central repository for meeting assets and productivity documents; Custom avatars: AI-generated video avatars for creating presentation clips. Most features will roll out between March and July 2025. While the standard AI Companion is included at no additional cost for paid users, specialized agents and custom configurations will require additional fees. “The most important aspect for us of agentic AI is really enabling the action-oriented information flow,” said Huang. “What that means is that when you have a meeting, the action task will flow into Docs or chat or into other actions you have to take.” AI Studio: Building custom agents for enterprises  While Zoom is providing a lot of different agentic AI capabilities out-of-the-box for users, Huang recognized that enterprises often need more customized options. That’s where AI Studio comes in, allowing companies to create customized AI agents tailored to specific business needs. These can be deeply integrated with company-specific knowledge and workflow processes. As an example, Huang detailed one practical application for human resources policy. Enterprises can use the AI Studio to upload all of their internal HR policy documents. The AI companion will then be trained on this company-specific HR policy information, allowing it to accurately answer employee questions about HR guidelines and procedures. IT administrators can also use the AI Studio to connect the companion to other internal knowledge bases, like IT support documentation. The goal is to enable companies to create AI agents that are deeply integrated with their own processes, data and workflows, transforming the AI companion into a customized and

Inside Zoom’s AI evolution: From basic meeting tools to agentic productivity platform powered by LLMs and SLMs Read More »

Why smarter ERP data is the key to AI-powered growth

Presented by SAP Sometimes two contradictory forces can be true at the same time. In the past 20 years, we’ve seen an explosion of innovative new technologies for businesses but slowing business productivity growth. Between 1995 and 2005, productivity in the U.S. grew at 2.6%. But in the decades since 2005, it grew at just 1.4% — a drop of nearly 50%, according to McKinsey. And this happened during a time when the internet matured, smartphones and mobile apps became pervasive and there were big advancements in ERP technologies. It is clear from the data that innovation alone doesn’t lead to broad-based productivity gains. And that is a problem because when businesses are more productive, everyone benefits. This is especially important to take note of right now as new data and business AI innovations are reaching the market with breakneck speed. But these innovations are also bringing more complexity. What really matters is how you use data and AI together. Clean and meaningful data is the foundation for relevant AI. At the center of this is ERP data. Why is ERP data so valuable? Because it is structured, describes processes, and has a sematic meaning. Examples include purchase orders, invoices, financial postings in a general ledger and supplier delivery schedules. ERP data has business semantics that connects individual data sets and that’s what makes it possible to truly understand how businesses run. When you add in industry-specific data — like energy prices, interest rates and consumer trends and preferences, it becomes even richer. Considering that over 80% of the business data generated in the world runs through an SAP system, this is knowledge that is unique to SAP. We know how businesses run best across any industry and geography. Emerging AI applications as a force multiplier The cloud has changed the game as how this data is accessible. On top of this data, a new class of AI-driven applications can be built that is relevant to a business and their industry and is useful through an intuitive user interface. That’s what creates a force multiplier for business productivity and business value. That’s exactly what SAP is focused on right now. In the past few weeks, SAP has announced a series of data and AI innovations to make businesses more productive. This includes Joule for Developers, which is an AI-powered assistant that understands SAP’s development framework and empowers developers of all skill levels to be more productive, creative and proficient in accelerating ABAP, Java and JavaScript-based application development and automation. Last month, we announced our Business Data Cloud, as our solution to unify all SAP and third-party data for our customers, providing the trusted data foundation organizations need to make more impactful decisions. In combination with our partner DataBricks, BDC combines structured data with unstructured data such as customer feedback, industry trends, etc., leading to better decision making. Unifying AI across the organization The source that is generating the necessary operational data is the SAP Business Suite. It represents a comprehensive set of integrated applications that seamlessly connects every part of their business. And that really speaks to the elegance of the SAP system — that everything works together seamlessly and provides customers a 360 view on their business. Think of it like an orchestra. Instead of individual musicians playing different instruments, an orchestra creates music and harmony because they play seamlessly together. In other words, the whole becomes greater than the sum of its parts. That’s what the SAP Business Suite does for business data. A great example is KIND, which is the health and wellness division of Mars, Inc. Their recent growth created lots of complexity. They needed to rethink their decentralized ways of working. So they implemented SAP S/4HANA Cloud Public Edition as their new business technology foundation while partnering with implementation experts from Accenture. As a result, they can better access and leverage their ERP data in the public cloud, leading to operational efficiencies and better use of analytics to drive better decisions for efficient business operations. It’s clear that the next wave of AI is agentic — agents or services that come as software. For example, think of a service to enter and process travel expenses. What is still needed is software that stores the data in a structured way, a system of record, etc. But what is changing is the user experience. In the case of SAP, our UX is Joule and it is becoming the orchestrator of services. There isn’t a single business technology conversation today that doesn’t start and end with the impact of data and AI. But just like an orchestra, it can either be noise or music depending on how it all comes together. Companies that embrace an active approach of data management supercharged by generative AI will stay ahead of what’s coming next and create a blueprint for a new era of business productivity growth. Jan Gilg is Chief Revenue Officer, Americas and SAP Global Business Suite. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Why smarter ERP data is the key to AI-powered growth Read More »

Halliday raises $20 million to build AI agents that operate safely on blockchain

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Halliday has raised $20 million in Series A funding to develop AI agents that can safely operate on blockchain networks without requiring traditional smart contract development. The funding round, led by Andreessen Horowitz’s crypto arm (a16z crypto), brings the company’s total funding to over $26 million. The investment signals growing confidence in Halliday’s approach to solving one of AI’s thorniest challenges: safely deploying autonomous agents in decentralized environments where mistakes can be costly and irreversible. “AI on blockchain has remained inaccessible due to compliance and safety concerns,” said Griffin Dunaif, CEO of Halliday, in an exclusive interview with VentureBeat. “Typically with smart contract development for AI agents, any minor mistake can cause a breach, leaving them vulnerable. For AI to operate onchain, there needs to be robust safety infrastructure where businesses have oversight of AI-enabled automations.” How Halliday’s safety protocol makes AI agents viable for enterprise blockchain applications Halliday’s core innovation, the Agentic Workflow Protocol (AWP), creates what the company calls “immutable guardrails” for AI agents operating on blockchain networks. These guardrails ensure that autonomous systems can execute tasks within strictly defined parameters without the risk of exploits or unintended actions. “We’ve built the proper tooling with our workflow protocol to make safe AI possible with immutable guardrails onchain,” explained Dunaif. “The protocol ensures that no party, not even Halliday, can circumvent the guardrails placed on tasks.” This approach could solve a fundamental challenge in AI development: how to allow autonomous agents to interact with financial systems and digital assets while maintaining security and compliance. By embedding safety constraints directly into the protocol layer, Halliday aims to make AI agents trustworthy enough for enterprise applications. AI-powered treasury management and B2B automation already in production Unlike many AI startups announcing funding before their technology reaches production, Halliday has already deployed its AI-enabled workflow engine with several high-profile partners, including DeFi Kingdoms, Core Wallet by Ava Labs, and ApeChain. The company’s workflows enable AI agents to automate complex, multi-step processes like treasury management, recurring payments, and business-to-business transactions. These autonomous systems can operate across multiple blockchain networks while maintaining compliance with predefined rules. “Halliday handles all the heavy lifting: protocol integrations like bridges and DEXs, data translation, and reliable multi-chain execution,” said Dunaif. “With workflows, teams can create breakthrough applications in hours, not years.” Enterprise AI adoption accelerated by removing blockchain programming barriers For enterprise customers, Halliday’s technology offers a path to leveraging blockchain networks and AI agents without specialized development teams or the security risks typically associated with smart contracts. This could dramatically accelerate adoption of AI in financial services, where institutions have shown interest in autonomous systems but have been deterred by security and compliance concerns. According to Dunaif, Halliday offers a “10,000x development cost reduction,” potentially transforming AI-blockchain integration from a capital-intensive project to an operational expense. “There’s opportunity to expand beyond web3 and bring our Workflow Protocol to fintechs and banks looking to enter the crypto arena, whether through stablecoin subscriptions, global yield products, or programmable treasury management,” Dunaif noted. Will safe AI agents unlock the next wave of enterprise blockchain adoption? The intersection of AI and blockchain has long promised powerful new business models. Autonomous agents that can execute complex financial transactions without human intervention could dramatically reduce costs and enable new services. However, the risk of rogue AI agents or exploitable code has limited enterprise adoption. Halliday’s approach aims to solve this problem by creating a middleware layer that handles security and execution while allowing businesses to focus on defining the logic of their AI agents. This separation of concerns could make AI-blockchain integration more accessible to mainstream enterprises. “We’re extending Stripe’s technology to be utilized by developers who are creating their own blockchains,” explained Dunaif when discussing how Halliday works with traditional financial infrastructure. The future of enterprise AI: Automated workflows replace manual processes With the new funding, Halliday plans to accelerate development of its AI workflow protocol and expand its team. The company has already attracted talent from leading technology companies, including Alphabet, Meta, Netflix, Stripe, and Compound. Halliday Payments, a first-party application built on the protocol, demonstrates how AI agents can simplify complex processes. It offers an end-to-end payments solution that uses AI to handle the complexities of blockchain transactions, lowering the barrier for new users. “By safely delegating workflows to autonomous systems, such as agents or software, teams can create breakthrough applications in hours, not years,” Dunaif said, describing his vision for how AI will transform enterprise operations. As businesses continue to explore AI’s potential to automate complex workflows, solutions like Halliday’s that address the fundamental challenges of safety and compliance could play a crucial role in bringing these technologies to mainstream adoption. source

Halliday raises $20 million to build AI agents that operate safely on blockchain Read More »

OpenAI’s strategic gambit: The Agents SDK and why it changes everything for enterprise AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI reshaped the enterprise AI landscape Tuesday with the release of its comprehensive agent-building platform – a package combining a revamped Responses API, powerful built-in tools and an open-source Agents SDK. While this announcement might have been overshadowed by other AI headlines — Google’s unveiling of the impressive open-source Gemma 3 model, and the emergence of Manus, a Chinese startup whose autonomous agent platform astonished observers — it is clearly a significant move for enterprises to be aware of. It consolidates a previously fragmented complex API ecosystem into a unified, production-ready framework. For enterprise AI teams, the implications are potentially profound: Projects that previously demanded multiple frameworks, specialized vector databases and complex orchestration logic can now be achieved through a single, standardized platform. But perhaps most revealing is OpenAI’s implicit acknowledgment that solving AI agent reliability issues requires outside expertise. This shift comes amid growing evidence that external developers are finding innovative solutions to agent reliability — something that the shocking Manus release also clearly demonstrated. This strategic concession represents a critical turning point: OpenAI recognizes that even with its vast resources, the path to truly reliable agents requires opening up to outside developers who can discover innovative solutions and workarounds that OpenAI’s internal teams might miss. A unified approach to agent development At its core, the announcement represents OpenAI’s comprehensive strategy to provide a complete, production-ready stack for building AI agents. The release brings several key capabilities into a unified framework: The Responses API builds on the Chat Completions API but adds seamless integration for tool use, with improved interface design for creating agents; Built-in tools include web search, file search and computer use (the technology behind OpenAI’s Operator feature); An open-source Agents SDK for orchestrating single-agent and multi-agent workflows with handoffs. What makes this announcement transformative is how it addresses the fragmentation that has plagued enterprise AI development. Companies that decide to standardize on OpenAI’s API format and open SDK will no longer need to cobble together different frameworks, manage complex prompt engineering or struggle with unreliable agents. “The word ‘reliable’ is so key,” Sam Witteveen, co-founder of Red Dragon, an independent developer of AI agents, said in a recent conversation with me on a video podcast deep dive on the release. “We’ve talked about it many times…most agents are just not reliable. And so OpenAI is looking at like, ‘Okay, how do we bring this sort of reliability in?’” After the announcement, Jeff Weinstein, the product lead of payments company Stripe took to X to say Stripe had already demonstrated the practical application of OpenAI’s new Agents SDK by releasing a toolkit that enables developers to integrate Stripe’s financial services into agentic workflows. This integration allows for the creation of AI agents capable of automating payments to contractors by checking files to see who needed payment or not, and billing and other transactions. Strategic implications for OpenAI and the market This release reveals a significant shift in OpenAI’s strategy. Having established a lead with foundation models, the company is now consolidating its position in the agent ecosystem through several calculated moves: 1. Opening up to external innovation OpenAI acknowledges that even its extensive resources aren’t enough to outpace community innovation. The launch of tools and an open-source SDK suggests a major strategic concession. The timing of the release coincided with the emergence of Manus, which impressed the AI community with a very capable autonomous agent platform — demonstrating capabilities using existing models from Claude and Qwen, essentially showing that clever integration and prompt engineering could achieve reliability that even major AI labs were struggling with. “Maybe even OpenAI are not the best at making Operator,” Witteveen noted, referring to the web-browsing tool that OpenAI shipped in late January, but which we found had bugs and was inferior to competitor Proxy. “Maybe the Chinese startup has some nice hacks in their prompt, or in whatever, that they’re able to use these sort of open-source tools.” The lesson is clear: OpenAI needs the community’s innovation to improve reliability. Any team, no matter how good they are, whether it’s OpenAI, Anthropic, Google — they just can’t try out as many things as the open source community can. 2. Securing the enterprise market through API standardization OpenAI’s API format has emerged as the de facto standard for large language model (LLM) interfaces, supported by multiple vendors including Google’s Gemini and Meta’s Llama. OpenAI’s change in its API is significant because a lot of third-party players are going to fall in line and support these other changes as well. By controlling the API standard while making it more extensible, OpenAI looks set to create a powerful network effect. Enterprise customers can adopt the Agents SDK knowing it works with multiple models, but OpenAI maintains its position at the center of the ecosystem. 3. Consolidating the RAG pipeline The file search tool challenges database companies like Pinecone, Chroma, Weaviate and others. OpenAI now offers a complete retrieval-augmented generation (RAG) tool out-of-the-box. The question now is what happens to this long list of RAG vendors or other agent orchestration vendors that popped up with large funding to go after the enterprise AI opportunity — if you can just get a lot of this through a single standard like OpenAI. In other words, enterprises may consider consolidating multiple vendor relationships into a single API provider, OpenAI. Companies can upload any data documents they want to use with OpenAI’s leading foundation models — and search it all within the API. While enterprises may encounter limitations compared to dedicated RAG databases like Pinecone, OpenAI’s built-in file and web search tools offer clear citations and URLs — which is critical for enterprises prioritizing transparency and accuracy. This citation capability is key for enterprise environments where transparency and verification are essential – allowing users to trace exactly where information comes from and validate its accuracy against the original documents.

OpenAI’s strategic gambit: The Agents SDK and why it changes everything for enterprise AI Read More »

Nous Research just launched an API that gives developers access to AI models that OpenAI and Anthropic won’t build

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nous Research, the New York-based AI collective known for developing what it calls “personalized, unrestricted” language models, has launched a new Inference API that makes its models more accessible to developers and researchers through a programmatic interface. The API launch represents a significant expansion of Nous Research’s offerings, which have gained attention because they challenge the more restricted approaches of larger AI companies like OpenAI and Anthropic. “We heard your feedback, and built a simple system to make our language models more accessible to developers and researchers everywhere,” the company announced on social media. The initial API release features two of the company’s flagship models: Hermes 3 Llama 70B, a powerful general-purpose model based on Meta’s Llama 3.1 architecture, and DeepHermes-3 8B Preview, the company’s recently released reasoning model that allows users to toggle between standard responses and detailed chains-of-thought (CoT). Today we’re releasing our Inference API that serves Nous Research models. We heard your feedback, and built a simple system to make our language models more accessible to developers and researchers everywhere. The initial release features two models – Hermes 3 Llama 70B and… pic.twitter.com/dAEA8donln — Nous Research (@NousResearch) March 12, 2025 Inside Nous Research’s waitlist-based portal: How the AI upstart is managing high demand To manage demand, Nous has implemented a waitlist system through its new portal, with access granted on a first-come, first-serve basis. The company is providing all new accounts with $5 in free credits. Developers can access the API documentation to learn more about integration options. The waitlist approach provides critical insight into Nous Research’s strategic positioning. Unlike major players with massive GPU reserves, Nous faces the infrastructure constraints common to smaller organizations in AI. The waitlist serves as both a technical necessity and a marketing tactic, creating an exclusivity that generates buzz while managing computational load. What makes this approach particularly notable is how it reflects Nous’s grassroots ethos. While the company positions itself as an alternative to big tech AI, it’s also adopting pragmatic business strategies that acknowledge the realities of scaling inference services. This tension between idealism and practicality will likely define Nous’ journey as it transitions from purely open-source releases to commercial offerings. The API follows OpenAI’s API design pattern for completions and chat completions, making it potentially easier for developers already familiar with that interface to integrate Nous’ models into their applications. From GitHub downloads to cloud API: Nous Research’s evolution signals a new business model This API launch comes just four months after Nous debuted Nous Chat, the company’s first user-facing chatbot interface. While the company has released numerous open-source models for local deployment, the new API allows developers to access high-performance versions of these models without managing their own infrastructure. “Previously, if researchers and users wanted to actually deploy these models, they needed to download and run the code on their own machines — a time-consuming, finicky and potentially costly endeavor,” VentureBeat executive editor Carl Franzen wrote in his coverage of the Nous Chat launch. DeepHermes-3, released just last month, represents the company’s entry into the increasingly competitive field of reasoning-focused AI models. The model allows users to switch between concise responses and detailed reasoning processes through a system prompt that activates its “thinking” capabilities. The ‘unrestricted AI’ philosophy: How Nous Research challenges big tech’s guardrails Since its founding in 2023, Nous Research has positioned itself as an alternative to more tightly controlled AI systems. The company emphasizes individual agency and alignment with user needs, reflected in blog posts with titles like “Freedom at the frontier” and “From black box to glass house: The imperative for transparent AI development.“ “Superintelligence should solve for maximal individual agency and freedom of spirit,” the company wrote in a recent blog post announcing its Psyche project on Solana. “Its development cannot be left solely in the hands of a few corporations and oligarchs.” This philosophical stance has resonated with developers seeking more flexible AI systems, although the approach has also raised questions about responsible deployment. Despite marketing itself as “unrestricted,” the company’s models do include some guardrails against harmful outputs. Monetizing open AI research: Nous’s API strategy and roadmap for Hermes, DeepHermes and beyond The API launch signals Nous Research’s move toward a more sustainable business model while maintaining its commitment to open source principles. According to the company’s release timeline, Nous has released 29 AI artifacts since July 2023, including models, papers, code and datasets. The API represents a delicate but crucial evolution in Nous Research’s business model. By commercializing deployment while continuing to release model weights, Nous is attempting to square a difficult circle: Generating revenue without alienating the open-source community that forms its foundation. This hybrid approach appears designed to capture different segments of the market. Individual developers and researchers can still download and run models locally, while enterprises seeking reliability, convenience and performance optimization can pay for API access. In effect, Nous is monetizing the infrastructure and optimization layer rather than the models themselves — a strategy that addresses the fundamental economic challenge of open-source AI without compromising its core principles. The success of this approach may determine whether independent AI labs can establish sustainable business models that preserve their independence from big tech or venture capital firms that might push for more aggressive commercialization. For developers concerned about AI centralization, Nous’ experiment represents a potential middle path that could maintain diversity in the AI ecosystem. Nous Research indicates that its inference offerings will expand over time, potentially including more of its models like Hermes 2 Pro, which specializes in function-calling, or its Psyche project. For the growing ecosystem of AI startups building on open models, the new API provides another option beyond established players like Together AI, Anthropic and OpenAI, potentially increasing competition and driving further innovation in the AI inference space. “We welcome your ideas to help shape the future,” the company noted in its announcement,

Nous Research just launched an API that gives developers access to AI models that OpenAI and Anthropic won’t build Read More »

Patronus AI’s Judge-Image wants to keep AI honest — and Etsy is already using it

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Patronus AI announced today the launch of what it calls the industry’s first multimodal large language model-as-a-judge (MLLM-as-a-Judge), a tool designed to evaluate AI systems that interpret images and produce text. The new evaluation technology aims to help developers detect and mitigate hallucinations and reliability issues in multimodal AI applications. E-commerce giant Etsy has already implemented the technology to verify caption accuracy for product images across its marketplace of handmade and vintage goods. “Super excited to announce that Etsy is one of our ship customers,” said Anand Kannappan, cofounder of Patronus AI, in an exclusive interview with VentureBeat. “They have hundreds of millions of items in their online marketplace for handmade and vintage products that people are creating around the world. One of the things that their AI team wanted to be able to leverage generative AI for was the ability to auto-generate image captions and to make sure that as they scale across their entire global user base, that the captions that are generated are ultimately correct.” Why Google’s Gemini powers the new AI judge rather than OpenAI Patronus built its first MLLM-as-a-Judge, called Judge-Image, on Google’s Gemini model after extensive research comparing it with alternatives like OpenAI’s GPT-4V. “We tended to see that there was a slighter preference toward egocentricity with GPT-4V, whereas we saw that Gemini was less biased in those ways and had more of an equitable approach to being able to judge different kinds of input-output pairs,” Kannappan explained. “That was seen in the uniform scoring distribution across the different sources that they looked at.” The company’s research yielded another surprising insight about multimodal evaluation. Unlike text-only evaluations where multi-step reasoning often improves performance, Kannappan noted that it “typically doesn’t actually increase MLLM judge performance” for image-based assessments. Judge-Image provides ready-to-use evaluators that assess image captions on multiple criteria, including caption hallucination detection, recognition of primary and non-primary objects, object location accuracy, and text detection and analysis. Beyond retail: How marketing teams and law firms can benefit from AI image evaluation While Etsy represents a flagship customer in e-commerce, Patronus sees applications extending far beyond retail. These include “marketing teams across companies that are generally looking at being able to scalably create descriptions and captions against new blocks in design, especially marketing design, but also product design,” Kannappan said. He also highlighted applications for enterprises dealing with document processing: “Larger enterprises like venture services companies and law firms typically might have engineering teams that are using relatively legacy technology to be able to extract different kinds of information from PDFs, to be able to summarize the content inside of larger documents.” As AI becomes increasingly critical to business processes, many companies face the build-versus-buy dilemma for evaluation tools. Kannappan argues that outsourcing AI evaluation makes strategic and economic sense. “As we’ve worked with teams, [we’ve found that] a lot of folks may start with something to see if they can develop something internally, and then they realize that it’s, one, not core to their value prop or the product they’re developing. And two, it is a very challenging problem, both from an AI perspective, but also from an infrastructure perspective,” he said. This applies particularly to multimodal systems, where failures can occur at multiple points in the process. “When you’re dealing with RAG systems or agents, or even multimodal AI systems, we’re seeing that failures happen across all parts of the system,” Kannappan noted. How Patronus plans to make money while competing with tech giants Patronus offers multiple pricing tiers, starting with a free option that allows users to experiment with the platform up to certain volume limits. Beyond that threshold, customers pay as they go for evaluator usage or can engage with the sales team for enterprise arrangements with custom features and tailored pricing. Despite using Google’s Gemini model as its foundation, the company positions itself as complementary rather than competitive with foundation model providers like Google, OpenAI and Anthropic. “We don’t necessarily see the technology that we build or the solutions that we build as competitive with foundational companies, but rather very complementary and additional new powerful tools in the toolkit that ultimately help folks develop better LLM systems, as opposed to LLMs themselves,” Kannappan said. Audio evaluation coming next as Patronus expands multimodal oversight Today’s announcement represents one step in Patronus’s broader strategy for AI evaluation across different modalities. The company plans to expand beyond images into audio evaluation soon. “We’re excited because this is the next phase of our vision towards multimodal, and specifically focused on images today — and then over time, we’re excited about what we’ll do, especially with audio in the future,” Kannappan confirmed. This roadmap aligns with what Kannappan describes as the company’s “research vision towards scalable oversight” — developing evaluation mechanisms that can keep pace with increasingly sophisticated AI systems. “We continue to develop new systems, products, frameworks, methods that ultimately are equally capable as the intelligent systems that we intend to want to have oversight over as humans in the long run,” he said. As businesses race to deploy AI systems that can interpret images, extract text from documents, and generate visual content, the risk of inaccuracies, hallucinations and biases grows. Patronus is betting that even as foundation models improve, the challenges of evaluating complex multimodal AI systems will remain — requiring specialized tools that can serve as impartial judges of increasingly human-like AI output. In the high-stakes world of commercial AI deployment, these digital judges may prove as valuable as the models they evaluate. source

Patronus AI’s Judge-Image wants to keep AI honest — and Etsy is already using it Read More »

Inching towards AGI: How reasoning and deep research are expanding AI from statistical prediction to structured problem-solving

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI has evolved at an astonishing pace. What seemed like science fiction just a few years ago is now an undeniable reality. Back in 2017, my firm launched an AI Center of Excellence. AI was certainly getting better at predictive analytics and many machine learning (ML) algorithms were being used for voice recognition, spam detection, spell checking (and other applications) — but it was early. We believed then that we were only in the first inning of the AI game. The arrival of GPT-3 and especially GPT 3.5 — which was tuned for conversational use and served as the basis for the first ChatGPT in November 2022 — was a dramatic turning point, now forever remembered as the “ChatGPT moment.”  Since then, there has been an explosion of AI capabilities from hundreds of companies. In March 2023 OpenAI released GPT-4, which promised “sparks of AGI” (artificial general intelligence). By that time, it was clear that we were well beyond the first inning. Now, it feels like we are in the final stretch of an entirely different sport. The flame of AGI Two years on, the flame of AGI is beginning to appear. On a recent episode of the Hard Fork podcast, Dario Amodei — who has been in the AI industry for a decade, formerly as VP of research at OpenAI and now as CEO of Anthropic — said there is a 70 to 80% chance that we will have a “very large number of AI systems that are much smarter than humans at almost everything before the end of the decade, and my guess is 2026 or 2027.” Anthropic CEO Dario Amodei appearing on the Hard Fork podcast. Source: https://www.youtube.com/watch?v=YhGUSIvsn_Y  The evidence for this prediction is becoming clearer. Late last summer, OpenAI launched o1 — the first “reasoning model.” They’ve since released o3, and other companies have rolled out their own reasoning models, including Google and, famously, DeepSeek. Reasoners use chain-of-thought (COT), breaking down complex tasks at run time into multiple logical steps, just as a human might approach a complicated task. Sophisticated AI agents including OpenAI’s deep research and Google’s AI co-scientist have recently appeared, portending huge changes to how research will be performed.  Unlike earlier large language models (LLMs) that primarily pattern-matched from training data, reasoning models represent a fundamental shift from statistical prediction to structured problem-solving. This allows AI to tackle novel problems beyond its training, enabling genuine reasoning rather than advanced pattern recognition. I recently used Deep Research for a project and was reminded of the quote from Arthur C. Clarke: “Any sufficiently advanced technology is indistinguishable from magic.” In five minutes, this AI produced what would have taken me 3 to 4 days. Was it perfect? No. Was it close? Yes, very. These agents are quickly becoming truly magical and transformative and are among the first of many similarly powerful agents that will soon come onto the market. The most common definition of AGI is a system capable of doing almost any cognitive task a human can do. These early agents of change suggest that Amodei and others who believe we are close to that level of AI sophistication could be correct, and that AGI will be here soon. This reality will lead to a great deal of change, requiring people and processes to adapt in short order.  But is it really AGI? There are various scenarios that could emerge from the near-term arrival of powerful AI. It is challenging and frightening that we do not really know how this will go. New York Times columnist Ezra Klein addressed this in a recent podcast: “We are rushing toward AGI without really understanding what that is or what that means.” For example, he claims there is little critical thinking or contingency planning going on around the implications and, for example, what this would truly mean for employment. Of course, there is another perspective on this uncertain future and lack of planning, as exemplified by Gary Marcus, who believes deep learning generally (and LLMs specifically) will not lead to AGI. Marcus issued what amounts to a take down of Klein’s position, citing notable shortcomings in current AI technology and suggesting it is just as likely that we are a long way from AGI.  Marcus may be correct, but this might also be simply an academic dispute about semantics. As an alternative to the AGI term, Amodei simply refers to “powerful AI” in his Machines of Loving Grace blog, as it conveys a similar idea without the imprecise definition, “sci-fi baggage and hype.” Call it what you will, but AI is only going to grow more powerful. Playing with fire: The possible AI futures In a 60 Minutes interview, Alphabet CEO Sundar Pichai said he thought of AI as “the most profound technology humanity is working on. More profound than fire, electricity or anything that we have done in the past.” That certainly fits with the growing intensity of AI discussions. Fire, like AI, was a world-changing discovery that fueled progress but demanded control to prevent catastrophe. The same delicate balance applies to AI today. A discovery of immense power, fire transformed civilization by enabling warmth, cooking, metallurgy and industry. But it also brought destruction when uncontrolled. Whether AI becomes our greatest ally or our undoing will depend on how well we manage its flames. To take this metaphor further, there are various scenarios that could soon emerge from even more powerful AI: The controlled flame (utopia): In this scenario, AI is harnessed as a force for human prosperity. Productivity skyrockets, new materials are discovered, personalized medicine becomes available for all, goods and services become abundant and inexpensive and individuals are freed from drudgery to pursue more meaningful work and activities. This is the scenario championed by many accelerationists, in which AI brings progress without engulfing us in too much chaos. The unstable fire (challenging): Here, AI

Inching towards AGI: How reasoning and deep research are expanding AI from statistical prediction to structured problem-solving Read More »

Launching your first AI project with a grain of RICE: Weighing reach, impact, confidence and effort to create your roadmap

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Businesses know they can’t ignore AI, but when it comes to building with it, the real question isn’t, What can AI do — it’s, What can it do reliably? And more importantly: Where do you start? This article introduces a framework to help businesses prioritize AI opportunities. Inspired by project management frameworks like the RICE scoring model for prioritization, it balances business value, time-to-market, scalability and risk to help you pick your first AI project. Where AI is succeeding today AI isn’t writing novels or running businesses just yet, but where it succeeds is still valuable. It augments human effort, not replaces it.  In coding, AI tools improve task completion speed by 55% and boost code quality by 82%. Across industries, AI automates repetitive tasks — emails, reports, data analysis—freeing people to focus on higher-value work. This impact doesn’t come easy. All AI problems are data problems. Many businesses struggle to get AI working reliably because their data is stuck in silos, poorly integrated or simply not AI-ready. Making data accessible and usable takes effort, which is why it’s critical to start small. Generative AI works best as a collaborator, not a replacement. Whether it’s drafting emails, summarizing reports or refining code, AI can lighten the load and unlock productivity. The key is to start small, solve real problems and build from there. A framework for deciding where to start with generative AI Everyone recognizes the potential of AI, but when it comes to making decisions about where to start, they often feel paralyzed by the sheer number of options. That’s why having a clear framework to evaluate and prioritize opportunities is essential. It gives structure to the decision-making process, helping businesses balance the trade-offs between business value, time-to-market, risk and scalability. This framework draws on what I’ve learned from working with business leaders, combining practical insights with proven approaches like RICE scoring and cost-benefit analysis, to help businesses focus on what really matters: Delivering results without unnecessary complexity. Why a new framework? Why not use existing frameworks like RICE? While useful, they don’t fully account for AI’s stochastic nature. Unlike traditional products with predictable outcomes, AI is inherently uncertain. The “AI magic” fades fast when it fails, producing bad results, reinforcing biases or misinterpreting intent. That’s why time-to-market and risk are critical. This framework helps bias against failure, prioritizing projects with achievable success and manageable risk. By tailoring your decision-making process to account for these factors, you can set realistic expectations, prioritize effectively and avoid the pitfalls of chasing over-ambitious projects. In the next section, I’ll break down how the framework works and how to apply it to your business. The framework: Four core dimensions Business value: What’s the impact? Start by identifying the potential value of the application. Will it increase revenue, reduce costs or enhance efficiency? Is it aligned with strategic priorities? High-value projects directly address core business needs and deliver measurable results. Time-to-market: How quickly can this project be implemented? Evaluate the speed at which you can go from idea to deployment. Do you have the necessary data, tools and expertise? Is the technology mature enough to execute efficiently? Faster implementations reduce risk and deliver value sooner. Risk: What could go wrong?: Assess the risk of failure or negative outcomes. This includes technical risks (will the AI deliver reliable results?), adoption risks (will users embrace the tool?) and compliance risks (are there data privacy or regulatory concerns?). Lower-risk projects are better suited for initial efforts. Ask yourself if you can only achieve 80% accuracy, is that ok? Scalability (long-term viability): Can the solution grow with your business? Evaluate whether the application can scale to meet future business needs or handle higher demand. Consider the long-term feasibility of maintaining and evolving the solution as your requirements grow or change. Scoring and prioritization Each potential project is scored across these four dimensions using a simple 1-5 scale: Business value: How impactful is this project? Time-to-market: How realistic and quick is it to implement? Risk: How manageable are the risks involved? (Lower risk scores are better.) Scalability: Can the application grow and evolve to meet future needs? For simplicity, you can use T-shirt sizing (small, medium, large) to score dimensions instead of numbers. Calculating a prioritization score Once you’ve sized or scored each project across the four dimensions, you can calculate a prioritization score: Prioritization score formula. Source: Sean Falconer Here, α (the risk weight parameter) allows you to adjust how heavily risk influences the score: α=1 (standard risk tolerance): Risk is weighted equally with other dimensions. This is ideal for organizations with AI experience or those willing to balance risk and reward. α> (risk-averse organizations): Risk has more influence, penalizing higher-risk projects more heavily. This is suitable for organizations new to AI, operating in regulated industries, or in environments where failures could have significant consequences. Recommended values: α=1.5 to α=2 α<1 (high-risk, high-reward approach): Risk has less influence, favoring ambitious, high-reward projects. This is for companies comfortable with experimentation and potential failure. Recommended values: α=0.5 to α=0.9 By adjusting α, you can tailor the prioritization formula to match your organization’s risk tolerance and strategic goals.  This formula ensures that projects with high business value, reasonable time-to-market, and scalability — but manageable risk — rise to the top of the list. Applying the framework: A practical example Let’s walk through how a business could use this framework to decide which gen AI project to start with. Imagine you’re a mid-sized e-commerce company looking to leverage AI to improve operations and customer experience. Step 1: Brainstorm opportunities Identify inefficiencies and automation opportunities, both internal and external. Here’s a brainstorming session output: Internal opportunities: Automating internal meeting summaries and action items. Generating product descriptions for new inventory. Optimizing inventory restocking forecasts. Performing sentiment analysis and automatic scoring for customer reviews. External opportunities: Creating personalized marketing email campaigns. Implementing a chatbot for

Launching your first AI project with a grain of RICE: Weighing reach, impact, confidence and effort to create your roadmap Read More »