VentureBeat

From dot-com to dot-AI: How we can learn from the last tech transformation (and avoid making the same mistakes)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More At the height of the dot-com boom, adding “.com” to a company’s name was enough to send its stock price soaring — even if the business had no real customers, revenue or path to profitability. Today, history is repeating itself. Swap “.com” for “AI,” and the story sounds eerily familiar. Companies are racing to sprinkle “AI” into their pitch decks, product descriptions and domain names, hoping to ride the hype. As reported by Domain Name Stat, registrations for “.ai” domains surged about 77.1% year-over-year in 2024, driven by startups and incumbents alike rushing to associate themselves with artificial intelligence — whether they have a true AI advantage or not. The late 1990s made one thing clear: Using breakthrough technology isn’t enough. The companies that survived the dot-com crash weren’t chasing hype — they were solving real problems and scaling with purpose. AI is no different. It will reshape industries, but the winners won’t be those slapping “AI” on a landing page — they’ll be the ones cutting through the hype and focusing on what matters. The first steps? Start small, find your wedge and scale deliberately. Start small: Find your wedge before you scale One of the most costly mistakes of the dot-com era was trying to go big too soon — a lesson AI product builders today can’t afford to ignore. Take eBay, for example. It began as a simple online auction site for collectibles — starting with something as niche as Pez dispensers. Early users loved it because it solved a very specific problem: It connected hobbyists who couldn’t find each other offline. Only after dominating that initial vertical did eBay expand into broader categories like electronics, fashion and, eventually, almost anything you can buy today. Compare that to Webvan, another dot-com era startup with a much different strategy. Webvan aimed to revolutionize grocery shopping with online ordering and rapid home delivery — all at once, in multiple cities. It spent hundreds of millions of dollars building massive warehouses and complex delivery fleets before it had strong customer demand. When growth didn’t materialize fast enough, the company collapsed under its own weight. The pattern is clear: Start with a sharp, specific user need. Focus on a narrow wedge you can dominate. Expand only when you have proof of strong demand. For AI product builders, this means resisting the urge to build an “AI that does everything.” Take, for example, a generative AI tool for data analysis. Are you targeting product managers, designers or data scientists? Are you building for people who don’t know SQL, those with limited experience or seasoned analysts? Each of those users has very different needs, workflows and expectations. Starting with a narrow, well-defined cohort — like technical project managers (PMs) with limited SQL experience who need quick insights to guide product decisions — allows you to deeply understand your user, fine-tune the experience and build something truly indispensable. From there, you can expand intentionally to adjacent personas or capabilities. In the race to build lasting gen AI products, the winners won’t be the ones who try to serve everyone at once — they’ll be the ones who start small, and serve someone incredibly well. Own your data moat: Build compounding defensibility early Starting small helps you find product-market fit. But once you gain traction, your next priority is to build defensibility — and in the world of gen AI, that means owning your data. The companies that survived the dot-com boom didn’t just capture users — they captured proprietary data. Amazon, for example, didn’t stop at selling books. They tracked purchases and product views to improve recommendations, then used regional ordering data to optimize fulfillment. By analyzing buying patterns across cities and zip codes, they predicted demand, stocked warehouses smarter and streamlined shipping routes — laying the foundation for Prime’s two-day delivery, a key advantage competitors couldn’t match. None of it would have been possible without a data strategy baked into the product from day one. Google followed a similar path. Every query, click and correction became training data to improve search results — and later, ads. They didn’t just build a search engine; they built a real-time feedback loop that constantly learned from users, creating a moat that made their results and targeting harder to beat. The lesson for gen AI product builders is clear: Long-term advantage won’t come from simply having access to a powerful model — it will come from building proprietary data loops that improve their product over time. Today, anyone with enough resources can fine-tune an open-source large language model (LLM) or pay to access an API. What’s much harder — and far more valuable — is gathering high-signal, real-world user interaction data that compounds over time. If you’re building a gen AI product, you need to ask critical questions early: What unique data will we capture as users interact with us? How can we design feedback loops that continuously refine the product? Is there domain-specific data we can collect (ethically and securely) that competitors won’t have? Take Duolingo, for example. With GPT-4, they’ve gone beyond basic personalization. Features like “Explain My Answer” and AI role-play create richer user interactions — capturing not just answers, but how learners think and converse. Duolingo combines this data with their own AI to refine the experience, creating an advantage competitors can’t easily match. In the gen AI era, data should be your compounding advantage. Companies that design their products to capture and learn from proprietary data will be the ones that survive and lead. Conclusion: It’s a marathon, not a sprint The dot-com era showed us that hype fades fast, but fundamentals endure. The gen AI boom is no different. The companies that thrive won’t be the ones chasing headlines — they’ll be the ones solving real problems, scaling with discipline and building real moats. The future of AI will belong to

From dot-com to dot-AI: How we can learn from the last tech transformation (and avoid making the same mistakes) Read More »

Why Microsoft Fabric has already been adopted by 70% of the Fortune 500 

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft is bringing even more database options into the Microsoft Fabric fold, alongside a series of initiatives that aim to help tackle enterprise data complexity. For literally generations of databases, compute and storage were always tightly coupled. That caused all kinds of scalability and data silo issues for enterprises. In 2023, Microsoft Fabric was first introduced as a strategy to help overcome that challenge. The basic idea behind Microsoft Fabric is to be a common data layer across Microsoft’s data and analytics tools. In November 2024, Microsoft Fabric expanded with support for the Azure SQL transactional database platform. Microsoft, just like its rivals at Google at Amazon, has a lot of different database platforms. While Azure SQL is widely used, when it comes to AI there is another more influential database platform and that’s CosmosDB.  At the Build 2025 conference today, Microsoft is announcing that CosmosDB is finally coming to Microsoft Fabric. CosmosDB is among the most critical databases in use today for AI as it is the database that is at the foundation for OpenAI’s ChatGPT service. CosmosDB is also getting a boost via integration with Azure AI Foundry, giving more direct access for agentic AI to data. There are also a series of additional data updates including support for Microsoft Copilot in the PowerBI business intelligence platform. SQL Server 2025 database is being previewed and the DiskANN (Disk Approximate Nearest Neighbor) vector index is being open sourced. These innovations directly address the integration complexity that plagues enterprise data teams when building AI applications. A key focus is to eliminate the data fragmentation that hampers enterprise AI initiatives. “When I talk to customers, the message I consistently get is, please unify,  I’m Chief Information Officer, I don’t want to be the Chief Integration Officer helping translate AI into my competitive advantage,” Arun Ulag, Corporate Vice President for Azure Data at Microsoft, told VentureBeat. Fabric accelerates enterprise AI by eliminating data silos Microsoft Fabric, the company’s unified data platform, continues its rapid growth trajectory by bringing previously separate products together in a cohesive ecosystem. “We’re bringing all of our products together and unifying them into a single product, which is Microsoft Fabric,” Ulag said. “In some ways, you can think about Fabric as almost like what we did with Office 30 years ago.” This strategy has clearly resonated with enterprises. Ulag said that Microsoft Fabric now has over 21,000 organizations as paying customers worldwide, including 70% of the Fortune 500.  “It’s growing very, very quickly,” he said. CosmosDB in Fabric eliminates NoSQL infrastructure overhead The headline addition to Fabric is CosmosDB, Microsoft’s NoSQL document database that powers many high-profile AI applications. “CosmosDB is, by far, often becoming the database of choice for the world’s AI workloads,” Ulag said. “ChatGPT itself is built on CosmosDB… Walmart’s e-commerce store runs on CosmosDB as well.” By bringing CosmosDB into Fabric, Microsoft enables organizations to deploy NoSQL databases without managing complex infrastructure. A key challenge of having a disaggregated compute and storage approach is maintaining performance without latency. Microsoft has taken very specific technical steps to maintain performance through an innovative caching system. “Inside Fabric, we maintain a highly performant cache, which handles all the fast updates that CosmosDB does,” Ulag explained. “We have a very fast synchronization mechanism that is completely transparent to the customer, where the data is replicated in near real-time into OneLake.” This approach delivers millisecond response times required for AI applications while eliminating infrastructure management tasks. Why open source data formats are key to Fabric’s success While Microsoft connects all its data products through the Fabric strategy, OneLake technology actually stores the data. There is tremendous complexity in having a unified data lake that handles multiple different data types and formats from SQL, NoSQL and unstructured data. It’s a challenge that Microsoft is solving with an open source approach. “Microsoft has completely embraced open source data formats, so everything in Fabric, regardless of whether which workload it is, by default, is always in Apache Parquet and Delta Lake,” Ulag said.”It’s really a unified product, with the unified architecture and a unified business model, with all of the data sitting in a global SaaS data lake, which is OneLake in open source data formats.” This optimization means all Fabric services, from SQL to Power BI to CosmosDB, can access the same underlying data without conversion or duplication, eliminating the traditional performance penalty associated with open formats. DiskANN open source release brings enterprise-grade vector search to all Microsoft isn’t just using open source for data formats, it’s also contributing its own code too. At Build, Microsoft is announcing that it is open sourcing the DiskANN vector search technology. Microsoft’s decision to open source DiskANN represents a significant contribution to the AI ecosystem, making enterprise-grade vector search capabilities available to all developers. “We have a very, very strong vector capability called DiskANN, it was originally created in Microsoft Research, and it’s used in Bing… built into CosmosDB and built into Fabric,” said Ulag. DiskANN implements approximate nearest neighbor (ANN) search algorithms optimized for disk-based operations, making it ideal for large-scale vector databases that exceed memory limitations. By open sourcing DiskANN, Microsoft enables developers to implement the same high-performance vector search used by ChatGPT and other leading AI applications. This helps address one of the key challenges in building retrieval-augmented generation (RAG) systems, where finding semantically similar content quickly is essential for grounding AI responses in enterprise data. “We’re allowing everybody to be able to get the benefits of the vector store that we’re using internally,” Ulag said. Why it matters for enterprise data leaders For enterprises leading in AI adoption, these announcements enable more sophisticated applications that seamlessly integrate multiple data types. The complexity and the challenges of dealing with data silos aren’t just about different locations but different formats too. The continued evolution of Microsoft Fabric directly addresses that concern in

Why Microsoft Fabric has already been adopted by 70% of the Fortune 500  Read More »

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As enterprises increasingly look to build and deploy generative AI-powered applications and services for internal or external use (employees or customers), one of the toughest questions they face is understanding exactly how well these AI tools are performing out in the wild. In fact, a recent survey by consulting firm McKinsey and Company found that only 27% of 830 respondents said that their enterprises’ reviewed all of the outputs of their generative AI systems before they went out to users. Unless a user actually writes in with a complaint report, how is a company to know if its AI product is behaving as expected and planned? Raindrop, formerly known as Dawn AI, is a new startup tackling the challenge head-on, positioning itself as the first observability platform purpose-built for AI in production, catching errors as they happen and explaining to enterprises what went wrong and why. The goal? Help solve generative AI’s so-called “black box problem.” “AI products fail constantly—in ways both hilarious and terrifying,” wrote co-founder Ben Hylak on X recently, “Regular software throws exceptions. But AI products fail silently.” Raindrop seeks to offer any category-defining tool akin to what observability company Sentry does for traditional software. But while traditional exception tracking tools don’t capture the nuanced misbehaviors of large language models or AI companions, Raindrop attempts to fill the hole. “In traditional software, you have tools like Sentry and Datadog to tell you what’s going wrong in production,” he told VentureBeat in a video call interview last week. “With AI, there was nothing.” Until now — of course. How Raindrop works Raindrop offers a suite of tools that allow teams at enterprises large and small to detect, analyze, and respond to AI issues in real time. The platform sits at the intersection of user interactions and model outputs, analyzing patterns across hundreds of millions of daily events, but doing so with SOC-2 encryption enabled, protecting the data and privacy of users and the company offering the AI solution. “Raindrop sits where the user is,” Hylak explained. “We analyze their messages, plus signals like thumbs up/down, build errors, or whether they deployed the output, to infer what’s actually going wrong.” Raindrop uses a machine learning pipeline that combines LLM-powered summarization with smaller bespoke classifiers optimized for scale. Promotional screenshot of Raindrop’s dashboard. Credit: Raindrop.ai “Our ML pipeline is one of the most complex I’ve seen,” Hylak said. “We use large LLMs for early processing, then train small, efficient models to run at scale on hundreds of millions of events daily.” Customers can track indicators like user frustration, task failures, refusals, and memory lapses. Raindrop uses feedback signals such as thumbs down, user corrections, or follow-up behavior (like failed deployments) to identify issues. Fellow Raindrop co-founder and CEO Zubin Singh Koticha told VentureBeat in the same interview that while many enterprises relied on evaluations, benchmarks, and unit tests for checking the reliability of their AI solutions, there was very little designed to check AI outputs during production. “Imagine in traditional coding if you’re like, ‘Oh, my software passes ten unit tests. It’s great. It’s a robust piece of software.’ That’s obviously not how it works,” Koticha said. “It’s a similar problem we’re trying to solve here, where in production, there isn’t actually a lot that tells you: is it working extremely well? Is it broken or not? And that’s where we fit in.” For enterprises in highly regulated industries or for those seeking additional levels of privacy and control, Raindrop offers Notify, a fully on-premises, privacy-first version of the platform aimed at enterprises with strict data handling requirements. Unlike traditional LLM logging tools, Notify performs redaction both client-side via SDKs and server-side with semantic tools. It stores no persistent data and keeps all processing within the customer’s infrastructure. Raindrop Notify provides daily usage summaries and surfacing of high-signal issues directly within workplace tools like Slack and Teams—without the need for cloud logging or complex DevOps setups. Advanced error identification and precision Identifying errors, especially with AI models, is far from straightforward. “What’s hard in this space is that every AI application is different,” said Hylak. “One customer might build a spreadsheet tool, another an alien companion. What ‘broken’ looks like varies wildly between them.” That variability is why Raindrop’s system adapts to each product individually. Each AI product Raindrop monitors is treated as unique. The platform learns the shape of the data and behavior norms for each deployment, then builds a dynamic issue ontology that evolves over time. “Raindrop learns the data patterns of each product,” Hylak explained. “It starts with a high-level ontology of common AI issues—things like laziness, memory lapses, or user frustration—and then adapts those to each app.” Whether it’s a coding assistant that forgets a variable, an AI alien companion that suddenly refers to itself as a human from the U.S., or even a chatbot that starts randomly bringing up claims of “white genocide” in South Africa, Raindrop aims to surface these issues with actionable context. The notifications are designed to be lightweight and timely. Teams receive Slack or Microsoft Teams alerts when something unusual is detected, complete with suggestions on how to reproduce the problem. Over time, this allows AI developers to fix bugs, refine prompts, or even identify systemic flaws in how their applications respond to users. “We classify millions of messages a day to find issues like broken uploads or user complaints,” said Hylak. “It’s all about surfacing patterns strong and specific enough to warrant a notification.” From Sidekick to Raindrop The company’s origin story is rooted in hands-on experience. Hylak, who previously worked as a human interface designer at visionOS at Apple and avionics software engineering at SpaceX, began exploring AI after encountering GPT-3 in its early days back in 2020. “As soon as I used GPT-3—just a simple text completion—it blew my mind,” he recalled. “I instantly thought, ‘This is going

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance Read More »

At Google I/O, Sergey Brin makes surprise appearance — and declares Google will build the first AGI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More At Google I/O this week, amid the usual parade of dazzling product demos and AI-powered announcements, something unusual happened: Google declared war — quietly — in the race to build artificial general intelligence (AGI). “We fully intend that Gemini will be the very first AGI,” said Google co-founder Sergey Brin, who made a surprise, unscheduled appearance at what was originally planned as a solo fireside chat with Demis Hassabis, CEO of DeepMind, Google’s AI research powerhouse. The conversation, hosted by Big Technology founder Alex Kantrowitz, pressed both men on the future of intelligence, scale, and the evolving definition of what it means for a machine to think. The moment was fleeting, but unmistakable. In a field where most players hedge their talk of AGI with caveats — or avoid the term altogether — Brin’s comment stood out. It marked the first time a Google executive has explicitly stated an intent to win the AGI race, a contest often associated more with Silicon Valley rivals like OpenAI and Elon Musk than with the search giant. Yet Brin’s boldness contrasted sharply with the caution expressed by Hassabis, a former neuroscientist and game developer whose vision has long steered DeepMind’s approach to AI. While Brin framed AGI as an imminent milestone and competitive objective, Hassabis called for clarity, restraint, and scientific precision. “What I’m interested in, and what I would call AGI, is really a more theoretical construct, which is, what is the human brain as an architecture able to do?” Hassabis explained. “It’s clear to me today, systems don’t have that. And then the other thing that why I think it’s sort of overblown the hype today on AGI is that our systems are not consistent enough to be considered to be fully General. Yet they’re quite general.” This philosophical tension between Brin and Hassabis — one chasing scale and first-mover advantage, the other warning of overreach — may define Google’s future as much as any product launch. Inside Google’s AGI timeline: Why Brin and Hassabis disagree on when superintelligence will arrive The contrast between the two executives became even more apparent when Kantrowitz posed a simple question: AGI before or after 2030? “Before,” Brin answered without hesitation. “Just after,” Hassabis countered with a smile, prompting Brin to joke that Hassabis was “sandbagging.” This five-second exchange encapsulates the subtle but significant tension in Google’s AGI strategy. While both men clearly believe powerful AI systems are coming this decade, their different timelines reflect fundamentally different approaches to the technology’s development. Hassabis took pains throughout the conversation to establish a more rigorous definition of AGI than is commonly used in industry discussions. For him, the human brain serves as “an important reference point, because it’s the only evidence we have, maybe in the universe, that general intelligence is possible.” True AGI, in his view, would require showing “your system was capable of doing the range of things even the best humans in history were able to do with the same brain architecture. It’s not one brain but the same brain architecture. So what Einstein did, what Mozart was able to do, what Marie Curie and so on.” By contrast, Brin’s focus appeared more oriented toward competitive positioning than scientific precision. When asked about his return to day-to-day technical work at Google, Brin explained: “As a computer scientist, it’s a very unique time in history, like, honestly, anybody who’s a computer scientist should not be retired right now. Should be working on AI.” DeepMind’s scientific roadmap clashes with Google’s competitive AGI strategy Despite their different emphases, both leaders outlined similar technical challenges that need to be solved on the path to more advanced AI. Hassabis identified several specific barriers, noting that “to get all the way to something like AGI, I think may require one or two more new breakthroughs.” He pointed to limitations in current systems’ reasoning abilities, creative invention, and the accuracy of their “world models.” “For me, for something to be called AGI, it would need to be consistent, much more consistent across the board than it is today,” Hassabis explained. “It should take, like, a couple of months for maybe a team of experts to find a hole in it, an obvious hole in it, whereas today, it takes an individual minutes to find that.” Both executives agreed on the importance of “thinking” capabilities in AI systems. Google’s newly announced “deep think” feature, which allows AI models to engage in parallel reasoning processes that check each other, represents a step in this direction. “We’ve always been big believers in what we’re now calling this thinking paradigm,” Hassabis said, referencing DeepMind’s early work on systems like AlphaGo. “If you look at a game like chess or go… we had versions of AlphaGo and AlphaZero with the thinking turned off. So it was just the model telling you its first idea. And, you know, it’s not bad. It’s maybe like master level… But then if you turn the thinking on, it’s been way beyond World Champion level.” Brin concurred, adding: “Most of us, we get some benefit by thinking before we speak. And although not always, I was reminded to do that, but I think that the AIs obviously, are much stronger once you add that capability.” Beyond scale: How Google is betting on algorithmic breakthroughs to win the AGI race When pressed on whether scaling current models or developing new algorithmic approaches would drive progress, both leaders emphasized the need for both — though with slightly different emphases. “I’ve always been of the opinion you need both,” Hassabis said. “You need to scale to the maximum the techniques that you know about. You want to exploit them to the limit, whether that’s data or compute, scale, and at the same time, you want to spend a bunch of effort on what’s coming next.” Brin agreed but added a notable historical perspective: “If

At Google I/O, Sergey Brin makes surprise appearance — and declares Google will build the first AGI Read More »

Microsoft just taught its AI agents to talk to each other—and it could transform how we work

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft announced a significant expansion of its Copilot Studio platform at Build 2025 today, introducing multi-agent systems that allow different AI agents to collaborate on complex business tasks, along with new developer tools, security enhancements, and integration with WhatsApp. The suite of announcements represents Microsoft’s most ambitious attempt yet to make AI agents more practical for enterprise use, addressing key limitations that have hindered broader adoption of agent technology in business settings. “We’re seeing from customers doing large-scale production rollouts that governance and observability become even more critical,” said Ray Smith, VP of AI Agents at Microsoft, in an exclusive interview with VentureBeat. “The beauty of Copilot Studio is that we provide a managed infrastructure framework with built-in lifecycle management and comprehensive governance capabilities.” How Microsoft’s new multi-agent system is transforming enterprise workflows At the heart of the announcements is Microsoft’s new multi-agent system, which enables agents built with Copilot Studio, Microsoft 365, Azure AI Agents Service, and Azure Fabric to work together, delegating tasks to one another to complete complex business processes. This capability addresses a fundamental challenge organizations have faced when implementing agent technology. “Creating a reliable process within a single agent is extremely challenging,” Smith explained. “Breaking it down into multiple agents not only improves maintainability and simplifies solution building, but it also significantly enhances overall reliability.” The system enables scenarios such as a Copilot Studio agent pulling sales data from a CRM, handing it to a Microsoft 365 agent to draft a proposal in Word, and then triggering another agent to schedule follow-ups in Outlook. Microsoft is also emphasizing interoperability through support for the agent-to-agent protocol recently announced by Google, potentially enabling cross-platform agent communication. ‘Computer use’ feature brings human-like UI interaction to AI agents without API dependencies Another key announcement is “computer use” for Copilot Studio agents, which allows agents to interact with desktop applications and websites by controlling interfaces directly — clicking buttons, navigating menus, and typing in fields — even when APIs aren’t available. “When APIs aren’t available, this feature can interact directly with user interfaces — whether desktop applications, browsers, or other platforms,” Smith said. “It provides what we call ‘no-cliffs extensibility’ and operates based on intent rather than pixel coordinates, unlike traditional desktop automation. This goal-oriented approach makes it significantly more robust.” This capability is currently available through Microsoft’s Frontier program for eligible customers with 500,000+ Copilot Studio messages and a U.S.-based environment. Customizable model selection and Python-powered analytics supercharge enterprise AI solutions Microsoft is giving organizations more flexibility with their AI models by enabling them to bring custom models from Azure AI Foundry into Copilot Studio. This includes access to over 1,900 models, including the latest from OpenAI GPT-4.1, Llama, and DeepSeek. “Start with off-the-shelf models because they’re already fantastic and continuously improving,” Smith said. “Companies typically choose to fine-tune these models when they need to incorporate specific domain language, unique use cases, historical data, or customer requirements. This customization ultimately drives either greater efficiency or improved accuracy.” The company is also adding a code interpreter feature that brings Python capabilities to Copilot Studio agents, enabling data analysis, visualization, and complex calculations without leaving the Copilot Studio environment. Smith highlighted financial applications as a particular strength: “In financial analysis and services, we’ve seen a remarkable breakthrough over the past six months,” Smith said. “Deep reasoning models, powered by reinforcement learning, can effectively self-verify any process that produces quantifiable outputs.” He added that these capabilities excel at “complex financial analysis where users need to generate code for creating graphs, producing specific outputs, or conducting detailed financial assessments.” WhatsApp integration extends AI agent reach to billions of global users Starting in early July, organizations will be able to publish Copilot Studio agents to WhatsApp, enabling them to reach customers through one of the world’s most popular messaging platforms. “WhatsApp is obviously a key channel. Globally, it’s pretty huge,” Smith explained. “So that became a high priority for us through the various channels and integrations, to unlock that as a way for end users to interact at a time that suits them.” This addition, along with a new SharePoint channel (now generally available), significantly expands the reach of custom agents beyond Microsoft’s own ecosystem. Microsoft bridges the developer experience gap with VS Code extension and enhanced admin controls For professional developers, Microsoft is launching a Visual Studio Code extension for Copilot Studio, bringing familiar tooling and workflows to agent development. The extension provides features like IntelliSense, color formatting, and “find all references” functionality, enabling developers to edit agents directly from within Visual Studio Code. IT administrators gain new tools as well, including a centralized “Agents & connectors” page in the Microsoft 365 Admin Center for managing the full lifecycle of Copilot agents. This interface allows admins to view all agents, filter by metadata, assign sensitivity labels, manage connector behavior, and block or delete agents that violate security policies. AI-powered agent discovery solves the ‘which agent do I need?’ problem for end users As organizations develop more specialized agents, Microsoft is addressing the challenge of agent discovery with new in-conversation agent recommendations. This feature suggests the most relevant agent based on a user’s needs, dynamically recommending handoffs when appropriate. “It’s fundamentally like an agentic RAG pattern,” Smith explained, referring to retrieval-augmented generation. “It uses vector databases and indexing based on the description of the agent. When you describe the task, it’s looking up and going, ‘Hey, I think you should look at this agent.’” The system only recommends agents the user has access to, respecting existing permissions structures. Overcoming enterprise AI implementation hurdles: Microsoft’s roadmap for success Despite the new capabilities, organizations still face significant challenges when implementing multi-agent systems. Smith recommends focusing on specific use cases rather than attempting comprehensive transformations from the outset. “Best practices have been customers who’ve focused on a use case or problem to start with, broken that down

Microsoft just taught its AI agents to talk to each other—and it could transform how we work Read More »

Google finally launches NotebookLM mobile app at I/O: hands-on, first impressions

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Among the numerous announcements search giant Google made this week during the 2025 edition of its annual I/O developer conference (short of “input/output”), one of the most notable for enterprise leaders is that it is finally bringing its hit conversational AI application Notebook LM, previously available only on the web, to the Google Play and Apple App Stores. NotebookLM, if you’re not already a user and don’t recall, is a free Google AI-powered service that began as a way for users to upload and query documents via text, only to become an exceedingly popular AI podcast generator, complete with two extremely personable default AI host voices — one masculine and one feminine — that banter and intonate very similarly to real people about whatever subjects the user wants, guided by user-uploaded PDFs, links and even YouTube videos. It was first released on the web back in July 2023 (and made generally available in the U.S. in December of that year), so the mobile app has been a long time coming. So how does the mobile version compare? Read on to find out. Strong user demand drives mobile release The mobile app addresses one of the most frequent user requests: a way to use NotebookLM beyond the desktop. According to a blog post by Biao Wang, Product Manager at Google Labs, many users wanted the ability to listen to summaries, interact with AI hosts and share content to NotebookLM directly from their phones. The release has seen strong early momentum. Within 24 hours of launch, the app reached the number two position in the Productivity category of the Apple App Store and placed ninth overall. Meanwhile, traffic analytics firm Similarweb reported that monthly visits to NotebookLM’s platform increased by 56% over the past six months to more than 48 million vists, citing a strong adoption rate. Mobile-first features focus on offline and multitasking The NotebookLM mobile app introduces several features tailored for on-the-go usage. Users can download Audio Overviews—AI-generated podcast discussions of uploaded documents or media—for offline playback. This is a huge unlock for those wanting to take audio summaries with them on hikes, on the subway, or anywhere cell and wireless service is lacking. Users can continue listening without interruption, whether commuting or dealing with limited connectivity. Additionally, Audio Overviews support background playback, enabling users to listen while using other apps or performing other tasks on their devices. Interactive AI hosts add dynamic engagement The mobile experience has a new layer of interactivity. When connected to the internet, users can tap a “Join” button during playback to ask the AI hosts questions, redirect the summary flow, or even request light-hearted content. This real-time engagement allows for a more personalized and flexible understanding of complex materials. Easy content sharing from any app The app also streamlines the process of adding new source materials. Users browsing a website, viewing a PDF, or watching a YouTube video can share that content directly to NotebookLM via their phone’s native share menu. The initial version supports three input types—websites, PDFs and YouTube links—with more formats planned for future updates. Available now on iOS and Android NotebookLM’s app is available immediately for download. It supports iPhones and iPads running iOS 17 or later, as well as Android phones and tablets running Android 10 or higher. The version released is described as a Minimum Viable Product (MVP), with ongoing improvements expected based on user input. The development team has encouraged feedback through social media and its community channels, particularly around which features from the web app should be prioritized for mobile use. Video Overviews on the way In addition to the launch, NotebookLM previewed a new feature: Video Overviews. The feature will arrive soon in English, allowing users to generate brief, animated video summaries from various content types, including PDFs and images. The mobile release represents a key step in NotebookLM’s ongoing product evolution. It aims to make information analysis and comprehension accessible in more contexts, whether users are at their desks or on the go. Hands-on experience with NotebookLM iOS app shows promise and pitfalls My initial hands-on tests of the iOS version of NotebookLM revealed a mix of promise and performance gaps. The core flow—pasting a URL to add it as a source, generating summaries and interacting with the AI—is intuitive and consistent with the web version’s focus on simplifying complex content. For instance, a source titled Explained: Neural Networks loaded successfully, created an overview with 7 sources, and allowed playback of an Audio Overview with standard controls like rewind, fast forward and playback speed adjustments. However, the experience wasn’t entirely smooth. The app shows inconsistent behavior when trying to add URLs as sources. In multiple cases, adding a website URL failed, resulting in the error message “Could not add source.” This occurred even with typical educational content (e.g., pages about how large language models work). The app did not provide details about why these URLs failed, whether due to unsupported formats, site restrictions or network issues. This lack of feedback makes troubleshooting difficult for users. While switching between notebooks, the app occasionally displayed a message stating “Could not load notebooks.” This occurred while navigating between previously created notebooks and attempting to create or access new ones. The loading message “This may take 10–20 seconds” appeared frequently but didn’t always resolve, potentially leaving users uncertain about whether an operation was actually successful or not. UI polish and real-time response The interface itself is visually clean and consistent with Google’s Material Design patterns. However, subtle usability delays—such as loading animations persisting for longer than necessary or summary generation pauses—detract from the sense of real-time responsiveness, especially for users familiar with the web app’s speed. Despite the above issues, some key promises of the mobile app were validated in use. My Audio Overview of Microsoft’s Build 2025 keynote address played back reliably in offline mode. The app

Google finally launches NotebookLM mobile app at I/O: hands-on, first impressions Read More »

Reduce model integration costs while scaling AI: LangChain’s open ecosystem delivers where closed vendors can’t

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More LangChain, one of the leaders in the AI framework and orchestration space, plans to remain committed to the open source ecosystem, particularly as it reinforces its vendor-agnostic stance.  Harrison Chase, Langchain co-founder and CEO, told VentureBeat that the success of its different platforms can be attributed to developers demanding model choice and not staying in a closed provider.  “The power of the LangChain framework is in its integrations and the ecosystem,” Chase said. “The scale of the ecosystem is enormous, and much of that is made possible by the framework being open source.” Chase said LangChain downloads reached 72.3 million last month, compared to competitors like OpenAI’s Agents SDK. He added that the LangChain Python and JS frameworks “have 4,500 contributors, that’s more contributors than Spark.” LangChain, founded in 2022, has grown beyond its initial framework, which helped developers build AI applications. In February last year, it released the testing and evaluation platform LangSmith, a second framework called LangGraph and LangGraph Platform to help deploy autonomous agents.  LangChain remained open source and agnostic to vendors and models throughout its growth. For example, it’s partnered with multiple companies, like Google and Cisco, around agent interoperability. As enterprises began experimenting with AI agents, Chase said LangChain saw an opportunity to offer deployment options that considered developer choices. “Over the past year and a half or so, more and more enterprises and companies are just looking to go into production. So we matured all of our offerings, not just the open source LangChain, but all of our offerings collectively as a company to meet that demand and make it as easy as possible to build agentic applications,” he said.  LangGraph Platform extends open-source offerings One of LangChain’s new open-source platforms is the LangGraph Platform, which became generally available this week. The LangGraph Platform lets developers manage and begin deploying long-lasting or stateful agents. These agents build on what Chase calls “ambient agents,” or agents that work in the background and are triggered by certain events.  “We’ve tried to focus a lot on some of the harder infrastructure problems that surround these agents,” Chase said. “LangGraph is good for long-running stateful agents, so if you’re deploying a simple application, you don’t want to use LangGraph Platform.” He added that the company wants to bet big on ambient or long-running agents, finding this more independent, autonomous agent a more interesting infrastructure challenge.  Through the LangGraph Platform, organizations can deploy agents with one-click deployment, horizontal scaling to handle “bursty, long-running traffic,” a persistence layer to support agentic memory, API endpoints for customization and native access to LangGraph Studio to debug any agents.  Organizations might find themselves bringing more and more agents online. LangGraph Platform includes a management console that lays out all the agents currently deployed and lets users find agents, reuse common agent architectures and create multi-agent architectures.” “One of the big benefits of LangGraph is that it gives the builder of the agent full control over the cognitive architecture. If there’s an [large language model] LLM action that must be done right, a good tool you have to enforce quality is to create an in-the-loop evaluation directly in your LangGraph app,” Chase said.  Chase added that with LangGraph, developers can access “a good orchestration framework” to build agents and bring these reliable agents into LangGraph Platform for deployment.  During the best test, Chase said over 370 teams used LangGraph Platform. LangChain offers three tiers to use LangGraph Platform, with pricing dependent on how developers plan to host the service.  The broader LangChain open-source ecosystem For Chase, one of LangChain’s strengths is its ability to create an entire application and agent development ecosystem.  LangSmith, the company’s testing and observability platform, works with LangGraph and LangGraph Platform to track agent metrics. Since many agents built and run with LangGraph Platform are longer-running, enterprises need to check whether they continue to perform to specifications constantly.  Chase boasted that LangGraph “is the most widely adopted agent framework” and claimed it’s downloaded more than AutoGen from Microsoft and the CrewAI agentic platform, once again citing the open-source value for its success. “LangGraph is most often selected by teams that need to build end-user facing or highly trafficked agents (LinkedIn, Uber, GitLab) – the reason is that you won’t scale off of LangGraph because it’s very low-level and controllable, which is needed for reliable agents. CrewAI and Autogen are often used because they have a less steep learning curve – these frameworks make more decisions for the user, so you’re trading ease of adoption for power,” he said. source

Reduce model integration costs while scaling AI: LangChain’s open ecosystem delivers where closed vendors can’t Read More »

Google’s Jules aims to out-code Codex in battle for the AI developer stack

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Vibe coding and the growth of AI-powered coding platforms gave rise to yet another battleground among tech companies.  In December, Google released Jules, an autonomous coding agent that can fix bugs asynchronously, as an experiment. However, during Google I/O, Google announced that Jules will now be available in beta.  With the broader release of Jules, Google positions itself as a strong competitor against a rising number of AI coding assistants designed to write, check and fix code autonomously.  Josh Woodward, vice president of Google Labs, told reporters in a briefing that Jules “will be available to help developers fix bugs, create tests, consult documentation all happening in the background.” “People are describing apps into existence,” Woodward said. “This started out as an asynchronous coding agent with the idea that, what if you created a way where you could assign tasks to this agent for the things you didn’t want to do?” Jules will be integrated into GitHub and uses Google’s Gemini 2.5 Pro. During the public beta phase, developers can access Jules for free but with usage limits.  Asynchronous and parallel Jules works asynchronously, allowing developers to assign it a task while they work separately on something else. It runs tasks inside a virtual machine, shows tasks and their reasoning and even offers audio summaries.  But Jules is not the only asynchronous and parallel task coding agent on the market, nor was it the only one announced in May.  OpenAI surprised the industry by releasing a research preview of its coding agent Codex, after rumors circulated that the company would buy the coding startup Windsurf. Codex began life as a coding model but has since transformed into a coding agent able to write, fix bugs and answer codebase questions in a separate sandbox. Codex was also behind one of the first code completion assistants, GitHub Copilot. GitHub announced at Microsoft Build this week a GitHub Copilot Agent, doing much of the same asynchronous work as Codex and Jules.  Social media is abuzz with interest in the upcoming arms race around coding agents, even before Jules and Codex were fully released to the public.  Yeah, I think Jules beats Codex by a lot. Only tested on a my lazy prompt so far “Analyze the project and write unit tests to cover 100%”. – Jules plans first and creates its own tasks. Codex does not. That’s major.– Jules VMs have internet pic.twitter.com/DCGPKwiNiP — Daniel Nakov (@dnak0v) May 19, 2025 @Google ai agent Jules just made her first contribution to a project I’m working onFeedback: I really wish there was a way where I could select files or directories where I would want the AI to focus on pic.twitter.com/z5yMaF2ERb — Nicolas (@NicolasSerna314) May 20, 2025 Seems like Coding agents that can submit PRs are the new shiny objects. Codex from OpenAI, Copilot coding agent from GitHub/Microsoft, Jules from Google, Claude and xAI when? — Samuel (@SamuelSurfboard) May 20, 2025 These more autonomous coding platforms follow the growth of “vibe coding,” where code and applications are generated mostly through prompting rather than hard coding written by humans. The entrance of Big Tech companies like Google and OpenAI into this arena brings coding agents even more to the forefront of the AI arms race.  More AI-powered code Even inside Google, Jules is not the only AI coding platform for building applications. Google offers Code Assist, AI Studio, Jules and Firebase.  Firebase, announced in April, allows non-coders to build applications and add AI features. Google updated the platform, adding a new AI Workspace for Firebase Studio and Firebase AI Logic to monitor AI usage.  Firebase Studio, powered by Gemini 2.5 Pro, allows users to build more sophisticated applications. Firebase AI Logic offers developers the means to add features to the app’s backend, like authentication and identity. It also allows people to check token usage or resolve latency issues without needing a third-party orchestration program.  Jeanine Banks, vice president and general manager for Developer X and head of Developer Relations at Google, told VentureBeat that Firebase differentiates itself from Jules and other Google coding products by being the first place people new to coding can experiment with making their own AI applications.  “Google offers many wonderful tools to help you with specialized parts of your stack. So, for example, you can use Google AI Studio, which helps in experimenting with your AI inference to figure out the best optimized prompts,” Banks said. “But Firebase is the single place that integrates all of those things together, and it’s a single place for full-stack developers and professionals, but also creators who are vibe coding.” source

Google’s Jules aims to out-code Codex in battle for the AI developer stack Read More »

Guardian agents: New approach could reduce AI hallucinations to below 1%

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hallucination is a risk that limits the real-world deployment of enterprise AI. Many organizations have attempted to solve the challenge of hallucination reduction with various approaches, each with varying degrees of success. Among the many vendors that have been working for the last several years to reduce the risk is Vectara. The company got its start as an early pioneer in grounded retrieval, which is better known today by the acronym Retrieval Augmented Generation (RAG). An early promise of RAG was that it could help reduce hallucinations by sourcing information from provided content. While RAG is helpful as a hallucination reduction approach, hallucinations still occur even with RAG. Among existing industry solutions, most technologies focus on detecting hallucinations or implementing preventative guardrails. Vectara has unveiled a fundamentally different approach: automatically identifying, explaining and correcting AI hallucinations through guardian agents inside a new service called the Vectara Hallucination Corrector. The guardian agents are functionally software components that monitor and take protective actions within AI workflows. Instead of just applying rules inside of an LLM, the promise of guardian agents is to apply corrective measures in an agentic AI approach that improves workflows. Vectara’s approach makes surgical corrections while preserving the overall content and providing detailed explanations of what was changed and why. The approach appears to deliver meaningful results. According to Vectara, the system can reduce hallucination rates for smaller language models under 7 billion parameters, to less than 1%. “As enterprises are implementing more agentic workflows, we all know that hallucinations are still an issue with LLMs and how that is going to exponentially amplify the negative impact of making mistakes in an agentic workflow is kind of scary for enterprises,” Eva Nahari, chief product officer at Vectara told VentureBeat in an exclusive interview. “So what we have set out as a continuation of our mission to build out trusted AI and enable the full potential of gen AI for enterprise… is this new track of releasing guardian agents.” The enterprise AI hallucination detection landscape It’s not surprising that every enterprise wants to have accurate AI. It’s also not surprising that there are many different options for reducing hallucinations. RAG approaches help reduce hallucinations by providing grounded responses from content, but they can still yield inaccurate results. One of the more interesting implementations of RAG is one from the Mayo Clinic, which uses a ‘reverse RAG‘ approach to limit hallucinations. Improving data quality and how vector data embeddings are created is another approach to improving accuracy. Among the many vendors working on that approach is database vendor MongoDB, which recently acquired advanced embedding and retrieval model vendor Voyage AI. Guardrails, available from many vendors, including Nvidia and AWS, among others, help detect risky outputs and can help with accuracy in some cases. IBM actually has a set of its Granite open-source models known as Granite Guardian that directly integrates guardrails as a series of fine-tuning instructions to reduce risky outputs. Another potential solution is using reasoning to validate output. AWS claims that its Bedrock Automated Reasoning approach catches 100% of hallucinations, though that claim is difficult to validate. Startup Oumi offers another approach: validating claims made by AI on a sentence-by-sentence basis by validating source materials with an open-source technology called HallOumi. How the guardian agent approach is different While there is merit to all the other approaches to hallucination reduction, Vectara claims its approach is different. Rather than just identifying if a hallucination is present and then either flagging or rejecting the content, the guardian agent approach actually corrects the issue. Nahari emphasized that the guardian agent takes action.  “It’s not just a learning on something,” she said. “It’s taking an action on behalf of someone, and that makes it an agent.” The technical mechanics of guardian agents The guardian agent is a multi-stage pipeline rather than a single model. Suleman Kazi, machine learning tech lead at Vectara told VentureBeat that the system comprises three key components: a generative model, a hallucination detection model and a hallucination correction model. This agentic workflow allows for dynamic guardrailing of AI applications, addressing a critical concern for enterprises hesitant to fully embrace generative AI technologies. Rather than wholesale elimination of potentially problematic outputs, the system can make minimal, precise adjustments to specific terms or phrases. Here’s how it works: A primary LLM generates a response Vectara’s hallucination detection model (Hughes Hallucination Evaluation Model) identifies potential hallucinations If hallucinations are detected above a certain threshold, the correction agent activates The correction agent makes minimal, precise changes to fix inaccuracies while preserving the rest of the content The system provides detailed explanations of what was hallucinated and why Why nuance matters for hallucination detection The nuanced correction capabilities are critically important. Understanding the context of the query and source materials can distinguish between an accurate answer and a hallucination. When discussing the nuances of hallucination correction, Kazi provided a specific example to illustrate why blanket hallucination correction isn’t always appropriate. He described a scenario where an AI is processing a science fiction book that describes the sky as red, instead of the typical blue. In this context, a rigid hallucination correction system might automatically “correct” the red sky to blue, which would be incorrect for the creative context of a science fiction narrative.  The example was used to demonstrate that hallucination correction needs contextual understanding. Not every deviation from expected information is a true hallucination – some are intentional creative choices or domain-specific descriptions. This highlights the complexity of developing an AI system that can distinguish between genuine errors and purposeful variations in language and description. Alongside its guardian agent, Vectara is releasing HCMBench, an open-source evaluation toolkit for hallucination correction models. This benchmark provides standardized ways to evaluate how well different approaches correct hallucinations. The goal of the benchmark is to help the community at large and to enable enterprises to evaluate the accuracy of hallucination

Guardian agents: New approach could reduce AI hallucinations to below 1% Read More »

Inside Google’s AI leap: Gemini 2.5 thinks deeper, speaks smarter and codes faster

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google is moving closer to its goal of a “universal AI assistant” that can understand context, plan and take action.  Today at Google I/O, the tech giant announced enhancements to its Gemini 2.5 Flash — it’s now better across nearly every dimension, including benchmarks for reasoning, code and long context — and 2.5 Pro, including an experimental enhanced reasoning mode, ‘Deep Think,’ that allows Pro to consider multiple hypotheses before responding.  “This is our ultimate goal for the Gemini app: An AI that’s personal, proactive and powerful,” Demis Hassabis, CEO of Google DeepMind, said in a press pre-brief.  ‘Deep Think’ scores impressively on top benchmarks Google announced Gemini 2.5 Pro — what it considers its most intelligent model yet, with a one-million-token context window — in March, and released its “I/O” coding edition earlier this month (with Hassabis calling it “the best coding model we’ve ever built!”).  “We’ve been really impressed by what people have created, from turning sketches into interactive apps to simulating entire cities,” said Hassabis.  He noted that, based on Google’s experience with AlphaGo, AI model responses improve when they’re given more time to think. This led DeepMind scientists to develop Deep Think, which uses Google’s latest cutting-edge research in thinking and reasoning, including parallel techniques. Deep Think has shown impressive scores on the hardest math and coding benchmarks, including the 2025 USA Mathematical Olympiad (USAMO). It also leads on LiveCodeBench, a difficult benchmark for competition-level coding, and scores 84.0% on MMMU, which tests multimodal understanding and reasoning. Hassabis added, “We’re taking a bit of extra time to conduct more frontier safety evaluations and get further input from safety experts.” (Meaning: As for now, it is available to trusted testers via the API for feedback before the capability is made widely available.) Overall, the new 2.5 Pro leads popular coding leaderboard WebDev Arena, with an ELO score — which measures the relative skill level of players in two-player games like chess — of 1420 (intermediate to proficient). It also leads across all categories of the LMArena leaderboard, which evaluates AI based on human preference.  Since its launch, “we’ve been really impressed by what [users have] created, from turning sketches into interactive apps to simulating entire cities,” said Hassabis.  Important updates to Gemini 2.5 Pro, Flash Also today, Google announced an enhanced 2.5 Flash, considered its workhorse model designed for speed, efficiency and low cost. 2.5 Flash has been improved across the board in benchmarks for reasoning, multimodality, code and long context — Hassabis noted that it’s “second only” to 2.5 Pro on the LMArena leaderboard. The model is also more efficient, using 20 to 30% fewer tokens. Google is making final adjustments to 2.5 Flash based on developer feedback; it is now available for preview in Google AI Studio, Vertex AI and in the Gemini app. It will be generally available for production in early June. Google is bringing additional capabilities to both Gemini 2.5 Pro and 2.5 Flash, including native audio output to create more natural conversational experiences, text-to-speech to support multiple speakers, thought summaries and thinking budgets.  With native audio input (in preview), users can steer Gemini’s tone, accent and style of speaking (think: directing the model to be melodramatic or maudlin when telling a story). Like Project Mariner, the model is also equipped with tool use, allowing it to search on users’ behalf.  Other experimental early voice features include affective dialogue, which gives the model the ability to detect emotion in user voice and respond appropriately; proactive audio that allows it to tune out background conversations; and thinking in the Live API to support more complex tasks.  New multiple-speaker features in both Pro and Flash support more than 24 languages, and the models can quickly switch from one dialect to another. “Text-to-speech is expressive and can capture subtle nuances, such as whispers,” Koray Kavukcuoglu, CTO of Google DeepMind, and Tulsee Doshi, senior director for product management at Google DeepMind, wrote in a blog posted today.  Further, 2.5 Pro and Flash now include thought summaries in the Gemini API and Vertex AI. These “take the model’s raw thoughts and organize them into a clear format with headers, key details, and information about model actions, like when they use tools,” Kavukcuoglu and Doshi explain. The goal is to provide a more structured, streamlined format for the model’s thinking process and give users interactions with Gemini that are simpler to understand and debug.  Like 2.5 Flash, Pro is also now equipped with ‘thinking budgets,’ which gives developers the ability to control the number of tokens a model uses to think before it responds, or, if they prefer, turn its thinking capabilities off altogether. This capability will be generally available in coming weeks. Finally, Google has added native SDK support for Model Context Protocol (MCP) definitions in the Gemini API so that models can more easily integrate with open-source tools. As Hassabis put it: “We’re living through a remarkable moment in history where AI is making possible an amazing new future. It’s been relentless progress.” source

Inside Google’s AI leap: Gemini 2.5 thinks deeper, speaks smarter and codes faster Read More »