VentureBeat

Windsurf: OpenAI’s potential $3B bet to drive the ‘vibe coding’ movement

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More ‘Vibe coding’ is a term of the moment, as it refers to a more accepted use of AI and natural language prompts for basic code completion.  OpenAI is reportedly looking to get in on the movement — and own more of the full-stack coding experience — as it eyes a $3 billion acquisition of Windsurf (formerly Codeium). If the deal materializes, it would be OpenAI’s most expensive acquisition to date.  The news comes on the heels of the company’s release of o3 and o4-mini, which are capable of “thinking with images,” or more intuitively understanding low-quality sketches and diagrams. This development follows the launch of the GPT-4.1 model family. The AI company nobody can stop talking about also recently raised a $40 billion funding round.  Industry watchers and insiders have been abuzz about the potential deal, as it could not only make OpenAI an even bigger industry player than it already is, but also further accelerate the cultural adoption of vibe coding.  “Windsurf could be game-changing for OpenAI because it is one of the tools that developers are racing to,” Lisa Martin, research director at The Futurum Group, told VentureBeat. “This deal could solidify OpenAI as a developer’s best friend.” A bet on vibe coding? AI-assisted coding isn’t a new concept by a long shot, but “vibe coding” — a term coined by OpenAI cofounder Andrej Karpathy — is a relatively new approach, as it leverages generative AI and natural language prompts to automate coding tasks.  This is compared to other AI coding assistants and no-code and low-code tools that use visual drag-and-drop elements. Vibe coding is all about incorporating AI into end-to-end development workflows, with the focus being intent rather than manual coding minutiae.  Windsurf is among the top tools in the space, along with Cursor, Replit, Lovable, Bolt, Devin and Aider. The company released Wave 6 earlier this month, which aims to address common workflow bottlenecks.  “Windsurf has been leading the charge in building truly AI-native development tools, helping developers accelerate delivery without compromising on experience,” said Mitchell Johnson, chief product development officer at software security firm Sonatype. “Like early open source, this started as ‘outlaw tech’ — but it’s quickly becoming foundational.”  Andrew Hill, CEO and co-founder of crowdsource AI agent platform Recall, said the potential acquisition is “a bet on vibe coding as the future of software development.” Windsurf has fast feedback loops, good defaults, and “just the right toggles” for people with the right intuition to guide AI to solve their problems. It is also an environment designed for co-creation.  “Let the coding leapfrogging commence from Replit, Claude, Cursor, Windsurf — what’s next?,” said Hill, calling vibe coding a “productivity unlock.” “The best agents will be built by humans who can vibe through a hundred ideas in a weekend.”  OpenAI owning more of the stack Others note that if OpenAI does acquire Windsurf, it signals a clear move to own more of the full-stack coding experience rather than just supplying the underlying models.  “Windsurf has focused on developer-centric workflows, not just raw code generation, which aligns with the growing need for contextual and collaborative coding tools,” said Kaveh Vahdat, AI industry watcher and founder of RiseAngle and RiseOpp.  Arvind Rongala, CEO of corporate training services company Edstellar, called it more of a power move than a software grab. With vibe coding, developers want environments that are “expressive, intuitive and nearly collaborative, rather than merely text editors.”  With Windsurf, OpenAI would have direct access to the next generation of code creation and sharing, he noted, with the plan being vertical integration. “The intelligence layer already belongs to OpenAI. It wants the canvas now.”  OpenAI has enormous power over not just what is developed, but how it is built, said Rongala, since it owns the creative tools that developers use for hours every day. “This isn’t about taking market share away from Replit or GitHub,” he said. “Making such platforms seem antiquated is the goal.” A strategy move or a scramble? Vahdat pointed out that a Windsurf acquisition would put OpenAI in more direct competition with GitHub Copilot and Amazon CodeWhisperer, both of which are backed by platform giants.  “The real value here is not just in the tool itself but in the distribution and user behavior data that comes with it,” he said. “That kind of insight is strategically important for improving AI coding systems at scale.”  The move is especially interesting because it could position OpenAI more directly against Microsoft, even though the two are closely partnered through tools like GitHub Copilot, noted Brian Jackson, principal research director at Info-Tech Research Group.  A deal would support OpenAI’s “larger strategy of moving beyond simple chat interactions and becoming a tool that helps users take real action and automate everyday workflows,” he said.  Still, Sonatype’s Johnson noted, what if Windsurf becomes tightly coupled with OpenAI’s ecosystem? Developers benefit most when tools can integrate freely with the AI models that suit their needs — whether that’s GPT, Claude or open-source.  “If ownership limits that flexibility, it could introduce a form of vendor lock-in that slows the very momentum Windsurf helped create,” he said.  Some OpenAI critics, meanwhile, see it as a desperate move. Matt Murphy, partner with Menlo Ventures, called Anthropic superior at coding, and the company has the best models and strongest partnerships.  “OpenAI’s move here feels like a scramble to close the gap — but it risks alienating key allies and still doesn’t address the core issue: Claude is the better model,” he posited.  source

Windsurf: OpenAI’s potential $3B bet to drive the ‘vibe coding’ movement Read More »

NVIDIA announcements, news and more, from GTC 2025

Presented by NVIDIA The AI revolution is accelerating driven by billion-parameter reasoning models needed to develop agentic and physical AI. As NVIDIA founder and CEO Jensen Huang shared in his GTC keynote, the move from training to full-production inference is causing AI compute demand to skyrocket as data centers worldwide transform into AI factories designed to churn out millions of user queries efficiently and effectively. To meet this $1 trillion opportunity, NVIDIA at GTC unveiled major advancements – from the Blackwell Ultra AI platform and an operating system for AI factories to advancements in networking, robotics and accelerated computing. Blackwell is already in full production — delivering an astonishing 40x performance boost over Hopper. This architecture is redefining AI model training and inference, making AI applications more efficient and more scalable. And coming second-half 2025 is the next evolution of the Blackwell AI factory platform: Blackwell Ultra — a powerhouse GPU with expanded memory to support the next generation of AI models. NVIDIA continues to move fast, committed to an annual AI architecture refresh. NVIDIA Vera Rubin is designed to supercharge AI data center performance and efficiency. Beyond GPUs, AI infrastructure is undergoing a seismic shift with innovations in photonics, AI-optimized storage and advanced networking. These breakthroughs will dramatically enhance scalability, efficiency and energy consumption across massive AI data centers. Meanwhile, physical AI for robotics and industry is a colossal $50 trillion opportunity, according to Huang. From manufacturing and logistics to healthcare and beyond, AI-powered automation is poised to reshape entire industries. NVIDIA Isaac and Cosmos platforms are at the forefront, driving the next era of AI-driven robotics. Some of the NVIDIA announcements at GTC NVIDIA Roadmap: The NVIDIA roadmap includes Vera Rubin, tobe released in the second half of 2026, followed by the launch of Vera Rubin Ultra in 2027. The Rubin chips and servers boast improved speeds, especially in data transfers between chips, which is a critical feature for large AI systems with many chips. And scheduled for 2028 is Feynman, the next architecture to be released, making use of next-gen HBM memory. DGX Personal AI computers: Powered by the NVIDIA Grace Blackwell platform, DGX Spark and DGX Station are designed to develop, fine-tune and inference large models on desktops. They’ll be manufactured by a number of companies, including ASUS, Dell and HP. Spectrum-X AND Quantum-X networking platforms: These silicon photonics networking switches help AI factories connect millions of GPUs across sites, and reduce energy consumption dramatically. The Quantum-X Photonics InfiniBand switches will be available later this year, and Spectrum-X Photonics Ethernet switches will arrive in 2026. Dynamo Software: Released for free, open-source Dynamo software helps speed the process of multi-step reasoning, improving efficiency and reducing time to innovation in AI factories. NVIDIA Accelerated Quantum Research Center: A Boston-based research center will provide cutting-edge technologies to advance quantum computing in collaboration with leading hardware and software makers. NVIDIA ISAAC GR00T N1: A foundational model for humanoid robots, GR00T N1 is the world’s first open, fully customizable foundation model for generalized humanoid reasoning and skills. It has a dual system similar to reasoning models, for both fast and slow thinking.   Newton Physics Engine: NVIDIA also announced a collaboration with Google DeepMind and Disney Research to develop Newton, an open-source physics engine that lets robots learn how to handle complex tasks with greater precision. These are just the highlights — don’t miss the full GTC recap, live on NVIDIA’s blog. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

NVIDIA announcements, news and more, from GTC 2025 Read More »

Beyond ARC-AGI: GAIA and the search for a real intelligence benchmark

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Intelligence is pervasive, yet its measurement seems subjective. At best, we approximate its measure through tests and benchmarks. Think of college entrance exams: Every year, countless students sign up, memorize test-prep tricks and sometimes walk away with perfect scores. Does a single number, say a 100%, mean those who got it share the same intelligence — or that they’ve somehow maxed out their intelligence? Of course not. Benchmarks are approximations, not exact measurements of someone’s — or something’s — true capabilities. The generative AI community has long relied on benchmarks like MMLU (Massive Multitask Language Understanding) to evaluate model capabilities through multiple-choice questions across academic disciplines. This format enables straightforward comparisons, but fails to truly capture intelligent capabilities. Both Claude 3.5 Sonnet and GPT-4.5, for instance, achieve similar scores on this benchmark. On paper, this suggests equivalent capabilities. Yet people who work with these models know that there are substantial differences in their real-world performance. What does it mean to measure ‘intelligence’ in AI? On the heels of the new ARC-AGI benchmark release — a test designed to push models toward general reasoning and creative problem-solving — there’s renewed debate around what it means to measure “intelligence” in AI. While not everyone has tested the ARC-AGI benchmark yet, the industry welcomes this and other efforts to evolve testing frameworks. Every benchmark has its merit, and ARC-AGI is a promising step in that broader conversation.  Another notable recent development in AI evaluation is ‘Humanity’s Last Exam,’ a comprehensive benchmark containing 3,000 peer-reviewed, multi-step questions across various disciplines. While this test represents an ambitious attempt to challenge AI systems at expert-level reasoning, early results show rapid progress — with OpenAI reportedly achieving a 26.6% score within a month of its release. However, like other traditional benchmarks, it primarily evaluates knowledge and reasoning in isolation, without testing the practical, tool-using capabilities that are increasingly crucial for real-world AI applications. In one example, multiple state-of-the-art models fail to correctly count the number of “r”s in the word strawberry. In another, they incorrectly identify 3.8 as being smaller than 3.1111. These kinds of failures — on tasks that even a young child or basic calculator could solve — expose a mismatch between benchmark-driven progress and real-world robustness, reminding us that intelligence is not just about passing exams, but about reliably navigating everyday logic. The new standard for measuring AI capability As models have advanced, these traditional benchmarks have shown their limitations — GPT-4 with tools achieves only about 15% on more complex, real-world tasks in the GAIA benchmark, despite impressive scores on multiple-choice tests. This disconnect between benchmark performance and practical capability has become increasingly problematic as AI systems move from research environments into business applications. Traditional benchmarks test knowledge recall but miss crucial aspects of intelligence: The ability to gather information, execute code, analyze data and synthesize solutions across multiple domains. GAIA is the needed shift in AI evaluation methodology. Created through collaboration between Meta-FAIR, Meta-GenAI, HuggingFace and AutoGPT teams, the benchmark includes 466 carefully crafted questions across three difficulty levels. These questions test web browsing, multi-modal understanding, code execution, file handling and complex reasoning — capabilities essential for real-world AI applications. Level 1 questions require approximately 5 steps and one tool for humans to solve. Level 2 questions demand 5 to 10 steps and multiple tools, while Level 3 questions can require up to 50 discrete steps and any number of tools. This structure mirrors the actual complexity of business problems, where solutions rarely come from a single action or tool. By prioritizing flexibility over complexity, an AI model reached 75% accuracy on GAIA — outperforming industry giants Microsoft’s Magnetic-1 (38%) and Google’s Langfun Agent (49%). Their success stems from using a combination of specialized models for audio-visual understanding and reasoning, with Anthropic’s Sonnet 3.5 as the primary model. This evolution in AI evaluation reflects a broader shift in the industry: We’re moving from standalone SaaS applications to AI agents that can orchestrate multiple tools and workflows. As businesses increasingly rely on AI systems to handle complex, multi-step tasks, benchmarks like GAIA provide a more meaningful measure of capability than traditional multiple-choice tests. The future of AI evaluation lies not in isolated knowledge tests but in comprehensive assessments of problem-solving ability. GAIA sets a new standard for measuring AI capability — one that better reflects the challenges and opportunities of real-world AI deployment. Sri Ambati is the founder and CEO of H2O.ai. source

Beyond ARC-AGI: GAIA and the search for a real intelligence benchmark Read More »

OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI launched two groundbreaking AI models today that can reason with images and use tools independently, representing what experts call a step change in artificial intelligence capabilities. The San Francisco-based company introduced o3 and o4-mini, the latest in its “o-series” of reasoning models, which it claims are its most intelligent and capable models to date. These systems can integrate images directly into their reasoning process, search the web, run code, analyze files, and even generate images within a single task flow. “There are some models that feel like a qualitative step into the future. GPT-4 was one of those. Today is also going to be one of those days,” said Greg Brockman, OpenAI’s president, during a press conference announcing the release. “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas.” How OpenAI’s new models ‘think with images’ to transform visual problem-solving The most striking feature of these new models is their ability to “think with images” — not just see them, but manipulate and reason about them as part of their problem-solving process. “They don’t just see an image — they think with it,” OpenAI said in a statement sent to VentureBeat. “This unlocks a new class of problem-solving that blends visual and textual reasoning.” During a demonstration at the press conference, a researcher showed how o3 could analyze a physics poster from a decade-old internship, navigate its complex diagrams independently, and even identify that the final result wasn’t present in the poster itself. “It must have just read, you know, at least like 10 different papers in a few seconds for me,” Brandon McKenzie, a researcher at OpenAI working on multimodal reasoning, said during the demo. He estimated the task would have taken him “many days just for me to even like, onboard myself, back to my project, and then a few days more probably, to actually search through the literature.” The ability for AI to manipulate images in its reasoning process — zooming in on details, rotating diagrams, or cropping unnecessary elements — represents a novel approach that industry analysts say could revolutionize fields from scientific research to education. I had early access, o3 is an impressive model, seems very capable. Some fun examples:1) Cracked a business case I use in my class2) Creating some SVGs (images created by code alone)3) Writing a constrained story of two interlocking gyres4) Hard science fiction space battle. pic.twitter.com/TK4PKvKNoT — Ethan Mollick (@emollick) April 16, 2025 OpenAI executives emphasized that these releases represent more than just improved models — they’re complete AI systems that can independently use and chain together multiple tools when solving problems. “We’ve trained them to use tools through reinforcement learning—teaching them not just how to use tools, but to reason about when to use them,” the company explained in its release. Greg Brockman highlighted the models’ extensive tool use capabilities: “They actually use these tools in their chain of thought as they’re trying to solve a hard problem. For example, we’ve seen o3 use like 600 tool calls in a row trying to solve a really hard task.” This capability allows the models to perform complex, multi-step workflows without constant human direction. For instance, if asked about future energy usage patterns in California, the AI can search the web for utility data, write Python code to analyze it, generate visualizations, and produce a comprehensive report — all as a single fluid process. OpenAI surges ahead of competitors with record-breaking performance on key AI benchmarks OpenAI claims o3 sets new state-of-the-art benchmarks across key measures of AI capability, including Codeforces, SWE-bench, and MMMU. In evaluations by external experts, o3 reportedly makes 20 percent fewer major errors than its predecessor on difficult, real-world tasks. The smaller o4-mini model is optimized for speed and cost efficiency while maintaining strong reasoning capabilities. On the AIME 2025 mathematics competition, o4-mini scored 99.5 percent when given access to a Python interpreter. “I really do believe that with this suite of models, o3 and o4-mini, we’re going to see more advances,” Mark Chen, OpenAI’s head of research, said during the press conference. The timing of this release is significant, coming just two days after OpenAI unveiled its GPT-4.1 model, which excels at coding tasks. The rapid succession of announcements signals an acceleration in the competitive AI landscape, where OpenAI faces increasing pressure from Google’s Gemini models, Anthropic’s Claude, and Elon Musk’s xAI. Last month, OpenAI closed what amounts to the largest private tech funding round in history, raising $40 billion at a $300 billion valuation. The company is also reportedly considering building its own social network, potentially to compete with Elon Musk’s X platform and to secure a proprietary source of training data. o3 and o4-mini are super good at coding, so we are releasing a new product, Codex CLI, to make them easier to use. this is a coding agent that runs on your computer. it is fully open source and available today; we expect it to rapidly improve. — Sam Altman (@sama) April 16, 2025 How OpenAI’s new models transform software engineering with unprecedented code navigation abilities One area where the new models particularly excel is software engineering. Brockman noted during the press conference that o3 is “actually better than I am at navigating through our OpenAI code base, which is really useful.” As part of the announcement, OpenAI also introduced Codex CLI, a lightweight coding agent that runs directly in a user’s terminal. The open-source tool allows developers to leverage the models’ reasoning capabilities for coding tasks, with support for screenshots and sketches. “We’re also sharing a new experiment: Codex CLI, a lightweight coding agent you can run from your terminal,” the company announced. “You can get the benefits of multimodal reasoning from the command line by passing screenshots or low fidelity sketches to the model, combined with access to your code locally.”

OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously Read More »

Moveworks joins AI agent library craze

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI agent marketplaces have become ubiquitous as enterprises look for ready-made agents they can customize and find agents for most of their use cases.  ServiceNow, Google, Writer, Amazon Web Services and Microsoft are just a few of the companies that have or recently announced platforms where customers can choose pre-built agents and then deploy them to their organizations.  Banking on this popularity, enterprise AI company Moveworks launched an AI Agent Marketplace, where customers can find more than 100 pre-built agents and install these into their systems.  Bhavin Shah, founder and CEO of Moveworks, told VentureBeat in an interview that agent marketplaces exist for enterprises to spin up agentic use cases quickly and can act as idea generators for other use cases.  “What we found is that to go from this transformation and business productivity is to identify the kinds of agents that can be used to translate these objectives that people have,” Shah said. “Sometimes you have an idea and ask how do we make it work, so what we’ve done with the AI Agent Marketplace is show real agents that connect to third-party systems.” The AI Agent Marketplace from Moveworks has over 100 agents across HR, sales, finance and IT operations. Some of these agents are for timesheet management, talent recruitment and expense management.  Customize agents to fit workflows Shah said enterprises can take one of the agents in the marketplace and configure it to their own needs.  The marketplace offers agent templates that connect to third-party platforms that can be integrated into an organization’s tech and data environment.  Shah noted that in the previous iteration of agents, that is robotic process automation (RPA), enterprises had to write out workflows “that were step by step and accounted for every variation and every if/then/this scenario which were brittle.” With AI agents, these workflows are coded more easily and can be connected to data sources for more context and understanding.  Moveworks said its marketplace differs from other marketplaces in that “unlike other marketplaces with simple prompt packs,” its version offers integrations to enterprise workloads. Users can discover AI agents through plugins to “extend the capabilities of an AI agent” or install these through a low-code platform.  More and more marketplaces It’s not a surprise that Moveworks is creating an agent marketplace. Think of agent marketplaces or agent libraries as the evolution of model gardens. Cloud model gardens, like those from Amazon, Google, Hugging Face, and Microsoft, offer models from different providers so customers can build applications or agents with any LLM they want. Meanwhile, agent libraries or marketplaces become a repository of existing agents that users can browse through. ServiceNow launched an agent library last year, while Microsoft followed with one of the largest AI agent marketplaces. Salesforce has its AgentExchange with more than 200 partners, and AWS has a library on Bedrock.  Ashley Sprague, senior director, IT and corporate engineering at GitHub, told VentureBeat in a separate interview that agent libraries will help cut down on the time needed to bring AI into the hands of all of its employees.  “We have this backlog of ideas, so when you look into the marketplace, you can go and just search for solutions to those ideas, and they’re pre-built and ready to go,” she said.  GitHub, which uses Moveworks’ Creator Studio to help build AI applications, has not yet used the AI Agent Marketplace. Sprague said, however, that because the company already uses Moveworks, it made sense that more teams in the company “who are interested in getting involved with agents can get involved because they can efficiently work from this low-code environment.  Sprague added that it could free up her IT teams from building all the agent ideas from other teams.  Harmonizing platforms Moveworks’ agent platform could soon be integrated with other libraries. In March, ServiceNow announced it was acquiring Moveworks for an undisclosed amount.  Shah said that while the deal is still being finalized, Moveworks’s AI Agent Marketplace operates independently. However, since both companies share multiple customers, he believes Moveworks’ marketplace and ServiceNow’s agent library exist for different levels of users.  “There is overlap, which tells you that we’ve actually coexisted quite nicely,” Shah said. “What ServiceNow identified and what you saw play out over the years is that they have an incredible agentic platform that has a lot of different capabilities for fulfillers and IT and HR teams, and they go basically north to south. We go east to west across employees.” source

Moveworks joins AI agent library craze Read More »

Google Cloud Next ’25: New AI chips and agent ecosystem challenge Microsoft and Amazon

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google Cloud is aggressively attempting to solidify its position in the increasingly competitive artificial intelligence landscape. It has announced a sweeping array of new technologies focused on “thinking models,” agent ecosystems, and specialized infrastructure designed specifically for large-scale AI deployments. At its annual Cloud Next conference in Las Vegas today, Google revealed its seventh-generation Tensor Processing Unit (TPU), Ironwood. The company claims it delivers more than 42 exaflops of computing power per pod—a staggering 24 times more powerful than the world’s leading supercomputer, El Capitan. “The opportunity with AI is as big as it gets,” said Amin Vahdat, Google’s vice president and general manager of ML systems and cloud AI, during a press conference ahead of the event. “Together with our customers, we’re powering a new golden age of innovation.” The conference comes at a pivotal moment for Google, which has seen considerable momentum in its cloud business. In January, the company reported that its Q4 2024 cloud revenue reached $12 billion, a 30% increase year over year. Google executives say active users in AI Studio and the Gemini API have increased by 80% in just the past month. How Google’s new Ironwood TPUs are transforming AI computing with power efficiency Google is positioning itself as the only major cloud provider with a “fully AI-optimized platform” built from the ground up for what it calls “the age of inference,” where the focus shifts from model training to actually using AI systems to solve real-world problems. The star of Google’s infrastructure announcements is Ironwood, representing a fundamental chip design philosophy shift. Unlike previous generations that balanced training and inference, Ironwood was built specifically to run complex AI models after they’ve been trained. “It’s no longer about the data put into the model, but what the model can do with data after it’s been trained,” Vahdat explained. Each Ironwood pod contains more than 9,000 chips and delivers two times better power efficiency than the previous generation. This focus on efficiency addresses one of the most pressing concerns about generative AI: its enormous energy consumption. In addition to the new chips, Google is opening up its massive global network infrastructure to enterprise customers through Cloud WAN (Wide Area Network). This service makes Google’s 2-million-mile fiber network — the same one that powers consumer services like YouTube and Gmail — available to businesses. According to Google, Cloud WAN improves network performance by up to 40% while simultaneously reducing the total cost of ownership by the same percentage compared to customer-managed networks. This represents an unusual step for a hyperscaler, essentially turning its internal infrastructure into a product. Inside Gemini 2.5: How Google’s ‘thinking models’ improve enterprise AI applications On the software side, Google is expanding its Gemini model family with Gemini 2.5 Flash, a cost-effective version of its flagship AI system that includes what the company describes as “thinking capabilities.” Unlike traditional large language models that generate responses directly, these “thinking models” break down complex problems through multi-step reasoning and even self-reflection. Gemini 2.5 Pro launched two weeks ago and is positioned for high-complexity use cases like drug discovery and financial modeling. At the same time, the newly announced Flash variant adjusts its reasoning depth based on prompt complexity to balance performance and cost. Google is also significantly expanding its generative media capabilities with updates to Imagen (for image generation), Veo (video), Chirp (audio), and the introduction of Lyria, a text-to-music model. During a demonstration during the press conference, Nenshad Bardoliwalla, Director of Product Management for Vertex AI, showed how these tools could work together to create a promotional video for a concert, complete with custom music and sophisticated editing capabilities like removing unwanted elements from video clips. “Only Vertex AI brings together all of these models, along with third-party models onto a single platform,” Bardoliwalla said. Beyond single AI systems: How Google’s multi-agent ecosystem aims to enhance enterprise workflows Perhaps the most forward-looking announcements focused on creating what Google calls a “multi-agent ecosystem” — an environment where multiple AI systems can work together across different platforms and vendors. Google is introducing an Agent Development Kit (ADK) that allows developers to build multi-agent systems with less than 100 lines of code. The company is also proposing a new open protocol called Agent2Agent (A2A), allowing AI agents from different vendors to communicate with each other. “2025 will be a transition year where generative AI shifts from answering single questions to solving complex problems through agented systems,” Vahdat predicted. More than 50 partners, including major enterprise software providers like Salesforce, ServiceNow and SAP, have signed on to support this protocol, suggesting a potential industry shift toward interoperable AI systems. For non-technical users, Google is enhancing its Agent Space platform with features like Agent Gallery (providing a single view of available agents) and Agent Designer (a no-code interface for creating custom agents). During a demonstration, Google showed how a banking account manager could use these tools to analyze client portfolios, forecast cash flow issues, and automatically draft communications to clients — all without writing any code. From document summaries to drive-thru orders: How Google’s specialized AI agents are affecting industries Google is also deeply integrating AI across its Workspace productivity suite, with new features like “Help me Analyze” in Sheets, which automatically identifies insights from data without explicit formulas or pivot tables, and Audio Overviews in Docs, which create human-like audio versions of documents. The company highlighted five categories of specialized agents that are seeing significant adoption: customer service, creative work, data analysis, coding and security. In customer service, Google pointed to Wendy’s AI drive-through system, which now handles 60,000 orders daily, and The Home Depot’s “Magic Apron” agent, which offers home improvement guidance. For creative teams, companies like WPP are using Google’s AI to conceptualize and produce marketing campaigns at scale. Cloud AI competition intensifies: How Google’s comprehensive approach challenges Microsoft and Amazon Google’s announcements come amid intensifying competition in the

Google Cloud Next ’25: New AI chips and agent ecosystem challenge Microsoft and Amazon Read More »

Claude just gained superpowers: Anthropic’s AI can now search your entire Google Workspace without you

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Anthropic launched major upgrades to its Claude AI assistant today, introducing an autonomous research capability and Google Workspace integration that transform the AI into what the company calls a “true virtual collaborator” for enterprise users. The expansion directly challenges OpenAI and Microsoft in the increasingly competitive market for AI productivity tools. The new Research capability enables Claude to independently conduct multiple searches that build upon each other while determining what to investigate next. Simultaneously, the Google Workspace integration connects Claude to users’ emails, calendars, and documents, eliminating the need for manual uploads. Credit: Anthropic ‘Minutes not hours’: How Claude’s research speed aims to win over busy executives Anthropic positions Claude’s Research functionality as dramatically faster than competing solutions, promising comprehensive answers in minutes rather than the “up to 30 minutes” they claim rival products require. “At Anthropic, we’re laser-focused on enterprise workers and use cases, and our Research tool is reflective of that,” an Anthropic spokesperson told VentureBeat. “Research is a tool to help enterprise workers get well-researched answers to queries in less than a minute. Other solutions on the market take up to 30 minutes to generate responses — that’s not what your average Sales exec or financial services employee needs.” This speed-focused approach represents a calculated bet that enterprise users prioritize quick responses for time-sensitive decisions over more exhaustive but slower research capabilities. Claude’s Drive integration suggests ways to analyze documents, revealing patterns in how users work with their files. (Credit: Anthropic) Enterprise-grade security promises to keep company data protected while Claude works For technical decision makers considering AI tools, data security remains paramount. Anthropic emphasizes its security-first approach, particularly for the Google Drive Catalog feature that uses retrieval augmented generation (RAG) techniques. “Privacy is foundational to our approach. We don’t train our models on user data by default,” the Anthropic spokesperson said. “We’ve implemented strict authentication and access control mechanisms. Each user or organization’s connections to external services are properly authenticated and authorized for only that specific user or organization.” The company restricts its Google Drive Catalog feature to Enterprise plan customers, which includes “enhanced security infrastructure, dedicated support, and advanced administrative controls designed for organizations with stringent data protection requirements.” Claude prepares for a sales meeting by scanning a user’s emails and calendar – a task that once took hours. (Credit: Anthropic) Fighting AI hallucinations: How Claude uses citations to build trust Anthropic tackles one of AI’s most persistent challenges — the tendency to generate plausible-sounding but incorrect information — through explicit source citation. “Anthropic is dedicated to building the most trustworthy enterprise AI tools. When conducting research — whether on the Web or in internal docs — Claude cites its sources so users can easily reference where information is coming from,” according to the spokesperson. This verification mechanism addresses growing enterprise demand for accountable AI, especially in business-critical applications where accuracy determines success. Real-world ROI: Early users report hours saved across multiple departments Though still in early beta, Anthropic reports promising results from internal testing, with employees “saved hours every week that they typically spent drudging through a sea of docs and emails.” The company highlighted several implementations, including communications teams compiling launch information and briefing books, engineers preparing for client meetings by researching industry news and reviewing existing documents, and sales leaders analyzing sector growth without manual data work. “New employees are skipping the classic ‘new hire questions’ and asking Claude to summarize upcoming company moments, OKRs and share relevant style guides,” the spokesperson noted, suggesting potential applications beyond day-to-day productivity. The AI assistant wars heat up: How Claude compares to ChatGPT, Copilot, and Gemini Anthropic’s updates arrive amid fierce competition. OpenAI’s ChatGPT offers web browsing, Microsoft’s Copilot integrates with Office 365, and Google’s Gemini connects with Workspace applications. Anthropic differentiates Claude through a combination of speed, enterprise focus, and what it describes as a more autonomous approach to research and information retrieval. The Research feature is available in early beta for Max, Team, and Enterprise plans in the United States, Japan, and Brazil. Web search has expanded to Brazil and Japan after its U.S. launch in March. The Google Workspace integration is available in beta to all paid users, though administrators must enable it company-wide before individual use. From assistants to collaborators: The new era of AI knowledge work As AI systems evolve from simple query responders to proactive research partners, they’re reshaping how knowledge work happens. Claude’s new capabilities represent more than incremental feature additions—they signal a fundamental shift in the relationship between professionals and their digital tools. For enterprises navigating digital transformation, the question isn’t whether AI will transform knowledge work, but how quickly. Anthropic’s vision of AI that can independently research, reason through problems, and connect directly with workplace systems suggests a future where the most valuable human skills may not be information gathering but rather asking the right questions. n the high-stakes race to make AI truly useful for everyday work, Claude’s latest upgrades reveal that the finish line isn’t just smarter AI — it’s AI that works the way humans already do. source

Claude just gained superpowers: Anthropic’s AI can now search your entire Google Workspace without you Read More »

Cohere launches Embed 4: New multimodal search model processes 200-page documents

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprise retrieval augmented generation (RAG) remains integral to the current agentic AI craze. Taking advantage of the continued interest in agents, Cohere released the latest version of its embeddings model with longer context windows and more multimodality.  Cohere’s Embed 4 builds on the multimodal updates of Embed 3 and adds more capabilities around unstructured data. Thanks to a 128,000 token context window, organizations can generate embeddings for documents with around 200 pages.  “Existing embedding models fail to natively understand complex multimodal business materials,‬‭ leading companies to develop cumbersome data pre-processing pipelines that only slightly‬‭ improve accuracy,” Cohere said in a blog post. “Embed 4 solves this problem, allowing enterprises and their employees to‬‭ efficiently surface insights that are hidden within mountains of unsearchable information.‬” Enterprises can deploy Embed 4 on virtual private clouds or on-premise technology stacks for added data security.  Companies can generate embeddings to transform their documents or other data into numerical representations for RAG use cases. Agents can then reference these embeddings to answer prompts.  Domain-specific knowledge Embed 4 “excels in regulated industries” like finance, healthcare and manufacturing, the company said. Cohere, which mainly focuses on enterprise AI use cases, said its models consider the security needs of regulated sectors and have a strong understanding of businesses. The company trained Embed 4 “to be robust against noisy real-world data” in that it remains accurate despite the “imperfections” of enterprise data, such as spelling mistakes and formatting issues.  “It is also performant at searching over scanned documents and‬ handwriting. These formats are common in legal paperwork, insurance invoices, and expense‬ receipts. This capability eliminates the need for complex data preparations or pre-processing‭ pipelines, saving businesses time and operational costs,” Cohere said.  Organizations can use Embed 4 for investor presentations, due diligence files, clinical trial reports, repair guides and product documents.  ‭ The model supports more than 100 languages, just like the previous version of the model.  Agora, a customer of Cohere, used Embed 4 for its AI search engine and found that the model could surface relevant products. “E-commerce data is complex, containing images and multifaceted text descriptions. Being able to‬ represent our products in a unified embedding makes our search faster and our internal tooling more‬ efficient,” said Param Jaggi, Founder of Agora‬, in the blog post.  Agent use cases Cohere argues that models like Embed 4 would improve agentic use cases and claims it can be “the optimal search engine” for agents and AI assistants across an enterprise. “In addition to‬‭ strong accuracy across data types, the model delivers enterprise-grade efficiency,” Cohere said. “This allows it‬ to scale to meet the demands of large organizations.” Cohere added that Embed 4 creates compressed data embeddings to cut high storage costs.  Embeddings and RAG-based searches let the agent reference specific documents to fulfill request-related tasks. Many believe these provide more accurate results, ensuring the agents do not respond with incorrect or hallucinated answers.  Other embedding models that Cohere competes against include Qodo’s Qodo-Embed-1-1.5B and models from Voyage AI, which database vendor MongoDB recently acquired.   ‭ source

Cohere launches Embed 4: New multimodal search model processes 200-page documents Read More »

Sam Altman at TED 2025: Inside the most uncomfortable — and important — AI interview of the year

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI CEO Sam Altman revealed that his company has grown to 800 million weekly active users and is experiencing “unbelievable” growth rates, during a sometimes tense interview at the TED 2025 conference in Vancouver last week. “I have never seen growth in any company, one that I’ve been involved with or not, like this,” Altman told TED head Chris Anderson during their on-stage conversation. “The growth of ChatGPT — it is really fun. I feel deeply honored. But it is crazy to live through, and our teams are exhausted and stressed.” The interview, which closed out the final day of TED 2025: Humanity Reimagined, showcased not just OpenAI’s skyrocketing success but also the increasing scrutiny the company faces as its technology transforms society at a pace that alarms even some of its supporters. ‘Our GPUs are melting’: OpenAI struggles to scale amid unprecedented demand Altman painted a picture of a company struggling to keep up with its own success, noting that OpenAI’s GPUs are “melting” due to the popularity of its new image generation features. “All day long, I call people and beg them to give us their GPUs. We are so incredibly constrained,” he said. This exponential growth comes as OpenAI is reportedly considering launching its own social network to compete with Elon Musk’s X, according to CNBC. Altman neither confirmed nor denied these reports during the TED interview. The company recently closed a $40 billion funding round, valuing it at $300 billion — the largest private tech funding in history — and this influx of capital will likely help address some of these infrastructure challenges. From non-profit to $300 billion giant: Altman responds to ‘Ring of Power’ accusations Throughout the 47-minute conversation, Anderson repeatedly pressed Altman on OpenAI’s transformation from a non-profit research lab to a for-profit company with a $300 billion valuation. Anderson voiced concerns shared by critics, including Elon Musk, who has suggested Altman has been “corrupted by the Ring of Power,” referencing “The Lord of the Rings.” Altman defended OpenAI’s path: “Our goal is to make AGI and distribute it, make it safe for the broad benefit of humanity. I think by all accounts, we have done a lot in that direction. Clearly, our tactics have shifted over time… We didn’t think we would have to build a company around this. We learned a lot about how it goes and the realities of what these systems were going to take from capital.” When asked how he personally handles the enormous power he now wields, Altman responded: “Shockingly, the same as before. I think you can get used to anything step by step… You’re the same person. I’m sure I’m not in all sorts of ways, but I don’t feel any different.” ‘Divvying up revenue’: OpenAI plans to pay artists whose styles are used by AI One of the most concrete policy announcements from the interview was Altman’s acknowledgment that OpenAI is working on a system to compensate artists whose styles are emulated by AI. “I think there are incredible new business models that we and others are excited to explore,” Altman said when pressed about apparent IP theft in AI-generated images. “If you say, ‘I want to generate art in the style of these seven people, all of whom have consented to that,’ how do you divvy up how much money goes to each one?” Currently, OpenAI’s image generator refuses requests to mimic the style of living artists without consent, but will generate art in the style of movements, genres, or studios. Altman suggested a revenue-sharing model could be forthcoming, though details remain scarce. Autonomous AI agents: The ‘most consequential safety challenge’ OpenAI has faced The conversation grew particularly tense when discussing “agentic AI” — autonomous systems that can take actions on the internet on a user’s behalf. OpenAI’s new “Operator” tool allows AI to perform tasks like booking restaurants, raising concerns about safety and accountability. Anderson challenged Altman: “A single person could let that agent out there, and the agent could decide, ‘Well, in order to execute on that function, I got to copy myself everywhere.’ Are there red lines that you have clearly drawn internally, where you know what the danger moments are?” Altman referenced OpenAI’s “preparedness framework” but provided few specifics about how the company would prevent misuse of autonomous agents. “AI that you give access to your systems, your information, the ability to click around on your computer… when they make a mistake, it’s much higher stakes,” Altman acknowledged. “You will not use our agents if you do not trust that they’re not going to empty your bank account or delete your data.” ’14 definitions from 10 researchers’: Inside OpenAI’s struggle to define AGI In a revealing moment, Altman admitted that even within OpenAI, there’s no consensus on what constitutes artificial general intelligence (AGI) — the company’s stated goal. “It’s like the joke, if you’ve got 10 OpenAI researchers in a room and asked to define AGI, you’d get 14 definitions,” Altman said. He suggested that rather than focusing on a specific moment when AGI arrives, we should recognize that “the models are just going to get smarter and more capable and smarter and more capable on this long exponential… We’re going to have to contend and get wonderful benefits from this incredible system.” Loosening the guardrails: OpenAI’s new approach to content moderation Altman also disclosed a significant policy change regarding content moderation, revealing that OpenAI has loosened restrictions on its image generation models. “We’ve given the users much more freedom on what we would traditionally think about as speech harms,” he explained. “I think part of model alignment is following what the user of a model wants it to do within the very broad bounds of what society decides.” This shift could signal a broader move toward giving users more control over AI outputs, potentially aligning with Altman’s expressed preference

Sam Altman at TED 2025: Inside the most uncomfortable — and important — AI interview of the year Read More »

OpenAI slashes prices for GPT-4.1, igniting AI price war among tech giants

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI released GPT-4.1 this morning, directly challenging competitors Anthropic, Google and xAI. By ramping up its coding and context-handling capabilities to a whopping one-million-token window and aggressively cutting API prices, GPT-4.1 is positioning itself as the go-to generative AI model. If you’re managing budgets or crafting code at scale, this pricing shake-up might just make your quarter. Performance upgrades at Costco prices The new GPT-4.1 series boasts serious upgrades, including a 54.6% win rate on the SWE-bench coding benchmark, marking a considerable leap from prior versions. But the buzz isn’t just about better benchmarks. Real-world tests by Qodo.ai on actual GitHub pull requests showed GPT-4.1 beating Anthropic’s Claude 3.7 Sonnet in 54.9% of cases, primarily thanks to fewer false positives and more precise, relevant code suggestions. That’s significant because Claude 3.7 Sonnet has been considered the coding leader when it comes to LLMs. OpenAI’s new pricing structure—openly targeting affordability—might finally tip the scales for teams wary of runaway AI expenses: Model Input cost (per Mtok) Output cost (per Mtok) GPT-4.1 $2.00 $8.00 GPT-4.1 mini $0.40 $1.60 GPT-4.1 nano $0.10 $0.40 The standout here? That generous 75% caching discount, effectively incentivizing developers to optimize prompt reuse—particularly beneficial for iterative coding and conversational agents. Feeling the heat Anthropic’s Claude models have established their footing by balancing power and cost. But GPT-4.1’s bold pricing undercuts their market position significantly: Model Input cost (per Mtok) Output cost (per Mtok) Claude 3.7 Sonnet $3.00 $15.00 Claude 3.5 Haiku $0.80 $4.00 Claude 3 Opus $15.00 $75.00 Anthropic still offers compelling caching discounts (up to 90% in some scenarios), but GPT-4.1’s base pricing advantage and developer-centric caching improvements position OpenAI as a budget-friendlier choice—particularly appealing for startups and smaller teams. Hidden financial pitfalls Gemini’s pricing complexity is becoming increasingly notorious in developer circles. According to Prompt Shield, Gemini’s tiered structure—especially with the powerful 2.5 Pro variant—can quickly escalate into financial nightmares due to surcharges for lengthy inputs and outputs that double past certain context thresholds: Model Input cost (per Mtok) Output cost (per Mtok) Gemini 2.5 Pro ≤200k $1.25 $10.00 Gemini 2.5 Pro >200k $2.50 $15.00 Gemini 2.0 Flash $0.10 $0.40 Moreover, Gemini lacks an automatic billing shutdown, which Prompt Shield says exposes developers to Denial-of-Wallet attacks—malicious requests designed to deliberately inflate your cloud bill, which Gemini’s current safeguards don’t fully mitigate. GPT-4.1’s predictable, no-surprise pricing seems to be a strategic counter to Gemini’s complexity and hidden risks. Context is king xAI’s Grok series, championed by Elon Musk, recently unveiled its API pricing for its latest models last week: Model Input Cost per Mtok Output (per Mtok) Grok-3 $3.00 $15.00 Grok-3 Fast-Beta $5.00 $25.00 Grok-3 Mini-Fast $0.60 $4.00 One complicating factor with Grok has been its context window. Musk touted that Grok 3 could handle 1 million tokens (similar to GPT-4.1’s claim), but the current API actually maxes out at 131k tokens​, well short of that promise. This discrepancy drew some criticism from users on X, pointing to a bit of overzealous marketing on xAI’s part​.  For developers evaluating Grok vs. GPT-4.1, this is notable: GPT-4.1 offers the full 1M context as advertised, whereas Grok’s API might not (at least at launch). In terms of pricing transparency, xAI’s model is straightforward on paper, but the limitations and the need to pay more for “fast” service show the trade-offs of a smaller player trying to compete with industry giants. Windsurf bets big on GPT-4.1’s developer appeal Demonstrating high confidence in GPT-4.1’s practical advantages, Windsurf—the AI-powered IDE—has offered an unprecedented free, unlimited GPT-4.1 trial for a week. This isn’t mere generosity; it’s a strategic gamble that once developers experience GPT-4.1’s capabilities and cost savings firsthand, reverting to pricier or less capable models will be a tough sell. A new era of competitive AI pricing OpenAI’s GPT-4.1 isn’t just shaking up the pricing game, it’s potentially setting new standards for the AI development community. With precise, reliable outputs verified by external benchmarks, simple pricing transparency, and built-in protections against runaway costs, GPT-4.1 makes a persuasive case for being the default choice in closed-model APIs. Developers should brace themselves—not just for cheaper AI, but for the domino effect this pricing revolution might trigger as Anthropic, Google, and xAI scramble to keep pace. For teams previously limited by cost, complexity, or both, GPT-4.1 might just be the catalyst for a new wave of AI-powered innovation. source

OpenAI slashes prices for GPT-4.1, igniting AI price war among tech giants Read More »