VentureBeat

Time Magazine appears to accidentally publish embargoed story confirming new Anthropic model

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The San Francisco-based AI startup Anthropic had already informed the general public that it planned to make an announcement today, Thursday, May 22nd with a livestream scheduled for 9:30 am PT. Many in the wider AI power user community, especially on X, began to theorize that the announcement would mark the launch of a new Claude model codenamed “Neptune,” based on earlier inspection of Anthropic documents and code online, which some took to be a new version of its powerful Claude large language model (LLM) family, potentially the long-awaited “Claude Opus,” a larger-parameter (denoting more internal settings and complexity) model successor to the current Claude 3.7 Sonnet. Now, it appears that Time Magazine has accidentally confirmed that rumor, publishing and quickly removing an article on its website, according to the observations of eagle-eyed AI programmers and news bloggers on X. Someone also appears to have published a full scrape of the Time article online on the news aggregator app Newsbreak, though that too has now been taken offline. The focus of Time’s piece was on safety risks and mitigations, but it does reveal that the new Claude is smart enough to potentially help novices create new bioweapons. There are scant specifics about the model size, cost, licensing terms, and performance on commonly used third-party AI benchmarks. For those, we will just have to wait until more information is revealed by Anthropic — or leaked by members of the press and AI community. source

Time Magazine appears to accidentally publish embargoed story confirming new Anthropic model Read More »

The future of engineering belongs to those who build with AI, not without it

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More When Salesforce CEO Marc Benioff recently announced that the company would not hire any more engineers in 2025, citing a “30% productivity increase on engineering” due to AI, it sent ripples through the tech industry. Headlines quickly framed this as the beginning of the end for human engineers — AI was coming for their jobs. But those headlines miss the mark entirely. What’s really happening is a transformation of engineering itself. Gartner named agentic AI as its top tech trend for this year. The firm also predicts that 33% of enterprise software applications will include agentic AI by 2028 — a significant portion, but far from universal adoption. The extended timeline suggests a gradual evolution rather than a wholesale replacement. The real risk isn’t AI taking jobs; it’s engineers who fail to adapt and are left behind as the nature of engineering work evolves. The reality across the tech industry reveals an explosion of demand for engineers with AI expertise. Professional services firms are aggressively recruiting engineers with generative AI experience, and technology companies are creating entirely new engineering positions focused on AI implementation. The market for professionals who can effectively leverage AI tools is extraordinarily competitive. While claims of AI-driven productivity gains may be grounded in real progress, such announcements often reflect investor pressure for profitability as much as technological advancement. Many companies are adept at shaping narratives to position themselves as leaders in enterprise AI — a strategy that aligns well with broader market expectations. How AI is transforming engineering work The relationship between AI and engineering is evolving in four key ways, each representing a distinct capability that augments human engineering talent but certainly doesn’t replace it.  AI excels at summarization, helping engineers distill massive codebases, documentation and technical specifications into actionable insights. Rather than spending hours poring over documentation, engineers can get AI-generated summaries and focus on implementation. Also, AI’s inferencing capabilities allow it to analyze patterns in code and systems and proactively suggest optimizations. This empowers engineers to identify potential bugs and make informed decisions more quickly and with greater confidence. Third, AI has proven remarkably adept at converting code between languages. This capability is proving invaluable as organizations modernize their tech stacks and attempt to preserve institutional knowledge embedded in legacy systems. Finally, the true power of gen AI lies in its expansion capabilities — creating novel content like code, documentation or even system architectures. Engineers are using AI to explore more possibilities than they could alone, and we’re seeing these capabilities transform engineering across industries.  In healthcare, AI helps create personalized medical instruction systems that adjust based on a patient’s specific conditions and medical history. In pharmaceutical manufacturing, AI-enhanced systems optimize production schedules to reduce waste and ensure an adequate supply of critical medications. Major banks have invested in gen AI for longer than most people realize, too; they are building systems that help manage complex compliance requirements while improving customer service.  The new engineering skills landscape As AI reshapes engineering work, it’s creating entirely new in-demand specializations and skill sets, like the ability to effectively communicate with AI systems. Engineers who excel at working with AI can extract significantly better results. Similar to how DevOps emerged as a discipline, large language model operations (LLMOps) focuses on deploying, monitoring and optimizing LLMs in production environments. Practitioners of LLMOps track model drift, evaluate alternative models and help to ensure consistent quality of AI-generated outputs. Creating standardized environments where AI tools can be safely and effectively deployed is becoming crucial. Platform engineering provides templates and guardrails that enable engineers to build AI-enhanced applications more efficiently. This standardization helps ensure consistency, security and maintainability across an organization’s AI implementations. Human-AI collaboration ranges from AI merely providing recommendations that humans may ignore, to fully autonomous systems that operate independently. The most effective engineers understand when and how to apply the appropriate level of AI autonomy based on the context and consequences of the task at hand.  Keys to successful AI integration Effective AI governance frameworks — which ranks No. 2 on Gartner’s top trends list — establish clear guidelines while leaving room for innovation. These frameworks address ethical considerations, regulatory compliance and risk management without stifling the creativity that makes AI valuable. Rather than treating security as an afterthought, successful organizations build it into their AI systems from the beginning. This includes robust testing for vulnerabilities like hallucinations, prompt injection and data leakage. By incorporating security considerations into the development process, organizations can move quickly without compromising safety. Engineers who can design agentic AI systems create significant value. We’re seeing systems where one AI model handles natural language understanding, another performs reasoning and a third generates appropriate responses, all working in concert to deliver better results than any single model could provide. As we look ahead, the relationship between engineers and AI systems will likely evolve from tool and user to something more symbiotic. Today’s AI systems are powerful but limited; they lack true understanding and rely heavily on human guidance. Tomorrow’s systems may become true collaborators, proposing novel solutions beyond what engineers might have considered and identifying potential risks humans might overlook. Yet the engineer’s essential role — understanding requirements, making ethical judgments and translating human needs into technological solutions — will remain irreplaceable. In this partnership between human creativity and AI, there lies the potential to solve problems we’ve never been able to tackle before — and that’s anything but a replacement. Rizwan Patel is head of information security and emerging technology at Altimetrik.  source

The future of engineering belongs to those who build with AI, not without it Read More »

Less is more: Meta study shows shorter reasoning improves AI accuracy by 34%

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers from Meta’s FAIR team and The Hebrew University of Jerusalem have discovered that forcing large language models to “think” less actually improves their performance on complex reasoning tasks. The study released today found that shorter reasoning processes in AI systems lead to more accurate results while significantly reducing computational costs. “In this work, we challenge the assumption that long thinking chains results in better reasoning capabilities,” write the authors in their paper titled “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning.” The research contradicts the prevailing trend in AI development, where companies have invested heavily in scaling up computing resources to allow models to perform extensive reasoning through lengthy “thinking chains” — detailed step-by-step trajectories that AI systems use to solve complex problems. AI accuracy jumps 34% when models use shorter reasoning chains The researchers discovered that within the same reasoning task, “shorter reasoning chains are significantly more likely to yield correct answers — up to 34.5% more accurate than the longest chain sampled for the same question.” This finding held true across multiple leading AI models and benchmarks. “While demonstrating impressive results, [extensive reasoning] incurs significant computational costs and inference time,” the authors note, pointing to a substantial inefficiency in how these systems are currently deployed. Based on these findings, the team developed a novel approach called “short-m@k,” which executes multiple reasoning attempts in parallel but halts computation once the first few processes complete. The final answer is then selected through majority voting among these shorter chains. New ‘short-m@k’ method slashes computing costs by 40% while boosting performance For organizations deploying large AI reasoning systems, the implications could be substantial. The researchers found their method could reduce computational resources by up to 40% while maintaining the same level of performance as standard approaches. “Short-3@k, while slightly less efficient than short-1@k, consistently surpasses majority voting across all compute budgets, while still being substantially faster (up to 33% wall time reduction),” the paper states. Michael Hassid, the paper’s lead author, and his team also discovered that training AI models on shorter reasoning examples improved their performance — challenging another fundamental assumption in AI development. “Training on the shorter ones leads to better performance,” the researchers write. “Conversely, finetuning on S1-long increases reasoning time with no significant performance gains.” Tech giants could save millions by implementing “don’t overthink it” approach The findings come at a critical time for the AI industry, as companies race to deploy increasingly powerful models that consume enormous computational resources. “Our findings suggest rethinking current methods of test-time compute in reasoning LLMs, emphasizing that longer ‘thinking’ does not necessarily translate to improved performance and can, counter-intuitively, lead to degraded results,” the researchers conclude. ‘This research stands in contrast to other prominent approaches. Previous influential studies, including OpenAI’s work on “chain-of-thought” prompting and “self-consistency” methods, have generally advocated for more extensive reasoning processes. It also builds upon recent work like Princeton and Google DeepMind’s “Tree of Thoughts” framework and Carnegie Mellon’s “Self-Refine” methodology, which have explored different approaches to AI reasoning. For technical decision makers evaluating AI investments, the research suggests that bigger and more computationally intensive isn’t always better. The study points toward potential cost savings and performance improvements by optimizing for efficiency rather than raw computing power. In an industry obsessed with scaling up, it turns out that teaching AI to be more concise doesn’t just save computing power — it makes the machines smarter too. Sometimes, even artificial intelligence benefits from the age-old wisdom: don’t overthink it. source

Less is more: Meta study shows shorter reasoning improves AI accuracy by 34% Read More »

Emotive voice AI startup Hume launches new EVI 3 model with rapid custom voice creation

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More New York-based AI startup Hume has unveiled its latest Empathic Voice Interface (EVI) conversational AI model, EVI 3 (pronounced “Evee” Three, like the Pokémon character), targeting everything from powering customer support systems and health coaching to immersive storytelling and virtual companionship. EVI 3 lets users create their own voices by talking to the model (it’s voice-to-voice/speech-to-speech), and aims to set a new standard for naturalness, expressiveness, and “empathy” according to Hume — that is, how users perceive the model’s understanding of their emotions and its ability to mirror or adjust its own responses, in terms of tone and word choice. Designed for businesses, developers, and creators, EVI 3 expands on Hume’s previous voice models by offering more sophisticated customization, faster responses, and enhanced emotional understanding. Individual users can interact with it today through Hume’s live demo on its website and iOS app, but developer access through Hume’s proprietary application programming interface (API) is said to be made available in “the coming weeks,” as a blog post from the company states. At that point, developers will be able to embed EVI 3 into their own customer service systems, creative projects, or virtual assistants — for a price (see below). My own usage of the demo allowed me to create a new, custom synthetic voice in seconds based on qualities I described to it — a mix of warm and confident, and a masculine tone. Speaking to it felt more naturalistic and easy than other AI models and certainly the stock voices from legacy tech leaders such Apple with Siri and Amazon with Alexa. What developers and businesses should know about EVI 3 Hume’s EVI 3 is built for a range of uses—from customer service and in-app interactions to content creation in audiobooks and gaming. It allows users to specify precise personality traits, vocal qualities, emotional tone, and conversation topics. This means it can produce anything from a warm, empathetic guide to a quirky, mischievous narrator—down to requests like “a squeaky mouse whispering urgently in a French accent about its scheme to steal cheese from the kitchen.” EVI 3’s core strength lies in its ability to integrate emotional intelligence directly into voice-based experiences. Unlike traditional chatbots or voice assistants that rely heavily on scripted or text-based interactions, EVI 3 adapts to how people naturally speak — picking up on pitch, prosody, pauses, and vocal bursts to create more engaging, humanlike conversations. However, one big feature Hume’s models currently lack — and which is offered by rivals open source and proprietary, such as ElevenLabs — is voice cloning, or the rapid replication of a user’s or other voice, such as a company CEO. Yet Hume has indicated it will add such a capability to its Octave text-to-speech model, as it is noted as “coming soon” on Hume’s website, and prior reporting by yours truly on the company found it will allow users to replicate voices from as little as five seconds of audio. Hume has stated it’s prioritizing safeguards and ethical considerations before making this feature broadly available. Currently, this cloning capability is not available in EVI itself, with Hume emphasizing flexible voice customization instead. Internal benchmarks show users prefer EVI 3 to OpenAI’s GPT-4o voice model According to Hume’s own tests with 1,720 users, EVI 3 was preferred over OpenAI’s GPT-4o in every category evaluated: naturalness, expressiveness, empathy, interruption handling, response speed, audio quality, voice emotion/style modulation on request, and emotion understanding on request (the “on request” features are covered in “instruction following” seen below). It also usually bested Google’s Gemini model family and the new open source AI model firm Sesame from former Oculus co-creator Brendan Iribe. It also boasts lower latency (~300 milliseconds), robust multilingual support (English and Spanish, with more languages coming), and effectively unlimited custom voices. As Hume writes on its website (see screenshot immediately below): Key capabilities include: Prosody generation and expressive text-to-speech with modulation. Interruptibility, enabling dynamic conversational flow. In-conversation voice customizability, so users can adjust speaking style in real time. API-ready architecture (coming soon), so developers can integrate EVI 3 directly into apps and services. Pricing and developer access Hume offers flexible, usage-based pricing across its EVI, Octave TTS, and Expression Measurement APIs. While EVI 3’s specific API pricing has not been announced yet (marked as TBA), the pattern suggests it will be usage-based, with enterprise discounts available for large deployments. For reference, EVI 2 is priced at $0.072 per minute — 30% lower than its predecessor, EVI 1 ($0.102/minute). For creators and developers working with text-to-speech projects, Hume’s Octave TTS plans range from a free tier (10,000 characters of speech, ~10 minutes of audio) to enterprise-level plans. Here’s the breakdown: Free: 10,000 characters, unlimited custom voices, $0/month Starter: 30,000 characters (~30 minutes), 20 projects, $3/month Creator: 100,000 characters (~100 minutes), 1,000 projects, usage-based overage ($0.20/1,000 characters), $10/month Pro: 500,000 characters (~500 minutes), 3,000 projects, $0.15/1,000 extra, $50/month Scale: 2,000,000 characters (~2,000 minutes), 10,000 projects, $0.13/1,000 extra, $150/month Business: 10,000,000 characters (~10,000 minutes), 20,000 projects, $0.10/1,000 extra, $900/month Enterprise: Custom pricing and unlimited usage For developers working on real-time voice interactions or emotional analysis, Hume also offers a Pay as You Go plan with $20 in free credits and no upfront commitment. High-volume enterprise customers can opt for a dedicated Enterprise plan featuring dataset licenses, on-premises solutions, custom integrations, and advanced support. Hume’s history of emotive AI voice models Founded in 2021 by Alan Cowen, a former researcher at Google DeepMind, Hume aims to bridge the gap between human emotional nuance and AI interaction. The company trained its models on an expansive dataset drawn from hundreds of thousands of participants worldwide—capturing not just speech and text, but also vocal bursts and facial expressions. “Emotional intelligence includes the ability to infer intentions and preferences from behavior. That’s the very core of what AI interfaces are trying to achieve,” Cowen told VentureBeat. Hume’s mission is to make

Emotive voice AI startup Hume launches new EVI 3 model with rapid custom voice creation Read More »

Microsoft announces over 50 AI tools to build the ‘agentic web’ at Build 2025

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft launched a comprehensive strategy to position itself at the center of what it calls the “open agentic web” at its annual Build conference this morning, introducing dozens of AI tools and platforms designed to help developers create autonomous systems that can make decisions and complete tasks with limited human intervention. The Redmond, Wash.-based technology giant introduced more than 50 announcements spanning its entire product portfolio, from GitHub and Azure to Windows and Microsoft 365, all focused on advancing AI agent technologies that can work independently or collaboratively to solve complex business problems. “We’ve entered the era of AI agents,” said Frank Shaw, Microsoft’s Chief Communications Officer, in a blog post coinciding with the Build announcements. “Thanks to groundbreaking advancements in reasoning and memory, AI models are now more capable and efficient, and we’re seeing how AI systems can help us all solve problems in new ways.” How AI agents transform software development through autonomous capabilities The concept of the “agentic web” moves far beyond today’s AI assistants. While current AI tools mainly respond to human questions and commands, agents actively initiate tasks, make decisions independently, coordinate with other AI systems, and complete complex workflows with minimal human supervision. This marks a fundamental shift in how AI systems operate and interact with both users and other technologies. Kevin Scott, Microsoft’s CTO, described this shift during a press conference as fundamentally changing how humans interact with technology: “Reasoning will continue to improve. We’re going to see great progress there. But there are a handful of new things that have to start happening pretty quickly in order for agents to be the recipients of more complicated work.” One critical missing element, according to Scott, is memory: “One of the things that is quite conspicuously missing right now in agents is memory.” To address this, Microsoft is introducing several memory-related technologies, including structured RAG (Retrieval-Augmented Generation), which helps AI systems more precisely recall information from large volumes of data. “You will likely have a personal agent and a work agent, and the work agent is going to have a whole bunch of your employer’s information that belongs to both you and your employer,” explained Steven Bathiche, CVP and technical fellow at Microsoft, during a presentation about agents. Bathiche emphasized that this contextual awareness is crucial for creating agents that “understand you well, contextualize where you are and what you want to do, and ultimately understand you so that you can click fewer buttons at the end of the day.” This shift from purely reactive AI to systems with persistent memory represents one of the most profound aspects of the agentic revolution. GitHub evolves from code completion to autonomous developer experience Microsoft is placing GitHub, its popular developer platform, at the forefront of its agentic strategy with the introduction of the GitHub Copilot coding agent, which goes beyond suggesting code snippets to autonomously solving programming tasks. The new GitHub Copilot coding agent can now operate as a member of software development teams, autonomously refactoring code, improving test coverage, fixing defects, and even implementing new features. For complex tasks, GitHub Copilot can collaborate with other agents across all stages of the software lifecycle. Microsoft is also open-sourcing GitHub Copilot Chat in Visual Studio Code, allowing the developer community to contribute to its evolution. This reflects Microsoft’s dual approach of both leading AI innovation while embracing open-source principles. “Over the next few months, the AI-powered capabilities from the GitHub Copilot extensions will be part of the VS Code open-source repository, the same open-source repository that drives the most popular software development tool,” the company explained in its announcement, emphasizing its commitment to transparency and community-driven innovation. Multi-agent systems enable complex business workflows and process automation For businesses looking to deploy AI agents, Microsoft unveiled significant updates to its Azure AI Foundry, a platform for developing and managing AI applications and agents. Ray Smith, VP of AI Agents at Microsoft, highlighted the importance of multi-agent systems in an exclusive interview with VentureBeat: “Multi-agent invocation, debugging and drilling down into those multiple agents is key, and that extends beyond just Copilot Studio to what’s coming with Azure AI Foundry agents. Our customers have consistently emphasized that this multi-agent capability is essential for their needs.” Smith explained why splitting tasks across multiple agents is crucial: “It’s very hard to create a reliable process that you squeeze into one agent. Breaking it up into parts improves maintainability and makes building solutions easier, but it also significantly enhances reliability as well.” The Azure AI Foundry Agent Service, now generally available, allows developers to build enterprise-grade AI agents with support for multi-agent workflows and open protocols like Agent2Agent (A2A) and Model Context Protocol (MCP). This enables organizations to orchestrate multiple specialized agents to handle complex tasks. Local AI capabilities expand as processing power shifts to client devices While cloud-based AI has dominated headlines, Microsoft is making a significant push toward local, on-device AI with several announcements targeting developers who want to deploy AI directly on user devices. Windows AI Foundry, an evolution of Windows Copilot Runtime, provides a unified platform for local AI development on Windows. It includes Windows ML, a built-in AI inferencing runtime, and tools for preparing and optimizing models for on-device deployment. “Foundry Local will make it easy to run AI models, tools and agents directly on-device, whether Windows 11 or MacOS,” the company announced. “Leveraging ONNX Runtime, Foundry Local is designed for situations where users can save on internet data usage, prioritize privacy and reduce costs.” Steven Bathiche explained during a presentation how client-side AI has advanced remarkably fast: “We’re super busy trying to essentially predict and stay ahead. Most of our predictions come true within three or four months, which is kind of crazy, because I’m used to predicting a year or two years out, and then feeling good about that timeline. Now it’s like we’re stressed

Microsoft announces over 50 AI tools to build the ‘agentic web’ at Build 2025 Read More »

DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The whale has returned. After rocking the global AI and business community early this year with the January 20 initial release of its hit open source reasoning AI model R1, the Chinese startup DeepSeek — a spinoff of formerly only locally well-known Hong Kong quantitative analysis firm High-Flyer Capital Management — has released DeepSeek-R1-0528, a significant update that brings DeepSeek’s free and open model near parity in reasoning capabilities with proprietary paid models such as OpenAI’s o3 and Google Gemini 2.5 Pro This update is designed to deliver stronger performance on complex reasoning tasks in math, science, business and programming, along with enhanced features for developers and researchers. Like its predecessor, DeepSeek-R1-0528 is available under the permissive and open MIT License, supporting commercial use and allowing developers to customize the model to their needs. Open-source model weights are available via the AI code sharing community Hugging Face, and detailed documentation is provided for those deploying locally or integrating via the DeepSeek API. Existing users of the DeepSeek API will automatically have their model inferences updated to R1-0528 at no additional cost. The current cost for DeepSeek’s API is For those looking to run the model locally, DeepSeek has published detailed instructions on its GitHub repository. The company also encourages the community to provide feedback and questions through their service email. Individual users can try it for free through DeepSeek’s website here, though you’ll need to provide a phone number or Google Account access to sign in. Enhanced reasoning and benchmark performance At the core of the update are significant improvements in the model’s ability to handle challenging reasoning tasks. DeepSeek explains in its new model card on HuggingFace that these enhancements stem from leveraging increased computational resources and applying algorithmic optimizations in post-training. This approach has resulted in notable improvements across various benchmarks. In the AIME 2025 test, for instance, DeepSeek-R1-0528’s accuracy jumped from 70% to 87.5%, indicating deeper reasoning processes that now average 23,000 tokens per question compared to 12,000 in the previous version. Coding performance also saw a boost, with accuracy on the LiveCodeBench dataset rising from 63.5% to 73.3%. On the demanding “Humanity’s Last Exam,” performance more than doubled, reaching 17.7% from 8.5%. These advances put DeepSeek-R1-0528 closer to the performance of established models like OpenAI’s o3 and Gemini 2.5 Pro, according to internal evaluations — both of those models either have rate limits and/or require paid subscriptions to access. UX upgrades and new features Beyond performance improvements, DeepSeek-R1-0528 introduces several new features aimed at enhancing the user experience. The update adds support for JSON output and function calling, features that should make it easier for developers to integrate the model’s capabilities into their applications and workflows. Front-end capabilities have also been refined, and DeepSeek says these changes will create a smoother, more efficient interaction for users. Additionally, the model’s hallucination rate has been reduced, contributing to more reliable and consistent output. One notable update is the introduction of system prompts. Unlike the previous version, which required a special token at the start of the output to activate “thinking” mode, this update removes that need, streamlining deployment for developers. Smaller variants for those with more limited compute budgets Alongside this release, DeepSeek has distilled its chain-of-thought reasoning into a smaller variant, DeepSeek-R1-0528-Qwen3-8B, which should help those enterprise decision-makers and developers who don’t have the hardware necessary to run the full This distilled version reportedly achieves state-of-the-art performance among open-source models on tasks such as AIME 2024, outperforming Qwen3-8B by 10% and matching Qwen3-235B-thinking. According to Modal, running an 8-billion-parameter large language model (LLM) in half-precision (FP16) requires approximately 16 GB of GPU memory, equating to about 2 GB per billion parameters. Therefore, a single high-end GPU with at least 16 GB of VRAM, such as the NVIDIA RTX 3090 or 4090, is sufficient to run an 8B LLM in FP16 precision. For further quantized models, GPUs with 8–12 GB of VRAM, like the RTX 3060, can be used. DeepSeek believes this distilled model will prove useful for academic research and industrial applications requiring smaller-scale models. Initial AI developer and influencer reactions The update has already drawn attention and praise from developers and enthusiasts on social media. Haider aka “@slow_developer” shared on X that DeepSeek-R1-0528 “is just incredible at coding,” describing how it generated clean code and working tests for a word scoring system challenge, both of which ran perfectly on the first try. According to him, only o3 had previously managed to match that performance. Meanwhile, Lisan al Gaib posted that “DeepSeek is aiming for the king: o3 and Gemini 2.5 Pro,” reflecting the consensus that the new update brings DeepSeek’s model closer to these top performers. Another AI news and rumor influencer, Chubby, commented that “DeepSeek was cooking!” and highlighted how the new version is nearly on par with o3 and Gemini 2.5 Pro. Chubby even speculated that the last R1 update might indicate that DeepSeek is preparing to release its long-awaited and presumed “R2” frontier model soon, as well. Looking Ahead The release of DeepSeek-R1-0528 underscores DeepSeek’s commitment to delivering high-performing, open-source models that prioritize reasoning and usability. By combining measurable benchmark gains with practical features and a permissive open-source license, DeepSeek-R1-0528 is positioned as a valuable tool for developers, researchers, and enthusiasts looking to harness the latest in language model capabilities. Let me know if you’d like to add any more quotes, adjust the tone further, or highlight additional elements! source

DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro Read More »

Mistral launches API for building AI agents that run Python, generate images, perform RAG and more

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The well-funded and innovative French AI startup Mistral AI is introducing a new service for enterprise customers and independent software developers alike. Mistral’s Agents application programming interface (API) allows third-party software developers to easily and rapidly add autonomous generative AI capabilities — such as pulling information securely from enterprise documents — to their existing enterprise and independent applications using the newest Mistral proprietary model, Medium 3, as the “brains” of each agent. It’s essentially designed as a “plug and play” platform with nearly limitless customization for getting AI agents up and running to handle enterprise and developer workflows. “With Agents API, we empower enterprises to use AI in more practical and impactful ways,” wrote Mistral’s Head of Developer Relations, Sophia Yang, on X. Designed to complement Mistral’s existing Chat Completion API, this latest release focuses on agentic orchestration, built-in connectors, persistent memory and the flexibility to coordinate multiple AI agents to tackle complex tasks. Surpassing the limits of typical LLMs… While traditional language models excel at generating text, they often fail to execute actions or maintain conversational context over time. Mistral’s Agents API addresses these limitations by providing developers with the tools to create AI agents capable of performing real-world tasks, managing interactions across conversations and dynamically orchestrating multiple agents when needed. The Agents API comes equipped with several built-in connectors, including: Code Execution: Securely runs Python code, enabling applications in data visualization, scientific computing and other technical tasks. Image Generation: Leverages Black Forest Lab FLUX1.1 [pro] Ultra to create custom visuals for marketing, education or artistic uses. Document Library: Accesses documents stored in Mistral Cloud, enhancing retrieval-augmented generation (RAG) features. Web Search: Allows agents to retrieve up-to-date information from online sources, news outlets and other reputable platforms. The API also supports MCP tools, which connect agents to external resources like APIs, databases, user data and documents—extending the agents’ abilities to handle dynamic, real-world content. Enhanced accuracy using web search One significant feature of the Agents API is the integration of web search as a connector, which notably improves performance on tasks requiring accurate, up-to-date information. In benchmark testing on the SimpleQA dataset, Mistral Large’s accuracy rose from 23% to 75% when web search was enabled. Mistral Medium showed a similar improvement, increasing from 22.08% to 82.32%. Real-world use cases Mistral AI has highlighted a range of use cases for the Agents API, demonstrating its flexibility across multiple sectors: Coding Assistant with GitHub: An agent oversees a developer assistant powered by DevStral, managing tasks and automating code development workflows. Linear Tickets Assistant: Transforms call transcripts into project deliverables using a multi-server MCP architecture. Financial Analyst: Sources financial metrics and securely compiles reports through orchestrated MCP servers. Travel Assistant: Helps users plan trips, book accommodations and manage travel needs. Nutrition Assistant: Supports users in setting dietary goals, logging meals and receiving personalized recommendations. Managing context and conversations The Agents API’s stateful conversation system ensures that agents maintain context throughout their interactions. Developers can start or continue new conversations without losing the thread, with conversation history stored and accessible for future use. Additionally, the API supports streaming output, enabling real-time updates in response to user requests or agent actions. Dynamic orchestration of multiple agents A core capability of the Agents API is its ability to coordinate multiple agents seamlessly. Developers can create customized workflows, assigning specific tasks to specialized agents and enabling handoffs as needed. This modular approach allows enterprises to deploy AI agents that work together to solve complex problems more effectively. What the Mistral Agents API means for enterprise technical decision-makers For senior-level engineers working at enterprise organizations, the Mistral Agents API represents a powerful addition to their AI toolkit. The ability to dynamically orchestrate agents and seamlessly integrate real-world data sources means these roles can deploy AI solutions faster and with greater precision—critical in environments where quick iteration and performance tuning are paramount. Specifically, these professionals often balance tight deployment timelines and the need to maintain model performance across different environments. The Agents API’s built-in connectors—like web search, document libraries, and secure code execution—can significantly reduce the need for ad hoc integrations and patchwork tooling. This streamlined approach saves time and lowers friction, allowing teams to focus more on fine-tuning models and less on building surrounding infrastructure. Moreover, stateful conversation management and real-time updates through streaming output align well with AI orchestration and deployment demands. These features make it easier for engineers to maintain context across iterations and ensure consistent, high-quality interactions with end users. For those responsible for introducing and integrating new AI tools into organizational workflows, the MCP tool support also ensures that agents can access data from a wide range of APIs and systems, further enhancing operational efficiency. Continuing to bolster Mistral’s enterprise AI push The release of the Agents API follows Mistral AI’s recent launch of Le Chat Enterprise, a unified AI assistant platform designed for enterprise productivity and data privacy. Le Chat Enterprise is powered by the new Mistral Medium 3 model, which offers impressive performance at a lower computational cost than larger models. Mistral Medium 3 is particularly strong in software development tasks, outperforming comparable models in key coding benchmarks like HumanEval and MultiPL-E. It also shows competitive performance in multilingual and multimodal scenarios, making it an attractive option for businesses operating in diverse environments. Le Chat Enterprise supports enterprise-grade features such as data sovereignty, hybrid deployment, and strict access controls, which can be crucial for organizations in regulated sectors. The platform consolidates AI functionality within a single environment, enabling customization, seamless integration with existing workflows, and full control over deployment and data security. But it’s another proprietary service Mistral’s earlier releases, like Mistral 7B, were open source and widely embraced by the developer community for their transparency and flexibility. However, Mistral Medium 3 is a proprietary model—requiring access through Mistral’s platform, APIs or partners—and is no longer available under an open-source license.

Mistral launches API for building AI agents that run Python, generate images, perform RAG and more Read More »

Anthropic debuts Claude conversational voice mode on mobile that searches your Google Docs, Drive, Calendar

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More San Francisco AI startup Anthropic has more up its sleeve than the new Claude Opus 4 and Sonnet 4 large language models (LLMs) announced last week — today it has unveiled two major updates for its similarly named Claude AI chatbot: a new conversational voice mode available on its mobile apps in the Apple App Store (iOS devices, iPhones) and Google Play Store (Android devices). In addition, the AI startup that rivals OpenAI (and was formed by some of its defectors) is also extending web search to all users on free plans. These updates aim to make Claude more versatile and accessible to a wider audience. “Early implementation of voice but something that I’ve already found very fun and useful,” wrote Alex Albert, Anthropic’s Head of Claude Relations, on the social network X. “Let us know how you like it (the good and the bad) so we can make it much better in the future!” However, Claude’s conversational voice interface is limited to English for now, and there has not yet been mention of API support for the feature, nor web support — meaning it is restricted to individual mobile app users for now. Hitting OpenAI where it hurts The new Claude voice mode is said to be rolling out over the next few weeks to all mobile app users, according to Anthropic. While rival OpenAI has offered a conversational voice mode on ChatGPT since late 2023 and upgraded it significantly several times, Claude’s new conversational mode brings it up to parity and then some, offering features OpenAI does not. As shown in a promotional video posted on X, users of Claude’s mobile apps can now ask via the conversational voice interface for Claude to check their Google Calendars, Gmail, and Google Docs for specific information that the chatbot will summarize and read back to them audibly, including upcoming appointments and presentation materials. While the conversational interface and web search are available to users of Claude’s free tier, the integration with outside apps and tools is available only to paying subscribers to Claude Pro ($20 per month or $214.99 per year up-front) and Claude Max ($100 monthly per user). As with OpenAI, users can choose from differing voice options — in Claude’s case, they’re known as “Buttery, Airy, Mellow, Glassy and Rounded” — each with distinct tones, accents and conversational quirks. Voice conversations generate full transcripts and voice mode summaries. Additionally, Claude provides visual notes that capture key insights from each discussion, giving users an easy way to review and revisit important points. Seamless transitions between text and voice, plus rich media support One notable feature of voice mode is the ability to switch seamlessly between text and voice interactions without losing conversation context. This flexibility supports different user preferences and use cases. Beyond spoken dialogue, voice mode also handles rich media interactions. Users can discuss documents, images and complex information using voice commands while Claude maintains the conversational flow. This allows for deeper engagement with content and easier access to insights. For Pro Plan users and above, voice mode also integrates personal information sources—such as emails, calendar events and documents—alongside real-time web search results. This combination of data sources offers a more comprehensive and actionable conversational experience. Web search for all In parallel with the voice mode rollout, Anthropic has expanded access to web search by making it available to all users on free plans. This new capability enables Claude to draw on real-time internet data, offering fresher and more accurate responses to questions about breaking news, market trends and other dynamic topics. Web search for free plans adds to Claude’s growing toolkit of integrations and knowledge resources, making it easier for users to get relevant answers and stay current. Anthropic’s broader vision Anthropic notes that voice technology isn’t new territory for the company. Beyond the speech-to-text features in Claude’s mobile apps, Anthropic also powers Amazon’s Alexa+ and Otter AI’s transcription services. These experiences inform the development of the new voice mode and its potential to integrate with other parts of the user’s digital life. These updates join a broader suite of enhancements to Claude, including the launch of Claude 4, integrations with Google Workspace, and expanded research capabilities. A push toward more versatile user interactivity Anthropic highlighted the ease with which users can start a voice conversation and ask Claude to summarize calendar entries or search for documents—demonstrating the platform’s expanded capabilities. Anthropic has also shared media assets to provide additional resources for users interested in learning more about the updates. With the rollout of voice mode in beta and web search now included for free plans, Anthropic continues to broaden the functionality and accessibility of Claude’s AI services. These updates represent another step forward in making conversational AI more adaptable and relevant to users’ daily tasks. source

Anthropic debuts Claude conversational voice mode on mobile that searches your Google Docs, Drive, Calendar Read More »

Google just leapfrogged every competitor with mind-blowing AI that can think deeper, shop smarter, and create videos with dialogue

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google announced a sweeping set of artificial intelligence advancements Tuesday at its annual I/O developer conference, introducing more powerful AI models, expanding its search capabilities, and launching new creative tools that push the boundaries of what its technology can accomplish. The Mountain View-based company unveiled Gemini 2.5 enhancements, rolled out AI Mode in Search to all U.S. users, introduced new generative media models, and launched a premium $249.99 monthly subscription tier called Google AI Ultra for power users — all reflecting Google’s accelerating AI momentum across its product ecosystem. “More intelligence is available, for everyone, everywhere. And the world is responding, adopting AI faster than ever before,” said Sundar Pichai, CEO of Google and Alphabet, during a press briefing ahead of the conference. “What all this progress means is that we’re in a new phase of the AI platform shift, where decades of research are now becoming reality for people, businesses and communities all over the world.” Enhanced reasoning: Gemini 2.5 models introduce revolutionary “Deep Think” capabilities At the center of Google’s announcements is the continued evolution of its Gemini large language models, with significant improvements to both the Pro and Flash versions. The updated Gemini 2.5 Flash will be generally available in early June, with Pro following shortly after. Most notable is the introduction of “Deep Think,” an enhanced reasoning mode for the Pro model that Google claims delivers breakthrough performance on complex tasks by using parallel thinking techniques. The company says this approach allows the model to consider multiple possibilities simultaneously, similar to how AlphaGo revolutionized game playing. “Deep Think pushes model performance to its limits, delivering groundbreaking results,” said Demis Hassabis, CEO of Google DeepMind, during the press briefing. “It gets an impressive score on USAMO 2025, one of the hardest maths benchmarks. It also leads on LiveCodeBench, a benchmark for competition-level coding.” The company is proceeding cautiously with Deep Think, planning to first make it available to trusted testers for feedback before wider release. This measured approach reflects Google’s emphasis on responsible AI deployment, especially for frontier capabilities that push the boundaries of what AI can accomplish. Reimagining search: AI Mode expands with personalization and agentic features Google is bringing AI deeper into its core search product, rolling out “AI Mode” to all U.S. users after previously limiting it to Labs testers. This alternative search experience uses a technique called “query fan-out” to break questions into subtopics and issue multiple simultaneous searches, delivering more comprehensive results than traditional search. “AI Mode is our most powerful AI search with more advanced reasoning and multimodality, and the ability to go deeper through follow-up questions and helpful links to the web,” said Liz Reid, VP and Head of Google Search. The company revealed impressive metrics around its existing AI Overviews feature, which now reaches more than 1.5 billion users. “In our biggest markets like the U.S. and India, AI overviews is driving over 10% increase in usage of Google for the types of queries that show AI overviews,” Reid noted during the preview. New features coming to AI Mode include Deep Search for comprehensive research reports, Live capabilities for real-time visual assistance, and personalization options that can incorporate data from users’ Google accounts. This personalization, which requires explicit user opt-in, aims to deliver more relevant results by understanding individual preferences and contexts. Google is making a significant push into AI-powered shopping experiences, introducing a virtual try-on feature that allows users to see how clothes would look on them using just a single photo of themselves. The technology represents a major advancement in making online shopping more intuitive and personalized. “This is a situation where I found maybe five dresses that I like, and I see how it looks on the website and on the models there. However, I look nothing like those models, and I’m wondering which one will really work for me,” explained Vidhya Srinivasan, VP and General Manager of Ads and Commerce. The system is powered by a specialized image generation model designed specifically for fashion applications. According to Srinivasan, it has “a very deep understanding of 3D shapes” and fabrics, allowing it to realistically render how clothing items would drape and fit on different body types. Beyond visual try-on, Google is also introducing agentic checkout capabilities that can automatically complete purchases when items reach a user-specified price point. This feature handles the entire checkout process through Google Pay, showcasing how Google is applying its agentic AI capabilities to streamline everyday tasks. Google unveiled significant upgrades to its generative media models, introducing Veo 3 for video generation and Imagen 4 for images. The most dramatic advancement comes in Veo 3’s ability to generate videos with synchronized audio — including ambient sounds, effects, and character dialogue. “For the first time, we’re emerging from the silent era of video generation,” said Hassabis. “Not only does Veo 3 offer even more stunning visual quality, but it can also generate sound effects, background noises and even dialog.” These advanced models power Flow, Google’s new AI filmmaking tool designed for creative professionals. Flow integrates Google’s best AI models to help storytellers create cinematic clips and scenes with a more intuitive interface. “Flow is inspired by what it feels like when time slows down and creation is effortless, iterative and full of possibility,” according to a company statement. The tool has already been tested with several filmmakers who have created short films using the technology in combination with traditional methods. Imagen 4, meanwhile, delivers improvements in image quality, with particular attention to typography and text rendering — making it especially valuable for creating marketing materials, presentations, and other content that combines visuals and text. Immersive communication: Google Beam evolves from Project Starline research The company announced that Project Starline, its experimental 3D video communication technology first showcased several years ago, is evolving into a commercial product called Google Beam. This technology creates the

Google just leapfrogged every competitor with mind-blowing AI that can think deeper, shop smarter, and create videos with dialogue Read More »

Everyone’s looking to get in on vibe coding — and Google is no different with Stitch, its follow-up to Jules

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Vibe coding is arguably one of the hottest trends in tech right now, as it reflects a wider adoption of AI and natural language prompts for basic code completion (challenging the conventional coding mindset that humans should complete downstream tasks). Google is releasing Stitch, a new experiment from Google Labs, to compete with Microsoft, AWS, and other existing end-to-end coding tools. Now in beta, the platform designs user interfaces (UIs) with one prompt—and some developers are already gushing.  “Google dropped the most powerful UI designer in the world,” Brendan Jowett, owner of voice AI company Inflate AI, posted on X.  The use of AI in programming and development certainly isn’t new, but the concept of “vibe coding” — coined by OpenAI cofounder Andrej Karpathy earlier this year — is a newer concept incorporating generative AI to automate coding tasks typically done manually. This goes beyond existing AI assistants and drag-and-drop no-code and low-code tools: The focus is on the end result, not the journey there.  “You finally give into the vibes, embrace exponentials and forget that code even exists,” Karpathy wrote on X. Top players in the integrated development environment (IDE) space include Windsurf (formerly Codeium), Cursor, Replit, Lovable, Bolt, Devin and Aider. Anthropic also recently launched its command-line AI agent Claude Code.    Larger players in addition to Google are looking to stake their claim, as well: Amazon Web Services (AWS) is offering its Amazon Q Developer AI assistant as an add-on for developers to access directly at any point in their coding; Microsoft released GitHub Copilot agent mode; OpenAI is looking to extend its capabilities in vibe coding with its Codex update and intended $3 billion purchase of Windsurf; and Agentforce is writing about 20% of Salesforce’s code.  Google, for its part, also recently released autonomous coding agent Jules into beta.  Stitch builds UIs with a prompt With Google Stitch, users can designate whether they want to build a dashboard or web or mobile app and describe what it should look like (such as color palettes or the user experience they’re going for).  The platform instantly generates HTML, CSS+ and templates with editable components that devs and non-devs can customize and edit (such as instructing Stitch to add a search function to the home screen). They can then add directly to apps or export to Figma.  “Design is an iterative process, and Stitch facilitates this by allowing you to generate multiple variants of your interface,” Google Labs researchers explain. “Experiment with different layouts, components and styles to achieve the desired look and feel.”  Users can choose a ‘standard mode’ that runs on Gemini 2.5 Flash or switch to an ‘experimental mode’ that uses Gemini Pro and allows users to upload visual elements such as screenshots, wireframes and sketches to guide what the platform generates. Google also plans to release a feature allowing users to annotate screenshots to make changes.  Stitch is “meant for quick first drafts, wireframes and MVP-ready frontends,” Jowett notes in his X thread.  Some say layouts are ‘unreal’; others call Bolt superior Many users have offered early praise. One noted: “I tried Stitch with a ‘crypto wallet dashboard’ prompt and it nailed the layout in under 10 seconds. Unreal.”  X user “God of Prompt” posted: “Honestly shocked this isn’t getting more attention. A real UI generator backed by Gemini with Figma export? Instant use-case.” However, others found the beta version less than optimal. Elizabeth Alli of DesignerUp outlined her experiences in a blog post: The dev prompted Stitch to make an app to help build mindfulness habits. She reported that it “missed the mark” on design elements (such as the colors she was looking for) and she wasn’t able to click around, as the platform only generated one screen (and in subsequent prompts she had difficulty generating the next logical screen or any other screen).  Also, there aren’t many editing options to choose from, and when Alli uploaded an image from a website she designed, she was not impressed by the formatting, typography, color combinations and “dated” shadows and icons.  “I had much higher expectations from Google given that there are already so many existing AI to UI design generation tools on the market that do this much better,” she writes. “Their effort seems half-baked at best.”  While it is in beta, it doesn’t have “anywhere near” the polished output of other offerings such as Figma’s First Draft or Uizard’s Autodesigner. “This release seems like a bit of a mad dash to throw their hat into the ring of the AI UI design hype,” Alli notes.  Other early users agree that Stitch can be wonky and underwhelming, that the designs aren’t quite there yet, and that other existing tools are still superior.  One X user lamented: “I used the same prompt that I used to generate landing pages in other AI tools which return direct code, but the designs were so much better in the other tools such as Bolt.” It’s clear Google has some kinks to work out if it intends to compete with already entrenched players. Still, it’s early in the vibe coding game, and users are eager to experiment with a variety of tools, so it’ll be interesting to see Stitch’s next iteration.  Try Stitch out for yourself here.  source

Everyone’s looking to get in on vibe coding — and Google is no different with Stitch, its follow-up to Jules Read More »