VentureBeat

Perplexity launches Sonar API, taking aim at Google and OpenAI with real-time AI search

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Perplexity has launched an aggressive bid to capture the enterprise AI search market, unveiling Sonar, an API service that outperforms offerings from Google, OpenAI and Anthropic on key benchmarks while also undercutting their prices. The move signals a significant shift in the AI landscape, as Perplexity — now valued at $9 billion — directly challenges larger competitors by making its real-time, web-connected search capabilities available to developers and enterprises. The company’s dual-tier strategy — offering both a lightweight Sonar service and a more robust Sonar Pro version — targets different segments of the growing AI integration market. Perplexity’s Sonar Pro outperforms major AI competitors in the SimpleQA benchmark, which measures response accuracy. (Credit: Perplexity) Sonar’s real-time advantage: Bringing fresh data to enterprises Zoom has already integrated Sonar into its AI Companion 2.0 product, allowing users to access real-time information without leaving video conferences — a capability that could reshape how businesses conduct remote meetings and research. The pricing structure appears to be designed to disrupt the market. Sonar’s base tier costs $5 per 1,000 searches plus minimal token fees, while Sonar Pro, despite higher token costs, offers doubled citation density and multi-search capabilities for complex queries. What sets Sonar apart is its real-time web connection, a feature absent in many competing APIs that rely solely on training data. This approach could prove particularly valuable for enterprises requiring current information, although it may face challenges in applications requiring deterministic outputs. Perplexity’s two-tier API offering shows the feature differences between Sonar Pro (left) and the base Sonar service (right), with Pro featuring enhanced citation capability and support for complex queries. (Credit: Perplexity) Disruptive pricing: Affordable AI search for the enterprise market The launch comes at a pivotal moment in the AI industry, when companies are increasingly seeking ways to integrate AI search capabilities into their products. With recent benchmarks showing Sonar Pro achieving an 85.8 F-score on the SimpleQA benchmark — significantly outperforming GPT-4o and Claude — Perplexity appears positioned to capitalize on growing enterprise demand for accurate, citation-backed AI responses. The timing of this launch comes as Perplexity demonstrates significant market momentum, having just secured a $500 million funding round led by Institutional Venture Partners, which valued the company at $9 billion. This strategy could prove particularly effective as enterprises increasingly prioritize AI tools that provide verifiable, current information over black-box solutions. For technical decision makers, Sonar’s launch represents a new option in the AI toolkit, particularly for applications requiring real-time information access and citation tracking. However, the true test will be whether Perplexity can maintain its performance edge and pricing advantage as larger competitors inevitably adjust their strategies. source

Perplexity launches Sonar API, taking aim at Google and OpenAI with real-time AI search Read More »

AI factories are factories: Overcoming industrial challenges to commoditize AI

This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from this special issue here. This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from the issue here. If you were to travel 60 years back in time to Stevenson, Alabama, you’d find Widows Creek Fossil Plant, a 1.6-gigawatt generating station with one of the tallest chimneys in the world. Today, there’s a Google data center where the Widows Creek plant once stood. Instead of running on coal, the old facility’s transmission lines bring in renewable energy to power the company’s online services. That metamorphosis, from a carbon-burning facility to a digital factory, is symbolic of a global shift to digital infrastructure. And we’re about to see the production of intelligence kick into high gear thanks to AI factories.  These data centers are decision-making engines that gobble up compute, networking and storage resources as they convert information into insights. Densely packed data centers are springing up in record time to satisfy the insatiable demand for artificial intelligence.  The infrastructure to support AI inherits many of the same challenges that defined industrial factories, from power to scalability and reliability, requiring modern solutions to century-old problems. The new labor force: Compute power In the era of steam and steel, labor meant thousands of workers operating machinery around the clock. In today’s AI factories, output is determined by compute power. Training large AI models requires massive processing resources. According to Aparna Ramani, VP of engineering at Meta, the growth of training these models is about a factor of four per year across the industry. That level of scaling is on track to create some of the same bottlenecks that existed in the industrial world. There are supply chain constraints, to start. GPUs — the engines of the AI revolution — come from a handful of manufacturers. They’re incredibly complex. They’re in high demand. And so it should come as no surprise that they’re subject to cost volatility.  In an effort to sidestep some of those supply limitations, big names like AWS, Google, IBM, Intel and Meta are designing their own custom silicon. These chips are optimized for power, performance and cost, making them specialists with unique features for their respective workloads. This shift isn’t just about hardware, though. There’s also concern about how AI technologies will affect the job market. Research published by Columbia Business School studied the investment management industry and found the adoption of AI leads to a 5% decline in the labor share of income, mirroring shifts seen during the Industrial Revolution.  “AI is likely to be transformative for many, perhaps all, sectors of the economy,” says Professor Laura Veldkamp, one of the paper’s authors. “I’m pretty optimistic that we will find useful employment for lots of people. But there will be transition costs.” Where will we find the energy to scale? Cost and availability aside, the GPUs that serve as the AI factory workforce are notoriously power-hungry. When the xAI team brought its Colossus supercomputer cluster online in September 2024, it reportedly had access to somewhere between seven and eight megawatts from the Tennessee Valley Authority. But the cluster’s 100,000 H100 GPUs need a lot more than that. So, xAI brought in VoltaGrid mobile generators to temporarily make up for the difference. In early November, Memphis Light, Gas & Water reached a more permanent agreement with the TVA to deliver xAI an additional 150 megawatts of capacity. But critics counter that the site’s consumption is straining the city’s grid and contributing to its poor air quality. And Elon Musk already has plans for another 100,000 H100/H200 GPUs under the same roof. According to McKinsey, the power needs of data centers are expected to increase to approximately three times current capacity by the end of the decade. At the same time, the rate at which processors are doubling their performance efficiency is slowing. That means performance per watt is still improving, but at a decelerating pace, and certainly not fast enough to keep up with the demand for compute horsepower.  So, what will it take to match the feverish adoption of AI technologies? A report from Goldman Sachs suggests that U.S. utilities need to invest about $50 billion in new generation capacity just to support data centers. Analysts also expect data center power consumption to drive around 3.3 billion cubic feet per day of new natural gas demand by 2030. Scaling gets harder as AI factories get larger Training the models that make AI factories accurate and efficient can take tens of thousands of GPUs, all working in parallel, months at a time. If a GPU fails during training, the run must be stopped, restored to a recent checkpoint and resumed. However, as the complexity of AI factories increases, so does the likelihood of a failure. Ramani addressed this concern during an AI Infra @ Scale presentation.  “Stopping and restarting is pretty painful. But it’s made worse by the fact that, as the number of GPUs increases, so too does the likelihood of a failure. And at some point, the volume of failures could become so overwhelming that we lose too much time mitigating these failures and you barely finish a training run.” According to Ramani, Meta is working on near-term ways to detect failures sooner and to get back up and running more quickly. Further over the horizon, research into asynchronous training may improve fault tolerance while simultaneously improving GPU utilization and distributing training runs across multiple data centers.  Always-on AI will change the way we do business Just as factories of the past relied on new technologies and organizational models to scale the production of goods, AI factories feed on compute power, networking infrastructure and storage to produce tokens — the smallest piece of information an AI model uses. “This AI factory is generating, creating, producing something of great value, a new commodity,” said Nvidia CEO Jensen Huang during his Computex 2024 keynote. “It’s

AI factories are factories: Overcoming industrial challenges to commoditize AI Read More »

Google releases free Gemini 2.0 Flash Thinking model, pressuring OpenAI’s premium strategy

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google has quietly released a major update to its popular artificial intelligence model, Gemini, which now explains its reasoning process, sets new performance records in mathematical and scientific tasks, and offers a free alternative to OpenAI’s premium services. The new Gemini 2.0 Flash Thinking model, released Tuesday in the Google AI Studio under the experimental designation “Exp-01-21,” has achieved a 73.3% score on the American Invitational Mathematics Examination (AIME) and 74.2% on the GPQA Diamond science benchmark. These results show clear improvements over earlier AI models and demonstrate Google’s increasing strength in advanced reasoning. “We’ve been pioneering these types of planning systems for over a decade, starting with programs like AlphaGo, and it is exciting to see the powerful combination of these ideas with the most capable foundation models,” wrote Demis Hassabis, CEO of Google DeepMind, in a post on X.com (formerly Twitter). Our latest update to our Gemini 2.0 Flash Thinking model (available here: https://t.co/Rr9DvqbUdO) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past… pic.twitter.com/cM1gNwBoTO — Demis Hassabis (@demishassabis) January 21, 2025 Gemini 2.0 Flash Thinking breaks records with million-token processing The model’s most striking feature is its ability to process up to one million tokens of text — five times more than OpenAI’s o1 Pro model — while maintaining faster response times. This expanded context window allows the model to analyze multiple research papers or extensive datasets simultaneously, a capability that could transform how researchers and analysts work with large volumes of information. “As a first experiment, I took various religious and philosophical texts and asked Gemini 2.0 Flash Thinking to weave them together, extracting novel and unique insights,” Dan Mac, an AI researcher who tested the model, said in a post on X.com. “It processed 970,000 tokens in total. The output is pretty incredible.” The release comes at a critical moment in the AI industry’s evolution. OpenAI recently announced its o3 model, which achieved an 87.7% score on the GPQA Diamond benchmark. However, Google’s decision to offer its model free during beta testing (with usage limits) could attract developers and enterprises seeking alternatives to OpenAI’s $200 monthly subscription. Benchmark results show Google’s latest Gemini 2.0 Flash Thinking model dramatically outperforming earlier versions across mathematics, science and reasoning tasks. (Credit: Google DeepMind) Google offers free Gemini 2.0 Flash Thinking with built-in code execution Jeff Dean, Chief Scientist at Google DeepMind, emphasized improvements in the model’s reliability: “We’re continuing to iterate, with higher reliability and reduced contradictions between the model’s thoughts and final answers,” he wrote. The model also includes native code execution capabilities, allowing developers to run and test code directly within the system. This feature, combined with improved contradiction safeguards, positions Gemini 2.0 Flash Thinking as a serious contender for both research and commercial applications. Industry analysts note that Google’s focus on explaining its reasoning process could help address growing concerns about AI transparency and reliability. Unlike traditional “black box” models, Gemini 2.0 Flash Thinking shows its work, making it easier for users to understand and verify its conclusions. We’re continuing to iterate, with higher reliability and reduced contradictions between the model’s thoughts and final answers. Check it out as gemini-2.0-flash-thinking-exp-01-21 at https://t.co/sw0jY6k74m — Jeff Dean (@JeffDean) January 21, 2025 AI transparency becomes the new battleground as Google challenges OpenAI The model has already claimed the top spot on the Chatbot Arena leaderboard, a prominent benchmark for AI performance, leading in categories including hard prompts, coding, and creative writing. However, questions remain about the model’s real-world performance and limitations. While benchmark scores provide valuable metrics, they don’t always translate directly to practical applications. Google’s challenge will be convincing enterprise customers that its free offering can match or exceed the capabilities of premium alternatives. As the AI arms race intensifies, Google’s latest release suggests a shift in strategy: combining advanced capabilities with accessibility. Whether this approach will help close the gap with OpenAI remains to be seen, but it certainly gives technical decision makers a compelling reason to reconsider their AI partnerships. For now, one thing is clear: the era of AI that can show its work has arrived, and it’s available to anyone with a Google account. source

Google releases free Gemini 2.0 Flash Thinking model, pressuring OpenAI’s premium strategy Read More »

What’s next for agentic AI? LangChain founder looks to ambient agents

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Agentic AI is the latest big trend in generative AI, but what comes after that? While full artificial general intelligence (AGI) is likely still some time in the future, there might well be an intermediate step with an approach known as ambient agents. LangChain, the agentic AI pioneer, introduced the term “ambient agents” on January 14. The technology that LangChain develops includes its eponymous open source LangChain framework that enables organizations to chain different large language models (LLMs) together to get a result. LangChain Inc. raised $24 million in funding in February 2024. The company also has a series of commercial products including LangSmith for LLM Ops. With a traditional AI interface, users typically interact with an LLM via text prompts to initiate an action. Agentic AI generally refers to LLM-powered systems that take actions on the user’s behalf. The concept of ambient agents takes that paradigm a step further. What are ambient agents? Ambient agents are AI systems that run in the background, continuously monitoring event streams and then triggered to act when appropriate, according to pre-set instructions and user intent. While the term “ambient agents” is new, the concept of ambient intelligence, where AI is always listening, is not. Amazon refers to its Alexa personal assistant technology as enabling ambient intelligence. The goal of ambient agents is to automate repetitive tasks and scale the user’s capabilities by having multiple agents running persistently, rather than the human user having to call them up and interact with each one, one-on-one. This allows the user to focus on higher-level tasks while the agents handle routine work. To help prove out and advance the concept of ambient agents, LangChain has developed a series of initial use cases, one that monitors emails, the other for social media, to help users manage and respond when needed.  “I think agents in general are powerful and exciting and cool,” Harrison Chase, cofounder and CEO of LangChain, told VentureBeat. “Ambient agents are way more powerful if there’s a bunch of them doing things in the background, you can just scale yourself way more.” The tech leverages many open-source solutions, and LangChain did not indicate yet how much it would charge for use of any new tools. How ambient agents work to improve AI usability Like many great technology innovations, the original motivation for ambient agents wasn’t to create a new paradigm, but rather to solve a real problem. For Chase, the problem is one that is all too familiar for many of us: email inbox overload. Chase began his journey to create ambient agents to solve email challenges. Six months ago he started building an ambient agent for his own email. Chase explained that the email assistant categorizes his emails, handling the triage process automatically. He no longer has to manually sort through his inbox, as the agent takes care of it. Through his own use of the agent inbox over an extended period, Chase was able to refine and improve its capabilities. He noted that it started off imperfect, but by using it regularly and addressing the pain points, he was able to enhance the agent’s performance. To be clear, the email assistant isn’t some kind of simplistic rules-based system for sorting email. It’s a system that actually understands his email and helps him to decide how to manage it. The ambient agent architecture for the email assistant use case The architecture of Chase’s email assistant is quite complex, involving multiple components and language models.  “It starts off with a triage step that’s kind of like an LLM and a pretty complicated prompt and some few short examples which are retrieved semantically from a vector database,” Chase explained. “Then, if it’s determined that it should try to respond, it goes to a drafting agent.” Chase further explained that the drafting agent has access to additional tools, including a sub-agent specifically for interacting with the calendar: “There’s an agent that I have specifically for interacting with the calendar, because actually LLMs kind of suck at dates,” Chase said. “So I had to have a dedicated agent just to interact with the calendar.” After the draft response is generated, Chase said there’s an additional LLM call that rewrites the response to ensure the correct tone and formatting. “I found that having the LLM try to call all these tools and construct an email and then also write in the correct tone was really tricky, so I have a step explicitly for tone,” Chase said. The agent inbox as a way to control and monitor agents A key part of the ambient agent experience Is having control and visibility into what the agents are doing. Chase noted that in an initial implementation, he just had agents message via Slack, but that quickly became unwieldy. Instead, LangChain designed a new user interface, the agent inbox, specifically for interacting with ambient agents. Screenshot of LangChain agent box. Credit: VentureBeat The system displays all open lines of communication between users and agents and makes it easy to track outstanding actions. How to build an ambient agent LangChain first and foremost is a tool for developers and it’s going to be a tool to help build and deploy ambient agents now too. Any developer can use the open-source LangChain technology to build an ambient agent, though additional tools can simplify the process. Chase explained that the agent inbox he built is in some respect a view on top of the LangGraph platform. LangGraph is an open-source framework for building agents that provides the infrastructure for operating long-running background jobs. On top of that, LangChain is using its commercial LangSmith platform, which provides observability and evaluation for agents. This helps developers put agents into production with the necessary monitoring and evaluation tools to ensure they are performing as expected. Ambient agents: A step toward using generalized intelligence Chase is optimistic that the concept of

What’s next for agentic AI? LangChain founder looks to ambient agents Read More »

Deloitte: 74% of enterprises have already met or exceeded gen AI initiatives (but challenges remain)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprises of all sizes around the world are trying to make sense of generative AI and determine where it might add value. The good news: The majority of organizations are actually making it work. According to a new report today from Deloitte, the majority of enterprises are actually meeting or exceeding their own expectations for return on investment (ROI) from gen AI. The “State of Generative AI Q4” report, based on a survey of 2,773 leaders across 14 countries, highlights both the progress and challenges organizations face in their gen AI journeys. The report shows considerable progress from the first version released a year ago, in which business leaders expressed multiple concerns. There is also positive progress over the third quarter report, which showed that the majority of organizations had avoided some gen AI use cases due to data issues. Despite longer-than-expected time to value, nearly three-quarters (74%) of respondents reported that their most advanced gen AI initiatives are meeting or exceeding ROI expectations. Cybersecurity and IT functions are leading the way in terms of ROI and successful scaling. Key findings include: Organizations require at least 12 months to resolve major adoption challenges IT, cybersecurity, operations, marketing and customer service show strongest adoption and results Regulatory compliance has emerged as the top barrier to gen AI deployment 78% of respondents expect to increase their overall AI spending in the next fiscal year Jim Rowan, head of AI at Deloitte, told VentureBeat that the biggest gains enterprises are reporting from AI usage are efficiency and cost savings. “We’re taking time out of day-to-day tasks and activities and making individuals more efficient,” said Rowan. The challenge of gen AI moving at enterprise speed Enterprise technology by definition is about stability and resilience. It is supposed to be the stuff businesses run on. For many types of technology, enterprise adoption can take multiple years as organizations first need to validate use cases and ROI potential. While the rapid advancements in gen AI capabilities have captured the public’s imagination, enterprises are often moving at a much slower pace when it comes to adoption. This disconnect between the breakneck speed of AI innovation and the more deliberate nature of enterprise technology rollouts presents a significant challenge. “Enterprises are moving at enterprise speed,” said Rowan. “That plays out in a couple different areas within the report, in terms of scaling questions, risk and regulatory challenges that organizations are facing across the board.” This disparity in speed is further complicated by the fact that many enterprises are still grappling with foundational technology challenges, such as data governance and platform modernization. Rowan noted that those underlying issues must be addressed before enterprises can fully capitalize on the potential of generative AI. Rather than rushing to deploy the latest gen AI tools, Rowan emphasized the importance of a more measured, strategic approach that focuses on building the necessary infrastructure and cultural readiness. By taking the time to properly integrate gen AI into existing operations and workflows, enterprises can ensure that the technology delivers tangible, long-term value, rather than just serving as a fleeting novelty. This patient, deliberate approach, while potentially slower in the short term, may ultimately prove more effective in driving lasting transformation. Where enterprise AI is delivering the most ROI today One of the key areas where enterprises are seeing tangible value from AI is in the software development lifecycle.  According to the report, AI is helping to drive efficiency gains across the entire process — from requirements gathering to testing and deployment.  “We’re seeing it a ton in the software development life cycle,” Rowan said. “This is why IT has been a big, big proponent of this.” Beyond software development, enterprises are tapping into AI to enhance their customer service and contact center operations. By automating certain tasks and interactions, companies are able to improve efficiency and responsiveness. “The other big use case is around contact centers, customer service, sort of engagement from those two,” said Rowan. “So those tend to be the largest areas where we’re seeing the most amount of efficiency being taken out.” How enterprises can measure the impact of gen AI As enterprises seek to quantify the impact of their AI investments, Rowan emphasized the importance of looking at both quantitative and qualitative metrics.  While cost savings and efficiency gains are important, companies should also track the number of new ideas and use cases generated, as well as the impact on employee skills and culture.  In the quantitative categories Rowan cited a few key metrics: Efficiency measurement through cost savings Increased revenue generation Increased efficiency per full-time equivalent employee (FTE) on some activities. On the qualitative side, Rowan pointed to metrics around employee development, continuous learning and the overall transformation of business processes.  “How are your employees’ skills improving? How are you using this moment to really change the culture around learning and development?” he said. Benefitting from the promise of agentic AI Perhaps the biggest area of innovation for enterprises to consider in 2025 is agentic AI. The report indicates that 52% of organizations are pursuing AI agents, with 45% specifically exploring multi-agent systems. Rowan expressed optimism about the potential of agentic AI, but noted that it will take time for enterprises to fully adopt and integrate this technology. He explained that enterprises will likely start with simpler, more focused agent applications before expanding their use. Rowan said that agentic AI has the potential to fundamentally transform enterprise processes and drive significant ROI, but only if approached strategically. With the initial rollouts of gen AI, enterprises often focussed on proof of concept (PoC) deployments. A different approach will be required for agentic AI. Instead of looking at individual use cases, enterprises will be well served looking at the broader process chain. He explained that the true value of agentic AI will come from rethinking entire business processes to be AI-driven, rather

Deloitte: 74% of enterprises have already met or exceeded gen AI initiatives (but challenges remain) Read More »

Microsoft launches Copilot Chat with AI agents; take that, Gemini!

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has been positioning Copilot as the “UI for AI.” The company has already launched several variants of the GPT-4o-powered assistant for business and personal users. Now, as the next step in this work, it is launching Microsoft 365 Copilot Chat — a rebranded version of its free AI chat experience for businesses, enhanced with agentic capabilities. Available starting today, the offering is designed to give businesses an easy way to explore most, if not all, of the capabilities of the more full-featured Microsoft 365 Copilot, which is priced at $30 per user per month. Although the experience is free, there is a notable caveat: The agentic capabilities promising task automation will work only on a consumption-based model. The goal here is pretty obvious: Microsoft wants to give its commercial customers a taste of what it has on offer in the paid version of Copilot. If, with powerful features like agents, the company can make using Copilot a daily habit of Microsoft 365 users — from customer service representatives to marketing leads to frontline technicians — those users might eventually turn to the paid plan. This development is not a surprise given that the rollout of Microsoft 365 Copilot has been reported to be far from perfect, with some enterprises describing it as expensive and complex to implement due to security concerns. For its part, Google continues to move ahead with Gemini for Workspace, positioning it as an affordable, easily accessible AI for work. What to expect from Microsoft 365 Copilot Chat Just like the original version, Microsoft 365 Copilot Chat will continue to have a chat interface, where users will be able to input their queries and get answers from AI.  The model under the hood, GPT-4o from OpenAI, will provide information grounded in the web, allowing users to do market research or prepare strategy documents. It even supports file uploads, enabling users to seek summaries, analyses or suggestions from documents, and image generation for use cases like social media marketing.  But the real deal is support for AI agents. IT admins can now use Copilot Studio to build domain-specific agents and make them available to employees via Microsoft 365 Copilot Chat.  These agents can serve as virtual teammates for employees, helping them automate repetitive tasks, from providing customer information before meetings to monitoring relevant events. They can be grounded using data from the web as well as work data either via Microsoft Graph or third-party graph connectors. “A customer service representative can ask a customer relationship management (CRM) agent for account details before a customer meeting, while field service agents can access step-by-step instructions and real-time product knowledge stored in SharePoint,” Microsoft notes in a blog post.  By providing access to agents within Microsoft 365 Copilot Chat, Microsoft wants to show businesses the value its AI offerings can bring. However, this experience will not be entirely free. The agents will be accessible on a consumption-based model, with the total usage being determined according to the number of messages used by an organization. “You can purchase messages though the Copilot Studio meter in Microsoft Azure, a pay-as-you-go option, for $0.01/message, or via pre-paid message packs priced at $200 for 25,000 messages/month,” the company notes in a separate post. It’s worth noting here that different kinds of interactions will use up messages differently, with Microsoft Graph-based answers taking up as many as 30 messages or 30 cents.  Microsoft 365 Copilot Chat vs Microsoft 365 Copilot Taking on Gemini dominance With this move, Microsoft hopes to squeeze some money out of Microsoft 365 users with basic AI needs while creating an opportunity to convert them into paying customers. It also comes as a counter to Google’s push with the Gemini assistant The Sundar Pichai-led company has just announced that Gemini will be available for free within its Workspace apps, including Gmail, Docs, Sheets, Meet, Chat and Vids. This integration is offered to Workspace Business and Enterprise customers, meaning companies paying a base price of $14 per user per month will gain access to AI features inside their core applications. In contrast, Microsoft 365 users must subscribe to the full Copilot version, priced at $30 per user per month, to access AI features within apps like Teams, Outlook, Word, Excel and PowerPoint. But Microsoft differentiates itself by offering usage-based agentic AI capabilities. This allows businesses to create custom agents for task automation — a feature currently absent in Gemini. Ultimately, the choice comes down to the ecosystem you’re aligned with and your specific needs. Google’s approach enables easy access to Gemini within essential business apps but lacks agentic capabilities for now. Meanwhile, Microsoft 365 provides web-based chat and agentic features (on a pay-as-you-go model) but requires a higher investment to unlock AI functionality within its work apps. source

Microsoft launches Copilot Chat with AI agents; take that, Gemini! Read More »

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The world of AI agents is undergoing a revolution, and Microsoft’s release of AutoGen v0.4 this week marked a significant leap forward in this journey. Positioned as a robust, scalable and extensible framework, AutoGen represents Microsoft’s latest attempt to address the challenges of building multi-agent systems for enterprise applications. But what does this release tell us about the state of agentic AI today, and how does it compare to other major frameworks like LangChain and CrewAI? This article unpacks the implications of AutoGen’s update, explores its standout features, and situates it within the broader landscape of AI agent frameworks, helping developers understand what’s possible and where the industry is headed. The promise of “asynchronous event-driven architecture” A defining feature of AutoGen v0.4 is its adoption of an asynchronous, event-driven architecture (see Microsoft’s full blog post). This is a step forward from older, sequential designs, enabling agents to perform tasks concurrently rather than waiting for one process to complete before starting another. For developers, this translates into faster task execution and more efficient resource utilization — especially critical for multi-agent systems. For example, consider a scenario where multiple agents collaborate on a complex task: One agent collects data via APIs, another parses the data, and a third generates a report. With asynchronous processing, these agents can work in parallel, dynamically interacting with a central reasoner agent that orchestrates their tasks. This architecture aligns with the needs of modern enterprises seeking scalability without compromising performance. Asynchronous capabilities are increasingly becoming table stakes. AutoGen’s main competitors, Langchain and CrewAI, already offered this, so Microsoft’s emphasis on this design principle underscores its commitment to keeping AutoGen competitive. AutoGen’s role in Microsoft’s enterprise ecosystem Microsoft’s strategy for AutoGen reveals a dual approach: Empower enterprise developers with a flexible framework like AutoGen, while also offering prebuilt agent applications and other enterprise capabilities through Copilot Studio (see my coverage of Microsoft’s extensive agentic buildout for its existing customers, crowned by its 10 pre-built applications, announced in November at Microsoft Ignite). By thoroughly updating the AutoGen framework capabilities, Microsoft provides developers the tools to create bespoke solutions while offering low-code options for faster deployment. This image depicts the AutoGen v0.4 update. It includes the framework, developer tools, and applications. It supports both first-party and third-party applications and extensions. This dual strategy positions Microsoft uniquely. Developers prototyping with AutoGen can seamlessly integrate their applications into Azure’s ecosystem, encouraging continued use during deployment. Additionally, Microsoft’s Magentic-One app introduces a reference implementation of what cutting-edge AI agents can look like when they sit on top of AutoGen — thus showing the way for developers to use AutoGen for the most autonomous and complex agent interactions. Magentic-One: Microsoft’s generalist multi-agent system, announced in November, for solving open-ended web and file-based tasks across a variety of domains. To be clear, it’s not clear how precisely Microsoft’s prebuilt agent applications leverage this latest AutoGen framework. After all, Microsoft has just finished rehauling AutoGen to make it more flexible and scalable — and Microsoft’s pre-built agents were released in November. But by gradually integrating AutoGen into its offerings going forward, Microsoft clearly aims to balance accessibility for developers with the demands of enterprise-scale deployments. How AutoGen stacks up against LangChain and CrewAI In the realm of agentic AI, frameworks like LangChain and CrewAI have carved their niches. CrewAI, a relative newcomer, gained traction for its simplicity and emphasis on drag-and-drop interfaces, making it accessible to less technical users. However even CrewAI, as it has added features, has gotten more complex to use, as Sam Witteveen mentions in the podcast we published this morning where we discuss these updates. At this point, none of these frameworks is super differentiated in terms of their technical capabilities. However, AutoGen is now distinguishing itself through its tight integration with Azure and its enterprise-focused design. While LangChain has recently introduced “ambient agents” for background task automation (see our story on this, which includes an interview with founder Harrison Chase), AutoGen’s strength lies in its extensibility — allowing developers to build custom tools and extensions tailored to specific use cases. For enterprises, the choice among these frameworks often boils down to specific needs. LangChain’s developer-centric tools make it a strong choice for startups and agile teams. CrewAI’s user-friendly interfaces appeal to low-code enthusiasts. AutoGen, on the other hand, will now be the go-to for organizations already embedded in Microsoft’s ecosystem. However, a big point made by Witteveen is that these frameworks are still mainly used as great places to build prototypes and experiment, and that many developers port their work over to their own custom environments and code (including the Pydantic library for Python for example) when it comes to actual deployment. It’s true, though, that this could change as these frameworks build out extensibility and integration capabilities. Enterprise readiness: the data and adoption challenge Despite the excitement around agentic AI, many enterprises are not ready to fully embrace these technologies. Organizations I’ve talked with over the past month, like Mayo Clinic, Cleveland Clinic, and GSK in healthcare, Chevron in energy, and Wayfair and ABinBev in retail, are focusing on building robust data infrastructures before deploying AI agents at scale. Without clean, well-organized data, the promise of agentic AI remains out of reach. Even with advanced frameworks like AutoGen, LangChain and CrewAI, enterprises face significant hurdles in ensuring alignment, safety and scalability. Controlled flow engineering — the practice of tightly managing how agents execute tasks — remains critical, particularly for industries with stringent compliance requirements like healthcare and finance. What’s next for AI agents? As the competition among agentic AI frameworks heats up, the industry is shifting from a race to build better models to a focus on real-world usability. Features like asynchronous architectures, tool extensibility, and ambient agents are no longer optional but essential. AutoGen v0.4 marks a significant step for Microsoft, signaling its intent to lead in the enterprise AI

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers Read More »

Purpose-built AI hardware: Smart strategies for scaling infrastructure

This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from this special issue here. This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from the issue here. Enterprises can look forward to new capabilities — and strategic decisions — around the crucial task of creating a solid foundation for AI expansion in 2025. New chips, accelerators, co-processors, servers and other networking and storage hardware specially designed for AI promise to ease current shortages and deliver higher performance, expand service variety and availability, and speed time to value.   The evolving landscape of new purpose-built hardware is expected to fuel continued double-digit growth in AI infrastructure that IDC says has lasted 18 straight months. The IT firm reports that organizational buying of  compute hardware (primarily servers with accelerators) and storage hardware infrastructure for AI grew 37% year over-year in the first half of 2024. Sales are forecast to triple to $100 billion a year by 2028.   “Combined spending on dedicated and public cloud infrastructure for AI is expected to represent 42% of new AI spending worldwide through 2025” writes Mary Johnston Turner, research VP for digital infrastructure strategies at IDC.  The main highway for AI expansion  Many analysts and experts say these staggering numbers illustrate that infrastructure is the main highway for AI growth and enterprise digital transformation. Accordingly, they advise, technology and business leaders in mainstream companies should make AI infrastructure a crucial strategic, tactical and budget priority in 2025.  “Success with generative AI hinges on smart investment and robust infrastructure,”  said Anay Nawathe, director of cloud and infrastructure delivery at ISG, a global research and advisory firm. “Organizations that benefit from generative AI redistribute their  budgets to focus on these initiatives.”   As evidence, Nawathe cited a recent ISG global survey that found that proportionally, organizations had ten projects in the pilot phase and 16 in limited deployment, but only six deployed at scale. A major culprit, says Nawathe, was the current infrastructure’s inability to affordably, securely, and performantly scale.” His advice? “Develop comprehensive purchasing practices and maximize GPU availability and utilization, including investigating specialized GPU and AI cloud services.”   Others agree that when expanding AI pilots, proof of concepts or initial projects, it’s essential to choose deployment strategies that offer the right mix of scalability, performance, price, security and manageability.  Experienced advice on AI infrastructure strategy  To help enterprises build their infrastructure strategy for AI expansion, VentureBeat consulted more than a dozen CTOs, integrators, consultants and other experienced industry experts, as well as an equal number of recent surveys and reports.   The insights and advice, along with hand-picked resources for deeper exploration, can help guide organizations along the smartest path for leveraging new AI hardware and help drive operational and competitive advantages. Smart strategy 1: Start with cloud services and hybrid  For most enterprises, including those scaling large language models (LLMs), experts say the best way to benefit from new AI-specific chips and hardware is indirectly — that is,  through cloud providers and services.   That’s because much of the new AI-ready hardware is costly and aimed at giant data centers. Most new products will be snapped up by hyperscalers Microsoft, AWS, Meta and Google; cloud providers like Oracle and IBM; AI giants such as XAI and OpenAI and other dedicated AI firms; and major colocation companies like Equinix. All are racing to expand their data centers and services to gain competitive advantage and keep up with surging demand.   As with cloud in general, consuming AI infrastructure as a service brings several advantages, notably faster jump-starts and scalability, freedom from staffing worries and the convenience of pay-go and operational expenses (OpEx) budgeting. But plans are still emerging, and analysts say 2025 will bring a parade of new cloud services based on powerful AI optimized hardware, including new end-to-end and industry-specific options.  Smart strategy 2: DIY for the deep-pocketed and mature  New optimized hardware won’t change the current reality: Do it yourself (DIY) infrastructure for AI is best suited for deep-pocketed enterprises in financial services, pharmaceuticals, healthcare, automotive and other highly competitive and regulated industries.   As with general-purpose IT infrastructure, success requires the ability to handle high capital expenses (CAPEX), sophisticated AI operations, staffing and partners with specialty skills, take hits to productivity and take advantage of market opportunities during building. Most firms tackling their own infrastructure do so for proprietary applications with high return on investment (ROI).   Duncan Grazier, CTO of BuildOps, a cloud-based platform for building contractors, offered a simple guideline. “If your enterprise operates within a stable problem space with well-known mechanics driving results, the decision remains straightforward: Does the capital outlay outweigh the cost and timeline for a hyperscaler to build a solution tailored to your problem? If deploying new hardware can reduce your overall operational expenses by 20-30%, the math often supports the upfront investment over a three-year period.”   Despite its demanding requirements, DIY is expected to grow in popularity. Hardware vendors will release new, customizable AI-specific products, prompting more and more mature organizations to deploy purpose-built, finely tuned, proprietary AI in private clouds or on premise. Many will be motivated by faster performance of specific workloads, derisking model drift, greater data protection and control and better cost management.  Ultimately, the smartest near-term strategy for most enterprises navigating the new infrastructure paradigm will mirror current cloud approaches: An open, “fit-for- purpose” hybrid that combines private and public clouds with on-premise and edge.  Smart strategy 3: Investigate new enterprise-friendly AI devices  Not every organization can get their hands on $70,000 high end GPUs or afford $2 million AI servers. Take heart: New AI hardware with more realistic pricing for everyday organizations is starting to emerge .   The Dell AI Factory, for example, includes AI Accelerators, high-performance servers, storage, networking and open-source software in a single integrated package. The company also has announced new PowerEdge servers and an Integrated Rack 5000 series offering air and liquid-cooled, energy-efficient AI infrastructure. Major PC makers continue to introduce

Purpose-built AI hardware: Smart strategies for scaling infrastructure Read More »

Tencent introduces ‘Hunyuan3D 2.0’ AI that speeds up 3D design from days to seconds

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Tencent has unveiled “Hunyuan3D 2.0,” an AI system that turns single images or text descriptions into detailed 3D models within seconds. The system makes a typically lengthy process — one that can take skilled artists days or weeks — into a rapid, automated task. Following its predecessor, this new version of the model is available as an open-source project on both Hugging Face and GitHub, making the technology immediately accessible to developers and researchers worldwide. “Creating high-quality 3D assets is a time-intensive process for artists, making automatic generation a long-term goal for researchers,” the company’s research team writes in a technical report. The upgraded system builds upon its predecessor’s foundation while introducing significant improvements in speed and quality. How Hunyuan3D 2.0 turns images into 3D models Hunyuan3D 2.0 uses two main components: Hunyuan3D-DiT creates the basic shape, while Hunyuan3D-Paint adds surface details. The system first makes multiple 2D views of an object, then builds these into a complete 3D model. A new guidance system ensures all views of the object match — solving a common problem in AI-generated 3D models. “We position cameras at specific heights to capture the maximum visible area of each object,” the researchers explain. This approach, combined with their method of mixing different viewpoints, helps the system capture details that other models often miss, especially on the tops and bottoms of objects. A diagram showing how Hunyuan3D 2.0 transforms a single panda image into a 3D model through multi-view diffusion and sparse-view reconstruction techniques. (credit: arxiv.org) Faster and more accurate: What sets Hunyuan3D 2.0 apart The technical results are impressive. Hunyuan3D 2.0 produces more accurate and visually appealing models than existing systems, according to standard industry measurements. The standard version creates a complete 3D model in about 25 seconds, while a smaller, faster version works in just 10 seconds. What sets Hunyuan3D 2.0 apart is its ability to handle both text and image inputs, making it more versatile than previous solutions. The system also introduces innovative features like “adaptive classifier-free guidance” and “hybrid inputs” that help ensure consistency and detail in generated 3D models. According to their published benchmarks, Hunyuan3D 2.0 achieves a CLIP score of 0.809, surpassing both open-source and proprietary alternatives. The technology introduces significant improvements in texture synthesis and geometric accuracy, outperforming existing solutions across all standard industry metrics. The system’s key technical advance is its ability to create high-resolution models without requiring massive computing power. The team developed a new way to increase detail while keeping processing demands manageable — a frequent limitation of other 3D AI systems. These advances matter for many industries. Game developers can quickly create test versions of characters and environments. Online stores could show products in 3D. Movie studios could preview special effects more efficiently. Tencent has shared nearly all parts of their system through Hugging Face. Developers can now use the code to create 3D models that work with standard design software, making it practical for immediate use in professional settings. While this technology marks a significant step forward in automated 3D creation, it raises questions about how artists will work in the future. Tencent sees Hunyuan3D 2.0 not as a replacement for human artists, but as a tool that handles technical tasks while creators focus on artistic decisions. As 3D content becomes increasingly central to gaming, shopping, and entertainment, tools like Hunyuan3D 2.0 suggest a future where creating virtual worlds is as simple as describing them. The challenge ahead may not be generating 3D models, but deciding what to do with them. source

Tencent introduces ‘Hunyuan3D 2.0’ AI that speeds up 3D design from days to seconds Read More »

Borderless AI secures $32 million to challenge HR software giants with its AI-powered platform

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new artificial intelligence startup is betting that HR departments will become the next major battleground for enterprise AI adoption, launching a specialized search engine that aims to transform how companies manage their workforce. Borderless AI, which emerged from stealth last year, announced today the release of HRGPT, a free AI-powered search engine that allows companies to query their internal HR data alongside employment laws and regulations. The company also secured a $5 million strategic investment as part of its latest funding round, with participation from Cohere co-founders Aidan Gomez and Ivan Zhang, bringing its total seed funding to $32 million. “Every HR department is going to have AI agents that manage various aspects across the HR stack,” said Willson Cross, cofounder and CEO of Borderless AI, in an exclusive interview with VentureBeat. “We’re proud to be at the forefront of that vertical.” How Borderless AI’s HRGPT is transforming workforce management The Toronto-based startup is positioning itself to compete with established HR software providers like Workday and ADP by focusing exclusively on AI-powered solutions. Its platform already counts several multinational companies as customers, including Dunlop Sporting Goods, which uses the technology to manage employee onboarding across 17 global offices. Unlike general-purpose AI chatbots, HRGPT combines real-time web search with access to internal company data and specialized HR knowledge. The system can perform tasks ranging from generating employment agreements to tracking time-off requests and managing international expense reimbursements. “Unlike ChatGPT, we have real-time web search. When a customer asks HRGPT a question, it scans the web for real-time sourcing and citations,” Cross told VentureBeat. The platform also integrates with PricewaterhouseCoopers for employment law expertise. Borderless AI’s platform displays employee time-off requests and compliance data in a conversational interface designed for HR professionals. (Credit: Borderless AI) The investment from Cohere‘s co-founders signals growing interest in vertical-specific AI applications for the enterprise. While consumer AI tools like ChatGPT have captured public attention, Cross believes the next wave of AI adoption will come from businesses. “For the next two to three years, it’s going to be the businesses that are catching up and waking up to bringing AI to their organizations,” he said. “HR is one that has many applicable use cases.” Borderless AI’s approach reflects a broader trend of AI companies focusing on specific industries rather than trying to build general-purpose tools. Similar vertical-focused companies include Harvey AI in legal tech and Sierra in customer service. Building a billion-dollar HR tech company with AI at its core The company’s ambitious vision includes automating complex HR processes like payroll management and employee analytics. Cross indicated they aim to build a billion-dollar company with fewer than 50 employees by leveraging AI extensively in their own operations. However, Borderless AI faces significant challenges, including prioritizing which features to build next amid strong customer demand. The company must also maintain accuracy and compliance in its automated HR functions, particularly for sensitive tasks like employment agreements and international payments. The startup’s success could signal whether specialized AI tools will successfully compete against established enterprise software providers who are racing to add AI capabilities to their existing products. For now, early customers appear convinced: Borderless AI reports that its AI agents perform tasks hourly across its customer base. source

Borderless AI secures $32 million to challenge HR software giants with its AI-powered platform Read More »