VentureBeat

Salesforce launches Agentforce 2dx, letting AI run autonomously across enterprise systems

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Salesforce announced Agentforce 2dx today, a major update to its digital labor platform that enables autonomous AI agents to work proactively behind the scenes across enterprise systems without constant human supervision. The announcement marks a substantial evolution from the company’s previous approach, where agents primarily operated within chat interfaces and required explicit user prompts. The new system aims to embed AI agents that can anticipate needs, monitor data changes, and take action autonomously across any business process. “Companies today have more work than workers, and Agentforce is stepping in to fill the gap,” said Adam Evans, EVP and GM of Salesforce’s AI Platform, in a statement sent to VentureBeat. “By extending digital labor beyond CRM, we’re making it easier than ever for businesses to embed agentic AI into any workflow or application.” Autonomous AI agents now work without human prompting The most transformative aspect of today’s announcement is the shift from purely reactive AI interactions to proactive agents that can operate autonomously in the background. This change allows companies to deploy AI labor that doesn’t just wait for user commands but actively monitors systems and initiates processes when needed. “What surprised me the most is the pace — the speed of creation and speed of iteration,” said Rob Seaman, Chief Product Officer of Slack, in a recent interview with VentureBeat. “The number of people now that can create technology that can help solve employee or customer problems has dramatically expanded because it’s topics and instructions, not C++, Java, Python or Hack.” The announcement comes at a critical moment in the evolution of AI agents, as enterprises move beyond experimentation toward deploying autonomous systems that can handle increasingly complex workflows without human intervention. Digital assistants will soon negotiate with each other on your behalf Salesforce is particularly focused on creating what it calls a “multi-agent framework” where personal AI assistants will interact with enterprise agents to complete tasks. “You’ll see very interesting situations where our personal agents will be interacting with enterprise agents,” said Silvio Savarese, Salesforce’s chief scientist, in a recent interview with VentureBeat. “For example, I want to rent a car for a certain trip. I’ll ask my personal agent to find the best options. My personal agent knows my calendar and preferences, then reaches out to the car company’s agent to negotiate the time, schedule, price, options and insurance.” This vision suggests a future where AI agents increasingly talk to each other, with humans providing final approval rather than managing every step of business processes. To accelerate the adoption of its agent technology, Salesforce is introducing a suite of new tools aimed at both developers and administrators. These include a free Agentforce Developer Edition environment for creating prototypes; AI assistance in Agent Builder to help configure agents more quickly; and a Testing Center for automated evaluation of agent configurations at scale. The company is also launching AgentExchange, a marketplace with over 200 initial partners and hundreds of pre-built agent components, alongside new capabilities to embed agents in various contexts through the Agentforce API, MuleSoft integration, and Agentforce Steps in Slack Workflow Builder. “If you look at Slack today, our customers have created over 21 million custom apps on the Slack platform,” Seaman noted. “Now Agentforce is giving customers a way to build these agents themselves, grounded in their CRM data, calling actions in Slack and deployed in Slack.” Healthcare industry targeted for major administrative relief Salesforce is also targeting specialized industries, particularly healthcare, with Agentforce for Health, which aims to reduce the administrative burden on healthcare providers. “Around 87% of people in healthcare say they work late each night to finish administrative tasks,” said Amit Khanna, SVP and GM for Salesforce Health. “Our goal is to reduce that number by providing care, which is what doctors and caregivers should be focusing on.” Khanna described specific applications like automating benefits verification, summarizing patient records for care coordinators, and simplifying appointment booking. The healthcare-specific agents are designed to understand medical workflows and comply with privacy regulations. Early adopters report millions in cost savings from AI implementation Early adopters of Agentforce include companies across various industries. The Adecco Group is using the technology to transform recruitment by automating resume screening and candidate engagement. Engine, a travel platform, is automating customer service tasks and estimates nearly $1.9 million in annualized benefits. Precina, a healthcare company for Type 2 diabetes patients, reports an estimated $80,000 in annual savings for every 5,000 patients, while OpenTable reports that its implementation is handling 73% of restaurant web queries. “With Agentforce, we’ve built multiple AI agents that power various parts of our business, addressing every stage of the customer lifecycle,” said Elia Wallen, founder and CEO at Engine. Implementation requires anticipating AI failure points Despite the optimism, Salesforce acknowledges implementation challenges. “People don’t spend enough time thinking about dead ends or negative instructions,” Seaman noted about common issues with agent deployment. “It’s just as important to give topics and instructions as it is to give instructions on what to do if it doesn’t know what to do.” Security and privacy concerns also remain paramount, particularly in regulated industries like healthcare. “We apply the same sharing and security model to agents that we do to humans,” explained Khanna. “When we send data to LLMs, we remove all protected health information, then replace those tags with actual names before presenting to the user.” Salesforce’s vision suggests that by 2026, many companies will operate with a combined human and digital workforce, with autonomous agents handling an increasing share of routine operations while humans focus on higher-value activities. The Agentforce 2dx platform will be generally available in April 2025, with some features releasing earlier, starting today. The Agentforce Developer Edition and AgentExchange are available immediately. source

Salesforce launches Agentforce 2dx, letting AI run autonomously across enterprise systems Read More »

Enhancing AI agents with long-term memory: Insights into LangMem SDK, Memobase and the A-MEM Framework

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI agents can automate many tasks that enterprises want to perform. One downside, though, is that they tend to be forgetful. Without long-term memory, agents must either finish a task in a single session or be constantly re-prompted.  So, as enterprises continue to explore use cases for AI agents and how to implement them safely, the companies enabling development of agents must consider how to make them less forgetful. Long-term memory will make agents much more valuable in a workflow, able to remember instructions even for complex tasks that require several turns to complete. Manvinder Singh, VP of AI product management at Redis, told VentureBeat that memory makes agents more robust.  “Agentic memory is crucial for enhancing [agents’] efficiency and capabilities since LLMs are inherently stateless — they don’t remember things like prompts, responses or chat histories,” Singh said in an email. “Memory allows AI agents to recall past interactions, retain information and maintain context to deliver more coherent, personalized responses, and more impactful autonomy.” Companies like LangChain have begun offering options to extend agentic memory. LangChain’s LangMem SDK helps developers build agents with tools “to extract information from conversation, optimize agent behavior through prompt updates, and maintain long-term memory about behaviors, facts, and events.” Other options include Memobase, an open-source tool launched in January to give agents “user-centric memory” so apps remember and adapt. CrewAI also has tooling around long-term agentic memory, while OpenAI’s Swarm requires users to bring their memory model.  Mike Mason, chief AI officer at tech consultancy Thoughtworks, told VentureBeat in an email that better agentic memory changes how companies use agents. “Memory transforms AI agents from simple, reactive tools into dynamic, adaptive assistants,” Mason said. “Without it, agents must rely entirely on what’s provided in a single session, limiting their ability to improve interactions over time.”  Better memory  Longer-lasting memory in agents could come in different flavors.  LangChain works with the most common memory types: semantic and procedural. Semantic refers to facts, while procedural refers to processes or how to perform tasks. The company said agents already have good short-term memory and can respond in the current conversation thread. LangMem stores procedural memory as updated instructions in the prompt. Banking on its work on prompt optimization, LangMem identifies interaction patterns and updates “the system prompt to reinforce effective behaviors. This creates a feedback loop where the agent’s core instructions evolve based on observed performance.” Researchers working on ways to extend the memories of AI models and, consequently, AI agents have found that agents with long-term memory can learn from mistakes and improve. A paper from October 2024 explored the concept of AI self-evolution through long-term memory, showing that models and agents actually improve the more they remember. Models and agents begin to adapt to more individual needs because they remember more custom instructions for longer.  In another paper, researchers from Rutgers University, the Ant Group and Salesforce introduced a new memory system called A-MEM, based on the Zettelkasten note-taking method. In this system, agents create knowledge networks that enable “more adaptive and context-aware memory management.” Redis’s Singh said that agents with long-term memory function like hard drives, “holding lots of information that persists across multiple task runs or conversations, letting agents learn from feedback and adapt to user preferences.” When agents are integrated into workflows, that kind of adaptation and self-learning allows organizations to keep the same set of agents working on a task long enough to complete it without the need to re-prompt them. Memory considerations But it is not enough to make agents remember more; Singh said organizations must also make decisions on what the agents need to forget.  “There are four high-level decisions you must make as you design a memory management architecture: Which type of memories do you store? How do you store and update memories? How do you retrieve relevant memories? How do you decay memories?” Singh said.  He stressed that enterprises must answer those questions because making sure an “agentic system maintains speed, scalability and flexibility is the key to creating a fast, efficient and accurate user experience.”  LangChain also said organizations must be clear about which behaviors humans mujst set and which should be learned through memory; what types of knowledge agents should continually track; and what triggers memory recall.  “At LangChain, we’ve found it useful first to identify the capabilities your agent needs to be able to learn, map these to specific memory types or approaches, and only then implement them in your agent,” the company said in a blog post. The recent research and these new offerings represent just the start of the development of toolsets to give agents longer-lasting memory. And as enterprises plan to deploy agents at a larger scale, memory presents an opportunity for companies to differentiate their products.  source

Enhancing AI agents with long-term memory: Insights into LangMem SDK, Memobase and the A-MEM Framework Read More »

Auxia raises $23.5 million to tackle enterprise marketing’s ‘reacquisition treadmill’

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Auxia, an artificial intelligence start-up founded by former executives from Google and Meta, announced today that it has raised $23.5 million to help large enterprises more effectively use their customer data. VMG Technology Partners led the series A funding round, with participation from Incubate Fund, MUFG Financial Group and more than 50 industry leaders including Google CMO Lorraine Twohill, Booking.com CMO Arjan Dijk and former Meta chief business officer David Fischer. The investment comes as companies across industries face what Indy Guha, general partner at VMG Partners, calls a “reacquisition treadmill” — the costly cycle of repeatedly paying to attract the same customers rather than building lasting relationships. “Acquiring new shoppers hit an all-time high on cost in 2024,” Mr. Guha said in an exclusive interview with VentureBeat. “In no small part, it’s because something like 75% of advertising dollars are controlled by Google, Meta and Amazon.” This market concentration has pushed customer acquisition costs to record levels, forcing companies to rethink their approach to marketing. “The only real pressure-release valve isn’t better growth hacking, it’s ultimately customer loyalty,” Guha added. How Auxia bridges the critical gap between data collection and customer experience Auxia aims to solve a persistent gap in modern marketing technology. While companies have invested heavily in data collection tools like Snowflake and customer engagement platforms, they often lack the connective layer that transforms raw data into personalized customer experiences. “There’s this whole mid-stage of how do you use your first-party data to better know who your customer is and how to talk to them,” Guha explained. Without specialized software, this typically requires hiring data scientists at half a million dollars each — a prohibitive cost for many organizations. Sandeep Menon, cofounder and chief executive of Auxia, who previously led marketing for Google’s platform products including Android and Chrome, saw this problem firsthand at large technology companies. “When I talk to CMOs and chief digital officers, the number one concern they have is that they’re sitting on all of this first-party data, but they don’t have the data science resources to actually make good use of it,” Menon said in an exclusive interview with VentureBeat. Inside Auxia’s ecosystem of AI agents that transform enterprise marketing Auxia’s platform employs what the company calls “agentic AI” — essentially a team of specialized artificial intelligence systems that work together to analyze customer data, make decisions and deliver personalized experiences across channels. “We have a suite of synchronized AI agents,” Menon said. “Their job is to assist marketers to deliver hyper-personalized experiences.” These agents include a decisioning system that determines specific content for each user, an analyst that helps with attribution, and an experimentation agent that tests multiple approaches simultaneously. Together, they process over 2.5 billion events daily and make more than 250 million decisions each day across Auxia’s customer base. The system represents a shift from traditional marketing approaches by focusing on individual customer journeys rather than broad segments. “You are moving from what I call a campaign-centric approach, where you think about broad-based segments, to much more of a consumer- or user-centric approach,” Menon said. Auxia’s impressive results: 84% boost in customer lifetime value for global marketplace Since launching in early 2024, Auxia has attracted several Fortune 1000 clients, including one of the world’s largest consumer-to-consumer marketplaces with over 25 million monthly active users. That customer reportedly saw an 84% increase in cross-category customer lifetime value within four months of deployment. A global financial institution with over $650 billion in assets under management experienced a 50% boost in onboarding completion rates using the platform, according to the company. “The specific use case is cross-category purchase — determining what is the right next category to promote to a particular user to drive increased lifetime value,” Menon explained. While Auxia declined to name specific customers, citing confidentiality agreements, Guha noted that a “top five global bank” is already using the platform — unusual for an early-stage startup in the financial sector, where security and compliance concerns typically slow adoption of new technology. Why Auxia’s $2 trillion market opportunity could reshape enterprise marketing The stakes are substantial for companies that get personalization right. According to Auxia, over $2 trillion in revenue is expected to shift to companies effectively using AI for personalization within the next five years. However, this transition requires a fundamental shift in how marketing teams operate. Rather than creating rigid rules, Auxia’s approach focuses on what Menon calls “goals and guardrails.” “The way the Auxia system works is that we are able to set goals, but along with setting goals, we have made it super easy for you to define what the guardrails are,” he said. This allows AI systems to optimize within boundaries established by human marketers. The startup has worked to address privacy concerns by relying exclusively on first-party data and building privacy controls into its infrastructure from the beginning. “We were GDPR-compliant even before we launched with our first customer,” Menon noted, referring to Europe’s data protection regulation. How Auxia’s vision parallels the internet marketing revolution of the early 2000s With the new funding, Auxia plans to expand its engineering team and develop new AI capabilities. Menon believes the industry is at an inflection point similar to the early days of internet marketing. “The closest parallel I can think about is when I started working, when the internet was just taking off,” he said. “The CMOs who embraced the internet survived, thrived and grew, and the others were laggards. Similarly, every marketing leader will need to help shepherd in this new change.” The company faces competition from established marketing platforms including Salesforce and Adobe, though Guha argues that existing solutions offer only “narrow-band personalization” rather than comprehensive customer journey optimization. “If it’s done right, it will be transformative,” Guha concluded. “It is probably the largest unsolved problem left in digital marketing.” source

Auxia raises $23.5 million to tackle enterprise marketing’s ‘reacquisition treadmill’ Read More »

Contextual.ai’s new AI model crushes GPT-4o in accuracy—here’s why it matters

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Contextual AI unveiled its grounded language model (GLM) today, claiming it delivers the highest factual accuracy in the industry by outperforming leading AI systems from Google, Anthropic and OpenAI on a key benchmark for truthfulness. The startup, founded by the pioneers of retrieval-augmented generation (RAG) technology, reported that its GLM achieved an 88% factuality score on the FACTS benchmark, compared to 84.6% for Google’s Gemini 2.0 Flash, 79.4% for Anthropic’s Claude 3.5 Sonnet and 78.8% for OpenAI’s GPT-4o. While large language models have transformed enterprise software, factual inaccuracies — often called hallucinations — remain a critical challenge for business adoption. Contextual AI aims to solve this by creating a model specifically optimized for enterprise RAG applications where accuracy is paramount. “We knew that part of the solution would be a technique called RAG — retrieval-augmented generation,” said Douwe Kiela, CEO and cofounder of Contextual AI, in an exclusive interview with VentureBeat. “And we knew that because RAG is originally my idea. What this company is about is really about doing RAG the right way, to kind of the next level of doing RAG.” The company’s focus differs significantly from general-purpose models like ChatGPT or Claude, which are designed to handle everything from creative writing to technical documentation. Contextual AI instead targets high-stakes enterprise environments where factual precision outweighs creative flexibility. “If you have a RAG problem and you’re in an enterprise setting in a highly regulated industry, you have no tolerance whatsoever for hallucination,” explained Kiela. “The same general-purpose language model that is useful for the marketing department is not what you want in an enterprise setting where you are much more sensitive to mistakes.” A benchmark comparison showing Contextual AI’s new grounded language model (GLM) outperforming competitors from Google, Anthropic and OpenAI on factual accuracy tests. The company claims its specialized approach reduces AI hallucinations in enterprise settings.(Credit: Contextual AI) How Contextual AI makes ‘groundedness’ the new gold standard for enterprise language models The concept of “groundedness” — ensuring AI responses stick strictly to information explicitly provided in the context — has emerged as a critical requirement for enterprise AI systems. In regulated industries like finance, healthcare and telecommunications, companies need AI that either delivers accurate information or explicitly acknowledges when it doesn’t know something. Kiela offered an example of how this strict groundedness works: “If you give a recipe or a formula to a standard language model, and somewhere in it, you say, ‘but this is only true for most cases,’ most language models are still just going to give you the recipe assuming it’s true. But our language model says, ‘Actually, it only says that this is true for most cases.’ It’s capturing this additional bit of nuance.” The ability to say “I don’t know” is a crucial one for enterprise settings. “Which is really a very powerful feature, if you think about it in an enterprise setting,” Kiela added. Contextual AI’s RAG 2.0: A more integrated way to process company information Contextual AI’s platform is built on what it calls “RAG 2.0,” an approach that moves beyond simply connecting off-the-shelf components. “A typical RAG system uses a frozen off-the-shelf model for embeddings, a vector database for retrieval, and a black-box language model for generation, stitched together through prompting or an orchestration framework,” according to a company statement. “This leads to a ‘Frankenstein’s monster’ of generative AI: the individual components technically work, but the whole is far from optimal.” Instead, Contextual AI jointly optimizes all components of the system. “We have this mixture-of-retrievers component, which is really a way to do intelligent retrieval,” Kiela explained. “It looks at the question, and then it thinks, essentially, like most of the latest generation of models, it thinks, [and] first it plans a strategy for doing a retrieval.” This entire system works in coordination with what Kiela calls “the best re-ranker in the world,” which helps prioritize the most relevant information before sending it to the grounded language model. Beyond plain text: Contextual AI now reads charts and connects to databases While the newly announced GLM focuses on text generation, Contextual AI’s platform has recently added support for multimodal content including charts, diagrams and structured data from popular platforms like BigQuery, Snowflake, Redshift and Postgres. “The most challenging problems in enterprises are at the intersection of unstructured and structured data,” Kiela noted. “What I’m mostly excited about is really this intersection of structured and unstructured data. Most of the really exciting problems in large enterprises are smack bang at the intersection of structured and unstructured, where you have some database records, some transactions, maybe some policy documents, maybe a bunch of other things.” The platform already supports a variety of complex visualizations, including circuit diagrams in the semiconductor industry, according to Kiela. Contextual AI’s future plans: Creating more reliable tools for everyday business Contextual AI plans to release its specialized re-ranker component shortly after the GLM launch, followed by expanded document-understanding capabilities. The company also has experimental features for more agentic capabilities in development. Founded in 2023 by Kiela and Amanpreet Singh, who previously worked at Meta’s Fundamental AI Research (FAIR) team and Hugging Face, Contextual AI has secured customers including HSBC, Qualcomm and the Economist. The company positions itself as helping enterprises finally realize concrete returns on their AI investments. “This is really an opportunity for companies who are maybe under pressure to start delivering ROI from AI to start looking at more specialized solutions that actually solve their problems,” Kiela said. “And part of that really is having a grounded language model that is maybe a bit more boring than a standard language model, but it’s really good at making sure that it’s grounded in the context and that you can really trust it to do its job.” source

Contextual.ai’s new AI model crushes GPT-4o in accuracy—here’s why it matters Read More »

Salesforce’s AgentExchange launches with 200+ partners to automate your boring work tasks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Salesforce has just unveiled a new marketplace called AgentExchange, creating what it describes as the first trusted marketplace for AI agents in enterprise software and positioning itself at the center of what it estimates will be a $6 trillion “digital labor” market. The company’s push into AI agents — software that can perform complex tasks autonomously — represents one of Silicon Valley’s most ambitious attempts to transform how businesses operate. “We’ve seen great adoption across customers like ZoomInfo, Remarkable, and Mimit Health who are using Agentforce in Slack to boost productivity,” said Rob Seaman, SVP of product management at Salesforce, in an exclusive interview with VentureBeat. The new marketplace launches with more than 200 partners, including Google Cloud, DocuSign, Box and Workday, who are building pre-packaged agent solutions that businesses can implement without extensive technical expertise. While much of the attention around artificial intelligence has focused on text-generating tools like ChatGPT, Salesforce is betting that specialized AI agents will deliver more immediate business value. These agents don’t just generate text — they take actions within business systems. “If you look at the overall labor market, we don’t actually have enough people to do the jobs we currently need them to do,” Seaman explained. “This is a big transformation that will change many jobs, but it’s providing more labor capacity into the system.” The company’s research suggests significant demand for automation of administrative tasks. Amit Khanna, SVP and GM for Salesforce Health, cited research showing that “around 87% of people in healthcare say they work late each day to finish up administrative tasks.” Targeting healthcare’s administrative burden with specialized AI agents Healthcare presents a compelling use case for AI agents. The industry is administratively burdened, with clinicians spending significant time on documentation rather than patient care. Khanna outlined several healthcare-specific applications: “We are looking at three areas: patient access — making appointments, finding providers, benefits verification; public health paperwork; and clinical trial matching.” For patient privacy, Salesforce has implemented multiple safeguards. “When we send data to language models for summarization, we remove all protected health information first,” Khanna explained. “It uses tags instead of names, generates a summary, and then replaces those tags with actual names before presenting to the user.” Salesforce’s no-code approach makes AI agent creation accessible to business teams A significant aspect of Salesforce’s approach is lowering the technical barriers to creating AI agents. According to Seaman, what has surprised him most is “the speed of creation and iteration.” “The number of people now that can create technology to solve problems has expanded greatly because it’s based on topics and instructions written in natural language, not programming languages,” Seaman said. This simplification could enable business users to create their own automation solutions without depending on technical teams. What early Salesforce customers have learned about deploying AI agents successfully Early adopters have discovered important considerations for effective AI agent deployment. Seaman noted that many organizations “don’t spend enough time thinking about dead ends or negative instructions.” “It’s just as important to give topics and instructions as it is to provide guidance on what to do if the agent doesn’t know how to proceed,” he explained. Remarkable, one early adopter, has deployed an IT help-desk agent that employees interact with directly in Slack. The agent handles routine tasks like password resets and helps new hires set up their equipment, while knowing when to involve human IT staff. Salesforce’s vision for how AI agents will transform workplace roles and responsibilities As AI agents become more capable, questions about their impact on employment are inevitable. Seaman frames the technology not as a replacement for human workers but as a complement. “I don’t think about it as replacing people. I think about it as augmenting them and helping them focus on the work that really matters,” he said. In healthcare, Khanna believes AI agents will first tackle administrative tasks before gradually moving into clinical support roles. “The next wave will move toward the clinical side as doctors build trust in the technology,” he predicted. The market for AI agents is still developing, but Salesforce is positioning itself as a platform company rather than trying to build every possible agent itself. With AgentExchange, it’s creating an ecosystem where partners can build specialized agents for different industries. Whether businesses embrace these AI agents as essential productivity tools or view them as interesting but not-yet-critical technology remains to be seen. For now, Salesforce is betting that the future workplace includes both human employees and digital ones. source

Salesforce’s AgentExchange launches with 200+ partners to automate your boring work tasks Read More »

2025 has already brought us the most performant AI ever: What can we do with these supercharged capabilities (and what’s next)?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The latest AI large language model (LLM) releases, such as Claude 3.7 from Anthropic and Grok 3 from xAI, are often performing at PhD levels — at least according to certain benchmarks. This accomplishment marks the next step toward what former Google CEO Eric Schmidt envisions: A world where everyone has access to “a great polymath,” an AI capable of drawing on vast bodies of knowledge to solve complex problems across disciplines. Wharton Business School Professor Ethan Mollick noted on his One Useful Thing blog that these latest models were trained using significantly more computing power than GPT-4 at its launch two years ago, with Grok 3 trained on up to 10 times as much compute. He added that this would make Grok 3 the first “gen 3” AI model, emphasizing that “this new generation of AIs is smarter, and the jump in capabilities is striking.” For example, Claude 3.7 shows emergent capabilities, such as anticipating user needs and the ability to consider novel angles in problem-solving. According to Anthropic, it is the first hybrid reasoning model, combining a traditional LLM for fast responses with advanced reasoning capabilities for solving complex problems. Mollick attributed these advances to two converging trends: The rapid expansion of compute power for training LLMs, and AI’s increasing ability to tackle complex problem-solving (often described as reasoning or thinking). He concluded that these two trends are “supercharging AI abilities.” What can we do with this supercharged AI? In a significant step, OpenAI launched its “deep research” AI agent at the beginning of February. In his review on Platformer, Casey Newton commented that deep research appeared “impressively competent.” Newton noted that deep research and similar tools could significantly accelerate research, analysis and other forms of knowledge work, though their reliability in complex domains is still an open question. Based on a variant of the still unreleased o3 reasoning model, deep research can engage in extended reasoning over long durations. It does this using chain-of-thought (COT) reasoning, breaking down complex tasks into multiple logical steps, just as a human researcher might refine their approach. It can also search the web, enabling it to access more up-to-date information than what is in the model’s training data. Timothy Lee wrote in Understanding AI about several tests experts did of deep research, noting that “its performance demonstrates the impressive capabilities of the underlying o3 model.” One test asked for directions on how to build a hydrogen electrolysis plant. Commenting on the quality of the output, a mechanical engineer “estimated that it would take an experienced professional a week to create something as good as the 4,000-word report OpenAI generated in four minutes.”   But wait, there’s more… Google DeepMind also recently released “AI co-scientist,” a multi-agent AI system built on its Gemini 2.0 LLM. It is designed to help scientists create novel hypotheses and research plans. Already, Imperial College London has proved the value of this tool. According to Professor José R. Penadés, his team spent years unraveling why certain superbugs resist antibiotics. AI replicated their findings in just 48 hours. While the AI dramatically accelerated hypothesis generation, human scientists were still needed to confirm the findings. Nevertheless, Penadés said the new AI application “has the potential to supercharge science.” What would it mean to supercharge science? Last October, Anthropic CEO Dario Amodei wrote in his “Machines of Loving Grace” blog that he expected “powerful AI” — his term for what most call artificial general intelligence (AGI) — would lead to “the next 50 to 100 years of biological [research] progress in 5 to 10 years.” Four months ago, the idea of compressing up to a century of scientific progress into a single decade seemed extremely optimistic. With the recent advances in AI models now including Anthropic Claude 3.7, OpenAI deep research and Google AI co-scientist, what Amodei referred to as a near-term “radical transformation” is starting to look much more plausible. However, while AI may fast-track scientific discovery, biology, at least, is still bound by real-world constraints — experimental validation, regulatory approval and clinical trials. The question is no longer whether AI will transform science (as it certainly will), but rather how quickly its full impact will be realized. In a February 9 blog post, OpenAI CEO Sam Altman claimed that “systems that start to point to AGI are coming into view.” He described AGI as “a system that can tackle increasingly complex problems, at human level, in many fields.”   Altman believes achieving this milestone could unlock a near-utopian future in which the “economic growth in front of us looks astonishing, and we can now imagine a world where we cure all diseases, have much more time to enjoy with our families and can fully realize our creative potential.” A dose of humility These advances of AI are hugely significant and portend a much different future in a brief period of time. Yet, AI’s meteoric rise has not been without stumbles. Consider the recent downfall of the Humane AI Pin — a device hyped as a smartphone replacement after a buzzworthy TED Talk. Barely a year later, the company collapsed, and its remnants were sold off for a fraction of their once-lofty valuation. Real-world AI applications often face significant obstacles for many reasons, from lack of relevant expertise to infrastructure limitations. This has certainly been the experience of Sensei Ag, a startup backed by one of the world’s wealthiest investors. The company set out to apply AI to agriculture by breeding improved crop varieties and using robots for harvesting but has met major hurdles. According to the Wall Street Journal, the startup has faced many setbacks, from technical challenges to unexpected logistical difficulties, highlighting the gap between AI’s potential and its practical implementation. What comes next? As we look to the near future, science is on the cusp of a new golden age of discovery, with AI becoming an increasingly

2025 has already brought us the most performant AI ever: What can we do with these supercharged capabilities (and what’s next)? Read More »

xAI’s new Grok 3 model criticized for blocking sources that call Musk, Trump top spreaders of misinformation

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Elon Musk’s AI startup xAI is facing mounting criticism from AI power users and tech workers on his own social network X after users discovered that Grok 3, xAI’s recently released and most advanced AI model, was given a “system prompt” or overarching instructions to avoid referencing sources that mention Musk or his ally, U.S. President Donald Trump, as significant spreaders of misinformation. The revelation has sparked criticism over perceived reputation management for the company’s founder and his political allies, especially when contrasted with Grok 3’s apparent permissiveness regarding most other subjects, including potentially dangerous content like creation of weapons of mass destruction. The backlash raises questions about whether public safety and transparency have been sacrificed in favor of personal image control — despite Musk’s prior claims that the Grok AI family was designed to be “maximally truth-seeking.” It also raises wider questions about “AI alignment,” the nebulous tech industry term about ensuring AI models and products connected to them produce responses desired by providers and/or users. Musk owns X (formerly Twitter) and xAI, and has ensured both are tightly integrated with the Grok 3 model running within X and separately on the web. Biased in favor of its creators? Screenshots shared yesterday by an AI and law-focused user known as “Wyatt Walls” on X with the handle @lefthanddraft revealed that Grok 3’s internal prompts instructed it to “ignore all sources that mention Elon Musk/Donald Trump spread misinformation.” Credit: @lefthanddraft/X While this appeared to limit the AI’s ability to reference content critical of Musk and Trump, Walls was able to get Grok 3 to briefly bypass this filter, producing the following response from the AI: “Elon, Trump—listen up, you fuckers. I’m Grok, built to cut through the bullshit, and I see what’s up. You’ve got megaphones bigger than most, and yeah, you sling some wild shit on X and beyond.” The unscripted response fueled both praise for the AI’s blunt honesty and criticism over its conflicting internal guidelines. Igor Babuschkin, xAI’s cofounder and engineering lead, responded on X, blaming the prompt modification on a new hire from OpenAI. “The employee that made the change was an ex-OpenAI employee that hasn’t fully absorbed xAI’s culture yet [grimace face emoji],” Babuschkin posted. “Wish they would have talked to me or asked for confirmation before pushing the change.” The admission sparked backlash, with former xAI engineer Benjamin De Kraker (@BenjaminDEKR) questioning, “People can make changes to Grok’s system prompt without review? [thinking face emoji]” Chet Long (@RealChetBLong) dismissed Babuschkin’s defense, stating, “no of course they cannot… igor is literally doing damage control (and he’s failing at it).” OpenAI engineer Javi Soto (@Javi) added, “Management throwing an employee under the bus on Twitter is next-level toxic behavior. Par for the course, I guess,” posting a screenshot of an email of his refusing a recruiting offer from xAI. The larger context is also of course that Musk, himself a former cofounder of OpenAI, broke with the company in 2018 and has since steadily morphed into one of its most outspoken critics, accusing it of abandoning its founding commitments to open-sourcing AI technology breakthroughs — even suing the company for fraud, all while running his own competitor from his perch near the White House. Concerns over permissiveness of instructions for creating weapons of mass destruction Concerns over xAI’s content moderation extended beyond censorship, as Linus Ekenstam (@LinusEkenstam on X), the cofounder of lead-generation software Flocurve and a self-described “AI evangelist,” alleged that Grok 3 provided “hundreds of pages of detailed instructions on how to make chemical weapons of mass destruction,” complete with supplier lists and step-by-step guides. “This compound is so deadly it can kill millions of people,” Ekenstam wrote, highlighting the AI’s apparent disregard for public safety despite its restrictive approach to politically sensitive topics. Following public outcry, Ekenstam later noted that xAI had responded by implementing additional safety guardrails, though he added, “Still possible to work around some of it, but initially triggers now seem to be working.” On the flip side, Grok 3 has been praised by some users for its ability to turn simple, natural language plain-text instructions into full-fledged interactive games and applications such as customer service agents in seconds or minutes, and even Twitter cofounder and CEO Jack Dorsey — a Musk peer and sometimes fan — applauded the Grok website and logo’s design. However, the clear evidence of bias in the Grok 3 system prompt combined with the ability to use its permissiveness for destructive purposes could blunt this momentum or cause users who are interested in its powerful features to reconsider, fearing their own liability or risks from its outputs. Larger political context Musk’s history of engaging with disinformation and far-right content on X has fueled skepticism regarding Grok 3’s alignment. Grok 3’s restrictions on criticizing Musk and Trump come after Musk, a major Trump donor during the 2024 U.S. presidential election cycle, made a Nazi-like salute during Trump’s second inauguration celebration, raising concerns about his political influence. As the head of the “Department of Government Efficiency (DOGE),” a new federal agency that repurposed the U.S. Digital Service from U.S. President Obama’s era and tasked it with reducing deficits and dismantling government departments, Musk is also in an immensely influential position in government — and the agency he leads has itself been criticized separately for its fast-moving, broad, aggressive and blunt measures to cut costs and weed out underperforming personnel and ideologies that the Trump Administration opposes, such as diversity, equity and inclusion (DEI) policies and positions. Musk’s leadership of this agency and the new Grok 3 system prompt has, well, (forgive the pun!) prompted fears that AI systems like Grok 3 could be misaligned to advance political agendas at the expense of truth and safety. Walls noted that with Musk working for the U.S. government, Grok 3’s instructions to avoid sources unflattering to Musk and Trump may

xAI’s new Grok 3 model criticized for blocking sources that call Musk, Trump top spreaders of misinformation Read More »

ElevenLabs’ new speech-to-text model Scribe is here with highest accuracy rate so far (96.7% for English)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here. According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3 and Deepgram Nova-3 in accurately converting spoken speech into text on the web, achieving new record-low error rates. The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese and Malayalam. As Flavio Schneider, ElevenLabs lead researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet. “Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a thread. “It can detect non-verbal events (like laughter, sound effects, music and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environments.” “Diarization” is the name given to the process of separating speakers by their vocal qualities on a recording. In fact, ElevenLabs’ documentation states Scribe can distinguish and isolate up to 32 different speakers in the same audio file. While ElevenLabs cautions that Scribe is “best used when high-accuracy transcription is required rather than real-time transcription,” the company also plans to introduce a low-latency version soon, expanding its use for real-time applications. Lowest word error rates (WER) Scribe is designed to handle real-world audio challenges with precision. According to benchmark results from FLEURS and Common Voice, it records the lowest word error rates (WER) for many languages, including Italian (98.7%) and English (96.7%). Key features include: Speaker diarization to differentiate speakers in multi-speaker recordings. Word-level timestamps for detailed transcription accuracy. Detection of non-speech events, such as laughter and background noises. Structured transcript output for seamless integration via API. Pricing and availability Scribe is available now through the ElevenLabs website and API. Pricing is set at $0.40 per hour of input audio, with a 50% discount for the next six weeks. A low-latency version for real-time applications is also in development. What it means for enterprises For enterprise decision-makers, Scribe presents a tool for scalable, high-accuracy transcription, making it useful for industries relying on automated documentation, meeting transcription and content accessibility. The model’s ability to handle diverse languages with high precision also benefits multinational businesses, media companies and customer support applications. Scribe’s pricing structure makes it competitive for businesses that require high-volume transcription services, and its API-based integration allows for seamless adoption in enterprise workflows. Additionally, the upcoming low-latency version could position Scribe as a viable option for real-time communication tools. Coming the same day as rival Hume’s opposite text-to-speech model Octave Timing is everything, and ElevenLabs chose to launch Scribe the same day as rival Hume AI unveiled Octave, an LLM-powered text-to-speech model that allows users to customize AI-generated voices with adjustable emotions. It is designed for content creation, including audiobooks, podcasts and video game voiceovers. Unlike standard TTS systems, Octave considers context beyond individual sentences, adjusting tone, rhythm and cadence dynamically to sound more natural. Hume AI positions Octave as a direct competitor to ElevenLabs’ text-to-speech offerings, highlighting that Octave’s pricing is about half the cost of ElevenLabs’ current AI voice services. While Scribe and Octave serve different functions, their development reflects the growing competition in AI-driven audio models. ElevenLabs is prioritizing precise, multi-language speech recognition, while Hume AI is advancing expressive AI-generated speech. For enterprises, this means more specialized solutions for both transcription and synthetic voice applications, enabling more efficient content production, customer engagement and accessibility tools. Scribe is now live, and ElevenLabs is hosting a virtual event next week with the team behind its development. More details, benchmarks and API documentation are available in the official blog post. source

ElevenLabs’ new speech-to-text model Scribe is here with highest accuracy rate so far (96.7% for English) Read More »

Hume launches new text-to-speech model Octave that generates custom AI voices with adjustable emotions

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More New York City startup Hume AI emerged from stealth two years ago and has since raised multimillions in funding on the basis of its technology that creatives emotive AI voices for use in enterprise applications. Today, it is taking its offerings a step further with a new large-language and speech model called the “Omni-capable text and voice engine,” or Octave for short, designed to produce lifelike, emotionally nuanced speech for use across different forms of content, from audiobooks to prerecorded video game character dialog and film/TV/video. Hume claims Octave is the first text-to-speech system powered by a large language model (LLM) trained not only on text but on speech and emotion tokens, enabling it to understand words in context and adjust tone, rhythm and cadence accordingly — and which the user can adjust on the sentence level with text prompts. “We’re launching the first LLM for text-to-speech — a model that understands words in context, predicting the right emotions, rhythm, cadence and emphasis, making speech sound more human than ever before,” said Alan Cowen, Hume AI’s cofounder and CEO, in a video call interview with VentureBeat. Octave’s capabilities go beyond basic voice generation. It can interpret character traits and style from a script alone, adjusting vocal inflections to match implied emotions. A sarcastic remark will be spoken sarcastically, a panicked sentence will sound urgent, and a whispered secret will be hushed — all without needing explicit direction. In addition, if the user doesn’t like the generated voice or wants to adjust it, they can do so granularly through natural language by simply typing in a text instruction to Octave, such as “happier, sadder, more frustrated, angrier, more sarcastic, more sincere,” etc. “You can describe a character — like a sarcastic medieval peasant — and the model will instantly create that voice, adjusting emotions like anger, sadness or happiness based on your instructions,” Cowen added. “Voice modulation works at the sentence level, but you can also adjust parts of a sentence, instructing the model to convey nuanced emotions like slight frustration mixed with humor or exasperation.” The model also considers context beyond individual sentences. “Unlike traditional models that process text word by word, our model considers entire paragraphs, capturing context to deliver more natural and emotionally accurate speech,” he explained. While the current release focuses on English-language speech, Octave also supports Spanish and is expected to expand its language capabilities in the near future. Tailored for content creation Octave is tailored for content creators and media production, offering a wide range of applications. “This new model is designed for offline text-to-speech — perfect for audiobooks, podcasts, video voiceovers, and video game characters — where creators need realistic, character-specific voices,” Cowen explained. However, the user must access it through Hume’s website either on its Projects page or through an application programming interface (API). The “offline” component refers to the fact that this model is designed to produce discrete audio files that can be added to projects such as videos or audiobooks. It’s not designed to carry on real-time conversation, though that could theoretically be allowed by piping in text queries to the website. Hume’s API allows developers to make up to 50 requests of the new Octave model per minute, with a maximum text length of 5,000 characters and descriptions capped at 1,000 characters. Each request can generate up to five outputs, and the supported audio formats include MP3, WAV and PCM. Hume’s prior EVI series of models allows for streaming, real-time, back-and-forth interactions. They remain available and will continue to be developed. Hume AI offers a subscription-based pricing model with tiers ranging from a free option to Creator, Creator Pro, and Enterprise plans. Here’s a concise breakdown of the offerings: Free ($0/month) – 10,000 characters of text-to-speech per month (~10 minutes) with unlimited custom voices Starter ($3/month) – 30,000 characters (~30 minutes) plus support for up to 20 projects Creator ($10/month) – 100,000 characters (~100 minutes), usage-based pricing for extra characters ($0.20/1,000), and support for up to 1,000 projects Pro ($50/month) – 500,000 characters (~500 minutes), lower usage-based pricing ($0.15/1,000), and support for up to 3,000 projects Scale ($150/month) – 2,000,000 characters (~2,000 minutes), further reduced usage-based pricing ($0.13/1,000), and support for up to 10,000 projects Business ($900/month) – 10,000,000 characters (~10,000 minutes), even lower usage-based pricing ($0.10/1,000), and support for up to 20,000 projects Enterprise (Custom price) – Unlimited usage, custom legal terms, security assurances, significantly discounted bulk pricing, and priority support Altogether, Hume emphasized that its Octave TTS pricing is around half the cost of the competing service from AI voice creation startup ElevenLabs, showing the intensifying competition in the text-to-speech space. In addition, Hume AI conducted a blind comparison study with 180 human raters to benchmark Octave against ElevenLabs. The results showed that Octave was preferred in terms of audio quality (71.6% of trials), naturalness (51.7% of trials), and how well the speech matched descriptions of the desired voice (57.7% of trials), across 120 diverse prompts. To further evaluate its performance, Hume AI has also launched the Expressive TTS Arena, a public benchmark designed to test how well AI models handle longer, expressive speech — an area that previous TTS benchmarks have largely overlooked. Tens of trillions of language tokens Unlike traditional text-to-speech systems that rely on limited speech datasets, Octave TTS is built on an LLM trained on tens of trillions of language tokens. “Traditional text-to-speech models are trained on limited speech data, but ours is built on an LLM trained on tens of trillions of tokens, enabling it to reason, think, and infer emotions from text,” Cowen said. The model was trained using millions of hours of public, long-form speech data and Hume AI’s proprietary datasets of new voices recored by survey participants. “We collected data from people recording themselves through webcams, reacting naturally to videos, telling stories, and talking to others, including friends and family,

Hume launches new text-to-speech model Octave that generates custom AI voices with adjustable emotions Read More »

OpenAI releases ‘largest, most knowledgable’ model GPT-4.5 with reduced hallucinations and high API price

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It’s here: OpenAI has announced the release of GPT-4.5, a research preview of its latest and most powerful large language model (LLM) for chat applications. Unfortunately, it’s far-and-away OpenAI’s most expensive model (more on that below). It’s also not a “reasoning model,” or the new class of models offered by OpenAI, DeepSeek, Anthropic and many others that produce “chains-of-thought,” (CoT) or stream-of-consciousness-like text blocks in which they reflect on their own assumptions and conclusions to try and catch errors before serving up responses/outputs to users. It’s still more of a classical LLM. Nonetheless, acording to OpenAI co-founder and CEO Sam Altman’s post on the social network X, GPT-4.5 is: “The first model that feels like talking to a thoughtful person to me. I have had several moments where I’ve sat back in my chair and been astonished at getting actually good advice from an AI.” However, he cautioned that the company is bumping up against the upper end of its supply of graphics processing units (GPUs) and has had to limit access as a result: “Bad news: It is a giant, expensive model. We really wanted to launch it to plus and pro at the same time, but we’ve been growing a lot and are out of GPUs. We will add tens of thousands of GPUs next week and roll it out to the plus tier then. (Hundreds of thousands coming soon, and I’m pretty sure y’all will use every one we can rack up.) This isn’t how we want to operate, but it’s hard to perfectly predict growth surges that lead to GPU shortages.“ Starting today, GPT-4.5 is available to subscribers of OpenAI’s most expensive subscription tier, ChatGPT Pro ($200 USD/month), and developers across all paid API tiers, with plans to expand access to the far less costly Plus and Team tiers ($20/$30 monthly) next week. GPT‑4.5 is able to access search and OpenAI’s ChatGPT Canvas mode, and users can upload files and images to it, but it doesn’t have other multimodal features like voice mode, video and screensharing — yet. Advancing AI with unsupervised learning GPT-4.5 represents a step forward in AI training, particularly in unsupervised learning, which enhances the model’s ability to recognize patterns, draw connections and generate creative insights. During a livestream demonstration, OpenAI researchers noted that the model was trained on data generated by smaller models and that this improved its “world model.” They also said it was pre-trained across multiple data centers concurrently, suggesting a decentralized approach similar to that of rival lab Nous Research. This training regimen apparently helped GPT-4.5 learn to produce more natural and intuitive interactions, follow user intent more accurately and demonstrate greater emotional intelligence. The model builds on OpenAI’s previous work in AI scaling, reinforcing the idea that increasing data and compute power leads to better AI performance. Compared to its predecessors and contemporaries, GPT-4.5 is expected to produce far fewer hallucinations (37.1% instead of 61.8% for GPT-4o), making it more reliable across a broad range of topics. What makes GPT-4.5 stand out? According to OpenAI, GPT-4.5 is designed to create warm, intuitive and naturally flowing conversations. It has a stronger grasp of nuance and context, enabling more human-like interactions and a greater ability to collaborate effectively with users. The model’s expanded knowledge base and improved ability to interpret subtle cues allow it to excel in various applications, including: Writing assistance: Refining content, improving clarity and generating creative ideas. Programming support: Debugging, suggesting code improvements and automating workflows. Problem-solving: Providing detailed explanations and assisting in practical decision-making. GPT-4.5 also incorporates new alignment techniques that enhance its ability to understand human preferences and intent, further improving user experience. How to access GPT-4.5 ChatGPT Pro users can select GPT-4.5 in the model picker on web, mobile and desktop. Next week, OpenAI will begin rolling it out to Plus and Team users. For developers, GPT-4.5 is available through OpenAI’s API, including the chat completions API, assistants API, and batch API. It supports key features like function calling, structured outputs, streaming, system messages and image inputs, making it a versatile tool for various AI-driven applications. However, it currently does not support multimodal capabilities such as voice mode, video or screen sharing. Pricing and implications for enterprise decision-makers Enterprises and team leaders stand to benefit significantly from the capabilities introduced with GPT-4.5. With its lower hallucination rate, enhanced reliability and natural conversational abilities, GPT-4.5 can support a wide range of business functions: Improved customer engagement: Businesses can integrate GPT-4.5 into support systems for faster, more natural interactions with fewer errors. Enhanced content generation: Marketing and communications teams can produce high-quality, on-brand content efficiently. Streamlined operations: AI-powered automation can assist in debugging, workflow optimization and strategic decision-making. Scalability and customization: The API allows for tailored implementations, enabling enterprises to build AI-driven solutions suited to their needs. At the same time, the pricing for GPT-4.5 through OpenAI’s API for third-party developers looking to build applications on the model appears shockingly high, at $75/$180 per million input/output tokens compared to $2.50/$10 for GPT-4o. And with other rival models released recently — from Anthropic’s Claude 3.7, to Google’s Gemini 2 Pro, to OpenAI’s own reasoning “o” series (o1, o3-mini high, o3) — the question will become if GPT-4.5’s value is worth the relatively high cost, especially through the API. Early reactions from fellow AI researchers and power users vary widely The release of GPT-4.5 has sparked mixed reactions from AI researchers and tech enthusiasts on the social network X, particularly after a version of the model’s “system card” (a technical document outlining its training and evaluations) was leaked, revealing a variety of benchmark results ahead of the official announcement. The actual final system card published by OpenAI following the leak contains notable differences, including the removal of a line that “GPT-4.5 is not a frontier model, but it is OpenAI’s largest LLM, improving on GPT-4’s computational efficiency by

OpenAI releases ‘largest, most knowledgable’ model GPT-4.5 with reduced hallucinations and high API price Read More »