VentureBeat

Anthropic raises $3.5 billion, reaching $61.5 billion valuation as AI investment frenzy continues

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Anthropic closed a $3.5 billion series E funding round, valuing the AI company at $61.5 billion post-money, the firm announced today. Lightspeed Venture Partners led the round with a $1 billion contribution, cementing Anthropic’s status as one of the world’s most valuable private companies and demonstrating investors’ unwavering appetite for leading AI developers despite already astronomical valuations. The financing attracted participation from an impressive roster of investors including Salesforce Ventures, Cisco Investments, Fidelity Management & Research Company, General Catalyst, D1 Capital Partners, Jane Street, Menlo Ventures and Bessemer Venture Partners. “With this investment, Anthropic will advance its development of next-generation AI systems, expand its compute capacity, deepen its research in mechanistic interpretability and alignment, and accelerate its international expansion,” the company said in its announcement. Revenue skyrockets 1,000% year-over-year as enterprise clients flock to Claude Anthropic’s dramatic valuation reflects its exceptional commercial momentum. The company’s annualized revenue reached $1 billion by December 2024, representing a tenfold increase year-over-year, according to people familiar with the company’s finances. That growth has accelerated further, with revenue reportedly increasing by 30% in just the first two months of 2025, according to a Bloomberg report. Founded in 2021 by former OpenAI researchers including siblings Dario and Daniela Amodei, Anthropic has positioned itself as a more research-focused and safety-oriented alternative to its chief rival. The company’s Claude chatbot has gained significant market share since its public launch in March 2023, particularly in enterprise applications. Krishna Rao, Anthropic’s CFO, said in a statement that the investment “fuels our development of more intelligent and capable AI systems that expand what humans can achieve,” adding that “continued advances in scaling across all aspects of model training are powering breakthroughs in intelligence and expertise.” AI valuation metrics evolve: 58x revenue multiple shows market maturation The funding round comes at a pivotal moment in AI startup valuations. While Anthropic’s latest round values the company at roughly 58 times its annualized revenue, down from approximately 150 times a year ago, this still represents an extraordinary premium compared to traditional software companies, which typically trade at 10 to 20 times revenue. What we’re witnessing with AI valuations isn’t merely another tech bubble, but rather a fundamental recalibration of how growth is valued in the marketplace. Traditional valuation models simply weren’t designed for companies experiencing growth curves this steep. When a firm like Anthropic can increase revenue tenfold in a single year — something that would take a typical software company a decade to achieve — investors are essentially buying future market dominance rather than current financials. This phenomenon creates a fascinating paradox: As AI companies grow larger, their revenue multiples are contracting, yet they remain astronomically high compared to any other sector. This suggests investors aren’t simply drunk on AI hype but are making calculated bets that these firms will eventually grow into their valuations by capturing the enormous productivity gains that advanced AI promises to unleash across every sector of the economy. Anthropic’s valuation surge contrasts with conventional tech wisdom that multiples should decrease as companies mature. The continued investor enthusiasm underscores beliefs that AI represents a fundamental technological shift rather than just another software category. Amazon and Google back Anthropic’s B2B strategy with $11 billion combined investment The funding comes after Anthropic secured major strategic investments from tech giants. Amazon has invested a total of $8 billion in the startup, making AWS Anthropic’s “primary cloud and training partner” for deploying its largest AI models. Google has committed more than $3 billion to the company. Unlike OpenAI, which has increasingly focused on developing consumer applications, Anthropic has positioned itself primarily as a B2B technology provider enabling other companies to build with its models. This approach has attracted clients ranging from startups like Cursor and Replit to global corporations including Zoom, Snowflake and Pfizer. “Replit integrated Claude into ‘Agent’ to turn natural language into code, driving 10X revenue growth,” Anthropic noted in its announcement. Other notable implementations include Thomson Reuters’ tax platform CoCounsel, which uses Claude to assist tax professionals, and Novo Nordisk, which has used Claude to reduce clinical study report writing “from 12 weeks to 10 minutes.” Anthropic also highlighted that Claude now helps power Amazon’s Alexa+, “bringing advanced AI capabilities to millions of households and Prime members.” SoftBank, OpenAI and DeepSeek intensify global AI competition with billion-dollar moves The funding announcement comes just weeks after reports that SoftBank is finalizing a massive $40 billion investment in OpenAI at a $260 billion pre-money valuation, highlighting the escalating stakes in the AI race. Meanwhile, Chinese AI firm DeepSeek has disrupted the market with its R1 model, which reportedly achieved similar capabilities to U.S. competitors’ systems but at a fraction of the cost. This challenge has prompted established players to accelerate their development timelines. Anthropic recently responded with the launch of Claude 3.7 Sonnet and Claude Code, with Sonnet 3.7 specifically optimized for programming tasks. The company claims these products have “set a new high-water mark in coding abilities” and plans “to make further progress in the coming months.” The trillion-dollar AI market: Investors bet big despite profitability questions The massive funding rounds flowing into leading AI companies signal that investors believe the generative AI market could indeed reach the $1 trillion valuation that analysts predict within the next decade. However, profitability remains a distant goal. Like its competitors, Anthropic continues to operate at a significant loss as it invests heavily in research, model development, and compute infrastructure. The long path to profitability hasn’t deterred investors, who view these companies as platforms that could fundamentally transform how humans interact with technology. As the AI arms race intensifies, the key question remains whether these multi-billion dollar valuations will eventually be justified by sustainable business models or if the current investment environment represents an AI bubble. For now, Anthropic’s successful fundraise suggests investors are firmly betting on the former. source

Anthropic raises $3.5 billion, reaching $61.5 billion valuation as AI investment frenzy continues Read More »

GPT-4.5 for enterprise: Are accuracy and knowledge worth the high cost?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The release of OpenAI GPT-4.5 has been somewhat disappointing, with many pointing out its insane price point (about 10 to 20X more expensive than Claude 3.7 Sonnet and 15 to 30X more costly than GPT-4o). However, given that this is OpenAI’s largest and most powerful non-reasoning model, it is worth considering its strengths and the areas where it shines.  Better knowledge and alignment There is little detail about the model’s architecture or training corpus, but we have a rough estimate that it has been trained with 10X more compute. And, the model was so large that OpenAI needed to spread training across multiple data centers to finish in a reasonable time. Bigger models have a larger capacity for learning world knowledge and the nuances of human language (given that they have access to high-quality training data). This is evident in some of the metrics presented by the OpenAI team. For example, GPT-4.5 has a record-high ranking on PersonQA, a benchmark that evaluates hallucinations in AI models. Practical experiments also show that GPT-4.5 is better than other general-purpose models at remaining true to facts and following user instructions. Users have pointed out that GPT-4.5’s responses feel more natural and context-aware than previous models. Its ability to follow tone and style guidelines has also improved. After the release of GPT-4.5, AI scientist and OpenAI co-founder Andrej Karpathy, who had early access to the model, said he “expect[ed] to see an improvement in tasks that are not reasoning-heavy, and I would say those are tasks that are more EQ (as opposed to IQ) related and bottlenecked by e.g. world knowledge, creativity, analogy making, general understanding, humor, etc.” However, evaluating writing quality is also very subjective. In a survey that Karpathy ran on different prompts, most people preferred the responses of GPT-4o over GPT-4.5. He wrote on X: “Either the high-taste testers are noticing the new and unique structure but the low-taste ones are overwhelming the poll. Or we’re just hallucinating things. Or these examples are just not that great. Or it’s actually pretty close and this is way too small sample size. Or all of the above.” Better document processing In its experiments, Box, which has integrated GPT-4.5 into its Box AI Studio product, wrote that GPT-4.5 is “particularly potent for enterprise use-cases, where accuracy and integrity are mission critical… our testing shows that GPT-4.5 is one of the best models available both in terms of our eval scores and also its ability to handle many of the hardest AI questions that we have come across.” In its internal evaluations, Box found GPT-4.5 to be more accurate on enterprise document question-answering tasks — outperforming the original GPT-4 by about 4 percentage points on their test set​. Source: Box Box’s tests also indicated that GPT-4.5 excelled at math questions embedded in business documents, which older GPT models often struggled with​. For example, it was better at answering questions about financial documents that required reasoning over data and performing calculations.  GPT-4.5 also showed improved performance at extracting information from unstructured data. In a test that involved extracting fields from hundreds of legal documents, GPT-4.5 was 19% more accurate than GPT-4o. Planning, coding, evaluating results Given its improved world knowledge, GPT-4.5 can also be a suitable model for creating high-level plans for complex tasks. Broken-down steps can then be handed over to smaller but more efficient models to elaborate and execute. According to Constellation Research, “In initial testing, GPT-4.5 seems to show strong capabilities in agentic planning and execution, including multi-step coding workflows and complex task automation.” GPT-4.5 can also be useful in coding tasks that require internal and contextual knowledge. GitHub now provides limited access to the model in its Copilot coding assistant and notes that GPT-4.5 “performs effectively with creative prompts and provides reliable responses to obscure knowledge queries.” Given its deeper world knowledge, GPT-4.5 is also suitable for “LLM-as-a-Judge” tasks, where a strong model evaluates the output of smaller models. For example, a model such as GPT-4o or o3 can generate one or several responses, reason over the solution and pass the final answer to GPT-4.5 for revision and refinement. Is it worth the price? Given the huge costs of GPT-4.5, though, it is very hard to justify many of the use cases. But that doesn’t mean it will remain that way. One of the constant trends we have seen in recent years is the plummeting costs of inference, and if this trend applies to GPT-4.5, it is worth experimenting with it and finding ways to put its power to use in enterprise applications. It is also worth noting that this new model can become the basis for future reasoning models. Per Karpathy: “Keep in mind that that GPT4.5 was only trained with pretraining, supervised finetuning and RLHF [reinforcement learning from human feedback], so this is not yet a reasoning model. Therefore, this model release does not push forward model capability in cases where reasoning is critical (math, code, etc.)… Presumably, OpenAI will now be looking to further train with reinforcement learning on top of GPT-4.5 model to allow it to think, and push model capability in these domains.” source

GPT-4.5 for enterprise: Are accuracy and knowledge worth the high cost? Read More »

Cohere’s first vision model Aya Vision is here with broad, multilingual understanding and open weights — but there’s a catch

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Canadian AI startup Cohere launched in 2019 specifically targeting the enterprise, but independent research has shown it has so far struggled to gain much of a market share among third-party developers compared to rival proprietary U.S. model providers such as OpenAI and Anthropic, not to mention the rise of Chinese open-source competitor DeepSeek. Yet Cohere continues to bolster its offerings: Today, its non-profit research division Cohere for AI announced the release of its first vision model, Aya Vision, a new open-weight multimodal AI model that integrates language and vision capabilities and boasts the differentiator of supporting inputs in 23 different languages spoken by what Cohere says in an official blog post is “half the world’s population,” making it appeal to a wide global audience. Aya Vision is designed to enhance AI’s ability to interpret images, generate text, and translate visual content into natural language, making multilingual AI more accessible and effective. This would be especially helpful for enterprises and organizations operating in multiple markets around the world with different language preferences. It’s available now on Cohere’s website and on AI code communities Hugging Face and Kaggle under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license, allowing researchers and developers to freely use, modify and share the model for non-commercial purposes as long as proper attribution is given. In addition, Aya Vision is available through WhatsApp, allowing users to interact with the model directly in a familiar environment. This limits its use for enterprises and as an engine for paid apps or moneymaking workflows, unfortunately. It comes in 8-billion and 32-billion parameter versions (parameters refer to the number of internal settings in an AI model, including its weights and biases, with more usually denoting a more powerful and performant model). Supports 23 languages and counting Even though leading AI models from rivals can understand text across multiple languages, extending this capability to vision-based tasks is a challenge. But Aya Vision overcomes this by allowing users to generate image captions, answer visual questions, translate images, and perform text-based language tasks in a diverse set of languages: 1. English 2. French 3. German 4. Spanish 5. Italian 6. Portuguese 7. Japanese 8. Korean 9. Chinese 10. Arabic 11. Greek 12. Persian 13. Polish 14. Indonesian 15. Czech 16. Hebrew 17. Hindi 18. Dutch 19. Romanian 20. Russian 21. Turkish 22. Ukrainian 23. Vietnamese In its blog post, Cohere showed how Aya Vision can analyze imagery and text on product packaging and provide translations or explanations. It can also identify and describe art styles from different cultures, helping users learn about objects and traditions through AI-powered visual understanding. Aya Vision’s capabilities have broad implications across multiple fields: • Language learning and education: Users can translate and describe images in multiple languages, making educational content more accessible. • Cultural preservation: The model can generate detailed descriptions of art, landmarks and historical artifacts, supporting cultural documentation in underrepresented languages. • Accessibility tools: Vision-based AI can assist visually impaired users by providing detailed image descriptions in their native language. • Global communication: Real-time multimodal translation enables organizations and individuals to communicate across languages more effectively. Strong performance and high efficiency across leading benchmarks One of Aya Vision’s standout features is its efficiency and performance relative to model size. Despite being significantly smaller than some leading multimodal models, Aya Vision has outperformed much larger alternatives in several key benchmarks. • Aya Vision 8B outperforms Llama 90B, which is 11 times larger. • Aya Vision 32B outperforms Qwen 72B, Llama 90B and Molmo 72B, all of which are at least twice as large (or more). • Benchmarking results on AyaVisionBench and m-WildVision show Aya Vision 8B achieving win rates of up to 79%, and Aya Vision 32B reaching 72% win rates in multilingual image understanding tasks. A visual comparison of efficiency vs. performance highlights Aya Vision’s advantage. As shown in the efficiency vs. performance trade-off graph, Aya Vision 8B and 32B demonstrate best-in-class performance relative to their parameter size, outperforming much larger models while maintaining computational efficiency. The tech innovations powering Aya Vision Cohere For AI attributes Aya Vision’s performance gains to several key innovations: • Synthetic annotations: The model leverages synthetic data generation to enhance training on multimodal tasks. • Multilingual data scaling: By translating and rephrasing data across languages, the model gains a broader understanding of multilingual contexts. • Multimodal model merging: Advanced techniques combine insights from both vision and language models, improving overall performance. These advancements allow Aya Vision to process images and text with greater accuracy while maintaining strong multilingual capabilities. The step-by-step performance improvement chart showcases how incremental innovations, including synthetic fine-tuning (SFT), model merging, and scaling, contributed to Aya Vision’s high win rates. Implications for enterprise decision-makers Despite Aya Vision’s ostensibly catering to the enterprise, businesses may have a hard time making much use of it given its restrictive non-commercial licensing terms. Nonetheless, CEOs, CTOs, IT leaders and AI researchers may use the models to explore AI-driven multilingual and multimodal capabilities within their organizations — particularly in research, prototyping and benchmarking. Enterprises can still use it for internal research and development, evaluating multilingual AI performance and experimenting with multimodal applications. CTOs and AI teams will find Aya Vision valuable as a highly efficient, open-weight model that outperforms much larger alternatives while requiring fewer computational resources. This makes it a useful tool for benchmarking against proprietary models, exploring potential AI-driven solutions, and testing multilingual multimodal interactions before committing to a commercial deployment strategy. For data scientists and AI researchers, Aya Vision is much more useful. Its open-source nature and rigorous benchmarks provide a transparent foundation for studying model behavior, fine-tuning in non-commercial settings, and contributing to open AI advancements. Whether used for internal research, academic collaborations, or AI ethics evaluations, Aya Vision serves as a cutting-edge resource for enterprises looking to stay at the forefront of multilingual and multimodal AI — without

Cohere’s first vision model Aya Vision is here with broad, multilingual understanding and open weights — but there’s a catch Read More »

OpenAI releases GPT-4.5 claiming 10X efficiency over GPT-4, but says it’s ‘not a frontier model’

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It’s here: OpenAI has announced the release of GPT-4.5, a research preview of its latest and most powerful large language model (LLM) for chat applications. Unfortunately, it’s far-and-away OpenAI’s most expensive model (more on that below). It’s also not a “reasoning model,” or the new class of models offered by OpenAI, DeepSeek, Anthropic and many others that produce “chains-of-thought,” (CoT) or stream-of-consciousness-like text blocks in which they reflect on their own assumptions and conclusions to try and catch errors before serving up responses/outputs to users. It’s still more of a classical LLM. Nonetheless, acording to OpenAI co-founder and CEO Sam Altman’s post on the social network X, GPT-4.5 is: “The first model that feels like talking to a thoughtful person to me. I have had several moments where I’ve sat back in my chair and been astonished at getting actually good advice from an AI.” However, he cautioned that the company is bumping up against the upper end of its supply of graphics processing units (GPUs) and has had to limit access as a result: “Bad news: It is a giant, expensive model. We really wanted to launch it to plus and pro at the same time, but we’ve been growing a lot and are out of GPUs. We will add tens of thousands of GPUs next week and roll it out to the plus tier then. (Hundreds of thousands coming soon, and I’m pretty sure y’all will use every one we can rack up.) This isn’t how we want to operate, but it’s hard to perfectly predict growth surges that lead to GPU shortages.“ Starting today, GPT-4.5 is available to subscribers of OpenAI’s most expensive subscription tier, ChatGPT Pro ($200 USD/month), and developers across all paid API tiers, with plans to expand access to the far less costly Plus and Team tiers ($20/$30 monthly) next week. GPT‑4.5 is able to access search and OpenAI’s ChatGPT Canvas mode, and users can upload files and images to it, but it doesn’t have other multimodal features like voice mode, video and screensharing — yet. Advancing AI with unsupervised learning GPT-4.5 represents a step forward in AI training, particularly in unsupervised learning, which enhances the model’s ability to recognize patterns, draw connections and generate creative insights. During a livestream demonstration, OpenAI researchers noted that the model was trained on data generated by smaller models and that this improved its “world model.” They also said it was pre-trained across multiple data centers concurrently, suggesting a decentralized approach similar to that of rival lab Nous Research. This training regimen apparently helped GPT-4.5 learn to produce more natural and intuitive interactions, follow user intent more accurately and demonstrate greater emotional intelligence. The model builds on OpenAI’s previous work in AI scaling, reinforcing the idea that increasing data and compute power leads to better AI performance. Compared to its predecessors and contemporaries, GPT-4.5 is expected to produce far fewer hallucinations (37.1% instead of 61.8% for GPT-4o), making it more reliable across a broad range of topics. What makes GPT-4.5 stand out? According to OpenAI, GPT-4.5 is designed to create warm, intuitive and naturally flowing conversations. It has a stronger grasp of nuance and context, enabling more human-like interactions and a greater ability to collaborate effectively with users. The model’s expanded knowledge base and improved ability to interpret subtle cues allow it to excel in various applications, including: Writing assistance: Refining content, improving clarity and generating creative ideas. Programming support: Debugging, suggesting code improvements and automating workflows. Problem-solving: Providing detailed explanations and assisting in practical decision-making. GPT-4.5 also incorporates new alignment techniques that enhance its ability to understand human preferences and intent, further improving user experience. How to access GPT-4.5 ChatGPT Pro users can select GPT-4.5 in the model picker on web, mobile and desktop. Next week, OpenAI will begin rolling it out to Plus and Team users. For developers, GPT-4.5 is available through OpenAI’s API, including the chat completions API, assistants API, and batch API. It supports key features like function calling, structured outputs, streaming, system messages and image inputs, making it a versatile tool for various AI-driven applications. However, it currently does not support multimodal capabilities such as voice mode, video or screen sharing. Pricing and implications for enterprise decision-makers Enterprises and team leaders stand to benefit significantly from the capabilities introduced with GPT-4.5. With its lower hallucination rate, enhanced reliability and natural conversational abilities, GPT-4.5 can support a wide range of business functions: Improved customer engagement: Businesses can integrate GPT-4.5 into support systems for faster, more natural interactions with fewer errors. Enhanced content generation: Marketing and communications teams can produce high-quality, on-brand content efficiently. Streamlined operations: AI-powered automation can assist in debugging, workflow optimization and strategic decision-making. Scalability and customization: The API allows for tailored implementations, enabling enterprises to build AI-driven solutions suited to their needs. At the same time, the pricing for GPT-4.5 through OpenAI’s API for third-party developers looking to build applications on the model appears shockingly high, at $75/$180 per million input/output tokens compared to $2.50/$10 for GPT-4o. And with other rival models released recently — from Anthropic’s Claude 3.7, to Google’s Gemini 2 Pro, to OpenAI’s own reasoning “o” series (o1, o3-mini high, o3) — the question will become if GPT-4.5’s value is worth the relatively high cost, especially through the API. Early reactions from fellow AI researchers and power users vary widely The release of GPT-4.5 has sparked mixed reactions from AI researchers and tech enthusiasts on the social network X, particularly after a version of the model’s “system card” (a technical document outlining its training and evaluations) was leaked, revealing a variety of benchmark results ahead of the official announcement. The actual final system card published by OpenAI following the leak contains notable differences, including the removal of a line that “GPT-4.5 is not a frontier model, but it is OpenAI’s largest LLM, improving on GPT-4’s computational efficiency by

OpenAI releases GPT-4.5 claiming 10X efficiency over GPT-4, but says it’s ‘not a frontier model’ Read More »

How Yelp reviewed competing LLMs for correctness, relevance and tone to develop its user-friendly AI assistant

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The review app Yelp has provided helpful information to diners and other consumers for decades. It had experimented with machine learning since its early years. During the recent explosion in AI technology, it was still encountering stumbling blocks as it worked to employ modern large language models to power some features.  Yelp realized that customers, especially those who only occasionally used the app, had trouble connecting with its AI features, such as its AI-powered assistant.  “One of the obvious lessons that we saw is that it’s very easy to build something that looks cool, but very hard to build something that looks cool and is very useful,” Craig Saldanha, chief product officer at Yelp, told VentureBeat in an interview. It certainly wasn’t all easy. After it launched Yelp Assistant, its AI-powered service search assistant, in April 2024 to a broader swathe of customers, Yelp saw usage figures for its AI tools actually beginning to decline.  “The one that took us by surprise was when we launched this as a beta to consumers — a few users and folks who are very familiar with the app — [and they] loved it. We got such a strong signal that this would be successful, and then we rolled it out to everyone, [and] the performance just fell off,” Saldanha said. “It took us a long time to figure out why.” It turned out that Yelp’s more casual users, those who occasionally visited the site or app to find a new tailor or plumber, did not expect to be be immediately talking with an AI representative.  From simple to more involved AI features Most people know Yelp as a website and app to look up restaurant reviews and menu photos. I use Yelp to find pictures of food in new eateries and to see if others share my feelings about a particularly bland dish. It’s also a place that tells me if a coffee shop I plan to use as a workspace for the day has WiFi, plugs and seating, a rarity in Manhattan. Saldanha recalled that Yelp had been investing in AI “for the better part of a decade.” “Way back when, I’d say in the 2013-2014 timeline, we were in a very different generation of AI, so our focus was on building our own models to do things like query understanding. Part of the job of making a meaningful connection is helping people refine their own search intent,” he said. But as AI continued to evolve, so did Yelp’s needs. It invested in AI to recognize food in pictures submitted by users to identify popular dishes, and then it launched new ways to connect to tradespeople and services and help guide users’ searches on the platform.  Yelp Assistant helps Yelp users find the right “Pro” to work with. People can tap the chatbox and either use the prompts or type out the task they need done. The assistant then asks follow-up questions to narrow down potential service providers before drafting a message to Pros who might want to bid for the job. Saldanha said Pros are encouraged to respond to users themselves, though he acknowledges that larger brands often have call centers that handle messages generated by Yelp’s AI Assistant.  In addition to Yelp Assistant, Yelp launched Review Insights and Highlights. LLMs analyze user and reviewer sentiment, which Yelp collects into sentiment scores. Yelp uses a detailed GPT-4o prompt to generate a dataset for a list of topics. Then, it’s fine-tuned with a GPT-4o-mini model.  The review highlights feature, which presents information from reviews, also uses an LLM prompt to generate a dataset. However, it is based on GPT-4, with fine-tuning from GPT-3.5 Turbo. Yelp said it will update the feature with GPT-4o and o1.  Yelp joined many other companies using LLMs to improve the usefulness of reviews by adding better search functions based on customer comments. For example, Amazon launched Rufus, an AI-powered assistant that helps people find recommended items. Big models and performance needs For many of its new AI features, including the AI assistant, Yelp turned to OpenAI’s GPT-4o and other models, but Saldanha noted that no matter the model, Yelp’s data is the secret sauce for its assistants. Yelp did not want to lock itself into one model and kept an open mind about which LLMs would provide the best service for its customers.  “We use models from OpenAI, Anthropic and other models on AWS Bedrock,” Saldanha said.  Saldanha explained that Yelp created a rubric to test the performance of models in correctness, relevance, consciousness, customer safety and compliance. He said that “it ‘s really the top end models” that performed best. The company runs a small pilot with each model before taking into account iteration cost and response latency.  Teaching users Yelp also embarked on a concerted effort to educate both casual and power users to get comfortable with the new AI features. Saldanha said one of the first things they realized, especially with the AI assistant, is that the tone had to feel human. It couldn’t respond too fast or too slowly; it couldn’t be overly encouraging or too brusque. “We put a bunch of effort into helping people feel comfortable, especially with that first response. It took us almost four months to get this second piece right. And as soon as we did, it was very obvious and you could see that hockey stick in engagement,” Saldanha said.  Part of that process involved training the Yelp Assistant to use certain words and to sound positive. After all that fine-tuning, Saldanha said they’re finally seeing higher usage numbers for Yelp’s AI features.  source

How Yelp reviewed competing LLMs for correctness, relevance and tone to develop its user-friendly AI assistant Read More »

Less is more: How ‘chain of draft’ could cut AI costs by 90% while improving performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A team of researchers at Zoom Communications has developed a breakthrough technique that could dramatically reduce the cost and computational resources needed for AI systems to tackle complex reasoning problems, potentially transforming how enterprises deploy AI at scale. The method, called chain of draft (CoD), enables large language models (LLMs) to solve problems with minimal words — using as little as 7.6% of the text required by current methods while maintaining or even improving accuracy. The findings were published in a paper last week on the research repository arXiv. “By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT (chain-of-thought) in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks,” write the authors, led by Silei Xu, a researcher at Zoom. Chain of draft (red) maintains or exceeds the accuracy of chain-of-thought (yellow) while using dramatically fewer tokens across four reasoning tasks, demonstrating how concise AI reasoning can cut costs without sacrificing performance. (Credit: arxiv.org) How ‘less is more’ transforms AI reasoning without sacrificing accuracy COD draws inspiration from how humans solve complex problems. Rather than articulating every detail when working through a math problem or logical puzzle, people typically jot down only essential information in abbreviated form. “When solving complex tasks — whether mathematical problems, drafting essays or coding — we often jot down only the critical pieces of information that help us progress,” the researchers explain. “By emulating this behavior, LLMs can focus on advancing toward solutions without the overhead of verbose reasoning.” The team tested their approach on numerous benchmarks, including arithmetic reasoning (GSM8k), commonsense reasoning (date understanding and sports understanding) and symbolic reasoning (coin flip tasks). In one striking example in which Claude 3.5 Sonnet processed sports-related questions, the COD approach reduced the average output from 189.4 tokens to just 14.3 tokens — a 92.4% reduction — while simultaneously improving accuracy from 93.2% to 97.3%. Slashing enterprise AI costs: The business case for concise machine reasoning “For an enterprise processing 1 million reasoning queries monthly, CoD could cut costs from $3,800 (CoT) to $760, saving over $3,000 per month,” AI researcher Ajith Vallath Prabhakar writes in an analysis of the paper. The research comes at a critical time for enterprise AI deployment. As companies increasingly integrate sophisticated AI systems into their operations, computational costs and response times have emerged as significant barriers to widespread adoption. Current state-of-the-art reasoning techniques like (CoT), which was introduced in 2022, have dramatically improved AI’s ability to solve complex problems by breaking them down into step-by-step reasoning. But this approach generates lengthy explanations that consume substantial computational resources and increase response latency. “The verbose nature of CoT prompting results in substantial computational overhead, increased latency and higher operational expenses,” writes Prabhakar. What makes COD particularly noteworthy for enterprises is its simplicity of implementation. Unlike many AI advancements that require expensive model retraining or architectural changes, CoD can be deployed immediately with existing models through a simple prompt modification. “Organizations already using CoT can switch to CoD with a simple prompt modification,” Prabhakar explains. The technique could prove especially valuable for latency-sensitive applications like real-time customer support, mobile AI, educational tools and financial services, where even small delays can significantly impact user experience. Industry experts suggest that the implications extend beyond cost savings, however. By making advanced AI reasoning more accessible and affordable, COD could democratize access to sophisticated AI capabilities for smaller organizations and resource-constrained environments. As AI systems continue to evolve, techniques like COD highlight a growing emphasis on efficiency alongside raw capability. For enterprises navigating the rapidly changing AI landscape, such optimizations could prove as valuable as improvements in the underlying models themselves. “As AI models continue to evolve, optimizing reasoning efficiency will be as critical as improving their raw capabilities,” Prabhakar concluded. The research code and data have been made publicly available on GitHub, allowing organizations to implement and test the approach with their own AI systems. source

Less is more: How ‘chain of draft’ could cut AI costs by 90% while improving performance Read More »

A standard, open framework for building AI agents is coming from Cisco, LangChain and Galileo

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More One goal for an agentic future is for AI agents from different organizations to freely and seamlessly talk to one another. But getting to that point requires interoperability, and these agents may have been built with different LLMs, data frameworks and code. To achieve interoperability, developers of these agents must agree on how they can communicate with each other. This is a challenging task.  A group of companies, including Cisco, LangChain, LlamaIndex, Galileo and Glean, have now created AGNTCY, an open-source collective with the goal of creating an industry-standard agent interoperability language. AGNTCY aims to make it easy for any AI agent to communicate and exchange data with another. Uniting AI Agents “Just like when the cloud and the internet came about and accelerated applications and all social interactions at a global scale, we want to build the Internet of Agents that accelerate all of human work at a global scale,” said Vijoy Pandey, head of Outshift by Cisco, Cisco’s incubation arm, in an interview with VentureBeat.  Pandey likened AGNTCY to the advent of the Transmission Control Protocol/Internet Protocol (TCP/IP) and the domain name system (DNS), which helped organize the internet and allowed for interconnections between different computer systems.  “The way we are thinking about this problem is that the original internet allowed for humans and servers and web farms to all come together,” he said. “This is the Internet of Agents, and the only way to do that is to make it open and interoperable.” Cisco, LangChain and Galileo will act as AGNTCY’s core maintainers, with Glean and LlamaIndex as contributors. However, this structure may change as the collective adds more members.  Standardizing a fast-moving industry AI agents cannot be islands. To reach their full potential, they must be able to communicate with other agents that lie outside of an enterprise’s network. This is where interoperability comes in. Setting standards in traditional industries is challenging enough; it becomes even more difficult for technology like AI, where upgrades and model changes occur every few months. However, this is not the first time a standard has been proposed for generative AI.  LangChain, one of AGNTCY’s core members, has its own protocol for working with agents built on frameworks other than LangChain. The Agent Protocol, launched in November last year, allows LangChain agents to talk to agents created with AutoGen, CrewAI or any other framework. Meanwhile, Anthropic announced its Model Context Protocol (MCP) in November. This protocol aims to standardize how models and AI tools connect to data sources. But while many developers have embraced MCP, it’s not exactly a standard just yet.  Yash Sheth, cofounder of AI evaluation platform Galileo, said standardization “is critical.” “Standardization is needed, in fact, it will drive increased velocity for agentic adoption. Today, teams are building in silos, having to figure out how to develop their own infrastructure components from scratch,” Sheth said in an email. “Standardization of multi-agentic systems can only happen if these agents powered by non-deterministic models have a strong anchor in measuring and reporting their performance, accuracy and reliability.” Sheth admits that making AI agents interoperable can be complex. AGNTCY “wants to encourage developers to extend these specs, APIs and tools to suit their needs instead of reinventing the wheel, which will be crucial to achieving standardization.” LangChain CEO Harrison Chase said in a separate conversation that creating a standard is not impossible, especially now that it’s easier to build the agents themselves.  “Building agents is already possible, and being done. Replit, Klarna, LinkedIn, Uber, Appfolio and many others have all already done this. Agents aren’t a thing of the future, they are now. Now that we know how to build agents, the next step is to allow them to connect to each other. That is what a standard agent protocol will help enable,” Chase said.  A platform and a language all at once  Pandey envisions AGNTCY as more than just a set of codes for agents. It will also allow customers to discover agents from different developers who run the AGNTCY standard.  “Customers can stitch together all these agents on the AGNTCY platform so they can discover, compose, deploy and evaluate as they build their workflows,” Pandey said.   AGNTCY still needs to recruit more AI players to add new agents to the platform and gain momentum as a standard. After all, for something to become an industry standard, there needs to be mass adoption, to prevent the establishment of too many competing standards.  That’s where projects like AGNTCY face an uphill battle. Pandey said the collective has been speaking with many other industry players, and they want to get as many viewpoints as possible while developing the platform. That will take time.  In the meantime, enterprises continue to experiment and even deploy AI agents. Maybe in the future, these will all be able to speak to each other.  source

A standard, open framework for building AI agents is coming from Cisco, LangChain and Galileo Read More »

Anthropic just launched a new platform that lets everyone in your company collaborate on AI — not just the tech team

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Anthropic has launched a significant overhaul to its developer platform, introducing team collaboration features and extended reasoning capabilities for its Claude AI assistant that aim to solve major pain points for organizations implementing AI solutions. The upgraded Anthropic Console now allows cross-functional teams to collaborate on AI prompts — the text instructions that guide AI models — and also supports the company’s latest Claude 3.7 Sonnet model with new controls for complex problem-solving. “We built our shareable prompts to help our customers and developers work together effectively on prompt development,” an Anthropic spokesperson told VentureBeat. “What we learned from talking to customers was that prompt creation rarely happens in isolation. It’s a team effort involving developers, subject matter experts, product managers and QA folks all trying to get the best results.” The move addresses a growing challenge for enterprises adopting AI: Coordinating prompt engineering work across technical and business teams. Before this update, companies often resorted to sharing prompts through documents or messaging apps, creating version control issues and knowledge silos. How Claude’s new thinking controls balance advanced AI power with budget-friendly cost management The updated platform also introduces “extended thinking controls” for Claude 3.7 Sonnet, allowing developers to specify when the AI should use deeper reasoning while setting budget limits to control costs. “Claude 3.7 Sonnet gives you two modes in one model: Standard mode for quick responses, and extended thinking mode when you need deeper problem-solving,” Anthropic told VentureBeat. “In extended thinking mode, Claude takes time to work through problems step-by-step, similar to how humans approach complex challenges.” This dual approach helps companies balance performance with expenditure — a key consideration as AI implementation costs come under greater scrutiny amid widespread adoption. Breaking AI’s technical barrier: How Anthropic is democratizing advanced AI for business users Anthropic’s platform update signals a broader industry shift to make AI development more accessible to non-technical staff, potentially accelerating enterprise adoption. “We believe AI should be accessible to everyone, not just technical specialists,” according to the Anthropic spokesperson. “Our console’s prompt library serves as a knowledge hub that helps spread best practices throughout organizations. This means someone in marketing or customer service can benefit from prompts developed by the technical team, without needing to understand all the technical details themselves.” Complete prompt lifecycle management: Anthropic’s strategic edge in the enterprise AI platform race The enhanced Console positions Anthropic distinctly in the competitive AI market, where rivals like OpenAI and Google have focused primarily on model capabilities rather than comprehensive development workflows. “We’ve recognized that enterprise success depends on the entire workflow around those models,” the spokesperson said. “Unlike other companies that offer either powerful models OR developer tools, we’re providing both in an integrated ecosystem.” Industry analysts say this approach could appeal particularly to mid-sized enterprises seeking to implement AI without expanding their technical teams. For larger organizations, the collaborative features may help standardize AI implementations across departments — a growing priority as AI usage expands beyond initial pilot projects. As enterprise AI spending continues to grow, companies are developing more sophisticated methods to measure return on investment. According to Anthropic, customers track metrics including time savings, quality improvements and new capabilities enabled by AI implementations. “The Console updates specifically target development efficiency, knowledge sharing and consistent quality,” the spokesperson told VentureBeat. “Companies can reduce their prompt development cycles by using our collaborative tools, and better scale their Claude implementation.” The updated Anthropic Console is available immediately to all users. source

Anthropic just launched a new platform that lets everyone in your company collaborate on AI — not just the tech team Read More »

How the A-MEM framework supports powerful long-context memory so LLMs can take on more complicated tasks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers at Rutgers University, Ant Group and Salesforce Research have proposed a new framework that enables AI agents to take on more complicated tasks by integrating information from their environment and creating automatically linked memories to develop complex structures.  Called A-MEM, the framework uses large language models (LLMs) and vector embeddings to extract useful information from the agent’s interactions and create memory representations that can be retrieved and used efficiently. With enterprises looking to integrate AI agents into their workflows and applications, having a reliable memory management system can make a big difference. Why LLM memory is important Memory is critical in LLM and agentic applications because it enables long-term interactions between tools and users. Current memory systems, however, are either inefficient or based on predefined schemas that might not fit the changing nature of applications and the interactions they face. “Such rigid structures, coupled with fixed agent workflows, severely restrict these systems’ ability to generalize across new environments and maintain effectiveness in long-term interactions,” the researchers write. “The challenge becomes increasingly critical as LLM agents tackle more complex, open-ended tasks, where flexible knowledge organization and continuous adaptation are essential.” A-MEM explained A-MEM introduces an agentic memory architecture that enables autonomous and flexible memory management for LLM agents, according to the researchers. Every time an LLM agent interacts with its environment— whether by accessing tools or exchanging messages with users — A-MEM generates “structured memory notes” that capture both explicit information and metadata such as time, contextual description, relevant keywords and linked memories. Some details are generated by the LLM as it examines the interaction and creates semantic components. Once a memory is created, an encoder model is used to calculate the embedding value of all its components. The combination of LLM-generated semantic components and embeddings provides both human-interpretable context and a tool for efficient retrieval through similarity search. Building up memory over time One of the interesting components of the A-MEM framework is a mechanism for linking different memory notes without the need for predefined rules. For each new memory note, A-MEM identifies the nearest memories based on the similarity of their embedding values. The LLM then analyzes the full content of the retrieved candidates to choose the ones that are most suitable to link to the new memory.  “By using embedding-based retrieval as an initial filter, we enable efficient scalability while maintaining semantic relevance,” the researchers write. “A-MEM can quickly identify potential connections even in large memory collections without exhaustive comparison. More importantly, the LLM-driven analysis allows for nuanced understanding of relationships that goes beyond simple similarity metrics.” After creating links for the new memory, A-MEM updates the retrieved memories based on their textual information and relationships with the new memory. As more memories are added over time, this process refines the system’s knowledge structures, enabling the discovery of higher-order patterns and concepts across memories. In each interaction, A-MEM uses context-aware memory retrieval to provide the agent with relevant historical information. Given a new prompt, A-MEM first computes its embedding value with the same mechanism used for memory notes. The system uses this embedding to retrieve the most relevant memories from the memory store and augment the original prompt with contextual information that helps the agent better understand and respond to the current interaction.  “The retrieved context enriches the agent’s reasoning process by connecting the current interaction with related past experiences and knowledge stored in the memory system,” the researchers write. A-MEM in action The researchers tested A-MEM on LoCoMo, a dataset of very long conversations spanning multiple sessions. LoCoMo contains challenging tasks such as multi-hop questions that require synthesizing information across multiple chat sessions and reasoning questions that require understanding time-related information. The dataset also contains knowledge questions that require integrating contextual information from the conversation with external knowledge. The experiments show that A-MEM outperforms other baseline agentic memory techniques on most task categories, especially when using open source models. Notably, researchers say that A-MEM achieves superior performance while lowering inference costs, requiring up to 10X fewer tokens when answering questions. Effective memory management is becoming a core requirement as LLM agents become integrated into complex enterprise workflows across different domains and subsystems. A-MEM — whose code is available on GitHub — is one of several frameworks that enable enterprises to build memory-enhanced LLM agents. source

How the A-MEM framework supports powerful long-context memory so LLMs can take on more complicated tasks Read More »

Operational excellence with AI: How companies are boosting success with process intelligence everyone can access

Presented by ARIS According to the Process Excellence Network, 2025 will see a laser focus on operational excellence as a pillar of survival and competitiveness.  Optimized process flows — the right people, following the right rules at the right time — make for happier customers, more motivated employees, better staying power and a healthier bottom line. And it’s against this backdrop that technologies that power process intelligence have gained significant traction.    Now, that landscape is shifting. And how! Generative AI is reshaping the business world, revolutionizing operations and enabling AI-powered process optimization. But, in a business world still generally defined by process immaturity, rushing to roll out emerging technologies doesn’t necessarily equate to faster time-to-value.  Sorry to say! So where should enterprises be focusing their use of AI to boost operational efficiency?  Without doubt, some of the most meaningful operational changes are happening at the interface between intelligence and its application — the human behind the keyboard. Use case: AI makes everyone a process expert Here’s an example: Using AI to make business process transformation accessible to everyone across the value chain, a world-renowned mortgage provider translated a technical exercise into a value exercise. At this mortgage company, the organization is constantly stretched between process excellence and heavy regulatory obligations.  Thousands of employees manage the touchpoints in each mortgage process from application to acceptance and disbursement, and in many cases, detailed rules apply to ensure compliance. Failure to deliver on the process can have serious financial consequences for a customer as well as trigger a regulatory audit or fine. But following the right rule at the right time involves a laborious search each instance — a massive source of process waste. So, the company decided to make it possible for every person in the process chain to have responsive and dynamic access to the specific best practices needed for each step — presented in a way everyone could understand. Embedding AI meant it could ‘democratize’ access to process knowledge enabling each team member to search for relevant rules based on natural language rather than technical terms. The operational impact has been huge: Now, everyone can feed suggestions back into the process, providing context. And the company learns from any errors or breaks in the process — tightening them up just in time. AI-powered process intelligence: A motor for achieving time-to-value It’s becoming increasingly challenging for companies to innovate in the realm of process excellence these days. Long lead times come at a huge cost; reducing competitiveness; adding internal process waste; shifting perspectives away from evidence-based transformation, and ultimately leading to poorer process performance. Time-to-value is, therefore, one of the most compelling reasons for embedding AI into process intelligence with most impact made in the areas of process automation and operational resilience. Here’s why: 1. Process automation companies first need to understand how their processes are performing — where bad practices threaten strategic targets — and discover opportunities for improvement. This requires both data and context to identify next steps and meaningful outcomes. Traditionally, this has been an expensive and time-consuming practice, requiring experienced resources and long-running analyses. AI is a differentiator: The right tooling can make a company’s processes visible and accessible to more than just its process experts. With strategic stakeholders and lines of business users involved, the very people who best know the business can contribute to innovation, design new processes and cut out endless wasted hours briefing process experts. AI, essentially, lowers the barrier to entry so everyone can come into the conversation, from process experts to line-of-business users. This speeds up time-to-value in transformation. 2. Operational resilience companies are required to perform a combination of practices to prevent, respond to, recover and learn from business disruption. This too can be expensive — both in terms of the time it takes to setup and the impact of failure. AI is a differentiator: Rather than simply ‘survive,’ companies can use AI to build true resilience — or antifragility — in which they learn from system failures or cybersecurity breaches and operationalize that knowledge. By putting AI into the loop on process breaks and testing potential scenarios via a digital twin of the organization, non-process experts and stakeholders are empowered to mitigate risk before escalations. Imagine building the practice of antifragility into an end-to-end process lifecycle: Making everything visible via AI, analyzed through AI, kept up to date through AI and adaptive to change with AI — all at a minimal cost. For some, this is already a reality. Data is king, context is crown So how, on a practical level, does the right AI technology enable operational efficiency? Firstly, it earns its buy-in by making highly technical information understandable by and available to a wider audience (full access). Secondly, it provides a holistic context by means of its integration into a single process intelligence suite (full lifecycle). Full access: Non-process experts must be able to make data-driven decisions faster with AI powered insights that recommend best practices and design principles for dashboards. Any queries that arise should be answered by means of automatically generated visualizations which can be integrated directly into apps — saving time and effort. While Generative AI (the choreography) powers most of this, the right solution also includes agentic AI (the dance steps) that has the power to be utilized for good in advanced automation applications. Full lifecycle: In the mission to achieve operational excellence, pockets of success simply won’t cut it. AI must be deployed across the full process intelligence lifecycle — not just at a use case, capability or functional level. Only when utilized across the entire enterprise — horizontally and vertically — can its insights be understood in true context and reliably aligned to strategy. Turn your process intelligence into business value with ARIS When you embed AI into every part of a single, platform-agnostic process management platform, and make your processes transparent and accessible to all, everyone can be part of the solution. That’s the ARIS advantage. Recognized as

Operational excellence with AI: How companies are boosting success with process intelligence everyone can access Read More »