VentureBeat

Vanta’s AI agent wants to run your compliance program — and it just might

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Vanta, the San Francisco-based compliance automation startup, unveiled its most ambitious artificial intelligence product yet on Tuesday — an autonomous AI agent that handles end-to-end security and compliance workflows without human intervention. The launch signals a major evolution in how enterprises manage governance, risk and compliance (GRC) programs as regulatory pressures intensify and manual processes become unsustainable. The Vanta AI Agent, entering private beta immediately with general availability planned for July, represents a fundamental shift from AI as a productivity enhancer to AI as a trusted program partner. Unlike traditional automation tools that follow pre-defined rules, the agent proactively identifies compliance issues, suggests fixes and takes action on behalf of security teams while keeping humans in control of final decisions. “We built the Vanta AI Agent to meet teams exactly where they are, stepping in during the most manual parts of compliance and surfacing issues they may not catch on their own,” said Jeremy Epling, Vanta’s Chief Product Officer, in an interview with VentureBeat. “By minimizing human error and taking on repetitive tasks, the Vanta AI agent enables teams to focus on higher-value work—the work that truly builds trust.” The timing reflects urgent market needs. According to Vanta’s State of Trust report, 55% of companies report security risks at an all-time high, with AI-powered threats contributing to the escalation. Simultaneously, organizations spend increasing amounts of time on compliance — U.K. companies alone dedicate 12 working weeks annually to compliance tasks, according to industry data. How AI tackles policy management and audit preparation in four critical areas The AI Agent tackles four critical areas that typically consume hundreds of hours of manual work. For policy onboarding, the system scans uploaded documents, extracts key details including version history and service level agreements, and automatically maps policies to relevant compliance controls while providing rationale for its recommendations. “Policies outline how an organization governs its systems and data, but managing them is often a slow, resource-intensive process that involves manually mapping them to dozens of compliance and security controls,” the company explained in its announcement. The agent eliminates this bottleneck by automating control mapping and generating policy change summaries for annual reviews. Perhaps most significantly, the agent proactively monitors for inconsistencies between written policies and actual practices—a common source of audit failures. “If an SLA outlined in your policy is five days, but the SLA you’re monitoring with Vanta’s automated tests is ten days, the agent will flag this mismatch and provide recommendations and next steps to make a quick fix,” Epling explained. The system also functions as an intelligent knowledge base, answering complex policy questions in real time. Security teams can query the agent about password requirements, vendor risk coverage, or compliance status for frameworks like SOC 2, ISO 27001 or HIPAA without manually searching through documentation. Customers report saving 12 hours weekly as AI streamlines compliance workflows Early customer feedback suggests substantial productivity gains. Anne Simpson, head of privacy, security, compliance at Databook, reported that her team saves 12 hours weekly since implementing the AI Agent. “The Vanta AI Agent complements my team’s expertise by filling in knowledge gaps, helping us learn faster and double-checking critical information—ultimately saving us 12 hours weekly. And in our organization, time is money,” Simpson said. The agent’s evidence verification capabilities address another persistent pain point. Auditors frequently request revisions or clarifications during evidence reviews, creating bottlenecks that can derail audit timelines. The AI Agent reviews uploaded documents against audit requirements to ensure accuracy and completeness, identifying gaps before they become issues. “With so many detailed evidence requirements, it’s not unusual for auditors or consultants to ask for revisions or clarifications after their manual evidence review,” Epling noted. “The Vanta AI Agent reviews uploaded evidence against audit requirements to confirm accuracy and completeness, offering clear guidance when revisions are needed and reducing back-and-forth with auditors and internal stakeholders.” $150M series C funding validates booming compliance automation market Vanta’s AI Agent launch comes as the compliance automation market experiences unprecedented growth. The company raised $150 million in Series C funding in July 2024, reaching a $2.45 billion valuation, with Sequoia Capital leading the round alongside Goldman Sachs and J.P. Morgan. The startup now serves over 8,000 customers globally, surpassing $100 million in annual recurring revenue. The broader market validates this trajectory. Compliance-focused startups are attracting significant investor attention as enterprises grapple with expanding regulatory requirements, from the EU AI Act to enhanced cybersecurity frameworks. Traditional manual approaches cannot scale to meet current demands. “Automation has always been at the heart of Vanta,” Epling emphasized. “The Vanta AI Agent continues this by eliminating time-consuming, manual, and repetitive tasks, such as gathering and reviewing evidence for audits, keeping your security program in sync across policies, controls, risks, and automation.” Advanced security features protect sensitive compliance data while enabling AI innovation Unlike rule-based automation or reactive chatbots, the Vanta AI Agent operates with the same platform access as human users, enabling proactive program improvements and one-click resolutions. The system benefits from complete context about a company’s compliance history and current risk posture, unlocking additional value through personalized recommendations. Security remains paramount given the sensitive nature of compliance data. Vanta leverages its existing identity and authorization system, ensuring users can only access information they’re already authorized to see. The company maintains formal Data Processing Agreements with third-party partners, guaranteeing that shared data won’t train external models. “We exclude documents marked as sensitive from being accessed by the Agent and give users control over this setting,” Epling explained. As one of the first companies certified under ISO 42001, Vanta applies rigorous AI governance standards across its platform. Why human control remains essential in AI-powered compliance automation Despite the automation, human oversight remains central to the system’s design. “The Vanta AI Agent is designed to empower, not replace, human teams,” Epling stressed. “Teams retain full control and approval over any recommended changes

Vanta’s AI agent wants to run your compliance program — and it just might Read More »

OpenAI announces 80% price drop for o3, it’s most powerful reasoning model

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Good news, AI developers! OpenAI has announced a substantial price cut on o3, its flagship reasoning large language model (LMM), slashing costs by a whopping 80% for both input and output tokens. (Recall tokens are the individual numeric strings that LLMs use to represent words, phrases, mathematical and coding strings, and other content. They are representations of the semantic constructions the model has learned through training, and in essence, are the LLMs’ native language. Most LLM providers offer their models through application programming interfaces or APIs that developers can build apps atop of or plug their external apps into, and most LLM providers charge them for the privilege at a cost per million tokens). The update positions the model as a more accessible option for developers seeking advanced reasoning capabilities, and places OpenAI in more direct pricing competition with rival models such as Gemini 2.5 Pro from Google DeepMind, Claude Opus 4 from Anthropic, and DeepSeek’s reasoning suite. COME SEE OpenAI Head of Product, API Olivier Godement in-person at VB TRANSFORM 2025 in San Francisco June 24-25. REGISTER NOW while tickets still remain. Announced by Altman himself on X Sam Altman, CEO of OpenAI, confirmed the change in a post on X highlighting that the new pricing is intended to encourage broader experimentation, writing: “we dropped the price of o3 by 80%!! excited to see what people will do with it now. think you’ll also be happy with o3-pro pricing for the performance :)” The cost of using o3 is now $2 per million input tokens and $8 per million output tokens, with an extra discount of $0.50 per million tokens when the user enters information that has been “cached,” or is stored and identical to what they provided before. This marks a significant reduction from the previous rates of $10 (input) and $40 (output), as OpenAI researcher Noam Brown pointed out on X. Ray Fernando, a developer and early adopter, celebrated the pricing drop in a post writing “LFG!” short for “let’s fucking go!” The sentiment reflects a growing enthusiasm among builders looking to scale their projects without prohibitive model access costs. Price comparison to other rival reasoning LLMs The price adjustment comes at a time when AI providers are competing more aggressively on both performance and affordability. A comparison with other leading AI reasoning models illustrates how significant this move could be: Gemini 2.5 Pro Preview, developed by Google DeepMind, charges between $1.25 and $2.50 for input depending on prompt size, and $10 to $15 for output. While its integration with Google Search offers additional functionality, that service carries its own cost — free for the first 1,500 requests per day, then $35 per thousand requests. Claude Opus 4, marketed by Anthropic as a model optimized for complex tasks, is the most expensive of the group, charging $15 per million input tokens and $75 for output. Prompt caching read and write services come at $1.50 and $18.75 respectively, although users can unlock a 50% discount with batch processing. DeepSeek’s models, particularly DeepSeek-Reasoner and DeepSeek-Chat, undercut much of the market with aggressive low pricing. Input tokens range from $0.07 to $0.55 depending on caching and time of day, while output ranges from $1.10 to $2.19. Discounted rates during off-peak hours bring prices down even further, to as low as $0.035 for cached inputs. Model Input Cached Input Output Discount Notes OpenAI o3 $2.00 (down from $10.00) $0.50 $8.00 (down from $40.00) Flex Processing: $5 / $20 Gemini 2.5 Pro $1.25 – $2.50 $0.31 – $0.625 $10.00 – $15.00 Higher rate applies to prompts >200k tokens Claude Opus 4 $15.00 $1.50 (read) / $18.75 (write) $75.00 50% off with batch processing DeepSeek-Chat $0.07 (hit)$0.27 (miss) — $1.10 50% off during off-peak hours DeepSeek-Reasoner $0.14 (hit)$0.55 (miss) — $2.19 75% off during off-peak hours In addition, independent third-party AI model comparison and research group Artificial Analysis ran the new o3 through its suite of benchmarking tests on various tasks, and found it cost $390 to complete them all, versus $971 for Gemini 2.5 Pro and $342 for Claude 4 Sonnet. Narrowing the cost vs. intelligence gap for developers OpenAI’s pricing move not only narrows the gap with ultra-low-cost models like DeepSeek but also puts downward pressure on higher-priced offerings like Claude Opus and Gemini Pro. Unlike Claude or Gemini, OpenAI’s o3 also now offers a flex mode for synchronous processing that charges $5 for input and $20 for output per million tokens, giving developers more control over compute cost and latency depending on workload type. o3 is currently available through the OpenAI API and Playground. Users with balances as low as a few dollars can now explore the model’s full capabilities, enabling prototyping and deployment with fewer financial barriers. This could particularly benefit startups, research teams, and individual developers who previously found higher-tier model access cost-prohibitive. By substantially lowering the cost of its most advanced reasoning model, OpenAI is signaling a broader trend in the generative AI space: premium performance is quickly becoming more affordable, and developers now have a growing number of viable, economically scalable options. source

OpenAI announces 80% price drop for o3, it’s most powerful reasoning model Read More »

AlphaOne gives AI developers a new dial to control LLM ‘thinking’ and boost performance

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more A new framework from researchers at the University of Illinois, Urbana-Champaign, and the University of California, Berkeley gives developers more control over how large language models (LLMs) “think,” improving their reasoning capabilities while making more efficient use of their inference budget. The framework, called AlphaOne (α1), is a test-time scaling technique, tweaking a model’s behavior during inference without needing costly retraining. It provides a universal method for modulating the reasoning process of advanced LLMs, offering developers the flexibility to improve performance on complex tasks in a more controlled and cost-effective manner than existing approaches. The challenge of slow thinking In recent years, developers of large reasoning models (LRMs), such as OpenAI o3 and DeepSeek-R1, have incorporated mechanisms inspired by “System 2” thinking—the slow, deliberate, and logical mode of human cognition. This is distinct from “System 1” thinking, which is fast, intuitive, and automatic. Incorporating System 2 capabilities enables models to solve complex problems in domains like mathematics, coding, and data analysis. Models are trained to automatically generate transition tokens like “wait,” “hmm,” or “alternatively” to trigger slow thinking. When one of these tokens appears, the model pauses to self-reflect on its previous steps and correct its course, much like a person pausing to rethink a difficult problem. However, reasoning models don’t always effectively use their slow-thinking capabilities. Different studies show they are prone to either “overthinking” simple problems, wasting computational resources, or “underthinking” complex ones, leading to incorrect answers. As the AlphaOne paper notes, “This is because of the inability of LRMs to find the optimal human-like system-1-to-2 reasoning transitioning and limited reasoning capabilities, leading to unsatisfactory reasoning performance.” There are two common methods to address this. Parallel scaling, like the “best-of-N” approach, runs a model multiple times and picks the best answer, which is computationally expensive. Sequential scaling attempts to modulate the thinking process during a single run. For example, s1 is a technique that forces more slow thinking by adding “wait” tokens in the model’s context, while the “Chain of Draft” (CoD) method prompts the model to use fewer words, thereby reducing its thinking budget. These methods, however, offer rigid, one-size-fits-all solutions that are often inefficient. A universal framework for reasoning Instead of simply increasing or reducing the thinking budget, the researchers behind AlphaOne asked a more fundamental question: Is it possible to develop a better strategy for transitioning between slow and fast thinking that can modulate reasoning budgets universally? Their framework, AlphaOne, gives developers fine-grained control over the model’s reasoning process at test time. The system works by introducing Alpha (α), a parameter that acts as a dial to scale the model’s thinking phase budget. Before a certain point in the generation, which the researchers call the “α moment,” AlphaOne strategically schedules how frequently it inserts a “wait” token to encourage slow, deliberate thought. This allows for what the paper describes as “both controllable and scalable thinking.” Once the “α moment” is reached, the framework inserts a </think> token in the mode’s context, ending the slow thinking process and forcing the model to switch to fast reasoning and produce its final answer. Previous techniques typically apply what the researchers call “sparse modulation,” making only a few, isolated adjustments, such as adding a “wait” token once or twice during the entire process. AlphaOne, in contrast, can be configured to intervene often (dense) or rarely (sparse), giving developers more granular control than other methods.  AlphaOne modulates reasoning by adding “wait” tokens into the model’s context at different intervals Source: AlphaOne GitHub page “We see AlphaOne as a unified interface for deliberate reasoning, complementary to chain-of-thought prompting or preference-based tuning, and capable of evolving alongside model architectures,” the AlphaOne team told VentureBeat in written comments. “The key takeaway is not tied to implementation details, but to the general principle: slow-to-fast structured modulation of the reasoning process enhances capability and efficiency.” AlphaOne in action The researchers tested AlphaOne on three different reasoning models, with parameter sizes ranging from 1.5 billion to 32 billion. They evaluated its performance across six challenging benchmarks in mathematics, code generation, and scientific problem-solving. They compared AlphaOne against three baselines: the vanilla, unmodified model; the s1 method that monotonically increases slow thinking; and the Chain of Draft (CoD) method that monotonically decreases it. The results produced several key findings that are particularly relevant for developers building AI applications. First, a “slow thinking first, then fast thinking” strategy leads to better reasoning performance in LRMs. This highlights a fundamental gap between LLMs and human cognition, which is usually structured based on fast thinking followed by slow thinking. Unlike humans, researchers found that models benefit from enforced slow thinking before acting fast.  “This suggests that effective AI reasoning emerges not from mimicking human experts, but from explicitly modulating reasoning dynamics, which aligns with practices such as prompt engineering and staged inference already used in real-world applications,” the AlphaOne team said. “For developers, this means that system design should actively impose a slow-to-fast reasoning schedule to improve performance and reliability, at least for now, while model reasoning remains imperfect.” Another interesting finding was that investing in slow thinking can lead to more efficient inference overall. “While slow thinking slows down reasoning, the overall token length is significantly reduced with α1, inducing more informative reasoning progress brought by slow thinking,” the paper states. This means that although the model takes more time to “think,” it produces a more concise and accurate reasoning path, ultimately reducing the total number of tokens generated and lowering inference costs. Compared to s1-style baselines, AlphaOne reduces average token usage by ~21%, resulting in lower compute overhead, while concurrently boosting reasoning accuracy by 6.15%, even on PhD-level math, science, and code problems. While AlphaOne makes slow progress in the beginning, it ends up getting better results with fewer tokens compared to other test-time scaling methods Source: AlphaOne GitHub page “For enterprise applications like

AlphaOne gives AI developers a new dial to control LLM ‘thinking’ and boost performance Read More »

Apple makes major AI advance with image generation technology rivaling DALL-E and Midjourney

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Apple‘s machine learning research team has developed a breakthrough AI system for generating high-resolution images that could challenge the dominance of diffusion models, the technology powering popular image generators like DALL-E and Midjourney. The advancement, detailed in a research paper published last week, introduces “STARFlow,” a system developed by Apple researchers in collaboration with academic partners that combines normalizing flows with autoregressive transformers to achieve what the team calls “competitive performance” with state-of-the-art diffusion models. The breakthrough comes at a critical moment for Apple, which has faced mounting criticism over its struggles with artificial intelligence. At Monday’s Worldwide Developers Conference, the company unveiled only modest AI updates to its Apple Intelligence platform, highlighting the competitive pressure facing a company that many view as falling behind in the AI arms race. “To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution,” wrote the research team, which includes Apple machine learning researchers Jiatao Gu, Joshua M. Susskind, and Shuangfei Zhai, along with academic collaborators from institutions including The University of California, Berkeley and Georgia Tech. How Apple is fighting back against OpenAI and Google in the AI wars The STARFlow research represents Apple’s broader effort to develop distinctive AI capabilities that could differentiate its products from competitors. While companies like Google and OpenAI have dominated headlines with their generative AI advances, Apple has been working on alternative approaches that could offer unique advantages. The research team tackled a fundamental challenge in AI image generation: scaling normalizing flows to work effectively with high-resolution images. Normalizing flows, a type of generative model that learns to transform simple distributions into complex ones, have traditionally been overshadowed by diffusion models and generative adversarial networks in image synthesis applications. “STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality,” the researchers wrote, demonstrating the system’s versatility across different types of image synthesis challenges. Inside the mathematical breakthrough that powers Apple’s new AI system Apple’s research team introduced several key innovations to overcome the limitations of existing normalizing flow approaches. The system employs what researchers call a “deep-shallow design,” using “a deep Transformer block [that] captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial.” The breakthrough also involves operating in the “latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling,” according to the paper. This approach allows the model to work with compressed representations of images rather than raw pixel data, significantly improving efficiency. Unlike diffusion models, which rely on iterative denoising processes, STARFlow maintains the mathematical properties of normalizing flows, enabling “exact maximum likelihood training in continuous spaces without discretization.” What STARFlow means for Apple’s future iPhone and Mac products The research arrives as Apple faces increasing pressure to demonstrate meaningful progress in artificial intelligence. A recent Bloomberg analysis highlighted how Apple Intelligence and Siri have struggled to compete with rivals. Apple’s modest announcements at WWDC this week underscored the company’s challenges in the AI space. For Apple, STARFlow’s exact likelihood training could offer advantages in applications requiring precise control over generated content or in scenarios where understanding model uncertainty is critical for decision-making — potentially valuable for enterprise applications and on-device AI capabilities that Apple has emphasized. The research demonstrates that alternative approaches to diffusion models can achieve comparable results, potentially opening new avenues for innovation that could play to Apple’s strengths in hardware-software integration and on-device processing. Why Apple is betting on university partnerships to solve its AI problem The research exemplifies Apple’s strategy of collaborating with leading academic institutions to advance its AI capabilities. Co-author Tianrong Chen, a doctoral student at Georgia Tech who interned with Apple’s machine learning research team, brings expertise in stochastic optimal control and generative modeling. The collaboration also includes Ruixiang Zhang from U.C. Berkeley’s mathematics department and Laurent Dinh, a machine learning researcher known for pioneering work on flow-based models at Google Brain and DeepMind. “Crucially, our model remains an end-to-end normalizing flow,” the researchers emphasized, distinguishing their approach from hybrid methods that sacrifice mathematical tractability for improved performance. The full research paper is available on arXiv, providing technical details for researchers and engineers looking to build upon this work in the competitive field of generative AI. While STARFlow represents a significant technical achievement, the real test will be whether Apple can translate such research breakthroughs into the kind of consumer-facing AI features that have made competitors like ChatGPT household names. For a company that once revolutionized entire industries with products like the iPhone, the question isn’t whether Apple can innovate in AI — it’s whether they can do it fast enough. source

Apple makes major AI advance with image generation technology rivaling DALL-E and Midjourney Read More »

Phonely’s new AI agents hit 99% accuracy—and customers can’t tell they’re not human

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more A three-way partnership between AI phone support company Phonely, inference optimization platform Maitai, and chip maker Groq has achieved a breakthrough that addresses one of conversational artificial intelligence’s most persistent problems: the awkward delays that immediately signal to callers they’re talking to a machine. The collaboration has enabled Phonely to reduce response times by more than 70% while simultaneously boosting accuracy from 81.5% to 99.2% across four model iterations, surpassing GPT-4o’s 94.7% benchmark by 4.5 percentage points. The improvements stem from Groq’s new capability to instantly switch between multiple specialized AI models without added latency, orchestrated through Maitai’s optimization platform. The achievement solves what industry experts call the “uncanny valley” of voice AI — the subtle cues that make automated conversations feel distinctly non-human. For call centers and customer service operations, the implications could be transformative: one of Phonely’s customers is replacing 350 human agents this month alone. Why AI phone calls still sound robotic: the four-second problem Traditional large language models like OpenAI’s GPT-4o have long struggled with what appears to be a simple challenge: responding quickly enough to maintain natural conversation flow. While a few seconds of delay barely registers in text-based interactions, the same pause feels interminable during live phone conversations. “One of the things that most people don’t realize is that major LLM providers, such as OpenAI, Claude, and others have a very high degree of latency variance,” said Will Bodewes, Phonely’s founder and CEO, in an exclusive interview with VentureBeat. “4 seconds feels like an eternity if you’re talking to a voice AI on the phone – this delay is what makes most voice AI today feel non-human.” The problem occurs roughly once every ten requests, meaning standard conversations inevitably include at least one or two awkward pauses that immediately reveal the artificial nature of the interaction. For businesses considering AI phone agents, these delays have created a significant barrier to adoption. “This kind of latency is unacceptable for real-time phone support,” Bodewes explained. “Aside from latency, conversational accuracy and humanlike responses is something that legacy LLM providers just haven’t cracked in the voice realm.” How three startups solved AI’s biggest conversational challenge The solution emerged from Groq’s development of what the company calls “zero-latency LoRA hotswapping” — the ability to instantly switch between multiple specialized AI model variants without any performance penalty. LoRA, or Low-Rank Adaptation, allows developers to create lightweight, task-specific modifications to existing models rather than training entirely new ones from scratch. “Groq’s combination of fine-grained software controlled architecture, high-speed on-chip memory, streaming architecture, and deterministic execution means that it is possible to access multiple hot-swapped LoRAs with no latency penalty,” explained Chelsey Kantor, Groq’s chief marketing officer, in an interview with VentureBeat. “The LoRAs are stored and managed in SRAM alongside the original model weights.” This infrastructure advancement enabled Maitai to create what founder Christian DalSanto describes as a “proxy-layer orchestration” system that continuously optimizes model performance. “Maitai acts as a thin proxy layer between customers and their model providers,” DalSanto said. “This allows us to dynamically select and optimize the best model for every request, automatically applying evaluation, optimizations, and resiliency strategies such as fallbacks.” The system works by collecting performance data from every interaction, identifying weak points, and iteratively improving the models without customer intervention. “Since Maitai sits in the middle of the inference flow, we collect strong signals identifying where models underperform,” DalSanto explained. “These ‘soft spots’ are clustered, labeled, and incrementally fine-tuned to address specific weaknesses without causing regressions.” From 81% to 99% accuracy: the numbers behind AI’s human-like breakthrough The results demonstrate significant improvements across multiple performance dimensions. Time to first token — how quickly an AI begins responding — dropped 73.4% from 661 milliseconds to 176 milliseconds at the 90th percentile. Overall completion times fell 74.6% from 1,446 milliseconds to 339 milliseconds. Perhaps more significantly, accuracy improvements followed a clear upward trajectory across four model iterations, starting at 81.5% and reaching 99.2% — a level that exceeds human performance in many customer service scenarios. “We’ve been seeing about 70%+ of people who call into our AI not being able to distinguish the difference between a person,” Bodewes told VentureBeat. “Latency is, or was, the dead giveaway that it was an AI. With a custom fine tuned model that talks like a person, and super low-latency hardware, there isn’t much stopping us from crossing the uncanny valley of sounding completely human.” The performance gains translate directly to business outcomes. “One of our biggest customers saw a 32% increase in qualified leads as compared to a previous version using previous state-of-the-art models,” Bodewes noted. 350 human agents replaced in one month: call centers go all-in on AI The improvements arrive as call centers face mounting pressure to reduce costs while maintaining service quality. Traditional human agents require training, scheduling coordination, and significant overhead costs that AI agents can eliminate. “Call centers are really seeing huge benefits from using Phonely to replace human agents,” Bodewes said. “One of the call centers we work with is actually replacing 350 human agents completely with Phonely just this month. From a call center perspective this is a game changer, because they don’t have to manage human support agent schedules, train agents, and match supply and demand.” The technology shows particular strength in specific use cases. “Phonely really excels in a few areas, including industry-leading performance in appointment scheduling and lead qualification specifically, beyond what legacy providers are capable of,” Bodewes explained. The company has partnered with major firms handling insurance, legal, and automotive customer interactions. The hardware edge: why Groq’s chips make sub-second AI possible Groq’s specialized AI inference chips, called Language Processing Units (LPUs), provide the hardware foundation that makes the multi-model approach viable. Unlike general-purpose graphics processors typically used for AI inference, LPUs optimize specifically for the sequential nature of language processing. “The LPU architecture

Phonely’s new AI agents hit 99% accuracy—and customers can’t tell they’re not human Read More »

Beyond the keyword: How AI is forging the future of enterprise search

Presented by SalesForce In the rapidly evolving landscape of artificial intelligence, the very definition of “search” is undergoing a profound transformation. No longer confined to simple keyword matching, enterprise search is shifting toward understanding and reasoning over data in a conversational interface, and ultimately, enabling autonomous AI agents  to reshape how work gets done in an organization. This evolution — driven by innovations like vector search, knowledge graphs, and agentic reasoning — is reshaping how businesses access, understand, and act upon their vast troves of information.  Data challenge: Enabling AI agents to access enterprise-wide data Today, organizations struggle to navigate their vast and fragmented data landscape. The data your organization gathers generally takes three forms — structured, semi-structured and unstructured.  Organizations produce enormous volumes of unstructured content — call transcripts, formal documents, Slack messages, and emails that hold immense value but often go underutilized.  Leveraging this content is challenging due to inconsistent formats, poor data quality, and growing requirements around privacy and security. These challenges will only increase with the advent of interoperable AI agents who must not only identify accurate information but also act on this data autonomously and securely while maintaining critical trust, privacy and compliance guardrails.  To be truly effective,  AI agents need real-time access to comprehensive, accurate, and contextually rich information — especially about their customers.  Often, they’re unable to identify insights needed to solve customer problems or take proactive action. For instance, data about a customer’s loyalty history or family status may be buried across systems — blocking even simple autonomous actions, like sending a personalized push notification for a family-friendly resort deal. When data is siloed, fragmented, or noisy, AI agents are forced to guess, resulting in  unreliable outputs. This leaves organizations stuck in the ‘garbage in-garbage out’ cycle that’s plagued many CIOs today. Simply put, bad data = bad AI.  The evolution of search: From keywords to meaning Traditional search engines rely heavily on keywords. If a document doesn’t contain the exact phrase you’re looking for, you might miss crucial information. The first significant leap in AI-enabled search came with vector search. Where queries are often spoken or expressed in natural language, systems need to grasp the meaning behind the words. Vector search converts data and queries into numerical representations (vectors), allowing the system to match based on semantic similarity, not just literal word presence. This means a query like “customer sentiment about product XYZ” can find relevant documents even if they don’t explicitly use the word “sentiment,” but discuss customer opinions, reviews, or feelings. However, the complexity of enterprise data demands more. While vector search is a powerful initial step, the sheer variety of content formats and the need for deeper contextual understanding led to the rise of enriched indexing. Here, AI goes a step further, first understanding the data and building a graph-like ontology. Think of this as organizing messy, unstructured data (fact: 80% of enterprise data today is unstructured in nature) — documents, emails, presentations — into a structured network of who, what, where, when, and why. This “knowledge graph” provides the critical context that enhances the quality of search responses, allowing for more insightful and accurate results. Bridging the divide: Unstructured, structured, and deep search The enterprise doesn’t just deal with unstructured documents; a vast amount of critical information resides in structured databases. To truly unify the search experience, natural language to SQL (NL2SQL) technology comes into play. This innovation allows users to pose questions about structured data in plain English (e.g., “Show me sales figures for Q1 in California for product A”), and the AI system automatically translates it into SQL code for data retrieval. This complements vector search, creating a holistic approach to querying both unstructured and structured information. At Salesforce, we are heavily focused on optimizing Search and Retrieval Augmented Generation (RAG) in Data Cloud to enhance the performance and accuracy of gen AI applications, particularly for powering AI agents like Agentforce. Salesforce’s hybrid approach of combining vector search and keyword search addresses the limitations of either model alone — leading to more consistent and accurate results. Additionally, Salesforce is implementing methods to embed additional metadata into our documents and indexes. This allows AI models to access structured context before generating responses, helping to prevent the LLM from fabricating answers based on partial or ambiguous data.  Grounding LLMs to increase autonomous trustworthiness  Large Language Models (LLMs), the powerful engines driving generative AI, have revolutionized how we interact with technology. They can tackle complex questions, conjure original content, and even code with impressive fluency. Yet, businesses quickly hit a wall: LLMs alone are limited by their training data, which is often static and doesn’t include an organization’s specific, real-time, or proprietary information. This is precisely where Retrieval-Augmented Generation (RAG) becomes indispensable. RAG acts as the critical bridge, allowing companies to securely connect their unique, internal data directly to LLMs. This connection transforms AI’s potential for businesses, making responses not only more trustworthy and relevant but also up-to-the-minute accurate. Imagine this: with RAG seamlessly linking an LLM to your internal knowledge base, an autonomous AI agent can instantly provide customer service responses that factor in a client’s entire interaction history, or generate marketing briefs perfectly aligned with the very latest brand guidelines and campaign performance data. It’s the difference between generic AI and an intelligent system deeply informed by your business’s living data. To unlock unparalleled efficiency and success across your entire organization, you’ll need to bring together the power of LLMs, a cloud-based data engine, your CRM, and conversational AI through RAG. This potent combination will enable you to deploy a fleet of powerful agents, each informed and precisely tailored to the unique demands of every department — deeply integrated into your workflows, and constantly refreshing information to drive business outcomes.  The road ahead: Enterprise intelligence and autonomous agents The ultimate vision is nothing short of enterprise intelligence powered by autonomous AI agents. Imagine AI agents within a company that can independently access and search across all enterprise information to

Beyond the keyword: How AI is forging the future of enterprise search Read More »

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Alibaba Group has introduced QwenLong-L1, a new framework that enables large language models (LLMs) to reason over extremely long inputs. This development could unlock a new wave of enterprise applications that require models to understand and draw insights from extensive documents such as detailed corporate filings, lengthy financial statements, or complex legal contracts. The challenge of long-form reasoning for AI Recent advances in large reasoning models (LRMs), particularly through reinforcement learning (RL), have significantly improved their problem-solving capabilities. Research shows that when trained with RL fine-tuning, LRMs acquire skills similar to human “slow thinking,” where they develop sophisticated strategies to tackle complex tasks. However, these improvements are primarily seen when models work with relatively short pieces of text, typically around 4,000 tokens. The ability of these models to scale their reasoning to much longer contexts (e.g., 120,000 tokens) remains a major challenge. Such long-form reasoning requires a robust understanding of the entire context and the ability to perform multi-step analysis. “This limitation poses a significant barrier to practical applications requiring interaction with external knowledge, such as deep research, where LRMs must collect and process information from knowledge-intensive environments,” the developers of QwenLong-L1 write in their paper. The researchers formalize these challenges into the concept of “long-context reasoning RL.” Unlike short-context reasoning, which often relies on knowledge already stored within the model, long-context reasoning RL requires models to retrieve and ground relevant information from lengthy inputs accurately. Only then can they generate chains of reasoning based on this incorporated information.  Training models for this through RL is tricky and often results in inefficient learning and unstable optimization processes. Models struggle to converge on good solutions or lose their ability to explore diverse reasoning paths. QwenLong-L1: A multi-stage approach QwenLong-L1 is a reinforcement learning framework designed to help LRMs transition from proficiency with short texts to robust generalization across long contexts. The framework enhances existing short-context LRMs through a carefully structured, multi-stage process: Warm-up Supervised Fine-Tuning (SFT): The model first undergoes an SFT phase, where it is trained on examples of long-context reasoning. This stage establishes a solid foundation, enabling the model to ground information accurately from long inputs. It helps develop fundamental capabilities in understanding context, generating logical reasoning chains, and extracting answers. Curriculum-Guided Phased RL: At this stage, the model is trained through multiple phases, with the target length of the input documents gradually increasing. This systematic, step-by-step approach helps the model stably adapt its reasoning strategies from shorter to progressively longer contexts. It avoids the instability often seen when models are abruptly trained on very long texts. Difficulty-Aware Retrospective Sampling: The final training stage incorporates challenging examples from the preceding training phases, ensuring the model continues to learn from the hardest problems. This prioritizes difficult instances and encourages the model to explore more diverse and complex reasoning paths. QwenLong-L1 process Source: arXiv Beyond this structured training, QwenLong-L1 also uses a distinct reward system. While training for short-context reasoning tasks often relies on strict rule-based rewards (e.g., a correct answer in a math problem), QwenLong-L1 employs a hybrid reward mechanism. This combines rule-based verification, which ensures precision by checking for strict adherence to correctness criteria, with an “LLM-as-a-judge.” This judge model compares the semanticity of the generated answer with the ground truth, allowing for more flexibility and better handling of the diverse ways correct answers can be expressed when dealing with long, nuanced documents. Putting QwenLong-L1 to the test The Alibaba team evaluated QwenLong-L1 using document question-answering (DocQA) as the primary task. This scenario is highly relevant to enterprise needs, where AI must understand dense documents to answer complex questions.  Experimental results across seven long-context DocQA benchmarks showed QwenLong-L1’s capabilities. Notably, the QWENLONG-L1-32B model (based on DeepSeek-R1-Distill-Qwen-32B) achieved performance comparable to Anthropic’s Claude-3.7 Sonnet Thinking, and outperformed models like OpenAI’s o3-mini and Qwen3-235B-A22B. The smaller QWENLONG-L1-14B model also outperformed Google’s Gemini 2.0 Flash Thinking and Qwen3-32B.  Source: arXiv An important finding relevant to real-world applications is how RL training results in the model developing specialized long-context reasoning behaviors. The paper notes that models trained with QwenLong-L1 become better at “grounding” (linking answers to specific parts of a document), “subgoal setting” (breaking down complex questions), “backtracking” (recognizing and correcting their own mistakes mid-reasoning), and “verification” (double-checking their answers). For instance, while a base model might get sidetracked by irrelevant details in a financial document or get stuck in a loop of over-analyzing unrelated information, the QwenLong-L1 trained model demonstrated an ability to engage in effective self-reflection. It could successfully filter out these distractor details, backtrack from incorrect paths, and arrive at the correct answer. Techniques like QwenLong-L1 could significantly expand the utility of AI in the enterprise. Potential applications include legal tech (analyzing thousands of pages of legal documents), finance (deep research on annual reports and financial filings for risk assessment or investment opportunities) and customer service (analyzing long customer interaction histories to provide more informed support). The researchers have released the code for the QwenLong-L1 recipe and the weights for the trained models. source

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs Read More »

Agent-based computing is outgrowing the web as we know it

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more We are on the cusp of a fundamental redesign of the internet. Not a facelift. A full-body transplant. For more than 30 years, the web has been our playground, our workplace, our high street and our therapist’s couch. But it’s also been entirely designed for us simple humans who type, tap, click and scroll. Interfaces built for eyes. Navigation designed for fingers. Decision trees dressed up as websites. But here’s the truth: We’re not going to be the web’s primary users for much longer. AI agents based on ChatGPT, Copilot, Claude and Gemini are moving from passive assistants to active participants. Today, we ask them to do things for us. Tomorrow, we’ll authorize them to act as us. And right now, we’re asking Ferraris to drive on cobblestone. AI is already trying to operate inside a human-shaped world. Clicking buttons. Dragging cursors. Filling out forms. It’s like putting a robot in a glove and telling it to pretend it’s got fingers. It works, for now, but it’s wildly inefficient. Remember when cars first appeared on horse trails? Well, I don’t but I know the story. It worked, barely. Until someone realized speed requires tarmac. The same logic applies to the web. AI agents aren’t going to just be digital chauffeurs. They’re going to be drivers that navigate, decide and transact. Fast. Without us in the loop. We’re about to need a new web. AI agents require machine-native design What does the web look like when it’s built for machines? It’s fast. Invisible. Transactional. Pages become endpoints. Interfaces dissolve. There are no “click here” buttons. Just structured data, unstructured context, exposed capabilities and intent flowing between systems. APIs will become the new storefronts. AI doesn’t need to read a product page or scroll through a review carousel. It needs to ask one question: “Is this the best option based on my user’s preferences, budget and priorities?” And it needs that answer instantly. The entire architecture of the internet will bend toward AI-native interfaces. Faster protocols. Cleaner metadata. Verifiable sources. Trust becomes the currency because AI can’t rely on vibes. Agents will assess source reliability, cross-check facts and learn from user outcomes. Reputation, structure and verification signals will matter more than design. And suddenly, “user experience” takes on a different meaning. You’re not designing for a distracted shopper. You’re designing for a synthetic brain with infinite tabs open and zero tolerance for friction. Two webs, one future So what happens next? We may end up with two parallel webs. One for humans that remains visual, persuasive, slow. One for machines that is minimal, efficient, fast. But more likely, the future is layered. Every digital surface will need a machine-readable skin. Your website, your content, your commerce, if it’s not optimized for autonomous agents, it’s invisible. This changes everything: SEO becomes MEO: machine experience optimization. Content becomes data. Brand trust becomes even more quantifiable and transparent. Influence shifts from design to accessibility, from layout to latency. Efficiency and trustable become key website differentiators. The brands that embrace this shift early to build AI-ready front doors, not just pretty landing pages will thrive . . They will treat their AI compatibility the way they once treated mobile optimization or security. Because in five years, it won’t be a human clicking “buy now.” It’ll be your AI agent, acting on your behalf, making hundreds of decisions a day — not just purchases, but scheduling meetings, booking travel, screening content and negotiating services across every domain of digital life. And it won’t choose the prettiest site, it’ll choose the fastest, most reliable and trustworthy, most machine-readable one. Bottom line We’re not just upgrading browsers. We’re rewriting the rules of the web. The old internet was built for people. The new one will be built for agents. And the companies that recognize this and build infrastructure, content and interfaces accordingly, will most likely own the future. Just like roads evolved for cars, the web will evolve for AI. And the next digital revolution? It’ll be executed in milliseconds by machines, for machines, on a web designed for (and quite possibly by) them. Justin Westcott leads the global technology sector for Edelman. source

Agent-based computing is outgrowing the web as we know it Read More »

AI’s big interoperability moment: Why A2A and MCP are key for agent collaboration

Presented by Google Cloud AI agents are approaching the kind of breakthrough moment that APIs had in the early 2010s. At that time, REST and JSON unlocked system-to-system integration at scale by simplifying what had been a tangle of SOAP, WSDL, and tightly coupled web services. That change didn’t just make developers more productive; it enabled entire business ecosystems built around modular software. A similar shift is underway in artificial intelligence. As agents become more capable and specialized, enterprises are discovering that coordination is the next big challenge. Two open protocols — Agent-to-Agent (A2A) and Model Context Protocol (MCP) — are emerging to meet that need. They simplify how agents share tasks, exchange information, and access enterprise context, even when they were built using different models or tools. These protocols are more than technical conveniences. They are foundational to scaling intelligent software across real-world workflows. AI systems are moving beyond general-purpose copilots. In practice, most enterprises are designing agents to specialize: managing inventory, handling returns, optimizing routes, or processing approvals. Value comes not only from their intelligence, but from how these agents work together. A2A provides the mechanism for agents to interact across systems. It allows agents to advertise their capabilities, discover others, and send structured requests. Built on JSON-RPC and OpenAPI-style authentication, A2A supports stateless communication between agents, making it simpler and more secure to run multi-agent workflows at scale. MCP is another protocol that is empowering AI agents with seamless access to essential tools, comprehensive data, and relevant context. It provides a standardized framework for connecting to diverse enterprise systems. Once an MCP Server is established by a service provider, its full functionality becomes universally accessible to all agents, enabling more intelligent and coordinated actions across the ecosystem. These protocols don’t require organizations to build or glue systems together manually. They make it possible to adopt a shared foundation for AI collaboration that works across the ecosystem. Why it’s gaining traction quickly Google Cloud initiated A2A as an open standard and published its draft in the open, encouraging contributions from across the industry. More than 50 partners have participated in its evolution, including Salesforce, Deloitte, and UiPath. Microsoft now supports A2A in Azure AI Foundry and Copilot Studio; SAP has integrated A2A into its Joule assistant. Other examples are emerging across the ecosystem. Zoom is using A2A to facilitate cross-agent interactions in its open platform. Box and Auth0 are demonstrating how enterprise authentication can be handled across agents using standardized identity flows. This kind of participation is helping the protocol mature quickly, both in specification and in tooling. The Python A2A SDK is stable and production-ready. Google Cloud has also released the Java Agent Development Kit to broaden support for enterprise development teams. Renault Group is among the early adopters already deploying these tools. Multi-agent workflows unlock new enterprise use cases The transition from standalone agents to coordinated systems is already underway. Imagine a scenario where a customer service agent receives a request. It uses A2A to check with an inventory agent about product availability. It then consults a logistics agent to recommend a shipping timeline. If needed, it loops in a finance agent to issue a refund. Each of these agents may be built using different models, toolkits, or platforms — but they can interoperate through A2A and MCP. In more advanced settings, this pattern enables use cases like live operations management. For example, an AI agent monitoring video streams at a theme park could coordinate with operations agents to adjust staff allocation based on real-time crowd conditions. Video, sensor, and ticketing data can be made available through tools like BigLake metastore and accessed by agents through MCP. Decisions are made and executed across agents, with minimal need for human orchestration. Architecturally, this is a new abstraction layer MCP and A2A represent more than messaging protocols. They are part of a broader shift toward clean, open abstractions in enterprise software. These agent protocols decouple intelligence from integration. With MCP, developers don’t need to hand-code API access for every data source. With A2A, they don’t need to maintain brittle logic for how agents interact. The result is a more maintainable, secure, and portable approach to building intelligent multi-agent systems — one that scales across business units and platforms. Google Cloud’s investment in open agent standards Google Cloud’s contributions to the ecosystem are both foundational and practical. We are working with Anthropic on MCP and we have released A2A as open specification and backed them with production-grade tooling. These protocols are deeply integrated into our AI platforms, including Vertex AI, where multi-agent workflows can be developed and managed directly. It is great to see other cloud providers embracing MCP and A2A standards.  By releasing the Agent Development Kit for both Python and Java, and by making these components modular and extensible, Google Cloud is now enabling teams to adopt these standards without needing to reinvent infrastructure. The Agent Development Kit now also features built-in tools to access the data in BigQuery, making it easy to build your own agents backed by your enterprise data. We are committed to enable you to access BigQuery, AlloyDB, and other GCP data services via MCP and A2A protocols. You can get started by using MCP Toolbox for Databases today and open your database queries as MCP tools. We are continuously adding more tools via MCP to enable developers to build even more sophisticated agents using the native capabilities of BigQuery. Why this is worth tracking closely For organizations investing in AI agents today, interoperability is going to matter more with each passing quarter. Systems built around isolated agents will struggle to scale; systems built on shared protocols will be more agile, collaborative, and future-proof. This transition echoes the rise of APIs in the last decade. REST and JSON didn’t just improve efficiency, they became the foundation of modern cloud applications. MCP and A2A are poised to do the same for AI agents. Adopting these protocols doesn’t require a full system rebuild. The

AI’s big interoperability moment: Why A2A and MCP are key for agent collaboration Read More »

How much information do LLMs really memorize? Now we know, thanks to Meta, Google, Nvidia and Cornell

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Most people interested in generative AI likely already know that Large Language Models (LLMs) — like those behind ChatGPT, Anthropic’s Claude, and Google’s Gemini — are trained on massive datasets: trillions of words pulled from websites, books, codebases, and, increasingly, other media such as images, audio, and video. But why? From this data, LLMs develop a statistical, generalized understanding of language, its patterns, and the world — encoded in the form of billions of parameters, or “settings,” in a network of artificial neurons (which are mathematical functions that transform input data into output signals). By being exposed to all this training data, LLMs learn to detect and generalize patterns that are reflected in the parameters of their neurons. For instance, the word “apple” often appears near terms related to food, fruit, or trees, and sometimes computers. The model picks up that apples can be red, green, or yellow, or even sometimes other colors if rotten or rare, are spelled “a-p-p-l-e” in English, and are edible. This statistical knowledge influences how the model responds when a user enters a prompt — shaping the output it generates based on the associations it “learned” from the training data. But a big question — even among AI researchers — remains: how much of an LLM’s training data is used to build generalized representations of concepts, and how much is instead memorized verbatim or stored in a way that is identical or nearly identical to the original data? This is important not only for better understanding how LLMs operate — and when they go wrong — but also as model providers defend themselves in copyright infringement lawsuits brought by data creators and owners, such as artists and record labels. If LLMs are shown to reproduce significant portions of their training data verbatim, courts could be more likely to side with plaintiffs arguing that the models unlawfully copied protected material. If not — if the models are found to generate outputs based on generalized patterns rather than exact replication — developers may be able to continue scraping and training on copyrighted data under existing legal defenses such as fair use. Now, we finally have an answer to the question of how much LLMs memorize versus generalize: a new study released this week from researchers at Meta, Google DeepMind, Cornell University, and NVIDIA finds that GPT-style models have a fixed memorization capacity of approximately 3.6 bits per parameter. To understand what 3.6 bits means in practice: A single bit is the smallest unit of digital data, representing either a 0 or a 1. Eight bits make up one byte. Storing 3.6 bits allows for approximately 12.13 distinct values, as calculated by 2^3.6. This is about the amount of information needed to choose one of 12 options—similar to selecting a month of the year or the outcome of a roll of a 12-sided die. It is not enough to store even one English letter (which needs about 4.7 bits), but it is just enough to encode a character from a reduced set of 10 common English letters (which requires about 3.32 bits). In bytes, 3.6 bits is 0.45 bytes—less than half the size of a typical character stored in ASCII (which uses 8 bits or 1 byte). This number is model-independent within reasonable architectural variations: different depths, widths, and precisions produced similar results. The estimate held steady across model sizes and even precision levels, with full-precision models reaching slightly higher values (up to 3.83 bits/parameter). More training data DOES NOT lead to more memorization — in fact, a model will be less likely to memorize any single data point One key takeaway from the research is that models do not memorize more when trained on more data. Instead, a model’s fixed capacity is distributed across the dataset, meaning each individual datapoint receives less attention. Jack Morris, the lead author, explained via the social network X that “training on more data will force models to memorize less per-sample.” These findings may help ease concerns around large models memorizing copyrighted or sensitive content. If memorization is limited and diluted across many examples, the likelihood of reproducing any one specific training example decreases. In essence, more training data leads to safer generalization behavior, not increased risk. How the researchers identified these findings To precisely quantify how much language models memorize, the researchers used an unconventional but powerful approach: they trained transformer models on datasets composed of uniformly random bitstrings. Each of these bitstrings was sampled independently, ensuring that no patterns, structure, or redundancy existed across examples. Because each sample is unique and devoid of shared features, any ability the model shows in reconstructing or identifying these strings during evaluation directly reflects how much information it retained—or memorized—during training. The key reason for this setup was to completely eliminate the possibility of generalization. Unlike natural language—which is full of grammatical structure, semantic overlap, and repeating concepts—uniform random data contains no such information. Every example is essentially noise, with no statistical relationship to any other. In such a scenario, any performance by the model on test data must come purely from memorization of the training examples, since there is no distributional pattern to generalize from. The authors argue their method is perhaps one of the only principled ways to decouple memorization from learning in practice, because when LLMs are trained on real language, even when they produce an output that matches the training data, it’s difficult to know whether they memorized the input or merely inferred the underlying structure from the patterns they’ve observed. This method allows the researchers to map a direct relationship between the number of model parameters and the total information stored. By gradually increasing model size and training each variant to saturation, across hundreds of experiments on models ranging from 500K to 1.5 billion parameters, they observed consistent results: 3.6 bits memorized per parameter, which

How much information do LLMs really memorize? Now we know, thanks to Meta, Google, Nvidia and Cornell Read More »