VentureBeat

The great cognitive migration: How AI is reshaping human purpose, work and meaning

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Humans have always migrated to survive. When glaciers advanced, when rivers dried up, when cities fell, people moved. Their journeys were often painful, but necessary, whether across deserts, mountains or oceans. Today, we are entering a new kind of migration — not across geography but across cognition. AI is reshaping the cognitive landscape faster than any technology before it. In the last two years, large language models (LLMs) have achieved PhD-level performance across many domains. It is reshaping our mental map much like an earthquake can upset the physical landscape. The rapidity of this change has led to a seemingly watchful inaction: We know a migration is coming soon, but we are unable to imagine exactly how or when it will unfold. But, make no mistake, the early stage of a staggering transformation is underway. Tasks once reserved for educated professionals (including authoring essays, composing music, drafting legal contracts and diagnosing illnesses), are now performed by machines at breathtaking speed. Not only that, but the latest AI systems can make fine-grained inferences and connections long thought to require unique human insight, further accelerating the need for migration. For example, in a New Yorker essay, Princeton history of science professor Graham Burnett marveled at how Google’s NotebookLM made an unexpected and illuminating link between theories from Enlightenment philosophy and a modern TV advertisement.  As AI grows more capable, humans will need to embrace new domains of meaning and value in areas where machines still falter, and where human creativity, ethical reasoning, emotional resonance and the weaving of generational meaning remain indispensable. This “cognitive migration” will define the future of work, education and culture, and those who recognize and prepare for it will shape the next chapter of human history. Where machines advance, humans must move Like climate migrants who must leave their familiar surroundings due to rising tides or growing heat, cognitive migrants will need to find new terrain where their contributions can have value. But where and how exactly will we do this?  Moravec’s Paradox provides some insight. This phenomenon is named for Austrian scientist Hans Moravec, who observed in the 1980s that tasks humans find difficult are easy for a computer, and vice-versa. Or, as computer scientist and futurist Kai-Fu Lee has said: “Let us choose to let machines be machines, and let humans be humans.” Moravec’s insight provides us with an important clue. People excel at tasks that are intuitive, emotional and deeply tied to embodied experience, areas where machines still falter. Successfully navigating through a crowded street, recognizing sarcasm in conversation and intuiting that a painting feels melancholy are all feats of perception and judgment that millions of years of evolution have etched deep into human nature. In contrast, machines that can ace a logic puzzle or summarize a thousand-page novel often stumble at tasks we consider second nature. The human domains AI cannot yet reach As AI rapidly advances, the safe terrain for human endeavor will migrate toward creativity, ethical reasoning, emotional connection and the weaving of deep meaning. The work of humans in the not-too-distant future will increasingly demand uniquely human strengths, including the cultivation of insight, imagination, empathy and moral wisdom. Like climate migrants seeking new fertile ground, cognitive migrants must chart a course toward these distinctly human domains, even as the old landscapes of labor and learning shift under our feet. Not every job will be swept away by AI. Unlike geographical migrations which might have clearer starting points, cognitive migration will unfold gradually at first, and unevenly across different sectors and regions. The diffusion of AI technologies and its impact may take a decade or two.  Many roles that rely on human presence, intuition and relationship-building may be less affected, at least in the near term. These roles include a range of skilled professions from nurses to electricians and frontline service workers. These roles often require nuanced judgment, embodied awareness and trust, which are human attributes for which machines are not always suited.  Cognitive migration, then, will not be universal. But the broader shift in how we assign value and purpose to human work will still ripple outward. Even those whose tasks remain stable may find their work and meaning reshaped by a world in flux. Some promote the idea that AI will unlock a world of abundance where work becomes optional, creativity flourishes and society thrives on digital productivity. Perhaps that future will come. But we cannot ignore the monumental transition it will require. Jobs will change faster than many people can realistically adapt. Institutions, built for stability, will inevitably lag. Purpose will erode before it is reimagined. If abundance is the promised land, then cognitive migration is the required, if uncertain, journey to reach it.  The uneven road ahead Just as in climate migration, not everyone will move easily or equally. Our schools are still training students for a world that is vanishing, not the one that is emerging. Many organizations cling to efficiency metrics that reward repeatable output, the very thing AI can now outperform us on. And far too many individuals will be left wondering where their sense of purpose fits in a world where machines can do what they once proudly did. Human purpose and meaning are likely to undergo significant upheaval.  For centuries, we have defined ourselves by our ability to think, reason and create. Now, as machines take on more of those functions, the questions of our place and value become unavoidable. If AI-driven job losses occur on a large scale without a commensurate ability for people to find new forms of meaningful work, the psychological and social consequences could be profound. It is possible that some cognitive migrants could slip into despair. AI scientist Geoffrey Hinton, who won the 2024 Nobel Prize in physics for his groundbreaking work on deep learning neural networks that underpin LLMs, has warned in recent years about the potential

The great cognitive migration: How AI is reshaping human purpose, work and meaning Read More »

Anthropic launches Claude web search API, betting on the future of post-Google information access

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Anthropic has introduced a web search capability for its Claude AI assistant, intensifying competition in the rapidly evolving AI search market where tech giants are racing to redefine how users find information online. The company announced today that developers can now enable Claude to access current web information through its API, allowing the AI assistant to conduct multiple progressive searches to compile comprehensive answers complete with source citations. The move comes as web search undergoes its most significant transformation since Google revolutionized the field more than two decades ago. “Developers can now augment Claude’s comprehensive knowledge with current, real-world data by enabling the web search tool when making requests to the Messages API,” Anthropic said in its announcement. The new capability arrives amid signs that traditional search is losing ground to AI-powered alternatives. Apple’s senior vice president of services, Eddy Cue, testified today in Google’s antitrust trial that searches in Safari fell last month for the first time in the browser’s 22-year history. “I’ve lost a lot of sleep thinking about it,” Cue said regarding potential revenue loss from Google’s estimated $20 billion payment to be Safari’s default search engine. Web search is now available on our API. Developers can augment Claude’s comprehensive knowledge with up-to-date data. pic.twitter.com/pRQf0ZKXUZ — Anthropic (@AnthropicAI) May 7, 2025 AI assistants are eating Google’s lunch: The decline of traditional search engine dominance The data points to a seismic shift in information discovery patterns. SOCi’s Consumer Behavior Index shows that 19% of consumers already use AI for search, creating the first meaningful challenge to Google’s stranglehold on web information access in decades. This transformation stems from fundamental differences in how AI assistants process and present information. Unlike traditional search engines that display a list of links requiring users to sift through results, AI assistants synthesize information from multiple sources, delivering concise, contextual answers. This eliminates the cognitive load of evaluating numerous websites and extracts the precise information users seek. The timing of Anthropic’s announcement is particularly significant. With Safari searches declining for the first time ever — a metric Cue called unprecedented in his testimony — we’re witnessing early indicators of a mass consumer behavior shift. Traditional search engines optimized for advertising revenue are increasingly being bypassed in favor of conversation-based interactions that prioritize information quality over commercial interests. Under the hood: How Anthropic’s API transforms information retrieval for developers Anthropic’s technical approach represents a significant advance in how AI systems can be deployed as information gathering tools. The system employs a sophisticated decision-making layer that determines when external information would improve response quality, generating targeted search queries rather than simply passing user questions verbatim to a search backend. This “agentic” capability — allowing Claude to conduct multiple progressive searches using earlier results to inform subsequent queries — enables a more thorough research process than traditional search. The implementation essentially mimics how a human researcher might explore a topic, starting with general queries and progressively refining them based on initial findings. For developers, the API offers granular control through the max_uses parameter that limits how many sequential searches Claude can perform. This addresses both cost considerations and prevents the AI from falling into research rabbit holes. The domain control features provide crucial guardrails for enterprise deployments, allowing organizations to ensure information comes only from trusted sources. At $10 per 1,000 searches plus standard token costs, Anthropic has positioned the feature as a premium offering, suggesting confidence that the value proposition justifies the price point compared to direct integration with free search engines. AI search wars heat up: Big tech scrambles to redefine digital information access The competitive landscape around AI search has become increasingly crowded and contentious. OpenAI integrated web search into ChatGPT last fall and recently expanded with shopping features, leveraging its massive user base — 800 million weekly active users, according to OpenAI CEO Sam Altman — to challenge Google’s commercial search dominance. Apple’s potential pivot represents perhaps the most significant threat to the status quo. With Safari accounting for 17.25% of global browser usage and 22.32% on mobile devices according to the most current data from StatCounter, any move to replace Google with AI search alternatives would dramatically alter market dynamics. Apple’s discussions with Perplexity AI, OpenAI, and Anthropic suggest the company is actively exploring multiple partnerships rather than developing its own search technology. Meanwhile, specialized players like Perplexity AI are making strategic moves to embed themselves directly in the hardware ecosystem. Its partnership with Motorola represents an early attempt to position AI search as a device-level feature rather than merely an app or website. Google’s reported resistance to this partnership, revealed during antitrust proceedings, indicates the search giant recognizes the existential threat these new models pose. The court-mandated unwinding of Google’s Safari search deal — potentially eliminating up to $20 billion in annual payments to Apple — may accelerate this transition by removing the financial incentive for Apple to maintain the status quo. Content economy disruption: Publishers face existential threat as AI bypasses traditional websites The shift to AI search presents profound challenges for the content economy that has evolved around traditional search engines. When AI assistants provide direct answers synthesized from multiple sources, they dramatically reduce click-through to original content sites. This threatens the advertising-based business model that sustains much of the internet’s information ecosystem. Google Search vice president Pandu Nayak’s inability to offer “any guarantees” about traffic recovery highlights the gravity of this situation. Content creators face a future where being cited by AI may become more important than appearing in search results — yet citations alone don’t generate the advertising revenue that sustains content creation. This creates a potentially unsustainable dynamic: AI systems rely on high-quality web content to generate responses, but by redirecting user attention away from source websites, they undermine the economic model that produces that content. Without a new compensation mechanism for content creators, the long-term

Anthropic launches Claude web search API, betting on the future of post-Google information access Read More »

Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia has become one of the most valuable companies in the world in recent years thanks to the stock market noticing how much demand there is for graphics processing units (GPUs), the powerful chips Nvidia makes that are used to render graphics in video games but also, increasingly, train AI large language and diffusion models. But Nvidia does far more than just make hardware, of course, and the software to run it. As the generative AI era wears on, the Santa Clara-based company has also been steadily releasing more and more of its own AI models — mostly open source and free for researchers and developers to take, download, modify and use commercially — and the latest among them is Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can, in the words of Hugging Face’s Vaibhav “VB” Srivastav, “transcribe 60 minutes of audio in 1 second [mind blown emoji].” This is the new generation of the Parakeet model Nvidia first unveiled back in January 2024 and updated again in April of that year, but this version two is so powerful, it currently tops the Hugging Face Open ASR Leaderboard with an average “Word Error Rate” (times the model incorrectly transcribes a spoken word) of just 6.05% (out of 100). To put that in perspective, it nears proprietary transcription models such as OpenAI’s GPT-4o-transcribe (with a WER of 2.46% in English) and ElevenLabs Scribe (3.3%). And it’s offering all this while remaining freely available under a commercially permissive Creative Commons CC-BY-4.0 license, making it an attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription services into their paid applications. Performance and benchmark standing The model boasts 600 million parameters and leverages a combination of the FastConformer encoder and TDT decoder architectures. It is capable of transcribing an hour of audio in just one second, provided it’s running on Nvidia’s GPU-accelerated hardware. The performance benchmark is measured at an RTFx (Real-Time Factor) of 3386.02 with a batch size of 128, placing it at the top of current ASR benchmarks maintained by Hugging Face. Use cases and availability Released globally on May 1, 2025, Parakeet-TDT-0.6B-v2 is aimed at developers, researchers, and industry teams building applications such as transcription services, voice assistants, subtitle generators, and conversational AI platforms. The model supports punctuation, capitalization, and detailed word-level timestamping, offering a full transcription package for a wide range of speech-to-text needs. Access and deployment Developers can deploy the model using Nvidia’s NeMo toolkit. The setup process is compatible with Python and PyTorch, and the model can be used directly or fine-tuned for domain-specific tasks. The open-source license (CC-BY-4.0) also allows for commercial use, making it appealing to startups and enterprises alike. Training data and model development Parakeet-TDT-0.6B-v2 was trained on a diverse and large-scale corpus called the Granary dataset. This includes around 120,000 hours of English audio, composed of 10,000 hours of high-quality human-transcribed data and 110,000 hours of pseudo-labeled speech. Sources range from well-known datasets like LibriSpeech and Mozilla Common Voice to YouTube-Commons and Librilight. Nvidia plans to make the Granary dataset publicly available following its presentation at Interspeech 2025. Evaluation and robustness The model was evaluated across multiple English-language ASR benchmarks, including AMI, Earnings22, GigaSpeech, and SPGISpeech, and showed strong generalization performance. It remains robust under varied noise conditions and performs well even with telephony-style audio formats, with only modest degradation at lower signal-to-noise ratios. Hardware compatibility and efficiency Parakeet-TDT-0.6B-v2 is optimized for Nvidia GPU environments, supporting hardware such as the A100, H100, T4, and V100 boards. While high-end GPUs maximize performance, the model can still be loaded on systems with as little as 2GB of RAM, allowing for broader deployment scenarios. Ethical considerations and responsible use NVIDIA notes that the model was developed without the use of personal data and adheres to its responsible AI framework. Although no specific measures were taken to mitigate demographic bias, the model passed internal quality standards and includes detailed documentation on its training process, dataset provenance, and privacy compliance. The release drew attention from the machine learning and open-source communities, especially after being publicly highlighted on social media. Commentators noted the model’s ability to outperform commercial ASR alternatives while remaining fully open source and commercially usable. Developers interested in trying the model can access it via Hugging Face or through Nvidia’s NeMo toolkit. Installation instructions, demo scripts, and integration guidance are readily available to facilitate experimentation and deployment. source

Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face Read More »

Korl launches platform orchestrating AI agents from OpenAI, Gemini and Anthropic to hyper-customize customer messaging

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It’s a conundrum: Customer teams have more data than they can ever begin to use—from Salesforce notes, Jira tickets, project dashboards and Google Docs—but they struggle to combine it all when crafting customer messaging that really resonates.  Existing tools often rely on generic templates or slides and fail to provide a complete picture of customer journeys, roadmaps, project goals and business objectives.  Korl, a startup launched today, hopes to overcome these challenges with a new platform that works across multiple systems to help create highly customized communications. The multi-agent, multimodal tool uses a mix of models from OpenAI, Gemini, and Anthropic to source and contextualize data.  “Engineers have powerful AI tools, but customer-facing teams are stuck with shallow, disconnected solutions,” Berit Hoffmann, CEO and co-founder of Korl, told VentureBeat in an exclusive interview. “Korl’s core innovation is rooted in our advanced multi-agent pipelines designed to build the customer and product context that generic presentation tools lack.”  Creating tailored customer materials through a multi-source view Korl’s AI agents aggregate information from across different systems — such as engineering documentation from Jira, outlines from Google Docs, designs from Figma, and project data from Salesforce — to build a multi-source view.  For instance, once a customer connects Korl to Jira, its agent studies existing and planned product capabilities to figure out how to map data and import new product capabilities, Hoffmann explained. The platform matches product data with customer information—such as usage history, business priorities and lifecycle stage—filling in gaps with AI.  “Korl’s data agents automatically gather, enrich, and structure diverse datasets from internal sources and external public data,” said Hoffmann.  The platform then automatically generates personalized quarterly business reviews (QBRs), renewal pitches, tailored presentations and other materials for use in important customer milestones.  Hoffmann said the company’s core differentiator is its ability to deliver “polished, customer-ready materials” such as slides, narratives and emails, “rather than merely analytics or raw insights.” “We think this delivers a level of operational value that customer-facing teams need today given the pressures to do more with less,” she said.  Switching between OpenAI, Gemini, Anthropic, based on performance Korl orchestrates an “ensemble of models” across OpenAI, Gemini and Anthropic, selecting the best model for the job at the time based on speed, accuracy and cost, Hoffmann explained. Korl needs to perform complex, diverse tasks — nuanced narratives, data computation, visuals — so each use case is matched with the most performant model. The company has implemented “sophisticated fallback mechanisms” to mitigate failures; early on, they observed high failure rates when relying on a single provider, Hoffman reported. The startup developed a proprietary auto-mapper fine-tuned to handle diverse enterprise data schemas across Jira, Salesforce and other systems. The platform automatically maps to relevant fields in Korl.  “Rather than just semantic or field-name matching, our approach evaluates additional factors like data sparsity to score and predict field matches,” said Hoffmann.  To speed the process, Korl combines low-latency, high-throughput models (such as GPT-4o for rapid, context-building responses) with deeper analytical models (Claude 3.7 for more complex, customer-facing communications).  “This ensures that we optimize for the best end user experience, making context-driven tradeoffs between immediacy and accuracy,” Hoffmann explained.  Because “security is paramount,” Korl seeks enterprise-grade privacy guarantees from vendors to ensure customer data is excluded from training datasets. Hoffmann pointed out that its multi-vendor orchestration and contextual prompting further limit inadvertent exposure and data leaks. Grappling with data that is ‘too messy’ or ‘incomplete’ Hoffman noted that, early on, Korl heard from customers that they worried their data would be “too messy” or “incomplete” to be put to good use. In response, the company built pipelines to understand business object relationships and fill in gaps — such as how to position features externally, or how to align values around desired outcomes.  “Our presentation agent is what leverages that data to generate customer slides and talk track [guide conversations with potential customers or leads] dynamically when needed,” said Hoffmann.  She also said Korl features “true multimodality.” The platform isn’t just pulling data from various sources; it’s interpreting different types of information such as text, structured data, charts or diagrams.  “The critical step is moving beyond the raw data to answer: What story does this graph tell? What are the deeper implications here, and will they actually resonate with this specific customer?,” she said. “We’ve built our process to perform that crucial due diligence, ensuring the output isn’t just aggregated data, but genuinely rich content delivered with meaningful context.” Two of Korl’s close competitors include Gainsight and Clari; however, Hoffmann said Korl differentiates itself by incorporating deep product and roadmap context. Effective customer renewal and expansion strategies require a deep understanding of what a product does, and this should be coupled with an analysis of customer data and behavior. Further, Hoffmann said Korl addresses two “foundational shortcomings” of existing platforms: deep business context and brand accuracy. Korl’s agents gather business context from multiple systems. “Without this comprehensive data intelligence, automated decks lack strategic business value,” she said.  When it comes to branding, Korl’s proprietary technology extracts and replicates guidelines from existing materials. Reducing deck prep time from ‘multiple hours to minutes’  Early indications suggest Korl can unlock at least a 1-point improvement in net revenue retention (NRR) for mid-market software companies, said Hoffmann. This is because it uncovers previously unrealized product value and makes it easy to communicate that to customers before they churn or make renewal or expansion decisions.  The platform also improves efficiency, reducing deck preparation time for each customer call from “multiple hours to minutes,” according to Hoffman.  Early customers include skills-building platform Datacamp and gifting and direct mail company Sendoso.  “They tackle a critical and overlooked challenge: Too often, product features are released while go-to-market (GTM) teams are not prepared to sell, support or communicate them effectively,” said Amir Younes, Sendoso’s chief customer officer. “With Korl’s AI, [go-to-market] GTM enablement and asset

Korl launches platform orchestrating AI agents from OpenAI, Gemini and Anthropic to hyper-customize customer messaging Read More »

Business leaders are losing trust in their data — agentic analytics promises a fix

Presented by Salesforce Data-driven decision-making isn’t just best practice; it’s a survival imperative. Business leaders are under immense pressure to back their arguments with data – 76% feel this acutely, according to a Salesforce survey. And while the volume of raw business data continues to mount, leaders’ confidence in using their data for decision-making has dropped significantly, down 18% from 2023, to less than half of leaders overall. This uncertainty is stifling executives’ ability to navigate today’s uncertain times. “Most executives don’t have their own data analysts on call,” says Southard Jones, chief product officer of Tableau. “They also don’t have the training they need to be really confident that they and their team are using the right data to help make the right decisions, especially as these decisions become more involved and more complex.” The solution lies in agentic analytics, the next evolution of business intelligence (BI). With agentic analytics, any business user – regardless of how data savvy they are – can collaborate with autonomous AI agents to automate repetitive, manual tasks like data preparation, and enable AI-powered insights and recommended actions delivered proactively into their preferred workflow. Bridging the trust gap with agentic AI Business leaders often leave valuable data on the table because it’s too intimidating, complex or time-consuming to dig into. AI agents are the key to bridging this data-to-insight gap. Solutions like Tableau Next, Salesforce’s agentic analytics solution, proactively identify patterns and anomalies in the data, and with business metrics that users might not think to ask about. Through a native integration with Agentforce, Salesforce’s digital labor platform, Tableau Next leverages AI agents to deliver insights in natural language, within a company’s daily workflow through any app on the platform — even without a specific inquiry. That’s important because one-third of business leaders say they don’t even know what questions to ask their data, with execs and VPs feeling particularly adrift. Agentic analytics, running quietly and autonomously in the background, solves that problem, surfacing key information a leader needs to know about their business. And that’s how to rebuild trust between a business leader and the data they rely on, Jones says. He likens it to navigating with a mapping app. There’s no need to request continuous updates on better routes or potential traffic slowdowns, because the app has your back, reasoning over your data and keeping tabs at all times. “When AI agents are running behind the scenes, it should be able to tell you whenever something critical is happening in your business,” he says. “That’s changing the game, democratizing data access.” When agentic analytics is working behind the scenes, it’s also dramatically speeding up time to action, bringing recommendations along with its insights. For instance, Tableau Next features a skill — a task or job that an AI agent can perform — to execute data inspections. The Agentforce Inspector continuously tracks a company’s data for key changes, analyzes trends and predicts improvements to address concerns. For example, it can proactively notify a business about an increase in bugs found in a new product. Or, after noting a sudden sales decline, recommend launching a targeted marketing campaign to specific customer segments. In addition to proactively surfacing insights, with Tableau Next’s pre-built Agentforce Concierge skill, users can write a question in their own words and get both a written insight and an interactive data visualization in response, making it easy to understand, as quickly as possible. This moves beyond static dashboards many executives say aren’t useful. As Jones sees it, “if you ask most business executives today, they’d probably tell you they have too many dashboards. The disconnect is that a dashboard was probably created one month ago to answer a business question that’s no longer relevant.” This dynamic, conversational approach allows for immediate answers that keep pace with evolving business needs, unlike static dashboards that quickly become outdated. Tableau Next also eliminates the friction that comes when a user switches over to a dashboard in the middle of a workflow. Instead, agents bring insights to people where they work, whether that’s Slack or Teams, email, Salesforce Sales Cloud or other applications. This abstracts away all the effort it takes to dig into data to find answers, whether that’s building a visualization or asking a question, making it easy for leaders to get trusted insights from their data. Turning data into AI-driven insight While agentic analytics holds immense promise for transforming raw data into actionable insights in any business workflow, its effectiveness hinges on the state of the underlying data. The issue isn’t having enough data — it’s ensuring the data is clean, integrated and enriched with the necessary business context. “Data should be deduped and consolidated to avoid skewing analysis, and unified to provide a ‘single source of truth,’” says Jones. And this doesn’t have to be a manual effort. For example, instead of users manually cleaning up and changing data using complex steps (like traditional Extract, Transform, Load), the Tableau Next Data Pro skill gives smart suggestions on how to do it and can even automatically handle some of the complicated changes, saving time and effort. This data also needs to be captured in a semantic layer for agents and humans to be able to extract meaning and insights. “Most businesses struggle with data because they’re missing a semantic layer, which bridges the gap between raw data and business users, contextualizing complex data and making it more accessible, understandable and usable,” notes Jones. Tableau Semantics serves as the semantic layer, providing Tableau Next and Agentforce with a unified understanding of business data. By establishing consistent definitions and context, it enables AI agents to generate accurate and relevant responses. This capability is significantly enhanced through its integration with Salesforce Data Cloud, which provides a comprehensive data foundation by unifying and federating customer and business data across various sources and systems. This powerful combination allows organizations to connect siloed data repositories and leverage a single data environment that feeds directly into Tableau Semantics.

Business leaders are losing trust in their data — agentic analytics promises a fix Read More »

SOC teams take note: The open-source AI that delivers tier-3 analysis at tier-1 costs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More With cyberattacks accelerating at machine speed, open-source large language models (LLMs) have quickly become the infrastructure that enables startups and global cybersecurity leaders to develop and deploy adaptive, cost-effective defenses against threats that evolve faster than human analysts can respond. Open-source LLMs’ initial advantages of faster time-to-market, greater adaptability and lower cost have created a scalable, secure foundation for delivering infrastructure. At last week’s RSAC 2025 conference, Cisco, Meta and ProjectDiscovery announced new open-source LLMs and a community-driven attack surface innovation that together define the future of open-source in cybersecurity.    One of the key takeaways from this year’s RSAC is the shift in open-source LLMs to extend and strengthen infrastructure at scale. Open-source AI is on the verge of delivering what many cybersecurity leaders have called on for years, which is the ability of the many cybersecurity providers to join forces against increasingly complex threats. The vision of being collaborators in creating a unified, open-source LLM and infrastructure is a step closer, given the announcements at RSAC. Cisco’s Chief Product Officer Jeetu Patel emphasized in his keynote, “The true enemy is not our competitor. It is actually the adversary. And we want to make sure that we can provide all kinds of tools and have the ecosystem band together so that we can actually collectively fight the adversary.” Patel explained the urgency of taking on such a complex challenge, saying, “AI is fundamentally changing everything, and cybersecurity is at the heart of it all. We’re no longer dealing with human-scale threats; these attacks are occurring at machine scale.” Cisco’s Foundation-sec-8B LLM defines a new era of open-source AI Cisco’s newly established Foundation AI group originates from the company’s recent acquisition of Robust Intelligence. Foundation AI’s focus is on delivering domain-specific AI infrastructure tailored explicitly to cybersecurity applications, which are among the most challenging to solve. Built on Meta’s Llama 3.1 architecture, this 8-billion parameter, open-weight Large Language Model isn’t a retrofitted general-purpose AI. It was purpose-built, meticulously trained on a cybersecurity-specific dataset curated in-house by Cisco Foundation AI. “By their nature, the problems in this charter are some of the most difficult ones in AI today. To make the technology accessible, we decided that most of the work we do in Foundation AI should be open. Open innovation allows for compounding effects across the industry, and it plays a particularly important role in the cybersecurity domain,” writes Yaron Singer, VP of AI and Security at Foundation. With open-source anchoring Foundation AI, Cisco has designed an efficient architectural approach for cybersecurity providers who typically compete with each other, selling comparable solutions, to become collaborators in creating more unified, hardened defenses. Singer writes, “Whether you’re embedding it into existing tools or building entirely new workflows, foundation-sec-8b adapts to your organization’s unique needs.” Cisco’s blog post announcing the model recommends that security teams apply foundation-sec-8b across the security lifecycle. Potential use cases Cisco recommends for the model include SOC acceleration, proactive threat defense, engineering enablement, AI-assisted code reviews, validating configurations and custom integration. Foundation-sec-8B’s weights and tokenizer have been open-sourced under the permissive Apache 2.0 license on Hugging Face, allowing enterprise-level customization and deployment without vendor lock-in, maintaining compliance and privacy controls. Cisco’s blog also notes plans to open-source the training pipeline, further fostering community-driven innovation. Cybersecurity is in the LLM’s DNA Cisco chose to create a cybersecurity-specific model optimized for the needs of SOC, DevSecOps and large-scale security teams. Retrofitting an existing, generic AI model wouldn’t get them to their goal, so the Foundation AI team engineered its training using a large-scale, expansive and well-curated cybersecurity-specific dataset. By taking a more precision-focused approach to building the model, the Foundation AI team was able to ensure that the model deeply understands real-world cyber threats, vulnerabilities and defensive strategies. Key training datasets included the following: Vulnerability Databases: Including detailed CVEs (Common Vulnerabilities and Exposures) and CWEs (Common Weakness Enumerations) to pinpoint known threats and weaknesses. Threat Behavior Mappings: Structured from proven security frameworks such as MITRE ATT&CK, providing context on attacker methodologies and behaviors. Threat Intelligence Reports: Comprehensive insights derived from global cybersecurity events and emerging threats. Red-Team Playbooks: Tactical plans outlining real-world adversarial techniques and penetration strategies. Real-World Incident Summaries: Documented analyses of cybersecurity breaches, incidents, and their mitigation paths. Compliance and Security Guidelines: Established best practices from leading standards bodies, including the National Institute of Standards and Technology (NIST) frameworks and the Open Worldwide Application Security Project (OWASP) secure coding principles. This tailored training regimen positions Foundation-sec-8B uniquely to excel at complex cybersecurity tasks, offering significantly enhanced accuracy, deeper contextual understanding and quicker threat response capabilities than general-purpose alternatives. Benchmarking Foundation-sec-8B LLM Cisco’s technical benchmarks show Foundation-sec-8B delivers cybersecurity performance comparable to significantly larger models: Benchmark Foundation-sec-8B Llama-3.1-8B Llama-3.1-70B CTI-MCQA 67.39 64.14 68.23 CTI-RCM 75.26 66.43 72.66 By designing the foundation model to be cybersecurity-specific, Cisco is enabling SOC teams to gain greater efficiency with advanced threat analytics without having to pay high infrastructure costs to get it. Cisco’s broader strategic vision, detailed in its blog, Foundation AI: Robust Intelligence for Cybersecurity, addresses common AI integration challenges, including limited domain alignment of general-purpose models, insufficient datasets and legacy system integration difficulties. Foundation-sec-8B is specifically designed to navigate these barriers, running efficiently on minimal hardware configurations, typically requiring just one or two Nvidia A100 GPUs. Meta also underscored its open-source strategy at RSAC 2025, expanding its AI Defenders Suite to strengthen security across generative AI infrastructure. Their open-source toolkit now includes Llama Guard 4, a multimodal classifier detecting policy violations across text and images, improving compliance monitoring within AI workflows. Also introduced is LlamaFirewall, an open-source, real-time security framework integrating modular capabilities that includes PromptGuard 2, which is used to detect prompt injections and jailbreak attempts. Also launched as part of LlamaFirewall are Agent Alignment Checks that monitor and protect AI agent decision-making processes along with CodeShield, which is designed to inspect

SOC teams take note: The open-source AI that delivers tier-3 analysis at tier-1 costs Read More »

ServiceNow lets users see more of their AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More With agents, applications and new workflows, enterprises will inevitably need a way to look at all the AI they use. ServiceNow’s new AI Control Tower, released this month, offers a holistic view of the entire AI ecosystem. The company also announced its own agentic communication system, which supports existing protocols.  AI Control Tower acts as a “command center” to help enterprise customers govern and manage all their AI workflows, including agents and models.  The AI Control Tower lets AI systems administrators and other AI stakeholders monitor and manage every AI agent, model or workflow in their system — even third-party agents. It also provides end-to-end lifecycle management, real-time reporting for different metrics, and embedded compliance and AI governance.  The idea around AI Control Tower is to give users a central location to see where all of the AI in the enterprise is.  “I can go to a single place to see all the AI systems, how many were onboarded or are currently deployed, which ones are an AI agent or classic machine learning,” said Dorit Zilbershot, ServiceNow’s Group Vice President of AI Experiences and Innovation, in a press briefing. “I could be managing these in a single place, making sure that I have full governance and understanding of what’s going on across my enterprise.” She added that the platform helps users “really drill down to understand the different systems by the provider and by type,” to understand risk and compliance better.  A holistic view of AI systems  Enterprises have begun deploying, even if just through small pilot programs, agents and AI-powered workflows. However, as with onboarding different SAAS providers or software, losing track of these AI features is easy.  Other companies have also begun offering customers a way to view and manage all their AI systems, especially to monitor agent behavior. Writer recently launched its AI HQ platform, which will include an observability feature.  ServiceNow has been considering agent management since it began offering pre-built AI agents to enterprises in September last year. The company’s agent library allows customers to choose the agent that best fits their workflows, and it has built-in orchestration features to help manage agent actions. Since then, the company has begun expanding the number of agents it offers in its agent library to target more use cases for enterprises.  Agent Fabric ServiceNow also unveiled its AI Agent Fabric, a way for its agent to communicate with other agents or tools.  The company said Agent Fabric will work with other agentic communication protocols like Model Context Protocol (MCP) from Anthropic, AGNTCY from Cisco and Google’s Agent2Agent (A2A). Zilbershot said ServiceNow will still support other protocols and will continue working with other companies to develop standards for agentic communication. “When we look at the AI Agent Fabric, it’s less about the protocol and more about the capability. At this point, we really look at ourselves as an open platform, and we will be able to support all the common protocols that are available out there and make sure that our customers can benefit from all these great innovations,” she said.  AI Agent Fabric is available to early adopters but will generally be available in the third quarter.  Many enterprises have been thinking about getting agents from one company to talk to another agent from another company or building on a different system. Interoperability could fuel an even bigger boom of AI agents because agents are no longer confined to one system and can get context and data from another to fulfill tasks. But despite popular protocols like MCP and A2A, the industry has yet to pick a standard interoperability protocol, mainly because these have only been around for several months.  source

ServiceNow lets users see more of their AI Read More »

Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 was, by many expert accounts, supposed to be the year of AI agents — task-specific AI implementations powered by leading large language and multimodal models (LLMs) like the kinds offered by OpenAI, Anthropic, Google, and DeepSeek. But so far, most AI agents remain stuck as experimental pilots in a kind of corporate purgatory, according to a recent poll conducted by VentureBeat on the social network X. Help may be on the way: a collaborative team from Northwestern University, Microsoft, Stanford, and the University of Washington — including a former DeepSeek researcher named Zihan Wang, currently completing a computer science PhD at Northwestern — has introduced RAGEN, a new system for training and evaluating AI agents that they hope makes them more reliable and less brittle for real-world, enterprise-grade usage. Unlike static tasks like math solving or code generation, RAGEN focuses on multi-turn, interactive settings where agents must adapt, remember, and reason in the face of uncertainty. Built on a custom RL framework called StarPO (State-Thinking-Actions-Reward Policy Optimization), the system explores how LLMs can learn through experience rather than memorization. The focus is on entire decision-making trajectories, not just one-step responses. StarPO operates in two interleaved phases: a rollout stage where the LLM generates complete interaction sequences guided by reasoning, and an update stage where the model is optimized using normalized cumulative rewards. This structure supports a more stable and interpretable learning loop compared to standard policy optimization approaches. The authors implemented and tested the framework using fine-tuned variants of Alibaba’s Qwen models, including Qwen 1.5 and Qwen 2.5. These models served as the base LLMs for all experiments and were chosen for their open weights and robust instruction-following capabilities. This decision enabled reproducibility and consistent baseline comparisons across symbolic tasks. Here’s how they did it and what they found: The Echo trap: how reinforcement learning rewards lead to LLM reasoning loss Wang summarized the core challenge in a widely shared X thread: Why does your RL training always collapse? According to the team, LLM agents initially generate symbolic, well-reasoned responses. But over time, RL systems tend to reward shortcuts, leading to repetitive behaviors that degrade overall performance—a pattern they call the “Echo Trap.” This regression is driven by feedback loops where certain phrases or strategies earn high rewards early on, encouraging overuse and stifling exploration. Wang notes that the symptoms are measurable: reward variance cliffs, gradient spikes, and disappearing reasoning traces. RAGEN test environments aren’t exactly enterprise-grade To study these behaviors in a controlled setting, RAGEN evaluates agents across three symbolic environments: Bandit: A single-turn, stochastic task that tests symbolic risk-reward reasoning. Sokoban: A multi-turn, deterministic puzzle involving irreversible decisions. Frozen Lake: A stochastic, multi-turn task requiring adaptive planning. Each environment is designed to minimize real-world priors and focus solely on decision-making strategies developed during training. In the Bandit environment, for instance, agents are told that Dragon and Phoenix arms represent different reward distributions. Rather than being told the probabilities directly, they must reason symbolically—e.g., interpreting Dragon as “strength” and Phoenix as “hope”—to predict outcomes. This kind of setup pressures the model to generate explainable, analogical reasoning. Stabilizing reinforcement learning with StarPO-S To address training collapse, the researchers introduced StarPO-S, a stabilized version of the original framework. StarPO-S incorporates three key interventions: Uncertainty-based rollout filtering: Prioritizing rollouts where the agent shows outcome uncertainty. KL penalty removal: Allowing the model to deviate more freely from its original policy and explore new behaviors. Asymmetric PPO clipping: Amplifying high-reward trajectories more than low-reward ones to boost learning. These changes delay or eliminate training collapse and improve performance across all three tasks. As Wang put it: “StarPO-S… works across all 3 tasks. Relieves collapse. Better reward.” What makes for a good agentic AI model? The success of RL training hinges not just on architecture, but on the quality of the data generated by the agents themselves. The team identified three dimensions that significantly impact training: Task diversity: Exposing the model to a wide range of initial scenarios improves generalization. Interaction granularity: Allowing multiple actions per turn enables more meaningful planning. Rollout freshness: Keeping training data aligned with the current model policy avoids outdated learning signals. Together, these factors make the training process more stable and effective. An interactive demo site published by the researchers on Github makes this explicit, visualizing agent rollouts as full dialogue turns—including not just actions, but the step-by-step thought process that preceded them. For example, in solving a math problem, an agent may first ‘think’ about isolating a variable, then submit an answer like ‘x = 5’. These intermediate thoughts are visible and traceable, which adds transparency into how agents arrive at decisions. When reasoning runs out While explicit reasoning improves performance in simple, single-turn tasks like Bandit, it tends to decay during multi-turn training. Despite the use of structured prompts and  tokens, reasoning traces often shrink or vanish unless directly rewarded. This points to a limitation in how rewards are typically designed: focusing on task completion may neglect the quality of the process behind it. The team experimented with format-based penalties to encourage better-structured reasoning, but acknowledges that more refined reward shaping is likely needed. RAGEN, along with its StarPO and StarPO-S frameworks, is now available as an open-source project at https://github.com/RAGEN-AI/RAGEN. However, no explicit license is listed in the GitHub repository at the time of writing, which may limit use or redistribution by others. The system provides a valuable foundation for those interested in developing AI agents that do more than complete tasks—they think, plan, and evolve. As AI continues to move toward autonomy, projects like RAGEN help illuminate what it takes to train models that learn not just from data, but from the consequences of their own actions. Outstanding questions for real-world enterprise adoption While the RAGEN paper offers a detailed technical roadmap, several practical questions remain for those looking to apply these methods in enterprise

Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN Read More »

Report: OpenAI is buying AI-powered developer platform Windsurf — what happens to its support for rival LLMs?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI appears to be on the verge of making its biggest public acquisition to date with an agreement reached to buy Windsurf, the software developer tool powered by large language models (LLMs), to the tune of $3 billion, according to Bloomberg (unpaywalled Yahoo reprint). Rumors have swirled around such a deal for weeks, but now it appears to be happening as early as today, May 6, 2025, with Windsurf CEO and co-founder Varun Mohan posting on X last night: “Big announcement tomorrow!” According to Bloomberg, the deal is meant to “help OpenAI take on rising competition in the market for AI-driven coding assistants — systems capable of tasks like writing code based on natural language prompting,” and Windsurf had been in talks with venture capital firms to raise another round of private investment around that $3 billion valuation, up from $1.25 billion last year. The startup, formerly known as Exafunction and later Codeium, was founded in 2021 by MIT graduates Varun Mohan and Douglas Chen, initially as a “security-focused LLM toolkit that provides intelligent code suggestions in the context of the codebase,” as VentureBeat reported last year. As it gained more users, its ambitions grew, culminating in the launch of the Windsurf Integrated Development Environment (IDE) in November 2024, a fork of Microsoft’s Visual Studio Code, and the renaming of the company after it. Windsurf reportedly now counts more than 800,000 developer users and 1,000 enterprises as customers. It’s far from the only game in town when it comes to LLM-powered IDEs and dev tools, though: OpenAI was reportedly in talks to buy another very similar and rival startup, Cursor, and there’s of course Amazon’s Q Developer and GitHub Copilot as well. But all are shared in their opinion that LLMs and AI models are going to change software development for the foreseeable future, writing code in the blink of an eye that would take human developers minutes, hours, or days to do manually. What will happen to Windsurf’s support and offering of non-OpenAI LLMs? For users, the integration with OpenAI will undoubtedly raise questions. Part of Windsurf’s appeal is that it is somewhat model agnostic, in that developers who use it can choose with LLM they want to help them write code. Right now, it offers several large language model options for its chat interface, including a custom Windsurf Base Model that’s a fine-tuned variant of Meta’s Llama 3.1 70B, while the Premier Model is based on Meta’s larger Llama 3.1 405B and is integrated with Windsurf’s internal reasoning tools to support more complex tasks, particularly in coding. Subscribers can also access external models such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, allowing for flexibility in model selection depending on the use case. Will OpenAI seek to remove the option for users to select outside LLMs and restrict them to OpenAI’s model families such as GPT-4o, o3, o4, etc? We’ll see, but I for one highly doubt it, given Windsurf’s business has succeeded in some part based on the flexibility of its tool offerings. It would also likely raise complaints of anti-competitive business measures and could even lead to some potential lawsuits. A usage and data play meant to bolster OpenAI’s models against competitors in the coding space? Instead, I would imagine that OpenAI is looking at the Windsurf acquisition as a means not only of acquiring a popular developer tool that plays well with its own models, but as a way to gather tons of user and usage data — and from this, it could see which types of developers use rival models such as the Meta Llama variants and Anthropic’s Claude, and for what purposes, and seek to ensure that new versions of OpenAI’s own LLMs are competitive on these fronts. Either way, it’s a “big freakin deal” — to paraphrase former President Joe Biden — and it will undoubtedly have many far-reaching ripple effects throughout Windsurf’s entire userbase and the wider pool of developers and AI-powered dev tools. Already, Windsurf’s Discord server is filled with posts from users bracing for the worst — an increase in prices or new access tiers bundling and limiting its usage to ChatGPT subscribers or OpenAI API developers. We’ll be tracking and reporting what we uncover that’s useful for technical decision-makers. Stay tuned! source

Report: OpenAI is buying AI-powered developer platform Windsurf — what happens to its support for rival LLMs? Read More »

Hidden costs in AI deployment: Why Claude models may be 20-30% more expensive than GPT in enterprise settings

It is a well-known fact that different model families can use different tokenizers. However, there has been limited analysis on how the process of “tokenization” itself varies across these tokenizers. Do all tokenizers result in the same number of tokens for a given input text? If not, how different are the generated tokens? How significant are the differences? In this article, we explore these questions and examine the practical implications of tokenization variability. We present a comparative story of two frontier model families: OpenAI’s ChatGPT vs Anthropic’s Claude. Although their advertised “cost-per-token” figures are highly competitive, experiments reveal that Anthropic models can be 20–30% more expensive than GPT models. API Pricing — Claude 3.5 Sonnet vs GPT-4o As of June 2024, the pricing structure for these two advanced frontier models is highly competitive. Both Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o have identical costs for output tokens, while Claude 3.5 Sonnet offers a 40% lower cost for input tokens. Source: Vantage The hidden “tokenizer inefficiency” Despite lower input token rates of the Anthropic model, we observed that the total costs of running experiments (on a given set of fixed prompts) with GPT-4o is much cheaper when compared to Claude Sonnet-3.5. Why? The Anthropic tokenizer tends to break down the same input into more tokens compared to OpenAI’s tokenizer. This means that, for identical prompts, Anthropic models produce considerably more tokens than their OpenAI counterparts. As a result, while the per-token cost for Claude 3.5 Sonnet’s input may be lower, the increased tokenization can offset these savings, leading to higher overall costs in practical use cases.  This hidden cost stems from the way Anthropic’s tokenizer encodes information, often using more tokens to represent the same content. The token count inflation has a significant impact on costs and context window utilization. Domain-dependent tokenization inefficiency Different types of domain content are tokenized differently by Anthropic’s tokenizer, leading to varying levels of increased token counts compared to OpenAI’s models. The AI research community has noted similar tokenization differences here. We tested our findings on three popular domains, namely: English articles, code (Python) and math. Domain Model Input GPT Tokens Claude Tokens % Token Overhead English articles 77 89 ~16% Code (Python) 60 78 ~30% Math 114 138 ~21% % Token Overhead of Claude 3.5 Sonnet Tokenizer (relative to GPT-4o) Source: Lavanya Gupta When comparing Claude 3.5 Sonnet to GPT-4o, the degree of tokenizer inefficiency varies significantly across content domains. For English articles, Claude’s tokenizer produces approximately 16% more tokens than GPT-4o for the same input text. This overhead increases sharply with more structured or technical content: for mathematical equations, the overhead stands at 21%, and for Python code, Claude generates 30% more tokens. This variation arises because some content types, such as technical documents and code, often contain patterns and symbols that Anthropic’s tokenizer fragments into smaller pieces, leading to a higher token count. In contrast, more natural language content tends to exhibit a lower token overhead. Other practical implications of tokenizer inefficiency Beyond the direct implication on costs, there is also an indirect impact on the context window utilization.  While Anthropic models claim a larger context window of 200K tokens, as opposed to OpenAI’s 128K tokens, due to verbosity, the effective usable token space may be smaller for Anthropic models. Hence, there could potentially be a small or large difference in the “advertised” context window sizes vs the “effective” context window sizes. Implementation of tokenizers GPT models use Byte Pair Encoding (BPE), which merges frequently co-occurring character pairs to form tokens. Specifically, the latest GPT models use the open-source o200k_base tokenizer. The actual tokens used by GPT-4o (in the tiktoken tokenizer) can be viewed here. JSON { #reasoning “o1-xxx”: “o200k_base”, “o3-xxx”: “o200k_base”, # chat “chatgpt-4o-“: “o200k_base”, “gpt-4o-xxx”: “o200k_base”, # e.g., gpt-4o-2024-05-13 “gpt-4-xxx”: “cl100k_base”, # e.g., gpt-4-0314, etc., plus gpt-4-32k “gpt-3.5-turbo-xxx”: “cl100k_base”, # e.g, gpt-3.5-turbo-0301, -0401, etc. } Unfortunately, not much can be said about Anthropic tokenizers as their tokenizer is not as directly and easily available as GPT. Anthropic released their Token Counting API in Dec 2024. However, it was soon demised in later 2025 versions. Latenode reports that “Anthropic uses a unique tokenizer with only 65,000 token variations, compared to OpenAI’s 100,261 token variations for GPT-4.” This Colab notebook contains Python code to analyze the tokenization differences between GPT and Claude models. Another tool that enables interfacing with some common, publicly available tokenizers validates our findings. The ability to proactively estimate token counts (without invoking the actual model API) and budget costs is crucial for AI enterprises.  Key Takeaways Anthropic’s competitive pricing comes with hidden costs:While Anthropic’s Claude 3.5 Sonnet offers 40% lower input token costs compared to OpenAI’s GPT-4o, this apparent cost advantage can be misleading due to differences in how input text is tokenized. Hidden “tokenizer inefficiency”:Anthropic models are inherently more verbose. For businesses that process large volumes of text, understanding this discrepancy is crucial when evaluating the true cost of deploying models. Domain-dependent tokenizer inefficiency:When choosing between OpenAI and Anthropic models, evaluate the nature of your input text. For natural language tasks, the cost difference may be minimal, but technical or structured domains may lead to significantly higher costs with Anthropic models. Effective context window:Due to the verbosity of Anthropic’s tokenizer, its larger advertised 200K context window may offer less effective usable space than OpenAI’s 128K, leading to a potential gap between advertised and actual context window. Anthropic did not respond to VentureBeat’s requests for comment by press time. We’ll update the story if they respond. source

Hidden costs in AI deployment: Why Claude models may be 20-30% more expensive than GPT in enterprise settings Read More »