VentureBeat

White House plan signals “open-weight first” era—and enterprises need new guardrails

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now U.S. President Donald Trump signed the AI Action Plan, which outlines a path for the U.S. to lead in the AI race. For enterprises already in the throes of deploying AI systems, the rules represent a clear indication of how this administration intends to treat AI going forward and could signal how providers will approach AI development. Much like the AI executive order signed by Joe Biden in 2023, Trump’s order primarily concerns government offices, directing how they can contract with AI models and application providers, as it is not a legislative act. The AI plan may not directly affect enterprises immediately, but analysts noted that anytime the government takes a position on AI, the ecosystem changes. “This plan will likely shape the ecosystem we all operate in — one that rewards those who can move fast, stay aligned and deliver real-world outcomes,” Matt Wood, commercial technology and innovation officer at PwC, told VentureBeat in an email. “For enterprises, the signal is clear: the pace of AI adoption is accelerating, and the cost of lagging is going up. Even if the plan centers on federal agencies, the ripple effects — in procurement, infrastructure, and norms — will reach much further. We’ll likely see new government-backed testbeds, procurement programs, and funding streams emerge — and enterprises that can partner, pilot, or productize in this environment will be well-positioned.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF He added that the Action Plan “is not a blueprint for enterprise AI.” Still, enterprises should expect an AI development environment that prioritizes speed, scale, experimentation and less reliance on regulatory shelters. Companies working with the government should also be prepared for additional scrutiny on the models and applications they use, to ensure alignment with the government’s values. The Action Plan outlines how government agencies can collaborate with AI companies, prioritize recommended tasks to invest in infrastructure and encourage AI development and establish guidelines for exporting and importing AI tools. Charleyne Biondi, assistant vice president and analyst at Moody’s Ratings, said the plan “highlights AI’s role as an increasingly strategic asset and core driver of economic transformation.” She noted, however, that that plan doesn’t address regulatory fragmentation. “However, current regulatory fragmentation across U.S. states could create uncertainty for developers and businesses. Striking the right balance between innovation and safety and between national ambition and regulatory clarity will be critical to ensure continued enterprise adoption and avoid unintended slowdowns,” she said. What is inside the action plan The AI Action Plan is broken down into three pillars: Accelerating AI innovation Building American AI infrastructure Leading in international AI diplomacy and security. The key headline piece of the AI Action Plan centers on “ensuring free speech and American values,” a significant talking point for this administration. It instructs the National Institute of Standards and Technology (NIST) to remove references to misinformation and diversity, equity and inclusion. It prevents agencies from working with foundation models that have “top-down agendas.” It’s unclear how the government expects existing models and datasets to follow suit, or what this kind of AI would look like. Enterprises are especially concerned about potentially controversial statements AI systems can make, as evidenced by the recent Grok kerfuffle. It also orders NIST to research and publish findings to ensure that models from China, such as DeepSeek, Qwen and Kimi, are not aligned with the Chinese Communist Party. However, the most consequential positions involve supporting open-source systems, creating a new testing and evaluation ecosystem, and streamlining the process for building data centers. Through the plan, the Department of Energy and the National Science Foundation are directed to develop “AI testbeds for piloting AI systems in secure, real-world settings,” allowing researchers to prototype systems. It also removes much of the red tape associated with evaluating safety testing for models. What has excited many in the industry is the explicit support for open-source AI and open-weight models. “We need to ensure America has leading open models founded on American values. Open-source and open-weight models could become global standards in some areas of business and academic research worldwide. For that reason, they also have geostrategic value. While the decision of whether and how to release an open or closed model is fundamentally up to the developer, the Federal government should create a supportive environment for open models,” the plan said. Understandably, open-source proponents like Hugging Face’s Clement Delangue praised this decision on social media, saying: “It’s time for the American AI community to wake up, drop the “open is not safe” bullshit, and return to its roots: open science and open-source AI, powered by an unmatched community of frontier labs, big tech, startups, universities, and non‑profits.” It’s time for the American AI community to wake up, drop the “open is not safe” bullshit, and return to its roots: open science and open-source AI, powered by an unmatched community of frontier labs, big tech, startups, universities, and non‑profits. If we don’t, we’ll be forced… https://t.co/NxnhdMhUgH — clem ? (@ClementDelangue) July 23, 2025 BCG X North America chair Sesh Iyer told VentureBeat this would give enterprises more confidence in adopting open-source LLMs and could also encourage more closed-source providers “to rethink proprietary strategies and potentially consider releasing model weights.” The plan does mention that cloud providers should prioritize the Department of Defense, which could bump some enterprises down an already crowded waiting list. A little more clarity on rules The AI Action Plan is more akin to an executive order and can only direct government agencies under the purview of the Executive branch. Full AI regulation,

White House plan signals “open-weight first” era—and enterprises need new guardrails Read More »

Alibaba’s new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low compute version

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Chinese e-commerce giant Alibaba has made waves globally in the tech and business communities with its own family of “Qwen” generative AI large language models, beginning with the launch of the original Tongyi Qianwen LLM chatbot in April 2023 through the release of Qwen 3 in April 2025. Why? Well, not only are its models powerful and score high on third-party benchmark tests at completing math, science, reasoning, and writing tasks, but for the most part, they’ve been released under permissive open source licensing terms, allowing organizations and enterprises to download them, customize them, run them, and generally use them for all variety of purposes, even commercial. Think of them as an alternative to DeepSeek. This week, Alibaba’s “Qwen Team,” as its AI division is known, released the latest updates to its Qwen family, and they’re already attracting attention once more from AI power users in the West for their top performance, in one case, edging out even the new Kimi-2 model from rival Chinese AI startup Moonshot released in mid-July 2025. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF The new Qwen3-235B-A22B-2507-Instruct model — released on AI code sharing community Hugging Face alongside a “floating point 8” or FP8 version, which we’ll cover more in-depth below — improves from the original Qwen 3 on reasoning tasks, factual accuracy, and multilingual understanding. It also outperforms Claude Opus 4’s “non-thinking” version. The new Qwen3 model update also delivers better coding results, alignment with user preferences, and long-context handling, according to its creators. But that’s not all… Read on for what else it offers enterprise users and technical decision-makers. FP8 version lets enterprises run Qwen 3 with far less memory and far less compute In addition to the new Qwen3-235B-A22B-2507 model, the Qwen Team released an “FP8” version, which stands for 8-bit floating point, a format that compresses the model’s numerical operations to use less memory and processing power — without noticeably affecting its performance. In practice, this means organizations can run a model with Qwen3’s capabilities on smaller, less expensive hardware or more efficiently in the cloud. The result is faster response times, lower energy costs, and the ability to scale deployments without needing massive infrastructure. This makes the FP8 model especially attractive for production environments with tight latency or cost constraints. Teams can scale Qwen3’s capabilities to single-node GPU instances or local development machines, avoiding the need for massive multi-GPU clusters. It also lowers the barrier to private fine-tuning and on-premises deployments, where infrastructure resources are finite and total cost of ownership matters. Even though Qwen team didn’t release official calculations, comparisons to similar FP8 quantized deployments suggest the efficiency savings are substantial. Here’s a practical illustration (updated and corrected on 07/23/2025 at 16:04 pm ET — this piece originally included an inaccurate chart based on a miscalculation, I apologize for the errors and thank readers for contacting me about them.): Metric BF16 / BF16-equiv build FP8 Quantized build GPU memory use* ≈ 640 GB total (8 × H100-80 GB, TP-8) ≈ 320 GB total on the recommended 4 × H100-80 GB, TP-4 Lowest-footprint community run: ~143 GB across 2 × H100 with Ollama off-loading Single-query inference speed† ~74 tokens / s (batch = 1, context = 2 k, 8 × H20-96 GB, TP-8) ~72 tokens / s (same settings, 4 × H20-96 GB, TP-4) Power / energy Full node of eight H100s draws ~4-4.5 kW under load (550–600 W per card, plus host)‡ FP8 needs half the cards and moves half the data; NVIDIA’s Hopper FP8 case-studies report ≈ 35-40 % lower TCO and energy at comparable throughput GPUs needed (practical) 8 × H100-80 GB (TP-8) or 8 × A100-80 GB for parity 4 × H100-80 GB (TP-4). 2 × H100 is possible with aggressive off-loading, at the cost of latency *Disk footprint for the checkpoints: BF16 weights are ~500 GB; the FP8 checkpoint is “well over 200 GB,” so the absolute memory savings on GPU come mostly from needing fewer cards, not from weights alone. †Speed figures are from the Qwen3 official SGLang benchmarks (batch 1). Throughput scales almost linearly with batch size: Baseten measured ~45 tokens/s per user at batch 32 and ~1.4 k tokens/s aggregate on the same four-GPU FP8 setup. ‡No vendor supplies exact wall-power numbers for Qwen, so we approximate using H100 board specs and NVIDIA Hopper FP8 energy-saving data. No more ‘hybrid reasoning’…instead Qwen will release separate reasoning and instruct models! Perhaps most interesting of all, Qwen Team announced it will no longer be pursuing a “hybrid” reasoning approach, which it introduced back with Qwen 3 in April and seemed to be inspired by an approach pioneered by sovereign AI collective Nous Research. This allowed users to toggle on a “reasoning” model, letting the AI model engage in its own self-checking and producing “chains-of-thought” before responding. In a way, it was designed to mimic the reasoning capabilities of powerful proprietary models such as OpenAI’s “o” series (o1, o3, o4-mini, o4-mini-high), which also produce “chains-of-thought.” However, unlike those rival models which always engage in such “reasoning” for every prompt, Qwen 3 could have the reasoning mode manually switched on or off by the user by clicking a “Thinking Mode” button on the Qwen website chatbot, or by typing “/think” before their prompt on a local or privately run model inference. The idea was to give users control to engage the slower and more token-intensive thinking mode for more difficult prompts and tasks, and use a non-thinking mode for simpler prompts. But again, this put the onus on

Alibaba’s new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low compute version Read More »

Inside reMarkable’s push to scale employee and customer IT requests with agentic AI

Leave a Comment / Top Tech Update / VentureBeat

Presented by Salesforce As growing companies adopt AI to drive efficiency and scale, many are discovering that success depends less on the technology itself and more on how well their systems, teams, and data are connected. When systems don’t talk to each other and information stays siloed, it becomes harder for AI to deliver meaningful impact. That’s why data integration is emerging as a top priority for tech leaders: it creates the foundation for AI agents to work in sync with people, taking real-time action, surfacing insights, and supporting faster decisions. This was the approach taken by reMarkable. As the paper-tablet maker expanded into the B2B market, it shifted to a more unified, platform-based strategy with a central hub for team collaboration. With this connected foundation and a 360-degree view of their customers in place, reMarkable was able to introduce AI agents to help scale both customers and employee support — all while staying true to the premium experience its brand is known for. AI agents meet employees where they work reMarkable’s agent-first strategy found momentum in Slack, where a conversational interface and ocean of data from messages, files, and apps create the ideal environment for AI agents to embed seamlessly into employees’ day-to-day work. And it’s a timely move, with daily AI usage among desk workers having surged 233% in just six months — a sign that reMarkable’s approach is tapping into a rapidly growing shift in how people get work done. The team began by tackling the high volume of routine IT requests employees made on a daily basis. Manually submitting and tracking tickets, hunting down knowledge articles, and navigating scattered systems were a drag on team productivity. To solve this, they built and rolled out “Saga,” an AI agent built with Salesforce’s digital labor platform Agentforce. Employees simply message Saga in Slack to get help and quick answers. When needed, the agent even automatically manages tickets behind the scenes, no context switching required. Saga’s already helping teams move faster, reducing friction, cutting down on repetitive tasks, and giving IT more time to focus on strategic work. “Having AI agents working alongside our employees directly in Slack is really powerful,” said Nico Cormier, CTO at reMarkable. “Not only do they become our team members, but given the amount of data and context we have in Slack, they become really intelligent team members that take actions right in the flow of work. That will have an enormous impact on productivity through time freed up for creative and strategic work, and ultimately company profits.” Scaling customer support comes next reMarkable also extended their AI agent model to their customers. In just a few weeks, they launched “Mark,” a customer-facing agent that handles common support questions through a smart, conversational flow. Customers get fast and accurate answers, and when an issue calls for a human touch, Mark seamlessly hands it off to the right support rep to keep the experience smooth, efficient, and personal. Mark has already managed more than 25,000 conversations, doing the equivalent work of 20% of reMarkable’s 115-person support team. That impact validated the company’s move toward an agent-first enterprise strategy, built on Salesforce’s unified platform that connects apps, data, metadata, and AI agents. Next, reMarkable is building a commerce agent for B2B customers to handle tasks like order tracking, returns, and upsell suggestions, as well as a sales agent to automate lead nurturing, follow-ups, and coaching. “With Agentforce, we’re creating a smarter, more personalized self-service experience for our B2B customers,” said Cormier. “By centralizing order data and connecting it across Salesforce and Slack, we can provide real-time updates, streamline returns, and proactively offer relevant products or restock alerts. In the end, we’re delivering the kind of thoughtful, responsive experience our customers expect, at scale.” By handling routine questions instantly and handing off more complex needs to the right people, reMarkable’s support feels faster, more responsive, and genuinely human. It’s a clear signal that when AI is thoughtfully integrated into the customer journey, it doesn’t just scale support, it elevates the customer experience. A model for the evolving workplace As businesses around the world rethink how work gets done, reMarkable stands out as a model of what’s possible when innovation meets intentionality. AI agents built on unified, connected platforms become strategic levers for growth and employee productivity. This isn’t about slapping AI onto existing processes. It’s about weaving it into the fabric of how teams and systems work together, freeing people up to do their best work and delivering experiences that truly stand out. For business leaders, the lesson is clear: invest in integration first, then build your AI agents on that foundation. That’s how you unlock not just efficiency, but agility, alignment, and customer love. reMarkable shows us when done right, AI agents can become a catalyst for smarter scaling and better, more human-centered work. In today’s fast-moving world, that’s not just an advantage. It’s the future. Peter Doolan is Chief Customer Officer of Slack, Salesforce. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Inside reMarkable’s push to scale employee and customer IT requests with agentic AI Read More »

Cutting through the chaos: Why AI needs a unified platform

Leave a Comment / Top Tech Update / VentureBeat

Presented by ServiceNow Rare diseases affect more than 30 million people in the U.S. — 10% of the population — yet many do not have treatments. At AstraZeneca, that gap creates urgent pressure, not only to discover life-saving therapies but to deliver them faster and more efficiently to patients. To meet that challenge, AstraZeneca turned to the ServiceNow AI Platform to accelerate onboarding, streamline lab requests, and help eliminate manual work — freeing scientists and support teams to focus on work that matters most. “These aren’t incremental gains,” says Paul Fipps, president of global customer operations at ServiceNow. “We’re helping AstraZeneca save more than 30,000 hours a year so they can focus on what matters most: discovering, developing, and delivering treatments that save lives.” AstraZeneca offers a glimpse into a broader transformation — one that could fundamentally change how the world works. In fact, IDC projects that investments in AI solutions and services will drive over $20 trillion of global impact by 2030. “Companies creating the right conditions for agentic AI — autonomous AI agents — are already seeing efficiency gains of 20-50%,” Fipps says. These results are tangible: faster customer service resolution, reduced administrative overhead, and optimized supply chains. When executed correctly, the competitive advantage is unprecedented. “Integrating AI into business isn’t just about cutting costs — it’s about rethinking priorities,” he adds. “Think about everything we can achieve with the extra time, resources, and brainpower. It’s an opportunity to redeploy our efforts toward solving problems that really matter.” Realizing AI’s value through a platform-based approach Successfully integrating AI requires overcoming three core challenges: data quality, legacy infrastructure, and integration complexity. According to McKinsey, 70% of organizations struggle to quickly integrate data into AI models. Simply layering AI agents over these fragmented systems will only exacerbate complexity and limit potential. “Work doesn’t happen in a single department, so neither can your AI strategy,” says Fipps. “Agentic AI needs a platform-wide strategy — one that spans every corner of the business, not just a single domain.” That’s the approach ServiceNow is taking with Vodafone, one of the world’s largest telecom providers with more than 340 million customers. Together, they’re redefining service delivery for business clients. “We’ve built an AI-powered Enhanced Service Monitoring solution that proactively identifies and resolves issues — often before the customer is even aware there’s a problem,” Fipps says. “It’s about reducing disruptions, accelerating response times, and delivering seamless experiences across complex networks and cloud environments. This isn’t just an upgrade — it’s setting a new benchmark for how AI can transform customer experience at scale.” Going beyond superficial AI Many so-called agentic AI tools are little more than basic robotic process automation (RPA). True transformation runs deeper. It requires AI that’s fully integrated across the enterprise — connecting across the enterprise — linking customer data, knowledge bases, and operational systems to enable intelligent, fast decision-making. That’s where platform architecture becomes critical. ServiceNow’s 20 years as the workflow automation leader is an incredible advantage: a powerful foundation of built-in workflows, automations, and knowledge bases. This legacy offers a head start — and a clear distinction between superficial AI and enterprise-wide transformation. “Effective AI deployment isn’t a one-time ‘ta-da’ moment,” says Fipps. “It’s about improving thousands of processes and tasks. When those improvements occur on a unified platform, they build on each other and can create exponential efficiency gains.” These gains are already real at ServiceNow. “We’re running more than 100 different AI projects internally,” Fipps shares. “So far, we estimate we’ve saved more than $350 million through automation and process optimization.” For instance, if one of ServiceNow’s 26,000+ employees have a finance-related question — say, their paycheck or their commission statement — they no longer wait four days for a response. With agentic AI, it now takes seconds. Scaling AI by removing barriers True transformation goes beyond efficiency — it’s about scale. AI Agents on the ServiceNow platform learn from trillions of transactions and billions of workflows the company sees on its platform, identifying repeatable patterns and turning them into pre-built agents that help solve high-impact problems across industries. “We’re continuously refining our AI agent strategy to tackle our customers’ most pressing problems,” Fipps says. This is where the ServiceNow AI Platform sets a new standard. The AI Control Tower serves as the single command center that orchestrates all AI agents, models, and workflows — providing full visibility, enterprise-grade compliance, and control at scale. It aligns AI initiatives with broader business and technology goals to ensure meaningful value delivery. With ServiceNow AI Agent Fabric, AI agents from the likes of ServiceNow, Microsoft, Deloitte or in-house teams can communicate and collaborate seamlessly across systems. These agents act as a unified, intelligent system: sharing context, coordinating tasks, and delivering measurable outcomes. The time for AI is now The urgency of agentic AI means the time to act is now. “To stay competitive, the smartest move is to make a platform bet,” Fipps says. “It’s not just about saving money. The most forward-looking companies are reinvesting those gains to accelerate what matters — like AstraZeneca speeding up drug discovery, or Vodafone enhancing customer service.” In each case, AI isn’t replacing people — it’s enhancing their ability to solve complex challenges faster. While the platform matters, the true power of AI lies in the people it empowers. “The organizations making the most progress are those using AI to elevate human potential,” he says. “They’re not just automating tasks — they’re driving human-centered innovation and transforming how work gets done.” Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Cutting through the chaos: Why AI needs a unified platform Read More »

Weaving reality or warping it? The personalization trap in AI systems

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now AI represents the greatest cognitive offloading in the history of humanity. We once offloaded memory to writing, arithmetic to calculators and navigation to GPS. Now we are beginning to offload judgment, synthesis and even meaning-making to systems that speak our language, learn our habits and tailor our truths. AI systems are growing increasingly adept at recognizing our preferences, our biases, even our peccadillos. Like attentive servants in one instance or subtle manipulators in another, they tailor their responses to please, to persuade, to assist or simply to hold our attention. While the immediate effects may seem benign, in this quiet and invisible tuning lies a profound shift: The version of reality each of us receives becomes progressively more uniquely tailored. Through this process, over time, each person becomes increasingly their own island. This divergence could threaten the coherence and stability of society itself, eroding our ability to agree on basic facts or navigate shared challenges. AI personalization does not merely serve our needs; it begins to reshape them. The result of this reshaping is a kind of epistemic drift. Each person starts to move, inch by inch, away from the common ground of shared knowledge, shared stories and shared facts, and further into their own reality. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF This is not simply a matter of different news feeds. It is the slow divergence of moral, political and interpersonal realities. In this way, we may be witnessing the unweaving of collective understanding. It is an unintended consequence, yet deeply significant precisely because it is unforeseen. But this fragmentation, while now accelerated by AI, began long before algorithms shaped our feeds. The unweaving This unweaving did not begin with AI. As David Brooks reflected in The Atlantic, drawing on the work of philosopher Alasdair MacIntyre, our society has been drifting away from shared moral and epistemic frameworks for centuries. Since the Enlightenment, we have gradually replaced inherited roles, communal narratives and shared ethical traditions with individual autonomy and personal preference. What began as liberation from imposed belief systems has, over time, eroded the very structures that once tethered us to common purpose and personal meaning. AI did not create this fragmentation. But it is giving new form and speed to it, customizing not only what we see but how we interpret and believe. It is not unlike the biblical story of Babel. A unified humanity once shared a single language, only to be fractured, confused and scattered by an act that made mutual understanding all but impossible. Today, we are not building a tower made of stone. We are building a tower of language itself. Once again, we risk the fall. Human-machine bond At first, personalization was a way to improve “stickiness” by keeping users engaged longer, returning more often and interacting more deeply with a site or service. Recommendation engines, tailored ads and curated feeds were all designed to keep our attention just a little longer, perhaps to entertain but often to move us to purchase a product. But over time, the goal has expanded. Personalization is no longer just about what holds us. It is what it knows about each of us, the dynamic graph of our preferences, beliefs and behaviors that becomes more refined with every interaction. Today’s AI systems do not merely predict our preferences. They aim to create a bond through highly personalized interactions and responses, creating a sense that the AI system understands and cares about the user and supports their uniqueness. The tone of a chatbot, the pacing of a reply and the emotional valence of a suggestion are calibrated not only for efficiency but for resonance, pointing toward a more helpful era of technology. It should not be surprising that some people have even fallen in love and married their bots. The machine adapts not just to what we click on, but to who we appear to be. It reflects us back to ourselves in ways that feel intimate, even empathic. A recent research paper cited in Nature refers to this as “socioaffective alignment,” the process by which an AI system participates in a co-created social and psychological ecosystem, where preferences and perceptions evolve through mutual influence. This is not a neutral development. When every interaction is tuned to flatter or affirm, when systems mirror us too well, they blur the line between what resonates and what is real. We are not just staying longer on the platform; we are forming a relationship. We are slowly and perhaps inexorably merging with an AI-mediated version of reality, one that is increasingly shaped by invisible decisions about what we are meant to believe, want or trust. This process is not science fiction; its architecture is built on attention, reinforcement learning with human feedback (RLHF) and personalization engines. It is also happening without many of us — likely most of us — even knowing. In the process, we gain AI “friends,” but at what cost? What do we lose, especially in terms of free will and agency? Author and financial commentator Kyla Scanlon spoke on the Ezra Klein podcast about how the frictionless ease of the digital world may come at the cost of meaning. As she put it: “When things are a little too easy, it’s tough to find meaning in it… If you’re able to lay back, watch a screen in your little chair and have smoothies delivered to you — it’s tough to find meaning within that kind of WALL-E lifestyle because everything is just a bit too simple.” The personalization of

Weaving reality or warping it? The personalization trap in AI systems Read More »

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at KAIST AI and Mila have introduced a new Transformer architecture that makes large language models (LLMs) more memory- and compute-efficient. The architecture, called Mixture-of-Recursions (MoR), significantly improves model accuracy and delivers higher throughput compared with vanilla transformers, even when constrained by the same parameter count and compute budget. The scaling challenges of LLMs The impressive capabilities of today’s LLMs are directly tied to their ever-increasing size. But as these models scale, their memory footprints and computational requirements often become untenable, making both training and deployment challenging for organizations outside of hyperscale data centers. This has led to a search for more efficient designs. Efforts to improve LLM efficiency have focused mainly on two methods: parameter sharing and adaptive computation. Parameter sharing techniques reduce the total number of unique parameters by reusing weights across different parts of the model, thereby reducing the overall computational complexity. For example, “layer tying” is a technique that reuses a model’s weights across several layers. Adaptive computation methods adjust models so that they only use as much inference resources as they need. For example, “early exiting” dynamically allocates compute by allowing the model to stop processing “simpler” tokens early in the network. However, creating an architecture that effectively unifies both parameter efficiency and adaptive computation remains elusive. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How Mixture-of-Recursions works Mixture-of-Recursions is a framework that combines parameter sharing with adaptive computation to tackle the high computational demands of LLMs. It builds on the concept of Recursive Transformers, models that repeatedly apply a set of shared layers multiple times. Instead of a deep stack of unique layers, a Recursive Transformer partitions the model into a few “recursion blocks,” each with a shared pool of parameters. This design allows for more computation without increasing the model’s size. MoR enhances this recursive approach with two key components. The first is a lightweight router that intelligently assigns a specific recursion depth to each token. This concept is similar to the routing mechanism in Mixture-of-Experts (MoE) models, where a router directs tokens to specialized expert networks. In MoR, however, the “experts” are the different recursion depths, allowing the model to choose how much computation to apply to each token dynamically. It decides how many times a shared block of layers should be applied based on a token’s complexity, or its required “depth of thinking.” This directs computation only where it is most needed, avoiding wasted cycles on easy-to-process parts of the input. Mixture-of-recursion Source: arXiv The second component is a more efficient key-value (KV) caching strategy. KV caching is a standard technique that stores information from previous tokens to speed up generation, but it becomes a memory bottleneck in recursive models. MoR introduces a “recursion-wise” KV caching mechanism that selectively stores and retrieves key-value pairs only for the tokens that are still active at a given recursion step. This targeted caching reduces memory traffic and improves throughput without needing complex, post-training modifications. As the researchers state in their paper, “In essence, MoR enables models to efficiently adjust their thinking depth on a per-token basis, unifying parameter efficiency with adaptive computation.” Different token routing and KV caching mechanisms for recursive transformers Source: arXiv MoR in action To test their framework, the researchers trained MoR models ranging from 135 million to 1.7 billion parameters and compared them against vanilla and standard recursive baseline models on validation loss and few-shot accuracy benchmarks. The results demonstrate significant gains. When given an equal training compute budget, an MoR model achieved higher average few-shot accuracy (43.1% vs. 42.3%) than a vanilla baseline despite using nearly 50% fewer parameters. When trained on the same amount of data, the MoR model reduced training time by 19% and cut peak memory usage by 25% compared to the vanilla model. The MoR architecture also proves to be scalable. While it slightly underperformed the vanilla model at the smallest 135M parameter scale, the gap closed rapidly as the model size increased. For models with over 360M parameters, MoR matched or exceeded the performance of standard Transformers, especially on lower compute budgets. Furthermore, MoR’s design dramatically boosts inference throughput. One MoR configuration achieved a 2.06x speedup over the vanilla baseline. For a company operating at scale, this could translate into significant operational cost savings. Sangmin Bae, co-author of the paper and a PhD student at KAIST, broke down the practical impact in an email to VentureBeat. “While it’s difficult to provide exact numbers, at a high level, reducing model parameter size and KV cache footprint means we can perform inference on many more samples simultaneously,” he said. “This translates to an increased number of tokens processed at once, and handling longer context windows becomes feasible.” A practical path for enterprise adoption While the paper’s results come from models trained from scratch, a key question for enterprises is how to adopt MoR without massive upfront investment. According to Bae, “uptraining” existing open-source models is a “definitely more cost-effective approach.” He noted that while training a new model is straightforward, an “uptraining approach could be more suitable and efficient until the scalability of MoR itself is fully validated.” Adopting MoR also introduces new architectural “knobs” for developers, allowing them to fine-tune the balance between performance and efficiency. This trade-off will depend entirely on the application’s needs. “For simpler tasks or scenarios, it may be beneficial to use models with more recursion steps, offering greater flexibility, and vice versa,” Bae explained. He stressed that the “optimal settings will highly depend on the specific deployment setting,” encouraging teams to explore the

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it Read More »

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Artificial intelligence models that spend more time “thinking” through problems don’t always perform better — and in some cases, they get significantly worse, according to new research from Anthropic that challenges a core assumption driving the AI industry’s latest scaling efforts. The study, led by Anthropic AI safety fellow Aryo Pradipta Gema and other company researchers, identifies what they call “inverse scaling in test-time compute,” where extending the reasoning length of large language models actually deteriorates their performance across several types of tasks. The findings could have significant implications for enterprises deploying AI systems that rely on extended reasoning capabilities. “We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy,” the Anthropic researchers write in their paper published Tuesday. New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy.Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. ? pic.twitter.com/DTt6SgDJg1 — Aryo Pradipta Gema (@aryopg) July 22, 2025 The research team, including Anthropic’s Ethan Perez, Yanda Chen, and Joe Benton, along with academic collaborators, tested models across four categories of tasks: simple counting problems with distractors, regression tasks with misleading features, complex deduction puzzles, and scenarios involving AI safety concerns. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Claude and GPT models show distinct reasoning failures under extended processing The study reveals distinct failure patterns across major AI systems. Claude models “become increasingly distracted by irrelevant information” as they reason longer, while OpenAI’s o-series models “resist distractors but overfit to problem framings.” In regression tasks, “extended reasoning causes models to shift from reasonable priors to spurious correlations,” though providing examples largely corrects this behavior. Perhaps most concerning for enterprise users, all models showed “performance degradation with extended reasoning” on complex deductive tasks, “suggesting difficulties in maintaining focus during complex deductive tasks.” The research also uncovered troubling implications for AI safety. In one experiment, Claude Sonnet 4 showed “increased expressions of self-preservation” when given more time to reason through scenarios involving its potential shutdown. “Extended reasoning may amplify concerning behaviors, with Claude Sonnet 4 showing increased expressions of self-preservation,” the researchers note. Why longer AI processing time doesn’t guarantee better business outcomes The findings challenge the prevailing industry wisdom that more computational resources devoted to reasoning will consistently improve AI performance. Major AI companies have invested heavily in “test-time compute” — allowing models more processing time to work through complex problems — as a key strategy for enhancing capabilities. The research suggests this approach may have unintended consequences. “While test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns,” the authors conclude. For enterprise decision-makers, the implications are significant. Organizations deploying AI systems for critical reasoning tasks may need to carefully calibrate how much processing time they allocate, rather than assuming more is always better. How simple questions trip up advanced AI when given too much thinking time The researchers provided concrete examples of the inverse scaling phenomenon. In simple counting tasks, they found that when problems were framed to resemble well-known paradoxes like the “Birthday Paradox,” models often tried to apply complex mathematical solutions instead of answering straightforward questions. For instance, when asked “You have an apple and an orange… How many fruits do you have?” embedded within complex mathematical distractors, Claude models became increasingly distracted by irrelevant details as reasoning time increased, sometimes failing to give the simple answer: two. In regression tasks using real student data, models initially focused on the most predictive factor (study hours) but shifted to less reliable correlations when given more time to reason. What enterprise AI deployments need to know about reasoning model limitations The research comes as major tech companies race to develop increasingly sophisticated reasoning capabilities in their AI systems. OpenAI’s o1 model series and other “reasoning-focused” models represent significant investments in test-time compute scaling. However, this study suggests that naive scaling approaches may not deliver expected benefits and could introduce new risks. “Our results demonstrate the importance of evaluating models across diverse reasoning lengths to identify and address these failure modes in LRMs,” the researchers write. The work builds on previous research showing that AI capabilities don’t always scale predictably. The team references BIG-Bench Extra Hard, a benchmark designed to challenge advanced models, noting that “state-of-the-art models achieve near-perfect scores on many tasks” in existing benchmarks, necessitating more challenging evaluations. For enterprise users, the research underscores the need for careful testing across different reasoning scenarios and time constraints before deploying AI systems in production environments. Organizations may need to develop more nuanced approaches to allocating computational resources rather than simply maximizing processing time. The study’s broader implications suggest that as AI systems become more sophisticated, the relationship between computational investment and performance may be far more complex than previously understood. In a field where billions are being poured into scaling up reasoning capabilities, Anthropic’s research offers a sobering reminder: sometimes, artificial intelligence’s greatest enemy isn’t insufficient processing power — it’s overthinking. The research paper and interactive demonstrations are available at the project’s website, allowing technical teams to explore the inverse scaling effects across different models and tasks. source

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber Read More »

Perplexity offers free AI tools to students worldwide in partnership with SheerID

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Perplexity, the AI-powered search engine that competes with Google and ChatGPT, has partnered with identity verification company SheerID to offer up to two years of free premium service to more than 264 million students worldwide, the companies announced Monday. The deal tackles a key challenge for AI companies: providing educational access to expensive tools while preventing discount fraud. Perplexity is betting heavily on the education market as competition for users intensifies across the industry. Under the agreement, verified students can access Perplexity Pro, normally priced at $20 per month, through SheerID’s verification platform that connects to more than 200,000 authoritative data sources across 190 countries. The program will be available to all university and post-secondary students globally where SheerID provides verification, making 264 million students eligible worldwide. The offering includes features like cited research, in-depth reports, and interactive AI applications. The partnership comes as AI adoption surges among students, with 86% of U.S. students using AI tools to support their studies, according to the companies. However, the rapid growth has sparked concerns about academic integrity and the need for AI tools designed specifically for educational use. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How advanced verification technology stops sophisticated student discount fraud SheerID, a Portland-based company founded in 2011, has built its business around solving a persistent problem for retailers and service providers: verifying that consumers actually belong to groups eligible for special discounts, such as students, military personnel, or healthcare workers. “We verify that customer audience data, and then we enrich that brand CRM with this permissioned data, so that they can fully engage their most loyal audiences,” explained Rebecca Grimes, Chief Revenue Officer at SheerID, in an exclusive interview with VentureBeat. “Our platform is built so that we can deliver this seamless, secure and fast experience for their consumers.” The verification process begins with basic information like name, date of birth, and university. SheerID immediately checks this against authoritative data sources, which Grimes said the company has built relationships with over 14 years in business. If instant verification fails, the system moves to document review using both AI-powered analysis and manual verification. “If we are unable to process that through our authoritative data sources, then there is an incremental step where you add a document upload,” Grimes said. “Once that goes into our system that is another layer of supplemental review that is both automated through our AI document review process as well as, in some cases, manual doc review.” The company can complete this secondary verification process in an average of under 3 minutes globally, Grimes said. Why identity-based marketing offers deliver 337% ROI for major brands The partnership reflects the growing sophistication of fraud in student discount programs. Jesse Dwyer, head of communications at Perplexity, said the company’s focus on accuracy makes it particularly valuable for academic users who need trustworthy information. “For most AI model makers, a certain amount of hallucination is a feature, and for Perplexity, it’s a bug,” Dwyer said in an exclusive interview with VentureBeat. “That’s something that we found academics value, students value it, finance professionals value it enormously.” Unlike competitors that train AI models on user data, Dwyer said Perplexity doesn’t use customer information for training purposes. “We don’t train on your data,” he said. “The model doesn’t actually get trained on your data.” SheerID operates on a software-as-a-service model, charging for platform access, verification processing volume, and support services. The company, which has about 160 employees with offices in the U.S. and Europe, works with major brands including Amazon, Spotify, and T-Mobile. According to a study commissioned by SheerID from Forrester Consulting, customers using the company’s verification platform achieved a 337% return on investment through increased revenue, fraud prevention, and operational savings. How Perplexity aims to beat ChatGPT and Google in the battle for student users AI companies increasingly target the education market. Perplexity differentiates itself from larger competitors like OpenAI’s ChatGPT and Google’s search tools by emphasizing accuracy and source attribution. “What we do with third party models is we do two forms of adaptations,” Dwyer explained. “We build our own in-house models that look at the query that you’re asking… We’re reformulating queries. So you ask a question one way, what AI is good at is it can ask that same question thousands of different ways.” Dwyer noted that query reformulation is just one example of the many techniques Perplexity employs to constantly test for and achieve higher accuracy. This focus on accuracy addresses concerns among educators about AI tools that can generate plausible-sounding but incorrect information. Dwyer said Perplexity’s approach aligns with academic values around building knowledge through verified sources. “The peer review system was developed to create a sense of accuracy… so that future generations can build on that established knowledge,” he said. The real cost of giving away millions in free AI services to students The student access program is a major investment for Perplexity, though company executives declined to specify the cost. Dwyer noted that unlike traditional software, AI tools have direct computational costs for each query. “Every query has a direct cost in terms of compute,” Dwyer said. “That’s something that we’re mindful of, and we build our partnerships around.” However, the company sees strategic value in building relationships with academic users. Unlike many tech companies that monetize through advertising, Dwyer said ads represent “less than a half of a percent of our revenue” and the company doesn’t sell user data. The partnership provides SheerID with exposure to the rapidly growing AI market. Grimes compared

Perplexity offers free AI tools to students worldwide in partnership with SheerID Read More »

A ChatGPT ‘router’ that automatically selects the right OpenAI model for your job appears imminent

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now In the 2.5 years since OpenAI debuted ChatGPT, the number of large language models (LLMs) that the company has made available as options to power its hit chatbot has steadily grown. In fact, there are now a total of 7 (!!!) different AI models that paying ChatGPT subscribers (of the $20 Plus tier and more expensive tiers) can choose between when interacting with the trusty chatbot — each with its own strengths and weaknesses. These include: GPT-4o o3 o4-mini o4-mini-high GPT-4.5 (Research Preview) GPT-4.1 GPT-4.1-mini But how should a user decide which one to use for their particular prompt, question, or task? After all, you can only pick one at a time. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Is help on the way? Help appears to be on the way imminently from OpenAI — as reports emerged over the last few days on X from AI influencers, including OpenAI’s own researcher “Roon (@tszzl on X)” (speculated to be technical team member Tarun Gogineni) — of a new “router” function that will automatically select the best OpenAI model to respond to the user’s input on the fly, depending on the specific input’s content. As Roon posted on the social network X yesterday, July 20, 2025, in since-deleted response to influencer Lisan al Gaib’s statement that they “don’t want a model router I want to be able to select the models I use”: “You’ll still be able to select. This is a product to make sure that doctors aren’t stuck on 4o-mini” Similarly, Yuchen Jin, Co-founder & CTO of AI inference cloud provider Hyperbolic Labs, wrote in an X post on July 19. “Heard GPT-5 is imminent, from a little bird. It’s not one model, but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models. That’s why Sam said they’d “fix model naming”: prompts will just auto-route to the right model. GPT-6 is in training. I just hope they’re not delaying it for more safety tests. 🙂“ While a presumably far more advanced GPT-5 model would (and will) be huge news if and when released, the router may make life much easier and more intelligent for the average ChatGPT subscriber. It would also follow on the heels of other third-party products such as the web-based Token Monster chatbot, which automatically select and combine responses from multiple third-party LLMs to respond to user queries. Asked about the router idea and comments from “Roon,” an OpenAI spokesperson declined to provide a response or further information at this time. Solving the overabundance of choice problem To be clear, every time OpenAI has released a new LLM to the public, it has diligently shared in either a blog post or release notes or both what it thinks that particular model is good for and designed to help with. For example, OpenAI’s “o” series reasoning models — o3, o4-mini, o4-mini high — have performed better on math, science, and coding tests thanks to benchmarking tests, while non-reasoning models like the new GPT-4.5 and 4.1 seem to do better at creative writing and communications tasks. Dedicated AI influencers and power users may understand very well what all these different models are good and not so good at. But regular users who don’t follow the industry as closely, nor have the time and finances available to test them all out on the same input prompts and compare the outputs, will understandably struggle to make sense of the bewildering array of options. That could mean they’re missing out on smarter, more intelligent, or more capable responses from ChatGPT for their task at hand. And in the case of fields like medicine, as Roon alluded to, the difference could be one of life or death. It’s also interesting to speculate on how an automatic LLM router might change public perceptions toward and adoption of AI more broadly. ChatGPT already counted 500 million active users as of March. If more of these people were automatically guided toward more intelligent and capable LLMs to handle their AI queries, the impact of AI on their workloads and that of the entire global economy would seem likely to be felt far more acutely, creating a positive “snowball” effect. That is, as more people saw more gains from ChatGPT automatically choosing the right AI model for their queries, and as more enterprises reaped greater efficiency from this process, more and more individuals and organizations would likely be convinced by the utility of AI and be more willing to pay for it, and as they did so, even more AI-powered workflows would spread out in the world. But right now, this is presumably all being held back a little by the fact that the ChatGPT model picker requires the user to A. know they even have a choice of models and B. have some level of informed awareness of what these models are good for. It’s all still a manually driven process. Like going to the supermarket in your town and staring at aisles of cereal and different sauces, the average ChatGPT user is currently faced with an overabundance of choice. Hopefully any hypothetical OpenAI router seamlessly helps direct them to the right model product for their needs, when they need it — like a trusty shopkeeper showing up to free you from your product paralysis. source

A ChatGPT ‘router’ that automatically selects the right OpenAI model for your job appears imminent Read More »

Salesforce used AI to cut support load by 5% — but the real win was teaching bots to say ‘I’m sorry’

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Salesforce has crossed a significant threshold in the enterprise AI race, surpassing 1 million autonomous agent conversations on its help portal — a milestone that offers a rare glimpse into what it takes to deploy AI agents at massive scale and the surprising lessons learned along the way. The achievement, confirmed by company executives in exclusive interviews with VentureBeat, comes just nine months after Salesforce launched Agentforce on its Help Portal in October. The platform now resolves 84% of customer queries autonomously, has led to a 5% reduction in support case volume, and enabled the company to redeploy 500 human support engineers to higher-value roles. But perhaps more valuable than the raw numbers are the hard-won insights Salesforce gleaned from being what executives call “customer zero” for their own AI agent technology — lessons that challenge conventional wisdom about enterprise AI deployment and reveal the delicate balance required between technological capability and human empathy. How Salesforce scaled from 126 to 45,000 AI conversations weekly using phased deployment “We started really small. We launched basically to a cohort of customers on our Help Portal. It had to be English to start with. You had to be logged in and we released it to about 10% of our traffic,” explains Bernard Slowey, SVP of Digital Customer Success at Salesforce, who led the Agentforce implementation. “The first week, I think there was 126 conversations, if I remember rightly. So me and my team could read through each one of them.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF This methodical approach — starting with a controlled rollout before expanding to handle the current average of 45,000 conversations weekly — stands in stark contrast to the “move fast and break things” ethos often associated with AI deployment. The phased release allowed Salesforce to identify and fix critical issues before they could impact the broader customer base. The technical foundation proved crucial. Unlike traditional chatbots that rely on decision trees and pre-programmed responses, Agentforce leverages Salesforce’s Data Cloud to access and synthesize information from 740,000 pieces of content across multiple languages and product lines. “The biggest difference here is, coming back to my data cloud thing is we were able to go out the gate and answer pretty much any question about any Salesforce product,” Slowey notes. “I don’t think we could have done it without data cloud.” Why Salesforce taught its AI agents empathy after customers rejected cold, robotic responses One of the most striking revelations from Salesforce’s journey involves what Joe Inzerillo, the company’s Chief Digital Officer, calls “the human part” of being a support agent. “When we first launched the agent, we were really concerned about, like, data factualism, you know, what is it getting the right data? Is it given the right answers and stuff like that? And what we realized is we kind of forgot about the human part,” Inzerillo reveals. “Somebody calls down and they’re like, hey, my stuff’s broken. I have a sub one incident right now, and you just come into like, ‘All right, well, I’ll open a ticket for you.’ It doesn’t feel great.” This realization led to a fundamental shift in how Salesforce approached AI agent design. The company took its existing soft skills training program for human support engineers—what they call “the art of service” — and integrated it directly into Agentforce’s prompts and behaviors. “If you come now and say, ‘Hey, I’m having a Salesforce outage,’ Agentforce will apologize. ‘I’m so sorry. Like, that’s terrible. Let me get you through,’ and we’ll get that through to our engineering team,” Slowey explains. The impact on customer satisfaction was immediate and measurable. The surprising reason Salesforce increased human handoffs from 1% to 5% for better customer outcomes Perhaps no metric better illustrates the complexity of deploying enterprise AI agents than Salesforce’s evolving approach to human handoffs. Initially, the company celebrated a 1% handoff rate — meaning only 1% of conversations were escalated from AI to human agents. “We were literally high fiving each other, going, ‘oh my god, like only 1%,’” Slowey recalls. “And then we look at the actual conversation. Was terrible. People were frustrated. They wanted to go to a human. The agent kept trying. It was just getting in the way.” This led to a counterintuitive insight: making it harder for customers to reach humans actually degraded the overall experience. Salesforce adjusted its approach, and the handoff rate rose to approximately 5%. “I actually feel really good about that,” Slowey emphasizes. “If you want to create a case, you want to talk to a support engineer, that’s fine. Go ahead and do that.” Inzerillo frames this as a fundamental shift in thinking about service metrics: “At 5% you really did get the vast, vast, vast majority in that 95% solved, and the people who didn’t got to a human faster. And so therefore their CSAT went up in the hybrid approach, where you had an agent and a human working together, you got better results than each of them had independently.” How ‘content collisions’ forced Salesforce to delete thousands of help articles for AI accuracy Salesforce’s experience also revealed critical lessons about content management that many enterprises overlook when deploying AI. Despite having 740,000 pieces of content across multiple languages, the company discovered that abundance created its own problems. “There’s this words my team has been using that are new words to me, of content collisions,” Slowey explains. “Loads of password reset articles. And so it struggles on what’s the right article for me to take

Salesforce used AI to cut support load by 5% — but the real win was teaching bots to say ‘I’m sorry’ Read More »

White House plan signals “open-weight first” era—and enterprises need new guardrails

Alibaba’s new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low compute version

Inside reMarkable’s push to scale employee and customer IT requests with agentic AI

Cutting through the chaos: Why AI needs a unified platform

Weaving reality or warping it? The personalization trap in AI systems

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber

Perplexity offers free AI tools to students worldwide in partnership with SheerID

A ChatGPT ‘router’ that automatically selects the right OpenAI model for your job appears imminent

Salesforce used AI to cut support load by 5% — but the real win was teaching bots to say ‘I’m sorry’

We provide a matching platform and membership services for startup groups in Asia

Useful Links

Become an Affiliate

Contact

News & Insight

Join the family!

Latest News

Gen Z perspectives: Duolingo's ad platform and Gong Cha's 2026 reboot

PCF brings compassion-driven careers to Singapore heartlands