VentureBeat

Cracking AI’s storage bottleneck and supercharging inference at the edge

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As AI applications increasingly permeate enterprise operations, from enhancing patient care through advanced medical imaging to powering complex fraud detection models and even aiding wildlife conservation, a critical bottleneck often emerges: data storage. During VentureBeat’s Transform 2025, Greg Matson, head of products and marketing, Solidigm and Roger Cummings, CEO of PEAK:AIO spoke with Michael Stewart, managing partner at M12 about how innovations in storage technology enables enterprise AI use cases in healthcare. The MONAI framework is a breakthrough in medical imaging, building it faster, more safely, and more securely. Advances in storage technology is what enables researchers to build on top of this framework, iterate and innovate quickly. PEAK:AIO partnered with Solidgm to integrate power-efficient, performant, and high-capacity storage which enabled MONAI to store more than two million full-body CT scans on a single node within their IT environment. “As enterprise AI infrastructure evolves rapidly, storage hardware increasingly needs to be tailored to specific use cases, depending on where they are in the AI data pipeline,” Matson said. “The type of use case we talked about with MONAI, an edge-use case, as well as the feeding of a training cluster, are well served by very high-capacity solid-state storage solutions, but the actual inference and model training need something different. That’s a very high-performance, very high I/O-per-second requirement from the SSD. For us, RAG is bifurcating the types of products that we make and the types of integrations we have to make with the software.” Improving AI inference at the edge For peak performance at the edge, it’s critical to scale storage down to a single node, in order to bring inference closer to the data. And what’s key is removing memory bottlenecks. That can be done by making memory a part of the AI infrastructure, in order to scale it along with data and metadata. The proximity of data to compute dramatically increases the time to insight. “You see all the huge deployments, the big green field data centers for AI, using very specific hardware designs to be able to bring the data as close as possible to the GPUs,” Matson said. “They’ve been building out their data centers with very high-capacity solid-state storage, to bring petabyte-level storage, very accessible at very high speeds, to the GPUs. Now, that same technology is happening in a microcosm at the edge and in the enterprise.” It’s becoming critical to purchasers of AI systems to ensure you’re getting the most performance out of your system by running it on all solid state. That allows you to bring huge amounts of data, and enables incredible processing power in a small system at the edge. The future of AI hardware “It’s imperative that we provide solutions that are open, scalable, and at memory speed, using some of the latest and greatest technology out there to do that,” Cummings said. “That’s our goal as a company, to provide that openness, that speed, and the scale that organizations need. I think you’re going to see the economies match that as well.” For the overall training and inference data pipeline, and within inference itself, hardware needs will keep increasing, whether it’s a very high-speed SSD or a very high-capacity solution that’s power efficient. “I would say it’s going to move even further toward very high-capacity, whether it’s a one-petabyte SSD out a couple of years from now that runs at very low power and that can basically replace four times as many hard drives, or a very high-performance product that’s almost near memory speeds,” Matson said. “You’ll see that the big GPU vendors are looking at how to define the next storage architecture, so that it can help augment, very closely, the HBM in the system. What was a general-purpose SSD in cloud computing is now bifurcating into capacity and performance. We’ll keep doing that further out in both directions over the next five or 10 years.” source

Cracking AI’s storage bottleneck and supercharging inference at the edge Read More »

Confidence in agentic AI: Why eval infrastructure must come first

As AI agents enter real-world deployment, organizations are under pressure to define where they belong, how to build them effectively, and how to operationalize them at scale. At VentureBeat’s Transform 2025, tech leaders gathered to talk about how they’re transforming their business with agents: Joanne Chen, general partner at Foundation Capital; Shailesh Nalawadi, VP of project management with Sendbird; Thys Waanders, SVP of AI transformation at Cognigy; and Shawn Malhotra, CTO, Rocket Companies. A few top agentic AI use cases “The initial attraction of any of these deployments for AI agents tends to be around saving human capital — the math is pretty straightforward,” Nalawadi said. “However, that undersells the transformational capability you get with AI agents.” At Rocket, AI agents have proven to be powerful tools in increasing website conversion. “We’ve found that with our agent-based experience, the conversational experience on the website, clients are three times more likely to convert when they come through that channel,” Malhotra said. But that’s just scratching the surface. For instance, a Rocket engineer built an agent in just two days to automate a highly specialized task: calculating transfer taxes during mortgage underwriting. “That two days of effort saved us a million dollars a year in expense,” Malhotra said. “In 2024, we saved more than a million team member hours, mostly off the back of our AI solutions. That’s not just saving expense. It’s also allowing our team members to focus their time on people making what is often the largest financial transaction of their life.” Agents are essentially supercharging individual team members. That million hours saved isn’t the entirety of someone’s job replicated many times. It’s fractions of the job that are things employees don’t enjoy doing, or weren’t adding value to the client. And that million hours saved gives Rocket the capacity to handle more business. “Some of our team members were able to handle 50% more clients last year than they were the year before,” Malhotra added. “It means we can have higher throughput, drive more business, and again, we see higher conversion rates because they’re spending the time understanding the client’s needs versus doing a lot of more rote work that the AI can do now.” Tackling agent complexity “Part of the journey for our engineering teams is moving from the mindset of software engineering – write once and test it and it runs and gives the same answer 1,000 times – to the more probabilistic approach, where you ask the same thing of an LLM and it gives different answers through some probability,” Nalawadi said. “A lot of it has been bringing people along. Not just software engineers, but product managers and UX designers.” What’s helped is that LLMs have come a long way, Waanders said. If they built something 18 months or two years ago, they really had to pick the right model, or the agent would not perform as expected. Now, he says, we’re now at a stage where most of the mainstream models behave very well. They’re more predictable. But today the challenge is combining models, ensuring responsiveness, orchestrating the right models in the right sequence and weaving in the right data. “We have customers that push tens of millions of conversations per year,” Waanders said. “If you automate, say, 30 million conversations in a year, how does that scale in the LLM world? That’s all stuff that we had to discover, simple stuff, from even getting the model availability with the cloud providers. Having enough quota with a ChatGPT model, for example. Those are all learnings that we had to go through, and our customers as well. It’s a brand-new world.” A layer above orchestrating the LLM is orchestrating a network of agents, Malhotra said. A conversational experience has a network of agents under the hood, and the orchestrator is deciding which agent to farm the request out to from those available. “If you play that forward and think about having hundreds or thousands of agents who are capable of different things, you get some really interesting technical problems,” he said. “It’s becoming a bigger problem, because latency and time matter. That agent routing is going to be a very interesting problem to solve over the coming years.” Tapping into vendor relationships Up to this point, the first step for most companies launching agentic AI has been building in-house, because specialized tools didn’t yet exist. But you can’t differentiate and create value by building generic LLM infrastructure or AI infrastructure, and you need specialized expertise to go beyond the initial build, and debug, iterate, and improve on what’s been built, as well as maintain the infrastructure. “Often we find the most successful conversations we have with prospective customers tend to be someone who’s already built something in-house,” Nalawadi said. “They quickly realize that getting to a 1.0 is okay, but as the world evolves and as the infrastructure evolves and as they need to swap out technology for something new, they don’t have the ability to orchestrate all these things.” Preparing for agentic AI complexity Theoretically, agentic AI will only grow in complexity — the number of agents in an organization will rise, and they’ll start learning from each other, and the number of use cases will explode. How can organizations prepare for the challenge? “It means that the checks and balances in your system will get stressed more,” Malhotra said. “For something that has a regulatory process, you have a human in the loop to make sure that someone is signing off on this. For critical internal processes or data access, do you have observability? Do you have the right alerting and monitoring so that if something goes wrong, you know it’s going wrong? It’s doubling down on your detection, understanding where you need a human in the loop, and then trusting that those processes are going to catch if something does go wrong. But because of the power it unlocks, you have to do it.” So how can you have confidence

Confidence in agentic AI: Why eval infrastructure must come first Read More »

New 1.5B router model achieves 93% accuracy without costly retraining

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at Katanemo Labs have introduced Arch-Router, a new routing model and framework designed to intelligently map user queries to the most suitable large language model (LLM).  For enterprises building products that rely on multiple LLMs, Arch-Router aims to solve a key challenge: how to direct queries to the best model for the job without relying on rigid logic or costly retraining every time something changes. The challenges of LLM routing As the number of LLMs grows, developers are moving from single-model setups to multi-model systems that use the unique strengths of each model for specific tasks (e.g., code generation, text summarization, or image editing).  LLM routing has emerged as a key technique for building and deploying these systems, acting as a traffic controller that directs each user query to the most appropriate model. Existing routing methods generally fall into two categories: “task-based routing,” where queries are routed based on predefined tasks, and “performance-based routing,” which seeks an optimal balance between cost and performance. However, task-based routing struggles with unclear or shifting user intentions, particularly in multi-turn conversations. Performance-based routing, on the other hand, rigidly prioritizes benchmark scores, often neglects real-world user preferences and adapts poorly to new models unless it undergoes costly fine-tuning. More fundamentally, as the Katanemo Labs researchers note in their paper, “existing routing approaches have limitations in real-world use. They typically optimize for benchmark performance while neglecting human preferences driven by subjective evaluation criteria.”  The researchers highlight the need for routing systems that “align with subjective human preferences, offer more transparency, and remain easily adaptable as models and use cases evolve.” A new framework for preference-aligned routing To address these limitations, the researchers propose a “preference-aligned routing” framework that matches queries to routing policies based on user-defined preferences. In this framework, users define their routing policies in natural language using a “Domain-Action Taxonomy.” This is a two-level hierarchy that reflects how people naturally describe tasks, starting with a general topic (the Domain, such as “legal” or “finance”) and narrowing to a specific task (the Action, such as “summarization” or “code generation”).  Each of these policies is then linked to a preferred model, allowing developers to make routing decisions based on real-world needs rather than just benchmark scores. As the paper states, “This taxonomy serves as a mental model to help users define clear and structured routing policies.” The routing process happens in two stages. First, a preference-aligned router model takes the user query and the full set of policies and selects the most appropriate policy. Second, a mapping function connects that selected policy to its designated LLM.  Because the model selection logic is separated from the policy, models can be added, removed, or swapped simply by editing the routing policies, without any need to retrain or modify the router itself. This decoupling provides the flexibility required for practical deployments, where models and use cases are constantly evolving. Preference-aligned routing framework Source: arXiv The policy selection is powered by Arch-Router, a compact 1.5B parameter language model fine-tuned for preference-aligned routing. Arch-Router receives the user query and the complete set of policy descriptions within its prompt. It then generates the identifier of the best-matching policy.  Since the policies are part of the input, the system can adapt to new or modified routes at inference time through in-context learning and without retraining. This generative approach allows Arch-Router to use its pre-trained knowledge to understand the semantics of both the query and the policies, and to process the entire conversation history at once. A common concern with including extensive policies in a prompt is the potential for increased latency. However, the researchers designed Arch-Router to be highly efficient. “While the length of routing policies can get long, we can easily increase the context window of Arch-Router with minimal impact on latency,” explains Salman Paracha, co-author of the paper and Founder/CEO of Katanemo Labs. He notes that latency is primarily driven by the length of the output, and for Arch-Router, the output is simply the short name of a routing policy, like “image_editing” or “document_creation.” Arch-Router in action To build Arch-Router, the researchers fine-tuned a 1.5B parameter version of the Qwen 2.5 model on a curated dataset of 43,000 examples. They then tested its performance against state-of-the-art proprietary models from OpenAI, Anthropic and Google on four public datasets designed to evaluate conversational AI systems. The results show that Arch-Router achieves the highest overall routing score of 93.17%, surpassing all other models, including top proprietary ones, by an average of 7.71%. The model’s advantage grew with longer conversations, demonstrating its strong ability to track context over multiple turns.  Arch-Router vs other models Source: arXiv In practice, this approach is already being applied in several scenarios, according to Paracha. For example, in open-source coding tools, developers use Arch-Router to direct different stages of their workflow, such as “code design,” “code understanding,” and “code generation,” to the LLMs best suited for each task. Similarly, enterprises can route document creation requests to a model like Claude 3.7 Sonnet while sending image editing tasks to Gemini 2.5 Pro.  The system is also ideal “for personal assistants in various domains, where users have a diversity of tasks from text summarization to factoid queries,” Paracha said, adding that “in those cases, Arch-Router can help developers unify and improve the overall user experience.” This framework is integrated with Arch, Katanemo Labs’ AI-native proxy server for agents, which allows developers to implement sophisticated traffic-shaping rules. For instance, when integrating a new LLM, a team can send a small portion of traffic for a specific routing policy to the new model, verify its performance with internal metrics, and then fully transition traffic with confidence. The company is also working to integrate its tools with evaluation platforms to streamline this process for enterprise developers further. Ultimately, the goal is to move beyond siloed AI implementations.

New 1.5B router model achieves 93% accuracy without costly retraining Read More »

Capital One builds agentic AI to supercharge auto sales

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Inspiration can come from different places, even for architecting and designing agentic systems.  At VB Transform, Capital One explained how it built its agentic platform for its auto business. Milind Naphade, SVP of Technology and Head of AI Foundations at Capital One, said during VB Transform that the company wanted its agents to function similarly to human agents, in that they problem-solve alongside customers.  Naphade said Capital One began designing its agentic offerings 15 months ago, “before agentic became a buzzword.” For Capital One, it was crucial that, in building its agent systems, they learn from how their human agents ask customers for information to identify their problems.  Capital One also looked to another source of organizational structure for its agents: itself.  “We took inspiration from how Capital One itself functions,” Naphade said. “Within Capital One, as I’m sure within other financial services, you have to manage risk, and then there are other entities that you also observe, evaluate, question and audit.” >>See all our Transform 2025 coverage here<< This same structure applies to agents that Capital One wants to monitor. They created an agent that evaluates existing agents, which was trained on Capital One’s policies and regulations. This evaluator agent can kick back the process if it detects a problem. Naphade said to think of it as “a team of experts where each of them has a different expertise and comes together to solve a problem.” Financial services organizations recognize the potential of agents to provide their human agents with information to resolve customer issues, manage customer service, and attract more people to their products. Other banks like BNY have deployed agents this year.  Auto dealership agents Capital One deployed agents to its auto business to assist the bank’s dealership clients in helping their customers find the right car and car loan. Consumers can look at the vehicle inventories of dealerships that are ready for test drives. Naphade said their dealership customers reported a 55% improvement in metrics such as engagement and serious sales leads. “They’re able to generate much better serious leads through this more conversational, natural conversation,” he said. “They can have 24/7 agents working, and if the car breaks down at midnight, the chat is there for you.” Naphade said Capital One would love to bring this type of agent to its travel business, especially for its customer-facing engagements. Capital One, which opened a new lounge in New York’s JFK Airport, offers a very popular credit card for travel points. However, Naphade pointed out that the bank needs to conduct extensive internal testing. Data and models for bank agents Like many enterprises, Capital One has a lot of data for its AI systems, but it has to figure out the best way to bring that context to its agents. It also has to experiment with the best model architecture for its agents.  Naphade and Capital One’s team of applied researchers, engineers and data scientists used methods like model distillation for more efficient architectures. “The understanding agent is the bulk of our cost because that’s the one that has to disambiguate,” he said. “It’s a bigger model, so we try to distribute it down and get a lot of bang for our buck. Then there’s also multi-token prediction and aggregated pre-fill, a lot of interesting ways we can optimize this.” In terms of data, Naphade said his team had undergone several “iterations of experimentation, testing, evaluation, human in the loop and all the right guardrails” before releasing its AI applications.  “But one of the biggest challenges we faced was that we didn’t have any precedents. We couldn’t go and say, oh somebody else did it this way, so we couldn’t ask how it worked out for them?” Naphade said.  Editor’s Note: As a thank-you to our readers, we’ve opened up early bird registration for VB Transform 2026 — just $200. This is where AI ambition meets operational reality, and you’re going to want to be in the room. Reserve your spot now.  source

Capital One builds agentic AI to supercharge auto sales Read More »

Transform 2025: Why observability is critical for AI agent ecosystems

The autonomous software revolution is coming. At Transform 2025, Ashan Willy, CEO of New Relic and Sam Witteveen, CEO and co-founder of Red Dragon AI, talked about how they’re instrumenting agentic systems for measurable ROI and charting the infrastructure roadmap to maximize agentic AI. New Relic provides observability to customers by capturing and correlating application, log, and infrastructure telemetry in real time. Observability goes beyond monitoring — it’s about equipping teams with the context and insight needed to understand, troubleshoot, and optimize complex systems, even in the face of unexpected issues. Today that’s become a considerably more complex undertaking now that generative and agentic AI are in the mix. And observability for the company now includes monitoring everything from Nvidia NIM, DeepSeek, ChatGPT and so on — use of its AI monitoring is up roughly 30%, quarter over quarter, reflecting the acceleration of adoption. “The other thing we see is a huge diversity in models,” Willy said. “Enterprises started with GPT, but are starting to use a whole bunch of models. We’ve seen about a 92% increase in variance of models that are being used. And we’re starting to see enterprises adopt more models. The question is, how do you measure the effectiveness?” Observability in an agentic world In other words, how is observability evolving? That’s a big question. The use cases vary wildly across industries, and the functionality is fundamentally different for each individual company, depending on size and goals. A financial firm might be focused on maximizing EBITDA margins, while a product-focused company is measuring speed to market alongside quality control. When New Relic was founded in 2008, the center of gravity for observability was application monitoring for SaaS, mobile, and then eventually cloud infrastructure. The rise of AI and agentic AI is bringing observability back to applications, as agents, micro-agents, and nano-agents are running and producing AI-written code. AI for observability As the number of services and microservices rises, especially for digitally native organizations, the cognitive load for any human handling observability tasks becomes overwhelming. Of course, AI can help that, Willy says. “The way it’s going to work is you’re going to have enough information where you’ll work in cooperative mode,” he explained. “The promise of agents in observability is to take some of those automatic workloads and make them happen. That will democratize it to more people.” Single platform agentic observability A single platform for observability takes advantage of the agentic world. Agents automate workflows, but they form deep integrations into the entire ecosystem, across all the multiple tools an organization has in play, like Harness, GitHub, ServiceNow, and so on. With agentic AI, developers can be alerted to what’s happening with code errors anywhere in the ecosystem and fix them immediately, without leaving their coding platform. In other words, if there’s an issue with code deployed in GitHub, an observability platform powered by agents can detect it, determine how to solve it, and then alert the engineer — or automate the process entirely. “Our agent is fundamentally looking at every piece of information we have on our platform,” Willy said. “That could be anything from how the application’s performing, how the underlying Azure or AWS structure is performing — anything we think is relevant to that code deployment. We call it agentic skills. We don’t rely on a third party to know APIs and so on.” In GitHub for example, they let a developer know when code is running fine, where errors are being handled, or even when a software rollback is necessary, and then automate that rollback, with developer approval. The next step, which New Relic announced last month, is working with Copilot coding agent to tell the developer exactly which lines of code it’s seeing the issue with. Copilot then goes back, corrects the issue, and then gets a version ready to deploy again. The future of agentic AI As organizations adopt agentic AI and start to adapt to it, they’re going to find that observability is a critical part of its functionality, Willy says. “As you start to build all these agentic integrations and pieces, you’re going to want to know what the agent does,” he says. “This is sort of reasoning for the infrastructure. Reasoning to find out what’s going on in your production. That’s what observability will bring, and we’re on the forefront of that.” source

Transform 2025: Why observability is critical for AI agent ecosystems Read More »

From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Computer vision projects rarely go exactly as planned, and this one was no exception. The idea was simple: Build a model that could look at a photo of a laptop and identify any physical damage — things like cracked screens, missing keys or broken hinges. It seemed like a straightforward use case for image models and large language models (LLMs), but it quickly turned into something more complicated. Along the way, we ran into issues with hallucinations, unreliable outputs and images that were not even laptops. To solve these, we ended up applying an agentic framework in an atypical way — not for task automation, but to improve the model’s performance. In this post, we will walk through what we tried, what didn’t work and how a combination of approaches eventually helped us build something reliable. Where we started: Monolithic prompting Our initial approach was fairly standard for a multimodal model. We used a single, large prompt to pass an image into an image-capable LLM and asked it to identify visible damage. This monolithic prompting strategy is simple to implement and works decently for clean, well-defined tasks. But real-world data rarely plays along. We ran into three major issues early on: Hallucinations: The model would sometimes invent damage that did not exist or mislabel what it was seeing. Junk image detection: It had no reliable way to flag images that were not even laptops, like pictures of desks, walls or people occasionally slipped through and received nonsensical damage reports. Inconsistent accuracy: The combination of these problems made the model too unreliable for operational use. This was the point when it became clear we would need to iterate. First fix: Mixing image resolutions One thing we noticed was how much image quality affected the model’s output. Users uploaded all kinds of images ranging from sharp and high-resolution to blurry. This led us to refer to research highlighting how image resolution impacts deep learning models. We trained and tested the model using a mix of high-and low-resolution images. The idea was to make the model more resilient to the wide range of image qualities it would encounter in practice. This helped improve consistency, but the core issues of hallucination and junk image handling persisted. The multimodal detour: Text-only LLM goes multimodal Encouraged by recent experiments in combining image captioning with text-only LLMs — like the technique covered in The Batch, where captions are generated from images and then interpreted by a language model, we decided to give it a try. Here’s how it works: The LLM begins by generating multiple possible captions for an image.  Another model, called a multimodal embedding model, checks how well each caption fits the image. In this case, we used SigLIP to score the similarity between the image and the text. The system keeps the top few captions based on these scores. The LLM uses those top captions to write new ones, trying to get closer to what the image actually shows. It repeats this process until the captions stop improving, or it hits a set limit. While clever in theory, this approach introduced new problems for our use case: Persistent hallucinations: The captions themselves sometimes included imaginary damage, which the LLM then confidently reported. Incomplete coverage: Even with multiple captions, some issues were missed entirely. Increased complexity, little benefit: The added steps made the system more complicated without reliably outperforming the previous setup. It was an interesting experiment, but ultimately not a solution. A creative use of agentic frameworks This was the turning point. While agentic frameworks are usually used for orchestrating task flows (think agents coordinating calendar invites or customer service actions), we wondered if breaking down the image interpretation task into smaller, specialized agents might help. We built an agentic framework structured like this: Orchestrator agent: It checked the image and identified which laptop components were visible (screen, keyboard, chassis, ports). Component agents: Dedicated agents inspected each component for specific damage types; for example, one for cracked screens, another for missing keys. Junk detection agent: A separate agent flagged whether the image was even a laptop in the first place. This modular, task-driven approach produced much more precise and explainable results. Hallucinations dropped dramatically, junk images were reliably flagged and each agent’s task was simple and focused enough to control quality well. The blind spots: Trade-offs of an agentic approach As effective as this was, it was not perfect. Two main limitations showed up: Increased latency: Running multiple sequential agents added to the total inference time. Coverage gaps: Agents could only detect issues they were explicitly programmed to look for. If an image showed something unexpected that no agent was tasked with identifying, it would go unnoticed. We needed a way to balance precision with coverage. The hybrid solution: Combining agentic and monolithic approaches To bridge the gaps, we created a hybrid system: The agentic framework ran first, handling precise detection of known damage types and junk images. We limited the number of agents to the most essential ones to improve latency. Then, a monolithic image LLM prompt scanned the image for anything else the agents might have missed. Finally, we fine-tuned the model using a curated set of images for high-priority use cases, like frequently reported damage scenarios, to further improve accuracy and reliability. This combination gave us the precision and explainability of the agentic setup, the broad coverage of monolithic prompting and the confidence boost of targeted fine-tuning. What we learned A few things became clear by the time we wrapped up this project: Agentic frameworks are more versatile than they get credit for: While they are usually associated with workflow management, we found they could meaningfully boost model performance when applied in a structured, modular way. Blending different approaches beats relying on just one: The combination of precise, agent-based detection alongside the broad coverage

From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways Read More »

How CISOs became the gatekeepers of $309B AI infrastructure spending

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Enterprise AI infrastructure spending is expected to reach $309 billion by 2032. The winners won’t be determined by who has the best models; it’ll come down to who controls the infrastructure layer that makes AI operational at scale. Security vendors are making the most aggressive moves. Palo Alto Networks, CrowdStrike and Cisco each report AI-driven security revenue growing 70 to 80% year-over-year while traditional infrastructure sales decline. The pattern is clear: Security is becoming the control plane for enterprise AI. “The complexity of AI workloads is straining existing infrastructure to its breaking point,” Ali Ghodsi, CEO of Databricks, notes in a blog post. “Enterprises need fundamentally new approaches to manage AI at scale.” The evidence is mounting. According to IDC, 73% of enterprises cite infrastructure inadequacy as their primary barrier to AI adoption. Meanwhile, adversaries are weaponizing AI faster than enterprises can deploy defenses. The infrastructure wars have begun. AgenticOps emerges as the new battleground AgenticOps isn’t one vendor’s vision. It’s an industry-wide recognition that traditional IT operations can’t manage AI agents operating at machine speed with human permissions. Cisco kicked off the category at Cisco Live 2025, but Microsoft’s AI Orchestration, Google’s Model Operations and startups like Weights & Biases are already racing to own it. The battle lines are drawn. The technical requirements are brutal. Enterprises deploying 50,000 AI agents need infrastructure that handles cross-domain data access, real-time governance and multi-team collaboration. Traditional tools break at 5,000 agents. The math doesn’t work. “For the very first time, security is becoming an accelerant to adoption, rather than an impediment,” Jeetu Patel, Cisco’s president and CPO, told VentureBeat in a recent interview. The shift is fundamental: Security teams now enable AI deployment rather than blocking it. Three pillars define enterprise-grade AgenticOps: unified data access across all domains, collaborative environments where NetOps and SecOps teams work together and purpose-built models that govern agent actions. Forrester research confirms multi-domain visibility as critical. Vendors who master all three components will be the ones to dominate. But most struggle to deliver even one effectively. Cisco’s President and CPO Jeetu Patel launched AgenticOps at Cisco Live 2025, signaling a decisive move toward cross-domain, multiplayer AI operations built on a purpose-built model engineered to handle the complexity and scale of enterprise infrastructure in the agentic AI era. Source: LinkedIn The death of perimeter security Traditional firewalls can’t protect AI workloads. The evidence is overwhelming. Palo Alto’s Prisma Cloud processes 2 billion security events daily at runtime. Fortinet’s Security Fabric connects more than 500 integration points because perimeter defense has failed. Check Point’s Infinity operates on zero-trust principles, assuming a breach at every layer. Extended Berkeley Packet Filter (eBPF) changed the game. This Linux kernel technology enables security enforcement without the 40% performance hit of traditional approaches. Cisco’s $2.8 billion Isovalent acquisition validated the approach. Cilium, Isovalent’s open-source project, now secures production workloads at Netflix, Adobe and Capital One. The 15,000 GitHub stars reflect enterprise adoption, not developer interest. Craig Connors, Cisco’s VP and CTO of security, framed the shift in a recent VentureBeat interview: “Security policy now applies across every layer, from workload to silicon.” The implication is clear. Security becomes an integral part of infrastructure, not an overlay. Hardware acceleration seals the transformation. Silicon-embedded security operates at nanosecond latency. The math is brutal: Software-defined security adds 50-200 milliseconds. Hardware security adds 50 to 200 nanoseconds. That’s a million-fold improvement. Vendors without silicon capabilities can’t compete. The 72-hour exploit window Adversaries weaponize vulnerabilities in 72 hours. Enterprises patch in 45 days. This gap generates 84% of successful breaches. Every security vendor is racing to close it. CrowdStrike’s Falcon Prevent blocks exploits before patches exist. Qualys VMDR delivers real-time vulnerability management. Tanium Patch promises sub-hour automated response. Cisco’s Live Protect applies kernel-level shields within minutes. The economics are undeniable. Ponemon Institute research shows that each hour of delayed patching costs $84,000 in breach risk. Automated platforms deliver a return on investment (ROI) in 4.7 months. CISOs can’t ignore the math. “Time is everything in cybersecurity,” emphasizes Shlomo Kramer, CEO of Cato Networks. “Automation isn’t just about efficiency; it’s about surviving attacks that human teams can’t respond to quickly enough.” The observability wars intensify The $28 billion Splunk acquisition signals a larger truth: Observability determines who wins the AI infrastructure battle. Datadog processes 18 trillion events daily. New Relic monitors 10 billion transactions per minute. Dynatrace tracks 2.5 million cloud applications. The stakes are existential. Enterprises deploying AI without observability are flying blind. “You can’t secure what you can’t see,” states Etay Maor, senior director of security strategy at Cato Networks. “Observability isn’t optional, it’s the very foundation of secure digital transformation.” Generative UI represents the next frontier. Instead of dashboards, AI creates interfaces in real-time based on the exact problem being solved. ServiceNow, Splunk and emerging players like Observable are betting that dynamic interfaces replace static dashboards within 24 months. Market consolidation accelerates The infrastructure giants are assembling their armies through acquisition. Cisco paid $28 billion for Splunk. Palo Alto acquired Cider Security, Dig Security and Talon for a combined $1.2 billion. CrowdStrike bought Reposify, Humio, and Preempt. Broadcom’s $69 billion VMware acquisition reshapes the entire landscape. Platform velocity now determines survival. Unified architectures cut development time from years to months. What took 18 months to deploy now launches in 8 weeks. Engineering teams are voting with their feet, joining companies that ship at startup speed with enterprise scale. The AI infrastructure market is expected to consolidate from over 200 vendors to fewer than 20 platforms within 36 months. Gartner predicts 60% of current vendors won’t exist by 2027. The message is brutal: Control the full stack or become irrelevant. The bottom line AgenticOps represents the most significant architectural shift since the advent of cloud computing. Enterprises that build AI infrastructure assuming continuous compromise, infinite identities and machine-speed attacks will

How CISOs became the gatekeepers of $309B AI infrastructure spending Read More »

Creatio’s new 8.3 Twin CRM update hits Salesforce where it hurts: ‘we don’t think of AI as an add-on…it’s just part of our app experience’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Creatio, the Boston-headquartered customer relationship management (CRM) company focused on no-code and low-code CRM app deployment, has officially launched its latest platform update, the 8.3 “Twin” Release, introducing a suite of AI-native capabilities designed to streamline CRM and workflow automation. With this update, Creatio continues its mission to build enterprise software where humans and AI agents collaborate across functions such as sales, marketing, service, and application development. “This release for us is sort of that pivot point where we move beyond the traditional SaaS CRM,” said Burley Kawasaki, Chief Product Officer at Creatio, in an interview with VentureBeat. “The way we work with SaaS has fundamentally changed—it’s now about fluid movement between applications and AI agents.” AI-native throughout Creatio with conversational and classic interface access The new release centers around embedding AI into the core of the Creatio platform, rather than treating it as an add-on. Creatio 8.3 includes a conversational user experience, prebuilt role-based AI agents, and a no-code agent builder that allows businesses to customize how automation functions in their specific environment. These features are now available to all customers and trial users with no added cost or licensing requirements. “You won’t see us ever release a Creatio Force, right?” Kawaski said, a twinkle in his eye as he subtley called out rival CRM provider Salesforce’s Agentforce AI agent creation platform, which requires a separate additional subscription or paid credits to access agent building capabilities. Creatio’s platform starts at $25 per user per month. Because we don’t think of it as an add-on or something separate you have to use in addition to your existing apps,” said Kawasaki. “This is just part of our app experience.” Multi-conversation support across platform The platform now offers a natural language interface that spans Creatio’s web and mobile apps, as well as integrations with tools such as Microsoft Outlook and Teams. Users can switch between channels and devices without losing context, allowing for ongoing, persistent interactions with AI agents across workflows. “We’ve enabled multi-conversation support with persistent context across devices and platforms—whether you’re in Outlook, Teams, or on mobile,” Kawasaki explained. “Our adaptive user experience lets users live in a prompt-driven CRM, accessing functionality through natural language. You can toggle between classic and conversational modes as needed.” Zoom and Gmail integration are expected later in the year. Prebuilt and customizable AI agents for diverse sectors and functions In this release, Creatio debuts several prebuilt agents that support high-frequency tasks across core business areas: Sales agents handle tasks such as researching accounts, preparing meetings, and generating quotes. Marketing agents assist with creating campaign messaging, emails, and other targeted content. Service agents focus on faster case resolution by drawing from internal knowledge bases and suggesting new content where gaps exist. No-code agents, starting with the Dashboards Agent, help non-technical users generate and refine analytics using natural language prompts. “In sales, agents handle data entry, follow-ups, and paperwork, freeing up reps to focus on relationships and strategy,” said Kawasaki. “In marketing, agents take on content creation and campaign analytics, letting marketers focus more on storytelling.” “While others are still offering fragmented AI products and complex pricing models, we’ve taken a different path,” said Kawasaki. “This release offers one platform, one experience, and one clear route to accelerated AI adoption and realizing real business value.” The conversational assistant in Creatio 8.3 also introduces advanced features like file uploads for grounding responses in organization-specific documents, and support for retrieval-augmented generation (RAG) to ensure accuracy based on proprietary knowledge. “Agents can now be grounded in uploaded documents and Creatio’s metadata, ensuring responses are accurate and personalized, not generic,” Kawasaki noted. Build and deploy custom AI agents with no code The no-code development environment has also been overhauled with embedded AI that assists in building dashboards, apps, and even new AI agents. Users can design and deploy agents by combining reusable skills, workflows, prompts, and knowledge sources—offering full flexibility to tailor automation without needing engineering resources. “We’re launching a no-code agent builder where you define skills, prompts, actions, and workflows,” said Kawasaki. “It’s visual and accessible but still requires thoughtful training and data input.” Under the hood, Creatio supports multiple foundational models including OpenAI, Anthropic, and Gemini. A bring-your-own-model capability is planned for later in 2025, enabling customers to host their own models, such as Llama or DeepSeek, and pair them with specific tasks or agents. “We decided early on not to build our own LLM,” Kawasaki explained. “Instead, we support OpenAI, Anthropic, and Gemini—and soon, customers will be able to bring their own model.” Security, data governance, and grounding remain priorities Security and governance were also a priority in this release. Documents used to train or inform AI agents are stored in each customer’s secure, dedicated instance, and are not shared with external large language models. Customers can control which documents are persistent for agent grounding and manage access to ensure regulatory compliance, especially in sensitive industries. The release follows extensive hands-on testing and feedback from Creatio’s customer base. Early users highlighted the need for new guidance on designing processes in AI-assisted environments, which Creatio plans to address with additional best-practice resources later this year. The company also expects to follow this summer launch with a second wave of feature enhancements and new agents in the fall. “We see the real opportunity not in agents replacing humans, but in agents complementing teams—human and digital working together in hybrid organizations,” said Kawasaki. Creatio positions the 8.3 release as a shift away from other CRM bolt-on AI strategies. Instead, it aims to unify AI and human contributions into a single, fluid experience. For customers, this means the freedom to work how they choose—whether through classic CRM interfaces or through conversational, AI-supported flows. source

Creatio’s new 8.3 Twin CRM update hits Salesforce where it hurts: ‘we don’t think of AI as an add-on…it’s just part of our app experience’ Read More »

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Japanese AI lab Sakana AI has introduced a new technique that allows multiple large language models (LLMs) to cooperate on a single task, effectively creating a “dream team” of AI agents. The method, called Multi-LLM AB-MCTS, enables models to perform trial-and-error and combine their unique strengths to solve problems that are too complex for any individual model. For enterprises, this approach provides a means to develop more robust and capable AI systems. Instead of being locked into a single provider or model, businesses could dynamically leverage the best aspects of different frontier models, assigning the right AI for the right part of a task to achieve superior results. The power of collective intelligence Frontier AI models are evolving rapidly. However, each model has its own distinct strengths and weaknesses derived from its unique training data and architecture. One might excel at coding, while another excels at creative writing. Sakana AI’s researchers argue that these differences are not a bug, but a feature. “We see these biases and varied aptitudes not as limitations, but as precious resources for creating collective intelligence,” the researchers state in their blog post. They believe that just as humanity’s greatest achievements come from diverse teams, AI systems can also achieve more by working together. “By pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.” Thinking longer at inference time Sakana AI’s new algorithm is an “inference-time scaling” technique (also referred to as “test-time scaling”), an area of research that has become very popular in the past year. While most of the focus in AI has been on “training-time scaling” (making models bigger and training them on larger datasets), inference-time scaling improves performance by allocating more computational resources after a model is already trained.  One common approach involves using reinforcement learning to prompt models to generate longer, more detailed chain-of-thought (CoT) sequences, as seen in popular models such as OpenAI o3 and DeepSeek-R1. Another, simpler method is repeated sampling, where the model is given the same prompt multiple times to generate a variety of potential solutions, similar to a brainstorming session. Sakana AI’s work combines and advances these ideas. “Our framework offers a smarter, more strategic version of Best-of-N (aka repeated sampling),” Takuya Akiba, research scientist at Sakana AI and co-author of the paper, told VentureBeat. “It complements reasoning techniques like long CoT through RL. By dynamically selecting the search strategy and the appropriate LLM, this approach maximizes performance within a limited number of LLM calls, delivering better results on complex tasks.” How adaptive branching search works The core of the new method is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). It enables an LLM to effectively perform trial-and-error by intelligently balancing two different search strategies: “searching deeper” and “searching wider.” Searching deeper involves taking a promising answer and repeatedly refining it, while searching wider means generating completely new solutions from scratch. AB-MCTS combines these approaches, allowing the system to improve a good idea but also to pivot and try something new if it hits a dead end or discovers another promising direction. To accomplish this, the system uses Monte Carlo Tree Search (MCTS), a decision-making algorithm famously used by DeepMind’s AlphaGo. At each step, AB-MCTS uses probability models to decide whether it’s more strategic to refine an existing solution or generate a new one. Different test-time scaling strategies Source: Sakana AI The researchers took this a step further with Multi-LLM AB-MCTS, which not only decides “what” to do (refine vs. generate) but also “which” LLM should do it. At the start of a task, the system doesn’t know which model is best suited for the problem. It begins by trying a balanced mix of available LLMs and, as it progresses, learns which models are more effective, allocating more of the workload to them over time. Putting the AI ‘dream team’ to the test The researchers tested their Multi-LLM AB-MCTS system on the ARC-AGI-2 benchmark. ARC (Abstraction and Reasoning Corpus) is designed to test a human-like ability to solve novel visual reasoning problems, making it notoriously difficult for AI.  The team used a combination of frontier models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1. The collective of models was able to find correct solutions for over 30% of the 120 test problems, a score that significantly outperformed any of the models working alone. The system demonstrated the ability to dynamically assign the best model for a given problem. On tasks where a clear path to a solution existed, the algorithm quickly identified the most effective LLM and used it more frequently. AB-MCTS vs individual models Source: Sakana AI More impressively, the team observed instances where the models solved problems that were previously impossible for any single one of them. In one case, a solution generated by the o4-mini model was incorrect. However, the system passed this flawed attempt to DeepSeek-R1 and Gemini-2.5 Pro, which were able to analyze the error, correct it, and ultimately produce the right answer.  “This demonstrates that Multi-LLM AB-MCTS can flexibly combine frontier models to solve previously unsolvable problems, pushing the limits of what is achievable by using LLMs as a collective intelligence,” the researchers write. AB-MTCS can select different models at different stages of solving a problem Source: Sakana AI “In addition to the individual pros and cons of each model, the tendency to hallucinate can vary significantly among them,” Akiba said. “By creating an ensemble with a model that is less likely to hallucinate, it could be possible to achieve the best of both worlds: powerful logical capabilities and strong groundedness. Since hallucination is a major issue in a business context, this approach could be valuable for its mitigation.” From research to real-world applications To help developers and businesses apply this technique, Sakana AI has released the underlying algorithm as an open-source framework called TreeQuest,

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30% Read More »

HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now It’s been a little more than a month since Chinese AI startup DeepSeek, an offshoot of Hong Kong-based High-Flyer Capital Management, released the latest version of its hit open source model DeepSeek, R1-0528. Like its predecessor, DeepSeek-R1 — which rocked the AI and global business communities with how cheaply it was trained and how well it performed on reasoning tasks, all available to developers and enterprises for free — R1-0528 is already being adapted and remixed by other AI labs and developers, thanks in large part to its permissive Apache 2.0 license. This week, the 24-year-old German firm TNG Technology Consulting GmbH released one such adaptation: DeepSeek-TNG R1T2 Chimera, the latest model in its Chimera large language model (LLM) family. R1T2 delivers a notable boost in efficiency and speed, scoring at upwards of 90% of R1-0528’s intelligence benchmark scores, while generating answers with less than 40% of R1-0528’s output token count. That means it produces shorter responses, translating directly into faster inference and lower compute costs. On the model card TNG released for its new R1T2 on the AI code sharing community Hugging Face, the company states that it is “about 20% faster than the regular R1” (the one released back in January) “and more than twice as fast as R1-0528” (the May official update from DeepSeek). Already, the response has been incredibly positive from the AI developer community. “DAMN! DeepSeek R1T2 – 200% faster than R1-0528 & 20% faster than R1,” wrote Vaibhav (VB) Srivastav, a senior leader at Hugging Face, on X. “Significantly better than R1 on GPQA & AIME 24, made via Assembly of Experts with DS V3, R1 & R1-0528 — and it’s MIT-licensed, available on Hugging Face.” This gain is made possible by TNG’s Assembly-of-Experts (AoE) method — a technique for building LLMs by selectively merging the weight tensors (internal parameters) from multiple pre-trained models that TNG described in a paper published in May on arXiv, the non-peer reviewed open access online journal. A successor to the original R1T Chimera, R1T2 introduces a new “Tri-Mind” configuration that integrates three parent models: DeepSeek-R1-0528, DeepSeek-R1, and DeepSeek-V3-0324. The result is a model engineered to maintain high reasoning capability while significantly reducing inference cost. R1T2 is constructed without further fine-tuning or retraining. It inherits the reasoning strength of R1-0528, the structured thought patterns of R1, and the concise, instruction-oriented behavior of V3-0324 — delivering a more efficient, yet capable model for enterprise and research use. How Assembly-of-Experts (AoE) Differs from Mixture-of-Experts (MoE) Mixture-of-Experts (MoE) is an architectural design in which different components, or “experts,” are conditionally activated per input. In MoE LLMs like DeepSeek-V3 or Mixtral, only a subset of the model’s expert layers (e.g., 8 out of 256) are active during any given token’s forward pass. This allows very large models to achieve higher parameter counts and specialization while keeping inference costs manageable — because only a fraction of the network is evaluated per token. Assembly-of-Experts (AoE) is a model merging technique, not an architecture. It’s used to create a new model from multiple pre-trained MoE models by selectively interpolating their weight tensors. The “experts” in AoE refer to the model components being merged — typically the routed expert tensors within MoE layers — not experts dynamically activated at runtime. TNG’s implementation of AoE focuses primarily on merging routed expert tensors — the part of a model most responsible for specialized reasoning — while often retaining the more efficient shared and attention layers from faster models like V3-0324. This approach enables the resulting Chimera models to inherit reasoning strength without replicating the verbosity or latency of the strongest parent models. Performance and Speed: What the Benchmarks Actually Show According to benchmark comparisons presented by TNG, R1T2 achieves between 90% and 92% of the reasoning performance of its most intelligent parent, DeepSeek-R1-0528, as measured by AIME-24, AIME-25, and GPQA-Diamond test sets. However, unlike DeepSeek-R1-0528 — which tends to produce long, detailed answers due to its extended chain-of-thought reasoning — R1T2 is designed to be much more concise. It delivers similarly intelligent responses while using significantly fewer words. Rather than focusing on raw processing time or tokens-per-second, TNG measures “speed” in terms of output token count per answer — a practical proxy for both cost and latency. According to benchmarks shared by TNG, R1T2 generates responses using approximately 40% of the tokens required by R1-0528. That translates to a 60% reduction in output length, which directly reduces inference time and compute load, speeding up responses by 2X, or 200%. When compared to the original DeepSeek-R1, R1T2 is also around 20% more concise on average, offering meaningful gains in efficiency for high-throughput or cost-sensitive deployments. This efficiency does not come at the cost of intelligence. As shown in the benchmark chart presented in TNG’s technical paper, R1T2 sits in a desirable zone on the intelligence vs. output cost curve. It preserves reasoning quality while minimizing verbosity — an outcome critical to enterprise applications where inference speed, throughput, and cost all matter. Deployment Considerations and Availability R1T2 is released under a permissive MIT License and is available now on Hugging Face, meaning it is open source and available to be used and built into commercial applications. TNG notes that while the model is well-suited for general reasoning tasks, it is not currently recommended for use cases requiring function calling or tool use, due to limitations inherited from its DeepSeek-R1 lineage. These may be addressed in future updates. The company also advises European users to assess compliance with the EU AI Act, which comes into effect on August 2, 2025. Enterprises operating in the EU should review relevant provisions or consider halting model use after that date if requirements cannot be met. However, U.S. companies operating domestically and servicing U.S.-based users, or those of other nations, are not subject to the terms of the EU

HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH Read More »