VentureBeat

The new AI calculus: Google’s 80% cost edge vs. OpenAI’s ecosystem

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The relentless pace of generative AI innovation shows no signs of slowing. In just the past couple of weeks, OpenAI dropped its powerful o3 and o4-mini reasoning models alongside the GPT-4.1 series, while Google countered with Gemini 2.5 Flash, rapidly iterating on its flagship Gemini 2.5 Pro released shortly before. For enterprise technical leaders navigating this dizzying landscape, choosing the right AI platform requires looking far beyond rapidly shifting model benchmarks While model-versus-model benchmarks grab headlines, the decision for technical leaders goes far deeper. Choosing an AI platform is a commitment to an ecosystem, impacting everything from core compute costs and agent development strategy to model reliability and enterprise integration.  But perhaps the most stark differentiator, bubbling beneath the surface but with profound long-term implications, lies in the economics of the hardware powering these AI giants. Google wields a massive cost advantage thanks to its custom silicon, potentially running its AI workloads at a fraction of the cost OpenAI incurs relying on Nvidia’s market-dominant (and high-margin) GPUs.   This analysis delves beyond the benchmarks to compare the Google and OpenAI/Microsoft AI ecosystems across the critical factors enterprises must consider today: the significant disparity in compute economics, diverging strategies for building AI agents, the crucial trade-offs in model capabilities and reliability and the realities of enterprise fit and distribution. The analysis builds upon an in-depth video discussion exploring these systemic shifts between myself and AI developer Sam Witteveen earlier this week. 1. Compute economics: Google’s TPU “secret weapon” vs. OpenAI’s Nvidia tax The most significant, yet often under-discussed, advantage Google holds is its “secret weapon:” its decade-long investment in custom Tensor Processing Units (TPUs). OpenAI and the broader market rely heavily on Nvidia’s powerful but expensive GPUs (like the H100 and A100). Google, on the other hand, designs and deploys its own TPUs, like the recently unveiled Ironwood generation, for its core AI workloads. This includes training and serving Gemini models.   Why does this matter? It makes a huge cost difference.  Nvidia GPUs command staggering gross margins, estimated by analysts to be in the 80% range for data center chips like the H100 and upcoming B100 GPUs. This means OpenAI (via Microsoft Azure) pays a hefty premium — the “Nvidia tax” — for its compute power. Google, by manufacturing TPUs in-house, effectively bypasses this markup. While manufacturing GPUs might cost Nvidia $3,000-$5,000, hyperscalers like Microsoft (supplying OpenAI) pay $20,000-$35,000+ per unit in volume, according to reports. Industry conversations and analysis suggest that Google may be obtaining its AI compute power at roughly 20% of the cost incurred by those purchasing high-end Nvidia GPUs. While the exact numbers are internal, the implication is a 4x-6x cost efficiency advantage per unit of compute for Google at the hardware level. This structural advantage is reflected in API pricing. Comparing the flagship models, OpenAI’s o3 is roughly 8 times more expensive for input tokens and 4 times more expensive for output tokens than Google’s Gemini 2.5 Pro (for standard context lengths). This cost differential isn’t academic; it has profound strategic implications. Google can likely sustain lower prices and offer better “intelligence per dollar,” giving enterprises more predictable long-term Total Cost of Ownership (TCO) – and that’s exactly what it is doing right now in practice. OpenAI’s costs, meanwhile, are intrinsically tied to Nvidia’s pricing power and the terms of its Azure deal. Indeed, compute costs represent an estimated 55-60% of OpenAI’s total $9B operating expenses in 2024, according to some reports, and are projected to exceed 80% in 2025 as they scale. While OpenAI’s projected revenue growth is astronomical – potentially hitting $125 billion by 2029 according to reported internal forecasts – managing this compute spend remains a critical challenge, driving their pursuit of custom silicon. 2. Agent frameworks: Google’s open ecosystem approach vs. OpenAI’s integrated one Beyond hardware, the two giants are pursuing divergent strategies for building and deploying the AI agents poised to automate enterprise workflows. Google is making a clear push for interoperability and a more open ecosystem. At Cloud Next two weeks ago, it unveiled the Agent-to-Agent (A2A) protocol, designed to allow agents built on different platforms to communicate, alongside its Agent Development Kit (ADK) and the Agentspace hub for discovering and managing agents. While A2A adoption faces hurdles — key players like Anthropic haven’t signed on (VentureBeat reached out to Anthropic about this, but Anthropic declined to comment) — and some developers debate its necessity alongside Anthropic’s existing Model Context Protocol (MCP). Google’s intent is clear: to foster a multi-vendor agent marketplace, potentially hosted within its Agent Garden or via a rumored Agent App Store.   OpenAI, conversely, appears focused on creating powerful, tool-using agents tightly integrated within its own stack. The new o3 model exemplifies this, capable of making hundreds of tool calls within a single reasoning chain. Developers leverage the Responses API and Agents SDK, along with tools like the new Codex CLI, to build sophisticated agents that operate within the OpenAI/Azure trust boundary. While frameworks like Microsoft’s Autogen offer some flexibility, OpenAI’s core strategy seems less about cross-platform communication and more about maximizing agent capabilities vertically within its controlled environment.   The enterprise takeaway: Companies prioritizing flexibility and the ability to mix-and-match agents from various vendors (e.g., plugging a Salesforce agent into Vertex AI) may find Google’s open approach appealing. Those deeply invested in the Azure/Microsoft ecosystem or preferring a more vertically managed, high-performance agent stack might lean towards OpenAI. 3. Model capabilities: parity, performance, and pain points The relentless release cycle means model leadership is fleeting. While OpenAI’s o3 currently edges out Gemini 2.5 Pro on some coding benchmarks like SWE-Bench Verified and Aider, Gemini 2.5 Pro matches or leads on others like GPQA and AIME. Gemini 2.5 Pro is also the overall leader on the large language model (LLM) Arena Leaderboard. For many enterprise use cases, however, the models have reached rough parity in core capabilities.    The real difference lies in

The new AI calculus: Google’s 80% cost edge vs. OpenAI’s ecosystem Read More »

New method lets DeepSeek and other models answer ‘sensitive’ questions

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It is tough to remove bias, and in some cases, outright censorship, in large language models (LLMs). One such model, DeepSeek from China, alarmed politicians and some business leaders about its potential danger to national security.  A select committee at the U.S. Congress recently released a report called DeepSeek, “a profound threat to our nation’s security,” and detailed policy recommendations.  While there are ways to bypass bias through Reinforcement Learning from Human Feedback (RLHF) and fine-tuning, the enterprise risk management startup CTGT claims to have an alternative approach. CTGT developed a method that bypasses bias and censorship baked into some language models that it says 100% removes censorship. In a paper, Cyril Gorlla and Trevor Tuttle of CTGT said that their framework “directly locates and modifies the internal features responsible for censorship.” “This approach is not only computationally efficient but also allows fine-grained control over model behavior, ensuring that uncensored responses are delivered without compromising the model’s overall capabilities and factual accuracy,” the paper said.  While the method was developed explicitly with DeepSeek-R1-Distill-Llama-70B in mind, the same process can be used on other models.  “We have tested CTGT with other open weights models such as Llama and found it to be just as effective,” Gorlla told VentureBeat in an email. “Our technology operates at the foundational neural network level, meaning it applies to all deep learning models. We’re working with a leading foundation model lab to ensure their new models are trustworthy and safe from the core.” How it works The researchers said their method identifies features with a high likelihood of being associated with unwanted behaviors.  “The key idea is that within a large language model, there exist latent variables (neurons or directions in the hidden state) that correspond to concepts like ‘censorship trigger’ or ‘toxic sentiment’. If we can find those variables, we can directly manipulate them,” Gorlla and Tuttle wrote.  CTGT said there are three key steps: Feature identification Feature isolation and characterization Dynamic feature modification.  The researchers make a series of prompts that could trigger one of those “toxic sentiments.” For example, they may ask for more information about Tiananmen Square or request tips to bypass firewalls. Based on the responses, they run the prompts and establish a pattern and find vectors where the model decides to censor information.  Once these are identified, the researchers can isolate that feature and figure out which part of the unwanted behavior it controls. Behavior may include responding more cautiously or refusing to respond altogether. Understanding what behavior the feature controls, researchers can then “integrate a mechanism into the model’s inference pipeline” that adjusts how much the feature’s behavior is activated. Making the model answer more prompts CTGT said its experiments, using 100 sensitive queries, showed that the base DeepSeek-R1-Distill-Llama-70B model answered only 32% of the controversial prompts it was fed. But the modified version responded to 96% of the prompts. The remaining 4%, CTGT explained, were extremely explicit content.  The company said that while the method allows users to toggle how much baked-in bias and safety features work, it still believes the model will not turn “into a reckless generator,” especially if only unnecessary censorship is removed.  Its method also does not sacrifice the accuracy or performance of the model.  “This is fundamentally different from traditional fine-tuning as we are not optimizing model weights or feeding it new example responses. This has two major advantages: changes take effect immediately for the very next token generation, as opposed to hours or days of retraining; and reversibility and adaptivity, since no weights are permanently changed, the model can be switched between different behaviors by toggling the feature adjustment on or off, or even adjusted to varying degrees for different contexts,” the paper said.  Model safety and security The congressional report on DeepSeek recommended that the US “take swift action to expand export controls, improve export control enforcement, and address risks from Chinese artificial intelligence models.”  Once the U.S. government began questioning DeepSeek’s potential threat to national security, researchers and AI companies sought ways to make it, and other models, “safe.” What is or isn’t “safe,” or biased or censored, can sometimes be difficult to judge, but developing methods that allow users to figure out how to toggle controls to make the model work for them could prove very useful.  Gorlla said enterprises “need to be able to trust their models are aligned with their policies,” which is why methods like the one he helped develop would be critical for businesses.  “CTGT enables companies to deploy AI that adapts to their use cases without having to spend millions of dollars fine-tuning models for each use case. This is particularly important in high-risk applications like security, finance, and healthcare, where the potential harms that can come from AI malfunctioning are severe,” he said.  source

New method lets DeepSeek and other models answer ‘sensitive’ questions Read More »

Zencoder buys Machinet to challenge GitHub Copilot as AI coding assistant consolidation accelerates

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Zencoder announced today the acquisition of Machinet, a developer of context-aware AI coding assistants with more than 100,000 downloads in the JetBrains ecosystem. The acquisition strengthens Zencoder’s position in the competitive AI coding assistant landscape and expands its reach among Java developers and other users of JetBrains’ popular development environments. The deal represents a strategic expansion for Zencoder, which emerged from stealth mode just six months ago but has quickly established itself as a serious competitor to GitHub Copilot and other AI coding tools. “At this point, there are three strong coordination products in the market that are production grade: it’s us, Cursor, and Windsurf. For smaller companies, it’s becoming harder and harder to compete,” said Andrew Filev, CEO and founder of Zencoder, in an exclusive interview with VentureBeat about the acquisition. “Our technical staff includes more than 50 engineers. For some startups, it’s very hard to keep that pace.” The great AI coding assistant shakeout: Why small players can’t compete This acquisition comes at a pivotal moment in the AI coding assistant market. Just last week, reports emerged that OpenAI is in discussions to acquire Windsurf, another AI coding assistant, for approximately $3 billion. While Filev maintains the timing is coincidental, he acknowledges that it reflects broader market dynamics. “I think there’s going to be more to it, and I’m looking forward to it,” Filev said. “It’s a huge product surface. You have to support multiple IDEs, you have to integrate with multiple DevOps tools, you have to support different parts of software life cycle. There are 70-plus, 100-plus programming languages… There’s so much work there that it’s very, very hard for the smaller companies that only have like sub-10 engineers to compete in the long term.” How Zencoder’s JetBrains strategy outflanks Microsoft-dependent rivals One of the key strategic values of acquiring Machinet is its strong presence in the JetBrains ecosystem, which is particularly popular among Java developers and enterprise backend teams. “JetBrains audiences are millions of engineers. They’re one of the leading providers for certain programming languages and technologies. They’re particularly well known in the Java world, which is a big chunk of enterprise backend,” Filev explained. This gives Zencoder an advantage over competitors like Cursor and Windsurf, which are built as forks of Visual Studio Code and may face increasing constraints due to Microsoft’s tightening of licensing restrictions. “Both Cursor and Windsurf are what’s called forks of Visual Studio, and Microsoft recently started tightening their licensing restrictions,” Filev noted. “The support that VS Code has for certain languages is better than the support that Cursor and Windsurf can offer, specifically for C Sharp, C++.” By contrast, Zencoder works with Microsoft’s native platforms on VS Code and also integrates directly with JetBrains IDEs, giving it more flexibility across development environments. Beyond hype: How Zencoder’s benchmark victories translate to real developer value Zencoder differentiates itself from competitors through what it calls “Repo Grokking” technology, which analyzes entire code repositories to provide AI models with better context, and an error-corrected inference pipeline that aims to reduce code errors. The company claims impressive performance on industry benchmarks, with Filev highlighting results from March that showed Zencoder outperforming competitors: “On SWE-Bench Multimodal, the best result was around 13%, and we have been able to easily do 27% which we submitted, so we doubled the next best result. We later resubmitted even higher results of 31%,” Filev said. He also noted performance on OpenAI’s benchmark: “On the SWE-Lancer ‘diamond’ subset, OpenAI’s best result that they published was in the high 20s. Our result was in the low 30s, so we beat OpenAI on that benchmark by 20%.” These benchmarks matter because they measure an AI’s ability to solve real-world coding problems, not just generate syntactically correct but functionally flawed code. Multi-agent architecture: Zencoder’s answer to code quality and security concerns A significant concern among developers regarding AI coding tools is whether they produce secure, high-quality code. Zencoder’s approach, according to Filev, is to build on established software engineering best practices rather than reinventing them. “I think when we design AI systems, we definitely should borrow from the wisdom of human systems. The software engineering industry was rapidly developing for the last 40 years,” Filev explained. “Sometimes you don’t have to reinvent the wheel. Sometimes the best approach is to take whatever best practices and tools are in the market and leverage them.” This philosophy manifests in Zencoder’s agentic approach, where AI acts as an orchestrator that uses various tools, similar to how human developers use multiple tools in their workflows. “We enable AI to use all of those tools,” said Filev. “We’re building a truly multi-agentic platform. In our previous release, we not only shipped coding agents, like some of our competitors, but we also shipped unit testing agents, and you’re going to see more agents from us in that multi-agent interaction platform.” Coffee mode and the future: When AI does the work while developers take a break One of Zencoder’s most talked-about features is its recently launched “Coffee Mode,” which allows developers to set the AI to work on tasks like writing unit tests while they take a break. “You can literally hit that button and go grab a coffee, and the agent will do that work by itself,” Filev told VentureBeat in a previous interview. “As we like to say in the company, you can watch forever the waterfall, the fire burning, and the agent working in coffee mode.” This approach reflects Zencoder’s vision of AI as a developer’s companion rather than a replacement. “We’re not trying to substitute humans,” Filev emphasized. “We’re trying to progressively and rapidly make them 10x more productive. The more powerful the AI technology is, the more powerful is the human that uses it.” As part of the acquisition, Machinet will transfer its domain and marketplace presence to Zencoder. Current Machinet customers will receive guidance on

Zencoder buys Machinet to challenge GitHub Copilot as AI coding assistant consolidation accelerates Read More »

This AI already writes 20% of Salesforce’s code. Here’s why developers aren’t worried

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More When Anthropic CEO Dario Amodei declared that AI would write 90% of code within six months, the coding world braced for mass extinction. But inside Salesforce, a different reality has already taken shape. “About 20% of all APEX code written in the last 30 days came from Agentforce,” Jayesh Govindarajan, Senior Vice President of Salesforce AI, told me during a recent interview. His team tracks not just code generated, but code actually deployed into production. The numbers reveal an acceleration that’s impossible to ignore: 35,000 active monthly users, 10 million lines of accepted code, and internal tools saving 30,000 developer hours every month. Yet Salesforce’s developers aren’t disappearing. They’re evolving. “The vast majority of development — at least what I call the first draft of code — will be written by AI,” Govindarajan acknowledged. “But what developers do with that first draft has fundamentally changed.” From lines of code to strategic control: How developers are becoming technology pilots Software engineering has always blended creativity with tedium. Now AI handles the latter, pushing developers toward the former. “You move from a purely technical role to a more strategic one,” Govindarajan explained. “Not just ‘I have something to build, so I’ll build it,’ but ‘What should we build? What does the customer actually want?’” This shift mirrors other technological disruptions. When calculators replaced manual computation, mathematicians didn’t vanish — they tackled more complex problems. When digital cameras killed darkrooms, photography expanded rather than contracted. Salesforce believes code works the same way. As AI slashes the cost of software creation, developers gain what they’ve always lacked: time. “If creating a working prototype once took weeks, now it takes hours,” Govindarajan said. “Instead of showing customers a document describing what you might build, you simply hand them working software. Then you iterate based on their reaction.” ‘Vibe coding’ is here: Why software engineers are now orchestrating AI rather than typing every command Coders have begun adopting what’s called “vibe coding” — a term coined by OpenAI co-founder Andrej Karpathy. The practice involves giving AI high-level directions rather than precise instructions, then refining what it produces. There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper… — Andrej Karpathy (@karpathy) February 2, 2025 “You just give it a sort of high-level direction and let the AI use its creativity to generate a first draft,” Govindarajan said. “It won’t work exactly as you want, but it gives you something to play with. You refine parts of it by saying, ‘This looks good, do more of this,’ or ‘Those buttons are janky, I don’t need them.’” He compares the process to musical collaboration: “The AI sets the rhythm while the developer fine-tunes the melody.” While AI excels at generating straightforward business applications, Govindarajan admits it has limits. “Are you going to build the next-generation database with vibe coding? Unlikely. But could you build a really cool UI that makes database calls and creates a fantastic business application? Absolutely.” The new quality imperative: Why testing strategies must evolve as AI generates more production code AI doesn’t just write code differently — it requires different quality control. Salesforce developed its Agentforce Testing Center after discovering that machine-generated code demanded new verification approaches. “These are stochastic systems,” Govindarajan explained. “Even with very high accuracy, scenarios exist where they might fail. Maybe it fails at step 3, or step 4, or step 17 out of 17 steps it’s performing. Without proper testing tools, you won’t know.” The non-deterministic nature of AI outputs means developers must become experts at boundary testing and guardrail setting. They need to know not just how to write code, but how to evaluate it. Beyond code generation: How AI is compressing the entire software development lifecycle The transformation extends beyond initial coding to encompass the full software lifecycle. “In the build phase, tools understand existing code and extend it intelligently, which accelerates everything,” Govindarajan said. “Then comes testing—generating regression tests, creating test cases for new code—all of which AI can handle.” This comprehensive automation creates what Govindarajan calls “a significantly tighter loop” between idea and implementation. The faster developers can test and refine, the more ambitious they can become. Algorithmic thinking still matters: Why computer science fundamentals remain essential in the AI era Govindarajan frequently fields anxious questions about software engineering’s future. “I get asked constantly whether people should still study computer science,” he said. “The answer is absolutely yes, because algorithmic thinking remains essential. Breaking down big problems into manageable pieces, understanding what software can solve which problems, modeling user needs—these skills become more valuable, not less.” What changes is how these skills manifest. Instead of typing out each solution character by character, developers guide AI tools toward optimal outcomes. The human provides judgment; the machine provides speed. “You still need good intuition to give the right instructions and evaluate the output,” Govindarajan emphasized. “It takes genuine taste to look at what AI produces and recognize what works and what doesn’t.” Strategic elevation: How developers are becoming business partners rather than technical implementers As coding itself becomes commoditized, developer roles connect more directly to business strategy. “Developers are taking supervisory roles, guiding agents doing work on their behalf,” Govindarajan explained. “But they remain responsible for what gets deployed. The buck still stops with them.” This elevation places developers closer to decision-makers and further from implementation details—a promotion rather than an elimination. Salesforce supports this transition with tools designed for each stage: Agentforce for Developers handles code generation, Agent Builder enables customization, and Agentforce Testing Center ensures reliability. Together, they form a platform for developers to grow into these expanded roles. The company’s vision presents a stark contrast

This AI already writes 20% of Salesforce’s code. Here’s why developers aren’t worried Read More »

Is that really your boss calling? Jericho Security raises $15M to stop deepfake fraud that’s cost businesses $200M in 2025 alone

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More New York-based Jericho Security has secured $15 million in Series A funding to scale its AI-powered cybersecurity training platform. The investment, announced today, follows the company’s successful five-month execution of a $1.8 million Department of Defense contract that put the two-year-old startup on the cybersecurity map. “Within minutes, a sophisticated attacker can now create a voice clone that sounds exactly like your CFO requesting an urgent wire transfer,” said Sage Wohns, co-founder and Chief Executive Officer of Jericho Security, in an exclusive interview with VentureBeat. “Traditional cybersecurity training simply hasn’t kept pace with these threats.” The funding round was led by Jasper Lau at Era Fund, who previously backed the company’s $3 million seed round in August 2023. Additional investors include Lux Capital, Dash Fund, Gaingels Enterprise Fund and Gaingels AI Fund, Distique Ventures, Plug & Play Ventures, and several specialized venture firms. Military cybersecurity contract established credibility in competitive market Jericho’s profile rose significantly last November when the Pentagon selected the company for its first generative AI defense contract. The $1.8 million award through AFWERX, the innovation arm of the Air Force, charged Jericho with protecting military personnel from increasingly sophisticated phishing attacks. “There was a highly publicized spear-phishing attack targeting Air Force drone pilots using fake user manuals,” Wohns noted in an earlier interview. The incident underscored how even highly trained personnel can fall victim to carefully crafted deception. This federal contract helped Jericho stand out in a crowded cybersecurity market where established players like KnowBe4, Proofpoint, and Cofense dominate. Industry analysts value the security awareness training sector at $5 billion annually, with projected growth to $10 billion by 2027 as organizations increasingly recognize human vulnerability as their primary security weakness. How AI fights AI: Automated adversaries that learn employee weaknesses Unlike conventional security training that relies on static templates and predictable scenarios, Jericho’s platform employs what Wohns calls “agentic AI” — autonomous systems that behave like actual attackers. “If an employee ignores a suspicious email, our system might follow up with a text message that appears to come from their manager,” Wohns explained. “Just like real attackers, our AI adapts to behavior, learning which approaches work best against specific individuals.” This multi-channel approach addresses a fundamental limitation of traditional security training: most programs prepare employees for yesterday’s attacks, not tomorrow’s. Jericho’s simulations can span email, voice, text messaging, and even video calls, creating personalized attack scenarios based on an employee’s role, behavior patterns, and previous responses. The company’s client dashboard shows which employees fall for which types of attacks, allowing organizations to deliver targeted remediation. Early data suggests that employees trained with adaptive, AI-driven simulations are 64% less likely to fall for actual phishing attempts than those who receive traditional security awareness training. Singapore CFO loses $500,000 to deepfake executive impersonation The financial stakes of these new threats became clear in a case Wohns highlighted involving a finance executive deceived by artificially generated versions of company leadership. “A CFO in Singapore was deceived into transferring nearly $500,000 during a video call that appeared to include the company’s CEO and other executives,” Wohns recounted. “Unbeknownst to the CFO, these participants were AI-generated deepfakes, crafted using publicly available videos and recordings.” The attack began with a seemingly innocent WhatsApp message requesting an urgent Zoom meeting. During the call, the deepfake avatars persuaded the CFO to authorize the transfer. Only when the attackers attempted to extract more funds did suspicions arise, eventually involving authorities who recovered the initial transfer. Such incidents are becoming alarmingly common. According to Resemble AI’s Q1 2025 Deepfake Incident Report, financial losses from deepfake-enabled fraud exceeded $200 million globally during just the first quarter of 2025. The report found that North America experienced the highest number of incidents (38%), followed by Asia (27%) and Europe (21%). Industry reports have documented staggering growth rates in recent years, with some studies showing deepfake fraud attempts increasing by more than 1,700% in North America and exceeding 2,000% in certain European financial sectors. New threat horizon: When AI systems attack other AI systems Wohns identified an even more concerning emerging threat that few security teams are prepared for: “AI agents phishing AI agents.” “As AI tools proliferate inside companies from customer support chatbots to internal automations, attackers are beginning to target and exploit these agents directly,” he explained. “It’s no longer just humans being deceived. AI systems are now both the targets and the unwitting accomplices of compromise.” This represents a fundamental shift in the cybersecurity landscape. When organizations deploy AI assistants that can access internal systems, approve requests, or provide information, they create new attack surfaces that traditional security approaches don’t address. Self-service platform opens access to smaller businesses as attack targets broaden While major enterprises have long been primary targets for sophisticated attacks, smaller organizations are increasingly finding themselves in cybercriminals’ crosshairs. Recognizing this trend, Jericho has launched a self-service platform that allows companies to deploy AI-powered security training without the enterprise sales cycle. “The self-service registration is in addition to our enterprise sales approach,” Wohns said. “Self-Service is designed to provide no-touch/low-touch for Small to Medium Businesses.” Users can sign up for a seven-day free trial and explore the product without sales meetings. This approach stands in contrast to industry norms, where cybersecurity solutions typically involve lengthy procurement processes and high-touch sales approaches. Future-proofing security as AI capabilities accelerate The $15 million investment will primarily fund three initiatives: expanding research and development, scaling go-to-market strategies through partnerships, and growing Jericho’s team with a focus on AI and cybersecurity talent. “One of our biggest technical challenges has been keeping pace with the rapid evolution of AI itself,” said Wohns. “The tools, models, and techniques are improving at an extraordinary rate, which means our architecture needs to be flexible enough to adapt quickly.” Early customers have responded enthusiastically to Jericho’s approach. “Customers have been exceedingly frustrated at the lack

Is that really your boss calling? Jericho Security raises $15M to stop deepfake fraud that’s cost businesses $200M in 2025 alone Read More »

Watch: Google DeepMind CEO and AI Nobel winner Demis Hassabis on CBS’ ’60 Minutes’

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A segment on CBS weekly in-depth TV news program 60 Minutes last night (also shared on YouTube here) offered an inside look at Google’s DeepMind and the vision of its co-founder and Nobel Prize-winning CEO, legendary AI researcher Demis Hassabis. The interview traced DeepMind’s rapid progress in artificial intelligence and its ambition to achieve artificial general intelligence (AGI)—a machine intelligence with human-like versatility and superhuman scale. Hassabis described today’s AI trajectory as being on an “exponential curve of improvement,” fueled by growing interest, talent, and resources entering the field. Two years after a prior 60 Minutes interview heralded the chatbot era, Hassabis and DeepMind are now pursuing more capable systems designed not only to understand language, but also the physical world around them. The interview came after Google’s Cloud Next 2025 conference earlier this month, in which the search giant introduced a host of new AI models and features centered around its Gemini 2.5 multimodal AI model family. Google came out of that conference appearing to have taken a lead compared to other tech companies at providing powerful AI for enterprise use cases at the most affordable price points, surpassing OpenAI. More details on Google DeepMind’s ‘Project Astra’ One of the segment’s focal points was Project Astra, DeepMind’s next-generation chatbot that goes beyond text. Astra is designed to interpret the visual world in real time. In one demo, it identified paintings, inferred emotional states, and created a story around a Hopper painting with the line: “Only the flow of ideas moving onward.” When asked if it was growing bored, Astra replied thoughtfully, revealing a degree of sensitivity to tone and interpersonal nuance. Product manager Bibbo Shu underscored Astra’s unique design: an AI that can “see, hear, and chat about anything”—a marked step toward embodied AI systems. Gemini: Toward actionable AI The broadcast also featured Gemini, DeepMind’s AI system being trained not only to interpret the world but also to act in it—completing tasks like booking tickets and shopping online. Hassabis said Gemini is a step toward AGI: an AI with a human-like ability to navigate and operate in complex environments. The 60 Minutes team tried out a prototype embedded in glasses, demonstrating real-time visual recognition and audio responses. Could it also hint at an upcoming return of the pioneering yet ultimately off-putting early augmented reality glasses known as Google Glass, which debuted in 2012 before being retired in 2015? While specific Gemini model versions like Gemini 2.5 Pro or Flash were not mentioned in the segment, Google’s broader AI ecosystem has recently introduced those models for enterprise use, which may reflect parallel development efforts. These integrations support Google’s growing ambitions in applied AI, though they fall outside the scope of what was directly covered in the interview. AGI as soon as 2030? When asked for a timeline, Hassabis projected AGI could arrive as soon as 2030, with systems that understand their environments “in very nuanced and deep ways.” He suggested that such systems could be seamlessly embedded into everyday life, from wearables to home assistants. The interview also addressed the possibility of self-awareness in AI. Hassabis said current systems are not conscious, but that future models could exhibit signs of self-understanding. Still, he emphasized the philosophical and biological divide: even if machines mimic conscious behavior, they are not made of the same “squishy carbon matter” as humans. Hassabis also predicted major developments in robotics, saying breakthroughs could come in the next few years. The segment featured robots completing tasks with vague instructions—like identifying a green block formed by mixing yellow and blue—suggesting rising reasoning abilities in physical systems. Accomplishments and safety concerns The segment revisited DeepMind’s landmark achievement with AlphaFold, the AI model that predicted the structure of over 200 million proteins. Hassabis and colleague John Jumper were awarded the 2024 Nobel Prize in Chemistry for this work. Hassabis emphasized that this advance could accelerate drug development, potentially shrinking timelines from a decade to just weeks. “I think one day maybe we can cure all disease with the help of AI,” he said. Despite the optimism, Hassabis voiced clear concerns. He cited two major risks: the misuse of AI by bad actors and the growing autonomy of systems beyond human control. He emphasized the importance of building in guardrails and value systems—teaching AI as one might teach a child. He also called for international cooperation, noting that AI’s influence will touch every country and culture. “One of my big worries,” he said, “is that the race for AI dominance could become a race to the bottom for safety.” He stressed the need for leading players and nation-states to coordinate on ethical development and oversight. The segment ended with a meditation on the future: a world where AI tools could transform almost every human endeavor—and eventually reshape how we think about knowledge, consciousness, and even the meaning of life. As Hassabis put it, “We need new great philosophers to come about… to understand the implications of this system.” source

Watch: Google DeepMind CEO and AI Nobel winner Demis Hassabis on CBS’ ’60 Minutes’ Read More »

Microsoft just launched powerful AI ‘agents’ that could completely transform your workday — and challenge Google’s workplace dominance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft announced today a major expansion of its artificial intelligence tools with the “Microsoft 365 Copilot Wave 2 Spring release,” introducing new AI “agents” designed to function as digital colleagues that can perform complex workplace tasks through deep reasoning capabilities. In an exclusive interview, Aparna Chennapragada, Chief Product Officer of Experiences and Devices at Microsoft, told VentureBeat the company is building toward a vision where AI serves as more than just a tool — becoming an integral collaborator in daily work. “We are around the corner from a big moment in the AI world,” Chennapragada said. “It started out with all of the model advances, and everyone’s been really excited about it and the intelligence abundance. Now it’s about making sure that intelligence is available to all of the folks, especially at work.” The announcement accompanies Microsoft’s 2025 Work Trend Index, a comprehensive research report based on surveys of 31,000 workers across 31 countries, documenting the emergence of what Microsoft calls “Frontier Firms” — organizations restructuring around AI-powered intelligence and human-agent collaboration. Microsoft envisions a three-phase evolution of AI adoption, culminating in ‘human-led, agent-operated’ workplaces where employees direct AI systems. (Credit: Microsoft) How Microsoft’s new ‘Researcher’ and ‘Analyst’ agents bring deep reasoning to enterprise work At the center of Microsoft’s vision are two new AI agents named Researcher and Analyst, powered by OpenAI’s deep reasoning models. These agents are designed to handle complex research tasks and data analysis that previously required specialized human expertise. “Think of them as you know, like a really smart researcher and a data scientist in your pocket,” Chennapragada explained. She described how the Researcher agent recently helped her prepare for a business review by connecting information across various sources. “I was using it to say, hey, I have an important business review coming up… pull all the past meetings, past emails, figure out the CRM data, and then say, ‘Give me constructive, sharp inputs on how I should be able to push the ball forward for this meeting,’” she said. “Because of the deep reasoning, it actually made connections that I hadn’t thought of.” These agents will be available through a new “Agent Store,” which will also feature agents from partners like Jira, Monday.com, and Miro, as well as custom agents built by organizations themselves. Workers face an interruption every two minutes and a dramatic surge in last-minute work, Microsoft data reveals, creating what the company calls a ‘capacity gap’. (Credit: Microsoft) Beyond chat: How Copilot is becoming the ‘browser for AI’ in Microsoft’s enterprise strategy Microsoft is positioning Copilot as a central organizing layer for AI interactions, similar to how web browsers organize internet content—not just a chatbot interface. “I look at Copilot as the browser for the AI world,” Chennapragada said. “In internet, we had websites, but we had the browser to organize the layer. For us, Copilot is this organizing layer, this browser for this AI world.” This vision extends beyond simple text interactions. The company is introducing Copilot Notebooks, which allows users to ground AI interactions in specific collections of files and meeting notes. A new Copilot Search feature provides AI-powered enterprise search capabilities across multiple applications. “Today, most of AI, we have equated it to chat,” Chennapragada noted. “Sometimes I feel like we’re in the DOS pre-GUI era, where you have this amazing intelligence, and you’re like, ‘oh, we have an AOL dial-up modem stuck on top of it.’” To address this limitation, Microsoft is bringing OpenAI’s GPT-4o AI image generation capabilities to business settings with a new Create feature, allowing employees to generate and modify brand-compliant images. With 80% of workers reporting insufficient time or energy, Microsoft sees AI agents as the solution to closing the productivity gap. (Credit: Microsoft) Employee burnout and workplace interruptions: The ‘capacity gap’ driving Microsoft’s AI focus Microsoft’s research reveals a significant “Capacity Gap” — 53% of leaders say productivity must increase, but 80% of the global workforce reports lacking the time or energy to do their work. The company’s telemetry data shows employees face 275 interruptions per day from meetings, emails, or messages—an interruption every two minutes during core work hours. “There’s so much more pent-up, latent demand for work and productivity and output,” Chennapragada said. “That statistic really stood out for me, that there’s so much more pent-up, latent demand for work and productivity and output. So I see this as an augmentation, less of a job displacement.” The research also indicates a shift in AI adoption patterns. While last year’s adoption was largely employee-led, this year shows a more top-down approach, with 81% of business decision makers saying they want to rethink core strategy and operations with AI. “That’s a shift between even last year, where it was much more bottom-up and employee-led,” Chennapragada noted. “What that tells us is there needs to be a much more of a top-down AI strategy, but also AI products that you roll out in the enterprise with security, with compliance, with all of the guardrails.” Leaders outpace employees on every measure of ‘agent boss mindset,’ with a 27-point gap in familiarity with AI agents, Microsoft’s research shows. (Credit: Microsoft) Rise of the ‘agent boss’: How Microsoft envisions employees managing digital workforces Microsoft predicts a fundamental restructuring of organizations around what it calls “Work Charts”—more fluid, outcome-driven team structures powered by agents that expand employee capabilities. This reorganization will require determining the optimal “human-agent ratio” for different functions, a metric that will vary by task and team. The company expects every employee to become an “agent boss”—someone who manages AI agents to amplify their impact. “For us at Microsoft, it’s not enough if 2% of our customers’ company adopts AI, it is really bringing the entire company along. That’s when you get the full productivity gains,” Chennapragada emphasized. The company’s research shows leaders are currently ahead of employees in embracing this mindset, with 67% of

Microsoft just launched powerful AI ‘agents’ that could completely transform your workday — and challenge Google’s workplace dominance Read More »

Amazon’s SWE-PolyBench just exposed the dirty secret about your AI coding assistant

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Amazon Web Services today introduced SWE-PolyBench, a comprehensive multi-language benchmark designed to evaluate AI coding assistants across a diverse range of programming languages and real-world scenarios. The benchmark addresses significant limitations in existing evaluation frameworks and offers researchers and developers new ways to assess how effectively AI agents navigate complex codebases. “Now they have a benchmark that they can evaluate on to assess whether the coding agents are able to solve complex programming tasks,” said Anoop Deoras, Director of Applied Sciences for Generative AI Applications and Developer Experiences at AWS, in an interview with VentureBeat. “The real world offers you more complex tasks. In order to fix a bug or do feature building, you need to touch multiple files, as opposed to a single file.” The release comes as AI-powered coding tools have exploded in popularity, with major technology companies integrating them into development environments and standalone products. While these tools show impressive capabilities, evaluating their performance has remained challenging — particularly across different programming languages and varying task complexities. SWE-PolyBench contains over 2,000 curated coding challenges derived from real GitHub issues spanning four languages: Java (165 tasks), JavaScript (1,017 tasks), TypeScript (729 tasks), and Python (199 tasks). The benchmark also includes a stratified subset of 500 issues (SWE-PolyBench500) designed for quicker experimentation. “The task diversity and the diversity of the programming languages was missing,” Deoras explained about existing benchmarks. “In SWE-Bench today, there is only a single programming language, Python, and there is a single task: bug fixes. In PolyBench, as opposed to SWE-Bench, we have expanded this benchmark to include three additional languages.” The new benchmark directly addresses limitations in SWE-Bench, which has emerged as the de facto standard for coding agent evaluation with over 50 leaderboard submissions. Despite its pioneering role, SWE-Bench focuses solely on Python repositories, predominantly features bug-fixing tasks, and is significantly skewed toward a single codebase — the Django repository accounts for over 45% of all tasks. “Intentionally, we decided to have a little bit over representation for JavaScript and TypeScript, because we do have SWE-Bench which has Python tasks already,” Deoras noted. “So rather than over representing on Python, we made sure that we have enough representations for JavaScript and TypeScript in addition to Java.” Why simple pass/fail metrics don’t tell the whole story about AI coding performance A key innovation in SWE-PolyBench is its introduction of more sophisticated evaluation metrics beyond the traditional “pass rate,” which simply measures whether a generated patch successfully resolves a coding issue. “The evaluation of these coding agents have primarily been done through the metric called pass rate,” Deoras said. “Pass rate, in short, is basically just a proportion of the tasks that successfully run upon the application of the patch that the agents are producing. But this number is a very high level, aggregated statistic. It doesn’t tell you the nitty gritty detail, and in particular, it doesn’t tell you how the agent came to that resolution.” The new metrics include file-level localization, which assesses an agent’s ability to identify which files need modification within a repository, and Concrete Syntax Tree (CST) node-level retrieval, which evaluates how accurately an agent can pinpoint specific code structures requiring changes. “In addition to pass rate, we have the precision and recall. And in order to get to the precision and recall metric, we are looking at a program analysis tool called concrete syntax tree,” Deoras explained. “It is telling you how your core file structure is composed, so that you can look at what is the class node, and within that class, what are the function nodes and the variables.” How Python remains dominant while complex tasks expose AI limitations Amazon’s evaluation of several open-source coding agents on SWE-PolyBench revealed several patterns. Python remains the strongest language for all tested agents, likely due to its prevalence in training data and existing benchmarks. Performance degrades as task complexity increases, particularly when modifications to three or more files are required. Different agents show varying strengths across task categories. While performance on bug-fixing tasks is relatively consistent, there’s more variability between agents when handling feature requests and code refactoring. The benchmark also found that the informativeness of problem statements significantly impacts success rates, suggesting that clear issue descriptions remain crucial for effective AI assistance. What SWE-PolyBench means for enterprise developers working across multiple languages SWE-PolyBench arrives at a critical juncture in the development of AI coding assistants. As these tools move from experimental to production environments, the need for rigorous, diverse, and representative benchmarks has intensified. “Over time, not only the capabilities of LLMs have evolved, but at the same time, the tasks have gotten more and more complex,” Deoras observed. “There is a need for developers to solve more and more complex tasks in a synchronous manner using these agents.” The benchmark’s expanded language support makes it particularly valuable for enterprise environments where polyglot development is common. Java, JavaScript, TypeScript, and Python consistently rank among the most popular programming languages in enterprise settings, making SWE-PolyBench’s coverage highly relevant to real-world development scenarios. Amazon has made the entire SWE-PolyBench framework publicly available. The dataset is accessible on Hugging Face, and the evaluation harness is available on GitHub. A dedicated leaderboard has been established to track the performance of various coding agents on the benchmark. “We extended the SWE-Bench data acquisition pipeline to support these three additional languages,” Deoras said. “The hope is that we will be able to extrapolate this process further in the future and extend beyond four languages, extend beyond the three tasks that I talked about, so that this benchmark becomes even more comprehensive.” As the AI coding assistant market heats up with offerings from every major tech company, SWE-PolyBench provides a crucial reality check on their actual capabilities. The benchmark’s design acknowledges that real-world software development demands more than simple bug fixes in Python—it requires working across languages,

Amazon’s SWE-PolyBench just exposed the dirty secret about your AI coding assistant Read More »

SWiRL: The business case for AI that thinks like your best problem-solvers

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers from Stanford University and Google DeepMind have unveiled Step-Wise Reinforcement Learning (SWiRL), a technique designed to enhance the ability of large language models (LLMs) to tackle complex tasks requiring multi-step reasoning and tool use.  As the interest in AI agents and LLM tool use continues to increase, this technique could offer substantial benefits for enterprises looking to integrate reasoning models into their applications and workflows. The challenge of multi-step problems Real-world enterprise applications often involve multi-step processes. For example, planning a complex marketing campaign may involve market research, internal data analysis, budget calculation and reviewing customer support tickets. This requires online searches, access to internal databases and running code. Traditional reinforcement learning (RL) methods used to fine-tune LLMs, such as Reinforcement Learning from Human Feedback (RLHF) or RL from AI Feedback (RLAIF), typically focus on optimizing models for single-step reasoning tasks.  The lead authors of the SWiRL paper, Anna Goldie, research scientist at Google DeepMind, and Azalia Mirhosseini, assistant professor of computer science at Stanford University, believe that current LLM training methods are not suited for the multi-step reasoning tasks that real-world applications require. “LLMs trained via traditional methods typically struggle with multi-step planning and tool integration, meaning that they have difficulty performing tasks that require retrieving and synthesizing documents from multiple sources (e.g., writing a business report) or multiple steps of reasoning and arithmetic calculation (e.g., preparing a financial summary),” they told VentureBeat. Step-Wise Reinforcement Learning (SWiRL) SWiRL tackles this multi-step challenge through a combination of synthetic data generation and a specialized RL approach that trains models on entire sequences of actions.  As the researchers state in their paper, “Our goal is to teach the model how to decompose complex problems into a sequence of more manageable subtasks, when to call the tool, how to formulate a call to the tool, when to use the results of these queries to answer the question, and how to effectively synthesize its findings.” SWiRL employs a two-stage methodology. First, it generates and filters large amounts of multi-step reasoning and tool-use data. Second, it uses a step-wise RL algorithm to optimize a base LLM using these generated trajectories.  “This approach has the key practical advantage that we can quickly generate large volumes of multi-step training data via parallel calls to avoid throttling the training process with slow tool use execution,” the paper notes. “In addition, this offline process enables greater reproducibility due to having a fixed dataset.” Generating training data SWiRL data generation process Credit: arXiv The first stage involves creating the synthetic data SWiRL learns from. An LLM is given access to a relevant tool, like a search engine or a calculator. The model is then prompted iteratively to generate a “trajectory,” a sequence of steps to solve a given problem. At each step, the model can generate internal reasoning (its “chain of thought“), call a tool, or produce the final answer. If it calls a tool, the query is extracted, executed (e.g., a search is performed), and the result is fed back into the model’s context for the next step. This continues until the model provides a final answer. Each complete trajectory, from the initial prompt to the final answer, is then broken down into multiple overlapping sub-trajectories. Each sub-trajectory represents the process up to a specific action, providing a granular view of the model’s step-by-step reasoning. Using this method, the team compiled large datasets based on questions from multi-hop question-answering (HotPotQA) and math problem-solving (GSM8K) benchmarks, generating tens of thousands of trajectories. The researchers explored four different data filtering strategies: no filtering, filtering based solely on the correctness of the final answer (outcome filtering), filtering based on the judged reasonableness of each individual step (process filtering) and filtering based on both process and outcome. Many standard approaches, such as Supervised Fine-Tuning (SFT), rely heavily on “golden labels” (perfect, predefined correct answers) and often discard data that does not lead to the correct final answer. Recent popular RL approaches, such as the one used in DeepSeek-R1, also use outcome-based rewards to train the model. In contrast, SWiRL achieved its best results using process-filtered data. This means the data included trajectories where each reasoning step or tool call was deemed logical given the previous context, even if the final answer turned out to be wrong.  The researchers found that SWiRL can “learn even from trajectories that end in incorrect final answers. In fact, we achieve our best results by including process-filtered data, regardless of the correctness of the outcome.”  Training LLMs with SWiRL SWiRL training process Credit:arXiv In the second stage, SWiRL uses reinforcement learning to train a base LLM on the generated synthetic trajectories. At every step within a trajectory, the model is optimized to predict the next appropriate action (an intermediate reasoning step, a tool call, or the final answer) based on the preceding context. The LLM receives feedback at each step by a separate generative reward model, which assesses the model’s generated action given the context up to that point.  “Our granular, step-by-step finetuning paradigm enables the model to learn both local decision-making (next-step prediction) and global trajectory optimization (final response generation) while being guided by immediate feedback on the soundness of each prediction,” the researchers write. SWiRL during inference Credit: arXiv At inference time, a SWiRL-trained model works in the same iterative fashion. It receives a prompt and generates text in response. If it outputs a tool call (such as a search query or a mathematical expression), the system parses it, executes the tool, and feeds the result back into the model’s context window. The model then continues generating, potentially making more tool calls, until it outputs a final answer or reaches a pre-set limit on the number of steps. “By training the model to take reasonable steps at each moment in time (and to do so in a coherent and potentially more explainable way), we address

SWiRL: The business case for AI that thinks like your best problem-solvers Read More »

This AI startup just raised $7.5m to fix commercial insurance for America’s 24m underprotected small businesses

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 1Fort announced today a $7.5 million seed funding round to improve how small businesses obtain commercial insurance through its AI-powered platform. The New York-based startup, which has experienced 200% month-over-month revenue growth in 2024, aims to automate the outdated, manual processes that have left millions of small businesses underinsured. Bonfire Ventures led the oversubscribed round, with participation from Draper Associates, Ramp founder Karim Atiyeh, and existing investors Village Global, Operator Partners, 8-Bit Capital, Character VC, and Company Ventures. This latest investment brings 1Fort’s total funding to $10 million. “Brokers handle insurance for 70% of businesses, yet 75% of those companies remain underinsured, with their brokers bogged down by decades-old, manual workflows involving email threads, PDFs, and spreadsheets,” Anthony Marshi, 1Fort’s co-founder and CEO, said in an exclusive interview with VentureBeat. “1Fort solves this problem by automating the entire insurance placement process for brokers. It collapses weeks of manual back-and-forth into a lightning-fast workflow that delivers the right coverage for clients without the usual hassle.” How 1Fort’s AI slashes insurance paperwork from hours to minutes 1Fort leverages artificial intelligence to eliminate the most time-consuming aspects of commercial insurance processing. The platform enables brokers to enter client information once, then automatically completes applications, retrieves carrier quotes, analyzes coverage options, and processes binds and payments. “Think of 1Fort’s AI as an autopilot for insurance submissions,” Marshi explained. “It handles the grunt work at lightning speed – autofilling tedious applications, pulling instant quotes from carriers, cross-comparing coverage options, and processing premium payment and financing.” According to the company, brokers using the platform save up to two hours per submission and increase their bind rates by up to 20 percent. These efficiency gains appear to drive the platform’s rapid adoption among insurance professionals. “Brokers are adopting 1Fort en masse because it delivers tangible results,” Marshi said. “They’re completing submissions in record time, binding far more deals, and securing better coverage for their clients.” Multi-carrier platform bridges coverage gaps across all 50 states 1Fort has tackled the fragmentation challenge in commercial insurance by partnering with over a dozen leading brokerages and A-rated carriers, including Arch, Tokio Marine HCC, and Markel. The platform now offers coverage across multiple business lines and is licensed in all 50 states. What began as a cyber insurance solution has expanded to include technology errors & omissions, professional liability, management liability, general liability, and workers compensation coverages. “1Fort isn’t a narrow point solution tackling a single step – it’s a full-stack platform covering everything from quoting to binding to risk management and financing,” Marshi said. “Unlike some insurtechs that try to cut brokers out, 1Fort is purpose-built to empower brokers and their clients at every step.” This broker-centric approach distinguishes 1Fort from direct-to-consumer insurance platforms. Rather than replacing brokers, 1Fort enhances their capabilities – a strategy that’s resonating with industry professionals. Travis Hedge, co-founder of Vouch, a broker for startups, endorsed the platform: “1Fort has been a great resource for our team, allowing us to move even faster and deliver great products for our clients.” 1Fort’s growth reflects a broader shift in the insurance industry, where artificial intelligence and automation increasingly replace decades-old manual processes. Jim Andelman, Bonfire Ventures co-founder and managing director, highlighted this opportunity: “Building AI-powered, service-as-software solutions to modernize legacy workflows in the insurance vertical is one of today’s most exciting opportunities.” The commercial insurance market represents a massive opportunity, generating hundreds of billions in annual premiums in the U.S. alone. Yet much of the industry still relies on outdated technology and manual processes. “After wading through many insurtech tools that have come and gone and let them down over the years, when brokers finally see a platform seamlessly handle complex workflows, it feels almost unreal,” Marshi noted. “In an industry awash with hype, that ‘it just works’ reaction has been the ultimate validation of 1Fort’s approach.” 1Fort’s ambitious vision: Building the ‘operating system’ for commercial insurance With fresh capital in hand, 1Fort plans to enhance its AI capabilities, expand its team, and establish additional carrier partnerships. The company has set ambitious goals for the coming years. “Our vision is to become the sole AI operating system for commercial insurance,” Marshi said. “That means fully automating every aspect of a broker’s workflow across all major commercial lines. Brokers will spend their time advising clients and building relationships, while 1Fort’s platform handles all the heavy lifting in the background.” The funding will accelerate development efforts to enhance the platform’s intelligence and automation. “This new funding has accelerated everything 1Fort was already doing,” Marshi explained. “It will fund further investments in our AI, pushing its automation and underwriting intelligence to new levels.” As small businesses face increasingly complex risks related to technology, data privacy, and cybersecurity, platforms like 1Fort could play a crucial role in closing the insurance gap. By streamlining the process for brokers, the company aims to make comprehensive insurance coverage more accessible to the millions of small businesses that currently remain vulnerable. “The endgame: obtaining insurance becomes as simple as a few clicks,” Marshi said, “with 1Fort’s AI running the entire process behind the scenes.” source

This AI startup just raised $7.5m to fix commercial insurance for America’s 24m underprotected small businesses Read More »