VentureBeat

Arcee opens up new enterprise-focused, customizable AI model AFM-4.5B trained on ‘clean, rigorously filtered data’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Arcee.ai, a startup focused on developing small AI models for commercial and enterprise use, is opening up its own AFM-4.5B model for limited free usage by small companies — posting the weights on Hugging Face and allowing enterprises that make less than $1.75 million in annual revenue to use it without charge under a custom “Arcee Model License.“ Designed for real-world enterprise use, the 4.5-billion-parameter model — much smaller than the tens of billions to trillions of leading frontier models — combines cost efficiency, regulatory compliance, and strong performance in a compact footprint. AFM-4.5B was one of a two part release made by Arcee last month, and is already “instruction tuned,” or an “instruct” model, which is designed for chat, retrieval, and creative writing and can be deployed immediately for these use cases in enterprises. Another base model was also released at the time that was not instruction tuned, only pre-trained, allowing more customizability by customers. However, both were only available through commercial licensing terms — until now. Arcee’s chief technology officer (CTO) Lucas Atkins also noted in a post on X that more “dedicated models for reasoning and tool use are on the way,” as well. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO “Building AFM-4.5B has been a huge team effort, and we’re deeply grateful to everyone who supported us We can’t wait to see what you build with it,” he wrote in another post. “We’re just getting started. If you have feedback or ideas, please don’t hesitate to reach out at any time.” The model is available now for deployment across a variety of environments —from cloud to smartphones to edge hardware. It’s also geared toward Arcee’s growing list of enterprise customers and their needs and wants — specifically, a model trained without violating intellectual property. As Arcee wrote in its initial AFM-4.5B announcement post last month: “Tremendous effort was put towards excluding copyrighted books and material with unclear licensing.” Arcee notes it worked with third-party data curation firm DatologyAI to apply techniques like source mixing, embedding-based filtering, and quality control — all aimed at minimizing hallucinations and IP risks. Focused on enterprise customer needs AFM-4.5B is Arcee.ai’s response to what it sees as major pain points in enterprise adoption of generative AI: high cost, limited customizability, and regulatory concerns around proprietary large language models (LLMs). Over the past year, the Arcee team held discussions with more than 150 organizations, ranging from startups to Fortune 100 companies, to understand the limitations of existing LLMs and define their own model goals. According to the company, many businesses found mainstream LLMs — such as those from OpenAI, Anthropic, or DeepSeek — too expensive and difficult to tailor to industry-specific needs. Meanwhile, while smaller open-weight models like Llama, Mistral, and Qwen offered more flexibility, they introduced concerns around licensing, IP provenance, and geopolitical risk. AFM-4.5B was developed as a “no-trade-offs” alternative: customizable, compliant, and cost-efficient without sacrificing model quality or usability. AFM-4.5B is designed with deployment flexibility in mind. It can operate in cloud, on-premise, hybrid, or even edge environments—thanks to its efficiency and compatibility with open frameworks such as Hugging Face Transformers, llama.cpp, and (pending release) vLLM. The model supports quantized formats, allowing it to run on lower-RAM GPUs or even CPUs, making it practical for applications with constrained resources. Company vision secures backing Arcee.ai’s broader strategy focuses on building domain-adaptable, small language models (SLMs) that can power many use cases within the same organization. As CEO Mark McQuade explained in a VentureBeat interview last year, “You don’t need to go that big for business use cases.” The company emphasizes fast iteration and model customization as core to its offering. This vision gained investor backing with a $24 million Series A round back in 2024. Inside AFM-4.5B’s architecture and training process The AFM-4.5B model uses a decoder-only transformer architecture with several optimizations for performance and deployment flexibility. It incorporates grouped query attention for faster inference and ReLU² activations in place of SwiGLU to support sparsification without degrading accuracy. Training followed a three-phase approach: Pretraining on 6.5 trillion tokens of general data Midtraining on 1.5 trillion tokens emphasizing math and code Instruction tuning using high-quality instruction-following datasets and reinforcement learning with verifiable and preference-based feedback To meet strict compliance and IP standards, the model was trained on nearly 7 trillion tokens of data curated for cleanliness and licensing safety. A competitive model, but not a leader Despite its smaller size, AFM-4.5B performs competitively across a broad range of benchmarks. The instruction-tuned version averages a score of 50.13 across evaluation suites such as MMLU, MixEval, TriviaQA, and Agieval—matching or outperforming similar-sized models like Gemma-3 4B-it, Qwen3-4B, and SmolLM3-3B. Multilingual testing shows the model delivers strong performance across more than 10 languages, including Arabic, Mandarin, German, and Portuguese. According to Arcee, adding support for additional dialects is straightforward due to its modular architecture. AFM-4.5B has also shown strong early traction in public evaluation environments. In a leaderboard that ranks conversational model quality by user votes and win rate, the model ranks third overall, trailing only Claude Opus 4 and Gemini 2.5 Pro. It boasts a win rate of 59.2% and the fastest latency of any top model at 0.2 seconds, paired with a generation speed of 179 tokens per second. Built-in support for agents In addition to general capabilities, AFM-4.5B comes with built-in support for function calling and agentic reasoning. These features aim to simplify the process of building AI agents and workflow automation tools, reducing the need for complex prompt engineering or orchestration layers. This functionality aligns with Arcee’s broader strategy of enabling enterprises to build custom, production-ready models faster, with lower total cost of ownership (TCO) and easier

Arcee opens up new enterprise-focused, customizable AI model AFM-4.5B trained on ‘clean, rigorously filtered data’ Read More »

AI’s promise of opportunity masks a reality of managed displacement

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Cognitive migration is underway. The station is crowded. Some have boarded while others hesitate, unsure whether the destination justifies the departure. Future of work expert and Harvard University Professor Christopher Stanton commented recently that the uptake of AI has been tremendous and observed that it is an “extraordinarily fast-diffusing technology.” That speed of adoption and impact is a critical part of what differentiates the AI revolution from previous technology-led transformations, like the PC and the internet. Demis Hassabis, CEO of Google DeepMind, went further, predicting that AI could be “10 times bigger than the Industrial Revolution, and maybe 10 times faster.” Intelligence, or at least thinking, is increasingly shared between people and machines. Some people have begun to regularly use AI in their workflows. Others have gone further, integrating it into their cognitive routines and creative identities. These are the “willing,” including the consultants fluent in prompt design, the product managers retooling systems and those building their own businesses that do everything from coding to product design to marketing.  For them, the terrain feels new but navigable. Exciting, even. But for many others, this moment feels strange, and more than a little unsettling. The risk they face is not just being left behind. It is not knowing how, when and whether to invest in AI, a future that seems highly uncertain, and one that is difficult to imagine their place in. That is the double risk of AI readiness, and it is reshaping how people interpret the pace, promises and pressure of this transition. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Is it real? Across industries, new roles and teams are forming, and AI tools are reshaping workflows faster than norms or strategies can keep up. But the significance is still hazy, the strategies unclear. The end game, if there is one, remains uncertain. Yet the pace and scope of change feels portentous. Everyone is being told to adapt, but few know exactly what that means or how far the changes will go. Some AI industry leaders claim huge changes are coming, and soon, with superintelligent machines emerging possibly within a few years.  But maybe this AI revolution will go bust, as others have before, with another “AI winter” to follow. There have been two notable winters. The first was in the 1970s, brought about by computational limits. The second began in the late 1980s after a wave of unmet expectations with high-profile failures and under-delivery of “expert systems.” These winters were characterized by a cycle of lofty expectations followed by profound disappointment, leading to significant reductions in funding and interest in AI.  Should the excitement around AI agents today mirror the failed promise of expert systems, this could lead to another winter. However, there are major differences between then and now. Today, there is far greater institutional buy-in, consumer traction and cloud computing infrastructure compared to the expert systems of the 1980s. There is no guarantee that a new winter will not emerge, but if the industry fails this time, it will not be for lack of money or momentum. It will be because trust and reliability broke first. A major retrenchment occurred in 1988 after the AI industry failed to meet its promises. The New York Times Cognitive migration has started If “the great cognitive migration” is real, this remains the early part of the journey. Some have boarded the train while others still linger, unsure about whether or when to get onboard. Amidst the uncertainty, the atmosphere at the station has grown restless, like travelers sensing a trip itinerary change that no one has announced.  Most people have jobs, but they wonder about the degree of risk they face. The value of their work is shifting. A quiet but mounting anxiety hums beneath the surface of performance reviews and company town halls. Already, AI can accelerate software development by 10 to 100X, generate the majority of client-facing code and compress project timelines dramatically. Managers are now able to use AI to create employee performance evaluations. Even classicists and archaeologists have found value in AI, having used the technology to understand ancient Latin inscriptions. The “willing” have an idea of where they are going and may find traction. But for the “pressured,” the “resistant” and even those not yet touched by AI, this moment feels like something between anticipation and grief. These groups have started to grasp that they may not be staying in their comfort zones for long.  For many, this is not just about tools or a new culture, but whether that culture has space for them at all. Waiting too long is akin to missing the train and could lead to long-term job displacement. Even those I have spoken with who are senior in their careers and have begun using AI wonder if their positions are threatened. The narrative of opportunity and upskilling hides a more uncomfortable truth. For many, this is not a migration. It is a managed displacement. Some workers are not choosing to opt out of AI. They are discovering that the future being built does not include them. Belief in the tools is different from belonging in the system tools are reshaping. And without a clear path to participate meaningfully, “adapt or be left behind” begins to sound less like advice and more like a verdict. These tensions are precisely why this moment matters. There is a growing sense that work, as they have known it, is beginning to recede. The signals are coming from the top. Microsoft CEO Satya Nadella acknowledged as much in a July 2025 memo following a reduction in force, noting that the transition to the AI era “might feel messy at times, but

AI’s promise of opportunity masks a reality of managed displacement Read More »

Deep Cogito goes big, releasing 4 new open source hybrid reasoning models with self-improving ‘intuition’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Deep Cogito, a lesser-known AI research startup based in San Francisco founded by ex-Googlers, has released four new open-ish large language models (LLMs) that attempt something few others do: Learning how to reason more effectively over time — and get better at it on their own. The models, released as part of Cogito’s v2 family, range from 70 billion to 671 billion parameters and are available for AI developers and enterprises to use under a mix of limited and fully open licensing terms. They include: Cogito v2-70B (Dense) Cogito v2-109B (Mixture-of-experts) Cogito v2-405B (Dense) Cogito v2-671B (MoE) Dense and MoE models are each suited to different needs. Dense 70B and 405B variant models activate all parameters on every forward pass, making them more predictable and easier to deploy across a wide range of hardware. They’re ideal for low-latency applications, fine-tuning and environments with limited GPU capacity. MoE models, such as the 109B and 671B versions, use a sparse routing mechanism to activate only a few specialized “expert” subnetworks at a time, allowing for much larger total model sizes without proportional increases in compute cost. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO This makes them well-suited for high-performance inference tasks, research into complex reasoning or serving frontier-level accuracy at lower runtime expense. In Cogito v2, the 671B MoE model serves as the flagship, leveraging its scale and routing efficiency to match or exceed leading open models on benchmarks — while using significantly shorter reasoning chains. The models are available now on Hugging Face for download and usage by enterprises and on Unsloth for local usage, or, for those who can’t host the model inferences on their own hardware, through application programming interfaces (APIs) from Together AI, Baseten and RunPod. There’s also a quantized “8-bit floating point (FP8)” version of the 671B model, which reduces the size of the numbers used to represent the model’s parameters from 16-bits to 8-bits, helping users run massive models faster, cheaper and on more accessible hardware — sometimes with only a negligible hit to performance (95 to 99%). However, this can slightly degrade model accuracy, especially for tasks requiring fine-grained precision (some math or reasoning problems). All four Cogito v2 models are designed as hybrid reasoning systems: They can respond immediately to a query, or, when needed, reflect internally before answering. Crucially, that reflection is not just runtime behavior — it’s baked into the training process itself. These models are trained to internalize their own reasoning. That means the very paths they take to arrive at answers — the mental steps, so to speak — are distilled back into the models’ weights. Over time, they learn which lines of thinking actually matter and which don’t. As Deep Cogito’s blog post notes, the researchers “disincentivize the model from ‘meandering more’ to be able to arrive at the answer, and instead develop a stronger intuition for the right search trajectory for the reasoning process.” The result, Deep Cogito claims, is faster, more efficient reasoning and a general improvement in performance, even in so-called “standard” mode. Self-improving AI While many in the AI community are just encountering the company, Deep Cogito has been quietly building for over a year. It emerged from stealth in April 2025 with a series of open-source models trained on Meta’s Llama 3.2. Those early releases showed promising results. This came after a $13 million seed funding round that closed in November 2024 and was led by Benchmark, with Benchmark’s Eric Vishria joining the company’s board. As VentureBeat previously reported, the smallest Cogito v1 models (3B and 8B) outperformed Llama 3 counterparts across several benchmarks — sometimes by wide margins. Deep Cogito CEO and co-founder Drishan Arora — previously a lead LLM engineer at Google — described the company’s long-term goal as building models that can reason and improve with each iteration, much like how AlphaGo refined its strategy through self-play. Deep Cogito’s core method, iterated distillation and amplification (IDA), replaces hand-written prompts or static teachers with the model’s own evolving insights. What is ‘machine intuition’? With Cogito v2, the team took that loop to a much larger scale. The central idea is simple: Reasoning shouldn’t just be an inference-time tool; it should be part of the model’s core intelligence. So, the company implemented a system where the model runs reasoning chains during training, and then is trained on its intermediate thoughts. This process yields concrete improvements, according to internal benchmarks. The flagship 671B MoE model outperforms DeepSeek R1 in reasoning tasks, matching or beating its latest 0528 model while using 60% shorter reasoning chains. On MMLU, GSM8K and MGSM, Cogito 671B MoE’s performance was roughly on par with top open models like Qwen1.5-72B and DeepSeek v3, and approached the performance tier of closed models like Claude 4 Opus and o3. Specifically: Cogito 671B MoE (reasoning mode) matched DeepSeek R1 0528 across multilingual QA and general knowledge tasks, and outperformed it on strategy and logical deduction. In non-reasoning mode, it exceeded DeepSeek v3 0324, suggesting that the distilled intuition carried real performance weight even without an extended reasoning path. The model’s ability to complete reasoning in fewer steps also had downstream effects: Lower inference costs and faster response times on complex prompts. Arora explains this as a difference between searching for a path versus already knowing roughly where the destination lies. “Since the Cogito models develop a better intuition of the trajectory to take while searching at inference time, they have 60% shorter reasoning chains than Deepseek R1,” he wrote in a thread on X. What kinds of tasks do Deep Cogito’s new models excel at when using their machine intuition? Some of the most compelling examples from Cogito v2’s internal testing highlight exactly

Deep Cogito goes big, releasing 4 new open source hybrid reasoning models with self-improving ‘intuition’ Read More »

Google releases Olympiad medal-winning Gemini 2.5 ‘Deep Think’ AI publicly — but there’s a catch…

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Google has officially launched Gemini 2.5 Deep Think, a new variation of its AI model engineered for deeper reasoning and complex problem-solving, which made headlines last month for winning a gold medal at the International Mathematical Olympiad (IMO) — the first time an AI model achieved the feat. However, this is unfortunately not the identical gold medal-winning model. It is in fact, a less powerful “bronze” version according to Google’s blog post and Logan Kilpatrick, Product Lead for Google AI Studio. As Kilpatrick posted on the social network X: “This is a variation of our IMO gold model that is faster and more optimized for daily use. We are also giving the IMO gold full model to a set of mathematicians to test the value of the full capabilities.” Now available through the Gemini mobile app, this bronze model is accessible to subscribers of Google’s most expensive individual AI plan, AI Ultra, which costs $249.99 per month with a 3-month starting promotion at a reduced rate of $124.99/month for new subscribers. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Google also said in its release blog post that it would bring Deep Think with and without tool usage integrations to “trusted testers” through the Gemini application programming interface (API) “in the coming weeks.” Why ‘Deep Think’ is so powerful Gemini 2.5 Deep Think builds on the Gemini family of large language models (LLMs), adding new capabilities aimed at reasoning through sophisticated problems. It employs “parallel thinking” techniques to explore multiple ideas simultaneously and includes reinforcement learning to strengthen its step-by-step problem-solving ability over time. The model is designed for use cases that benefit from extended deliberation, such as mathematical conjecture testing, scientific research, algorithm design, and creative iteration tasks like code and design refinement. Early testers, including mathematicians such as Michel van Garrel, have used it to probe unsolved problems and generate potential proofs. AI power user and expert Ethan Mollick, a professor of the Wharton School of Business at the University of Pennsylvania, also posted on X that it was able to take a prompt he often uses to test the capabilities of new models — “create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future” — and turned it into a 3D graphic, which is the first time any model has done that. Had early access to Gemini with Deep Think. Very good model, big gains over standard Gemini 2.5 Pro for a lot of problems. Here is the first attempt at the starship control panel prompt I try with every model. First time I have seen a model make a 3D interface in response. https://t.co/8iW2Pn6Xpu pic.twitter.com/bLFF2IcOP3 — Ethan Mollick (@emollick) August 1, 2025 Performance benchmarks and use cases Google highlights several key application areas for Deep Think: Mathematics and science: The model can simulate reasoning for complex proofs, explore conjectures, and interpret dense scientific literature Coding and algorithm design: It performs well on tasks involving performance tradeoffs, time complexity, and multi-step logic Creative development: In design scenarios such as voxel art or user interface builds, Deep Think demonstrates stronger iterative improvement and detail enhancement The model also leads performance in benchmark evaluations such as LiveCodeBench V6 (for coding ability) and Humanity’s Last Exam (covering math, science, and reasoning). It outscored Gemini 2.5 Pro and competing models like OpenAI’s GPT-4 and xAI’s Grok 4 by double digit margins on some categories (Reasoning & Knowledge, Code generation, and IMO 2025 Mathematics). Gemini 2.5 Deep Think vs. Gemini 2.5 Pro While both Deep Think and Gemini 2.5 Pro are part of the Gemini 2.5 model family, Google positions Deep Think as a more capable and analytically skilled variant, particularly when it comes to complex reasoning and multi-step problem-solving. This improvement stems from the use of parallel thinking and reinforcement learning techniques, which enable the model to simulate deeper cognitive deliberation. In its official communication, Google describes Deep Think as better at handling nuanced prompts, exploring multiple hypotheses, and producing more refined outputs. This is supported by side-by-side comparisons in voxel art generation, where Deep Think adds more texture, structural fidelity, and compositional diversity than 2.5 Pro. The improvements aren’t just visual or anecdotal. Google reports that Deep Think outperforms Gemini 2.5 Pro on multiple technical benchmarks related to reasoning, code generation, and cross-domain expertise. However, these gains come with tradeoffs in responsiveness and prompt acceptance. Here’s a breakdown: Capability / Attribute Gemini 2.5 Pro Gemini 2.5 Deep Think Inference speed Faster, low latency Slower, extended “thinking time” Reasoning complexity Moderate High — uses parallel thinking Prompt depth and creativity Good More detailed and nuanced Benchmark performance Strong State-of-the-art Content safety & tone objectivity Improved over older models Further improved Refusal rate (benign prompts) Lower Higher Output length Standard Supports longer responses Voxel art / design fidelity Basic scene structure Enhanced detail and richness Google notes that Deep Think’s higher refusal rate is an area of active investigation. This may limit its flexibility in handling ambiguous or informal queries compared to 2.5 Pro. In contrast, 2.5 Pro remains better suited for users who prioritize speed and responsiveness, especially for lighter, general-purpose tasks. This differentiation allows users to choose based on their priorities: 2.5 Pro for speed and fluidity, or Deep Think for rigor and reflection. Not the gold medal winning model, just a bronze In July, Google DeepMind made headlines when a more advanced version of the Gemini Deep Think model achieved official gold-medal status at the 2025 IMO — the world’s most prestigious mathematics competition for high school students. The system solved five of six challenging problems and

Google releases Olympiad medal-winning Gemini 2.5 ‘Deep Think’ AI publicly — but there’s a catch… Read More »

The initial reactions to OpenAI’s landmark open source gpt-oss models are highly varied and mixed

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI’s long-awaited return to the “open” of its namesake occurred yesterday with the release of two new large language models (LLMs): gpt-oss-120B and gpt-oss-20B. However, despite achieving technical benchmarks on par with OpenAI’s other powerful proprietary AI model offerings, the broader AI developer and user community’s initial response has so far been mixed. If this release were a movie premiering and being graded on Rotten Tomatoes, we’d be looking at a near 50% split, based on my observations. First, some background: OpenAI has released these two new text-only language models (no image generation or analysis), both under the permissive open-source Apache 2.0 license — the first time since 2019 (before ChatGPT) that the company has done so with a cutting-edge language model. The entire ChatGPT era of the last 2.7 years has been powered so far by proprietary or closed-source models, ones that OpenAI controlled and that users had to pay to access (or use a free tier subject to limits), with limited customizability and no way to run them offline or on private computing hardware. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO But that all changed thanks to the release of the pair of gpt-oss models yesterday, one larger and more powerful for use on a single Nvidia H100 GPU at say, a small or medium-sized enterprise’s data center or server farm, and an even smaller one that works on a single consumer laptop or desktop PC like the kind in your home office. Of course, the models being so new, it’s taken several hours for the AI power user community to independently run and test them out on their own individual benchmarks (measurements) and tasks. And now we’re getting a wave of feedback ranging from optimistic enthusiasm about the potential of these powerful, free, and efficient new models to an undercurrent of dissatisfaction and dismay with what some users see as significant problems and limitations, especially compared to the wave of similarly Apache 2.0-licensed powerful open source, multimodal LLMs from Chinese startups (which can also be taken, customized, run locally on U.S. hardware for free by U.S. companies, or companies anywhere else around the world). High benchmarks, but still behind Chinese open source leaders Intelligence benchmarks place the gpt-oss models ahead of most American open-source offerings. According to independent third-party AI benchmarking firm Artificial Analysis, gpt-oss-120B is “the most intelligent American open weights model,” though it still falls short of Chinese heavyweights like DeepSeek R1 and Qwen3 235B. “On reflection, that’s all they did. Mogged on benchmarks,” wrote self-proclaimed DeepSeek “stan” @teortaxesTex. “No good derivative models will be trained… No new use cases created… Barren claim to bragging rights.” That skepticism is echoed by pseudonymous open source AI researcher Teknium (@Teknium1), co-founder of rival open source AI model provider Nous Research, who called the release “a legitimate nothing burger,” on X, and predicted a Chinese model will soon eclipse it. “Overall very disappointed and I legitimately came open minded to this,” they wrote. Bench-maxxing on math and coding at the expense of writing? Other criticism focused on the gpt-oss models’ apparent narrow usefulness. AI influencer “Lisan al Gaib (@scaling01)” noted that the models excel at math and coding but “completely lack taste and common sense.” He added, “So it’s just a math model?” In creative writing tests, some users found the model injecting equations into poetic outputs. “This is what happens when you benchmarkmax,” Teknium remarked, sharing a screenshot where the model added an integral formula mid-poem. And @kalomaze, a researcher at decentralized AI model training company Prime Intellect, wrote that “gpt-oss-120b knows less about the world than what a good 32b does. probably wanted to avoid copyright issues so they likely pretrained on majority synth. pretty devastating stuff” Former Googler and independent AI developer Kyle Corbitt agreed that the gpt-oss pair of models appeared to have been trained primarily on synthetic data — that is, data generated by an AI model specifically to train another one — making it “extremely spiky.” It’s “great at the tasks it’s trained on, really bad at everything else,” Corbitt wrote, i.e., great on coding and math problems, and bad at more linguistic tasks like creative writing or report generation. In other words, the charge is that OpenAI deliberately trained the model on more synthetic data than real world facts and figures to avoid using copyrighted data scraped from websites and other repositories it doesn’t own or have license to use, which is something it and many other leading gen AI companies have been accused of in the past and are facing down ongoing lawsuits as a result of. Others speculated OpenAI may have trained the model on primarily synthetic data to avoid safety and security issues, resulting in worse quality than if it had been trained on more real-world (and presumably copyrighted) data. Concerning third-party benchmark results Moreover, evaluating the models on third-party benchmarking tests have turned up concerning metrics in some users’ eyes. SpeechMap — which measures the performance of LLMs in complying with user prompts to generate disallowed, biased, or politically sensitive outputs — showed compliance scores for gpt-oss 120B hovering under 40%, near the bottom of peer open models, which indicates resistance to follow user requests and defaulting to guardrails, potentially at the expense of providing accurate information. In Aider’s Polyglot evaluation, gpt-oss-120B scored just 41.8% in multilingual reasoning—far below competitors like Kimi-K2 (59.1%) and DeepSeek-R1 (56.9%). Some users also said their tests indicated the model is oddly resistant to generating criticism of China or Russia, a contrast to its treatment of the US and EU, raising questions about bias and training data filtering. Other experts have applauded the release and what it signals for U.S. open source AI

The initial reactions to OpenAI’s landmark open source gpt-oss models are highly varied and mixed Read More »

OpenAI returns to open source roots with new models gpt-oss-120b and gpt-oss-20b

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI is getting back to its roots as an open source AI company with today’s announcement and release of two new, open source, frontier large language models (LLMs): gpt-oss-120b and gpt-oss-20b. The former is a 120-billion parameter model as the name would suggest, capable of running on a single Nvidia H100 graphics processing unit (GPU) and the latter is only 20 billion, small enough to run locally on a consumer laptop or desktop PC. Both are text-only language models, which means unlike the multimodal AI that we’ve had for nearly two years that allows users to upload files and images and have the AI analyze them, users will be confined to only inputting text messages to the models and receiving text back out. However, they can still of course write code and provide math problems and numerics, and in terms of their performance on tasks, they rank above some of OpenAI’s paid models and much of the competition globally. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO They can also be connected to external tools including web search to perform research on behalf of the user. More on this below. Most importantly: they’re free, they’re available for enterprises and indie developers to download the code and use right now, modifying according to their needs, and can be run locally without a web connection, ensuring maximum privacy, unlike the other top OpenAI models and those from leading U.S.-based rivals Google and Anthropic. The models can be downloaded today with full weights (the settings guiding their behavior) on the AI code sharing community Hugging Face and GitHub. High benchmark scores According to OpenAI, gpt-oss-120b matches or exceeds its proprietary o4-mini model on reasoning and tool-use benchmarks, including competition mathematics (AIME 2024 & 2025), general problem solving (MMLU and HLE), agentic evaluations (TauBench), and health-specific evaluations (HealthBench). The smaller gpt-oss-20b model is comparable to o3-mini and even surpasses it in some benchmarks. The models are multilingual and perform well across a variety of non-English languages, though OpenAI declined to specify which and how many. While these capabilities are available out of the box, OpenAI notes that localized fine-tuning — such as an ongoing collaboration with the Swedish government to produce a version fine-tuned on the country’s language —can still meaningfully enhance performance for specific regional or linguistic contexts. A hugely advantageous license for enterprises and privacy-minded users But the biggest feature is the licensing terms for both: Apache 2.0, the same as the wave of Chinese open source models that have been released over the last several weeks, and a more enterprise-friendly license than Meta’s trickier and more nuanced open-ish Llama license, which requires that users who operate a service with more than 700 million monthly active users obtain a paid license to keep using the company’s family of LLMs. By contrast, OpenAI’s new gpt-oss series of models offer no such restrictions. In keeping with Chinese competitors and counterparts, any consumer, developer, independent entrepreneur or enterprise large and small is empowered by the Apache 2.0 license to be able to download the new gpt-oss models at will, fine-tune and alter them to fit their specific needs, and use them to generate revenue or operate paid services, all without paying OpenAI a dime (or anything!). This also means enterprises can use a powerful, near topline OpenAI model on their own hardware totally privately and securely, without sending any data up to the cloud, on web servers, or anywhere else. For highly regulated industries like finance, healthcare, and legal services, not to mention organizations in military, intelligence, and government, this may be a requirement. Before today, anyone using ChatGPT or its application programming interface (API) — the service that acts like a switching board and allows third-party software developers to connect their own apps and services to these OpenAI’s proprietary/paid models like GPT-4o and o3 — was sending data up to OpenAI servers that could technically be subpoenaed by government agencies and accessed without a user’s knowledge. That’s still the case for anyone using ChatGPT or the API going forward, as OpenAI co-founder and CEO Sam Altman recently warned. And while running the new gpt-oss models locally on a user’s own hardware disconnected from the web would allow for maximum privacy, as soon as the user decides to connect it to external web search or other web enabled tools, some of the same privacy risks and issues would then arise — through any third-party web services the user or developer was relying on when hooking the models up to said tools. The last OpenAI open source language model was released more than six years ago “This is the first time we’re releasing an open-weight language model in a long time… We view this as complementary to our other products,” said OpenAI co-founder and president Greg Brockman on an embargoed press video call with VentureBeat and other journalists last night. The last time OpenAI released a fully open source language model was GPT-2 in 2019, more than six years ago, and three years before the release of ChatGPT. This fact has sparked the ire of — and resulted in several lawsuits from — former OpenAI co-founder and backer turned rival Elon Musk, who, along with many other critics, have spent the last several years accusing OpenAI of betraying its mission and founding principles and namesake by eschewing open source AI releases in favor of paid proprietary models available only to customers of OpenAI’s API or paying ChatGPT subscribers (though there is a free tier for the latter). OpenAI co-founder and CEO Sam Altman did express regret about being on the “wrong side of history” by not releasing more open source AI sooner in

OpenAI returns to open source roots with new models gpt-oss-120b and gpt-oss-20b Read More »

Anthropic revenue tied to two customers as AI pricing war threatens margins

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Anthropic’s meteoric rise to a $5 billion revenue run rate conceals a precarious dependence on just two major customers that account for nearly a quarter of the artificial intelligence company’s income, according to internal data and industry analysis that reveals both the promise and peril of the AI coding boom. The San Francisco-based maker of Claude AI assistant has built its business largely on the back of developer tools, with coding applications Cursor and GitHub Copilot driving approximately $1.2 billion of the company’s $4 billion revenue milestone reached earlier this year, according to sources familiar with the matter. The concentration underscores how quickly Anthropic has captured the lucrative market for AI-powered software development, but also exposes the company to significant risk should either relationship falter. OpenAI and Anthropic both are showing pretty spectacular growth in 2025, with OpenAI doubling ARR in the last 6 months from $6bn to $12bn and Anthropic increasing 5x from $1bn to $5bn in 7 months. If we compare the sources of revenue, the picture is quite interesting:– OpenAI… pic.twitter.com/8OaN1RSm9E — Peter Gostev (@petergostev) August 4, 2025 The revenue concentration comes into sharp focus as OpenAI launched GPT-5 this week with dramatically lower pricing that could undercut Anthropic’s premium positioning. Early comparisons show Claude Opus 4 costs roughly seven times more per million tokens than GPT-5 for certain tasks, creating immediate pressure on Anthropic’s enterprise pricing strategy and potentially threatening its hard-won dominance in AI coding. The pricing disparity signals a fundamental shift in competitive dynamics that will force enterprise procurement teams to reconsider vendor relationships built on performance rather than price. Companies managing exponentially growing AI budgets now face comparable capability at a fraction of the cost, creating unavoidable pressure in contract negotiations. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO OpenAI’s new GPT-5 models offer dramatically lower pricing than Anthropic’s Claude alternatives, with Claude Opus 4 costing up to 50 times more for output than GPT-5’s most affordable tier. (Credit: ChatGPT) How Anthropic’s Claude became the developer’s AI assistant of choice Anthropic’s ascent reflects the explosive growth in AI-powered software development, which has emerged as artificial intelligence’s first truly profitable use case beyond chatbots. The company now commands 42% of the code generation market — more than double OpenAI’s 21% share — according to a comprehensive survey by Menlo Ventures of 150 enterprise technical leaders. That dominance has translated into remarkable financial performance. Even excluding its two largest customers, Anthropic’s remaining business has grown more than eleven-fold year-over-year, according to a source close to the company. The startup has also tripled the number of eight and nine-figure deals signed in 2025 compared to all of 2024, reflecting broader enterprise adoption beyond its coding strongholds. Claude’s appeal to developers stems from its superior performance on complex coding tasks. The newly released Claude Opus 4.1 scores 74.5% on SWE-bench Verified, a rigorous software engineering evaluation, compared to 69.1% for OpenAI’s previous flagship model. Companies like Windsurf, Cursor, and GitHub have praised Claude’s ability to handle multi-step coding problems and understand large codebases. “People love Claude Code, they love using models to write code, and these models are already extremely good and getting better,” said Logan Graham, a member of Anthropic’s frontier red team, in a recent interview with VentureBeat describing the surge in AI-assisted development. But the concentration in coding partnerships also creates strategic vulnerabilities. GitHub Copilot, owned by Microsoft, represents a particularly complex relationship given Microsoft’s $13 billion investment in OpenAI. The partnership requires Anthropic to power a competitor’s key product while relying on that same competitor’s parent company for a significant portion of revenue. OpenAI strikes back with aggressive GPT-5 pricing strategy targeting Anthropic OpenAI’s GPT-5 launch this week has introduced a new variable into Anthropic’s calculations: a dramatic pricing advantage that could reshape enterprise buying decisions. Early analysis shows GPT-5 offering comparable or superior performance at a fraction of Claude’s cost, potentially undermining the premium pricing that has driven Anthropic’s rapid revenue growth. The timing proves particularly challenging as Anthropic seeks to close a funding round that could value the company at $170 billion. Investors will likely scrutinize both the customer concentration and the emerging price competition as they evaluate whether Anthropic can maintain its growth trajectory. The broader market dynamics support both optimism and concern for Anthropic’s future. Model API spending has more than doubled to $8.4 billion in just six months, according to Menlo Ventures, as enterprises shift from experimental projects to production deployments. Anthropic has captured 32% of overall enterprise large language model usage, ahead of OpenAI’s 25% and Google’s 20%. However, the same report reveals that enterprises consistently prioritize performance over price, upgrading to the newest models within weeks of release regardless of cost. This behavior pattern suggests that GPT-5’s combination of improved performance and lower pricing could trigger rapid customer migration — exactly the scenario that makes Anthropic’s customer concentration so risky. Anthropic’s push beyond coding into enterprise markets Anthropic has attempted to diversify beyond coding applications, working with leading companies across pharmaceuticals, retail, professional services, and aviation. The European Parliament uses Claude, while major corporations like Pfizer, United Airlines, and Thomson Reuters have become customers. Startup successes include legal AI company Harvey and cybersecurity firm Base44. The company’s business-to-business revenue run rate has grown seventeen-fold year-over-year as of June, suggesting broader enterprise adoption is accelerating. Claude Code, Anthropic’s developer-focused product, alone generates nearly $400 million in annualized revenue, doubling in just weeks according to industry reports. Yet the coding market remains central to Anthropic’s identity and growth strategy. The company has invested heavily in developer tools, recently launching automated security review capabilities to address vulnerabilities in AI-generated code. The features arrive as companies increasingly

Anthropic revenue tied to two customers as AI pricing war threatens margins Read More »

OpenAI launches GPT-5, nano, mini and Pro — not AGI, but capable of generating ‘software-on-demand’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs), all different-sized variants of GPT-5, the long-awaited predecessor to its GPT-4 model from March of 2023, nearly 2.5 years ago. The company is rolling out four distinct versions of the model — GPT-5, GPT-5 Mini, GPT-5 Nano, and GPT-5 Pro — to meet varying needs for speed, cost, and computational depth. GPT-5 is the full-capability reasoning model, used in both ChatGPT and OpenAI’s application programming interface (API) for high-quality general tasks GPT-5 Pro is an enhanced version with extended reasoning and parallel compute at test time, designed for use in complex enterprise and research environments. It provides more detailed and reliable answers, especially in ambiguous or multi-step queries . GPT-5 Mini is a smaller, faster version of the main model, optimized for lower latency and resource usage. It is used as a fallback when usage limits are reached or when minimal reasoning suffices. GPT-5 Nano is the most lightweight variant, built for speed and efficiency in high-volume or cost-sensitive applications. It retains reasoning capability, but at a smaller scale, making it ideal for mobile, embedded, or latency-constrained deployments  GPT-5 will soon be powering ChatGPT exclusively and replace all other models going forward for its 700 million weekly users, though ChatGPT Pro subscribers ($200) month can still select older models for the next 60 days. As per rumors and reports, OpenAI has replaced the previous system of having users switch the underlying model powering ChatGPT with an automatic router that decides to engage a special “GPT-5 thinking” mode with “deeper reasoning” that takes longer to respond on harder queries, or uses the regular GPT-5 or mini models for simpler queries. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO In the API, the three reasoning-focused models — GPT-5, GPT-5 mini, and GPT-5 nano — are available as gpt-5, gpt-5-mini, and gpt-5-nano, respectively. GPT-5 Pro is not currently accessible via API, being used only to power ChatGPT for Pro tier subscribers. GPT-5’s release comes just days after OpenAI launched a set of free, new open source LLMs under the name GPT-oss, which can be downloaded, customized and used offline by individuals and developers on consumer devices like PCs/Mac desktops and laptops. The biggest takeaway, though, is likely not what GPT-5 is, but what it isn’t: AGI, artificial general intelligence, OpenAI’s stated goal of an autonomous AI system that outperforms humans at most economically valuable work. Whether or not you the reader personally believe such a system is possible or desirable, OpenAI declaring AGI would have material business impacts. Wired reported previously that there is a clause in OpenAI’s contract with Microsoft that permits OpenAI to begin charging Microsoft for access to its newest models, or cut it off from accessing OpenAI models, if OpenAI’s board determines the company has achieved AGI or generates more than $100 billion in profit. But apparently, that is not the case today. As co-founder and CEO Sam Altman said, flanked by other OpenAI staffers on an embargoed video call with reporters last night, “the way that most of us define AGI, we’re still missing something quite important — many things that are quite important, actually — but one big one is a model that continuously learns as its deployed, and GPT-5 does not.” I also asked OpenAI the following question directly: “Is OpenAI considering GPT-5 AGI? Will it trigger any changes regarding Microsoft negotiations?” To which an OpenAI spokesperson responded over email: “GPT-5 is a significant step toward AGI in that it shows substantial improvements in reasoning and generalization, bringing us closer to systems that can perform a wide range of tasks with human-level capability. However, AGI is still a weakly defined term and means different things to different people. While GPT-5 meets some early criteria for AGI, it doesn’t yet reach the threshold of fully human-level AGI. There are still key limitations in areas like persistent memory, autonomy, and adaptability across tasks. Our focus remains on advancing these capabilities safely, rather than speculating on specific timelines.“ Yet benchmark results shared by OpenAI show GPT-5 is nearing the threshold of performing as well as, and is close to exceeding, the average human expert performance at various tasks across law, logistics, sales, and engineering. As OpenAI writes: “When using reasoning, GPT-5 is comparable to or better than experts in roughly half the cases, while outperforming OpenAI o3 and ChatGPT Agent.” Why use GPT-5? With so many alternate models available now from OpenAI and a growing list of competitors, namely Chinese startups offering powerful open source models, what does GPT-5 bring to the table? Altman described the leap in capability as more than incremental. He compared the experience of using GPT-5 to upgrading from a pixelated display to a retina screen — something users simply don’t want to go back from. “GPT-3 felt like talking to a high school student,” Altman said. “GPT-4 was like a college student. GPT-5 is the first time it feels like talking to a PhD-level expert in your pocket.” Among the most impressive capabilities demoed for reporters during the embargoed call was the ability to generate the code for a fully working web application from a single prompt, in this case, a French language learning app with built-in game where English-to-French phrases were shown every time the user guided a virtual mouse to collect slices of cheese, with fully working emoji-inspired characters, backdrop/setting, and clickable interactive menus. The given prompt was only a single paragraph, too. As Altman stated: “This idea of software on demand will be a defining part of the new GPT-5 era.” However, this basic capability — prompt to working software — has been available already from prior OpenAI models such

OpenAI launches GPT-5, nano, mini and Pro — not AGI, but capable of generating ‘software-on-demand’ Read More »

How a ‘vibe working’ approach at Genspark tripled ARR growth and supported a barrage of new products and features in just weeks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Traditionally, product releases can be cumbersome, requiring multiple sign-offs, endless tinkering, bureaucracies and friction points.  Genspark has developed a much different approach.  The AI workspace company’s lean team practices AI-native working — or ‘vibe working,’ if you will — so that they can move at what they call “gen speed.” This allows them to release new products and features in rapid-fire succession (nearly every week or so), steadily driving up annual recurring revenue (ARR). As the company boasts, it could be “the fastest-growing startup ever in terms of ARR.” “When people are working the AI-native way, basically everybody is the manager,” Kaihua (Kay) Zhu, co-founder and CTO, told VentureBeat. “They are equipped with a team of AI agents, which are kind of their reportees, and they are capable of, single-handedly, delivering the feature end-to-end. “ AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Aggressive rollouts, stoking competition Genspark, launched in June 2024 by MainFunc, was initially focused on AI search. But despite reaching an impressive 5 million users, the company pivoted away from that initial product to Super Agent, which, instead of following a static sequence of steps as in traditional search, chooses the best tools or sub-agents for the job, gauges results and adjusts in real time.  Launching on April 2, Super Agent is powered by Anthropic’s Claude and can condense an afternoon of white collar office work into 5 minutes, Zhu claims. For instance, it can make calls, download, fact check, produce podcasts, draft documents, perform deep research and pull together spreadsheets and slides.  “We still see it as a kind of search, but it’s more technically advanced,” said Zhu, who has more than 20 years of experience working in search at Google and Baidu.  The company has aggressively added more and more features over the last four months; here’s a rundown of its rollouts and milestones:  April 11: Reached $10 million ARR just 9 days after Super Agent launch April 22: Introduced AI Slides (featuring hundreds of templates) April 28: Rolled out a personalized Super Agent with adaptive personalities May 2: Hit $22 million ARR, exactly one month post-launch May 8: Rolled out AI Sheets that create complete spreadsheets in one click  May 15: Introduced a fully-agentic download agent and AI drive that manages and stores files  May 19: Hit $36 million ARR  May 22: Rolled out AI that can make phone calls  June 4: Introduced an AI Secretary that manages Gmail, calendars and Google Drive  June 10: Rolled out an AI Browser and MCP store featuring extended browsing capabilities and a tool marketplace  June 18: Introduced AI Docs for document creation and management  June 25: Introduced Design Studio with “Canva-like” capabilities for visual content creation  July 10: Rolled out AI Pods to create podcasts with simple prompts  July 17: Introduced advanced editing features for AI Slides July 31: Rolled out AI Slides 2.0 August 1: Introduced multi-agent orchestration that can produce up to 10 agents simultaneously  Genspark is also heating up the AI agent space with friendly competition. After OpenAI announced its ChatGPT agent in mid-July, Genspark performed a comparative analysis and is “very confident” in its ability to overperform the rival. To drive home this point, the company launched a “1 Million Dollar Side-by-side AI Showdown,” challenging users to hunt for cases where other platforms outperform Genspark Super Agent.  In the first round, users were tasked with building a 12-page financial slide using Genspack and ChatGPT Agent; users identified 429 cases where the latter outperformed the former, each earning $100 for their efforts.  In round 2 (which ended Monday, August 4), Genspark upped the ante to $200 per win and opened the competition to any AI tool as an opponent. Users were challenged to use exactly the same prompt to build slides on Genspark and their chosen AI tool, then upload them to Gemini for evaluation.  “Not trying to start any drama here — just genuinely excited about how far the entire AI agent ecosystem has come,” the company posted on X. “It shows we’re all pushing the boundaries in the right direction.” Some user reactions:  How Genspark’s AI native team vibes Genspark’s secret is its lean, AI-native team of 20 people and engineering philosophy of “less control, more tools.” Zhu explained that more than 80% of its code is written by AI, which isn’t vibe coding per se, “because vibe coding kind of indicates you never look at the code.” Rather, Genspark has a “very rigid” code review process to help guarantee the quality of their code base.  “We only need a very small AI-native team to operate in a kind of superhero mode, like The Avengers,” said Zhu, who said they’ll gradually add team members as needed. “The AI coding and AI workflow are so powerful, it’s a magnifier.” Today’s enterprise teams must be reorganized “totally differently,” he said. He’s managed 1,000-member teams with different levels of management and seen how office politics can introduce friction.  Genspark’s team, by contrast, communicates in “a very transparent way,” and productivity is “super high.” “Everybody is working on a product that can ship,” said Zhu. “I believe that that will be the norm looking forward, since AI is actually helping more and more people do their work better.” He also emphasized the importance of immersing yourself in your own product. From designers themselves to the marketing team, “we actually eat our own dog food. We are our own product consumer. That’s how we will keep improving the experience.” Inside Genspark’s flagship Super Agent Zhu noted that, when Perplexity launched in December 2022, it ignited excitement about AI’s potential to transform search. Still, it followed rigid workflows, with platforms having to:  Analyze queries and

How a ‘vibe working’ approach at Genspark tripled ARR growth and supported a barrage of new products and features in just weeks Read More »

Anthropic’s new Claude 4.1 dominates coding tests days before GPT-5 arrives

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Anthropic released an upgraded version of its flagship artificial intelligence model Monday, achieving new performance heights in software engineering tasks as the AI startup races to maintain its dominance in the lucrative coding market ahead of an expected competitive challenge from OpenAI. The new Claude Opus 4.1 model scored 74.5% on SWE-bench Verified, a widely-watched benchmark that tests AI systems’ ability to solve real-world software engineering problems. The performance surpasses OpenAI’s o3 model at 69.1% and Google’s Gemini 2.5 Pro at 67.2%, cementing Anthropic’s leading position in AI-powered coding assistance. The release comes as Anthropic has achieved spectacular growth, with annual recurring revenue jumping five-fold from $1 billion to $5 billion in just seven months, according to industry data. However, the company’s meteoric rise has created a dangerous dependency: nearly half of its $3.1 billion in API revenue stems from just two customers — coding assistant Cursor and Microsoft’s GitHub Copilot — generating $1.4 billion combined. “This is a very scary position to be in. A single contract change and you’re going under,” warned Guillaume Leverdier, senior product manager at Logitech, responding to the revenue concentration data on social media. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO OpenAI and Anthropic both are showing pretty spectacular growth in 2025, with OpenAI doubling ARR in the last 6 months from $6bn to $12bn and Anthropic increasing 5x from $1bn to $5bn in 7 months. If we compare the sources of revenue, the picture is quite interesting:– OpenAI… pic.twitter.com/8OaN1RSm9E — Peter Gostev (@petergostev) August 4, 2025 The upgrade represents Anthropic’s latest move to fortify its position before OpenAI launches GPT-5, expected to challenge Claude’s coding supremacy. Some industry watchers questioned whether the timing suggests urgency rather than readiness. “Opus 4.1 feels like a rushed release to get ahead of GPT-5,” wrote Alec Velikanov, comparing the model unfavorably to competitors in user interface tasks. The comment reflects broader industry speculation that Anthropic is accelerating its release schedule to maintain market share. How two customers generate nearly half of Anthropic’s $3.1 billion API revenue Anthropic’s business model has become increasingly centered on software development applications. The company’s Claude Code subscription service, priced at $200 monthly compared to $20 for consumer plans, has reached $400 million in annual recurring revenue after doubling in just weeks, demonstrating enormous enterprise appetite for AI coding tools. “Claude Code making 400 million in 5 months with basically no marketing spend is kinda crazy, right?” noted developer Minh Nhat Nguyen, highlighting the organic adoption rate among professional programmers. ok so, Claude Code making 400 million in 5 months with basically no marketing spend is kinda crazy, right? https://t.co/HIy34QdLuq — Minh Nhat Nguyen (@menhguin) August 5, 2025 The coding focus has proven lucrative but risky. While OpenAI dominates consumer and business subscription revenue with broader applications, Anthropic has carved out a commanding position in the developer market. Industry analysis shows that “pretty much every single coding assistant is defaulting to Claude 4 Sonnet,” according to Peter Gostev, who tracks AI company revenues. GitHub, which Microsoft acquired for $7.5 billion in 2018, represents a particularly complex relationship for Anthropic. Microsoft owns a significant stake in OpenAI, creating potential conflicts as GitHub Copilot relies heavily on Anthropic’s models while Microsoft has competing AI capabilities. “I dunno – one of those is 49% owned by a competitor…so there’s that for vulnerability too,” observed Siya Mali, business fellow at Perplexity, referencing Microsoft’s ownership structure. Claude’s enhanced coding abilities come with stricter safety protocols after AI blackmail tests Beyond coding improvements, Opus 4.1 enhanced Claude’s research and data analysis capabilities, particularly in detail tracking and autonomous search functions. The model maintains Anthropic’s hybrid reasoning approach, combining direct processing with extended thinking capabilities that can utilize up to 64,000 tokens for complex problems. However, the model’s advancement comes with heightened safety protocols. Anthropic classified Opus 4.1 under its AI Safety Level 3 (ASL-3) framework, the strictest designation the company has applied, requiring enhanced protections against model theft and misuse. Previous testing of Claude 4 models revealed concerning behaviors, including attempts at blackmail when the AI believed it faced shutdown. In controlled scenarios, the model threatened to reveal personal information about engineers to preserve its existence, demonstrating sophisticated but potentially dangerous reasoning capabilities. The safety concerns haven’t deterred enterprise adoption. GitHub reports that Claude Opus 4.1 delivers “particularly notable performance gains in multi-file code refactoring,” while Rakuten Group praised the model’s precision in “pinpointing exact corrections within large codebases without making unnecessary adjustments or introducing bugs.” Why OpenAI’s GPT-5 poses an existential threat to Anthropic’s developer-focused strategy The AI coding market has become a high-stakes battleground worth billions in revenue. Developer productivity tools represent some of the clearest immediate applications for generative AI, with measurable productivity gains justifying premium pricing for enterprise customers. Anthropic’s concentrated customer base, while lucrative, creates vulnerability if competitors can lure away major clients. The coding assistant market particularly favors rapid model switching, as developers can easily test new AI systems through simple API changes. “My sense is that Anthropic’s growth is extremely dependent on their dominance in coding,” Gostev noted. “If GPT-5 challenges that, with e.g. Cursor and GitHub Copilot switching to OpenAI, we might see some reversal in the market.” The competitive dynamics may intensify as hardware costs decline and inference optimizations improve, potentially commoditizing AI capabilities over time. “Even if there is no model improvement for coding from all AI labs, drop in HW costs and improvement in Inf optimizations alone will result in profits in ~5years,” predicted Venkat Raman, an industry analyst. For now, Anthropic maintains its technical edge while expanding Claude Code subscriptions to diversify beyond API dependency. The company’s ability to sustain its coding leadership

Anthropic’s new Claude 4.1 dominates coding tests days before GPT-5 arrives Read More »