DeepSeek-R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More (Updated Monday, 1/27 8am) DeepSeek-R1’s release last Monday has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. Matching OpenAI’s o1 at just 3%-5% of the cost, this open-source model has not only captivated developers but also challenges enterprises to rethink their AI strategies. The model has rocketed to become the top-trending model being downloaded on HuggingFace (109,000 times, as of this writing), as developers rush to try it out and seek to understand what it means for their AI development. Users are commenting that DeepSeek’s accompanying search feature (which you can find at DeepSeek’s site) is now superior to competitors like OpenAI and Perplexity, and is rivaled only by Google’s Gemini Deep Research. (Update as of Monday 1/27, 8am: DeepSeek has also shot up to the top of the iPhone app store, and caused a selloff on Wall Street this morning as investors reexamine the efficiencies of capital expenditures by leading U.S. AI companies.) The implications for enterprise AI strategies are profound: With reduced costs and open access, enterprises now have an alternative to costly proprietary models like OpenAI’s. DeepSeek’s release could democratize access to cutting-edge AI capabilities, enabling smaller organizations to compete effectively in the AI arms race. This story focuses on exactly how DeepSeek managed this feat, and what it means for the vast number of users of AI models. For enterprises developing AI-driven solutions, DeepSeek’s breakthrough challenges assumptions of OpenAI’s dominance — and offers a blueprint for cost-efficient innovation. It’s “how” DeepSeek did what it did that should be the most educational here. DeepSeek-R1’s breakthrough #1: Moving to pure reinforcement learning In November, DeepSeek made headlines with its announcement that it had achieved performance surpassing OpenAI’s o1, but at the time it only offered a limited R1-lite-preview model. With Monday’s full release of R1 and the accompanying technical paper, the company revealed a surprising innovation: a deliberate departure from the conventional supervised fine-tuning (SFT) process widely used in training large language models (LLMs). SFT, a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model. This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets. While some flaws emerged — leading the team to reintroduce a limited amount of SFT during the final stages of building the model — the results confirmed the fundamental breakthrough: Reinforcement learning alone could drive substantial performance gains. The company got much of the way using open source — a conventional and unsurprising way First, some background on how DeepSeek got to where it did. DeepSeek, a 2023 spinoff of Chinese hedge fund High-Flyer Quant, began by developing AI models for its proprietary chatbot before releasing them for public use. Little is known about the company’s exact approach, but it quickly open-sourced its models, and it’s extremely likely that the company built upon the open projects produced by Meta, for example the Llama model, and ML library Pytorch.  To train its models, High-Flyer Quant secured over 10,000 Nvidia GPUs before U.S. export restrictions kicked in, and reportedly expanded to 50,000 GPUs through alternative supply routes despite trade barriers (actually, no one knows; these extras may have been Nvidia H800’s, which are compliant with the barriers and have reduced chip-to-chip transfer speeds). Either way, this pales compared to leading AI labs like OpenAI, Google, and Anthropic, which operate with more than 500,000 GPUs each.   DeepSeek’s ability to achieve competitive results with limited resources highlights how ingenuity and resourcefulness can challenge the high-cost paradigm of training state-of-the-art LLMs. Despite speculation, DeepSeek’s full budget is unknown DeepSeek reportedly trained its base model — called V3 — on a $5.58 million budget over two months, according to Nvidia engineer Jim Fan. While the company hasn’t divulged the exact training data it used (side note: critics say this means DeepSeek isn’t truly open-source), modern techniques make training on web and open datasets increasingly accessible. Estimating the total cost of training DeepSeek-R1 is challenging. While running 50,000 GPUs suggests significant expenditures (potentially hundreds of millions of dollars), precise figures remain speculative. But it was certainly more than the $6 million budget that is often quoted in the media. (Update: Good analysis just released here by Ben Thompson goes into more detail on cost and the significant innovations the company made on the GPU and infrastructure levels.) What’s clear, though, is that DeepSeek has been very innovative from the get-go. Last year, reports emerged about some initial innovations it was making, around things like mixture-of-experts and multi-head latent attention. (Update: Here is a very detailed report just published about DeepSeek’s various infrastructure innovations by Jeffrey Emanuel, a former quant investor and now entrepreneur. It’s long but very good. See the “Theoretical Threat” section about three other innovations worth mentioning: (1) mixed-precision training, which allowed DeepSeek to use 8-bit floating numbers throughout the training, instead of 32-bit, allowing DeepSeek to dramatically reduce memory requirements per GPU, translating into needing fewer GPUs; (2) multi-token predicting during inference; and (3) advances in GPU communication efficiency through their DualPipe algorithm, resulting in higher GPU utilization.) How DeepSeek-R1 got to the “aha moment” The journey to DeepSeek-R1’s final iteration began with an intermediate model, DeepSeek-R1-Zero, which was trained using pure reinforcement learning. By relying solely on RL, DeepSeek incentivized this model to think independently, rewarding both correct answers and the logical processes used to arrive at them. This approach led to an unexpected phenomenon: The model began allocating additional processing time to more complex problems, demonstrating an ability to prioritize tasks based on their difficulty. DeepSeek’s researchers described this as an “aha moment,” where the model itself identified

DeepSeek-R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost Read More »

SAP to give on-prem customers three-year reprieve

“The delay is obviously intended to win over customers for the ‘Rise with SAP’ program,” Hungershausen said, interpreting SAP’s change of strategy. “An understandable step by SAP in the course of its cloud strategy.” From DSAG’s point of view, however, this is unfortunately another measure “that gives the impression of forcing on-premises customers to switch to the cloud.”  However, on-premises users must not be left behind, the DSAG boss has repeatedly demanded. “We therefore believe it is essential that SAP grants companies more flexibility, transparency, and freedom of choice when it comes to their move to the cloud or their desire to continue to rely on SAP’s on-premises software.” While Hungershausen stresses that details are not yet known, “we are in a constructive and critical dialogue with SAP” on this matter, he said. source

SAP to give on-prem customers three-year reprieve Read More »

'Extraordinary' $630M CDK Deal Wraps Auto Dealer Data MDL

By Bryan Koenig ( January 28, 2025, 7:52 PM EST) — A certified class of car dealership app makers is seeking preliminary approval for the final settlement in the years-old web of cases accusing CDK Global of monopolizing auto dealership management software, with a $630 million Wisconsin federal court deal that puts a $140 million premium on estimated damages…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

'Extraordinary' $630M CDK Deal Wraps Auto Dealer Data MDL Read More »

The AI Coding Honeymoon (And What Comes After)

Let me tell you about my first time using an AI coding assistant. Picture this: I enter a prompt and sit at my desk watching in slack-jawed amazement as my digital partner spins up hundreds of lines of perfectly formatted code faster than I can brew my morning coffee. “Add authentication!” POOF! “Create a data visualization!” WHOOSH! “Fix this bug!” ZING! I felt invincible — like I’d discovered a cheat code for software development. Oh, My Sweet Summer Child Don’t get me wrong — that initial magic is real. Going from zero to a thousand lines of code has never been easier. But as my projects grew more complex, I found myself juggling 15 browser tabs, switching between AI chat and integrated development environment (IDE), and watching my productivity drip away with each context switch. The cognitive load was starting to feel like carrying water in a leaky bucket. It’s worth noting here that at this stage I was just working within a web interface (picture ChatGPT in a website, and you’ve got it). Switching to an IDE-embedded chat window with access to multiple large language models (LLMs) with a drop-down made an enormous difference in both productivity and results. Here’s the thing that took me an embarrassingly long time to remember: Writing code is just one of many pieces of software development. Something about that initial AI coding magic made me temporarily forget everything I knew about proper software development lifecycle principles. I was like a kid with a new superpower, trying to solve every problem by shooting laser beams at it. The first breakthrough came when I stopped treating AI as just a code generator and started using it as a partner in the entire development process. I built a workflow that leveraged LLMs to help create design documents, sprint plans, and architecture specifications. For a while, it was perfect — all the magic of AI assistance but with proper structure and planning. All Honeymoons Must End I’d gotten clever with my architecture. Microservices everywhere! Clean separations! But as I wrapped up my third sprint, architectural drift had set in. A little compromise here, a tiny violation of domain boundaries there … death by a thousand tiny, pragmatic decisions. No problem, I thought. We can just refactor everything now! Just me and my AI pair programmer, cleaning house and restoring order. At first, it was glorious. We were moving at lightning speed, reorganizing code across more than a dozen files simultaneously. Then, without any worry or trepidation, I tried to start the FastAPI service that I built for my application’s back end. Oh No! Remember that scene in “Jurassic Park” when they realize the raptors have been testing the fences for weaknesses? That’s what my error messages felt like. Each fix spawned two new problems. My test coverage plummeted to barely 50%, and every attempt to improve it just revealed more issues. My AI assistant and I, previously an unstoppable duo, were now like perfect strangers trying to solve a Rubik’s cube in the dark while wearing oven mitts. The humbling irony? The very tools that had made me feel invincible had just helped me dig the deepest development hole I’d experienced in years. I had moved too fast, changed too much, and completely overwhelmed both my own ability to reason about the system and my AI assistant’s capacity to help fix it. Had I forgotten that design validation for adherence, best practices, and patterns should be conducted in every sprint, rather than kicking the can down the road? Apparently, I had! After days of playing whack-a-mole with errors, I had to make a tough call. The project needed a fresh start, this time with sounder design principles from day one. Sometimes, the best way out of a hole is to stop digging and climb out a different way entirely. The lesson? For the uninitiated, AI-assisted development can be like strapping a rocket engine to a bicycle. It can help you move at incredible speeds, but strap a rocket to a bicycle with wobbly wheels, and, well … you can picture how that ends. The fundamentals of software engineering are now more important than ever: clean architecture; careful design; disciplined development practices; thorough testing. These aren’t constraints — they’re the foundations that let us safely use all this newfound power. So embrace the magic of AI-assisted development, but don’t get drunk on the power like I did. Speed means nothing if you’re running in the wrong direction. And maybe, just maybe, think twice before refactoring a dozen files at once — no matter how invincible your AI sidekick makes you feel. After all, the goal isn’t just to write code faster — it’s to build better software faster. Sometimes, you need to learn that lesson the hard way. I know I did. source

The AI Coding Honeymoon (And What Comes After) Read More »

SAP restructures board to emphasize AI-first, suite-first strategy

SAP is starting the 2025 financial year with a new executive board. The German software company announced that 39-year-old Sebastian Steinhäuser will be promoted to the executive board. In the future, Steinhäuser will head the newly created Strategy & Operations board group and, in this position, will continue to drive the implementation of SAP’s strategy and simplify company processes. The integration of Strategy & Operations with Global Marketing, which is headed by the newly appointed Chief Marketing Officer Ada Agrait, is intended to improve collaboration and strengthen the digital experience for customers and partners, SAP said in a statement. Steinhäuser joined SAP in 2020 and has held various roles, including chief strategy officer. In this role, he led the growth areas of Business Transformation Management, Business Network, and Sustainability. In 2024, he was appointed chief strategy and operations officer. His area of ​​responsibility expanded to include business operations, processes, and IT, as well as partner network and commercial functions. Before joining SAP, Steinhäuser worked at Boston Consulting Group. source

SAP restructures board to emphasize AI-first, suite-first strategy Read More »

Strategic Considerations for Japan Semiconductor Players

Japan is vigorously revitalizing its semiconductor industry to reclaim its leadership position in the global chip market. To achieve this vision, Japan has implemented a multi-faceted strategy. First, through strategic subsidies, it has successfully attracted major international manufacturers like TSMC to invest in advanced process technologies, thereby enhancing domestic manufacturing capabilities. Second, it has established a collaborative model between industry, government, and academia to advance research on 2nm process technology, with mass production targeted for 2027. Japan is also leveraging its strengths in semiconductor materials to develop advanced packaging technologies based on the 2nm process. By securing a strong foundation in mature processes while advancing advanced processes, Japan aims to achieve its ambitious target of 15 trillion yen in domestic semiconductor sales by 2030. Due to various historical factors, Japan’s semiconductor industry has largely retreated from the global market, with limited exposure to globalization and maintaining primarily an Integrated Device Manufacturer (IDM) model. Their product applications focus mainly on mature process chips for automotive and home appliances, leaving them technologically behind leading nations. To revitalize their market position, Japan must better understand market developments and competitive dynamics. For Japanese semiconductor companies, we believe three key developments require close attention in 2025. Driven by AI, Data Centers Will be the Key Driver from an Application Perspective The global semiconductor market will maintain growth in 2025, benefiting from the rising demand for AI and generative AI. IDC sees vigorous development opportunities for industries such as IoT, automotive and autonomous vehicles, terminal devices, and communications. Computing power is a must to support these. Beyond that, coupled with the concept of sovereign AI that has gradually been emphasized by various countries, more expansion is expected in Southeast Asia, India, and other emerging markets for building new data centers. It is also expected that data centers will be the application area with the most significant growth in 2025. 2025 Will be a Critical Year for 2nm Technology With all three major foundries entering 2nm mass production, 2025 will be a critical year for 2nm technology. TSMC is actively expanding its fabs in Hsinchu and Kaohsiung, which is expected to enter mass production in the second half of the year. Samsung, following past trends, is expected to enter production earlier than TSMC. Intel will focus on 18A, which already has Backside Power Delivery Network (BSPDN), under strategic adjustment. The above three major players will confront critical optimization challenges in balancing performance, power consumption, and cost per area with the 2 nm technology. In particular, the 2nm technology will simultaneously start mass production of key products, such as Smartphone AP, Mining Chip, AI Accelerator, etc. By then, the yield rate of each company will improve, and the pace of production expansion will become the focus of market attention. Chinese Foundry Players are Still Performing Well Despite the Trade Restrictions. Utilization rate (UTR) of China’s foundry players remains high in 2024, benefiting from the “Design by China + Manufacturing in China” policy and its highly competitive wafer pricing. China foundry players’ UTR is expected to be approximately 87% in 2025. Driven by the “China+1” policy, there will be more orders transferred from China to Taiwan from U.S. Fabless, which will help Taiwan foundry players’ UTR to improve. IDC expects the UTR of Taiwan in 2025 will be 79%. Due to policy restrictions on advanced process development, China’s semiconductor strategy focuses on mature process technologies. The current government subsidies are now linked to operational performance, requiring fabs to secure orders and maintain high utilization rates. This will significantly impact wafer prices and competitive dynamics, making it a critical concern for Japanese semiconductor companies. Japanese semiconductor companies need to closely monitor China’s development of third-generation semiconductors alongside its advanced process technologies. Wide-bandgap semiconductors like SiC and GaN are vital for EVs, 5G, and green energy. The substrate is usually an issue for SiC cost, but its share of the total cost goes down from 49% to 45% due to China players being aggressive in building EPI and expanding the capacity, this will help speed up the usage for silicon carbide. We expect China will drive more impact on the market after it prioritizes the development of expanding SiC and GaN markets. The Biden administration has included third-generation semiconductors in its “Section 301 investigation” of China’s mature process technologies, particularly in light of China’s aggressive development in this sector. Since third-generation semiconductors are also a key development target for Japan’s future, China’s expansion and movements in this sector require ongoing monitoring. Conclusion AI has become the key to impacting the whole industry and computing power will play a very crucial role in developing and deploying AI for all applications. To support that, a leading-edge node like 2nm will be more important. In the meantime, we expect China players will take more actions to break through in the next AI era. To cope with the changing environment in the future, Japan’s semiconductor players need to build a comprehensive strategy, more technical innovation and new cooperation alliances will be key to building competitive strengths. source

Strategic Considerations for Japan Semiconductor Players Read More »

Alibaba’s Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Alibaba Cloud unveiled its Qwen2.5-Max model today, marking the second major artificial intelligence breakthrough from China in less than a week, further rattling U.S. technology markets and intensifying concerns about America’s eroding AI leadership. The new model outperforms DeepSeek’s R1 model, whose success sent Nvidia’s stock plunging 17% on Monday, in several key benchmarks including Arena-Hard, LiveBench and LiveCodeBench. Qwen2.5-Max also demonstrates competitive results against industry leaders like GPT-4o and Claude-3.5-Sonnet in tests of advanced reasoning and knowledge. “We have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes,” Alibaba Cloud announced in a blog post. The company emphasized its model’s efficiency, having trained it on over 20 trillion tokens while using a mixture-of-experts architecture that requires significantly fewer computational resources than traditional approaches. The timing of these back-to-back Chinese AI releases has deepened Wall Street’s anxiety about U.S. technological supremacy. Both announcements came during President Trump’s first week back in office, prompting questions about the effectiveness of U.S. chip export controls meant to slow China’s AI advancement. Qwen2.5-Max outperforms major AI models across key benchmarks, including a significant lead in Arena-Hard testing, where it scored 89.4%. (Source: Alibaba Cloud) How Qwen2.5-Max could reshape enterprise AI strategies For CIOs and technical leaders, Qwen2.5-Max’s architecture represents a potential shift in enterprise AI deployment strategies. Its mixture-of-experts approach demonstrates that competitive AI performance can be achieved without massive GPU clusters, potentially reducing infrastructure costs by 40-60% compared to traditional large language model deployments. The technical specifications show sophisticated engineering choices that matter for enterprise adoption. The model activates only specific neural network components for each task, allowing organizations to run advanced AI capabilities on more modest hardware configurations. This efficiency-first approach could reshape enterprise AI roadmaps. Rather than investing heavily in data center expansions and GPU clusters, technical leaders might prioritize architectural optimization and efficient model deployment. The model’s strong performance in code generation (LiveCodeBench: 38.7%) and reasoning tasks (Arena-Hard: 89.4%) suggests it could handle many enterprise use cases while requiring significantly less computational overhead. However, technical decision-makers should carefully consider factors beyond raw performance metrics. Questions about data sovereignty, API reliability and long-term support will likely influence adoption decisions, especially given the complex regulatory landscape surrounding Chinese AI technologies. Qwen2.5-Max achieves top scores across key AI benchmarks, including 94.5% accuracy in mathematical reasoning tests, outperforming major competitors. (Source: Alibaba Cloud) China’s AI leap: how efficiency is driving innovation Qwen2.5-Max’s architecture reveals how Chinese companies are adapting to U.S. restrictions. This efficiency-focused innovation suggests China may have found a sustainable path to AI advancement despite limited access to cutting-edge chips. The technical achievement here cannot be overstated. While U.S. companies have focused on scaling up through brute computational force — exemplified by OpenAI’s estimated use of over 32,000 high-end GPUs for its latest models — Chinese companies are finding success through architectural innovation and efficient resource use. U.S. export controls: catalysts for China’s AI renaissance? These developments force a fundamental reassessment of how technological advantage can be maintained in an interconnected world. U.S. export controls, designed to preserve American leadership in AI, may have inadvertently accelerated Chinese innovation in efficiency and architecture. “The scaling of data and model size not only showcases advancements in model intelligence but also reflects our unwavering commitment to pioneering research,” Alibaba Cloud stated in its announcement. The company emphasized its focus on “enhancing the thinking and reasoning capabilities of large language models through the innovative application of scaled reinforcement learning.” What Qwen2.5-Max means for enterprise AI adoption For enterprise customers, these developments could herald a more accessible AI future. Qwen2.5-Max is already available through Alibaba Cloud’s API services, offering capabilities similar to leading U.S. models at potentially lower costs. This accessibility could accelerate AI adoption across industries, particularly in markets where cost has been a barrier. However, security concerns persist. The U.S. Commerce Department has launched a review of both DeepSeek and Qwen2.5-Max to assess potential national security implications. The ability of Chinese companies to develop advanced AI capabilities despite export controls raises questions about the effectiveness of current regulatory frameworks. The future of AI: efficiency over power? The global AI landscape is shifting rapidly. The assumption that advanced AI development requires massive computational resources and cutting-edge hardware is being challenged. As Chinese companies demonstrate the possibility of achieving similar results through efficient innovation, the industry may be forced to reconsider its approach to AI advancement. For U.S. technology leaders, the challenge is now twofold: responding to immediate market pressures while developing sustainable strategies for long-term competition in an environment where hardware advantages may no longer guarantee leadership. The next few months will be crucial as the industry adjusts to this new reality. With both Chinese and U.S. companies promising further advances, the global race for AI supremacy enters a new phase — one where efficiency and innovation may prove more important than raw computational power. source

Alibaba’s Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI Read More »

Inside New Commerce Tech Restrictions: Mitigation Strategies

By Peter Jeydel ( January 24, 2025, 5:49 PM EST) — The U.S. Department of Commerce’s Bureau of Industry and Security has issued the final rule that will determine how its Information and Communications Technology and Services regulations will work going forward.[1]… Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Inside New Commerce Tech Restrictions: Mitigation Strategies Read More »

Take An Audience Centric Approach To Create Compelling B2B Event Content

Delivering compelling B2B event content is central to an event’s success, but it’s a high-stakes endeavor. Leaders must build organizational alignment around a key theme that aligns to business objectives. They need to use this theme to craft a narrative that feeds into all aspects of event content. They must coordinate with demanding speakers. Resourcing and budgets are stretched. And all of this happens under the pressure of the unmovable deadline of a live event. Forrester research has identified three key challenges that marketers face when it comes to managing their event content: Measuring content effectiveness. Budgets are under pressure, and marketers need to better measure the impact of their event content, but two-thirds of B2B marketers tell us they find this difficult, and many organizations lack the integrated event infrastructure required for effective content measurement. Offering personalized content. Marketers recognize that attendees want (and now expect) higher levels of personalized event content, but they struggle to deliver this. While AI holds the potential to help here, marketers are reluctant (or unable) to fully exploit its capabilities. Driving post-event engagement. Over half of marketers struggle to create content that nurtures attendees post-event and helps to build “community.” Too often, event teams need to shift focus to the next event and lack the bandwidth or internal support to focus here. Take An Audience-Centric Approach To Create Enduring, Impactful Event Content   To overcome these challenges, marketers must take a more disciplined, process-driven approach to their event content strategy and creation. The Forrester Event Content Lifecycle Framework places the target audience at the center of event content planning. It breaks the event content lifecycle into the four key phases of pre-event content planning, pre-event content production, at-event content delivery, and post-event content value realization. For each of these phases, we examine the objectives, inputs, activities, team, and infrastructure that leaders need to consider. Forrester clients can read the report, Master B2B Event Content Best Practices To Drive Engagement, which goes into each of these phases in more detail, and can also request a guidance session to discuss their own event content strategies! source

Take An Audience Centric Approach To Create Compelling B2B Event Content Read More »