DeepSeek unveils new technique for smarter, scalable AI reward models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs).  Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for open-ended tasks and domains where current models can’t capture the nuances and complexities of their environment and users. The crucial role and current limits of reward models Reinforcement learning (RL) has become a cornerstone in developing state-of-the-art LLMs. In RL, models are fine-tuned based on feedback signals that indicate the quality of their responses.  Reward models are the critical component that provides these signals. Essentially, an RM acts as a judge, evaluating LLM outputs and assigning a score or “reward” that guides the RL process and teaches the LLM to produce more useful responses. However, current RMs often face limitations. They typically excel in narrow domains with clear-cut rules or easily verifiable answers. For example, current state-of-the-art reasoning models such as DeepSeek-R1 underwent an RL phase, in which they were trained on math and coding problems where the ground truth is clearly defined. However, creating a reward model for complex, open-ended, or subjective queries in general domains remains a major hurdle. In the paper explaining their new technique, researchers at DeepSeek AI write, “Generalist RM requires to generate high-quality rewards beyond specific domains, where the criteria for rewards are more diverse and complex, and there are often no explicit reference or ground truth.”  They highlight four key challenges in creating generalist RMs capable of handling broader tasks: Input flexibility: The RM must handle various input types and be able to evaluate one or more responses simultaneously. Accuracy: It must generate accurate reward signals across diverse domains where the criteria are complex and the ground truth is often unavailable.  Inference-time scalability: The RM should produce higher-quality rewards when more computational resources are allocated during inference. Learning scalable behaviors: For RMs to scale effectively at inference time, they need to learn behaviors that allow for improved performance as more computation is used. Different types of reward models Credit: arXiv Reward models can be broadly classified by their “reward generation paradigm” (e.g., scalar RMs outputting a single score, generative RMs producing textual critiques) and their “scoring pattern” (e.g., pointwise scoring assigns individual scores to each response, pairwise selects the better of two responses). These design choices affect the model’s suitability for generalist tasks, particularly its input flexibility and potential for inference-time scaling.  For instance, simple scalar RMs struggle with inference-time scaling because they will generate the same score repeatedly, while pairwise RMs can’t easily rate single responses.  The researchers propose that “pointwise generative reward modeling” (GRM), where the model generates textual critiques and derives scores from them, can offer the flexibility and scalability required for generalist requirements. The DeepSeek team conducted preliminary experiments on models like GPT-4o and Gemma-2-27B, and found that “certain principles could guide reward generation within proper criteria for GRMs, improving the quality of rewards, which inspired us that inference-time scalability of RM might be achieved by scaling the generation of high-quality principles and accurate critiques.”  Training RMs to generate their own principles Based on these findings, the researchers developed Self-Principled Critique Tuning (SPCT), which trains the GRM to generate principles and critiques based on queries and responses dynamically.  The researchers propose that principles should be a “part of reward generation instead of a preprocessing step.” This way, the GRMs could generate principles on the fly based on the task they are evaluating and then generate critiques based on the principles.  “This shift enables [the] principles to be generated based on the input query and responses, adaptively aligning [the] reward generation process, and the quality and granularity of the principles and corresponding critiques could be further improved with post-training on the GRM,” the researchers write. Self-Principled Critique Tuning (SPCT) Credit: arXiv SPCT involves two main phases: Rejective fine-tuning: This phase trains the GRM to generate principles and critiques for various input types using the correct format. The model generates principles, critiques and rewards for given queries/responses. Trajectories (generation attempts) are accepted only if the predicted reward aligns with the ground truth (correctly identifying the better response, for instance) and rejected otherwise. This process is repeated and the model is fine-tuned on the filtered examples to improve its principle/critique generation capabilities. Rule-based RL: In this phase, the model is further fine-tuned through outcome-based reinforcement learning. The GRM generates principles and critiques for each query, and the reward signals are calculated based on simple accuracy rules (e.g., did it pick the known best response?). Then the model is updated. This encourages the GRM to learn how to generate effective principles and accurate critiques dynamically and in a scalable way. “By leveraging rule-based online RL, SPCT enables GRMs to learn to adaptively posit principles and critiques based on the input query and responses, leading to better outcome rewards in general domains,” the researchers write. To tackle the inference-time scaling challenge (getting better results with more compute), the researchers run the GRM multiple times for the same input, generating different sets of principles and critiques. The final reward is determined by voting (aggregating the sample scores). This allows the model to consider a broader range of perspectives, leading to potentially more accurate and nuanced final judgments as it is provided with more resources. However, some generated principles/critiques might be low-quality or biased due to model limitations or randomness. To address this, the researchers introduced a “meta RM”—a separate, lightweight scalar RM trained specifically to predict whether a principle/critique generated by the primary GRM will likely lead to a correct final reward.  During inference, the meta RM evaluates the generated samples and filters out the low-quality judgments before the final voting, further enhancing scaling performance. Putting SPCT into practice with

DeepSeek unveils new technique for smarter, scalable AI reward models Read More »

Judge Rules That Google Is An Illegal Monopoly

Meta’s not the only Big Tech company in the hot seat this week. US District Judge Leonie Brinkema found Google liable for illegally monopolizing two online advertising technology markets: publisher ad servers and ad exchanges. This comes less than a year after another federal judge ruled that the company had a monopoly in online search. Google disagrees with the court’s decision and plans to appeal the ruling, asserting that publishers choose Google over other options because its tech tools are “simple, affordable, and effective.” As we’ve said before, the impact of these cases won’t be fully realized until the remedies stage, which may take years to play out. Any order to break up Google will spend time in the court of appeals and potentially go to the Supreme Court. When we surveyed consumers about Google’s illegal monopolies, only 18% said they “believe that Google will have to break up.” The Google Era Gives Way To A Google Overhaul Judge Brinkema’s ruling, paired with Judge Amit Mehta’s finding that Google maintains an illegal search monopoly, raises the likelihood of Google’s overhaul. The Department of Justice specifically requested divestment of Google Ad Manager, which includes its publisher ad exchange and ad server. At least, Google will be compelled to not destroy evidence of its monopolization going forward. According to Judge Brinkema, “Google’s systemic disregard of the evidentiary rules regarding spoliation of evidence and its misuse of the attorney-client privilege may well be sanctionable.” In addition, Google’s publisher adtech could be restructured by separating its ad server from its ad exchange, opening the loop between two products that have been tied to competition’s detriment. Publishers Can Expect (Eventual) Changes To The Sell-Side Adtech Ecosystem This ruling heightens (the already substantial) counterparty risk between Google and publishers, which is exacerbated by generative AI. Google’s AI Overviews, which facilitate zero-click searches, retain traffic that would, pre-ChatGPT, land on publishers’ sites. During guidance sessions, publishers tell us that they’re losing tons of traffic to AI Overviews. Publishers missing traffic must now deal with uncertainty about the future of Google’s sell-side adtech. Advertisers, however, are relatively unaffected by this decision. The DOJ failed to prove that Google has a monopoly on tech advertisers’ usage to buy display ads. In ruling for Google on the buy side, where Google fortifies tech acquired from DoubleClick and Admeld, Judge Brinkema found that advertisers choose among various ad platforms based on perceived return on ad spend. Advertisers continue to be dissatisfied by Google’s buy-side adtech’s lack of transparency and control, but Google doesn’t monopolize that market. Forrester clients: Let’s chat more about this via a Forrester guidance session. source

Judge Rules That Google Is An Illegal Monopoly Read More »

Trade Wars to Tech Wars: Can China’s Stimulus Offset U.S. Tariffs in ICT Markets?

The U.S.-China tech rivalry has escalated to a new level this April 2025 with U.S. tariffs becoming a targeted trade tool. The Trump administration unleashed​ waves of tariffson Chinese goods: on March 4, a 10% tariff on all imports was imposed on top of raising tariffs from 10% to 20% on many Chinese electronics, machinery and industrial components; on April 2, ending of de minimis eligibility for China and Hong Kong (from May 2) and the “reciprocal tariffs” on key critical sectors imposed an additional 34%; and on April 8, an additional 50% tariff on semiconductors, EVs, and robotics was announced. There also continues to be tariff escalations, clarifications and exemptions like in cases where final products have more than 20% of U.S. produced components. Chinese imports can be as high as 245% on needles and syringes or as low as zero for children’s books. Imported smartphones, computers and electronics appear to be currently granted a partial tariff reprieve and may only be subject to the March tariffs of 20%. These adjustments will probably continue as the impacts are felt by American consumers and the global markets. Some view these measures as a means to derail China’s technological ascendancy by inflating costs, disrupting supply chains, and isolating it from global markets. Beijing’s counterstrategy, a mix of aggressive fiscal stimulus packages, supply chain resilience frameworks, and enforced technology self-reliance, suggests a calculated pivot to absorb short-term shocks while securing long-term growth. The question then is: can China’s 2025 policy playbook neutralize U.S. tariffs’ impacts and sustain its ICT ambitions? U.S. Tariffs’ Impact on China’s ICT Sector IDC’s 2025 projections reveal a sector under strain but adapting. Our baseline scenario, with 20% tariffs in place, China’s ICT spending is expected to grow at 9.1% driven by domestic AI, cloud, and industrial software demand. With 50% tariffs, IDC’s downside scenario will slow down growth to 5.7%, with consumer electronics (PCs, smartphones) declining ​7.6% due to inflated import duty. An optimistic scenario, with tariffs rolled back, sees China’s ICT spending growth at 9.9%, fueled by pent-up innovation and continued existing global partnerships. Contributing to China’s ICT spending in 2025 are the key trends we are seeing in software/cloud services (+10 to 16% YoY), largely due to organizations prioritization of digital efficiency, as well as in industrial technologies such as AI, IoT, automation, which remains resilient due to state subsidies. While the consumer hardware export market is expected to falter (e.g., iPhone costs rose 25% post-April hikes), domestic demand is expected to remain steady with government subsidies. China’s Growing Tech Independence There is increased emphasis on China’s “dual circulation” strategy that was originally a response to the U.S. tariffs and other sanctions introduced between 2018 and 2020. The strategy seeks to prioritize domestic consumption and non-western international trade to gain greater self-reliance and resilience. This strategy can be seen at work in the likes of DeepSeek whose open-source models are now powering a ​significant portion of Chinese Cloud services including Tencent, Alibaba and many more. Huawei’s Ascend AI chips also increased their share of AI-accelerator chips to ​27% in 2024 and is expected to reach 40% by the end of 2025. Companies’ Response: Increased Agility to Respond to Tariff Chaos Agility is the name of the game amid all this tariff chaos. Chinese tech giants are restructuring their supply chains by accelerating offshoring to Southeast Asia, shifting their assembly lines to sidestep tariffs. They are also diversifying their markets by pivoting to emerging markets, such as expanding electric vehicle and cloud service exports to tariff-immune regions. Some companies are also innovating operations by adopting leaner strategies like AI-powered factories to cut waste or using direct shipping tech from e-commerce platforms to bypass tariffs. China’s 5-Point Plan: A Phase-Matched Counterattack In response to each wave of tariffs, a 5-point plan helped blunt immediate impacts while increasing long-term leverage: 1. Domestic Demand Boost via “Consumer Upgrade Action” Plan With the aim of boosting domestic demand and spurring economic growth, the Chinese government has put in place subsidies and trade-in programs for eligible consumer goods. For smartphones, tablets, and smartwatches, the government subsidy is up to 15% of product price, capped at ¥500/item. This trade-in program is expanded in 2025 to apply to other electronics, EVs and home appliances as well, as illustrated in the following chart: The subsidies also target rural/low-tier cities for 5G adoption, smart home devices, and rural e-commerce logistics. There are also plans to stabilize consumer confidence through stock/real estate market reforms and wage growth policies. The effect of these subsidies can be seen in the latest sales-out PC shipments with flat growth of 1% in 1Q 2025 compared to -16% in 1Q 2024. 2. Increased Funding for Emerging Tech China’s $138B Innovation Fund aims to boost homegrown tech innovation and reduce foreign reliance amid escalating U.S. tariffs. It focuses on discovering and increasing “original technological breakthroughs” in early-stage startups in AI, quantum computing, hydrogen energy, biomanufacturing, and 6G technology. Funding is a combination of state capital and private/local government long-term (over 20 years) investments in R&D infrastructure and tech-to-product pipelines. The innovation fund also involves industry stakeholders such as the MIIT (Ministry of Industry and Information Technology), academia, enterprises (to enhance smart manufacturing), and foreign collaborators in the telecom/robotics sectors. The program also aims to cultivate and highlight domestic STEM talent to offset global supply chain risks, with existing success stories like DeepSeek. 3. China’s “Five Financial Priorities” Guidelines These guidelines provide financial support to organizations providing technology, green finance, digitalization, financial inclusion, and pension products and services. It uses technology investments to bolster innovation and self-reliance. Key measures include: comprehensive financing for national tech projects and SMEs via equity, debt, and insurance tools; capital market focus prioritizing early-stage investments in emerging technology through multi-layered markets; risk mitigation mechanisms to disperse R&D risks and expand venture capital/angel funding; and patient capital to cultivate long-term investments that nurture tech leaders, unicorns, and specialized SMEs. This framework integrates financial resources to advance China’s tech competitiveness and industrial

Trade Wars to Tech Wars: Can China’s Stimulus Offset U.S. Tariffs in ICT Markets? Read More »

From Copilot to agent – AI is growing up, and CISOs need to be ready

Now, agentic AI has stepped into the spotlight. More autonomous and adaptive than its predecessors, this next-gen approach can take on more complex security tasks, anticipate emerging threats, and dynamically adjust defenses in real-time. This class of advanced AI systems is designed to operate autonomously, making decisions and taking actions to reach specific goals with little to no human monitoring. The big difference is that agentic AI uses advanced reasoning, adaptability, and learning capabilities to independently navigate complex tasks rather than relying on existing AI’s human approval and guidance to make decisions. It’s an astonishing step ahead, combining the power of large language models (LLMs) and real-time data processing to act as a proactive “agent” in dynamic environments without human intervention. But questions linger. Will the AI take over entire processes? And if so, could the lack of a human in the loop cause unexpected issues? For example, might an agentic AI stop or block a legitimate business transaction because the agent thinks it’s fraud? Alternatively, could the agent accidentally create a vulnerability that can be exploited? Understanding the potential of agentic AI For CISOs, agentic AI represents both a transformative opportunity and a strategic shift. As cyber threats grow in speed and sophistication, CISOs are pressured to maintain or boost their organizational resilience while managing resource constraints and/or worker burnout. That’s where agentic AI can make its mark – stepping in as a force multiplier, automating decision-making, adapting to evolving threats, and enabling CISOs to evolve from reactive defenders to architects of business-aligned security strategies. source

From Copilot to agent – AI is growing up, and CISOs need to be ready Read More »

Zurich Stuck With $12.2M Solar Farm Verdict, Judge Rules

By Chart Riggall ( April 18, 2025, 4:41 PM EDT) — A Georgia federal judge has shot down Zurich American Insurance Co.’s bid to escape a $12.2 million judgment that followed a January trial where a jury found the insurer shortchanged a Peach State solar farm’s claim for storm damage…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Zurich Stuck With $12.2M Solar Farm Verdict, Judge Rules Read More »

OpenAI's New AI Models o3 and o4-mini Can Now 'Think With Images'

OpenAI’s CEO Sam Altman. Image: Creative Commons OpenAI has rolled out two new AI models, o3 and o4‑mini, that can literally “think with images,” marking a big step forward in how machines understand pictures. These models, announced in an OpenAI press release, can reason about images the same way they do about text — cropping, zooming, and rotating photos as part of their internal thought process. At the heart of this update is the ability to blend visual and verbal reasoning. “OpenAI o3 and o4‑mini represent a significant breakthrough in visual perception by reasoning with images in their chain of thought,” the company said in its press release. Unlike past versions, these models don’t rely on separate vision systems — instead, they natively mix image tools and text tools for richer, more accurate answers. How does ‘thinking with images’ work? The models can crop, zoom, rotate, or flip an image as part of their thinking process, just like humans would. They’re not just recognizing what’s in a photo but working with it to draw conclusions. The company notes that “ChatGPT’s enhanced visual intelligence helps you solve tougher problems by analyzing images more thoroughly, accurately, and reliably than ever before.” This means if you upload a photo of a handwritten math problem, a blurry sign, or a complicated chart, the model can not only understand it, but also break it down step by step — possibly even better than before. More must-read AI coverage Outperforms previous models in key benchmarks These new abilities aren’t just impressive in theory; OpenAI says both models outperform their predecessors regarding top academic and AI benchmarks. “Our models set new state-of-the-art performance in STEM question-answering (MMMU, MathVista), chart reading and reasoning (CharXiv), perception primitives (VLMs are Blind), and visual search (V*),” the company noted in a statement. “On V*, our visual reasoning approach achieves 95.7% accuracy, largely solving the benchmark.” But the models aren’t perfect. OpenAI admits the models can sometimes overthink, leading to prolonged and unnecessary image manipulations. There are also cases where the AI might misinterpret what it sees, despite correctly using tools to analyze the image. The company also warned of reliability issues when trying the same task multiple times. Who can use OpenAI o3 and o4-mini? As of April 16, both o3 and o4-mini are available to ChatGPT Plus, Pro, and Team users; they replace older models like o1 and o3-mini. Enterprise and education users will get access next week, and free users can try o4-mini through a new “Think” feature. source

OpenAI's New AI Models o3 and o4-mini Can Now 'Think With Images' Read More »

CIOs highlight negotiation opportunities as AWS and Google lower cloud costs

“I always look at the platform’s reliability, the quality of support, how well it integrates with our systems, and whether it gives us the flexibility to scale or pivot in the future. Price cuts may open the door, but the overall value and long-term fit matter,” Nekvinda added. For Wilfredo Perez, CIO at Muvi Cinemas, a multi-cloud and platform-agnostic approach will minimise the impact of any single provider’s price drop. “For sure, we celebrate a price reduction, but our strategy goes in the direction of being platform-independent or multi-cloud. Combining this idea with a pay-per-use model, we can balance and get a higher ROI,” he said. Perez divides workloads between pay-per-use, cloud-native services for transactional needs and reserved instances for resource-intensive tasks, using containers for maximum flexibility. In contract negotiations, he focuses on cost transparency, clear SLAs, and the ability to add licenses as needed. source

CIOs highlight negotiation opportunities as AWS and Google lower cloud costs Read More »

Trump’s ‘war on science’ hands Europe major tech talent opportunity

As the Trump administration ramps up what academics call a  “war on science,” US researchers are increasingly looking to Europe for new opportunities — which could be good news for the continent’s tech sectors. France, in particular, is positioning itself as a safe haven for scientists. In a not-so-subtle appeal to disaffected US talent on Friday, the country’s president, Emmanuel Macron, called on researchers to “choose France, choose Europe” for their next job. In a post on X, he promoted a new platform that aims to make it easier for international scientists to conduct research in the country.  “Here in France, research is a priority, innovation a culture, science a limitless horizon,” he said. Yann LeCun, Meta’s chief AI scientist, called Macron’s announcement a “smart move.” LeCun has previously criticised Trump’s cuts to science funding at institutions such as Harvard, Columbia, and NASA. From Shark Tank to Tinder Swindler TNW Conference 2025 combines the latest breakthroughs in tech, the startup ecosystem & enterprise innovation “The US seems set on destroying its public research funding system,” he said in a LinkedIn post last month. “[Europe] may have an opportunity to attract some of the best scientists in the world.” European institutions are already seizing that opportunity. Last month, France’s Aix-Marseille University opened applications for its Safe Space for Science scheme, which specifically targets US researchers looking to relocate. Belgium’s Vrije Universiteit Brussel has opened a similar programme targeting American scientists “under threat.” Europe’s appeal to refugees from a war on science Three out of four US researchers recently surveyed by Nature said they were thinking about relocating to Europe or Canada, driven by growing concerns over President Trump’s stance on science.  An exodus of US researchers could have knock-on impacts on Europe’s tech ecosystem. Many of the continent’s most successful startups — from DeepMind to ClimeWorks — emerged from university labs.   Kanika Chandaria, a climate expert at Danish carbon credit startup Agreena, told TNW that the exodus of US researchers presents a “strategic opportunity for European countries,” especially in climate tech. With the US rolling back climate protections, European countries could move to “attract top talent and position themselves at the forefront of climate research and technology development,” she said. However, while Europe hopes to lure in disillusioned US scientists with promises of a high quality of life and research freedoms, there are potential drawbacks to relocating. LeCun highlighted several of them, including lower compensation than in the US and limited access to research funding. “To attract the best scientific and technological talents, make science and technology research professions attractive,” he wrote. “It’s pretty straightforward.” European tech talent will flock to Amsterdam on June 19-20 for TNW Conference. Tickets for the event are now on sale. Use the code TNWXMEDIA2025 at the check-out to get 30% off the price tag. source

Trump’s ‘war on science’ hands Europe major tech talent opportunity Read More »

Forrester’s Top Threats For 2025

2025 started with a bang! Technology and geopolitics are changing so fast that many can’t keep track of the latest trends, with an announcement of new, benchmark-shattering genAI-related tech seemingly every week. Meanwhile, planned job cuts across US employers are at their highest levels since 2020, we are on the brink of a global trade war, and geopolitical tensions are high. On the plus side, there was a reported 35% year-over-year decrease in ransomware payments from 2024, but we’re not even one-third of the way through 2025 and things are already hectic. To help security leaders better prepare for the chaos that is and will be this year, Forrester has released our yearly report on the top threats that we expect organizations to face in 2025. Read the full report here: The Top Cybersecurity Threats In 2025. This report is based on data and trends from the changing dynamics in the threat landscape. We expect that organizations will face the following in 2025: Global regulatory disruptions. Some regulations are being established in force this year, others are cropping up net new, and still others are being revoked. With so much regulatory change, organizations must focus on compliance change management and prioritize requirements that are being enforced now. High-quality deepfakes. Convincing deepfakes are becoming easier to create thanks to the proliferation of open-source algorithms, purpose-built websites, cheap GPU power, and the wide availability of voice and audio profiles. Mitigating deepfakes requires an investment in end-user education and the implementation of strong authentication methods. Tech exuberance over generative AI. The anthropomorphizing of genAI means that people trust it even when they shouldn’t, which puts your organization at risk. It’s critical to invest in ML and AI security tools and create processes focused on discovery, policy enforcement, and detection and response. Job loss radicalization. A new economic reality has emerged in 2025, with a flurry of activity that saw job cuts to 4% of the US federal government workforce, massive tech layoffs, and job cuts in Europe. Managing potential insider threats with an insider risk management program is paramount this year. Generative AI-driven extortion. Ransomware became less lucrative in 2024, and attackers are likely to mix things up because of it. Before genAI, stealing data was only so useful — reviewing millions of emails takes far too much time. Now, with GenAI, attackers can perform a quick sentiment analysis on troves of stolen data for extortion schemes. Prepare now for infostealers, which lead to extortion and will become a bigger threat than ransomware. For more on what to know about these threats and what to do about them, read the full report. If you have more questions about the threat landscape, book an inquiry or guidance session with me or one of my colleagues. source

Forrester’s Top Threats For 2025 Read More »