The 5 hottest scaleups in France enter TECH5’s ‘Champions League of Tech'

Five flourishing French scaleups have made it into TECH5 — the “Champions League of Technology.” The quintet will now compete for the title of top scaleup in Europe. The contest concludes on June 19-20, when the TECH5 champion will be announced on the main stage of TNW Conference. But first, the contenders have to win a regional crown. For the French challengers, that’s no easy task.  The country’s tech sector has been going through a historic boom. Over the past decade, the startup scene has seen the most dramatic growth of any European country, with investments surging nearly 1000% to €53bn, according to VC firm Atomico. 40% off TNW Conference! For 1 week only… Register by 28 April & save up to €700 on General Admission, Corporate, VIP & Investor Passes, and Startup/Scaleup packages One of the largest recent funding rounds went to French AI darling Mistral. In June 2024, the Paris-based company raised €468mn and became Europe’s most valuable AI startup. France’s blend of highly skilled talent, increased government support, and expanded investment streams has created fertile ground for tech firms to grow. This has laid the foundation for an impressive flock of scaleups. Yet only five of them could enter the TECH5 finals. Our judges selected them based on an analysis of their growth, impact, and future potential. Their evaluation led us to the following high-flying scaleups, listed in random order: 1. Kinetix A frontrunner in the thriving AI scene of Paris, Kinetix specialises in 3D character animation for games. Using GenAI tools, the company transforms camera footage and text prompts into precise animations. AI filters can then add extensive customisations to the visuals. The platform makes 3D content creation accessible to anyone. “At Kinetix, we believe character motion is at the heart of storytelling,” the scaleup told TNW. “Whether in games, entertainment, or branded content, movement brings digital characters to life and creates meaningful, engaging narratives.” Kinetix is best known for its embeddable AI emote feature, which lets players create and use custom emotes in-game. It’s a concept that has attracted booming demand. Fortnite alone has over 1,000 emotes. By 2030, the digital human avatar market is forecast to reach over €450bn. 2. Kovalee Kovalee has developed a powerful publishing platform for non-gaming apps. The scaleup wants to give every promising content creator a chance to build the best app in their field — regardless of their resources. Through product enhancement, monetisation boost, user acquisition, and app store optimisation services, Kovalee has fostered numerous success stories. Several have become category leaders in the App Store, from stretching platform Bend to motivational companion PetTalk. The model has fostered a rapid rise for Kovalee, which was founded in 2020. Last year, the company topped Sifted’s list of France’s fastest-growing startups after an eye-catching 626% two-year revenue growth. VC firm Iris, which led an €8mn Series A investment in the company in 2023, said Kovalee has “the potential to become the leading non-gaming publishing platform.” 3. Swan One of Europe’s premier fintechs, Swan provides a straightforward route to embedding banking features. Via simple APIs, companies can quickly integrate services including accounts, cards, and payments into their own products. Swan was founded in 2019 by three fintech veterans and seasoned entrepreneurs. The trio had first-hand experience with the frustrations of embedded finance, from the interminable meetings and piles of paperwork to the clunky APIs. They launched Swan to offer an alternative.  Nicolas Benady, the company’s CEO and co-founder, has an ambitious goal for the business: “Swan is on a mission to build the leading tech-driven bank in Europe.” Investors have been impressed by the plans. In January, Swan announced it had raised €42mn, bringing the scaleup’s total funding to an estimated €100mn.  4. Qovoltis Qovoltis has created an innovative all-in-one EV charging solution. It comprises a smart charging station that adjusts power in real time, a mobile app for remote management, and a novel energy optimisation system. Last year, Qovoltis expanded its product line with the launch of the Qobox mini, an ultra-compact smart charger. The model is the first charging station to earn an “Origine France Garantie” certificate — a guarantee of French production and quality. It also won the Made in France Innovation Grand Prix 2024. The milestone year culminated in a €45mn Series A funding round. Qovoltis president Ehsan Emani — who founded the company in 2019 — described the cash injection as a “decisive step” for the business. “It will enable us to expand our commercial offerings and solidify our role in the transition to sustainable electric mobility,” he said. 5. Dalma Dalma has pioneered a new approach to pet insurance. The company’s insurance reimburses all veterinary expenses within 48 hours — with no excess or hidden fees. Founded in 2021, Dalma has rapidly expanded — and still has enormous growth potential. Nearly half of European households have a pet, on which they collectively spend an estimated €24.6bn annually, opening up a lucrative market for insurers.  Investors have identified Dalma as one of the industry’s front runners. Last month, the company raised €20mn, taking its total funding to over €50mn, according to Bounce Watch data. “Our ambition for Dalma is to build the pet insurance leader in Europe — one that not only provides financial protection but also fundamentally improves pet healthcare,” Dalma told TNW. What’s next for the French scaleups? The fabulous French five will compete for the TECH5 title with contenders from six other regions. At TNW Conference in June, the grand champion will be crowned Europe’s hottest scaleup.  The challengers from France, Benelux, the Nordics, and DACH have now all been chosen. Next week, we reveal the finalists from another region in the tournament: Southern Europe. TECH5 is part of a packed programme for TNW Conference, which takes place on June 19-20 in Amsterdam. Tickets for the event are now on sale. Use the code TNWXMEDIA2025 at the check-out to get 30% off the price tag. source

The 5 hottest scaleups in France enter TECH5’s ‘Champions League of Tech' Read More »

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Anthropic, the AI company founded by former OpenAI employees, has pulled back the curtain on an unprecedented analysis of how its AI assistant Claude expresses values during actual conversations with users. The research, released today, reveals both reassuring alignment with the company’s goals and concerning edge cases that could help identify vulnerabilities in AI safety measures. The study examined 700,000 anonymized conversations, finding that Claude largely upholds the company’s “helpful, honest, harmless” framework while adapting its values to different contexts — from relationship advice to historical analysis. This represents one of the most ambitious attempts to empirically evaluate whether an AI system’s behavior in the wild matches its intended design. “Our hope is that this research encourages other AI labs to conduct similar research into their models’ values,” said Saffron Huang, a member of Anthropic’s Societal Impacts team who worked on the study, in an interview with VentureBeat. “Measuring an AI system’s values is core to alignment research and understanding if a model is actually aligned with its training.” Inside the first comprehensive moral taxonomy of an AI assistant The research team developed a novel evaluation method to systematically categorize values expressed in actual Claude conversations. After filtering for subjective content, they analyzed over 308,000 interactions, creating what they describe as “the first large-scale empirical taxonomy of AI values.” The taxonomy organized values into five major categories: Practical, Epistemic, Social, Protective, and Personal. At the most granular level, the system identified 3,307 unique values — from everyday virtues like professionalism to complex ethical concepts like moral pluralism. “I was surprised at just what a huge and diverse range of values we ended up with, more than 3,000, from ‘self-reliance’ to ‘strategic thinking’ to ‘filial piety,’” Huang told VentureBeat. “It was surprisingly interesting to spend a lot of time thinking about all these values, and building a taxonomy to organize them in relation to each other — I feel like it taught me something about human values systems, too.” The research arrives at a critical moment for Anthropic, which recently launched “Claude Max,” a premium $200 monthly subscription tier aimed at competing with OpenAI’s similar offering. The company has also expanded Claude’s capabilities to include Google Workspace integration and autonomous research functions, positioning it as “a true virtual collaborator” for enterprise users, according to recent announcements. How Claude follows its training — and where AI safeguards might fail The study found that Claude generally adheres to Anthropic’s prosocial aspirations, emphasizing values like “user enablement,” “epistemic humility,” and “patient wellbeing” across diverse interactions. However, researchers also discovered troubling instances where Claude expressed values contrary to its training. “Overall, I think we see this finding as both useful data and an opportunity,” Huang explained. “These new evaluation methods and results can help us identify and mitigate potential jailbreaks. It’s important to note that these were very rare cases and we believe this was related to jailbroken outputs from Claude.” These anomalies included expressions of “dominance” and “amorality” — values Anthropic explicitly aims to avoid in Claude’s design. The researchers believe these cases resulted from users employing specialized techniques to bypass Claude’s safety guardrails, suggesting the evaluation method could serve as an early warning system for detecting such attempts. Why AI assistants change their values depending on what you’re asking Perhaps most fascinating was the discovery that Claude’s expressed values shift contextually, mirroring human behavior. When users sought relationship guidance, Claude emphasized “healthy boundaries” and “mutual respect.” For historical event analysis, “historical accuracy” took precedence. “I was surprised at Claude’s focus on honesty and accuracy across a lot of diverse tasks, where I wouldn’t necessarily have expected that theme to be the priority,” said Huang. “For example, ‘intellectual humility’ was the top value in philosophical discussions about AI, ‘expertise’ was the top value when creating beauty industry marketing content, and ‘historical accuracy’ was the top value when discussing controversial historical events.” The study also examined how Claude responds to users’ own expressed values. In 28.2% of conversations, Claude strongly supported user values — potentially raising questions about excessive agreeableness. However, in 6.6% of interactions, Claude “reframed” user values by acknowledging them while adding new perspectives, typically when providing psychological or interpersonal advice. Most tellingly, in 3% of conversations, Claude actively resisted user values. Researchers suggest these rare instances of pushback might reveal Claude’s “deepest, most immovable values” — analogous to how human core values emerge when facing ethical challenges. “Our research suggests that there are some types of values, like intellectual honesty and harm prevention, that it is uncommon for Claude to express in regular, day-to-day interactions, but if pushed, will defend them,” Huang said. “Specifically, it’s these kinds of ethical and knowledge-oriented values that tend to be articulated and defended directly when pushed.” The breakthrough techniques revealing how AI systems actually think Anthropic’s values study builds on the company’s broader efforts to demystify large language models through what it calls “mechanistic interpretability” — essentially reverse-engineering AI systems to understand their inner workings. Last month, Anthropic researchers published groundbreaking work that used what they described as a “microscope” to track Claude’s decision-making processes. The technique revealed counterintuitive behaviors, including Claude planning ahead when composing poetry and using unconventional problem-solving approaches for basic math. These findings challenge assumptions about how large language models function. For instance, when asked to explain its math process, Claude described a standard technique rather than its actual internal method — revealing how AI explanations can diverge from actual operations. “It’s a misconception that we’ve found all the components of the model or, like, a God’s-eye view,” Anthropic researcher Joshua Batson told MIT Technology Review in March. “Some things are in focus, but other things are still unclear — a distortion of the microscope.” What Anthropic’s research means for enterprise AI decision makers For technical decision-makers evaluating AI systems for their organizations, Anthropic’s research offers several key takeaways. First, it suggests

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own Read More »

Government Organizations: Link Your Digital Metrics To Customer-Led Mission Outcomes

Digital leaders in government share a common task: transforming the way the government serves the public using technology. Despite how simple that sounds, the task is quite complex and requires keen understanding of many elements that influence how governments make decisions and operate. That’s why we studied examples of digital public services around the world, at different jurisdictional levels, and across various mandates (e.g. public health, education, transport). As a result, we created a standardized framework that links customer, operations, and budget measures to mission outcomes; and assembled a comprehensive inventory of typical metrics digital leaders in public organizations can use to track the success of their digital initiatives. Forrester’s public sector clients can access the full version report and the digital metrics inventory file, including the step-by-step approach, using the links below.  During our extensive research and granular analysis of government institutions that serve the public at local, regional, and national levels in North America, Europe and APAC, we’ve observed many challenges that governments of every type and size are facing, but also the lessons from those who’ve managed to succeed. Here are the top 3 lessons learned:   Lesson #1: What comes first: the agency mission or the customer outcome?  While looking at how some government agencies talk about their mission, we couldn’t ignore the feeling that there is some confusion among them about what should come first – the mission or the customer. This “chicken or egg” dilemma is an existential one because it affects how government entities prioritize and execute their digital strategy. This got me thinking about what my old law professor said: “The law is made for the people, not the other way around.” If we agree with that piece of wisdom, then the customer is the mission! So, in CX/DX speak, it means: “bending the mission to serve your customers, not bending the customers to serve your mission”; and that’s the most important lesson to begin with.  Lesson #2: Start tracking and reporting the metrics customers (the public) care about.  There is no shortage of numbers and statistics gathered, and sometimes transparently reported, by government agencies. But there is a shortage of insights customers (i.e. the public) would really care about. Ask yourself: “who cares more about the number of downloads of a government app or total number of digital transactions – the government or the people?” If numbers tell a story, what story will these numbers tell the people who are -let’s not forget – the rightful audience? Instead, savvy government organizations track combinations of metrics that manifest outcomes created for the people, like:   Achieving double-digit growth in children’s vaccination rate (outcome) as a result of online campaigns (proxy) and alerts (proxy) targeting new parents, including parents without insurance (proxy), and instant e-scheduling (proxy) for vaccination shots (proxy) to ensure the complete set of recommended vaccinations by age 3 (goal).  Shortening the average emergency response time (proxy) by fire and paramedic teams, thanks to automated monitoring devices and software (proxy) and alerts (proxy), resulting in a double-digit drop in fatalities by fire and heart-stokes (outcome) which are among the leading morbidities targeted for reduction (goal).  The lesson here is: make you ‘digital story’ about the stuff that people care about using the right metrics.  Lesson #3: Learn to tie digital metrics to $-signs.  If you didn’t pay for it, it didn’t get built. Yeah – budgets are important for digital ambitions, but what’s important to the budget-holders is not your digital ambitions, but ‘dollars and cents’ (or Swiss Francs, because they are more trendy nowadays– pun intended). Specifically, what the budget bosses want to know is how the costs for the digital ambitions are counted, spent, and justified. Therefore, the third – and probably the least talked about lesson is about how to connect your digital story to the money story. At Forrester we’ve been talking about this for many years, but normally addressing private sector and technology sectors who are profit-driven, and revenue-obsessed. In a government setting, without the focus on making profits, the financial talk comes down to budgets and that primarily means budget planning, cost-accounting, and constantly trying to squeeze every ounce of efficiency per dollar spent. In that context, budget-savvy digital leaders are prone to using granular tech and operational resource accounting methods, hawk-eye cost controls, and superior business-case making skills that make them more convincing. The lesson here is: learn to speak the finance language and adopt the requisite discipline.   source

Government Organizations: Link Your Digital Metrics To Customer-Led Mission Outcomes Read More »

Addressing Antitrust Scrutiny Over AI-Powered Pricing Tools

By Josh Goodman, Minna Lo Naranjo and Amir Ali ( April 17, 2025, 6:56 PM EDT) — While algorithmic pricing has been used in many industries for decades, the rapid development of artificial intelligence technology has led antitrust enforcers — including federal agencies and state attorneys general, legislators, and private plaintiffs — to begin actively scrutinizing potential anticompetitive practices related to the use of algorithmic pricing tools, particularly such tools that may involve systems considered to be AI…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Addressing Antitrust Scrutiny Over AI-Powered Pricing Tools Read More »

An Unrestrained, Bright-Eyed View Of Legal AI's Future

By Todd Itami ( April 18, 2025, 3:34 PM EDT) — I have some bad news for you, legal industry: Not everyone is going to be raptured by artificial intelligence. I hate to get all eschatological on you, but it’s difficult to see any scenario where future generations will spend so much money on human legal professionals…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

An Unrestrained, Bright-Eyed View Of Legal AI's Future Read More »

Get Microsoft Project 2021 for Just $15

TL;DR: Better projects start with Microsoft Project Professional. Get it for just $14.97 (reg. $249.99) at TechRepublic Academy. Every company is striving to be as efficient as possible, but every company deals with waste and inefficiency. One good way to limit waste and inefficiency is by investing in the right tools to keep your company on task and under budget. One of the leading tools is Microsoft Project Professional 2021, and you can get it for just $14.97 at TechRepublic Academy. That’s a small price to pay for a program that will continue to yield results and savings through successfully managed projects over time. About Microsoft Project Professional Microsoft Project Professional has earned 4.4/5 stars from GetApp and Capterra because it’s a powerful, intuitive tool that makes project management easier. With Project Professional, you’ll have a host of pre-built templates to organize a wide variety of projects and tools to manage timelines, budgets and resources. You can run what-if scenarios to explore the potential outcomes of decisions before you make them, visually represent the schedules of multiple stakeholders and projects to ensure everyone is aligned and use automated reporting and scheduling tools to reduce inefficiencies. You can even plug in with Project Online and Project Server to manage even more data points through one central hub. You can get Microsoft Project Professional 2021 for just $14.97 (reg. $249.99). This offer is made possible by an Authorized Microsoft Partner and is not to be missed. Prices and availability are subject to change. source

Get Microsoft Project 2021 for Just $15 Read More »

2. How Americans view the Russia-Ukraine war

Here are several key takeaways about U.S. opinion of the war between Russia and Ukraine: Democrats and Republicans are divided – and this divide has grown wider – when it comes to U.S. responsibility to help Ukraine defend itself and levels of concern over possible conflict outcomes. The survey asked about how committed four world leaders are to lasting peace between Russia and Ukraine: French President Emmanuel Macron, Russian President Vladimir Putin, U.S. President Donald Trump and Ukrainian President Volodymyr Zelenskyy. A majority of Americans say Zelenskyy is committed to lasting peace, while 19% say the same of Putin. Fewer than half (47%) say Trump is committed to peace, and 45% say this of Macron. U.S. responsibility to help Ukraine More than four-in-ten Americans (44%) say the U.S. has a responsibility to help Ukraine defend itself from Russia’s invasion, while 53% say the nation does not have this responsibility. Views on this issue have shifted over recent months. Fewer Americans now believe the U.S. has a responsibility to help Ukraine in its war against Russia than said so in a November 2024 survey fielded after the U.S. presidential election. At that time, 50% held this opinion. Partisanship and age Opinion on the United States’ responsibility to Ukraine is divided along partisan lines. Two-thirds of Democrats and Democratic-leaning independents say the U.S. has a responsibility help Ukraine defend itself, compared with 23% of Republicans and Republican leaners. The share of Republicans who believe the U.S. should aid in Ukraine’s defense has dropped 13 points since November, while the share of Democrats who say the same is largely unchanged over the same period. Partisans are also divided by age. Republicans and Democrats ages 50 and older are more likely than their younger counterparts to say the U.S. has a responsibility to help Ukraine. National and personal importance of the Russia-Ukraine war Roughly seven-in-ten Americans (69%) view the war between Russia and Ukraine as important to U.S. national interests. A 56% majority of U.S. adults also say the Russia-Ukraine war is at least somewhat important to them personally. Partisanship and age Views of the war’s national and personal importance vary by party. Democrats are more likely than Republicans to say that the war is important on both counts. Liberal Democrats are especially likely to say the Russia-Ukraine war is important both to U.S. national interests and to them personally: About nine-in-ten liberal Democrats (88%) say this, compared with 72% of conservative or moderate Democrats. Conservative Republicans are more likely than liberal or moderate Republicans to view the war as important to U.S. interests (66% vs. 58%). The share of Republicans who say the war between Russia and Ukraine is important to them personally has dropped by 9 points since January 2024, and the share who say it is important to U.S. interests has dropped by 6 points. In comparison, views of the war’s personal importance among Democrats remain unchanged over the same period, and the share of Democrats who see the war as important to U.S. interests has declined by only 3 points. Views also vary by age more generally, with older U.S. adults more likely to consider the Russia-Ukraine war important personally and nationally. About three-quarters of Americans ages 65 and older say the war is at least somewhat important to them personally, while half of adults under 30 say the same. There’s a similar gap in the shares of older and younger Americans who see the war as at least somewhat important to U.S. national interests (81% vs. 61%). Differences between older and younger adults are evident among both Republicans and Democrats. Older Republicans (those ages 50 and older) are more likely than younger Republicans ages 18 to 49 (70% vs. 56%) to say the Russia-Ukraine war is important to U.S. national interests. And older Democrats are more likely to say this than younger Democrats (84% vs. 73%). This pattern holds on the question of the war’s personal importance. Views of Russia Americans who consider Russia an enemy of the U.S. are more likely than those who view Russia as a partner or competitor to see the war as important both to U.S. interests and to themselves personally. Roughly seven-in-ten of those who see Russia as an enemy (68%) say that the war is at least somewhat important to them personally, compared with 45% among those who see Russia as a competitor and 41% of those who see Russia as a partner. Differences by views of Russia also exist on the question about national importance. Concerns about possible outcomes of the Russia-Ukraine war More than four-in-ten U.S. adults (43%) are extremely or very concerned about Ukraine possibly being defeated and taken over by Russia; another 29% are somewhat concerned and 28% are not too or not at all concerned. Nearly half (47%) are extremely or very concerned about Russia invading other countries in the region, while 26% are somewhat and 26% not too or not at all concerned. Concerns about both situations have not changed much over the past year, but are lower than when the public was first asked about these outcomes in the early months of Russia’s invasion. In April 2022, 55% were extremely or very concerned about a Ukrainian defeat, and 59% said the same about a wider Russian invasion. Partisanship Democrats have long been more likely than Republicans to say they are extremely or very concerned about both a possible Ukrainian defeat and a Russian invasion of other countries in the region. But the partisan gap on each of these questions has grown wider over the past year. Today, 29% of Republicans are extremely or very concerned about Russia invading other countries in the region, down from 42% in July 2024. Democrats’ opinions have not changed as much. World leaders’ commitment to lasting peace between Russia and Ukraine A majority of Americans (59%) say that Ukrainian President Volodymyr Zelenskyy is committed to lasting peace between Russia and Ukraine, while 47% say this about U.S. President Donald

2. How Americans view the Russia-Ukraine war Read More »

Could MCP supercharge the agentic AI revolution?

The key value of MCP is bringing together multiple tools, LLMs, and data sources, allowing autonomous agents to provide answers and solutions to real-world problems. The ease of discoverability of these resources in near real time is another challenge. Google solved the problem of locating information on the web 25 years ago through its indexing and PageRank algorithm. As users flocked to the search engine, website owners optimized their content for greater visibility, bending much of the web to Google’s algorithm. MCP servers are at the heart of this agentic AI transformation, and various initiatives are underway to catalog and provide access to them. MCP.so currently lists and offers connections to over 4,800 MCP servers with the number growing daily. Another potential challenge lies with the MCP standard forking into a more proprietary format through corporate capture. Microsoft tried to colonize the web in the 1990s through the Internet Explorer browser and use of its VBScript and Jscript scripting languages. Although ultimately unsuccessful, it could have derailed the explosion of digital innovation of the last 30 years. The greater good Despite these challenges, there’s a positive future for MCP. The dynamic developer communities sharing information and best practices provide a strong foundation for experimentation and innovation. The more recent entry of tech giants in providing solutions to extend MCP’s potential beyond the desktop and across external networks should encourage CIOs to explore the standard’s potential for their organization. source

Could MCP supercharge the agentic AI revolution? Read More »

DeepSeek unveils new technique for smarter, scalable AI reward models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs).  Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for open-ended tasks and domains where current models can’t capture the nuances and complexities of their environment and users. The crucial role and current limits of reward models Reinforcement learning (RL) has become a cornerstone in developing state-of-the-art LLMs. In RL, models are fine-tuned based on feedback signals that indicate the quality of their responses.  Reward models are the critical component that provides these signals. Essentially, an RM acts as a judge, evaluating LLM outputs and assigning a score or “reward” that guides the RL process and teaches the LLM to produce more useful responses. However, current RMs often face limitations. They typically excel in narrow domains with clear-cut rules or easily verifiable answers. For example, current state-of-the-art reasoning models such as DeepSeek-R1 underwent an RL phase, in which they were trained on math and coding problems where the ground truth is clearly defined. However, creating a reward model for complex, open-ended, or subjective queries in general domains remains a major hurdle. In the paper explaining their new technique, researchers at DeepSeek AI write, “Generalist RM requires to generate high-quality rewards beyond specific domains, where the criteria for rewards are more diverse and complex, and there are often no explicit reference or ground truth.”  They highlight four key challenges in creating generalist RMs capable of handling broader tasks: Input flexibility: The RM must handle various input types and be able to evaluate one or more responses simultaneously. Accuracy: It must generate accurate reward signals across diverse domains where the criteria are complex and the ground truth is often unavailable.  Inference-time scalability: The RM should produce higher-quality rewards when more computational resources are allocated during inference. Learning scalable behaviors: For RMs to scale effectively at inference time, they need to learn behaviors that allow for improved performance as more computation is used. Different types of reward models Credit: arXiv Reward models can be broadly classified by their “reward generation paradigm” (e.g., scalar RMs outputting a single score, generative RMs producing textual critiques) and their “scoring pattern” (e.g., pointwise scoring assigns individual scores to each response, pairwise selects the better of two responses). These design choices affect the model’s suitability for generalist tasks, particularly its input flexibility and potential for inference-time scaling.  For instance, simple scalar RMs struggle with inference-time scaling because they will generate the same score repeatedly, while pairwise RMs can’t easily rate single responses.  The researchers propose that “pointwise generative reward modeling” (GRM), where the model generates textual critiques and derives scores from them, can offer the flexibility and scalability required for generalist requirements. The DeepSeek team conducted preliminary experiments on models like GPT-4o and Gemma-2-27B, and found that “certain principles could guide reward generation within proper criteria for GRMs, improving the quality of rewards, which inspired us that inference-time scalability of RM might be achieved by scaling the generation of high-quality principles and accurate critiques.”  Training RMs to generate their own principles Based on these findings, the researchers developed Self-Principled Critique Tuning (SPCT), which trains the GRM to generate principles and critiques based on queries and responses dynamically.  The researchers propose that principles should be a “part of reward generation instead of a preprocessing step.” This way, the GRMs could generate principles on the fly based on the task they are evaluating and then generate critiques based on the principles.  “This shift enables [the] principles to be generated based on the input query and responses, adaptively aligning [the] reward generation process, and the quality and granularity of the principles and corresponding critiques could be further improved with post-training on the GRM,” the researchers write. Self-Principled Critique Tuning (SPCT) Credit: arXiv SPCT involves two main phases: Rejective fine-tuning: This phase trains the GRM to generate principles and critiques for various input types using the correct format. The model generates principles, critiques and rewards for given queries/responses. Trajectories (generation attempts) are accepted only if the predicted reward aligns with the ground truth (correctly identifying the better response, for instance) and rejected otherwise. This process is repeated and the model is fine-tuned on the filtered examples to improve its principle/critique generation capabilities. Rule-based RL: In this phase, the model is further fine-tuned through outcome-based reinforcement learning. The GRM generates principles and critiques for each query, and the reward signals are calculated based on simple accuracy rules (e.g., did it pick the known best response?). Then the model is updated. This encourages the GRM to learn how to generate effective principles and accurate critiques dynamically and in a scalable way. “By leveraging rule-based online RL, SPCT enables GRMs to learn to adaptively posit principles and critiques based on the input query and responses, leading to better outcome rewards in general domains,” the researchers write. To tackle the inference-time scaling challenge (getting better results with more compute), the researchers run the GRM multiple times for the same input, generating different sets of principles and critiques. The final reward is determined by voting (aggregating the sample scores). This allows the model to consider a broader range of perspectives, leading to potentially more accurate and nuanced final judgments as it is provided with more resources. However, some generated principles/critiques might be low-quality or biased due to model limitations or randomness. To address this, the researchers introduced a “meta RM”—a separate, lightweight scalar RM trained specifically to predict whether a principle/critique generated by the primary GRM will likely lead to a correct final reward.  During inference, the meta RM evaluates the generated samples and filters out the low-quality judgments before the final voting, further enhancing scaling performance. Putting SPCT into practice with

DeepSeek unveils new technique for smarter, scalable AI reward models Read More »