VentureBeat

DeepSeek-R1 is a boon for enterprises — making AI apps cheaper, easier to build, and more innovative

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The release of the DeepSeek-R1 reasoning model has caused shockwaves across the tech industry, with the most obvious sign being the sudden selloff of major AI stocks. The advantage of well-funded AI labs such as OpenAI and Anthropic no longer seems very solid, as DeepSeek has reportedly been able to develop its o1 competitor at a fraction of the cost. While some AI labs are currently in crisis mode, as far as the enterprise sector is concerned, it’s mostly good news.  Cheaper applications, more applications As we had said here before, one of the trends worth watching in 2025 is the continued drop in the cost of using AI models. Enterprises should experiment and build prototypes with the latest AI models regardless of the price, knowing that continued price reduction will enable them to eventually deploy their applications at scale.  That trendline just saw a huge step change. OpenAI o1 costs $60 per million output tokens versus $2.19 per million for DeepSeek-R1. And, if you’re concerned about sending your data to Chinese servers, you can access R1 on U.S.-based providers such as Together.ai and Fireworks AI, where it is priced at $8 and $9 per million tokens, respectively — still a huge bargain in comparison to o1.  To be fair, o1 still has the edge over R1, but not so much as to justify such a huge price difference. Moreover, the capabilities of R1 will be sufficient for most enterprise applications. And, we can expect more advanced and capable models to be released in the coming months. We can also expect second-order effects on the overall AI market. For instance, OpenAI CEO Sam Altman announced that free ChatGPT users will soon have access to o3-mini. Although he did not explicitly mention R1 as the reason, the fact that the announcement was made shortly after R1 was released is telling. More innovation R1 still leaves a lot of questions unanswered — for example, there are multiple reports that DeepSeek trained the model on outputs from OpenAI large language models (LLMs). But if its paper and technical report are correct, DeepSeek was able to create a model that nearly matches the state of the art while slashing costs and removing some of the technical steps that require a lot of manual labor. If others can reproduce DeepSeek’s results, it can be good news for AI labs and companies that were sidelined by the financial barriers to innovation in the field. Enterprises can expect faster innovation and more AI products to power their applications. What will happen to the billions of dollars that big tech companies have spent on acquiring hardware accelerators? We still haven’t reached the ceiling of what is possible with AI, so leading tech companies will be able to do more with their resources. More affordable AI will, in fact, increase demand in the medium to long term. But more importantly, R1 is proof that not everything is tied to bigger compute clusters and datasets. With the right engineering chops and good talent, you will be able to push the limits of what is possible.  Open source for the win To be clear, R1 is not fully open-source, as DeepSeek has only released the weights, but not the code or full details of the training data. Nonetheless, it is a big win for the open source community. Since the release of DeepSeek-R1, more than 500 derivatives have been published on Hugging Face, and the model has been downloaded millions of times. It will also give enterprises more flexibility over where to run their models. Aside from the full 671-billion-parameter model, there are distilled versions of R1, ranging from 1.5 billion to 70 billion parameters, enabling companies to run the model on a variety of hardware. Moreover, unlike o1, R1 reveals its full thought chain, giving developers a better understanding of the model’s behavior and the ability to steer it in the desired direction. With open source catching up to closed models, we can hope for a renewal of the commitment to share knowledge and research so that everyone can benefit from advances in AI. source

DeepSeek-R1 is a boon for enterprises — making AI apps cheaper, easier to build, and more innovative Read More »

Ai2 releases Tülu 3, a fully open source model that bests DeepSeek v3, GPT-4o with novel post-training approach

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The open-source model race just keeps on getting more interesting.  Today, the Allen Institute for AI (Ai2) debuted its latest entry in the race with the launch of its open-source Tülu 3 405 billion-parameter large language model (LLM). The new model not only matches the capabilities of OpenAI’s GPT-4o, it surpasses DeepSeek’s v3 model across critical benchmarks. This isn’t the first time the Ai2 has made bold claims about a new model. In November 2024 the company released its first version of Tülu 3, which had both 8- and 70-billion parameter versions. At the time, Ai2 claimed the model was on par with the latest GPT-4 model from OpenAI, Anthropic’s Claude and Google’s Gemini. The big difference is that Tülu 3 is open-source. Ai2 also claimed back in September 2024 that its Molmo models were able to beat GPT-4o and Claude on some benchmarks.  While benchmark performance data is interesting, what’s perhaps more useful is the training innovations that enable the new Ai2 model. Pushing post-training to the limit The big breakthrough for Tülu 3 405B is rooted in an innovation that first appeared with the initial Tülu 3 release in 2024. That release utilized a combination of advanced post-training techniques to get better performance. With the Tülu 3 405B model, those post-training techniques have been pushed even further, using an advanced post-training methodology that combines supervised fine-tuning, preference learning, and a novel reinforcement learning approach that has proven exceptional at larger scales. “Applying Tülu 3’s post-training recipes to Tülu 3-405B, our largest-scale, fully open-source post-trained model to date, levels the playing field by providing open fine-tuning recipes, data and code, empowering developers and researchers to achieve performance comparable to top-tier closed models,” Hannaneh Hajishirzi, senior director of NLP Research at Ai2 told VentureBeat. Advancing the state of open-source AI post-training with RLVR Post-training is something that other models, including DeepSeek v3, do as well. The key innovation that helps to differentiate Tülu 3 is Ai2’s “reinforcement learning from verifiable rewards” (RLVR) system.  Unlike traditional training approaches, RLVR uses verifiable outcomes — such as solving mathematical problems correctly — to fine-tune the model’s performance. This technique, when combined with direct preference optimization (DPO) and carefully curated training data, has enabled the model to achieve better accuracy in complex reasoning tasks while maintaining strong safety characteristics. Key technical innovations in the RLVR implementation include: Efficient parallel processing across 256 GPUs Optimized weight synchronization  Balanced compute distribution across 32 nodes Integrated vLLM deployment with 16-way tensor parallelism The RLVR system showed improved results at the 405B-parameter scale compared to smaller models. The system also demonstrated particularly strong results in safety evaluations, outperforming DeepSeek V3 , Llama 3.1 and Nous Hermes 3. Notably, the RLVR framework’s effectiveness increased with model size, suggesting potential benefits from even larger-scale implementations. How Tülu 3 405B compares to GPT-4o and DeepSeek v3 The model’s competitive positioning is particularly noteworthy in the current AI landscape. Tülu 3 405B not only matches the capabilities of GPT-4o but also outperforms DeepSeek v3 in some areas, particularly with safety benchmarks. Across a suite of 10 AI benchmarks including safety benchmarks, Ai2 reported that the Tülu 3 405B RLVR model had an average score of 80.7, surpassing DeepSeek V3’s 75.9. Tülu however is not quite as good at GPT-4o, which scored 81.6. Overall the metrics suggest that Tülu 3 405B is at the very least extremely competitive with GPT-4o and DeepSeek v3 across the benchmarks. Why open-source AI matters and how Ai2 is doing it differently What makes Tülu 3 405B different for users, though, is how Ai2 has made the model available.  There is a lot of noise in the AI market about open source. DeepSeek says its model is open-source, and so is Meta’s Llama 3.1, which Tülu 3 405B also outperforms. With both DeepSeek and Llama the models are freely available for use; and some code, but not all, is available. For example, DeepSeek-R1 has released its model code and pre-trained weights but not the training data. Ai2 is taking a different approach in an attempt to be more open. “We don’t leverage any closed datasets,” Hajishirzi said. “As with our first Tülu 3 release in November 2024, we are releasing all of the infrastructure code.” She added that Ai2’s fully open approach, which includes data, training code and models, ensures users can easily customize their pipeline for everything from data selection through evaluation. Users can access the full suite of Tülu 3 models, including Tülu 3-405B, on Ai2’s Tülu 3 page, or test the Tülu 3-405B functionality through Ai2’s Playground demo space. source

Ai2 releases Tülu 3, a fully open source model that bests DeepSeek v3, GPT-4o with novel post-training approach Read More »

Observo’s AI-native data pipelines cut noisy telemetry by 70%, strengthening enterprise security

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The AI boom has set off an explosion of data. AI models need massive datasets to train on, and the workloads they power — whether internal tools or customer-facing apps — are generating a flood of telemetry data: logs, metrics, traces and more. Even with observability tools that have been around for some time, organizations are often struggling to keep up, making it harder to detect and respond to incidents in time. That’s where a new player, Observo AI, comes in.  The California-based startup, which has just been backed by Felicis and Lightspeed Venture Partners, has developed a platform that creates AI-native data pipelines to automatically manage surging telemetry flows. This ultimately helps companies like Informatica and Bill.com cut incident response times by over 40% and slash observability costs by more than half. The problem: rule-based telemetry control Modern enterprise systems generate petabyte-scale operational data on an ongoing basis.  While this noisy, unstructured information has some value, not every data point is a critical signal for identifying incidents. This leaves teams dealing with a lot of data to filter through for their response systems. If they feed everything into the system, the costs and false positives increase. On the other hand, if they pick and choose, scalability and accuracy get hit — again leading to missed threat detection and response.  In a recent survey by KPMG, nearly 50% of enterprises said they suffered from security breaches, with poor data quality and false alerts being major contributors. It’s true that some security information and event management (SIEM) systems and observability tools have rule-based filters to cut down the noise, but that rigid approach doesn’t evolve in response to surging data volumes. To address this gap, Gurjeet Arora, who previously led engineering at Rubrik, developed Observo, a platform that optimizes these operational data pipelines with the help of AI. The offering sits between telemetry sources and destinations and uses ML models to analyze the stream of data coming in. It understands this information and then cuts down the noise to decide where it should go — to a high-value incident alert and response system or a more affordable data lake covering different data categories. In essence, it finds the high-importance signals on its own and routes them to the right place.  “Observo AI…dynamically learns, adapts and automates decisions across complex data pipelines,” Arora told VentureBeat. “By leveraging ML and LLMs, it filters through noisy, unstructured telemetry data, extracting only the most critical signals for incident detection and response. Plus, Observo’s Orion data engineer automates a variety of data pipeline functions including the ability to derive insights using a natural language query capability.” What’s even more interesting here is that the platform continues to evolve its understanding on an ongoing basis, proactively adjusting its filtering rules and optimizing the pipeline between sources and destinations in real time. This ensures that it keeps up even as new threats and anomalies emerge, and does not require new rules to be set up.  Observo AI stack The value to enterprises Observo AI has been around for nine months and has already roped in over a dozen enterprise customers, including Informatica, Bill.com, Alteryx, Rubrik, Humber River Health and Harbor Freight. Arora noted that they have seen 600% revenue growth quarter-over-quarter and have already drawn some of their competitors’ customers. “Our biggest competitor today is another start-up called Cribl. We have clear product and value differentiation against Cribl, and have also displaced them at a few enterprises. At the highest level, our use of AI is the key differentiating factor, which leads to higher data optimizations and enrichment, leading to better ROI and analytics, leading to faster incident resolution,” he added, noting that the company typically optimizes data pipelines to the extent of reducing “noise” by 60-70%, as compared to competitors’ 20-30%.  The CEO did not share how the above-mentioned customers derived benefits from Observo, although he did point out what the platform has been able to do for companies operating in highly regulated industries (without sharing names). In one case, a large North American hospital was struggling with the growing volume of security telemetry from different sources, leading to thousands of insignificant alerts and massive expenses for Azure Sentinel SIEM, data retention and compute. The organization’s security operations analysts tried creating makeshift pipelines to manually sample and reduce the amount of data ingested, but they feared they could be missing some signals that could have a big impact.  With Observo’s data-source-specific algorithms, the organization was initially able to reduce more than 78% of the total log volume ingested into Sentinel while fully onboarding all the data that mattered. As the tool continues to improve, the company expect to achieve more than 85% reductions within the first three months. On the cost front, it reduced the total cost of Sentinel, including storage and compute, by over 50%. This allowed their team to prioritize the most important alerts, leading to a 35% reduction in mean time to resolve critical incidents.  Similarly, in another case, a global data and AI company was able to reduce its log volumes by more than 70% and reduce its total Elasticsearch Observability and SIEM costs by more than 40%.  Plan ahead As the next step in this work, the company plans to accelerate its go-to-market efforts and take on other players in the category — Cribl, Splunk, DataDog, etc.  It also plans to enhance the product with more AI capabilities, anomaly detection, data policy engine, analytics, and source and destination connectors. According to insights from MarketsAndMarkets, the market size for global observability tools and platforms is expected to grow nearly 12% from $2.4 billion in 2023 to $4.1 billion by 2028. source

Observo’s AI-native data pipelines cut noisy telemetry by 70%, strengthening enterprise security Read More »

93% of IT leaders see value in AI agents but struggle to deliver, Salesforce finds

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Everyone is talking about AI agents. But so far, a lot of that has just been, well, talk.  That is set to change in 2025, according to Salesforce — AI agents are finally getting real. According to a new survey from its integration and automation software company Mulesoft, 93% of enterprise IT leaders have implemented or planned to implement AI agents in the next two years.  Still, enterprises continue to struggle with delivery times — for instance, 29% missed their delivery goals in 2024 — and 80% say data integration is one of their biggest challenges when using AI.  “Integration challenges hinder companies from fully realizing the technology’s potential to create a limitless digital workforce,” Andrew Comstock, SVP and GM of MuleSoft, told VentureBeat. “Integration is incredibly foundational to making AI agents work because AI agent outputs depend on connected data that enables a comprehensive understanding of the context and nuances within user queries.” Enterprises still struggle, but are seeing AI take shape Salesforce’s 10th annual MuleSoft Connectivity Benchmark Report surveyed 1,050 enterprise IT leaders. One of the key findings was that organizations today use 897 apps on average. Further, the number of AI models organizations use has doubled (from nine in 2024 to 18 this year), and organizations using agents have roughly 22 AI models deployed, significantly more than those not yet using agents (15).  However, only about 29% of those apps are connected, and the majority of respondents (95%) say they struggle to integrate data across systems.  Comstock explained that such integration issues impact agent accuracy and usefulness; they need to gather structured and unstructured data from diverse sources, including enterprise resource planning (ERP), customer relationship management (CRM) and human capital management (HCM) platforms, as well as emails, PDFs, Slack and other systems to make decisions and take action.  IT leaders do see great value in application programming interfaces (APIs) — which allow different apps to talk to each other — saying they are beneficial when it comes to improving IT infrastructure, sharing data across teams and integrating disparate systems.  With correct integration and APIs, agents “can interact directly with their existing systems, automations and other agents across the enterprise, so that they don’t have to refit everything for the AI world,” said Comstock.  The survey also found that IT leaders are expecting an 18% increase in projects this year, and will spend an average of $16.9 million on IT staff in 2024, more than double the average in 2023. Yet nearly 40% of IT teams’ time is still spent on designing, building and testing new integrations between systems and data.  “That’s an incredibly high percentage to spend on cumbersome work,” Comstock pointed out. “Every IT [team] has more work to do than resources available, leading to backlogs, delays and inefficiencies. Agents, we believe, can close the IT delivery gap.” Indeed, the vast majority of IT leaders polled (93%) say that AI will increase the productivity of their developers over the next three years.  “A digital labor workforce can act autonomously in a business to successfully carry out both simple and complex tasks, enabling increased productivity and efficiency,” said Comstock. He noted that enterprises will eventually move beyond simple AI agents to “super agents,” which don’t just respond to a single command, but pursue a goal and perform complex human tasks.  Bottom line: Enterprise leaders are already seeing — and experiencing — AI at work. As evidenced as recently as this week, “DeepSeek has changed the baseline of what we think we can do,” said Comstock.  “We’re seeing more readiness to talk about the things that AI is going to do, more than just sort of behind the scenes,” he said. “What we’re seeing from these benchmark studies is that IT leaders are ready for that conversation.” How PenFed Credit Union and Adecco are using AI agents PenFed Credit Union is one Salesforce customer already seeing the benefit of AI agents. In less than eight weeks, the country’s third-largest federal credit union set up two new support channels — live agent chat and chatbots built on Agentforce — with just one engineer.  The company is using Mulesoft to gather data into one unified platform. This gives service agents a 360-degree view of member data, Comstock explained, allowing them to provide better and faster support with live chat and self-service support options. Branch representatives can also handle multiple types of customer requests in a single window.  As a result, PenFed now resolves 20% of cases on first contact with AI-powered chatbots, increasing chat and chatbot activity by 223% in the past year. Coincidentally or not, the credit union has also increased membership by 31%. “Members get channels of choice when they need help,” Comstock explained. “They’re enjoying short wait times because they’re not repeating the same information when they talk to service representatives, because the information is being connected together more effectively.” Leading talent company Adecco, meanwhile, is using Agentforce, MuleSoft and Salesforce Data Cloud to centralize more than 40 disparate systems. The company processes 300 million applications a year and places 1 million people daily. However, with its traditional tooling, its recruiters can only respond to a fraction of the applications it receives, unintentionally ghosting a significant number of candidates.  To address this problem, Agentforce will autonomously and automatically sift through resumes and pull together shortlists of candidates based on preset criteria such as skills, experience or location. After passing that list off to human recruiters, the model will then notify candidates who weren’t a good fit and suggest alternative roles. The goal: to eventually respond to 100% of applicants, Comstock explained.  Similarly, Agentforce will help with job postings, identifying the most effective job boards and platforms based on past success rates and regional needs. This will remove the need for recruiters to manually publish listings.  “It’s about ‘How effective can we make our employees and free them

93% of IT leaders see value in AI agents but struggle to deliver, Salesforce finds Read More »

Pig API: Give your AI agents a virtual desktop to automate Windows apps

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In the evolving landscape of AI, enterprises face the challenge of integrating modernsolutions with legacy systems that often lack the necessary application programming interfaces (APIs) for seamless integration. Approximately 66% of organizations continue to rely on legacy applications for core operations, leading to increased maintenance costs and security vulnerabilities. Tools like PigAPI have taken a different approach to this problem by enabling AI agents to interact directly with graphical user interfaces (GUIs) within virtual Windows desktops hosted in the cloud. This connects modern AI capabilities with legacy software, allowing for automation of tasks such as data entry and workflow management without the need for local infrastructure. Additionally, users can intervene at any point, taking control of the virtual machine (VM) to guide or adjust tasks as needed. For businesses grappling with legacy challenges, this hybrid approach offers a practical solution to modernize operations without overhauling existing systems. Breaking through legacy system barriers Traditional robotic process automation (RPA) tools, such as UiPath and Automation Anywhere, are designed to automate repetitive tasks by mimicking human interactions with software applications. However, these tools often encounter significant challenges when dealing with legacy systems, particularly those that are GUI-based and lack modern integration points. The absence of user-friendly APIs in these older systems makes integration cumbersome and prone to errors. Additionally, RPA solutions are typically rule-based and struggle to adapt to dynamic changes in user interfaces or workflows, leading to brittle automation processes that require constant maintenance and updates. By contrast, AI agents, such as those enabled by Pig API, offer a more flexible and intelligent approach to automation. Unlike traditional RPA tools, AI agents are not solely rule-based; they can learn and adapt to changes in the user interface, making them more resilient to updates or modifications in legacy systems. This adaptability reduces the need for constant maintenance and allows for more complex task automation. Furthermore, by operating within virtual environments, AI agents can scale more efficiently, handling multiple tasks across different systems simultaneously without the constraints of physical hardware. For example, in the finance sector, AI agents can facilitate the migration of data from outdated accounting systems to modern customer relationship management (CRM) platforms by mimicking manual data entry processes. In healthcare, they can interact with legacy electronic health record (EHR) systems to extract and input patient information, streamlining administrative tasks and reducing the potential for human error. Technical details: How Pig API powers GUI automation with AI agents Pig API enables AI agents to interact directly with GUIs within cloud-hosted virtual Windows desktops. Through its Python software development kit (SDK), Pig makes it possible for developers to integrate virtual environments into workflows, automating processes that traditionally required manual effort. Connecting AI agents to cloud-hosted virtual desktops At the heart of Pig API is its ability to create and manage VMs for AI agents. These cloud-hosted environments eliminate the need for local infrastructure, allowing enterprises to scale workflows seamlessly. For instance, developers can easily initialize a VM, connect to it, and define tasks for their AI agents using a straightforward process. Here’s an example: Source: https://x.com/erikdunteman/status/1881754445899567315 This setup provides AI agents with a dedicated environment to perform tasks such as interacting with desktop applications, simulating user inputs and automating workflows. By abstracting the complexities of GUI interaction, Pig ensures that developers of varying expertise can leverage its capabilities effectively. Simulating human-like interactions  Pig API enables AI agents to perform a variety of actions that closely mimic human behavior. This includes moving a mouse, clicking, dragging, typing into forms or spreadsheets and capturing screenshots of the current desktop view. These tools allow agents to make informed decisions during their operations and execute complex workflows. Source: https://github.com/pig-dot-dev/pig-python LLM integration for multi-step workflows One of Pig API’s standout features is its integration with large language models (LLMs) such as Anthropic’s Claude or OpenAI’s GPT. This capability enables AI agents to incorporate decision-making into their automation workflows, handling tasks that go beyond predefined rules. For instance, consider the following example of a data extraction and processing workflow: Source: https://x.com/erikdunteman/status/1881754445899567315 In this workflow, the AI agent opens a browser, navigates to a specified URL, extracts relevant customer reviews and organizes data into an Excel spreadsheet. By integrating with LLMs, Pig enables agents to execute multi-step tasks that combine GUI automation with AI-driven logic, demonstrating its potential for streamlining complex operations. Pig API in the automation ecosystem The automation landscape includes a variety of tools tailored for different use cases, from traditional RPA platforms to advanced agentic AI solutions. Tools like UiPath and AutoHotkey excel at automating structured workflows and repetitive tasks, but are often limited when it comes to unstructured processes or GUI-heavy environments. Both require predefined scripts or rule-based logic, making them less adaptable to changes in user interfaces or dynamic workflows. Pig API positions itself as a solution for scenarios where traditional automation tools encounter barriers, particularly in interacting with legacy Windows applications. Other emerging solutions, such as Microsoft’s UFO project and Anthropic’s Computer Use, also aim to enhance automation through intelligent agents capable of interacting with GUIs. However, these technologies remain in their experimental stages and focus more on augmenting user productivity rather than enterprise-scale workflows. Pig’s specific focus on enabling agents to operate within isolated virtual environments provides an alternative that aligns with the needs of enterprises dealing with legacy systems. What’s next for Pig API and AI automation As enterprises continue to navigate the complexities of integrating modern AI solutions with legacy systems, tools like Pig API take a new approach to bridging this gap. By enabling AI agents to interact directly with GUIs within virtual Windows desktops, Pig opens up new possibilities for automation in environments that have traditionally been difficult to modernize. Its cloud-hosted architecture and ability to work without APIs position it as a valuable tool for enterprises looking to extend the lifespan of legacy systems while improving operational efficiency.

Pig API: Give your AI agents a virtual desktop to automate Windows apps Read More »

Tencent introduces ‘Hunyuan3D 2.0,’ AI that speeds up 3D design from days to seconds

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Tencent has unveiled “Hunyuan3D 2.0,” an AI system that turns single images or text descriptions into detailed 3D models within seconds. The system makes a typically lengthy process — one that can take skilled artists days or weeks — into a rapid, automated task. Following its predecessor, this new version of the model is available as an open-source project on both Hugging Face and GitHub, making the technology immediately accessible to developers and researchers worldwide. “Creating high-quality 3D assets is a time-intensive process for artists, making automatic generation a long-term goal for researchers,” the company’s research team writes in a technical report. The upgraded system builds upon its predecessor’s foundation while introducing significant improvements in speed and quality. How Hunyuan3D 2.0 turns images into 3D models Hunyuan3D 2.0 uses two main components: Hunyuan3D-DiT creates the basic shape, while Hunyuan3D-Paint adds surface details. The system first makes multiple 2D views of an object, then builds these into a complete 3D model. A new guidance system ensures all views of the object match — solving a common problem in AI-generated 3D models. “We position cameras at specific heights to capture the maximum visible area of each object,” the researchers explain. This approach, combined with their method of mixing different viewpoints, helps the system capture details that other models often miss, especially on the tops and bottoms of objects. A diagram showing how Hunyuan3D 2.0 transforms a single panda image into a 3D model through multi-view diffusion and sparse-view reconstruction techniques. (credit: arxiv.org) Faster and more accurate: What sets Hunyuan3D 2.0 apart The technical results are impressive. Hunyuan3D 2.0 produces more accurate and visually appealing models than existing systems, according to standard industry measurements. The standard version creates a complete 3D model in about 25 seconds, while a smaller, faster version works in just 10 seconds. What sets Hunyuan3D 2.0 apart is its ability to handle both text and image inputs, making it more versatile than previous solutions. The system also introduces innovative features like “adaptive classifier-free guidance” and “hybrid inputs” that help ensure consistency and detail in generated 3D models. According to their published benchmarks, Hunyuan3D 2.0 achieves a CLIP score of 0.809, surpassing both open-source and proprietary alternatives. The technology introduces significant improvements in texture synthesis and geometric accuracy, outperforming existing solutions across all standard industry metrics. The system’s key technical advance is its ability to create high-resolution models without requiring massive computing power. The team developed a new way to increase detail while keeping processing demands manageable — a frequent limitation of other 3D AI systems. These advances matter for many industries. Game developers can quickly create test versions of characters and environments. Online stores could show products in 3D. Movie studios could preview special effects more efficiently. Tencent has shared nearly all parts of their system through Hugging Face. Developers can now use the code to create 3D models that work with standard design software, making it practical for immediate use in professional settings. While this technology marks a significant step forward in automated 3D creation, it raises questions about how artists will work in the future. Tencent sees Hunyuan3D 2.0 not as a replacement for human artists, but as a tool that handles technical tasks while creators focus on artistic decisions. As 3D content becomes increasingly central to gaming, shopping, and entertainment, tools like Hunyuan3D 2.0 suggest a future where creating virtual worlds is as simple as describing them. The challenge ahead may not be generating 3D models, but deciding what to do with them. source

Tencent introduces ‘Hunyuan3D 2.0,’ AI that speeds up 3D design from days to seconds Read More »

We asked OpenAI’s o1 about the top AI trends in 2025 — here’s a look into our conversation

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI is already reshaping industries and society on a global scale. IDC predicts that AI will contribute $19.9 trillion to the global economy by 2030, comprising 3.5% of GDP. This momentum is exemplified by the recent announcement of “Project Stargate,” a partnership to invest up to $100 billion in new AI-focused data center capacity. This is all indicative of the tremendous activity going on with AI development. On a single day, AI made headlines for discovering proteins to counteract cobra venom, creating a Star Trek-style universal translator and paving the way for true AI assistants.  These and other developments highlight individual achievements, as well as their interconnected progress. This flywheel of innovation is where breakthroughs in one domain amplify advancements in others, compounding AI’s transformative potential. Separating signal from noise Even for someone who follows AI developments closely, the rapid technological breakthroughs and diffusion across industries and applications is dizzying, making it highly challenging to not only know and understand what is going on, but understand the relative importance of developments. It is challenging to separate the signal from noise.  In the past, I might have turned to an AI industry analyst to help explain the dynamics and meaning of recent and projected developments. This time, I decided instead to see if AI itself might be able to help me. This led me to a conversation with OpenAI’s o1 model. The 4o model might have worked as effectively, but I expected that a reasoning model such as o1 might be more effective.  I asked o1 what it thought were the top AI trends and why. I started by asking for the top 10 to 15, but over the course of our collaborative dialog, this expanded to 25. Yes, there really are that many, which is a testament to AI’s value as a general-purpose technology.  In dialog about leading AI trends with OpenAI’s o1 model. After about 30 seconds of inference-time “thinking,” o1 responded with a list of trends in AI development and use, ranked according to their potential significance and impact on business and society. I asked several qualifying questions and made a few suggestions that led to slight changes in the evaluation method and rankings.  Methodology Rankings of the various AI trends are determined by a blended heuristic that balances multiple factors including both quantitative indicators (near-term commercial viability) and qualitative judgments (disruptive potential and near-term societal impact) further described as follows:  Current commercial viability: The trend’s market presence and adoption. Long term disruptive potential: How a trend could significantly reshape industries and create new markets. Societal impact: Weighing the immediate and near-term effects on society, including accessibility, ethics and daily life. In addition to the overall AI trend rankings, each trend receives a long-term social transformation score (STS), ranging from incremental improvements (6) to civilization-altering breakthroughs (10). The STS reflects the trend’s maximum potential impact if fully realized, offering an absolute measure of transformational significance. Levels of social transformation associated with top AI trends. The development of this ranking process reflects the potential of human-AI collaboration. o1 provided a foundation for identifying and ranking trends, while my human oversight helped ensure that the insights were contextualized and relevant. The result shows how humans and AI can work together to navigate complexity. Top AI trends in 2025 For tech leaders, developers and enthusiasts alike, these trends signal both immense opportunity and significant challenges in navigating the many changes brought by AI. Highly-ranked trends typically have broad current adoption, high commercial viability or significant near-term disruptive effects. Table of top 10 trends for 2025 ranked on current commercial viability, long-term disruptive potential and potential for social impact. Specific use cases — like self-driving cars or personal assistant robots — are not considered individual trends but are instead subsumed within the broader foundational trends. Honorable mention list: AI trends 11 – 25 One can quibble whether number 11 or any of the following should be in the top 10, but keep in mind that these are relative rankings and include a certain amount of subjectivity (whether from o1 or from me), based on our iterative conversation. I suppose this is not too different from the conversations that take place within any research organization when completing their reports ranking the comparative merits of trends. In general, this next set of trends has significant potential but are either: 1) not yet as widespread and/or 2) have a potential payoff that is still several or more years away. While these trends did not make the top 10, they showcase the expanding influence of AI across healthcare, sustainability and other critical domains.  Table of top 11 to 25 trends for 2025 ranked on current commercial viability, long-term disruptive potential and potential for societal impact. Digital humans show the innovation flywheel in action One use case that highlights the convergence of these trends is digital humans, which exemplify how foundational and emerging AI technologies come together to drive transformative innovation. These AI-powered avatars create lifelike, engaging interactions and span roles such as digital coworkers, tutors, personal assistants, entertainers and companions. Their development shows how interconnected AI trends create transformative innovations.  The flywheel of AI innovation: Interconnected advancements in AI technologies drive transformative progress, where breakthroughs in one domain amplify developments in others, creating a self-reinforcing cycle of innovation leading to new uses. For example, these lifelike avatars are developed using the capabilities of generative AI (trend 1) for natural conversation, explainable AI (2) to build trust through transparency and agentic AI (3) for autonomous decision-making. With synthetic data generation, digital humans are trained on diverse, privacy-preserving datasets, ensuring they adapt to cultural and contextual nuances. Meanwhile, edge AI (5) enables near real-time responsiveness and multi-modal AI (17) enhances interactions by integrating text, audio and visual elements.  By using the technologies described by these trends, digital humans exemplify how advancements in one domain can accelerate progress in

We asked OpenAI’s o1 about the top AI trends in 2025 — here’s a look into our conversation Read More »

DeepSeek-R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More (Updated Monday, 1/27 8am) DeepSeek-R1’s release last Monday has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. Matching OpenAI’s o1 at just 3%-5% of the cost, this open-source model has not only captivated developers but also challenges enterprises to rethink their AI strategies. The model has rocketed to become the top-trending model being downloaded on HuggingFace (109,000 times, as of this writing), as developers rush to try it out and seek to understand what it means for their AI development. Users are commenting that DeepSeek’s accompanying search feature (which you can find at DeepSeek’s site) is now superior to competitors like OpenAI and Perplexity, and is rivaled only by Google’s Gemini Deep Research. (Update as of Monday 1/27, 8am: DeepSeek has also shot up to the top of the iPhone app store, and caused a selloff on Wall Street this morning as investors reexamine the efficiencies of capital expenditures by leading U.S. AI companies.) The implications for enterprise AI strategies are profound: With reduced costs and open access, enterprises now have an alternative to costly proprietary models like OpenAI’s. DeepSeek’s release could democratize access to cutting-edge AI capabilities, enabling smaller organizations to compete effectively in the AI arms race. This story focuses on exactly how DeepSeek managed this feat, and what it means for the vast number of users of AI models. For enterprises developing AI-driven solutions, DeepSeek’s breakthrough challenges assumptions of OpenAI’s dominance — and offers a blueprint for cost-efficient innovation. It’s “how” DeepSeek did what it did that should be the most educational here. DeepSeek-R1’s breakthrough #1: Moving to pure reinforcement learning In November, DeepSeek made headlines with its announcement that it had achieved performance surpassing OpenAI’s o1, but at the time it only offered a limited R1-lite-preview model. With Monday’s full release of R1 and the accompanying technical paper, the company revealed a surprising innovation: a deliberate departure from the conventional supervised fine-tuning (SFT) process widely used in training large language models (LLMs). SFT, a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model. This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets. While some flaws emerged — leading the team to reintroduce a limited amount of SFT during the final stages of building the model — the results confirmed the fundamental breakthrough: Reinforcement learning alone could drive substantial performance gains. The company got much of the way using open source — a conventional and unsurprising way First, some background on how DeepSeek got to where it did. DeepSeek, a 2023 spinoff of Chinese hedge fund High-Flyer Quant, began by developing AI models for its proprietary chatbot before releasing them for public use. Little is known about the company’s exact approach, but it quickly open-sourced its models, and it’s extremely likely that the company built upon the open projects produced by Meta, for example the Llama model, and ML library Pytorch.  To train its models, High-Flyer Quant secured over 10,000 Nvidia GPUs before U.S. export restrictions kicked in, and reportedly expanded to 50,000 GPUs through alternative supply routes despite trade barriers (actually, no one knows; these extras may have been Nvidia H800’s, which are compliant with the barriers and have reduced chip-to-chip transfer speeds). Either way, this pales compared to leading AI labs like OpenAI, Google, and Anthropic, which operate with more than 500,000 GPUs each.   DeepSeek’s ability to achieve competitive results with limited resources highlights how ingenuity and resourcefulness can challenge the high-cost paradigm of training state-of-the-art LLMs. Despite speculation, DeepSeek’s full budget is unknown DeepSeek reportedly trained its base model — called V3 — on a $5.58 million budget over two months, according to Nvidia engineer Jim Fan. While the company hasn’t divulged the exact training data it used (side note: critics say this means DeepSeek isn’t truly open-source), modern techniques make training on web and open datasets increasingly accessible. Estimating the total cost of training DeepSeek-R1 is challenging. While running 50,000 GPUs suggests significant expenditures (potentially hundreds of millions of dollars), precise figures remain speculative. But it was certainly more than the $6 million budget that is often quoted in the media. (Update: Good analysis just released here by Ben Thompson goes into more detail on cost and the significant innovations the company made on the GPU and infrastructure levels.) What’s clear, though, is that DeepSeek has been very innovative from the get-go. Last year, reports emerged about some initial innovations it was making, around things like mixture-of-experts and multi-head latent attention. (Update: Here is a very detailed report just published about DeepSeek’s various infrastructure innovations by Jeffrey Emanuel, a former quant investor and now entrepreneur. It’s long but very good. See the “Theoretical Threat” section about three other innovations worth mentioning: (1) mixed-precision training, which allowed DeepSeek to use 8-bit floating numbers throughout the training, instead of 32-bit, allowing DeepSeek to dramatically reduce memory requirements per GPU, translating into needing fewer GPUs; (2) multi-token predicting during inference; and (3) advances in GPU communication efficiency through their DualPipe algorithm, resulting in higher GPU utilization.) How DeepSeek-R1 got to the “aha moment” The journey to DeepSeek-R1’s final iteration began with an intermediate model, DeepSeek-R1-Zero, which was trained using pure reinforcement learning. By relying solely on RL, DeepSeek incentivized this model to think independently, rewarding both correct answers and the logical processes used to arrive at them. This approach led to an unexpected phenomenon: The model began allocating additional processing time to more complex problems, demonstrating an ability to prioritize tasks based on their difficulty. DeepSeek’s researchers described this as an “aha moment,” where the model itself identified

DeepSeek-R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost Read More »

Alibaba’s Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Alibaba Cloud unveiled its Qwen2.5-Max model today, marking the second major artificial intelligence breakthrough from China in less than a week, further rattling U.S. technology markets and intensifying concerns about America’s eroding AI leadership. The new model outperforms DeepSeek’s R1 model, whose success sent Nvidia’s stock plunging 17% on Monday, in several key benchmarks including Arena-Hard, LiveBench and LiveCodeBench. Qwen2.5-Max also demonstrates competitive results against industry leaders like GPT-4o and Claude-3.5-Sonnet in tests of advanced reasoning and knowledge. “We have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes,” Alibaba Cloud announced in a blog post. The company emphasized its model’s efficiency, having trained it on over 20 trillion tokens while using a mixture-of-experts architecture that requires significantly fewer computational resources than traditional approaches. The timing of these back-to-back Chinese AI releases has deepened Wall Street’s anxiety about U.S. technological supremacy. Both announcements came during President Trump’s first week back in office, prompting questions about the effectiveness of U.S. chip export controls meant to slow China’s AI advancement. Qwen2.5-Max outperforms major AI models across key benchmarks, including a significant lead in Arena-Hard testing, where it scored 89.4%. (Source: Alibaba Cloud) How Qwen2.5-Max could reshape enterprise AI strategies For CIOs and technical leaders, Qwen2.5-Max’s architecture represents a potential shift in enterprise AI deployment strategies. Its mixture-of-experts approach demonstrates that competitive AI performance can be achieved without massive GPU clusters, potentially reducing infrastructure costs by 40-60% compared to traditional large language model deployments. The technical specifications show sophisticated engineering choices that matter for enterprise adoption. The model activates only specific neural network components for each task, allowing organizations to run advanced AI capabilities on more modest hardware configurations. This efficiency-first approach could reshape enterprise AI roadmaps. Rather than investing heavily in data center expansions and GPU clusters, technical leaders might prioritize architectural optimization and efficient model deployment. The model’s strong performance in code generation (LiveCodeBench: 38.7%) and reasoning tasks (Arena-Hard: 89.4%) suggests it could handle many enterprise use cases while requiring significantly less computational overhead. However, technical decision-makers should carefully consider factors beyond raw performance metrics. Questions about data sovereignty, API reliability and long-term support will likely influence adoption decisions, especially given the complex regulatory landscape surrounding Chinese AI technologies. Qwen2.5-Max achieves top scores across key AI benchmarks, including 94.5% accuracy in mathematical reasoning tests, outperforming major competitors. (Source: Alibaba Cloud) China’s AI leap: how efficiency is driving innovation Qwen2.5-Max’s architecture reveals how Chinese companies are adapting to U.S. restrictions. This efficiency-focused innovation suggests China may have found a sustainable path to AI advancement despite limited access to cutting-edge chips. The technical achievement here cannot be overstated. While U.S. companies have focused on scaling up through brute computational force — exemplified by OpenAI’s estimated use of over 32,000 high-end GPUs for its latest models — Chinese companies are finding success through architectural innovation and efficient resource use. U.S. export controls: catalysts for China’s AI renaissance? These developments force a fundamental reassessment of how technological advantage can be maintained in an interconnected world. U.S. export controls, designed to preserve American leadership in AI, may have inadvertently accelerated Chinese innovation in efficiency and architecture. “The scaling of data and model size not only showcases advancements in model intelligence but also reflects our unwavering commitment to pioneering research,” Alibaba Cloud stated in its announcement. The company emphasized its focus on “enhancing the thinking and reasoning capabilities of large language models through the innovative application of scaled reinforcement learning.” What Qwen2.5-Max means for enterprise AI adoption For enterprise customers, these developments could herald a more accessible AI future. Qwen2.5-Max is already available through Alibaba Cloud’s API services, offering capabilities similar to leading U.S. models at potentially lower costs. This accessibility could accelerate AI adoption across industries, particularly in markets where cost has been a barrier. However, security concerns persist. The U.S. Commerce Department has launched a review of both DeepSeek and Qwen2.5-Max to assess potential national security implications. The ability of Chinese companies to develop advanced AI capabilities despite export controls raises questions about the effectiveness of current regulatory frameworks. The future of AI: efficiency over power? The global AI landscape is shifting rapidly. The assumption that advanced AI development requires massive computational resources and cutting-edge hardware is being challenged. As Chinese companies demonstrate the possibility of achieving similar results through efficient innovation, the industry may be forced to reconsider its approach to AI advancement. For U.S. technology leaders, the challenge is now twofold: responding to immediate market pressures while developing sustainable strategies for long-term competition in an environment where hardware advantages may no longer guarantee leadership. The next few months will be crucial as the industry adjusts to this new reality. With both Chinese and U.S. companies promising further advances, the global race for AI supremacy enters a new phase — one where efficiency and innovation may prove more important than raw computational power. source

Alibaba’s Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI Read More »

Calm down: DeepSeek R1 is great, but ChatGPT’s product advantage is far from over

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Just a week ago — on January 20, 2025 — Chinese AI startup DeepSeek unleashed a new, open-source AI model called R1 that might have initially been mistaken for one of the ever-growing masses of nearly interchangeable rivals that have sprung up since OpenAI debuted ChatGPT (powered by its own GPT-3.5 model, initially) more than two years ago. But that quickly proved unfounded, as DeepSeek’s mobile app has in that short time rocketed up the charts of the Apple App Store in the U.S. to dethrone ChatGPT for the number one spot and caused a massive market correction as investors dumped stock in formerly hot computer chip makers such as Nvidia, whose graphics processing units (GPUs) have been in high demand for use in massive superclusters to train new AI models and serve them up to customers on an ongoing basis (a modality known as “inference.”) Venture capitalist Marc Andreessen, echoing sentiments of other tech workers, wrote on the social network X last night: “Deepseek R1 is AI’s Sputnik moment,” comparing it to the pivotal October 1957 launch of the first artificial satellite in history, Sputnik 1, by the Soviet Union, which sparked the “space race” between that country and the U.S. to dominate space travel. Sputnik’s launch galvanized the U.S. to invest heavily in research and development of spacecraft and rocketry. While it’s not a perfect analogy — heavy investment was not needed to create DeepSeek-R1, quite the contrary (more on this below) — it does seem to signify a major turning point in the global AI marketplace, as for the first time, an AI product from China has become the most popular in the world. But before we jump on the DeepSeek hype train, let’s take a step back and examine the reality. As someone who has extensively used OpenAI’s ChatGPT — on both web and mobile platforms — and followed AI advancements closely, I believe that while DeepSeek-R1’s achievements are noteworthy, it’s not time to dismiss ChatGPT or U.S. AI investments just yet. And please note, I am not being paid by OpenAI to say this — I’ve never taken money from the company and don’t plan on it. What DeepSeek-R1 does well DeepSeek-R1 is part of a new generation of large “reasoning” models that do more than answer user queries: They reflect on their own analysis while they are producing a response, attempting to catch errors before serving them to the user. And DeepSeek-R1 matches or surpasses OpenAI’s own reasoning model, o1, released in September 2024 initially only for ChatGPT Plus and Pro subscription users, in several areas. For instance, on the MATH-500 benchmark, which assesses high-school-level mathematical problem-solving, DeepSeek-R1 achieved a 97.3% accuracy rate, slightly outperforming OpenAI o1’s 96.4%. In terms of coding capabilities, DeepSeek-R1 scored 49.2% on the SWE-bench Verified benchmark, edging out OpenAI o1’s 48.9%. Moreover, financially, DeepSeek-R1 offers substantial cost savings. The model was developed with an investment of under $6 million, a fraction of the expenditure — estimated to be multiple billions —reportedly associated with training models like OpenAI’s o1. DeepSeek was essentially forced to become more efficient with scarce and older GPUs thanks to a U.S. export restriction on the tech’s sales to China. Additionally, DeepSeek provides API access at $0.14 per million tokens, significantly undercutting OpenAI’s rate of $7.50 per million tokens. DeepSeek-R1’s massive efficiency gain, cost savings and equivalent performance to the top U.S. AI model have caused Silicon Valley and the wider business community to freak out over what appears to be a complete upending of the AI market, geopolitics, and known economics of AI model training. While DeepSeek’s gains are revolutionary, the pendulum is swinging too far toward it right now There’s no denying that DeepSeek-R1’s cost-effectiveness is a significant achievement. But let’s not forget that DeepSeek itself owes much of its success to U.S. AI innovations, going back to the initial 2017 transformer architecture developed by Google AI researchers (which started the whole LLM craze). DeepSeek-R1 was trained on synthetic data questions and answers and specifically, according to the paper released by its researchers, on the supervised fine-tuned “dataset of DeepSeek-V3,” the company’s previous (non-reasoning) model, which was found to have many indicators of being generated with OpenAI’s GPT-4o model itself! It seems pretty clear-cut to say that without GPT-4o to provide this data, and without OpenAI’s own release of the first commercial reasoning model o1 back in September 2024, which created the category, DeepSeek-R1 would almost certainly not exist. Furthermore, OpenAI’s success required vast amounts of GPU resources, paving the way for breakthroughs that DeepSeek has undoubtedly benefited from. The current investor panic about U.S. chip and AI companies feels premature and overblown. ChatGPT’s vision and image generation capabilities are still hugely important and valuable in workplace and personal settings — DeepSeek-R1 doesn’t have any yet While DeepSeek-R1 has impressed with its visible “chain of thought” reasoning — a kind of stream of consciousness wherein the model displays text as it analyzes the user’s prompt and seeks to answer it — and efficiency in text- and math-based workflows, it lacks several features that make ChatGPT a more robust and versatile tool today. No image generation or vision capabilities The official DeepSeek-R1 website and mobile app do let users upload photos and file attachments. But, they can only extract text from them using optical character recognition (OCR), one of the earliest computing technologies (dating back to 1959). This pales in comparison to ChatGPT’s vision capabilities. A user can upload images without any text whatsoever and have ChatGPT analyze the image, describe it, or provide further information based on what it sees and the user’s text prompts. ChatGPT allows users to upload photos and can analyze visual material and provide detailed insights or actionable advice. For example, when I needed guidance on repairing my bike or maintaining my air conditioning unit, ChatGPT’s ability to process images

Calm down: DeepSeek R1 is great, but ChatGPT’s product advantage is far from over Read More »