VentureBeat

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese AI startup DeepSeek, known for challenging leading AI vendors with open-source technologies, just dropped another bombshell: a new open reasoning LLM called DeepSeek-R1. Based on the recently introduced DeepSeek V3 mixture-of-experts model, DeepSeek-R1 matches the performance of o1, OpenAI’s frontier reasoning LLM, across math, coding and reasoning tasks. The best part? It does this at a much more tempting cost, proving to be 90-95% more affordable than the latter. The release marks a major leap forward in the open-source arena. It showcases that open models are further closing the gap with closed commercial models in the race to artificial general intelligence (AGI). To show the prowess of its work, DeepSeek also used R1 to distill six Llama and Qwen models, taking their performance to new levels. In one case, the distilled version of Qwen-1.5B outperformed much bigger models, GPT-4o and Claude 3.5 Sonnet, in select math benchmarks. These distilled models, along with the main R1, have been open-sourced and are available on Hugging Face under an MIT license. What does DeepSeek-R1 bring to the table? The focus is sharpening on artificial general intelligence (AGI), a level of AI that can perform intellectual tasks like humans. A lot of teams are doubling down on enhancing models’ reasoning capabilities. OpenAI made the first notable move in the domain with its o1 model, which uses a chain-of-thought reasoning process to tackle a problem. Through RL (reinforcement learning, or reward-driven optimization), o1 learns to hone its chain of thought and refine the strategies it uses — ultimately learning to recognize and correct its mistakes, or try new approaches when the current ones aren’t working.  Now, continuing the work in this direction, DeepSeek has released DeepSeek-R1, which uses a combination of RL and supervised fine-tuning to handle complex reasoning tasks and match the performance of o1.  When tested, DeepSeek-R1 scored 79.8% on AIME 2024 mathematics tests and 97.3% on MATH-500. It also achieved a 2,029 rating on Codeforces — better than 96.3% of human programmers. In contrast, o1-1217 scored 79.2%, 96.4% and 96.6% respectively on these benchmarks.  It also demonstrated strong general knowledge, with 90.8% accuracy on MMLU, just behind o1’s 91.8%.  Performance of DeepSeek-R1 vs OpenAI o1 and o1-mini The training pipeline DeepSeek-R1’s reasoning performance marks a big win for the Chinese startup in the US-dominated AI space, especially as the entire work is open-source, including how the company trained the whole thing.  However, the work isn’t as straightforward as it sounds. According to the paper describing the research, DeepSeek-R1 was developed as an enhanced version of DeepSeek-R1-Zero — a breakthrough model trained solely from reinforcement learning.  We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive – truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely. DeepSeek-R1 not only open-sources a barrage of models but… pic.twitter.com/M7eZnEmCOY — Jim Fan (@DrJimFan) January 20, 2025 The company first used DeepSeek-V3-base as the base model, developing its reasoning capabilities without employing supervised data, essentially focusing only on its self-evolution through a pure RL-based trial-and-error process. Developed intrinsically from the work, this ability ensures the model can solve increasingly complex reasoning tasks by leveraging extended test-time computation to explore and refine its thought processes in greater depth. “During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors,” the researchers note in the paper. “After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with majority voting, the score further improves to 86.7%, matching the performance of OpenAI-o1-0912.” However, despite showing improved performance, including behaviors like reflection and exploration of alternatives, the initial model did show some problems, including poor readability and language mixing. To fix this, the company built on the work done for R1-Zero, using a multi-stage approach combining both supervised learning and reinforcement learning, and thus came up with the enhanced R1 model. “Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model,” the researchers explained. “Following this, we perform reasoning-oriented RL like DeepSeek-R1- Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.” Far more affordable than o1 In addition to enhanced performance that nearly matches OpenAI’s o1 across benchmarks, the new DeepSeek-R1 is also very affordable. Specifically, where OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, costs $0.55 per million input and $2.19 per million output tokens.  Sooo @deepseek_ai's reasoner model, which sits somewhere between o1-mini & o1 is about 90-95% cheaper 👀 https://t.co/ohnI6dtPRC pic.twitter.com/Qn78yIGUtt — Emad (@EMostaque) January 20, 2025 The model can be tested as “DeepThink” on the DeepSeek chat platform, which is similar to ChatGPT. Interested users can access the model weights and code repository via Hugging Face, under an MIT license, or can go with the API for direct integration. source

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost Read More »

Adobe Firefly Bulk Create edits several images all at once

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprise creatives often need to re-edit the same images repeatedly to keep their websites and social media fresh. With the advent of AI-powered image and video editing platforms, organizations will turn to these products to reduce these tedious tasks.  Adobe announced Adobe Firefly Bulk Create, a new product connected to its Adobe Firefly Services API suite that streamlines the process. Bulk Create is a web app that allows users to edit several photos with AI in one go. Bulk Create doesn’t require users to download a desktop app or get a license to run Photoshop. As long as someone pays for Adobe Firefly Services, they can access the web app for Bulk Create. Right now, Bulk Create can do two things: change backgrounds and resize images. Using Adobe’s Firefly model, users can remove or replace all at once the backgrounds of the photos they upload. They can also generate new backgrounds and upload them to the platform using Firefly. The resize feature contains presets for popular social media platforms like Instagram or Facebook, which have specific image dimensions.  The app also learns brand preferences so customers can customize how their images look.  Bulk Create is currently in a private beta, but Adobe said the platform will become generally availability in the first quarter of this year.  A math problem Hannah Elsakr, vice president of product at Adobe Firefly GenAI for Enterprise, told VentureBeat that the balancing the volume of images with the time to update and edit is a math problem that remains a significant issue for many enterprise creative teams. “There’s an unprecedented demand for content on every level, and it’s backbreaking for a lot of creative teams,” Elsakr said. “And what I mean by that is [that] we have more and more digital channels with the need for personalization with engagement.” Elsakr said enterprises — retailers, for example — often have to update product shots for their websites and social media channels to show seasonality. This could mean about 52 refreshes a year, one new photo a week. Photoshoots are expensive, so re-editing images to change backgrounds or resizing them for the different social platforms are the more cost-effective option. While normally users would take to Adobe’s incredibly popular editing tool, Photoshop, Bulk Create lets people who may not have Photoshop access the editing features to complete these tasks.  Right now, Bulk Create only edits images, though Elsakr said video support may be coming soon.  Adobe has always been a major player in the creative space. In June 2023, it released an enterprise version of Firefly that offered “commercially viable” AI content. After that announcement, some Adobe stock image creators voiced their concerns that the company might be training the model with their work. In 2024, after users noticed some wording in Adobe’s Terms of Service requiring customers to agree to give access to their work, Adobe promised it doesn’t train Firefly on customer work. AI in creative works Using generative AI to generate or edit images has become normal for many enterprises, though many artists believe these tools train on their work without their consent. Most AI model providers have an image generation or editing model. OpenAI has DALL·E 3, which is now integrated into ChatGPT. Google’s Imagen 3 is now available to users, and Meta lets people on Instagram and Facebook generate photos. Other popular image generators include Stability AI and MidJourney. Even Getty Images has a text-to-image generator. As these tools become more available, and if the move to label AI-generated content gains steam, more enterprises will turn to AI to get the images they need. At the same time, there continues to be a broader backlash against AI-generated photos, and the public has become very sensitive to art they believe is made by an AI model.  Disclosure: VentureBeat uses Adobe Firefly, MidJourney and other AI tools and programs to generate content. source

Adobe Firefly Bulk Create edits several images all at once Read More »

AI comes alive: From bartenders to surgical aides to puppies, tomorrow’s robots are on their way

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Humanoid robots are no longer the stuff of science fiction. Imagine a world where robots not only collaborate with us in factories but also greet us in stores, aid in surgeries and care for our loved ones. With Tesla planning to deploy thousands of Optimus robots by 2026, the age of humanoid robots is closer than we think.  This vision is becoming increasingly tangible as more companies showcase groundbreaking innovations. The 2025 Consumer Electronics Show (CES) showcased several examples of how robotics is advancing in both functionality and human-centric design. These included ADAM the robot bartender from Richtech Robotics, which mixes more than 50 types of drinks and interacts with customers, and Tombot Inc.’s puppy dogs that wag their tails and make sounds designed to comfort older adults with dementia. While there may be a market for these and other robots on display at the show, it is still early days for broad deployment of this type of robotic technology. Nevertheless, real technological progress is being made in the field. Increasingly, this includes “humanoid” robots that use generative AI to create more human-like abilities — enabling robots to learn, sense and act in complex environments. From Optimus by Tesla to Aria from Realbotix, the next decade will see a proliferation of humanoid robots.  A conversation with “Aria.” Source: CNET https://youtu.be/2HQ84TVcbMw Despite these promising advancements, some experts caution that achieving fully human-like capabilities is still a distant goal. Citing shortcomings in current technology, Yann LeCun — one of the “Godfathers of AI” — argued recently that AI systems do not “have the capacity to plan, reason … or understand the physical world.” He added that we cannot build smart enough robots today because “we can’t get them to be smart enough.” LeCun might be correct, although that doesn’t mean we will not soon see more humanoid robots. Elon Musk recently said that Tesla will produce several thousand Optimus units in 2025 and that he expects to ship 50,000 to 100,000 of them in 2026. That is a dramatic increase from the handful that exist today performing circumscribed functions. Of course, Musk has been known to get his timelines wrong, such as when he said in 2016 that fully autonomous driving would be achieved within two years.  Nevertheless, it seems clear that significant advances are being made with humanoid robots. Tesla is not alone in pursuing this goal, as other companies including Agility Robotics, Boston Dynamics and Figure AI are among the leaders in the humanoid robotic field.  Business Insider recently had a conversation with Agility Robotics CEO Peggy Johnson, who said it would soon be “very normal” for humanoid robots to become coworkers with humans across a variety of workplaces. Last month, Figure announced in a LinkedIn post: “We delivered F.02 humanoid robots to our commercial client, and they’re currently hard at work.” With significant backing from major investors including Microsoft and Nvidia, Figure will provide fierce competition for the humanoid robot market. Figure 02 humanoid robots at work in a BMW factory. Source: YouTube: https://youtu.be/WlUFoZstcWg Creating a world view LeCun did have a point, however, as more advances are required before robots have more complete human capabilities. It is simpler to move parts in a factory than to navigate dynamic, complex environments. The current generation of robots face three key challenges: processing visual information quickly enough to react in real-time; understanding the subtle cues in human behavior; and adapting to unexpected changes in their environment. Most humanoid robots today are dependent on cloud computing and the resulting network latency can make simple tasks like picking up an object difficult.  One company working to overcome current robotics limitations is startup World Labs, founded by “AI Godmother” Fei Fei Li. Speaking with Wired, Li said: “The physical world for computers is seen through cameras, and the computer brain behind the cameras. Turning that vision into reasoning, generation and eventual interaction involves understanding the physical structure, the physical dynamics of the physical world. And that technology is called spatial intelligence.”  Gen AI powers spatial intelligence by helping robots map their surroundings in real-time, much like humans do, predicting how objects might move or change. Such advancements are crucial for creating autonomous humanoid robots capable of navigating complex, real-world scenarios with the adaptability and decision-making skills needed for success. While spatial intelligence relies on real-time data to build mental maps of the environment, another approach is to help the humanoid robot infer the real world from a single still image. As explained in a pre-published paper, Generative World Explorer (GenEx) uses AI to create a detailed virtual world from a single image, mimicking how humans make inferences about  their surroundings. While still in the research phase, this capability will help robots to make split-second decisions or navigate new environments with limited sensor data. This would allow them to quickly understand and adapt to spaces they have never experienced before. The ChatGPT moment for robotics is coming While World Labs and GenEx push the boundaries of AI reasoning, Nvidia’s Cosmos and GR00T are addressing the challenges of equipping humanoid robots with real-world adaptability and interactive capabilities. Cosmos is a family of AI “world foundation models” that help robots understand physics and spatial relationships, while GR00T (Generalist Robot 00 Technology) allows robots to learn by watching humans — like how an apprentice learns from a master. Together, these technologies help robots understand both what to do and how to do it naturally. These innovations reflect a broader push in the robotics industry to equip humanoid robots with both cognitive and physical adaptability. GR00T could enable humanoid robots to help in healthcare by observing and mimicking medical professionals, while GenEx might allow robots to navigate disaster zones by inferring environments from limited visual input. As reported by Investor’s Business Daily, Nvidia CEO Jensen Huang said: “The ChatGPT moment for robotics is coming.”  Another company working to create physical AI

AI comes alive: From bartenders to surgical aides to puppies, tomorrow’s robots are on their way Read More »

The AI gold rush: Why risks and rewards remain a balancing act

Presented by Stibo Systems In the race to capitalize on the transformative potential of AI, enterprises may be taking major risks with their organization’s future by deploying AI solutions without fully considering the ethical, governance and security implications. A recent study from global SaaS solutions provider Stibo Systems, “AI: The High-Stakes Gamble for Enterprises,” found that a full 49% of business leaders admit they are not prepared to use AI responsibly, 79% of organizations do not have bias mitigation policies and practices in place, and 54% of organizations have not implemented new security measures to keep up with AI integration — but only 32% of business leaders admit they’ve rushed AI adoption. Gaps in literacy, ethical usage and organizational preparedness are critical concerns, says Gustavo Amorim, CMO at Stibo Systems, but it’s a balancing act. “Companies need to adopt AI to stay competitive — to realize the major benefits like efficiency, higher productivity, lower costs and greater innovation,” Amorim says. “But in the need to move forward, they’re often leaving business and organizational readiness behind.” Part of the wave of adoption comes from a shift in how business leaders are viewing AI — today it’s overwhelmingly considered to be an overall enabler: nearly 90% of business leaders surveyed said they are eager to use the technology as their partner in critical decision-making. “It’s not that leaders don’t see the risks or don’t realize and acknowledge that there are some risks involved,” he explains. “It’s that we’ve seen the short-term business benefits at this point, and long-term risks and implications have not come home to roost for many organizations yet. But the price tag is steep, including reputational damage, regulatory penalties and the erosion of trust among customers and stakeholders.” Where change management fails technology adoption The three pillars of business readiness and change management are technology, people and process. From an AI perspective, it has become far easier, and far faster, to implement an AI tool, flip the switch — and then consider potential consequences. Digging into the data to ensure it’s fair and free from bias, and fully secure throughout the AI pipeline, takes a great deal of change management effort and time. Unfortunately, that data, and how it’s used, is also foundational to the actual results that an AI initiative produces. “Companies are not necessarily taking the steps and the time to ensure that those things are done in parallel with adoption,” Amorim says. “Changing an internal process, or training the people who are going to be using the technology, giving them the skills required to eliminate bias and write fair treatment and data privacy into the DNA of a strategy, is a major hurdle.” Governance, too, is a challenge and often overlooked. Without putting standards and processes in place around managing the inputs and the outputs, instead of running in parallel with other business processes and becoming a regular part of how business is done, outputs become a problem to be managed. All these issues impact things like how customer data is used in AI, for instance — sharply increasing the potential for problems like non-compliant data use, or data breaches. However, the Stibo Systems study shows that these aren’t currently major concerns for most business leaders, and they haven’t taken those preliminary steps. AI adoption is simply outpacing the development of ethical guidelines, 61% of leaders report, while 49% say they’re not prepared to use AI responsibly, even though 65% feel confident in their AI literacy skill. Unfortunately, that confidence in their preparedness is not reflected in their organization’s AI policies and procedures. For instance, a full 69% of organizations have not implemented any data governance training as part of their AI strategy. AI literacy: the foundation of an ethical framework Data is the foundation of AI, but humans remain the single most important element of an AI strategy right from the jump. Models are created by humans, and humans are in charge of choosing and preparing the data that’s required to train those models, the data that leads to conclusions and outcomes. But AI literacy also includes the business implications of the technology, and understanding what kinds of business processes can be run, and how they can be run fairly and accurately, and that they comply with a company’s internal policies. And because it’s a technology that’s evolving at a breakneck speed, and one that self-learns, adapts and becomes more intelligent, based on the data inputs it receives, you’re never done. Part of data literacy includes continuously analyzing the outputs against certain criteria, which will evolve just as rapidly. AI literacy and organizational preparedness, like most technology initiatives, starts at the top. It’s not just sponsorship of AI initiatives, but the top executive level actively engaging with the subject, offering education around how AI is incorporated into the day-to-day of an organization’s business processes, and setting an example around the importance of responsible AI. “This is usually not a conversation that most senior executives will engage with,” Amorim says. “Imagine a CMO, a CFO or a CEO talking about data bias and how that might become a corporate risk for the organization,” he says. “It’s not a common agenda, which is why it’s ideal to start there.” From there, it’s a matter of turning that into action, by establishing cross-functional teams that can develop policies, standards and guidelines. “Most every company has guidelines and standards around using social media in the workplace, but not every company has guidelines on how you should use AI — and this is essential,” Amorim says.  Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

The AI gold rush: Why risks and rewards remain a balancing act Read More »

Imec spins out Vertical Compute memory chip firm in $20.5M deal

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Europe’s Imec.xpand is spinning out memory chip firm Vertical Compute in a seed investment round worth $20.5 million. Founded by CEO Sylvain Dubois (ex-Google) and CTO Sebastien Couet (ex-imec), today announced that it successfully closed a seed investment $20.5 million, or 20 million euros. The round was led by Imec.xpand and supported by a strong investor base including Eurazeo, XAnge, Vector Gestion and imec. The funding will support Vertical Compute’s ambition to develop a novel vertical integrated memory and compute technology, unlocking a new generation of AI applications. Vertical Compute’s technology will have a transformative impact, enabling next-generation applications with unparalleled efficiency and privacy. By minimizing data movement and bringing large data closer to computation, the innovation ensures energy savings of up to 80%, unlocks hyper-personalized AI solutions, and eliminates the need for remote data transfers, protecting user privacy. “Memory technologies face limitations in both density and performance scaling, while processor performance continues to surge. The extreme data access requirements of AI workloads exacerbate this challenge, making it imperative to overcome the memory wall to enable the next wave of AI innovations. We believe going Vertical is the path to 100X gains”, said Sébastien Couet, CTO of Vertical Compute, in a statement. Tackling the Memory Wall The rapid advancements in large language models and generative AI are transforming virtually all industries at an unprecedented pace. However, these large-scale AI models still heavily rely on complex cloud infrastructure and high bandwidth memories, leading to data transfer latency, high energy consumption and sending sensitive data to distant servers. Edge computing can address these issues, but inferencing large AI models on smartphones, PCs or smart home devices faces significant cost, power and scalability constraints. The big underlying problem is the ‘memory wall’. Static Random Access Memory (SRAM), integrated as caches of the CPU or GPU, is fast but very small and expensive. Dynamic Random-Access Memory (DRAM), the main memory of computer systems, is larger but expensive and energy consuming. The scaling of both memory technologies in density and performance is slowing down while processor speeds and market needs keep increasing, causing a significant bottleneck. This problem is rapidly escalating due to the surging demand for AI workloads, requiring vast amounts of data to be accessed quickly. Overcoming this memory wall is crucial for advancing AI inference. Innovating with Vertical Compute’s Chiplet Technology Vertical Compute is spinning out of Imec. The convergence of large-scale AI models and edge computing calls for a transformative shift in the way data is processed. Vertical Compute will capture this opportunity by developing chiplet-based solutions — which take a modular approach to chip design — leveraging a new way to store bits in a high aspect ratio vertical structure. The concept behind Vertical Compute’s core patented technology has been invented by Sebastien Couet, Imec’s former Magnetic Program Director. The core innovation resides in the integration of vertical data lanes on top of computation units. It has the potential to outperform DRAM in terms of density, cost and energy, by reducing data movements from centimeters to nanometers. This promising technology, coupled with an ambitious commercialization plan, has led to the creation of this new semiconductor venture. “The surge in data-intensive applications like generative AI demands a drastic new approach to transferring data between computing cores and memory units. Our solution is designed to overcome the fundamental scaling limitations of memory technologies by going vertical. We are committed to unlocking the full potential of large language models on the edge without any compromise,” said Sylvain Dubois, CEO of Vertical Compute, in a statement. “We want to recruit the very best from all over Europe and finally put Europe at the forefront in terms of tech”, said Dubois. Driving Recruitment and Growth Vertical Compute is headquartered in Louvain-La-Neuve (BE), with its main R&D offices in Leuven (BE), Grenoble (FR) and Nice (FR). The company is recruiting an elite team of engineers to support its ambitious R&D goals and accelerate the development and commercialization of its chiplet-based technology. This seed investment round highlights the confidence in the leadership team’s capabilities and the disruptive potential of this game-changing technology. We could not be more excited to collaborate with Sylvain, Sebastien and their team and to help them to achieve their ambitious goals”, said Tom Vanhoutte from Imec.xpand, in a statement. “We are confident that, with the ongoing support of our teams and ecosystem, Vertical Compute can become a disruptor in the semiconductor industry. The strong international investor base shows that we are not alone in this belief,” said Patrick Vandenameele, co-COO at Imec, in a statement. Vertical Compute was founded in 2024 to solve the memory bottleneck in computer systems. source

Imec spins out Vertical Compute memory chip firm in $20.5M deal Read More »

Cerebras Systems teams with Mayo Clinic on genomic model that predicts arthritis treatment

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cerebras Systems has teamed with Mayo Clinic to create an AI genomic foundation model that predicts the best medical treatments for people with reheumatoid arthritis. It could also be useful in predicting the best treatment for people with cancer and cardiovascular disease, said Andrew Feldman, CEO of Cerebras Systems, in an interview with GamesBeat. Mayo Clinic, in collaboration with Cerebras Systems, announced significant progress in developing artificial intelligence tools to advance patient care, today at the JP Morgan Healthcare Conference in San Francisco. As part of Mayo Clinic’s commitment to transforming healthcare, the institution has led the development of a world-class genomic foundation model, designed to support physicians and patients. Like Nvidia and other semiconductor companies, Cerebras is focused on AI supercomputing. But its approach is much different from Nvidia’s, which relies on individual AI processors. Cerebras Systems designs an entire wafer — with many chips on a single wafer of silicon — that collectively solve big AI problems and other computing tasks with much lower power consumption. Feldman said it took tens of such systems to compute the genomic foundation model over months of time. Still, that was far less time, effort, power and cost than traditional computing solutions, he said. PitchBook recently predicted that Cerebras would have an IPO in 2025. Cerebras Systems’ calculations can determine which treatment will work on a given patient with rheumatoid arthritis. Building on Mayo Clinic’s leadership in precision medicine, the model is designed to improve diagnostics and personalize treatment selection, with an initial focus on Rheumatoid Arthritis (RA). RA treatment presents a significant clinical challenge, often requiring multiple attempts to find effective medications for individual patients. Traditional approaches examining single genetic markers have shown limited success in predicting treatment response. The joint team’s genomic model was trained by mixing publicly available human reference genome data with Mayo’s comprehensive patient exome data. The human reference genome is a digital DNA sequence representing a composite, “idealized” version of the human genome. It serves as a standard framework against which individual human genomes can be compared, enabling researchers to identify genetic variations. In contrast to models trained exclusively on human reference genome, Mayo’s genomic foundation model demonstrates significantly better results on genomic variant classification because it was trained on data sourced from 500 Mayo Clinic patients. As more patient data is incorporated into training, the team expects continuous improvement in model quality. The team designed new benchmarks to evaluate the model’s clinically relevant capabilities, such as detecting specific medical conditions from DNA data, addressing a gap in publicly available benchmarks, which focus primarily on identifying structural elements like regulatory or functional regions. Cerebras Systems said its AI prediction for treatment is highly accurate. The Mayo Clinic Genomic Foundation Model demonstrates state-of-the-art accuracy in several key areas: 68-100% accuracy in RA benchmarks, 96% accuracy in cancer predisposing prediction, and 83% accuracy in cardiovascular phenotype prediction. These capabilities align to Mayo Clinic’s vision of delivering world leading healthcare through AI technology. More testing will need to be done to verify the results, Feldman said. “Mayo Clinic is committed to using the most advanced AI technology to train models that will fundamentally transform healthcare,” Matthew Callstrom, Mayo Clinic’s medical director for strategy and chair of radiology, in a statement. “Our collaboration with Cerebras enabled us to create a state-of-the-art AI model for genomics. In less than a year, we’ve developed promising AI tools that will help our physicians make more informed decisions based on genomic data.” “Mayo’s genomic foundation model sets a new bar for genomic models, excelling not only in standard tasks like predicting functional and regulatory properties of DNA but also enabling discoveries of complex correlations between genetic variants and medical conditions,” said Natalia Vassilieva, field CTO at Cerebras Systems, in a statement. “Unlike current approaches focused on single-variant associations, this model enables the discovery of connections where collections of variants contribute to a particular condition.” Cerebras Systems can parse the meaning of mutations. The rapid development of these models – typically a multi-year endeavor – was accelerated by training Mayo Clinic’s custom models on the Cerebras AI platform. The Mayo Genomic Foundation Model represents significant steps toward enhancing clinical decision support and advancing precision medicine. Cerebras’ flagship product is the CS-3, a system powered by the Wafer-Scale Engine-3. Advancing AI for chest X-rays Separately, Mayo Clinic today unveiled separate groundbreaking collaborations with Microsoft Research and with Cerebras Systems in the field of generative artificial intelligence (AI), designed to personalize patient care, significantly accelerate diagnostic time and improve accuracy. Announced during the J.P. Morgan Healthcare Conference, the projects focus on developing and testing foundation models customized for various applications, leveraging the power of multimodal radiology images and data (including CT scans and MRIs) with Microsoft Research and genomic sequencing data with Cerebras. The innovations have the potential to transform how clinicians approach diagnosis and treatment, ultimately leading to better patient outcomes.  Foundation AI models are large, pre-trained models capable of adapting to and carrying out many tasks with minimal extra training. They learn from massive datasets, acquiring general knowledge that can be used across diverse applications. This adaptability makes them efficient and versatile building blocks for numerous AI systems. Mayo Clinic and Microsoft Research are collaboratively developing foundation models that integrate text and images. For this use case, Mayo and Microsoft Research are working together to explore the use of generative AI in radiology using Microsoft Research’s AI technology and Mayo Clinic’s X-ray data. Empowering clinicians with instant access to the information they need is at the heart of this research project. Mayo Clinic aims to develop a model that can automatically generate reports, evaluate tube and line placement in chest X-rays, and detect changes from prior images. This proof-of-concept model seeks to improve clinician workflow and patient care by providing a more efficient and comprehensive analysis of radiographic images. The Mayo Clinic has 76,000 people

Cerebras Systems teams with Mayo Clinic on genomic model that predicts arthritis treatment Read More »

Anthropomorphizing AI: Dire consequences of mistaking human-like for human have already emerged

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In our rush to understand and relate to AI, we have fallen into a seductive trap: Attributing human characteristics to these robust but fundamentally non-human systems. This anthropomorphizing of AI is not just a harmless quirk of human nature — it is becoming an increasingly dangerous tendency that might cloud our judgment in critical ways. Business leaders are comparing AI learning to human education to justify training practices to lawmakers crafting policies based on flawed human-AI analogies. This tendency to humanize AI might inappropriately shape crucial decisions across industries and regulatory frameworks. Viewing AI through a human lens in business has led companies to overestimate AI capabilities or underestimate the need for human oversight, sometimes with costly consequences. The stakes are particularly high in copyright law, where anthropomorphic thinking has led to problematic comparisons between human learning and AI training. The language trap Listen to how we talk about AI: We say it “learns,” “thinks,” “understands” and even “creates.” These human terms feel natural, but they are misleading. When we say an AI model “learns,” it is not gaining understanding like a human student. Instead, it performs complex statistical analyses on vast amounts of data, adjusting weights and parameters in its neural networks based on mathematical principles. There is no comprehension, eureka moment, spark of creativity or actual understanding — just increasingly sophisticated pattern matching. This linguistic sleight of hand is more than merely semantic. As noted in the paper, Generative AI’s Illusory Case for Fair Use: “The use of anthropomorphic language to describe the development and functioning of AI models is distorting because it suggests that once trained, the model operates independently of the content of the works on which it has trained.” This confusion has real consequences, mainly when it influences legal and policy decisions. The cognitive disconnect Perhaps the most dangerous aspect of anthropomorphizing AI is how it masks the fundamental differences between human and machine intelligence. While some AI systems excel at specific types of reasoning and analytical tasks, the large language models (LLMs) that dominate today’s AI discourse — and that we focus on here — operate through sophisticated pattern recognition. These systems process vast amounts of data, identifying and learning statistical relationships between words, phrases, images and other inputs to predict what should come next in a sequence. When we say they “learn,” we’re describing a process of mathematical optimization that helps them make increasingly accurate predictions based on their training data. Consider this striking example from research by Berglund and his colleagues: A model trained on materials stating “A is equal to B” often cannot reason, as a human would, to conclude that “B is equal to A.” If an AI learns that Valentina Tereshkova was the first woman in space, it might correctly answer “Who was Valentina Tereshkova?” but struggle with “Who was the first woman in space?” This limitation reveals the fundamental difference between pattern recognition and true reasoning — between predicting likely sequences of words and understanding their meaning. The copyright conundrum This anthropomorphic bias has particularly troubling implications in the ongoing debate about AI and copyright. Microsoft CEO Satya Nadella recently compared AI training to human learning, suggesting that AI should be able to do the same if humans can learn from books without copyright implications. This comparison perfectly illustrates the danger of anthropomorphic thinking in discussions about ethical and responsible AI. Some argue that this analogy needs to be revised to understand human learning and AI training. When humans read books, we do not make copies of them — we understand and internalize concepts. AI systems, on the other hand, must make actual copies of works — often obtained without permission or payment — encode them into their architecture and maintain these encoded versions to function. The works don’t disappear after “learning,” as AI companies often claim; they remain embedded in the system’s neural networks. The business blind spot Anthropomorphizing AI creates dangerous blind spots in business decision-making beyond simple operational inefficiencies. When executives and decision-makers think of AI as “creative” or “intelligent” in human terms, it can lead to a cascade of risky assumptions and potential legal liabilities. Overestimating AI capabilities One critical area where anthropomorphizing creates risk is content generation and copyright compliance. When businesses view AI as capable of “learning” like humans, they might incorrectly assume that AI-generated content is automatically free from copyright concerns. This misunderstanding can lead companies to: Deploy AI systems that inadvertently reproduce copyrighted material, exposing the business to infringement claims Fail to implement proper content filtering and oversight mechanisms Assume incorrectly that AI can reliably distinguish between public domain and copyrighted material Underestimate the need for human review in content generation processes The cross-border compliance blind spot The anthropomorphic bias in AI creates dangers when we consider cross-border compliance. As explained by Daniel Gervais, Haralambos Marmanis, Noam Shemtov, and Catherine Zaller Rowland in “The Heart of the Matter: Copyright, AI Training, and LLMs,” copyright law operates on strict territorial principles, with each jurisdiction maintaining its own rules about what constitutes infringement and what exceptions apply. This territorial nature of copyright law creates a complex web of potential liability. Companies might mistakenly assume their AI systems can freely “learn” from copyrighted materials across jurisdictions, failing to recognize that training activities that are legal in one country may constitute infringement in another. The EU has recognized this risk in its AI Act, particularly through Recital 106, which requires any general-purpose AI model offered in the EU to comply with EU copyright law regarding training data, regardless of where that training occurred. This matters because anthropomorphizing AI’s capabilities can lead companies to underestimate or misunderstand their legal obligations across borders. The comfortable fiction of AI “learning” like humans obscures the reality that AI training involves complex copying and storage operations that trigger different legal obligations in other jurisdictions. This fundamental misunderstanding of AI’s

Anthropomorphizing AI: Dire consequences of mistaking human-like for human have already emerged Read More »

Google’s Gemini AI just shattered the rules of visual processing — here’s what that means for you

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google’s Gemini AI has quietly upended the AI landscape, achieving a milestone few thought possible: The simultaneous processing of multiple visual streams in real time. This breakthrough — which allows Gemini to not only watch live video feeds but also to analyze static images simultaneously — wasn’t unveiled through Google’s flagship platforms. Instead, it emerged from an experimental application called “AnyChat.” This unanticipated leap underscores the untapped potential of Gemini’s architecture, pushing the boundaries of AI’s ability to handle complex, multi-modal interactions. For years, AI platforms have been restricted to managing either live video streams or static photos, but never both at once. With AnyChat, that barrier has been decisively broken. “Even Gemini’s paid service can’t do this yet,” Ahsen Khaliq, machine learning (ML) lead at Gradio and the creator of AnyChat, said in an exclusive interview with VentureBeat. “You can now have a real conversation with AI while it processes both your live video feed and any images you want to share.” A Gradio team member demonstrates Gemini AI’s new capability to process real-time video alongside static images during a voice chat session, showcasing the potential for multi-stream visual processing in artificial intelligence. (credit: x.com / @freddy_alfonso_) How Google’s Gemini is quietly redefining AI vision The technical achievement behind Gemini’s multi-stream capability lies in its advanced neural architecture — an infrastructure that AnyChat skillfully exploits to process multiple visual inputs without sacrificing performance. This capability already exists in Gemini’s API, but it has not been made available in Google’s official applications for end users. In contrast, the computational demands of many AI platforms, including ChatGPT, limit them to single-stream processing. For example, ChatGPT currently disables live video streaming when an image is uploaded. Even handling one video feed can strain resources, let alone when combining it with static image analysis. The potential applications of this breakthrough are as transformative as they are immediate. Students can now point their camera at a calculus problem while showing Gemini a textbook for step-by-step guidance. Artists can share works-in-progress alongside reference images, receiving nuanced, real-time feedback on composition and technique. The interface of Gemini Chat, an experimental platform leveraging Google’s Gemini AI for real-time audio, video streaming and simultaneous image processing, showcasing its potential for advanced AI applications. (Credit: Hugging Face / Gradio) The technology behind Gemini’s multi-stream AI breakthrough What makes AnyChat’s achievement remarkable is not just the technology itself but the way it circumvents the limitations of Gemini’s official deployment. This breakthrough was made possible through specialized allowances from Google’s Gemini API, enabling AnyChat to access functionality that remains absent in Google’s own platforms. Using these expanded permissions, AnyChat optimizes Gemini’s attention mechanisms to track and analyze multiple visual inputs simultaneously — all while maintaining conversational coherence. Developers can easily replicate this capability using a few lines of code, as demonstrated by AnyChat’s use of Gradio, an open-source platform for building ML interfaces. For example, developers can launch their own Gemini-powered video chat platform with image upload support using the following code snippet: A simple Gradio code snippet allows developers to create a Gemini-powered interface that supports simultaneous video streaming and image uploads, showcasing the accessibility of advanced AI tools.(Credit: Hugging Face / Gradio) This simplicity highlights how AnyChat isn’t just a demonstration of Gemini’s potential, but a toolkit for developers looking to build custom vision-enabled AI applications. “The real-time video feature in Google AI Studio can’t handle uploaded images during streaming,” Khaliq told VentureBeat. “No other platform has implemented this kind of simultaneous processing right now.” The experimental app that unlocked Gemini’s hidden capabilities AnyChat’s success wasn’t a simple accident. The platform’s developers worked closely with Gemini’s technical architecture to expand its limits. By doing so, they revealed a side of Gemini that even Google’s official tools haven’t yet explored. This experimental approach allowed AnyChat to handle simultaneous streams of live video and static images, essentially breaking the “single-stream barrier.” The result is a platform that feels more dynamic, intuitive and capable of handling real-world use cases much more effectively than its competitors. Why simultaneous visual processing is a game-changer The implications of Gemini’s new capabilities stretch far beyond creative tools and casual AI interactions. Imagine a medical professional showing an AI both live patient symptoms and historical diagnostic scans at the same time. Engineers could compare real-time equipment performance against technical schematics, receiving instant feedback. Quality control teams could match production line output against reference standards with unprecedented accuracy and efficiency. In education, the potential is transformative. Students can use Gemini in real-time to analyze textbooks while working on practice problems, receiving context-aware support that bridges the gap between static and dynamic learning environments. For artists and designers, the ability to showcase multiple visual inputs simultaneously opens up new avenues for creative collaboration and feedback. What AnyChat’s success means for the future of AI innovation For now, AnyChat remains an experimental developer platform, operating with expanded rate limits granted by Gemini’s developers. Yet, its success proves that simultaneous, multi-stream AI vision is no longer a distant aspiration — it’s a present reality, ready for large-scale adoption. AnyChat’s emergence raises provocative questions. Why hasn’t Gemini’s official rollout included this capability? Is it an oversight, a deliberate choice in resource allocation, or an indication that smaller, more agile developers are driving the next wave of innovation? As the AI race accelerates, the lesson of AnyChat is clear: The most significant advances may not always come from the sprawling research labs of tech giants. Instead, they may originate from independent developers who see potential in existing technologies — and dare to push them further. With Gemini’s groundbreaking architecture now proven capable of multi-stream processing, the stage is set for a new era of AI applications. Whether Google will fold this capability into its official platforms remains uncertain. One thing is clear, however: The gap between what AI can do and what it

Google’s Gemini AI just shattered the rules of visual processing — here’s what that means for you Read More »

Devin 1.2: Updated AI engineer enhances coding with smarter in-context reasoning, voice integration

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Last year, Cognition started the AI agent wave with a product called Devin — the world’s first AI engineer. The offering was under wraps for several months, but now it’s generally available and learning new chops very quickly. Case in point: the Scott Wu-led startup has just released Devin 1.2, which brings a bunch of new capabilities to take the AI engineer’s ability to handle entire development projects to a whole new level. The biggest highlight of Devin 1.2 is its improved in-context reasoning, which makes the agent better at handling and reusing code. It also includes the ability to take voice messages via Slack, which gives users a more seamless way to tell Devin what it has to do. The development comes at a time when AI-powered agents are being touted as the future of modern work. Experts believe that there will soon be a time when humans and agents will be working together, with the former seamlessly handling repetitive tasks (which is already beginning to happen). Recently, at CES, Nvidia boss Jensen Huang said that in the future, enterprise IT departments would evolve into “HR departments” for AI, responsible for commissioning and maintaining agents working across different functions within the company. What does Devin 1.2 bring to the table? While not a major upgrade, Devin 1.2 introduces some interesting capabilities to make the agent better at its job. The number one feature here is the improved ability to reason in context in a code repository. This essentially means Devin can now better understand the structure and content of a repository. With this understanding, the agent can identify which file is relevant to a particular task, recognize and re-use existing code and patterns, and be more accurate in suggesting edits or creating pull requests (PRs), reducing errors and manual adjustments. For developers, this capability would mean accelerated workflows and reduced cognitive load from searching for files, understanding codebases or fixing inconsistent code.  The other notable update with Devin 1.2 is the introduction of voice messages. Devin can also take voice commands from users, via Slack.  Voice messages for Devin via Slack All one has to do is tag Devin in a Slack chat, hit the “Record audio clip” button and describe the task or feedback the AI engineer should execute. Devin will prepare a step-by-step action and begin to execute the command using its developer tools — its own shell, code editor and browser. The move simplifies how one interacts with the agent, saving the hassle of typing natural-language prompts into Devin’s chatbot-style interface. Improved login process, new enterprise controls Cognition has also made some usability improvements in Devin. For instance, in the new release the company is introducing machine snapshots to simplify the login process for Devin’s workspace. “If you log in for Devin during onboarding with Devin’s browser, we’ll save the cookie for future sessions (if the cookie expires, you’ll need to provide credentials for Devin in Secrets as well). This also unblocks authentication processes that require visiting a URL on Devin’s machine,” the company wrote in a blog post. Cognition is also introducing enterprise accounts, where organization admins will get a centralized console to manage multiple Devin workspaces, including members and their access controls, as well as billing for them.  Finally, the company is adding a usage-based billing model, allowing users to pay for additional capacity beyond their subscription limits. This way, once the users have exhausted their monthly allocation of ACUs, they can continue building beyond that limit by paying for extra usage.  The model has been active since January 9, with users able to set their additional usage budgets according to their needs. This allows users to maintain control over spending while ensuring uninterrupted service when they need additional capacity. Currently, Devin is generally available for engineering assistance at a starting price of $500 a month — with no seat limits. Multiple enterprises are already incorporating it into their workflows, including Lumos, OpenSea, Curai Health, Nu Bank and Ramp. Devin’s new capabilities come as competition in the AI engineering space is heating up. From GitHub Copilot’s widespread adoption to Magic and Poolside AI raising substantial funding to develop cutting-edge capabilities, the race to create the ultimate AI coding assistant is intensifying. Each player is striving to redefine software development, promising faster workflows, reduced cognitive load, and seamless collaboration between human and AI. As these AI-powered agents continue to evolve, they’re not only transforming how developers work but shaping the future of modern work itself, where efficiency and innovation are driven by a partnership between humans and machines. By 2028, Gartner estimates, 33% of enterprise software applications will include agentic AI, enabling autonomous decision-making in 15% of day-to-day work. source

Devin 1.2: Updated AI engineer enhances coding with smarter in-context reasoning, voice integration Read More »

Microsoft’s AutoGen update boosts AI agents with cross-language interoperability and observability

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has updated its AutoGen orchestration framework so the agents it helps build can become more flexible and give organizations more control.  AutoGen v0.4 brings robustness to AI agents and solves issues customers identified around architectural constraints.  “The initial release of AutoGen generated widespread interest in agentic technologies,” Microsoft researchers said in a blog post. “At the same time, users struggled with architectural constraints, an inefficient API compounded by rapid growth and limited debugging and intervention functionality.” The researchers added that customers are asking for stronger observability and control, flexibility around multi-agent collaboration and reusable components.  AutoGen v0.4 is more modular and extensible, with scalability and distributed agent networks. It adds asynchronous messaging; cross-language support, observability and debugging; and built-in and community extensions.  Asynchronous messaging means agents built with AutoGen v0.4 support event-driven and request-interaction patterns. The framework is more modular, so developers can add plug-in components and build long-running agents. It also enables users to design more complex and distributed agent networks.  AutoGen’s extension module simplifies the process of working with multi-agent teams and advanced model clients. It also allows open-source developers to manage their extensions.  To address the issue of observability, AutoGen v0.4 has built-in metric tracking, messaging tracing and debugging tools so users can monitor agent interactions. The updates enable interoperability between agents speaking different coding languages; for now, AutoGen v0.4 supports Python and .NET, but support for additional languages is in the works.  New framework Microsoft updated AutoGen’s framework to better define responsibilities across the framework, tools and application.  It has three layers: core, which consists of the foundational building blocks for an event-driven system; AgentChat, a “task-driven, high-level API built on the core layer” that features group chat, code execution and pre-built agents and is most similar to AutoGen v0.2; and first-party extensions, which interface with integrations like the Azure code executor and OpenAI’s model client.   Along with updating its framework, some tools Microsoft built around AutoGen also got an upgrade.  AutoGen Studio, a low-code interface for rapidly prototyping agents, was rebuilt on the AutoGen v4.0 AgentChat API. Users can get real-time agent updates, pause conversations or redirect agents with mid-execution control, design agent teams with a drag-and-drop interface, import custom agents and get interactive feedback.  Microsoft and agents Microsoft released AutoGen in October 2023 with the hope of simplifying how agents communicate with each other. Along with LangChain and LlamaIndex, AutoGen was one of the first AI agent orchestration frameworks released before agents became the buzzword they are today.  Since then, Microsoft released other agentic systems including Magentic-One, a generalist agentic system that can power multiple agents to complete tasks.  The company has embraced AI agents, deploying perhaps the largest AI agent ecosystems through its Copilot Studio platform.  But other companies are hot on its heels. Salesforce launched AgentForce, and more recently its updated AgentForce 2.0, while ServiceNow released a library of customizable agents. AWS has also added more support for creating multi-agent systems to its Bedrock platform.  source

Microsoft’s AutoGen update boosts AI agents with cross-language interoperability and observability Read More »