VentureBeat

Nvidia unveils AI foundation models running on RTX AI PCs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia today announced foundation models running locally on Nvidia RTX AI PCs that supercharge digital humans, content creation, productivity and development. GeForce has long been a vital platform for AI developers. The first GPU-accelerated deep learning network, AlexNet, was trained on the GeForce GTXTM 580 in 2012 — and last year, over 30% of published AI research papers cited the use of GeForce RTX. Jensen Huang, CEO of Nvidia, made the announcement during his CES 2025 opening keynote. Now, with generative AI and RTX AI PCs, anyone can be a developer. A new wave of low-code and no-code tools, such as AnythingLLM, ComfyUI, Langflow and LM Studio enable enthusiasts to use AI models in complex workflows via simple graphical user interfaces. NIM microservices connected to these GUIs will make it effortless to access and deploy the latest generative AI models. Nvidia AI Blueprints, built on NIM microservices, provide easy-to-use, preconfigured reference workflows for digital humans, content creation and more. To meet the growing demand from AI developers and enthusiasts, every top PC manufacturer and system builder is launching NIM-ready RTX AI PCs. “AI is advancing at light speed, from perception AI to generative AI and now agentic AI,” said Huang. “NIM microservices and AI Blueprints give PC developers and enthusiasts the building blocks to explore the magic of AI.” The NIM microservices will also be available with Nvidia Digits, a personal AI supercomputer that provides AI researchers, data scientists and students worldwide with access to the power of Nvidia Grace Blackwell. Project Digits features the new Nvidia GB10 Grace Blackwell Superchip, offering a petaflop of AI computing performance for prototyping, fine-tuning and running large AI models. Making AI NIMble How AI gets smarter Foundation models — neural networks trained on immense amounts of raw data — are the building blocks for generative AI. Nvidia will release a pipeline of NIM microservices for RTX AI PCs from top model developers such as Black Forest Labs, Meta, Mistral and Stability AI. Use cases span large language models (LLMs), vision language models, image generation, speech, embedding models for retrieval-augmented generation (RAG), PDF extraction and computer vision. “Making FLUX an Nvidia NIM microservice increases the rate at which AI can be deployed and experienced by more users, while delivering incredible performance,” said Robin Rombach, CEO of Black Forest Labs, oin a statement. Nvidia today also announced the Llama Nemotron family of open models that provide high accuracy on a wide range of agentic tasks. The Llama Nemotron Nano model will be offered as a NIM microservice for RTX AI PCs and workstations, and excels at agentic AI tasks like instruction following, function calling, chat, coding and math. NIM microservices include the key components for running AI on PCs and are optimized for deployment across NVIDIA GPUs — whether in RTX PCs and workstations or in thecloud. Developers and enthusiasts will be able to quickly download, set up and run these NIM microservices on Windows 11 PCs with Windows Subsystem for Linux (WSL). “AI is driving Windows 11 PC innovation at a rapid rate, and Windows Subsystem for Linux (WSL) offers a great cross-platform environment for AI development on Windows 11 alongside Windows Copilot Runtime,” said Pavan Davuluri, corporate vice president of Windows at Microsoft, in a statement. “Nvidia NIM microservices, optimized for Windows PCs, give developers and enthusiasts ready-to-integrate AI models for their Windows apps, further accelerating deployment of AI capabilities to Windows users.” The NIM microservices, running on RTX AI PCs, will be compatible with top AI development and agent frameworks, including AI Toolkit for VSCode, AnythingLLM, ComfyUI, CrewAI, Flowise AI, LangChain, Langflow and LM Studio. Developers can connect applications and workflows built on these frameworks to AI models running NIM microservices through industry-standard endpoints, enabling them to use the latest technology with a unified interface across the cloud, data centers, workstations and PCs. Enthusiasts will also be able to experience a range of NIM microservices using an upcoming release of the Nvidia ChatRTX tech demo. Putting a Face on Agentic AI Nvidia AI Blueprints To demonstrate how enthusiasts and developers can use NIM to build AI agents and assistants, Nvidia today previewed Project R2X, a vision-enabled PC avatar that can put information at a user’s fingertips, assist with desktop apps and video conference calls, read and summarize documents, and more. The avatar is rendered using Nvidia RTX Neural Faces, a new generative AI algorithm that augments traditional rasterization with entirely generated pixels. The face is then animated by a new diffusion-based NVIDIA Audio2FaceTM-3D model that improves lip and tongue movement. R2X can be connected to cloud AI services such as OpenAI’s GPT4o and xAI’s Grok, and NIM microservices and AI Blueprints, such as PDF retrievers or alternative LLMs, via developer frameworks such as CrewAI, Flowise AI and Langflow. AI Blueprints Coming to PC A wafer full of Nvidia Blackwell chips. NIM microservices are also available to PC users through AI Blueprints — reference AI workflows that can run locally on RTX PCs. With these blueprints, developers can create podcasts from PDF documents, generate stunning images guided by 3D scenes and more. The blueprint for PDF to podcast extracts text, images and tables from a PDF to create a podcast script that can be edited by users. It can also generate a full audio recording from the script using voices available in the blueprint or based on a user’s voice sample. In addition, users can have a real-time conversation with the AI podcast host to learn more. The blueprint uses NIM microservices like Mistral-Nemo-12B-Instruct for language, Nvidia Riva for text-to-speech and automatic speech recognition, and the NeMo Retriever collection of microservices for PDF extraction. The AI Blueprint for 3D-guided generative AI gives artists finer control over image generation. While AI can generate amazing images from simple text prompts, controlling image composition using only words can be challenging. With this blueprint, creators can use

Nvidia unveils AI foundation models running on RTX AI PCs Read More »

How Narvar is using AI and data to enhance post-purchase customer experiences

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More What happens after a customer clicks the “buy” button on an ecommerce website? It’s a domain known as post-purchase, and it’s often one of the costliest and most impactful aspects of operations for retailers. Post-purchase activities include figuring out delivery, customer retention and, if needed, returns. Among the pioneers in the space is Narvar which counts more than 1,500 global retailers, including big brands like Gap, Levis and Sonos among its customers. Across all its various customer touch points, Narvar collects information from more than 42 billion consumer interactions annually. Narvar today is expanding its services’ intelligence with a new AI-powered platform it calls IRIS (Intelligent Retail Insights Service). IRIS combines data, AI and analytics in a highly optimized platform. The goal is to help retailers combat fraud, optimize delivery promises, streamline returns and create more personalized customer experiences. Among the first services IRIS enables is the AI-powered Narvar Assist, which is designed to automate claims management and help reduce delivery claims fraud. Early results from a cohort of 20 retailers show dramatic improvements: an 80% reduction in fraud-related inquiries and a 25% decrease in appeasements, or the compensation retailers provide for shipping-related issues.  “We’re not just solving problems; we’re transforming what has traditionally been a cost center into a strategic advantage for retailers,” Anisa Kumar, Narvar’s CEO, told VentureBeat in an exclusive interview. Why AI in post-purchase operations is critical to retail success Kumar joined Narvar in 2021 as chief customer officer and became the CEO in October 2024. Prior to that, she had a long history working in the trenches of customer operations at Levis Strauss and Co., Walmart and Target where she saw retailers’ challenges firsthand. Retailers of all types generally spend a lot of time and effort thinking about consumer aquisition. Kumar noted that the big challenge, however, is keeping customers. “Post-purchase is really thinking through what’s that next frontier to keep your consumers coming back, and really treat them the way they need to be treated, give them personalized experiences,” she said. With all the data that Narvar collects, AI is now able to help retailers turn post-purchase into an activity that helps with customer retention. The use of AI in retail operation overall has struggled; for instance, a 2024 report from Forrester found high levels of interest, but low levels of adoption. As a SaaS offering, Narvar is making it easier for retailers to get the benefits of AI. Kumar explained that the IRIS platform will help create hyper-personalized post-purchase experiences for retailers and their end consumers. How Narvar is using AI to improve the bottom line  The IRIS system uses a combination of AI and data services from Google Cloud, as well as proprietary machine learning (ML) and predictive AI algorithms. Ram Ravicharan, CTO of Narvar, emphasized the power and importance of the data the company has to inform AI to help retailers. Narvar processes billions of consumer touchpoints, giving it unique insights into customer behavior and intent. Narvar’s IRIS is not using generative AI, although it is using techniques that have pioneered large language models (LLMs), including the use of transformers.  “If you think of transactions that people do on the purchase journey as a language, we now almost have a language of what the next sentence is going to be,” Ravicharan explained. “And that’s literally the way we look at it.” With predictive AI models and the data, Narvar has a solid understanding of customer intent. That can be extremely useful for customer retention as well as fraud prevention. Beyond fraud mitigation, IRIS is also designed to help retailers more accurate delivery promises and enhance customer loyalty. Prior to IRIS, Narvar tended to rely on rules-based models, particularly for commitments such as estimated delivery date. With the new models, there is more intelligence from across the retail network to provide a higher degree of accuracy, Kumar noted. For example, the system is aware of weather issues and carrier delivery systems that can impact delivery. “Everyone focuses on customer acquisition, but they lose them and pay to acquire them all over again,” Kumar explained. “IRIS helps retailers create lasting relationships by delivering personalized experiences when they matter most — after the sale.” Early users see gains  The Narvar Assist technology is not yet generally available, although it is being piloted by existing customers. Among those is Boston Proper. The clothing retailer has been a Narvar customer for 6 years, explained DeAnne Judd, Boston Proper’s CIO. To date, Boston Proper has used Narvar’s Engage solution to proactively notify consumers about the delivery of their orders and potential exceptions to improve visibility and customer experience. The company also uses Narvar’s Return and Exchange solution to automate return processing and provide visibility to the consumer on the status of their refund. Judd noted that, right now, Boston Proper is using the first IRIS solution, Assist, that leverages Narvar’s ecosystem to reduce costs due to fraud. “Since integrating Narvar Assist, customer contacts and costs have decreased due to its enhanced user interface and streamlined intelligent processes,” said Judd. Bridging online and in-store  Looking forward, Narvar plans to extend IRIS in a number of ways. While the initial Assist product has been focused on online transactions, Kumar noted that Narvar is working with a few retailers to extend the capabilities in-store as well. The Narvar platform has insights into data and interactions across online, in-store and even warehouse operations. “Our vision is to bridge online and in-store environments and the way we’ve constructed our models and the way we will develop transactional intent goes across channels,” she said. source

How Narvar is using AI and data to enhance post-purchase customer experiences Read More »

Cohere just launched ‘North’, its biggest AI bet yet for privacy-focused enterprises

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cohere released North today, a secure AI workspace platform that directly challenges Microsoft Copilot and Google Vertex AI in the enterprise market. The company claims its new platform outperforms both tech giants’ offerings across finance, HR, customer support and IT functions. North combines large language models, search capabilities and automation tools in a secure package that lets companies deploy AI while maintaining control over sensitive data. The platform operates in private cloud environments or on-premises installations, targeting regulated industries like finance and healthcare. Internal benchmarks comparing Microsoft Copilot, Google Vertex AI and Cohere North show significant discrepancies between automated and human evaluations of AI performance. While all platforms scored well in automated tests, Cohere North maintained consistent accuracy when subjected to human review, while competitors showed marked declines. (Credit: Cohere) Security becomes major battleground for enterprise AI adoption “The market for artificial intelligence is maturing, and enterprises have begun to understand the opportunity,” Cohere CEO Aidan Gomez said in an internal company letter shared on LinkedIn last month. “While consumers have fallen in love with the technology and use it as part of their daily lives, enterprises are struggling to keep up.” Royal Bank of Canada has already partnered with Cohere to develop North for Banking, a specialized version designed for financial institutions. This marks one of the first major enterprise deployments of the platform. Search technology promises to slash workflow times North’s built-in search system, Compass, processes multiple data types including images, presentations, spreadsheets and documents across languages. Internal testing shows the system reduces task completion times by more than 80% compared to manual searches. “It’s becoming clear that for enterprises, it is not sufficient to simply prompt or fine-tune an off-the-shelf consumer AI chatbot for a work environment,” Gomez said. “They want something customized for their needs. They want a true partner to help achieve their goals.” Cohere North outperformed rival platforms across key business functions, with particularly strong advantages in finance and IT operations. Microsoft Copilot notably lagged in IT-related tasks, achieving only 29% relative accuracy compared to North’s benchmark performance. Graph shows relative accuracy normalized by maximum value for each platform. (Credit: Cohere) Enterprise AI race shifts from raw power to practical implementation Gomez challenged the industry’s focus on computational scale, noting that “data quality and novel methods like synthetic data have driven far more of the progress these past 18 months than scale.” He claimed this approach has made Cohere “an order of magnitude more capital-efficient than our competition.” The platform lets employees build and customize AI tools for their specific needs without requiring technical expertise. Early testers include companies in finance, healthcare, manufacturing and infrastructure — sectors where data security has traditionally limited AI adoption. Cohere North early access North is currently available through an early access program, targeting the finance, healthcare, manufacturing and critical infrastructure sectors. The launch could reshape how businesses implement AI technologies, as companies increasingly prioritize security and customization over raw computational power. “Going forward, every company will be an AI company,” Gomez said, emphasizing the need for secure, rapidly deployable solutions. source

Cohere just launched ‘North’, its biggest AI bet yet for privacy-focused enterprises Read More »

LlamaIndex goes beyond RAG so agents can make complex decisions

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Popular AI orchestration framework LlamaIndex has introduced Agent Document Workflow (ADW), a new architecture that the company says goes beyond retrieval-augmented generation (RAG) processes and increases agent productivity.  As orchestration frameworks continue to improve, this method could offer organizations an option for enhancing agents’ decision-making capabilities.  LlamaIndex says ADW can help agents manage “complex workflows beyond simple extraction or matching.” Some agentic frameworks are based on RAG systems, which provide agents the information they need to complete tasks. However, this method does not allow agents to make decisions based on this information.  LlamaIndex gave some real-world examples of how ADW would work well. For instance, in contract reviews, human analysts must extract key information, cross-reference regulatory requirements, identify potential risks and generate recommendations. When deployed in that workflow, AI agents would ideally follow the same pattern and make decisions based on the documents they read for contract review and knowledge from other documents.  “ADW addresses these challenges by treating documents as part of broader business processes,” LlamaIndex said in a blog post. “An ADW system can maintain state across steps, apply business rules, coordinate different components and take actions based on document content — not just analyze it.”   LlamaIndex has previously said that RAG, while an important technique, remains primitive, particularly for enterprises seeking more robust decision-making capabilities using AI.  Understanding context for decision making LlamaIndex has developed reference architectures combining its LlamaCloud parsing capabilities with agents. It “builds systems that can understand context, maintain state and drive multi-step processes.” To do this, each workflow has a document that acts as an orchestrator. It can direct agents to tap LlamaParse to extract information from data, maintain the state of the document context and process, then retrieve reference material from another knowledge base. From here, the agents can start generating recommendations for the contract review use case or other actionable decisions for different use cases.  “By maintaining state throughout the process, agents can handle complex multi-step workflows that go beyond simple extraction or matching,” the company said. “This approach allows them to build deep context about the documents they’re processing while coordinating between different system components.” Differing agent frameworks Agentic orchestration is an emerging space, and many organizations are still exploring how agents — or multiple agents — work for them. Orchestrating AI agents and applications may become a bigger conversation this year as agents go from single systems to multi-agent ecosystems. AI agents aree an extension of what RAG offers, that is, the ability to find information grounded on enterprise knowledge.  But as more enterprises begin deploying AI agents, they also want them to do many of the tasks human employees do. And, for these more complicated use cases, “vanilla” RAG isn’t enough. One of the advanced approaches enterprises have considered is agentic RAG, which expands agents’ knowledge base. Models can decide if they needs to find more information, which tool to use to get that information and if the context it just fetched is relevant, before coming up with a result.  source

LlamaIndex goes beyond RAG so agents can make complex decisions Read More »

Nvidia’s Nemotron model families will advance AI agents

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As part of its bevy of AI announcements at CES 2025 today, Nvidia announced Nemotron model families to advance agentic AI. Available as Nvidia NIM microservices, open Llama Nemotron large language models and Cosmos Nemotron vision language models can supercharge AI agents on any accelerated system. Nvidia made the announcement as part of CEO Jensen Huang’s opening keynote today at CES 2025. Agentic AI Artificial intelligence is entering a new era — the age of agentic AI — where teams of specialized agents can help people solve complex problems and automate repetitive tasks. With custom AI agents, enterprises across industries can manufacture intelligence and achieve unprecedented productivity. These advanced AI agents require a system of multiple generative AI models optimized for agentic AI functions and capabilities. This complexity means that the need for powerful, efficient enterprise-grade models has never been greater. “AI agents is the next robotic industry and likely to be a multibillion-dollar opportunity,” Huang said. The Llama Nemotron family of open large language models (LLMs) is intended to provide a foundation for enterprise agentic AI. Built with Llama, the models can help developers create and deploy AI agents across a range of applications, including customer support, fraud detection, and product supply chain and inventory management optimization. To be effective, many AI agents need both language skills and the ability to perceive the world and respond with the appropriate action. Words and Visuals Nvidia Nemotron With the new Nvidia Cosmos Nemotron vision language models (VLMs) and Nvidia NIM microservices for video search and summarization, developers can build agents that analyze and respond to images and video from autonomous machines, hospitals, stores and warehouses, as well as sports events, movies and news. For developers seeking to generate physics-aware videos for robotics and autonomous vehicles, Nvidia today separately announced Nvidia Cosmos world foundation models. The Nemotron models optimize compute efficiency and accuracy for AI agents built with Llama foundation models — one of the most popular commercially viable open-source model collections, downloaded over 650 million times — and provide optimized building blocks for AI agent development. The models are pruned and trained with Nvidia’s latest techniques and high-quality datasets for enhanced agentic capabilities. They excel at instruction following, chat, function calling, coding and math, while being size-optimized to run on a broad range of Nvidia accelerated computing resources. “Agentic AI is the next frontier of AI development, and delivering on this opportunity requires full-stack optimization across a system of LLMs to deliver efficient, accurate AI agents,” said Ahmad Al-Dahle, vice president and head of GenAI at Meta, in a statement. “Through our collaboration with Nvidia and our shared commitment to open models, the Nvidia Llama Nemotron family built on Llama can help enterprises quickly create their own custom AI agents.” Early adopters Leading AI agent platform providers including SAP and ServiceNow are expected to be among the first to use the new Llama Nemotron models. “AI agents that collaborate to solve complex tasks across multiple lines of the business will unlock a whole new level of enterprise productivity beyond today’s generative AI scenarios,” said Philipp Herzig, chief AI officer at SAP, in a statement. “Through SAP’s Joule, hundreds of millions enterprise users will interact with these agents to accomplish their goals faster than ever before. Nvidia’s new open Llama Nemotron model family will foster the development of multiple specialized AI agents to transform business processes.” “AI agents make it possible for organizations to achieve more with less effort, setting new standards for business transformation,” said Jeremy Barnes, vice president of platform AI at ServiceNow, in a statement. “The improved performance and accuracy of Nvidia’s open Llama Nemotron models can help build advanced AI agent services that solve complex problems across functions, in any industry.” The Nvidia Llama Nemotron models use Nvidia NeMo for distilling, pruning and alignment. Using these techniques, the models are small enough to run on a variety of computing platforms while providing high accuracy as well as increased model throughput. The Nemotron models will be available as downloadable models and as Nvidia NIM microservices that can be easily deployed on clouds, data centers, PCs and workstations. They are intended to offer enterprises industry-leading performance with reliable, secure and seamless integration into their agentic AI application workflows. Customize and connect to business knowledge with Nvidia NeMo The Llama Nemotron and Cosmos Nemotron model families are coming in Nano, Super and Ultra sizes to provide options for deploying AI agents at every scale. ● Nano: The most cost-effective model optimized for real-time applications with low latency, ideal for deployment on PCs and edge devices ● Super: A high-accuracy model offering exceptional throughput on a single GPU ● Ultra: The highest-accuracy model, designed for data-center-scale applications demanding the highest performance Enterprises can also customize the models for their specific use cases and domains with Nvidia NeMo microservices to simplify data curation, accelerate model customization and evaluation, and apply guardrails to keep responses on track. With Nvidia NeMo Retriever, developers can also integrate retrieval-augmented generation (RAG) capabilities to connect models to their enterprise data. And using Nvidia Blueprints for agentic AI, enterprises can create their own applications using Nvidia’s advanced AI tools and end-to-end development expertise. In fact, Nvidia Cosmos Nemotron, Nvidia Llama Nemotron and NeMo Retriever supercharge the new Nvidia Blueprint for video search and summarization (announced separately today). NeMo, NeMo Retriever and Nvidia Blueprints are all available with the Nvidia AI Enterprise software platform. Availability Llama Nemotron and Cosmos Nemotron models will be available as hosted APIs and for download on build.nvidia.com and on Hugging Face. Access for development, testing and research is free for members of the Nvidia Developer Program. Enterprises can run Llama Nemotron and Cosmos Nemotron NIM microservices in production with the Nvidia AI Enterprise software platform on accelerated data center and cloud infrastructure. source

Nvidia’s Nemotron model families will advance AI agents Read More »

Story uses Web3 to enable creators to capture the value they contribute to the AI ecosystem

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Story, an intellectual property blockchain, believes that creators, developers and artists should be able to be rewarded for what they contribute to AI. But it’s not easy to trace their contributions, and Story, which has raised $140 million to date (formerly Story Protocol), is doing something about that. Story, the world’s intellectual property blockchain has announced its use of Stability AI’s cutting-edge models to usher in a new era of open-source AI development, which allows contributors – creators, developers, artists – to capture the value they create by contributing to the AI ecosystem. With the use of Stability AI’s technology, Story aims to address the critical challenge of properly attributing, tracking, and monetizing creative work generated via AI. Story is focused on addressing the lack of a clear path for creators to monetize their derivative works in the open-source ecosystem. Despite the incredible progress in AI, proper attribution and monetization for creators’ IP has not kept up with the rate of innovation. Story’s ecosystem partners. “We’re thrilled to leverage Stability AI’s models to tackle the most pressing challenges we face with the rapid rise of AI,” said Jason Zhao, chief protocol officer at PIP Labs, Story’s initial core contributor, in a statement. “The combination of AI and blockchain is not only incredibly powerful, but necessary. Blockchains secure digital property rights in the era of AI driven creative abundance. By leveraging Stability AI’s technology and Story’s technology, we’re showcasing how the proper incentive structures can ensure attribution and empower creators, driving AI development forward.” Jason Zhao, chief protocol officer at PIP Labs. Story and its ecosystem applications will use Stability AI’s leading foundational image models to build AI applications that embed tracking of contributions across the AI development life cycle to enable fair compensation to all creators involved with a monetized output.  Mahojin and ABLO are two AI applications building on Story that leverage Stability AI’s foundation models and Story’s blockchain technology. Mahojin, a search-to-generate AI remixing platform and ABLO, a collaborative AI platform that allows creators to design physical goods with leading brand IPs use Stability AI’s models to allow users to easily bring their creative vision to life and Story’s technology to enable better provenance and attribution across the AI stack. These two projects showcase real-world use cases and illustrate how to unlock new ways for creators to safeguard their IP and earn from their contributions in a dynamic, shared creative economy. With this kind of tracking, it means that artists and other creators will be able to get paid for their work more fairly, quickly and easily. It also means that the work of those artists, creators and developers could be used more widely. Stability AI, maker of Stable Diffusion, is used by more than five million people to generate images and media through its generative AI model. It focuses on open source models and has expanded into generation of video, audio and more. “Empowering creators is at the core of everything we do at Stability AI. We are thrilled to see our models used in Story’s blockchain technology to ensure proper attribution and rewards contributors,” said Scott Trowbridge, vice president of Stability AI, in a statement. He said a decentralized model for the creator industry is going to become increasingly important. Credit: Image generated by VentureBeat with Stable Diffusion Large 3.5 Story said it is committed to exploring different use cases for how AI and blockchain can come together to meet the evolving needs of creators and developers in the age of generative AI. For example, one area of exploration is registering training data like an artist’s unique style or voice as IP with transparent usage terms on Story. Anyone can then train and fine-tune their own model using this IP. If a creator uses this model to generate an output that is monetized, everyone in this chain of creation wins and benefits together. By leveraging Stability AI’s cutting-edge models, Story is taking a key step toward creating a sustainable and fair internet in the age of AI. PIP Labs, an initial core contributor to the Story Network, is backed by investors, including A16z crypto, Endeavor, and Polychain. PIP Labs was cofounded by a serial entrepreneur with a $440 million exit and Deepmind’s youngest PM with the veteran founding executive team with a diverse background in consumer tech, generative AI, and Web3 infrastructure. source

Story uses Web3 to enable creators to capture the value they contribute to the AI ecosystem Read More »

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As enterprises continue to adopt large language models (LLMs) in various applications, one of the key challenges they face is improving the factual knowledge of models and reducing hallucinations. In a new paper, researchers at Meta AI propose “scalable memory layers,” which could be one of several possible solutions to this problem. Scalable memory layers add more parameters to LLMs to increase their learning capacity without requiring additional compute resources. The architecture is useful for applications where you can spare extra memory for factual knowledge but also want the inference speed of nimbler models. Dense and memory layers Traditional language models use “dense layers” to encode vast amounts of information in their parameters. In dense layers, all parameters are used at their full capacity and are mostly activated at the same time during inference. Dense layers can learn more complex functions as they grow larger, but increasing their size requires additional computational and energy resources.  In contrast, for simple factual knowledge, much simpler layers with associative memory architectures resembling lookup tables would be more efficient and interpretable. This is what memory layers do. They use simple sparse activations and key-value lookup mechanisms to encode and retrieve knowledge. Sparse layers take up more memory than dense layers but only use a small portion of the parameters at once, which makes them much more compute-efficient. Memory layers have existed for several years but are rarely used in modern deep learning architectures. They are not optimized for current hardware accelerators.  Current frontier LLMs usually use some form of “mixture of experts” (MoE) architecture, which uses a mechanism vaguely similar to memory layers. MoE models are composed of many smaller expert components that specialize in specific tasks. At inference time, a routing mechanism determines which expert becomes activated based on the input sequence. PEER, an architecture recently developed by Google DeepMind, extends MoE to millions of experts, providing more granular control over the parameters that become activated during inference. Upgrading memory layers Memory layers are light on compute but heavy on memory, which presents specific challenges for current hardware and software frameworks. In their paper, the Meta researchers propose several modifications that solve these challenges and make it possible to use them at scale. Memory layers can store knowledge in parallel across several GPUs without slowing down the model (source: arXiv) First, the researchers configured the memory layers for parallelization, distributing them across several GPUs to store millions of key-value pairs without changing other layers in the model. They also implemented a special CUDA kernel for handling high-memory bandwidth operations. And, they developed a parameter-sharing mechanism that supports a single set of memory parameters across multiple memory layers within a model. This means that the keys and values used for lookups are shared across layers. These modifications make it possible to implement memory layers within LLMs without slowing down the model. “Memory layers with their sparse activations nicely complement dense networks, providing increased capacity for knowledge acquisition while being light on compute,” the researchers write. “They can be efficiently scaled, and provide practitioners with an attractive new direction to trade-off memory with compute.” To test memory layers, the researchers modified Llama models by replacing one or more dense layers with a shared memory layer. They compared the memory-enhanced models against the dense LLMs as well as MoE and PEER models on several tasks, including factual question answering, scientific and common-sense world knowledge and coding. A 1.3B memory model (solid line) trained on 1 trillion tokens approaches the performance of a 7B model (dashed line) on factual question-answering tasks as it is given more memory parameters (source: arxiv) Their findings show that memory models improve significantly over dense baselines and compete with models that use 2X to 4X more compute. They also match the performance of MoE models that have the same compute budget and parameter count. The model’s performance is especially notable on tasks that require factual knowledge. For example, on factual question-answering, a memory model with 1.3 billion parameters approaches the performance of Llama-2-7B, which has been trained on twice as many tokens and 10X more compute.  Moreover, the researchers found that the benefits of memory models remain consistent with model size as they scaled their experiments from 134 million to 8 billion parameters. “Given these findings, we strongly advocate that memory layers should be integrated into all next generation AI architectures,” the researchers write, while adding that there is still a lot more room for improvement. “In particular, we hope that new learning methods can be developed to push the effectiveness of these layers even further, enabling less forgetting, fewer hallucinations and continual learning.” source

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations Read More »

Eureka’s robotic vacuum can detect liquid stains and cut itself free of tangles

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Eureka, a household cleaning pioneer with over a century of history, has unveiled its Eureka J15 Max Ultra robotic vacuum. While such robotic vacuums are plentiful now, this one has cool features like being able to detect and adapt to transparent liquids. It’s another example of a 115-year-old company showing up at CES 2025 with new technology. It also introduces an anti-tangle mechanism that can cut it free when it gets caught on carpet tangles. It has an extendable side brush and mop for accessing the tightest corners, enhanced obstacle-crossing capabilities to glide smoothly over thresholds, and a powerful 22,000 Pa suction system for cleaning performance. My family had a Eureka vacuum when I was growing up. But it wasn’t autonomous. Eureka’s ScrubExtend feature. The Eureka J15 Max Ultra has a self-cleaning base station and the FlexiRazor technology, which effortlessly cuts through tangles to minimize maintenance. Pet owners have also been wowed by the series’ pet-friendly design, featuring the ability to avoid pet waste and allow remote video interaction with pets (with user permission), ensuring a stress-free cleaning experience for both pets and owners. IntelliView AI 2.0 – transparent liquid stain detection Eureka’s IntelliView AI 2.0 Eureka’s proprietary IntelliView AI technology, first introduced with the earlier J15 Pro Ultra, provides a more intelligent way to manage wet messes. When encountering wet messes, it commands the robot to automatically rotate its body, prioritize mop cleaning, and lift the roller brush to prevent liquid from entering the dustbin. However, transparent liquids could still be missed due to the influence of ambient light on the robot’s vision sensors. The Eureka J15 Max Ultra overcomes this limitation with the groundbreaking IntelliView AI 2.0, an advanced system that integrates an infrared vision system and an FHD vision sensor. This combination allows the vacuum to generate two types of views in real-time: high-definition images of the objects and their surface structures that are largely unaffected by ambient light or lighting variations. These images are processed by powerful AI algorithms trained to identify subtle differences in surface reflections and texture, enabling the robot to detect liquids clearly, even transparent ones. Enhanced side brushes – extendable and anti-tangle tech Eureka robotic vacuum’s DragonClaw Side Brush. To tackle even the tightest corners and smallest nooks, the Eureka J15 Max Ultra features an advanced dual extension system. This system combines the widely acclaimed ScrubExtend mop extension technology from the J15 Pro Ultra with the newly introduced SweepExtend. Together, these innovations enable the mop and side brush to automatically extend when detecting corners and edges, ensuring thorough cleaning coverage, even in the most hard-to-reach spaces. Eureka takes anti-tangle innovation a step further with the introduction of the DragonClaw Side Brush. Unlike traditional side brushes with equilateral triangle bristles prone to tangling, the DragonClaw features a cutting-edge V-shaped design. This design leverages centrifugal force during rotation to actively untangle hair, offering unparalleled anti-tangle performance and ease of maintenance. Enhanced obstacle-crossing and suction power Eureka’s robotic vacuum can cross obstacles. With enhanced power and agility, the Eureka J15 Max Ultra sets a new standard in performance. Delivering an impressive 22,000 Pa of suction power—a 35% increase over its predecessor—it ensures deep and thorough cleaning. Equipped with advanced ObstaCross Technology, the robot effortlessly navigates standard thresholds up to 1.18 inches and handles complex double-layer thresholds up to 1.57 inches*, allowing it to seamlessly transition across diverse floor types and obstacles with ease. Availability and pricing The Eureka J15 Max Ultra is expected to be available in June 2025, priced at $1,300. The initial sales wave will kick off in the United States, Germany, France, Italy, and Spain. In addition, Eureka also introduced the J15 Ultra, an entry-level model in the Eureka J15 Series, featuring 19,000 Pa suction power, FlexiRazor Technology, ScrubExtend, and the All-in-One Base Station. The J15 Ultra is set to launch in March 2025 at a price of $800. For more information, please visit Eureka’s official website at . Founded in 1909 in Detroit, Michigan, Eureka offers a full line of vacuum cleaners, including uprights, canisters, sticks, handhelds, cordless, and robot vacuum cleaners. source

Eureka’s robotic vacuum can detect liquid stains and cut itself free of tangles Read More »

Diffbot’s AI model doesn’t guess—it knows, thanks to a trillion-fact knowledge graph

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Diffbot, a small Silicon Valley company best known for maintaining one of the world’s largest indexes of web knowledge, announced today the release of a new AI model that promises to address one of the biggest challenges in the field: factual accuracy. The new model, a fine-tuned version of Meta’s LLama 3.3, is the first open-source implementation of a system known as graph retrieval-augmented generation, or GraphRAG. Unlike conventional AI models, which rely solely on vast amounts of preloaded training data, Diffbot’s LLM draws on real-time information from the company’s Knowledge Graph, a constantly updated database containing more than a trillion interconnected facts. “We have a thesis: that eventually general-purpose reasoning will get distilled down into about 1 billion parameters,” said Mike Tung, Diffbot’s founder and CEO, in an interview with VentureBeat. “You don’t actually want the knowledge in the model. You want the model to be good at just using tools so that it can query knowledge externally.” How it works Diffbot’s Knowledge Graph is a sprawling, automated database that has been crawling the public web since 2016. It categorizes web pages into entities such as people, companies, products and articles, extracting structured information using a combination of computer vision and natural language processing. Every four to five days, the Knowledge Graph is refreshed with millions of new facts, ensuring it remains up-to-date. Diffbot’s AI model leverages this resource by querying the graph in real time to retrieve information, rather than relying on static knowledge encoded in its training data. For example, when asked about a recent news event, the model can search the web for the latest updates, extract relevant facts, and cite the original sources. This process is designed to make the system more accurate and transparent than traditional LLMs. “Imagine asking an AI about the weather,” Tung said. “Instead of generating an answer based on outdated training data, our model queries a live weather service and provides a response grounded in real-time information.” How Diffbot’s Knowledge Graph beats traditional AI at finding facts In benchmark tests, Diffbot’s approach appears to be paying off. The company reports its model achieves an 81% accuracy score on FreshQA, a Google-created benchmark for testing real-time factual knowledge, surpassing both ChatGPT and Gemini. It also scored 70.36% on MMLU-Pro, a more difficult version of a standard test of academic knowledge. Perhaps most significantly, Diffbot is making its model fully open-source, allowing companies to run it on their own hardware and customize it for their needs. This addresses growing concerns about data privacy and vendor lock-in with major AI providers. “You can run it locally on your machine,” Tung noted. “There’s no way you can run Google Gemini without sending your data over to Google and shipping it outside of your premises.” Open-source AI could transform how enterprises handle sensitive data The release comes at a pivotal moment in AI development. Recent months have seen mounting criticism of large language models’ tendency to “hallucinate” or generate false information, even as companies continue to scale up model sizes. Diffbot’s approach suggests an alternative path forward, one focused on grounding AI systems in verifiable facts rather than attempting to encode all human knowledge in neural networks. “Not everyone’s going after just bigger and bigger models,” Tung said. “You can have a model that has more capability than a big model with kind of a non-intuitive approach like ours.” Industry experts note that Diffbot’s Knowledge Graph-based approach could be particularly valuable for enterprise applications where accuracy and auditability are crucial. The company already provides data services to major firms including Cisco, DuckDuckGo and Snapchat. The model is available immediately through an open-source release on GitHub and can be tested through a public demo at diffy.chat. For organizations wanting to deploy it internally, Diffbot says the smaller 8-billion-parameter version can run on a single Nvidia A100 GPU, while the full 70-billion-parameter version requires two H100 GPUs. Looking ahead, Tung believes the future of AI lies not in ever-larger models, but in better ways of organizing and accessing human knowledge: “Facts get stale. A lot of these facts will be moved out into explicit places where you can actually modify the knowledge and where you can have data provenance.” As the AI industry grapples with challenges around factual accuracy and transparency, Diffbot’s release offers a compelling alternative to the dominant bigger-is-better paradigm. Whether it succeeds in shifting the field’s direction remains to be seen, but it has certainly demonstrated that when it comes to AI, size isn’t everything. source

Diffbot’s AI model doesn’t guess—it knows, thanks to a trillion-fact knowledge graph Read More »

Nvidia announces early access for Omniverse Sensor RTX for smarter autonomous machines

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia announced early access for Omniverse Cloud Sensor RTX software to enable smarter autonomous machines with generative AI. Generative AI and foundation models let autonomous machines generalize beyond the operational design domains on which they’ve been trained. Using new AI techniques such as tokenization and large language and diffusion models, developers and researchers can now address longstanding hurdles to autonomy. Nvidia made the announcement during Nvidia CEO Jensen Huang’s keynote at CES 2025. These larger models require massive amounts of diverse data for training, fine-tuning and validation. But collecting such data — including from rare edge cases and potentially hazardous scenarios, like a pedestrian crossing in front of an autonomous vehicle (AV) at night or a human entering a welding robot work cell — can be incredibly difficult and resource-intensive. To help developers fill this gap, NVIDIA Omniverse Cloud Sensor RTX APIs enable physically accurate sensor simulation for generating datasets at scale. The application programming interfaces (APIs) are designed to support sensors commonly used for autonomy — including cameras, radar and lidar — and can integrate seamlessly into existing workflows to accelerate the development of autonomous vehicles and robots of every kind. Omniverse Sensor RTX APIs are now available to select developers in early access. Organizations such as Accenture, Foretellix, MITRE and Mcity are integrating these APIs via domain-specific blueprints to provide end customers with the tools they need to deploy the next generation of industrial manufacturing robots and self-driving cars. Powering Industrial AI With Omniverse Blueprints In complex environments like factories and warehouses, robots must be orchestrated to safely and efficiently work alongside machinery and human workers. All those moving parts present a massive challenge when designing, testing or validating operations while avoiding disruptions. Mega is an Omniverse Blueprint that offers enterprises a reference architecture of Nvidia accelerated computing, AI, NVIDIA Isaac and NVIDIA Omniverse technologies. Enterprises can use it to develop digital twins and test AI-powered robot brains that drive robots, cameras, equipment and more for handling enormous complexity and scale. Integrating Omniverse Sensor RTX, the blueprint lets robotics developers simultaneously render sensor data from any type of intelligent machine in a factory for high-fidelity, large-scale sensor simulation. With the ability to test operations and workflows in simulation, manufacturers can save considerable time and investment, and improve efficiency in entirely new ways. International supply chain solutions company Kion Group and Accenture are using the Mega blueprint to build Omniverse digital twins that serve as virtual training and testing environments for industrial AI’s robot brains, tapping into data from smart cameras, forklifts, robotic equipment and digital humans. The robot brains perceive the simulated environment with physically accurate sensor data rendered by the Omniverse Sensor RTX APIs. They use this data to plan and act, with each action precisely tracked with Mega, alongside the state and position of all the assets in the digital twin. With these capabilities, developers can continuously build and test new layouts before they’re implemented in the physical world. Driving AV Development and Validation Autonomous vehicles have been under development for over a decade, but barriers in acquiring the right training and validation data and slow iteration cycles have hindered large-scale deployment. To address this need for sensor data, companies are harnessing the Nvidia Omniverse Blueprint for AV simulation, a reference workflow that enables physically accurate sensor simulation. The workflow uses Omniverse Sensor RTX APIs to render the camera, radar and lidar data necessary for AV development and validation. AV toolchain provider Foretellix has integrated the blueprint into its Foretify AV development toolchain to transform object-level simulation into physically accurate sensor simulation. The Foretify toolchain can generate any number of testing scenarios simultaneously. By adding sensor simulation capabilities to these scenarios, Foretify can now enable developers to evaluate the completeness of their AV development, as well as train and test at the levels of fidelity and scale needed to achieve large-scale and safe deployment. In addition, Foretellix will use the newly announced Nvidia Cosmos platform to generate an even greater diversity of scenarios for verification and validation. Nuro, an autonomous driving technology provider with one of the largest level 4 deployments in the U.S., is using the Foretify toolchain to train, test and validate its self-driving vehicles before deployment. In addition, research organization MITRE is collaborating with the University of Michigan’s Mcity testing facility to build a digital AV validation framework for regulatory use, including a digital twin of Mcity’s 32-acre proving ground for autonomous vehicles. The project uses the AV simulation blueprint to render physically accurate sensor data at scale in the virtual environment, boosting training effectiveness. The future of robotics and autonomy is coming into sharp focus, thanks to the power of high-fidelity sensor simulation. Learn more about these solutions at CES by visiting Accenture at Ballroom F and Foretellix booth 4016 in the West Hall. source

Nvidia announces early access for Omniverse Sensor RTX for smarter autonomous machines Read More »