Nvidia’s ‘hard pivot’ to AI reasoning bolsters Llama models for agentic AI

Nvidia’s Briski said the company’s “hard pivot” to reasoning has boosted the accuracy of its Llama Nemotron models up to 20% compared with the base model. Inference speed has also been optimized by 5x compared with other leading open reasoning models, she claimed. These improvements in inference performance make the family of models capable of handling more complex reasoning tasks, Briski said, which in turn reduce operational costs for enterprises.

The Llama Nemotron family of models are available as Nvidia NIM microservices in Nano, Super, and Ultra sizes, which enable organizations to deploy the models at scales suited to their needs. Nano microservices are optimized for deployment on PCs and edge devices. Super microservices are for high throughput on a single GPU. Ultra microservices are for multi-GPU servers and data-center-scale applications.

Partners extend reasoning to Llama ecosystem

Nvidia’s partners are also getting in on the action. Microsoft is expanding its Azure AI Foundry model catalog with Llama Nemotron reasoning models and NIM microservices to enhance services such as the Azure AI Agent Service for Microsoft 365. SAP is leveraging them for SAP Business AI solutions and its Joule copilot. It’s also using NeMo microservices to increase code completion accuracy for SAP ABAP programming language models. ServiceNow said Llama Nemotron models will provide its AI agents with greater performance and accuracy.

source

Leave a Comment

Your email address will not be published. Required fields are marked *