VentureBeat

Reflection 70B saga continues as training data provider releases post-mortem report

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More On September 5th, 2024, Matt Shumer, co-founder and CEO of the startup Hyperwrite AI (also known as OthersideAI) took to the social network X to post the bombshell news that he had fine-tuned a version of Meta’s open source Llama 3.1-70B into an even more performant large language model (LLM) known as Reflection 70B — so performant, in fact, based on alleged third-party benchmarking test results he published, that it was “the world’s top open-source model,” according to his post. I’m excited to announce Reflection 70B, the world’s top open-source model. Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes. 405B coming next week – we expect it to be the best model in the world. Built w/ @GlaiveAI. Read on ⬇️: pic.twitter.com/kZPW1plJuo — Matt Shumer (@mattshumer_) September 5, 2024 However, shortly after its release, third-party evaluators in the AI research and hosting community struggled to reproduce the claimed results, leading to accusations of fraud. Researchers cited discrepancies between the announced benchmark results and their independent tests, sparking a wave of criticism on social platforms such as Reddit and X. In response to these concerns, Shumer pledged he would conduct a review of the issues alongside Sahil Chaudhary, founder of Glaive, the AI startup whose synthetic data Shumer claimed he had trained Reflection 70B on — and which he later revealed to have invested what he called a small amount into. Now, nearly a month later, Chaudhary last night released a post-mortem report on his Glaive AI blog about the Reflection 70B model and published resources for the open-source AI community to test the model and his training process on their own. He says while he was unable to reproduce all of the same benchmarks, he “found a bug in the initial code,” resulting in several results appearing higher than what he has found on recent tests of Reflection 70B. However, other benchmark results appear higher than before — adding to the mystery. On September 5th, @mattshumer_ announced Reflection 70B, a model fine-tuned on top of Llama 3.1 70B, showing SoTA benchmark numbers, which was trained by me on Glaive generated data. Today, I’m sharing model artifacts to reproduce the initial claims and a post-mortem to address… — Sahil Chaudhary (@csahil28) October 2, 2024 As Chaudhary wrote in the post: “There were a lot of mistakes made by us in the way we launched the model, and handled the problems reported by the community. I understand that things like these have a significant negative effect on the open source ecosystem, and I’d like to apologize for that. I hope that this adds some clarity to what happened, and is a step in the direction of regaining the lost trust. I have released all of the assets required to independently verify the benchmarks and use this model.“ Sharing model artifacts To restore transparency and rebuild trust, Chaudhary shared several resources to help the community replicate the Reflection 70B benchmarks. These include: Model weights: Available on Hugging Face, providing the pre-trained version of Reflection 70B. Training data: Released for public access, enabling independent tests on the dataset used to fine-tune the model. Training scripts and evaluation code: Available on GitHub, these scripts allow for reproduction of the model’s training and evaluation process. These resources aim to clarify how the model was developed and offer a path for the community to validate the original performance claims. Reproducing the benchmarks In his post-mortem, Chaudhary explained that a major issue with reproducing the initial benchmark results stemmed from a bug in the evaluation code. This bug caused inflated scores in certain tasks, such as MATH and GSM8K, due to an error in how the system handled responses from an external API. The corrected benchmarks show slightly lower, but still strong, performance relative to the initial report. The updated benchmark results for Reflection 70B are as follows: MMLU: 90.94% GPQA: 55.6% HumanEval: 89.02% MATH: 70.8% GSM8K: 95.22% IFEVAL: 87.63% Compare that to the originally stated performance of: MMLU: 89.9% GPQA: 55.3% HumanEval: 91% MATH: 79.7% GSM8K: 99.2% IFEVAL: 90.13% Although the revised scores are not as high as those initially reported, Chaudhary asserts that they are more accurate reflections of the model’s capabilities. He also addressed concerns about dataset contamination, confirming that tests showed no significant overlap between the training data and benchmark sets. Reflecting on a hasty release Chaudhary admitted that the decision to release Reflection 70B was made hastily, driven by enthusiasm for the model’s performance on reasoning-based tasks. He noted that the launch lacked sufficient testing, particularly regarding the compatibility of the model files, and that he and Shumer had not verified whether the model could be easily downloaded and run by the community. “We shouldn’t have launched without testing, and with the tall claims of having the best open-source model,” Chaudhary wrote. He also acknowledged that more transparency was needed, especially regarding the model’s strengths and weaknesses. While Reflection 70B excels at reasoning tasks, it struggles in areas like creativity and general user interaction, a fact that was not communicated at launch. Clarifying API confusion One of the more serious accusations involved the suspicion that the Reflection 70B API was simply relaying outputs from Anthropic’s Claude model. Users reported strange behavior in the model’s outputs, including responses that seemed to reference Claude directly. Chaudhary addressed these concerns, explaining that although some of these behaviors were reproducible, he asserts there was no use of Claude APIs or any form of word filtering in the Reflection 70B model. He reiterated that the API was run on Glaive AI’s compute infrastructure, and Matt Shumer had no access to the code or servers used during this period. Looking ahead In closing, Chaudhary emphasized his commitment to transparency and expressed his hope that this post-mortem and the release of model artifacts will help restore trust in the project. He also confirmed that

Reflection 70B saga continues as training data provider releases post-mortem report Read More »

Vectorize debuts agentic RAG platform for real time enterprise data

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More While vector databases are now increasingly commonplace as a core element of an enterprise AI deployment for Retrieval Augmented Generation (RAG), that’s not all that’s needed. Chris Latimer, the CEO and co-founder of startup Vectorize, spent several years working at DataStax where he helped to lead the database vendor’s cloud efforts. A recurring issue that he saw time and again was that the vector database wasn’t really the hard part of enabling enterprise RAG. The hard part of the problem was taking all the unstructured data and getting it into the vector database, in a way that was optimized and going to work well for generative AI. That’s why Latimer started Vectorize just ten months ago, in a bid to help solve that challenge.  Today the company is announcing that it has raised $3.6 million in a seed round of funding, led by True Ventures. Alongside the funding, the company announced the general availability of its enterprise RAG platform. The Vectorize platform can enable an agentic RAG approach for near real-time data capability. Vectorize focuses on the data engineering side of AI. The platform helps companies prepare and maintain their data for use in vector databases and large language models. The Vectorize platform also enables enterprises to quickly build an RAG data pipeline through an intuitive interface. Another core capability is an RAG evaluation feature that allows enterprises to test different approaches. “We kept seeing people get to the end of the development cycle with their Gen AI projects and find out that they didn’t work really well,” Chris Latimer, co-founder and CEO of Vectorize told VentureBeat in an exclusive interview. “The context they were getting for their vector database wasn’t the most useful to the large language model, it was still hallucinating or it was misinterpreting the data.” How Vectorize fits into the enterprise RAG stack Vectorize is not a vector database itself. Rather, it’s a platform that connects unstructured data sources to existing vector databases like Pinecone, DataStax, Couchbase and Elastic. Latimer explained that Vectorize ingests and optimizes data from diverse sources for vector databases. The platform will provide a production-ready data pipeline that handles ingestion, synchronization, error handling and other data engineering best practices. Vectorize itself is not a vector embedding technology either. The process of converting data, be it text, images or audio into vectors, is what vector embedding is all about. Vectorize helps users evaluate different embedding models and data chunking methods to determine the best configuration for the enterprise’s specific use case and data. Latimer explained that Vectorize allows users to choose from any number of different embedding models. The different models could include for example OpenAI’s ada, or even Voyage AI embeddings, which are now being adopted by Snowflake. “We do take into account innovative ways to vectorize the data so that you get the best results,” Latimer said. “But ultimately, where we see the value is in giving enterprises and developers a production-ready solution that they just don’t have to worry about the data engineering side.” Using agentic AI to power enterprise RAG One of Vectorize’s key innovations is its “agentic RAG” approach. It’s an approach that combines traditional RAG techniques with AI agent capabilities, allowing for more autonomous problem-solving in applications. Agentic RAG isn’t a hypothetical concept either. It’s already being used by one of Vectorize’s early users, AI inference silicon startup Groq, which recently raised $640 million. Groq is using Vectorize’s agentic RAG capabilities to power an AI support agent. The agent can autonomously solve customer problems using the data and context provided by Vectorize’s data pipelines. “If a customer has a question that’s been asked and answered before, you want that agent to be able to solve the customer’s problem without a human getting involved,” Latimer said. “But if there’s something that the agent can’t solve, you do want to have a human in the loop where you can escalate, so this idea of being able to have an agent reason its way through solving a problem, is the whole idea behind an AI agent architecture.” Why real time data pipelines are essential to enterprise RAG A primary reason why an enterprise will use RAG is to connect to its own sources of data. What’s equally important though is making sure that data is up to date. “Stale data is going to lead to stale decisions,” Latimer said. Vectorize provides real-time and near-real-time data update capabilities, with the ability for customers to configure their tolerance for data staleness. “We’ve actually let people configure the platform based on their tolerance for stale data and their need for real-time data,” he said. “So if all you need is to schedule your pipeline to run once a week, we’ll let you do that, and then if you need to run real-time, we’ll let you do that as well, and you’ll have real-time updates as soon as they’re available.” source

Vectorize debuts agentic RAG platform for real time enterprise data Read More »

OpenAI will bring Cosmopolitan publisher Hearst’s content to ChatGPT

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Is the future of written media — and potentially imagery and videos, too — going to be primarily surfaced to us through ChatGPT? It’s not out of the question at the rate OpenAI is going. At the very least, the $157-billion dollar valued AI unicorn — fresh off the launch of its new Canvas feature for ChatGPT and a record-setting $6.6 billion fundraising round — is making damn well sure it has most of the leading U.S. magazine and text-based news publishers entered into content licensing agreements with it. These enable OpenAI to train on, or at least serve up, vast archives of prior written articles, photos, videos and other journalistic/editorial materials, through ChatGPT, SearchGPT and other AI products, potentially as truncated summaries. The latest major American media firm to join with OpenAI is Hearst, the eponymous media company famed for its “yellow journalism” founder William Randolph Hearst (who helped beat the drum for the U.S. to enter the Spanish-American War as well as demonized marijuana, and was memorably fictionalized by Citizen Kane‘s Charles Foster Kane) which is now perhaps best known as the publisher of Cosmopolitan, the sex and lifestyle magazine aimed at young women, as well as Esquire, Elle, Car & Driver, Country Living, Good Housekeeping, Popular Mechanics and many more. In total, Hearst operates 25 brands in the U.S., 175 websites and more than 200 magazine editions worldwide, according to its media page. However, OpenAI will be specifically surfacing “curated content” from more than 20 magazine brands and over 40 newspapers, including well-known titles such as Cosmopolitan, Esquire, Houston Chronicle, San Francisco Chronicle, ELLE, and Women’s Health. The content will be clearly attributed, with appropriate citations and direct links to Hearst’s original sources, ensuring transparency, according to the brands. “Hearst’s other businesses outside of magazines and newspapers are not included in this partnership,” reads a release jointly published on Hearst’s and OpenAI’s websites. It’s unclear whether or not the company will be training its models specifically on Hearst content — or merely piping said content through to end users of ChatGPT and other products. I’ve reached out to an OpenAI spokesperson for clarity and will update when I hear back. Hearst now joins the long and growing list of media publishers that have struck content licensing deals with OpenAI. Among the many that have forged deals with OpenAI include: These partnerships represent OpenAI’s broader ambition to collaborate with established media brands and elevate the quality of content provided through its AI systems. With Hearst’s integration, OpenAI continues to expand its network of trusted content providers, ensuring users of its AI products, like ChatGPT, have access to reliable information across a wide range of topics. What the executives are saying it means Jeff Johnson, President of Hearst Newspapers, emphasized the critical role that professional journalism plays in the evolution of AI. “As generative AI matures, it’s critical that journalism created by professional journalists be at the heart of all AI products,” he said, underscoring the importance of integrating trustworthy, curated content into these platforms. Debi Chirichella, President of Hearst Magazines, echoed this sentiment, noting that the partnership allows Hearst to help shape the future of magazine content while preserving the credibility and high standards of the company’s journalism. These deals signal a growing trend of cooperation between tech companies and traditional publishers as both industries adapt to the changes brought about by advances in AI. While OpenAI’s partnerships offer media companies access to cutting-edge technology and the opportunity to reach larger audiences, they also raise questions about the long-term impact on the future of publishing. Some critics argue that licensing content to AI platforms could potentially lead to competition, as AI systems improve and become more capable of generating content that rivals traditional journalism. I myself, as a journalist whose work was undoubtedly scraped and trained by many AI models (and used for lots of other things of which I had no control over or say in), voiced my own hesitation about media publishers moving so quickly to ink deals with OpenAI. These concerns were amplified in recent legal actions, such as the lawsuit filed by The New York Times against OpenAI and Microsoft, alleging copyright infringement in the development of AI models. The case remains in court for now, and NYT remains one of an increasingly few holdouts who have yet to settle with or strike a deal with OpenAI to license their content. Despite these concerns, publishers like Hearst, Condé Nast, and Vox Media are actively embracing AI as a means of staying competitive in an increasingly digital landscape. As Chirichella pointed out, Hearst’s partnership with OpenAI is not only about delivering their high-quality content to a new audience but also about preserving the cultural and historical context that defines their publications. This collaboration, she said, “ensures that our high-quality writing and expertise, cultural and historical context and attribution and credibility are promoted as OpenAI’s products evolve.” For OpenAI, these partnerships with major media brands enhance its ability to deliver reliable, engaging content to its users, aligning with the company’s stated goal of building AI products that provide trustworthy and relevant information. As Brad Lightcap, COO of OpenAI, explained, bringing Hearst’s content into ChatGPT elevates the platform’s value to users, particularly as AI becomes an increasingly common tool for consuming and interacting with news and information. source

OpenAI will bring Cosmopolitan publisher Hearst’s content to ChatGPT Read More »

Foxconn to build Taiwan’s fastest AI supercomputer with Nvidia Blackwell

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia and Foxconn are building Taiwan’s largest supercomputer using Nvidia Blackwell chips. The project, Hon Hai Kaohsiung Super Computing Center, revealed Tuesday at Hon Hai Tech Day, will be built around Nvidia’s Blackwell graphics processing unit (GPU) architecture and feature the GB200 NVL72 platform, which includes a total of 64 racks and 4,608 Tensor Core GPUs. With an expected performance of over 90 exaflops of AI performance, the machine would easily be considered the fastest in Taiwan. Foxconn plans to use the supercomputer, once operational, to power breakthroughs in cancer research, large language model development and smart city innovations, positioning Taiwan as a global leader in AI-driven industries. Foxconn’s “three-platform strategy” focuses on smart manufacturing, smart cities and electric vehicles. The new supercomputer will play a pivotal role in supporting Foxconn’s ongoing efforts in digital twins, robotic automation and smart urban infrastructure, bringing AI-assisted services to urban areas like Kaohsiung. Construction has started on the new supercomputer housed in Kaohsiung, Taiwan. The first phase is expected to be operational by mid-2025. Full deployment is targeted for 2026. The project will integrate with Nvidia technologies, such as Nvidia Omniverse and Isaac robotics platforms for AI and digital twins technologies to help transform manufacturing processes. Nvidia is providing Blackwell AI chips to Foxconn for a new supercomputer. “Powered by Nvidia’s Blackwell platform, Foxconn’s new AI supercomputer is one of the most powerful in the world, representing a significant leap forward in AI computing and efficiency,” said Foxconn vice president James Wu, in a statement. The GB200 NVL72 is a state-of-the-art data center platform optimized for AI and accelerated computing. Each rack features 36 Nvidia Grace CPUs and 72 Nvidia Blackwell GPUs connected via Nvidia’s NVLink technology, delivering 130TB/s of bandwidth. Nvidia NVLink Switch allows the 72-GPU system to function as a single, unified GPU. This makes it ideal for training large AI models and executing complex inference tasks in real time on trillion-parameter models. Taiwan-based Foxconn, officially known as Hon Hai Precision Industry Co., is the world’s largest electronics manufacturer, known for producing a wide range of products, from smartphones to servers, for the world’s top technology brands. Foxconn is building digital twins of its factories using Nvidia Omniverse, and Foxconn was also one of the first companies to use Nvidia NIM microservices inthe development of domain-specific large language models, or LLMs, embedded into a variety of internal systems and processes in its AI factories for smart manufacturing, smart electric vehicles and smart cities. source

Foxconn to build Taiwan’s fastest AI supercomputer with Nvidia Blackwell Read More »

DeepMind’s Michelangelo benchmark reveals limitations of long-context LLMs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) with very long context windows have been making headlines lately. The ability to cram hundreds of thousands or even millions of tokens into a single prompt unlocks many possibilities for developers.  But how well do these long-context LLMs really understand and utilize the vast amounts of information they receive? Researchers at Google DeepMind have introduced Michelangelo, a new benchmark designed to evaluate the long-context reasoning capabilities of LLMs. Their findings, published in a new research paper, show that while current frontier models have progressed in retrieving information from large in-context data, they still struggle with tasks that require reasoning over the data structure. The need for better long-context benchmarks The emergence of LLMs with extremely long context windows, ranging from 128,000 to over 1 million tokens, has prompted researchers to develop new benchmarks to evaluate their capabilities. However, most of the focus has been on retrieval tasks, such as the popular “needle-in-a-haystack” evaluation, where the model is tasked with finding a specific piece of information within a large context. “Over time, models have grown considerably more capable in long context performance,” Kiran Vodrahalli, research scientist at Google DeepMind, told VentureBeat. “For instance, the popular needle-in-a-haystack evaluation for retrieval has now been well saturated up to extremely long context lengths. Thus, it has become important to determine whether the harder tasks models are capable of solving in short context regimes are also solvable at long ranges.” Retrieval tasks don’t necessarily reflect a model’s capacity for reasoning over the entire context. A model might be able to find a specific fact without understanding the relationships between different parts of the text. Meanwhile, existing benchmarks that evaluate a model’s ability to reason over long contexts have limitations. “It is easy to develop long reasoning evaluations which are solvable with a combination of only using retrieval and information stored in model weights, thus ‘short-circuiting’ the test of the model’s ability to use the long-context,” Vodrahalli said. Michelangelo To address the limitations of current benchmarks, the researchers introduced Michelangelo, a “minimal, synthetic, and unleaked long-context reasoning evaluation for large language models.”  Michelangelo is based on the analogy of a sculptor chiseling away irrelevant pieces of marble to reveal the underlying structure. The benchmark focuses on evaluating the model’s ability to understand the relationships and structure of the information within its context window, rather than simply retrieving isolated facts. The benchmark consists of three core tasks: Latent list: The model must process a long sequence of operations performed on a Python list, filter out irrelevant or redundant statements, and determine the final state of the list. “Latent List measures the ability of a model to track a latent data structure’s properties over the course of a stream of code instructions,” the researchers write. Multi-round co-reference resolution (MRCR): The model must produce parts of a long conversation between a user and an LLM. This requires the model to understand the structure of the conversation and resolve references to previous turns, even when the conversation contains confusing or distracting elements. “MRCR measures the model’s ability to understanding ordering in natural text, to distinguish between similar drafts of writing, and to reproduce a specified piece of previous context subject to adversarially difficult queries,” the researchers write. “I don’t know” (IDK): The model is given a long story and asked to answer multiple-choice questions about it. For some questions, the context does not contain the answer, and the model must be able to recognize the limits of its knowledge and respond with “I don’t know.” “IDK measures the model’s ability to understand whether it knows what it doesn’t know based on the presented context,” the researchers write. Latent Structure Queries The tasks in Michelangelo are based on a novel framework called Latent Structure Queries (LSQ). LSQ provides a general approach for designing long-context reasoning evaluations that can be extended to arbitrary lengths. It can also test the model’s understanding of implicit information as opposed to retrieving simple facts. LSQ relies on synthesizing test data to avoid the pitfalls of test data leaking into the training corpus. “By requiring the model to extract information from structures rather than values from keys (sculptures from marble rather than needles from haystacks), we can more deeply test language model context understanding beyond retrieval,” the researchers write. LSQ has three key differences from other approaches to evaluating long-context LLMs. First, it has been explicitly designed to avoid short-circuiting flaws in evaluations that go beyond retrieval tasks. Second, it specifies a methodology for increasing task complexity and context length independently. And finally, it is general enough to capture a large range of reasoning tasks. The three tests used in Michelangelo cover code interpretation and reasoning over loosely written text. “The goal is that long-context beyond-reasoning evaluations implemented by following LSQ will lead to fewer scenarios where a proposed evaluation reduces to solving a retrieval task,” Vodrahalli said. Evaluating frontier models on Michelangelo The researchers evaluated ten frontier LLMs on Michelangelo, including different variants of Gemini, GPT-4 and 4o, and Claude. They tested the models on contexts up to 1 million tokens. Gemini models performed best on MRCR, GPT models excelled on Latent List, and Claude 3.5 Sonnet achieved the highest scores on IDK. However, all models exhibited a significant drop in performance as the complexity of the reasoning tasks increased, suggesting that even with very long context windows, current LLMs still have room to improve in their ability to reason over large amounts of information. Frontier LLMs struggle with reasoning on long-context windows (source: arxiv) “Frontier models have room to improve on all of the beyond-retrieval reasoning primitives (Latent List, MRCR, IDK) that we investigate in Michelangelo,” Vodrahalli said. “Different frontier models have different strengths and weaknesses – each class performs well on different context ranges and on different tasks. What does seem to be universal across models is the initial drop

DeepMind’s Michelangelo benchmark reveals limitations of long-context LLMs Read More »

Accenture forms Nvidia business group to scale enterprise AI adoption

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Accenture has formed an Nvidia business group with 30,000 professionals to receive training to help enterprises scale up for the AI era. The aim is to train Accenture’s team to help clients reinvent processes and scale enterprise AI adoption with AI agents, said Lan Guan, chief AI officer at Accenture, in a press briefing. “We are living in the future we are envisioning, starting with our own company,” Guan said. “We are reinventing this business.” Accenture is tapping its existing workforce for the talent, but it is also training current employees and hiring new people to meet the 30,000-person goal for the new group, Guan said. She did not disclose how many new hires there would be. She also did not say how much each company will invest in the partnership. “Demand for GenAI is not slowing down,” Guan said. “We are coming together to increase adoption so they can use generative AI as a competitive advantage.” Justin Boitano, Nvidia’s vice president of enterprise AI software, said in a press call, “Every job function can benefit. There are a lot of great early successes. Customers are not always AI experts. The Accenture team has invested a lot” in this expertise. The new group amounts to an expanded partnership between Accenture and Nvidia. With generative AI demand driving $3 billion in Accenture bookings in its recently-closed fiscal year, the new group will help clients lay the foundation for agentic AI functionality using Accenture’s AI Refinery, which usesthe full Nvidia AI stack—including Nvidia AI Foundry, Nvidia AI Enterprise and Nvidia Omniverse—toadvance areas such as process reinvention, AI-powered simulation and sovereign AI. This software foundation will help Nvidia sell more of its AI processors. Guan said the Accenture AI Refinery will be available on all public and private cloud platforms and will integrate seamlessly with other Accenture Business Groups to accelerate AI across the SaaS and Cloud AI ecosystem. “We are breaking significant new ground with our partnership with NVIDIA and enabling our clients to be at the forefront of using generative AI as a catalyst for reinvention,” said Julie Sweet, chair and CEO at Accenture, in a statement. “Accenture AI Refinery will create opportunities for companies to reimagine their processes and operations, discover new ways of working, and scale AI solutions across the enterprise to help drive continuous change and create value.” I asked if the group was a division of Accenture. The company replied The Accenture Nvidia Business Group is wholly owned by Accenture. Accenture has business groups with its largest and most strategic ecosystem partners. The groups bring together the leading technology from partners with Accenture’s innovation and industry experience to help joint clients reinvent their businesses. The Accenture Nvidia Business Group will leverage Accenture’s AI Refinery and Nvidia’s technology to help enterprises rapidly deploy and scale AI-driven solutions. “AI will supercharge enterprises to scale innovation at greater speed,” said Jensen Huang, Nvidia CEO, said in a statement. “Nvidia’s platform, Accenture’s AI Refinery and our combined expertise will help businesses and nations accelerate this transformation to drive unprecedented productivity and growth.” Scaling agentic AI for enterprises The new Accenture Nvidia Business Group will accelerate momentum with generative AI and help clients scale agentic AI systems — the next frontier of gen AI — to drive new levels of productivity and growth. This significant investment will be supported by over 30,000 professionals receiving training globally to help clients reinvent processes and scale enterprise AI adoption. Agentic AI systems represent a leap forward for generative AI, the companies said. Instead of a human typing in a prompt or automating pre-existing business steps, agentic AI systems can act on the intent of the user, create new workflows and take appropriate actions based on their environment that can reinvent entire processes or functions. Accenture and Nvidia are already helping clients adopt and scale agentic AI systems. For example, IndosatGroup announced the first sovereign AI in Indonesia that enables businesses to securely deploy AI while ensuring data governance and adhering to regulations. It is collaborating with Accenture to build industry-specific solutions on top of Indosat’s data center, which includes Nvidia AI software and accelerated computing, to support local enterprises. With an initial focus on the financial services sector, the new solutions, powered by the AI Refinery platform, will help Indonesian banks harness AI to drive profitability, operational efficiency and sustainable growth in a highly competitive market. Accenture will also debut a new Nvidia NIM Agent Blueprint for virtual facility robot fleet simulation, which integrates Nvidia Omniverse, Isaac and Metropolis software, to enable industrial companies to build autonomous, robot-operated software-defined factories and facilities. Accenture will use these new capabilities at Eclipse Automation, an Accenture-owned manufacturing automation company, to deliver as much as 50% faster designs and 30% reduction in cycle time on behalf of its clients. Network of AI engineering hubs As part of its Center for Advanced AI, Accenture is introducing a network of hubs with deep engineering skills and the technical capacity for using agentic AI systems to transform large-scale operations. These hubs will focus on the selection, fine-tuning and large-scale inferencing of foundation models, all of which pose significant accuracy, cost, latency and compliance challenges when development is scaled. Building on existing hubs in Mountain View, California, and Bangalore, Accenture is adding AI Refinery Engineering Hubs in Singapore, Tokyo, Malaga and London. In addition to its use of agentic AI at Eclipse Automation, Accenture’s marketing function is integrating the AI Refinery platform with autonomous agents to help create and run smarter campaigns faster. This will result in a 25% to 35% reduction in manual steps, 6% cost savings and is expected to achieve a 25% to 55% increase in speed to market. source

Accenture forms Nvidia business group to scale enterprise AI adoption Read More »