VentureBeat

Nvidia unveils next-gen AI and industrial digitalization capabilities at Microsoft Ignite

This is a VB Lab Insights article presented by Nvidia. At the recent Microsoft Ignite conference, Nvidia and Microsoft unveiled innovations that will power the next wave of generative AI from cloud to PCs and fuel a new era of industrial AI. These announcements span across the full-stack including next-gen AI infrastructure powered by Nvidia Blackwell, new Omniverse workflows for digital twins and visual generative AI, and tools to enable Windows developers to build and optimize AI-powered apps on RTX AI PCs. Together, Nvidia and Microsoft are helping businesses across industries unleash the full potential of AI and drive more efficiencies and transformation through a comprehensive, full-stack approach. Check out some of the new announcements to come from Microsoft Ignite: Nvidia Blackwell empowers next-gen AI on Microsoft Azure Microsoft announced the launch of the first cloud private preview of the Azure ND GB200 V6 VM series. Based on the Nvidia Blackwell platform, this is a new AI-optimized virtual machine (VM) series and combines the Nvidia GB200 NVL72 rack design with Nvidia Quantum InfiniBand networking. Optimized for large-scale deep learning workloads, including trillion-parameter scale AI model training and inference, as well as complex tasks in advanced natural language processing, computer vision and more, it complements the previously announced Azure AI clusters with ND H200 V5 VMs, powered by Nvidia H200 GPUs. Azure Container Apps enables serverless AI inference with Nvidia Microsoft Azure Container Apps now supports Nvidia accelerated computing for simplified and scalable AI deployments. Serverless computing gives AI application developers more agility in deploying, scaling and iterating on applications without worrying about underlying infrastructure, improving functionality while minimizing operational overhead. Additionally, Nvidia and Microsoft will bring Nvidia NIM microservices to serverless Nvidia GPUs in Azure to optimize AI model performance. Omniverse reference workflows unveiled for industrial AI and visual generative AI The Nvidia Omniverse platform on Azure includes new reference workflows for industrial AI. These workflows can help developers build 3D, simulation and digital twin applications on Nvidia Omniverse and Universal Scene Description (OpenUSD). The reference workflow for 3D remote monitoring of industrial operations will enable customers to connect physically accurate 3D models of industrial systems to real-time data from Azure IoT Operations and Power BI. The Omniverse Blueprint for precise visual generative AI enables developers to build applications that empower nontechnical teams to generate AI-enhanced visuals aligned with brand assets. Nvidia introduced the Nvidia Nemovision-4B Instruct, a new multimodal small language model (SLM) for RTX AI PCs and workstations, which will enable digital humans to understand visual imagery and provide relevant responses. Nvidia also announced updates to Nvidia TensorRT Model Optimizer (ModelOpt) to enable Windows developers to optimize models for ONNX Runtime deployment. With these new features and optimizations, Windows application developers can deploy AI models with faster throughput and minimal accuracy loss, allowing them to run on a wider range of PCs. Watch on-demand sessions about Nvidia AI on Azure At Microsoft Ignite, Nvidia and Microsoft shared how these recent technologies can accelerate generative AI, help launch new AI applications quickly and bring new digital capabilities across industries. Whether you missed Microsoft Ignite or want to check out these in-depth presentations again, these sessions are now available on-demand:  Accelerate generative AI adoption with Nvidia AI on Azure  Discover how enterprises can leverage the Nvidia accelerated computing platform on Azure to transform with generative AI. Along with enterprise-grade software and services, you can optimize performance, reduce total cost of ownership and accelerate time-to-solution.  Get from pilot to production the quick, easy, optimized way  Explore the latest development of Nvidia NIM microservices within Azure AI Studio to get your generative AI applications into production successfully. Learn how you can streamline workflows, enhance performance and quickly move from pilot to production.  The new era of industrial digitalization and manufacturing innovation  Hear from Nvidia and Microsoft experts about the latest advancements in computer graphics, data interoperability, generative AI and accelerated cloud computing. Learn how Azure IoT Operations, powered by Nvidia Omniverse Cloud APIs and OpenUSD, is enabling manufacturers to digitalize operations.  Explore a new era of digital manufacturing  Check out this hands-on demo of how Azure IoT Operations and Power BI can provide real-time monitoring, collaboration and physically based visualization capabilities for production insights and new operational possibilities.  Harness the power of RTX AI PCs to elevate next-gen AI applications  See how Nvidia RTX AI provides an end-to-end, GPU-accelerated platform for building and deploying advanced models and applications that keep your data private while tapping into the power of AI on your PC. Understanding and accelerating business decisions for M365 data using AI Enterprises are leveraging AI to accelerate their business but face challenges integrating diverse, siloed data into AI services. Learn how Cohesity, with Nvidia generative AI and Microsoft Azure OpenAI, provide instant, high-quality insights from secondary data. Learn more about Nvidia and Microsoft solutions The announcements and sessions at Microsoft Ignite reinforced that accelerated computing technology from Nvidia and purpose-built AI architecture from Microsoft Azure are helping organizations empower new capabilities and revolutionize whole industries. Check out the Nvidia partner page for on-demand sessions, demos, special offers for developers and more from Microsoft Ignite. VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Nvidia unveils next-gen AI and industrial digitalization capabilities at Microsoft Ignite Read More »

Stable Diffusion 3.5 hits Amazon Bedrock: What it means for enterprise AI workflows

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Creating fancy generative AI images can be fun and useful, but that’s not all enterprises need. Enterprise text-to-image generation is about more than just creating images. It’s about integration with existing workflows and other enterprise AI tools. That’s a direction that Stability AI, the vendor behind Stable Diffusion, understands. Today, Stability AI and Amazon Web Services (AWS) jointly announced that Stable Diffusion 3.5 Large is available on the Amazon Bedrock service. AWS is the only public cloud service offering the flagship Stability AI models.  The move isn’t just about simple availability. It’s about integration and a go-to-market strategy that breathes more life into Stability AI’s efforts as the company’s new CEO brings renewed focus to meeting customer needs. Amazon Bedrock provides a single, unified API that allows enterprises to access and use multiple AI models, including Stable Diffusion. That’s important, as AWS’ own research shows that most enterprises use more than one model at a time. It’s an approach that users including the National Football League (NFL) and Stride Learning are already benefiting from.  Stable Diffusion 3.5’s Amazon Bedrock deployment comes as Stability AI faces an increasingly competitive landscape with rivals including Google, Midjourney, Ideogram and Black Forest Labs’ Flux Pro, among others. The company aims to differentiate with more image diversity in terms of style, prompt adherence and enterprise workflows. “There’s a reason that we’re on AWS,” Prem Akkaraju, CEO of Stability AI, told VentureBeat in an exclusive interview. “It’s because that’s where the developers and the creators are, and we want to bring our tools and our models where they are. Our goal is to empower professional content creators.”  AWS has its own image generation tech, why does it need Stable Diffusion? The increasingly competitive text-to-image gen AI landscape also includes models from Amazon. At the beginning of December, the Amazon Nova AI model family was announced, including image generation models.  Baskar Sridharan, VP of AI and ML services and infrastructure at AWS told VentureBeat that having multiple text-to-image generation models provides user choice. Amazon Bedrock provides a single unified API for users, so they can choose to deploy any model that’s available on the platform using the same API. Sridharan also noted that AWS provides model evaluation tools that can help enterprises choose the best tool for a specific deployment. Not surprisingly, Akkaraju sees Stable Diffusion 3.5 as being superior to other models. That’s an assertion that Stability AI has backed up through reported benchmarks on prompt adherence. “We show in our research that Stable Diffusion 3.5 Large leads the market in prompt adherence, allowing for models to closely follow a given text prompt and really making it the top choice for efficiency and high quality performance,” said Akkaraju. How enterprises can fit Stable Diffusion into an Amazon Bedrock AI workflow Stable Diffusion 3.5 large has been available to Stability AI users via the company’s API as well as its Stable Assistant service since late October. Stability AI doesn’t see any real price difference between using Stable Diffusion on its API or via Amazon Bedrock. The real benefit for enterprise users having Stable Diffusion now available on Amazon Bedrock is the ability to fit into larger, more complex enterprise workflows. Enterprises can benefit from a unified workflow that ties multiple models from different vendors all together with a single API. It’s an approach that the NFL is already using. The NFL has an application called “My Cause, My Cleats” that uses Amazon Bedrock to enable a collaborative, community-focused experience for creating custom cleat designs. Sridharan explained that the application uses both Anthropic Claude and Stable Diffusion. The NFL uses Claude to create detailed prompts that understand user preferences and determine what they want. That prompt is then fed into Stable Diffusion to generate the image. The entire process and workflow is done on Amazon Bedrock without the need to jump between different services. Another organization that has benefitted from the integration is education vendor Stride Learning. The company needed images to support its online learning game Legends Library — a lot of images, up to 1,000 images per minute. Amazon Bedrock provides the infrastructure to support that scale for running Stable Diffusion. Beyond just high-performance scale, there was also a need to secure image generation output. That’s where the Amazon Bedrock Guardrails API fits in. Sridharan noted that with guardrails, Stride Learning is able to meet responsible AI policies for image generation. “When you do this all using a single API endpoint, it makes it very easy for customers to build these kinds of applications,” said Sridharan. Stability AI’s eventful 2024 and the road ahead with its new leadership The Stable Diffusion 3.5 update and availability on Amazon Bedrock caps off an eventful year for Stability AI. The company’s founder and former CEO Emad Mostaque resigned in March amid concerns about focus and lack of revenue. It took until June for Stability AI to find a permanent replacement with the appointment of Akkaraju.  So far, Akkaraju had been at the helm for a series of model updates. He has also helped to bring in new investors such as Napster founder Sean Parker and advisors including acclaimed director James Cameron in September. Akkaraju comes from a visual effects background and helped make movies including Cameron’s Avatar. In his view, the professional visual media industry will completely transform over the next several years, moving to generated content rather than rendered content. “It’s a good thing that we work a lot in the creative business, because those are probably some of the  most demanding clients that we could ask for,” Akkaraju said. Looking forward, he joked that Stability AI’s plan is world domination. On a more serious note, though, he expects continued innovation as his company strives to meet real workflow needs. “We’re going to continue to push the model forward,” said Akkaraju. “You might even see us do another release

Stable Diffusion 3.5 hits Amazon Bedrock: What it means for enterprise AI workflows Read More »

Beyond LLMs: How SandboxAQ’s large quantitative models could optimize enterprise AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More While large language models (LLMs) and generative AI have dominated enterprise AI conversations over the past year, there are other ways that enterprises can benefit from AI. One alternative is large quantitative models (LQMs). LQMs are trained to optimize for specific objectives and parameters relevant to the industry or application, such as material properties or financial risk metrics. This is in contrast to the more general language understanding and generation tasks of LLMs. Among the leading advocates and commercial vendors of LQMs is SandboxAQ, which today announced it has raised $300 million in a new funding round. The company was originally part of Alphabet and was spun out as a separate business in 2022. The funding is a testament to the company’s success, and more importantly, to its future growth prospects as it looks to solve enterprise AI use cases. SandboxAQ has established partnerships with major consulting firms including Accenture, Deloitte and EY to distribute its enterprise solutions. The key advantage of LQMs is their ability to tackle complex, domain-specific problems in industries where the underlying physics and quantitative relationships are critical. “It’s all about core product creation at the companies that use our AI,” SandboxAQ CEO Jack Hidary told VentureBeat. “And so if you want to create a drug, a diagnostic, a new material or you want to do risk management at a big bank, that’s where quantitative models shine.” Why LQMs matter for enterprise AI LQMs have different goals and work in a different way than LLMs. Unlike LLMs that process internet-sourced text data, LQMs generate their own data from mathematical equations and physical principles. The goal is to tackle quantitative challenges that an enterprise might face. “We generate data and get data from quantitative sources,” Hidary explained. This approach enables breakthroughs in areas where traditional methods have stalled. For instance, in battery development, where lithium-ion technology has dominated for 45 years, LQMs can simulate millions of possible chemical combinations without physical prototyping. Similarly, in pharmaceutical development, where traditional approaches face a high failure rate in clinical trials, LQMs can analyze molecular structures and interactions at the electron level. In financial services, meanwhile, LQMs address limitations of traditional modelling approaches.  “Monte Carlo simulation is not sufficient anymore to handle the complexity of structured instruments,” said Hidary. A Monte Carlo simulation is a classic form of computational algorithm that uses random sampling to get results. With the SandboxAQ LQM approach, a financial services firm can scale in a way that a Monte Carlo simulation can’t enable. Hidary noted that some financial portfolios can be exceedingly complex with all manner of structured instruments and options. “If I have a portfolio and I want to know what the tail risk is given changes in this portfolio,” said Hidary, “what I’d like to do is I’d like to create 300 to 500 million versions of that portfolio with slight changes to it, and then I want to look at the tail risk.” How SandboxAQ is using LQMs to improve cybersecurity Sandbox AQ’s LQM technology is focused on enabling enterprises to create new products, materials and solutions, rather than just optimizing existing processes. Among the enterprise verticals in which the company has been innovating is cybersecurity. In 2023, the company first released its Sandwich cryptography management technology. That has since been further expanded with the company’s AQtive Guard enterprise solution.  The software can analyze an enterprise’s files, applications and network traffic to identify the encryption algorithms being used. This includes detecting the use of outdated or broken encryption algorithms like MD5 and SHA-1. SandboxAQ feeds this information into a management model that can alert the chief information security officer (CISO) and compliance teams about potential vulnerabilities. While an LLM could be used for the same purpose, the LQM takes a different approach. LLMs are trained on broad, unstructured internet data, which can include information about encryption algorithms and vulnerabilities. In contrast, SandboxAQ’s LQMs are built using targeted, quantitative data about encryption algorithms, their properties and known vulnerabilities. The LQMs use this structured data to build models and knowledge graphs specifically for encryption analysis, rather than relying on general language understanding. Looking forward, Sandbox AQ is also working on a future remediation module that can automatically suggest and implement updates to the encryption being used. Quantum dimensions without a quantum computer or transformers The original idea behind SandboxAQ was to combine AI techniques with quantum computing. Hidary and his team realized early on that real quantum computers were not going to be easy to come by or powerful enough in the short term. SandboxAQ is using quantum principles implemented through enhanced GPU infrastructure. Through a partnership, SandboxAQ has extended Nvidia’s CUDA capabilities to handle quantum techniques.  SandboxAQ also isn’t using transformers, which are the basis of nearly all LLMs. “The models that we train are neural network models and knowledge graphs, but they’re not transformers,” said Hidary. “You can generate from equations, but you can also have quantitative data coming from sensors or other kinds of sources and networks.” While LQM are different from LLMs, Hidary doesn’t see it as an either-or situation for enterprises. “Use LLMs for what they’re good at, then bring in LQMs for what they’re good at,” he said. source

Beyond LLMs: How SandboxAQ’s large quantitative models could optimize enterprise AI Read More »

Microsoft’s smaller AI model beats the big guys: Meet Phi-4, the efficiency king

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft launched a new artificial intelligence model today that achieves remarkable mathematical reasoning capabilities while using far fewer computational resources than its larger competitors. The 14-billion-parameter Phi-4 frequently outperforms much larger models like Google’s Gemini Pro 1.5, marking a significant shift in how tech companies might approach AI development. The breakthrough directly challenges the AI industry’s “bigger is better” philosophy, where companies have raced to build increasingly massive models. While competitors like OpenAI’s GPT-4o and Google’s Gemini Ultra operate with hundreds of billions or possibly trillions of parameters, Phi-4’s streamlined architecture delivers superior performance in complex mathematical reasoning. Microsoft’s Phi-4 AI model outperforms larger competitors in mathematical reasoning while using significantly fewer computational resources, as shown in its position at the forefront of small but powerful models on the efficiency-performance frontier. (Image: Microsoft) Small language models could reshape enterprise AI economics The implications for enterprise computing are significant. Current large language models (LLMs) require extensive computational resources, driving up costs and energy consumption for businesses deploying AI solutions. Phi-4’s efficiency could dramatically reduce these overhead costs, making sophisticated AI capabilities more accessible to mid-sized companies and organizations with limited computing budgets. This development comes at a critical moment for enterprise AI adoption. Many organizations have hesitated to fully embrace LLMs due to their resource requirements and operational costs. A more efficient model that maintains or exceeds current capabilities could accelerate AI integration across industries. Mathematical reasoning shows promise for scientific applications Phi-4 particularly excels at mathematical problem-solving, demonstrating impressive results on standardized math competition problems from the Mathematical Association of America’s American Mathematics Competitions (AMC). This capability suggests potential applications in scientific research, engineering, and financial modeling — areas where precise mathematical reasoning is crucial. The model’s performance on these rigorous tests indicates that smaller, well-designed AI systems can match or exceed the capabilities of much larger models in specialized domains. This targeted excellence could prove more valuable for many business applications than the broad but less focused capabilities of larger models. Microsoft’s Phi-4 achieves the highest average score on the November 2024 AMC 10/12 tests, outperforming both large and small AI models, including Google’s Gemini Pro, demonstrating its superior mathematical reasoning capabilities with fewer computational resources. (Image: Microsoft) Microsoft emphasizes safety and responsible AI development The company is taking a measured approach to Phi-4’s release, making it available through its Azure AI Foundry platform under a research license agreement, with plans for a wider release on Hugging Face. This controlled rollout includes comprehensive safety features and monitoring tools, reflecting growing industry awareness of AI risk management. Through Azure AI Foundry, developers can access evaluation tools to assess model quality and safety, along with content filtering capabilities to prevent misuse. These features address mounting concerns about AI safety while providing practical tools for enterprise deployment. Phi-4’s introduction suggests that the future of artificial intelligence might not lie in building increasingly massive models, but in designing more efficient systems that do more with less. For businesses and organizations looking to implement AI solutions, this development could herald a new era of more practical and cost-effective AI deployment. source

Microsoft’s smaller AI model beats the big guys: Meet Phi-4, the efficiency king Read More »

Google: AI agents, multimodal AI, enterprise search will dominate in 2025

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More If 2024 was all about experimentation, 2025 will truly be the year enterprises scale AI, according to a new trends report from Google Cloud out today.  Notably, sophisticated multimodal AI will support ever more complex tasks, AI agents will be embedded across enterprise, and internal search engines will unlock critical business insights.  Interestingly, Google identified these trends by using NotebookLM to analyze data from a previous research study, pulled out the fastest-growing AI topics in Google Trends, and plugged in third-party research and insights.  “Moving forward, you’ll see different agents talking to different agents, almost to the point where we all go to sleep at the end of the evening, and there’s a series of tasks and things and actions that are happening behind the scenes,” Oliver Parker, VP for global generative AI go-to-market at Google Cloud, told VentureBeat. Enterprises will move from chatbots to multi-agent systems AI agents are able to work autonomously (or semi-autonomously) and perform  multi-step processes. According to Capgemini, only about 10% of large enterprises are already using AI agents — but 82% plan to integrate them in the next three years.  Google identifies six types of AI agents:  Customer agents that understand user needs, answer questions, resolve issues and recommend products and services. They work across channels and can integrate voice and video.  Employee agents that help streamline processes, manage repetitive tasks, answer questions and edit and translate.  Creative agents that generate content, images and ideas to support design, marketing and writing projects and other endeavors.  Data agents that can assist with research and data analysis by finding and acting on data (while ensuring factual integrity).  Code agents that support code generation and provide coding assistance.  Security agents that help mitigate attacks or increase the speed of investigations.  However, Parker pointed out, having many agents taking on many processes across many functions can create a bit of chaos, which will give rise to new platforms, he predicts.  “Being able to have a single canvas for managing and enabling your agents is where we think there’s going to be a huge opportunity,” he said. This will lead to “agentic governance,” or the need for an agentic layer that supports “different agents that are going everywhere and working across all these different systems.” Multimodal AI will provide more context The global multimodal AI market was estimated at $2.4 billion in 2025, and is expected to reach $98.9 billion by the end of 2037.  Multimodal AI brings AI comprehension to the next level, allowing models to decipher and process a range of data sources including not only text, but images, video and audio. Several leading vendors and cutting-edge startups already offer highly capable multimodal tools — for instance, Google’s own Gemini 2.0 Flash, Mistral’s Pixtral 12B or Cohere’s Embed 3.  Google predicts that the explosion of multimodal AI will support complex data analysis and lead to greater grounding and more personalized insights.  Along with this, enterprises will be multi-model as they adopt AI. Parker pointed out that conversations have transitioned from enterprises adopting a single model to deploying many for different use cases. “It’s not just an OpenAI model,” he said. “It’s also Gemini, it’s Anthropic, it’s Mistral, it’s Cohere, it’s Llama.” It’s been a fast evolution over the past 12 months, Parker noted. Enterprises have moved beyond just looking at models to analyzing different platforms and laying out AI and AI agent roadmaps. While much of the focus to this point has been on development, the goal in 2025 will be getting gen AI capabilities into the hands of enterprise users.  “The first half of ’24 was heavy, heavy experimentation, but without a lot of production,” said Parker. Now, enterprises are beginning to move into production, although it’s not yet production at scale (more of that will come in 2025).  “These are typically trends you see over several years,” he said. “We’ve just seen them very compressed over a 12-month period. It’s breathtaking.” We will finally unlock the power of enterprise search Enterprise search — supported by internal search engines that query specific enterprise data — will only become more intuitive with AI, Google predicts. It will no longer just be keyword-based queries; employees will be able to use images, audio, video and conversational prompts to quickly access internal data. This will allow for more advanced and intuitive searches, Parker pointed out, and gen AI can process different data formats such as documents, spreadsheets and multimedia.   “It’s not just search, it’s search plus conversational AI,” said Parker. “People’s jobs are really about finding information and bringing it together to be able to get insights and take actions.” He noted that many organizations have different information siloed across different applications — whether a coding system, Jira, Confluence, Box, or platforms like SharePoint or Service Now. AI search can quickly move across these to bring data together. “These systems of reasoning are able to search across enterprise systems,” he said. “So how do you query and find out what’s happening across your organization, across all your systems, and then start to apply agents to take action on it?” Yes, here, too, AI agents will play a big part. “We’re seeing a confluence of conversational and agent-based capabilities combined with search inside organizations,” said Parker.  source

Google: AI agents, multimodal AI, enterprise search will dominate in 2025 Read More »

Meta’s new BLT architecture replaces tokens to make LLMs more efficient and versatile

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The AI research community continues to find new ways to improve large language models (LLMs), the latest being a new architecture introduced by scientists at Meta and the University of Washington. Their technique, Byte latent transformer (BLT), could be the next important paradigm for making LLMs more versatile and scalable. BLT solves one of the longstanding problems of LLMs that operate at byte level as opposed to tokens. BLT can open the way for new models that can process raw data, are robust to changes and don’t rely on fixed vocabularies. Tokens vs bytes Most LLMs are trained based on a static set of tokens, predefined groups of byte sequences. During inference, a tokenizer breaks the input sequence down into tokens before passing it to the LLM. This makes the models more efficient in using compute resources but also creates biases that can degrade the model’s performance when faced with tokens not included in the vocabulary. For example, many leading language models can become slow and more costly when faced with languages that have a small representation on the web because their words were not included in the model’s token vocabulary. Misspelled words can also cause the model to tokenize the input incorrectly. And tokenized models can struggle with character-level tasks, such as manipulating sequences. Moreover, modifying the vocabulary requires the model to be retrained. And expanding the token vocabulary can require architectural changes to the model to accommodate the added complexity. Alternatively, LLMs can be trained directly on single bytes, which can solve many of the abovementioned problems. However, byte-level LLMs are prohibitively costly to train at scale and can’t handle very long sequences, which is why tokenization remains an essential part of current LLMs. Byte latent transformer (BLT) Byte latent transformer (BLT) is a tokenizer-free architecture that learns directly from raw bytes and matches the performance of tokenization-based models. To solve the inefficiencies of other byte-level LLMs, BLT uses a dynamic method that groups bytes based on the level of information they contain. “Central to our architecture is the idea that models should dynamically allocate compute where it is needed,” the researchers write.  Unlike tokenized models, BLT has no fixed vocabulary. Instead, it maps arbitrary groups of bytes into patches using entropy measures. BLT does this dynamic patching through a novel architecture with three transformer blocks: two small byte-level encoder/decoder models and a large “latent global transformer.” BLT architecture (source: arXiv) The encoder and decoder are lightweight models. The encoder takes in raw input bytes and creates the patch representations that are fed to the global transformer. At the other end, the local decoder takes the batch representations processed by the global transformer and decodes them into raw bytes. The latent global transformer is the model’s main workhorse. It takes in the patch representations generated by the encoder and predicts the next patch in the sequence. When processed by the decoder, this patch is unpacked into one or several bytes. The global transformer accounts for the largest share of compute resources during training and inference. Therefore, the patching mechanism determines how the global transformer is used and can help control the amount of compute used for different portions of the input and output. BLT redefines the tradeoff between vocabulary size and compute requirements. In standard LLMs, increasing the size of the vocabulary means larger tokens on average, which can reduce the number of steps required to process a sequence. However, it will also require larger dimensions in the projection layers inside the transformer, which itself consumes more resources.  In contrast, BLT can balance compute resources based on the complexity of the data instead of the vocabulary size. For example, the ending of most words is easy to predict and requires fewer resources. On the other hand, predicting the first byte of a new word or the first word of a sentence requires more compute cycles. “BLT unlocks a new dimension for scaling, allowing simultaneous increases in model and patch size within a fixed inference budget,” the researchers write. “This new paradigm becomes advantageous for compute regimes commonly encountered in practical settings.” BLT in action The researchers conducted experiments with BLT and classic transformers on models of different scales, running from 400 million to 8 billion parameters. According to the authors, this is “the first flop-controlled scaling study of byte-level models up to 8B parameters and 4T training bytes, showing that we can train a model end-to-end at scale from bytes without fixed-vocabulary tokenization.” Their findings show that when controlled for the amount of compute resources allocated to training, BLT matches the performance of Llama 3 while using up to 50% fewer FLOPs at inference. This efficiency comes from the model’s dynamic patching, which results in longer groups of bytes, saving compute that can be reallocated to grow the size of the global latent transformer. “To the best of our knowledge, BLT is the first byte-level Transformer architecture to achieve matching scaling trends with BPE-based models at compute optimal regimes,” the researchers write. Beyond efficiency, BLT models proved to be more robust to noisy inputs compared to tokenizer-based models. They had enhanced character-level understanding abilities and also showed improved performance on tasks such as character manipulation and low-resource machine translation. According to the researchers, the ability of BLT to directly process raw bytes as opposed to tokens “provides significant improvements in modeling the long tail of the data,” which means the models are better at working with patterns that don’t appear often in the training corpus. This is still the beginning of what could be a new standard for creating language models. The researchers note that existing transformer libraries and codebases are designed to be highly efficient for tokenizer-based transformer architectures. This means that BLT still has room to benefit from software and hardware optimizations. source

Meta’s new BLT architecture replaces tokens to make LLMs more efficient and versatile Read More »

Salesforce drops Agentforce 2.0, brings reasoning AI to enterprise

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Salesforce unveiled a major upgrade to its artificial intelligence platform on Tuesday, introducing technology that enables AI agents to perform deeper reasoning and take more autonomous actions across enterprise workflows — part of what the company’s CEO frames as an ambitious push into “digital labor.” The San Francisco software giant’s Agentforce 2.0 represents a significant evolution in how AI assistants operate within businesses, moving beyond simple chatbots to AI agents that can understand complex requests, access relevant company data, and independently complete multi-step tasks. “We’re creating a new industry,” said Marc Benioff, Salesforce’s chief executive, at a press conference announcing the release. “This isn’t just about managing and sharing information and data anymore. We’re a digital labor provider.” How Atlas Reasoning Engine powers next-generation enterprise AI The upgraded platform introduces what Salesforce calls the Atlas Reasoning Engine, which enables AI agents to engage in more sophisticated analysis and decision-making. Unlike traditional AI assistants that provide quick responses based on pattern matching, Atlas employs “System 2” reasoning — a more deliberative approach inspired by psychologist Daniel Kahneman’s research on human thought processes. “The reasoning engine should be one of the first factors enterprise organizations consider when comparing digital labor options,” said Claire Cheng, Ph.D., VP of machine learning and engineering at Salesforce. Early results appear promising. In testing, Agentforce 2.0 achieved a 33% improvement in answer accuracy compared to DIY AI solutions, while doubling response relevance, according to Salesforce. The company has already deployed the technology internally. At help.salesforce.com, AI agents now handle 83% of customer support queries independently, with human escalations dropping by 50% since implementation two weeks ago. “Suddenly, as a CEO, I’m not just managing human beings, but I’m also managing agents,” said Benioff. “There is an authentic agentic layer around the platform today. It’s not some vision fantasy in the future idea, it’s what is happening right now.” Digital labor: The key to solving global workforce challenges Salesforce’s push into “digital labor” comes amid growing labor shortages across industries. With birth rates declining and companies struggling to fill positions, Benioff sees AI agents as a crucial solution for business growth. “To unlock GDP growth, we need breakthrough technology. We have to become a digital labor provider,” he said. “This is the new horizon for business — this idea that a door has opened and business will never be the same.” The technology is already finding real-world applications. The Adecco Group, a global staffing firm, is using Agentforce to process millions of resumes and match candidates to opportunities. Digital tablet maker reMarkable deployed it for customer service, while accounting firm 1-800 Accountant expects to deflect 65% of incoming service requests using AI agents. Behind the tech: The innovation powering Salesforce’s AI revolution Under the hood, Agentforce 2.0 introduces several technical advances. The Atlas Reasoning Engine creates detailed semantic understanding of company data and processes, enabling more contextual responses. “We’re able to associate each data component with contextual metadata information, which allows us to find the mapping between data and the corresponding semantic meaning,” explained Silvio Savarese, who leads Salesforce’s AI research. “This enables much more relevant, much more aligned responses to user queries.” The platform also introduces enhanced integration with Slack, Salesforce’s workplace messaging platform, allowing employees to work alongside AI agents directly in their communication flows. “If you want these agents to be used, to be engaged with, and you want them to get better over time, having them where people are already working is critical,” said Rob Seaman, who oversees Slack integration. Looking ahead, Salesforce envisions expanding into physical robotics, with Benioff announcing plans for a “robot force partner program” to connect physical robots with the company’s AI agent platform. Trust, security, and the future: Navigating AI’s enterprise integration For Salesforce, the stakes of this initiative are significant. While the company expects $38 billion in revenue this year from its traditional software business, Benioff believes the digital labor market represents a multi-trillion dollar opportunity. However, challenges remain, particularly around trust and security. Salesforce emphasizes its “trust layer” that prevents toxic content and maintains data privacy, while giving customers control over how agents operate within their organizations. “These things act as a user — they don’t have God permissions or admin permissions,” noted Seaman. “We don’t create any holes for the AI to see things that it should not be able to.” As businesses grapple with persistent labor shortages and productivity challenges, Salesforce is betting that AI agents will become an essential part of the modern workforce. The company’s vision suggests a future where human employees work alongside AI agents that can handle increasingly complex tasks — fundamentally changing how businesses operate and scale. “This is the beginning of the beginning,” said Benioff. “When you’re at the beginning of the beginning, you see these little things, and then you try to extrapolate what this is going to be. This is an incredible moment.” source

Salesforce drops Agentforce 2.0, brings reasoning AI to enterprise Read More »

Google upgrades its programming agent Code Assist with Gemini 2.0, adds source integrations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More On the heels of releasing its new generative AI models, Google updated its Code Assist tools to work with Gemini 2.0 and expanded the external data sources it connects to.  Code Assist will now run on the recently released Gemini 2.0, offering a larger context window to understand bigger code bases from enterprises.  Google will also launch Gemini Code Assist tools in a private preview. The platform will connect to data sources like GitLab, GitHub, Google Docs, Sentry.io, Atlassian and Snyk. This will allow developers and other coders to ask Code Assist for help directly in their IDEs. Previously, Code Assist connected to VS Code and JetBrains.  Google Cloud senior director for product management Ryan J. Salva told VentureBeat in an interview that the idea is to allow coders to add more context to their work without interrupting their flow. Salva said Google will add more partners in the future.  Formerly Duet AI, Code Assist was launched for enterprises in October. As organizations sought ways to streamline coding projects, demand for AI coding platforms like GitHub Copilot grew. Code Assist added enterprise-grade security and legal indemnification when the enterprise option was released.   AI where developers work Salva said connecting Code Assist to other tools developers use provides more context for their work without them having to simultaneously open multiple windows.  “There’s so many other tools that a developer uses in the course of a day,” Salva said. “They might use GitHub or Atlassian Jira or DataDog or Snyk or all these other tools. What we wanted to do is to enable developers to bring in that additional context to their IDE.” Demo of Code Assist Salva said developers just need to open the Code Assist chat window and ask it to summarize the most recent comments for particular issues or the most recent pull requests on repositories, “so that it queries the data source and brings the context back to the IDE and [the] large language model can synthesize it.” AI code assistants were some of the first significant use cases for generative AI, especially after software developers began using ChatGPT to help with coding. Since then, a slew of enterprise-focused coding assistants have been released. GitHub released Copilot Enterprise in February, and Oracle launched its Java and SQL coding assistant. Harness came out with a coding assistant built with Gemini that gives real-time suggestions.  Meanwhile, OpenAI and Anthropic began offering interface features that let coders work directly on their chat platforms. ChatGPT’s Canvas lets users generate and edit code without copying and pasting it elsewhere. OpenAI also added integrations to tools like VS Code, XCode, Terminal and iTerm 2 from the ChatGPT MacOS desktop app. Meanwhile, Anthropic launched Artifacts for Claude so Claude users can generate, edit and run code.  Not Jules Salva pointed out that while Code Assist now supports Gemini 2.0, it remains wholly separate from Jules, the coding tool Google announced during the launch of the new Gemini models.  “Jules is really one of the many experiments to emerge out of the Google Labs team to show how we can use autonomous or semiautonomous agents to automate the process of coding,” Salva said. “You can expect that over time, the experiments that graduate from Google Labs, those same capabilities, might become a part of products like Gemini Code Assist.” He added that his team works closely with the Jules team and is excited to see Jules progress, but Code Assist remains the only generally available enterprise-grade coding tool powered by Gemini.  Salva said early feedback from Code Assist and Jules users shows great interest in Gemini 2.0’s latency improvements.  “When you’re sitting there trying to code and trying to stay in the flow state, you want those kinds of responses to come up in milliseconds. Any moment the developer feels like they’re waiting for the tool is a bad thing, and so we’re getting faster and faster responses out of it,” he said.  Coding assistants will still be crucial to the growth of the generative AI space, but Salva said the next few years may see a change in how companies develop code generation models and applications. Salva pointed to the 2024 Accelerate State of DevOps Report from Google’s DevOps Research and Assessment team, which showed 39% of respondents distrusted AI-generated code and a decline in documentation and delivery quality.  “We have as an industry with AI assistive tools focused largely on throughput productivity improvements and velocity improvements over the course of the last four years,” Salva said. “And as we’re starting to see that that be associated with a drop in overall stability, I suspect here that the conversation in the next year is really going to shift to how are we using AI to improve quality across multiple dimensions.” source

Google upgrades its programming agent Code Assist with Gemini 2.0, adds source integrations Read More »

We’ve come a long way from RPA: How AI agents are revolutionizing automation

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In the past year, the race to automate has intensified, with AI agents emerging as the ultimate game-changers for enterprise efficiency. While generative AI tools have made significant strides over the past three years — acting as valuable assistants in enterprise workflows — the spotlight is now shifting to AI agents capable of thinking, acting and collaborating autonomously. For enterprises preparing to embrace the next wave of intelligent automation, understanding the leap from chatbots to retrieval-augmented generation (RAG) applications to autonomous multi-agent AI is crucial. As Gartner noted in a recent survey, 33% of enterprise software applications will include agentic AI by 2028, up from less than 1% in 2024. As Google Brain founder Andrew Ng aptly stated: “The set of tasks that AI can do will expand dramatically because of agentic workflows.” This marks a paradigm shift in how organizations view the potential of automation, moving beyond predefined processes to dynamic, intelligent workflows. The limitations of traditional automation Despite their promise, traditional automation tools are constrained by rigidity and high implementation costs. Over the past decade, robotic process automation (RPA) platforms like UiPath and Automation Anywhere have struggled with workflows lacking clear processes or relying on unstructured data. These tools mimic human actions but often lead to brittle systems that require costly vendor intervention when processes change. Current gen AI tools, such as ChatGPT and Claude, have advanced reasoning and content generation capabilities but fall short of autonomous execution. Their dependency on human input for complex workflows introduces bottlenecks, limiting efficiency gains and scalability. The emergence of vertical AI agents As the AI ecosystem evolves, a significant shift is occurring toward vertical AI agents — highly specialized AI systems designed for specific industries or use cases. As Microsoft founder Bill Gates said in a recent blog post: “Agents are smarter. They’re proactive — capable of making suggestions before you ask for them. They accomplish tasks across applications. They improve over time because they remember your activities and recognize intent and patterns in your behavior. “ Unlike traditional software-as-a-service (SaaS) models, vertical AI agents do more than optimize existing workflows; they reimagine them entirely, bringing new possibilities to life. Here’s what makes vertical AI agents the next big thing in enterprise automation: Elimination of operational overhead: Vertical AI agents execute workflows autonomously, eliminating the need for operational teams. This is not just automation; it’s a complete replacement of human intervention in these domains. Unlocking new possibilities: Unlike SaaS, which optimized existing processes, vertical AI fundamentally reimagines workflows. This approach brings entirely new capabilities that didn’t exist before, creating opportunities for innovative use cases that redefine how businesses operate. Building strong competitive advantages: AI agents’ ability to adapt in real-time makes them highly relevant in today’s fast-changing environments. Regulatory compliance, such as HIPAA, SOX, GDPR, CCPA and new and forthcoming AI regulations can help these agents build trust in high-stakes markets. Additionally, proprietary data tailored to specific industries can create strong, defensible moats and competitive advantages. Evolution from RPA to multi-agent AI The most profound shift in the automation landscape is the transition from RPA to multi-agent AI systems capable of autonomous decision-making and collaboration. According to a recent Gartner survey, this shift will enable 15% of day-to-day work decisions to be made autonomously by 2028. These agents are evolving from simple tools into true collaborators, transforming enterprise workflows and systems. This reimagination is happening at multiple levels: Systems of record: AI agents like Lutra AI and Relevance AI integrate diverse data sources to create multimodal systems of record. Leveraging vector databases like Pinecone, these agents analyze unstructured data such as text, images and audio, enabling organizations to extract actionable insights from siloed data seamlessly. Workflows: Multi-agent systems automate end-to-end workflows by breaking complex tasks into manageable components. For example: Startups like Cognition automate software development workflows, streamlining coding, testing and deployment, while Observe.AI handles customer inquiries by delegating tasks to the most appropriate agent and escalating when necessary. Real-world case study: In a recent interview, Lenovo’s Linda Yao said, “With our gen AI agents helping support customer service, we’re seeing double-digit productivity gains on call handling time. And we’re seeing incredible gains in other places too. We’re finding that marketing teams, for example, are cutting the time it takes to create a great pitch book by 90% and also saving on agency fees.” Reimagined architectures and developer tools: Managing AI agents requires a paradigm shift in tooling. Platforms like AI Agent Studio from Automation Anywhere enable developers to design and monitor agents with built-in compliance and observability features. These tools provide guardrails, memory management and debugging capabilities, ensuring agents operate safely within enterprise environments. Reimagined co-workers: AI agents are more than just tools — they are becoming collaborative co-workers. For example, Sierra leverages AI to automate complex customer support scenarios, freeing up employees to focus on strategic initiatives. Startups like Yurts AI optimize decision-making processes across teams, fostering human-agent collaboration. According to McKinsey, “60 to 70% of the work hours in today’s global economy could theoretically be automated by applying a wide variety of existing technology capabilities, including gen AI.” Future outlook: As agents gain better memory, advanced orchestration capabilities and enhanced reasoning, they will seamlessly manage complex workflows with minimal human intervention, redefining enterprise automation. The accuracy imperative and economic considerations As AI agents progress from handling tasks to managing workflows and entire jobs, they face a compounding accuracy challenge. Each additional step introduces potential errors, multiplying and degrading overall performance. Geoffrey Hinton, a leading figure in deep learning, warns: “We should not be afraid of machines thinking; we should be afraid of machines acting without thinking.” This highlights the critical need for robust evaluation frameworks to ensure high accuracy in automated processes. Case in point: An AI agent with 85% accuracy in executing a single task achieves only 72% overall accuracy when performing two tasks (0.85 × 0.85).

We’ve come a long way from RPA: How AI agents are revolutionizing automation Read More »

Nvidia and DataStax just made generative AI smarter and leaner

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia and DataStax launched new technology today that dramatically reduces storage requirements for companies deploying generative AI systems, while enabling faster and more accurate information retrieval across multiple languages. The new Nvidia NeMo Retriever microservices, integrated with DataStax’s AI platform, cuts data storage volume by 35 times compared to traditional approaches — a crucial capability, as enterprise data is projected to reach more than 20 zettabytes by 2027. “Today’s enterprise unstructured data is at 11 zettabytes, roughly equal to 800,000 copies of the Library of Congress, and 83% of that is unstructured with 50% being audio and video,” said Kari Briski, VP of product management for AI at Nvidia, in an interview with VentureBeat. “Significantly reducing these storage costs while enabling companies to effectively embed and retrieve information becomes a game changer.” Nvidia’s NeMo Retriever technology delivers a 35x improvement in data storage efficiency, as illustrated in a comparison of raw text storage, baseline vector embeddings, and reduced embedding dimensions. This breakthrough underpins the scalability of generative AI across enterprise applications. (Credit: Nvidia) The technology is already proving transformative for Wikimedia Foundation, which used the integrated solution to reduce processing time for 10 million Wikipedia entries from 30 days to under three days. The system handles real-time updates across hundreds of thousands of entries being edited daily by 24,000 global volunteers. “You can’t just rely on large language models for content — you need context from your existing enterprise data,” explained Chet Kapoor, CEO of DataStax. “This is where our hybrid search capability comes in, combining both semantic search and traditional text search, then using Nvidia’s re-ranker technology to deliver the most relevant results in real time at global scale.” Enterprise data security meets AI accessibility The partnership addresses a critical challenge facing enterprises: how to make their vast stores of private data accessible to AI systems without exposing sensitive information to external language models. “Take FedEx — 60% of their data sits in our products, including all package delivery information for the past 20 years with personal details. That’s not going to Gemini or OpenAI anytime soon, or ever,” Kapoor explained. The technology is finding early adoption across industries, with financial services firms leading the charge despite regulatory constraints. “I’ve been blown away by how far ahead financial services firms are now,” said Kapoor, citing Commonwealth Bank of Australia and Capital One as examples. The next frontier for AI: Multimodal document processing Looking ahead, Nvidia plans to expand the technology’s capabilities to handle more complex document formats. “We’re seeing great results with multimodal PDF processing — understanding tables, graphs, charts and images and how they relate across pages,” Briski revealed. “It’s a really hard problem that we’re excited to tackle.” For enterprises drowning in unstructured data while trying to deploy AI responsibly, the new offering provides a path to make their information assets AI-ready without compromising security or breaking the bank on storage costs. The solution is available immediately through the Nvidia API catalog with a 90-day free trial license. The announcement underscores the growing focus on enterprise AI infrastructure as companies move beyond experimentation to large-scale deployment, with data management and cost efficiency becoming critical success factors. source

Nvidia and DataStax just made generative AI smarter and leaner Read More »