VentureBeat

Multimodal RAG is growing, here’s the best way to get started

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As companies begin experimenting with multimodal retrieval augmented generation (RAG), companies providing multimodal embeddings — a way to transform data to RAG-readable files — advise enterprises to start small when starting with embedding images and videos.  Multimodal RAG, RAG that can also surface a variety of file types from text, images or videos, relies on embedding models that transform data into numerical representations that AI models can read. Embeddings that can process all kinds of files let enterprises find information from financial graphs, product catalogs or just any informational video they have and get a more holistic view of their company.  Cohere, which updated its embeddings model, Embed 3, to process images and videos last month, said enterprises need to prepare their data differently, ensure suitable performance from the embeddings, and better use multimodal RAG. “Before committing extensive resources to multimodal embeddings, it’s a good idea to test it on a more limited scale. This enables you to assess the model’s performance and suitability for specific use cases and should provide insights into any adjustments needed before full deployment,” a blog post from Cohere staff solutions architect Yann Stoneman said.  The company said many of the processes discussed in the post are present in many other multimodal embedding models. Stoneman said, depending on some industries, models may also need “additional training to pick up fine-grain details and variations in images.” He used medical applications as an example, where radiology scans or photos of microscopic cells require a specialized embedding system that understands the nuances in those kinds of images. Data preparation is key Before feeding images to a multimodal RAG system, these must be pre-processed so the embedding model can read them well.  Images may need to be resized so they’re all a consistent size, while organizations need to figure out if they want to improve low-resolution photos so important details don’t get lost or make too high-resolution pictures a lower quality so it doesn’t strain processing time.  “The system should be able to process image pointers (e.g. URLs or file paths) alongside text data, which may not be possible with text-based embeddings. To create a smooth user experience, organizations may need to implement custom code to integrate image retrieval with existing text retrieval,” the blog said.  Multimodal embeddings become more useful  Many RAG systems mainly deal with text data because using text-based information as embeddings is easier than images or videos. However, since most enterprises hold all kinds of data, RAG which can search pictures and texts has become more popular. Organizations often had to implement separate RAG systems and databases, preventing mixed-modality searches.  Multimodal search is nothing new, as OpenAI and Google offer the same on their respective chatbots. OpenAI launched its latest generation of embeddings models in January. Other companies also provide a way for businesses to harness their different data for multimodal RAG. For example, Uniphore released a way to help enterprises prepare multimodal datasets for RAG. source

Multimodal RAG is growing, here’s the best way to get started Read More »

AnyChat brings together ChatGPT, Google Gemini, and more for ultimate AI flexibility

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new tool called AnyChat is giving developers unprecedented flexibility by uniting a wide range of leading large language models (LLMs) under a single interface. Developed by Ahsen Khaliq (also known as “AK”), a prominent figure in the AI community and machine learning growth lead at Gradio, the platform allows users to switch seamlessly between models like ChatGPT, Google’s Gemini, Perplexity, Claude, Meta’s LLaMA, and Grok, all without being locked into a single provider. AnyChat promises to change how developers and enterprises interact with artificial intelligence by offering a one-stop solution for accessing multiple AI systems. At its core, AnyChat is designed to make it easier for developers to experiment with and deploy different LLMs without the restrictions of traditional platforms. “We wanted to build something that gave users total control over which models they can use,” said Khaliq. “Instead of being tied to a single provider, AnyChat gives you the freedom to integrate models from various sources, whether it’s a proprietary model like Google’s Gemini or an open-source option from Hugging Face.” Khaliq’s brainchild is built on Gradio, a popular framework for creating customizable AI applications. The platform features a tab-based interface that allows users to easily switch between models, along with dropdown menus for selecting specific versions of each AI. AnyChat also supports token authentication, ensuring secure access to APIs for enterprise users. For models requiring paid API keys—such as Gemini’s search capabilities—developers can input their own credentials, while others, like basic Gemini models, are available without an API key thanks to a free key provided by Khaliq. How AnyChat fills a critical gap in AI development The launch of AnyChat comes at a critical time for the AI industry. As companies increasingly integrate AI into their operations, many have found themselves constrained by the limitations of individual platforms. Most developers currently have to choose between committing to a single model, such as OpenAI’s GPT-4o, or spending significant time and resources integrating multiple models separately. AnyChat addresses this pain point by offering a unified interface that can handle both proprietary and open-source models, giving developers the flexibility to choose the best tool for the job at any given moment. This flexibility has already attracted interest from the developer community. In a recent update, a contributor added support for DeepSeek V2.5, a specialized model made available through the Hyperbolic API, demonstrating how easily new models can be integrated into the platform. “The real power of AnyChat lies in its ability to grow,” said Khaliq. “The community can extend it with new models, making the potential of this platform far greater than any one model alone.” What makes AnyChat useful for teams and companies For developers, AnyChat offers a streamlined solution to what has historically been a complicated and time-consuming process. Rather than building separate infrastructure for each model or being forced to use a single AI provider, users can deploy multiple models within the same app. This is particularly useful for enterprises that may need different models for different tasks—an organization could use ChatGPT for customer support, Gemini for research and search capabilities, and Meta’s LLaMA for vision-based tasks, all within the same interface. The platform also supports real-time search and multimodal capabilities, making it a versatile tool for more complex use cases. For example, Perplexity models integrated into AnyChat offer real-time search functionality, a feature that many enterprises find valuable for keeping up with constantly changing information. On the other hand, models like LLaMA 3.2 provide vision support, expanding the platform’s capabilities beyond text-based AI. Khaliq noted that one of the key advantages of AnyChat is its open-source support. “We wanted to make sure that developers who prefer working with open-source models have the same access as those using proprietary systems,” he said. AnyChat supports a broad range of models hosted on Hugging Face, a popular platform for open-source AI implementations. This gives developers more control over their deployments and allows them to avoid costly API fees associated with proprietary models. How AnyChat handles both text and image processing One of the most exciting aspects of AnyChat is its support for multimodal AI, or models that can process both text and images. This capability is becoming increasingly crucial as companies look for AI systems that can handle more complex tasks, from analyzing images for diagnostic purposes to generating text-based insights from visual data. Models like LLaMA 3.2, which includes vision support, are key to addressing these needs, and AnyChat makes it easy to switch between text-based and multimodal models as needed. For many enterprises, this flexibility is a huge deal. Rather than investing in separate systems for text and image analysis, they can now deploy a single platform that handles both. This can lead to significant cost savings, as well as faster development times for AI-driven projects. AnyChat’s growing library of AI models AnyChat’s potential extends beyond its current capabilities. Khaliq believes that the platform’s open architecture will encourage more developers to contribute models, making it an even more powerful tool over time. “The beauty of AnyChat is that it doesn’t just stop at what’s available now. It’s designed to grow with the community, which means the platform will always be at the cutting edge of AI development,” he told VentureBeat. The community has already embraced this vision. In a discussion on Hugging Face, developers have noted how easy it is to add new models to the platform. With support for models like DeepSeek V2.5 already being integrated, AnyChat is poised to become a hub for AI experimentation and deployment. What’s next for AnyChat and AI development As the AI landscape continues to evolve, tools like AnyChat will play a crucial role in shaping how developers and enterprises interact with AI technology. By offering a unified interface for multiple models and allowing for seamless integration of both proprietary and open-source systems, AnyChat is breaking down the barriers that

AnyChat brings together ChatGPT, Google Gemini, and more for ultimate AI flexibility Read More »

Microsoft brings AI to the farm and factory floor, partnering with industry giants

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has launched a new suite of specialized AI models designed to address specific challenges in manufacturing, agriculture, and financial services. In collaboration with partners such as Siemens, Bayer, Rockwell Automation, and others, the tech giant is aiming to bring advanced AI technologies directly into the heart of industries that have long relied on traditional methods and tools. These purpose-built models—now available through Microsoft’s Azure AI catalog—represent Microsoft’s most focused effort yet to develop AI tools tailored to the unique needs of different sectors. The company’s initiative reflects a broader strategy to move beyond general-purpose AI and deliver solutions that can provide immediate operational improvements in industries like agriculture and manufacturing, which are increasingly facing pressures to innovate. “Microsoft is in a unique position to deliver the industry-specific solutions organizations need through the combination of the Microsoft Cloud, our industry expertise, and our global partner ecosystem,” Satish Thomas, Corporate Vice President of Business & Industry Solutions at Microsoft, said in a LinkedIn post announcing the new AI models. “Through these models,” he added, “we’re addressing top industry use cases, from managing regulatory compliance of financial communications to helping frontline workers with asset troubleshooting on the factory floor — ultimately, enabling organizations to adopt AI at scale across every industry and region… and much more to come in future updates!” Siemens and Microsoft remake industrial design with AI-powered software At the center of the initiative is a partnership with Siemens to integrate AI into its NX X software, a widely used platform for industrial design. Siemens’ NX X copilot uses natural language processing to allow engineers to issue commands and ask questions about complex design tasks. This feature could drastically reduce the onboarding time for new users while helping seasoned engineers complete their work faster. By embedding AI into the design process, Siemens and Microsoft are addressing a critical need in manufacturing: the ability to streamline complex tasks and reduce human error. This partnership also highlights a growing trend in enterprise technology, where companies are looking for AI solutions that can improve day-to-day operations rather than experimental or futuristic applications. Smaller, faster, smarter: How Microsoft’s compact AI models are transforming factory operations Microsoft’s new initiative relies heavily on its Phi family of small language models (SLMs), which are designed to perform specific tasks while using less computing power than larger models. This makes them ideal for industries like manufacturing, where computing resources can be limited, and where companies often need AI that can operate efficiently on factory floors. Perhaps one of the most novel uses of AI in this initiative comes from Sight Machine, a leader in manufacturing data analytics. Sight Machine’s Factory Namespace Manager addresses a long-standing but often overlooked problem: the inconsistent naming conventions used to label machines, processes, and data across different factories. This lack of standardization has made it difficult for manufacturers to analyze data across multiple sites. The Factory Namespace Manager helps by automatically translating these varied naming conventions into standardized formats, allowing manufacturers to better integrate their data and make it more actionable. While this may seem like a minor technical fix, the implications are far-reaching. Standardizing data across a global manufacturing network could unlock operational efficiencies that have been difficult to achieve. Early adopters like Swire Coca-Cola USA, which plans to use this technology to streamline its production data, likely see the potential for gains in both efficiency and decision-making. In an industry where even small improvements in process management can translate into substantial cost savings, addressing this kind of foundational issue is a crucial step toward more sophisticated data-driven operations. Smart farming gets real: Bayer’s AI model tackles modern agriculture challenges In agriculture, the Bayer E.L.Y. Crop Protection model is poised to become a key tool for farmers navigating the complexities of modern farming. Trained on thousands of real-world questions related to crop protection labels, the model provides farmers with insights into how best to apply pesticides and other crop treatments, factoring in everything from regulatory requirements to environmental conditions. This model comes at a crucial time for the agricultural industry, which is grappling with the effects of climate change, labor shortages, and the need to improve sustainability. By offering AI-driven recommendations, Bayer’s model could help farmers make more informed decisions that not only improve crop yields but also support more sustainable farming practices. The initiative also extends into the automotive and financial sectors. Cerence, which develops in-car voice assistants, will use Microsoft’s AI models to enhance in-vehicle systems. Its CaLLM Edge model allows drivers to control various car functions, such as climate control and navigation, even in settings with limited or no cloud connectivity—making the technology more reliable for drivers in remote areas. In finance, Saifr, a regulatory technology startup within Fidelity Investments, is introducing models aimed at helping financial institutions manage regulatory compliance more effectively. These AI tools can analyze broker-dealer communications to flag potential compliance risks in real-time, significantly speeding up the review process and reducing the risk of regulatory penalties. Rockwell Automation, meanwhile, is releasing the FT Optix Food & Beverage model, which helps factory workers troubleshoot equipment in real time. By providing recommendations directly on the factory floor, this AI tool can reduce downtime and help maintain production efficiency in a sector where operational disruptions can be costly. The release of these AI models marks a shift in how businesses can adopt and implement artificial intelligence. Rather than requiring companies to adapt to broad, one-size-fits-all AI systems, Microsoft’s approach allows businesses to use AI models that are custom-built to address their specific operational challenges. This addresses a major pain point for industries that have been hesitant to adopt AI due to concerns about cost, complexity, or relevance to their particular needs. The focus on practicality also reflects Microsoft’s understanding that many businesses are looking for AI tools that can deliver immediate, measurable results. In sectors like manufacturing and agriculture,

Microsoft brings AI to the farm and factory floor, partnering with industry giants Read More »

How Microsoft’s next-gen BitNet architecture is turbocharging LLM efficiency

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More One-bit large language models (LLMs) have emerged as a promising approach to making generative AI more accessible and affordable. By representing model weights with a very limited number of bits, 1-bit LLMs dramatically reduce the memory and computational resources required to run them. Microsoft Research has been pushing the boundaries of 1-bit LLMs with its BitNet architecture. In a new paper, the researchers introduce BitNet a4.8, a new technique that further improves the efficiency of 1-bit LLMs without sacrificing their performance. The rise of 1-bit LLMs Traditional LLMs use 16-bit floating-point numbers (FP16) to represent their parameters. This requires a lot of memory and compute resources, which limits the accessibility and deployment options for LLMs. One-bit LLMs address this challenge by drastically reducing the precision of model weights while matching the performance of full-precision models. Previous BitNet models used 1.58-bit values (-1, 0, 1) to represent model weights and 8-bit values for activations. This approach significantly reduced memory and I/O costs, but the computational cost of matrix multiplications remained a bottleneck, and optimizing neural networks with extremely low-bit parameters is challenging.  Two techniques help to address this problem. Sparsification reduces the number of computations by pruning activations with smaller magnitudes. This is particularly useful in LLMs because activation values tend to have a long-tailed distribution, with a few very large values and many small ones.   Quantization, on the other hand, uses a smaller number of bits to represent activations, reducing the computational and memory cost of processing them. However, simply lowering the precision of activations can lead to significant quantization errors and performance degradation. Furthermore, combining sparsification and quantization is challenging, and presents special problems when training 1-bit LLMs.  “Both quantization and sparsification introduce non-differentiable operations, making gradient computation during training particularly challenging,” Furu Wei, Partner Research Manager at Microsoft Research, told VentureBeat. Gradient computation is essential for calculating errors and updating parameters when training neural networks. The researchers also had to ensure that their techniques could be implemented efficiently on existing hardware while maintaining the benefits of both sparsification and quantization. BitNet a4.8 BitNet a4.8 transformer architecture (source: arXiv) BitNet a4.8 addresses the challenges of optimizing 1-bit LLMs through what the researchers describe as “hybrid quantization and sparsification.” They achieved this by designing an architecture that selectively applies quantization or sparsification to different components of the model based on the specific distribution pattern of activations. The architecture uses 4-bit activations for inputs to attention and feed-forward network (FFN) layers. It uses sparsification with 8 bits for intermediate states, keeping only the top 55% of the parameters. The architecture is also optimized to take advantage of existing hardware. “With BitNet b1.58, the inference bottleneck of 1-bit LLMs switches from memory/IO to computation, which is constrained by the activation bits (i.e., 8-bit in BitNet b1.58),” Wei said. “In BitNet a4.8, we push the activation bits to 4-bit so that we can leverage 4-bit kernels (e.g., INT4/FP4) to bring 2x speed up for LLM inference on the GPU devices. The combination of 1-bit model weights from BitNet b1.58 and 4-bit activations from BitNet a4.8 effectively addresses both memory/IO and computational constraints in LLM inference.” BitNet a4.8 also uses 3-bit values to represent the key (K) and value (V) states in the attention mechanism. The KV cache is a crucial component of transformer models. It stores the representations of previous tokens in the sequence. By lowering the precision of KV cache values, BitNet a4.8 further reduces memory requirements, especially when dealing with long sequences.  The promise of BitNet a4.8 Experimental results show that BitNet a4.8 delivers performance comparable to its predecessor BitNet b1.58 while using less compute and memory. Compared to full-precision Llama models, BitNet a4.8 reduces memory usage by a factor of 10 and achieves 4x speedup. Compared to BitNet b1.58, it achieves a 2x speedup through 4-bit activation kernels. But the design can deliver much more. “The estimated computation improvement is based on the existing hardware (GPU),” Wei said. “With hardware specifically optimized for 1-bit LLMs, the computation improvements can be significantly enhanced. BitNet introduces a new computation paradigm that minimizes the need for matrix multiplication, a primary focus in current hardware design optimization.” The efficiency of BitNet a4.8 makes it particularly suited for deploying LLMs at the edge and on resource-constrained devices. This can have important implications for privacy and security. By enabling on-device LLMs, users can benefit from the power of these models without needing to send their data to the cloud. Wei and his team are continuing their work on 1-bit LLMs. “We continue to advance our research and vision for the era of 1-bit LLMs,” Wei said. “While our current focus is on model architecture and software support (i.e., bitnet.cpp), we aim to explore the co-design and co-evolution of model architecture and hardware to fully unlock the potential of 1-bit LLMs.” source

How Microsoft’s next-gen BitNet architecture is turbocharging LLM efficiency Read More »

Unlocking generative AI’s true value: a guide to measuring ROI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In the race to harness the transformative power of generative AI, companies are betting big – but are they flying blind? As billions pour into gen AI initiatives, a stark reality emerges: enthusiasm outpaces understanding. A recent KPMG survey reveals a staggering 78% of C-suite leaders are confident in gen AI’s ROI. However, confidence alone is hardly an investment thesis. Most companies are still struggling with what gen AI can even do, much less being able to quantify it.  “There’s a profound disconnect between gen AI’s potential and our ability to measure it,” warns Matt Wallace, CTO of Kamiwaza, a startup building generative AI platforms for enterprises. “We’re seeing companies achieve incredible results, but struggling to quantify them. It’s like we’ve invented teleportation, but we’re still measuring its value in miles per gallon.” This disconnect is not merely an academic concern. It’s a critical challenge for leaders tasked with justifying large gen AI investments to their boards. Yet, the unique nature of this technology can often defy conventional measurement approaches. Why measuring gen AI’s impact is so challenging Unlike traditional IT investments with predictable returns, gen AI’s impact often unfolds over months or years. This delayed realization of benefits can make it difficult to justify AI investments in the short term, even when the long-term potential is significant. At the heart of the problem lies a glaring absence of standardization. “It’s like we’re trying to measure distance in a world where everyone uses different units,” explains Wallace. “One company’s “productivity boost”’ might be another’s “cost savings”. This lack of universally accepted metrics for measuring AI ROI makes it difficult to benchmark performance or draw meaningful comparisons across industries or even within organizations. Compounding this issue is the complexity of attribution. In today’s interconnected business environments, isolating the impact of AI from other factors – market fluctuations, concurrent tech upgrades, or even changes in workforce dynamics – is akin to untangling a Gordian knot. “When you implement gen AI, you’re not just adding a tool, you’re often transforming entire processes,” explains Wallace.  Further, some of the most significant benefits of gen AI resist traditional quantification. Improved decision-making, enhanced customer experiences, and accelerated innovation don’t always translate neatly into dollars and cents. These indirect and intangible benefits, while potentially transformative, are notoriously difficult to capture in conventional ROI calculations. The pressure to demonstrate ROI on gen AI investments continues to mount. As Wallace puts it, “We’re not just measuring returns anymore. We’re redefining what ‘return’ means in the age of AI.” This shift is forcing technical leaders to rethink not just how they measure AI’s impact, but how they conceptualize value creation in the digital age. The question then becomes not just how to measure ROI, but how to develop a new framework for understanding and quantifying the multifaceted impact of AI on business operations, innovation, and competitive positioning. The answer to this question may well redefine not just how we value AI, but how we understand business value itself in the age of artificial intelligence. Summary table: Challenges in measuring gen AI ROI Challenge Description Impact on Measurement Lack of standardized metrics No universally accepted metrics exist for measuring gen AI ROI, making comparisons across industries and organizations difficult. Limits cross-industry benchmarking and internal consistency. Complexity of attribution Difficult to isolate gen AI’s contribution from other influencing factors such as market conditions or other technological changes. Introduces ambiguity in identifying gen AI’s true impact. Indirect and intangible benefits Many gen AI benefits, like improved decision-making or enhanced customer experience, are hard to quantify directly in financial terms. Complicates the creation of financial justifications for gen AI. Time lag in realizing benefits Full benefits of gen AI might take time to materialize, requiring long-term evaluation periods. Delays meaningful ROI assessments. Data quality and availability issues Accurate ROI analysis requires comprehensive and high-quality data, which many organizations struggle to gather and maintain. Undermines reliability of ROI measurements. Rapidly evolving technology Gen AI advances rapidly, making benchmarks and measurement approaches outdated quickly. Increases the need for continuous recalibration. Varying implementation scales ROI can differ significantly between pilot tests and full implementations, making it difficult to extrapolate results. Creates inconsistencies when projecting future returns. Integration complexities Gen AI implementations often require significant changes to processes and systems, making it challenging to isolate the specific impact of gen AI. Obscures direct cause-and-effect analysis. Key performance indicators for gen AI ROI To better navigate these challenges, organizations need a blend of quantitative and qualitative metrics that reflect both the direct and indirect impact of gen AI initiatives. “Traditional KPIs won’t cut it,” says Wallace. “You have to look beyond the obvious numbers.” Among the essential KPIs for gen AI are productivity gains, cost savings and time reductions—metrics that provide tangible evidence to satisfy boardrooms. Yet, focusing only on these metrics can obscure the real value gen AI creates. For example, reduced error rates may not show immediate financial returns, but they prevent future losses, while higher customer satisfaction signals long-term brand loyalty. The true value of gen AI goes beyond numbers, and companies must balance financial metrics with qualitative assessments. Improved decision-making, accelerated innovation and enhanced customer experiences often play a crucial role in determining the success of gen AI initiatives—yet these benefits don’t easily fit into traditional ROI models. Some companies are also tracking a more nuanced metric: Return on Data. This measures how effectively gen AI converts existing data into actionable insights. “Companies sit on massive amounts of data,” Wallace notes. “The ability to turn that data into value is often where gen AI makes the biggest impact.” A balanced scorecard approach helps address this gap by giving equal weight to both financial and non-financial metrics. In cases where direct measurement isn’t possible, companies can develop proxy metrics—for instance, using employee engagement as an indicator of improved processes. The key is alignment: every metric, whether

Unlocking generative AI’s true value: a guide to measuring ROI Read More »

Our brains are vector databases — here’s why that’s helpful when using AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In 2014, a breakthrough at Google transformed how machines understand language: The self-attention model. This innovation allowed AI to grasp context and meaning in human communication by treating words as mathematical vectors — precise numerical representations that capture relationships between ideas. Today, this vector-based approach has evolved into sophisticated vector databases, systems that mirror how our own brains process and retrieve information. This convergence of human cognition and AI technology isn’t just changing how machines work — it’s redefining how we need to communicate with them. How our brains already think in vectors Think of vectors as GPS coordinates for ideas. Just as GPS uses numbers to locate places, vector databases use mathematical coordinates to map concepts, meanings and relationships. When you search a vector database, you’re not just looking for exact matches — you’re finding patterns and relationships, just as your brain does when recalling a memory. Remember searching for your lost car keys? Your brain didn’t methodically scan every room; it quickly accessed relevant memories based on context and similarity. This is exactly how vector databases work. The three core skills, evolved To thrive in this AI-augmented future, we need to evolve what I call the three core skills: reading, writing and querying. While these may sound familiar, their application in AI communication requires a fundamental shift in how we use them. Reading becomes about understanding both human and machine context. Writing transforms into precise, structured communication that machines can process. And querying — perhaps the most crucial new skill — involves learning to navigate vast networks of vector-based information in ways that combine human intuition with machine efficiency. Mastering vector communication Consider an accountant facing a complex financial discrepancy. Traditionally, they’d rely on their experience and manual searches through documentation. In our AI-augmented future, they’ll use vector-based systems that work like an extension of their professional intuition. As they describe the issue, the AI doesn’t just search for keywords — it understands the problem’s context, pulling from a vast network of interconnected financial concepts, regulations and past cases. The key is learning to communicate with these systems in a way that leverages both human expertise and AI’s pattern-recognition capabilities. But mastering these evolved skills isn’t about learning new software or memorizing prompt templates. It’s about understanding how information connects and relates— thinking in vectors, just like our brains naturally do. When you describe a concept to AI, you’re not just sharing words; you’re helping it navigate a vast map of meaning. The better you understand how these connections work, the more effectively you can guide AI systems to the insights you need. Taking action: Developing your core skills for AI Ready to prepare yourself for the AI-augmented future? Here are concrete steps you can take to develop each of the three core skills: Strengthen your reading Reading in the AI age requires more than just comprehension — it demands the ability to quickly process and synthesize complex information. To improve: Study two new words daily from technical documentation or AI research papers. Write them down and practice using them in different contexts. This builds the vocabulary needed to communicate effectively with AI systems. Read at least two to three pages of AI-related content daily. Focus on technical blogs, research summaries or industry publications. The goal isn’t just consumption but developing the ability to extract patterns and relationships from technical content. Practice reading documentation from major AI platforms. Understanding how different AI systems are described and explained will help you better grasp their capabilities and limitations. Evolve your writing Writing for AI requires precision and structure. Your goal is to communicate in a way that machines can accurately interpret. Study grammar and syntax intentionally. AI language models are built on patterns, so understanding how to structure your writing will help you craft more effective prompts. Practice writing prompts daily. Create three new ones each day, then analyze and refine them. Pay attention to how slight changes in structure and word choice affect AI responses. Learn to write with query elements in mind. Incorporate database-like thinking into your writing by being specific about what information you’re requesting and how you want it organized. Master querying Querying is perhaps the most crucial new skill for AI interaction. It’s about learning to ask questions in ways that leverage AI’s capabilities: Practice writing search queries for traditional search engines. Start with simple searches, then gradually make them more complex and specific. This builds the foundation for AI prompting. Study basic SQL concepts and database query structures. Understanding how databases organize and retrieve information will help you think more systematically about information retrieval. Experiment with different query formats in AI tools. Test how various phrasings and structures affect your results. Document what works best for different types of requests. The future of human-AI collaboration The parallels between human memory and vector databases go deeper than simple retrieval. Both excel at compression, reducing complex information into manageable patterns. Both organize information hierarchically, from specific instances to general concepts. And both excel at finding similarities and patterns that might not be obvious at first glance. This isn’t just about professional efficiency — it’s about preparing for a fundamental shift in how we interact with information and technology. Just as literacy transformed human society, these evolved communication skills will be essential for full participation in the AI-augmented economy. But unlike previous technological revolutions that sometimes replaced human capabilities, this one is about enhancement. Vector databases and AI systems, no matter how advanced, lack the uniquely human qualities of creativity, intuition, and emotional intelligence. The future belongs to those who understand how to think and communicate in vectors — not to replace human thinking, but to enhance it. Just as vector databases combine precise mathematical representation with intuitive pattern matching, successful professionals will blend human creativity with AI’s analytical power. This isn’t about competing with AI or

Our brains are vector databases — here’s why that’s helpful when using AI Read More »

ServiceNow rolls out enterprise AI governance capabilities to accelerate production deployment

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More ServiceNow has long been a cornerstone of enterprise IT operations with its flagship Now platform.   In recent years, the company has been growing its capabilities with the introduction of enterprise AI capabilities, including Now Assist. As a platform that organizations use to literally run their operations, having a high degree of confidence is absolutely critical. With generative AI in particular, there has been some hesitation for enterprises about safety and concerns about potential hallucinations. Today, the company announced a series of new governance capabilities for its flagship Now platform designed to help increase confidence in enterprise AI usage. The new governance features address a growing challenge in enterprise AI adoption: the gap between experimentation and full production deployment.  The governance components include Now Assist Guardian, Now Assist Data Kit and Now Assist Analytics. The new tools help organizations manage AI deployments across their enterprise. These tools are crucial as companies move beyond proof-of-concept stages into full production environments. “Last year, broadly, it was more an experimentation approach and this year it’s getting real,” Jeremy Barnes, VP AI Product at ServiceNow told VentureBeat. “People are deploying AI for something related to their top or their bottom line.” Why AI governance is critical to enterprise adoption In an enterprise, governance and compliance are critical operations.  The ServiceNow platform recognizes the often complex relationship between different enterprise stakeholders.  “Typically, our customers will have governance and compliance in a different organization to the organization which is defining and owning the economic benefits of the generative AI,” Barnes said.  What that generally means in most organizations is that one team can get a proof of concept together to try out generative AI. At that stage, there are not the same constraints as when an application or service is rolled out across an enterprise in a full production deployment. Inevitably a governance team within the enterprise will tell the development team that they can’t deploy something without first ensuring compliance with the organization’s policies. Barnes said that what tends to happen as a result, is that generative AI efforts end up in ‘limbo’ between proof of concept and production for a very long time.  He noted that the new AI governance updates help bridge this divide by providing tools and visibility that satisfy both business and compliance requirements. “AI governance is not just about researching the models,” Barnes commented.  He explained that it’s about having a system that includes AI components and traditional workflows. It’s about understanding and being able to make sure that the system fits within the expected outcome desired by the enterprise. Governance is also about understanding when something is wrong and providing the ability to manage the situation. How agentic AI  accelerates the governance imperative Among the reasons why more AI governance is needed now is the fact that agentic AI is starting to be deployed. Many organizations, including ServiceNow, are deploying agent frameworks to provide more autonomous capabilities to AI. Barnes noted that with more autonomous AI agents, there is a greater need for robust governance, controls and human oversight to ensure the systems are operating as intended and within acceptable parameters. The governance tools and workflows provided by ServiceNow aim to help enterprises manage the risks and maintain the necessary level of control over these more autonomous AI systems. The intersection of enterprise AI governance and hallucination A primary challenge for enterprise adoption is the risk of hallucination. Governance itself is not the answer to that challenge, but it’s a component of the solution that is needed. Hallucination is an industry-wide concern and is something that impacts all generative AI models in one way or another. ServiceNow is taking a multi-layered approach to mitigating hallucination. The approach includes fine-tuning language models to be more focused on extracting information rather than generating new information.  Governance is another critical aspect of helping to mitigate risk. The new Now Assist Analysis Guardian tool will now also provide an extra layer of protection against hallucination, analyzing AI outputs. Barnes said that a key goal for ServiceNow is to make sure that hallucination is not a ‘showstopper’ for enterprise AI deployments, but rather is viewed as a risk that can be addressed with tools in the platform. How enterprise AI will help Configuration Management Database deployments Configuration Management Database (CMDB) is a cornerstone of IT operations management. CMDB systems manage the inventory of systems, software and configurations used across an enterprise. As part of the ServiceNow update today there is also a new Now Assist for CMDB capability that brings the power of AI. Barnes explained that the new capability does not directly address the population or discovery of the CMDB, which is typically done through other means but rather focuses on improving the productivity of users interacting with the CMDB data.  The CMDB analysis feature is part of ServiceNow’s broader strategy to provide AI-powered productivity enhancements for different personas within their customer organizations. The CMDB analysis feature is integrated with the AI governance framework, ensuring that the deployment and use of this AI-powered tool is subject to the same governance processes and controls. This helps address the trust and operational constraints that IT operations teams may have when deploying AI-based tools within their critical systems and data. “The more that you rely on an AI tool, the more you need to be sure that, it is trustworthy for what you’re doing,” Barnes said. source

ServiceNow rolls out enterprise AI governance capabilities to accelerate production deployment Read More »

OpenAI launches ChatGPT desktop integrations, rivaling Copilot

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More When OpenAI released desktop app versions of ChatGPT, it was clear the goal was to get more users to bring ChatGPT into their daily workflows. Now, new updates to Mac OS and Windows PC versions encourage users to stay in the ChatGPT apps for most of their tasks.  Some ChatGPT on Mac OS users can now open third-party applications directly from the app. ChatGPT Plus and Teams subscribers — with ChatGPT Enterprise and Edu users following soon after — can access VS Code, Xcode, Terminal and iTerm2 from a dropdown.  This kind of integration calls to mind GitHub Copilot’s integration with coding platforms announced in October.  Alexander Embiricos, product lead with the ChatGPT desktop team, said one of the biggest user behaviors the company saw was copy-pasting text or code generation with ChatGPT to other applications. Embiricos was the CEO of Multi, a screen sharing and collaboration startup acquired by OpenAI in June. “We wanted to start integration with [integrated development environments] IDEs because we know a lot of our customers are developers, as we were seeing a lot of copy-pasting text-based material from the app to other platforms,” Embiricos said.  He added that OpenAI wanted to focus on privacy while building the integrations, so the third-party apps would only open manually.  Users can begin coding with ChatGPT and choose VS Code from the app. Once launched, VS Code will open with the same code that they were working on. Embiricos said theoretically, people can have multiple third-party apps open while using ChatGPT on Mac.  Right now, third-party app integration is only available on Mac OS, but Embiricos said PC users will also get the feature eventually. OpenAI also plans to expand the number of apps in the future.  Windows PC is not left behind The Windows PC version of the ChatGPT desktop app will now be available for download to all ChatGPT users, following the limited release to subscribers. Along with expanding the user base, OpenAI updated the PC app with access to Advanced Voice Mode and screenshot capabilities.  Embiricos said customers have asked them to use Advanced Voice Mode on desktop for a while, so they wanted to focus on the feature for the PC app. The screenshot capability will also take advantage of some specific features in Windows machines, which will let users choose which windows to take a photo of.  “ChatGPT can understand what you’re describing to it, of course, but if you add a photo to your chat, its responses are richer, and we see a lot of users copy-pasting photos into ChatGPT so adding a screenshot option makes that easier,” Embiricos said. Many of the features in the Mac OS desktop app will also come to PC, but Embiricos noted that the team focused on making the PC app more widely available first.  Interfaces are the new battle ground Chat interfaces like ChatGPT proved incredibly useful to a variety of users, but before the advent of desktop versions, people had to go to a website to generate text or code or photos and have to bring chat responses to whichever application they’re doing actual work with.  So it’s no surprise that companies like OpenAI want to capture more of their customer base by bringing their workflows closer to their interface. GitHub made this possible with its integrations with VS Code and Xcode. Anthropic’s Claude, while not integrated with third-party apps, created Artifacts so users don’t have to go elsewhere to see what their generated webpage looks like. OpenAI followed suit with Canvas, which functions similarly.  Meanwhile, Amazon Web Services (AWS) just made its Q Developer AI assistant integrated into popular IDEs Visual Studio Code and JetBrains as an in-line suggestions and code completion add-on, allowing them to highlight chunks of their code and type instructions directly into the LLM without toggling over to another screen. App integration is nothing new for software, as many companies often work together to bring services to where users are. For example, Slack includes apps from Zoom, Atlassian, Asana, and Google that people can call up within a chat window.  source

OpenAI launches ChatGPT desktop integrations, rivaling Copilot Read More »

SambaNova and Hugging Face make AI chatbot deployment easier with one-click integration

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More SambaNova and Hugging Face launched a new integration today that lets developers deploy ChatGPT-like interfaces with a single button click, reducing deployment time from hours to minutes. For developers interested in trying the service, the process is relatively straightforward. First, visit SambaNova Cloud’s API website and obtain an access token. Then, using Python, enter these three lines of code: import gradio as gr import sambanova_gradio gr.load(“Meta-Llama-3.1-70B-Instruct-8k”, src=sambanova_gradio.registry, accept_token=True).launch() The final step is clicking “Deploy to Hugging Face” and entering the SambaNova token. Within seconds, a fully functional AI chatbot becomes available on Hugging Face’s Spaces platform. The three-line code required to deploy an AI chatbot using SambaNova and Hugging Face’s new integration. The interface includes a “Deploy into Huggingface” button, demonstrating the simplified deployment process. (Credit: SambaNova / Hugging Face) How one-click deployment changes enterprise AI development “This gets an app running in less than a minute versus having to code and deploy a traditional app with an API provider, which might take an hour or more depending on any issues and how familiar you are with API, reading docs, etc…,” Ahsen Khaliq, ML Growth Lead at Gradio, told VentureBeat in an exclusive interview. The integration supports both text-only and multimodal chatbots, capable of processing both text and images. Developers can access powerful models like Llama 3.2-11B-Vision-Instruct through SambaNova’s cloud platform, with performance metrics showing processing speeds of up to 358 tokens per second on unconstrained hardware. Performance metrics reveal enterprise-grade capabilities Traditional chatbot deployment often requires extensive knowledge of APIs, documentation, and deployment protocols. The new system simplifies this process to a single “Deploy to Hugging Face” button, potentially increasing AI deployment across organizations of varying technical expertise. “Sambanova is committed to serve the developer community and make their life as easy as possible,” Kaizhao Liang, senior principal of machine learning at SambaNova Systems, told VentureBeat. “Accessing fast AI inference shouldn’t have any barrier, partnering with Hugging Face Spaces with Gradio allows developers to utilize fast inference for SambaNova cloud with a seamless one-click app deployment experience.” The integration’s performance metrics, particularly for the Llama3 405B model, demonstrate significant capabilities, with benchmarks showing average power usage of 8,411 W for unconstrained racks, suggesting robust performance for enterprise-scale applications. Performance metrics for SambaNova’s Llama3 405B model deployment, showing processing speeds and power consumption across different server configurations. The unconstrained rack demonstrates higher performance capabilities but requires more power than the 9KW configuration. (Credit: SambaNova) Why This Integration Could Reshape Enterprise AI Adoption The timing of this release coincides with growing enterprise demand for AI solutions that can be rapidly deployed and scaled. While tech giants like OpenAI and Anthropic have dominated headlines with their consumer-facing chatbots, SambaNova’s approach targets the developer community directly, providing them with enterprise-grade tools that match the sophistication of leading AI interfaces. To encourage adoption, SambaNova and Hugging Face will host a hackathon in December, offering developers hands-on experience with the new integration. This initiative comes as enterprises increasingly seek ways to implement AI solutions without the traditional overhead of extensive development cycles. For technical decision makers, this development presents a compelling option for rapid AI deployment. The simplified workflow could potentially reduce development costs and accelerate time-to-market for AI-powered features, particularly for organizations looking to implement conversational AI interfaces. But faster deployment brings new challenges. Companies must think harder about how they’ll use AI effectively, what problems they’ll solve, and how they’ll protect user privacy and ensure responsible use. Technical simplicity doesn’t guarantee good implementation. “We’re removing the complexity of deployment,” Liang told VentureBeat, “so developers can focus on what really matters: building tools that solve real problems.” The tools for building AI chatbots are now simple enough for nearly any developer to use. But the harder questions remain uniquely human: What should we build? How will we use it? And most importantly, will it actually help people? Those are the challenges worth solving. source

SambaNova and Hugging Face make AI chatbot deployment easier with one-click integration Read More »

Qwen2.5-Coder just changed the game for AI programming—and it’s free

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and it’s available to developers at no cost. The release includes six model variants, from 0.5 billion to 32 billion parameters, making advanced AI coding accessible to developers with different computing resources. This achievement by the Chinese tech company comes despite facing export restrictions on advanced semiconductors. According to the team’s technical report on arXiv, Qwen2.5-Coder’s success stems from refined data processing, synthetic data generation, and balanced training datasets, resulting in strong code generation while maintaining broader capabilities. A comparison of AI coding models shows Alibaba’s Qwen2.5-Coder-32B (in blue) outperforming GPT-4 and other competitors across multiple industry benchmarks. Source: Alibaba Cloud Research State-of-the-art performance raises stakes in global AI race The flagship model, Qwen2.5-Coder-32B-Instruct, has shattered previous benchmarks for open-source coding assistants. It scored 92.7% on HumanEval and 90.2% on MBPP, two crucial metrics for measuring code generation abilities. Most impressively, it achieved 31.4% accuracy on LiveCodeBench, a contemporary benchmark testing AI models on real-world programming challenges. The achievement goes far beyond typical performance metrics. While most AI coding assistants specialize in one or two popular languages like Python or JavaScript, Qwen2.5-Coder’s mastery of 92 programming languages — from mainstream tools to niche languages like Haskell and Racket — represents a major leap forward in AI versatility. This broad language support, combined with its ability to handle complex tasks like repository-level code completion and debugging, suggests we’re entering a new era where AI coding assistants can truly function as universal programming partners rather than just specialized tools. Benchmark results comparing Alibaba’s Qwen2.5-Coder against leading AI models, including GPT-4 and Claude 3.5. The new model (leftmost column) achieves top scores in several key metrics, including a 92.7% accuracy rate on HumanEval, surpassing both open-source and proprietary competitors. Source: Alibaba Cloud Research Open-source strategy could reshape enterprise software development Unlike its closed-source competitors, most Qwen2.5-Coder models carry the permissive Apache 2.0 license, allowing companies to freely integrate them into their products. This could dramatically reduce development costs for businesses worldwide while accelerating AI adoption. The model’s capabilities extend beyond basic coding. It excels at repository-level code completion, understands context across multiple files, and can generate visual applications like websites and data visualizations. “We explore the practicality of Qwen2.5-Coder in two scenarios, including code assistants and Artifacts, with some examples showcasing the potential applications in real-world scenarios,” the researchers explained in their paper. China’s AI innovation defies U.S. chip restrictions This release could fundamentally alter the economics of AI-assisted software development. While companies like OpenAI and Anthropic have built their business models around subscription access to proprietary models, Alibaba’s decision to open-source Qwen2.5-Coder creates a new dynamic. Enterprise customers who currently pay hundreds of thousands of dollars annually for AI coding assistance could soon have access to comparable capabilities at a fraction of the cost. This doesn’t just challenge existing business models – it could accelerate AI adoption among smaller companies and developers in emerging markets who have been priced out of the current AI boom. The shift toward open-source, enterprise-grade AI tools also raises strategic questions for Western tech companies. As more sophisticated open-source alternatives emerge, maintaining high-priced subscription models for AI services may become increasingly difficult to justify to enterprise customers. The achievement is particularly important given the ongoing U.S. restrictions on chip exports to China. Alibaba’s success suggests Chinese tech companies have found ways to innovate despite these constraints, possibly reshaping the global AI competitive landscape. The model’s release intensifies the AI development race between the U.S. and China. While American companies have traditionally led in large language models, Chinese firms are increasingly matching or exceeding their capabilities in specialized domains like coding and mathematics. Alibaba’s researchers plan to explore scaling up both data size and model size while enhancing reasoning capabilities. This suggests the company isn’t content with current achievements and aims to push the boundaries further. For developers and businesses worldwide, Qwen2.5-Coder presents a new option in the AI toolkit — one that combines state-of-the-art performance with the freedom of open-source software. As the AI arms race continues to accelerate, this release may mark a shift in how advanced AI capabilities are distributed and accessed globally. source

Qwen2.5-Coder just changed the game for AI programming—and it’s free Read More »