VentureBeat

Google quietly launches AI Edge Gallery, letting Android phones run AI without the cloud

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Google has quietly released an experimental Android application that enables users to run sophisticated artificial intelligence models directly on their smartphones without requiring an internet connection, marking a significant step in the company’s push toward edge computing and privacy-focused AI deployment. The app, called AI Edge Gallery, allows users to download and execute AI models from the popular Hugging Face platform entirely on their devices, enabling tasks such as image analysis, text generation, coding assistance, and multi-turn conversations while keeping all data processing local. The application, released under an open-source Apache 2.0 license and available through GitHub rather than official app stores, represents Google’s latest effort to democratize access to advanced AI capabilities while addressing growing privacy concerns about cloud-based artificial intelligence services. “The Google AI Edge Gallery is an experimental app that puts the power of cutting-edge Generative AI models directly into your hands, running entirely on your Android devices,” Google explains in the app’s user guide. “Dive into a world of creative and practical AI use cases, all running locally, without needing an internet connection once the model is loaded.” Google’s AI Edge Gallery app shows the main interface, model selection from Hugging Face, and configuration options for processing acceleration. (Credit: Google) How Google’s lightweight AI models deliver cloud-level performance on mobile devices The application builds on Google’s LiteRT platform, formerly known as TensorFlow Lite, and MediaPipe frameworks, which are specifically optimized for running AI models on resource-constrained mobile devices. The system supports models from multiple machine learning frameworks, including JAX, Keras, PyTorch, and TensorFlow. At the heart of the offering is Google’s Gemma 3 model, a compact 529-megabyte language model that can process up to 2,585 tokens per second during prefill inference on mobile GPUs. This performance enables sub-second response times for tasks like text generation and image analysis, making the experience comparable to cloud-based alternatives. The app includes three core capabilities: AI Chat for multi-turn conversations, Ask Image for visual question-answering, and Prompt Lab for single-turn tasks such as text summarization, code generation, and content rewriting. Users can switch between different models to compare performance and capabilities, with real-time benchmarks showing metrics like time-to-first-token and decode speed. “Int4 quantization cuts model size by up to 4x over bf16, reducing memory use and latency,” Google noted in technical documentation, referring to optimization techniques that make larger models feasible on mobile hardware. The AI Chat feature provides detailed responses and displays real-time performance metrics including token speed and latency. (Credit: Google) Why on-device AI processing could revolutionize data privacy and enterprise security The local processing approach addresses growing concerns about data privacy in AI applications, particularly in industries handling sensitive information. By keeping data on-device, organizations can maintain compliance with privacy regulations while leveraging AI capabilities. This shift represents a fundamental reimagining of the AI privacy equation. Rather than treating privacy as a constraint that limits AI capabilities, on-device processing transforms privacy into a competitive advantage. Organizations no longer need to choose between powerful AI and data protection — they can have both. The elimination of network dependencies also means that intermittent connectivity, traditionally a major limitation for AI applications, becomes irrelevant for core functionality. The approach is particularly valuable for sectors like healthcare and finance, where data sensitivity requirements often limit cloud AI adoption. Field applications such as equipment diagnostics and remote work scenarios also benefit from the offline capabilities. However, the shift to on-device processing introduces new security considerations that organizations must address. While the data itself becomes more secure by never leaving the device, the focus shifts to protecting the devices themselves and the AI models they contain. This creates new attack vectors and requires different security strategies than traditional cloud-based AI deployments. Organizations must now consider device fleet management, model integrity verification, and protection against adversarial attacks that could compromise local AI systems. Google’s platform strategy takes aim at Apple and Qualcomm’s mobile AI dominance Google’s move comes amid intensifying competition in the mobile AI space. Apple’s Neural Engine, embedded across iPhones, iPads, and Macs, already powers real-time language processing and computational photography on-device. Qualcomm’s AI Engine, built into Snapdragon chips, drives voice recognition and smart assistants in Android smartphones, while Samsung uses embedded neural processing units in Galaxy devices. However, Google’s approach differs significantly from competitors by focusing on platform infrastructure rather than proprietary features. Rather than competing directly on specific AI capabilities, Google is positioning itself as the foundation layer that enables all mobile AI applications. This strategy echoes successful platform plays from technology history, where controlling the infrastructure proves more valuable than controlling individual applications. The timing of this platform strategy is particularly shrewd. As mobile AI capabilities become commoditized, the real value shifts to whoever can provide the tools, frameworks, and distribution mechanisms that developers need. By open-sourcing the technology and making it widely available, Google ensures broad adoption while maintaining control over the underlying infrastructure that powers the entire ecosystem. What early testing reveals about mobile AI’s current challenges and limitations The application currently faces several limitations that underscore its experimental nature. Performance varies significantly based on device hardware, with high-end devices like the Pixel 8 Pro handling larger models smoothly while mid-tier devices may experience higher latency. Testing revealed accuracy issues with some tasks. The app occasionally provided incorrect responses to specific questions, such as incorrectly identifying crew counts for fictional spacecraft or misidentifying comic book covers. Google acknowledges these limitations, with the AI itself stating during testing that it was “still under development and still learning.” Installation remains cumbersome, requiring users to enable developer mode on Android devices and manually install the application via APK files. Users must also create Hugging Face accounts to download models, adding friction to the onboarding process. The hardware constraints highlight a fundamental challenge facing mobile AI: the tension between model sophistication and device limitations. Unlike cloud environments

Google quietly launches AI Edge Gallery, letting Android phones run AI without the cloud Read More »

The battle to AI-enable the web: NLweb and what enterprises need to know

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more In the first generation of the web, back in the late 1990s, search was okay but not great, and it wasn’t easy to find things. That led to the rise of syndication protocols in the early 2000s, with Atom and RSS (Really Simple Syndication) providing a simplified way for website owners to make headlines and other content easily available and searchable. In the modern era of AI, a new group of protocols is emerging to serve the same basic purpose. This time, instead of making sites easier for humans to find, it’s all about making websites easier for AI. Anthropic’s Model Control Protocol (MCP), Google‘s Agent2Agent and large language models/ LLMs.txt are among the existing efforts. The newest protocol is Microsoft’s open-source NLWeb (natural language web) effort, which was announced during the Build 2025 conference. NLWeb is also directly linked to the first generation of web syndication standards, as it was conceived and created by RV Guha, who helped create RSS, RDF (Resource Description Framework) and schema.org. NLWeb enables websites to easily add AI-powered conversational interfaces, effectively turning any website into an AI app where users can query content using natural language. NLWeb isn’t necessarily about competing with other protocols; rather, it builds on top of them. The new protocol uses existing structured data formats like RSS, and each NLWeb instance functions as an MCP server. “The idea behind NLWeb is it is a way for anyone who has a website or an API already to very easily make their website or their API an agentic application,” Microsoft CTO Kevin Scott said during his Build 2025 keynote. “You really can think about it a little bit like HTML for the agentic web.” How NLWeb works to AI-enable the web for enterprises NLWeb transforms websites into AI-powered experiences through a straightforward process that builds on existing web infrastructure while leveraging modern AI technologies. Building on existing data: The system begins by leveraging structured data that websites already publish, including markup, RSS feeds and other semi-structured formats that are commonly embedded in web pages. This means publishers don’t need to rebuild their content infrastructure completely. Data processing and storage: NLWeb includes tools for adding this structured data to vector databases, which enable efficient semantic search and retrieval. The system supports all major vector database options, allowing developers to choose the solution that best fits their technical requirements and scale. AI enhancement layer: LLMs then enhance this stored data with external knowledge and context. For instance, when a user queries about restaurants, the system automatically layers on geographic insights, reviews and related information by combining the vectorized content with LLM capabilities to provide comprehensive, intelligent responses rather than simple data retrieval. Universal interface creation: The result is a natural language interface that serves both human users and AI agents. Visitors can ask questions in plain English and receive conversational responses, while AI systems can programmatically access and query the site’s information through the MCP framework. This approach allows any website to participate in the emerging agentic web without requiring extensive technical overhauls. It makes AI-powered search and interaction as accessible as creating a basic webpage was in the early days of the internet. The emerging AI protocol landscape brings many choices to enterprises There are a lot of different protocols emerging in the AI space; not all do the same thing. Google’s Agent2Agent, for example, is all about enabling agents to talk to each other. It’s about orchestrating and communicating agentic AI and is not particularly focused on AI-enabling existing websites or AI content. Maria Gorskikh, founder and CEO of AIA and a contributor to the Project NANDA team at MIT, explained to VentureBeat that Google’s A2A enables structured task passing between agents using defined schemas and lifecycle models. “While the protocol is open-source and model-agnostic by design, its current implementations and tooling are closely tied to Google’s Gemini stack — making it more of a backend orchestration framework than a general-purpose interface for web-based services,” she said. Another emerging effort is LLMs.txt. Its goal is to help LLMs better access web content. While on the surface, it might sound somewhat like NLWeb, it’s not the same thing. “NLWeb doesn’t compete with LLMs.txt; it is more comparable to web scraping tools that try to deduce intent from a website,” Michael Ni, VP and Principal Analyst at Constellation Research told VentureBeat. Krish Arvapally, co-founder and CTO of Dappier, explained to VentureBeat that LLMs.txt provides a markdown-style format with training permissions that helps LLM crawlers ingest content appropriately. NLWeb focuses on enabling real-time interactions directly on a publisher’s website. Dappier has its own platform that automatically ingests RSS feeds and other structured data, then delivers branded, embeddable conversational interfaces. Publishers can syndicate their content to their data marketplace. MCP is the other big protocol, and it is increasingly becoming a de facto standard and a foundational element of NLWeb. Fundamentally, MCP is an open standard for connecting AI systems with data sources. Ni explained that in Microsoft’s view, MCP is the transport layer, where, together, MCP and NLWeb provide the HTML and TCP/IP of the open agentic web. Forrester Senior Analyst Will McKeon-White sees a number of advantages for NLWeb over other options. “The main advantage of NLWeb is better control over how AI systems ‘see’ the pieces that make up websites, allowing for better navigation and more complete understanding of the tooling,” McKeon-White told VentureBeat. “This could reduce both errors from systems misunderstanding what they’re seeing on websites, as well as reduce interface rework.” Early adopters already see the promise of NLWeb for enterprise agentic AI Microsoft didn’t just throw NLWeb over the proverbial wall and hope someone would use it. Microsoft already has multiple organizations engaged and using NLWeb, including Chicago Public Media, Allrecipes, Eventbrite, Hearst (Delish), O’Reilly Media, Tripadvisor and Shopify. Andrew Odewahn, Chief Technology Officer at O’Reilly Media is among the early adopters and sees

The battle to AI-enable the web: NLweb and what enterprises need to know Read More »

Your AI models are failing in production—Here’s how to fix model selection

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Enterprises need to know if the models that power their applications and agents work in real-life scenarios. This type of evaluation can sometimes be complex because it is hard to predict specific scenarios. A revamped version of the RewardBench benchmark looks to give organizations a better idea of a model’s real-life performance. The Allen Institute for AI (Ai2) launched RewardBench 2, an updated version of its reward model benchmark, RewardBench, which they claim provides a more holistic view of model performance and assesses how models align with an enterprise’s goals and standards. Ai2 built RewardBench with classification tasks that measure correlations through inference-time compute and downstream training. RewardBench mainly deals with reward models (RM), which can act as judges and evaluate LLM outputs. RMs assign a score or a “reward” that guides reinforcement learning with human feedback (RHLF). RewardBench 2 is here! We took a long time to learn from our first reward model evaluation tool to make one that is substantially harder and more correlated with both downstream RLHF and inference-time scaling. pic.twitter.com/NGetvNrOQV — Ai2 (@allen_ai) June 2, 2025 Nathan Lambert, a senior research scientist at Ai2, told VentureBeat that the first RewardBench worked as intended when it was launched. Still, the model environment rapidly evolved, and so should its benchmarks. “As reward models became more advanced and use cases more nuanced, we quickly recognized with the community that the first version didn’t fully capture the complexity of real-world human preferences,” he said. Lambert added that with RewardBench 2, “we set out to improve both the breadth and depth of evaluation—incorporating more diverse, challenging prompts and refining the methodology to reflect better how humans actually judge AI outputs in practice.” He said the second version uses unseen human prompts, has a more challenging scoring setup and new domains. Using evaluations for models that evaluate While reward models test how well models work, it’s also important that RMs align with company values; otherwise, the fine-tuning and reinforcement learning process can reinforce bad behavior, such as hallucinations, reduce generalization, and score harmful responses too high. RewardBench 2 covers six different domains: factuality, precise instruction following, math, safety, focus and ties. “Enterprises should use RewardBench 2 in two different ways depending on their application. If they’re performing RLHF themselves, they should adopt the best practices and datasets from leading models in their own pipelines because reward models need on-policy training recipes (i.e. reward models that mirror the model they’re trying to train with RL). For inference time scaling or data filtering, RewardBench 2 has shown that they can select the best model for their domain and see correlated performance,” Lambert said. Lambert noted that benchmarks like RewardBench offer users a way to evaluate the models they’re choosing based on the “dimensions that matter most to them, rather than relying on a narrow one-size-fits-all score.” He said the idea of performance, which many evaluation methods claim to assess, is very subjective because a good response from a model highly depends on the context and goals of the user. At the same time, human preferences get very nuanced. Ai2 released the first version of RewardBench in March 2024. At the time, the company said it was the first benchmark and leaderboard for reward models. Since then, several methods for benchmarking and improving RM have emerged. Researchers at Meta’s FAIR came out with reWordBench. DeepSeek released a new technique called Self-Principled Critique Tuning for smarter and scalable RM. Super excited that our second reward model evaluation is out. It’s substantially harder, much cleaner, and well correlated with downstream PPO/BoN sampling. Happy hillclimbing! Huge congrats to @saumyamalik44 who lead the project with a total commitment to excellence. https://t.co/c0b6rHTXY5 — Nathan Lambert (@natolambert) June 2, 2025 How models performed Since RewardBench 2 is an updated version of RewardBench, Ai2 tested both existing and newly trained models to see if they continue to rank high. These included a variety of models, such as versions of Gemini, Claude, GPT-4.1, and Llama-3.1, along with datasets and models like Qwen, Skywork, and its own Tulu. The company found that larger reward models perform best on the benchmark because their base models are stronger. Overall, the strongest-performing models are variants of Llama-3.1 Instruct. In terms of focus and safety, Skywork data “is particularly helpful,” and Tulu did well on factuality. Ai2 said that while they believe RewardBench 2 “is a step forward in broad, multi-domain accuracy-based evaluation” for reward models, they cautioned that model evaluation should be mainly used as a guide to pick models that work best with an enterprise’s needs. source

Your AI models are failing in production—Here’s how to fix model selection Read More »

Solidroad just raised $6.5M to reinvent customer service with AI that coaches, not replaces

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Solidroad, an artificial intelligence startup that promises to solve one of customer service’s most persistent problems, has raised $6.5 million in seed funding to expand its platform that automatically trains customer service representatives and improves AI agents. The Dublin-founded company, led by First Round Capital with participation from Y Combinator, addresses a fundamental challenge facing growing businesses: how to maintain high-quality customer experiences while controlling costs as conversation volumes explode past 10,000 interactions per month. “CX leaders scaling past 10,000 conversations a month are often stuck between two options: either they maintain quality and eat the cost, or cut costs and watch customer satisfaction suffer,” said Mark Hughes, co-founder and CEO of Solidroad, in an exclusive interview with VentureBeat. “None of the traditional solutions work.” The funding round, which brings Solidroad’s total capital raised to $8 million, comes as companies increasingly struggle to balance customer experience quality with operational efficiency. Traditional approaches — offshore outsourcing, legacy quality assurance tools, or fully automated AI agents — often result in deteriorating customer satisfaction scores, according to Hughes. How AI analyzes every customer conversation to create personalized training simulations Solidroad’s platform operates as what Hughes calls “an aggregation layer” that sits atop existing customer communication channels, analyzing every interaction between companies and their customers. Unlike AI solutions that attempt to replace human agents entirely, Solidroad focuses on making both human representatives and AI systems more effective. The platform automatically reviews 100% of customer conversations across multiple channels, applying AI-powered quality assurance that traditionally required manual review of just 1-3% of interactions. More critically, it transforms these insights into actionable improvements through individualized training simulations for human agents and refinement recommendations for AI systems. “Traditional QA has always been manual and retrospective,” Hughes explained. “Someone reviews a handful of calls or emails, applies a rubric, and tells you how you did. We had to completely rethink that approach. It wasn’t enough to just score conversations with AI — we set out to make the insights actionable.” The system generates personalized training scenarios based on actual conversation patterns and identified skill gaps, creating what Hughes describes as targeted coaching at scale without adding process overhead or additional staff. Early customer results suggest the approach delivers measurable improvements. Crypto.com, the cryptocurrency exchange, used Solidroad to reduce average handling time by 18% while simultaneously improving customer satisfaction scores from 87% to 90% — a 3-percentage-point increase that represents significant improvement in the customer service industry. Marketing automation platform ActiveCampaign reported saving the equivalent of a full year of manual coaching time, which the company reinvested into higher-leverage training initiatives and faster feedback mechanisms. Customer engagement platform Podium cut new hire ramp time in half by embedding Solidroad’s AI simulations into their onboarding process. “Across the board, Solidroad customers are seeing 90% or higher go-live CSAT scores, faster ramp times, and a huge reduction in manual QA work,” Hughes said, citing additional results from PartnerHero, which saw a 30% improvement in agent proficiency scores. The platform currently analyzes hundreds of thousands of conversations monthly for more than 50 customers, with new companies signing up weekly, according to the company. Hughes and co-founder Patrick Finlay, who serves as chief technology officer, developed their understanding of customer experience challenges during their tenure at Intercom, the customer messaging platform where they first met and collaborated. “Patrick was building features; I was selling them,” Hughes recalled. “We saw firsthand how important customer experience is to growth, but also how frustrating it was to work with tools that didn’t actually help CX teams do their jobs better. Even great companies were stuck duct-taping together solutions that weren’t built for them.” The duo represents a growing trend of second-time founders applying artificial intelligence to enterprise operational challenges. Hughes previously founded and sold Gradguide, a career guidance platform, while Finlay co-founded Y Combinator-backed no-code startup Monaru. Why Solidroad chose human augmentation over the AI replacement trend sweeping customer service The customer experience software market has exploded as companies recognize the revenue impact of customer satisfaction, but many existing solutions focus on either full automation or basic analytics rather than systematic improvement of human performance. Traditional quality assurance tools typically require significant manual oversight and provide retrospective insights rather than proactive training. Meanwhile, fully automated AI agents, while promising cost savings, often struggle with complex or emotionally nuanced customer interactions, sometimes delivering what Hughes characterizes as “hallucinations” rather than helpful responses. “Unlike other AI-powered CX solutions, we don’t handle conversations ourselves,” Hughes explained. “Most AI CX tools are trying to replace humans with AI agents. We help them improve.” This positioning reflects a broader industry debate about the optimal balance between human agents and artificial intelligence in customer service operations. First Round Capital’s bet signals confidence in human-AI collaboration over full automation First Round Capital’s lead investment represents a significant validation of Solidroad’s approach. The venture firm previously led early rounds for companies including Notion, Uber, and other category-defining platforms, suggesting confidence in Solidroad’s potential to reshape customer experience technology. “We’re excited to be working with First Round which was the first institutional investor in companies like Notion, Uber, and many more,” Hughes noted in the company’s announcement. “But more importantly, they’ve backed founders who know how to build.” The funding will primarily support aggressive hiring, particularly in San Francisco where the company is establishing its primary hub. Solidroad plans to relocate its Ireland-based team to the Bay Area while expanding across engineering and go-to-market functions. “We’re currently focused on hiring engineering and go-to-market roles,” Hughes said. “We’re looking for people who want to be at the frontier of AI and customer experience.” Enterprise security measures address growing concerns about AI analyzing sensitive conversations As Solidroad analyzes sensitive customer conversations, the company has implemented enterprise-grade security measures including SOC 2 Type 2 and ISO27001 compliance. Customer data remains isolated in secure workspaces

Solidroad just raised $6.5M to reinvent customer service with AI that coaches, not replaces Read More »

Stop guessing why your LLMs break: Anthropic’s new tool shows you exactly what goes wrong

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Large language models (LLMs) are transforming how enterprises operate, but their “black box” nature often leaves enterprises grappling with unpredictability. Addressing this critical challenge, Anthropic recently open-sourced its circuit tracing tool, allowing developers and researchers to directly understand and control models’ inner workings. This tool allows investigators to investigate unexplained errors and unexpected behaviors in open-weight models. It can also help with granular fine-tuning of LLMs for specific internal functions. Understanding the AI’s inner logic This circuit tracing tool works based on “mechanistic interpretability,” a burgeoning field dedicated to understanding how AI models function based on their internal activations rather than merely observing their inputs and outputs. While Anthropic’s initial research on circuit tracing applied this methodology to their own Claude 3.5 Haiku model, the open-sourced tool extends this capability to open-weights models. Anthropic’s team has already used the tool to trace circuits in models like Gemma-2-2b and Llama-3.2-1b and has released a Colab notebook that helps use the library on open models. The core of the tool lies in generating attribution graphs, causal maps that trace the interactions between features as the model processes information and generates an output. (Features are internal activation patterns of the model that can be roughly mapped to understandable concepts.) It is like obtaining a detailed wiring diagram of an AI’s internal thought process. More importantly, the tool enables “intervention experiments,” allowing researchers to directly modify these internal features and observe how changes in the AI’s internal states impact its external responses, making it possible to debug models. The tool integrates with Neuronpedia, an open platform for understanding and experimentation with neural networks. Circuit tracing on Neuronpedia (source: Anthropic blog) Practicalities and future impact for enterprise AI While Anthropic’s circuit tracing tool is a great step toward explainable and controllable AI, it has practical challenges, including high memory costs associated with running the tool and the inherent complexity of interpreting the detailed attribution graphs. However, these challenges are typical of cutting-edge research. Mechanistic interpretability is a big area of research, and most big AI labs are developing models to investigate the inner workings of large language models. By open-sourcing the circuit tracing tool, Anthropic will enable the community to develop interpretability tools that are more scalable, automated, and accessible to a wider array of users, opening the way for practical applications of all the effort that is going into understanding LLMs. As the tooling matures, the ability to understand why an LLM makes a certain decision can translate into practical benefits for enterprises. Circuit tracing explains how LLMs perform sophisticated multi-step reasoning. For example, in their study, the researchers were able to trace how a model inferred “Texas” from “Dallas” before arriving at “Austin” as the capital. It also revealed advanced planning mechanisms, like a model pre-selecting rhyming words in a poem to guide line composition. Enterprises can use these insights to analyze how their models tackle complex tasks like data analysis or legal reasoning. Pinpointing internal planning or reasoning steps allows for targeted optimization, improving efficiency and accuracy in complex business processes. Source: Anthropic Furthermore, circuit tracing offers better clarity into numerical operations. For example, in their study, the researchers uncovered how models handle arithmetic, like 36+59=95, not through simple algorithms but via parallel pathways and “lookup table” features for digits. For example, enterprises can use such insights to audit internal computations leading to numerical results, identify the origin of errors and implement targeted fixes to ensure data integrity and calculation accuracy within their open-source LLMs. For global deployments, the tool provides insights into multilingual consistency. Anthropic’s previous research shows that models employ both language-specific and abstract, language-independent “universal mental language” circuits, with larger models demonstrating greater generalization. This can potentially help debug localization challenges when deploying models across different languages. Finally, the tool can help combat hallucinations and improve factual grounding. The research revealed that models have “default refusal circuits” for unknown queries, which are suppressed by “known answer” features. Hallucinations can occur when this inhibitory circuit “misfires.” Source: Anthropic Beyond debugging existing issues, this mechanistic understanding unlocks new avenues for fine-tuning LLMs. Instead of merely adjusting output behavior through trial and error, enterprises can identify and target the specific internal mechanisms driving desired or undesired traits. For instance, understanding how a model’s “Assistant persona” inadvertently incorporates hidden reward model biases, as shown in Anthropic’s research, allows developers to precisely re-tune the internal circuits responsible for alignment, leading to more robust and ethically consistent AI deployments. As LLMs increasingly integrate into critical enterprise functions, their transparency, interpretability and control become increasingly critical. This new generation of tools can help bridge the gap between AI’s powerful capabilities and human understanding, building foundational trust and ensuring that enterprises can deploy AI systems that are reliable, auditable, and aligned with their strategic objectives. source

Stop guessing why your LLMs break: Anthropic’s new tool shows you exactly what goes wrong Read More »

Which LLM should you use? Token Monster automatically combines multiple models and tools for you

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Token Monster, a new AI chatbot platform, has launched its alpha preview, aiming to change how users interact with large language models (LLMs). Developed by Matt Shumer, co-founder and CEO of OthersideAI and its hit AI writing assistant Hyperwrite AI, Token Monster’s key selling point is its ability to route user prompts to the best available LLMs for the task at hand, delivering enhanced outputs by leveraging the strengths of multiple models. There are seven major LLMs presently available through Token Monster. Once a user types something into the prompt entry box, Token Monster uses pre-prompts developed through iteration by Shumer himself to automatically analyze the user’s input, decide which combination of multiple available models and linked tools are best suited to answer it, and then provide a combined response leveraging the strengths of said models. The available LLMs include: Anthropic Claude 3.5 Sonnet Anthropic Claude 3.5 Opus OpenAI GPT-4.1 OpenAI GPT-4o Perplexity AI PPLX (for research) OpenAI o3 (for reasoning) Google Gemini 2.5 Pro Unlike other chatbot platforms, Token Monster automatically identifies which LLM is best for specific tasks — as well as which LLM-connected tools would be helpful such as web search or coding environments — and orchestrates a multi-model workflow. “We’re just building the connectors to everything and then a system that decides what to use when,” said Shumer. For instance, it might use Claude for creativity, o3 for reasoning, and PPLX for research, among others. This approach eliminates the need for users to manually choose the right model for each prompt, simplifying the process for anyone who wants high-quality, tailored results. Feature highlights The alpha preview, which is currently free to sign up for at tokenmonster.ai, allows users to upload a range of file types, including Excel, PowerPoint, and Docs. It also includes features such as webpage extraction, persistent conversation sessions, and a “FAST mode” that auto-routes to the best model without user input. At the heart of Token Monster is OpenRouter, a third-party service that acts as a gateway to multiple LLMs, and into which Shumer has invested a small sum, by his admission. This architecture lets Token Monster tap into a range of models from different providers without having to build separate integrations for each one. Pricing and availability As of right now, Token Monster does not charge a flat monthly fee. Instead, users only pay for the tokens they consume through OpenRouter, making it flexible for varying levels of usage. According to Shumer, this model was inspired by Cline, a tool that enables high-spending users to access unlimited AI power, allowing them to achieve better outputs by simply using more compute resources. Multi-step workflows produce richer LLM responses Token Monster’s AI workflows extend beyond simple prompt routing. In one example, the chatbot might start with a research phase using web search APIs, pass that data to o3 for identifying information gaps, then create an outline with Gemini 2.5 Pro, draft text with Claude Opus, and refine it with Claude 3.5 Sonnet. This multi-step orchestration is designed to provide richer, more complete answers than a single LLM might be able to generate alone. The platform also includes the ability to save sessions, with data securely stored using the open source online database service Supabase. This ensures that users can return to ongoing projects without losing their work, while still giving them control over what data is saved and what is ephemeral. A non-traditional CEO In a notable experiment, Token Monster’s leadership has been handed over to Anthropic’s Claude model. Shumer announced that he is committed to following every decision made by “CEO Claude,” calling it a test to see whether an AI can manage a business effectively. “Either we’ve revolutionized management forever or made a huge mistake,” he wrote on X. Emerging from the Reflection 70-B controversy Token Monster’s launch comes less than a year after Shumer faced controversy over his launch and ultimate retraction of Reflection 70B, a fine-tuned version of Meta’s Llama 3.1 that was initially touted as the most highly performant open source model in the world, but which quickly became subject to criticism and accusations of fraud after third-party researchers were unable to reproduce its stated performance on third-party benchmark tests. Shumer apologized and said the issues were born out of mistakes made due to speed. The episode underscored the challenges and risks of rapid AI development and the importance of transparency in model releases. MCP integrations coming next Shumer said his team on Token Monster is also exploring new capabilities, such as integrating with Model Context Protocol (MCP) servers that allow websites and companies to have LLMs make use of their knowledge, tools, and products to achieve higher-order tasks than just text or image generation. This would enable Token Monster to connect with a user’s internal data and services, opening possibilities for it to handle tasks like managing customer support tickets or interfacing with other business systems. Shumer emphasized that Token Monster is still very much in its early stages. While it already supports a suite of powerful features, the platform remains an alpha product and is expected to see rapid iterations and updates as more users provide feedback. “We’re going to keep iterating and adding things,” he said. A promising experiment For users who want to take advantage of the combined power of multiple LLMs without the hassle of model switching, Token Monster could be an appealing choice. It’s designed to work for people who don’t want to spend hours tweaking prompts or testing different models themselves, instead letting the system’s automated routing and multi-step workflows handle the complexity. As Token Monster’s capabilities grow, it will be interesting to see how users and businesses adopt it — and how its experiment with AI-led management pans out. For now, it’s a promising addition to the rapidly expanding landscape of AI chatbots and digital assistants. source

Which LLM should you use? Token Monster automatically combines multiple models and tools for you Read More »

Google claims Gemini 2.5 Pro preview beats DeepSeek R1 and Grok 3 Beta in coding performance

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Google has released an updated preview of Gemini 2.5 Pro, its “most intelligent” model, first announced in March and upgraded in May, as a preview, intending to release the same model to general availability in a couple of weeks. Enterprises can test building new applications or replace earlier versions with an updated version of the “I/O edition” of Gemini 2.5 Pro that, according to a blog post by Google, is more creative in its responses and outperforms other models in coding and reasoning. Our latest Gemini 2.5 Pro update is now in preview. It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads @lmarena_ai with a 24pt Elo score jump since the previous version. We also… pic.twitter.com/SVjdQ2k1tJ — Sundar Pichai (@sundarpichai) June 5, 2025 During its annual I/O developer conference in May, Google announced that it updated Gemini 2.5 Pro to be better than its earlier iteration, which it quietly released. Google DeepMind CEO Demis Hassabis said the I/O edition is the company’s best coding model yet. But this new preview, called Gemini 2.5 Pro Preview 06-05 Thinking, is even better than the I/O edition. The stable version Google plans to release publicly is “ready for enterprise-scale capabilities.” The I/O edition, or gemini-2.5-pro-preview-05-06, was first made available to developers and enterprises in May through Google AI Studio and Vertex AI. Gemini 2.5 Pro Preview 06-05 Thinking can be accessed via the same platforms. Performance metrics This new version of Gemini 2.5 Pro performs even better than the first release. Google said the new version of Gemini 2.5 Pro improved by 24 points in LMArena and by 35 points in WebDevArena, where it currently tops the leaderboard. The company’s benchmark tests showed that the model outscored competitors like OpenAI’s o3, o3-mini, and o4-mini, Anthropic’s Claude 4 Opus, Grok 3 Beta from xAI and DeepSeek R1. “We’ve also addressed feedback from our previous 2.5 Pro releases, improving its style and structure — it can be more creative with better-formatted responses,” Google said in the blog post. What enterprises can expect Google’s continuous improvement of Gemini 2.5 Pro might be confusing for many, but Google previously framed these as a response to community feedback. Pricing for the new version is $1.25 per million tokens without caching for inputs and $10 for the output price. When the very first version of Gemini 2.5 Pro launched in March, VentureBeat’s Matt Marshall called it “the smartest model you’re not using.” Since then, Google has integrated the model into many of its new applications and services, including “Deep Think,” where Gemini considers multiple hypotheses before responding. The release of Gemini 2.5 Pro, and its two upgraded versions, revived Google’s place in the large language model space after competitors like DeepSeek and OpenAI diverted the industry’s attention to their reasoning models. In just a few hours of announcing the updated Gemini 2.5 Pro, developers have already begun playing around with it. While many found the update to live up to Google’s promise of being faster, the jury is still out if this latest Gemini 2.5 Pro does actually perform better. First hour with “Gemini 2.5 Pro Preview 06-05” Positives: – It’s faster– It produces more output– It has a better macro play (multi file edits, better overview)– Output structure is better (readable)– It’s more concise and LESS APOLOGETIC!! Before: “You are absolutely… — Patrick Bade (@nishffx) June 5, 2025 you guys cooked, really enjoying the app builder. made a game and tested it out, it was using imagen to build assets on the fly ? and it’s up, hosted, easy to share. Really the best no-experience no-code builder yet. keep building out the vibe app marketplace, this could… — bone (@boneGPT) June 5, 2025 Gemini 2.5 Pro Preview is pretty good.. used it yesterday for deep research and the results are better than some of the big names.. — Janak (@janaks09) June 5, 2025 source

Google claims Gemini 2.5 Pro preview beats DeepSeek R1 and Grok 3 Beta in coding performance Read More »

Model Context Protocol: A promising AI integration layer, but not a standard (yet)

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more In the past couple of years as AI systems have become more capable of not just generating text, but taking actions, making decisions and integrating with enterprise systems, they have come with additional complexities. Each AI model has its own proprietary way of interfacing with other software. Every system added creates another integration jam, and IT teams are spending more time connecting systems than using them. This integration tax is not unique: It’s the hidden cost of today’s fragmented AI landscape. Anthropic’s Model Context Protocol (MCP) is one of the first attempts to fill this gap. It proposes a clean, stateless protocol for how large language models (LLMs) can discover and invoke external tools with consistent interfaces and minimal developer friction. This has the potential to transform isolated AI capabilities into composable, enterprise-ready workflows. In turn, it could make integrations standardized and simpler. Is it the panacea we need? Before we delve in, let us first understand what MCP is all about. Right now, tool integration in LLM-powered systems is ad hoc at best. Each agent framework, each plugin system and each model vendor tend to define their own way of handling tool invocation. This is leading to reduced portability. MCP offers a refreshing alternative: A client-server model, where LLMs request tool execution from external services; Tool interfaces published in a machine-readable, declarative format; A stateless communication pattern designed for composability and reusability. If adopted widely, MCP could make AI tools discoverable, modular and interoperable, similar to what REST (REpresentational State Transfer) and OpenAPI did for web services. Why MCP is not (yet) a standard While MCP is an open-source protocol developed by Anthropic and has recently gained traction, it is important to recognize what it is — and what it is not. MCP is not yet a formal industry standard. Despite its open nature and rising adoption, it is still maintained and guided by a single vendor, primarily designed around the Claude model family. A true standard requires more than just open access. There should be an independent governance group, representation from multiple stakeholders and a formal consortium to oversee its evolution, versioning and any dispute resolution. None of these elements are in place for MCP today. This distinction is more than technical. In recent enterprise implementation projects involving task orchestration, document processing and quote automation, the absence of a shared tool interface layer has surfaced repeatedly as a friction point. Teams are forced to develop adapters or duplicate logic across systems, which leads to higher complexity and increased costs. Without a neutral, broadly accepted protocol, that complexity is unlikely to decrease. This is particularly relevant in today’s fragmented AI landscape, where multiple vendors are exploring their own proprietary or parallel protocols. For example, Google has announced its Agent2Agent protocol, while IBM is developing its own Agent Communication Protocol. Without coordinated efforts, there is a real risk of the ecosystem splintering — rather than converging, making interoperability and long-term stability harder to achieve. Meanwhile, MCP itself is still evolving, with its specifications, security practices and implementation guidance being actively refined. Early adopters have noted challenges around developer experience, tool integration and robust security, none of which are trivial for enterprise-grade systems. In this context, enterprises must be cautious. While MCP presents a promising direction, mission-critical systems demand predictability, stability and interoperability, which are best delivered by mature, community-driven standards. Protocols governed by a neutral body ensure long-term investment protection, safeguarding adopters from unilateral changes or strategic pivots by any single vendor. For organizations evaluating MCP today, this raises a crucial question — how do you embrace innovation without locking into uncertainty? The next step isn’t to reject MCP, but to engage with it strategically: Experiment where it adds value, isolate dependencies and prepare for a multi-protocol future that may still be in flux. What tech leaders should watch for While experimenting with MCP makes sense, especially for those already using Claude, full-scale adoption requires a more strategic lens. Here are a few considerations: 1. Vendor lock-in If your tools are MCP-specific, and only Anthropic supports MCP, you are tied to their stack. That limits flexibility as multi-model strategies become more common. 2. Security implications Letting LLMs invoke tools autonomously is powerful and dangerous. Without guardrails like scoped permissions, output validation and fine-grained authorization, a poorly scoped tool could expose systems to manipulation or error. 3. Observability gaps The “reasoning” behind tool use is implicit in the model’s output. That makes debugging harder. Logging, monitoring and transparency tooling will be essential for enterprise use. Tool ecosystem lag Most tools today are not MCP-aware. Organizations may need to rework their APIs to be compliant or build middleware adapters to bridge the gap. Strategic recommendations If you are building agent-based products, MCP is worth tracking. Adoption should be staged: Prototype with MCP, but avoid deep coupling; Design adapters that abstract MCP-specific logic; Advocate for open governance, to help steer MCP (or its successor) toward community adoption; Track parallel efforts from open-source players like LangChain and AutoGPT, or industry bodies that may propose vendor-neutral alternatives. These steps preserve flexibility while encouraging architectural practices aligned with future convergence. Why this conversation matters Based on experience in enterprise environments, one pattern is clear: The lack of standardized model-to-tool interfaces slows down adoption, increases integration costs and creates operational risk. The idea behind MCP is that models should speak a consistent language to tools. Prima facie: This is not just a good idea, but a necessary one. It is a foundational layer for how future AI systems will coordinate, execute and reason in real-world workflows. The road to widespread adoption is neither guaranteed nor without risk. Whether MCP becomes that standard remains to be seen. But the conversation it is sparking is one the industry can no longer avoid. Gopal Kuppuswamy is co-founder of Cognida. source

Model Context Protocol: A promising AI integration layer, but not a standard (yet) Read More »

OpenAI hits 3M business users and launches workplace tools to take on Microsoft

Leave a Comment / Top Tech Update / VentureBeat

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more OpenAI announced Wednesday that its business user base has surged 50% since February, reaching 3 million paying enterprise customers as the artificial intelligence company unveiled an expansive suite of new workplace tools designed to compete directly with Microsoft’s enterprise AI offerings. The milestone, revealed alongside the launch of several new business-focused features, underscores OpenAI’s aggressive push into corporate markets where reliable, secure AI tools can command premium prices. The company introduced new “connectors” that integrate ChatGPT with popular business applications, a meeting transcription feature called Record Mode, and enhanced versions of its Deep Research and Codex coding tools. “ChatGPT is helping transform businesses by helping employees work with more productivity, efficiency, and more strategically,” an OpenAI spokesperson told VentureBeat. “Over the last few months, we’ve continued evolving ChatGPT into an increasingly impactful platform for work with business products like connectors, record mode with ChatGPT, Codex, image generation, deep research, and more.” The rapid enterprise adoption comes as OpenAI faces intensifying competition from tech giants like Microsoft and Google, which offer deep workplace integrations through existing enterprise relationships. Yet the company appears to be winning customers by positioning itself as the premier destination for cutting-edge AI capabilities. “Customers often choose ChatGPT for direct access to SOTA (state-of-the-art) models and tools, combined with enterprise-grade security and commitments on never training on business data,” the spokesperson said, emphasizing OpenAI’s competitive advantage as an “AI-native” company focused solely on advancing artificial intelligence rather than integrating it into legacy systems. OpenAI’s new workplace connectors challenge Microsoft and Google’s enterprise AI dominance The newly announced connectors represent OpenAI’s most direct challenge yet to Microsoft’s workplace AI strategy. The integrations allow workers to access company data stored in Dropbox, Box, SharePoint, OneDrive, and Google Drive directly through ChatGPT, eliminating the need to switch between applications. The connectors also extend to OpenAI’s Deep Research feature, an AI agent that conducts multi-step research tasks by gathering and synthesizing information from both external sources and internal company data. Deep Research connectors now work with HubSpot, Linear, and various Microsoft and Google tools, enabling the creation of comprehensive research reports that combine web data with proprietary business insights. “Every organization holds vast knowledge, but it’s often trapped in silos,” OpenAI explained in its announcement. The company’s goal is to “evolve ChatGPT into a platform that unlocks your organization’s entire knowledge base — enabling each employee to continuously leverage this knowledge.” Record Mode, available to Team users, automatically transcribes and summarizes meetings while generating actionable items and integrating with internal documents. The feature represents OpenAI’s entry into a market dominated by services like Otter.ai and Microsoft’s own transcription tools. Perhaps most significantly, OpenAI expanded access to its Codex software engineering agent, powered by the new codex-1 model based on the company’s upcoming o3 reasoning system. Codex can write code, fix bugs, and propose pull requests while working in isolated cloud environments, offering enterprises a powerful tool for accelerating software development. Data security and privacy remain key hurdles for enterprise ChatGPT adoption Despite the growth, OpenAI continues to face questions about data security and privacy — critical concerns for enterprise customers handling sensitive business information. When asked about companies’ hesitations to input confidential data into ChatGPT, particularly given recent AI security incidents across the industry, the OpenAI spokesperson directed attention to the company’s security policies without providing specific details. “Security is critical at OpenAI–more details here,” the spokesperson said, referring to the company’s published security documentation. The response highlights ongoing challenges for AI companies seeking enterprise adoption. Many organizations remain cautious about cloud-based AI services, particularly after high-profile data breaches and concerns about how AI models are trained and where sensitive information might be stored. OpenAI has attempted to address these concerns by implementing enterprise-grade security measures and promising never to train its models on business customer data. However, the company’s rapid growth and the complex technical nature of large language models continue to generate skepticism among some IT decision-makers. Sam Altman says AI is ready for enterprise deployment as competition heats up OpenAI’s enterprise push occurs amid a broader transformation in how businesses adopt artificial intelligence. Recent industry analysis suggests that AI adoption is accelerating faster than any previous technology in history, with companies moving beyond experimental pilots to production deployments. “Certainly, what you are seeing with enterprises and AI is that the people making the early bets and learning very quickly are doing much better than the people who are waiting to see how it’s all going to shake out,” OpenAI CEO Sam Altman said recently at the Snowflake Summit in San Francisco, advising enterprise leaders to “just do it” when it comes to AI adoption. This represents a notable shift in Altman’s messaging. A year ago, he advised companies to experiment cautiously with AI rather than deploy it in critical business processes. Now, he argues that AI capabilities have matured sufficiently for production use in most enterprise contexts. The competitive landscape has also intensified significantly. While OpenAI dominates public attention and developer mindshare, the company faces mounting pressure from well-funded rivals. Anthropic, the AI safety-focused startup founded by former OpenAI researchers, has been successfully recruiting top talent from both OpenAI and Google’s DeepMind division, according to recent talent analysis. Meanwhile, Microsoft’s integration of OpenAI technologies into its Office suite and the recent launch of free Sora video generation through Bing demonstrate how the partnership between the two companies continues to evolve. Microsoft’s announcement that Bing users can now access OpenAI’s Sora video creation tool for free — bypassing the $20 monthly ChatGPT subscription requirement — illustrates the complex dynamics of their relationship. Deep research and coding capabilities give OpenAI a competitive edge in enterprise markets OpenAI’s enterprise success stems largely from its technical capabilities, particularly in reasoning and research tasks. The company’s Deep Research feature, powered by a version of the upcoming o3 model, represents a

OpenAI hits 3M business users and launches workplace tools to take on Microsoft Read More »

OpenAI’s Sora is now available for FREE to all users through Microsoft Bing Video Creator on mobile

Leave a Comment / Top Tech Update / VentureBeat

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI‘s Sora was one of the most hyped releases of the AI era, launching in December 2024, nearly 10 months after it was first previewed to awe-struck reactions due to its — at the time, at least — unprecedented level of realism, camera dynamism, and prompt adherence and 60-second long generation clips. However, much of the luster has worn off as numerous other AI video generators — from U.S. startups Runway to Luma to Chinese competitors Kling, Hailuo MiniMax and Israel’s LTX Studio — are all offering generative AI video models and applications for consumers and enterprise users that rival or have already surpassed OpenAI’s offering. Also, we still haven’t gotten 60-second generations from a single Sora prompt (as far as I know, the maximum appears to be 20 seconds). But now OpenAI and its ally/investor/frenemy Microsoft are seeking to bring Sora to far more users — for free (at least for a few generations). Today, Microsoft announced that Sora is now being offered through its Bing Video Creator feature on the free Bing mobile app for iOS (Apple iPhone and App Store) and Android (Google Play Store). That’s an incredible value, given that to get it through ChatGPT and OpenAI, you’ll need to pay for a ChatGPT Plus ($20 monthly) or Pro ($200 monthly) subscription. Bing Video Creator with Sora is the latest in a series of AI-driven offerings from Microsoft, following the release of Bing Image Creator and Copilot. As Microsoft Corporate Vice President (CVP) and Head of Search Jordi Ribas wrote on X: “Two years ago, Bing was the first product to ship image creation for free for our users. Today, I’m excited to share that Bing Video Creator is now available in the Bing mobile app, everywhere that Bing Image Creator is available worldwide. Powered by Sora, Bing Video Creator transforms your text prompts into short videos. Just describe what you want to see and watch your vision come to life.” To introduce Bing Video Creator, Microsoft has released a promotional video ad (embedded above) that showcases how the tool brings creative ideas to life. The ad demonstrates users typing prompts like “Create a hummingbird flapping its wings in ultra slow motion,” “A turtle drifting slowly through a neon coral canyon,” and “A tiny astronaut exploring a giant mushroom planet.” The AI then generates short, vibrant video clips based on these prompts. The video emphasizes how easy it is to create and share these videos, including an example of the astronaut video being shared in a chat and receiving positive reactions. Free 5-second vertical video creations on mobile — with horizontal videos coming soon Bing Video Creator turns text prompts into five-second AI-generated videos. It does not yet support text-to-video or video-to-video generations (which many other rival AI video generators, including OpenAI’s implementation of Sora, do). To use the tool, users can open the Bing Mobile app, tap the menu in the bottom right corner, and select “Video Creator.” Alternatively, you can launch the video creation process by typing a prompt directly into the Bing search bar in the app—beginning with “Create a video of…” Once the prompt is entered, Bing Video Creator generates a short video based on the description. For example, a prompt like “In a busy Italian pizza restaurant, a small otter works as a chef and wears a chef’s hat and an apron. He kneads the dough with his paws and is surrounded by other pizza ingredients” would result in an engaging, AI-generated five-second video. Currently, videos are available in 9:16 portrait format — that is, vertical, perfect for TikTok and YouTube Shorts — though Microsoft says it in its announcement blog post that a 16:9 aka landscape or horizontal aspect ratio option is “coming soon.” Users can queue up to three video generations at a time, and each creation is stored for up to 90 days. Once a video is ready, it can be downloaded, shared via email or social media, or accessed through a direct link. Bing Video Creator will be available worldwide today, except for China and Russia. It’s available now on the Bing Mobile app, and desktop and Copilot Search are also said to be launching “soon.” Free to use for 10 fast generations, unlimited slow generations Bing Video Creator is free for all users. Each user is allowed ten “Fast” video generations, which can create videos in seconds. After using these, users can continue with Standard speed generations — which takes minutes — at no cost, or redeem 100 Microsoft Rewards points for each additional Fast creation. Those reward points come from Microsoft’s free, opt-in program that allows users to earn points for everyday activities — like searching with Bing, shopping in the Microsoft Store, or playing games with Xbox Game Pass. To participate, users must sign in with a Microsoft account and activate their Rewards dashboard here. Beyond fun videos and social media posts, Bing Video Creator is positioned as a tool for enhancing everyday communication and creativity. Bing’s announcement encourages users to create videos to celebrate special moments, test creative ideas, and communicate more effectively. To help users get the best results, Bing suggests providing descriptive prompts, incorporating action-oriented language and experimenting with tone and style—such as cinematic or playful aesthetics. Responsible AI and safety, built-in Microsoft says that Bing Video Creator is designed according to its Responsible AI principles, leveraging C2PA standards for content credentials to help identify AI-generated content. The tool also includes moderation features that automatically block prompts that could generate harmful or unsafe videos. Implications for enterprises and technical decision-makers Although Bing Video Creator is currently framed as a consumer-focused tool, its underlying technology and capabilities could have interesting implications for enterprise users — particularly those involved in AI orchestration, data engineering and AI model deployment. For AI engineers responsible for deploying and fine-tuning large language models, Bing Video Creator highlights the growing

OpenAI’s Sora is now available for FREE to all users through Microsoft Bing Video Creator on mobile Read More »

Google quietly launches AI Edge Gallery, letting Android phones run AI without the cloud

The battle to AI-enable the web: NLweb and what enterprises need to know

Your AI models are failing in production—Here’s how to fix model selection

Solidroad just raised $6.5M to reinvent customer service with AI that coaches, not replaces

Stop guessing why your LLMs break: Anthropic’s new tool shows you exactly what goes wrong

Which LLM should you use? Token Monster automatically combines multiple models and tools for you

Google claims Gemini 2.5 Pro preview beats DeepSeek R1 and Grok 3 Beta in coding performance

Model Context Protocol: A promising AI integration layer, but not a standard (yet)

OpenAI hits 3M business users and launches workplace tools to take on Microsoft

OpenAI’s Sora is now available for FREE to all users through Microsoft Bing Video Creator on mobile

We provide a matching platform and membership services for startup groups in Asia

Useful Links

Become an Affiliate

Contact

News & Insight

Join the family!

Latest News

Dow Jones Futures Fall; Crude Oil Prices Top $90 On Tanker Attacks, Port Disruption

Stock Market Today: Dow, S&P 500 Give Up Gains; Micron Has A Good Day (Live Coverage)