VentureBeat

OpenAI’s o1 model doesn’t show its thinking, giving open source an advantage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has ushered in a new reasoning paradigm in large language models (LLMs) with its o1 model, which recently got a major upgrade. However, while OpenAI has a strong lead in reasoning models, it might lose some ground to open source rivals that are quickly emerging. Models like o1, sometimes referred to as large reasoning models (LRMs), use extra inference-time compute cycles to “think” more, review their responses and correct their answers. This enables them to solve complex reasoning problems that classic LLMs struggle with and makes them especially useful for tasks such as coding, math and data analysis.  However, in recent days, developers have shown mixed reactions to o1, especially after the updated release. Some have posted examples of o1 accomplishing incredible tasks while others have expressed frustration over the model’s confusing responses. Developers have experienced all kinds of problems from making illogical changes to code or ignoring instructions. Secrecy around o1 details Part of the confusion is due to OpenAI’s secrecy and refusal to show the details of how o1 works. The secret sauce behind the success of LRMs is the extra tokens that the model generates as it reaches the final response, referred to as the model’s “thoughts” or “reasoning chain.” For example, if you prompt a classic LLM to generate code for a task, it will immediately generate the code. In contrast, an LRM will generate reasoning tokens that examine the problem, plan the structure of code, and generate multiple solutions before emitting the final answer. o1 hides the thinking process and only shows the final response along with a message that displays how long the model thought and possibly a high overview of the reasoning process. This is partly to avoid cluttering the response and providing a smoother user experience. But more importantly, OpenAI considers the reasoning chain as a trade secret and wants to make it difficult for competitors to replicate o1’s capabilities. The costs of training new models continue to grow and profit margins are not keeping pace, which is pushing some AI labs to become more secretive in order to extend their lead. Even Apollo research, which did the red-teaming of the model, was not given access to its reasoning chain. This lack of transparency has led users to make all kinds of speculations, including accusing OpenAI of degrading the model to cut inference costs. Open-source models fully transparent On the other hand, open source alternatives such as Alibaba’s Qwen with Questions and Marco-o1 show the full reasoning chain of their models. Another alternative is DeepSeek R1, which is not open source but still reveals the reasoning tokens. Seeing the reasoning chain enables developers to troubleshoot their prompts and find ways to improve the model’s responses by adding additional instructions or in-context examples. Visibility into the reasoning process is especially important when you want to integrate the model’s responses into applications and tools that expect consistent results. Moreover, having control over the underlying model is important in enterprise applications. Private models and the scaffolding that supports them, such as the safeguards and filters that test their inputs and outputs, are constantly changing. While this may result in better overall performance, it can break many prompts and applications that were built on top of them. In contrast, open source models give full control of the model to the developer, which can be a more robust option for enterprise applications, where performance on very specific tasks is more important than general skills. QwQ and R1 are still in preview versions and o1 has the lead in terms of accuracy and ease of use. And for many uses, such as making general ad hoc prompts and one-time requests, o1 can still be a better option than the open source alternatives.  But the open-source community is quick to catch up with private models and we can expect more models to emerge in the coming months. They can turn into a suitable alternative where visibility and control are crucial. source

OpenAI’s o1 model doesn’t show its thinking, giving open source an advantage Read More »

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The UAE government-backed Technology Innovation Institute (TII) has announced the launch of Falcon 3, a family of open-source small language models (SLMs) designed to run efficiently on lightweight, single GPU-based infrastructures. Falcon 3 features four model sizes — 1B, 3B, 7B, and 10B — with base and instruct variants, promising to democratize access to advanced AI capabilities for developers, researchers, and businesses. According to the Hugging Face leaderboard, the models are already outperforming or closely matching popular open-source counterparts in their size class, including Meta’s Llama and category leader Qwen-2.5. The development comes at a time when the demand for SLMs, with fewer parameters and simpler designs than LLMs, is rapidly growing due to their efficiency, affordability, and ability to be deployed on devices with limited resources. They are suitable for a range of applications across industries, like customer service, healthcare, mobile apps and IoT, where typical LLMs might be too computationally expensive to run effectively. According to Valuates Reports, the market for these models is expected to grow, with a CAGR of nearly 18% over the next five years. What does Falcon 3 bring to the table? Trained on 14 trillion tokens — more than double its predecessor Falcon 2 — the Falcon 3 family employs a decoder-only architecture with grouped query attention to share parameters and minimize memory usage for key-value (KV) cache during inference. This enables faster and more efficient operations when handling diverse text-based tasks. At the core, the models support four primary languages — English, French, Spanish, and Portuguese—and come equipped with a 32K context window, allowing them to process long inputs, such as heavily worded documents. “Falcon 3 is versatile, designed for both general-purpose and specialized tasks, providing immense flexibility to users. Its base model is perfect for generative applications, while the instruct variant excels in conversational tasks like customer service or virtual assistants,” TII notes on its website. According to the leaderboard on Hugging Face, while all four Falcon 3 models perform fairly well, the 10B and 7B versions are the stars of the show, achieving state-of-the-art results on reasoning, language understanding, instruction following, code and mathematics tasks.  Among models under the 13B-parameter size class, Falcon 3’s 10B and 7B versions outperform competitors, including Google’s Gemma 2-9B, Meta’s Llama 3.1-8B, Mistral-7B, and Yi 1.5-9B. They even surpass Alibaba’s category leader Qwen 2.5-7B in most benchmarks — such as MUSR, MATH, GPQA, and IFEval — except for MMLU, which is the test for evaluating how well language models understand and process human language. Falcon 3 benchmarks Deployment across industries With the Falcon 3 models now available on Hugging Face, TII aims to serve a broad range of users, enabling cost-effective AI deployments without computational bottlenecks. With their ability to handle specific, domain-focused tasks with fast processing times, the models can power various applications at the edge and in privacy-sensitive environments, including customer service chatbots, personalized recommender systems, data analysis, fraud detection, healthcare diagnostics, supply chain optimization and education. The institute also plans to expand the Falcon family further by introducing models with multimodal capabilities. These models are expected to launch sometime in January 2025. Notably, all models have been released under the TII Falcon License 2.0, a permissive Apache 2.0-based license with an acceptable use policy that encourages responsible AI development and deployment. To help users get started, TII has also launched a Falcon Playground, a testing environment where researchers and developers can try out Falcon 3 models before integrating them into their applications. source

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models Read More »

Writer’s new AI model aims to fix the ‘sameness problem’ in generative content

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Writer, the fast-rising enterprise AI startup recently valued at $1.9 billion, has launched Palmyra Creative, a specialized AI model promising to change how businesses tackle creative tasks. Unlike traditional AI models — often criticized for their rigid, predictable outputs — Palmyra Creative introduces a new approach aimed at fostering originality and breaking free from the sameness that has begun to plague AI-generated content. “All the AI models sound remarkably similar,” said Writer’s chief technology officer Waseem AlShikh in an interview with VentureBeat. “What’s surprising is how quickly humans have learned to spot AI-generated text — not just specialists, but everyone can now identify it almost instantly.” By addressing this “sameness problem,” Writer is positioning itself as a key player in the $1 trillion generative AI market, offering enterprises a tool that combines creativity with domain-specific expertise — a balance that few competitors have managed to achieve. Writer’s Palmyra Creative model shown in a product demo, with customization controls for tailoring AI outputs to different business needs. (Credit: Writer) How Palmyra Creative thinks differently Rather than chasing the industry trend of expanding training data, Writer has developed a fundamentally different approach to AI architecture. While most AI models rely on vast datasets and fine-tuning, Palmyra Creative uses merging techniques and adaptive model layering to restructure how the model interprets and generates language. “[We thought] let’s not actually focus on the training data,” AlShikh explained. “Can we focus on actually recreating the layering [within] the model so the model can see the token differently? We trained three different models with three different datasets and used some techniques called merging techniques.” This innovative method reorganizes how relationships between tokens are processed within the model, resulting in outputs that are more dynamic and less predictable. The result is a model capable of generating unique, engaging content without requiring monumental amounts of new training data. Writer’s approach is also cost-effective. Training Palmyra Creative cost just $700,000, a fraction of the $4.6 million it reportedly costs competitors like OpenAI to train similarly sized models. Creativity meets accuracy: Guardrails for enterprise use Palmyra Creative doesn’t just produce creative outputs — it does so while maintaining high levels of accuracy and reliability, thanks to Writer’s new “claim detection” system. This feature ensures that creative outputs generated by the model remain grounded when combined with Writer’s industry-specific models, such as Palmyra Med for healthcare or Palmyra Fin for finance. “We had to put [in] a lot of guardrails because creative is great, but there is some limit when [what is] created actually could ruin the input,” said AlShikh. “When you have a chain of thought with multiple models, we create a cell layer to develop checks — what we call the claim system.” This system evaluates whether claims made by the creative model align with factual inputs from domain-specific models, ensuring outputs are as reliable as they are innovative. For instance, when integrating a healthcare model with Palmyra Creative, the system flags any divergence from established medical facts, allowing enterprises to maintain compliance and trustworthiness. Measuring creativity: A unique challenge Unlike traditional benchmarks like Stanford’s HELM or MMLU, which evaluate models based on accuracy and reasoning, creativity doesn’t fit neatly into established metrics. To address this, Writer developed a new evaluation framework, employing a team of 20 linguists who spent three weeks analyzing Palmyra Creative’s outputs. The company also introduced a benchmark system that measures token uniqueness across multiple generations of outputs. “We try to work differently,” AlShikh said. “We measure creativity by looking at token uniqueness and how relationships between tokens differ from the training data. It’s a way to quantify originality.” Writer plans to publish this benchmark as an open-source tool in January, potentially setting a new industry standard for evaluating creative AI. Real-world applications: Creativity in action Palmyra Creative is already being used to address a variety of creative challenges in industries like marketing, finance, and product development. For example, the model can help businesses brainstorm original strategies, such as devising loyalty programs or crafting unconventional marketing campaigns, while maintaining brand distinctiveness. In one demonstration, Palmyra Creative suggested unique ideas for a small-town bakery competing with a national chain. Among its recommendations: hosting sensory baking sessions for seniors to recreate childhood treats, organizing community bake-offs for charity, and using gamified loyalty programs to engage customers. These kinds of imaginative, tailored solutions are precisely what enterprises need to stand out in competitive markets. A billion-dollar bet on the future of enterprise AI The Palmyra Creative launch comes at a pivotal time for Writer, which recently raised $200 million in Series C funding co-led by Premji Invest, Radical Ventures, and ICONIQ Growth. With high-profile clients such as Salesforce, Uber, L’Oréal, and Accenture, Writer is doubling down on its enterprise-first strategy, offering tools that promise measurable ROI and scalability. “Enterprises don’t just need AI — they need AI that works for their unique challenges,” said Patrick Stokes, EVP of product at Salesforce, in a statement. “Writer provides a refined, AI-powered solution that’s effective, easy to deploy, and has rapidly accelerated our workflows.” Writer’s partnership with Nvidia further underscores its commitment to enterprise scalability. Palmyra Creative is packaged as a NVIDIA NIM microservice, allowing businesses to deploy it across cloud, data center, and edge environments with ease. Can Writer outpace the competition? By addressing the “sameness problem” in AI-generated content, Writer is staking its claim in a crowded market dominated by tech giants like OpenAI, Google, Microsoft and Anthropic. Its focus on creativity, coupled with cost-effective innovation and enterprise-grade reliability, gives it a unique edge. However, the road ahead won’t be easy. Competing in generative AI requires not only technical excellence but also robust governance frameworks to address emerging issues like bias and safety. Writer’s claim detection system and open-source benchmarks are promising steps, but the company will need to continue innovating to stay ahead. With the generative AI market

Writer’s new AI model aims to fix the ‘sameness problem’ in generative content Read More »

Engineered Arts restructures with $10M to create humanoid robots

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Engineered Arts, a United Kingdom firm making humanoid robots, has restructured as a U.S. company and raised $10 million. The reason for moving to the U.S. is to expand its footprint and meet U.S. growing demand. And the company raised the new round to accelerate product refinement, manufacturing readiness, scale production, and investment in advanced business systems. This milestone brings Engineered Arts’ total funding to $16.2 million to date. And it advances its mission to integrate humanoid robots into daily life with a human-focused approach to AI. Engineered Arts’ humanoid entertainment robots are designed to foster natural and intuitive interactions, enhancing experiences at businesses, science centers, theme parks, and conventions with unforgettable, one-of-a-kind engagements. Helium-3 Ventures led the Series A funding, with additional participation from AppDirect CEO Nicolas Desmarais, Belvoir Investments and a consortium of investors, including ThirtySeven Holdings and Figueira Capital. Matthew Bellamy, frontman of the English rock band Muse and a partner in Helium-3 Ventures, will join Engineered Arts’ board as an observer. “Our motto is simple: ‘Be wow!’” said Will Jackson, CEO of Engineered Arts, in a statement. “When you meet one of our robots, you’ll experience a connection to technology in the most human way possible. The saying goes, ‘The future is already here; you just haven’t seen it yet.’ We’re changing that. Get ready to experience the power of embodied AI.” Scaling robots Ameca is a humanoid robot from Engineered Arts. While many companies are just beginning to explore the development and commercialization of humanoid robotics, Engineered Arts has been a pioneer in the field for over 20 years. With a proven track record, the company has deployed over 200 robots worldwide and developed six distinct humanoid robotic models, all ready to scale. Two years ago, footage about Ameca — its most advanced humanoid robot — went viral, captivating millions with videos showcasing its conversation with researchers. The new funding will enable Engineered Arts to make its full-sized and desktop robots more accessible, launch a virtual robot character platform, and expand its cloud-based AI services to enhance product features and fleet deployment. Focusing on next-generation robot hardware development, Engineered Arts will enhance dexterity and locomotion to bring humanoid robots closer to everyday functionality. Additionally, scaled support and regional offices will enable Engineered Arts to provide customization for specific use cases. The company plans to hire approximately 20 new employees for the Redwood City, California, location over the next year-and-a-half, ranging from top-level execs and sales to software, assembly, and support engineers. “We envision a world where the virtual seamlessly integrates into everyday life,” said Jackson, in a statement. “Our robots are designed to support, entertain, inform, and educate —providing a genuinely human-centric vision of AI-driven technology.” Captivating and engaging applications Ameca has human-like expressions. Engineered Arts’ humanoid robots are already making an impact. They serve marquee customers like Madison Square Garden’s Sphere in Las Vegas, where they deliver entertainment and drive customer engagement. Pharmaceutical giant GSK uses them to connect with attendees at trade events. At the Computer History Museum in Mountain View, California, Ameca sits at the center of an exhibit, “Chatbots Decoded: Exploring AI,” an immersive experience that takes visitors through the history, current landscape, and future possibilities of chatbots and AI. “Ameca is a milestone in the history of AI, bringing together decades of work in robotics, natural language processing, large language models, and more,” shared Kirsten Tashev, vice president and chief curatorial and exhibitions officer at CHM, in a statement. “It delivers a highly engaging, nearly mind-blowing experience for our visitors. With its lifelike expressions, dynamic personality, sharp sense of humor, and remarkable ability to ‘read the room,’ Ameca continually amazes and delights audiences of all ages — even younger visitors, who are notoriously hard to engage.” Tashev cites partnering with Engineered Arts as a significant contributor to the exhibit’s success. “Interactive experiences must be reliable, safe, and magical in the education and entertainment industry. It’s a tall order, but Engineered Arts masterfully does all three,” Tashev said. source

Engineered Arts restructures with $10M to create humanoid robots Read More »

Google unveils AI coding assistant ‘Jules,’ promising autonomous bug fixes and faster development cycles

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google unveiled “Jules” on Wednesday, an artificial intelligence coding assistant that can autonomously fix software bugs and prepare code changes while developers sleep, marking a significant advancement in the company’s push to automate core programming tasks. The experimental AI-powered code agent, built on Google’s newly announced Gemini 2.0 platform, integrates directly with GitHub’s workflow system and can analyze complex codebases, implement fixes across multiple files, and prepare detailed pull requests without constant human supervision. The timing of Jules’ release is strategic. As the software development industry grapples with a persistent talent shortage and mounting technical debt, automated coding assistants have become increasingly crucial. Market research firm Gartner estimates that by 2028, AI-assisted coding will be involved in 75% of new application development. Unlike traditional coding assistants that merely suggest fixes, Jules operates as an autonomous agent within GitHub’s ecosystem. It analyzes codebases, creates comprehensive repair plans, and executes fixes across multiple files simultaneously. Most importantly, it integrates seamlessly with existing developer workflows. During a press conference, Jaclyn Konzelmann, director of product management at Google Labs, emphasized the system’s safety features. “Developers are in control along the way,” she explained. “Jules presents a suggested plan before taking action, and users can monitor its progress writing code.” The system requires explicit approval before merging any changes, maintaining human oversight of the development process. The rise of AI agents: How Jules fits into Google’s master plan Jules represents more than just a coding assistant; it’s part of Google’s broader vision for AI agents that can operate autonomously while remaining under human supervision. The system is powered by Gemini 2.0, Google’s latest large language model, which brings significant improvements in code understanding and generation. “We’re early in our understanding of the full capabilities of AI agents for computer use,” Konzelmann acknowledged during the press conference. This cautious approach reflects the broader industry concerns about AI safety and reliability, particularly in critical systems. The human factor: What Jules means for developer jobs For many developers, Jules raises important questions about the future of their profession. However, early testing suggests it’s more likely to enhance rather than replace human developers. At Lawrence Berkeley National Laboratory, researchers using Jules and related Google AI tools reduced certain analysis tasks from a week to minutes, allowing them to focus on more complex challenges. The financial implications of Jules could be substantial. Software development projects typically run significant risks of cost overruns, with large IT projects running 45% over budget and delivering 56% less value than predicted, according to McKinsey. By automating routine bug fixes and maintenance tasks, Jules could significantly reduce these costs while accelerating development cycles. Google’s strategy also positions it competitively against Microsoft’s GitHub Copilot and Amazon’s CodeWhisperer. The integration with GitHub’s workflow gives Google a strong foothold in the developer tools market, estimated to reach $937 billion by 2027. What’s next for AI-powered development Jules will initially be available to a select group of trusted testers, with broader access planned for early 2025. Google has already announced plans to integrate similar capabilities across its development ecosystem, including Android Studio and Chrome DevTools. The true test of Jules will be its ability to handle increasingly complex programming challenges while maintaining code quality and security. As one senior developer at a major tech firm noted, “The promise isn’t just about fixing bugs faster — it’s about fundamentally changing how we approach software development.” In an industry where the cost of poor code quality reaches $2.84 trillion annually according to CISQ, Jules might represent more than just another tool in the developer’s arsenal. It could mark the beginning of a new era when AI and human developers work in genuine partnership, potentially reshaping the future of software development itself. source

Google unveils AI coding assistant ‘Jules,’ promising autonomous bug fixes and faster development cycles Read More »

Lambda launches ‘inference-as-a-service’ API claiming lowest costs in AI industry

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Lambda is a 12-year-old San Francisco company best known for offering graphics processing units (GPUs) on demand as a service to machine learning researchers and AI model builders and trainers. But today it’s taking its offerings a step further with the launch of the Lambda Inference API (application programming interface), which it claims to be the lowest-cost service of its kind on the market. The API allows enterprises to deploy AI models and applications into production for end users without worrying about procuring or maintaining compute. The launch complements Lambda’s existing focus on providing GPU clusters for training and fine-tuning machine learning models. “Our platform is fully verticalized, meaning we can pass dramatic cost savings to end users compared to other providers like OpenAI,” said Robert Brooks, Lambda’s vice president of revenue, in a video call interview with VentureBeat. “Plus, there are no rate limits inhibiting scaling, and you don’t have to talk to a salesperson to get started.” In fact, as Brooks told VentureBeat, developers can head over to Lambda’s new Inference API webpage, generate an API key, and get started in less than five minutes. Lambda’s Inference API supports leading-edge models such as Meta’s Llama 3.3 and 3.1, Nous’s Hermes-3, and Alibaba’s Qwen 2.5, making it one of the most accessible options for the machine learning community. The full list is available here and includes: deepseek-coder-v2-lite-instruct dracarys2-72b-instruct hermes3-405b hermes3-405b-fp8-128k hermes3-70b hermes3-8b lfm-40b llama3.1-405b-instruct-fp8 llama3.1-70b-instruct-fp8 llama3.1-8b-instruct llama3.2-3b-instruct llama3.1-nemotron-70b-instruct llama3.3-70b Pricing begins at $0.02 per million tokens for smaller models like Llama-3.2-3B-Instruct, and scales up to $0.90 per million tokens for larger, state-of-the-art models such as Llama 3.1-405B-Instruct. As Lambda cofounder and CEO Stephen Balaban put it recently on X, “Stop wasting money and start using Lambda for LLM Inference.” Balaban published a graph showing its per-token cost for serving up AI models through inference compared to rivals in the space. Furthermore, unlike many other services, Lambda’s pay-as-you-go model ensures customers pay only for the tokens they use, eliminating the need for subscriptions or rate-limited plans. Closing the AI loop Lambda has a decade-plus history of supporting AI advancements with its GPU-based infrastructure. From its hardware solutions to its training and fine-tuning capabilities, the company has built a reputation as a reliable partner for enterprises, research institutions, and startups. “Understand that Lambda has been deploying GPUs for well over a decade to our user base, and so we’re sitting on literally tens of thousands of Nvidia GPUs, and some of them can be from older life cycles and newer life cycles, allowing us to still get maximum utility out of those AI chips for the wider ML community, at reduced costs as well,” Brooks explained. “With the launch of Lambda Inference, we’re closing the loop on the full-stack AI development lifecycle. The new API formalizes what many engineers had already been doing on Lambda’s platform — using it for inference — but now with a dedicated service that simplifies deployment.” Brooks noted that its deep reservoir of GPU resources is one of Lambda’s distinguishing features, reiterating that “Lambda has deployed tens of thousands of GPUs over the past decade, allowing us to offer cost-effective solutions and maximum utility for both older and newer AI chips.” This GPU advantage enables the platform to support scaling to trillions of tokens monthly, providing flexibility for developers and enterprises alike. Open and flexible Lambda is positioning itself as a flexible alternative to cloud giants by offering unrestricted access to high-performance inference. “We want to give the machine learning community unrestricted access to inference APIs. You can plug and play, read the docs, and scale rapidly to trillions of tokens,” Brooks explained. The API supports a range of open-source and proprietary models, including popular instruction-tuned Llama models. The company has also hinted at expanding to multimodal applications, including video and image generation, in the near future. “Initially, we’re focused on text-based LLMs, but soon we’ll expand to multimodal models,” Brooks said. Serving devs and enterprises with privacy and security The Lambda Inference API targets a wide range of users, from startups to large enterprises, in media, entertainment, and software development. These industries are increasingly adopting AI to power applications like text summarization, code generation, and generative content creation. “There’s no retention or sharing of user data on our platform. We act as a conduit for serving data to end users, ensuring privacy,” Brooks emphasized, reinforcing Lambda’s commitment to security and user control. As AI adoption continues to rise, Lambda’s new service is poised to attract attention from businesses seeking cost-effective solutions for deploying and maintaining AI models. By eliminating common barriers such as rate limits and high operating costs, Lambda hopes to empower more organizations to harness AI’s potential. The Lambda Inference API is available now, with detailed pricing and documentation accessible through Lambda’s website. source

Lambda launches ‘inference-as-a-service’ API claiming lowest costs in AI industry Read More »

Scaling AI: Platform best practices

This is a VB Lab Insights article presented by Capital One. Enterprises are now deeply invested in how they build and continually evolve world-class enterprise platforms that enable AI use cases to be built, deployed, scaled, and evolve over time. Many companies have historically taken a federated approach to platforms as they built capabilities and features to support the bespoke needs of individual areas of their business. Today, however, advances like generative AI introduce new challenges that require an evolved approach to building and scaling enterprise platforms. This includes factoring in the specialized talent and Graphics Processing Unit (GPU) resource needs for training and hosting large language models, access to huge volumes of high-quality data, close collaboration across many teams to deploy agentic workflows, and a high level of maturity for internal application programming interfaces (APIs) and tooling that multi-agentic workflows require, to name a few. Disparate systems and a lack of standardization hinder companies’ ability to embrace the full potential of AI. At Capital One, we’ve learned that large enterprises should be guided by a common set of best practices and platform standards to effectively deploy AI at scale. While the details will vary, there are four common principles that help companies to successfully deploy AI at scale to unlock value for their business: 1. Everything starts with the user The goal for any enterprise platform is to empower users — therefore you must start with those users’ needs. You should seek to understand how your users are engaging with your platforms, what problems they’re trying to solve and any friction they’re coming up against. At Capital One for instance, a key tenet guiding our AI/ML platform teams is that we obsess over all aspects of the customer experience, even those we don’t directly oversee. For example, we undertook a number of initiatives in recent years to solve the data and access management pain points for our users, even though we rely on other enterprise platforms for these. As you earn the trust and engagement of your users, you can innovate and reimagine the art of what’s possible with new ideas and by going “further up the stack.” This customer obsession is the foundation for building long-lasting and sustainable platforms. 2. Establishing a multi-tenant platform control plane Multi-tenancy is essential for any enterprise platform, allowing multiple business lines and distributed teams to use the core platform capabilities such as compute, storage, inference services, workflow orchestration, etc. in a shared but well-managed environment. It allows you to solve core data access pain points, allows abstraction, enables multiple compute patterns, and it simplifies the provisioning and management of compute instances for core services — for example, the large fleet of GPUs and Central Processing Units (CPUs) that AI/ML workloads require. With the right design of a multi-tenant platform control plane, you can integrate both best-in-class open-source and commercial software components, and scale flexibly as the platform evolves over time. At Capital One, we have developed a robust platform control plane with Kubernetes as the foundation, which scales to our large fleet of compute clusters on AWS, that are used by thousands of active AI/ML users across the company. We routinely experiment with and adopt best-in-class open-source and commercial software components as plug-ins, and develop our own proprietary capabilities where they give us a competitive edge. For the end-user, this enables access to the latest technologies and greater self-service capabilities, empowering teams to build and deploy on our platforms without having to call on our engineering teams for support.  3. Embedding automation and governance As you build a new platform, it’s critical to have the right mechanisms in place to collect logs and insights on models and features along the end-to-end lifecycle, as they are built, tested and deployed. Enterprises can automate core tasks such as lineage tracking, adherence to enterprise controls, observability, monitoring and detection across various layers of their platforms. By standardizing and automating these tasks, it is possible to cut weeks and in some cases, months of time from developing and deploying new mission-critical models and AI use cases. At Capital One, we’ve taken this a step further by building a marketplace of reusable components and software development kits (SDKs) that have built-in observability and governance standards. These empower our associates to find the reusable libraries, workflows and user-contributed code they need to develop AI models and apps with confidence knowing that the artifacts they are building on enterprise platforms are well-managed under the hood. In fact, at this point in our journey, we consider this level of automation and standardization as a competitive advantage. 4. Investing in talent and effective business routines Building state-of-the-art AI platforms requires a world-class, cross-functional team. An effective AI platform team must be multidisciplinary and diverse, inclusive of data scientists, engineers, designers,  product managers, cyber and model risk experts and more. Each of these team members brings with them unique skills and experiences and has a key role to play in building and iterating on an AI platform that works for all users and can be extensible over time.  At Capital One, we have made it our mission to partner cross-functionally across the company as we build and deploy our AI platform capabilities. As we’ve sought to evolve our organization and build up our AI workforce, we established the Machine Learning Engineer role in 2021 and more recently, the AI Engineer role, to recruit and retain the technical talent that will help us continue to stay at the frontier of AI and solve the most challenging problems in financial services. Along the way, establishing and communicating well-defined roadmaps and change controls for the platform users, and incorporating feedback loops into your planning and software delivery processes is critical to ensuring your users stay informed, can contribute to what’s coming, and understand the benefits of the platform strategy you’re putting in place. Future-proofing your foundations for AI Building or transforming enterprise platforms for the AI era is no small task, but it will set your business

Scaling AI: Platform best practices Read More »

NotebookLM updates Business to Plus with more audio, lets all users interact with AI hosts

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google expanded access to the business version of its popular NotebookLM app, now called NotebookLM Plus, targeting enterprises, teams and individuals who rely on the app’s research tools.  The company also updated its podcast-like Audio Overview feature, which allows users to interact with the AI hosts and ask questions out loud.  The research tool, which lets people gather information into “notebooks” and ask questions with answers from the source material, launched in July last year on preview. It proved popular and became generally available in December. Originally built with Gemini 1.5, NotebookLM has been upgraded with an experimental version of Google 2.0 Flash, Google said. After the Google team noticed a lot of different use cases for NotebookLM — including many enterprise projects — the company launched NotebookLM Plus for enterprises, teams and individuals who use the application a lot. NotebookLM Plus will have five times as many Audio Overviews, notebooks and sources per notebook.  Premium users can also customize the style and tone of notebooks, share notebooks with team members, and see usage analytics. Google said it also added more privacy and security features.  NotebookLM Plus can be accessed through Google Workspace or Google Agentspace. Next year, NotebookLM Plus will be included in the Google One AI Premium subscription.    Google announced what was then called NotebookLM Business in October as a pilot program for new business-focused uses for the application.  Audio interaction  Audio Overviews, where users can generate an audio conversation based on the information in the notebook, came out in September and became an instant hit. The podcast-y nature of the audio offered a way to help people digest complex information via a conversation between two people and proved very popular. The tool often featured two AI-generated hosts chatting about the information in the notebooks; now, NotebookLM users can interject and ask questions using their voice to get more details or direct the conversation. Users can create a new Audio Overview, tap the “interactive” button and then click on “join” while listening, and the AI hosts will call on the user to ask their question.   Interacting with the hosts of AI Overviews will be available only on new Audio Overviews, not on existing ones.  Google warned in a blog post that interacting with the Audio Overview is still experimental, and the “hosts may also pause awkwardly before responding or [may] occasionally introduce inaccuracies.” Former NotebookLM product lead Raiza Martin had told VentureBeat that Google would introduce more controls and interactions with Audio Overview.  All-new look Google redesigned NotebookLM to help users “better manage content and ask the AI interface questions about their sources.” The new look introduces three panels: a Sources panel for all the documents or files uploaded to NotebookLM; a Chat panel to access the Gemini chat box to interrogate data sources; and the Studio panel for creating study guides, briefing documents and Audio Overviews.  “From the start, we wanted NotebookLM to be a tool that would let you move effortlessly from asking questions to reading your sources to capturing your own ideas. Today, we’re rolling out a new design that makes it easier than ever to switch between those different activities in a single, unified interface,” Google said in a blog post.  Enterprise interest Since its launch, NotebookLM has​​ had various uses​​, even in the enterprise space. Some have even claimed it is a “CRM killer.” Users posted on social media about the different ways they’ve been using NotebookLM. Sam Lessin, former vice president of product at Meta and general partner at Slow Ventures, said on X that his firm uses NotebookLM instead of a CRM.   Martin previously told VentureBeat that her team saw many users begin sharing notebooks with others, making some notebooks the repository of data around company policies or project research.  source

NotebookLM updates Business to Plus with more audio, lets all users interact with AI hosts Read More »

ChatGPT gets screensharing and real-time video analysis, rivaling Gemini 2

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI finally added long-awaited video and screen sharing to its advanced voice mode, allowing users to interact with the chatbot in different modalities. Both capabilities are now available on iOS and Android mobile apps for ChatGPT Teams, Plus and Pro users, and will be rolled out to ChatGPT Enterprise and Edu subscribers in January. However, users in the EU, Switzerland, Iceland, Norway and Liechtenstein won’t be able to access advanced voice mode. OpenAI first teased the feature in May, when the company unveiled GPT-4o and discussed ChatGPT learning to “watch” a game and explain what’s happening. Advanced voice mode was rolled out to users in September. Credit: OpenAI Users can access video via new buttons on the advanced voice mode screen to start a video.  OpenAI’s video mode feels like a video call like Facetime, because ChatGPT responds in real time to what users show in the video. It can see what is around the user, identify objects and even remember people who introduce themselves. In an OpenAI demo as part of the company’s “12 Days of Shipmas” event, ChatGPT used the video feature to help brew coffee. ChatGPT saw the coffee paraphernalia, instructed when to put in a filter and critiqued the result.  It is also very similar to Google’s recently announced Project Astra, in which users can open a video chat, and Gemini 2.0 will respond to questions about what it sees, like identifying a sculpture found in a London street. In many ways, these features are more advanced versions of what AI devices like the Humane Pin and the Rabbit r1 were marketed to do: Have an AI voice assistant respond to questions about what it’s seeing in a video.  Sharing a screen  The new screen-sharing feature brings ChatGPT out of the app and into the realm of the browser.  For screen share, a three-dot menu allows users to navigate out of the ChatGPT app. They can open apps on their phones and ask ChatGPT questions about what it’s seeing. In the demo, OpenAI researchers triggered screen share, then opened the Messages app to ask ChatGPT for help responding to a photo sent via text message.  However, the screen-sharing feature on advanced voice mode bears similarities to recently released features from Microsoft and Google.  Last week, Microsoft released a preview version of Copilot Vision, which lets Pro subscribers open a Copilot chat while browsing a webpage. Copilot Vision can look at photos on a store’s website or even help play the map guessing game Geoguessr. Google’s Project Astra can also read browsers in the same way.  Both Google and OpenAI released screen-sharing AI chat features on phones to target the consumer base who may be using ChatGPT or Gemini more on the go. But these types of features could signal a way for enterprises to collaborate more with AI agents, as the agent can see what a person is looking at onscreen. It can be a precursor to models that use computers, like Anthropic’s Computer Use, where the AI model is not only looking at a screen but is actively opening tabs and programs for the user.  Ho ho ho, ask Santa a question  In a bid for levity, OpenAI also rolled out “Santa Mode” in advanced voice mode. The new preset voice sounds much like the jolly old man in a red suit. Unlike the new features restricted to specific users, “Santa Mode” is now available to users with access to advanced voice mode on the mobile app, the web version of ChatGPT and the Windows and MacOS apps until early January.  Chats with Santa, though, will not be saved in chat history and will not affect ChatGPT’s memory.  Even OpenAI is feeling the Christmas spirit. source

ChatGPT gets screensharing and real-time video analysis, rivaling Gemini 2 Read More »

Midjourney is launching a multiplayer collaborative worldbuilding tool called ‘Patchwork’

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Midjourney, the popular AI image generation startup with more than 21 million users on its Discord server alone, is branching out from AI image creation and editing. Patchwork revealed Max Kreminski, leader of Midjourney’s Storytelling Lab, demoed the new tool, called “Patchwork,” in a livestream screenshare on Discord and X via Restream. Screenshot of a Patchwork world. He clarified that it would be a stand alone app that would require Midjourney accounts to log into, and that the URL would be available as a “research preview” in the Midjourney Discord server’s “updates” channel. Users will need to connect their Midjourney Discord account to their Google Account to access Patchwork’s research preview. The company posted instructions for doing so on its X account. The tool appears to be a web-based blank white, infinite canvas with a “toolbox” on the left side of the browser screen, showing a variety of buttons labeled for “character,” “event,” “faction,” “place,” “prop,” and “random,” as well as tools such as “note,” “image,” “portal,” “save” and “share.” “Save” downloads a JSON file with links to all the Midjourney images created in the canvas. Midjourney considers each canvas a separate digital “world.” To switch between worlds, the user creates a “portal,” a small black circular button. To generate a new world, the user enters a text prompt into an editor bar at the top of the “create” screen and selects one or more of a set of 10 different image styles. This then produces a new whiteboard with a bunch of new still image assets and text boxes or entities known as “scraps”, including input boxes that allow the user to prompt new images or settings that fit the initial world description, even whole new AI generated character descriptions. In the demo livestream, the character name automatically populated with Marcus “Dizzy” Gillespie, echoing the name of the famous jazz musician. Dragging the description into a new character image creator box produces four new AI-generated images. Adding new character boxes, the user can then prompt to create names and characteristics, as well as motivations that can spur a conflict for the basis of a story. The user can then link characters together with lines that denote connections between them. They can also write action sequences and scene descriptions that each narrate a story. Each character can be used in multiple images and these images gathered together with a single option. The user can “share” the board with other Midjourney users who can collaborate, purportedly in real-time, with multiple cursors moving across the same shared canvas. A single world can support dozens, even up to 100 users, according to Kreminski. However, he noted that the more users, the more chaotic the experience would be. Kreminski said only users who are logged in can view boards (for now), but in the future, boards may be viewable by non-users. He mentioned that tabletop roleplaying groups were already using the feature to chart their campaigns. He also said that Midjourney version 7 (V7) would include a setting to allow multiple character consistency across different and new images. Moving towards immersive, 3D worlds Kreminski further revealed that there were at least 3 different large language models powering the application, including a fine-tuned open source one unique to Midjourney. Ultimately, it appears to be a novel, complex, powerful, somewhat overwhelming yet compelling tool for storyboarding. I could easily see it being used by writers and film directors, game designers, comic book creators and even live theater directors and writers. In the long term, Kreminski said there was a “very clear path in terms of escalation of the details and interactions in the worlds,” including fully immersive 3D virtual reality scenes, but that was likely years away. The news comes as other AI researchers, startups such as Fei-Fei Li’s World Labs, and big tech companies such as Google seek to develop AI that can create 3D immersive, navigable worlds online from simple prompts or images. More Midjourney updates coming soon In addition, Midjourney’s creator David Holz joined the announcement livestream to state the startup would launch multiple model personalization modes in the coming days. Currently, Midjourney allows users to rate images to personalize the kinds of visuals they want to see in generations, and fine-tune the model to personal preferences. Now, the startup will allow users to have multiple personalized versions they can toggle between. In addition, Holz shared that Midjourney would allow users to upload and reference multiple images to boards to guide generations. Furthermore, sometime after Christmas (December 25), Midjourney will be introducing video models and a Midjourney V7 AI image generator that will feature increased prompt understanding. Holz further revealed that Midjourney is working on three to four new hardware projects and said the startup was “trying to branch out and become a full research lab…it may take us six months to announce all six things.” source

Midjourney is launching a multiplayer collaborative worldbuilding tool called ‘Patchwork’ Read More »