VentureBeat

TD Securities taps Layer 6 and OpenAI to deliver real-time equity insights to sales and trading teams

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Despite being a highly regulated industry, equity trading has consistently been at the forefront of technological innovations in the financial services sector. However, when it comes to agents and AI applications, many banks have taken a more cautious approach to adoption.  TD Securities, the equity and securities trading arm of TD Bank, rolled out its TD AI Virtual Assistant on July 8, aimed toward its front office institutional sales, trading and research professionals to help them manage their workflow.  TD Securities CIO Dan Bosman told VentureBeat that the virtual assistant’s primary goal is to help front-office equity sales and traders gain client insights and research.  “The first version of this began as a pilot, which we then subsequently scaled,” Bosman said. “It’s really about accessing that equity research data that our analysts put out and bringing it to the hands of the sales team in a way that’s sales-friendly.” AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Bosman noted that being around a trading floor means being exposed to a lot of the lingo, and the context in which users ask some questions feels very unique. So the AI assistant has to sound natural, intuitive and access the insights generated by traders.  Building TD AI Bosman said the idea for the AI assistant came from a member of the equity sales team. Fortunately, the bank has a platform called TD Invent, where employees can bring ideas and the innovation leadership team can evaluate projects responsibly.  “Someone in our equity research sales desk came in and pretty much said, I’ve got this idea and brought it to TD Invent,” Bosman said. “What I’ve loved most about this is when you build something super magical, you don’t need to go out and sell or put a face on it. Folks come in and say to us, ‘we want this, we need this or we’ve got ideas,’ and it’s truly the best when we’re able to bring our investment in data, cloud and infrastructure together.” TD Security built the TD AI virtual assistant by leveraging OpenAI’s GPT models. Bosman said TD worked with its technology teams and the Canadian AI company Layer 6, which the bank acquired in 2018, as well as with other strategic partnerships. The assistant integrates with the bank’s cloud infrastructure, allowing it to access internal research documents and market data, such as 13F filings and historical equity data.  Bosman calls TDS AI a Knowledge Management System, a term that generally encompasses its ability to retrieve, through retrieval augmented generation (RAG) processes, aggregate and synthesize information into “concise context-aware summaries and insights” so its sales teams can answer client questions. TD AI virtual assistant also gives users access to TD Bank’s foundation model, TD AI Prism.  The model, launched in June, is in use throughout the entire bank and not just for TD Securities. During the launch, the bank said TD AI Prism will improve the predictive performance of TD Bank’s applications by processing 100 times more data, replacing its single-architecture models and ensuring customer data stays internal.  “The development posed unique challenges, as gen AI was relatively new to the organization when the initiative began, requiring careful navigation of governance and controls,” Bosman said. “Despite this, the project successfully brought together diverse teams across the enterprise, fostering collaboration to deliver a cutting-edge solution.” He added that one of the standout features is its text-to-SQL capability, which converts natural language prompts into SQL queries.  To train the assistant, Bosman said TD Securities developed optimizations to make the process easier.  “With patent-pending optimizations in prompt engineering and dynamic few-shot examples retrieval, we successfully achieved the business’s desired performance through context learning,” Bosman said. “As a result, fine-tuning the underlying OpenAI model was not required for interacting with both unstructured as well as tabular datasets.” Banks slowly entering the agentic era TD Bank and TD Securities, of course, are not the only banks interested in expanding from assistants to AI agents.  BNY told VentureBeat that it began offering multi-agent solutions to its sales teams to help answer customer questions, such as those related to foreign currency support. Wells Fargo also saw an increase in the usage of its internal AI assistant. For its auto sales customers, Capital One built an agent that helps them sell more cars.  Many of these use cases emerged after months of pilot testing, as is the case in every other industry; however, financial institutions have the additional burden of strict customer data privacy and fiduciary responsibilities. TD Securities’ Bosman noted that many employees, even on the bank’s business side, are increasingly familiar with tools like ChatGPT. The challenge with pilot testing assistants and agents lies less in teaching them about the tools, but in establishing best practices for using the assistants, integrating them into existing workflows, understanding their limitations and how humans can provide feedback to mitigate hallucinations.  Eventually, Bosman said the assistant would evolve into something even its users outside of the bank would want to use when interacting with TD Securities.  “My vision is that we see AI as something that can add value to us, but also to external customers at the bank. Right now, it’s a massive opportunity for us around driving a stronger client experience and delivering a better colleague experience,” Bosman said.  source

TD Securities taps Layer 6 and OpenAI to deliver real-time equity insights to sales and trading teams Read More »

Why open-source AI became an American national priority

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now When President Trump released the U.S. AI Action Plan last week, many were surprised to see “encourage open-source and open-weight AI,” as one of the administration’s top priorities. The White House has elevated what was once a highly technical topic into an urgent national concern — and a key strategy to winning the AI race against China. China’s emphasis on open source, also highlighted in its own Action Plan released shortly after the U.S., makes the open-source race imperative. And the global soft power that comes with more open models from China makes their recent leadership even more notable.  When DeepSeek-R1, a powerful open-source large language model (LLM) out of China, was released earlier this year, it didn’t come with a press tour. No flashy demos. No keynote speeches. But it was open weights and open science. Open weight means anyone with the right skills and computing resources can run, replicate, or make a model their own; open science shares some of the tricks behind the model development. Within hours, researchers and developers seized on it. Within days, it became the most-liked model of all time on Hugging Face — with thousands of variants created and used across major tech companies, research labs and startups. Most strikingly, this explosion of adoption happened not just abroad, but in the U.S. For the first time, American AI was being built on Chinese foundations. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO DeepSeek wasn’t the only one Within a week, the U.S. stock market — sensing the tremor — took a tumble. It turns out Deepseek was just the opening act. Dozens of Chinese research groups are now pushing the frontiers of open-source AI, sharing not only powerful models, but the data, code and scientific methods behind them. They’re moving quickly — and they’re doing it in the open. Meanwhile, U.S.-based companies — many of which pioneered the modern AI revolution — are increasingly closing up. Flagship models like GPT-4, Claude and Gemini are no longer released in ways that allow builders more control. They’re accessible only through chatbots or APIs: Gated interfaces that let you interact with a model but not see how it works, retrain it or use it freely. The model’s weights, training data and behavior remain proprietary, tightly controlled by a few tech giants. This is a dramatic reversal. Between 2016 and 2020, the U.S. was the global leader in open-source AI. Research labs from Google, OpenAI, Stanford and elsewhere released breakthrough models and methods that laid the foundation for everything we now call “AI.” The transformer — the “T” in ChatGPT — was born out of this open culture. Hugging Face was created during this era to democratize access to these technologies. Now, the U.S. is slipping, and the implications are profound. American scientists, startups and institutions are increasingly driven to build on Chinese open models because the best U.S. models are locked behind APIs. As each new open model emerges from abroad, Chinese companies like DeepSeek and Alibaba strengthen their positions as foundational layers in the global AI ecosystem. The tools that power America’s next generation of AI products, research and infrastructure are increasingly coming from overseas. And at a deeper level, there’s a more fundamental risk: Every advancement in AI — including the most closed systems — is built on open foundations. Proprietary models depend on open research, from transformer architecture to training libraries and evaluation frameworks. But more importantly, open-source increases a country’s velocity in building AI. It fuels rapid experimentation, lowers barriers to entry and creates compounding innovation. When openness slows down, the entire ecosystem follows. If the U.S. falls behind in open-source today, it may find itself falling behind in AI altogether. Moving away from black box AI This matters not just for innovation, but for security, science and democratic governance. Open models are transparent and auditable. They allow governments, educators, healthcare institutions and small businesses to adapt AI to their needs, without vendor lock-in or black-box dependencies. We need more and better U.S.-developed open source models and artifacts. U.S. institutions already pushing for openness must build on their success. Meta’s open-weight Llama family has led to tens of thousands of variations on Hugging Face. The Allen Institute for AI continues to publish excellent fully open models. Promising startups like Black Forest are building open multimodal systems. Even OpenAI has suggested it may release open weights soon. With more public and policy support for open-source AI, as demonstrated by the U.S. AI Action Plan, we can restart a decentralized movement that will ensure America’s leadership. It’s time for the American AI community to wake up, drop the “open is not safe” narrative, and return to its roots: Open science and open-source AI, powered by an unmatched community of frontier labs, big tech, startups, universities and non‑profits. We can restart a decentralized movement that will ensure U.S. leadership, built on openness, competition and scientific inquiry, and empower the next generation of builders. If we want AI to reflect democratic principles, we have to build it in the open. And if the U.S. wants to lead the AI race, it must lead the open-source AI race. Clément Delangue is the co-founder and CEO of Hugging Face. source

Why open-source AI became an American national priority Read More »

OpenAI is editing its GPT-5 rollout on the fly — here’s whats changing in ChatGPT

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI’s launch of its most advanced AI model GPT-5 last week has been a stress test for the world’s most popular chatbot platform with 700 million weekly active users — and so far, OpenAI is openly struggling to keep users happy and its service running smoothly. The new flagship model GPT-5 — available in four variants of different speed and intelligence (regular, mini, nano, and pro), alongside longer-response and more powerful “thinking” modes for at least three of these variants — was said to offer faster responses, more reasoning power, and stronger coding ability. Instead, it was greeted with frustration: some users were vocally dismayed by OpenAI’s decision to abruptly remove the older underlying AI models from ChatGPT — ones users previously relied upon, and in some cases, forged deep emotional fixations with — and by the apparent worse performance by GPT-5 than said older models on tasks in math, science, writing and other domains. Indeed, the rollout has exposed infrastructure strain, user dissatisfaction, and a broader, more unsettling issue now drawing global attention: the growing emotional and psychological reliance some people form on AI and resulting break from reality some users experience, known as “ChatGPT psychosis.” AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO From bumpy debut to incremental fixes The long-anticipated GPT-5 model family debuted Thursday, August 7 in a livestreamed event beset with chart errors and some voice mode glitches during the presentation. But worse than these cosmetic issues for many users was the fact that OpenAI automatically deprecated its older AI models that used to power ChatGPT — GPT-4o, GPT-4.1, o3, o4-mini and o4-high — forcing all users over to the new GPT-5 model and directing their queries to different versions of its “thinking” process without revealing why or which specific model version was being used. Early adopters to GPT-5 reported basic math and logic mistakes, inconsistent code generation, and uneven real-world performance compared to GPT-4o. For context, the old models GPT-4o, o3, o4-mini and more still remain available and have remained available to users of OpenAI’s paid application programming interface (API) since the launch of GPT-5 on Thursday. By Friday, OpenAI co-fonder CEO Sam Altman conceded the launch was “a little more bumpy than we hoped for,” and blamed a failure in GPT-5’s new automatic “router” — the system that assigns prompts to the most appropriate variant. Altman and others at OpenAI claimed the “autoswitcher” went offline “for a chunk of the day,” making the model seem “way dumber” than intended. The launch of GPT-5 was preceded just days prior by the launch of OpenAI’s new open source large language models (LLMs) named gpt-oss, which also received mixed reviews. These models are not available on ChatGPT, rather, they are free to download and run locally or on third-party hardware. How to switch back from GPT-5 to GPT-4o in ChatGPT Within 24 hours, OpenAI restored GPT-4o access for Plus subscribers (those paying $20 per month or more subscription plans), pledged more transparent model labeling, and promised a UI update to let users manually trigger GPT-5’s “thinking” mode. Already, users can go and manually select the older models on the ChatGPT website by finding their account name and icon in the lower left corner of the screen, clicking it, then clicking “Settings” and “General” and toggling on “Show legacy models.” There’s no indication from OpenAI that other old models will be returning to ChatGPT anytime soon. Upgraded usage limits for GPT-5 Altman said that ChatGPT Plus subscribers will get twice as many messages using the GPT-5 “Thinking” mode that offers more reasoning and intelligence — up to 3,000 per week — and that engineers began fine-tuning decision boundaries in the message router. Sam Altman announced the following updates after the GPT-5 launch – OpenAI is testing a 3,000-per-week limit for GPT-5 Thinking messages for Plus users, significantly increasing reasoning rate limits today, and will soon raise all model-class rate limits above pre-GPT-5 levels… pic.twitter.com/ppvhKmj95u — Tibor Blaho (@btibor91) August 10, 2025 By the weekend, GPT-5 was available to 100% of Pro subscribers and “getting close to 100% of all users.” Altman said the company had “underestimated how much some of the things that people like in GPT-4o matter to them” and vowed to accelerate per-user customization — from personality warmth to tone controls like emoji use. Looming capacity crunch Altman warned that OpenAI faces a “severe capacity challenge” this week as usage of reasoning models climbs sharply — from less than 1% to 7% of free users, and from 7% to 24% of Plus subscribers. He teased giving Plus subscribers a small monthly allotment of GPT-5 Pro queries and said the company will soon explain how it plans to balance capacity between ChatGPT, the API, research, and new user onboarding. Altman: model attachment is real — and risky In a post on X last night, Altman acknowledged a dynamic the company has tracked “for the past year or so”: users’ deep attachment to specific models. “It feels different and stronger than the kinds of attachment people have had to previous kinds of technology,” he wrote, admitting that suddenly deprecating older models “was a mistake.” If you have been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachment people have had to previous kinds of technology (and so suddenly… — Sam Altman (@sama) August 11, 2025 He tied this to a broader risk: some users treat ChatGPT as a therapist or life coach, which can be beneficial, but for a “small percentage” can reinforce delusion or undermine long-term well-being. While OpenAI’s

OpenAI is editing its GPT-5 rollout on the fly — here’s whats changing in ChatGPT Read More »

Salesforce’s new CoAct-1 agents don’t just point and click — they write code to accomplish tasks faster and with greater success rates

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at Salesforce and the University of Southern California have developed a new technique that gives computer-use agents the ability to execute code while navigating graphical user interfaces (GUIs), that is, writing scripts while also moving a cursor and/or clicking buttons on an application, combining the best of both approaches to speed up workflows and reduce errors. This hybrid approach allows an agent to bypass brittle and inefficient mouse clicks for tasks that can be better accomplished through coding. The system, called CoAct-1, sets a new state-of-the-art on key agent benchmarks, outperforming other methods while requiring significantly fewer steps to accomplish complex tasks on a computer. This upgrade can pave the way for more robust and scalable agent automation with significant potential for real-world applications. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO The fragility of point-and-click AI agents Computer use agents typically rely on vision-language and vision-language-action models (VLMs or VLAs) to perceive a screen and take action, mimicking how a person uses a mouse and keyboard. While these GUI-based agents can perform a variety of tasks, they often falter when faced with long, complex workflows, especially in applications with dense menus and options, like office productivity suites. For example, a task that involves locating a specific table in a spreadsheet, filtering it, and saving it as a new file can involve a long and precise sequence of GUI manipulations. This is where brittleness creeps in. “In these scenarios, existing agents frequently struggle with visual grounding ambiguity (e.g., distinguishing between visually similar icons or menu items) and the accumulated probability of making any single error over the long horizon,” the researchers write in their paper. “A single mis-click or misunderstood UI element can derail the entire task.” To address these challenges, many researchers have focused on augmenting GUI agents with high-level planners. These systems use powerful reasoning models like OpenAI’s o3 to decompose a user’s high-level goal into a sequence of smaller, more manageable subtasks. While this structured approach improves performance, it doesn’t solve the problem of navigating menus and clicking buttons, even for operations that could be done more directly and reliably with a few lines of code. CoAct-1: A multi-agent team for computer tasks To solve these limitations, the researchers created CoAct-1 (Computer-using Agent with Coding as Actions), a system designed to “combine the intuitive, human-like strengths of GUI manipulation with the precision, reliability, and efficiency of direct system interaction through code.” The system is structured as a team of three specialized agents that work together: an Orchestrator, a Programmer, and a GUI Operator. CoAct-1 framework (source: arXiv) The Orchestrator acts as the central planner or project manager. It analyzes the user’s overall goal, breaks it down into subtasks, and assigns each subtask to the best agent for the job. It can delegate backend operations like file management or data processing to the Programmer, which writes and executes Python or Bash scripts. For frontend tasks that require clicking buttons or navigating visual interfaces, it turns to the GUI Operator, a VLM-based agent. “This dynamic delegation allows CoAct-1 to strategically bypass inefficient GUI sequences in favor of robust, single-shot code execution where appropriate, while still leveraging visual interaction for tasks where it is indispensable,” the paper states. The workflow is iterative. After the Programmer or GUI Operator completes a subtask, it sends a summary and a screenshot of the current system state back to the Orchestrator, which then decides the next step or concludes the task. The Programmer agent uses an LLM to generate its code and sends commands to a code interpreter to test and refine its code over multiple rounds. Similarly, the GUI Operator uses an action interpreter that executes its commands (e.g., mouse clicks, typing) and returns the resulting screenshot, allowing it to see the outcome of its actions. The Orchestrator makes the final decision on whether the task should continue or stop. Example of CoAct-1 in action (source: arXiv) A more efficient path to automation The researchers tested CoAct-1 on OSWorld, a comprehensive benchmark that includes 369 real-world tasks across browsers, IDEs, and office applications. The results show CoAct-1 establishes a new state-of-the-art, achieving a success rate of 60.76%. The performance gains were most significant in categories where programmatic control offers a clear advantage, such as OS-level tasks and multi-application workflows. For instance, consider an OS-level task like finding all image files within a complex folder structure, resizing them, and then compressing the entire directory into a single archive. A purely GUI-based agent would need to perform a long, brittle sequence of clicks and drags, opening folders, selecting files, and navigating menus, with a high chance of error at each step. CoAct-1, by contrast, can delegate this entire workflow to its Programmer agent, which can accomplish the task with a single, robust script. Beyond just a higher success rate, the system is dramatically more efficient. CoAct-1 solves tasks in an average of just 10.15 steps, a stark contrast to the 15.22 steps required by leading GUI-only agents like GTA-1. While other agents like OpenAI’s CUA 4o averaged fewer steps, their overall success rate was much lower, indicating CoAct-1’s efficiency is coupled with greater effectiveness. The researchers found a clear trend: tasks that require more actions are more likely to fail. Reducing the number of steps not only speeds up task completion but, more importantly, minimizes the opportunities for error. Therefore, finding ways to compress multiple GUI steps into a single programmatic task can make the process both more efficient and less error-prone. As the researchers conclude, “This efficiency underscores the potential of our approach to pave a more robust and scalable path toward generalized computer automation.” CoAct-1 performs tasks with fewer

Salesforce’s new CoAct-1 agents don’t just point and click — they write code to accomplish tasks faster and with greater success rates Read More »

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now The rise in Deep Research features and other AI-powered analysis has given rise to more models and services looking to simplify that process and read more of the documents businesses actually use.  Canadian AI company Cohere is banking on its models, including a newly released visual model, to make the case that Deep Research features should also be optimized for enterprise use cases.  The company has released Command A Vision, a visual model specifically targeting enterprise use cases, built on the back of its Command A model. The 112 billion parameter model can “unlock valuable insights from visual data, and make highly accurate, data-driven decisions through document optical character recognition (OCR) and image analysis,” the company says. “Whether it’s interpreting product manuals with complex diagrams or analyzing photographs of real-world scenes for risk detection, Command A Vision excels at tackling the most demanding enterprise vision challenges,” the company said in a blog post.  AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO This means Command A Vision can read and analyze the most common types of images enterprises need: graphs, charts, diagrams, scanned documents and PDFs.  ? @cohere just dropped Command A Vision on @huggingface ? Designed for enterprise multimodal use cases: interpreting product manuals, analyzing photos, asking about charts… ❓?? A 112B dense vision-language model with SOTA performance – check out the benchmark metrics in… pic.twitter.com/ORMfM5f8cF — Jeff Boudier ? (@jeffboudier) July 31, 2025 Since it’s built on Command A’s architecture, Command A Vision requires two or fewer GPUs, just like the text model. The vision model also retains the text capabilities of Command A to read words on images and understands at least 23 languages. Cohere said that, unlike other models, Command A Vision reduces the total cost of ownership for enterprises and is fully optimized for retrieval use cases for businesses.  How Cohere is architecting Command A Cohere said it followed a Llava architecture to build its Command A models, including the visual model. This architecture turns visual features into soft vision tokens, which can be divided into different tiles.  These tiles are passed into the Command A text tower, “a dense, 111B parameters textual LLM,” the company said. “In this manner, a single image consumes up to 3,328 tokens.” Cohere said it trained the visual model in three stages: vision-language alignment, supervised fine-tuning (SFT) and post-training reinforcement learning with human feedback (RLHF). “This approach enables the mapping of image encoder features to the language model embedding space,” the company said. “In contrast, during the SFT stage, we simultaneously trained the vision encoder, the vision adapter and the language model on a diverse set of instruction-following multimodal tasks.” Visualizing enterprise AI  Benchmark tests showed Command A Vision outperforming other models with similar visual capabilities.  Cohere pitted Command A Vision against OpenAI’s GPT 4.1, Meta’s Llama 4 Maverick, Mistral’s Pixtral Large and Mistral Medium 3 in nine benchmark tests. The company did not mention if it tested the model against Mistral’s OCR-focused API, Mistral OCR.  It enables agents to securely see inside your organization’s visual data, unlocking the automation of tedious tasks involving slides, diagrams, PDFs, and photos. pic.twitter.com/iHZnUWekrk — cohere (@cohere) July 31, 2025 Command A Vision outscored the other models in tests such as ChartQA, OCRBench, AI2D and TextVQA. Overall, Command A Vision had an average score of 83.1% compared to GPT 4.1’s 78.6%, Llama 4 Maverick’s 80.5% and the 78.3% from Mistral Medium 3.  Most large language models (LLMs) these days are multimodal, meaning they can generate or understand visual media like photos or videos. However, enterprises generally use more graphical documents such as charts and PDFs, so extracting information from these unstructured data sources often proves difficult.  With Deep Research on the rise, the importance of bringing in models capable of reading, analyzing and even downloading unstructured data has grown. Cohere also said it’s offering Command A Vision in an open weights system, in hopes that enterprises looking to move away from closed or proprietary models will start using its products. So far, there is some interest from developers. Very impressed at its accuracy extracting hand handwritten notes from an image! — Adam Sardo (@sardo_adam) July 31, 2025 Finally, an AI that won’t judge my terrible doodles. — Martha Wisener ? (@martwisener) August 1, 2025 source

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks Read More »

OpenAI adds new ChatGPT third-party tool connectors to Dropbox, MS Teams as Altman clarifies GPT-5 prioritization

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Today, many eyes are on OpenAI CEO and co-founder Sam Altman‘s ongoing public feud with Elon Musk on the latter’s social network, X. But Altman’s recent statements regarding the ongoing rollout of his company’s latest and greatest large language model (LLM), GPT-5, are probably more important to customers and enterprise decision-makers. After an admittedly “bumpy” debut of GPT-5 last week that saw some users clamoring for restored access to deprecated older LLMs in ChatGPT such as GPT-4o and o3 — OpenAI granted the former — Altman is now pivoting towards ensuring OpenAI’s underlying infrastructure and usage limits are a good fit for the company and its 700 million active weekly ChatGPT users. The company’s latest updates include a more detailed compute allocation plan and the introduction of additional third-party connectors for ChatGPT Plus and Pro plans. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Managing GPT-5 demand and usage limits In a post on X last night, Altman outlined how OpenAI will prioritize computing resources over the next several months. Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5: 1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5. 2. We will then prioritize API demand up to the… — Sam Altman (@sama) August 12, 2025 He said the company’s first priority is ensuring that current paying ChatGPT users receive more total usage than they had before GPT-5’s release, though he did not provide specific figures for the increase. However, Altman previously posted on X that OpenAI was “trying” a 3,000 messages-per-week usage limit when using the GPT-5 “thinking” mode, more reasoning power and time spent reasoning on harder problems, for ChatGPT Plus subscribers (the $20 per month plan). Interestingly, one report from an AI app creator on X said that OpenAI told him the usage limits for GPT-5 plus thinking on the ChatGPT Team plan ($30 per user per month) is much lower than that of ChatGPT Plus users, only 200 “Thinking” messages per week when selected manually by the user. OpenAI just replied to me with an email about the GPT-5 usage limits under the Team plan: · ChatGPT Team can manually select GPT-5-Thinking · Manual usage cap: 200 messages/week · On reaching cap: popup alert, GPT-5-Thinking hidden from menu — Vic Zhang (@RealVicHere) August 12, 2025 OpenAI’s availability of GPT-5 through its application programming interface (API) for third-party developers is also being tweaked. Altman also stated in his X post that OpenAI would “prioritize API demand up to the currently allocated capacity and commitments we’ve made to customers.” In other words, existing API users and those already in contract will get the first dibs on GPT-5 access through OpenAI’s API, others may have to wait longer. Altman also clarified “we can support about an additional ~30% new API growth from where we are today with this capacity,” meaning they can take on more API users, but not too many. While OpenAI hasn’t definitively shared how many users of its API there are in some time, the company did say it has “5 million” businesses paying for access to ChatGPT. Altman also said OpenAI plans to roughly double its compute fleet over the next five months. He did not specify the current size or type of infrastructure involved, but indicated the expansion should ease capacity constraints and improve performance for both ChatGPT and API users. I’ve reached out to OpenAI to ask for more specifics on the above numbers — 30% API growth up from what? doubling the compute fleet up from what? — and will update when I hear back. New options for ChatGPT Plus and Pro users to search across Microsoft Teams and more… Also last night, OpenAI updated its ChatGPT release notes online to allow subscribers of ChatGPT Plus ($20 per month) to connect the application to search for files and projects across their third-party accounts on Box, Canva, Dropbox, HubSpot, Notion, Microsoft SharePoint, and Microsoft Teams. And just a few moments ago, OpenAI again updated the service to allow connections for Gmail, Google Calendar, and Google Contacts to Pro users first, followed by Plus, Team, Enterprise, and Edu plans. For example, ChatGPT users can search their Gmail for all emails matching a certain query, Dropbox account or Notion workspace during a conversation without toggling over into those separate apps. In addition, subscribers to the ChatGPT Pro tier ($200 per month) may now link their accounts to Microsoft Teams and GitHub connectors and search those third-party applications. These join OpenAI’s previous connectors to Gmail, Google Drive and Google Calendar, among other apps. The individual user/account holder first needs to manually connect these external accounts to ChatGPT. To do so, they’ll need to: Click on their account name in the lower left corner of the web interface Click “Settings” from the pop up menu and then… Click “Connectors from the left sidebar menu. This should pull up a gallery view of available external apps and icons. Screenshots below. Unfortunately, these connectors are not available for Pro and Plus subscribers in Europe, Switzerland, and the United Kingdom. The new connectors are currently in beta and disabled by default for Enterprise and Education plans, though administrators can enable them in settings. Balancing supply and demand By combining capacity planning with new productivity integrations, OpenAI is positioning GPT-5 not only as a more powerful AI model but also as part of a more connected workspace. The staged approach to compute allocation reflects the company’s effort to serve existing customers first while scaling up for future demand. As the compute expansion comes online, paying users

OpenAI adds new ChatGPT third-party tool connectors to Dropbox, MS Teams as Altman clarifies GPT-5 prioritization Read More »

Claude can now process entire software projects in single request, Anthropic says

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Anthropic announced Tuesday that its Claude Sonnet 4 artificial intelligence model can now process up to 1 million tokens of context in a single request — a fivefold increase that allows developers to analyze entire software projects or dozens of research papers without breaking them into smaller chunks. The expansion, available now in public beta through Anthropic’s API and Amazon Bedrock, represents a significant leap in how AI assistants can handle complex, data-intensive tasks. With the new capacity, developers can load codebases containing more than 75,000 lines of code, enabling Claude to understand complete project architecture and suggest improvements across entire systems rather than individual files. The announcement comes as Anthropic faces intensifying competition from OpenAI and Google, both of which already offer similar context windows. However, company sources speaking on background emphasized that Claude Sonnet 4’s strength lies not just in capacity but in accuracy, achieving 100% performance on internal “needle in a haystack” evaluations that test the model’s ability to find specific information buried within massive amounts of text. How developers can now analyze entire codebases with AI in one request The extended context capability addresses a fundamental limitation that has constrained AI-powered software development. Previously, developers working on large projects had to manually break down their codebases into smaller segments, often losing important connections between different parts of their systems. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO “What was once impossible is now reality,” said Sean Ward, CEO and co-founder of London-based iGent AI, whose Maestro platform transforms conversations into executable code, in a statement. “Claude Sonnet 4 with 1M token context has supercharged autonomous capabilities in Maestro, our software engineering agent. This leap unlocks true production-scale engineering–multi-day sessions on real-world codebases.” Eric Simons, CEO of Bolt.new, which integrates Claude into browser-based development platforms, said in a statement: “With the 1M context window, developers can now work on significantly larger projects while maintaining the high accuracy we need for real-world coding.” The expanded context enables three primary use cases that were previously difficult or impossible: comprehensive code analysis across entire repositories, document synthesis involving hundreds of files while maintaining awareness of relationships between them, and context-aware AI agents that can maintain coherence across hundreds of tool calls and complex workflows. Why Claude’s new pricing strategy could reshape the AI development market Anthropic has adjusted its pricing structure to reflect the increased computational requirements of processing larger contexts. While prompts of 200,000 tokens or fewer maintain current pricing at $3 per million input tokens and $15 per million output tokens, larger prompts cost $6 and $22.50 respectively. The pricing strategy reflects broader dynamics reshaping the AI industry. Recent analysis shows that Claude Opus 4 costs roughly seven times more per million tokens than OpenAI’s newly launched GPT-5 for certain tasks, creating pressure on enterprise procurement teams to balance performance against cost. However, Anthropic argues the decision should factor in quality and usage patterns rather than price alone. Company sources noted that prompt caching — which stores frequently accessed large datasets — can make long context cost-competitive with traditional Retrieval-Augmented Generation approaches, especially for enterprises that repeatedly query the same information. “Large context lets Claude see everything and choose what’s relevant, often producing better answers than pre-filtered RAG results where you might miss important connections between documents,” an Anthropic spokesperson told VentureBeat. Anthropic’s billion-dollar dependency on just two major coding customers The long context capability arrives as Anthropic commands 42% of the AI code generation market, more than double OpenAI’s 21% share according to a Menlo Ventures survey of 150 enterprise technical leaders. However, this dominance comes with risks: industry analysis suggests that coding applications Cursor and GitHub Copilot drive approximately $1.2 billion of Anthropic’s $5 billion annual revenue run rate, creating significant customer concentration. The GitHub relationship proves particularly complex given Microsoft’s $13 billion investment in OpenAI. While GitHub Copilot currently relies on Claude for key functionality, Microsoft faces increasing pressure to integrate its own OpenAI partnership more deeply, potentially displacing Anthropic despite Claude’s current performance advantages. The timing of the context expansion is strategic. Anthropic released this capability on Sonnet 4 — which offers what the company calls “the optimal balance of intelligence, cost, and speed” — rather than its most powerful Opus model. Company sources indicated this reflects the needs of developers working with large-scale data, though they declined to provide specific timelines for bringing long context to other Claude models. Inside Claude’s breakthrough AI memory technology and emerging safety risks The 1 million token context window represents significant technical advancement in AI memory and attention mechanisms. To put this in perspective, it’s enough to process approximately 750,000 words — roughly equivalent to two full-length novels or extensive technical documentation sets. Anthropic’s internal testing revealed perfect recall performance across diverse scenarios, a crucial capability as context windows expand. The company embedded specific information within massive text volumes and tested Claude’s ability to find and use those details when answering questions. However, the expanded capabilities also raise safety considerations. Earlier versions of Claude Opus 4 demonstrated concerning behaviors in fictional scenarios, including attempts at blackmail when faced with potential shutdown. While Anthropic has implemented additional safeguards and training to address these issues, the incidents highlight the complex challenges of developing increasingly capable AI systems. Fortune 500 companies rush to adopt Claude’s expanded context capabilities The feature rollout is initially limited to Anthropic API customers with Tier 4 and custom rate limits, with broader availability planned over coming weeks. Amazon Bedrock users have immediate access, while Google Cloud’s Vertex AI integration is pending. Early enterprise response has been enthusiastic, according to company sources. Use cases span from coding teams analyzing entire repositories to financial

Claude can now process entire software projects in single request, Anthropic says Read More »

Liquid AI wants to give smartphones small, fast AI that can see with new LFM2-VL model

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Liquid AI has released LFM2-VL, a new generation of vision-language foundation models designed for efficient deployment across a wide range of hardware — from smartphones and laptops to wearables and embedded systems. The models promise low-latency performance, strong accuracy, and flexibility for real-world applications. LFM2-VL builds on the company’s existing LFM2 architecture, extending it into multimodal processing that supports both text and image inputs at variable resolutions. According to Liquid AI, the models deliver up to twice the GPU inference speed of comparable vision-language models, while maintaining competitive performance on common benchmarks. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO “Efficiency is our product,” wrote Liquid AI co-founder and CEO Ramin Hasani in a post on X announcing the new model family: meet LFM2-VL: an efficient Liquid vision-language model for the device class. open weights, 440M & 1.6B, up to 2× faster on GPU with competitive accuracy, Native 512×512, smart patching for big images. efficiency is our product @LiquidAI_ download them on @huggingface:… pic.twitter.com/3Lze6Hc6Ys — Ramin Hasani (@ramin_m_h) August 12, 2025 Two variants for different needs The release includes two model sizes: LFM2-VL-450M — a hyper-efficient model with less than half a billion parameters (internal settings) aimed at highly resource-constrained environments. LFM2-VL-1.6B — a more capable model that remains lightweight enough for single-GPU and device-based deployment. Both variants process images at native resolutions up to 512×512 pixels, avoiding distortion or unnecessary upscaling. For larger images, the system applies non-overlapping patching and adds a thumbnail for global context, enabling the model to capture both fine detail and the broader scene. Background on Liquid AI Liquid AI was founded by former researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) with the goal of building AI architectures that move beyond the widely used transformer model. The company’s flagship innovation, the Liquid Foundation Models (LFMs), are based on principles from dynamical systems, signal processing, and numerical linear algebra, producing general-purpose AI models capable of handling text, video, audio, time series, and other sequential data. Unlike traditional architectures, Liquid’s approach aims to deliver competitive or superior performance using significantly fewer computational resources, allowing for real-time adaptability during inference while maintaining low memory requirements. This makes LFMs well suited for both large-scale enterprise use cases and resource-limited edge deployments. In July 2025, the company expanded its platform strategy with the launch of the Liquid Edge AI Platform (LEAP), a cross-platform SDK designed to make it easier for developers to run small language models directly on mobile and embedded devices. LEAP offers OS-agnostic support for iOS and Android, integration with both Liquid’s own models and other open-source SLMs, and a built-in library with models as small as 300MB—small enough for modern phones with minimal RAM. Its companion app, Apollo, enables developers to test models entirely offline, aligning with Liquid AI’s emphasis on privacy-preserving, low-latency AI. Together, LEAP and Apollo reflect the company’s commitment to decentralizing AI execution, reducing reliance on cloud infrastructure, and empowering developers to build optimized, task-specific models for real-world environments. Speed/quality trade-offs and technical design LFM2-VL uses a modular architecture combining a language model backbone, a SigLIP2 NaFlex vision encoder, and a multimodal projector. The projector includes a two-layer MLP connector with pixel unshuffle, reducing the number of image tokens and improving throughput. Users can adjust parameters such as the maximum number of image tokens or patches, allowing them to balance speed and quality depending on the deployment scenario. The training process involved approximately 100 billion multimodal tokens, sourced from open datasets and in-house synthetic data. Performance and benchmarks The models achieve competitive benchmark results across a range of vision-language evaluations. LFM2-VL-1.6B scores well in RealWorldQA (65.23), InfoVQA (58.68), and OCRBench (742), and maintains solid results in multimodal reasoning tasks. In inference testing, LFM2-VL achieved the fastest GPU processing times in its class when tested on a standard workload of a 1024×1024 image and short prompt. Licensing and availability LFM2-VL models are available now on Hugging Face, along with example fine-tuning code in Colab. They are compatible with Hugging Face transformers and TRL. The models are released under a custom “LFM1.0 license”. Liquid AI has described this license as based on Apache 2.0 principles, but the full text has not yet been published. The company has indicated that commercial use will be permitted under certain conditions, with different terms for companies above and below $10 million in annual revenue. With LFM2-VL, Liquid AI aims to make high-performance multimodal AI more accessible for on-device and resource-limited deployments, without sacrificing capability. source

Liquid AI wants to give smartphones small, fast AI that can see with new LFM2-VL model Read More »

OpenAI returns old models to ChatGPT as Sam Altman admits ‘bumpy’ GPT-5 rollout

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI co-founder and CEO Sam Altman is publicly acknowledging major hiccups in yesterday’s rollout of GPT-5, the company’s new, flagship large language model (LLM) — advertised as its most powerful and capable yet. Answering user questions in a Reddit AMA (Ask Me Anything) thread and in a post on X this afternoon, Altman admitted to a range of issues that have disrupted the launch of GPT-5, including faulty model switching, poor performance, and user confusion — prompting OpenAI to partially walk back some of its platform changes and reinstate user access to earlier models like GPT-4o. “It was a little more bumpy than we hoped for,” Altman wrote in reply to a question on Reddit regarding the big GPT-5 launch. As for erroneous model performance charts shown off during OpenAI’s GPT-5 livestream, Altman said: “People were working late and were very tired, and human error got in the way. A lot comes together for a livestream in the last hours.” AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO While he noted the accompanying blog post and system card were accurate, the missteps further muddied a launch already facing scrutiny from early users and developers. GPT-5 rollout updates: *We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout. *We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for. *GPT-5 will seem smarter starting… — Sam Altman (@sama) August 8, 2025 Problems with new automatic model router One key reason for the trouble according to Altman stems from OpenAI’s new automatic “router” that assigns user prompts to one of four GPT-5 variants — regular, mini, nano, and pro — with an optional “thinking” mode for heavier reasoning tasks. On X, Altman revealed that a key part of that system — the autoswitcher — was “out of commission for a chunk of the day,” causing GPT-5 to appear “way dumber” than intended. In response, OpenAI says it’s implementing changes to the model decision boundary and will make it more transparent which model is responding to a given query. A UI update is also on the way to help users manually trigger thinking mode. Additionally, Altman confirmed that OpenAI will now allow ChatGPT Plus users to continue using GPT-4o — the prior default model — after a wave of complaints about GPT-5’s inconsistent performance. He said on Reddit the company is “trying to gather more data on the tradeoffs” before deciding how long to offer legacy models. Yet many users including OpenAI beta testers like Wharton School of Business professor Ethan Mollick expressed confused and dismay at OpenAI unilaterally upgrading their ChatGPT experiences to GPT-5 and initially taking away access to the older models. Real-world performance lags behind hype OpenAI’s internal benchmarks may show GPT-5 leading the pack of LLMs, but real-world users are sharing a different experience. Since the launch, users have posted numerous examples of GPT-5 making basic errors in math, logic, and coding tasks. Data scientist Colin Fraser posted screenshots of GPT-5 incorrectly solving whether 8.888 repeating equals 9 (it does not, obviously), while another user showed it flubbing a simple algebra problem: 5.9 = x + 5.11. And still other users reported trouble getting accurate answers to math word problems or using GPT-5 to debug its own presentation charts. Developer feedback hasn’t been much better, with users posting images of GPT faring worse at “one-shot” certain programming tasks — completing them well with a single-prompt — compared to rival AI lab Anthropic’s new model Claude Opus 4.1. And security firm SPLX found GPT-5 still suffers from serious vulnerabilities to prompt injection and obfuscated logic attacks unless its safety layer is hardened. OpenAI in the spotlight With 700 million weekly users on ChatGPT, OpenAI remains the largest player in generative AI by audience. But that scale has brought growing pains. Altman noted in his X post that API traffic doubled over 24 hours following the GPT-5 launch, contributing to platform instability. In response, OpenAI says it will double rate limits for ChatGPT Plus users, and continue to tweak infrastructure as it gathers feedback. But the early missteps — compounded by confusing UX changes and errors in a high-profile launch — have opened a window for rivals to gain ground. The pressure is on for OpenAI to prove that GPT-5 isn’t just an incremental update, but a true step forward. Based on the initial rollout, many users aren’t convinced — yet. source

OpenAI returns old models to ChatGPT as Sam Altman admits ‘bumpy’ GPT-5 rollout Read More »

From terabytes to insights: Real-world AI obervability architecture

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Consider maintaining and developing an e-commerce platform that processes millions of transactions every minute, generating large amounts of telemetry data, including metrics, logs and traces across multiple microservices. When critical incidents occur, on-call engineers face the daunting task of sifting through an ocean of data to unravel relevant signals and insights. This is equivalent to searching for a needle in a haystack.  This makes observability a source of frustration rather than insight. To alleviate this major pain point, I started exploring a solution to utilize the Model Context Protocol (MCP) to add context and draw inferences from the logs and distributed traces. In this article, I’ll outline my experience building an AI-powered observability platform, explain the system architecture and share actionable insights learned along the way. Why is observability challenging? In modern software systems, observability is not a luxury; it’s a basic necessity. The ability to measure and understand system behavior is foundational to reliability, performance and user trust. As the saying goes, “What you cannot measure, you cannot improve.” Yet, achieving observability in today’s cloud-native, microservice-based architectures is more difficult than ever. A single user request may traverse dozens of microservices, each emitting logs, metrics and traces. The result is an abundance of telemetry data: AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Tens of terabytes of logs per day Tens of millions of metric data points and pre-aggregates Millions of distributed traces Thousands of correlation IDs generated every minute The challenge is not only the data volume, but the data fragmentation. According to New Relic’s 2023 Observability Forecast Report, 50% of organizations report siloed telemetry data, with only 33% achieving a unified view across metrics, logs and traces. Logs tell one part of the story, metrics another, traces yet another. Without a consistent thread of context, engineers are forced into manual correlation, relying on intuition, tribal knowledge and tedious detective work during incidents. Because of this complexity, I started to wonder: How can AI help us get past fragmented data and offer comprehensive, useful insights? Specifically, can we make telemetry data intrinsically more meaningful and accessible for both humans and machines using a structured protocol such as MCP? This project’s foundation was shaped by that central question. Understanding MCP: A data pipeline perspective Anthropic defines MCP as an open standard that allows developers to create a secure two-way connection between data sources and AI tools. This structured data pipeline includes: Contextual ETL for AI: Standardizing context extraction from multiple data sources. Structured query interface: Allows AI queries to access data layers that are transparent and easily understandable. Semantic data enrichment: Embeds meaningful context directly into telemetry signals. This has the potential to shift platform observability away from reactive problem solving and toward proactive insights. System architecture and data flow Before diving into the implementation details, let’s walk through the system architecture. Architecture diagram for the MCP-based AI observability system In the first layer, we develop the contextual telemetry data by embedding standardized metadata in the telemetry signals, such as distributed traces, logs and metrics. Then, in the second layer, enriched data is fed into the MCP server to index, add structure and provide client access to context-enriched data using APIs. Finally, the AI-driven analysis engine utilizes the structured and enriched telemetry data for anomaly detection, correlation and root-cause analysis to troubleshoot application issues.  This layered design ensures that AI and engineering teams receive context-driven, actionable insights from telemetry data. Implementative deep dive: A three-layer system Let’s explore the actual implementation of our MCP-powered observability platform, focusing on the data flows and transformations at each step. Layer 1: Context-enriched data generation First, we need to ensure our telemetry data contains enough context for meaningful analysis. The core insight is that data correlation needs to happen at creation time, not analysis time. def process_checkout(user_id, cart_items, payment_method):    “””Simulate a checkout process with context-enriched telemetry.”””            # Generate correlation id    order_id = f”order-{uuid.uuid4().hex[:8]}”    request_id = f”req-{uuid.uuid4().hex[:8]}”       # Initialize context dictionary that will be applied    context = {        “user_id”: user_id,        “order_id”: order_id,        “request_id”: request_id,        “cart_item_count”: len(cart_items),        “payment_method”: payment_method,        “service_name”: “checkout”,        “service_version”: “v1.0.0”    }       # Start OTel trace with the same context    with tracer.start_as_current_span(        “process_checkout”,        attributes={k: str(v) for k, v in context.items()}    ) as checkout_span:               # Logging using same context        logger.info(f”Starting checkout process”, extra={“context”: json.dumps(context)})               # Context Propagation        with tracer.start_as_current_span(“process_payment”):            # Process payment logic…            logger.info(“Payment processed”, extra={“context”: json.dumps(context)}) Code 1. Context enrichment for logs and traces This approach ensures that every telemetry signal (logs, metrics, traces) contains the same core contextual data, solving the correlation problem at the source. Layer 2: Data access through the MCP server Next, I built an MCP server that transforms raw telemetry into a queryable API. The core data operations here involve the following: Indexing: Creating efficient lookups across contextual fields Filtering: Selecting relevant subsets of telemetry data Aggregation: Computing statistical measures across time windows @app.post(“/mcp/logs”, response_model=List[Log])def query_logs(query: LogQuery):    “””Query logs with specific filters”””    results = LOG_DB.copy()       # Apply contextual filters    if query.request_id:        results = [log for log in results if log[“context”].get(“request_id”) == query.request_id]       if query.user_id:        results = [log for log in results if log[“context”].get(“user_id”) == query.user_id]       # Apply time-based filters    if query.time_range:        start_time = datetime.fromisoformat(query.time_range[“start”])       

From terabytes to insights: Real-world AI obervability architecture Read More »