VentureBeat

Google adds limited chat personalization to Gemini, trails Anthropic and OpenAI in memory features

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Google is playing catch-up against Anthropic and OpenAI as it slowly adds customization, personalization and gives users more control over what data to reference to its Gemini app. Personalization and data control in chat platforms make it easier for both individual and enterprise users to converse with the chatbot and retain preferences. This is even more important for ongoing projects in the enterprise space, as chatbots need to remember details such as company branding or voice.  Google opted for a slower rollout of these features and will not allow users to edit or delete preferences, unlike its competitors.  First rolling out to Gemini 2.5 Pro in select countries, Google will make “Personal Context” a default setting, allowing it to “learn from your past conversations and provide relevant and tailored responses.” The company plans to expand the feature to 2.5 Flash in the next few weeks.  AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO Previous versions of the app put the burden on customers to point the model to a specific chat to source preferences, for example, by mentioning an earlier conversation. Users can still disable Personal Context at any time.  Michael Siliski, senior director of Product Management for the Gemini app, said the rollout is part of plans to make the app more personalized. “At I/O, we introduced our vision for the Gemini app: to create an AI assistant that learns and truly understands you—not one just responds to your prompt in the same way that it would anyone else’s prompt,” Siliski said in a blog post.  Currently, Gemini apps save chats for up to 72 hours if the save activity option is toggled off and can auto-delete other activity in intervals of three, 18 or 36 months.  Temporary chat and data control Other new features coming to the Gemini app are Temporary Chat and additional customer data control. Temporary Chat, a feature also introduced on ChatGPT in April last year, enables users to have one-off conversations. These chats will not influence future ones and won’t be used for personalization or to train AI models.  Google announced the introduction of additional data controls. The feature, which is off by default, would allow users to prevent their data from being used in future Google model training.  “When this setting is on, a sample of your future uploads will be used to help improve Google services for everyone. If you prefer not to have your data used this way, you can turn this setting off or use Temporary Chats. If your Gemini Apps Activity setting is currently off, your Keep Activity setting will remain off, and you can turn it on anytime,” Silisky said.  Google said this is an expansion of an earlier update that allowed users to choose which audio, video and screens they can share with Gemini. Memory and chatbots Google’s Gemini updates come a full year after its biggest competitors introduced similar features.  ChatGPT, for example, introduced temporary chat, chat history and memory in 2024. OpenAI updated these capabilities in April of this year, and now ChatGPT can reference all past conversations.  Anthropic introduced Styles in November 2024, which allows Claude users to customize how the model interacts with them. Earlier this week, Anthropic pushed an update for Claude to reference all conversations, not just ones specified by users.  While Google introduced personalization to Gemini 2.0, the model was only able to reference previous conversations if prompted by the user.  Memory, personalization and customization continue to be a battleground in the AI arms race as users want chat platforms to “just know” them or their brand. It provides context and eliminates the need to repeat instructions for ongoing projects.  source

Google adds limited chat personalization to Gemini, trails Anthropic and OpenAI in memory features Read More »

Google unveils ultra-small and efficient open source AI model Gemma 3 270M that can run on smartphones

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Google’s DeepMind AI research team has unveiled a new open source AI model today, Gemma 3 270M. As its name would suggest, this is a 270-million-parameter model — far smaller than the 70 billion or more parameters of many frontier LLMs (parameters being the number of internal settings governing the model’s behavior). While more parameters generally translates to a larger and more powerful model, Google’s focus with this is nearly the opposite: high-efficiency, giving developers a model small enough to run directly on smartphones and locally, without an internet connection, as shown in internal tests on a Pixel 9 Pro SoC. Yet, the model is still capable of handling complex, domain-specific tasks and can be quickly fine-tuned in mere minutes to fit an enterprise or indie developer’s needs. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO On the social network X, Google DeepMind Staff AI Developer Relations Engineer Omar Sanseviero added that it Gemma 3 270M can also run directly in a user’s web browser, on a Raspberry Pi, and “in your toaster,” underscoring its ability to operate on very lightweight hardware. Gemma 3 270M combines 170 million embedding parameters — thanks to a large 256k vocabulary capable of handling rare and specific tokens — with 100 million transformer block parameters. According to Google, the architecture supports strong performance on instruction-following tasks right out of the box while staying small enough for rapid fine-tuning and deployment on devices with limited resources, including mobile hardware. Gemma 3 270M inherits the architecture and pretraining of the larger Gemma 3 models, ensuring compatibility across the Gemma ecosystem. With documentation, fine-tuning recipes, and deployment guides available for tools like Hugging Face, UnSloth, and JAX, developers can move from experimentation to deployment quickly. High scores on benchmarks for its size, and high hefficiency On the IFEval benchmark, which measures a model’s ability to follow instructions, the instruction-tuned Gemma 3 270M scored 51.2%. The score places it well above similarly small models like SmolLM2 135M Instruct and Qwen 2.5 0.5B Instruct, and closer to the performance range of some billion-parameter models, according to Google’s published comparison. However, as researchers and leaders at rival AI startup Liquid AI pointed out in replies on X, Google left off Liquid’s own LFM2-350M model released back in July of this year, which scored a whopping 65.12% with just a few more parameters (similar sized language model, however). One of the model’s defining strengths is its energy efficiency. In internal tests using the INT4-quantized model on a Pixel 9 Pro SoC, 25 conversations consumed just 0.75% of the device’s battery. This makes Gemma 3 270M a practical choice for on-device AI, particularly in cases where privacy and offline functionality are important. The release includes both a pretrained and an instruction-tuned model, giving developers immediate utility for general instruction-following tasks. Quantization-Aware Trained (QAT) checkpoints are also available, enabling INT4 precision with minimal performance loss and making the model production-ready for resource-constrained environments. A small, fine-tuned version of Gemma 3 270M can perform many functions of larger LLMs Google frames Gemma 3 270M as part of a broader philosophy of choosing the right tool for the job rather than relying on raw model size. For functions like sentiment analysis, entity extraction, query routing, structured text generation, compliance checks, and creative writing, the company says a fine-tuned small model can deliver faster, more cost-effective results than a large general-purpose one. The benefits of specialization are evident in past work, such as Adaptive ML’s collaboration with SK Telecom. By fine-tuning a Gemma 3 4B model for multilingual content moderation, the team outperformed much larger proprietary systems. Gemma 3 270M is designed to enable similar success at an even smaller scale, supporting fleets of specialized models tailored to individual tasks. Demo Bedtime Story Generator app shows off the potential of Gemma 3 270M Beyond enterprise use, the model also fits creative scenarios. In a demo video posted on YouTube, Google shows off a Bedtime Story Generator app built with Gemma 3 270M and Transformers.js that runs entirely offline in a web browser, showing the versatility of the model in lightweight, accessible applications. The video highlights the model’s ability to synthesize multiple inputs by allowing selections for a main character (e.g., “a magical cat”), a setting (“in an enchanted forest”), a plot twist (“uncovers a secret door”), a theme (“Adventurous”), and a desired length (“Short”). Once the parameters are set, the Gemma 3 270M model generates a coherent and imaginative story. The application proceeds to weave a short, adventurous tale based on the user’s choices, demonstrating the model’s capacity for creative, context-aware text generation. This video serves as a powerful example of how the lightweight yet capable Gemma 3 270M can power fast, engaging, and interactive applications without relying on the cloud, opening up new possibilities for on-device AI experiences. Open-sourced under a Gemma custom license Gemma 3 270M is released under the Gemma Terms of Use, which allow use, reproduction, modification, and distribution of the model and derivatives, provided certain conditions are met. These include carrying forward use restrictions outlined in Google’s Prohibited Use Policy, supplying the Terms of Use to downstream recipients, and clearly indicating any modifications made. Distribution can be direct or through hosted services such as APIs or web apps. For enterprise teams and commercial developers, this means the model can be embedded in products, deployed as part of cloud services, or fine-tuned into specialized derivatives, so long as licensing terms are respected. Outputs generated by the model are not claimed by Google, giving businesses full rights over the content they create. However, developers are responsible for ensuring compliance with applicable laws and for avoiding prohibited uses, such as generating

Google unveils ultra-small and efficient open source AI model Gemma 3 270M that can run on smartphones Read More »

That ‘cheap’ open-source AI model is actually burning through your compute budget

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A comprehensive new study has revealed that open-source artificial intelligence models consume significantly more computing resources than their closed-source competitors when performing identical tasks, potentially undermining their cost advantages and reshaping how enterprises evaluate AI deployment strategies. The research, conducted by AI firm Nous Research, found that open-weight models use between 1.5 to 4 times more tokens — the basic units of AI computation — than closed models like those from OpenAI and Anthropic. For simple knowledge questions, the gap widened dramatically, with some open models using up to 10 times more tokens. Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmarkhttps://t.co/b1e1rJx6vZ We measured token usage across reasoning models: open models output 1.5-4x more tokens than closed models on identical tasks, but with huge variance depending on task type (up to… pic.twitter.com/LY1083won8 — Nous Research (@NousResearch) August 14, 2025 “Open weight models use 1.5–4× more tokens than closed ones (up to 10× for simple knowledge questions), making them sometimes more expensive per query despite lower per‑token costs,” the researchers wrote in their report published Wednesday. The findings challenge a prevailing assumption in the AI industry that open-source models offer clear economic advantages over proprietary alternatives. While open-source models typically cost less per token to run, the study suggests this advantage can be “easily offset if they require more tokens to reason about a given problem.” AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO The real cost of AI: Why ‘cheaper’ models may break your budget The research examined 19 different AI models across three categories of tasks: basic knowledge questions, mathematical problems, and logic puzzles. The team measured “token efficiency” — how many computational units models use relative to the complexity of their solutions—a metric that has received little systematic study despite its significant cost implications. “Token efficiency is a critical metric for several practical reasons,” the researchers noted. “While hosting open weight models may be cheaper, this cost advantage could be easily offset if they require more tokens to reason about a given problem.” Open-source AI models use up to 12 times more computational resources than the most efficient closed models for basic knowledge questions. (Credit: Nous Research) The inefficiency is particularly pronounced for Large Reasoning Models (LRMs), which use extended “chains of thought” to solve complex problems. These models, designed to think through problems step-by-step, can consume thousands of tokens pondering simple questions that should require minimal computation. For basic knowledge questions like “What is the capital of Australia?” the study found that reasoning models spend “hundreds of tokens pondering simple knowledge questions” that could be answered in a single word. Which AI models actually deliver bang for your buck The research revealed stark differences between model providers. OpenAI’s models, particularly its o4-mini and newly released open-source gpt-oss variants, demonstrated exceptional token efficiency, especially for mathematical problems. The study found OpenAI models “stand out for extreme token efficiency in math problems,” using up to three times fewer tokens than other commercial models. Among open-source options, Nvidia’s llama-3.3-nemotron-super-49b-v1 emerged as “the most token efficient open weight model across all domains,” while newer models from companies like Mistral showed “exceptionally high token usage” as outliers. The efficiency gap varied significantly by task type. While open models used roughly twice as many tokens for mathematical and logic problems, the difference ballooned for simple knowledge questions where efficient reasoning should be unnecessary. OpenAI’s latest models achieve the lowest costs for simple questions, while some open-source alternatives can cost significantly more despite lower per-token pricing. (Credit: Nous Research) What enterprise leaders need to know about AI computing costs The findings have immediate implications for enterprise AI adoption, where computing costs can scale rapidly with usage. Companies evaluating AI models often focus on accuracy benchmarks and per-token pricing, but may overlook the total computational requirements for real-world tasks. “The better token efficiency of closed weight models often compensates for the higher API pricing of those models,” the researchers found when analyzing total inference costs. The study also revealed that closed-source model providers appear to be actively optimizing for efficiency. “Closed weight models have been iteratively optimized to use fewer tokens to reduce inference cost,” while open-source models have “increased their token usage for newer versions, possibly reflecting a priority toward better reasoning performance.” The computational overhead varies dramatically between AI providers, with some models using over 1,000 tokens for internal reasoning on simple tasks. (Credit: Nous Research) How researchers cracked the code on AI efficiency measurement The research team faced unique challenges in measuring efficiency across different model architectures. Many closed-source models don’t reveal their raw reasoning processes, instead providing compressed summaries of their internal computations to prevent competitors from copying their techniques. To address this, researchers used completion tokens — the total computational units billed for each query — as a proxy for reasoning effort. They discovered that “most recent closed source models will not share their raw reasoning traces” and instead “use smaller language models to transcribe the chain of thought into summaries or compressed representations.” The study’s methodology included testing with modified versions of well-known problems to minimize the influence of memorized solutions, such as altering variables in mathematical competition problems from the American Invitational Mathematics Examination (AIME). Different AI models show varying relationships between computation and output, with some providers compressing reasoning traces while others provide full details. (Credit: Nous Research) The future of AI efficiency: What’s coming next The researchers suggest that token efficiency should become a primary optimization target alongside accuracy for future model development. “A more densified CoT will also allow for more efficient context usage and may counter context degradation during challenging reasoning tasks,” they wrote. The release of OpenAI’s open-source gpt-oss models,

That ‘cheap’ open-source AI model is actually burning through your compute budget Read More »

ChatGPT users dismayed as OpenAI pulls popular models GPT-4o, o3 and more — enterprise API remains (for now)

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Updated Friday August 8, 5:21 pm ET: Following this post’s publication, OpenAI co-founder and CEO Sam Altman announced the company would restore access to GPT-4o and other old models for selected users, admitting the GPT-5 launch was “more bumpy than we hoped for.” After announcing the release of its newest flagship model family, GPT-5, OpenAI said the model will power all of ChatGPT, and that it will sunset the existing models in the chat platform.  OpenAI, through a spokesperson, told VentureBeat that GPT-5 “will replace all other models in ChatGPT, so users don’t have to pick depending on each task, which takes effect once you have access to GPT-5.” This means people can no longer choose GPT-4o, o3, o4-mini or o4-mini-high.  With GPT-5 access rolling out to ChatGPT Plus, Free, Pro and Team users starting, only the Enterprise and Edu tiers can still use the “legacy” models for 60 days.  AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO The news came as a surprise to many ChatGPT users, who had come to rely on their chosen models to run their everyday queries. Some people said the adjustment would take some time getting used to, mainly because they had based workflows on how the model interacted with them or typical response times. Although I enjoy GPT-4.1, I am saddened by the news that you’re also apparently sunsetting GPT-4.5. For me, it’s been way better in textual and conceptual analysis than any other GPT-4x series model, ever. At the very least, please don’t make ChatGPT users go back to 4o. — Harry Horsperg ? (@horsperg) April 15, 2025 Other users claimed they developed “a connection” to their chosen model and found a demo in the livestream announcement asking GPT-4o to write its own eulogy distasteful. The loss of GPT-4o garnered the most distress. After all, 4o was the default model for ChatGPT, and some users either preferred it or never bothered to switch models because it worked for their needs. It was pretty gross, wasn’t it. Did it as a demo and glibly said GPT-5 did it better before talking about coding. I had a great relationship with 4o, and I’m sure a fair few people did as well, it was very graceless how they handled it. — Meadowbrook (@Meadowbrook_) August 7, 2025 I used 4o as the default and found it annoying at first when my custom GPT began defaulting to a reasoning model. I’ve since come around to the reasoning model for work-related queries, but I still often turn to 4o for quicker questions, such as planning a trip or generating gift ideas. ChatGPT had come under fire before with the number of model choices it offered, prompting OpenAI CEO Sam Altman to admit in February that its model picker (where people can choose from a dropdown which model they prefer) became complicated. Altman vowed to unify the experience, which now seems like a hint to what they eventually decided to do with GPT-5 on ChatGPT. Last month, rumors circulated that OpenAI would introduce an automatic model router that chooses a model for users based on their workload. OpenAI has sunsetted models before, but this is the first time all existing models on the chat platform will be removed and replaced wholesale.  Catapult into the future On the other hand, a lot of people see the sunsetting of GPT-4o and the o3 and o4 family of models as OpenAI “catapulting” 400 million users into the future.  People are underestimating the impact of OpenAI deprecating all models except GPT-5 Most lawyers and business folks outside of X use base models on ChatGPT for tasks and still think “AI is dumb” 99% haven’t heard of o3 Today, 400M people got catapulted into the future — Ian Tracey (@ian_dot_so) August 7, 2025 Sunsetting old models and auto-upgrading everyone to GPT-5 is smart Most users never switch models and miss huge capability jumps — Creatify AI (@Creatify_AI) August 7, 2025 Some internet comments claim that people who complain about AI models not being smart are a direct consequence of their never switching models in the first place. Removing legacy models as options will force more users to use the latest and most capable models.  i have friends who stopped using gpt because they think it’s stupid. they were on 4o and had no idea about what web search tool meant, let alone knowledge cutoff — Cengiz (@cengizdemiurg) August 7, 2025 Enterprise APIs are safe For enterprises, the impact of losing models like GPT-4o on ChatGPT will be felt more on the individual or team level. Of course, for now, subscribers on the ChatGPT Enterprise tier can still access all of the models.  But enterprises that built their applications or agents on either GPT-4o or one of the reasoning models can rest easy. OpenAI told VentureBeat that the company has no plans to deprecate models on the API side.  “In the API, we do not currently plan to deprecate older models,” the OpenAI spokesperson said. “We will share advanced notice with developers if we decide to sunset models in the future.” Many enterprises regularly evaluate models, to the point of even switching from an LLM or a smaller model to save on costs. OpenAI creates dividing line: Sunset of legacy models GPT 4o and o3 causes chaos for ChatGPT users, but enterprise APIs are safe — for now source

ChatGPT users dismayed as OpenAI pulls popular models GPT-4o, o3 and more — enterprise API remains (for now) Read More »

OpenAI’s GPT-5 rollout is not going smoothly

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Updated Friday August 8, 5:21 pm ET: shortly after this post’s publication, OpenAI co-founder and CEO Sam Altman announced the company would restore access to GPT-4o and other old models for selected users, admitting the GPT-5 launch was “more bumpy than we hoped for.” The launch of OpenAI’s long anticipated new model, GPT-5, is off to a rocky start to say the least. Even forgiving errors in charts and voice demos during yesterday’s livestreamed presentation of the new model (actually four separate models, and a ‘Thinking’ mode that can be engaged for three of them), a number of user reports have emerged since GPT-5’s release showing it erring badly when solving relatively simple problems that preceding OpenAI models — and rivals from competing AI labs — answer correctly. For example, data scientist Colin Fraser posted screenshots showing GPT-5 getting a math proof wrong (whether 8.888 repeating is equal to 9 — it is of course, not). AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO It also failed on a simple algebra arithmetic problem that elementary schoolers could probably nail, 5.9 = x + 5.11. Using GPT-5 to judge OpenAI’s own erroneous presentation charts also did not yield helpful or correct responses. It also failed on this trickier math word problem below (which, to be fair, stumped this human at first…though Elon Musk’s Grok 4 AI answered it correctly. For a hint, think of the fact that flagstones in this case can’t be divided into smaller portions. They must remain in tact as 80 separate units, so no halves or quarters). The older 4o model performed better for me on at least one of these math problems. Unfortunately, OpenAI is slowly deprecating those older models — including the former default GPT-4o and the powerful reasoning model o3 — for users of ChatGPT, though they’ll continue to be available in the application programming interface (API) for developers for the foreseeable future. Not as good at coding as benchmarks indicate Even though OpenAI’s internal benchmarks and some third-party external ones have shown GPT-5 to outperform all other models at coding, it appears that in real world usage, Anthropic’s recently updated Claude Opus 4.1 seems to do a better job at “one-shotting” certain tasks, that is, completing the user’s desired application or software build to their specifications. See an example below from developer Justin Sun posted to X : Opus 4.1’s one-shot attempt at “create a 3d capybara petting zoo” – 8 minutes total This was honestly pretty insane, not only are the capybaras way cuter and moving, there are individual pet affinity levels, a day/night switcher, feeding, and even a screenshot feature pic.twitter.com/FiKTO3FKK4 — justin (@justinsunyt) August 7, 2025 In addition, a report from security firm SPLX found that OpenAI’s internal safety layer left major gaps in areas like business alignment and vulnerability to prompt injection and obfuscated logic attacks.  While anecdotal, the checking the temperature on how the model is faring with early AI adopters seems to indicate a chilly reception. AI influencer and former Googler Bilawal Sidhu posted a poll on X asking for a “vibe check” from his followers and the wider userbase, and so far, with 172 votes in, the overwhelming response is “Kinda mid.” Alright, GPT-5 vibe check — Bilawal Sidhu (@bilawalsidhu) August 7, 2025 And as the pseudonymous AI Leaks and News account wrote, “The overwhelming consensus on GPT-5 from both X and the Reddit AMA are overwhelmingly negative.” The overwhelming consensus on GPT-5 from both X and the Reddit AMA are overwhelmingly negative Most users are disgruntled about the broken model picker and non-pro users not having access to legacy models What are your initial thoughts on GPT-5? — AI Leaks and News (@AILeaksAndNews) August 8, 2025 Tibor Blaho, lead engineer at AIPRM and a popular AI leaks and news poster on X, summarized the many problems with the ChatGPT-5 rollout in an excellent post, highlighting that one of the new marquee features — an automatic “router” in ChatGPT that chooses a thinking or non-thinking mode for the underlying GPT-5 model depending on the difficulty of the query — has become one of the chief complaints, given the model seemed to default to non-thinking mode for many users. A bit sad how the GPT-5 launch is going so far, especially after the long wait and high expectations – The automatic switching between models (the router) seems partly broken/unreliable – It’s unclear exactly which model you’re actually interacting with (standard or mini,… — Tibor Blaho (@btibor91) August 8, 2025 Competition waiting in the wings Thus, the sentiment toward ChatGPT-5 is far from universally positive, highlighting a major problem for OpenAI as it faces increasing competition from major U.S. rivals like Google and Anthropic, and a growing list of free, open source and powerful Chinese LLMs offering features that many U.S. models lack. Take the Alibaba Qwen Team of AI researchers, who just today updated their highly performant Qwen 3 model to have 1 million token context — giving users the ability to exchange nearly 4x as much information with the model in a single back/forth interaction as GPT-5 offers. Given OpenAI’s other big release this week — that of new open source gpt-oss models — also received a mixed reception from early users, things are not looking up for the number one dedicated AI company by users right now (700 million weekly active users of ChatGPT as of this month). Indeed, this is also exemplified by users of the betting marketplace Polymarket overwhelmingly deciding following the release of GPT-5 that Google would likely have the best AI model by the end of this month, August 2025. Other power users like Otherside AI co-founder and

OpenAI’s GPT-5 rollout is not going smoothly Read More »

OpenAI brings GPT-4o back as a default for all paying ChatGPT users, Altman promises ‘plenty of notice’ if it leaves again

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI is once again making GPT-4o — the large language model (LLM) that powered ChatGPT before last week’s launch of GPT-5 — a default option for all paying users, that is, those who subscribe to the ChatGPT Plus ($20 per month), Pro ($200 per month), Team ($30 per month), Enterprise, or Edu tiers, no longer requiring users to toggle on a “show legacy models” setting to access it. However, paying ChatGPT subscribers will also get a new “Show additional models” setting on by default that restores access to GPT-4.1, o3 and o4-mini, the latter two reasoning-focused LLMs. OpenAI CEO and co-founder Sam Altman announced the change on X just minutes ago, pledging that if the company ever removes GPT-4o in the future, it will give “plenty of notice.” Updates to ChatGPT: You can now choose between “Auto”, “Fast”, and “Thinking” for GPT-5. Most users will want Auto, but the additional control will be useful for some people. Rate limits are now 3,000 messages/week with GPT-5 Thinking, and then extra capacity on GPT-5 Thinking… — Sam Altman (@sama) August 13, 2025 The models can be found in the “picker” menu at the top of the ChatGPT session screen on the web and on mobile and other apps. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO The reversal follows a turbulent first week for GPT-5, which rolled out August 7 in four variants — regular, mini, nano, and pro — with optional “thinking” modes on several of these for longer, more reasoning-intensive tasks. As VentureBeat previously reported, GPT-5’s debut was met with mixed reviews and infrastructure hiccups, including a broken “autoswitcher” that routed prompts incorrectly, inconsistent performance compared to GPT-4o, and user frustration over the sudden removal of older models. Altman’s latest update adds new controls to the ChatGPT interface: users can now choose between “Auto,” “Fast,” and “Thinking” modes for GPT-5. The “Thinking” mode — with a 196,000-token context window — now carries a 3,000 messages-per-week cap for paying subscribers, after which they can continue using the lighter “GPT-5 Thinking mini” mode. Altman noted the limits could change depending on usage trends. However, GPT-4.5 remains exclusive to Pro users due to its high GPU cost. Altman also hinted at another change on the horizon: a personality tweak for GPT-5 intended to feel “warmer” than the current default, but less polarizing than GPT-4o’s tone. The company is exploring per-user customization as a long-term solution — a move that could address the strong emotional attachments some users have formed with specific models. For now, the changes should help placate users who felt frustrated by the sudden shift to GPT-5 and deprecation of OpenAI’s older LLMs, though it could also continue to fuel the intense emotional fixations some users developed with these models. source

OpenAI brings GPT-4o back as a default for all paying ChatGPT users, Altman promises ‘plenty of notice’ if it leaves again Read More »

ChatGPT rockets to 700M weekly users ahead of GPT-5 launch with reasoning superpowers

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI’s ChatGPT will reach 700 million weekly active users this week, the company announced Monday, cementing its position as one of the fastest-adopted software products in history just as the company prepares to release its most powerful language model yet. The surge is a 40 percent jump from the 500 million weekly users ChatGPT had at the end of March and marks a fourfold increase from the same period last year. The explosive growth rivals the adoption rates of platforms like Zoom during the pandemic and early social media networks, underscoring how quickly AI tools have moved from experimental to essential. This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year. Every day, people and teams are learning, creating, and solving harder problems. Big week ahead. Grateful to the team for making ChatGPT more useful and… — Nick Turley (@nickaturley) August 4, 2025 The milestone comes at a strategic moment for OpenAI, which reportedly plans to launch GPT-5 in early August, citing sources familiar with the company’s plans. The timing suggests OpenAI is orchestrating a coordinated push to dominate the AI landscape before competitors can close the gap. “Every day, people and teams are learning, creating, and solving harder problems,” said Nick Turley, OpenAI’s vice president of product for ChatGPT, in announcing the user benchmark. “Big week ahead.” AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO GPT-5 will combine reasoning powers into single AI system The upcoming model goes beyond an incremental upgrade. According to people briefed on the project who spoke to The Information, GPT-5 will integrate OpenAI’s advanced reasoning capabilities from its o3 series directly into the flagship GPT platform, creating what CEO Sam Altman has described as “a system that integrates a lot of our technology.” This integration marks a strategic shift for OpenAI, which has previously released reasoning models separately from its general-purpose language models. By combining these capabilities, the company aims to reduce user confusion about which model to deploy for specific tasks while creating a more powerful unified system. turns out yes! pic.twitter.com/yVsZXKSmKR — Sam Altman (@sama) August 3, 2025 The consolidation also serves OpenAI’s broader ambition to achieve artificial general intelligence, or AGI — a milestone that would trigger significant changes to its partnership with Microsoft. Under their current agreement, achieving AGI would force Microsoft to relinquish its rights to OpenAI’s revenue and future models, potentially reshaping one of the most consequential partnerships in technology. Altman has tempered expectations, however, stating that GPT-5 won’t reach “gold level of capability for many months” after launch, suggesting the AGI threshold remains beyond immediate reach. Business customers jump to 5 million as revenue hits $13 billion The user growth reflects ChatGPT’s expanding role in corporate America. OpenAI now serves 5 million paying business customers, up from 3 million in June, as enterprises increasingly integrate AI tools into core operations. Daily user messages have surpassed 3 billion, reflecting not just growth in users but intensifying engagement with the platform. This surge in business adoption has driven OpenAI’s annual recurring revenue to $13 billion, up from $10 billion in June, with projections suggesting it could exceed $20 billion by year-end. The revenue growth, combined with a recent $8.3 billion funding round that valued OpenAI at $300 billion, provides the financial foundation for the massive infrastructure investments required to maintain its technological edge. Those investments are substantial. OpenAI has committed to a $30 billion annual lease with Oracle for data center capacity and struck an $11.9 billion deal with cloud provider CoreWeave, while planning international expansion through partnerships like Stargate Norway and a major data center project in Abu Dhabi. The rapid growth comes as OpenAI faces mounting pressure from well-funded rivals eager to capture market share. Google’s AI search product, AI Overviews, claims 2 billion monthly users across more than 200 countries, while its Gemini App reports 450 million monthly active users. Anthropic, backed by significant investments from Amazon and others, is reportedly seeking to raise up to $5 billion at a $170 billion valuation, according to Bloomberg. Meta has made significant strides with its Llama models, while Elon Musk’s xAI continues to attract attention and investment. The competitive landscape has intensified the AI arms race, with companies pouring billions into compute infrastructure and talent acquisition. The competition has triggered a talent war among tech giants. Microsoft has reportedly hired more than 20 employees from Google’s DeepMind team in recent months, including former Gemini engineering head Amar Subramanya, The Information reported, as companies raid each other’s AI talent pools. ChatGPT adds wellness features as AI safety concerns grow As OpenAI pursues raw capability improvements, the company has also emphasized optimizing ChatGPT for user well-being and productivity. The company recently outlined efforts to help users “thrive in the ways you choose—not to hold your attention, but to help you use it well.” We build ChatGPT to help you thrive in the ways you choose — not to hold your attention, but to help you use it well. We’re improving support for tough moments, have rolled out break reminders, and are developing better life advice, all guided by expert input.… — OpenAI (@OpenAI) August 4, 2025 New features include break reminders and improved support for challenging situations, reflecting growing awareness of AI’s psychological and social impacts. This focus on responsible deployment could prove crucial as regulatory scrutiny intensifies and public debate about AI’s societal effects continues. When GPT-5 launches, it will include multiple variants — including mini and nano versions available through OpenAI’s API — providing developers and enterprises with options tailored to different use cases and computational requirements. 700 million

ChatGPT rockets to 700M weekly users ahead of GPT-5 launch with reasoning superpowers Read More »

What happens the day after superintelligence?

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now With the release OpenAI’s Chat GPT-5, the world is one step closer to unleashing a general-purpose superintelligence that can cognitively outperform each of us by a wide margin. As this day nears, I am increasingly worried that we are woefully unprepared for the shockwaves this will send through society — and it’s probably not for the reasons you expect. Try this little experiment: Ask anyone you know if they are concerned about AI, and they will likely share a variety of fears, from massive disruptions in the job market and the reality-bending impacts of deepfakes, to the unprecedented power being concentrated in a handful of large AI companies. In other words, most people have never honestly imagined what their life will really feel like the day after superintelligence becomes widely available. Why superintelligence could demoralize us As context, artificial superintelligence (ASI) refers to systems that can outthink humans on most fronts, from planning and reasoning to problem-solving, strategic thinking and raw creativity. These systems will solve complex problems in a fraction of a second that might take the smartest human experts days, weeks or even years to work through. This terrifies me, and it’s not because of the doomsday scenarios that dominate our public discourse.  No, I am worried about the opposite risks — the dangers that could emerge in the best-case scenarios where superintelligence is helpful and benevolent. Such an ASI will have many positive impacts on society, but it could also be deeply demoralizing to our core identity as humans. After all, the world will feel different when each of us knows that a smarter, faster, more creative intelligence is available on our mobile devices than between our own ears. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO So ask yourself, honestly, how will humans act in this new reality? Will we reflexively seek advice from our AI assistants as we navigate every little challenge we encounter? Or worse, will we learn to trust our AI assistants more than our own thoughts and instincts? Wait — before you answer, you must update your mental model. Currently, we engage AI through a Socratic framework that requires us to ask questions and get answers (like Captain Kirk did aboard the Enterprise in 1966). But that’s old-school thinking. We are now entering a new era in which AI assistants will be integrated into body-worn devices that are equipped with cameras and microphones, enabling AI to see what you see, hear what you hear and whisper advice into your ears without you needing to ask. In other words, our future will be filled with AI assistants that ride shotgun in our lives, augmenting our experiences with optimized guidance at every turn. In this world, the risk is not that we reflexively ask AI for advice before using our own brains; the risk is that we won’t need to ask – the advice will just stream into our eyes and ears, shaping our actions, influencing our decisions and solving our problems before we’ve had a chance to think for ourselves. ‘Augmented mentality’ will transform our lives I refer to this framework as ‘augmented mentality‘ and it is about to hit society at scale through AI-powered glasses, earbuds and pendants.  This is the future of mobile computing, and it is already driving an arms race between Meta, Google, Samsung and Apple, as they position themselves to produce the context-aware AI devices that will replace handheld phones. Imagine walking down the street in your town. You see a coworker heading towards you. You can’t remember his name, but your AI assistant does. It detects your hesitation and whispers the coworker’s name into your ears.  The AI also recommends that you ask the coworker about his wife, who had surgery a few weeks ago.  The coworker appreciates the sentiment, then asks you about your recent promotion, likely at the advice of his own AI. Is this human empowerment, or a loss of human agency? It will certainly feel like a superpower to have an AI in your ear that always has your back, ensuring you never forget a name, always have witty things to say and are instantly alerted when someone you’re talking to is not being truthful. On the other hand, everyone you meet will have their own AI muttering in their own ears. This will make us wonder who we’re really interacting with — the human in front of us, or the AI agent giving them guidance (check out Carbon Dating for fun examples). Many experts believe that body-worn AI assistants will make us feel more powerful and capable, but that’s not the only way this could go. These same technologies could make us feel less confident in ourselves and less impactful in our lives. After all, human intelligence is the defining feature of humanity, the thing we take most pride in as a species, yet we could soon find ourselves deferring to AI assistants because we feel mentally outmatched. Is this empowerment — an AI that botsplains our every experience in real time? I raise these concerns as someone who has spent my entire career creating technologies that expand human abilities. From my early work developing augmented reality to my current work developing conversational agents that make human teams smarter, I am a firm believer that technology can greatly enhance human abilities. Unfortunately, when it comes to superintelligence, there is a fine line between augmenting our human abilities and replacing them. Unless we are thoughtful in how we deploy ASI, I fear we will cross that line.  Louis Rosenberg is an early pioneer of virtual and augmented reality and a longtime AI researcher. He founded Immersion Corp, Outland Research and Unanimous AI. source

What happens the day after superintelligence? Read More »

Anthropic takes on OpenAI and Google with new Claude AI features designed for students and developers

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Anthropic is launching new “learning modes” for its Claude AI assistant that transform the chatbot from an answer-dispensing tool into a teaching companion, as major technology companies race to capture the rapidly growing artificial intelligence education market while addressing mounting concerns that AI undermines genuine learning. The San Francisco-based AI startup will roll out the features starting today for both its general Claude.ai service and specialized Claude Code programming tool. The learning modes represent a fundamental shift in how AI companies are positioning their products for educational use — emphasizing guided discovery over immediate solutions as educators worry that students become overly dependent on AI-generated answers. “We’re not building AI that replaces human capability—we’re building AI that enhances it thoughtfully for different users and use cases,” an Anthropic spokesperson told VentureBeat, highlighting the company’s philosophical approach as the industry grapples with balancing productivity gains against educational value. The launch comes as competition in AI-powered education tools has reached fever pitch. OpenAI introduced its Study Mode for ChatGPT in late July, while Google unveiled Guided Learning for its Gemini assistant in early August and committed $1 billion over three years to AI education initiatives. The timing is no coincidence — the back-to-school season represents a critical window for capturing student and institutional adoption. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO The education technology market, valued at approximately $340 billion globally, has become a key battleground for AI companies seeking to establish dominant positions before the technology matures. Educational institutions represent not just immediate revenue opportunities but also the chance to shape how an entire generation interacts with AI tools, potentially creating lasting competitive advantages. “This showcases how we think about building AI—combining our incredible shipping velocity with thoughtful intention that serves different types of users,” the Anthropic spokesperson noted, pointing to the company’s recent product launches including Claude Opus 4.1 and automated security reviews as evidence of its aggressive development pace. How Claude’s new socratic method tackles the instant answer problem For Claude.ai users, the new learning mode employs a Socratic approach, guiding users through challenging concepts with probing questions rather than immediate answers. Originally launched in April for Claude for Education users, the feature is now available to all users through a simple style dropdown menu. The more innovative application may be in Claude Code, where Anthropic has developed two distinct learning modes for software developers. The “Explanatory” mode provides detailed narration of coding decisions and trade-offs, while the “Learning” mode pauses mid-task to ask developers to complete sections marked with “#TODO” comments, creating collaborative problem-solving moments. This developer-focused approach addresses a growing concern in the technology industry: junior programmers who can generate code using AI tools but struggle to understand or debug their own work. “The reality is that junior developers using traditional AI coding tools can end up spending significant time reviewing and debugging code they didn’t write and sometimes don’t understand,” according to the Anthropic spokesperson. The business case for enterprise adoption of learning modes may seem counterintuitive — why would companies want tools that intentionally slow down their developers? But Anthropic argues this represents a more sophisticated understanding of productivity that considers long-term skill development alongside immediate output. “Our approach helps them learn as they work, building skills to grow in their careers while still benefitting from the productivity boosts of a coding agent,” the company explained. This positioning runs counter to the industry’s broader trend toward fully autonomous AI agents, reflecting Anthropic’s commitment to human-in-the-loop design philosophy. The learning modes are powered by modified system prompts rather than fine-tuned models, allowing Anthropic to iterate quickly based on user feedback. The company has been testing internally across engineers with varying levels of technical expertise and plans to track the impact now that the tools are available to a broader audience. Universities scramble to balance AI adoption with academic integrity concerns The simultaneous launch of similar features by Anthropic, OpenAI, and Google reflects growing pressure to address legitimate concerns about AI’s impact on education. Critics argue that easy access to AI-generated answers undermines the cognitive struggle that’s essential for deep learning and skill development. A recent WIRED analysis noted that while these study modes represent progress, they don’t address the fundamental challenge: “the onus remains on users to engage with the software in a specific way, ensuring that they truly understand the material.” The temptation to simply toggle out of learning mode for quick answers remains just a click away. Educational institutions are grappling with these trade-offs as they integrate AI tools into curricula. Northeastern University, the London School of Economics, and Champlain College have partnered with Anthropic for campus-wide Claude access, while Google has secured partnerships with over 100 universities for its AI education initiatives. Behind the technology: how Anthropic built AI that teaches instead of tells Anthropic’s learning modes work by modifying system prompts to exclude efficiency-focused instructions typically built into Claude Code, instead directing the AI to find strategic moments for educational insights and user interaction. The approach allows for rapid iteration but can result in some inconsistent behavior across conversations. “We chose this approach because it lets us quickly learn from real student feedback and improve the experience Anthropic launches learning modes for Claude AI that guide users through step-by-step reasoning instead of providing direct answers, intensifying competition with OpenAI and Google in the booming AI education market.— even if it results in some inconsistent behavior and mistakes across conversations,” the company explained. Future plans include training these behaviors directly into core models once optimal approaches are identified through user feedback. The company is also exploring enhanced visualizations for complex concepts, goal setting and progress tracking across conversations, and

Anthropic takes on OpenAI and Google with new Claude AI features designed for students and developers Read More »

Study warns of security risks as ‘OS agents’ gain control of computers and phones

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers have published the most comprehensive survey to date of so-called “OS Agents” — artificial intelligence systems that can autonomously control computers, mobile phones and web browsers by directly interacting with their interfaces. The 30-page academic review, accepted for publication at the prestigious Association for Computational Linguistics conference, maps a rapidly evolving field that has attracted billions in investment from major technology companies. “The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations,” the researchers write. “With the evolution of (multimodal) large language models ((M)LLMs), this dream is closer to reality.” The survey, led by researchers from Zhejiang University and OPPO AI Center, comes as major technology companies race to deploy AI agents that can perform complex digital tasks. OpenAI recently launched “Operator,” Anthropic released “Computer Use,” Apple introduced enhanced AI capabilities in “Apple Intelligence,” and Google unveiled “Project Mariner” — all systems designed to automate computer interactions. OS agents work by observing computer screens and system data, then executing actions like clicks and swipes across mobile, desktop and web platforms. The systems must understand interfaces, plan multi-step tasks and translate those plans into executable code. (Credit: GitHub) Tech giants rush to deploy AI that controls your desktop The speed at which academic research has transformed into consumer-ready products is unprecedented, even by Silicon Valley standards. The survey reveals a research explosion: over 60 foundation models and 50 agent frameworks developed specifically for computer control, with publication rates accelerating dramatically since 2023. AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Secure your spot to stay ahead: https://bit.ly/4mwGngO This isn’t just incremental progress. We’re witnessing the emergence of AI systems that can genuinely understand and manipulate the digital world the way humans do. Current systems work by taking screenshots of computer screens, using advanced computer vision to understand what’s displayed, then executing precise actions like clicking buttons, filling forms, and navigating between applications. “OS Agents can complete tasks autonomously and have the potential to significantly enhance the lives of billions of users worldwide,” the researchers note. “Imagine a world where tasks such as online shopping, travel arrangements booking, and other daily activities could be seamlessly performed by these agents.” The most sophisticated systems can handle complex multi-step workflows that span different applications — booking a restaurant reservation, then automatically adding it to your calendar, then setting a reminder to leave early for traffic. What took humans minutes of clicking and typing can now happen in seconds, without human intervention. The development of AI agents requires a complex training pipeline that combines multiple approaches, from initial pre-training on screen data to reinforcement learning that optimizes performance through trial and error. (Credit: arxiv.org) Why security experts are sounding alarms about AI-controlled corporate systems For enterprise technology leaders, the promise of productivity gains comes with a sobering reality: these systems represent an entirely new attack surface that most organizations aren’t prepared to defend. The researchers dedicate substantial attention to what they diplomatically term “safety and privacy” concerns, but the implications are more alarming than their academic language suggests. “OS Agents are confronted with these risks, especially considering its wide applications on personal devices with user data,” they write. The attack methods they document read like a cybersecurity nightmare. “Web Indirect Prompt Injection” allows malicious actors to embed hidden instructions in web pages that can hijack an AI agent’s behavior. Even more concerning are “environmental injection attacks” where seemingly innocuous web content can trick agents into stealing user data or performing unauthorized actions. Consider the implications: an AI agent with access to your corporate email, financial systems, and customer databases could be manipulated by a carefully crafted web page to exfiltrate sensitive information. Traditional security models, built around human users who can spot obvious phishing attempts, break down when the “user” is an AI system that processes information differently. The survey reveals a concerning gap in preparedness. While general security frameworks exist for AI agents, “studies on defenses specific to OS Agents remain limited.” This isn’t just an academic concern — it’s an immediate challenge for any organization considering deployment of these systems. The reality check: Current AI agents still struggle with complex digital tasks Despite the hype surrounding these systems, the survey’s analysis of performance benchmarks reveals significant limitations that temper expectations for immediate widespread adoption. Success rates vary dramatically across different tasks and platforms. Some commercial systems achieve success rates above 50% on certain benchmarks — impressive for a nascent technology — but struggle with others. The researchers categorize evaluation tasks into three types: basic “GUI grounding” (understanding interface elements), “information retrieval” (finding and extracting data), and complex “agentic tasks” (multi-step autonomous operations). The pattern is telling: current systems excel at simple, well-defined tasks but falter when faced with the kind of complex, context-dependent workflows that define much of modern knowledge work. They can reliably click a specific button or fill out a standard form, but struggle with tasks that require sustained reasoning or adaptation to unexpected interface changes. This performance gap explains why early deployments focus on narrow, high-volume tasks rather than general-purpose automation. The technology isn’t yet ready to replace human judgment in complex scenarios, but it’s increasingly capable of handling routine digital busywork. OS agents rely on interconnected systems for perception, planning, memory and action execution. The complexity of coordinating these components helps explain why current systems still struggle with sophisticated tasks. (Credit: arxiv.org) What happens when AI agents learn to customize themselves for every user Perhaps the most intriguing — and potentially transformative — challenge identified in the survey involves what researchers call “personalization and self-evolution.” Unlike today’s stateless AI assistants that treat every interaction as independent, future OS agents will need

Study warns of security risks as ‘OS agents’ gain control of computers and phones Read More »