VentureBeat

Midjourney v7 launches with voice prompting and faster draft mode — why is it getting mixed reviews?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Midjourney, the boot-strapped startup viewed by many AI power users as the “gold standard” of AI image generation since its launch in 2022, has now introduced the much-anticipated, most advanced version of its generator model, Midjourney v7. The headline feature is a new way to prompt the model to create images. Previously, users were limited to typing in text prompts and attaching other images to help guide generations (the model could incorporate a variety of user-uploaded and attached images, including other Midjourney generations, to influence the style and subjects of new generations). Now, the user can simply speak aloud to Midjourney’s alpha website (alpha.midjourney.com) — provided they have a microphone in/on/attached to their computer (or using a networked device with audio input, such as headphones or a smartphone) — and the model will listen and conjure up its own text prompts based on the user’s spoken audio descriptions, generating images from this. It’s unclear whether or not Midjourney created a new voice input model (speech-to-text) from scratch or is using a fine-tuned or out-of-the-box version of one from another provider such as ElevenLabs or OpenAI. I asked Midjourney founder David Holz on X, but he has yet to answer. Using Draft Mode and conversational Voice Input to prompt in a flow state Going hand-in-hand with this input method is a new “Draft Mode” that generates images more rapidly than Midjourney v6.1, the most immediate preceding version, often in less than a minute or even 30 seconds in some cases. While the images are initially of lower quality than v6.1, the user can click on the “enhance” or “vary” buttons located to the right of each generation to re-render the draft at full quality. The idea is that the human user will be happy to use both together — in fact, you need to have “Draft Mode” turned on to activate audio input — to enter a more seamless flow state of creative drafting with the model, spending less time on refining the specific language of prompts and more on seeing new generations, reacting to them in realtime, and adjusting them or tweaking them as needed more naturally and rapidly by simply speaking the thoughts out to the model. “Make this look more detailed, darker, lighter, more realistic, more kinetic, more vibrant,” etc. are some of the instructions the user could provide through the new audio interface in response to generations to produce new, adjusted ones that better match their creative vision. Getting started with Midjourney v7 To enter these modes, starting with the new “Draft” feature, the user must first jump through one new hurdle: Midjourney’s personalization feature. While this feature had been introduced previously on Midjourney v6 back in June 2024, it was optional, allowing the user to create a personal “style” that could be applied to all generations going forward by rating 200 pairs of images (selecting which on the user liked best) through the Midjourney website. The user could then toggle on a style that matched the images they liked best during the pairwise rating process. Now, Midjourney v7 requires users to generate a new v7-specific personalized style before even using it at all in the first place. Once the user does that, they’ll land on the familiar Midjourney Alpha website dashboard where they can click “Create” from the left side rail to open a the creation tab. Then, in the prompt entry bar at the top, the user can click on the new “P” button to the right of the bar to turn on their personalization mode. Midjourney founder and leader David Holz confirmed to VentureBeat on X that older personalization styles from v6 could also be selected, but not the separate “moodboards” — styles made up of user-uploaded image collections — though Midjourney’s X account separately stated that feature will be returning soon as well. However, I didn’t see the opportunity to select my older v6 style. Nonetheless, the user can then click on the new “Draft Mode” button to the right of the Personalization button (also further to the right of the text prompt entry box) to activate this faster image generation mode. Once that’s been selected with the cursor, it will turn orange indicating it is turned on, and then a new button with a microphone icon should appear to the right of this one. This is the voice prompting mode, which the user can once again click on to activate. Once the user has pressed this microphone button to enter the voice prompting mode, they should see the microphone icon change from white to orange to indicate it is engaged, and a waveform line will appear to the right of it that should begin undulating in time with the user’s speech. The model will then be able to hear you and should also hear when you finish speaking. In practice, I sometimes got an error message saying “Realtime API disconnected,” but stopping and restarting the voice entry mode and refreshing the webpage usually cleared it quickly. After a few seconds of speaking, Midjourney will begin flashing some keyword windows below the prompt entry textbox at the top and also generate a full text prompt to the right as it generates a new set of 4 images based on what the user said. The user can then further modify these new generations by speaking to the model again, toggling voice mode on and off as needed. Here’s a quick demo video of me using it today to generate some sample imagery. You’ll see the process is far from perfect, but it is really fast and does allow for more of an interrupted state of prompting, refining, and receiving images from the model. More new features…but also many missing features and limitations from v6/6.1 Midjourney v7 is launching with two operational modes: Turbo and Relax. Turbo Mode provides high performance at twice the cost

Midjourney v7 launches with voice prompting and faster draft mode — why is it getting mixed reviews? Read More »

Now it’s TikTok parent ByteDance’s turn for a reasoning AI: enter Seed-Thinking-v1.5!

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It started with the announcement of OpenAI’s o1 model in Sept. 2024, but really took off with the DeepSeek R1 release in Jan. 2025. Now, it seems that most major AI model providers and trainers are in a new race to deliver better, faster, and cheaper “reasoning” AI language models — that is, ones that maybe take a little longer to respond to a human user, but ideally do so with better, more comprehensive, more well “reasoned” answers, which these class of models get by performing “chain-of-thought,” reflecting on their own conclusions and interrogating them for veracity before responding. ByteDance, the Chinese web media giant parent of TikTok, is the latest to join the party with the announcement and publication of the technical paper behind Seed-Thinking-v1.5, an upcoming large language model (LLM) designed to advance reasoning performance across both science, tech, math, and engineering (STEM) fields and general-purpose domains. The model is not yet available for download or use, and it’s unclear what the licensing terms will be—whether it will be proprietary/closed source, open source/free for all to use and modify at will, or somewhere in between. However, the technical paper provides some noteworthy details that are worth going over now and in advance of whenever they are made available. Built atop the increasingly popular Mixture-of-Experts (MoE) architecture Like Meta’s new Llama 4 and Mistral’s Mixtral before it, Seed-Thinking-v1.5 is built using a Mixture-of-Experts (MoE) architecture. This architecture is designed to make models more efficient. It essentially combines the capabilities of multiple models into one, each specializing in a different domain. In this case, the MoE architecture means that Seed-Thinking-v1.5 uses only 20 billion of the 200 billion parameters at a time. ByteDance says in its technical paper published to GitHub that Seed-Thinking-v1.5 prioritizes structured reasoning and thoughtful response generation. The results nearly speak for themselves, with Seed-Thinking-v1.5 outperforming DeepSeek R1 and approaching Google’s newly released Gemini 2.5 Pro and OpenAI’s o3-mini-high reasoner on many third-party benchmark evaluations. It even exceeds those two in the case of the ARC-AGI benchmark, which measures progress towards artificial general intelligence, seen as the goal or “Holy Grail” of AI. This model outperforms humans on most economically valuable tasks, according to OpenAI’s definition. Positioned as a compact yet capable alternative to larger state-of-the-art models, Seed-Thinking-v1.5 achieves competitive benchmark results. It introduces reinforcement learning (RL) innovations, training data curation and AI infrastructure. Performance benchmarks and model focus Seed-Thinking-v1.5 shows strong performance on a suite of challenging tasks, scoring 86.7% on AIME 2024, 55.0% pass@8 on Codeforces and 77.3% on the GPQA science benchmark. These results place it close to or matching models like OpenAI’s o3-mini-high and Google’s Gemini 2.5 Pro on specific reasoning metrics. On non-reasoning tasks, the model was evaluated through human preference comparisons and achieved an 8.0% higher win rate over DeepSeek R1, suggesting that its strengths generalize beyond logic or math-heavy challenges. To address saturation in standard benchmarks like AIME, ByteDance introduced BeyondAIME, a new, harder math benchmark with curated problems designed to resist memorization and better discriminate model performance. This and the Codeforces evaluation set are expected to be publicly released to support future research. Data strategy Training data played a central role in the model’s development. For supervised fine-tuning (SFT), the team curated 400,000 samples, including 300,000 verifiable (STEM, logic and coding tasks) and 100,000 non-verifiable problems like creative writing and role-playing. For RL training, data was segmented into: Verifiable problems: 100,000 rigorously filtered STEM questions and logic puzzles with known answers, sourced from elite competitions and expert review. Non-verifiable tasks: Human-preference datasets focused on open-ended prompts, evaluated using pairwise reward models. The STEM data leaned heavily on advanced mathematics, accounting for over 80% of the problem set. Additional logic data included tasks like Sudoku and 24-point puzzles, with adjustable difficulty to match model progress. Reinforcement learning approach Reinforcement learning in Seed-Thinking-v1.5 is powered by custom actor-critic (VAPO) and policy-gradient (DAPO) frameworks, developed to address known instabilities in RL training. These techniques reduce reward signal sparsity and enhance training stability, especially in long chain-of-thought (CoT) settings. Reward models play a critical role in supervising RL outputs. ByteDance introduced two key tools: Seed-Verifier: A rule-based LLM that checks if generated and reference answers are mathematically equivalent. Seed-Thinking-Verifier: A step-by-step reasoning-based judge that improves judgment consistency and resists reward hacking. This two-tiered reward system enables nuanced evaluation for both straightforward and complex tasks. Infrastructure and scaling To support efficient large-scale training, ByteDance built a system atop its HybridFlow framework. Execution is handled by Ray clusters, and training and inference processes are co-located to reduce GPU idle time. The Streaming Rollout System (SRS) is a notable innovation that separates model evolution from runtime execution. It accelerates iteration speed by asynchronously managing partially completed generations across model versions. This architecture reportedly delivers up to 3× faster RL cycles. Additional infrastructure techniques include: Mixed precision (FP8) for memory savings Expert parallelism and kernel auto-tuning for MoE efficiency ByteCheckpoint for resilient and flexible checkpointing AutoTuner for optimizing parallelism and memory configurations Human evaluation and real-world impact To evaluate alignment with human-centric preferences, ByteDance conducted human testing across a range of domains, including creative writing, humanities knowledge and general conversation. Seed-Thinking-v1.5 consistently outperformed DeepSeek R1 across sessions, reinforcing its applicability to real-world user needs. The development team notes that reasoning models trained primarily on verifiable tasks demonstrated strong generalization to creative domains—an outcome attributed to the structure and rigor embedded in mathematical training workflows. What it means for technical leaders, data engineers and enterprise decision-makers For technical leads managing the lifecycle of large language models—from data curation to deployment—Seed-Thinking-v1.5 presents an opportunity to rethink how reasoning capabilities are integrated into enterprise AI stacks. Its modular training process, which includes verifiable reasoning datasets and multi-phase reinforcement learning, particularly appeals to teams looking to scale LLM development while retaining fine-grained control. ByteDance’s moves to introduce Seed-Verifier and Seed-Thinking-Verifier offer

Now it’s TikTok parent ByteDance’s turn for a reasoning AI: enter Seed-Thinking-v1.5! Read More »

What you need to know about Amazon Nova Act: the new AI agent SDK challenging OpenAI, Microsoft, Salesforce

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The sleeping giant has awoken! For a while, it seemed like Amazon was playing catchup in the race to offer its users — particularly the millions of developers building atop Amazon Web Services (AWS)’s cloud infrastructure — compelling first-party AI models and tools. But in late 2024, it debuted its own internal foundation model family, Amazon Nova, with text, image and even video generation capabilities, and last month saw a new Amazon Alexa voice assistant powered in part by Anthropic’s Claude family of models. Then, on Monday, the e-commerce and cloud giant’s artificial general intelligence division Amazon AGI has announced the release of Amazon Nova Act, an experimental developer kit for building AI agents that can navigate the web and complete tasks autonomously, powered by a custom, proprietary version of Amazon’s Nova large language model (LLM). Oh, and the standard developer kit (SDK) is open source under a permissive Apache 2.0 license, though the SDK is designed to work only with Amazon’s in-house custom Nova model, not any third-party ones. The goal is to enable third-party developers to build AI agents capable of reliably performing tasks within web browsers. But how does Amazon’s Nova Act stack up to other agent building platforms out there on the market, such as Microsoft’s AutoGen, Salesforce’s Agentforce, and of course, OpenAI’s recently released open source Agents SDK? A different, more thoughtful approach to AI agents Since the public rise of large language models (LLMs), most “agent” systems have been limited to responding in natural language or providing information by querying knowledge bases. Nova Act is part of the larger industry shift toward action-based agents—systems that can complete actual tasks across digital environments on behalf of the user. OpenAI’s new Responses API, which gives users access to its autonomous browser navigator, is one leading example of this, which developers can integrate into AI agents through the OpenAI Agents SDK. Amazon AGI emphasizes that current agent systems, while promising, struggle with reliability and often require human supervision, especially when handling multi-step or complex workflows. Nova Act is specifically designed to address these limitations by providing a set of atomic, prescriptive commands that can be chained together into reliable workflows. Deniz Birlikci, a Member of Technical Staff at Amazon, described the broader vision in a video introducing Nova Act: soon, there will be more AI agents than people browsing the web, carrying out tasks on behalf of users. David Luan, VP of Amazon’s Autonomy Team and Head of AGI SF Lab, framed the mission more directly in a recent video call interview with VentureBeat: “We’ve created this new experimental AI model that is trained to perform actions in a web browser. Fundamentally, we think that agents are the building block of computing,” he said. Luan, formerly a co-founder and CEO of Adept AI, joined Amazon in 2024 as part of an aqcui-hire. Luan said he has long been a proponent of AI agents. “With Adept, we were the first company to really start working on AI agents. At this point, everybody knows how important agents are. It was pretty cool to be a bit ahead of our time,” he added. What Nova Act offers devs The Nova Act SDK provides developers with a framework for constructing web-based automation agents using natural language prompts broken down into clear, manageable steps. Unlike typical LLM-powered agents that attempt entire workflows from a single prompt—often resulting in unreliable behavior—Nova Act is designed to incrementally execute smaller, verifiable tasks. Some of the key features of Nova Act include: Fine-Grained Task Decomposition: Developers can break down complex digital workflows into smaller act() calls, each guiding the agent to perform specific UI interactions. Direct Browser Manipulation via Playwright: Nova Act integrates with Playwright, an open-source browser automation framework developed by Microsoft. Playwright allows developers to control web browsers programmatically—clicking elements, filling forms, or navigating pages—without relying solely on AI predictions. This integration is particularly useful for handling sensitive tasks such as entering passwords or credit card details. For example, instead of sending sensitive information to the model, developers can instruct Nova Act to focus on a password field and then use Playwright APIs to securely enter the password without the model ever “seeing” it. This approach helps strengthen security and privacy when automating web interactions. Python Integration: The SDK allows developers to interleave Python code with Nova Act commands, including standard Python tools such as breakpoints, assertions, or thread pooling for parallel execution. Structured Information Extraction: The SDK supports structured data extraction through Pydantic schemas, allowing agents to convert screen content into structured formats. Parallelization and Scheduling: Developers can run multiple Nova Act instances concurrently and schedule automated workflows without the need for continuous human oversight. Luan emphasized that Nova Act is a tool for developers rather than a general-purpose chatbot. “Nova Act is built for developers. It’s not a chatbot you talk to for fun. It’s designed to let developers start building useful products,” he said. For example, one of the sample workflows demonstrated in Amazon’s documentation shows how Nova Act can automate apartment searches by scraping rental listings and calculating biking distance to train stations, then sorting the results in a structured table. Another showcased example uses Nova Act to order a specific salad from Sweetgreen every Tuesday, entirely hands-free and on a schedule, illustrating how developers can automate repeatable digital tasks in a way that feels reliable and customizable. Benchmark performance and a focus on reliability A central message in Amazon’s announcement is that reliability, not just intelligence, is the key barrier to widespread agent adoption. Current state-of-the-art models are actually quite brittle at powering AI agents, with agents typically achieving 30% to 60% success rates on browser-based multi-step tasks, according to Amazon. Nova Act, however, emphasizes a building-block approach, scoring over 90% on internal evaluations of tasks that challenge other models—such as interacting with dropdowns, date pickers, or pop-ups. Luan

What you need to know about Amazon Nova Act: the new AI agent SDK challenging OpenAI, Microsoft, Salesforce Read More »

Bigger isn’t always better: Examining the business case for multi-million token LLMs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The race to expand large language models (LLMs) beyond the million-token threshold has ignited a fierce debate in the AI community. Models like MiniMax-Text-01 boast 4-million-token capacity, and Gemini 1.5 Pro can process up to 2 million tokens simultaneously. They now promise game-changing applications and can analyze entire codebases, legal contracts or research papers in a single inference call. At the core of this discussion is context length — the amount of text an AI model can process and also remember at once. A longer context window allows a machine learning (ML) model to handle much more information in a single request and reduces the need for chunking documents into sub-documents or splitting conversations. For context, a model with a 4-million-token capacity could digest 10,000 pages of books in one go. In theory, this should mean better comprehension and more sophisticated reasoning. But do these massive context windows translate to real-world business value? As enterprises weigh the costs of scaling infrastructure against potential gains in productivity and accuracy, the question remains: Are we unlocking new frontiers in AI reasoning, or simply stretching the limits of token memory without meaningful improvements? This article examines the technical and economic trade-offs, benchmarking challenges and evolving enterprise workflows shaping the future of large-context LLMs. The rise of large context window models: Hype or real value? Why AI companies are racing to expand context lengths AI leaders like OpenAI, Google DeepMind and MiniMax are in an arms race to expand context length, which equates to the amount of text an AI model can process in one go. The promise? deeper comprehension, fewer hallucinations and more seamless interactions. For enterprises, this means AI that can analyze entire contracts, debug large codebases or summarize lengthy reports without breaking context. The hope is that eliminating workarounds like chunking or retrieval-augmented generation (RAG) could make AI workflows smoother and more efficient. Solving the ‘needle-in-a-haystack’ problem The needle-in-a-haystack problem refers to AI’s difficulty identifying critical information (needle) hidden within massive datasets (haystack). LLMs often miss key details, leading to inefficiencies in: Search and knowledge retrieval: AI assistants struggle to extract the most relevant facts from vast document repositories. Legal and compliance: Lawyers need to track clause dependencies across lengthy contracts. Enterprise analytics: Financial analysts risk missing crucial insights buried in reports. Larger context windows help models retain more information and potentially reduce hallucinations. They help in improving accuracy and also enable: Cross-document compliance checks: A single 256K-token prompt can analyze an entire policy manual against new legislation. Medical literature synthesis: Researchers use 128K+ token windows to compare drug trial results across decades of studies. Software development: Debugging improves when AI can scan millions of lines of code without losing dependencies. Financial research: Analysts can analyze full earnings reports and market data in one query. Customer support: Chatbots with longer memory deliver more context-aware interactions. Increasing the context window also helps the model better reference relevant details and reduces the likelihood of generating incorrect or fabricated information. A 2024 Stanford study found that 128K-token models reduced hallucination rates by 18% compared to RAG systems when analyzing merger agreements. However, early adopters have reported some challenges: JPMorgan Chase’s research demonstrates how models perform poorly on approximately 75% of their context, with performance on complex financial tasks collapsing to near-zero beyond 32K tokens. Models still broadly struggle with long-range recall, often prioritizing recent data over deeper insights. This raises questions: Does a 4-million-token window truly enhance reasoning, or is it just a costly expansion of memory? How much of this vast input does the model actually use? And do the benefits outweigh the rising computational costs? Cost vs. performance: RAG vs. large prompts: Which option wins? The economic trade-offs of using RAG RAG combines the power of LLMs with a retrieval system to fetch relevant information from an external database or document store. This allows the model to generate responses based on both pre-existing knowledge and dynamically retrieved data. As companies adopt AI for complex tasks, they face a key decision: Use massive prompts with large context windows, or rely on RAG to fetch relevant information dynamically. Large prompts: Models with large token windows process everything in a single pass and reduce the need for maintaining external retrieval systems and capturing cross-document insights. However, this approach is computationally expensive, with higher inference costs and memory requirements. RAG: Instead of processing the entire document at once, RAG retrieves only the most relevant portions before generating a response. This reduces token usage and costs, making it more scalable for real-world applications. Comparing AI inference costs: Multi-step retrieval vs. large single prompts While large prompts simplify workflows, they require more GPU power and memory, making them costly at scale. RAG-based approaches, despite requiring multiple retrieval steps, often reduce overall token consumption, leading to lower inference costs without sacrificing accuracy. For most enterprises, the best approach depends on the use case: Need deep analysis of documents? Large context models may work better. Need scalable, cost-efficient AI for dynamic queries? RAG is likely the smarter choice. A large context window is valuable when: The full text must be analyzed at once (ex: contract reviews, code audits). Minimizing retrieval errors is critical (ex: regulatory compliance). Latency is less of a concern than accuracy (ex: strategic research). Per Google research, stock prediction models using 128K-token windows analyzing 10 years of earnings transcripts outperformed RAG by 29%. On the other hand, GitHub Copilot’s internal testing showed that 2.3x faster task completion versus RAG for monorepo migrations. Breaking down the diminishing returns The limits of large context models: Latency, costs and usability While large context models offer impressive capabilities, there are limits to how much extra context is truly beneficial. As context windows expand, three key factors come into play: Latency: The more tokens a model processes, the slower the inference. Larger context windows can lead to significant delays, especially when real-time

Bigger isn’t always better: Examining the business case for multi-million token LLMs Read More »

ChatGPT’s memory can now reference all past conversations, not just what you tell it to

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI is slowly rolling out better memory on ChatGPT, making it a default for ChatGPT to reference past conversations. This has raised the fear that the platform is proactively “listening” to users, making them uncomfortable with how much the platform knows.  ChatGPT already logs information from previous interactions through its Memory feature, ensuring preferences are saved and conversations can seamlessly continue from where the user left off.  This new update allows ChatGPT to “draw on past conversations to deliver more relevant and useful responses” and go across all modalities in the platform. Improvements in Memory allow future conversations, not just current chat windows, to reference previous chats. It will only be available for ChatGPT Plus and Pro users. ChatGPT Enterprise, Team and Edu will get access to the feature later. Starting today, memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond. pic.twitter.com/s9BrWl94iY — OpenAI (@OpenAI) April 10, 2025 OpenAI added Memory to ChatGPT in February last year to make talking to ChatGPT more helpful. Memory is a feature for most chat platforms and large language models (LLMs). Gemini 2.0 Flash Thinking added Memory, while frameworks like A-Mem improve long-context memory for more complicated tasks.    Proactive memory Improvements to memory will let ChatGPT “naturally build” on earlier chat, and over time, OpenAI said interactions on ChatGPT will be more tailored to the user.  OpenAI offers two ways to control Memory through settings. The first is Reference Saved Memories, where the user can direct ChatGPT to remember facts like names or preferences. The company said people usually add this information by explicitly telling ChatGPT to remember something. The model will figure out which information will be helpful in future conversations.  The second control is Reference Chat History. This setting permits ChatGPT to draw context from previous discussions and “adapt to your tone, goals, interests, or other recurring topics.” However, the context will not be stored or shown in the settings page like saved memories are.  “You can choose to have both settings on or off, or just turn on reference saved memories,” OpenAI said. “The settings are flexible, and you can change them anytime, including managing specific saved memories. If you opt out, ChatGPT won’t draw on past conversations. You can also ask what it remembers or switch to Temporary Chat for memory‑free sessions.” Concerns from some users Remembering conversations and taking details for future conversations not only makes it easy to continue a chat, but ideally, for enterprise tasks, having access to preferences and context makes AI models more useful.  AI investor Allie K. Miller said in a post on X that this update makes ChatGPT “listening all the time. It’s cutting across all of your conversations, whether you have explicitly asked it to remember something or not.” “As I mentioned a few weeks ago, memory is the best feature inside these platforms. As models and features get commoditized, it’s going to come down to personalization, collaboration and network effects. Memory is the key. Memory is the moat,” Miller said.  Let me explain the brand-new OpenAI release that kept Sam Altman up all night last night. ⬇️⬇️⬇️ I don’t have memory turned on ChatGPT (because I can’t for work), and today is the day that starts to hurt. Until today, ChatGPT’s memory was pretty boring. It waited for a clear… https://t.co/fdAq1JLlwA — Allie K. Miller (@alliekmiller) April 10, 2025 However, after OpenAI announced the Memory update, some users expressed concern that it might change how the model interacts with you.  Prominent AI commenter and Wharton professor Ethan Mollick noted it’s not a feature he will turn on.  “I totally get why AI long-term memory is useful and, based on my testing, think many people will love it… but I actually don’t want my LLMs I use for work to chime in with personal details or subtly change its answers as a result of my past interactions. Boundaries are good,” Mollick said.  I totally get why AI long term memory is useful and, based on my testing, think many people will love it… but I actually don’t want my LLMs I use for work to chime in with personal details or subtly change its answers as a result of my past interactions. Boundaries are good. — Ethan Mollick (@emollick) April 10, 2025 OpenAI cofounder Andrej Karpathy worried ChatGPT “think worse of me based on that noob bash question I asked 7 months ago.”  Will GPT think worse of me based on that noob bash question I asked 7 months ago ? — Andrej Karpathy (@karpathy) April 10, 2025 Memory on ChatGPT will be helpful, but it will be up to the user to determine how much they want the chat platform to know about them and how crucial past information will be for future conversations.  source

ChatGPT’s memory can now reference all past conversations, not just what you tell it to Read More »

From MIPS to exaflops in mere decades: Compute power is exploding, and it will transform AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More At the recent Nvidia GTC conference, the company unveiled what it described as the first single-rack system of servers capable of one exaflop — one billion billion, or a quintillion, floating-point operations (FLOPS) per second. This breakthrough is based on the latest GB200 NVL72 system, which incorporates Nvidia’s latest Blackwell graphics processing units (GPUs). A standard computer rack is about 6 feet tall, a little more than 3 feet deep and less than 2 feet wide. Shrinking an exaflop: From Frontier to Blackwell A couple of things about the announcement struck me. First, the world’s first exaflop-capable computer was installed only a few years ago, in 2022, at Oak Ridge National Laboratory. For comparison, the “Frontier” supercomputer built by HPE and powered by AMD GPUs and CPUs, originally consisted of 74 racks of servers. The new Nvidia system has achieved roughly 73X greater performance density in just three years, equivalent to a tripling of performance every year. This advancement reflects remarkable progress in computing density, energy efficiency and architectural design. Secondly, it needs to be said that while both systems hit the exascale milestone, they are built for different challenges, one optimized for speed, the other for precision. Nvidia’s exaflop specification is based on lower-precision math — specifically 4-bit and 8-bit floating-point operations — considered optimal for AI workloads including tasks like training and running large language models (LLMs). These calculations prioritize speed over precision. By contrast, the exaflop rating for Frontier was achieved using 64-bit double-precision math, the gold standard for scientific simulations where accuracy is critical. We’ve come a long way (very quickly) This level of progress seems almost unbelievable, especially as I recall the state-of-the-art when I began my career in the computing industry. My first professional job was as a programmer on the DEC KL 1090. This machine, part of DEC’s PDP-10 series of timeshare mainframes, offered 1.8 million instructions per second (MIPS). Aside from its CPU performance, the machine connected to cathode ray tube (CRT) displays via hardwired cables. There were no graphics capabilities, just light text on a dark background. And of course, no Internet. Remote users connected over phone lines using modems running at speeds up to 1,200 bits per second. DEC System 10; Source: By Joe Mabel, CC BY-SA 3.0. 500 billion times more compute While comparing MIPS to FLOPS gives a general sense of progress, it is important to remember that these metrics measure different computing workloads. MIPS reflects integer processing speed, which is useful for general-purpose computing, particularly in business applications. FLOPS measures floating-point performance that is crucial for scientific workloads and the heavy number-crunching behind modern AI, such as the matrix math and linear algebra used to train and run machine learning (ML) models. While not a direct comparison, the sheer scale of the difference between MIPS then and FLOPS now provides a powerful illustration of the rapid growth in computing performance. Using these as a rough heuristic to measure work performed, the new Nvidia system is approximately 500 billion times more powerful than the DEC machine. That kind of leap exemplifies the exponential growth of computing power over a single professional career and raises the question: If this much progress is possible in 40 years, what might the next 5 bring? Nvidia, for its part, has offered some clues. At GTC, the company shared a roadmap predicting that its next-generation full-rack system based on the “Vera Rubin” Ultra architecture will deliver 14X the performance of the Blackwell Ultra rack shipping this year, reaching somewhere between 14 and 15 exaflops in AI-optimized work in the next year or two. Just as notable is the efficiency. Achieving this level of performance in a single rack means less physical space per unit of work, fewer materials and potentially lower energy use per operation, although the absolute power demands of these systems remain immense. Does AI really need all that compute power? While such performance gains are indeed impressive, the AI industry is now grappling with a fundamental question: How much computing power is truly necessary and at what cost? The race to build massive new AI data centers is being driven by the growing demands of exascale computing and ever-more capable AI models. The most ambitious effort is the $500 billion Project Stargate, which envisions 20 data centers across the U.S., each spanning half a million square feet. A wave of other hyperscale projects is either underway or in planning stages around the world, as companies and countries scramble to ensure they have the infrastructure to support the AI workloads of tomorrow. Some analysts now worry that we may be overbuilding AI data center capacity. Concern intensified after the release of R1, a reasoning model from China’s DeepSeek that requires significantly less compute than many of its peers. Microsoft later canceled leases with multiple data center providers, sparking speculation that it might be recalibrating its expectations for future AI infrastructure demand. However, The Register suggested that this pullback may have more to do with some of the planned AI data centers not having sufficiently robust ability to support the power and cooling needs of next-gen AI systems. Already, AI models are pushing the limits of what present infrastructure can support. MIT Technology Review reported that this may be the reason many data centers in China are struggling and failing, having been built to specifications that are not optimal for the present need, let alone those of the next few years. AI inference demands more FLOPs Reasoning models perform most of their work at runtime through a process known as inference. These models power some of the most advanced and resource-intensive applications today, including deep research assistants and the emerging wave of agentic AI systems. While DeepSeek-R1 initially spooked the industry into thinking that future AI might require less computing power, Nvidia CEO Jensen Huang pushed back hard. Speaking to CNBC, he

From MIPS to exaflops in mere decades: Compute power is exploding, and it will transform AI Read More »

$40B into the furnace: As OpenAI adds a million users an hour, the race for enterprise AI dominance hits a new gear

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In a move that surprised the tech industry Monday, OpenAI said it has secured a monumental $40 billion funding round led by SoftBank, catapulting its valuation to an unprecedented $300 billion — making it the largest private equity investment on record. The landmark investment underscores the escalating significance of AI, and also signals a shift in the enterprise technology landscape. With such a vast war chest, OpenAI now has much more staying power in its battle to serve companies with sophisticated generative AI solutions — where it is going against giant competitors like Google and AWS, as well navigating a sensitive relationship with its partner Microsoft. It is also facing tough competitors like Anthropic and Elon Musk’s xAI. Before this round closed, questions remained around whether OpenAI had the capital to continue to play in the big leagues. Spending by companies on generative AI is expected to hit $644 billion this year, according to research company Gartner. That’s 76 percent more than was spent last year, and shows why the race is on among large companies to grab market share. In its announcement, Open AI said it now has 500 million active weekly users, a significant jump from the 400 million number it cited just a month ago. With such viral growth, the company badly needed capital to build the servers and other infrastructure to keep up with this demand. It also shows that the intense competition, where other providers such as Google, Anthropic and even Chinese companies like DeepSeek are offering AI models that often match the functionality of OpenAI’s own leading models, has not slowed OpenAI’s growth rate. In another significant twist Monday, OpenAI also announced that it planned to launch an open-weights reasoning model, and that it would allow developers to run it on their own hardware, the departure from OpenAI’s cloud subscription model that has so far driven its revenue. The funding details: a closer look For decision-makers navigating this rapidly evolving environment, understanding the implications of OpenAI’s latest financial maneuver is important. The $40 billion infusion came primarily from SoftBank, with contributions from Microsoft, Coatue, Altimeter, and Thrive Capital, according to reporting by CNBC. The capital is earmarked for OpenAI’s AI research, computational infrastructure, and enhancing its suite of AI tools, including the widely adopted ChatGPT, according to OpenAI’s post on the news. Notably, $18 billion of this funding is allocated to the Stargate project — a joint venture between OpenAI, SoftBank, and Oracle — aimed at developing extensive AI infrastructure. The reports also suggested that the latest OpenAI funding would come in several tranches, and that part of it depends on OpenAI turning into a for-profit company by the end of this year. While OpenAI is still driving a significant loss, the company projects that it will make enough revenue to break even by 2029 and then start making significant profits. CEO Sam Altman tweeted Monday morning that the company had added “one million users in the last hour,” contrasting it with the million added in the five days after ChatGPT launched 26 months ago. The latest viral surge of usage comes from the big update OpenAI made last week to its image generating technology, which has taken image creation to a whole new level of ease and sophistication — with consumers going crazy making selfies in the style of Studio Ghibli. Notably, OpenAI announced Monday that it was restoring its offer to allow free users to access the new image generating technology, something it had taken back temporarily last week after usage overwhelmed the company’s servers. While a lot of excitement around OpenAI remains on the consumer side, for enterprise technology leaders the funding development also carries big implications: OpenAI’s bolstered resources will help it fast-track the development of advanced AI models and products for enterprise as well, allowing it to stay ahead amid increased competition. Enterprises should anticipate a continued flurry of new AI-driven solutions, necessitating continued vigilance among enterprise companies to stay on top of these releases in order to remain competitive. source

$40B into the furnace: As OpenAI adds a million users an hour, the race for enterprise AI dominance hits a new gear Read More »

DeepCoder delivers top coding performance in efficient 14B open model

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers at Together AI and Agentica have released DeepCoder-14B, a new coding model that delivers impressive performance comparable to leading proprietary models like OpenAI’s o3-mini.  Built on top of DeepSeek-R1, this model gives more flexibility to integrate high-performance code generation and reasoning capabilities into real-world applications. Importantly, the teams have fully open-sourced the model, its training data, code, logs and system optimizations, which can help researchers improve their work and accelerate progress. Competitive coding capabilities in a smaller package The research team’s experiments show that DeepCoder-14B performs strongly across several challenging coding benchmarks, including LiveCodeBench (LCB), Codeforces and HumanEval+. “Our model demonstrates strong performance across all coding benchmarks… comparable to the performance of o3-mini (low) and o1,” the researchers write in a blog post that describes the model. Interestingly, despite being trained primarily on coding tasks, the model shows improved mathematical reasoning, scoring 73.8% on the AIME 2024 benchmark, a 4.1% improvement over its base model (DeepSeek-R1-Distill-Qwen-14B). This suggests that the reasoning skills developed through RL on code can be generalized effectively to other domains. Credit: Together AI The most striking aspect is achieving this level of performance with only 14 billion parameters. This makes DeepCoder significantly smaller and potentially more efficient to run than many frontier models. Innovations driving DeepCoder’s performance While developing the model, the researchers solved some of the key challenges in training coding models using reinforcement learning (RL). The first challenge was curating the training data. Reinforcement learning requires reliable reward signals indicating the model’s output is correct. As the researchers point out, “Unlike math—where abundant high-quality, verifiable data is readily available on the Internet—the coding domain suffers from a relative scarcity of such data.”  To address this problem, the DeepCoder team implemented a strict pipeline that gathers examples from different datasets and filters them for validity, complexity and duplication. This process yielded 24,000 high-quality problems, providing a solid foundation for effective RL training. The team also designed a straightforward reward function that only provides a positive signal if the generated code passes all sampled unit tests for the problem within a specific time limit. Combined with the high-quality training examples, this outcome-focused reward system prevents the model from learning tricks like printing memorized answers for public tests or optimizing for simple edge cases without solving the core problem. The model’s core training algorithm is based on Group Relative Policy Optimization (GRPO), a reinforcement learning algorithm that proved very successful in DeepSeek-R1. However, the team made several modifications to the algorithm to make it more stable and allow the model to continue improving as the training extends for a longer time. GRPO+ enables DeepCoder-14 to continue for longer durations without collapsing Credit: Together AI Finally, the team extended the model’s context window iteratively, first training it on shorter reasoning sequences and gradually increasing the length. They also developed a filtering method to avoid penalizing the model when it created reasoning chains that exceeded the context limits when solving a hard prompt.  DeepCoder was trained on 32K context problems but was also able to solve 64K tasks Credit: Together AI The researchers explain the core idea: “To preserve long-context reasoning while enabling efficient training, we incorporated overlong filtering… This technique masks out truncated sequences during training so that models aren’t penalized for generating thoughtful but lengthy outputs that exceed the current context limit.”  The training was gradually scaled from a 16K to a 32K context window, and the resulting model could also solve problems that required up to 64K tokens. Optimizing long-context RL training Training large models with RL, especially on tasks requiring long generated sequences like coding or complex reasoning, is computationally intensive and slow. A major bottleneck is the “sampling” step, where the model generates potentially thousands of tokens per example in the batch. Variations in response length mean some responses finish much later than others, leaving GPUs idle and slowing down the entire training loop.  To accelerate this, the team developed verl-pipeline, an optimized extension of the open-source verl library for reinforcement learning from human feedback (RLHF). The key innovation, which they call “One-Off Pipelining,” rearranges the response sampling and model updates to reduce the bottlenecks and accelerator idle time. One-Off Pipelining Their experiments showed that one-off pipelining provided up to a 2x speedup for coding RL tasks compared to baseline implementations. This optimization was crucial for training DeepCoder within a reasonable timeframe (2.5 weeks on 32 H100s) and is now open-sourced as part of verl-pipeline for the community to use and build upon.  Enterprise impact The researchers have made all the artifacts for training and running DeepCoder-14B available on GitHub and Hugging Face under a permissive license. “By fully sharing our dataset, code, and training recipe, we empower the community to reproduce our work and make RL training accessible to all,” the researchers write. DeepCoder-14B powerfully illustrates a broader, accelerating trend in the AI landscape: the rise of highly capable yet efficient and openly accessible models.  For the enterprise world, this shift signifies more options and higher accessibility of advanced models. Cutting-edge performance is no longer solely the domain of hyperscalers or those willing to pay premium API fees. Models like DeepCoder can empower organizations of all sizes to leverage sophisticated code generation and reasoning, customize solutions to their specific needs, and securely deploy them within their environments.  This trend can lower the barrier to entry for AI adoption and foster a more competitive and innovative ecosystem, where progress is driven through open source collaboration. source

DeepCoder delivers top coding performance in efficient 14B open model Read More »

What’s inside the LLM? Ai2 OLMoTrace will ‘trace’ the source

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Understanding precisely how the output of a large language model (LLM) matches with training data has long been a mystery and a challenge for enterprise IT. A new open-source effort launched this week by the Allen Institute for AI (Ai2) aims to help solve that challenge by tracing LLM output to training inputs. The OLMoTrace tool allows users to trace language model outputs directly back to the original training data, addressing one of the most significant barriers to enterprise AI adoption: the lack of transparency in how AI systems make decisions. OLMo is an acronym for Open Language Model, which is also the name of Ai2’s family of open-source LLMs. On the company’s Ai2 Playground site, users can try out OLMoTrace with the recently released OLMo 2 32B model. The open-source code is also available on GitHub and is freely available for anyone to use. Unlike existing approaches focusing on confidence scores or retrieval-augmented generation, OLMoTrace offers a direct window into the relationship between model outputs and the multi-billion-token training datasets that shaped them. “Our goal is to help users understand why language models generate the responses they do,” Jiacheng Liu, researcher at Ai2 told VentureBeat. How OLMoTrace works: More than just citations LLMs with web search functionality, like Perplexity or ChatGPT Search, can provide source citations. However, those citations are fundamentally different from what OLMoTrace does. Liu explained that Perplexity and ChatGPT Search use retrieval-augmented generation (RAG). With RAG, the purpose is to improve the quality of model generation by providing more sources than what the model was trained on. OLMoTrace is different because it traces the output from the model itself without any RAG or external document sources. The technology identifies long, unique text sequences in model outputs and matches them with specific documents from the training corpus. When a match is found, OLMoTrace highlights the relevant text and provides links to the original source material, allowing users to see exactly where and how the model learned the information it’s using. Beyond confidence scores: Tangible evidence of AI decision-making By design, LLMs generate outputs based on model weights that help to provide a confidence score. The basic idea is that the higher the confidence score, the more accurate the output. In Liu’s view, confidence scores are fundamentally flawed.  “Models can be overconfident of the stuff they generate and if you ask them to generate a score, it’s usually inflated,” Liu said. “That’s what academics call a calibration error—the confidence that models output does not always reflect how accurate their responses really are.” Instead of another potentially misleading score, OLMoTrace provides direct evidence of the model’s learning source, enabling users to make their own informed judgments. “What OLMoTrace does is showing you the matches between model outputs and the training documents,” Liu explained. “Through the interface, you can directly see where the matching points are and how the model outputs coincide with the training documents.” How OLMoTrace compares to other transparency approaches Ai2 is not alone in the quest to better understand how LLMs generate output. Anthropic recently released its own research into the issue. That research focused on model internal operations, rather than understanding data. “We are taking a different approach from them,” Liu said. “We are directly tracing into the model behavior, into their training data, as opposed to tracing things into the model neurons, internal circuits, that kind of thing.” This approach makes OLMoTrace more immediately useful for enterprise applications, as it doesn’t require deep expertise in neural network architecture to interpret the results. Enterprise AI applications: From regulatory compliance to model debugging For enterprises deploying AI in regulated industries like healthcare, finance, or legal services, OLMoTrace offers significant advantages over existing black-box systems. “We think OLMoTrace will help enterprise and business users to better understand what is used in the training of models so that they can be more confident when they want to build on top of them,” Liu said. “This can help increase the transparency and trust between them of their models, and also for customers of their model behaviors.” The technology enables several critical capabilities for enterprise AI teams: Fact-checking model outputs against original sources Understanding the origins of hallucinations Improving model debugging by identifying problematic patterns Enhancing regulatory compliance through data traceability Building trust with stakeholders through increased transparency The Ai2 team has already used OLMoTrace to identify and correct their models’ issues. “We are already using it to improve our training data,” Liu reveals. “When we built OLMo 2 and we started our training, through OLMoTrace, we found out that actually some of the post-training data was not good.” What this means for enterprise AI adoption For enterprises looking to lead the way in AI adoption, OLMoTrace represents a significant step toward more accountable enterprise AI systems. The technology is available under an Apache 2.0 open-source license, which means that any organization with access to its model’s training data can implement similar tracing capabilities. “OLMoTrace can work on any model, as long as you have the training data of the model,” Liu notes. “For fully open models where everyone has access to the model’s training data, anyone can set up OLMoTrace for that model and for proprietary models, maybe some providers don’t want to release their data, they can also do this OLMoTrace internally.” As AI governance frameworks continue to evolve globally, tools like OLMoTrace that enable verification and auditability will likely become essential components of enterprise AI stacks, particularly in regulated industries where algorithmic transparency is increasingly mandated. For technical decision-makers weighing the benefits and risks of AI adoption, OLMoTrace offers a practical path to implementing more trustworthy and explainable AI systems without sacrificing the power of large language models. source

What’s inside the LLM? Ai2 OLMoTrace will ‘trace’ the source Read More »

Writer unveils ‘AI HQ’ platform, betting on agents to transform enterprise work

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprise AI company Writer unveiled a new platform today that it claims will help businesses finally bridge the gap between AI’s theoretical potential and real-world results. The “AI HQ” product represents a significant shift toward autonomous AI systems that can execute complex workflows across organizations. “This is not another hype train, but a massive change coming to enterprise software,” said May Habib, Writer’s CEO and co-founder, at a press conference announcing the product. “The vast majority of enterprises have not gotten meaningful results from generative AI, and it’s been two years. There has never before been such a gap between what the tech is capable of and what the enterprise results have been.” AI HQ is Writer’s answer to this problem—a platform for building, activating, and supervising AI “agents” that can perform sequences of tasks traditionally requiring human intervention. These agents can make decisions, reason through problems and act across different systems with little human oversight. How Writer’s AI agents move beyond chatbots to deliver real business value The announcement comes as many enterprises reevaluate their AI strategies. According to Habib, most AI implementations have failed to deliver substantial value, with businesses struggling to move beyond basic generative AI use cases. “Process mapping is the new prompt engineering,” Habib said, highlighting how the company’s approach has evolved beyond simply crafting the right text prompts to designing entire workflows for AI systems. AI HQ consists of three main components: a development environment called Agent Builder where IT and business teams collaboratively create agents; Writer Home, which provides access to over 100 pre-built agents for specific industries and functions; and observability tools for monitoring and governing agent behavior at scale. During a product demonstration, Writer executives showed how customers already use these technologies. For example, an investment management firm uses Writer’s agents to automatically generate fund reports and personalized market commentary by pulling data from Snowflake, SEC filings, and real-time web searches. Another demonstration showed a marketing workflow where an agent could analyze a strategy brief, create a project in Adobe Workfront, generate content, find or create supporting images, and prepare the material for legal review. Enterprise AI that actually works: How Writer’s autonomous agents tackle complex business workflows Writer’s pivot to agent-based AI reflects broader market trends. While many companies initially focused on using large language models for text generation and chat functions, businesses are increasingly exploring how AI can automate complex processes. “Ten percent of the headcount is going to be enough,” Habib told Forbes in a recent interview about the potential workforce impact of agent technologies. This dramatic assertion underscores the transformative potential—and potential disruption—these technologies may bring to knowledge work. Anna Griffin, chief marketing officer at cybersecurity firm Commvault and an early adopter of Writer’s agent technology, spoke during the press conference about the value of connecting previously siloed systems. “What if I could connect our Salesforce, Gainsite, Optimizely? What if I could pull together enough of the insights across these systems that we could actually work to create an experience for our customer that is seamless?” Griffin said. Her advice for others: “Think about the hardest, gnarliest problem your industry has, and start thinking about how agentic AI is going to solve that.” The future of AI learning: Writer’s self-evolving models remember mistakes and learn without retraining The event also featured a presentation from Waseem AlShikh, Writer’s co-founder and CTO, who unveiled research into “self-evolving models” — AI systems that can learn from their mistakes over time without additional training. “If we expect AI to behave more like a human, we need it to learn more like a human,” AlShikh explained. He demonstrated how traditional AI models repeatedly make the same errors when faced with a maze challenge, while self-evolving models remember past failures and find better solutions. “This unique architecture means that over time, as the model is used, it gains knowledge — a model that gets smarter the more you engage with it,” AlShikh said. Writer expects to have self-evolving models in the pilot by the end of the year. Inside Writer’s $1.9 billion valuation: How enterprise AI adoption is driving explosive growth Writer’s aggressive expansion comes after raising $200 million in Series C funding last November, which valued the company at $1.9 billion. The funding round was co-led by Premji Invest, Radical Ventures and ICONIQ Growth, with participation from major enterprise players including Salesforce Ventures, Adobe Ventures and IBM Ventures. The company has witnessed impressive growth, with a reported 160% net retention rate. This means customers typically expand their contracts by 60% on average after initial adoption. According to a Forbes report published today, some clients have grown from initial contracts of $200,000-$300,000 to spending approximately $1 million each. Writer’s approach differs from competitors like OpenAI and Anthropic, which have raised billions but focus more on developing general-purpose AI models. Instead, Writer has developed its own models—Palmyra—specifically designed for enterprise use cases. “We trained our own models even though everyone advised against it,” AlShikh told Forbes. This strategy has allowed Writer to create AI that’s more secure for enterprise deployment, as client data is retrieved from dedicated servers and isn’t used to train models, mitigating concerns about sensitive information leaks. Navigating the $114 billion enterprise AI market: Opportunities and obstacles ahead Writer’s ambitions face obstacles in a competitive landscape. The enterprise AI software market — projected to grow from $58 billion to $114 billion by 2027 — is attracting intense competition from established tech giants and well-funded startups alike. Paul Dyrwal, VP of generative AI at Marriott who appeared at Writer’s press conference, shared advice for enterprises navigating this rapidly evolving field: “Focus on fewer, higher-value opportunities rather than chasing every possibility.” The announcement also comes amid growing concerns about AI’s impact on jobs. While Habib acknowledged that AI will change work dramatically, she painted an optimistic picture of the transition. “Your people are instrumental to redesigning

Writer unveils ‘AI HQ’ platform, betting on agents to transform enterprise work Read More »