VentureBeat

Halliday unveils AI smart glasses with lens-free AR viewing

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Halliday has come up with eyewear we didn’t know we needed: smart glasses that project images directly onto your eye. Unveiled at CES 2025‘s CES Unveiled event, Halliday showed its AI smart glasses that beams images directly to your field of vision — beaming them to your eye instead of a lens — without needing a lens to do that. Blending fashion with functionality, it is the first AI glasses to feature a unique proactive AI agent and theDigiWindow technology, a completely unnoticeable, first-of-its-kind near-eye display that beams information directly within a user’s field of vision without a lens. The glasses still have lenses that can carry your prescription, but this is different in that there is nothing projected onto the lens itself. Inspired by the visionary character James Halliday from Steven Spielberg’s (and Ernest Cline’s) Ready Player One, these iconic frames are as versatile as they are stylish, accommodating any prescription lens—or none at all. Available in matte black or tortoiseshell, they exude timeless sophistication. (Halliday is not affiliated with either Spielberg or Cline). This discreetly delivers information and AI-enhanced insights directly to the user’s eyes, setting a new standard for what wearable technology can achieve. The glasses use a minimal optical module projection technology called DigiWindow to project images on the glass in front of your eyes. The eyewear can also serve as prescription glasses. Invisible, undetectable superpowers Halliday can put images on its smart glasses. It’s invisible to anyone you are conversing with (except at night when you may see a green light), but there is no camera on the device so it doesn’t have the same privacy issue that other smart glasses recording devices have. The device is equipped with a microphone integrated into the eyewear, enabling the user to engage with the proactive AI agent. Like voice assistance such as Siri or Alexa, the user is required to manually launch the Halliday AI application for it to commence listening to conversations but the audio is not permanently recorded. Halliday may look like an ordinary pair of nostalgic, retro glasses, but appearances can be deceiving. Developed by Ph.D. engineers from Stanford with deep expertise in advanced optics, Halliday’s DigiWindow is the world’s smallest and lightest near-eye display module, making invisible displays a reality. Installed on the upper-right part of the frame, the DigiWindow seamlessly projects information within the user’s natural field of vision, regardless of whether they have perfect eyesight or require vision correction. Perceived as a 3.5-inch screen in the upper-right corner of the user’s view, DigiWindow delivers essential information without obstructing the user’s main field of view or requiring a lens. Unlike traditional smart glasses that use waveguide lenses—which often suffer from issues like rainbow patterns and front light leakage—Halliday adopts a unique technological approach that offers superior display quality without encountering the typical issues. Additionally, Halliday glasses work in outdoor visibility; with its exceptional light efficiency, the display remains clear even under the brightest sunlight, providing a stable viewing experience across all scenarios. Invisible to onlookers, Halliday said it provides users with a hidden superpower to tackle life’s challenges. Designed for ultimate discretion, it is primarily controlled through a sleek trackpad ring, with additional interaction options available via voice commands and a built-in frame interface. This blend of style and advanced engineering empowers users to be smarter, more capable, and always one step ahead, Halliday said. Proactive AI: Assistance before you ask The components in Halliday’s smart glasses. Halliday redefines what an AI agent can do—it doesn’t just respond to commands; it anticipates users’ needs and offers assistance proactively. While traditional reactive AI assistants are limited to basic tasks like setting alarms or identifying objects, Halliday’s proactive AI agent does more. It seamlessly analyzes conversations, answers direct questions, and offers additional insights—all without waiting for a prompt. The proactive AI agent actively listens, understands, and processes the ongoing context of conversations and tasks, enabling it to identify opportunities to add value intuitively. For instance, during a meeting, it can proactively answer complex questions, summarize key discussion points, and generate summarized meeting notes afterward. Always a step ahead, it predicts the next logical move before you even ask, ensuring a smooth and highly effective experience. In the blink of an eye, Halliday transforms everyday moments with powerful intelligence. Connected to smartphones via Bluetooth, it offers a range of remarkable capabilities: Echo Mode: The proactive AI agent anticipates user needs and automatically provides suggestions and answers based on context, leading to more effective conversations and meetings. Audio Memo: Capture and summarize audio content from meetings or conversations quickly and accurately, allowing for efficient note-taking and recall. Notifications & Instant Replies: Discreetly check messages and respond with quick actions, all without anyone noticing. Cheat Sheet/Teleprompter: Speak effortlessly without notes—Halliday discreetly displays prompts, keeping you polished and confident. AI Translation: Real-time translation in up to 40 languages ensures smooth communication anywhere in the world. Navigation: Access real-time navigation directly on your glasses, eliminating the need to check your phone for directions. Music & Lyrics: Enjoy your favorite songs with synchronized lyrics displayed in real-time. Quick Notes: Voice-to-text instant note-taking for your everyday ideas and inspirations. Retro Design with Revolutionary Technology Halliday effortlessly combines a nostalgic, retro aesthetic with cutting-edge technology, creating a fashion-forward accessory for any occasion. Weighing just 1.23 ounces (35 grams) —nearly half the weight of traditional smart glasses—and offering eight hours of continuous use, Halliday provides exceptional comfort for all-day wear. Halliday said this tool combines vintage charm and retro elegance with innovation, offering users an invisible superpower to navigate life’s challenges with confidence and style. Availability and price Halliday smart glasses Halliday will be available for purchase after its debut at the CES conference, with prices starting at an accessible $399 to $499 range. Shipping is expected to begin by the end of Q1 2025. Founded in 2021, Shenzhen, China-based Halliday is a

Halliday unveils AI smart glasses with lens-free AR viewing Read More »

Despite intense AI arms race, we’re in for a multi-model future

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Every week — sometimes every day—a new state-of-the-art AI model is born to the world. As we move into 2025, the pace at which new models are being released is dizzying, if not exhausting. The curve of the rollercoaster is continuing to grow exponentially, and fatigue and wonder have become constant companions. Each release highlights why this particular model is better than all others, with endless collections of benchmarks and bar charts filling our feeds as we scramble to keep up. The number of large foundation models released each year has been exploding since 2020Charlie Giattino, Edouard Mathieu, Veronika Samborska and Max Roser (2023) – “Artificial Intelligence” Published online at OurWorldinData.org. Eighteen months ago, the vast majority of developers and businesses were using a single AI model. Today, the opposite is true. It is rare to find a business of significant scale that is confining itself to the capabilities of a single model. Companies are wary of vendor lock-in, particularly for a technology which has quickly become a core part of both long-term corporate strategy and short-term bottom-line revenue. It is increasingly risky for teams to put all their bets on a single large language model (LLM). But despite this fragmentation, many model providers still champion the view that AI will be a winner-takes-all market. They claim that the expertise and compute required to train best-in-class models is scarce, defensible and self-reinforcing. From their perspective, the hype bubble for building AI models will eventually collapse, leaving behind a single, giant artificial general intelligence (AGI) model that will be used for anything and everything. To exclusively own such a model would mean to be the most powerful company in the world. The size of this prize has kicked off an arms race for more and more GPUs, with a new zero added to the number of training parameters every few months.  Deep Thought, the monolithic AGI from the Hitchhiker’s Guide to the UniverseBBC, Hitchhiker’s Guide to the Galaxy, television series (1981). Still image retrieved for commentary purposes. We believe this view is mistaken. There will be no single model that will rule the universe, neither next year nor next decade. Instead, the future of AI will be multi-model.  Language models are fuzzy commodities  The Oxford Dictionary of Economics defines a commodity as a “standardized good which is bought and sold at scale and whose units are interchangeable.” Language models are commodities in two important senses:  The models themselves are becoming more interchangeable on a wider set of tasks;  The research expertise required to produce these models is becoming more distributed and accessible, with frontier labs barely outpacing each other and independent researchers in the open-source community nipping at their heels.  Commodities describing commodities (Credit: Not Diamond) But while language models are commoditizing, they are doing so unevenly. There is a large core of capabilities for which any model, from GPT-4 all the way down to Mistral Small, is perfectly suited to handle. At the same time, as we move towards the margins and edge cases, we see greater and greater differentiation, with some model providers explicitly specializing in code generation, reasoning, retrieval-augmented generation (RAG) or math. This leads to endless handwringing, reddit-searching, evaluation and fine-tuning to find the right model for each job.  AI models are commoditizing around core capabilities and specializing at the edges. Credit: Not Diamond And so while language models are commodities, they are more accurately described as fuzzy commodities. For many use cases, AI models will be nearly interchangeable, with metrics like price and latency determining which model to use. But at the edge of capabilities, the opposite will happen: Models will continue to specialize, becoming more and more differentiated. As an example, Deepseek-V2.5 is stronger than GPT-4o on coding in C#, despite being a fraction of the size and 50 times cheaper.  Both of these dynamics — commoditization and specialization — uproot the thesis that a single model will be best-suited to handle every possible use case. Rather, they point towards a progressively fragmented landscape for AI.  Multi-model orchestration and routing There is an apt analogy for the market dynamics of language models: The human brain. The structure of our brains has remained unchanged for 100,000 years, and brains are far more similar than they are dissimilar. For the vast majority of our time on Earth, most people learned the same things and had similar capabilities.  But then something changed. We developed the ability to communicate in language — first in speech, then in writing. Communication protocols facilitate networks, and as humans began to network with each other, we also began to specialize to greater and greater degrees. We became freed from the burden of needing to be generalists across all domains, to be self-sufficient islands. Paradoxically, the collective riches of specialization have also meant that the average human today is a far stronger generalist than any of our ancestors.  On a sufficiently wide enough input space, the universe always tends towards specialization. This is true all the way from molecular chemistry, to biology, to human society. Given sufficient variety, distributed systems will always be more computationally efficient than monoliths. We believe the same will be true of AI. The more we can leverage the strengths of multiple models instead of relying on just one, the more those models can specialize, expanding the frontier for capabilities.   Multi-model systems can allow for greater specialization, capability and efficiency. Source: Not Diamond An increasingly important pattern for leveraging the strengths of diverse models is routing — dynamically sending queries to the best-suited model, while also leveraging cheaper, faster models when doing so doesn’t degrade quality. Routing allows us to take advantage of all the benefits of specialization — higher accuracy with lower costs and latency — without giving up any of the robustness of generalization. A simple demonstration of the power of routing can be seen in the fact

Despite intense AI arms race, we’re in for a multi-model future Read More »

Swave Photonics raises $28.3M for 3D holographic smartglasses and displays

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Swave Photonics, a holographic display company, has raised $28.27 million in funding as it prepares components for AI-powered smartglasses and heads-up displays. Swave said the Series A investment will catalyze the advancement of its Holographic eXtended Reality (HXR) platform, enabling a reality-first user experience for AI-powered augmented reality (AR) smartglasses and heads-up displays. The company will show its tech at CES 2025. The funding round was co-led by investors Imec.xpand and SFPIM Relaunch, with participation from new investors EIC Fund, IAG Capital Partners, and Murata Electronics North America, as well as existing investors Qbic Fund, PMV, Imec, and Luminate. Leuven, Belgium-based Swave previously raised a $10.47 million seed round in 2023, which propelled the launch of Swave’s HXR technology, as well as the expansion of Swave’s team, which has veterans in photonics and semiconductors. “This round will accelerate Swave’s product introductions as we continue to solve the challenges of today’s AR experiences through true holography,” said Mike Noonen, Swave CEO, in a statement. “We are thrilled with continued support from our existing investors and our new investors. They recognize that Swave uniquely brings together semiconductor, holographic and AI technologies in a way that will deliver cost-effective and truly useful solutions.” Swave is bringing NanoPixel holography to glasses. “AR glasses are set to become the primary interface for AI-powered spatial computing and other applications, and Swave is uniquely positioned to enable this future” said Theo Marescaux, Swave and chief product officer, in a statement. “We are co-designing every element—from our holographic SLMs with cutting-edge nano-pixels, to real-time compute chips, light engines, and AR combiners—delivering the most advanced and integrated solution yet.” “With Swave’s seed funding, we successfully built our team, proved the capabilities of the technology, and completed prototype designs”, said Dmitri Choutov, COO, in a statement. “With Series A funding secured and silicon running at our partner fabs, we are on track to introduce product development kits and soon thereafter production devices.” Swave’s HXR technology uses what it calls the “world’s smallest pixel” to shape light and sculpt high-quality 3D holographic images that create a reality-first user experience, where digital information interacts and adapts to the user’s surroundings. The images allow for the human vision system to process them naturally leveraging patented DynamicDepth technology. AR devices currently being prototyped or on the market are all faced with challenges of high cost, uncomfortable size and weight, significant power usage, and visual phenomena like Vergence-Accomodation Conflict, which cause nausea or fatigue for users. Swave’s unique HXR technology not only solves these issues, but also eliminates the need for the most costly components, such as waveguides or varifocal lenses, inherently required for existing AR devices.  Swave’s technology has been developed for over a decade and the company currently holds 60 core technology patents. Swave announced its HXR platform in April 2024, followed by the achievement of the world’s first true color holographic display, and recently announced that HXR will be recognized at CES 2025 with a CES Innovation Award. source

Swave Photonics raises $28.3M for 3D holographic smartglasses and displays Read More »

Nvidia to open-source Run:ai, the software it acquired for $700M to help companies manage GPUs for AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia has completed its acquisition of Run:ai, a software company that makes it easier for customers to orchestrate GPU clouds for AI, and said that it would open-source the software. The purchase price wasn’t disclosed, but was pegged by reports at $700 million when Nvidia first reported its intent to close the deal in April. Run:ai posted the deal news on its website today and also said that Nvidia plans to open-source the software. Run:ai’s software remotely schedules Nvidia GPU resources for AI in the cloud. Neither company explained why Run:ai will open-source its platform, but it’s probably not hard to figure out. Since Nvidia has grown to be the number one maker of AI chips, its stock price has soared to $3.56 trillion, making it the most valuable company in the world. That’s great for Nvidia, but it makes it hard for it to acquire companies because of antitrust oversight. A spokesperson for Nvidia said in a statement only that “We’re delighted to welcome the Run:ai team to Nvidia.” When Microsoft acquired Activision Blizzard for $68.7 billion, it appeased antitrust regulators by licensing Activision’s Call of Duty game to other platforms for a decade to address worries that the company would become too powerful in gaming. The same might be happening here. Run:ai founders Omri Geller and Ronen Dar said in a press release that open-sourcing its software will help the community build better AI, faster. “While Run:ai currently supports only Nvidia GPUs, open-sourcing the software will enable it to extend its availability to the entire AI ecosystem,” Geller and Dar said. They said they will continue to help their customers to get the most out of their AI Infrastructure and offer the ecosystem maximum flexibility, efficiency and utilization for GPU systems, wherever they are: on-prem, in the cloud through native solutions, or on Nvidia DGX Cloud, co-engineered with leading CSPs. The founders also said, “True to our open-platform philosophy, as part of Nvidia, we will keep empowering AI teams with the freedom to choose the tools, platforms, and frameworks that best suit their needs. We will continue to strengthen our partnerships and work alongside the ecosystem todeliver a wide variety of AI solutions and platform choices.” The Israel-based company said its goal when it was founded in 2018 was to be a driving force in the AIrevolution and empower organizations to unlock the full potential of their AI infrastructures. “Over the years, our world-class team has achieved milestones that we could only dream of back then. Together, we’ve built innovative technology, an amazing product, and an incredible go-to-market engine,” the founders said. Run:ai helps customers to orchestrate their AI Infrastructure, increase efficiency and utilization, and boost the productivity of their AI teams. “We are thrilled to build on this momentum, now as part of Nvidia. AI and accelerated computing are transforming the world at an unprecedented pace, and we believe this is just the beginning,” the Run:ai founders said. “GPUs and AI infrastructure will remain at the forefront of driving these transformative innovations and joining Nvidia provides us an extraordinary opportunity to carry forward a joint mission of helping humanity solve the world’s greatest challenges.” Nvidia has been a longtime maker of graphics chips, and those chips have become a lot more useful in recent years in running AI software. Now the company is also emphasizing software, and this acquisition is aimed at giving customers maximum choice, efficiency and flexibility for GPU orchestration software. Nvidia and Run:ai have been working together since 2020 and they have joint customers. TLV Partners led the seed round for Run:ai in 2018. Rona Segev, managing director of TLV, said in a statement, “The AI market in early 2018 seemed like a different world. OpenAI was still a research company and Nvidia’s market cap was ‘only’ around $100 billion. We met Omri and Ronen who painted a picture for us of what the future of AI would look like. In their vision of the future, AI was ubiquitous.” Segev added, “Everyone on the planet would be interacting with AI daily, and it would be obvious that every company would be leveraging AI in one way or another. The only thing preventing that vision from becoming a reality, according to them, was the lack of efficiency and [the] costs associated with training AI models and running them in production on multiple GPU clusters. To solve this problem, Omri and Ronen pitched an idea of creating an orchestration layer between AI models and GPUs that would enable a much more efficient use of the underlying compute resources leading to faster training times and significantly reduced costs.” And Segev said, “Of course, this was all theoretical at the time as they hadn’t yet incorporated a company, let alone a product. We didn’t know much about the industry at the time. But there was something special about Omri and Ronen. They had a unique combination of intellect, charm, craziness and humility that created the perfect recipe for the type of founders we’re looking to back.” source

Nvidia to open-source Run:ai, the software it acquired for $700M to help companies manage GPUs for AI Read More »

How Meta leverages generative AI to understand user intent

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Meta — parent company of Facebook, Instagram, WhatsApp, Threads and more — runs one of the biggest recommendation systems in the world. In two recently released papers, its researchers have revealed how generative models can be used to better understand and respond to user intent.  By looking at recommendations as a generative problem, you can tackle it in new ways that are richer in content and more efficient than classic approaches. This approach can have important uses for any application that requires retrieving documents, products or other kinds of objects. Dense vs generative retrieval The standard approach to creating recommendation systems is to compute, store and retrieve dense representations of documents. For example, to recommend items to users, an application must train a model that can compute embeddings for the users’ requests and embeddings for a large store of items.  At inference time, the recommendation system tries to understand the user’s intent by finding one or more items whose embeddings are similar to the user’s. This approach requires an increasing amount of storage and computation capacity as the number of items grows because every item embedding must be stored and every recommendation operation requires comparing the user embedding against the entire item store. Dense retrieval (source: arXiv) Generative retrieval is a more recent approach that tries to understand user intent and make recommendations not by searching a database but by simply predicting the next item in a sequence of things it knows about a user’s interactions. Here’s how it works: The key to making generative retrieval work is to compute “semantic IDs” (SIDs) which contain the contextual information about each item. Generative retrieval systems like TIGER work in two phases. First, an encoder model is trained to create a unique embedding value for each item based on its description and properties. These embedding values become the SIDs and are stored along with the item.  Generative retrieval (source: arXiv) In the second stage, a transformer model is trained to predict the next SID in an input sequence. The list of input SIDs represents the user’s interactions with past items, and the model’s prediction is the SID of the item to recommend. Generative retrieval reduces the need for storing and searching across individual item embeddings. So its inference and storage costs remain constant as the list of items grows. It also enhances the ability to capture deeper semantic relationships within the data, and provides other benefits of generative models, such as modifying the temperature to adjust the diversity of recommendations.  Advanced generative retrieval Despite its lower storage and inference costs, generative retrieval suffers from some limitations. For example, it tends to overfit to the items it has seen during training, which means it has trouble dealing with items that were added to the catalog after the model was trained. In recommendation systems, this is often referred to as “the cold start problem,” which pertains to users and items that are new and have no interaction history.  To address these shortcomings, Meta has developed a hybrid recommendation system called LIGER, which combines the computational and storage efficiencies of generative retrieval with the robust embedding quality and ranking capabilities of dense retrieval. During training, LIGER uses both similarity score and next-token goals to improve the model’s recommendations. During inference, LIGER selects several candidates based on the generative mechanism and supplements them with a few cold-start items, which are then ranked based on the embeddings of the generated candidates.  LIGER combines generative and dense retrieval (source: arXiv) The researchers note that “the fusion of dense and generative retrieval methods holds tremendous potential for advancing recommendation systems,” and as the models evolve “they will become increasingly practical for real-world applications, enabling more personalized and responsive user experiences.” In a separate paper, the researchers introduce a novel multimodal generative retrieval method named Multimodal preference discerner (Mender), a technique that can enable generative models to pick up implicit preferences from users’ interactions with different items. Mender builds on top of the generative retrieval methods based on SIDs and adds a few components that can enrich recommendations with user preferences. Mender uses a large language model (LLM) to translate user interactions into specific preferences. For example, if the user has praised or complained about a specific item in a review, the model will summarize it into a preference about that product category.  The main recommender model is trained to be conditioned both on the sequence of user interactions and the user preferences when predicting the next semantic ID in the input sequence. This gives the recommender model the ability to generalize and perform in-context learning and to adapt to user preferences without being explicitly trained on them. “Our contributions pave the way for a new class of generative retrieval models that unlock the ability to utilize organic data for steering recommendation via textual user preferences,” the researchers write. Mender recommendation framework (source: arXiv) Implications for enterprise applications The efficiency provided by generative retrieval systems can have important implications for enterprise applications. These advancements translate into immediate practical benefits, including reduced infrastructure costs and faster inference. The technology’s ability to maintain constant storage and inference costs regardless of catalog size makes it particularly valuable for growing businesses. The benefits extend across industries, from ecommerce to enterprise search. Generative retrieval is still in its early stages and we can expect applications and frameworks to emerge as it matures. source

How Meta leverages generative AI to understand user intent Read More »

Five breakthroughs that make OpenAI’s o3 a turning point for AI — and one big challenge

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The end of the year 2024 has brought reckonings for artificial intelligence, as industry insiders feared progress toward even more intelligent AI was slowing down. But OpenAI’s o3 model, announced just last week, has sparked a fresh wave of excitement and debate, and suggests big improvements are still to come in 2025 and beyond. This model, announced for safety testing among researchers, but not yet released publicly, achieved an impressive score on the important ARC metric. The benchmark was created by François Chollet, a renowned AI researcher and creator of the Keras deep learning framework, and is specifically designed to measure a model’s ability to handle novel, intelligent tasks. As such, it provides a meaningful gauge of progress toward truly intelligent AI systems. Notably, o3 scored 75.7% on the ARC benchmark under standard compute conditions and 87.5% using high compute, significantly surpassing previous state-of-the-art results, such as the 53% scored by Claude 3.5. This achievement by o3 represents a surprising advancement, according to Chollet, who had been a critic of the ability of large language models (LLMs) to achieve this sort of intelligence. It highlights innovations that could accelerate progress toward superior intelligence, whether we call it artificial general intelligence (AGI) or not. AGI is a hyped term, and ill-defined, but it signals a goal: intelligence capable of adapting to novel challenges or questions in ways that surpass human abilities. OpenAI’s o3 tackles specific hurdles in reasoning and adaptability that have long stymied large language models. At the same time, it exposes challenges, including the high costs and efficiency bottlenecks inherent in pushing these systems to their limits. This article will explore five key innovations behind the o3 model, many of which are underpinned by advancements in reinforcement learning (RL). It will draw on insights from industry leaders, OpenAI’s claims, and above all Chollet’s important analysis, to unpack what this breakthrough means for the future of AI as we move into 2025. The five core innovations of o3 1. “Program synthesis” for task adaptation OpenAI’s o3 model introduces a new capability called “program synthesis,” which enables it to dynamically combine things that it learned during pre-training — specific patterns, algorithms, or methods — into new configurations. These things might include mathematical operations, code snippets, or logical procedures that the model has encountered and generalized during its extensive training on diverse datasets. Most significantly, program synthesis allows o3 to address tasks it has never directly seen in training, such as solving advanced coding challenges or tackling novel logic puzzles that require reasoning beyond rote application of learned information. François Chollet describes program synthesis as a system’s ability to recombine known tools in innovative ways — like a chef crafting a unique dish using familiar ingredients. This feature marks a departure from earlier models, which primarily retrieve and apply pre-learned knowledge without reconfiguration — and it’s also one that Chollet had advocated for months ago as the only viable way forward to better intelligence.  2. Natural language program search At the heart of o3’s adaptability is its use of chains of thought (CoTs) and a sophisticated search process that takes place during inference — when the model is actively generating answers in a real-world or deployed setting. These CoTs are step-by-step natural language instructions the model generates to explore solutions. Guided by an evaluator model, o3 actively generates multiple solution paths and evaluates them to determine the most promising option. This approach mirrors human problem-solving, where we brainstorm different methods before choosing the best fit. For example, in mathematical reasoning tasks, o3 generates and evaluates alternative strategies to arrive at accurate solutions. Competitors like Anthropic and Google have experimented with similar approaches, but OpenAI’s implementation sets a new standard. 3. Evaluator model: A new kind of reasoning O3 actively generates multiple solution paths during inference, evaluating each with the help of an integrated evaluator model to determine the most promising option. By training the evaluator on expert-labeled data, OpenAI ensures that o3 develops a strong capacity to reason through complex, multi-step problems. This feature enables the model to act as a judge of its own reasoning, moving large language models closer to being able to “think” rather than simply respond. 4. Executing Its own programs One of o3’s most groundbreaking features is its ability to execute its own CoTs as tools for adaptive problem-solving. Traditionally, CoTs have been used as step-by-step reasoning frameworks to solve specific problems. OpenAI’s o3 extends this concept by leveraging CoTs as reusable building blocks, allowing the model to approach novel challenges with greater adaptability. Over time, these CoTs become structured records of problem-solving strategies, akin to how humans document and refine their learning through experience. This ability demonstrates how o3 is pushing the frontier in adaptive reasoning. According to OpenAI engineer Nat McAleese, o3’s performance on unseen programming challenges, such as achieving a CodeForces rating above 2700, showcases its innovative use of CoTs to rival top competitive programmers. This 2700 rating places the model at “Grandmaster” level, among the top echelon of competitive programmers globally. 5. Deep learning-guided program search O3 uses a deep learning-driven approach during inference to evaluate and refine potential solutions to complex problems. This process involves generating multiple solution paths and using patterns learned during training to assess their viability. François Chollet and other experts have noted that this reliance on “indirect evaluations” — where solutions are judged based on internal metrics rather than tested in real-world scenarios — can limit the model’s robustness when applied to unpredictable or enterprise-specific contexts. Additionally, o3’s dependence on expert-labeled datasets for training its evaluator model raises concerns about scalability. While these datasets enhance precision, they also require significant human oversight, which can restrict the system’s adaptability and cost-efficiency. Chollet highlights that these trade-offs illustrate the challenges of scaling reasoning systems beyond controlled benchmarks like ARC-AGI. Ultimately, this approach demonstrates both the potential and the limitations of integrating

Five breakthroughs that make OpenAI’s o3 a turning point for AI — and one big challenge Read More »

CES 2025 tips and tricks: A guide to tech’s biggest trade show

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More CES 2025 is coming to Las Vegas during the week of January 5, and it will be one of the biggest tech expos in the world again. This year’s show should be big again. Last year’s attendance reached 138,789, according to an audited report by the Consumer Technology Association (CTA), the group that puts on the show. Last year at CES 2024, I recorded around 80 press events, interviews, and sessions. I walked 46.78 miles, or 105,407 steps, over six days. My feet hurt and my back were sore. There were more than 4,300 exhibitors and 2.4 million square feet of exhibit space to crawl. And the Goodyear blimp was there. I’ve been attending the Consumer Electronics Show since the 1990s when then-Microsoft CEO Bill Gates gave the opening keynote speeches every year. This time, the biggest speech will come from Jensen Huang, CEO of Nvidia, the graphics chip maker that has become the king of AI hardware with a market value of $3.42 trillion — the most valuable company in the world. He will give a talk at 6:30 p.m. Pacific time at the Michelob Arena at the Mandalay Bay on January 6. Most attendees arrive at the show on January 7 and stay through January 10, when the expos are open. But the press — a few thousand of us — start arriving on January 5 for the afternoon previews and CES Unveiled (press only, in Mandalay Bay), where award-winning exhibitors show their wares at tables. “The CES is an amazing, powerful tech event. I was looking back at what you had written last year about it, before and after,” said Gary Shapiro, CEO of the Consumer Technology Association, in an interview with GamesBeat. “A lot of people go with a very full agenda, but we always say you have to have time for serendipity and discovery. We have a new look, a new feel. We focused the campaign on “Dive in.” We’re inviting attendees to do three things: connect, solve, and discover.” He said the average attendee has about 29 meetings during the show, as face-to-face business is still important. About 75% of attendees say their business is primarily B2B, or both B2B and B2C. Gary Shapiro is CEO of the CTA, creator of CES. My No. 1 tip is to always wear comfortable shoes. I learned that lesson after some blisters during a CES years ago. As for a mask, I wear it on planes and occasionally during big, crowded indoor events. But things have changed since the COVID days when the show was canceled outright in January 2021 and severely restricted in January 2022, with only 45,000 showing up. Much of my advice is not rocket science. But I renew this story every year since there are new people attending the show and many going for the first time. I take no responsibility for bad advice. You can check out the CES app here. Attendance is not quite where it once was. Back in 2019, the show drew 175,212 in 2019 and 171,268 in 2020. CES 2020 had about 4,500 exhibitors across 2.9 million square feet of space. This year, the show is at least a few days after New Year’s and that gives some breathing room for those planning on going. As a bonus, here’s my gaming predictions for 2025. The latest word from the CTA: “CES 2025 will be the world’s most powerful business event, setting the tech agenda for the year. With nearly 140,000 global attendees at CES 2024, we are seeing positive momentum and interest from industry executives, top manufacturers, buyers, retailers, and media for CES 2025. Thousands of startups and companies from around the world will showcase innovation that will solve some of our most pressing challenges.” Should you go? The line to get into the first press event at CES 2024. It’s a big show, and, unlike the gaming industry’s canceled E3 show, it’s still relevant. Last year, there were 5,355 media last year, up from 4,800 media attending CES 2023, up from 3,100 media at CES 2022. Eureka Park at the Venetian’s Sands Expo had more than 1,400 startups. There were 46,000 international visitors, or 40.6% of the total, last year. Some 5,975 were from the Americas outside the U.S., 12,424 were from Europe, 36,017 were from the Middle East and Asia, 229 were from Africa and 552 were from Oceania. About 15,723 were presidents or founders, or 11.7% of overall attendance, and another 12,768, or 9.5%, were C-suite executives. Last year’s top areas of interest were AI, vehicle tech, IoT sensors, smarthomes and appliances, AR/XR/VR, robotics, marketing and advertising, startups, video technologies, 5G, energy/power, cloud computing/data and fitness/wearables. The year before, the top areas of interest were AI, IoT/sensors, vehicle technology, AR/VR/XR, smarthomes and appliances, 5G, robotics, startups, energy/power, fitness and wearables and marketing and advertising. I still view CES as a bellwether for the tech economy, as no other event spans the entire tech world like it does. Companies want to create a buzz at CES, which is designed to signal products coming in the next year. I find the show a useful way to stay up to speed on the latest technology. If you find the health risk acceptable, then it can still be a valuable way to stay in touch. Apple doesn’t attend the show, but just about every other tech giant does. It’s where the tech industry will be next week, though it’s not so much of a game event these days. Sony, however, will be showing up and they are expected to show off the Afeela electric car (created with Honda) complete with a PlayStation 5 in the vehicle at the show. Gary Shapiro wore his mask at CES 2022 during the Omicron wave. Since 2020, CES official exhibit venues have been equipped with improved ventilation systems and fresh air flow.  The Las Vegas Convention Center, Mandalay

CES 2025 tips and tricks: A guide to tech’s biggest trade show Read More »

Unintended consequences: U.S. election results could vastly accelerate AI development

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More While the 2024 U.S. election focused on traditional issues like the economy and immigration, its quiet impact on AI policy could prove even more transformative. Without a single debate question or major campaign promise about AI, voters inadvertently tipped the scales in favor of accelerationists — those who advocate for rapid AI development with minimal regulatory hurdles. The implications of this acceleration are profound, heralding a new era of AI policy that prioritizes innovation over caution and signals a decisive shift in the debate between AI’s potential risks and rewards. The pro-business stance of President-elect Donald Trump leads many to assume that his administration will favor those developing and marketing AI and other advanced technologies. His party platform has little to say about AI. However, it does emphasize a policy approach focused on repealing AI regulations, particularly targeting what they described as “radical left-wing ideas” within existing executive orders of the outgoing administration. In contrast, the platform supported AI development aimed at fostering free speech and “human flourishing,” calling for policies that enable innovation in AI while opposing measures perceived to hinder technological progress. Early indications based on appointments to leading government positions underscore this direction. However, there is a larger story unfolding: The resolution of the intense debate over AI’s future. An intense debate Ever since ChatGPT appeared in November 2022, there has been a raging debate between those in the AI field who want to accelerate AI development and those who want to decelerate. Famously, in March 2023 the latter group proposed a six-month AI pause in development of the most advanced systems, warning in an open letter that AI tools present “profound risks to society and humanity.” This letter, spearheaded by the Future of Life Institute, was prompted by OpenAI’s release of the GPT-4 large language model (LLM), several months after ChatGPT launched. The letter was initially signed by more than 1,000 technology leaders and researchers, including Elon Musk, Apple Co-founder Steve Wozniak, 2020 Presidential candidate Andrew Yang, podcaster Lex Fridman, and AI pioneers Yoshua Bengio and Stuart Russell. The number of signees of the letter eventually swelled to more than 33,000. Collectively, they became known as “doomers,” a term to capture their concerns about potential existential risks from AI. Not everyone agreed. OpenAI CEO Sam Altman did not sign. Nor did Bill Gates and many others. Their reasons for not doing so varied, although many voiced concerns about potential harm from AI. This led to many conversations about the potential for AI to run amok, leading to disaster. It became fashionable for many in the AI field to talk about their assessment of the probability of doom, often referred to as an equation: p(doom). Nevertheless, work on AI development did not pause. For the record, my p(doom) in June 2023 was 5%. That might seem low, but it was not zero. I felt that the major AI labs were sincere in their efforts to stringently test new models prior to release and in providing significant guardrails for their use. Many observers concerned about AI dangers have rated existential risks higher than 5%, and some have rated much higher. AI safety researcher Roman Yampolskiy rated the probability of AI ending humanity at over 99%. That said, a study released early this year, well before the election and representing the views of more than 2,700 AI researchers, showed that “the median prediction for extremely bad outcomes, such as human extinction, was 5%.” Would you board a plane if there were a 5% chance it might crash? This is the dilemma AI researchers and policymakers face. Must go faster Others have been openly dismissive of worries about AI, pointing instead to what they perceived as the huge upside of the technology. These include Andrew Ng (who founded and led the Google Brain project) and Pedro Domingos (a professor of computer science and engineering at the University of Washington and author of “The Master Algorithm”). They argued, instead, that AI is part of the solution. As put forward by Ng, there are indeed existential dangers, such as climate change and future pandemics, and AI can be part of how these are addressed and mitigated. Ng argued that AI development should not be paused, but should instead go faster. This utopian view of technology has been echoed by others who are collectively known as “effective accelerationists” or “e/acc” for short. They argue that technology — and especially AI — is not the problem, but the solution to most, if not all, of the world’s issues. Startup accelerator Y Combinator CEO Garry Tan, along with other prominent Silicon Valley leaders, included the term “e/acc” in their usernames on X to show alignment to the vision. Reporter Kevin Roose at the New York Times captured the essence of these accelerationists by saying they have  an “all-gas, no-brakes approach.” A Substack newsletter from a couple years ago described the principles underlying effective accelerationism. Here is the summation they offer at the end of the article, plus a comment from OpenAI CEO Sam Altman. AI acceleration ahead The 2024 election outcome may be seen as a turning point, putting the accelerationist vision in a position to shape U.S. AI policy for the next several years. For example, the President-elect recently appointed technology entrepreneur and venture capitalist David Sacks as “AI czar.” Sacks, a vocal critic of AI regulation and a proponent of market-driven innovation, brings his experience as a technology investor to this role. He is one of the leading voices in the AI industry, and much of what he has said about AI aligns with the accelerationist viewpoints expressed by the incoming party platform. In response to the AI executive order from the Biden administration in 2023, Sacks tweeted: “The U.S. political and fiscal situation is hopelessly broken, but we have one unparalleled asset as a country: Cutting-edge innovation in AI driven by a completely

Unintended consequences: U.S. election results could vastly accelerate AI development Read More »

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3. Available via Hugging Face under the company’s license agreement, the new model comes with 671B parameters but uses a mixture-of-experts architecture to activate only select parameters, in order to handle given tasks accurately and efficiently. According to benchmarks shared by DeepSeek, the offering is already topping the charts, outperforming leading open-source models, including Meta’s Llama 3.1-405B, and closely matching the performance of closed models from Anthropic and OpenAI. The release marks another major development closing the gap between closed and open-source AI. Ultimately, DeepSeek, which started as an offshoot of Chinese quantitative hedge fund High-Flyer Capital Management, hopes these developments will pave the way for artificial general intelligence (AGI), where models will have the ability to understand or learn any intellectual task that a human being can. What does DeepSeek-V3 bring to the table? Just like its predecessor DeepSeek-V2, the new ultra-large model uses the same basic architecture revolving around multi-head latent attention (MLA) and DeepSeekMoE. This approach ensures it maintains efficient training and inference — with specialized and shared “experts” (individual, smaller neural networks within the larger model) activating 37B parameters out of 671B for each token. While the basic architecture ensures robust performance for DeepSeek-V3, the company has also debuted two innovations to further push the bar. The first is an auxiliary loss-free load-balancing strategy. This dynamically monitors and adjusts the load on experts to utilize them in a balanced way without compromising overall model performance. The second is multi-token prediction (MTP), which allows the model to predict multiple future tokens simultaneously. This innovation not only enhances the training efficiency but enables the model to perform three times faster, generating 60 tokens per second. “During pre-training, we trained DeepSeek-V3 on 14.8T high-quality and diverse tokens…Next, we conducted a two-stage context length extension for DeepSeek-V3,” the company wrote in a technical paper detailing the new model. “In the first stage, the maximum context length is extended to 32K, and in the second stage, it is further extended to 128K. Following this, we conducted post-training, including Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on the base model of DeepSeek-V3, to align it with human preferences and further unlock its potential. During the post-training stage, we distill the reasoning capability from the DeepSeekR1 series of models, and meanwhile carefully maintain the balance between model accuracy and generation length.” Notably, during the training phase, DeepSeek used multiple hardware and algorithmic optimizations, including the FP8 mixed precision training framework and the DualPipe algorithm for pipeline parallelism, to cut down on the costs of the process. Overall, it claims to have completed DeepSeek-V3’s entire training in about 2788K H800 GPU hours, or about $5.57 million, assuming a rental price of $2 per GPU hour. This is much lower than the hundreds of millions of dollars usually spent on pre-training large language models. Llama-3.1, for instance, is estimated to have been trained with an investment of over $500 million.  Strongest open-source model currently available Despite the economical training, DeepSeek-V3 has emerged as the strongest open-source model in the market. The company ran multiple benchmarks to compare the performance of the AI and noted that it convincingly outperforms leading open models, including Llama-3.1-405B and Qwen 2.5-72B. It even outperforms closed-source GPT-4o on most benchmarks, except English-focused SimpleQA and FRAMES — where the OpenAI model sat ahead with scores of 38.2 and 80.5 (vs 24.9 and 73.3), respectively. Notably, DeepSeek-V3’s performance particularly stood out on the Chinese and math-centric benchmarks, scoring better than all counterparts. In the Math-500 test, it scored 90.2, with Qwen’s score of 80 the next best.  The only model that managed to challenge DeepSeek-V3 was Anthropic’s Claude 3.5 Sonnet, outperforming it with higher scores in MMLU-Pro, IF-Eval, GPQA-Diamond, SWE Verified and Aider-Edit. 🚀 Introducing DeepSeek-V3! Biggest leap forward yet:⚡ 60 tokens/second (3x faster than V2!)💪 Enhanced capabilities🛠 API compatibility intact🌍 Fully open-source models & papers 🐋 1/n pic.twitter.com/p1dV9gJ2Sd — DeepSeek (@deepseek_ai) December 26, 2024 The work shows that open-source is closing in on closed-source models, promising nearly equivalent performance across different tasks. The development of such systems is extremely good for the industry as it potentially eliminates the chances of one big AI player ruling the game. It also gives enterprises multiple options to choose from and work with while orchestrating their stacks. Currently, the code for DeepSeek-V3 is available via GitHub under an MIT license, while the model is being provided under the company’s model license. Enterprises can also test out the new model via DeepSeek Chat, a ChatGPT-like platform, and access the API for commercial use. DeepSeek is providing the API at the same price as DeepSeek-V2 until February 8. After that, it will charge $0.27/million input tokens ($0.07/million tokens with cache hits) and $1.10/million output tokens. source

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch Read More »

Test-driving Google’s Gemini-Exp-1206 model: Competitive data analysis and sophisticated visualizations in under a minute

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More One of Google’s latest experimental models, Gemini-Exp-1206, shows the potential to alleviate one of the most grueling aspects of any analyst’s job: getting their data and visualizations to sync up perfectly and provide a compelling narrative, without having to work all night. Investment analysts, junior bankers, and members of consulting teams aspiring for partnership positions take their roles knowing that long hours, weekends, and pulling the occasional all-nighter could give them an inside edge on a promotion. What burns so much of their time is getting advanced data analysis done while also creating visualizations that reinforce a compelling storyline. Making this more challenging is that every banking, fintech and consulting firm, like JP Morgan, McKinsey and PwC, has unique formats and conventions for data analysis and visualization. VentureBeat interviewed members of internal project teams whose employers had hired these firms and assigned them to the project. Employees working on consultant-led teams said producing visuals that condense and consolidate the massive amount of data is a persistent challenge. One said it was common for consultant teams to work overnight and do a minimum of three to four iterations of a presentation’s visualizations before settling on one and getting it ready for board-level updates. A compelling use case for test-driving Google’s latest model The process analysts rely on to create presentations that support a storyline with solid visualizations and graphics has so many manual steps and repetitions that it proved a compelling use case for testing Google’s latest model. In launching the model earlier in December, Google’s Patrick Kane wrote, “Whether you’re tackling complex coding challenges, solving mathematical problems for school or personal projects, or providing detailed, multistep instructions to craft a tailored business plan, Gemini-Exp-1206 will help you navigate complex tasks with greater ease.” Google noted the model’s improved performance in more complex tasks, including math reasoning, coding, and following a series of instructions. VentureBeat took Google’s Exp-1206 model for a thorough test drive this week. We created and tested over 50 Python scripts in an attempt to automate and integrate analysis and intuitive, easily understood visualizations that could simplify the complex data being analyzed. Given how hyperscalers are dominant in news cycles today, our specific goal was to create an analysis of a given technology market while also creating supporting tables and advanced graphics. Through over 50 different iterations of verified Python scripts, our findings included: The greater the complexity of a Python code request, the more the model “thinks” and tries to anticipate the desired result. Exp-1206 attempts to anticipate what’s needed from a given complex prompt and will vary what it produces by even the slightest nuance change in a prompt. We saw this in how the model would alternate between formats of table types placed directly above the spider graph of the hyperscaler market analysis we created for the test.   Forcing the model to attempt complex data analysis and visualization and produce an Excel file delivers a multi-tabbed spreadsheet. Without ever being asked for an Excel spreadsheet with multiple tabs, Exp-1206 created one. The primary tabular analysis requested was on one tab, visualizations on another, and an ancillary table on the third. Telling the model to iterate on the data and recommend the 10 visualizations it decides best fit the data delivers beneficial, insightful results. Aiming to reduce the time drain of having to create three or four iterations of slide decks before a board review, we forced the model to produce multiple concept iterations of images. These could be easily cleaned up and integrated into a presentation, saving many hours of manual work creating diagrams on slides. Pushing Exp-1206 toward complex, layered tasks VentureBeat’s goal was to see how far the model could be pushed in terms of complexity and layered tasks. Its performance in creating, running, editing and fine-tuning 50 different Python scripts showed how quickly the model attempts to pick up on nuances in code and react immediately. The model flexes and adapts based on prompt history. The result of running Python code created with Exp-1206 in Google Colab showed that the nuanced granularity extended into shading and translucency of layers in an eight-point spider graph that was designed to show how six hyperscaler competitors compare. The eight attributes we asked Exp-1206 to identify across all hyperscalers and to anchor the spider graph stayed consistent, while graphical representations varied. Battle of the hyperscalers We chose the following hyperscalers to compare in our test: Alibaba Cloud, Amazon Web Services (AWS), Digital Realty, Equinix, Google Cloud Platform (GCP), Huawei, IBM Cloud, Meta Platforms (Facebook), Microsoft Azure, NTT Global Data Centers, Oracle Cloud, and Tencent Cloud. Next, we wrote an 11-step prompt of over 450 words. The goal was to see how well Exp-1206 can handle sequential logic and not lose its place in a complex multistep process. (You can read the prompt in the appendix at the end of this article.) We next submitted the prompt in Google AI Studio, selecting the Gemini Experimental 1206 model, as shown in the figure below. Next, we copied the code into Google Colab and saved it into a Jupyter notebook (Hyperscaler Comparison – Gemini Experimental 1206.ipynb), then ran the Python script. The script ran flawlessly and created three files (denoted with the red arrows in the upper left). Hyperscaler comparative analysis and a graphic — in less than a minute The first series of instructions in the prompt asked Exp-1206 to create a Python script that would compare 12 different hyperscalers by their product name, unique features and differentiators, and data center locations. Below is how the Excel file that was requested in the script turned out. It took less than a minute to format the spreadsheet to shrink it to fit in the columns. The next series of commands asked for a table of the top six hyperscalers compared across the top of a page and the

Test-driving Google’s Gemini-Exp-1206 model: Competitive data analysis and sophisticated visualizations in under a minute Read More »