VentureBeat

Here’s what AI-powered startups need to succeed in 2025

Presented by Twilio In 2024, thousands of startups emerged built on the powerful capabilities of cutting-edge large language models (LLMs). Statistically speaking, only one-fifth will survive to the end of 2025. To make it beyond this year, these companies will need an edge. That said, I’ve never been more excited about the potential of a new tech sector. AI-powered startups will remake our world in ways we can’t imagine yet — if they have the ingredients to succeed. Serving as a judge for Twilio’s Startup Searchlight 2.0 competition, which celebrates the builders creating the future of communications and customer engagement, drove this point home for me. We selected 12 honorees from among the more than 500 companies that applied. All of the winners embody a few basic principles that startups need to remember when building AI-powered solutions. Keep these in mind, and you will have a good start on building an AI business that will last. 1. Focus on business basics You can’t count on AI alone to give you a competitive advantage — it’s too ubiquitous. The challenges of starting and running a startup remain much as they were before LLMs came along. You need to attract, convert and retain customers. You need to keep costs under control: While AI is getting cheaper all the time, it is still possible to run up the tab if you create complicated workflows (and take note that 72% of IT and financial leaders say AI costs are becoming “unmanageable”). Of course, you also need to establish and defend a sustainable competitive advantage. AI’s power can be your advantage too, if you can take something that is currently complicated to do and encapsulate it in an easy-to-use API framework. That’s what Twilio did for telephony a decade ago, and it’s what the big AI models are doing today. If you want to build a sustainable tech business today, think about how you can deliver it via an API. 2. Build more than a wrapper If you’re just creating a “wrapper” for existing LLMs, you won’t be able to maintain differentiation over the long haul.  For example, if you’re trying to create a tool to help create code, do speech transcription or scan PDFs and extract information, it doesn’t matter how nice your interface is — the major LLMs are already excellent at these tasks. Focus on an area where you can provide a differentiated service that gives you a compounding advantage through a data flywheel or network effects. For example, one of the AI Startup Searchlight honorees, Goodcall, automates voice calls for businesses. It has been amassing anonymized data from over 4 million customer calls to build a more robust database and improved analytics. Another area startups could focus on is pulling data out of unstructured customer conversations. One Searchlight honoree, Spoke AI, does this by pulling data from customers’ voice calls so that business users can see who is calling them, what they might want, how they are feeling and what they talked about previously with colleagues. 3. Understand the growth trajectory of AI  AI is changing incredibly fast. The number of AI patents per year has increased 31x since 2010, with over 62,000 granted in 2022.  When deciding where to focus your efforts, first learn about the arc of LLM development and where it’s likely to go in the next 12 months. If you don’t, your solution may be obsolete before you can get it to market. For example, the big AI labs are currently working to enhance the reasoning capabilities of these models, improving their capabilities in various complex domains. Don’t focus on advanced reasoning unless you have billions in funding! By contrast, one of the Searchlight honorees, CuraJOY, is a grassroots tech nonprofit that uses AI and entertainment to improve the accessibility, effectiveness and equity of social and mental health support. That’s definitely not an area of focus for the big AI models — but it’s meeting a major societal need. 4. Capture the excitement New AI solutions attract a lot of interest, but the excitement is fleeting. If you don’t have a plan to capture those tire-kickers and turn them into long-term customers, your business will fade quickly along with the hype. You need to maintain a high interest level. One way to do that is to keep improving your product based on customers’ input. For example, you might use AI to capture and sort customer feedback and route the highest-value feature requests directly to your product team. That will keep customers coming back and fuel sustainable growth. Another way is to keep raising the bar with new capabilities, certifications and customer-friendly offers. Here’s an illustration from a Searchlight honoree: Alpharun is an AI-powered phone interview platform; it was part of the OpenAI accelerator this year and won the audience award at the 2024 Staffing Industry Analysts conference. The company wasn’t content to rest on its laurels: It’s already securing key technology certifications and offering its customers uptime guarantees, international support and top-notch reliability — essential offerings for the enterprise customers it’s targeting. Looking forward to an AI-powered economy While 2024 marked the year of AI experimentation, 2025 will be defined by AI-powered startups delivering measurable business impact. Through my work at Lightspeed and experience judging Twilio’s Searchlight competition, one thing is clear: The most promising companies aren’t just creating clever AI implementations — they’re building robust businesses that can weather the inevitable changes in technology.  The AI Searchlight honorees exemplify this approach, building true competitive moats with compounding advantages. These companies show us that lasting success comes from combining AI capabilities with deep domain expertise and strong business fundamentals.  We’re at the dawn of a new tech boom, and I have no doubt that some of today’s builders will emerge as tomorrow’s tech giants.  Learn more about the Twilio AI Startup Searchlight and the honorees here. Nnamdi Iregbulem is an investment partner at Lightspeed Venture Partners. Sponsored articles are content produced by a company that is

Here’s what AI-powered startups need to succeed in 2025 Read More »

The era of custom chips

Presented by Marvell This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from the issue here. AI is about to face some serious growing pains. Demand for AI services is exploding globally. Unfortunately, so is the challenge of delivering those services in an economical and sustainable manner. AI power demand is forecasted to grow by 44.7% annually, a surge that will double data center power consumption to 857 terawatt hours in 2028: As a nation today, that would make data centers the sixth largest consumer of electricity, right behind Japan. It’s an imbalance that threatens the “smaller, cheaper, faster” mantra that has driven every major trend in technology for the last 50 years. It also doesn’t have to happen. Custom silicon — unique silicon optimized for specific use cases — is already demonstrating how we can continue to increase performance while cutting power even as Moore’s Law fades into history. Custom may account for 25% of AI accelerators (XPUs) by 2028 (Marvell estimate), and that’s just one category of chips going custom. The data center as a factory Jensen Huang’s vision for AI factories is apt. These coming AI data centers will churn at an unrelenting pace, 24/7. And, like manufacturing facilities, their ultimate success or failure for service providers will be determined by operational excellence, the two-word phrase that rules manufacturing. Are we consuming more, or less, energy per token than our competitor? Why is mean time to failure rising? What’s the current operational equipment effectiveness (OEE)? In oil and chemicals, the end products sold to customers are indistinguishable commodities. Where they differ is in process design, as they leverage distinct combinations of technologies to squeeze out marginal gains. The same will occur in AI. Going forward, diversity will rule, and the operators with the lowest cost, least downtime and ability to roll out new differentiating services and applications will become the favorite of businesses and consumers. In short, the best infrastructure will win. The custom chip concept One of the chief ways to differentiate will be through custom silicon that is enabled by custom semiconductors — that is, chips containing unique IP or features that achieve leapfrog performance for an application. It’s a spectrum ranging from AI accelerators built around distinct, singular design to a merchant chip containing additional custom IP, cores and firmware to optimize it for a particular software environment. While the focus is now primarily on higher-value chips such as AI accelerators, every chip will get customized: Meta, for example, recently unveiled a custom network interface controller (NIC), a relatively unsung chip that connects servers to networks, to reduce the impact of downtime. A single stack of high bandwidth memory (HBM) can require an interface with 2,048 pins to transfer data, or more than 8,000 per XPU. Customizing can dramatically reduce power, pin count and increase memory capacity. XPUs with custom HBM are expected in one to two years. Customization will involve rethinking every aspect of semiconductor design. Some, for example, are looking at ways to optimize the base chip and interfaces for managing the gigabytes of high bandwidth memory (HBM) used as a cache in high-end AI accelerators. Optimization can potentially increase memory inside the chip package by up to 33%, reduce interface power by 70% and increase the available silicon real estate for logic functions by close to 25% (Marvell estimate).   The custom category also includes new, emerging classes of interconnect chips aimed at scaling up the size and capabilities of computing systems. Today, servers typically contain eight or fewer XPUs and/or CPUs and all of the components are housed in an aluminum box that slides into a rack. In the future, AI systems will contain hundreds of accelerators along with storage and memory spread over several racks connected with a portfolio of optical engines tailored to the specifications of XPUs, CXL controllers, PCIe retimers, transmit-receive optical digital signal processors (DSPs) and other devices. Many of these devices didn’t even exist a few years ago, but are expected to grow rapidly: 75% of AI and cloud servers may contain PCIe retimers within two years, according to The 650 Group. While these devices and servers will be grounded in technology standards, architectures and designs will vary widely from cloud to cloud. A periodic table for semis But how does one make custom semiconductors — where designing a platform for producing 3nm or 2nm chips can cost over $500 million? In a market where large language models (LLMs) change every few months? And how will these technologies work with emerging ideas like cold plate or immersive cooling? As basic as it sounds, it starts with the elemental ingredients. Serializer-deserializer (SerDes) circuits are the textbook “most important technology in the world” you’ve never heard about. These components control the flow of data between chips and infrastructure devices such as switches and servers. An 800G optical module, for example, is built with eight 100G SerDes. A single data center rack will contain tens of thousands of SerDes. You can think of them as the molecules of networking: Fundamental building blocks that have an outsized influence on the health of the system as a whole. Slightly reducing the picojoules consumed in transmitting bits across a SerDes can translate into substantial energy saving across a global infrastructure. Similarly, chip packaging now plays an outsized role in chip design because it provides a mechanism for streamlining power delivery and data paths while continuing to boost computing performance. More than 50% of power in a chip can get consumed by moving data from between different subsystems inside the chip itself. Chip industry 2.0? As custom becomes the norm, we will also face a new dilemma: How does a company deliver custom products and still leverage the benefits of mass manufacturing? To date, semiconductor makers have succeeded by making very large numbers of a small handful of devices. ”Custom” used to mean taking fairly simple actions like tweaking speed or cache size, similar

The era of custom chips Read More »

Not just hype — here are real-world use cases for AI agents

This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from this special issue here. Just seven or eight months ago, when a customer called in to or emailed Baca Systems with a service question, a human agent handling the query would begin searching for similar cases in the system and analyzing technical documents.  This process would take roughly five to seven minutes; then the agent could offer the “first meaningful response” and finally begin troubleshooting.  But now, with AI agents powered by Salesforce, that time has been shortened to as few as five to 10 seconds.  “That’s a big [reduction],” Andrew Russo, enterprise architect at Baca Systems, told VentureBeat. He emphasized that, “for us, it’s not about how do we eliminate headcount, reduce staffing. Our goal is, how do we make sure the customer is back up and running as quickly as possible?” Closing time gaps, delivering faster time to resolution BACA Systems, a Michigan-based robotics manufacturing company, first implemented Salesforce in 2014, eventually adding Service Cloud to replace its “vanilla, or maybe more like strawberry ice cream, basic service cloud,” Russo explained. The company then did a “giant digital transformation” in 2021, bringing on Salesforce’s enterprise resource planning (ERP) platform.  Team members soon began working with predictive AI for sales and manufacturing forecasts; then the company evolved to AI agents, implementing Salesforce’s Agentforce within the last year.  An initial key use case was service calls. Russo explained that about 57% of questions coming in from customers are hardware-related (for instance, a machine falling or requiring calibration).  Now, instead of having to sift through databases for previous customer calls and similar cases, human reps can ask the AI agent to find the relevant information. The AI runs in the background and allows humans to respond right away, Russo noted.  AI can also support preventative maintenance. For instance, a circuit breaker might be continually tripping, indicating that there’s a short in the wire that should be investigated, Russo explained. This could help eliminate ongoing issues that haven’t been resolved in the past.  “It’s all about how do we deliver faster time to resolution for customers,” said Russo. AI agents generating sales leads, handling customer inquiries Another critical use case is sales, because as a small company, Baca naturally doesn’t have hundreds of sales people or even dozens (in fact they have less than 10).  “We have a boatload of leads that we haven’t had time to actually make a reachout to,” said Russo. “Our goal is: How do we start to engage those?” AI can serve as a sales development representative (SDR) to send out general inquires and emails, have a back-and-forth dialogue, then pass the prospect to a member of the sales team, Russo explained. Bringing on additional salespeople to handle such tasks would require tens of thousands of dollars for salaries, but if AI can develop new deals, its upfront cost is “very easy to justify.” In coming months, the company plans to deploy customer-facing service agents that can interact with human users via text message to open and handle cases without initial need for human intervention. If the AI agent isn’t able to solve a problem, it will escalate the issue to a human rep.  The intent is, “How do we keep delivering more value to customers on the service side and create more deals on the sales side?” Russo noted.  Outside sales and service, Baca is using AI to generate emails, create receivables and craft “very stern collections letters” when required. Russo, for his part, is using the technology for part deduplication checking, leveraging retrieval-augmented generation (RAG) with prompt builders to detect duplicates to prevent bad data from porting into Salesforce.  There’s been little to no pushback from employees, he reports: The company started small, initially giving a select group of users access. Others then quickly began inquiring. “They actually started to beg [us] to give them access,” Russo noted. “No one’s scared of it; they like using it because it helps make their job better.” The company is keeping that deliberate, incremental approach as it further incorporates AI so it can remain agile. “Our goals are not changing, it’s just how we get there and the road we’re taking,” said Russo. “It’s a different road, it’s a better road — it’s the highway.” AI serving up savings for ezCater Corporate catering is more complicated than it might sound. There can be shifts in headcounts, food preferences and dietary restrictions as well as other logistical challenges. This sometimes has organizers at ezCater scrambling.  “Concierge agents have really struggled to keep up with the pace,” Erin DeCesare, CTO of the workplace catering platform, told VentureBeat.  But once the company implements Salesforce’s Agentforce, a customer needing to modify an order will be able to communicate their needs with AI in natural language, and the AI agent will automatically make adjustments. When more complex issues come up — such as a reconfiguration of an order or an all-out venue change — the AI agent will quickly push the matter up to a human rep.  “This is a huge cost savings for us,” said DeCesare, Another intended use case is “restaurant discovery” — that is, AI agents will be able to guide users to the best venue based on inputs about their food preferences, budget, location and other factors. This will be supported by data from millions of workplace food orders. “This is what NLP and AI is perfect for,” said DeCesare.  ezCater is initially incorporating AI agents in-house to assist concierge agents, and the human agents love it, she reports. “We’re giving them tools to be better, and be able to handle more calls.” There’s been a shift in the comfort level of engineers, too, as they are able to conceive of agents more structurally. “They can test and trust in a way that feels like software development,” said DeCesare. “It’s more like what they would expect in the software development lifecycle.”

Not just hype — here are real-world use cases for AI agents Read More »

Colossal raises $200M to “de-extinct” the woolly mammoth, thylacine and dodo

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Colossal BioSciences has raised $200 million in a new round of funding to bring back extinct species like the woolly mammoth. Dallas- and Boston-based Colossal is making strides in the scientific breakthroughs toward “de-extinction,” or bringing back extinct species like the woolly mammoth, thylacine and the dodo. I would be remiss if I did not mention this is the plot of Michael Crichton’s novel Jurassic Park, where scientists used the DNA found in mosquitoes preserved in amber to bring back the Tyrannosaurus Rex and other dinosaurs. I mean, what could go wrong when science fiction becomes reality? Kidding aside, this is pretty amazing work and I’m not surprised to see game dev Richard Garriott among the investors. The big investor this time was TWG Global, a diversified holding company with operating businesses and investments in technology/AI, financial services, private lending and sports and media. The investor is jointly led by Mark Walter and Thomas Tull. Since launching in September 2021, Colossal has raised $435 million in total funding. This latest round of capital places the company at a $10.2 billion valuation. Colossal will leverage this latest infusion of capital to continue to advance its genetic engineering technologies while pioneering new revolutionary software, wetware and hardware solutions, which have applications beyond de-extinction including species preservation and human healthcare. “Our recent successes in creating the technologies necessary for our end-to-end de-extinction toolkit have been met with enthusiasm by the investor community. TWG Global and our other partners have been bullish in their desire to help us scale as quickly and efficiently as possible,” said CEO Colossal Ben Lamm, in a statement. “This funding will grow our team, support new technology development, expand our de-extinction species list, while continuing to allow us to carry forth our mission to make extinction a thing of the past.” Colossal employs over 170 scientists and partners with labs in Boston, Dallas, and Melbourne, Australia. In addition, Colossal sponsors over 40 full time postdoctoral scholars and research programs in 16 partner labs at some of the most prestigious universities around the globe. Colossal’s scientific advisory board has grown to include over 95 of the top scientists working in genomics, ancient DNA, ecology, conservation, developmental biology, and paleontology. Together, these teams are tackling some of the hardest problems in biology, including mapping genotypes to traits and behaviors, understanding developmental pathways to phenotypes like craniofacial shape, tusk formation, and coat color patterning, and developing new tools for multiplex and large-insert genome engineering. “Colossal is the leading company working at the intersection of AI, computational biology and genetic engineering for both de-extinction and species preservation,” said Mark Walter, CEO of TWG Global, in a statement. “Colossal has assembled a world-class team that has already driven, in a short period of time, significant technology innovations and impact in advancing conservation, which is a core value of TWG Global. We are thrilled to support Colossal as it accelerates and scales its mission to combat the animal extinction crisis.” “Colossal is a revolutionary genetics company making science fiction into science fact. We are creating the technology to build de-extinction science and scale conservation biology particularly for endangered and at-risk species. I could not be more appreciative of the investor support for this important mission,” said George Church, Colossal cofounder and a professor of genetics at Harvard Medical School and professor of Health Sciences and Technology at Harvard and the Massachusetts Institute of Technology (MIT). In October 2024, the Colossal Foundation was launched, a sister 501(c)(3) focused on overseeing the deployment and application of Colossal-developed science and technology innovations. The organization currently supports 48 conservation partners and their global initiatives around the world. This includes partners like Re:wild, Save The Elephants, Biorescue, Birdlife International, Conservation Nation, Sezarc, Mauritian Wildlife Foundation, Aussie Ark, International Elephant Foundation, Saving Animals From Extinction. Currently the Colossal Foundation is focused on supporting conservation partners who are working on new innovative technologies that can be applied to conservation and those who benefit from the development and deployment of new genetic rescue and de-extinction technologies to help combat the biodiversity extinction crisis. Tracking Progress on Colossal’s De-Extinction Projects Ben Lamm is CEO of Colossal Biosciences The first step in every de-extinction project is to recover and analyze preserved genetic material and use that data to identify each species’ core genomic components. In addition to recruiting Beth Shapiro, a global leader in ancient DNA research, as Colossal’s chief science officer, Colossal has built a team of Ph.D experts in ancient DNA among its scientific advisors, including Love Dalen, Andrew Pask, Tom Gilbert, Michael Hofreiter, Hendrik Poinar, Erez Lieberman Aiden, and Matthew Wooler. With this team, Colossal continues to push advances in ancient DNA through support to academic labs and internal scientific research. All three core species – mammoth, thylacine, and dodo – have already benefited from this coalescence of expertise. As an example, Colossal now has the most contiguous and complete ancient genomes to date for each of these three species; these genomes are the blueprints from which these species’ core traits will be engineered. The path from ancient genome to living species requires a systems model approach to innovation across computational biology, cellular engineering, genetic engineering, embryology, and animal husbandry, with refinement and tuning in each step along the de-extinction pipeline occurring simultaneously. To date, Colossal’s scientists have achieved monumental breakthroughs at each step for each of the three flagship species. In the last three years, Colossal’s first major project to be announced, the woolly mammoth project, generated new genomic resources, made breakthroughs in cell biology and genome engineering, and explored the ecological impact of de-extinction, with implications for mammoths, elephants, and species across the vertebrate tree of life. Woolly Mammoth De-extinction Project Progress The mammoth team has generated chromosome-scale reference genomes for the African elephant, Asian elephant, and rock hyrax, all of which have been released on the National Center for

Colossal raises $200M to “de-extinct” the woolly mammoth, thylacine and dodo Read More »

Borderless AI emerges from stealth with $32M in funding to disrupt HR tech

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new artificial intelligence startup is betting that HR departments will become the next major battleground for enterprise AI adoption, launching a specialized search engine that aims to transform how companies manage their workforce. Borderless AI, which emerged from stealth last year, announced today the release of HRGPT, a free AI-powered search engine that allows companies to query their internal HR data alongside employment laws and regulations. The company also disclosed a $5 million strategic investment from AI company Cohere, bringing its total seed funding to $32 million. “Every HR department is going to have AI agents that manage various aspects across the HR stack,” said Willson Cross, cofounder and CEO of Borderless AI, in an exclusive interview with VentureBeat. “We’re proud to be at the forefront of that vertical.” How Borderless AI’s HRGPT is transforming workforce management The Toronto-based startup is positioning itself to compete with established HR software providers like Workday and ADP by focusing exclusively on AI-powered solutions. Its platform already counts several multinational companies as customers, including Dunlop Sporting Goods, which uses the technology to manage employee onboarding across 17 global offices. Unlike general-purpose AI chatbots, HRGPT combines real-time web search with access to internal company data and specialized HR knowledge. The system can perform tasks ranging from generating employment agreements to tracking time-off requests and managing international expense reimbursements. “Unlike ChatGPT, we have real-time web search. When a customer asks HRGPT a question, it scans the web for real-time sourcing and citations,” Cross told VentureBeat. The platform also integrates with PricewaterhouseCoopers for employment law expertise. Borderless AI’s platform displays employee time-off requests and compliance data in a conversational interface designed for HR professionals. (Credit: Borderless AI) The investment from Cohere signals growing interest in vertical-specific AI applications for the enterprise. While consumer AI tools like ChatGPT have captured public attention, Cross believes the next wave of AI adoption will come from businesses. “For the next two to three years, it’s going to be the businesses that are catching up and waking up to bringing AI to their organizations,” he said. “HR is one that has many applicable use cases.” Borderless AI’s approach reflects a broader trend of AI companies focusing on specific industries rather than trying to build general-purpose tools. Similar vertical-focused companies include Harvey AI in legal tech and Sierra in customer service. Building a billion-dollar HR tech company with AI at its core The company’s ambitious vision includes automating complex HR processes like payroll management and employee analytics. Cross indicated they aim to build a billion-dollar company with fewer than 50 employees by leveraging AI extensively in their own operations. However, Borderless AI faces significant challenges, including prioritizing which features to build next amid strong customer demand. The company must also maintain accuracy and compliance in its automated HR functions, particularly for sensitive tasks like employment agreements and international payments. The startup’s success could signal whether specialized AI tools will successfully compete against established enterprise software providers who are racing to add AI capabilities to their existing products. For now, early customers appear convinced: Borderless AI reports that its AI agents perform tasks hourly across its customer base. source

Borderless AI emerges from stealth with $32M in funding to disrupt HR tech Read More »

MiniMax unveils its own open source LLM with industry-leading 4M token context

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More MiniMax is perhaps today best known here in the U.S. as the Singaporean company behind Hailuo, a realistic, high-resolution generative AI video model that competes with Runway, OpenAI’s Sora and Luma AI’s Dream Machine. But the company has far more tricks up its sleeve: Today, for instance, it announced the release and open-sourcing of the MiniMax-01 series, a new family of models built to handle ultra-long contexts and enhance AI agent development. The series includes MiniMax-Text-01, a foundation large language model (LLM), and MiniMax-VL-01, a visual multimodal model. A massive context window MiniMax-Text-o1, is of particular note for enabling up to 4 million tokens in its context window — equivalent to a small library’s worth of books. The context window is how much information the LLM can handle in one input/output exchange, with words and concepts represented as numerical “tokens,” the LLM’s own internal mathematical abstraction of the data it was trained on. And, while Google previously led the pack with its Gemini 1.5 Pro model and 2-million-token context window, MiniMax remarkably doubled that. As MiniMax posted on its official X account today: “MiniMax-01 efficiently processes up to 4M tokens — 20 to 32 times the capacity of other leading models. We believe MiniMax-01 is poised to support the anticipated surge in agent-related applications in the coming year, as agents increasingly require extended context handling capabilities and sustained memory.” The models are available now for download on Hugging Face and Github under a custom MiniMax license, for users to try directly on Hailuo AI Chat (a ChatGPT/Gemini/Claude competitor), and through MiniMax’s application programming interface (API), where third-party developers can link their own unique apps to them. MiniMax is offering APIs for text and multi-modal processing at competitive rates: $0.2 per 1 million input tokens $1.1 per 1 million output tokens For comparison, OpenAI’s GPT-4o costs $2.50 per 1 million input tokens through its API, a staggering 12.5X more expensive. MiniMax has also integrated a mixture of experts (MoE) framework with 32 experts to optimize scalability. This design balances computational and memory efficiency while maintaining competitive performance on key benchmarks. Striking new ground with Lightning Attention Architecture At the heart of MiniMax-01 is a Lightning Attention mechanism, an innovative alternative to transformer architecture. This design significantly reduces computational complexity. The models consist of 456 billion parameters, with 45.9 billion activated per inference. Unlike earlier architectures, Lightning Attention employs a mix of linear and traditional SoftMax layers, achieving near-linear complexity for long inputs. SoftMax, for those like myself who are new to the concept, are the transformation of input numerals into probabilities adding up to 1, so that the LLM can approximate which meaning of the input is likeliest. MiniMax has rebuilt its training and inference frameworks to support the Lightning Attention architecture. Key improvements include: MoE all-to-all communication optimization: Reduces inter-GPU communication overhead. Varlen ring attention: Minimizes computational waste for long-sequence processing. Efficient kernel implementations: Tailored CUDA kernels improve Lightning Attention performance. These advancements make MiniMax-01 models accessible for real-world applications, while maintaining affordability. Performance and benchmarks On mainstream text and multimodal benchmarks, MiniMax-01 rivals top-tier models like GPT-4 and Claude-3.5, with especially strong results on long-context evaluations. Notably, MiniMax-Text-01 achieved 100% accuracy on the Needle-In-A-Haystack task with a 4-million-token context. The models also demonstrate minimal performance degradation as input length increases. MiniMax plans regular updates to expand the models’ capabilities, including code and multi-modal enhancements. The company views open-sourcing as a step toward building foundational AI capabilities for the evolving AI agent landscape. With 2025 predicted to be a transformative year for AI agents, the need for sustained memory and efficient inter-agent communication is increasing. MiniMax’s innovations are designed to meet these challenges. Open to collaboration MiniMax invites developers and researchers to explore the capabilities of MiniMax-01. Beyond open-sourcing, its team welcomes technical suggestions and collaboration inquiries at [email protected]. With its commitment to cost-effective and scalable AI, MiniMax positions itself as a key player in shaping the AI agent era. The MiniMax-01 series offers an exciting opportunity for developers to push the boundaries of what long-context AI can achieve. source

MiniMax unveils its own open source LLM with industry-leading 4M token context Read More »

Do new AI reasoning models require new approaches to prompting?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The era of reasoning AI is well underway. After OpenAI once again kickstarted an AI revolution with its o1 reasoning model introduced back in September 2024 — which takes longer to answer questions but with the payoff of higher performance, especially on complex, multi-step problems in math and science — the commercial AI field has been flooded with copycats and competitors. There’s DeepSeek’s R1, Google Gemini 2 Flash Thinking, and just today, LlamaV-o1, all of which seek to offer similar built-in “reasoning” to OpenAI’s new o1 and upcoming o3 model families. These models engage in “chain-of-thought” (CoT) prompting — or “self-prompting” — forcing them to reflect on their analysis midstream, double back, check over their own work and ultimately arrive at a better answer than just shooting it out of their embeddings as fast as possible, as other large language models (LLMs) do. Yet the high cost of o1 and o1-mini ($15.00/1M input tokens vs. $1.25/1M input tokens for GPT-4o on OpenAI’s API) has caused some to balk at the supposed performance gains. Is it really worth paying 12X as much as the typical, state-of-the-art LLM? As it turns out, there are a growing number of converts — but the key to unlocking reasoning models’ true value may lie in the user prompting them differently. Shawn Wang (founder of AI news service Smol) featured on his Substack over the weekend a guest post from Ben Hylak, the former Apple Inc., interface designer for visionOS (which powers the Vision Pro spatial computing headset) and co-founder of Dawn, an analytics and diagnostics platform for AI products. The post has gone viral, as it convincingly explains how Hylak prompts OpenAI’s o1 model to receive incredibly valuable outputs (for him). In short, instead of the human user writing prompts for the o1 model, they should think about writing “briefs,” or more detailed explanations that include lots of context up-front about what the user wants the model to output, who the user is and what format in which they want the model to output information for them. As Hylak writes on Substack: With most models, we’ve been trained to tell the model how we want it to answer us. e.g. ‘You are an expert software engineer. Think slowly and carefully“ This is the opposite of how I’ve found success with o1. I don’t instruct it on the how — only the what. Then let o1 take over and plan and resolve its own steps. This is what the autonomous reasoning is for, and can actually be much faster than if you were to manually review and chat as the “human in the loop”. Hylak also includes a great annotated screenshot of an example prompt for o1 that produced a useful results for a list of hikes: This blog post was so helpful, OpenAI’s own president and co-founder Greg Brockman re-shared it on his X account with the message: “o1 is a different kind of model. Great performance requires using it in a new way relative to standard chat models.” I tried it myself on my recurring quest to learn to speak fluent Spanish and here was the result, for those curious. Perhaps not as impressive as Hylak’s well-constructed prompt and response, but definitely showing strong potential. Separately, even when it comes to non-reasoning LLMs such as Claude 3.5 Sonnet, there may be room for regular users to improve their prompting to get better, less constrained results. As Louis Arge, former Teton.ai engineer and current creator of neuromodulation device openFUS, wrote on X, “one trick i’ve discovered is that LLMs trust their own prompts more than my prompts,” and provided an example of how he convinced Claude to be “less of a coward” by first “trigger[ing] a fight” with him over its outputs. All of which goes to show that prompt engineering remains a valuable skill as the AI era wears on. source

Do new AI reasoning models require new approaches to prompting? Read More »

Hallucinations in AI: How GSK is addressing a critical problem in drug development

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Pharmaceutical giant GSK is pushing the boundaries of what generative AI can achieve in healthcare areas like scientific literature review, genomic analysis and drug discovery. But it faces a persistent problem with hallucinations, or when AI models generate incorrect or fabricated information. Errors in healthcare are not merely inconvenient; they can have life-altering consequences. Here’s how GSK is tackling it. The hallucination problem in generative health care A lot of focus around reducing hallucinations has been applied during the training of a large language model (LLM), or when it is learning from data. But to mitigate hallucinations, GSK instead employs strategies at inference-time, or at the time when a model is actually being used in a real application. Strategies here include things like self-reflection mechanisms, multi-model sampling and iterative output evaluation. According to Kim Branson, SVP of AI and machine learning (ML) at GSK, these techniques help ensure that agents are “robust and reliable,” while enabling scientists to generate actionable insights more quickly. “We’re all about increasing the iteration cycles at GSK — how we think faster,” he said. Leveraging test-time compute scaling Improving an generative AI application’s performance at inference-time, also referred to as test-time, is mostly done by increasing computational resources when a model trying to figure out the answer to a problem. This includes more complex operations such as iterative output refinement or multi-model aggregation, which are critical for reducing hallucinations and improving model performance. Branson emphasized the transformative role of scaling this phase of test-time compute in GSK’s AI efforts, noting that by using strategies like self-reflection and ensemble modeling, GSK can leverage these additional compute cycles to produce results that are not only quicker, but more accurate and reliable. In fact, this is a broader industry trend, not only across healthcare, but other verticals too. “You’re seeing this war happening with how much I can serve, my cost per token and time per token,” said Branson. “That allows people to bring these different algorithmic strategies which were before not technically feasible, and that also will drive the kind of deployment and adoption of agents.” Strategies for reducing hallucinations To tackle hallucinations in healthcare gen AI apps, GSK employs two main strategies that require additional computational resources during inference. Self-reflection and iterative output review One core technique is self-reflection, where LLMs critique or edit their own responses to improve quality. The model “thinks step by step,” analyzing its initial output, pinpointing weaknesses and revising answers as needed. GSK’s literature search tool exemplifies this: It collects data from internal repositories and an LLM’s memory, then re-evaluates its findings through self-criticism to uncover inconsistencies.  This iterative process results in clearer, more detailed final answers. Branson underscored the value of self-criticism, saying: “If you can only afford to do one thing, do that.” Refining its own logic before delivering results allows the system to produce insights that align with healthcare’s strict standards. Multi-model sampling GSK’s second strategy relies on multiple LLMs or different configurations of a single model to cross-verify outputs. In practice, the system might run the same query at various temperature settings to generate diverse answers, employ fine-tuned versions of the same model specializing in particular domains or call on entirely separate models trained on distinct datasets. Comparing and contrasting these outputs helps confirm the most consistent or convergent conclusions. “You can get that effect of having different orthogonal ways to come to the same conclusion,” said Branson. Although this approach requires more computational power, it reduces hallucinations and boosts confidence in the final answer — an essential benefit in high-stakes healthcare environments. The inference wars GSK’s strategies depend on infrastructure that can handle significantly heavier computational loads. In what Branson calls “inference wars,” AI infrastructure companies — such as Cerebras, Groq and SambaNova — compete to deliver hardware breakthroughs that enhance token throughput, lower latency and reduce costs per token.  Specialized chips and architectures enable complex inferencing routines, including multi-model sampling and iterative self-reflection, at scale. Cerebras’ technology, for example, processes thousands of tokens per second, allowing advanced techniques to work in real-world scenarios. “You’re seeing the results of these innovations directly impacting how we can deploy generative models effectively in healthcare,” Branson noted.  This week, in a partnership with Mayo Clinic and Microsoft, Cerebras announced a genomic foundation model that predicts the best medical treatments for people with rheumatoid arthritis using the efficiencies found in its custom silicon. When hardware keeps pace with software demands, solutions emerge to maintain accuracy and efficiency. Challenges remain Even with these advancements, scaling compute resources presents obstacles. Longer inference times can slow workflows, especially if clinicians or researchers need prompt results. This is where the advanced silicon comes in. Higher compute usage also drives up costs, requiring careful resource management. Nonetheless, GSK considers these trade-offs necessary for stronger reliability and richer functionality.  “As we enable more tools in the agent ecosystem, the system becomes more useful for people, and you end up with increased compute usage,” Branson noted. Balancing performance, costs and system capabilities allows GSK to maintain a practical yet forward-looking strategy. What’s next? GSK plans to keep refining its AI-driven healthcare solutions with test-time compute scaling as a top priority. The combination of self-reflection, multi-model sampling and robust infrastructure helps to ensure that generative models meet the rigorous demands of clinical environments.  This approach also serves as a road map for other organizations, illustrating how to reconcile accuracy, efficiency and scalability. Maintaining a leading edge in compute innovations and sophisticated inference techniques not only addresses current challenges, but also lays the groundwork for breakthroughs in drug discovery, patient care and beyond. This is part of our Healthcare and Gen AI feature series. source

Hallucinations in AI: How GSK is addressing a critical problem in drug development Read More »

McAfee launches scam detector to stop scams before they strike

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Scams are everywhere. McAfee’s new scam detector spots and stops scams across text, email, and video to keep you from being fooled. McAfee today announced at CES 2025 the launch of McAfee Scam Detector – the most comprehensive protection against text, email, and video scams. Today’s scams are smarter, sneakier, and more convincing than ever – and they’re everywhere. One in three Americans admit falling for a text, email, or video scam in the last 12 months. From fake emails and suspicious texts to deepfake videos that look incredibly real, scammers are using clever tricks to steal people’s money and personal information. McAfee is helping consumers take back control with its AI-powered McAfee Scam Detector to stop scammers in their tracks. McAfee is using AI to try to stop scams before they strike via automatic detection of scams or potential scams. Scammers have a couple of things working in their favor, said Steve Grobman, CTO of McAfee, in an interview with GamesBeat. One is there’s a large number of private communication channels where there’s not necessarily a moderator, like on encrypted messaging or direct messaging. They could be communicating with the victim, you know, under the guise of security. The other piece is real time, and it’s one of the reasons that McAfee has made its deepfake detector product work on any video stream in the web, Grobman said. Beginning this spring, McAfee’s Scam Detector will be included at no extra cost for McAfee customers. This must-have product uses the latest in advanced AI technology to proactively analyze and flag risky messages in real time. That ‘Hi, how are you?’ text from a stranger? It’s one of the top text scams of 2024. I got a scam email about using an automated document signing service, with an attachment saying “termination NDA” on it. I almost fell for that one, given the urgency I associated with that one. (I wasn’t fired). An urgent email about a failed delivery? Probably fake. And no, Elon Musk doesn’t have a unique investment opportunity for you. McAfee makes it easy to tell real from fake in seconds and gives you the winning combination of tips and technology to keep scams out of your life for good. “I think the thing that is most clever is the personalization,” said Grobman. “One of the things that we have seen are job scams, where people have made their search public, and those scams take the form of taking your job interview to the next level, but it requires a background check and we require the applicant to pay for the background check.” He said the scanning for scams is working in real time, making it possible to catch the scammers at the right times. Background A year ago, Grobman talked to me about focusing on scam defense in three areas: scams that come through email, scams that come through text, and then scams that users are exposed to, more generally, in other media forms like video. To address that, the company developed advanced AI models to protect consumers in all three of those areas and it started to deploy them in late 2024. One of those was a deepfake detector to help consumers identify whether AI generated videos are AI generated versus authentic. The company also ran technology previews for our other modalities, namely email and text. “And now that we’re moving into 2025 we’re very excited to move all of these technologies into a much larger scale by making them available to the vast majority of our customers,” Grobman said. “We’re evolving our AI models to take advantage of advanced AI PCs when an advanced AI PC is there. So we can run our AI models on the [PC’s own] NPU, but we also have the ability to run on other inference engines, either the CPU or GPU, which is able to give us a broader scope, to give capabilities to a broader set of users.” The email scam defense is going to move beyond a technology preview that McAfee did in 2024 where it was available for Microsoft-based email properties. Now it can support Gmail and other properties. “In offerings in 2025, we’ve developed advanced technology to detect deep fake images. So we did a partnership with Yahoo News in 2024 around helping to ensure that images that came through the news pipeline, that their editors, their quality assurance personnel, could best detect if anything was generated with AI image generation technology and help provide that insight.” Protecting people in today’s Scamiverse Every day, scammers trick people with fake emails, texts, and videos, and the results can be devastating. Americans report receiving 12 scam messages daily, losing as much as $1,000 when they fall for one, and spend 80 hours a year simply trying to figure out if the onslaught of messages they receive are real or fake, according to a McAfee survey. And deepfake scams, which use AI to create fake video, can be even worse – some people have lost up to half a million dollars, based on a McAfee survey. It’s clear that scams have become a drain on people’s time, energy, and finances. “I’ve had more issues with scam texts recently. I’d say within the last year, it’s just been bad. Like I’ve been getting a lot of spam emails, texts, calls…it’s a lot,” said Tina, 31, in South Carolina. “It was a fake email for UPS. I thought that I was signing up to change my address and instead it charged my credit card,” said Alexandria, 46, in Georgia. “While I love new technology, I have been very scared of AI generated videos and information since seeing just how realistic it can be,” said Haley, 24, from New Jersey. “Scammers are getting smarter every day, using technology like artificial intelligence to make their tricks more convincing and harder to spot,” said

McAfee launches scam detector to stop scams before they strike Read More »

Up Network and DreamSmart partner on Web3 AI glasses powered by Google Gemini

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Up Network, the user-powered AI agent operating system, has announced its partnership with DreamSmart to create Web3 AI glasses powered by Google Gemini. This product integrates state-of-the-art industrial design, AI, XR (extended reality) capabilities, and Web3 incentives, with the aim of redefining human-machine interaction and advancing the post-smartphone era. Developed under DreamSmart’s StarV brand, the Web3 AI glasses are aimed at changing how we connect with technology. Powered by Google’s Gemini, these glasses allow you to interact naturally—just by talking—while delivering seamless, intuitive experiences. They simplify complexity, adapt to your needs with context-aware intelligence, and ensure your data remains private and under your control. The glasses weigh just 44 grams, or about as much as the heavy side of ordinary glasses. They are built for all-day wear, delivering up to eight hours of battery life for uninterrupted usage. The glasses have an optical guidewave display, the glasses deliver a seamless extended reality (XR) experience for productivity, entertainment, and daily tasks. Google Gemini and other advanced AI agents, the glasses provide real-time contextual intelligence, and the companies claim they surpass current offerings from major tech players like Google and Samsung. Web3 Made Simple: AI Glasses Empowering Web3 for Everyone Web3 technologies are complex, requiring users to interact with decentralized systems, manage wallets and digital assets, and engage with blockchain-based applications. They haven’t proven as popular among consumers due to the complexity. Through its integration with Up Network, the Web3 AI Glasses elevates this experience by providing hands-free, natural language interaction, with real-time and context-aware assistance, bridging the gap between complexity and accessibility. The companies said the AI agent swarms eliminate the steep learning curve and complexities of Web3. By handling tasks collaboratively and intuitively, these agents enable anyone—even crypto newcomers—to interact with blockchain and AI using natural language. They use tokenized incentives allow users to earn by interacting with AI agents, contributing insights, and engaging in decentralized activities. And users own their data as an asset, maintaining full control and privacy through on-device processing and anonymized storage. The companies said they are creating a privacy-first experience. All interactions are securely processed on-device, ensuring users retain their data sovereignty without compromising usability. “This partnership with DreamSmart to launch the first Web3 AI Glasses represents a major step forward for Up Network,” said Devansh Khatri, cofounder at Up Network, in a statement. “These glasses are not just a device—they’re a gateway to the future of computing and decentralized technology, combining AI, XR, and Web3 incentives into one powerful ecosystem.” The Web3 AI Glasses will be available in Q1 2025. Additional details on pricing, market availability, and exclusive previews will be announced soon. DreamSmart is based in China. It was founded in March, 2023, and it has more than 4,000 people. Up Network is based in Singapore and it has 15 people. Up Network was founded in the summer of 2024, and it expects to announce a funding round soon. source

Up Network and DreamSmart partner on Web3 AI glasses powered by Google Gemini Read More »