VentureBeat

Lightning AI’s AI Hub shows AI app marketplaces are the next enterprise game-changer

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The last mile problem in generative AI refers to the ability of enterprises to deploy applications to production.  For many companies, the answer lies in marketplaces, which enterprises and developers can browse for applications akin to the Apple app store and download new programs onto their phones. Providers such as AWS Bedrock and Hugging Face have begun building marketplaces, offering ready-built applications from partners that customers can integrate into their stack.  The latest entrant into the AI marketplace space is Lightning AI, the company that runs the open-source Python library PyTorch Lightning. Today it is launching AI Hub, a marketplace for both AI models and applications. What sets it apart from other marketplaces, however, is that Lightning allows allows developers to actually do deployment — and enjoy enterprise security too. Lightning AI CEO William Falcon told VentureBeat in an exclusive interview that AI Hub allows enterprises to find the application they want without having all the other platforms required to run it.  Falcon noted that previously, enterprises had to find hardware providers that could run and host models. The next step was to find a way to deploy that model and make it into something useful.  “But then you need those models to do something, and that’s where the last mile issue is, that’s the end thing enterprises use, and most of that is from standalone companies that offer an app,” he said. “They bought all these tools, did a bunch of experiments, and then couldn’t deploy them or really take them to that last mile.” Falcon added that AI Hub “removes the need for specialized platforms.” Enterprises can find any type of AI application they want in one place. This helps organizations stuck in the prototype phase move faster to deployment.  AI Hub as an app store AI Hub hosts more than 50 APIs at launch, with a mix of foundation models and applications. It hosts many popular models, including DeepSeek-R1.  Enterprises can access AI Hub and find applications built using Lightning’s flagship product, Lightning AI Studio, or by other developers. They can then run these on Lightning’s cloud or private enterprise cloud environments. Organizations can link their AWS or Google Cloud instances and keep data within their company’s virtual private cloud. Falcon said this offers enterprises control over deployment.  Lightning AI’s AI Hub can work with most cloud providers. While it hosts open-source models, Falcon said the apps it hosts are not open-source, meaning users cannot alter their code.  Lightning AI will offer AI Hub free for current customers, with 15 monthly credits to run applications. It will offer different pricing tiers for enterprises that want to connect to their private clouds.   Falcon said AI Hub speeds up the deployment of AI applications within an organization because everything they need is on the platform.  “Ultimately, as a platform, what we offer enterprises is iteration and speed,” he said. “I’ll give you an example: We have a Big Fortune 100 pharma company customer. Within a few days of when DeepSeek came out, they had it in production, already running.” More AI marketplaces  Lightning AI’s AI Hub is not the first AI app marketplace, but its launch indicates how fast the enterprise AI space has moved since the launch of ChatGPT, which powered a generative AI boom in enterprise technology. API marketplaces still offer tons of SaaS applications to enterprises, and more companies are beginning to provide access to AI-powered applications like Apple’s App Store to make them easier to deploy.  AWS, for instance, announced the AWS Bedrock Marketplace for specialized foundation models and Buy with AWS — which features services from AWS partners — during re:Invent in December.  Hugging Face, for its part, has launched Spaces, an AI app directory that allows developers to search and try out new apps, for general availability. Hugging Face CEO Clement Delangue posted on X that Spaces “has quietly become the biggest AI app store, with 400,000 total apps, 2,000 new apps created every day, getting visited 2.5M times every week!” He added that the launch of Spaces shows how “The future of AI will be distributed.” Even OpenAI’s GPT Store on ChatGPT technically functions as a marketplace for people to try out custom GPTs.  https://twitter.com/_akhaliq/status/1886831521216016825 Falcon noted that most technologies are offered in a marketplace, especially to reach many potential customers. In fact, this is not the first time Lightning AI has launched an AI marketplace. Lightning AI Studio, first announced in December 2023, lets enterprises create AI platforms using pre-built templates.  “Every technology ends up here,” said Falcon. “Through the evolution of any technology, you’re going to end up in something like this. The iPhone’s a good example. You went from point solutions to calculators. flashlights and notepads. Something like Slack did the same thing where you had an app to send files or photos before, but now it’s all in one. There hasn’t really been that for AI because it’s still kind of new.” Lightning AI, though, faces tough competition especially against Hugging Face. Hugging Face has long been a repository of models and applications and is widely used by developers. Falcon said what makes AI Hub different is that users not only access to state of the art applications with powerful models, but it allows them to begin their AI deployment in the platform and focus on enterprise security. “I can hit deployment here. As an enterprise they can point to their AWS or Google Cloud and the application runs in their private cloud. No data leaks or security issues it’s all within your firewall,” he said. source

Lightning AI’s AI Hub shows AI app marketplaces are the next enterprise game-changer Read More »

LangChain shows AI agents aren’t human-level yet because they’re overwhelmed by tools

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More With AI agents showing promise, organizations have to grapple with figuring out if a single agent is enough, or if they should invest in building out a wider multi-agent network that touches more points in their organization.  Orchestration framework company LangChain sought to get closer to an answer to this question. It subjected an AI agent to several experiments that found single agents do have a limit of context and tools before their performance begins to degrade. These experiments could lead to a better understanding of the architecture needed to maintain agents and multi-agent systems.  In a blog post, LangChain detailed a set of experiments it performed with a single ReAct agent and benchmarked its performance. The main question LangChain hoped to answer was, “At what point does a single ReAct agent become overloaded with instructions and tools, and subsequently sees performance drop?” LangChain chose to use the ReAct agent framework because it is “one of the most basic agentic architectures.” While benchmarking agentic performance can often lead to misleading results, LangChain chose to limit the test to two easily quantifiable tasks of an agent: answering questions and scheduling meetings.  “There are many existing benchmarks for tool-use and tool-calling, but for the purposes of this experiment, we wanted to evaluate a practical agent that we actually use,” LangChain wrote. “This agent is our internal email assistant, which is responsible for two main domains of work — responding to and scheduling meeting requests and supporting customers with their questions.” Parameters of LangChain’s experiment LangChain mainly used pre-built ReAct agents through its LangGraph platform. These agents featured tool-calling large language models (LLMs) that became part of the benchmark test. These LLMs included Anthropic’s Claude 3.5 Sonnet, Meta’s Llama-3.3-70B and a trio of models from OpenAI, GPT-4o, o1 and o3-mini.  The company broke testing down to better assess the performance of email assistant on the two tasks, creating a list of steps for it to follow. It began with the email assistant’s customer support capabilities, which look at how the agent accepts an email from a client and responds with an answer.  LangChain first evaluated the tool calling trajectory, or the tools an agent taps. If the agent followed the correct order, it passed the test. Next, researchers asked the assistant to respond to an email and used an LLM to judge its performance.  For the second work domain, calendar scheduling, LangChain focused on the agent’s ability to follow instructions.  “In other words, the agent needs to remember specific instructions provided, such as exactly when it should schedule meetings with different parties,” the researchers wrote.  Overloading the agent Once they defined parameters, LangChain set to stress out and overwhelm the email assistant agent.  It set 30 tasks each for calendar scheduling and customer support. These were run three times (for a total of 90 runs). The researchers created a calendar scheduling agent and a customer support agent to better evaluate the tasks.  “The calendar scheduling agent only has access to the calendar scheduling domain, and the customer support agent only has access to the customer support domain,” LangChain explained.  The researchers then added more domain tasks and tools to the agents to increase the number of responsibilities. These could range from human resources, to technical quality assurance, to legal and compliance and a host of other areas.  Single-agent instruction degradation After running the evaluations, LangChain found that single agents would often get too overwhelmed when told to do too many things. They began forgetting to call tools or were unable to respond to tasks when given more instructions and contexts.  LangChain found that calendar scheduling agents using GPT-4o “performed worse than Claude-3.5-sonnet, o1 and o3 across the various context sizes, and performance dropped off more sharply than the other models when larger context was provided.” The performance of GPT-4o calendar schedulers fell to 2% when the domains increased to at least seven.  Other models didn’t fare much better. Llama-3.3-70B forgot to call the send_email tool, “so it failed every test case.” Only Claude-3.5-sonnet, o1 and o3-mini all remembered to call the tool, but Claude-3.5-sonnet performed worse than the two other OpenAI models. However, o3-mini’s performance degrades once irrelevant domains are added to the scheduling instructions. The customer support agent can call on more tools, but for this test, LangChain said Claude-3.5-mini performed just as well as o3-mini and o1. It also presented a shallower performance drop when more domains were added. When the context window extends, however, the Claude model performs worse.  GPT-4o also performed the worst among the models tested.  “We saw that as more context was provided, instruction following became worse. Some of our tasks were designed to follow niche specific instructions (e.g., do not perform a certain action for EU-based customers),” LangChain noted. “We found that these instructions would be successfully followed by agents with fewer domains, but as the number of domains increased, these instructions were more often forgotten, and the tasks subsequently failed.” The company said it is exploring how to evaluate multi-agent architectures using the same domain overloading method.  LangChain is already invested in the performance of agents, as it introduced the concept of “ambient agents,” or agents that run in the background and are triggered by specific events. These experiments could make it easier to figure out how best to ensure agentic performance.  source

LangChain shows AI agents aren’t human-level yet because they’re overwhelmed by tools Read More »

Cognida.ai raises $15M to fix enterprise AI’s biggest bottleneck: deployment

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cognida.ai, a Chicago-based AI company, has raised $15 million in Series A funding to help enterprises move beyond AI pilots to production-grade solutions that deliver measurable business impact. The funding round was led by Nexus Venture Partners. The investment comes at a critical time when enterprises are struggling to transform AI experiments into operational solutions. While 87% of enterprises are investing in AI, only 20% successfully deploy solutions into production, according to Cognida. “Enterprise AI adoption has reached its tipping point,” Feroze Mohammed, founder and CEO of Cognida, said in an exclusive interview with VentureBeat. “The biggest challenges enterprises face isn’t just building AI models — it’s getting them to work in production.” How Zunō platform cuts AI implementation time from 8 months to 12 weeks Mohammed, who previously led Hitachi Vantara as COO, identified three major barriers to enterprise AI adoption: data readiness, integration challenges with existing business processes and lack of AI expertise within organizations. To address these challenges, Cognida has developed Zunō, a platform that includes accelerators for predictive modeling, intelligent document processing and advanced graph-based solutions. The company claims its approach reduces typical AI implementation times from 6 to 8 months to 10 to 12 weeks. “Most conventional approaches require long lead times of doing consulting projects, doing a lot of change management with long timelines and long upfront investments,” Anup Gupta, managing director at Nexus Venture Partners, said in an interview with VentureBeat. “Cognida is one of the first times we have come across a business that can talk about demonstrable use cases across various industries.” The company has already deployed solutions at more than 30 enterprises. In one case, Cognida helped a major garage door manufacturer transform its catalog generation process from a six-month cycle to just weeks using generative AI. The solution allows the manufacturer to create virtual door designs and render them in different settings, enabling rapid testing with dealers. Other successful implementations include a 70% improvement in invoice processing speed and a one percent reduction in customer churn for SaaS clients — metrics that translate to significant revenue impact for large enterprises. The future of enterprise software: Every stack is being rewritten with AI The funding will support three primary initiatives: market expansion, intellectual property development and capability building. Mohammed envisions Cognida becoming “the practical AI company for the enterprise” within five years. “Every software stack is being rewritten leveraging AI,” said Gupta. “In the next few years, every workflow in all enterprises will have a lot more AI than is being used today.” This investment reflects a broader trend in enterprise AI, where focus is shifting from experimental projects to practical implementations that deliver clear return on investment. As businesses seek to operationalize AI while maintaining existing systems, Cognida’s approach of building solutions that integrate with current workflows appears particularly timely. The company plans to expand its AI solution library, advance its Zunō platform and grow its implementation teams to meet increasing enterprise demand. With offices in Chicago, Silicon Valley and Hyderabad, India, Cognida serves clients across manufacturing, healthcare, finance and technology sectors. Industry analysts suggest this funding round could signal a new phase in enterprise AI adoption, where practical implementation and measurable outcomes take precedence over experimental pilots. As organizations continue to grapple with AI integration challenges, solutions that can demonstrate concrete business impact while working within existing systems may find increasing traction in the market. source

Cognida.ai raises $15M to fix enterprise AI’s biggest bottleneck: deployment Read More »

Inside Monday’s AI pivot: Building digital workforces through modular AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The Monday.com work platform has been steadily growing over the past decade, in a quest to achieve its goal of helping empower teams at organizations small and large to be more efficient and productive. According to co-founder Roy Mann, AI has been a part of the company for much of its history. The initial use cases supported its own performance marketing. (Who among us has not seen a Monday advertisement somewhere over the last 10 years?) A large part of that effort has benefited from AI and machine learning (ML). With the advent and popularity of generative AI in the last three years, particularly since the debut of ChatGPT, Monday — much like every other enterprise on the planet — began to consider and integrate the technology. The initial deployment of gen AI at Monday didn’t quite generate the return on investment users wanted, however. That realization led to a bit of a rethink and pivot as the company looked to give its users AI-powered tools that actually help to improve enterprise workflows. That pivot has now manifested itself with the company’s “AI blocks” technology and the preview of its agentic AI technology that it calls “digital workforce.” Monday’s AI journey, for the most part, is all about realizing the company’s founding vision. “We wanted to do two things, one is give people the power we had as developers,” Mann told VentureBeat in an exclusive interview. “So they can build whatever they want, and they feel the power that we feel, and the other end is to build something they really love.” Any type of vendor, particularly an enterprise software vendor, is always trying to improve and help its users. Monday’s AI adoption fits securely into that pattern. The company’s public AI strategy has evolved through several distinct phases: AI assistant: Initial platform-wide integration; AI blocks: Modular AI capabilities for workflow customization; Digital workforce: Agentic AI. Much like many other vendors, the first public foray into gen AI involved an assistant technology. The basic idea with any AI assistant is that it provides a natural language interface for queries. Mann explained that the Monday AI assistant was initially part of the company’s formula builder, giving non-technical users the confidence and ability to build things they couldn’t before. While the service is useful, there is still much more that organizations need and want to do. Or Fridman, AI product group lead at Monday, explained that the main lesson learned from deploying the AI assistant is that customers want AI to be integrated into their workflows. That’s what led the company to develop AI blocks. Building the foundation for enterprise workflows with AI blocks Monday realized the limitations of the AI assistant approach and what users really wanted.  Simply put, AI functionality needs to be in the right context for users — directly in a column, component or service automation.  AI blocks are pre-built AI functions that Monday has made accessible and integrated directly into its workflow and automation tools. For example, in project management, the AI can provide risk mapping and predictability analysis, helping users better manage their projects. This allows them to focus on higher-level tasks and decision-making, while the AI handles the more repetitive or data-intensive work. This approach has particular significance for the platform’s user base, 70% of which consists of non-technical companies. The modular nature allows businesses to implement AI capabilities without requiring deep technical expertise or major workflow disruptions. Monday is taking a model agnostic approach to integrating AI An early approach taken by many vendors on their AI journeys was to use a single vendor large language model (LLM). From there, they could build a wrapper around it or fine tune for a specific use case. Mann explained that Monday is taking a very agnostic approach. In his view, models are increasingly becoming a commodity. The company builds products and solutions on top of available models, rather than creating its own proprietary models. Looking a bit deeper, Assaf Elovic, Monday’s AI director, noted that the company uses a variety of AI models. That includes OpenAI models such as GPT-4o via Azure, and others through Amazon Bedrock, ensuring flexibility and strong performance. Elovic noted that the company’s usage follows the same data residency standards as all Monday features. That includes multi-region support and encryption, to ensure the privacy and security of customer data. Agentic AI and the path to the digital workforce The latest step in Monday’s AI journey is in the same direction as the rest of the industry — the adoption of agentic AI. The promise of agentic AI is more autonomous operations that can enable an entire workflow. Some organizations build agentic AI on top of frameworks such as LangChain or Crew AI. But that’s not the specific direction that Monday is taking with its digital workforce platform. Elovic explained that Monday’s agentic flow is deeply connected to its own AI blocks infrastructure. The same tools that power its agents are built on AI blocks like sentiment analysis, information extraction and summarization.  Mann noted that digital workforce isn’t so much about using a specific agentic AI tool or framework, but about creating better automation and flow across the integrated components on the Monday platform. Digital workforce agents are tightly integrated into the platform and workflows. This allows the agents to have contextual awareness of the user’s data, processes and existing setups within Monday. The first digital workforce agent is set to become available in March. Mann said it will be called the monday “expert” designed to build solutions for specific users. Users describe their problems and needs to the agent, and the AI will provide them relevant workflows, boards and automations to address those challenges. AI specialization and integration provides differentiation in a commoditized market There is no shortage of competition across the markets that Monday serves. As a workflow platform, it crosses

Inside Monday’s AI pivot: Building digital workforces through modular AI Read More »

Agents, shadow AI and AI factories: Making sense of it all in 2025

Presented by Nvidia The rise of agentic AI Over a decade ago, “perceptive AI” gave us models that could find patterns or anomalies in data and make predictions. But that intelligence was anchored to answers that were already known. Is this a picture of a dog or a cat? Is that a pedestrian crossing the road in front of me? With “generative AI”, we can create new, never before seen content, shaped by our prompting. Instead of being served an answer to a previously answered question, each one of us is creating new text, images, voice and video using all the unstructured data we can throw at these modern AI models. “Agentic AI” promises “digital agents” that learn from us, and can perceive, reason problems out in multiple steps and then make autonomous decisions on our behalf. They can solve multilayered questions that require them to interact with many other agents, formulate answers and take actions. Consider forecasting agents in the supply chain predicting customer needs by engaging customer service agents, and then proactively adjusting warehouse stock by engaging inventory agents. Every knowledge worker will find themselves gaining these superhuman capabilities backed by a team of domain-specific task agent workers helping them tackle large complex jobs with less expended effort. The growing “shadow AI” problem However, the proliferation of generative, and soon agentic AI, presents a growing problem for IT teams. Maybe you’re familiar with “shadow IT,” where individual departments or users procure their own resources, without IT knowing. In today’s world we have “shadow AI,” and it’s hitting businesses on two fronts. Consumer-oriented AI apps are proliferating at a rapid pace, and many enterprise knowledge workers are using them1, feeding them potentially sensitive, intellectual property and customer data, often engaging with services that are not properly guardrailed2. This is creating a huge governance risk for most enterprises. Many developers are also standing up their own IT silos to support their projects. Most of these silos have little if any knowledge of each other’s work, and are ramping up operating expenses as they procure computing for short term projects, which then go underutilized or wasted, along with data silos that impede the flow of vital information between teams. And maybe worst of all, they’re losing the opportunity to learn from each other in terms of sharing expertise and best practices to efficiently deliver AI applications to production. The AI factory — built on Blackwell-powered Nvidia DGX Today’s enterprises create value through insights and answers driven by intelligence, setting them apart from their competitors. Just as past industrial revolutions transformed industries — think about steam, electricity, internet and later computer software — the age of AI heralds a new era where the production of intelligence is the core engine of every business. The ability to create this digitized intelligence on a large scale is driving the demand for a new type of factory. This “AI factory” is the next evolution of enterprise infrastructure. Instead of coal, electricity or software (the fuels of factories past), AI factories manufacture AI models to: Reduce operational costs Analyze vast amount of data and drive innovation Foster scale with agility Enhance enterprise productivity AI factories are now the essential infrastructure on which organizations can have their own AI “center of excellence” — namely a unified platform on which people, process and infrastructure can be consolidated to gain key benefits including: Scaling AI talent, with citizen data science expertise groomed from within instead of hired from outside Standardization of tools and best practices that create an application development flywheel Maximized utilization of accelerated computing infrastructure that is centrally orchestrated To enable the age of large language models (LLMs), agentic AI and what comes next, we’ve created the Nvidia DGX platform to be the engine that powers AI factories. Businesses have begun building their platforms with it, to enable leading-edge applications requiring many different expert models to work in concert with imperceivable latency, solving complex, multi-layered problems. GPU-driven Nvidia DGXTM systems with Intel® Xeon® CPUs integrate Nvidia Blackwell accelerators with a next-generation architecture optimized for the era of agentic AI, while providing fifteen times greater inference throughput with twelve times greater energy efficiency3. This platform includes best-of-breed developer and infrastructure management software that streamlines and accelerates the application development lifecycle from development to deployment, while supporting ongoing model fine-tuning. Real world impact now, not later In an Nvidia analysis of AI factory implementers, we found many derived benefits that can counter the impact of shadow AI, improving time to market, productivity and infrastructure utilization — while enabling support for the rising tide of generative and agentic AI. These organizations shared the following benefits4, as expressed by Nvidia DGX platform customers: 6X increase in infrastructure performance compared with legacy IT infrastructure 20% greater productivity for data scientists and AI practitioners 90% infrastructure utilization Typically, these benefits have been confined to hyperscalers who have decades of experience with operating high-performance infrastructure, along with a deep bench of expertise in running such platforms. The reality is that even the “experts” admit that their own platforms often can’t deliver the efficiencies needed, with many accepting 20-30% as the typical utilization factor4 of their infrastructure. Now every business has the opportunity to have a hyperscale-class platform for their own AI factory, that’s easier to acquire, dramatically more efficient, simpler to manage and delivering benefits to the business now, not later. Learn how to achieve AI-powered insights faster on GPU-driven Nvidia DGX™ systems, powered by Nvidia Tensor Core GPUs and Intel® Xeon® processors. Tony Paikeday is Senior Director of Product Marketing, Artificial Intelligence Systems at Nvidia. 1. “Why IT leaders should seize on shadow AI to modernize governance” Dell Technologies / VentureBeat, Dec 2023 2. ”Generative AI: From Buzz to Business Value”, KPMG, June 2023 3. NVIDIA test comparisons from www.nvidia.com/dgx-b200: 32,768 GPU scale, 4,096x eight-way DGX H100 air-cooled cluster: 400G IB network, 4,096x 8-way DGX B200 air-cooled cluster: 400G IB network. Projected performance subject to change. 4. Chowdery, Aakanksha, et al., “PaLM:

Agents, shadow AI and AI factories: Making sense of it all in 2025 Read More »

Would you stop using OpenAI’s ChatGPT and API if Elon Musk took it over?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More There’s almost never a dull moment on the AI beat, and today was no exception: The Wall Street Journal this afternoon reported that a consortium of private investors led by the world’s wealthiest man, the multi-company owning Elon Musk, had presented a bid of $97.4 billion to OpenAI’s non-profit board of directors to acquire the for-profit subsidiary of the company led by former co-founder turned rival, OpenAI CEO Sam Altman. Putting aside the long and messy history between the two men, which has already resulted in several lawsuits, Musk’s stated goal of wanting to acquire another company atop the six he already owns or runs (SpaceX, Tesla, Starlink, Neuralink, X, xAI) is to make OpenAI open source, per its original founding mission statement of delivering AI benefits and artificial general intelligence (AGI) for all. “It’s time for OpenAI to return to the open-source, safety-focused force for good it once was,” Musk said in a statement provided by his attorney to the Journal. “We will make sure that happens.” The takeover bid is also personal for Musk, who after co-founding and bankrolling the company in 2015 alongside Altman and 9 others, decided to exit the venture in 2018, only to turn into one of its biggest public critics and business rivals. He founded his own xAI startup in 2023 and is building a massive AI model training supercluster of graphics processing units (GPUs) in Memphis, Tennessee, known as Colossus. Untangling Musk’s motivations Yet the bid to control OpenAI would seem to be a tacit admission. Despite all his rapid and hefty investment in spinning up a competitor — the rival Grok chatbot baked into social network X, and the underlying Grok-2 large multimodal model and application programming interface (API) for third party software developers — Musk is not succeeding at winning as many users as he and his collaborators might like. It would also suggest that the Grok-3 model — reportedly in training, and which pseudonymous AI rumor accounts on X have hyped as industry-leading — is perhaps not as advanced or ready as the competition (namely, these days: DeepSeek-R1 and OpenAI’s “o” series of models.”) Altman himself took to Musk’s social network X to dismiss the idea of Musk acquiring OpenAI, writing: “no thank you but we will buy twitter for $9.74 billion if you want,” to which Musk responded with another post calling Altman “Scam Altman.” While some journalists have suggested that Musk’s bid may have the effect of complicating OpenAI’s neat plans to spin off the for-profit arm from the non-profit holding company — for which the price was suggested to be less than $40 billion — the truth is that OpenAI’s last funding round valued it at $157 billion, so both prices are ultimately much lower than the current crop of investors have bought into the company for. Not out of the question Yet it’s hardly out of the question that Musk could succeed with this takeover bid. After all, his bid to take over Twitter (and ultimately change the name to X) was also deemed as a longshot by some in the press — until it happened for real, and arguably changed the course of history by promoting more posts from conservative and freewheeling influencers and paying subscribers over the verified journalists of yore, influencing the 2024 election and a myriad other global events and individual perceptions/worldviews. Which also raises the question, especially in light of his controversial Nazi-like salute on Trump’s second inauguration day and general support for far-right politics globally: If Musk does succeed in taking over OpenAI, would you continue to use its products (ChatGPT, Sora, DALL-E 3, its APIs and various other models and services) or switch to another AI model provider? After all, a number of people and organizations left X for competing short social posting platforms BlueSky and Threads in the wake of Musk’s takeover and general moves to support Trump and far-right politicians. I should also hasten to point out that while Musk has promoted viewpoints and political positions I personally find detestable, his Grok AI model has enabled a much more freewheeling and freeform expression than most other competing AI models, especially with regards to image generation — it’s how I created the likeness of Altman at the top of this post, for example. This is a laudable position in my eyes, and might indicate that a takeover of OpenAI would result in less censored/restricted models, which I support. Vote for yourself below: source

Would you stop using OpenAI’s ChatGPT and API if Elon Musk took it over? Read More »

Google drops AI weapons ban—what it means for the future of artificial intelligence

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google has removed its long-standing prohibition against using AI for weapons and surveillance systems, marking a significant shift in the company’s ethical stance on AI development that former employees and industry experts say could reshape how Silicon Valley approaches AI safety. The change, quietly implemented this week, eliminates key portions of Google’s AI Principles that explicitly banned the company from developing AI for weapons or surveillance. These principles, established in 2018, had served as an industry benchmark for responsible AI development. “The last bastion is gone,” said Tracy Pizzo Frey, who spent five years implementing Google’s original AI principles as senior director of outbound product management, engagements and responsible AI at Google Cloud, wrote in a BlueSky post. “It’s no holds barred. Google really stood alone in this level of clarity about its commitments for what it would build.” The revised principles remove four specific prohibitions: technologies likely to cause overall harm; weapons applications; surveillance systems; and technologies that violate international law and human rights. Instead, Google now says it will “mitigate unintended or harmful outcomes” and align with “widely accepted principles of international law and human rights.” (Credit: BlueSky / Tracy Pizzo Frey) Google loosens AI ethics: What this means for military and surveillance tech This shift comes at a particularly sensitive moment, as AI capabilities advance rapidly and debates intensify about appropriate guardrails for the technology. The timing has raised questions about Google’s motivations, although the company maintains these changes have been long in development. “We’re in a state where there’s not much trust in big tech, and every move that even appears to remove guardrails creates more distrust,” Pizzo Frey said in an interview with VentureBeat. She emphasized that clear ethical boundaries had been crucial for building trustworthy AI systems during her tenure at Google. The original principles emerged in 2018 amid employee protests over Project Maven, a Pentagon contract involving AI for drone footage analysis. While Google eventually declined to renew that contract, the new changes could signal openness to similar military partnerships. The revision maintains some elements of Google’s previous ethical framework, but shifts from prohibiting specific applications to emphasizing risk management. This approach aligns more closely with industry standards like the NIST AI Risk Management Framework, although critics argue it provides less concrete restrictions on potentially harmful applications. “Even if the rigor is not the same, ethical considerations are no less important to creating good AI,” Pizzo Frey noted, highlighting how ethical considerations improve AI products’ effectiveness and accessibility. From Project Maven to policy shift: The road to Google’s AI ethics overhaul Industry observers say this policy change could influence how other technology companies approach AI ethics. Google’s original principles had set a precedent for corporate self-regulation in AI development, with many enterprises looking to Google for guidance on responsible AI implementation. The modification reflects broader tensions in the tech industry between rapid innovation and ethical constraints. As competition in AI development intensifies, companies face pressure to balance responsible development with market demands. “I worry about how fast things are getting out there into the world, and if more and more guardrails are removed,” said Pizzo Frey, expressing concern about the competitive pressure to release AI products quickly without sufficient evaluation of potential consequences. Big tech’s ethical dilemma: Will Google’s AI policy shift set a new industry standard? The revision also raises questions about internal decision-making processes at Google and how employees might navigate ethical considerations without explicit prohibitions. During her time at Google, Pizzo Frey established review processes that brought together diverse perspectives to evaluate AI applications’ potential impacts. While Google maintains its commitment to responsible AI development, the removal of specific prohibitions marks a significant departure from its previous leadership role in establishing clear ethical boundaries for AI applications. As AI continues to advance, the industry is watching to see how this shift might influence the broader landscape of AI development and regulation. source

Google drops AI weapons ban—what it means for the future of artificial intelligence Read More »

Ai2 releases Tülu 3, a fully open-source model that bests DeepSeek v3, GPT-4o with novel post-training approach

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The open-source model race just keeps on getting more interesting.  Today, the Allen Institute for AI (Ai2) debuted its latest entry in the race with the launch of its open-source Tülu 3 405 billion-parameter large language model (LLM). The new model not only matches the capabilities of OpenAI’s GPT-4o, it surpasses DeepSeek’s v3 model across critical benchmarks. This isn’t the first time the Ai2 has made bold claims about a new model. In November 2024 the company released its first version of Tülu 3, which had both 8- and 70-billion parameter versions. At the time, Ai2 claimed the model was on par with the latest GPT-4 model from OpenAI, Anthropic’s Claude and Google’s Gemini. The big difference is that Tülu 3 is open-source. Ai2 also claimed back in September 2024 that its Molmo models were able to beat GPT-4o and Claude on some benchmarks.  While benchmark performance data is interesting, what’s perhaps more useful is the training innovations that enable the new Ai2 model. Pushing post-training to the limit The big breakthrough for Tülu 3 405B is rooted in an innovation that first appeared with the initial Tülu 3 release in 2024. That release utilized a combination of advanced post-training techniques to get better performance. With the Tülu 3 405B model, those post-training techniques have been pushed even further, using an advanced post-training methodology that combines supervised fine-tuning, preference learning, and a novel reinforcement learning approach that has proven exceptional at larger scales. “Applying Tülu 3’s post-training recipes to Tülu 3-405B, our largest-scale, fully open-source post-trained model to date, levels the playing field by providing open fine-tuning recipes, data and code, empowering developers and researchers to achieve performance comparable to top-tier closed models,” Hannaneh Hajishirzi, senior director of NLP Research at Ai2 told VentureBeat. Advancing the state of open-source AI post-training with RLVR Post-training is something that other models, including DeepSeek v3, do as well. The key innovation that helps to differentiate Tülu 3 is Ai2’s “reinforcement learning from verifiable rewards” (RLVR) system.  Unlike traditional training approaches, RLVR uses verifiable outcomes — such as solving mathematical problems correctly — to fine-tune the model’s performance. This technique, when combined with direct preference optimization (DPO) and carefully curated training data, has enabled the model to achieve better accuracy in complex reasoning tasks while maintaining strong safety characteristics. Key technical innovations in the RLVR implementation include: Efficient parallel processing across 256 GPUs Optimized weight synchronization  Balanced compute distribution across 32 nodes Integrated vLLM deployment with 16-way tensor parallelism The RLVR system showed improved results at the 405B-parameter scale compared to smaller models. The system also demonstrated particularly strong results in safety evaluations, outperforming DeepSeek V3 , Llama 3.1 and Nous Hermes 3. Notably, the RLVR framework’s effectiveness increased with model size, suggesting potential benefits from even larger-scale implementations. How Tülu 3 405B compares to GPT-4o and DeepSeek v3 The model’s competitive positioning is particularly noteworthy in the current AI landscape. Tülu 3 405B not only matches the capabilities of GPT-4o but also outperforms DeepSeek v3 in some areas, particularly with safety benchmarks. Across a suite of 10 AI benchmarks including safety benchmarks, Ai2 reported that the Tülu 3 405B RLVR model had an average score of 80.7, surpassing DeepSeek V3’s 75.9. Tülu however is not quite as good at GPT-4o, which scored 81.6. Overall the metrics suggest that Tülu 3 405B is at the very least extremely competitive with GPT-4o and DeepSeek v3 across the benchmarks. Why open-source AI matters and how Ai2 is doing it differently What makes Tülu 3 405B different for users, though, is how Ai2 has made the model available.  There is a lot of noise in the AI market about open source. DeepSeek says its model is open-source, and so is Meta’s Llama 3.1, which Tülu 3 405B also outperforms. With both DeepSeek and Llama the models are freely available for use; and some code, but not all, is available. For example, DeepSeek-R1 has released its model code and pre-trained weights but not the training data. Ai2 is taking a different approach in an attempt to be more open. “We don’t leverage any closed datasets,” Hajishirzi said. “As with our first Tülu 3 release in November 2024, we are releasing all of the infrastructure code.” She added that Ai2’s fully open approach, which includes data, training code and models, ensures users can easily customize their pipeline for everything from data selection through evaluation. Users can access the full suite of Tülu 3 models, including Tülu 3-405B, on Ai2’s Tülu 3 page, or test the Tülu 3-405B functionality through Ai2’s Playground demo space. source

Ai2 releases Tülu 3, a fully open-source model that bests DeepSeek v3, GPT-4o with novel post-training approach Read More »

OpenAI responds to DeepSeek competition with detailed reasoning traces for o3-mini

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI is now showing more details of the reasoning process of o3-mini, its latest reasoning model. The change was announced on OpenAI’s X account and comes as the AI lab is under increased pressure by DeepSeek-R1, a rival open model that fully displays its reasoning tokens. Models like o3 and R1 undergo a lengthy “chain of thought” (CoT) process in which they generate extra tokens to break down the problem, reason about and test different answers and reach a final solution. Previously, OpenAI’s reasoning models hid their chain of thought and only produced a high-level overview of reasoning steps. This made it difficult for users and developers to understand the model’s reasoning logic and change their instructions and prompts to steer it in the right direction.  OpenAI considered chain of thought a competitive advantage and hid it to prevent rivals from copying to train their models. But with R1 and other open models showing their full reasoning trace, the lack of transparency becomes a disadvantage for OpenAI. The new version of o3-mini shows a more detailed version of CoT. Although we still don’t see the raw tokens, it provides much more clarity on the reasoning process. Why it matters for applications In our previous experiments on o1 and R1, we found that o1 was slightly better at solving data analysis and reasoning problems. However, one of the key limitations was that there was no way to figure out why the model made mistakes — and it often made mistakes when faced with messy real-world data obtained from the web. On the other hand, R1’s chain of thought enabled us to troubleshoot the problems and change our prompts to improve reasoning. For example, in one of our experiments, both models failed to provide the correct answer. But thanks to R1’s detailed chain of thought, we were able to find out that the problem was not with the model itself but with the retrieval stage that gathered information from the web. In other experiments, R1’s chain of thought was able to provide us with hints when it failed to parse the information we provided it, while o1 only gave us a very rough overview of how it was formulating its response. We tested the new o3-mini model on a variant of a previous experiment we ran with o1. We provided the model with a text file containing prices of various stocks from January 2024 through January 2025. The file was noisy and unformatted, a mixture of plain text and HTML elements. We then asked the model to calculate the value of a portfolio that invested $140 in the Magnificent 7 stocks on the first day of each month from January 2024 to January 2025, distributed evenly across all stocks (we used the term “Mag 7” in the prompt to make it a bit more challenging). o3-mini’s CoT was really helpful this time. First, the model reasoned about what the Mag 7 was, filtered the data to only keep the relevant stocks (to make the problem challenging, we added a few non–Mag 7 stocks to the data), calculated the monthly amount to invest in each stock, and made the final calculations to provide the correct answer (the portfolio would be worth around $2,200 at the latest time registered in the data we provided to the model). It will take a lot more testing to see the limits of the new chain of thought, since OpenAI is still hiding a lot of details. But in our vibe checks, it seems that the new format is much more useful. What it means for OpenAI When DeepSeek-R1 was released, it had three clear advantages over OpenAI’s reasoning models: It was open, cheap and transparent. Since then, OpenAI has managed to shorten the gap. While o1 costs $60 per million output tokens, o3-mini costs just $4.40, while outperforming o1 on many reasoning benchmarks. R1 costs around $7 and $8 per million tokens on U.S. providers. (DeepSeek offers R1 at $2.19 per million tokens on its own servers, but many organizations will not be able to use it because it is hosted in China.) With the new change to the CoT output, OpenAI has managed to somewhat work around the transparency problem. It remains to be seen what OpenAI will do about open sourcing its models. Since its release, R1 has already been adapted, forked and hosted by many different labs and companies potentially making it the preferred reasoning model for enterprises. OpenAI CEO Sam Altman recently admitted that he was “on the wrong side of history” in open source debate. We’ll have to see how this realization will manifest itself in OpenAI’s future releases. source

OpenAI responds to DeepSeek competition with detailed reasoning traces for o3-mini Read More »

U.S. Copyright Office says AI generated content can be copyrighted — if a human contributes to or edits it

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In an important and helpful update issued today, the U.S. Copyright Office — which administers copyright protections from the government to human-authored works such as films, TV shows, novels, art, music, even software — clarified that some forms of AI generated content can, in fact, receive copyright protection, provided that a human substantially contributed or changed the content in question. The clarity came in a new document, “Copyright and Artificial Intelligence, Part 2: Copyrightability” (a PDF is embedded below), the second portion of a report that was initially released in July 2024. The report confirms that human creativity remains central to copyright law and intellectual property (IP) rights, even as AI tools become more widely used in artistic and commercial creation. But it should also give enterprises, in particular, reassurance that their brands and IP will remain protected even when they integrate distinctive products and brand marks into AI generated media, such as Coca Cola’s controversial AI holiday commercial released late last year. It marks something of an about-face for the Copyright Office, after it issued, then rescinded a copyright protection to Kris Kashtanova, an artist and AI evangelist for Adobe, on her graphic novel “Zarya of the Dawn,” who created the images using AI image generator Midjourney (which VentureBeat also uses, including for this article header). Reacting to today’s news, Kashtanova wrote on the social network X: “Two years ago I started advocating for copyright in AI. It was first Zarya of the Dawn and then Rose Enigma I did for this. It’s a small step forward and I am so happy today. AI work can be copyrighted. Your work matters. AI are tools for creativity (not replacement of it).” The Copyright Office also said a third section of this same report will be issued in the future to address the legal implications of training AI on copyrighted material, including licensing and liability. That third section should be a big deal for AI image, video and music generating companies, not to mention large language model (LLM) providers such as OpenAI, Anthropic, Google, Meta and numerous others — as they are all said to have trained on vast quantities of copyrighted material without express permission and are currently facing various lawsuits from human creators as a result. What qualifies for copyright in the AI generated era of content The report reaffirms the longstanding principle that copyright applies only to human creativity. While AI can serve as a tool in the creative process, its outputs are not copyrightable unless a human author has exercised sufficient creative control. The Copyright Office outlines three key scenarios where AI-generated material can apply for, and receive, an official certificate of copyright from the office: When human-authored content is incorporated into the AI output. When a human significantly modifies or arranges the AI-generated material. When the human contribution is sufficiently expressive and creative. In addition, the Copyright Office makes clear that using AI in the creative process does not disqualify a work from copyright protection. AI can assist with: Editing and refining text, images or music. Generating drafts or preliminary ideas for human creators to shape. Acting as a creative assistant while the human determines the final expression. As long as human authorship remains a core part of the final work, copyright protection can still apply. However, merely providing text prompts to an AI system is not enough to establish authorship. The Copyright Office determined that prompts are generally instructions or ideas rather than expressive contributions, which are required for copyright protection. Thus, an image generated with a text-to-image AI service such as Midjourney or OpenAI’s DALL-E 3 (via ChatGPT), on its own could not qualify for copyright protection. However, if the image was used in conjunction with a human-authored or human-edited article (such as this one), then it would seem to qualify. Similarly, for those looking to use AI video generation tools such as Runway, Pika, Luma, Hailuo, Kling, OpenAI Sora, Google Veo 2 or others, simply generating a video clip based on a description would not qualify for copyright. Yet, a human editing together multiple AI generated video clips into a new whole would seem to qualify. The report also clarifies that using AI in the creative process does not disqualify a work from copyright protection. If an AI tool assists an artist, writer or musician in refining their work, the human-created elements remain eligible for copyright. This aligns with historical precedents, where copyright law has adapted to new technologies such as photography, film and digital media. No legislative changes recommended After analyzing public feedback — including more than 10,000 comments from creators, legal experts and technology companies — the Copyright Office found no immediate need for new legislation, stating that the current laws around copyright in the U.S. should stand the test of time. While some had called for additional protections for AI-generated content, the report states that existing copyright law is sufficient to handle these issues. The Office did, however, acknowledge that it will continue monitoring technological developments and legal interpretations to determine if future changes are warranted. Shira Perlmutter, register of copyrights and director of the U.S. Copyright Office, emphasized the importance of human creativity in the copyright system: “After considering the extensive public comments and the current state of technological development, our conclusions turn on the centrality of human creativity to copyright. Where that creativity is expressed through the use of AI systems, it continues to enjoy protection. Extending protection to material whose expressive elements are determined by a machine, however, would undermine rather than further the constitutional goals of copyright.” Additionally, the Copyright Office plans to update its official Compendium of Copyright Practices to provide clearer guidelines for creators using AI tools. AI creators celebrate the news As news of the Copyright Office’s new document spread across social media, particularly on X — the unofficial nexus of AI research

U.S. Copyright Office says AI generated content can be copyrighted — if a human contributes to or edits it Read More »