VentureBeat

OpenAI rolls out ChatGPT for iPhone in landmark AI integration with Apple

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI demonstrated its new iPhone integration on Wednesday as iOS 18.2 rolled out to users, bringing ChatGPT directly into Siri, writing tools and camera features. The feature update, shown off on day five of OpenAI’s “12 Days of Shipmas” product launches, marks a rare opening of Apple’s core iPhone features to outside software. ChatGPT can now process commands through Siri and handle tasks across the operating system. “When Siri thinks that it would be helped by giving a task over to ChatGPT, it can just hand it off,” Dave Cummings, engineering manager for ChatGPT at OpenAI, explained during Wednesday’s demonstration. The system works through three main paths: Siri voice commands, Writing Tools for text editing and Visual Intelligence through the Camera Control button. Users can access basic ChatGPT features without an account, although premium capabilities require a subscription. Inside Apple’s AI strategy: Why the iPhone maker chose OpenAI instead of building its own The partnership addresses critical challenges for both companies. Apple, despite its $3 trillion market capitalization, has struggled to match competitors in AI development. Google’s Gemini and Anthropic’s Claude have demonstrated capabilities that surpass anything in Apple’s current AI portfolio. “We really want to make ChatGPT as frictionless and easy to use everywhere,” Sam Altman, CEO of OpenAI, said during Wednesday’s press conference. “We love Apple devices, and so this integration is one that we’re very, very proud of.” The timing of this release could boost Apple’s high-end device sales at a crucial moment. While the company doesn’t break out AI-specific revenue, limiting these features to iPhone 15 Pro models and newer devices creates a compelling reason for consumers to upgrade. This strategy mirrors Apple’s previous pattern of using advanced features — like ProRAW photography or ProRes video — to drive adoption of its premium devices, which carry margins estimated at over 60%. The move also positions Apple differently in the AI race. Rather than competing head-on with Google and Microsoft in building foundational AI models, Apple is leveraging partnerships to bring AI to its ecosystem while maintaining its focus on hardware and user experience. This approach could prove more profitable in the short term, as AI model training remains enormously expensive with uncertain returns. The $5 billion question: How OpenAI plans to monetize its million-user iPhone base For OpenAI, the partnership provides immediate access to Apple’s installed base of more than one billion iPhone users. This comes at a crucial time for the AI company, which is under pressure to generate revenue while managing massive computing costs. Recent reports indicate OpenAI’s computing expenses could reach $5 billion annually by 2025. The partnership also arrives amid OpenAI’s broader monetization push. The company recently announced a partnership with defense contractor Anduril and launched a $200-per-month ChatGPT Pro tier. OpenAI’s CFO Sarah Friar has indicated the company is exploring advertising revenue streams. Corporate AI spending could shift as ChatGPT comes to enterprise iPhones For enterprise users, this integration represents more than just a new iPhone feature. Many companies have invested heavily in standalone AI solutions, often paying for multiple services like Jasper, Claude or corporate ChatGPT licenses. Native iPhone AI integration could consolidate these tools, potentially reshaping how businesses approach mobile productivity. Companies might shift their enterprise software budgets from specialized AI applications to platforms that integrate seamlessly with Apple’s ecosystem. The integration could also reshape the competitive landscape. Google, which pays Apple billions annually to remain the iPhone’s default search engine, may need to reassess its mobile strategy. The search giant has already accelerated its AI efforts, recently launching Gemini across its products. Apple’s privacy-first reputation influenced the integration’s design. The system requires explicit user permission before sharing data with ChatGPT, and anonymous usage options preserve user privacy. All processing occurs on-device for basic features, with more advanced capabilities requiring cloud computation. The future of mobile AI: A new platform war begins The partnership highlights a broader shift in computing, where AI capabilities become as fundamental as operating systems themselves. We’re seeing the emergence of a new platform war, but unlike the mobile OS battles of the 2000s, this one centers on AI integration. The stakes are much higher: Whoever controls the AI interface likely controls the primary way users will interact with technology for years to come. Apple’s choice to partner rather than compete suggests they’ve learned from history — sometimes being the platform that hosts the best services is more valuable than trying to build everything in-house. OpenAI has more announcements planned as part of its “12 Days of Shipmas” campaign. But the Apple partnership may prove the most consequential, reshaping how a billion users interact with AI technology daily. Neither company is exchanging cash payments in the initial partnership, with Apple viewing the massive distribution potential of its devices as compensation enough for OpenAI. However, future revenue-sharing agreements are being explored, particularly around ChatGPT’s premium subscriptions. For OpenAI, the deal offers something potentially priceless: Seamless access to hundreds of millions of Apple devices. For Apple, it’s a strategic play that keeps the company competitive in AI while maintaining flexibility to partner with other providers like Google and Anthropic — suggesting that in the emerging AI platform wars, Apple is positioning itself not as a combatant, but as the battlefield itself. source

OpenAI rolls out ChatGPT for iPhone in landmark AI integration with Apple Read More »

Realtime AI video analysis app Lloyd will offer developer kit after passing 50,000 users

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Disclaimer: EndlessAI previously published a contributor piece on VentureBeat announcing the launch of Lloyd in early October. Four-year-old AI startup EndlessAI isn’t a household name — yet. But its founders and leaders believe they have a bonafide hit on their hands: Their freemium iOS app Lloyd, which uses proprietary video streaming and encoding tech to feed the user’s live video view to underlying AI models including OpenAI’s GPT-4o for help with a wide variety of tasks, from bicycle repair to telling bedtime stories, has achieved 50,000+ users three months after a stealth launch. Forty-one percent of those users engage with the app daily, according to data provided to VentureBeat by EndlessAI. While it’s no ChatGPT — which became the fastest product in history to cross the 100 million user mark in January 2023, just two months after launch — it is nonetheless encouraging enough to EndlessAI CEO Roi Ginat and executive chairman Thomas Pompidou, who told VentureBeat in a recent video call interview they planned to open their platform up to third-party developers in early 2025 and launch a consumer-facing Android app in January. Also, EndlessAI has already begun upgrading Lloyd with what it calls “Powers,” or as Pompidou describes them, “basically fine-tuned large language models (LLMs) that provide deep dive to consumer on specific use case[s].” For example, the first Lloyd Power live now in the app is “Chef,” which provides a real-time, entirely AI coach for you that watches you as you cook (if you point your smartphone camera at your stove top or cooking area) and provides step-by-step guidance. Another Lloyd Power planned to launch shortly is Tour Guide, which allows users to hold up their phone and see real-time contextual information about their surroundings. By capturing a video of a location, it identifies points of interest, provides relevant details, and can even recommend nearby attractions or activities. Making realtime video analysis accessible at scale While current LLMs have struggled to process live video efficiently due to high computational costs. EndlessAI’s technology overcomes this limitation, reducing the cost of video analysis by over 99%. Pompidou underscored the app’s broader mission: “Our mission is to scale AI to the real world. The real world is visual and live, and today’s large language models, as they’re architected, face challenges in analyzing video accurately, at scale, and cost-effectively. That’s what we make possible.” Enabling real-time video analysis allows users to interact with their environment in novel ways, from diagnosing mechanical issues to creating personalized bedtime stories. Lloyd’s core differentiation lies in its ability to process video data through LLMs at a fraction of the cost typically associated with such tasks. Traditional LLM architectures are not optimized for video, making real-time video analysis prohibitively expensive and slow. “Analyzing video with ChatGPT, assuming it could, would cost over $300 per hour,” Pompidou said. “With Lloyd, we deliver the same level of accuracy for just tens of cents per hour.” This cost-efficiency is achieved without sacrificing accuracy, setting Lloyd apart from competitors that rely on reduced frame rates or lower resolutions to cut costs, often at the expense of reliability. “Our communication layer is robust in ways other solutions aren’t. It lets developers integrate real-time AI services like speech-to-text, text-to-speech, and video analysis with unmatched reliability and performance.” As Pompidou envisions the future, he offered a glimpse into the app’s potential: “Imagine a finely tuned LLM trained on every IKEA instruction manual, guiding customers step by step with video and recognizing errors in real time. It’s just one example of how our technology can transform user experiences.” Another big arena that EndlessAI plans to court through Lloyd and its underlying video encoding tech: law enforcement, specifically analysis of police bodycam footage. “If there is someone who has a heart attack, it is going to recognize that and provide the officer with instructions on what to do right away,” said Pompidou. Privacy and security Even though Lloyd itself sees exactly whatever you point your smartphone camera at, EndlessAI prioritizes user privacy. “Data stays private to [user] accounts, and we only access it for support if users explicitly request assistance,” Ginat said. This approach ensures robust safeguards while enabling seamless interactions. But as a consequence, EndlessAI isn’t exactly sure what the most popular uses for Lloyd are among its users. Anecdotally, it says that its surveys and feedback forms have shown interest in food preparation, household repairs, fashion and lifestyle coaching, and more. While Lloyd’s consumer-facing features gain traction, EndlessAI is also building tools to empower developers and enterprises to harness its technology. “Our long-term roadmap includes an SDK for developers, starting early next year,” Pompidou said. “It will empower them to create unique visual AI solutions with extreme simplicity.” The SDK will allow developers to integrate AI vision capabilities into their own applications. “The first offering for developers will be a robust platform for real-time API communication, connecting to OpenAI and other backends,” Ginat told VentureBeat. “Developers can pick and choose which components they want to use, such as audio services or speech-to-text.” Applications for these tools span industries, from creating AI-enhanced chat applications to integrating video analysis into production lines and safety monitoring systems. EndlessAI aims to offer scalable solutions that adapt to different performance and cost requirements. “Our developer tools will allow on-the-fly adjustments — choosing between backend services or lightweight, on-device solutions depending on the use case and cost requirements,” Ginat added. By combining robust APIs with an intuitive SDK, EndlessAI envisions a new wave of AI-driven applications that go beyond traditional text or image processing. “We’ll offer developers the ability to integrate various services, including side-processing video, enhancing their sessions with additional capabilities,” Ginat said. Transforming consumer and enterprise AI Lloyd’s ability to leverage existing smartphones — without requiring additional hardware — makes it uniquely accessible. By reducing barriers to entry, EndlessAI is redefining what’s possible with AI in daily

Realtime AI video analysis app Lloyd will offer developer kit after passing 50,000 users Read More »

New LLM optimization technique slashes memory costs up to 75%

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications on top of large language models (LLMs) and other Transformer-based models. The technique, called “universal transformer memory,” uses special neural networks to optimize LLMs to keep bits of information that matter and discard redundant details from their context.  Optimizing Transformer memory The responses of Transformer models, the backbone of LLMs, depend on the content of their “context window” — that is, what they receive as input from users. The context window can be considered the model’s working memory. Tweaking the content of the context window can have a tremendous impact on the model’s performance, which has given rise to an entire field of “prompt engineering.” Current models support very long context windows with hundreds of thousands, or even millions, of tokens (an LLM’s numerical representations of the words, word parts, phrases, concepts and numbers inputted by users in their prompts). This enables users to cram more information into their prompts. However, longer prompts can result in higher compute costs and slower performance. Optimizing prompts to remove unnecessary tokens while keeping important information can reduce costs and increase speed. Current prompt optimization techniques are resource-intensive or require users to manually test different configurations to reduce the size of their prompts. Neural attention memory modules Universal transformer memory optimizes prompts using neural attention memory models (NAMMs), simple neural networks that decide whether to “remember” or “forget” each given token stored in the LLM’s memory.  “This new capability allows Transformers to discard unhelpful or redundant details, and focus on the most critical information, something we find to be crucial for tasks requiring long-context reasoning,” the researchers write. Universal transformer memory (source: Sakana AI) NAMMs are trained separately from the LLM and are combined with the pre-trained model at inference time, which makes them flexible and easy to deploy. However, they need access to the inner activations of the model, which means they can only be applied to open-source models. Like other techniques developed by Sakana AI, NAMMs are trained through evolutionary algorithms instead of gradient-based optimization methods. By iteratively mutating and selecting the best-performing models through trial and error, evolution algorithms optimize NAMMs for efficiency and performance. This is especially important since NAMMs are trying to achieve a non-differentiable goal: keeping or discarding tokens. NAMMs operate on the attention layers of LLMs, one of the key components of the Transformer architecture that determines the relations and importance of each token in the model’s context window. Based on attention values, NAMMs determine which tokens should be preserved and which can be discarded from the LLM’s context window. This attention-based mechanism makes it possible to use a trained NAMM on various models without further modification. For example, a NAMM trained on text-only data can be applied to vision or multi-modal models without additional training. Neural attention memory models (NAMMs) examine attention layers to determine which tokens should be kept or discarded from the context window (source: Sakana AI) Universal memory in action To test the universal transformer memory concept in action, the researchers trained a NAMM on top of an open-source Meta Llama 3-8B model. Their experiments show that with NAMMs, Transformer-based models perform better on natural language and coding problems on very long sequences. Meanwhile, by discarding unnecessary tokens, NAMM enabled the LLM model to save up to 75% of its cache memory while performing the tasks. “Across our benchmarks, NAMMs provide clear performance improvements to the Llama 3-8B transformer,” the researchers write. “Furthermore, our memory systems yield notable side benefits, reducing the context size of each layer, while never being explicitly optimized for memory efficiency.”  NAMM models compete with leading prompt optimization techniques while improving the model’s performance (source: Sakana AI) They also tested the model on the 70B version of Llama as well as Transformer models designed for other modalities and tasks, such as Llava (computer vision) and Decision Transformer (reinforcement learning).  “Even in these out-of-distribution settings, NAMMs retain their benefits by discarding tokens such as redundant video frames and suboptimal actions, allowing their new base models to focus on the most relevant information to improve performance,” the researchers write. Task-dependent behavior Another interesting finding is that NAMMs automatically adjust their behavior based on the task. For example, for coding tasks, the model discards contiguous chunks of tokens that correspond to comments and whitespaces that don’t affect the code’s execution. On the other hand, in natural language tasks, the model discards tokens that represent grammatical redundancies and don’t affect the meaning of the sequence. The researchers released the code for creating your own NAMMs. Techniques such as universal transformer memory can be very useful for enterprise applications that process millions of tokens and can benefit from speed boosts and cost reduction. The reusability of a trained NAMM also makes it a versatile tool to use across different applications in an enterprise. For the future, the researchers suggest more advanced techniques, such as using NAMMs during the training of LLMs to further extend their memory capabilities. “This work has only begun to tap into the potential of our new class of memory models, which we anticipate might offer many new opportunities to advance future generations of transformers,” the researchers write.   source

New LLM optimization technique slashes memory costs up to 75% Read More »

ServiceNow open sources Fast-LLM in a bid to help enterprises train AI models 20% quicker

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Training a large language model (LLM) is among the most costly and time consuming exercises for enterprises. A new open-source model being released today by ServiceNow could make a big difference, with the promise of training 20% faster, saving enterprises time and money. The Fast-LLM technology has already been in development inside the company, helping ServiceNow to accelerate its own LLM training efforts. Fast-LLM helped train ServiceNow’s StarCoder 2 LLM, which the company released earlier this year. StarCoder itself is an open source effort, as well, which benefits from the contributions of Hugging Face, Nvidia and others. ServiceNow also uses Fast-LLM for large, trillion-token continuous pre-training from existing models, as well as for fine-tuning jobs. Because it is an open source technology, anyone can use Fast-LLM to help accelerate AI training, including fine tuning operations. The intent is that it can be a drop-in replacement to an existing AI training pipeline with minimal configuration changes. The new open source project aims to differentiate against commonly used AI training frameworks, including the open-source PyTorch, with a series of innovations for data parallelism and memory management. “When you’re dealing with compute clusters that cost hundreds of millions and training runs that cost millions of dollars, 20% can be a huge saving in terms of both dollars and time and the overall CO2 footprint,” Nicolas Chapados, VP of research at ServiceNow, told VentureBeat. The innovations that enable Fast-LLM to accelerate AI training The AI industry well understands the challenge of training AI more efficiently. VentureBeat Transform 2024 featured a panel that discussed that very issue, detailing options for scaling infrastructure. The Fast-LLM approach isn’t about scaling infrastructure; it’s about optimizing the efficiency of existing training resources. “We carefully looked at all the operations needed to train large language models, especially transformer based large language models,” Chapados explained. “We carefully optimize both the way in which the compute is distributed to the individual cores within the GPU, as well as how the memory is being used by the models themselves.” Fast-LLM’s competitive advantage stems from two primary innovations that help to differentiate it. The first is Fast-LLM’s approach to computation ordering, which defines the order in which computations occur in an AI training run. Chapados explained that Fast-LLM uses a new technique that ServiceNow calls “Breadth-First Pipeline Parallelism.” “This is the fundamental scientific innovation around the way that compute is scheduled, both inside a single GPU and across multiple GPUs,” said Chapados. The second major innovation addresses memory management. In large training operations, memory fragments over time. This means memory becomes broken into pieces over time as training progresses. The fragmentation creates memory inefficiency, preventing training clusters from using all available memory properly. “We’ve been very careful in the way that we design Fast LLM to almost completely eliminate the problem of memory fragmentation when training those large language models,” said Chapados. How enterprises can use Fast-LLM today to accelerate training  The Fast-LLM framework is designed to be accessible while maintaining enterprise-grade capabilities. It functions as a drop-in replacement for PyTorch environments and integrates with existing distributed training setups.  “For any model developer or any researcher, it’s just a simple configuration file that lets you specify all the architectural details that matter,” said Chapados . Running training operations faster has multiple benefits and can allow enterprises to experiment more. “It makes the risk of large training runs smaller,” said Chapados. “It equips users, researchers and model builders with a bit more ambition to train larger runs, because they will not be afraid that it will cost so much anymore.” Looking forward, the expectation is that as an open source project, Fast-LLM will be able to expand faster, benefiting from external contributions. ServiceNow has already been successful with that approach with StarCoder. “Our goal is really to be very, very transparent and responsive to the community contributions in terms of the use of this framework,” said Chapados.” We’re still getting early feedback about what people like, what they are able to do with it and our goal is really to scale this.” source

ServiceNow open sources Fast-LLM in a bid to help enterprises train AI models 20% quicker Read More »

OpenAI expands ChatGPT Canvas to all users

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI is extending access to its side-by-side digital editing space, Canvas, to all ChatGPT users and adding new features, the company announced today in a livestream, the fourth of its “12 Days of OpenAI” holiday-themed announcements. Canvas, which was announced in October, was previously available only to paying ChatGPT Plus, Teams, Edu and Enterprise subscribers. Available on desktop web browsers, it converts ChatGPT’s traditional interface, with a conversation at the top and text entry box at the bottom, into a left-hand sidebar, and on the right side of the chat session screen, adds a new space for the content the user is working on — such as a code block for an application or a text document. When a user converses with ChatGPT and asks for changes to the content on the right sidebar, it will appear automatically there with the changes implemented, rather than generating a whole new text response in the traditional interface. Canvas can also make suggestions for text and code, which it will implement immediately.  Canvas added to GPT-4o Starting today, Canvas will be integrated into GPT-4o, eliminating the need to toggle to GPT-4o with Canvas on the model picker. Canvas will automatically open for some prompts or pasted text. It is available only on the web version or the Windows app of ChatGPT. With the wider release, OpenAI also updated Canvas to run Python code, support more text pasting, and be launchable in custom GPTs.  Users can paste Python code to ChatGPT which may automatically open Canvas. Previously, it didn’t let people see if the code that was just generated or edited works. They needed to copy the code again and run it in their own systems. Allowing Canvas to run code brings it closer to Anthropic’s Claude Artifacts, which already allowed people to see a sample webpage from their code.  During a demo, OpenAI showed Canvas can also create and preview graphics from code alone so developers or analysts can adjust the formulas or data before finalizing a chart. Canvas also has a feature that can find bugs in the code and make suggestions to fix them.  Custom GPTs with Canvas For those that create custom GPTs, Canvas will be integrated by default, though users can still define the parameters of when and if Canvas will open for prompts on the assistant they created.  But for existing custom GPTs, OpenAI did not make Canvas a default to avoid disrupting how these already work. Users can add Canvas as a feature to their GPTs through the settings of the custom GPT.  Adding Canvas makes custom GPTs as powerful and useful as the base ChatGPT, allowing for more features specific to customers’ needs.  OpenAI said it plans “to continue making improvements and launching new features available in Canvas in the near term.” Features like Canvas and Artifacts are indicative of the interface battleground model-makers find themselves in as users look for more useful features to keep using the chat platforms.    source

OpenAI expands ChatGPT Canvas to all users Read More »

‘Not there yet’: Sora rollout receives mixed response from AI filmmakers citing inconsistent results, content restrictions

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Ten months after previewing it with eye-catching, vividly lifelike videos, OpenAI finally released Sora, its AI video generator model, to the public on Monday. However, in the two days since, the debut has been less than picture-perfect: Early-adopter AI filmmakers have reported surprisingly inconsistent and unrealistic results from Sora, especially compared to leading rival AI video creation tools from the likes of Runway, Luma, Hailuo, Kling, and Tencent’s new Hunyuan. Others have taken issue with OpenAI’s content restrictions prohibiting violence and explicit content, even with cartoonish or unserious visuals. And OpenAI has now closed off Sora account creation temporarily to deal with unanticipated high demand, according to a post by OpenAI CEO Sam Altman on X yesterday. Sora’s bumpy rollout already has some stalwart AI critics such as public relations agent Ed Zitron suggesting that it was a “bait and switch” to earn OpenAI positive press coverage despite the company’s being technically unable to actually provide the model in a reliable inference to the masses. Wide-ranging reactions, from impressed to disappointed Regardless, those who have been able to access the tool starting this week (or earlier, when OpenAI pre-seeded it to selected alpha and beta testers) report a wide range of experiences, from impressive to disappointing, especially given the price point for accessing it: $20 a month for 50 generations through ChatGPT Plus subscriptions, or $200 a month for unlimited generations through ChatGPT Pro. “Nope, Sora is not there yet!” wrote creator Umesh on X. “HailuoAI seems far better. I just tried four generations with varying prompts to achieve what HailuoAI did so easily, but none of them worked.” Similarly, artist PurzBeats posted on X saying Sora was “[p]robably only worth it on the Pro plan,” and that they experienced “[v]ery strange and choppy motion on everything but the subject” in their generations, among other complaints. “OpenAI has been lying to us this whole time!” wrote independent filmmaker el cine on X. “It loses in every way, most of the clips not usable and it doesn’t even follow prompts properly,” they noted, posting clips of a generation with people walking backwards with their legs facing opposite their torsos and heads. Ultimately, they concluded: “Think twice before going for the Pro plan.” Others have been more impressed with the results, including futurist podcaster Ed Krassenstein, who called the model “amazing” in a post on X based on his experiences making quick clips with it. He posted a four-minute long Sora-generated film by another creator, KNGMKRlabs, that shows cavemen in a documentary-style program called “The First Humans” which to my eyes looks incredibly realistic and compelling. A highly competitive market leaves less room for error and tinkering Nonetheless, as AI video generators work to out-compete one another for users, with new features that make Hollywood-caliber filmmaking available to the masses, Sora’s debut seems challenged to say the least. And for actual Hollywood studios that OpenAI and rivals are reportedly courting, the rivals may currently have the edge. Already, for example, Runway has inked a deal for an unspecified amount with Lionsgate to provide the John Wick studio with custom AI models trained on its catalog of 20,000+ films and TV shows. Especially for those looking to shell out the money for the “Pro” subscription tier, the question is whether Sora is worth it now, or whether other AI generators with similar or less-expensive pricing structures are a better deal. Sora’s current output and relatively high entry price points (it offers no free tier, unlike other AI video generators) may make it more challenging to find widespread adoption. In response to these reactions, an OpenAI spokesperson emailed VentureBeat the following statement taking from Sora’s official launch blog post: “The current model still has room for improvement. It may struggle to simulate the physics of a complex scene, and may not comprehend specific instances of cause and effect (for example: a cookie might not show a mark after a character bites it). The model may also confuse spatial details included in a prompt, such as discerning left from right, or struggle with precise descriptions of events that unfold over time, like specific camera trajectories.“ OpenAI’s spokesperson also noted: “We’ve seen significant demand for Sora.” source

‘Not there yet’: Sora rollout receives mixed response from AI filmmakers citing inconsistent results, content restrictions Read More »

OpenAI releases hyperrealistic AI video generator Sora Turbo to the public

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI announced the public release of its hyperrealistic AI video generation software Sora today — nearly 10 months after it was first shown publicly in February 2024. In fact, OpenAI is actually releasing a much upgraded model from the one debuted back then: The new Sora Turbo will be available at sora.com to ChatGPT Plus and Pro paying subscribers ($20/month or $200/month) for those in the U.S. and most countries outside of the EU and UK. OpenAI cofounder and CEO Sam Altman presented the news in a YouTube livestream, part of the company’s “12 Days of OpenAI” series of holiday-themed announcements scheduled for 1 pm ET / 10 am PT. Sora can generate a wide range of videos from text inputs or still images, creating clips between 10 and 20 seconds long, and do so in a range of resolutions from 480p to 1080p, as well as aspect ratios from landscape to square and vertical. OpenAI created a whole new unique interface for the product, which includes a grid or list view the user can toggle within to see their generations. Users can also enter a mode called Storyboarding which lets them generate multiple linked clips in a Timeline view. The model attempts to provide a seamless transition between the clips — users can drag to make cuts more abrupt or make takes longer and more fluid. ChatGPT Plus users can generate up to 50 videos per month at 480p resolution. For professionals and heavy users, the Pro plan offers higher resolutions, longer durations, and unlimited generations at slow speeds. OpenAI also announced plans to release tailored pricing options for diverse user needs by early 2025. News broken by MKBHD Popular tech reviewing YouTuber Marques Brownlee, better known by his handle MKHBD, broke the news of Sora’s release about an hour beforehand. “The rumors are true — SORA, OpenAI’s AI video generator, is launching for the public today…” Brownlee wrote in a post on the social network X. Brownlee also shared a thread of examples of videos he made using the text/image/video-to-video generator, to which he was given early access as one among several dozen early creative partners to whom OpenAI seeded the program before its general release. Brownlee shared that while Sora could produce impressive and sometimes eerily realistic footage such as that of newscasters or a gadget reviewer like himself, it also tends to hallucinate random details and telltale signs of being AI-generated, such as garbled, nonsensical text in news chyrons, unnatural physics, and even adding or removing objects seemingly at random. He also noted that OpenAI imposes fairly strict guardrails against generating likenesses of real people and against violence and explicit themes. Credit: MKBHD/YouTube Still, in his full YouTube review, he also ultimately concluded that “this is a lot for humanity to digest now…[it] is the new baseline, this is once again the worst that it will ever be.” Leaked on Hugging Face in protest by early testers The release follows a leak of Sora onto the AI code sharing community Hugging Face by beta testers roughly two weeks ago in protest of OpenAI’s handling of the beta testing program. As the leakers wrote on their Hugging Face space: “Hundreds of artists provide unpaid labor through bug testing, feedback and experimental work for the program for a $150B valued company. While hundreds contribute for free, a select few will be chosen through a competition to have their Sora-created films screened — offering minimal compensation which pales in comparison to the substantial PR and marketing value OpenAI receives.” Sora also arrives in the midst of an increasingly competitive landscape for realistic, live-action AI video generation. Runway continues to upgrade its AI video generation platform rapidly with new features including, just last week, the ability to re-record dialog in pre-existing footage and have the characters’ faces match. Luma AI and Chinese competitors such as Kling, Hailuo, and recently, Tencent, have all fielded impressive AI video generation tools in the last few weeks alone. So even though OpenAI — by virtue of its success with ChatGPT and early, eye-catching Sora footage — may have strong recognition that can help popularize the launch of this new AI video generator to the masses, there are now many competing options that appear, at least superficially, to offer similar or better video quality. That makes Sora less of a guaranteed success. source

OpenAI releases hyperrealistic AI video generator Sora Turbo to the public Read More »

Here’s the one thing you should never outsource to an AI model

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In a world where efficiency is king and disruption creates billion-dollar markets overnight, it’s inevitable that businesses are eyeing generative AI as a powerful ally. From OpenAI’s ChatGPT generating human-like text, to DALL-E producing art when prompted, we’ve seen glimpses of a future where machines create alongside us — or even lead the charge. Why not extend this into research and development (R&D)? After all, AI could turbocharge idea generation, iterate faster than human researchers and potentially discover the “next big thing” with breathtaking ease, right? Hold on. This all sounds great in theory, but let’s get real: Betting on gen AI to take over your R&D will likely backfire in significant, maybe even catastrophic, ways. Whether you’re an early-stage startup chasing growth or an established player defending your turf, outsourcing generative tasks in your innovation pipeline is a dangerous game. In the rush to embrace new technologies, there’s a looming risk of losing the very essence of what makes truly breakthrough innovations — and, worse yet, sending your entire industry into a death spiral of homogenized, uninspired products. Let me break down why over-reliance on gen AI in R&D could be innovation’s Achilles’ heel. 1. The unoriginal genius of AI: Prediction ≠ imagination Gen AI is essentially a supercharged prediction machine. It creates by predicting what words, images, designs or code snippets fit best based on a vast history of precedents. As sleek and sophisticated as this may seem, let’s be clear: AI is only as good as its dataset. It’s not genuinely creative in the human sense of the word; it doesn’t “think” in radical, disruptive ways. It’s backward-looking — always relying on what’s already been created. In R&D, this becomes a fundamental flaw, not a feature. To truly break new ground, you need more than just incremental improvements extrapolated from historical data. Great innovations often arise from leaps, pivots, and re-imaginings, not from a slight variation on an existing theme. Consider how companies like Apple with the iPhone or Tesla in the electric vehicle space didn’t just improve on existing products — they flipped paradigms on their heads. Gen AI might iterate design sketches of the next smartphone, but it won’t conceptually liberate us from the smartphone itself. The bold, world-changing moments — the ones that redefine markets, behaviors, even industries — come from human imagination, not from probabilities calculated by an algorithm. When AI is driving your R&D, you end up with better iterations of existing ideas, not the next category-defining breakthrough. 2. Gen AI is a homogenizing force by nature One of the biggest dangers in letting AI take the reins of your product ideation process is that AI processes content — be it designs, solutions or technical configurations — in ways that lead to convergence rather than divergence. Given the overlapping bases of training data, AI-driven R&D will result in homogenized products across the market. Yes, different flavors of the same concept, but still the same concept. Imagine this: Four of your competitors implement gen AI systems to design their phones’ user interfaces (UIs). Each system is trained on more or less the same corpus of information — data scraped from the web about consumer preferences, existing designs, bestseller products and so on. What do all those AI systems produce? Variations of a similar result. What you’ll see develop over time is a disturbing visual and conceptual cohesion where rival products start mirroring one another. Sure, the icons might be slightly different, or the product features will differ at the margins, but substance, identity and uniqueness? Pretty soon, they evaporate. We’ve already seen early signs of this phenomenon in AI-generated art. In platforms like ArtStation, many artists have raised concerns regarding the influx of AI-produced content that, instead of showing unique human creativity, feels like recycled aesthetics remixing popular cultural references, broad visual tropes and styles. This is not the cutting-edge innovation you want powering your R&D engine. If every company runs gen AI as its de facto innovation strategy, then your industry won’t get five or ten disruptive new products each year — it’ll get five or ten dressed-up clones. 3. The magic of human mischief: How accidents and ambiguity propel innovation We’ve all read the history books: Penicillin was discovered by accident after Alexander Fleming left some bacteria cultures uncovered. The microwave oven was born when engineer Percy Spencer accidentally melted a chocolate bar by standing too close to a radar device. Oh, and the Post-it note? Another happy accident — a failed attempt at creating a super-strong adhesive. In fact, failure and accidental discoveries are intrinsic components of R&D. Human researchers, uniquely attuned to the value hidden in failure, are often able to see the unexpected as opportunity. Serendipity, intuition, gut feeling — these are as pivotal to successful innovation as any carefully laid-out roadmap. But here’s the crux of the problem with gen AI: It has no concept of ambiguity, let alone the flexibility to interpret failure as an asset. The AI’s programming teaches it to avoid mistakes, optimize for accuracy and resolve data ambiguities. That’s great if you’re streamlining logistics or increasing factory throughput, but it’s terrible for breakthrough exploration. By eliminating the possibility of productive ambiguity — interpreting accidents, pushing against flawed designs — AI flattens potential pathways toward innovation. Humans embrace complexity and know how to let things breathe when an unexpected output presents itself. AI, meanwhile, will double down on certainty, mainstreaming the middle-of-road ideas and sidelining anything that looks irregular or untested. 4. AI lacks empathy and vision — two intangibles that make products revolutionary Here’s the thing: Innovation is not just a product of logic; it’s a product of empathy, intuition, desire, and vision. Humans innovate because they care, not just about logical efficiency or bottom lines, but about responding to nuanced human needs and emotions. We dream of making things faster, safer, more

Here’s the one thing you should never outsource to an AI model Read More »