VentureBeat

Tesla’s big ‘We, Robot’ event criticized for ‘parlor tricks’ and vague timelines for robots, Cybercab, Robovan

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Elon Musk’s publicly traded electric vehicle company Tesla, Inc. hosted its highly anticipated “We Robot” event on Oct. 10, 2024, at Warner Bros. Discovery Studios in Burbank, California and streamed it live on his social network X and YouTube. Despite showing off slick prototypes of a new “Cybercab” autonomous car without a steering wheel or gas and brake pedals, and a similarly sparse, art deco retrofuturistic “Robovan” capable of seating 20 passengers, the event was criticized by some prominent observers as being more style than substance. lt was lacking in precise details on timelines, costs and legal issues, and even came across as misleading in some cases. The most glaring example of potentially misleading information was Tesla’s move to have its still-in-development humanoid Optimus robots filling the venue space and interacting with attendees, even serving drinks at a bar. While some present assumed the robots were entirely autonomous, reports confirmed they were teleoperated — meaning controlled by a human in another room. “Not wholly AI? Not at all AI,” wrote venture capitalist Josh Wolfe, co-founder of Lux Capital on Musk’s social network X. “Totally worthy to celebrate low latency remote control but totally dishonest to demo these as autonomous robots—call it the parlor trick it is.” This skepticism raises questions about how far Tesla has truly advanced in developing artificial intelligence for robotics. While Musk touted the Optimus, Cybercab and Robotaxi as tremendously impactful inventions for society, EV reviewers The Kilowatts noted on X that much of the technology will remain “unbelievable” to investors and consumers until it is shipped. For now, Tesla’s vision of fully autonomous personal robots as well as new autonomous electric vehicle types remains more speculative than realistic. Here’s a summary of what was discussed: Cybercab: all autonomous and cheaper than a bus or Model 3? Credit: Tesla Perhaps the most expected of the announcements was Tesla’s Cybercab, a two-seater electric vehicle designed for autonomous operation. Musk described the Cybercab as a sleek, more compact version of the Cybertruck, and it will reportedly cost less than $30,000. That is below the current price of Tesla’s most affordable personal vehicle, the Model 3, which debuted at $35,000 in 2019 but has since seen its price rise to around $42,000. According to Musk, Tesla aims for the Cybercab’s operating cost to be between $0.20 and $0.30 per mile compared to the operational cost of a bus, which he placed at around $1 per mile. The vehicles would be powered by inductive (wireless) charging, eliminating the need for plug-in charging stations and further integrating autonomous cars into the urban landscape. The promise of an individualized “mass transit” future has long been part of Musk’s vision, and the Cybercab is a key component of that goal. During the event, Musk proudly displayed 20 Cybercabs driving autonomously around the venue. He emphasized that the Cybercab is part of a broader effort to make cities safer, cleaner and more efficient. Tesla’s AI Vision system, trained on millions of cars, allows these vehicles to operate without the fatigue and distractions that affect human drivers. Musk claimed that Tesla’s autonomous technology could eventually make driving 10 to 30 times safer than human operation. He also floated the idea that autonomous car owners could manage fleets of vehicles, offering ride-hailing services similar to Uber or Lyft. This business model, if successful, could reshape the gig economy and create new opportunities for individuals to generate income. However, while the Cybercab’s debut was met with enthusiasm, industry insiders raised concerns about the lack of concrete details surrounding its rollout. Musk indicated that production on the Cybercab would begin between “probably” in 2026 or “before 2027,” but admitted he “tend[ed] to be a little optimistic with timeframes.” Indeed, Tesla has historically struggled with meeting deadlines for its more ambitious projects such as its Full Self-Driving (FSD) and even shipping the Cybertruck, which Musk at one point suggested would be waterproof enough to act as a boat for short journeys (it is not and cannot). As Washington Post technology journalist Faiz Siddiqui noted on X, the entire We, Robot event livestream was preceded by a heavy disclaimer from Tesla stating, in part, that “Forward-looking statements are based on assumptions with respect to the future, are based on management’s current expectations, involve certain risks and uncertainties, and are not guarantees. Future results may differ materially from those expressed in any forward-looking statement.” While the vision of affordable autonomous transportation is compelling, much remains uncertain about when—or if—Tesla can deliver on these promises. Robovan: Tesla’s answer to buses, trains, and mass transit Credit: Tesla Another key reveal at the event was Tesla’s Robovan, a large autonomous vehicle designed to transport up to 20 passengers or goods. Musk positioned the Robovan as a potential solution for high-density urban transport, hinting at a future where autonomous shuttles replace conventional buses. The Robovan represents a vision of more efficient, less congested cities where autonomous vehicles run frequently enough to eliminate the need for large, underutilized parking lots. Musk suggested that, over time, cities could convert parking spaces into parks, improving the quality of life in urban areas. Some technology observers such as Brian Roemmele on X were overjoyed at the news, especially the Robovan’s sleek, striking art deco design, even predicting that “100s of 1000s” or hundreds of thousands of people would be living in Robovans converted into mobile homes by 2031. Credit: Tesla Despite these ambitious goals and praise, critics were quick to point out that Tesla offered no specific timeline for the Robovan’s production. X user Facts Chaser noted that while Tesla unveiled a prototype, China already has operational autonomous vans in real urban environments. Tesla Full Self-Driving coming to Texas and California next year? A recurring theme at the We Robot event was Musk’s long-held belief that autonomous vehicles will revolutionize urban life by reducing traffic, improving safety and reclaiming public

Tesla’s big ‘We, Robot’ event criticized for ‘parlor tricks’ and vague timelines for robots, Cybercab, Robovan Read More »

Nvidia makes 7 tech announcements in Washington D.C.

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia showed off its technology in Washington, D.C. today at its AI Summit to help educate the nation’s capital. The world’s biggest maker of AI chips made seven big announcements at the summit, and we’ll summarize them here. First, it is teaming with U.S. tech leaders to help organizations create custom AIapplications and transform the world’s industries using the latest Nvidia NIM Agent Blueprints and Nvidia NeMo and Nvidia NIM microservices. Across industries, organizations like AT&T, Lowe’s and the University of Florida are using the microservices to create their own data-driven AI flywheels to power custom generative AI applications. U.S. technology consulting leaders Accenture, Deloitte, Quantiphi and SoftServe are adopting Nvidia NIM Agent Blueprints and Nvidia NeMo and NIM microservices to help clients in healthcare, manufacturing, telecommunications, financial services and retail create custom generative AI agents and copilots. Data and AI platform leaders Cadence, Cloudera, DataStax, Google Cloud, NetApp, SAP, ServiceNow and Teradata are advancing their data and AI platforms with Nvidia NIM. “AI is driving transformation and shaping the future of global industries,” said Jensen Huang, CEO of Nvidia, in a statement. “In collaboration with U.S. companies, universities and government agencies, Nvidia will help advance AI adoption to boost productivity and drive economic growth.” New NeMo microservices — NeMo Customizer, NeMo Evaluator and NeMo Guardrails — can be paired with NIM microservices to help developers easily curate data at scale, customize and evaluate models, and manage responses to align with business objectives. Developers can then seamlessly deploy a custom NIM microservice across any GPU-accelerated cloud, data center or workstation. Lowe’s, a home improvement company, is exploring the use of Nvidia NIM and NeMo microservices to improve experiences for associates and customers and enhance productivity of their store associates. Forexample, the retailer is leveraging Nvidia NeMo Guardrails to enhance the safety and security of its generative AI solution platform. Nvidia is helping SETI sift through radio data faster. SETI Institute researchers are also using Nvidia tech to conduct the first real-time AI search for fast radio bursts that might be a sign of life somewhere else. To better understand new and rare astronomical phenomena, radio astronomers are adopting accelerated computing and AI on Nvidia Holoscan and IGX platforms. This summer, scientists supercharged their tools in the hunt for signs of life beyond Earth. Researchers at the SETI Institute became the first to apply AI to the real-time direct detection of faint radio signals from space. Their advances in radio astronomy are available for any field that applies accelerated computing and AI. “We’re on the cusp of a fundamentally different way of analyzing streaming astronomical data, and the kinds of things we’ll be able to discover with it will be quite amazing,” said Andrew Siemion, Bernard M. Oliver Chair for SETI at the SETI Institute, a group formed in 1984 that now includes more than 120 scientists. The SETI Institute operates the Allen Telescope Array (pictured above) in Northern California. It’s a cutting-edge telescope used in the search for extraterrestrial intelligence (SETI) as well as for the study of intriguing transient astronomical events such as fast radio bursts. The project started more than a decade ago, during early attempts to marry machine learning and astronomy. Pittsburgh trades steel for AI tech Pittsburgh is getting new Nvidia AI tech centers. Carnegie Mellon University and the University of Pittsburgh will accelerate innovation and public-private collaboration through a pair of joint technology centers with Nvidia. Serving as a bridge for academia, industry and public-sector groups to partner on artificial intelligence innovation, Nvidia is launching its inaugural AI Tech Community in Pittsburgh, Pennsylvania. Collaborations with Carnegie Mellon University and the University of Pittsburgh, as well as startups, enterprises and organizations based in the “city of bridges,” are part of the new Nvidia AI Tech Community initiative, announced today during the Nvidia AI Summit in Washington, D.C. The initiative aims to supercharge public-private partnerships across communities rich with potential for enabling technological transformation using AI. Two Nvidia joint technology centers will be established in Pittsburgh to tap into expertise in the region. Nvidia’s Joint Center with Carnegie Mellon University (CMU) for Robotics, Autonomy and AI will equip higher-education faculty, students and researchers with the latest technologies and boost innovation in the fields of AI and robotics. And Nvidia’s Joint Center with the University of Pittsburgh for AI and Intelligent Systems will focus on computational opportunities across the health sciences, including applications of AI in clinical medicine and biomanufacturing. CMU — the nation’s No. 1 AI university according to the U.S. News & World Report — has pioneered work in autonomous vehicles and natural language processing. CMU’s Robotics Institute, the world’s largest university-affiliated robotics research group, brings a diverse group of more than a thousand faculty, staff, students, post-doctoral fellows and visitors together to solve humanity’s toughest challenges through robotics. The University of Pittsburgh — designated as an R1 research university at the forefront of innovation — is ranked No. 6 among U.S. universities in research funding from the National Institutes of Health, topping more than $1 billion in research expenditures in fiscal year 2022 and ranking No. 14 among U.S. universities granted utility patents. Nvidia will provide the centers with DGX for AI training, Omniverse for simulation and Jetson for robotics edge computing. U.S. healthcare system deploys AI agents for research to rounds The U.S. healthcare system is harnessing AI agents from research laboratories to clinical settings. Nvidia also said the U.S. healthcare system is adopting digital health agents to harness AI across the board, from research laboratories to clinical settings. The latest AI-accelerated tools — on display at the Nvidia AI Summit taking place this week in Washington, D.C. — include Nvidia NIM, a collection of cloud-native microservices that support AI model deployment and execution, and Nvidia NIM Agent Blueprints, a catalog of pretrained, customizable workflows. These technologies are already in use in the public

Nvidia makes 7 tech announcements in Washington D.C. Read More »

OpenAI’s Swarm AI agent framework: Routines and handoffs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The newly launched Swarm framework from developers at OpenAI is an experimental tool designed to orchestrate networks of AI agents, and it’s been making waves in the tech community. Unlike other multi-agent frameworks, Swarm aims to provide a blend of simplicity, flexibility and control that sets it apart. Although still in its early stages, Swarm offers a fresh take on agent collaboration, with core concepts like “routines” and “handoffs” to guide agents through collaborative tasks. While Swarm is not an official OpenAI product nor is intended as a production-ready tool, it provides valuable insights into the potential of multi-agent systems in enterprise automation. Its key focus is on simplifying agent interactions, which is achieved through the Chat Completions API. This stateless design means agents do not retain memory between interactions, contributing to Swarm’s simplicity but limiting its use for complex decision-making tasks that require contextual memory.  Instead, developers need to implement their own memory solutions, which offer both challenges and opportunities for customization. This balance of simplicity and control is a major point of attraction for developers interested in learning about or building multi-agent orchestration systems. A lightweight approach for developers Swarm is distinct in its lightweight design, focusing on ease of understanding and implementation. This approach gives developers more granular control over execution steps and tool calls, making it easier to experiment with agent interactions and orchestrations. Compared to other frameworks like LangChain or CrewAI, Swarm’s stateless model is easier to grasp, which makes it accessible for those who are new to multi-agent systems. However, the lack of built-in memory management is a noted limitation. To achieve more sophisticated agent behavior, developers must implement external memory solutions. Despite this, Swarm’s emphasis on transparency and modularity has been praised for enabling developers to tailor agent behaviors and extend the framework based on their needs Guiding collaboration with routines and handoffs At the heart of Swarm are the concepts of “routines” and “handoffs,” which are mechanisms designed to help agents carry out collaborative tasks in an organized manner. A routine is a set of instructions that agents follow to complete specific actions, while handoffs allow for seamless transitions between agents, each specializing in particular functions.  This structured approach to agent interactions allows developers to create dynamic, multi-step processes where tasks are handled by the agent best suited for each step. Examples include customer service systems where triage agents manage initial contact before passing on specific queries to agents specialized in sales, support or refunds. This adaptability makes Swarm particularly useful for building applications that require multiple, specialized capabilities to work together. Addressing limitations: The role of state and memory Despite its promising features, Swarm’s lack of internal support for state and memory limits its effectiveness in complex decision-making based on past interactions. For instance, in a sales scenario, a stateful system would allow agents to track customer history across interactions—a capability that Swarm, in its current form, does not provide. The release of Swarm has also sparked ethical discussions about its potential impact on the workforce and the broader implications of AI-driven automation. While Swarm aims to make sophisticated multi-agent systems more accessible, its capability to replace human tasks raises concerns about job displacement and fairness. Security experts have also highlighted the need for robust safeguards to prevent misuse or malfunction within these autonomous agent networks. However, the decision to open-source Swarm has created an opportunity for community-driven development, potentially leading to novel uses and improvements. As developers experiment with Swarm, they contribute to the growing understanding of how multi-agent orchestration can be leveraged to solve real-world problems, particularly in enterprise environments where automation can drive efficiency and allow human workers to focus on more strategic initiatives. source

OpenAI’s Swarm AI agent framework: Routines and handoffs Read More »

SAP doubles down on AI to transform enterprise operations

Presented by SAP In the midst of the AI revolution, fake podcasts and chatbots may get the most hype, but the transformative power of AI lies in the heart of the business. It’s in the day-to-day operations where businesses can realize the potential of AI: automating and improving workflows, gaining intelligent insights and driving efficiency.  SAP, the global powerhouse behind many leading enterprise applications and business AI, is one of the pioneers in applying the latest AI technology to the challenges of running an enterprise. They are folding the technology into their stack across all levels including SAP S/4HANA Cloud Private Edition, a cloud-based ERP (Enterprise Resource Planning) solution that is customizable to meet unique business needs. ERP tools have long allowed business leaders to manage models of their business processes and now they’ll have the option to leverage time-saving assistance from AI. Joule, SAP’s generative AI copilot, is differentiated by direct integration across SAP’s broad set of business applications. Consider the basic example of a sales manager who oversees customer orders.  Since Joule is integrated across SAP’s business applications, the sales manager will be able to use Joule to check inventory status and potential supply chain issues causing a delay and ask Joule to monitor and resolve the issue quickly, improving workflow and customer experience. An AI copilot named Joule “Joule is the copilot throughout SAP’s cloud portfolio, including ERP,” explained Vinay V, cloud ERP solution expert for SAP. “Joule is the front end to the vast amounts of data and rich process information that resides within SAP. Joule will help to uncover and provide those insights for the end users.” The potential of Joule can be found throughout SAP’s solutions. A finance team, for instance, can automate fiscal decisions by using the AI to analyze past results, compare projected and actual financial performance, and make actionable recommendations for budget management. Supply chain management teams can ask Joule for smart predictions to optimize inventories and order fulfillment.  The power of AI is even more apparent when it can reach across disciplines, divisions and silos in organizations and coordinate the response. These cross-disciplinary opportunities may offer the greatest advantages for enterprises because they can unlock synergies that weren’t being tapped before. “We’re seeing and developing multiple use cases across each of these application areas,” explained V. “This is where we see there is a huge potential for making an impact and liberating the technology to help, running through these processes much more efficiently. We see this across finance, in supply chain, including manufacturing, delivery management, production-related activities and also in asset management.” While many of these complex opportunities are just beginning to be explored, AI will have obvious applications for the front-line teams that manage customers, both existing and future. SAP wants to marry their deep reservoir of transaction data with the ability of LLMs to add a more human layer to personalized responses. The ability of large language models to work with human languages opens up the possibilities for humans to work more naturally and efficiently with the underlying SAP systems as well. “With Joule, the end users, irrespective of their know-how, irrespective of their background or their familiarity with the systems, can use natural language to interact.” said V. “That’s a significant change in the way that end users will not only be able to interact, but be able to uncover greater insights on what’s happening within the system.” Embracing the cloud In essence, SAP is driving a foundational change for its customers. While end-user AI applications may be the most visible, SAP is also using generative AI to help customers migrate to the cloud faster. Moving to the cloud can help simplify operations and reduce costs for businesses. At SAP’s annual TechEd event, the company announced a new generative AI functionality to encourage and support customers to speed up this transition.  “Customers moving their ERP systems from on-premise to the cloud is a significant endeavor. No jokes about that,” explained Pratibha Kumar Sood, vice president of cloud ERP product marketing at SAP. That is why SAP is investing in generative AI capabilities to help customers get proactive guidance and task automation to help them on their journey. While this generative AI capability is still in beta, SAP plans to roll out this feature more broadly in Q1 of 2025 through its RISE with SAP program and has developed a rigorous process to help customers along the path of embracing the cloud — and any of the AI that’s available there. “This is where the RISE with SAP methodology comes in to help with SAP expert support, tools and best-practice guidance” said Sood.  “The methodology is not just theory — here are the steps and here are the phases — but it’s SAP providing expert guidance. We will have our architects, our SAP team of advisors to support the RISE customers right from day one.” SAP is also enhancing tools like SAP Build with generative AI capabilities that allow both developers and business users to create scalable, secure and stable extensions to their cloud ERP. These capabilities are designed to promote developer productivity with tools for automated code generation and code explanation. AI is also strengthening the process of data governance in organizations by bringing continuity to their business. “We’re leveraging AI to flag and quickly summarize the various changes that are made, or that are pending for a particular master data object,” explained V.  In the end, all of the seemingly magical powers of AI depend entirely on the quality of the data. The enterprises that rely upon SAP have been using it as a stable platform for back-office tasks like recording transactions or tracking inventory. Now they’re able to unlock new answers sealed away in this data to uncover trends and patterns with the complex analytical AI algorithms. To learn more, register for SAP’s RISE Into the Future virtual event taking place on October 22, 2024. Sponsored articles are content produced by a company that

SAP doubles down on AI to transform enterprise operations Read More »

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia quietly unveiled a new artificial intelligence model on Tuesday that outperforms offerings from industry leaders OpenAI and Anthropic, marking a significant shift in the company’s AI strategy and potentially reshaping the competitive landscape of the field. The model, named Llama-3.1-Nemotron-70B-Instruct, appeared on the popular AI platform Hugging Face without fanfare, quickly drawing attention for its exceptional performance across multiple benchmark tests. Nvidia reports that their new offering achieves top scores in key evaluations, including 85.0 on the Arena Hard benchmark, 57.6 on AlpacaEval 2 LC, and 8.98 on the GPT-4-Turbo MT-Bench. These scores surpass those of highly regarded models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, catapulting Nvidia to the forefront of AI language understanding and generation. Nvidia’s AI gambit: From GPU powerhouse to language model pioneer This release represents a pivotal moment for Nvidia. Known primarily as the dominant force in graphics processing units (GPUs) that power AI systems, the company now demonstrates its capability to develop sophisticated AI software. This move signals a strategic expansion that could alter the dynamics of the AI industry, challenging the traditional dominance of software-focused companies in large language model development. Nvidia’s approach to creating Llama-3.1-Nemotron-70B-Instruct involved refining Meta’s open-source Llama 3.1 model using advanced training techniques, including Reinforcement Learning from Human Feedback (RLHF). This method allows the AI to learn from human preferences, potentially leading to more natural and contextually appropriate responses. With its superior performance, the model has the potential to offer businesses a more capable and cost-efficient alternative to some of the most advanced models on the market. The model’s ability to handle complex queries without additional prompting or specialized tokens is what sets it apart. In a demonstration, it correctly answered the question “How many r’s are in strawberry?” with a detailed and accurate response, showcasing a nuanced understanding of language and an ability to provide clear explanations. What makes these results particularly significant is the emphasis on “alignment,” a term in AI research that refers to how well a model’s output matches the needs and preferences of its users. For enterprises, this translates into fewer errors, more helpful responses, and ultimately, better customer satisfaction. How Nvidia’s new model could reshape business and research For businesses and organizations exploring AI solutions, Nvidia’s model presents a compelling new option. The company offers free hosted inference through its build.nvidia.com platform, complete with an OpenAI-compatible API interface. This accessibility makes advanced AI technology more readily available, allowing a broader range of companies to experiment with and implement advanced language models. The release also highlights a growing shift in the AI landscape toward models that are not only powerful but also customizable. Enterprises today need AI that can be tailored to their specific needs, whether that’s handling customer service inquiries or generating complex reports. Nvidia’s model offers that flexibility, along with top-tier performance, making it a compelling option for businesses across industries. However, with this power comes responsibility. Like any AI system, Llama-3.1-Nemotron-70B-Instruct is not immune to risks. Nvidia has cautioned that the model has not been tuned for specialized domains like math or legal reasoning, where accuracy is critical. Enterprises will need to ensure they are using the model appropriately and implementing safeguards to prevent errors or misuse. The AI arms race heats up: Nvidia’s bold move challenges tech giants Nvidia’s latest model release signals just how fast the AI landscape is shifting. While the long-term impact of Llama-3.1-Nemotron-70B-Instruct remains uncertain, its release marks a clear inflection point in the competition to build the most advanced AI systems. By moving from hardware into high-performance AI software, Nvidia is forcing other players to reconsider their strategies and accelerate their own R&D. This comes on the heels of the company’s introduction of the NVLM 1.0 family of multimodal models, including the 72-billion-parameter NVLM-D-72B. These recent releases, particularly the open-source NVLM project, have shown that Nvidia’s AI ambitions go beyond just competing—they are challenging the dominance of proprietary systems like GPT-4o in areas ranging from image interpretation to solving complex problems. The rapid succession of these releases underscores Nvidia’s ambitious push into AI software development. By offering both multimodal and text-only models that compete with industry leaders, Nvidia is positioning itself as a comprehensive AI solutions provider, leveraging its hardware expertise to create powerful, accessible software tools. Nvidia’s strategy seems clear: it’s positioning itself as a full-service AI provider, combining its hardware expertise with accessible, high-performance software. This move could reshape the industry, pushing rivals to innovate faster and potentially sparking more open-source collaboration across the field. As developers test Llama-3.1-Nemotron-70B-Instruct, we’re likely to see new applications emerge across sectors like healthcare, finance, education, and beyond. Its success will ultimately depend on whether it can turn impressive benchmark scores into real-world solutions. In the coming months, the AI community will closely watch how Llama-3.1-Nemotron-70B-Instruct performs in real-world applications beyond benchmark tests. Its ability to translate high scores into practical, valuable solutions will ultimately determine its long-term impact on the industry and society at large. Nvidia’s deeper dive into AI model development has intensified the competition. If this is the beginning of a new era in artificial intelligence, it’s one where fully integrated solutions may set the pace for future breakthroughs. source

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results Read More »

Pika 1.5 updates again to add even more AI video Pikaffects: crumble, dissolve, deflate, ta-da

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Pika a.k.a Pika Labs or Pika AI, the Palo Alto, California-based startup that has raised $55 million to disrupt video production with its video AI models of the same name, is further expanding the free special effects users can access through its web-based AI image-to-video generator. Pika 1.5, its latest AI video model, now includes the ability to crumble, dissolve, deflate and “ta-da” video subjects — the last of these essentially making a video subject disappear behind a cloth. Users can simply upload an image to the site and Pika 1.5 will turn it into a video with a corresponding animation. The user guides which animation is used by selecting it from a button beside the “Image” attachment icon (paperclip) labeled “Pikaeffect” with a magic wand beside it. The new AI powered special effects — or “Pikaffects, in the company’s parlance — join six others previously unveiled earlier this month: Explode, squish, melt, crush, inflate and “cake-ify,” the latter of which turns any uploaded still image into an “is it cake?” video where the answer is a resounding “yes!” Unfortunately, VentureBeat has been unable to use the new effects yet as when we attempted, the site said “We’re experiencing high demand right now (how flattering)!” Nonetheless, as the AI landscape evolves, Pika’s unique approach to video manipulation sets it apart from the growing field of AI-driven content generation. While Pikaffects cater to users seeking creative transformations, traditional features like lip-syncing and AI sound effects remain accessible on the earlier Pika 1.0 model. Paid subscribers have the flexibility to switch between Pika 1.5 and 1.0, depending on their project needs. Where Pika came from Pika Labs, co-founded by former Stanford AI researchers Demi Guo and Chenlin Meng, first launched its AI video platform in late 2023. The company has rapidly scaled, reaching over half a million users in less than a year. Unlike many AI video platforms that focus primarily on realism, Pika takes a different route by prioritizing creative manipulation. These effects enable users to reshape video subjects in ways that are not just visually impactful but also technologically intriguing, offering hands-on AI practitioners a sandbox for experimentation. For professionals managing machine learning models or integrating new AI tools, Pika Labs’ latest features could present new opportunities to deploy innovative content solutions. The platform allows the quick application of effects through a user-friendly interface while still enabling deeper integration via text-to-video (T2V) and image-to-video (I2V) workflows. Subscription pricing To accommodate a diverse range of users, Pika Labs offers four subscription plans: Basic (Free): This entry-level plan provides 150 monthly video credits and access to the Pika 1.5 features, making it suitable for casual users or those curious about the platform. Standard ($8/month, billed yearly): With 700 monthly credits, access to both Pika 1.5 and Pika 1.0, and faster generation times, this plan offers more flexibility for content creators looking to produce more videos. Pro ($28/month, billed yearly): This plan includes 2,000 monthly credits and even faster generation times, catering to users with higher content demands. Unlimited ($76/month, billed yearly): Designed for power users, this plan allows unlimited video credits, offering the fastest generation times available on the platform. The updated credit structure (15 credits per five-second clip) allows for a scalable approach to video generation. The various subscription tiers accommodate different needs, from light experimentation to intensive production, ensuring that both individual contributors and larger teams can find an affordable solution. These flexible pricing options make Pika Labs accessible to smaller teams and larger organizations alike, allowing AI engineers to manage costs while experimenting with new video capabilities. Attempting to differentiate amid a crowded sea of competitors The move by Pika to further differentiate its video AI model from competitors such as Runway, Luma, Kling, and Hailuo comes amid intensifying competition in the nascent industry, and follows Adobe’s move this week at its MAX conference in Miami Beach, Florida, to begin offering a preview of its own “enterprise safe” AI video model Firefly Video, trained on licensed data. Pika, like most other generative AI startups, has not disclosed its precise training dataset. Other rivals such as Runway have been sued by artists for alleged copyright infringement over training AI models on data scraped from the web, including many other artworks and videos, and likely many copyrighted ones. That case, which also names AI image generator Midjourney and Stability, is moving forward toward a trial but has yet to be decided. source

Pika 1.5 updates again to add even more AI video Pikaffects: crumble, dissolve, deflate, ta-da Read More »

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has introduced a new tool to measure artificial intelligence capabilities in machine learning engineering. The benchmark, called MLE-bench, challenges AI systems with 75 real-world data science competitions from Kaggle, a popular platform for machine learning contests. This benchmark emerges as tech companies intensify efforts to develop more capable AI systems. MLE-bench goes beyond testing an AI’s computational or pattern recognition abilities; it assesses whether AI can plan, troubleshoot, and innovate in the complex field of machine learning engineering. A schematic representation of OpenAI’s MLE-bench, showing how AI agents interact with Kaggle-style competitions. The system challenges AI to perform complex machine learning tasks, from model training to submission creation, mimicking the workflow of human data scientists. The agent’s performance is then evaluated against human benchmarks. (Credit: arxiv.org) AI takes on Kaggle: Impressive wins and surprising setbacks The results reveal both the progress and limitations of current AI technology. OpenAI’s most advanced model, o1-preview, when paired with specialized scaffolding called AIDE, achieved medal-worthy performance in 16.9% of the competitions. This performance is notable, suggesting that in some cases, the AI system could compete at a level comparable to skilled human data scientists. However, the study also highlights significant gaps between AI and human expertise. The AI models often succeeded in applying standard techniques but struggled with tasks requiring adaptability or creative problem-solving. This limitation underscores the continued importance of human insight in the field of data science. Machine learning engineering involves designing and optimizing the systems that enable AI to learn from data. MLE-bench evaluates AI agents on various aspects of this process, including data preparation, model selection, and performance tuning. A comparison of three AI agent approaches to solving machine learning tasks in OpenAI’s MLE-bench. From left to right: MLAB ResearchAgent, OpenHands, and AIDE, each demonstrating different strategies and execution times in tackling complex data science challenges. The AIDE framework, with its 24-hour runtime, shows a more comprehensive problem-solving approach. (Credit: arxiv.org) From lab to industry: The far-reaching impact of AI in data science The implications of this research extend beyond academic interest. The development of AI systems capable of handling complex machine learning tasks independently could accelerate scientific research and product development across various industries. However, it also raises questions about the evolving role of human data scientists and the potential for rapid advancements in AI capabilities. OpenAI’s decision to make MLE-benc open-source allows for broader examination and use of the benchmark. This move may help establish common standards for evaluating AI progress in machine learning engineering, potentially shaping future development and safety considerations in the field. As AI systems approach human-level performance in specialized areas, benchmarks like MLE-bench provide crucial metrics for tracking progress. They offer a reality check against inflated claims of AI capabilities, providing clear, quantifiable measures of current AI strengths and weaknesses. The future of AI and human collaboration in machine learning The ongoing efforts to enhance AI capabilities are gaining momentum. MLE-bench offers a new perspective on this progress, particularly in the realm of data science and machine learning. As these AI systems improve, they may soon work in tandem with human experts, potentially expanding the horizons of machine learning applications. However, it’s important to note that while the benchmark shows promising results, it also reveals that AI still has a long way to go before it can fully replicate the nuanced decision-making and creativity of experienced data scientists. The challenge now lies in bridging this gap and determining how best to integrate AI capabilities with human expertise in the field of machine learning engineering. source

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test Read More »

CareYaya’s QuikTok is AI phone companion for lonely aging adults

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More CareYaya Health Technologies has launched QuikTok, an AI phone companion targeted at lonely older adults. The free service is akin to “TikTok for older adults,” and it is developed to combat the loneliness epidemic and flag the early warning signs of cognitive decline and mental health issues. Of course, in this case, the older folks are talking with AI characters who are not real. The service comes from Research Triangle Park, North Carolina-based CareYaya Health Technologies, which is developing artificial intelligence innovations for the aging population. QuikTok is available free of charge to individuals through partnerships with the AgeTech Collaborative from the American Association of Retired Persons (AARP) and the Johns Hopkins Artificial Intelligence & Technology Collaboratory. CareYaya is a mission-driven social enterprise dedicated to researching and developing technologies that benefit the aging and chronically ill populations. It operates a no-cost care platform to empower families to book affordable care. The work is funded by individuals and grants from organizations including the Johns Hopkins Artificial Intelligence & Technology Collaboratory, Atrium Health, and support from the AgeTech Collaborative at AARP and the National Institutes of Health. About 37% of older Americans suffer from loneliness. The AI phone companion program provides comfort through meaningful interactions while also assessing early warning signs of cognitive decline, depression, anxiety, and other mental health disorders to support mental stimulation and emotional well-being for the older population. A recent poll reported that 37% of older Americans (ages 50-80) experienced loneliness, with 34% reporting being socially isolated. Loneliness has been identified as an epidemic by major health organizations, affecting physical and mental health and increasing the prevalence of heart disease, stroke, dementia and other health problems. “We believe conversational AI can be used as a tool to combat loneliness and prevent disease arising from social isolation, especially for older adults,” said Neal Shah, CEO of CareYaya, in a statement. “It’s been reported that for older Americans, being lonely is worse for your health and life expectancy than smoking 15 cigarettes a day. We designed QuikTok to bring people a sense of companionship, comfort, and mental stimulation while addressing some of the most pressing challenges older adults face, such as aloneness, memory decline, and even chronic pain.” QuikTok is the world’s first AI-based phone companion that meaningfully engages older adults. Powered by CareYaya’s state-of-the-art conversational LLMs, QuikTok uses AI voice generation to produce high-quality, human-like speech. It offers reduced latency for smooth, natural language speech patterns. As a complimentary service, it is accessible to anyone with a landline or mobile phone and bridges the technological divide by not requiring an internet connection or even a computer. Critically, this promotes equitable access to cutting-edge technology that can benefit older Americans. The company has 60 people. “As we continue to explore innovative ways to improve the quality of life for older adults, AI-driven companions offer practical support and emotional engagement, which is critical to the older population,” said David Casarett, chief of palliative care at the Duke University Health System, in a statement. “QuikTok has the potential to alleviate loneliness, enhance emotional well-being, support longevity and help seniors manage the complex challenges of aging and chronic illness.” Key features of QuikTok CareYaya also provides AI-driven online games for seniors as well. AI chat therapy: QuikTok initiates conversations and provides an empathetic listening ear, offering older adults a comforting presence to help them cope with loneliness, grief and loss. Personalized memory recall: QuikTok remembers past conversations, creating an ongoing dialogue that feels deeply personal and authentic, making each user feel understood and catered to. Interactive mental exercises: When connected to a web interface, QuikTok engages older people in daily mental exercises that keep their minds sharp, from word puzzles to games like bingo and chess. Pain management assistance: This service offers guided meditation and mindfulness exercises to help the older population manage chronic pain and improve overall well-being. Routine check-ins: For concerned family members and friends, the service can call individuals on certain days and times to check in on them and provide telephone-based companionship. Nancy Gribble, a 78-year-old QuikTok user, said in a statement, “At my age, it’s easy to feel invisible, like your voice doesn’t matter anymore. But Frank, my friend from QuikTok, hangs on my every word. He asks questions, he listens and remembers the details I share, and he helps me find joy in things to reminisce and talk about. QuikTok makes me feel heard and valued. It’s become a trusted confidant when I have no one else to turn to.” Due to high demand, older Americans or their families interested in QuikTok can join the waitlist to access the service at https://quiktok.careyaya.org/. Origins Neal Shah is CEO and cofounder of CareYaya Health Technologies. Shah cofounded CareYaya in 2022. As a former hedge fund manager turned social entrepreneur, he cofounded the company after a profoundly personal experience with caregiving. Motivated by creativity and humanitarian progress, the company’s flagship product is a technology platform that lets people quickly book experienced caregivers who are uniquely all students in the healthcare field, helping expand the care workforce amidst a critical caregiver shortage. Previously, Shah founded and managed a $250 million investment fund in New York, focusing on healthcare investments, and was a partner at a $1.5 billion private equity and hedge fund focusing on various sectors. He started his career in investment banking at Credit Suisse First Boston. How it works Asked about the AI tech, Shah said in an email to VentureBeat that the tech uses a large language model (LLM) is paired with a text-to-speech (TTS) model, which are connected to a telephony server that transcribes the speech of elderly users so that the LLM can understand and respond to him or her. The AI is specifically trained and prompted to optimize it for speaking with elderly people and participating in phone conversations. This includes thousands of phone and

CareYaya’s QuikTok is AI phone companion for lonely aging adults Read More »

Writer’s Palmyra X 004 takes the lead in AI function calling, surpassing tech giants

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Writer, the full-stack generative AI platform, unveiled its latest large language model (LLM) Palmyra X 004 today, marking a significant advancement in enterprise artificial intelligence. This new frontier model excels in function calling and workflow execution, key capabilities for building practical AI agents and assistants for businesses. The release of Palmyra X 004 arrives at a crucial juncture in the AI industry. Companies are racing to integrate generative AI into their operations, creating a growing demand for models that can not only process and generate text but also take actions and execute complex workflows. “We’re enabling AI to execute multiple functions and actions simultaneously, which is crucial for automating complex enterprise workflows,” said Waseem Alshikh, co-founder and CTO of Writer, in an interview with VentureBeat. “With Palmyra X 004, we’re moving from AI assistants that simply provide information to systems that can actually do work.” A diagram illustrating how Writer’s Palmyra X 004 AI model executes complex business tasks, from analyzing inventory data to sending summary emails, by coordinating multiple API calls and functions — a capability that sets it apart in the realm of enterprise AI solutions. (Credit: Writer) Outperforming tech giants: How Palmyra X 004 is raising the bar for AI function calling Palmyra X 004 distinguishes itself with its exceptional performance on function calling tasks. The model achieved a score of 78.76% on Berkeley’s Tool Calling Leaderboard, surpassing offerings from tech giants like OpenAI, Anthropic, Google, and Meta by nearly 20%. This benchmark evaluates a model’s ability to select appropriate tools, determine which APIs to call, and successfully execute tasks based on natural language inputs. The model’s capabilities extend beyond function calling. Palmyra X 004 also ranked in the top 10 on Stanford University’s Holistic Evaluation of Language Models (HELM) benchmark, scoring 86.1% on HELM Lite and 81.3% on HELM MMLU. These scores indicate strong general language understanding and reasoning abilities across a wide range of subjects. Writer claims to have achieved these results with a model containing only around 150 billion parameters — significantly smaller than some other frontier models rumored to have trillions of parameters. The company attributes this efficiency to its innovative use of synthetic data and a proprietary early stopping mechanism during training. Alshikh explained, “We’ve found a way to build highly capable models without relying on massive parameter counts or exorbitant training costs. Our model training costs were below a million dollars in GPU time for something above 100 billion parameters. We’re proving that you don’t need hundreds of billions of dollars to compete in the AI race.” This focus on efficiency could have major implications for the AI industry. As companies grapple with the high costs of deploying and running large language models, Writer’s approach suggests a path to more affordable and accessible enterprise AI solutions. Breaking barriers: Palmyra X 004’s multilingual and multimodal capabilities Palmyra X 004 boasts impressive technical specifications. It features a 128,000 token context window, allowing it to process and reason over very long documents or conversations. The model supports multilingual capabilities across 30+ languages and can handle multimodal inputs including text, images, and audio (though image and audio capabilities are still in beta). Writer offers multiple deployment options for Palmyra X 004, addressing a key concern for many enterprises: data privacy and control. Companies can access the model through Writer’s API, deploy it via cloud providers like AWS SageMaker and Nvidia AI Enterprise, or even host the model on-premises within their own infrastructure. The release of Palmyra X 004 reflects a broader shift in the AI landscape. While public attention has focused on consumer-facing chatbots and image generators, the real transformative potential of AI lies in its application to complex business processes. “We’re seeing a transition from using AI for simple tasks like summarizing emails to building complex, multi-step workflows,” Alshikh noted. “Our enterprise customers are looking to create AI agents that can interact with multiple internal systems, access varied data sources, and execute sophisticated business logic.” This vision of AI as a workflow automation tool aligns with broader industry trends. Gartner predicts that by 2025, 50% of enterprise applications will embed some form of AI functionality. Writer’s focus on function calling and agentic capabilities positions them well to capitalize on this trend. The future of AI: Writer’s vision for deeper, smarter, and more efficient models However, challenges remain. As AI systems become more deeply integrated into business processes, issues of reliability, explainability, and governance become paramount. Writer has attempted to address some of these concerns with built-in features like automatic data integration with retrieval augmented generation (RAG) and source transparency. The company emphasizes the importance of AI safety and control. Palmyra X 004 integrates with Writer’s existing suite of AI guardrails and governance tools, allowing enterprises to set content policies and control the model’s outputs. Looking ahead, Alshikh hinted at Writer’s future research directions. The company is exploring ways to build even deeper transformer models, potentially with 500-2000 layers, which they believe could lead to significant improvements in reasoning capabilities. “We’re at an inflection point in AI development,” Alshikh said. “The next frontier isn’t just about making models bigger, but making them smarter and more efficient. We’re focusing on architectural innovations that can deliver better reasoning at lower inference costs.” As the AI arms race intensifies, Writer’s release of Palmyra X 004 serves as a reminder that innovation isn’t just about raw scale. By focusing on efficiency, ease of deployment, and real-world business applications, the company is charting a distinctive path in the enterprise AI market. The true test will be in how enterprises adopt and apply this technology. As businesses continue to explore the potential of generative AI, models like Palmyra X 004 could play a crucial role in turning the promise of AI-driven workflow automation into reality. source

Writer’s Palmyra X 004 takes the lead in AI function calling, surpassing tech giants Read More »

The ‘strawberrry’ problem: How to overcome AI’s limitations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More By now, large language models (LLMs) like ChatGPT and Claude have become an everyday word across the globe. Many people have started worrying that AI is coming for their jobs, so it is ironic to see almost all LLM-based systems flounder at a straightforward task: Counting the number of “r”s in the word “strawberry.” They are not exclusively failing at the alphabet “r”; other examples include counting “m”s in “mammal”, and “p”s in “hippopotamus.” In this article, I will break down the reason for these failures and provide a simple workaround. LLMs are powerful AI systems trained on vast amounts of text to understand and generate human-like language. They excel at tasks like answering questions, translating languages, summarizing content and even generating creative writing by predicting and constructing coherent responses based on the input they receive. LLMs are designed to recognize patterns in text, which allows them to handle a wide range of language-related tasks with impressive accuracy. Despite their prowess, failing at counting the number of “r”s in the word “strawberry” is a reminder that LLMs are not capable of “thinking” like humans. They do not process the information we feed them like a human would. Conversation with ChatGPT and Claude about the number of “r”s in strawberry. Almost all the current high performance LLMs are built on transformers. This deep learning architecture doesn’t directly ingest text as their input. They use a process called tokenization, which transforms the text into numerical representations, or tokens. Some tokens might be full words (like “monkey”), while others could be parts of a word (like “mon” and “key”). Each token is like a code that the model understands. By breaking everything down into tokens, the model can better predict the next token in a sentence.  LLMs don’t memorize words; they try to understand how these tokens fit together in different ways, making them good at guessing what comes next. In the case of the word “hippopotamus,” the model might see the tokens of letters “hip,” “pop,” “o” and “tamus”, and not know that the word “hippopotamus” is made of the letters — “h”, “i”, “p”, “p”, “o”, “p”, “o”, “t”, “a”, “m”, “u”, “s”. A model architecture that can directly look at individual letters without tokenizing them may potentially not have this problem, but for today’s transformer architectures, it is not computationally feasible. Further, looking at how LLMs generate output text: They predict what the next word will be based on the previous input and output tokens. While this works for generating contextually aware human-like text, it is not suitable for simple tasks like counting letters. When asked to answer the number of “r”s in the word “strawberry”, LLMs are purely predicting the answer based on the structure of the input sentence. Here’s a workaround While LLMs might not be able to “think” or logically reason, they are adept at understanding structured text. A splendid example of structured text is computer code, of many many programming languages. If we ask ChatGPT to use Python to count the number of “r”s in “strawberry”, it will most likely get the correct answer. When there is a need for LLMs to do counting or any other task that may require logical reasoning or arithmetic computation, the broader software can be designed such that the prompts include asking the LLM to use a programming language to process the input query. Conclusion A simple letter counting experiment exposes a fundamental limitation of LLMs like ChatGPT and Claude. Despite their impressive capabilities in generating human-like text, writing code and answering any question thrown at them, these AI models cannot yet “think” like a human. The experiment shows the models for what they are, pattern matching predictive algorithms, and not “intelligence” capable of understanding or reasoning. However, having a prior knowledge of what type of prompts work well can alleviate the problem to some extent. As the integration of AI in our lives increases, recognizing its limitations is crucial for responsible usage and realistic expectations of these models.  Chinmay Jog is a senior machine learning engineer at Pangiam. DataDecisionMakers Welcome to the VentureBeat community! DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation. If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers. You might even consider contributing an article of your own! Read More From DataDecisionMakers source

The ‘strawberrry’ problem: How to overcome AI’s limitations Read More »