VentureBeat

IP Copilot wants to use AI to turn your Slack messages into patents

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More IP Copilot, a startup using artificial intelligence to modernize intellectual property management, announced today it has raised $4.2 million in seed funding led by Salesforce Ventures and Preface Ventures, with participation from NextGen Ventures and Notation. The San Francisco-based company, founded by AI experts with over 1,000 patents between them, aims to streamline how enterprises discover and protect innovative ideas by analyzing internal communications and documents in real-time. “Everyone is an inventor,” said Austin Walters, CEO of IP Copilot, in an exclusive interview with VentureBeat. “Engineers are busier than ever and our goal is to minimize friction between ideas and patents, helping more innovators become inventors.” Unlike other AI tools focused on patent drafting, IP Copilot emphasizes early discovery by integrating with platforms like Slack and Jira to identify potentially patentable ideas as they emerge in everyday work conversations. How AI supercharges IP legal teams’ workflow “At a large company, one IP counsel might be responsible for 10,000 employees. You can’t possibly read all the Slacks available to you every day, all your Jira tickets, and all the confluence pages that change,” explained Jason Harrier, who recently joined as founder and general counsel after serving as Head of IP at Plaid. “Our tool gives patent teams the superpowers to actually read everything available to them and automatically categorize the best patent candidates.” The company’s approach combines traditional machine learning with large language models, prioritizing accuracy over pure automation. “About 60% is traditional machine learning,” said Harrier. “We use what I think is the best AI for what it does well, and then use large language models where they work really well.” To address privacy concerns, the system only monitors public channels and can be deployed within an enterprise’s own cloud environment. “Everything is a first-party system with us,” Walters emphasized. “We’re not sending communications to a third party.” Enterprise IP management faces AI transformation The funding comes at a critical time for enterprise IP management. As AI innovation accelerates, companies are struggling to identify and protect intellectual property effectively. While most AI startups in the space focus on automating patent drafting, IP Copilot’s emphasis on early discovery could reshape how companies build their patent portfolios. The startup’s roadmap suggests broader ambitions. Plans include expanding into trade secret management and introducing natural language interfaces for portfolio analysis. These moves could position IP Copilot to become a comprehensive IP intelligence platform rather than just another legal tech tool. But perhaps the company’s most striking innovation isn’t technological – it’s philosophical. In a landscape crowded with AI companies promising to replace human expertise, IP Copilot has chosen a different path. “AI isn’t going to take your job,” says Harrier, “but an attorney that’s using AI could take your job.” For patent professionals watching the AI revolution unfold, that distinction might make all the difference. source

IP Copilot wants to use AI to turn your Slack messages into patents Read More »

OpenAI’s hyper realistic AI video generator Sora launches today, MKBHD reports

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI announced the public release of its hyperrealistic AI video generation software Sora today — nearly 10 months after it was first shown publicly in February 2024. In fact, OpenAI is actually releasing a much upgraded model from the one debuted back then: The new Sora Turbo will be available at sora.com to ChatGPT Plus and Pro paying subscribers ($20/month or $200/month) for those in the U.S. and most countries outside of the EU and UK. OpenAI cofounder and CEO Sam Altman presented the news in a YouTube livestream, part of the company’s “12 Days of OpenAI” series of holiday-themed announcements scheduled for 1 pm ET / 10 am PT. Sora can generate a wide range of videos from text inputs or still images, creating clips between 10 and 20 seconds long, and do so in a range of resolutions from 480p to 1080p, as well as aspect ratios from landscape to square and vertical. OpenAI created a whole new unique interface for the product, which includes a grid or list view the user can toggle within to see their generations. Users can also enter a mode called Storyboarding which lets them generate multiple linked clips in a Timeline view. The model attempts to provide a seamless transition between the clips — users can drag to make cuts more abrupt or make takes longer and more fluid. ChatGPT Plus users can generate up to 50 videos per month at 480p resolution. For professionals and heavy users, the Pro plan offers higher resolutions, longer durations, and unlimited generations at slow speeds. OpenAI also announced plans to release tailored pricing options for diverse user needs by early 2025. News broken by MKBHD Popular tech reviewing YouTuber Marques Brownlee, better known by his handle MKHBD, broke the news of Sora’s release about an hour beforehand. “The rumors are true — SORA, OpenAI’s AI video generator, is launching for the public today…” Brownlee wrote in a post on the social network X. Brownlee also shared a thread of examples of videos he made using the text/image/video-to-video generator, to which he was given early access as one among several dozen early creative partners to whom OpenAI seeded the program before its general release. Brownlee shared that while Sora could produce impressive and sometimes eerily realistic footage such as that of newscasters or a gadget reviewer like himself, it also tends to hallucinate random details and telltale signs of being AI-generated, such as garbled, nonsensical text in news chyrons, unnatural physics, and even adding or removing objects seemingly at random. He also noted that OpenAI imposes fairly strict guardrails against generating likenesses of real people and against violence and explicit themes. Credit: MKBHD/YouTube Still, in his full YouTube review, he also ultimately concluded that “this is a lot for humanity to digest now…[it] is the new baseline, this is once again the worst that it will ever be.” Leaked on Hugging Face in protest by early testers The release follows a leak of Sora onto the AI code sharing community Hugging Face by beta testers roughly two weeks ago in protest of OpenAI’s handling of the beta testing program. As the leakers wrote on their Hugging Face space: “Hundreds of artists provide unpaid labor through bug testing, feedback and experimental work for the program for a $150B valued company. While hundreds contribute for free, a select few will be chosen through a competition to have their Sora-created films screened — offering minimal compensation which pales in comparison to the substantial PR and marketing value OpenAI receives.” Sora also arrives in the midst of an increasingly competitive landscape for realistic, live-action AI video generation. Runway continues to upgrade its AI video generation platform rapidly with new features including, just last week, the ability to re-record dialog in pre-existing footage and have the characters’ faces match. Luma AI and Chinese competitors such as Kling, Hailuo, and recently, Tencent, have all fielded impressive AI video generation tools in the last few weeks alone. So even though OpenAI — by virtue of its success with ChatGPT and early, eye-catching Sora footage — may have strong recognition that can help popularize the launch of this new AI video generator to the masses, there are now many competing options that appear, at least superficially, to offer similar or better video quality. That makes Sora less of a guaranteed success. source

OpenAI’s hyper realistic AI video generator Sora launches today, MKBHD reports Read More »

Google’s new Trillium AI chip delivers 4x speed and powers Gemini 2.0

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google has just unveiled Trillium, its sixth-generation artificial intelligence accelerator chip, claiming performance improvements that could fundamentally alter the economics of AI development while pushing the boundaries of what’s possible in machine learning. The custom processor, which powered the training of Google’s newly announced Gemini 2.0 AI model, delivers four times the training performance of its predecessor while using significantly less energy. This breakthrough comes at a crucial moment, as tech companies race to build increasingly sophisticated AI systems that require enormous computational resources. “TPUs powered 100% of Gemini 2.0 training and inference,” Sundar Pichai, Google’s CEO, explained in an announcement post highlighting the chip’s central role in the company’s AI strategy. The scale of deployment is unprecedented: Google has connected more than 100,000 Trillium chips in a single network fabric, creating what amounts to one of the world’s most powerful AI supercomputers. How Trillium’s 4x performance boost is transforming AI development Trillium’s specifications represent significant advances across multiple dimensions. The chip delivers a 4.7x increase in peak compute performance per chip compared to its predecessor, while doubling both high-bandwidth memory capacity and interchip interconnect bandwidth. Perhaps most importantly, it achieves a 67% increase in energy efficiency — a crucial metric as data centers grapple with the enormous power demands of AI training. “When training the Llama-2-70B model, our tests demonstrate that Trillium achieves near-linear scaling from a 4-slice Trillium-256 chip pod to a 36-slice Trillium-256 chip pod at a 99% scaling efficiency,” said Mark Lohmeyer, VP of compute and AI infrastructure at Google Cloud. This level of scaling efficiency is particularly remarkable given the challenges typically associated with distributed computing at this scale. The economics of innovation: Why Trillium changes the game for AI startups Trillium’s business implications extend beyond raw performance metrics. Google claims the chip provides up to a 2.5x improvement in training performance per dollar compared to its previous generation, potentially reshaping the economics of AI development. This cost efficiency could prove particularly significant for enterprises and startups developing large language models. AI21 Labs, an early Trillium customer, has already reported significant improvements. “The advancements in scale, speed, and cost-efficiency are significant,” noted Barak Lenz, CTO of AI21 Labs, in the announcement. Scaling new heights: Google’s 100,000-chip AI supernetwork Google’s deployment of Trillium within its AI Hypercomputer architecture demonstrates the company’s integrated approach to AI infrastructure. The system combines over 100,000 Trillium chips with a Jupiter network fabric capable of 13 petabits per second of bisectional bandwidth — enabling a single distributed training job to scale across hundreds of thousands of accelerators. “The growth of flash usage has been more than 900% which has been incredible to see,” noted Logan Kilpatrick, a product manager on Google’s AI studio team, during the developer conference, highlighting the rapidly increasing demand for AI computing resources. Beyond Nvidia: Google’s bold move in the AI chip wars The release of Trillium intensifies the competition in AI hardware, where Nvidia has dominated with its GPU-based solutions. While Nvidia’s chips remain the industry standard for many AI applications, Google’s custom silicon approach could provide advantages for specific workloads, particularly in training very large models. Industry analysts suggest that Google’s massive investment in custom chip development reflects a strategic bet on the growing importance of AI infrastructure. The company’s decision to make Trillium available to cloud customers indicates a desire to compete more aggressively in the cloud AI market, where it faces strong competition from Microsoft Azure and Amazon Web Services. Powering the future: what Trillium means for tomorrow’s AI The implications of Trillium’s capabilities extend beyond immediate performance gains. The chip’s ability to handle mixed workloads efficiently — from training massive models to running inference for production applications — suggests a future where AI computing becomes more accessible and cost-effective. For the broader tech industry, Trillium’s release signals that the race for AI hardware supremacy is entering a new phase. As companies push the boundaries of what’s possible with artificial intelligence, the ability to design and deploy specialized hardware at scale could become an increasingly critical competitive advantage. “We’re still in the early stages of what’s possible with AI,” Demis Hassabis, CEO of Google DeepMind, wrote in the company blog post. “Having the right infrastructure — both hardware and software — will be crucial as we continue to push the boundaries of what AI can do.” As the industry moves toward more sophisticated AI models that can act autonomously and reason across multiple modes of information, the demands on the underlying hardware will only increase. With Trillium, Google has demonstrated that it intends to remain at the forefront of this evolution, investing in the infrastructure that will power the next generation of AI advancement. source

Google’s new Trillium AI chip delivers 4x speed and powers Gemini 2.0 Read More »

The biggest news from Amazon Web Services (AWS) re:Invent 2024

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cloud computing leader Amazon Web Services’s (AWS) annual re:Invent conference for 2024 is taking place this week in Las Vegas, Nevada, and it’s shaping up to be the biggest of the series since it launched 12 years ago. Why? Generative AI, of course, and the increasing competition between tech giants and startups to offer useful tools to enterprises — AWS’s bread and butter. VentureBeat’s senior AI reporter Emilia David is reporting directly from the conference and is joined remotely by the rest of us covering the most important news for business leaders and those looking to embrace and deploy the latest, most useful AWS technology. Here’s the biggest news we’ve found from the show so far: AWS Brings Multi-Agent Orchestration to Bedrock: AWS has introduced multi-agent orchestration to its Bedrock platform, allowing enterprises to build collaborative AI agents and streamlined workflows. This upgrade enables companies like Moody’s to achieve more accurate analyses by coordinating specialized agents for complex tasks. AWS says new Bedrock Automated Reasoning catches 100% of AI hallucinations: New features on Amazon Bedrock include Model Distillation for training smaller, faster AI models and Automated Reasoning Checks to reduce hallucinations. These tools aim to improve response accuracy and enable enterprises to create tailored models for specific needs. AWS SageMaker Transforms Into a Combined Data and AI Hub: AWS unveiled the next generation of SageMaker, integrating analytics and ML tools into a unified platform. The upgrades, including Lakehouse and Unified Studio capabilities, allow enterprises to seamlessly link data from various sources for faster AI app development. Amazon Launches Nova AI Model Family for Generating Text, Images, and Video: Amazon debuted the Nova family of generative AI models at re:Invent 2024, targeting text, image, and video creation. The Nova models, integrated with Bedrock, offer businesses customizable tools for creative content development and advanced AI applications. Qodo Introduces AI Regression Testing Agent, Qodo Cover: Qodo launched its fully autonomous regression testing agent, Qodo Cover, to streamline software quality validation by automatically generating and validating test suites. The tool, built on Meta’s TestGen-LLM, recently demonstrated its capabilities by contributing production-quality tests accepted by Hugging Face, a major ML repository. Amazon HyperPod Task Governance Keeps GPUs Running, Cutting Costs 40%: AWS introduced HyperPod Task Governance, a feature for SageMaker HyperPod that optimizes GPU usage and reduces idle time, cutting AI infrastructure costs by up to 40%. By intelligently managing resource allocation and prioritizing tasks, the system ensures higher utilization rates, even during off-peak hours, addressing a critical efficiency challenge for enterprises scaling AI initiatives. AWS Now Allows Prompt Caching with 90% Cost Reduction: AWS announced Intelligent Prompt Routing and Prompt Caching on Bedrock, offering cost savings of up to 30% and 90%, respectively, for running AI applications. Intelligent Prompt Routing optimizes prompt handling by directing queries to appropriately sized models, while Prompt Caching reduces token generation costs by storing common queries for reuse, significantly lowering expenses and latency for enterprises. This year’s AWS announcements highlight the company’s efforts to empower enterprises with advanced AI, data analytics, and generative tools. Explore these innovations to stay ahead in the AI race. AWS Debuts Advanced RAG Features for Structured and Unstructured Data: AWS unveiled new tools at re:Invent 2024 to simplify retrieval augmented generation (RAG) workflows for both structured and unstructured data, including Amazon Bedrock Knowledge Bases and GraphRAG. These features automate complex tasks like generating SQL queries and creating knowledge graphs, enabling enterprises to build more accurate, intelligent AI applications without custom coding or expertise. Tackling Unstructured Data with Amazon Bedrock Data Automation: AWS introduced Bedrock Data Automation to transform unstructured data—like PDFs, audio, and videos—into structured formats ready for generative AI use cases. This gen AI-powered (extract, transform and load) ETL tool processes multimodal content at scale, streamlining data preparation and expanding AI’s ability to leverage diverse enterprise datasets. source

The biggest news from Amazon Web Services (AWS) re:Invent 2024 Read More »

Google Gemini 2.0: Could this be the beginning of truly autonomous AI?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google unveiled Gemini 2.0 today, marking an ambitious leap toward AI systems that can independently complete complex tasks and introducing native image generation and multilingual audio capabilities — features that position the tech giant for direct competition with OpenAI and Anthropic in an increasingly heated race for AI dominance. The release arrives almost exactly one year after Google’s initial Gemini launch, emerging during a pivotal moment in artificial intelligence development. Rather than simply responding to queries, these new “agentic” AI systems can understand nuanced context, plan multiple steps ahead, and take supervised actions on behalf of users. How Google’s new AI assistant could reshape daily digital life During a recent press conference, Tulsee Doshi, director of product management for Gemini, outlined the system’s enhanced capabilities while demonstrating real-time image generation and multilingual conversations. “Gemini 2.0 brings enhanced performance and new capabilities like native image and multilingual audio generation,” Doshi explained. “It also has native intelligent tool use, which means that it can directly access Google products like search or even execute code.” The initial release centers on Gemini 2.0 Flash, an experimental version that Google claims operates at twice the speed of its predecessor while surpassing the capabilities of more powerful models. This represents a significant technical achievement, as previous speed improvements typically came at the cost of reduced functionality. Inside the new generation of AI agents that promise to transform how we work Perhaps most significantly, Google introduced three prototype AI agents built on Gemini 2.0’s architecture that demonstrate the company’s vision for AI’s future. Project Astra, an updated universal AI assistant, showcased its ability to maintain complex conversations across multiple languages while accessing Google tools and maintaining contextual memory of previous interactions. “Project Astra now has up to 10 minutes of in-session memory, and can remember conversations you’ve had with it in the past, so you can have a more helpful, personalized experience,” explained Bibo Xu, group product manager at Google DeepMind, during a live demonstration. The system smoothly transitioned between languages and accessed real-time information through Google Search and Maps, suggesting a level of integration previously unseen in consumer AI products. For developers and enterprise customers, Google introduced Project Mariner and Jules, two specialized AI agents designed to automate complex technical tasks. Project Mariner, demonstrated as a Chrome extension, achieved an impressive 83.5% success rate on the WebVoyager benchmark for real-world web tasks — a significant improvement over previous attempts at autonomous web navigation. “Project Mariner is an early research prototype that explores agent capabilities for browsing the web and taking action,” said Jaclyn Konzelmann, director of product management at Google Labs. “When evaluated against the WebVoyager benchmark, which tests agent performance on end-to-end, real-world web tasks, Project Mariner achieved the impressive results of 83.5%.” Custom silicon and massive scale: The infrastructure behind Google’s AI ambitions Supporting these advances is Trillium, Google’s sixth-generation Tensor Processing Unit (TPU), which becomes generally available to cloud customers today. The custom AI accelerator represents a massive investment in computational infrastructure, with Google deploying over 100,000 Trillium chips in a single network fabric. Logan Kilpatrick, a product manager on the AI studio and Gemini API team, highlighted the practical impact of this infrastructure investment during the press conference. “The growth of flash usage has been more than 900% which has been incredible to see,” Kilpatrick said. “You know, we’ve had like six experimental model launches in the last few months, there’s now millions of developers who are using Gemini.” The road ahead: Safety concerns and competition in the age of autonomous AI Google’s shift toward autonomous agents represents perhaps the most significant strategic pivot in artificial intelligence since OpenAI’s release of ChatGPT. While competitors have focused on enhancing the capabilities of large language models, Google is betting that the future belongs to AI systems that can actively navigate digital environments and complete complex tasks with minimal human intervention. This vision of AI agents that can think, plan, and act marks a departure from the current paradigm of reactive AI assistants. It’s a risky bet — autonomous systems bring inherently greater safety concerns and technical challenges — but one that could reshape the competitive landscape if successful. The company’s massive investment in custom silicon and infrastructure suggests it’s prepared to compete aggressively in this new direction. However, the transition to more autonomous AI systems raises new safety and ethical concerns. Google has emphasized its commitment to responsible development, including extensive testing with trusted users and built-in safety measures. The company’s approach to rolling out these features gradually, starting with developer access and trusted testers, suggests an awareness of the potential risks involved in deploying autonomous AI systems. The release comes at a crucial moment for Google, as it faces increasing pressure from competitors and heightened scrutiny over AI safety. Microsoft and OpenAI have made significant strides in AI development this year, while other companies like Anthropic have gained traction with enterprise customers. “We firmly believe that the only way to build AI is to be responsible from the start,” emphasized Shrestha Basu Mallick, group product manager for the Gemini API, during the press conference. “We’ll continue to prioritize making safety and responsibility a key element of our model development process as we advance our models and agents.” As these systems become more capable of taking action in the real world, they could fundamentally reshape how people interact with technology. The success of Gemini 2.0 could determine not only Google’s position in the AI market but also the broader trajectory of AI development as the industry moves toward more autonomous systems. One year ago, when Google launched the first version of Gemini, the AI landscape was dominated by chatbots that could engage in clever conversation but struggled with real-world tasks. Now, as AI agents begin to take their first tentative steps toward autonomy, the industry stands at another inflection point. The question is no

Google Gemini 2.0: Could this be the beginning of truly autonomous AI? Read More »

Liquid AI’s new STAR model architecture outshines Transformer efficiency

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As rumors and reports swirl about the difficulty facing top AI companies in developing newer, more powerful large language models (LLMs), the spotlight is increasingly shifting toward alternate architectures for the “Transformer” — the tech underpinning most of the current generative AI boom, introduced by Google researchers in the seminal 2017 paper “Attention Is All You Need.“ As described in that paper and henceforth, a Transformer is a deep learning neural network architecture that processes sequential data, such as text or time-series information. Now, MIT-birthed startup Liquid AI has introduced STAR (Synthesis of Tailored Architectures), an innovative framework designed to automate the generation and optimization of AI model architectures. The STAR framework leverages evolutionary algorithms and a numerical encoding system to address the complex challenge of balancing quality and efficiency in deep learning models. According to Liquid AI’s research team, which includes Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli, and Michael Poli, STAR’s approach represents a shift from traditional architecture design methods. Instead of relying on manual tuning or predefined templates, STAR uses a hierarchical encoding technique — referred to as “STAR genomes” — to explore a vast design space of potential architectures. These genomes enable iterative optimization processes such as recombination and mutation, allowing STAR to synthesize and refine architectures tailored to specific metrics and hardware requirements. 90% cache size reduction versus traditional ML Transformers Liquid AI’s initial focus for STAR has been on autoregressive language modeling, an area where traditional Transformer architectures have long been dominant. In tests conducted during their research, the Liquid AI research team demonstrated STAR’s ability to generate architectures that consistently outperformed highly-optimized Transformer++ and hybrid models. For example, when optimizing for quality and cache size, STAR-evolved architectures achieved cache size reductions of up to 37% compared to hybrid models and 90% compared to Transformers. Despite these efficiency improvements, the STAR-generated models maintained or exceeded the predictive performance of their counterparts. Similarly, when tasked with optimizing for model quality and size, STAR reduced parameter counts by up to 13% while still improving performance on standard benchmarks. The research also highlighted STAR’s ability to scale its designs. A STAR-evolved model scaled from 125 million to 1 billion parameters delivered comparable or superior results to existing Transformer++ and hybrid models, all while significantly reducing inference cache requirements. Re-architecting AI model architecture Liquid AI stated that STAR is rooted in a design theory that incorporates principles from dynamical systems, signal processing, and numerical linear algebra. This foundational approach has enabled the team to develop a versatile search space for computational units, encompassing components such as attention mechanisms, recurrences, and convolutions. One of STAR’s distinguishing features is its modularity, which allows the framework to encode and optimize architectures across multiple hierarchical levels. This capability provides insights into recurring design motifs and enables researchers to identify effective combinations of architectural components. What’s next for STAR? STAR’s ability to synthesize efficient, high-performing architectures has potential applications far beyond language modeling. Liquid AI envisions this framework being used to tackle challenges in various domains where the balance between quality and computational efficiency is critical. While Liquid AI has yet to disclose specific plans for commercial deployment or pricing, the research findings signal a significant advancement in the field of automated architecture design. For researchers and developers looking to optimize AI systems, STAR could represent a powerful tool for pushing the boundaries of model performance and efficiency. With its open research approach, Liquid AI has published the full details of STAR in a peer-reviewed paper, encouraging collaboration and further innovation. As the AI landscape continues to evolve, frameworks like STAR are poised to play a key role in shaping the next generation of intelligent systems. STAR might even herald the birth of a new post-Transformer architecture boom — a welcome winter holiday gift for the machine learning and AI research community. source

Liquid AI’s new STAR model architecture outshines Transformer efficiency Read More »

Qodo’s fully autonomous agent tackles the complexities of regression testing

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Code is continuously evolving in the software development process, requiring ongoing testing for quality and maintainability. This is the root of regression testing, in which existing tests are re-run to ensure that modified code continues to function as intended. However, regression testing can be time-consuming and complex, and may often be neglected in favor of other priorities. Easing the pain of software testing Qodo (formerly CodiumAI) says it can ease headaches in the process with the release today of its new fully autonomous AI regression testing agent, Qodo Cover. Its agent creates validation suites to ensure that software applications are, essentially, behaving. The two-and-a-half-year-old startup announced its new tool at AWS re:Invent, where it also pitched as a finalist in an AWS Unicorn Tank competition. “We’re moving toward a place where AI doesn’t just write code — it helps tackle the majority of developers’ workload by proving that code functions correctly,” Qodo CEO Itamar Friedman told VentureBeat. Supporting the next big leap in software development Qodo explained earlier this year at VentureBeat Transform that it is approaching AI agents in an incremental fashion — taking on competitors such as Devin that offer more end-to-end suites. The Israeli startup offers numerous small agents that handle specific tasks within software development workflows.  Qodo Cover is the newest of these. The fully autonomous agent analyzes source code and performs regression tests to validate it as it changes throughout its lifecycle. The platform ensures that each test runs successfully, passes and increases the amount of code it covers — and only keeps those that meet all three criteria.  It’s estimated that enterprise developers spend only an hour a day actually writing code; the rest of their time goes to other crucial tasks such as testing and review, Friedman pointed out. However, “many companies are rushing to generate code with AI, focusing on that one hour while ignoring the rest of the equation.” Traditional testing approaches simply don’t scale, he noted. This can stall the next leap in software development, where AI can reliably generate 80% or more of high-quality code. “Just like how hardware verification revolutionized chip manufacturing a few decades ago, we’re now at a similar inflection point with software. When 25% or more of code is AI-generated, we need new paradigms to ensure reliability.” Hugging Face-approved Demonstrating its ability to generate production-quality tests, a pull request generated fully autonomously by Qodo Cover was recently accepted into Hugging Face’s PyTorch Image Models repository. Pull requests are a means of quality control in software development, allowing collaborators to propose and review changes before they are integrated into a codebase. This can keep bad code and bugs out of the main codebase to ensure quality and consistency. The acceptance by Hugging Face validates Qodo’s offering and exposes it to more than 40,000 projects in the popular machine learning (ML) repository.  “Getting a contribution accepted into a major open-source project is a signal that AI agents are beginning to operate at the level of professional developers when it comes to understanding complex codebases and maintaining high standards for quality,” said Friedman. “It’s a peek into how software development will evolve.” Qodo Cover is built on an open-source project that Qodo launched in May. That project was based on TestGen-LLM, a tool developed by Meta researchers to fully automate test coverage. To overcome challenges with large language model (LLM)-generated tests, the researchers set out to answer specific questions:  Does the test compile and run properly? Does the test increase code coverage? Once those questions are validated, it’s important to perform a manual investigation, Friedman writes in a blog post. This involves asking:  How well is the test written? How much value does it actually add? Does it meet any additional requirements? Users provide several inputs to Qodo Cover, including:  The source file for code to be tested Existing test suite Coverage report Command for building and running suites Code coverage targets and maximum number of iterations to run Additional context and prompting options Qodo Cover then generates more tests in the same style, validates them using the runtime environment (i.e., do they build and pass?), reviews metrics such as increased code coverage, and updates existing test suites and coverage reports. This is repeated until code either reaches the coverage threshold or the maximum number of iterations is reached.  Giving devs full control, providing progress reports Qodo’s agent can be deployed as a comprehensive tool that analyzes full repositories to identify gaps and irregularities and extend test suites. Or, it can be established as a GitHub action that creates pull requests automatically to suggest tests for newly-changed code. Qodo emphasizes that developers maintain full control and have the ability to review and selectively accept tests. Each pull request also includes detailed coverage progress reports.  Qodo Cover supports all popular AI models, including GPT-4o and Claude 3.5 Sonnet. The company says it delivers high-quality results across more than a dozen programming languages including JavaScript, TypeScript, C++, C#, Ruby, Go and Rust. It is intended to integrate with Qodo Merge, which reviews and handles pull requests, and coding tool Qodo Gen. source

Qodo’s fully autonomous agent tackles the complexities of regression testing Read More »

Amazon HyperPod Task Governance keeps GPUs running, cutting costs 40%

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cost remains a primary concern of enterprise AI usage and it’s a challenge that AWS is tackling head-on. At the AWS:reinvent 2024 conference today, the cloud giant announced HyperPod Task Governance, a sophisticated solution targeting one of the most expensive inefficiencies in enterprise AI operations: underutilized GPU resources. According to AWS, HyperPod Task Governance can increase AI accelerator utilization, helping enterprises to optimize AI costs and producing potentially significant savings. “This innovation helps you maximize computer resource utilization by automating the prioritization and management of these Gen AI tasks, reducing the cost by up to 40%,” said Swami Sivasubramanian, VP of AI and Data at AWS. End GPU idle time As organizations rapidly scale their AI initiatives, many are discovering a costly paradox. Despite heavy investments in GPU infrastructure to power various AI workloads, including training, fine tuning and inference, these expensive computing resources frequently sit idle. Enterprise leaders report surprisingly low utilization rates across their AI projects, even as teams compete for computing resources. As it turns out, it’s actually a challenge that AWS itself faced. “Internally, we had this kind of problem as we were scaling up more than a year ago, and we built a system that takes into account the consumption needs of these accelerators,” Sivasubramanian told VentureBeat. “I talked to many of our customers, CIOs and CEOs, they said we want exactly that; we want it as part of Sagemaker and that’s what we are launching.” Swami said that once the system was deployed AWS’ AI accelerator utilization went through the roof with utilization rates rising over 90% How HyperPod Task Governance works The SageMaker Hyperpod technology was first announced at the re:invent 2023 conference. SageMaker HyperPod is built to handle the complexity of training large models with billions or tens of billions of parameters, which requires managing large clusters of machine learning accelerators. HyperPod Task Governance adds a new layer of control to SageMaker Hyperpod by introducing intelligent resource allocation across different AI workloads. The system recognizes that different AI tasks have varying demand patterns throughout the day. For instance, inference workloads typically peak during business hours when applications see the most use, while training and experimentation can be scheduled during off-peak hours. The system provides enterprises with real-time insights into project utilization, team resource consumption, and compute needs. It enables organizations to effectively load balance their GPU resources across different teams and projects, ensuring that expensive AI infrastructure never sits idle. AWS wants to make sure enterprises don’t leave money on the table Sivasubramanian highlighted the critical importance of AI cost management during his keynote address. As an example, he said that if an organization has allocated a thousand AI accelerators deployed not all are utilized consistently over a 24 hour period. During the day, they are heavily used for inference, but at night, a large portion of these costly resources are sitting idle when the inference demand might be very low.  “We live in a world where compute resources are finite and expensive and it can be difficult to maximize utilization and efficiently allocate resources, which is typically done through spreadsheets and calendars,” he said. ” Now, without a strategic approach to resource allocation, you’re not only missing opportunities, but you’re also leaving money on the table.” source

Amazon HyperPod Task Governance keeps GPUs running, cutting costs 40% Read More »

AWS says new Bedrock Automated Reasoning catches 100% of AI hallucinations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AWS announced more updates for Bedrock aimed to spot hallucinations and build smaller models faster as enterprises want more customization and accuracy from models.  AWS announced during re:Invent 2024 Amazon Bedrock Model Distillation and Automated Reasoning Checks on preview for enterprise customers interested in training smaller models and catching hallucinations. Amazon Bedrock Model Distillation will let users use a larger AI model to train a smaller model and offer enterprises access to a model they feel would work best with their workload.  Larger models, such as Llama 3.1 405B, have more knowledge but are slow and unwieldy. A smaller model responds faster but most often has limited knowledge. AWS said Bedrock Model Distillation would make the process of transferring a bigger model’s knowledge to a smaller one without sacrificing response time.  Users can select the heavier-weight model they want and find a small model within the same family, like Llama or Claude, which have a range of model sizes in the same family, and write out sample prompts. Bedrock will generate responses and fine-tune the smaller model and continue to make more sample data to finish distilling the larger model’s knowledge.  Right now, model distillation works with Anthropic, Amazon and Meta models. Bedrock Model Distillation is currently on preview.  Why enterprises are interested in model distillation For enterprises that want a faster response model — such as one that can quickly answer customer questions — there must be a balance between knowing a lot and responding quickly. While they can choose to use a smaller version of a large model, AWS is banking that more enterprises want more customization in the kinds of models — both the larger and smaller ones — that they want to use.  AWS, which does offer a choice of models in Bedrock’s model garden, hopes enterprises will want to choose any model family and train a smaller model for their needs.  Many organizations, mostly model providers, use model distillation to train smaller models. However, AWS said the process usually entails a lot of machine learning expertise and manual fine-tuning. Model providers such as Meta have used model distillation to bring a broader knowledge base to a smaller model. Nvidia leveraged distillation and pruning techniques to make Llama 3.1-Minitron 4B, a small language model it said performs better than similar-sized models. Model distillation is not new for Amazon, which has been working on model distillation methods since 2020.  Catching factual errors faster Hallucinations remain an issue for AI models, even though enterprises have created workarounds like fine-tuning and limiting what models will respond to. However, even the most fine-tuned model that only performs retrieval augmented generation (RAG) tasks with a data set can still make mistakes.  AWS solution is Automated Reasoning checks on Bedrock, which uses mathematical validation to prove that a response is correct.  “Automated Reasoning checks is the first and only generative AI safeguard that helps prevent factual errors due to hallucinations using logically accurate and verifiable reasoning,” AWS said. “By increasing the trust that customers can place in model responses, Automated Reasoning checks opens generative AI up to new use cases where accuracy is paramount.”  Customers can access Automated Reasoning checks from Amazon Bedrock Guardrails, the product that brings responsible AI and fine-tuning to models. Researchers and developers often use automated reasoning to deal with precise answers for complex issues with math.  Users have to upload their data and Bedrock will develop the rules for the model to follow and guide customers to ensure the model is tuned to them. Once it’s checked, Automated Reasoning checks on Bedrock will verify the responses from the model. If it returns something incorrectly, Bedrock will suggest a new answer. AWS CEO Matt Garman said during his keynote that automated checks ensure an enterprise’s data remains its differentiator, with their AI models reflecting that accurately.  source

AWS says new Bedrock Automated Reasoning catches 100% of AI hallucinations Read More »

ChatGPT’s second birthday: What will gen AI (and the world) look like in another 2 years?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It is now just over two years since the first appearance of ChatGPT on November 30, 2022. At the time of its launch, OpenAI viewed ChatGPT as a demonstration project designed to learn how people would make use of the tool and the underlying GPT 3.5 large language model (LLM). A LLM is a model based on the transformer architecture first introduced by Google in 2017, which uses self-attention mechanisms to process and generate human-like text across tasks like natural language understanding. It was more than a successful demonstration project! OpenAI was as surprised as anyone by the rapid uptake of ChatGPT, which reached one hundred million users within two months. Although perhaps they should not have been so surprised. Futurist Kevin Kelly, also the co-founder of Wired, advised in 2014 that “the business plans of the next 10,000 startups are easy to forecast: Take X and add AI. This is a big deal, and now it’s here.” Kelly said this several years before ChatGPT. Yet, this is exactly what has happened. Equally remarkable is his prediction in the same Wired article that: “By 2024, Google’s main product will not be search but AI.” It could be debated if this is true, but it might soon be. Gemini is Google’s flagship AI chat product, but AI pervades its search and likely every other one of its products, including YouTube, TensorFlow and AI features in Google Workspace. The bot heard around the world The headlong rush of AI startups that Kelly foresaw really gained momentum after the ChatGPT launch. You could call it the AI big bang moment, or the bot heard around the world. And it jumpstarted the field of generative AI — the broad category of LLMs for text and diffusion models for image creation. This reached the heights of hype, or what Gartner calls “The Peak of Inflated Expectations” in 2023. The hype of 2023 may have diminished, but only by a little. By some estimates, there are as many as 70,000 AI companies worldwide, representing a 100% increase since 2017. This is a veritable Cambrian explosion of companies pursuing novel uses for AI technology. Kelly’s 2014 foresight about AI startups proved prophetic. If anything, huge venture capital investments continue to flow into startup companies looking to harness AI. The New York Times reported that investors poured $27.1 billion into AI start-ups in the U.S. in the second quarter of 2024 alone, “accounting for nearly half of all U.S. start-up funding in that period.” Statista added: “In the first nine months of 2024, AI-related investments accounted for 33% of total investments in VC-backed companies headquartered in the U.S. That is up from 14% in 2020 and could go even higher in the years ahead.” The large potential market is a lure for both the startups and established companies. A recent Reuters Institute survey of consumers indicated individual usage of ChatGPT was low across six countries, including the U.S. and U.K. Just 1% used it daily in Japan, rising to 2% in France and the UK, and 7% in the U.S. This slow uptake might be attributed to several factors, ranging from a lack of awareness to concerns about the safety of personal information. Does this mean AI’s impact is overestimated? Hardly, as most of the survey respondents expected gen AI to have a significant impact on every sector of society in the next five years. The enterprise sector tells quite a different story. As reported by VentureBeat, industry analyst firm GAI Insights estimates that 33% of enterprises will have gen AI applications in production next year. Enterprises often have clearer use cases, such as improving customer service, automating workflows and augmenting decision-making, which drive faster adoption than among individual consumers. For example, the healthcare industry is using AI for capturing notes and financial services is using the technology for enhanced fraud detection. GAI further reported that gen AI is the leading 2025 budget priority for CIOs and CTOs. What’s next? From gen AI to the dawn of superintelligence The uneven rollout of gen AI raises questions about what lies ahead for adoption in 2025 and beyond. Both Anthropic CEO Dario Amodei and OpenAI CEO Sam Altman suggest that artificial general intelligence (AGI) — or even superintelligence — could appear within the next two to 10 years, potentially reshaping our world. AGI is thought to be the ability for AI to understand, learn and perform any intellectual task that a human being can, thereby emulating human cognitive abilities across a wide range of domains. Sparks of AGI in 2025 As reported by Variety, Altman said that we could see the first glimmers of AGI as soon as 2025. Likely he was talking about AI agents, in which you can give an AI system a complicated task and it will autonomously use different tools to complete it. For example, Anthropic recently introduced a Computer Use feature that enables developers to direct the Claude chatbot “to use computers the way people do — by looking at a screen, moving a cursor, clicking buttons and typing text.” This feature allows developers to delegate tasks to Claude, such as scheduling meetings, responding to emails or analyzing data, with the bot interacting with computer interfaces as if it were a human user. In a demonstration, Anthropic showcased how Claude could autonomously plan a day trip by interacting with computer interfaces — an early glimpse of how AI agents may oversee complex tasks. Caption: Anthropic shows how its Claude chatbot can autonomously plan tasks like a day trip. Source: https://www.youtube.com/watch?v=jqx18KgIzAE  In September, Salesforce said it “is ushering in the third wave of the AI revolution, helping businesses deploy AI agents alongside human workers.” They see agents focusing on repetitive, lower-value tasks, freeing people to focus on more strategic priorities. These agents could enable human workers to focus on innovation, complex problem-solving or customer relationship management. 

ChatGPT’s second birthday: What will gen AI (and the world) look like in another 2 years? Read More »