Galileo launches ‘Agentic Evaluations’ to fix AI agent errors before they cost you

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Galileo, a San Francisco-based startup, is betting that the future of artificial intelligence depends on trust. Today, the company launched a new product, Agentic Evaluations, to address a growing challenge in the world of AI: making sure the increasingly complex systems known as AI agents actually work as intended. AI agents — autonomous systems that perform multi-step tasks like generating reports or analyzing customer data — are gaining traction across industries. But their rapid adoption raises a crucial question: How can companies verify these systems remain reliable after deployment? Galileo’s CEO, Vikram Chatterji, believes his company has found the answer. “Over the last six to eight months, we started to see some of our customers trying to adopt agentic systems,” said Chatterji in an interview. “Now LLMs can be used as a smart router to pick and choose the right API calls towards actually completing a task. Going from just generating text to actually completing a task was a very big chasm that was unlocked.” A diagram showing how Galileo evaluates AI agents at three key stages: tool selection, error detection and task completion. (Credit: Galileo) AI agents show promise, but enterprises demand accountability Major enterprises like Cisco and Ema (the latter founded by Coinbase’s former chief product officer) have already adopted Galileo’s platform. These companies use AI agents to automate tasks from customer support to financial analysis, and report significant productivity gains. “A sales representative who’s trying to do outreach and outbounds would otherwise use maybe a week of their time to do that, versus with some of these AI-enabled agents, they’re doing that within two days or less,” Chatterji explained, highlighting the return on investment for enterprises. Galileo’s new framework evaluates tool selection quality, detects errors in tool calls, and tracks overall session success. It also monitors essential metrics for large-scale AI deployment, including costs and latency. A dashboard showing how Galileo evaluates AI agents at three key stages: tool selection, error detection and task completion. (Credit: Galileo) $68 million in funding fuels Galileo’s push into enterprise AI The launch builds on Galileo’s recent momentum. The company raised $45 million in series B funding led by Scale Venture Partners last October, bringing its total funding to $68 million. Industry analysts project the market for AI operations tools could reach $4 billion by 2025. The stakes are high as AI deployment accelerates. Studies show even advanced models like GPT-4 can hallucinate about 23% of the time during basic question-and-answer tasks. Galileo’s tools help enterprises identify these issues before they impact operations. “Before we launch this thing, we really, really need to know that this thing works,” Chatterji said, describing customer concerns. “The bar is really high. So that’s where we gave them this tool chain, such that they could just use our metrics as the basis for these tests.” Addressing AI hallucinations and enterprise-scale challenges The company’s focus on reliable, production-ready solutions positions it well in a market increasingly concerned with AI safety. For technical leaders deploying enterprise AI, Galileo’s platform provides essential guardrails for ensuring AI agents perform as intended while controlling costs. As enterprises expand their use of AI agents, performance monitoring tools become crucial infrastructure. Galileo’s latest offering aims to help businesses deploy AI responsibly and effectively at scale. “2025 will be the year of agents. It is going to be very prolific,” Chatterji noted. “However, what we’ve also seen is a lot of companies that are just launching these agents without good testing is leading to negative implications…The need for proper testing and evaluations is more than ever before.” source

Galileo launches ‘Agentic Evaluations’ to fix AI agent errors before they cost you Read More »

Where CIOs should place their 2025 AI bets

“In 2025, companies at the forefront of the agentic AI revolution will face a critical challenge: balancing the delivery of seamless, done-for-you experiences with the need to give customers ultimate authority and control over final decision-making at their discretion,” says Ashok Srivastava, chief data officer at Intuit. “To achieve this, innovation must focus on AI systems that seamlessly blend advanced autonomy with user-centric control, incorporating adaptive transparency, ethical safeguards, and context-aware learning to empower customer decision-making.” What to bet on: Expect significant agentic AI hype in 2025 on one end and potential employee fears around autonomous agents taking their jobs on the other. CIO should bet on change management programs and evangelizing high-quality agents with whom employees collaborate to deliver value beyond productivity. Build toward intelligent document management Most enterprises have document management systems to extract information from PDFs, word processing files, and scanned paper documents, where document structure and the required information aren’t complex. Examples include scanning invoices, extracting basic contract information, or capturing information from PDF forms. Even simple use cases had exceptions requiring business process outsourcing (BPO) or internal data processing teams to manage.  source

Where CIOs should place their 2025 AI bets Read More »

Highlights And Implications Of Biden’s Executive Order On Strengthening And Promoting Innovation In The Nation’s Cybersecurity

Building on the 2021 Executive Order on Improving the Nation’s Cybersecurity, former US President Joe Biden’s 2025 Executive Order (EO) 14144 puts forth additional actions for strengthening security, improving accountability for software and cloud service providers, and promoting innovation, including use of emerging technologies. In this blog, we’ll break down the key topics and technology areas of this latest cybersecurity executive order, highlighting the good that will come from it as well as other implications. Raising The Bar Once More For Third-Party Software Supply Chains What’s good: This EO pushes for the Federal Acquisition Regulation (FAR) to update contract language as a risk management tool. It requires software providers to provide machine-readable secure software development attestations, high-level artifacts to validate those attestations, and a list of the providers’ Federal Civilian Executive Branch (FCEB) agency software customers. It sets a higher bar, with updating of attestations to address both the delivery and the security of software and make them machine-readable, along with the removal of agency discretion to collect evidence and the centralization of attestation verification and artifact validation by the Cybersecurity and Infrastructure Security Agency (CISA). Notably, it recommends “[referring] attestations that fail validation to the Attorney General for action as appropriate,” which aligns with the National Cybersecurity Strategy to hold providers accountable that fail to adhere to secure development practices. This will help federal agencies with processes, tools, and resources necessary to ensure supplier submission and conformity. For suppliers, the establishment of common procurement standards reduces the ambiguity of expectations, minimizes the duplication of efforts to attest, and provides a more efficient process. Forrester’s take: Federal agencies should assess their progress in adopting cybersecurity risk management practices in compliance with the National Institute of Standards and Technology’s (NIST) SP 800-161 Revision 1 before the Office of Management and Budget (OMB) begins requesting progress reports. Agencies should watch for updates to NIST Special Publication (SP) 800-161 on how to securely and reliably deploy patches and updates as well as guidance on management of open-source software usage. Software providers should look out for updates to the NIST Secure Software Development Framework (SSDF), modifications to the attestation form, and methods to automate the attestation. Providers should also keep an eye out for the enumeration of “high-level artifacts to validate those attestations,” with a software bill of materials (SBOM) being the most likely evidence to be required. A Focus On EDR And Enabling Threat Hunting And Response Capabilities What’s good: The EO prioritizes use of endpoint detection and response (EDR) controls to enable CISA’s hunting and response capabilities in FCEB agencies. It also provides CISA wiggle room on specifying what qualifies as timely access and completeness of data for threat hunting and response and also requires CISA to provide advanced notice of if and when it accesses FCEB systems. The EO also emphasizes use of phishing-resistant authentication and standards like WebAuthn as well as requirements for baselines for configuration of cloud-based systems from cloud service providers in the FedRAMP Marketplace for improving cybersecurity of federal systems overall. Forrester’s take: FCEB participation in the working groups is fundamental to ensure that the EDR technologies that CISA supports include those implemented by each agency. This helps determine what “timely access to required data” and “completeness” of data when delivering data to CISA for hunting and response should be, as well as establishing use cases for administrative accommodation on restricted data access. FCEB agencies should now start preparing a comprehensive and continually updated list of systems, endpoints, and datasets that need more controls, have data access restrictions, or require periods of nondisruption. Cloud service providers can be proactive in recommending baselines, such as checking for insecure configurations and detecting and remediating configuration drifts. A First Acknowledgement Of Defending Against Threats To Space Systems What’s good: While the White House has not officially designated space systems as critical infrastructure, this EO is the first to acknowledge that space systems must be protected as if they were. Space systems’ roles in supporting critical infrastructure and services such as global commerce, health, communication, and national security make them key targets for attack. The EO sets requirements for FCEB agencies that deploy, operat,e and maintain space systems to enhance the security of communications between ground and in-orbit systems. It directs the FAR Council to develop new cybersecurity contract requirements for agency-procured civil space systems that follow NIST SSDF best practices and bring space systems into agencies’ existing continuous risk assessment requirements. The EO also requires the National Cyber Director to create the government’s first inventory of space ground systems to support a national study on recommendations to improve civil space cyber defenses. Forrester’s take: A governmentwide inventory will be difficult to achieve. While FCEB agencies are already required to report all federal information systems to CISA, the federal definition of an “information system” and the unique category of “space system” are not exactly the same, making it potentially difficult for agencies to meet the deadlines. Additionally, the government has historically left civil space system cybersecurity up to global standards bodies, with NIST only recently publishing space-related guidance for ground and satellite systems. This creates an opportunity for the private sector to influence best practices and standards going forward as threats and the technologies that comprise space systems evolve. FCEB agencies should not wait for FAR-mandated requirements and should begin evaluating their existing contracts to ensure that minimum SSDF best practices are already in place. The Prioritization Of Advancing Cryptographic Infrastructure: E2EE, PQC, And Key Protection What’s good: The EO takes a holistic view of securing communications from internet routing, DNS traffic, and email messages to end-to-end encryption (E2EE) for modern communications such as voice- and videoconferencing and instant messages. It stresses continued urgency and action for quantum security and the migration to usage of post-quantum cryptographic (PQC) algorithms and measures to protect cryptographic keys, in particular with a call to take advantage of commercial security technologies like hardware security modules (HSMs), trusted execution environments (TEEs), and other isolation technologies

Highlights And Implications Of Biden’s Executive Order On Strengthening And Promoting Innovation In The Nation’s Cybersecurity Read More »

Ex-FCC Members Oppose 5th Circ. Universal Service Ruling

By Christopher Cole ( January 21, 2025, 7:32 PM EST) — A bipartisan group of eight former members of the Federal Communications Commission is urging the U.S. Supreme Court to overturn a Fifth Circuit ruling that found the mechanism for funding the FCC’s universal service subsidies unconstitutional…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Ex-FCC Members Oppose 5th Circ. Universal Service Ruling Read More »

Thoma Bravo Clinches $3.6B Credit Fund III

By Jade Martinez-Pogue ( January 21, 2025, 2:06 PM EST) — Software investor Thoma Bravo on Tuesday announced that it wrapped fundraising on its most recent credit fund after securing $3.6 billion in total available capital…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Thoma Bravo Clinches $3.6B Credit Fund III Read More »

Cooley-Led Insulin Device Maker Preps $113M IPO

By Jade Martinez-Pogue ( January 22, 2025, 4:09 PM EST) — Insulin delivery system maker Beta Bionics on Wednesday announced the terms for its initial public offering, planning to raise $113 million…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Cooley-Led Insulin Device Maker Preps $113M IPO Read More »

Microsoft just built an AI that designs materials for the future: Here’s how it works

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft Research has introduced a powerful new AI system today that generates novel materials with specific desired properties, potentially accelerating the development of better batteries, more efficient solar cells and other critical technologies. The system, called MatterGen, represents a fundamental shift in how scientists discover new materials. Rather than screening millions of existing compounds — the traditional approach that can take years — MatterGen directly generates novel materials based on desired characteristics, similar to how AI image generators create pictures from text descriptions. “Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints,” said Tian Xie, principal research manager at Microsoft Research and lead author of the study published today in Nature. “This represents a major advancement towards creating a universal generative model for materials design.” How Microsoft’s AI engine works differently than traditional methods MatterGen uses a specialized type of AI called a diffusion model — similar to those behind image generators like DALL-E — but adapted to work with three-dimensional crystal structures. It gradually refines random arrangements of atoms into stable, useful materials that meet specified criteria. The results surpass previous approaches. According to the research paper, materials produced by MatterGen are “more than twice as likely to be novel and stable, and more than 15 times closer to the local energy minimum” compared to previous AI approaches. This means the generated materials are both more likely to be useful and physically possible to create. In one striking demonstration, the team collaborated with scientists at China’s Shenzhen Institutes of Advanced Technology to synthesize a new material, TaCr2O6, that MatterGen had designed. The real-world material closely matched the AI’s predictions, validating the system’s practical utility. Real-world applications could transform energy storage and computing The system is particularly notable for its flexibility. It can be “fine-tuned” to generate materials with specific properties — from particular crystal structures to desired electronic or magnetic characteristics. This could be invaluable for designing materials for specific industrial applications. The implications could be far-reaching. New materials are crucial for advancing technologies in energy storage, semiconductor design and carbon capture. For instance, better battery materials could accelerate the transition to electric vehicles, while more efficient solar cell materials could make renewable energy more cost-effective. “From an industrial perspective, the potential here is enormous,” Xie explained. “Human civilization has always depended on material innovations. If we can use generative AI to make materials design more efficient, it could accelerate progress in industries like energy, healthcare and beyond.” Microsoft’s open source strategy aims to accelerate scientific discovery Microsoft has released MatterGen’s source code under an open-source license, allowing researchers worldwide to build upon the technology. This move could accelerate the system’s impact across various scientific fields. The development of MatterGen is part of Microsoft’s broader AI for Science initiative, which aims to accelerate scientific discovery using AI. The project integrates with Microsoft’s Azure Quantum Elements platform, potentially making the technology accessible to businesses and researchers through cloud computing services. However, experts caution that while MatterGen represents a significant advance, the path from computationally designed materials to practical applications still requires extensive testing and refinement. The system’s predictions, while promising, need experimental validation before industrial deployment. Nevertheless, the technology represents a significant step forward in using AI to accelerate scientific discovery. As Daniel Zügner, a senior researcher on the project, noted, “We’re deeply committed to research that can have a positive, real-world impact, and this is just the beginning.” source

Microsoft just built an AI that designs materials for the future: Here’s how it works Read More »

Meta Wants Mass. Justices To Intervene In AG's Suit

By Julie Manganis ( January 23, 2025, 5:31 PM EST) — Meta Platforms has urged Massachusetts’ highest court to take up its challenge to a pending lawsuit brought by the state attorney general’s office, which accused the social media company of intentionally designing Instagram to be addictive to children and teenagers…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Meta Wants Mass. Justices To Intervene In AG's Suit Read More »

Hugging Face shrinks AI vision models to phone-friendly size, slashing computing costs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hugging Face has achieved a remarkable breakthrough in AI, introducing vision-language models that run on devices as small as smartphones while outperforming their predecessors that require massive data centers. The company’s new SmolVLM-256M model, requiring less than one gigabyte of GPU memory, surpasses the performance of their Idefics 80B model from just 17 months ago — a system 300 times larger. This dramatic reduction in size and improvement in capability marks a watershed moment for practical AI deployment. “When we released Idefics 80B in August 2023, we were the first company to open-source a video language model,” Andrés Marafioti, machine learning research engineer at Hugging Face, said in an exclusive interview with VentureBeat. “By achieving a 300X size reduction while improving performance, SmolVLM marks a breakthrough in vision-language models.” Performance comparison of Hugging Face’s new SmolVLM models shows the smaller versions (256M and 500M) consistently outperforming their 80-billion-parameter predecessor across key visual reasoning tasks. (Credit: Hugging Face) Smaller AI models that run on everyday devices The advancement arrives at a crucial moment for enterprises struggling with the astronomical computing costs of implementing AI systems. The new SmolVLM models — available in 256M and 500M parameter sizes — process images and understand visual content at speeds previously unattainable at their size class. The smallest version processes 16 examples per second while using only 15GB of RAM with a batch size of 64, making it particularly attractive for businesses looking to process large volumes of visual data. “For a mid-sized company processing 1 million images monthly, this translates to substantial annual savings in compute costs,” Marafioti told VentureBeat. “The reduced memory footprint means businesses can deploy on cheaper cloud instances, cutting infrastructure costs.” The development has already caught the attention of major technology players. IBM has partnered with Hugging Face to integrate the 256M model into Docling, their document processing software. “While IBM certainly has access to substantial compute resources, using smaller models like these allows them to efficiently process millions of documents at a fraction of the cost,” said Marafioti. Processing speeds of SmolVLM models across different batch sizes, showing how the smaller 256M and 500M variants significantly outperform the 2.2B version on both A100 and L4 graphics cards. (Credit: Hugging Face) How Hugging Face reduced model size without compromising power The efficiency gains come from technical innovations in both vision processing and language components. The team switched from a 400M parameter vision encoder to a 93M parameter version and implemented more aggressive token compression techniques. These changes maintain high performance while dramatically reducing computational requirements. For startups and smaller enterprises, these developments could be transformative. “Startups can now launch sophisticated computer vision products in weeks instead of months, with infrastructure costs that were prohibitive mere months ago,” said Marafioti. The impact extends beyond cost savings to enabling entirely new applications. The models are powering advanced document search capabilities through ColiPali, an algorithm that creates searchable databases from document archives. “They obtain very close performances to those of models 10X the size while significantly increasing the speed at which the database is created and searched, making enterprise-wide visual search accessible to businesses of all types for the first time,” Marafioti explained. A breakdown of SmolVLM’s 1.7 billion training examples shows document processing and image captioning comprising nearly half of the dataset. (Credit: Hugging Face) Why smaller AI models are the future of AI development The breakthrough challenges conventional wisdom about the relationship between model size and capability. While many researchers have assumed that larger models were necessary for advanced vision-language tasks, SmolVLM demonstrates that smaller, more efficient architectures can achieve similar results. The 500M parameter version achieves 90% of the performance of its 2.2B parameter sibling on key benchmarks. Rather than suggesting an efficiency plateau, Marafioti sees these results as evidence of untapped potential: “Until today, the standard was to release VLMs starting at 2B parameters; we thought that smaller models were not useful. We are proving that, in fact, models at 1/10 of the size can be extremely useful for businesses.” This development arrives amid growing concerns about AI’s environmental impact and computing costs. By dramatically reducing the resources required for vision-language AI, Hugging Face’s innovation could help address both issues while making advanced AI capabilities accessible to a broader range of organizations. The models are available open-source, continuing Hugging Face’s tradition of increasing access to AI technology. This accessibility, combined with the models’ efficiency, could accelerate the adoption of vision-language AI across industries from healthcare to retail, where processing costs have previously been prohibitive. In a field where bigger has long meant better, Hugging Face’s achievement suggests a new paradigm: The future of AI might not be found in ever-larger models running in distant data centers, but in nimble, efficient systems running right on our devices. As the industry grapples with questions of scale and sustainability, these smaller models might just represent the biggest breakthrough yet. source

Hugging Face shrinks AI vision models to phone-friendly size, slashing computing costs Read More »