Microsoft just launched powerful AI ‘agents’ that could completely transform your workday — and challenge Google’s workplace dominance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft announced today a major expansion of its artificial intelligence tools with the “Microsoft 365 Copilot Wave 2 Spring release,” introducing new AI “agents” designed to function as digital colleagues that can perform complex workplace tasks through deep reasoning capabilities. In an exclusive interview, Aparna Chennapragada, Chief Product Officer of Experiences and Devices at Microsoft, told VentureBeat the company is building toward a vision where AI serves as more than just a tool — becoming an integral collaborator in daily work. “We are around the corner from a big moment in the AI world,” Chennapragada said. “It started out with all of the model advances, and everyone’s been really excited about it and the intelligence abundance. Now it’s about making sure that intelligence is available to all of the folks, especially at work.” The announcement accompanies Microsoft’s 2025 Work Trend Index, a comprehensive research report based on surveys of 31,000 workers across 31 countries, documenting the emergence of what Microsoft calls “Frontier Firms” — organizations restructuring around AI-powered intelligence and human-agent collaboration. Microsoft envisions a three-phase evolution of AI adoption, culminating in ‘human-led, agent-operated’ workplaces where employees direct AI systems. (Credit: Microsoft) How Microsoft’s new ‘Researcher’ and ‘Analyst’ agents bring deep reasoning to enterprise work At the center of Microsoft’s vision are two new AI agents named Researcher and Analyst, powered by OpenAI’s deep reasoning models. These agents are designed to handle complex research tasks and data analysis that previously required specialized human expertise. “Think of them as you know, like a really smart researcher and a data scientist in your pocket,” Chennapragada explained. She described how the Researcher agent recently helped her prepare for a business review by connecting information across various sources. “I was using it to say, hey, I have an important business review coming up… pull all the past meetings, past emails, figure out the CRM data, and then say, ‘Give me constructive, sharp inputs on how I should be able to push the ball forward for this meeting,’” she said. “Because of the deep reasoning, it actually made connections that I hadn’t thought of.” These agents will be available through a new “Agent Store,” which will also feature agents from partners like Jira, Monday.com, and Miro, as well as custom agents built by organizations themselves. Workers face an interruption every two minutes and a dramatic surge in last-minute work, Microsoft data reveals, creating what the company calls a ‘capacity gap’. (Credit: Microsoft) Beyond chat: How Copilot is becoming the ‘browser for AI’ in Microsoft’s enterprise strategy Microsoft is positioning Copilot as a central organizing layer for AI interactions, similar to how web browsers organize internet content—not just a chatbot interface. “I look at Copilot as the browser for the AI world,” Chennapragada said. “In internet, we had websites, but we had the browser to organize the layer. For us, Copilot is this organizing layer, this browser for this AI world.” This vision extends beyond simple text interactions. The company is introducing Copilot Notebooks, which allows users to ground AI interactions in specific collections of files and meeting notes. A new Copilot Search feature provides AI-powered enterprise search capabilities across multiple applications. “Today, most of AI, we have equated it to chat,” Chennapragada noted. “Sometimes I feel like we’re in the DOS pre-GUI era, where you have this amazing intelligence, and you’re like, ‘oh, we have an AOL dial-up modem stuck on top of it.’” To address this limitation, Microsoft is bringing OpenAI’s GPT-4o AI image generation capabilities to business settings with a new Create feature, allowing employees to generate and modify brand-compliant images. With 80% of workers reporting insufficient time or energy, Microsoft sees AI agents as the solution to closing the productivity gap. (Credit: Microsoft) Employee burnout and workplace interruptions: The ‘capacity gap’ driving Microsoft’s AI focus Microsoft’s research reveals a significant “Capacity Gap” — 53% of leaders say productivity must increase, but 80% of the global workforce reports lacking the time or energy to do their work. The company’s telemetry data shows employees face 275 interruptions per day from meetings, emails, or messages—an interruption every two minutes during core work hours. “There’s so much more pent-up, latent demand for work and productivity and output,” Chennapragada said. “That statistic really stood out for me, that there’s so much more pent-up, latent demand for work and productivity and output. So I see this as an augmentation, less of a job displacement.” The research also indicates a shift in AI adoption patterns. While last year’s adoption was largely employee-led, this year shows a more top-down approach, with 81% of business decision makers saying they want to rethink core strategy and operations with AI. “That’s a shift between even last year, where it was much more bottom-up and employee-led,” Chennapragada noted. “What that tells us is there needs to be a much more of a top-down AI strategy, but also AI products that you roll out in the enterprise with security, with compliance, with all of the guardrails.” Leaders outpace employees on every measure of ‘agent boss mindset,’ with a 27-point gap in familiarity with AI agents, Microsoft’s research shows. (Credit: Microsoft) Rise of the ‘agent boss’: How Microsoft envisions employees managing digital workforces Microsoft predicts a fundamental restructuring of organizations around what it calls “Work Charts”—more fluid, outcome-driven team structures powered by agents that expand employee capabilities. This reorganization will require determining the optimal “human-agent ratio” for different functions, a metric that will vary by task and team. The company expects every employee to become an “agent boss”—someone who manages AI agents to amplify their impact. “For us at Microsoft, it’s not enough if 2% of our customers’ company adopts AI, it is really bringing the entire company along. That’s when you get the full productivity gains,” Chennapragada emphasized. The company’s research shows leaders are currently ahead of employees in embracing this mindset, with 67% of

Microsoft just launched powerful AI ‘agents’ that could completely transform your workday — and challenge Google’s workplace dominance Read More »

Thanks to AI, the data reckoning has arrived

2. Data classification As data gets housed in data lakes and other increasingly connected ways, another challenge is classification. Who is allowed to look at particular data? From government security classifications to confidential HR information, data shouldn’t be accessible to everyone. Data must be properly classified, and those categories and the limits they entail must be maintained and live on as companies integrate and harness data in new ways. 3. Stability A lot of data is transient. If you’re taking data from sensors, for example, you need to understand how often you’ll refresh the data based on sensor readings. This is an issue of data stability, as constantly changing data may lead to different results. Data is also aging. For example, imagine you had a specific process for raising a job requisition for a new employee for nine years, but you revised the process last year. If you use all 10 years’ worth of data to train a model and then ask how to open a job requisition, most of the time, you will get a wrong answer because most of the data is outdated. source

Thanks to AI, the data reckoning has arrived Read More »

11 Tips For Contractors Dealing With DOD Staff Reductions

By Scott Freling and Homer La Rue ( April 22, 2025, 5:06 PM EDT) — Since Jan. 20, the Trump administration has taken numerous executive actions that affect federal government contractors and grant recipients. In just a single day earlier this month, the Trump administration issued a series of executive orders and presidential memoranda that, among other things, seek to reform the defense acquisition system, bolster the U.S. maritime industry and streamline foreign military sales. The potential impact of these changes on U.S. Department of Defense procurement policy are relatively self-evident…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

11 Tips For Contractors Dealing With DOD Staff Reductions Read More »

When AI writes the laws: UAE’s bold move forces a rethink on compliance and human touch

Paving the way for smarter compliance The UAE’s new AI system marks a major shift for businesses facing complex regulations. With the global AI legal tech market set to grow from $1.2 billion in 2024 to $3.5 billion by 2030, there’s a clear demand for automation in tasks like eDiscovery and regulatory reporting, according to Grand View Research. In the UAE, where rapid economic growth is a priority, AI is set to play a critical role. Manish Bahl, Founder and CEO of Curious Insights, sees a clear path: “Over the next five years, AI will automate compliance monitoring, deliver real-time risk alerts, and simplify due diligence, helping organizations stay ahead of regulatory changes.” However, as AI enters the legal space, experts caution that integration must be done thoughtfully. Ekhlaque Bari, AI Strategy Consultant at Minfy Technologies and a former CIO, stresses that if legal systems aren’t central to a business, AI-driven tools can still serve as standalone solutions for legal and contracting teams. But Bari also pointed out the challenge of balancing data security and privacy while keeping the system up to date with evolving laws. source

When AI writes the laws: UAE’s bold move forces a rethink on compliance and human touch Read More »

Sun Life’s Laura Money on driving innovation with AI and championing women in tech.

sure, you can’t go anywhere and still, and it’s been a couple years now, you can’t go anywhere in tech and not talk about generative AI or AI in general, in fact. And it really has been an incredible opportunity to use a new tool to help our business, businesses progress and help technology be more efficient as well. And it really does transform the way we’re working, so we can be better at our meeting our clients needs and moving faster. And all those aspects and everything, everything you read at you know we’re I think what we were lucky about is our CEO, Kevin strain was really keen right from the start that we understand the technology and understand how it could be used. And one of the very first things we did was to create a safe and secure internal chat bot. We call it sunlight pass. It’s got well over a million queries. We’re seeing over 20,000 visits to beat. And employee. Employees are using it for a whole variety of tasks. I love what they’re using it for. I’m less concerned about what they’re using for other but I what I’m thrilled about is people are using it, and they’re starting to understand what this technology is capable of. And to me, that’s the most important thing. Sure, people have used it to build scripts for videos. They’ve used it to analyze reports. In some case, people have used it for coding, even though we have, you know, copilot for GitHub. But we are actually finding that it’s the learning and understanding of it that is really perhaps the most valuable thing, and we’ve provided training to all our employees around AI and generative AI. Again, that’s more from the point of view of if all the employees understand the capabilities and what the potential of this technology is, then we have all those different ways that we might be able to innovate. In fact, in 2024 just last year, we were recognized by CIO awards Canada for our innovation with some like that. Yeah, but we’ve, you know, we’ve taken it. We have, we have a Gen AI Iris, our chat bot, which is helping our Service Desk support our employees. So that’s our internal IT help desk, so it’s actually able to answer about 80% of employee queries, which reduces, for those 80% of queries, the resolution time from 12 to two minutes. And that’s where we’re really starting to test and learn in a real environment with agentic AI. So the intent is that IRIS will be able to do everything that the Service Desk can do in terms of those simple asks, and, you know, order a new phone, order a new screen, reset passwords. Of course, that’s been around for a while, but it’s, it’s been really helpful, and because of that, we now have saw squads. Jenny I squad seated in each of our business groups, and they have a big, long backlog of things that they’re they’re working through. So our employees are have been so enthusiastic that they actually organize themselves into functional a multi function team to actually have set our guardrails. So up until now, we have not had Gen AI interact directly with clients. We’ve always kept the people in the loop, and that was one of the first guardrails. And of course, there’s guardrails around data, there’s guardrails around privacy, there’s guardrails around new technologies, but they’ve also allowed us to go fast with certain types of Gen AI. So Chatbot is a good one, we now have the ability to build, in a matter of days, a chat bot that can be based on any knowledge base. And so because we can do that, you can imagine, there’s literally backlogs. And I would say probably we now have 10s of these that are live, and hundreds in backlogs, because, for example, one of the first ones we did was the HR policies and procedures. How many times have you wanted to know? Oh, you know, what’s the limit for, I don’t know, lunch for two colleagues when you’re doing the coaching session. Well, now you can put that in and immediately get an answer, rather than having to find the email, where is that linked? Policy, etc, etc. So, so that proved really useful. It proved useful in different contact centers for different types of you know, when we have complex knowledge bases and and again, those are just evolving to be more and more sophisticated. And so that safe adoption is something that we’re really focused on as well and as part of the training, and has been part of the guardrails that people have have put in place, that they actually self organized and realized as employees. If we don’t do this safely, we’re not going to be able to be effective at it. So that’s probably our biggest learning, is involve, involve the and help entice the broader team into understanding the capabilities of the technology. And then the other one that we’re really thinking a lot about now is change management. So there’s a good portion of our employees who just immediately embrace Gen AI. They love it. They want to use it for all sorts of things. But that’s not everybody. So you really need to think about, how do you manage through some of the change for some of our employees? And of course, that helps with the communication, leader, advocacy, great stories and testimonials. So those are, those are some of the, those are some of the things that we’re we’re doing, but it’s, yeah, and don’t be afraid to fail. We’ve worked a lot on that. Don’t be afraid to fail. Pivot and, yeah, you know, move Unknown Speaker forward. source

Sun Life’s Laura Money on driving innovation with AI and championing women in tech. Read More »

The Verdict Is In, It’s Buying Groups For The Win

We’ve just wrapped up another great B2B Summit event, and buying groups was all the buzz … again. Marketing leaders have built their plans, budgets, metrics, and success based on the lead (marketing-qualified lead, or MQL) for decades now. But for most organizations we talk to, it’s not working like it used to. That decision-maker who sales wants to talk to isn’t working alone and definitely isn’t taking cold calls or emails. In fact, Forrester’s Buyers’ Journey Survey, 2024, says that on average, 13 people are involved in making a purchasing decision. If your organization is aiming to enhance performance and foster sustainable revenue growth, it’s time to pivot from individual leads to engaging multiple members of the buying group. This shift is not only beneficial but essential. The Real Opportunity Is Buying Groups Organizations must shift to looking beyond the lead to the group of individuals involved in making buying decisions. The typical B2B buying group is made up of individuals from different parts of the organization, each with their unique needs and roles in the buying process, including decision-maker, champion, influencer, ratifier, and user. Engaging with the entire buying group, understanding the roles that each member plays, and catering to their specific needs is the key to unlocking more opportunities and driving growth. Providers that have shifted their focus to buying groups have seen significant benefits, from uncovering more opportunities from hidden prospects and existing engagement to improving sales efficiency. “It’s all about adding buying group members with the right titles or more deeply qualifying the existing members with the right titles,” said Jeremy Schwartz, senior manager of global lead management and strategy at Palo Alto Networks and Forrester 2025 B2B Program Of The Year Awards Winner. “Our revenue process transformation approach resulted in significant improvements. During the pilot stage, win rates doubled. Upon fully scaling, we saw an 800% increase in opportunity progression to forecast. We also saw a large increase in business development rep conversions and average deal sizes for the quarter that, if applied to our previous year’s results, would have generated an estimated 13% increase in revenue.” Three Steps To Shift To Buying Groups Today Ready to make the shift? Here are three steps to get you started today: Outline the buying group and roles. Think of it as moving from a simple game of checkers to a strategic game of chess. Each member of the buying group plays a different role, from the champion pushing for change to the decision-maker holding the budget. How will your product or solution help each of them? Connect and package signals for revenue development teams. Once you’ve got a good sense of the buying group, pass that info along to your revenue development reps (also known as business development reps or sales development reps). Equip them with the insights needed to identify and engage other members of the group. Assemble buying groups and signals in early-stage opportunities. The sooner you can get a clear picture of the buying group in your sales and marketing systems, the better, as your visibility into potential revenue will improve. Why You Should Make This Shift And Begin Your Revenue Process Transformation In today’s world, it shouldn’t be a surprise that buyers have more power than ever before and expect more personalized experiences. By focusing on buying groups rather than individual leads, we can meet these expectations, uncover more opportunities, and drive sustainable growth. Plus, aligning marketing, sales, and customer success around the buyer’s journey ensures that we’re all working together to deliver value and a consistent, relevant experience every step of the way. Making the move to buying groups is not just about improving conversion rates; it’s the shift from being revenue-obsessed to being truly customer-obsessed and better aligned to how businesses make purchasing decisions today. Making the shift to buying groups isn’t new news, but after Summit this year, it’s buying groups for the win. To learn more about how to embrace the shift to buying groups, check out our latest research, Buying Groups For The Win. Want to know about some of the organizations that have already begun this shift? Hear directly from them in this webinar with Siemens, Palo Alto Networks, and Zendesk. source

The Verdict Is In, It’s Buying Groups For The Win Read More »

Breaking Down the Walls Between IT and OT

IT and OT systems can seem worlds apart, and historically, they have been treated that way. Different teams and departments managed their operations, often with little or no communication. But over time OT systems have become increasingly networked, and those two worlds are bleeding into one another. And threat actors are taking advantage.   Organizations that have IT and OT systems — oftentimes critical infrastructure organizations — the risk to both of these environments is present and pressing. CISOs and other security leaders are tasked with the challenge of breaking down the barriers between the two to create a comprehensive cybersecurity strategy.   The Gulf Between IT and OT   Why are IT and OT treated as such separate spheres when both face cybersecurity threats?  “Even though there’s cyber on both sides, they are fundamentally different in concept,” Ian Bramson, vice president of global industrial cybersecurity at Black & Veatch, an engineering, procurement, consulting, and construction company, tells InformationWeek. “It’s one of the things that have kept them more apart traditionally.”  Age is one of the most prominent differences. In a Fortinet survey of OT organizations, 74% of respondents shared that the average age of their industrial control systems is between six and 10 years old.   Related:Surgical Center CIO Builds an IT Department OT technology is built to last for years, if not decades, and it is deeply embedded in an organization’s operations. The lifespan of IT, on the other hand, looks quite different.  “OT is looked at as having a much longer lifespan, 30 to 50 years in some cases. An IT asset, the typical laptop these days that’s issued to an individual in a company, three years is about when most organization start to think about issuing a replacement,” says Chris Hallenbeck, CISO at endpoint management company Tanium.   Maintaining IT and OT systems looks very different, too. IT teams can have regular patching schedules. OT teams have to plan far in advance for maintenance windows, if the equipment can even be updated. Downtime in OT environments is complicated and costly.   The skillsets required of the teams to operate IT and OT systems are also quite different. On one side, you likely have people skilled in traditional systems engineering. They may have no idea how to manage the programmable logic controllers (PLC) commonly used in OT systems.   The divide between IT and OT has been, in some ways, purposeful. The Purdue model, for example, provides a framework for segmenting ICS networks, keeping them separate from corporate networks and the internet.   Related:Knowledge Gaps Influence CEO IT Decisions But over time, more and more occasions to cross the gulf between IT and OT systems — intentionally and unintentionally — have arisen.   People working on the OT side want the ability to monitor and control industrial processes remotely. “If I want to do that remotely, I need to facilitate that connectivity. I need to get data out of these systems to review it and analyze it in a remote location. And then send commands back down to that system,” Sonu Shankar, CPO at Phosphorus, an enterprise xIoT cybersecurity company, explains.   The very real possibility that OT and IT systems intersect accidentally is another consideration for CISOs. Hallenbeck has seen an industrial arc welder plugged into the IT side of an environment, unbeknownst to the people working at the company.   “Somehow that system was even added to the IT active directory, and they just were operating it as if it was a regular Windows server, which in every way it was, except for the part where it was directly attached to an industrial system,” he shares. “It happens far too often.”  Cyberattack vectors on IT and OT environments look different and result in different consequences.   “On the IT side, the impact is primarily data loss and all of the second order effects of your data getting stolen or your data getting held for ransom,” says Shankar. “Disrupt the manufacturing process, disrupt food production, disrupt oil and gas production, disrupt power distribution … the effects are more obvious to us in the physical world.”  Related:The Kraft Group CIO Talks Gillette Stadium Updates and FIFA World Cup Prep While the differences between IT and OT are apparent, enterprises ignore the reality of the two worlds’ convergence at their peril. As the connectivity between these systems grows, so do their dependencies and the potential consequences of an attack.   Ultimately, a business does not care if a threat actor compromised an IT system or an OT system. They care about the impact. Has the attack resulted in data theft? Has it impacted physical safety? Can the business operate and generate revenue?   “You have to start thinking of that holistically as one system against those consequences,” urges Bramson.   Integrating IT and OT Cybersecurity  How can CISOs create a cybersecurity strategy that effectively manages IT and OT?  The first step is gaining a comprehensive understanding of what devices and systems are a part of both the IT and OT spheres of a business. Without that information, CISOs cannot quantify and mitigate risk.  “You need to know that the systems exist. There’s this tendency to just put them on the other side of a wall, physical or virtual, and no one knows what number of them exist, what state they’re in, what versions they’re in,” says Hallenbeck.   In one of his CISO roles, Christos Tulumba, CISO at data security and management company Cohesity, worked with a company that had multiple manufacturing plants and distribution centers. The IT and OT sides of the house operated quite separately.   “I walked in there … I did my first network map, and I saw all this exposure all over,” he tells InformationWeek. “It raised a lot of alarms.”  Once CISOs have that network map on the IT and OT side, they can begin to assess risk and build a strategy for mitigation. Are there devices running on default passwords? Are there devices running suboptimal configurations or vulnerable firmware? Are there unnecessary IT and OT connections?   “You start

Breaking Down the Walls Between IT and OT Read More »

Rivian Secures Calif. State Court Win Over Investors' IPO Suit

By Katryna Perera ( April 24, 2025, 6:49 PM EDT) — A California state appellate court affirmed the dismissal of a suit brought against Rivian Automotive accusing the electric vehicle manufacturer and its underwriters of misleading investors ahead of its blockbuster 2021 initial public offering, finding that Rivian’s articles of incorporation direct any federal securities-related claims to federal court…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Rivian Secures Calif. State Court Win Over Investors' IPO Suit Read More »

Amazon’s SWE-PolyBench just exposed the dirty secret about your AI coding assistant

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Amazon Web Services today introduced SWE-PolyBench, a comprehensive multi-language benchmark designed to evaluate AI coding assistants across a diverse range of programming languages and real-world scenarios. The benchmark addresses significant limitations in existing evaluation frameworks and offers researchers and developers new ways to assess how effectively AI agents navigate complex codebases. “Now they have a benchmark that they can evaluate on to assess whether the coding agents are able to solve complex programming tasks,” said Anoop Deoras, Director of Applied Sciences for Generative AI Applications and Developer Experiences at AWS, in an interview with VentureBeat. “The real world offers you more complex tasks. In order to fix a bug or do feature building, you need to touch multiple files, as opposed to a single file.” The release comes as AI-powered coding tools have exploded in popularity, with major technology companies integrating them into development environments and standalone products. While these tools show impressive capabilities, evaluating their performance has remained challenging — particularly across different programming languages and varying task complexities. SWE-PolyBench contains over 2,000 curated coding challenges derived from real GitHub issues spanning four languages: Java (165 tasks), JavaScript (1,017 tasks), TypeScript (729 tasks), and Python (199 tasks). The benchmark also includes a stratified subset of 500 issues (SWE-PolyBench500) designed for quicker experimentation. “The task diversity and the diversity of the programming languages was missing,” Deoras explained about existing benchmarks. “In SWE-Bench today, there is only a single programming language, Python, and there is a single task: bug fixes. In PolyBench, as opposed to SWE-Bench, we have expanded this benchmark to include three additional languages.” The new benchmark directly addresses limitations in SWE-Bench, which has emerged as the de facto standard for coding agent evaluation with over 50 leaderboard submissions. Despite its pioneering role, SWE-Bench focuses solely on Python repositories, predominantly features bug-fixing tasks, and is significantly skewed toward a single codebase — the Django repository accounts for over 45% of all tasks. “Intentionally, we decided to have a little bit over representation for JavaScript and TypeScript, because we do have SWE-Bench which has Python tasks already,” Deoras noted. “So rather than over representing on Python, we made sure that we have enough representations for JavaScript and TypeScript in addition to Java.” Why simple pass/fail metrics don’t tell the whole story about AI coding performance A key innovation in SWE-PolyBench is its introduction of more sophisticated evaluation metrics beyond the traditional “pass rate,” which simply measures whether a generated patch successfully resolves a coding issue. “The evaluation of these coding agents have primarily been done through the metric called pass rate,” Deoras said. “Pass rate, in short, is basically just a proportion of the tasks that successfully run upon the application of the patch that the agents are producing. But this number is a very high level, aggregated statistic. It doesn’t tell you the nitty gritty detail, and in particular, it doesn’t tell you how the agent came to that resolution.” The new metrics include file-level localization, which assesses an agent’s ability to identify which files need modification within a repository, and Concrete Syntax Tree (CST) node-level retrieval, which evaluates how accurately an agent can pinpoint specific code structures requiring changes. “In addition to pass rate, we have the precision and recall. And in order to get to the precision and recall metric, we are looking at a program analysis tool called concrete syntax tree,” Deoras explained. “It is telling you how your core file structure is composed, so that you can look at what is the class node, and within that class, what are the function nodes and the variables.” How Python remains dominant while complex tasks expose AI limitations Amazon’s evaluation of several open-source coding agents on SWE-PolyBench revealed several patterns. Python remains the strongest language for all tested agents, likely due to its prevalence in training data and existing benchmarks. Performance degrades as task complexity increases, particularly when modifications to three or more files are required. Different agents show varying strengths across task categories. While performance on bug-fixing tasks is relatively consistent, there’s more variability between agents when handling feature requests and code refactoring. The benchmark also found that the informativeness of problem statements significantly impacts success rates, suggesting that clear issue descriptions remain crucial for effective AI assistance. What SWE-PolyBench means for enterprise developers working across multiple languages SWE-PolyBench arrives at a critical juncture in the development of AI coding assistants. As these tools move from experimental to production environments, the need for rigorous, diverse, and representative benchmarks has intensified. “Over time, not only the capabilities of LLMs have evolved, but at the same time, the tasks have gotten more and more complex,” Deoras observed. “There is a need for developers to solve more and more complex tasks in a synchronous manner using these agents.” The benchmark’s expanded language support makes it particularly valuable for enterprise environments where polyglot development is common. Java, JavaScript, TypeScript, and Python consistently rank among the most popular programming languages in enterprise settings, making SWE-PolyBench’s coverage highly relevant to real-world development scenarios. Amazon has made the entire SWE-PolyBench framework publicly available. The dataset is accessible on Hugging Face, and the evaluation harness is available on GitHub. A dedicated leaderboard has been established to track the performance of various coding agents on the benchmark. “We extended the SWE-Bench data acquisition pipeline to support these three additional languages,” Deoras said. “The hope is that we will be able to extrapolate this process further in the future and extend beyond four languages, extend beyond the three tasks that I talked about, so that this benchmark becomes even more comprehensive.” As the AI coding assistant market heats up with offerings from every major tech company, SWE-PolyBench provides a crucial reality check on their actual capabilities. The benchmark’s design acknowledges that real-world software development demands more than simple bug fixes in Python—it requires working across languages,

Amazon’s SWE-PolyBench just exposed the dirty secret about your AI coding assistant Read More »

Meta MDL Judge Doubts Insurers' Bid To Kick Fight To Del.

By Dorothy Atkins ( April 23, 2025, 8:10 PM EDT) — A California federal judge presiding over sprawling social media personal injury multidistrict litigation doubted on Wednesday insurers’ arguments their multimillion-dollar coverage fight with Meta belongs in Delaware state court, questioning how moving the case would preserve judicial resources, while observing that Hartford’s pre-litigation conduct may have been in bad faith…. Law360 is on it, so you are, too. A Law360 subscription puts you at the center of fast-moving legal issues, trends and developments so you can act with speed and confidence. Over 200 articles are published daily across more than 60 topics, industries, practice areas and jurisdictions. A Law360 subscription includes features such as Daily newsletters Expert analysis Mobile app Advanced search Judge information Real-time alerts 450K+ searchable archived articles And more! Experience Law360 today with a free 7-day trial. source

Meta MDL Judge Doubts Insurers' Bid To Kick Fight To Del. Read More »