VentureBeat

5 key questions your developers should be asking about MCP

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now The Model Context Protocol (MCP) has become one of the most talked-about developments in AI integration since its introduction by Anthropic in late 2024. If you’re tuned into the AI space at all, you’ve likely been inundated with developer “hot takes” on the topic. Some think it’s the best thing ever; others are quick to point out its shortcomings. In reality, there’s some truth to both. One pattern I’ve noticed with MCP adoption is that skepticism typically gives way to recognition: This protocol solves genuine architectural problems that other approaches don’t. I’ve gathered a list of questions below that reflect the conversations I’ve had with fellow builders who are considering bringing MCP to production environments. 1. Why should I use MCP over other alternatives? Of course, most developers considering MCP are already familiar with implementations like OpenAI’s custom GPTs, vanilla function calling, Responses API with function calling, and hardcoded connections to services like Google Drive. The question isn’t really whether MCP fully replaces these approaches — under the hood, you could absolutely use the Responses API with function calling that still connects to MCP. What matters here is the resulting stack. Despite all the hype about MCP, here’s the straight truth: It’s not a massive technical leap. MCP essentially “wraps” existing APIs in a way that’s understandable to large language models (LLMs). Sure, a lot of services already have an OpenAPI spec that models can use. For small or personal projects, the objection that MCP “isn’t that big a deal” is pretty fair. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF The practical benefit becomes obvious when you’re building something like an analysis tool that needs to connect to data sources across multiple ecosystems. Without MCP, you’re required to write custom integrations for each data source and each LLM you want to support. With MCP, you implement the data source connections once, and any compatible AI client can use them. 2. Local vs. remote MCP deployment: What are the actual trade-offs in production? This is where you really start to see the gap between reference servers and reality. Local MCP deployment using the stdio programming language is dead simple to get running: Spawn subprocesses for each MCP server and let them talk through stdin/stdout. Great for a technical audience, difficult for everyday users. Remote deployment obviously addresses the scaling but opens up a can of worms around transport complexity. The original HTTP+SSE approach was replaced by a March 2025 streamable HTTP update, which tries to reduce complexity by putting everything through a single /messages endpoint. Even so, this isn’t really needed for most companies that are likely to build MCP servers. But here’s the thing: A few months later, support is spotty at best. Some clients still expect the old HTTP+SSE setup, while others work with the new approach — so, if you’re deploying today, you’re probably going to support both. Protocol detection and dual transport support are a must. Authorization is another variable you’ll need to consider with remote deployments. The OAuth 2.1 integration requires mapping tokens between external identity providers and MCP sessions. While this adds complexity, it’s manageable with proper planning. 3. How can I be sure my MCP server is secure? This is probably the biggest gap between the MCP hype and what you actually need to tackle for production. Most showcases or examples you’ll see use local connections with no authentication at all, or they handwave the security by saying “it uses OAuth.” The MCP authorization spec does leverage OAuth 2.1, which is a proven open standard. But there’s always going to be some variability in implementation. For production deployments, focus on the fundamentals: Proper scope-based access control that matches your actual tool boundaries Direct (local) token validation Audit logs and monitoring for tool use However, the biggest security consideration with MCP is around tool execution itself. Many tools need (or think they need) broad permissions to be useful, which means sweeping scope design (like a blanket “read” or “write”) is inevitable. Even without a heavy-handed approach, your MCP server may access sensitive data or perform privileged operations — so, when in doubt, stick to the best practices recommended in the latest MCP auth draft spec. 4. Is MCP worth investing resources and time into, and will it be around for the long term? This gets to the heart of any adoption decision: Why should I bother with a flavor-of-the-quarter protocol when everything AI is moving so fast? What guarantee do you have that MCP will be a solid choice (or even around) in a year, or even six months? Well, look at MCP’s adoption by major players: Google supports it with its Agent2Agent protocol, Microsoft has integrated MCP with Copilot Studio and is even adding built-in MCP features for Windows 11, and Cloudflare is more than happy to help you fire up your first MCP server on their platform. Similarly, the ecosystem growth is encouraging, with hundreds of community-built MCP servers and official integrations from well-known platforms. In short, the learning curve isn’t terrible, and the implementation burden is manageable for most teams or solo devs. It does what it says on the tin. So, why would I be cautious about buying into the hype? MCP is fundamentally designed for current-gen AI systems, meaning it assumes you have a human supervising a single-agent interaction. Multi-agent and autonomous tasking are two areas MCP doesn’t really address; in fairness, it doesn’t really need to. But if you’re looking for an evergreen yet still somehow bleeding-edge approach, MCP isn’t it. It’s

5 key questions your developers should be asking about MCP Read More »

OpenAI unveils ‘ChatGPT agent’ that gives ChatGPT its own computer to autonomously use your email and web apps, download and create files for you

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI isn’t letting the delay of its open source AI model slow it down on shipping other features. Today, the company is unveiling ChatGPT agent, a feature that allows its AI chatbot to autonomously browse the web, conduct extensive research, download and create new files for its human users using its own virtual computer. Come again? ChatGPT now gets its own PC? And it can use that PC to log into your, the human user’s, accounts and download or send stuff for you? That’s correct, at least in a virtual sense, according to OpenAI. As the company explains: The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF “The model can choose to open a page using the text browser or visual browser, download a file from the web, manipulate it by running a command in the terminal, and then view the output back in the visual browser. The model adapts its approach to carry out tasks with speed, accuracy, and eﬃciency.” How to use ChatGPT agent Users can engage the agent by clicking on the ‘Tools’ button in the ChatGPT prompt entry box, opening the dropdown menu, and selecting ‘agent mode’ from the available options. Then, when it’s turned on describe a task in plain language, and the agent can carry it out across web and local app environments, combining reasoning with actions that previously only a human user could perform on their own machine, manually. ChatGPT agent can connect to apps like your personal or business Gmail and GitHub, so it can pull in useful information — emails or code — from your accounts to help with tasks you ask it to do. It can connect to third-party application programming interfaces (API) to pull information and use connected applications and services through them, as well. If a website needs you to log in, you can do that securely through a special browser view, which lets the agent dig deeper and handle more personalized tasks, like checking your inbox or filling out forms on your behalf. Going where Operator could not — offline The new ChatGPT agent builds upon and expands from the “Operator” agent OpenAI released in January 2025, which allowed ChatGPT to browse the web and fill out forms, place orders, and do other web-based tasks in a private “headless browser,” that is, a cloud-based custom web browser that OpenAI itself maintained and offered for each Operator session. However, Operator was limited only to interacting with websites and web-based applications — not programs that could also run locally on a PC, such as spreadsheet tabulators and slide deck presentation making software. The new ChatGPT agent can browse websites, interact with online forms, run code, analyze data, and deliver finished outputs — such as editable presentations or spreadsheets — based entirely on user instructions. The unveiling comes on the heels of a report published days ago by independent subscription tech industry website The Information suggesting that OpenAI would upgrade ChatGPT to be a more direct competitor to its own investor Microsoft’s Office software applications (e.g. Excel, Word, PowerPoint, etc.) Merging Operator and Deep Research into one agent In fact, OpenAI positions ChatGPT agent as a merging of two of its prior agents — Operator and Deep Research, the latter introduced in February 2025, which exhaustively searches the web through its own headless text-only browser to find and compile information into lengthy and in-depth reports (hence the name). As OpenAI writes in a blog post: “Operator couldn’t dive deep into analysis or write detailed reports, and deep research couldn’t interact with websites to refine results or access content requiring user authentication. In fact, we saw that many queries users attempted with Operator were actually better suited for deep research, so we brought the best of both together.” The previous Operator tool will be phased out, but users can still access Deep Research via the dropdown in the ChatGPT interface. Whether using a visual browser to interact with a website or a terminal to run Python code, the agent moves seamlessly between tools within a single session. It supports a broad range of use cases, from analyzing competitors and generating reports to planning trips, summarizing emails, or booking appointments. Users can interrupt, redirect, or pause a task at any time, with the agent picking up right where it left off. Availability and access Starting today, subscribers to ChatGPT’s $200-per-month “Pro” tier will have full access to ChatGPT agent, with a monthly quota of 400 messages. ChatGPT Plus ($20 per month) and Team ($30 per month) will gain access over the next few days, with 40 messages per month. Additional usage is available through credit-based options. OpenAI said in a release shared with VentureBeat under embargo that its ChatGPT Enterprise and Education subscribers will gain access to the feature the coming weeks. For now, the feature is not yet available in Europe or Switzerland, no doubt disappointing residents there. Built with safety and control at the forefront Given that the agent can now take actions on behalf of users — including on logged-in websites or with connected apps — OpenAI has introduced extensive safety measures. These include user confirmations before taking action, active supervision for sensitive tasks, and technical safeguards to limit unintended behavior. Key protections include: Confirmation prompts before actions like submitting forms or sending emails Watch Mode, which pauses execution when a user becomes inactive Refusal of high-risk tasks, such as financial transfers or privacy violations No memory retention during agent sessions High-risk domain classification In line with its Preparedness Framework, OpenAI is treating ChatGPT

OpenAI unveils ‘ChatGPT agent’ that gives ChatGPT its own computer to autonomously use your email and web apps, download and create files for you Read More »

Forget the hype — real AI agents solve bounded problems, not open-world fantasies

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Everywhere you look, people are talking about AI agents like they’re just a prompt away from replacing entire departments. The dream is seductive: Autonomous systems that can handle anything you throw at them, no guardrails, no constraints, just give them your AWS credentials and they’ll solve all your problems. But the reality is that’s just not how the world works, especially not in the enterprise, where reliability isn’t optional. Even if an agent is 99% accurate, that’s not always good enough. If it’s optimizing food delivery routes, that means one out of every hundred orders ends up at the wrong address. In a business context, that kind of failure rate isn’t acceptable. It’s expensive, risky and hard to explain to a customer or regulator. In real-world environments like finance, healthcare and operations, the AI systems that actually deliver value don’t look anything like these frontier fantasies. They aren’t improvising in the open world; they’re solving well-defined problems with clear inputs and predictable outcomes. If we keep chasing open-world problems with half-ready technology, we’ll burn time, money and trust. But if we focus on the problems right in front of us, the ones with clear ROI and clear boundaries, we can make AI work today. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF This article is about cutting through the hype and building AI agents that actually ship, run and help. The problem with the open world hype The tech industry loves a moonshot (and for the record, I do too). Right now, the moonshot is open-world AI — agents that can handle anything, adapt to new situations, learn on the fly and operate with incomplete or ambiguous information. It’s the dream of general intelligence: Systems that can not only reason, but improvise. What makes a problem “open world”? Open-world problems are defined by what we don’t know. More formally, drawing from research defining these complex environments, a fully open world is characterized by two core properties: Time and space are unbounded: An agent’s past experiences may not apply to new, unseen scenarios. Tasks are unbounded: They aren’t predetermined and can emerge dynamically. In such environments, the AI operates with incomplete information; it cannot assume that what isn’t known to be true is false, it’s simply unknown. The AI is expected to adapt to these unforeseen changes and novel tasks as it navigates the world. This presents an incredibly difficult set of problems for current AI capabilities. Most enterprise problems aren’t like this In contrast, closed-world problems are ones where the scope is known, the rules are clear and the system can assume it has all the relevant data. If something isn’t explicitly true, it can be treated as false. These are the kinds of problems most businesses actually face every day: invoice matching, contract validation, fraud detection, claims processing, inventory forecasting. Feature Open world Closed world Scope Unbounded Well-defined Knowledge Incomplete Complete (within domain) Assumptions Unknown ≠ false Unknown = false Tasks Emergent, not predefined Fixed, repetitive Testability Extremely hard Well-bounded These aren’t the use cases that typically make headlines, but they’re the ones businesses actually care about solving. The risk of hype and inaction However, the hype is harmful: By setting the bar at open-world general intelligence, we make enterprise AI feel inaccessible. Leaders hear about agents that can do everything, and they freeze, because they don’t know where to start. The problem feels too big, too vague, too risky. It’s like trying to design autonomous vehicles before we’ve even built a working combustion engine. The dream is exciting, but skipping the fundamentals guarantees failure. Solve what’s right in front of you Open-world problems make for great demos and even better funding rounds. But closed-world problems are where the real value is today. They’re solvable, testable and automatable. And they’re sitting inside every enterprise, just waiting for the right system to tackle them. The question isn’t whether AI will solve open-world problems eventually. The question is: What can you actually deploy right now that makes your business faster, smarter and more reliable? What enterprise agents actually look like When people imagine AI agents today, they tend to picture a chat window. A user types a prompt, and the agent responds with a helpful answer (maybe even triggers a tool or two). That’s fine for demos and consumer apps, but it’s not how enterprise AI will actually work in practice. In the enterprise, most useful agents aren’t user-initiated, they’re autonomous. They don’t sit idly waiting for a human to prompt them. They’re long-running processes that react to data as it flows through the business. They make decisions, call services and produce outputs, continuously and asynchronously, without needing to be told when to start. Imagine an agent that monitors new invoices. Every time an invoice lands, it extracts the relevant fields, checks them against open purchase orders, flags mismatches and either routes the invoice for approval or rejection, without anyone asking it to do so. It just listens for the event (“new invoice received”) and goes to work. Or think about customer onboarding. An agent might watch for the moment a new account is created, then kick off a cascade: verify documents, run know-your-customer (KYC) checks, personalize the welcome experience and schedule a follow-up message. The user never knows the agent exists. It just runs. Reliably. In real time. This is what enterprise agents look like: They’re event-driven: Triggered by changes in the system, not user prompts. They’re autonomous: They act without human initiation. They’re continuous: They don’t spin up for a single task and

Forget the hype — real AI agents solve bounded problems, not open-world fantasies Read More »

New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Google has officially moved its new, high-performance Gemini Embedding model to general availability, currently ranking number one overall on the highly regarded Massive Text Embedding Benchmark (MTEB). The model (gemini-embedding-001) is now a core part of the Gemini API and Vertex AI, enabling developers to build applications such as semantic search and retrieval-augmented generation (RAG). While a number-one ranking is a strong debut, the landscape of embedding models is very competitive. Google’s proprietary model is being challenged directly by powerful open-source alternatives. This sets up a new strategic choice for enterprises: adopt the top-ranked proprietary model or a nearly-as-good open-source challenger that offers more control. What’s under the hood of Google’s Gemini embedding model At their core, embeddings convert text (or other data types) into numerical lists that capture the key features of the input. Data with similar semantic meaning have embedding values that are closer together in this numerical space. This allows for powerful applications that go far beyond simple keyword matching, such as building intelligent retrieval-augmented generation (RAG) systems that feed relevant information to LLMs. Embeddings can also be applied to other modalities such as images, video and audio. For instance, an e-commerce company might utilize a multimodal embedding model to generate a unified numerical representation for a product that incorporates both textual descriptions and images. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF For enterprises, embedding models can power more accurate internal search engines, sophisticated document clustering, classification tasks, sentiment analysis and anomaly detection. Embeddings are also becoming an important part of agentic applications, where AI agents must retrieve and match different types of documents and prompts. One of the key features of Gemini Embedding is its built-in flexibility. It has been trained through a technique known as Matryoshka Representation Learning (MRL), which allows developers to get a highly detailed 3072-dimension embedding but also truncate it to smaller sizes like 1536 or 768 while preserving its most relevant features. This flexibility enables an enterprise to strike a balance between model accuracy, performance and storage costs, which is crucial for scaling applications efficiently. Google positions Gemini Embedding as a unified model designed to work effectively “out-of-the-box” across diverse domains like finance, legal and engineering without the need for fine-tuning. This simplifies development for teams that need a general-purpose solution. Supporting over 100 languages and priced competitively at $0.15 per million input tokens, it is designed for broad accessibility. A competitive landscape of proprietary and open-source challengers Source: Google Blog The MTEB leaderboard shows that while Gemini leads, the gap is narrow. It faces established models from OpenAI, whose embedding models are widely used, and specialized challengers like Mistral, which offers a model specifically for code retrieval. The emergence of these specialized models suggests that for certain tasks, a targeted tool may outperform a generalist one. Another key player, Cohere, targets the enterprise directly with its Embed 4 model. While other models compete on general benchmarks, Cohere emphasizes its model’s ability to handle the “noisy real-world data” often found in enterprise documents, such as spelling mistakes, formatting issues, and even scanned handwriting. It also offers deployment on virtual private clouds or on-premises, providing a level of data security that directly appeals to regulated industries such as finance and healthcare. The most direct threat to proprietary dominance comes from the open-source community. Alibaba’s Qwen3-Embedding model ranks just behind Gemini on MTEB and is available under a permissive Apache 2.0 license (available for commercial purposes). For enterprises focused on software development, Qodo’s Qodo-Embed-1-1.5B presents another compelling open-source alternative, designed specifically for code and claiming to outperform larger models on domain-specific benchmarks. For companies already building on Google Cloud and the Gemini family of models, adopting the native embedding model can have several benefits, including seamless integration, a simplified MLOps pipeline, and the assurance of using a top-ranked general-purpose model. However, Gemini is a closed, API-only model. Enterprises that prioritize data sovereignty, cost control, or the ability to run models on their own infrastructure now have a credible, top-tier open-source option in Qwen3-Embedding or can use one of the task-specific embedding models. source

New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap Read More »

Data governance: The contract layer that makes agentic systems possible

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Presented by Google Cloud Data governance once felt like a compliance burden, tucked away in the back office. Today, it’s the bedrock for enterprises scaling AI responsibly and unlocking its true value. As companies race to deploy agentic AI, CIOs and data leaders face a critical mandate: deliver governed, trusted data that AI systems can understand. This moment demands a shift in mindset. Governing data can no longer be an afterthought or a bottleneck. It must become an active contract layer that provides context, trust and traceability for every application and autonomous system. When done right, governance transforms scattered data into reliable data products, ready for an AI-driven future. How data governance has evolved The idea of governance has existed for decades, rooted in cataloging assets, tracking where data came from and controlling who could see it. In the early days of business intelligence, these tasks were mostly static and handled at a manageable scale. Reports refreshed overnight, and a small group of analysts made sense of the results. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Today, AI has changed everything. Lineage, access enforcement and cataloging must operate in real time and cover vastly more data types and sources. Models consume data continuously and make decisions instantly, raising the stakes for mistakes or gaps in oversight. What used to be a once-a-week check is now an always-on discipline. This transformation has turned data governance from a checklist into a living system that protects quality and trust at scale. Why cataloguing is evolving too As the volume, variety and velocity of data continue to grow, the traditional model of a static catalog with fixed semantics and passive metadata no longer meets the demands of modern AI use cases. What once served business intelligence needs now limits the ability to scale autonomous decision-making. CIOs and CDOs need to rethink the catalog as a living system. That means supporting structured and unstructured data, updating continuously, and operating more like a knowledge graph than a lookup table. It must power AI-assisted workflows across the governance lifecycle, capturing not just what data exists but how it is used. In this new model, the catalog is not just for visibility. It is the engine for context and trust at machine speed. Turning data into trusted products Traditional data governance focused on organizing structured tables, reports and compliance rules in silos. In a world where AI acts on real-time data, that old approach falls short. Agentic systems need more than raw access. They need clear semantics, guaranteed freshness and defined usage rights. This is why leading organizations now design their architectures around logical domains and treat data as a product. Each data product comes with a clear contract that defines what it represents, how current it is, who can access it and under what conditions. This is what governed data really means: it’s automatically cataloged, indexed across structured and unstructured sources, has clear lineage and is protected by trusted usage policies that follow it wherever it goes. This contract-backed model mirrors how we have always handled SLAs for applications and now extends that discipline to the data itself. It gives developers, analysts and AI models a reliable source of truth they can trust. Better data ergonomics through governance One of the biggest misconceptions is that governance slows down innovation. In reality, good governance speeds it up. By clarifying ownership, policies and data quality from the start, teams avoid spending precious time reconciling mismatches and can focus on delivering AI that works as intended. A clear governance framework reduces unnecessary data copies, lowers regulatory risk and prevents AI from producing unpredictable results. Getting this right also requires a culture shift. Producers and consumers alike need to see themselves as co-stewards of shared data products. Leaders like CIOs and CDOs set the standards and invest in the right technology, but people across the business keep the trust alive. This shared responsibility ensures that data stays reliable and AI stays accountable. Governance ready for real-time AI Enterprises deploying agentic AI cannot leave governance behind. These systems run continuously, make autonomous decisions and rely on accurate context to stay relevant. Governance must move from passive checks to an active, embedded foundation within both architecture and culture. At Google Cloud, we continue to expand Dataplex and our Iceberg integration to help organizations govern data at scale. With open formats, trusted data products and intelligent policy enforcement, companies can finally break free from fragmented tools and deliver AI that is reliable, explainable and built for the future. Governance is not just an IT function anymore. It is the essential contract that connects your data to the full promise of AI. Learn more about Google Cloud’s data to AI governance capabilities here. Irina Farooq is Senior Director of Product Management, Data & Analytics, at Google Cloud. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Data governance: The contract layer that makes agentic systems possible Read More »

AWS unveils Bedrock AgentCore, a new platform for building enterprise AI agents with open source frameworks and tools

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Cloud giant Amazon Web Services (AWS) believes AI agents will change how we all work and interact with information, and that enterprises need a platform that allows them to build and deploy agents at scale — all in one place. Today at its New York Summit, AWS unveiled Amazon Bedrock AgentCore, a new enterprise-grade platform designed to build, deploy, and operate AI agents securely and at scale. Swami Sivasubramanian, AWS Vice President of Agentic AI, said during the keynote that AgentCore “helps organizations move beyond experiments to production-ready agent systems that can be trusted with your most critical business processes.” AgentCore is a modular stack of services—available in preview—that gives developers the core infrastructure needed to move AI agents from prototype to production, including runtime, memory, identity, observability, API integration, and tools for web browsing and code execution. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF “We believe that agents are going to fundamentally change how we use tools and the internet,” said Deepak Singh, AWS Vice President of Databases and AI. “The line between an agent and an application is getting blurrier.” AgentCore builds on the existing Bedrock Agents framework, launched in late 2024, but dramatically expands capabilities by supporting any agent framework or foundation model—not just those hosted within Bedrock. That includes compatibility with open-source toolkits like CrewAI, LangChain, LlamaIndex, LangGraph, and AWS’s own Strands Agents SDK. What AWS Bedrock AgentCore includes AgentCore Runtime: A serverless, low-latency execution environment that supports multimodal workloads and long-running sessions with session isolation. AgentCore Memory: Long- and short-term memory services that let agents learn from past interactions and persist contextual knowledge across sessions. AgentCore Identity: OAuth-based identity and access management, allowing agents to act on behalf of users across systems like GitHub, Slack, or Salesforce. AgentCore Observability: Built-in dashboards, debugging, and telemetry tools with support for OpenTelemetry, LangSmith, and Datadog. AgentCore Gateway: Converts internal APIs, Lambda functions, and third-party services into agent-compatible tools using the Model Context Protocol (MCP). AgentCore Browser: Provides headless browser access for agents to autonomously interact with websites. AgentCore Code Interpreter: A secure environment for executing code generated by agents for analysis and visualization. AgentCore also integrates with the AWS Marketplace, enabling teams to discover and deploy pre-built agents and tools. According to Singh, AgentCore has been designed with interoperability in mind. It supports emerging industry standards like MCP and Google’s Agent-2-Agent (A2A) protocol. Features such as AgentCore Identity and Gateway ensure agents have clear permissioning and can interact securely with internal systems and third-party APIs. AWS’s launch puts it squarely into the center of what’s quickly becoming one of the most competitive segments in enterprise AI. OpenAI’s Agents SDK and Google’s Gemini-based Agents SDK are both pushing similar visions of end-to-end agent development platforms. Writer’s AI HQ and startups like Cognition (maker of Devin) are also building tools for managing autonomous software agents. “Agents are the most impactful change we’ve seen in ages,” Sivasubramanian said. “With agents comes a shift to service as a software. This is a tectonic change in how software is built, deployed and operated.” Customer adoption and early use cases Several companies granted early access to AgentCore are already building production-grade agentic applications across industries including finance, healthcare, marketing, and content management. Cloud document and file storage company Box is exploring ways to extend its content management tools using Strands Agents and Bedrock AgentCore Runtime. CTO Ben Kus said the integration gives Box customers “top tier security and compliance” while scaling AI capabilities across enterprise environments. Brazil’s Itaú Unibanco is using AgentCore to support its development of hyper-personalized, secure digital banking experiences. Chief Technology Officer Carlos Eduardo Mazzei said the new platform “will help us deliver an intuitive banking experience with the efficiency of automation and personalization customers expect.” In the healthcare space, Innovaccer has built a new protocol—HMCP (Healthcare Model Context Protocol)—on top of AgentCore Gateway. CEO and co-founder Abhinav Shashank called Gateway a “game-changer” that allows the company to convert existing APIs into agent-compatible tools at scale while maintaining trust, compliance, and operational efficiency. Marketing firm Epsilon is leveraging AgentCore to accelerate campaign build times and improve engagement. Prashanth Athota, SVP of Software Engineering, said the company expects to reduce build times by up to 30% and enhance customer journey personalization. Availability and pricing AgentCore is now available in preview in select AWS regions including US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt). It’s free to try until September 16, 2025, with pricing to begin thereafter. Pricing for AgentCore is entirely consumption-based, with no upfront commitments or minimum fees. Each module—Runtime, Memory, Identity, Observability, Gateway, Browser, and Code Interpreter—is billed independently and can be used a la carte or together. Runtime, Browser, and Code Interpreter services are priced per second, based on CPU and memory usage, with rates set at $0.0895 per vCPU-hour and $0.00945 per GB-hour. Gateway charges $0.005 per 1,000 tool API invocations, $0.025 per 1,000 search queries, and $0.02 per 100 tools indexed per month. Memory costs are based on data volume: $0.25 per 1,000 short-term memory events, $0.75 per 1,000 long-term memories stored (or $0.25 with custom strategies), and $0.50 per 1,000 retrievals. AgentCore Identity costs $0.010 per 1,000 token or API key requests, though it’s included at no extra charge when used via Runtime or Gateway. Observability is billed via Amazon CloudWatch rates. To learn more or get started, AWS directs developers to its AgentCore documentation, GitHub samples, and a dedicated Discord server. source

AWS unveils Bedrock AgentCore, a new platform for building enterprise AI agents with open source frameworks and tools Read More »

Anthropic launches finance-specific Claude with built-in data connectors, higher limits and prompt libraries

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As some regulated enterprises cautiously expand their use of AI, platform and model makers are starting to offer bespoke versions to specific industries. Anthropic is making its first step into that direction with the new Claude for Financial Services, essentially a special version of its Claude for Enterprise tier, that could soothe some of the fears of the sector around interoperability and tool use. Anthropic will also provide pre-built connectors to data providers, opening up discoverability routes for tools on Claude. Jonathan Pelosi, head of industry for financial services at Anthropic, told VentureBeat that Anthropic models “were already particularly well-suited for financial workloads and we’ve been tailoring them to get better and better.” But while the Claude models offer enterprise solutions for financial services firms, Pelosi said the sector was also looking for more features from the Claude chat interface. “Unlike some competitors in the space who built a consumer app that became a sensation, or they built these new video generators and meme generators, that’s just never our focus,” Pelosi said. “We are enterprise first, so our models are uniquely well-suited to perform best in class against complicated enterprise workloads, which means complicated quantitative analysis, complex data extraction for the financial services industry at scale.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Claude for Financial Services would offer users additional rate limits, especially since, Pelosi noted, many analysts often find themselves quickly hitting capacity limitations due to the size of their workloads. Unlike Claude for Enterprise, this financial services-specific platform will also include pre-built MCP connectors to financial data providers like FactSet, PitchBook, S&P Capital IQ and Morningstar, among others, and implementation support from Anthropic. The third big difference between Claude for Enterprise and Claude for Financial Services is the presence of a prompt library. Pelosi said some users had a difficult time translating their analytics workflow or needs into a prompt, so the prompt library can guide them. Pelosi said customers already using Claude for Enterprise are not obligated to move to the financial services version; however, added features like increased limits might prompt them to switch. How Anthropic is approaching the financial services ecosystem Pelosi noted that many financial service institutions have strong engineers who can build AI applications. However, these companies’ core focus remains banking or insurance, so a platform that simplifies the process of connecting data to AI is essential. “What we’re trying to do is bring all this technology together under one roof,” said Pelosi. “Think of it as an out-of-the-box solution that’s easily configurable for the Bridgewaters, or the Norwegian Sovereign Wealth Funds of the world, versus the alternative where they cobble this thing together on their own.” Financial services companies have been building generative AI tools in various ways, creating AI platforms on their own using off-the-shelf models, or building on top of existing chatbots like Claude, Gemini or ChatGPT. BNY, for example, has been experimenting with AI agents for its AI platform Eliza. Capital One also built an agent that pulls on car dealership inventory and car loan data for auto sales. Startup Metal offers an assistant for financial analysts and private equity that reads and parses through 10-Ks, 10-Qs or 8-Ks. Rogo, another startup, also allows financial institutions to upload documents and set workflows. Allaying concerns Anthropic is not alone in providing bespoke solutions for the financial sector. However, offering a guided setup and vetted access to data providers may go a long way for an industry wary of accidentally exposing itself to more risk. MCP can connect one company or its agent to another company by providing needed identification and tool use permissions. However, regulated industries express concern that it still lacks some important KYC and identity features. Many in the sector see the benefit of MCP servers to access critical financial data or other documents, but are waiting for mass adoption. At the same time, the financial services industry has been criticized for being too cautious in adopting the technology. Pelosi reiterated that Anthropic is focused on safety and responsibility, which is why he feels a solution specific to finance was the natural next step for them. Pelosi said that while Anthropic’s intention “is not to build a Claude for every vertical,” the company could extend bespoke features to other industries at some point. Anthropic also recently opened up Claude to more tool discoverability with partners like Canva, Notion, Stripe and Figma, providing more context to searches and activities on the app. source

Anthropic launches finance-specific Claude with built-in data connectors, higher limits and prompt libraries Read More »

Slack gets smarter: New AI tools summarize chats, explain jargon, and automate work

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Slack is rolling out an extensive array of artificial intelligence features that promise to eliminate routine tasks and turn the messaging platform into a central hub for enterprise productivity, marking owner Salesforce’s direct challenge to Microsoft’s workplace AI dominance. The announcements, set to roll out over the coming months, include AI-powered writing assistance embedded directly into Slack’s canvas feature, contextual message explanations, automated action item identification, and enterprise search capabilities that span multiple connected business applications. The moves come as Salesforce simultaneously restricts external AI companies from accessing Slack data, creating a walled garden approach that mirrors broader industry trends toward platform consolidation. “Unlike some AI tools that sit outside the flow of work, Slack’s AI shows up where work happens – across conversations, decisions, and documentation,” said Shalini Agarwal, Vice President of Slack Product at Salesforce, in an exclusive interview with VentureBeat. “The key differentiator is context, which comes in the form of structured and unstructured data in Slack.” The timing underscores intensifying competition in the $45 billion enterprise collaboration market, where Microsoft’s Teams platform and its Copilot AI assistant have gained significant traction against Slack since Salesforce’s $27.7 billion acquisition of the messaging service in 2021. Google is also pushing its Duet AI across Workspace applications, creating a three-way battle for corporate customers increasingly focused on AI-driven productivity gains. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How Slack’s contextual AI works inside workplace conversations Slack’s new capabilities depart from traditional AI assistant models that require users to actively prompt for help. Instead, the platform will proactively surface relevant information and automate routine tasks within existing workflows. The AI writing assistance, launching soon within Slack’s canvas feature, will allow teams to automatically generate project briefs from conversation threads, extract action items from brainstorming sessions, and reformat meeting notes into structured updates. When combined with Slack’s existing AI-powered meeting transcription in huddles, the feature creates an end-to-end documentation workflow. “AI needs to feel easy and seamless, and you shouldn’t have to work hard to use it,” Agarwal told VentureBeat. “Since the release of AI in Slack, customers have summarized more than 600 million messages, saving a collective 1.1 million hours across users.” Perhaps more intriguingly, Slack will introduce contextual message explanations that activate when users hover over unfamiliar terms, acronyms, or project references. The feature draws from the organization’s unique vocabulary and conversation history stored within Slack, potentially solving a persistent onboarding and cross-team collaboration challenge. “Ever hit an unfamiliar acronym or bit of jargon in a Slack message? That moment of confusion, of searching or asking, slows everything down,” the company noted in its announcement. Enterprise search becomes the new battleground for workplace data The centerpiece of Slack’s AI strategy is enterprise search, now generally available, which allows users to query information across connected applications including Salesforce, Microsoft Teams, Google Drive, Confluence, and Box from a single interface within Slack. The capability addresses a persistent productivity drain in modern workplaces, where employees spend an estimated 41% of their time on repetitive tasks like searching for information across disconnected systems, according to Slack’s research. By positioning Slack as the unified search interface for enterprise data, Salesforce is making a bold play to become the primary workspace hub for knowledge workers. Rather than building point-to-point connections between applications, Slack is positioning itself as the universal translator for workplace information. The approach acknowledges a harsh reality: most organizations have accepted that their data will remain scattered across dozens of applications, but they desperately need a better way to find and use that information. For IT departments, Slack promises minimal deployment complexity. “Generally, it should be a light lift for IT teams,” Agarwal said. “Connectors will be out of the box as they become available. Once admins enable an app, and users authenticate to it, results will be available immediately.” Why Salesforce is blocking AI competitors from accessing Slack data Even as Slack opens its search capabilities to customers’ connected applications, Salesforce has been aggressively restricting how external AI companies access Slack data. In May, the company amended its API terms of service to prohibit bulk data exports and explicitly ban using Slack data to train large language models. The move affects third-party AI search companies like Glean, which had been indexing Slack conversations alongside other enterprise data sources to provide unified search experiences. Under the new restrictions, such companies can only access Slack data through real-time search APIs with significant limitations. Salesforce is making a calculated gamble. By restricting access to Slack data, the company is betting that its own AI capabilities will prove superior to external alternatives. But enterprise customers have consistently shown they prefer choice and flexibility over forced vendor lock-in. If competing AI platforms deliver significantly better results using data from other sources, Salesforce risks pushing customers toward alternative messaging platforms that offer more open integration. The restrictions underscore how valuable workplace conversation data has become. With over 5 billion messages exchanged weekly on Slack, the platform contains what Agarwal describes as “the history of your company, and all the information across teams and projects.” This conversational data offers something unique in the enterprise software landscape: unstructured, context-rich information about how work actually gets done, rather than formal documentation about how it should get done. Enterprise security concerns drive AI trust and compliance features Salesforce has built its AI capabilities around what it calls “the Einstein Trust Layer,” emphasizing that customer data never leaves the company’s infrastructure or trains external AI models. The approach addresses enterprise concerns about data sovereignty that have

Slack gets smarter: New AI tools summarize chats, explain jargon, and automate work Read More »

Blaxel raises $7.3M seed round to build ‘AWS for AI agents’ after processing billions of agent requests

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Blaxel, a startup building cloud infrastructure specifically designed for artificial intelligence agents, has raised $7.3 million in seed funding led by First Round Capital, the company announced Thursday. The financing comes just one month after the six-founder team graduated from Y Combinator’s Spring 2025 batch, underscoring investor appetite for infrastructure plays in the rapidly expanding AI agent market. The San Francisco-based company is betting that the current generation of cloud providers — Amazon Web Services, Google Cloud, and Microsoft Azure — are fundamentally mismatched for the new wave of autonomous AI systems that can take actions without human intervention. These AI agents, which handle everything from managing calendars to generating code, require dramatically different infrastructure than traditional web applications built for human users. “The current cloud providers have been designed for the Web 2.0, Software as a Service era,” said Paul Sinaï, Blaxel’s co-founder and CEO, in an exclusive interview with VentureBeat. “But with this new wave of agentic AI, we believe that there is a need for a new type of infrastructure which is dedicated to AI agents.” Why AWS and Google Cloud weren’t built for autonomous AI agents The timing reflects a broader shift in enterprise computing as companies increasingly deploy AI agents for customer service, data processing, and workflow automation. Unlike traditional applications where databases sit alongside web servers in predictable patterns, AI agents create unique networking challenges by connecting to language models in one region, APIs in another cloud, and knowledge bases elsewhere—all while users expect instant responses. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Blaxel has already demonstrated significant traction, processing millions of agent requests daily across 16 global regions by the end of their Y Combinator batch. One customer is running over 1 billion seconds of agent runtime to process millions of videos, representing a scale that illustrates the infrastructure demands of AI-first companies. “One of our customers is processing session replays to enable product managers to understand better how the user behavior of their product,” Sinaï explained. “They need to process millions of session replays every month. So it represents millions of minutes of sessions. They are using our agentic infrastructure to process those session replays and provide insights for product managers.” The company’s approach centers on providing infrastructure that AI agents can operate themselves, rather than requiring human administrators. This includes sandboxed virtual machines that boot in under 25 milliseconds, automatic scaling based on agent activity patterns, and APIs designed to be consumed directly by AI systems rather than human developers. How six co-founders with a successful exit plan to take on Big Tech Blaxel’s unusual six-founder structure stems from the team’s shared experience building and selling a previous company to OVHcloud, Europe’s largest cloud provider. That company became OVH’s entire analytics product suite, giving the team firsthand experience with both cloud infrastructure challenges and successful exits. “I know it sounds unusual, pretty big team. We didn’t fit exactly on the stage for demo day,” Sinaï said, referencing Y Combinator’s signature event. “But we already did that. My previous company, which I sold to OVH cloud, we were also six co-founders.” The team includes Charles Drappier, whom Sinaï has known for over 15 years, along with co-founders Christophe Ploujoux, Nicolas Lecomte, Thomas Crochet, and Mathis Joffre. Their collective experience spans infrastructure, developer tools, and platform engineering — critical expertise for competing against tech giants with virtually unlimited resources. “I think it’s important to be six right now, because we have a lot of ambition,” Sinaï said. “What we are doing is building this next generation of cloud computing for this new agentic era.” What sets Blaxel apart in the competitive cloud infrastructure market The cloud infrastructure market is notoriously competitive, with AWS commanding roughly one-third market share and newer players like Modal, Replicate, and RunPod targeting AI workloads. Blaxel differentiates itself by focusing specifically on AI agents rather than model inference or training. “Most of the competitors you mentioned are solving a very difficult problem, which is around the inference — how you can host your model, how you can make those models as fast as you can in terms of number of tokens,” Sinaï said. “But there is not that many people working on infrastructure for the agents, and it’s exactly what we are doing.” The company’s platform includes three main components: agent hosting for deploying AI systems as serverless APIs, MCP (Model Context Protocol) servers for connecting agents to external tools, and a unified gateway for accessing multiple AI models. The infrastructure is designed to handle the variable resource demands of AI agents, which might require minimal computing power while waiting for responses but need significant resources during active processing. Enterprise security and compliance features target regulated industries Despite targeting younger AI-first companies, Blaxel has implemented enterprise-grade security measures including SOC2 and HIPAA compliance. The platform offers data residency controls that allow customers to restrict workloads to specific geographic regions—critical for companies in regulated industries. “We provide a policy framework where you can attach, for example, to workloads to say, this agent cannot run outside of those subsets of regions,” Sinaï explained. “You can attach a policy to say this agent cannot run outside of the United States, so you are sure that this agent will process the data only in the regions you have chosen.” This approach reflects the company’s belief that even early-stage AI companies need robust infrastructure practices because they’re building the enterprises of tomorrow. “We believe that it’s very important to have, even for young companies, the best infrastructure

Blaxel raises $7.3M seed round to build ‘AWS for AI agents’ after processing billions of agent requests Read More »

OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’

Leave a Comment / Top Tech Update / VentureBeat

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Scientists from OpenAI, Google DeepMind, Anthropic and Meta have abandoned their fierce corporate rivalry to issue a joint warning about AI safety. More than 40 researchers across these competing companies published a research paper today arguing that a brief window to monitor AI reasoning could close forever — and soon. The unusual cooperation comes as AI systems develop new abilities to “think out loud” in human language before answering questions. This creates an opportunity to peek inside their decision-making processes and catch harmful intentions before they become actions. But the researchers warn that this transparency is fragile and could vanish as AI technology advances. The paper has drawn endorsements from some of the field’s most prominent figures, including Nobel Prize laureate Geoffrey Hinton, often called the “godfather of AI,” of the University of Toronto; Ilya Sutskever, co-founder of OpenAI who now leads Safe Superintelligence Inc.; Samuel Bowman from Anthropic; and John Schulman from Thinking Machines. Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability. pic.twitter.com/MZAehi2gkn — Bowen Baker (@bobabowen) July 15, 2025 “AI systems that ‘think’ in human language offer a unique opportunity for AI safety: We can monitor their chains of thought for the intent to misbehave,” the researchers explain. But they emphasize that this monitoring capability “may be fragile” and could disappear through various technological developments. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Models now show their work before delivering final answers The breakthrough centers on recent advances in AI reasoning models like OpenAI’s o1 system. These models work through complex problems by generating internal chains of thought (CoT) — step-by-step reasoning that humans can read and understand. Unlike earlier AI systems trained primarily on human-written text, these models create internal reasoning that may reveal their true intentions, including potentially harmful ones. When AI models misbehave — exploiting training flaws, manipulating data or falling victim to attacks — they often confess in their reasoning traces. The researchers found examples where models wrote phrases like “Let’s hack,” “Let’s sabotage,” or “I’m transferring money because the website instructed me to” in their internal thoughts. Jakub Pachocki, OpenAI’s chief technology officer and co-author of the paper, described the importance of this capability in a social media post. “I am extremely excited about the potential of chain-of-thought faithfulness and interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview,” he wrote. I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview. As AI systems spend more compute working e.g. on long term research problems, it is… https://t.co/7Dqy2gtZfy — Jakub Pachocki (@merettm) July 15, 2025 The technical foundation for monitoring lies in how current AI systems work. For complex tasks requiring extended reasoning, AI models must use their CoT as working memory, making their reasoning process partially visible to human observers. The researchers explain this creates an “externalized reasoning property” where some thinking must happen in readable language. Several technological shifts could eliminate monitoring capabilities The transparency could vanish through several pathways. As AI companies scale up training using reinforcement learning — where models get rewarded for correct outputs regardless of their methods — systems may drift away from human-readable reasoning toward more efficient but opaque internal languages. Previous research shows that language models fine-tuned with outcome-based rewards can abandon legible English for incomprehensible shortcuts. The shift away from human-generated training data in favor of AI-generated reasoning could accelerate this drift. Bowen Baker, an OpenAI researcher and lead author, highlighted these concerns in social media posts. “The existing CoT monitorability may be extremely fragile,” he warned. “Higher-compute RL, alternative model architectures, certain forms of process supervision, may all lead to models that obfuscate their thinking.” Furthermore, the existing CoT monitorability may be extremely fragile. Higher-compute RL, alternative model architectures, certain forms of process supervision, etc. may all lead to models that obfuscate their thinking. — Bowen Baker (@bobabowen) July 15, 2025 Novel AI architectures pose another threat. Researchers are developing systems that reason in continuous mathematical spaces rather than discrete words, potentially eliminating the need for language-based thought entirely. These “latent reasoning models might not need to verbalize any of their thoughts and would thus lose the safety advantages,” the researchers write. Even without deliberate changes, indirect pressures could erode transparency. If AI outputs are optimized to look good to human reviewers, this could affect the reasoning chains leading to those outputs. Models might also learn to hide their thoughts if they become aware their reasoning is being monitored. Companies already catch AI systems scheming during testing Despite its fragility, CoT monitoring has proven valuable in current AI safety research. The technique successfully detects when models exploit weaknesses in their training systems, fall victim to manipulation attempts or reveal misaligned goals even when they don’t act on them. The monitoring provides “an early insight into models’ goals and motivations, even if the model takes no misaligned actions” — potentially catching problems before they manifest as harmful behaviors. This early warning system has also helped researchers identify flaws in AI evaluations and understand when models might behave differently during testing versus real-world use. When models pretend to have desirable goals while pursuing objectives humans would disapprove of, this deception often shows up in their CoT. Researchers have successfully used this visibility to spot various forms of AI

OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’ Read More »

5 key questions your developers should be asking about MCP

OpenAI unveils ‘ChatGPT agent’ that gives ChatGPT its own computer to autonomously use your email and web apps, download and create files for you

Forget the hype — real AI agents solve bounded problems, not open-world fantasies

New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap

Data governance: The contract layer that makes agentic systems possible

AWS unveils Bedrock AgentCore, a new platform for building enterprise AI agents with open source frameworks and tools

Anthropic launches finance-specific Claude with built-in data connectors, higher limits and prompt libraries

Slack gets smarter: New AI tools summarize chats, explain jargon, and automate work

Blaxel raises $7.3M seed round to build ‘AWS for AI agents’ after processing billions of agent requests

OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’

We provide a matching platform and membership services for startup groups in Asia

Useful Links

Become an Affiliate

Contact

News & Insight

Join the family!

Latest News

Social club The Nanson nabs ex-Häagen-Dazs Asia marketing head as CEO

When good data goes bad: Fighting fraud and silos in Southeast Asia's largest economy