On-prem AI agents: the signal in three announcements

Three vendors, one week, one direction: the cloud-only assumption for AI agents is cracking.

Share
On-prem AI agents: the signal in three announcements

TL/DR: Three announcements from three vendors in one week all point the same way: the cloud‑only assumption for AI agents is cracking, under pressure from data gravity, sovereignty, and compliance.


The OpenAI/Dell partnership on May 18 got the most attention. But it is the least surprising of the three, once you see it in context of what Microsoft shipped in February and what Anthropic did on the exact same day. Put them together and a pattern emerges that is worth understanding before building anything agent-shaped for an enterprise.

What happened, in sequence

February 24: Microsoft announced Foundry Local and Azure Local disconnected operations. The headline capability is running large multimodal AI models on NVIDIA hardware, fully air‑gapped, with local inference APIs, inside a customer's sovereign private cloud, not Microsoft’s. No network required. Microsoft 365 Local (Exchange, SharePoint, Skype) follows the same pattern: full on-prem, under the customer's own governance, even when offline. In practice, this is an agent runtime that lives entirely inside a sovereign environment.

Microsoft Sovereign Cloud: large AI models running fully disconnected
Azure Local disconnected operations, Microsoft 365 Local, and Foundry Local with large model support, now available for sovereign and air-gapped environments.

April 27: OpenAI and Microsoft amended their partnership. The change that matters here: OpenAI's exclusivity with Azure ends. OpenAI products can now ship on any cloud provider. The phrasing is “flexibility,” but the practical meaning is that OpenAI can now go where enterprise customers need the model to be, including outside Azure entirely and, eventually, into more on‑prem and carrier‑hotel environments.

May 18: Two moves on the same day. OpenAI and Dell announced that Codex will connect with Dell's AI Data Platform and Dell AI Factory, i.e. on‑prem enterprise data storage and compute. And Anthropic acquired Stainless, the company that builds SDKs and MCP servers—the connectivity layer that lets agents reach internal tools and systems.

OpenAI and Dell: Codex for hybrid and on-premises enterprise environments
Codex connects with the Dell AI Data Platform and AI Factory, bringing agentic AI to hybrid and on-premises environments where enterprise data already lives.
Anthropic acquires Stainless
Stainless builds SDKs and MCP servers. Anthropic's stated reason: "agents are only as useful as what they can connect to."

Three vendors, three different moves, one direction: agents moving to where the data and systems already live.

Why on-prem agents at all

The case for keeping AI agents in the cloud is real: no infrastructure to manage, instant access to frontier models, and billing that scales with usage rather than upfront capex. For a lot of workloads, that trade‑off is fine.

But a few classes of enterprise problems push back hard.

Data gravity. An agent that needs to reason over a large, frequently-updated internal knowledge base (codebases, internal wikis, ERP exports, support ticket history) pulls data across a network boundary on every invocation. Moving the context to the cloud and back is slow and expensive at scale. Moving the agent to the data is faster.

Sovereignty and compliance. Healthcare, financial services, and public sector customers in many jurisdictions cannot send data outside a specific boundary, let alone outside the country. "Use our cloud API" is not an option regardless of model quality. Microsoft's February announcement was explicit about this: the target market is defence, regulated industries, and governments operating under strict legal and regulatory requirements.

Latency. Agentic workflows that need to iterate quickly on local context (running a test suite, querying a database repeatedly, analyzing a crash dump) benefit from co-location with the systems they touch. Round-trip latency to a cloud API adds up when the agent is making dozens of tool calls per task.

None of this is a new argument. It is the same set of pressures that drove on‑prem databases, private cloud deployments, and data‑residency provisions in enterprise contracts for the last decade. What changed is that frontier‑level AI capabilities are now available in forms that can realistically run locally, and the vendors are shipping infrastructure to support that.

What the Anthropic/Stainless deal adds to this picture

The Stainless acquisition is easy to misread as a developer-experience play. Stainless has built every official Anthropic SDK since the beginning, and hundreds of companies use their tooling to generate SDKs, CLIs, and MCP servers from API specs. That is genuinely useful and Anthropic wanting to own it makes sense on those grounds alone.

But the framing Anthropic chose is the tell: "agents are only as useful as what they can connect to." They bought the company that builds the connectivity layer between agents and enterprise systems at exactly the moment the whole industry is trying to figure out how to wire agents into internal data and tools.

On‑prem and hybrid agent deployments make the connectivity problem harder, not easier. When the agent runs inside a customer's boundary, the SDK and MCP server tooling has to work there too, often in more constrained and heterogeneous environments. Owning that layer is a meaningful structural advantage if enterprise on‑prem and hybrid is where a significant share of the market is heading.

The part nobody talks about

Every announcement in this sequence underweights one thing: governance is harder when the agent runs close to production systems without a cloud intermediary keeping logs.

A cloud-based agent service provides audit trails, rate limiting, content filtering, and billing as side effects of routing through the vendor's infrastructure. Move the agent on-prem and all of that becomes the operator's problem. Llama Guard handles content filtering locally. Structured logging is solvable. But building the full governance stack that enterprises actually need, especially for agents with write access to production systems, is non-trivial engineering.

Whether that works in practice depends on whether the agent's tool calls map cleanly onto whatever governance model the customer already runs, which is not guaranteed. The model quality and serving infrastructure problems are mostly solved. The governance and auditability story for on-prem agents is where the quite some engineering work remains.

And that is the actual question underneath all three announcements: not "can frontier AI run on‑prem?" but "who owns the governance layer when it does?" 🤔