Managing AI Agents at Scale: The Organizational Problem

Andrius Putna · Wed May 13 2026 · 9 min read

#ai #agents #enterprise #governance #organizational-design

Abstract visualization of agent sprawl across a network topology converging into a central orchestration dashboard with blue and amber tones

The average Fortune 500 firm will run 150,000 agents by 2028. Only 13% have governance that can handle them. The bottleneck isn't engineering.

The conversation in enterprise corridors has shifted from “should we build agents?” to “how do we make sense of the ones we already built?” That’s not a technology question. It’s an organizational design question — and most companies are answering it wrong.

Gartner delivered the defining number for this half: by 2028, the average Global Fortune 500 enterprise will run over 150,000 AI agents, up from fewer than 15 in 2025 (Gartner, Digital Workplace Summit 2026). Four orders of magnitude. In two years. Yet only 13% of organizations say they have adequate governance to manage what they’re deploying (Complete AI Training / Gartner reporting).

We’ve watched this play out across our own engagements. The teams that succeed don’t have better models or more engineers. They have better organizational structure around their agent programs. This post breaks down what that structure looks like and why most orgs are building the wrong one.

The Sprawl Problem No One Planned For

Agent sprawl isn’t a hypothetical. It’s what happens when five teams each build three customer-facing agents without a shared identity system, audit trail, or cancellation protocol. Two years later you have 15,000 unregistered agents across staging and production, most of them running on service accounts that haven’t been rotated since the pilot phase.

The structural problem is that organizations treat agents like regular software: ship it, monitor it, incident-response when it breaks. But agents are different in three ways:

They make decisions at runtime. Traditional software follows deterministic logic paths. An agent’s behavior changes based on the model it loads, the context it receives, and the tools it calls. You can’t “version” agent behavior the way you version code.
They multiply through composition. A single workflow agent becomes three when you add a supervisor, a critic, and a memory layer. Teams don’t intentionally architect this sprawl — it emerges as they stack capabilities to solve edge cases.
They create liability without intent. When a deterministic script deletes records, you have a bug report. When an agent deletes records because it misinterpreted a tool schema, you have a governance incident, a PR problem, and potential regulatory exposure.

Why the Central AI Team Model Breaks Down

The most common organizational response to “we have too many agents” is to create a central AI team — sometimes called a Center of Excellence, sometimes an AI Platform team. This team owns the models, the governance, the infrastructure. Everyone else files requests and waits.

The fatal flaw: central AI teams become the very bottleneck they were created to prevent. A centralized team that must approve every agent deployment cannot keep pace with 150,000 agents. The math doesn’t work.

What actually separates organizations that scale successfully from those that don’t isn’t whether they have a central AI team — it’s how they structure the relationship between central governance and decentralized execution.

The Platform Model That Works: Guardrails, Not Gates

The orgs that get this right in 2026 share one characteristic: their central AI team acts as a platform provider, not a gatekeeper. They ship guardrails that make it easy to build agents correctly, rather than requiring approval to build agents at all.

The Three-Layer Operating Model

Layer 1: The Platform (central team, 8–20 people). Builds and maintains the shared agent infrastructure: identity and RBAC, audit logging, tool registry, model routing, eval harnesses, and cost tracking. This is the foundation that any team agent needs to be production-grade. Think of it the way you think about your cloud platform team — nobody debates whether each product team should provision their own VPC.

Layer 2: Embedded Agent Engineers (domain teams). Every business unit with meaningful agent use cases (customer service, finance ops, IT, product) has one or two agent specialists embedded. These engineers report to their business unit but maintain strong dotted lines to the platform team. They build domain-specific agents using platform primitives and contribute tool definitions and eval datasets back to the platform.

Layer 3: Business Owners (non-technical accountability). Every production agent has a named business owner accountable for its performance metrics, cost budget, and incident response. This is the single most overlooked organizational element — without business ownership, agents become orphaned infrastructure the moment the original engineer moves teams.

IDC’s FutureScape 2026 research found organizations with mature Agentic Centers of Excellence are 20% more competitive on innovation metrics, but only when the CoE operates as a platform layer rather than a centralized build team (IDC FutureScape, via AgentCorps).

The Five Operational Capabilities Every Agent Program Needs

Regardless of org structure, any organization running agents at scale needs these five capabilities in place. Teams missing more than two are almost certainly in the 88% pilot-to-production failure cohort we’ve tracked (see our analysis).

1. Agent Identity and Lifecycle Management

Every production agent needs its own identity — separate from the human user who launched it and separate from generic service accounts. This means agent-specific credentials, role-based scope definitions, provisioning workflows, suspension mechanisms, and rotation policies.

Okta formalized this pattern at Showcase 2026 with their Agent Identity blueprint, and Google added unique cryptographic IDs for every agent on its enterprise platform at Cloud Next 2026. The principle is clear: agents are principals, not proxies.

Without this, you cannot answer the most basic governance question when an incident occurs: which agent did this? Only 21.9% of teams currently grant agents their own identity and apply RBAC accordingly (see our governance toolkit analysis).

2. The Tool Registry

Every tool or MCP server available to agents across your organization should live in a central registry with documented contracts, version history, SLA expectations, and deprecation timelines. This is the agent equivalent of an API catalog, but with the added dimension that tools are non-deterministic consumers — an LLM can misuse a tool its human author never anticipated.

Microsoft addressed this via their Agent Governance Toolkit, which includes a Policy Engine that enforces action-level constraints at runtime. The key insight: tool access isn’t binary (allowed/denied). It’s contextual — the same tool should be available to agent A under condition X but not to agent B under condition Y.

3. Eval-as-a-Service

The evaluation gap is the top blocker for agent productionization (64% of leaders cite it, per Forrester and Anaconda’s 2026 surveys). The platform team should provide a shared eval infrastructure — golden datasets, scoring pipelines, regression testing against model upgrades — that any team can use without building their own.

This isn’t just nice-to-have. It’s economics. Teams that build eval infrastructure from scratch per agent spend 3–5 weeks getting it right. Teams that consume it as a service ship in days. We’ve written extensively about the evaluation landscape — see our 2026 eval tutorial for specifics.

4. Cost Attribution and Budget Enforcement

When median enterprise LLM bills grow 7.2x year-over-year, you need token spend attribution down to the individual agent. Not “the AI program spent $47K this month” — “the invoice-matching agent spent $3,200 on GPT-5.5 calls, $800 on embedding refresh, and $1,100 on evaluation runs.”

The organizations doing this right treat cost attribution like their cloud FinOps program: tagged resources, dashboards per team, budget alerts with automatic throttle, and monthly cost reviews between the platform team and each agent’s business owner. We’ve covered the mechanics in our AI FinOps breakdown.

5. Incident Response Playbooks for Agent Failures

Traditional incident response assumes a human operator can restart a failed process or roll back a deploy. Agent incidents are different: the agent may have initiated actions in three downstream systems before its failure was detected, and “restarting” means resuming from a checkpoint — if one exists — while reconciling the actions that already completed.

Runbooks for agent incidents need three things traditional runbooks don’t: checkpoint recovery procedures, downstream reconciliation workflows, and a clear escalation path to the business owner (the engineer who built the agent can usually be reached; the business owner who authorized its production deployment may not be).

The Agent Inventory Problem

Here’s a number that should land uncomfortably: 28,663 agent control systems with exposed interfaces were discovered accessible from the public internet in April 2026 alone (AI Agent Store reporting). You probably don’t know how many agents your organization has running right now. Neither does anyone else at your scale.

The first operational act for any serious agent program should be an audit: catalog every agent, every service account, every API key used as an agent credential, and every external-facing endpoint that accepts agent control commands. If you can produce this list in under a week, you’re ahead of most enterprises. If the list takes months to compile, you’re already behind.

What the Numbers Say About Org Structure

The data on organizational success correlates strongly with the three-layer model we outlined:

Organizations with a centralized AI team that builds agents directly see a 25% pilot-to-production conversion rate — marginally above the cross-industry average of 12%.
Organizations with a centralized AI team that operates as a platform see a 41% conversion rate, with median payback of 5.1 months (per BCG and Forrester 2026 surveys, as documented in our industry benchmarks analysis).
Organizations with no central AI team or governance model (79% of enterprises at last count) remain locked in the pilot-to-production churn, burning through budgets that their CFOs approved for transformation but are spending on repeated prototype development.

The gap between 25% and 41% conversion maps directly to a single organizational decision: does your central team provide rails or gates?

The Hard Truth About Agent Sprawl

Gartner’s 150,000-agent projection isn’t a warning — it’s a forecast. The agents will exist. The question is whether they’ll exist under your governance or outside it.

The organizations that will manage 150,000 agents productively in 2028 are building their operating models right now. Not by hiring 200 more AI engineers. By designing the platform, identity, and accountability layers that make agents runnable, observable, and cancellable at scale.

The companies that treat agent governance as a compliance checkbox will discover — as many already have — that governance that arrives after deployment is just expensive incident forensics.

← back to blog

Technical illustration of layered AI agent governance architecture with policy enforcement nodes

Deep Dives

AI Agent Governance: The 2026 Deep Dive

Traditional AI governance fails runtime agents. We build a six-layer architecture covering policy enforcement, audit trails, and kill switches.

May 1, 2026

A glowing security shield protecting abstract AI agent nodes in a dark blue and cyan tech illustration

Industry Analysis

Agent Governance: Secure, Observe, and Deploy AI Agents in Production

Microsoft, Google, and Okta shipped agent governance tooling this month. We reviewed the landscape for builders facing the 88% pilot failure rate.

Apr 27, 2026

Split desk scene: developer workstation on the left representing the build path, executive desk with platform dashboards on the right representing the buy path, with a wooden decision signpost in the middle