Key takeaways
- Anthropic Managed Agents owns L1-L3: model, tool execution, and orchestration. These three layers constitute the infrastructure harness — the runtime that makes AI agents operationally viable. This is excellent engineering and increasingly commodity. It is not governance.
- Regulated enterprises must own L4-L7: policy, evidence, control, and business workflow. These four layers constitute the governed harness for AI agents — the accountability layer that makes agentic AI defensible under SOX, HIPAA, EU AI Act, and DORA. No vendor can own this because compliance is jurisdiction-specific and institution-specific.
- The shared-responsibility boundary sits between L3 and L4 and is non-negotiable. This boundary is conceptually identical to AWS shared responsibility for cloud. The vendor owns the substrate; the enterprise owns the posture. Confusing L3 orchestration for governance is the single most expensive architectural mistake teams will make in 2026.
- The integration surface is narrow: three boundaries. Tool registration, per-call action interception, and evidence emission. Getting these three right is most of the work of building a governed harness on top of Managed Agents — and all three map to Context OS architectural primitives.
- Hosting the runtime raises the bar on governance, not lowers it. When Anthropic operates more of the stack, the enterprise has fewer hooks into the runtime and must do more enforcement at the boundary. Convenience at L1-L3 raises the bar at L4-L7. This is the most commonly misread implication of managed agent runtimes.
Why does the AI agent layered architecture matter for regulated agentic operations?
Article 1 made the category claim: regulated enterprises need a governed harness above the agent runtime. This article does the unglamorous work of mapping the layers cleanly so engineering teams stop confusing capability with control.
The point is not to critique Anthropic Managed Agents — it is excellent at what it was built for. The point is to read it on its own terms so the boundary between vendor responsibility and enterprise responsibility is sharp enough to design against.
If you cannot draw the line, you cannot defend the line. And if you cannot defend the line, you will discover it during an audit — which is the worst possible time.
The AI agent layered architecture defines seven layers, a shared-responsibility boundary, and three integration surfaces. Understanding this architecture is the prerequisite for building AI agent governance that survives contact with SOX, HIPAA, EU AI Act, and DORA obligations — and for implementing the AI Agent Audit Evidence Framework that regulated production requires.
What is the Anthropic Managed Agents architecture and what does it provide?
Anthropic Managed Agents is a hosted runtime for AI agents built on Claude. It provides the components that make an agent operationally viable within the AI agents computing platform:
- A managed orchestration loop that handles reasoning steps
- A tool execution layer that routes calls to MCP servers and built-in capabilities
- Sandboxed code execution with network and process isolation
- Persistent memory across turns and sessions
- Lifecycle management that keeps long-running agents alive without customer infrastructure
It is, in the language of this series, an infrastructure harness — and a very capable one.
What it does not provide, by design, is the layer that makes an agent's behaviour defensible inside a regulated enterprise's control environment. That layer is not missing because Anthropic forgot it. It is absent because it cannot be standardised across customers — every enterprise's policies, jurisdictions, and risk appetites are different. The runtime is correctly scoped. The governance work is correctly left to the institution that owns the risk.
This is the same architectural truth that Decision Infrastructure addresses: the layer between AI capability and AI accountability must be owned by the enterprise, not the runtime vendor.
What is the L1-L7 AI agent layered architecture reference diagram?
The reference layer diagram maps the full agentic stack in a regulated enterprise. The shared-responsibility boundary divides it: everything below the enterprise owns through Anthropic Managed Agents, everything above the enterprise must own through its governed harness for AI agents.
| Layer | Name | Function | Owner |
|---|---|---|---|
| L7 | Business workflow | Intent, SLAs, KPIs, business owner accountability | Enterprise (governed harness) |
| L6 | Control plane | Approvals, kill switches, rollback, escalation routing | Enterprise (governed harness) |
| L5 | Evidence plane | Lineage, attestation, audit-grade records, tamper-evidence | Enterprise (governed harness) |
| L4 | Policy plane | Enforced rules, jurisdictions, risk tiers, deterministic evaluation | Enterprise (governed harness) |
| Shared-responsibility boundary | |||
| L3 | Orchestration | Agent loop, planning, memory, lifecycle management | Anthropic (infrastructure harness) |
| L2 | Tool execution | MCP invocation, code sandbox, retrieval, file handling | Anthropic (infrastructure harness) |
| L1 | Model | Claude inference, alignment, safety training | Anthropic (infrastructure harness) |
Reading this diagram is the entire job of this article. The layers are not interchangeable, the boundary is not negotiable, and the integration surface between L4 and L3 is where most architectural mistakes in AI agent governance will be made over the next eighteen months.
What does Anthropic own at L1 through L3 in the infrastructure harness?
L1 — The model layer
Claude itself: weights, inference, alignment, the safety training that shapes default behaviour. The enterprise consumes this layer through versioned model identifiers and never modifies it. Model risk management at L1 is largely about version pinning, regression testing against evaluation suites, and tracking model card disclosures.
L2 — The tool execution layer
The mechanics of calling tools: MCP server invocation, sandboxed code execution, retrieval primitives, file handling, the network and process isolation that prevents tool calls from causing infrastructure-level damage. Anthropic provides the substrate and its safety properties. The enterprise provides the tools themselves and decides which ones an agent is allowed to register.
L3 — The orchestration layer
The agent loop: plan, act, observe, repeat. Memory across turns and sessions. Lifecycle management of long-running agents. Recovery from transient failures. This is the layer Managed Agents most clearly commoditises — and the layer most likely to be mistaken for governance.L3 is where the agent's behaviour visibly happens. It is not governance. It is execution.
This is the critical distinction for AI agent governance: orchestration coordinates what the agent does; governance determines whether the agent should do it, proves what it did, and gives a regulator something to read six months later.
What must the enterprise own at L4 through L7 in the governed harness for AI agents?
L4 — The policy plane: where AI agent governance is enforced
Enforced rules — not guidance, not system prompts. The policy plane sits between the orchestration loop and any consequential action and answers a single question per call: is this action permitted, given who the agent is acting for, what data it is touching, what jurisdiction applies, and what risk tier the action falls into?
A policy decision is itself a structured, logged event. A denial is a first-class outcome the model cannot prompt-engineer around because the enforcement happens outside the model's control. This is the architectural role of Decision Infrastructure — deterministic policy evaluation separated from model output.
System prompts are guidance the model can be argued out of. Policy enforcement at L4 is structural — the model cannot route around a denial because the enforcement sits between the orchestration loop and the tool call.
L5 — The evidence plane: where Decision Traces are produced
Audit-grade records designed for someone who was not in the room: structured input and output capture, the policy decisions that fired and why, model and tool versions, data lineage of every retrieved document, timestamps, actor identity, and tamper-evident anchoring.
This is where Decision Traces live within the Context OS architecture. Logs become evidence only when the discipline around them makes them admissible — lineage, version pinning, tamper-evidence, and policy context. The evidence plane transforms runtime telemetry into the audit-grade artifacts that the AI Agent Audit Evidence Framework requires.
L6 — The control plane: where delegated authority is enforced
Where humans stay in the loop on the actions that warrant it: approval gates above defined thresholds, kill switches with bounded latency guarantees, rollback procedures with tested recovery paths, and the routing logic that decides which actions escalate to which humans.
The control plane is the architectural assertion that the agent operates under delegated authority — a scoped, revocable grant from a named human principal — rather than autonomous authority. This maps to the Authority Model within Context OS, where every AI agent action is verified through explicit, time-bound authority before execution.
L7 — The business workflow: where accountability lives
The actual job: the intent the agent is serving, the SLAs it must meet, the KPIs it is judged against, the business owner who answers for the outcome. L7 is often left implicit, which is a mistake.
Without an explicit owner at L7, the governance layers below have no one to answer to and decay into ceremony. Every governed agent in agentic operations must have a named business owner at L7 — the person who answers for the outcome when the agent's decision is questioned.
Where is the integration surface between the infrastructure harness and the governed harness?
The integration surface is narrower than most teams expect — which is the good news. It lives in three places, and getting all three right is most of the work of building a governed harness for AI agents on top of Managed Agents:
Integration surface 1: Tool registration boundary
Every tool exposed to the agent runtime is a potential consequential action. The governed harness intercepts tool registration so that each tool is wrapped with a policy check before the runtime can invoke it. The wrap is mechanical; the policy logic behind it is institutional.
Within Context OS, this maps to the Agent Registry — where every tool is registered with identity, authority scope, and governance constraints before becoming available to any AI agent.
Integration surface 2: Action interception boundary
For tools that survive registration but whose individual invocations need per-call evaluation (high-value transactions, PHI access, cross-jurisdiction data movement), the policy plane evaluates the call payload before the runtime executes it — and can deny, allow, or escalate. This is execution governance at the action level.
Integration surface 3: Evidence emission boundary
Every event the runtime produces — model call, tool call, memory write, policy decision — must flow into the evidence plane in a normalised schema. Anthropic emits operational telemetry; the governed harness transforms it into audit-grade evidence by adding lineage, attestation, and the policy context that makes it interpretable later.
The runtime emits telemetry. The governed harness produces evidence. The difference is whether someone can stand behind it under oath.
What is the most common misreading of Managed Agents and AI agent governance?
The most common misreading is that hosting the runtime collapses the shared-responsibility boundary — that because Anthropic now operates more of the stack, less of the stack belongs to the customer.
The opposite is true.
When the runtime is hosted, the governance layer becomes more important, not less, because the enterprise has fewer hooks into the runtime itself and must do more of its enforcement at the boundary. Convenience at L1-L3 raises the bar at L4-L7.
This misreading is the same one that plagued early cloud adoption: enterprises assumed that hosting infrastructure in AWS meant AWS owned compliance. The shared-responsibility model corrected that assumption. The same correction is happening now for agentic AI — and the enterprises that understand it will build their governed harness while competitors are still debating whether they need one.
For enterprise teams evaluating their governance readiness, the Governed AI Agent Platform Maturity Framework provides a five-level assessment. Platforms at Level 1-2 (Observed/Instrumented) typically exhibit this misreading. Level 3+ (Governed) requires explicit ownership of L4-L7.
How does the L1-L7 architecture map to Context OS and Decision Infrastructure?
| Layer | Function | Context OS mapping |
|---|---|---|
| L4 — Policy plane | Deterministic rule enforcement | Decision Infrastructure — Decision Boundaries evaluated as code |
| L5 — Evidence plane | Audit-grade records | Decision Traces — queryable artifacts with lineage, policy, identity |
| L6 — Control plane | Human authority gates | Authority Model — delegated, scoped, revocable authority |
| L7 — Business workflow | Intent, SLAs, accountability | Governed Agent Runtime — business-aligned agent execution |
Context OS provides the architectural implementation of L4-L7 as a unified governed operating system above any agent runtime. The AI agent layered architecture defines what each layer must do. Context OS defines how each layer is implemented as Decision Infrastructure within a single AI agents computing platform.
Conclusion: Why layers tell you who owns what — and why you must know before you ship
AI agent systems do not fail because of model limitations—they fail because ownership across the AI Agent Layered Architecture is unclear.
Anthropic Managed Agents owns L1 (model), L2 (tool execution), and L3 (orchestration)—the infrastructure harness that makes AI agents operational. But in regulated environments, enterprises must own L4 (policy), L5 (evidence), L6 (control), and L7 (business workflow)—the Governed Harness for AI Agents that makes decisions defensible.
The boundary between L3 and L4 is where AI agent governance becomes real. It is a non-negotiable separation required for compliance frameworks like SOX, HIPAA, EU AI Act, and DORA. The integration surface is intentionally narrow—tool registration, per-call interception, and evidence emission—but getting these right defines whether your system can be audited, trusted, and scaled.
This is where the AI Agent Audit Evidence Framework and Governed Agent Pipeline for Regulated AI come into play. Without structured evidence, policy enforcement, and control, AI decisions cannot be explained or validated.
Managed infrastructure does not reduce responsibility—it increases it. As control at L1–L3 becomes abstracted, the need for strong governance at L4–L7 becomes essential.
This is exactly what Context OS and Decision Infrastructure provide:
- Policy enforcement through Decision Boundaries
- Evidence through Decision Traces
- Control through the Authority Model
- Execution through a Governed Agent Runtime
Together, they form a complete Governed Agent Pipeline for Regulated AI.
Ultimately, layers define ownership, and pipelines define execution.
Without layers, governance breaks. Without pipelines, execution fails.
Before you ship AI agents into production, you must have both—because enterprise AI is not about what agents can do, but what they are allowed, controlled, and proven to have done.
Frequently asked questions
-
What is the AI agent layered architecture?
A seven-layer reference architecture for the full agentic stack in regulated enterprises: L1 (model), L2 (tool execution), L3 (orchestration) owned by the runtime vendor, and L4 (policy), L5 (evidence), L6 (control), L7 (business workflow) owned by the enterprise. The shared-responsibility boundary sits between L3 and L4.
-
What does Anthropic Managed Agents own in the stack?
L1 through L3: the model (Claude inference, alignment, safety), tool execution (MCP, sandbox, retrieval), and orchestration (agent loop, memory, lifecycle). These constitute the infrastructure harness that makes agents operationally viable.
-
What must the enterprise own?
L4 through L7: the policy plane (enforced rules, not system prompts), the evidence plane (audit-grade Decision Traces, not logs), the control plane (approval gates, kill switches, delegated authority), and the business workflow (intent, SLAs, named business owner). These constitute the governed harness for AI agents.
-
Does Anthropic Managed Agents include policy enforcement for HIPAA or EU AI Act?
No. Managed Agents provides infrastructure-level safety at L1-L3. Framework-specific policy enforcement (HIPAA minimum necessary, EU AI Act risk classification, DORA operational resilience) is enterprise-owned at L4 and cannot be standardised across customers.
-
How does this compare to AWS shared responsibility?
Conceptually identical. AWS owns the substrate; the customer owns the posture. Managed Agents owns L1-L3; the enterprise owns L4-L7. The boundary moves up the stack but the principle is unchanged: the vendor cannot own jurisdiction-specific compliance.
-
Where should policy checks live — in system prompts or outside the runtime?
Outside the runtime, always. System prompts are guidance the model can be argued out of. Policy enforcement at L4 is structural — it sits between the orchestration loop and the tool call, and a denial is a logged event the model cannot route around. This is the architectural role of Decision Infrastructure.
-
Why does hosting the runtime raise the bar on governance?
Because the enterprise has fewer hooks into the runtime itself and must do more enforcement at the boundary. Convenience at L1-L3 means the enterprise cannot rely on internal runtime modifications for governance — it must build governance structurally at L4-L7.
-
Why is confusing L3 for governance the most expensive mistake?
Because orchestration (L3) is where agent behaviour is visible, so teams assume it includes governance. It does not. Orchestration coordinates execution; governance enforces policy, produces evidence, and maintains human control. Without L4-L7, the enterprise has no governance — only execution.
-
How does the AI agent layered architecture relate to Context OS?
Context OS provides the architectural implementation of L4-L7: Decision Infrastructure for the policy plane, Decision Traces for the evidence plane, the Authority Model for the control plane, and the Governed Agent Runtime for business workflow execution. It is the governed operating system above any agent runtime.
-
Can this architecture work with runtimes other than Anthropic Managed Agents?
Yes. The layered architecture is runtime-agnostic. L4-L7 (the governed harness) sits above any L1-L3 infrastructure harness — whether Anthropic Managed Agents, LangChain, CrewAI, or custom frameworks. Context OS provides the governance layer regardless of the underlying runtime.


