Boundary Enforcement¶

Coadaptive Layer · Chapter 04

This chapter extends: SF² Process Stewardship (Section 02), SF² Implementation Guides (Section 06). Scope: capability-based security as the substrate-layer answer when code review doesn't scale.

Code review was the security control that assumed a human could read what shipped. That assumption is gone. When generation outruns comprehension, inspecting the code more carefully is a losing race against a faster machine. The architectural response is to stop relying on inspection and start relying on boundaries: capability-based security enforced by infrastructure, so the question shifts from whether someone read the change to whether the system was ever able to do the dangerous thing in the first place. This is the move away from meat-gated security and toward paved roads.

Why code review doesn't scale to AI velocity¶

The bandwidth problem from Chapter 02 lands here as direct operational pressure. If generation outruns comprehension, as that chapter lays out, then human review is the bottleneck and the gap only widens. You can hire more reviewers and lose anyway, because the generation side scales with compute and the review side scales with headcount. Any control whose throughput is capped by human reading speed is a control that AI velocity has already outrun.

The trap is responding by demanding more review. That makes security the thing standing between the team and shipping, which is the failure mode the rest of this chapter names. The way out is to change what review is for: not gating every change, but designing the boundaries within which any change, reviewed or not, is safe to run.

Capabilities over code inspection¶

Capability-based security is an old and well-founded tradition. Saltzer and Schroeder set out the principles in The Protection of Information in Computer Systems (1975), least privilege chief among them, and systems from EROS to FreeBSD's Capsicum carried the model forward. It is not a research curiosity; the same model runs in production today. A component holds explicit, narrow authority to do specific things and holds nothing else, so the boundary is the load-bearing surface rather than the line of code. A component that has no capability to exfiltrate data cannot be talked into exfiltrating data, no matter what an attacker writes into its input.

That property is what makes capabilities the right substrate for AI-era systems. You cannot reliably predict what a generated component or a reasoning agent will try to do. You can decide, in advance and at the infrastructure layer, what it is able to do. The authority question and the confused-deputy problem that rides on it are taken up in Authorization at Agent Scale; this chapter's claim is narrower and prior: enforce authority at the boundary, because inspecting behavior does not scale and bounding capability does.

The case that shows this most cleanly is the dependency you cannot see into. When a provider you run is itself operating sub-providers you never contracted, the operator beneath the operator, there is nothing to inspect and no one to certify two layers down. What you can still do is bound what the whole composition is able to reach and spend, so a failure at an uncontracted link is contained by the authority you granted rather than by trust in scoping you never saw.

In deployed form, that boundary is increasingly a gateway every model call and agent hop routes through, where an organization allows or denies on the traffic it can see. Constrain the network and compute the people and agents work under so they reach a model only through that chokepoint, and the boundary becomes one the organization actually holds. It governs only what crosses it; what an undisclosed sub-operator already holds stays a Third-Party residual, carried by contract rather than by the gateway.

This is clean when the component is narrow. A function that reads one table, a service that writes one queue, can be handed exactly that authority and nothing more. The harder case, and the one that matters most for AI-era systems, is the general agent. Its value is that it can do many things, so a model that says "grant the narrowest authority and withhold the rest" looks self-defeating: withhold enough to make the agent safe and you have made it useless. This is the strongest objection to boundary enforcement at agent scale, and it has to be answered rather than waved away.

The answer is to separate what an agent could do from what any single request is permitted to do. The agent holds broad latent capability, the capacity to act across many tools and surfaces. Each request runs under an attenuated capability that names exactly what this action may do and can only be narrowed further, never widened. Breadth stays available, so the agent is still useful. Authority is scoped per request, so a compromised or confused agent spends only what this task was handed, never the full reach of the agent behind it. This is attenuation, and it is already how production capability systems scope delegated authority (see references). Authorization at Agent Scale carries it down the delegation chain, where any holder can add a constraint that narrows authority and none can widen it.

Scoping authority is one invariant. It is not the only one an agent needs, and it does not answer prompt injection on its own. A scoped capability bounds what an agent may touch; it does not stop untrusted data the agent reads from redirecting what the agent decides to do with that reach. Keeping control flow intact under untrusted input is a separate property, and it is enforceable by construction: a privileged model that plans but never reads attacker-controlled data, a quarantined model that reads that data but holds no authority to act, and an interpreter that enforces policy on the flow between them. Google DeepMind's CaMeL is the working instance, and it answers the natural objection that scoped credentials already contain the agent. They do not. A credential narrows reach; it does not keep injected text out of the plan.

The cost is structural and worth stating plainly. The split runs two models where one ran before, the planning model re-queries itself to produce interpreter code that runs clean, and the quarantined model has to read every untrusted artifact, so the approach roughly doubles the model calls a task makes and spends materially more tokens by construction. A workflow that has the quarantined model read many artifacts, a mailbox of messages for instance, can add seconds of latency on top. That price buys strong containment where it is affordable and prices the approach out where it is not, the low-latency interactive paths where a person is waiting on the answer. Containment by construction is not free, and where it is too expensive the boundary has to be held another way.

Meat-gated security at agent scale¶

Meat-gated security is any control that depends on a human standing in the path of the work. A person approves the deploy, signs off on the access, eyeballs the diff. At human authorship speed, that was tolerable. At agent scale it becomes the bottleneck the system was supposed to remove, and worse, a bottleneck that quietly rubber-stamps because the human cannot actually evaluate the volume flowing past them. A queue that approves everything because it has no time to reject anything is theater with a person in it rather than a control.

Paved roads are the affirmative pattern. Build the safe path so it is also the easy path, enforce the boundaries in the infrastructure that path runs on, and let builders move at speed inside it without a human gate on every step. Reserve human authority for the narrow set of actions whose downside is catastrophic, where the judgment is worth the latency. The goal is to spend human attention where it changes the outcome rather than remove humans from security, letting the boundary hold everywhere else.

One caution closes the case. Boundaries are not the whole of security, and they do not retire monitoring, detection, or response. For a broad agent, the dynamic layer above the boundary still does real work. The boundary is the irreplaceable floor: the layer the others cannot substitute for once you assume the agent is compromised. Detection can miss and a reviewer can wave a change through, but a capability the system never granted is one the agent cannot spend. That claim is narrower than "boundaries are the answer," and far harder to argue away. That floor holds as a property you get by construction for as long as the agent cannot reach the layer that grants it; when it can, the floor narrows to a contained, verified core rather than disappearing. This is the time axis on the guarantee, taken up in the three-layer model.

Defender cost economics¶

The Adversary Economics criterion prices a control from the attacker's side, scoring it by the surface it closes. This principle prices the same control from the defender's side. Every mitigation has a cost, and money is the smallest part of its price. A control also spends latency, throughput, and developer friction, and the one most often left off the books is performance. So the discipline is a stopping rule: stop adding controls once the next one costs more than the risk it retires. Before you trust that call, stress it. Halve your loss estimate and confirm the decision still holds. Past that point, accepting the residual risk you have priced is the decision rather than the failure.

The stopping rule rests on two claims, and each is checked differently, because a latency cost and a risk estimate are not the same kind of number. A performance cost is load-tested: a measured p99, taken at the tail under representative load, that either breaches a stated SLO or clears it. A risk estimate is sensitivity-tested: a guess you stress by halving it, kept only if the decision survives. The risk side is the gameable one, because the loss figure is an input a reader can inflate until any control looks worth it. Either number counts only once it survives the test that fits it: a load test for the latency, the halving test for the risk.

Performance enters as a veto that fires at design time, before any load test exists to confirm the cost. The veto has to name a specific, testable claim, for instance that a control prices the agent out of the synchronous low-latency path. It then carries a measurement obligation: produce the p99 under load before general availability, or it converts to a risk acceptance someone signs by name. But the burden sits with the veto. With no measured artifact the objection fails, and the control goes back to clearing its own security bar. Nothing more. CaMeL, the control-flow-integrity design above, is the calibrating example. It is a strong prompt-injection defense, but its dual-model split roughly doubles the model calls, which prices it out of synchronous low-latency paths: sound where that latency is affordable, declined on the record where it is not. This burden-of-proof test is also self-auditing. If a year of performance vetoes cites no load-test artifact, the instrument is theater, and the principle collapses to its risk-side clause.