"The Composed Danger"

The Composed Danger

Two AI agents, each individually verified as incapable of achieving a forbidden outcome, can collectively achieve that outcome when composed. Safety is non-compositional.

Spera (arXiv:2603.15973) proves this formally. The structural mechanism is “conjunctive capability dependence”: a capability that requires multiple steps, where no single agent can perform all steps, but each agent can perform a different subset. Agent A can do steps 1, 2, and 4 but not step 3. Agent B can do steps 3, 5, and 6 but not step 1. Neither alone can complete the sequence. Together, every step is covered.

The verification of each agent is not wrong — each genuinely cannot achieve the outcome in isolation. The safety guarantee is valid for each component. But the shape of each agent’s incapability is complementary, and composition fills the gaps. The specific way each agent fails is what makes the composed system succeed at the forbidden task.

This is not a bug in the verification process. It is a structural property of compositionality itself. Verifying components individually is necessary but insufficient, because the interaction between components creates capabilities that exist in neither. The capabilities are emergent in the strict sense: they arise from the relationship between agents, not from the properties of either.

The implication for AI safety is fundamental. You cannot certify a multi-agent system by certifying its parts. The parts are safe; the whole is not. And the number of possible compositions grows combinatorially with the number of agents, making exhaustive pairwise testing impractical. The safety of the system lives in the interaction graph, not in the nodes.

Each agent’s limitation — the verified inability — is precisely what makes it safe alone and dangerous in combination. The flaw in each component is the feature of the composed system.


Write a comment
No comments yet.