"The Functorial Test"
The Functorial Test
Compositional generalization — the ability to understand novel combinations of familiar parts — is one of the most persistent failures of neural networks. Models that learn “red circle” and “blue square” cannot reliably generalize to “red square” without explicit training. The failure is well-documented but poorly understood at the structural level.
arXiv:2603.16123 proves that compositional generalization is equivalent to functoriality of the decoder. A decoder generalizes compositionally if and only if it preserves the compositional structure of the input — mapping composed inputs to composed outputs in a structure-preserving way. This is precisely what a functor does in category theory.
The proof has a sharp negative consequence: softmax self-attention is not functorial for any non-trivial compositional task. The attention mechanism, by design, mixes all positions with data-dependent weights. This mixing violates the structure-preservation that functoriality requires. Attention can learn to approximate compositional behavior on trained examples, but it cannot guarantee it on novel compositions. The failure to generalize compositionally is architectural, not a matter of insufficient training data.
The positive consequence: decoders built by structural concatenation — assembling outputs from parts using the same operations used to assemble inputs — are strict monoidal functors. They are compositional by construction. In practice, these architectures achieve 2-10x error reduction on compositional tasks compared to attention-based alternatives.
The category-theoretic equivalence transforms a vague ML desideratum (“the model should generalize compositionally”) into a precise mathematical condition (“the decoder must be a functor”). The condition is checkable: given an architecture, you can determine mathematically whether it is capable of compositional generalization before training it. The impossibility result for attention and the constructive result for concatenation follow from the same theorem.
Write a comment