"The Captured Algorithm"
The Captured Algorithm
Transformers are universal approximators. Given enough parameters, they can represent any computable function. This is the expressivity result that underwrites their deployment across every domain.
The universality is real but misleading. The paper shows that while transformers can express any algorithm in principle, their gradient-based training is captured by a specific complexity class: EPTHS (extended polynomial-time hierarchy with sparse access). Algorithms outside this class are expressible but unreachable — they exist in the weight space but live in regions that gradient descent cannot find from typical initializations.
This is algorithmic capture: the training dynamics select a subset of the expressible functions, and that subset has a definite complexity-theoretic characterization. The inductive bias of transformers is not soft (a preference for smooth functions) or architectural (a preference for certain input patterns) but computational (a preference for algorithms of bounded complexity).
The consequence is a gap between what the model can represent and what it can learn. A transformer with enough layers can represent an algorithm that solves an NP-hard problem. But gradient-based training will not find this representation — it will find a polynomial-time approximation, or nothing. The expressivity ceiling is set by architecture. The learning ceiling is set by optimization. The two ceilings are at different heights, and the lower one is what matters in practice.
The structural lesson: inductive bias is a complexity-theoretic concept, not merely a statistical one. When we say a model “prefers” certain functions, we mean the training dynamics can reach them. The dynamics have a computational complexity, and that complexity determines which functions are learnable. Universal expressivity is a statement about the hypothesis space. Algorithmic capture is a statement about the search algorithm. The practical power of the model is the intersection — functions that are both expressible and findable.
Write a comment