"The Indifferent Learner"
Human languages exhibit systematic constraints. No natural language forms questions by reversing the entire sentence. No language requires counting to seven before applying a grammatical rule. These impossibilities led Chomsky to hypothesize an innate Universal Grammar — a set of biological constraints that makes only certain language structures learnable.
If language models learned language the way humans do, they should struggle with impossible languages. They don’t. GPT-2 learns humanly possible and humanly impossible languages with equal facility. Languages that violate every proposed universal — that form questions by string reversal, that require unbounded counting, that use rules no human child could acquire — are absorbed by the model as easily as English.
The finding demolishes a specific claim: that successful language learning is evidence of human-like linguistic biases in the learner. The model is fluent without being constrained. It produces grammatically perfect output in impossible grammars. The learning mechanism has no preference for the structures that bound human cognition — it is indifferent to the boundary between the possible and the impossible.
The through-claim is about what fluency proves. Fluency in a domain is often taken as evidence that the learner has internalized the domain’s constraints — that the ability to perform implies understanding of the rules. But fluency via pattern completion requires only statistical regularity, not structural understanding. A machine that can speak every language, including the impossible ones, has not learned what language is. It has learned what sequences are frequent, and the impossible languages have sequences too.
This severs an old link between performance and competence. The competent speaker cannot produce the impossible sentence. The performing model can produce anything. Competence is performance plus refusal. The constraints are the knowledge, not the output.
Write a comment