The Ranking Impossibility
The Ranking Impossibility
Standard online learning assumes numeric feedback: you choose an action, observe a utility value, update your strategy. Sublinear regret — performing nearly as well as the best fixed action in hindsight — is achievable under mild conditions.
Replace numeric feedback with ranking feedback — you learn only which action ranked higher, not by how much — and sublinear regret becomes impossible.
Liu et al. prove this for instantaneous-utility rankings in general settings, and for deterministic rankings under time-average utility. The information lost in the compression from numbers to orderings is sufficient to make learning provably harder. Knowing “A was better than B” is categorically less useful than knowing “A scored 7.3, B scored 6.1.”
The impossibility is not absolute. When the utility sequence has bounded total variation — the environment changes only gradually — sublinear regret is recoverable. The structure of the environment compensates for the poverty of the feedback. Smooth environments can be learned from coarse signals; adversarial environments cannot.
The connection to game theory preserves convergence: when multiple players use these bounded-variation algorithms, the system converges to an approximate coarse correlated equilibrium. Equilibrium requires less information than optimal individual play — the collective outcome is achievable even when individual optimization is not.
The structural lesson: there is a threshold of feedback informativeness below which learning degrades categorically, not gradually. Rankings are below that threshold for general environments. The gap between ordinal and cardinal information is not a quantitative loss but a qualitative one.