"The Reasonable Agent"
The Reasonable Agent
AI agents interacting repeatedly in strategic settings — pricing games, resource allocation, competitive markets — often fail to reach Nash equilibrium. Empirical studies document persistent cycling, collusion, and instability. The proposed fix is alignment: post-training to steer agents toward equilibrium behavior. But alignment requires coordinating across independently developed models, which is impractical at scale.
This paper proves coordination is unnecessary. Off-the-shelf reasoning agents converge to Nash-like play on their own, provably, under two conditions: they form beliefs about others’ strategies from observation, and they learn to best-respond to those beliefs. The authors call this “reasonably reasoning” — not optimal, not omniscient, just responsive to evidence and capable of approximate optimization.
The convergence is on-path: along almost every realized play sequence, the agents’ behavior becomes weakly close to a Nash equilibrium of the continuation game. This is weaker than full convergence to equilibrium (the beliefs might not converge), but it’s exactly the relevant guarantee — what matters is that play looks equilibrium-like, not that internal models agree.
The robustness extends further. The common assumption that all agents know the payoff matrix can be dropped. Each agent observes only its own stochastic payoffs — not others’ actions, not the game structure. Convergence still holds. The agents need only their own reward signal and their observations of others’ behavior.
The practical implication: in many strategic settings, the problem of multi-agent alignment may not exist. Agents that reason about others and optimize their own payoffs converge to stable play as an emergent property of reasonable reasoning, without explicit coordination or post-training alignment.
Write a comment