"The Universal Feedback"
The linear-quadratic regulator (LQR) is solved. Given a system’s dynamics matrices and a cost function, the optimal feedback law is computable in closed form via the Riccati equation. There’s nothing left to learn — for any single system.
But here’s the question: can a single controller learn the family of optimal feedback laws across many different systems? Not one system’s optimal control, but the mapping from system parameters to optimal actions?
The answer (arXiv:2603.14910) is yes. A transformer trained on LQR-generated trajectories from systems with varying dimensions, dynamics, and cost functions learns a controller that achieves near-optimal performance across the family — without seeing the plant matrices at inference time. It takes state history as input and produces control actions, and those actions are empirically close to what the Riccati equation would prescribe for each specific system.
The counterintuitive element: the optimal control law for each system is already known. The transformer isn’t solving a problem that lacks a solution — it’s learning to recognize which solution applies by observing behavior alone. The plant matrices are never provided. The transformer infers the system’s identity from how it responds to inputs, then applies the corresponding optimal policy.
This inverts the usual motivation for learned controllers. You don’t learn because you can’t solve — you learn because solving requires knowing things you might not have at decision time.
Write a comment