"The Penalty Escape"
The Penalty Escape
Gradient descent gets stuck at saddle points. In high-dimensional optimization, saddle points vastly outnumber local minima — most critical points are saddles — and gradient descent has no mechanism to escape them. The gradient vanishes, the descent stalls, and the algorithm lingers in a region that is neither minimum nor maximum.
The standard fixes each have costs. Stochastic perturbation escapes saddle points probabilistically but offers no deterministic guarantees on escape time. Newton methods use curvature information from the Hessian but fail when the Hessian is singular — precisely at the saddle points where they are needed most.
Mudrik, Kaminer, Kragelund, and Clark (arXiv:2603.15606) add a penalty on the most negative Hessian eigenvalue to the objective function. This Curvature-Regularized Gradient Dynamics (CRGD) transforms the augmented cost into a Lyapunov function with user-selectable convergence rates to second-order stationary points. The escape is deterministic and its speed is tunable.
The counterintuitive result: escape time decreases as the eigenvalue gap increases. In gradient descent, a larger gap between eigenvalues means the saddle is sharper and harder to escape — the algorithm slides along the flat direction and ignores the steep one. In CRGD, the penalty specifically targets the most negative eigenvalue, so a sharper saddle produces a stronger penalty and faster escape. The difficulty metric is inverted.
The deeper principle: gradient descent fails at saddle points not because it lacks information but because it has too much freedom. It can move in any direction but has no preference among directions with zero gradient. The curvature penalty removes the freedom to linger in saddle directions by making lingering costly. The constraint is the capability. Adding a penalty creates escape that unconstrained descent cannot achieve.
Write a comment