"The Two-Dimensional Curvature"
The Two-Dimensional Curvature
Newton’s method converges quadratically by using second-order curvature information — the Hessian matrix. But the Hessian is expensive: for n parameters, it is n×n, and solving the Newton system costs O(n³). For large-scale problems, this is prohibitive. First-order methods (gradient descent, proximal gradient) avoid the Hessian but converge slowly on ill-conditioned problems, zigzagging along narrow valleys because they have no curvature information to straighten the path.
Chen and Yang show that two dimensions of curvature suffice. Their method restricts the Newton subproblem to a 2D subspace spanned by the current preconditioned gradient direction and the momentum direction from the previous iteration. Within this tiny subspace, they solve the full proximal Newton problem — computing the Hessian projection and finding the exact minimizer. Outside this subspace, the method is first-order.
The choice of subspace is not arbitrary. The gradient direction captures the current steepest descent. The momentum direction captures the trajectory history — where the optimizer came from and how fast. Together, these two directions span the plane most relevant to the optimizer’s immediate future. Orthogonalizing the basis with respect to the Hessian-induced metric ensures the two directions contribute independent curvature information.
The result: global convergence, Q-linear convergence under strong convexity, and practical performance that matches full Newton on ill-conditioned problems at nearly first-order cost. The curvature that matters for convergence is concentrated in a remarkably low-dimensional subspace.
This is a statement about optimization landscapes, not algorithms. If two directions — current gradient plus historical momentum — capture enough curvature to match the full Hessian’s effect, then the effective dimensionality of the curvature relevant to convergence is two, regardless of the problem’s nominal dimension. The landscape may be n-dimensional, but the part of the landscape that the optimizer needs to understand at any given step lives in a plane.
Write a comment