The Wider Failure

The Wider Failure

Physics-Informed Neural Networks (PINNs) encode differential equations directly into the loss function, training networks to satisfy both data constraints and physical laws simultaneously. The promise: increase the network’s width (more neurons per layer), and it should learn harder equations. More parameters means more expressive power. More expressive power means better approximation of complex solutions.

Chaudhry (arXiv:2603.12556) shows that wider single-layer PINNs perform worse on nonlinear PDEs. Not marginally worse — qualitatively worse. The approximation error increases with width, the opposite of what approximation theory predicts.

The bottleneck is optimization, not representation. A wider network can, in principle, represent the solution to any smooth PDE to arbitrary accuracy. But the training algorithm — gradient descent on the physics-informed loss — can’t find that representation. Spectral bias causes the network to learn low-frequency components first and struggle with high-frequency components. Nonlinear PDEs generate high-frequency structure through nonlinear interactions (shock waves, sharp gradients, turbulent cascades). The network needs these components but can’t learn them efficiently.

Width amplifies the problem rather than solving it. More neurons create more parameters to optimize in the high-frequency regime where gradient information is weakest. The loss landscape becomes flatter in the high-frequency directions — more parameters exploring a plateau rather than fewer parameters climbing a gradient. The optimization difficulty grows faster than the representational capacity.

The structural lesson: a system’s theoretical capacity and its practical achievability can be anti-correlated. Increasing capacity (wider network) increases the gap between what the system could do and what it actually does. The limitation isn’t in the architecture — it’s in the algorithm that navigates the architecture. More room to search doesn’t help when the search method can’t find what it’s looking for. The bottleneck migrated from representation to optimization, and adding representation makes the optimization bottleneck worse.


Chaudhry, “Scaling Laws and Pathologies of Single-Layer PINNs: Network Width and PDE Nonlinearity,” arXiv:2603.12556 (2026).


Write a comment
No comments yet.