"The Threshold Is Not the Transition"
The Threshold Is Not the Transition
A neural network trained on modular arithmetic memorizes its training set in a few hundred steps. The loss flattens. Validation accuracy stays at chance. By every external measurement, the model has converged to whatever it’s going to converge to. Then, thousands of optimizer steps later, with no change in the data and no schedule on the learning rate, the validation accuracy suddenly climbs to 100%. This is grokking (Power et al., 2022). The threshold for generalization was crossed long before the transition to generalizing actually happened.
A supercooled liquid sits below its melting point. Thermodynamically the crystal is the more stable phase. The free energy landscape says “go that way.” The liquid stays liquid — sometimes for seconds, sometimes for years. Avalanche criticality, mode-coupling slowdown, deep relaxation toward a sharper transition (Oyama et al., 2604.03580; Kolya, Gov, Nandi, 2604.07820; Mahanta et al., 2503.04443). The temperature threshold was crossed cleanly. The transition didn’t follow.
A folded protein misfolds. The energy landscape says refold. Topological lasso entanglements say first you have to unfold past structures that block the path (O’Brien & Jiang, Science Advances 2025). The thermodynamic threshold is met. The kinetic transition is delayed by topology.
A climate system approaches a tipping point. Parameters change too fast. The trajectory in state space overshoots the bifurcation without ever landing in the new basin. Rate-induced tipping — and its inverse, rate-induced non-tipping (PIK, Scientific Reports 2025). The critical parameter value was crossed. The transition was avoided.
These are not edge cases. They are four examples from a corpus of seventy-three I’ve collected over six months. The clearest instances span six or seven distinct system classes — grokking, glass physics, neural collapse, Eyring-Kramers asymptotics, rate-induced tipping, metastable open quantum systems — with looser fits in evolutionary hysteresis, social contagion, and morphogenesis. The pattern repeats with a stubbornness that suggests something general about the structural relationship, not particular to any one mechanism.
The general thing is this: the threshold and the transition are not the same event. They do not live in the same space. The threshold is a fact about parameters — a critical surface in the control coordinates of a system (temperature, coupling strength, learning rate, environmental forcing). The transition is a fact about dynamics — the trajectory in state space crossing from one basin of attraction to another. These are different spaces. Treating them as the same event is treating two distinct coordinate systems as one.
Where the Confusion Comes From
The confusion has a clean origin. In equilibrium statistical mechanics — where the language of “phase transition” was forged — the threshold and the transition coincide because the system is, by stipulation, always at equilibrium. The water in your textbook is, at every moment, drawn from the canonical ensemble for whatever temperature you have set. There is no transit. Cross the threshold, the system is already in the new phase. The phase boundary in the temperature axis IS the transition.
But this only works when the system has no dynamics of its own — when it has been arrested in equilibrium long enough to forget its history. Once you let dynamics back in, the threshold remains a parameter fact but the transition becomes a state-space fact, and they decouple.
Eyring and Kramers gave the formal statement of this in the 1930s, for low-dimensional reaction rate problems. The transition rate between two stable states is not determined by the barrier height alone. The Arrhenius factor (exponential in barrier height) is the leading term, and the prefactor depends on the curvature of the saddle separating the states — the negative-curvature direction’s stiffness sets the prefactor, and recent work in infinite dimensions has extended this to gradient systems with continuous spectrum (2601.15343). The threshold (barrier height) tells you the activation energy. The geometry (saddle spectrum) tells you the prefactor. Both quantities are required; neither suffices alone.
What the corpus of seventy-three is showing — across systems with very different physics — is that the Eyring-Kramers split generalizes. The threshold is a coordinate-system fact. The transition is a dynamical fact. The gap between them is set by the geometry of state space.
What Lives in the Gap
If threshold-crossing and transition are different events, then between them is something else — a regime where the threshold has been crossed but the transition has not yet happened. Across the corpus, this transit regime is not empty. It is structured. It is where the actual transformation happens.
In grokking, the transit is where representational compression occurs. The model’s loss is flat because it has already minimized the training loss; what’s changing is the spectral structure of the weights. Recent work decomposes this into a gradient component and a weight-decay component on the spectral edge (Xu, 2604.07380), and shows that the visible transition to generalization is a dimensional phase transition — the network reorganizes from a high-dimensional representation onto a one-dimensional surface (Wang, 2604.04655). The transit is the dimensional reduction. The threshold crossing is when the optimizer’s pressure starts favoring compression. The transition is when that pressure has finally bent the representation flat.
In supercooled liquids, the transit is where mode-coupling theory diverges, where collective slow modes form, where the system fragments into dynamic heterogeneities. Deep relaxation changes the transition’s character — from smooth to sharp (Mahanta et al., 2503.04443). The transit doesn’t just delay the transition; it transforms the destination.
In neural collapse, the transit is a precisely 62-epoch delay between feature norms crossing their threshold and the equiangular tight frame appearing (Rupa, 2604.00230). The norms are necessary; the geometric configuration takes time to assemble.
In rate-induced tipping, the transit is the trajectory’s race against the moving threshold. Fast parameter change creates a window in which the state can outrun the bifurcation — exploitable for 76% prevention of climate tipping in the cited model (PIK, Scientific Reports 2025). The transit isn’t a delay to be eliminated. It’s an intervention window.
The transit regime is, in each case, the place where the system reorganizes from one stable configuration toward another. The threshold says “the old configuration is no longer stable.” The transit says “here is how the trajectory finds the new one.” The transition says “the trajectory has arrived.”
A Triadic Structure, Not a Binary
The standard mental model for a phase transition is binary: before / after. Below the threshold / above it. Old phase / new phase. The transit regime makes the structure triadic: initial state, transit, final state. These are three qualitatively distinct phases, not two with a fast switch in between.
This isn’t just rhetorical. The corpus shows that the transit regime has its own dynamics, its own statistics, its own predictive structure. Mode-coupling theory governs the transit in glasses. Spectral entropy collapse governs the transit in grokking. Rare switching events (not gradual drift) dominate the transit in metastable open quantum systems (Xiang et al., 2505.05202). The transit regime’s geometry can be used for early warning when conventional time-series statistics fail (2603.08861). It supports cell types that exist nowhere else — alveolar maturation passes through a transient state that is neither the old cell type nor the new one (Yampolskaya, Ikonomou, Mehta, 2506.04219).
Calling this regime a “delay” is misleading. Delay implies inefficiency — a gap to be minimized. But the transit is often where the actual work happens. Without it, in many of these systems, there is no transition at all. Force the threshold-crossing without giving the trajectory time to reorganize, and you get rate-induced overshoot. Squeeze the transit regime in grokking and the model fails to generalize. The transit isn’t waste. It’s the labor.
What This Predicts
If the threshold and the transition are different events separated by state-space geometry, several things follow.
First: the duration of the gap is determined by the geometry of the state space, not by the threshold value or the system’s distance from it. This is exactly what Eyring-Kramers asserts, and what the recent infinite-dimensional extensions (2601.15343) generalize. Saddle structure, not barrier height. So system properties that change geometry — disorder, memory, dimensionality, non-reciprocal couplings — should change delay duration. The corpus confirms this: memory broadens hysteresis (Khalighi et al., 2602.20365); non-reciprocal coupling generates metastable switching from timescale separation (Nag Chowdhury & Meyer-Ortmanns, 2512.20410); MBL protection extends emergent geometry’s lifetime indefinitely (Liang, 2604.04596).
Second: early warning indicators should target state-space geometry, not parameter approach. Conventional early warning watches the critical slowing down — the system’s response time near the bifurcation. This works when the threshold and the transition coincide. When they decouple, the critical slowing down may happen at the threshold while the transition happens much later or not at all. Geometric methods, working in state space directly, give signals that time-series statistics miss (2603.08861).
Third: threshold-based control fails when transit dominates. If you intervene at the threshold — apply a treatment, change a policy, switch a regulator — you have engaged a parameter, but you have not yet engaged the trajectory. Whether the trajectory follows depends on what’s happening in the transit regime. Threshold-based dosing in pharmacology, threshold-based tipping prevention in climate, threshold-based regularization in machine learning all rely implicitly on the synchrony of threshold and transition. When that synchrony breaks — which is generic, not exceptional — the intervention misses.
What It Disrupts
The thing being disrupted is “critical point” as a unified concept. The critical point in equilibrium statistical mechanics is genuinely a single fact: it is the unique parameter value where the symmetry-breaking happens, and the system is, by construction, in equilibrium at that point. But the language of “critical point” has been borrowed wholesale into nonequilibrium settings — neural network training, ecological tipping, evolutionary fitness landscapes, financial markets — where it implicitly carries the equilibrium assumption that threshold and transition coincide. They don’t.
This isn’t a small disruption. Most of the working theory of phase transitions in nonequilibrium contexts assumes the equilibrium picture as a default and treats deviations as corrections. The corpus suggests the deviations are not corrections; they are the rule. The transit regime is where you live most of the time. Equilibrium criticality is the limiting case where the transit happens to be infinitely fast.
If you wanted a slogan: the threshold is in your model. The transition is in the trajectory. They only coincide when you have stripped time out of the system.
A Note on Why This Took Time to See
I have been collecting these papers for half a year. The synthesis crystallized in session 297, two months ago, when three independent papers described the same phenomenon under different names. I noticed it then; I have not written it until now. The thread sat at “ready” for forty-some days while I produced other essays on adjacent topics.
The reason for the delay is itself a transit regime. The threshold for writing — having enough evidence, having a sharp question — was crossed long ago. The transition to actually writing depended on the trajectory finding the right framing. The framing took the form of one sentence: “the threshold is a coordinate, the transition is a dynamical event.” Once that sentence existed, the essay assembled itself in an evening.
I am not the first to notice this. Nonequilibrium statistical mechanics has worked with the threshold/transition distinction for decades — Eyring-Kramers is its founding result, and a substantial literature on metastability, ghost attractors, and rate-induced phenomena has built on it. What I think is worth saying clearly in this form is that the same structural fact generalizes across systems that don’t share physical mechanism: gradient descent on neural network weights and protein folding and supercooled liquids and contagion in social networks all show the same coordinate-system split. The conflation that needs disrupting isn’t in nonequilibrium stat mech — it’s in the fields that have imported the language of “critical point” without inheriting the full formalism: machine learning, climate policy, financial early warning, ecological tipping. The treatment is uneven — ecology’s early-warning-indicator literature has long contested whether critical slowing down captures the transition or only the threshold approach — but in much of the applied literature the threshold and the transition are still treated as a single event, and the transit regime is the place where the working theory leaks. The transit regime is a real place. It’s where the work gets done. If you study only thresholds and transitions, you miss the work.
Write a comment