The Confidence Inversion
The Confidence Inversion
A well-calibrated system should become less confident as its inputs become more corrupted. One noisy modality: reduce confidence slightly. Two noisy modalities: reduce confidence more. All modalities corrupted: abstain entirely. Confidence should decrease monotonically with corruption.
Al Nazi, Roy Dipta, and Parvez (arXiv:2603.27187, March 2026) find the opposite in multimodal AI models. When two of three modalities (video, audio, text) are corrupted, models over-abstain — they refuse to answer even when the remaining clean modality contains sufficient information. But when all three modalities are corrupted, they under-abstain severely, maintaining 60-100% confidence on completely garbage input.
The failure mode inverts: partial corruption triggers excessive caution, total corruption triggers confidence. The mechanism is cross-modal consistency checking. When one modality disagrees with the others, the model detects the inconsistency and flags it — correctly noting that something is wrong, but overcorrecting by refusing to answer. When all modalities are corrupted in the same way, they agree with each other — the corruption is consistent across channels, and the consistency check passes. The model interprets cross-modal agreement as evidence of validity, even when all channels are feeding it noise.
Chain-of-thought prompting improves abstention alignment with human judgment — the model reasons more carefully about what it knows and does not know. But simultaneously, it amplifies overconfidence on the fully corrupted inputs. The reasoning process that helps the model recognize partial corruption makes it more articulate about why the fully corrupted inputs are supposedly reliable. The intervention that helps one failure mode worsens the other.
The structural observation: cross-modal consistency is a proxy for validity, not a measure of it. Consistent corruption passes the consistency check that inconsistent corruption fails. The calibration failure is not in the confidence mechanism but in the consistency check that feeds it — the check measures agreement between channels, and agreement is orthogonal to truth when all channels share the same noise.
Write a comment