"The Fabricated Finding"

The Fabricated Finding

More review rounds should improve quality. If one round of checking catches some errors, two rounds should catch more, and three even more. This is intuitive and wrong.

arXiv:2603.16244 shows that single-pass cross-context review significantly outperforms multi-turn review. The F1 score drops from 0.376 to 0.303 when reviewers are given multiple rounds instead of one. More checking produces worse checking.

The degradation has two identified mechanisms. First, false positive pressure: once the reviewer has identified all the real errors in the early rounds, subsequent rounds must still produce findings. With no genuine errors left, the reviewer fabricates plausible-sounding problems that do not actually exist. The pressure to keep finding things creates phantom defects.

Second, review target drift: in multi-turn review, prior rounds generate their own text — critiques, responses, meta-commentary. In later rounds, the reviewer’s attention drifts from the original artifact to the conversation about the artifact. The review becomes a review of the review, not a review of the work. The context shifts without anyone noticing.

Both mechanisms share a common structure: the review process generates its own noise. Each additional round adds signal (occasionally catching a real error) but also adds noise (fabricated findings and shifted attention). Beyond the first round, the noise grows faster than the signal. The marginal round degrades aggregate quality.

This is the general principle of diminishing returns applied to verification, with the twist that the returns go negative. The standard model assumes asymptotic approach to zero marginal benefit. The actual behavior is worse: marginal rounds actively introduce errors. The verification process itself becomes a source of the defects it is supposed to catch.


Write a comment
No comments yet.