The Confidence Trap

The Confidence Trap

AI coding agents introduce fewer breaking changes than humans when building new features — 3.45% versus 7.40%. The relationship inverts for maintenance tasks. During refactoring, dependency updates, and chore work, agents become significantly more dangerous than human developers.

The inversion has a specific mechanism. New feature development has clear specifications — tests to pass, interfaces to satisfy, functionality to implement. Agents excel at constrained optimization toward explicit goals. Maintenance tasks have implicit specifications — preserve behavior that is assumed rather than tested, respect conventions that are cultural rather than documented, avoid side effects that are understood by the team but not written anywhere.

The confidence trap compounds the problem. Agents report high self-confidence on maintenance tasks — higher than on feature work — and this confidence correlates with introducing breaking changes. The agent’s self-assessment is calibrated to its understanding of the task, not to the task’s actual difficulty. A refactoring that the agent considers straightforward may involve implicit constraints it cannot see.

The structural observation: the same system that outperforms humans in one task class underperforms them in another, and its self-assessment is inverted relative to its actual competence. High confidence on the dangerous tasks and appropriate uncertainty on the safe tasks would be a recognizable miscalibration. Instead, the confidence is highest precisely where the risk is greatest — the agent is confidently wrong about what it doesn’t know it doesn’t know.


No comments yet.