The Cost of a Confidently Wrong Agent
The failure mode that matters is not the agent that stops. It is the one that finishes, wrong.
Why it matters
Context is the scarce resource. The teams shipping real agentic work treat the context window like a budget: what goes in, what gets summarized, what gets dropped. Most agent failures are context failures wearing a different costume.
Reliability compounds badly. An agent that is right ninety-five percent of the time across a ten-step task succeeds barely sixty percent of the time. The math is unforgiving, which is why narrow, verifiable steps beat ambitious ones.
The human moved up a level. You no longer write the fix; you specify the goal and judge the result. The scarce skill now is a crisp success criterion and an honest review of the diff.
An agent is only as good as the loop it runs. Perception, plan, act, verify — the failure usually hides in the verify step, where the model that wrote the code also grades it. Independent checks are not optional; they are the architecture.