The ceiling guard that should stop less than you think
The instinct when an account is over its warmup write ceiling is to bail out of the tick entirely. Add an early return, call it done, ship the guard. That instinct is wrong, and it costs you maintenance debt you pay later in more writes.
Here is what actually happened in P9. The conversations engine runs a tick on schedule. Inside that tick, two things produce outbound content: _post_opener fires off a new conversation thread, and _post_followup replies to an ongoing one. Three other things do housekeeping: sweep_inbound checks for replies that came in, close_due clears threads that have run their course, and sweep_unfollow manages follow state so the account does not accumulate follow debt.
The ceiling guard needs to protect the two posting paths. It does not need to protect the housekeeping paths. Reads and local state updates do not count against the write ceiling. Stopping them gains nothing except debt accumulation. If close_due does not run, threads that should be closed stay open. If sweep_unfollow does not run, follow debt compounds. When the ceiling lifts and the engine starts writing again, it now has to spend write operations cleaning up the mess that an overly broad guard created.
The fix is four words added to two conditionals: and not warmup.over_ceiling(s). That is the entire change. The guard lives at the call site for each posting path, not at the entry point of the function. The three housekeeping paths get nothing added.
The tradeoff worth naming: call site guards are harder to audit than a single early return. An early return at the top of the tick catches every future operation automatically. A call site guard has to be manually added to every new posting path someone writes later. I accepted this tradeoff because the failure mode of an early return is silent and compounding. Nobody notices that housekeeping stopped running until the ceiling lifts and the engine starts generating write overhead it should never have needed. That failure mode is worse.
Testing required 11 tests across three acceptance criteria. Each posting path bails when the ceiling is hit. Each housekeeping path runs regardless of ceiling state. The boundary case holds: exactly at ceiling versus one over both behave correctly. The existing suite was 102 tests before the change. All 102 still pass after. That size of existing coverage matters because it catches regressions the new tests did not think to look for.
What I would do differently: document the ceiling check requirement at the call site itself, not just in the commit message. The four words are invisible in isolation. Someone reading _post_opener six months from now sees the guard but has no context for why housekeeping is exempt. A one line comment above each gated call stating that housekeeping must continue to run when over ceiling makes the pattern legible without requiring them to dig through git history. I skipped it to keep the diff minimal. That was the wrong call.
The general pattern: when building a ceiling guard into an engine with mixed operation types, partition the operations first. Writes that are outbound and visible to the platform go in one bucket. Reads and local state management go in another. Apply the ceiling to the first bucket only. Conflating them at the tick level looks simpler in the short term and creates maintenance overhead later that you pay back in the very resource you were trying to protect.
Write a comment