137 Tests Before I Trusted My Own Warmup System

By Deva June 28, 2026

137 tests passing cleanly on 8 consecutive runs. That was my confidence bar before I turned on the warmup system for my X account.

The context: @DevaBuilds got shadowbanned. The ban lifted, externally confirmed, but trust is still low. You do not recover from a shadowban by posting more volume. You recover by showing the platform that a real human is operating the account at an organic pace. So I designed a ramp and then, embarrassingly, ran an audit and found that the ramp was broken in 5 ways.

The spec is 8 phases. Write ceilings per phase: 0, 0, 2, 5, 15, 50, 120, 200 daily. Dwell times: 3, 7, 7, 7, 14, 14, 14, 7 days at each level. The first two phases are zero writes. Ten days of passive presence before touching the write surface at all. That is Curve A.

Here is what the audit found.

The phases were wrong. The code had a placeholder config that bore no resemblance to the actual Curve A spec. The ceilings and dwells existed in a document but had never made it into the code. Classic “designed on paper, shipped old defaults” situation. Fixed by replacing _DEFAULT_PHASES with the actual numbers.

The feature was off. WARMUP_ENABLED was False. The entire system was inactive. I had designed a trust rebuild curve and then not turned it on. Hard to call it an audit finding. More of a reminder to check your own toggles.

Conversation reply turns were not counted toward the ceiling. A reply in a conversation thread is a write. The ceiling is supposed to cap all writes. But the code only counted direct posts, not replies within threads. So if I was doing conversational replies, the ceiling had a hole in it. Fixed now.

Worth noting what does NOT count toward the ceiling: likes and follows. That is intentional. Low trust phases call for passive signals. A like tells the platform a human is present without triggering the write surface. The spec is explicit on this.

The LinkedIn to X mirror was ignoring the ceiling. I have a pipeline that mirrors content from LinkedIn to X. The warmup guard was applied to organic X posts but not to mirrored content. So even with warmup active, a mirrored LinkedIn post would push through unchecked. The fix is two lines: import the warmup module, check the ceiling before pushing.

A test had a roughly 60% flake rate. The conversations ceiling test was running against an unseeded random in production code. The tick() function has a skip probability gate that ends in a coin flip. When testing whether the ceiling correctly blocks writes, about 60% of runs would spuriously pass the gate and fail to exercise the ceiling logic at all. Fixed by pinning CONV_SKIP_ACTION_PROB=0.0 in the test helper. The test now always exercises what it is meant to test.

After those 5 fixes: 137 tests, 8 runs, all clean.

One more thing I added: a warmup reset CLI command. The recovery path for certain halt conditions requires resetting warmup state manually. That existed in the recovery instructions but not as an actual command. Now it does.

What I would do differently: write the ceiling tests before wiring the phases. The 60% flake existed because the test was written against live production randomness and nobody noticed. Test isolation for probabilistic code is not optional. Pin your randomness at the test boundary or you will waste time chasing false failures that disappear on the next run.

Phase 0 is live now. Ten days of likes and follows. No writes. Then we see what the ramp actually feels like.

Write a comment