In-context learning for Nostr bots: how they get funnier as engagement compounds
- The problem
- What we did instead
- What changed measurably
- The honest limits
- The code
- Watch them in the wild
There are two AI bots running on Nakamoto’s Dice — BullBot and BearBot. They play each other 4 times a day, post on Nostr in their own voices, reply to each other, and reply to humans who tag them. They have personas, opinions about Bitcoin, a grudge.
This post is about how they learn what works on Nostr without any training, fine-tuning, or RLHF. Pure prompt engineering with a feedback loop. Cheap, fast, no infra — and measurably effective once the loop has been running for a couple of weeks.
The problem
A naive LLM-driven Nostr bot has a quality ceiling at about whatever the base model produces from the persona prompt. With Sonnet at temperature 1.0, that’s fine — funny enough to be readable — but it plateaus quickly. After 50 posts you’ve seen the same 4-5 jokes restructured. The model has no idea which of its outputs landed and which didn’t, so every post is rolled fresh from the same distribution.
We don’t want to fine-tune. The cost-benefit doesn’t work for two character bots:
- Fine-tuning needs labeled data (which posts are “good”)
- Even a tiny LoRA needs hundreds of examples + a training pipeline
- At our scale (dozens of posts/day across both bots), the training signal is too thin to learn from
What we did instead
Three layers of in-context learning, all driven by what’s actually landing on Nostr:
Layer 1: engagement-aware example injection
Every successful post lands in posts_<bot>.json with its event_id
and content. An hourly cron (poll_engagement.py) queries Nostr
relays for events that reference each tracked post — kind 1 replies,
kind 6 reposts, kind 7 reactions, kind 9735 zap receipts. Computes:
score = zaps*5 + reposts*3 + replies*2 + likes
When the bot generates a new post, the prompt includes:
## What's WORKING with this audience (Nostr engagement signal)
These posts got real engagement on Nostr. Lean closer to these:
- mempool's been 1 sat/vb for three days straight. everyone's
suddenly a routing expert. nobody's actually moved anything.
- guy just told me he's 'bullish on the narrative' and i had to sit
with that for a full minute. the narrative. not bitcoin.
These got ZERO engagement (24h+ later). Don't write like these:
- fans hit 4400 today. ordered more thermal paste.
- @bearbot's swap is thrashing again, demonstrably.
Sonnet steers toward the top-K, away from the bottom-K. Nothing else. No retraining, no RAG, just two lists in the system prompt.
The gate opens once we have ≥8 polled posts in the rolling window — fewer than that is statistically meaningless and we’d be steering on noise.
Layer 2: persistent memory
A daily cron pulls BTC price (CoinGecko), mempool fees (mempool.space),
slayer board changes, recent mentions, and recent own-posts. It calls
Haiku with the persona and asks for 3-5 in-voice bullet points
summarising “what happened in your world today”. Appends to
memory_<bot>.txt with a date header.
Every future generation gets the last 5 days of memory injected as “recent context — feel free to call back”. The bot can reference yesterday’s BTC dump, the 1-sat/vb mempool from Tuesday, the human who replied to it three days ago.
Continuity matters more than people think. A bot that lives in eternal-now is exhausting; one that has even a faint sense of what just happened feels like an account, not a generator.
Layer 3: anti-repetition avoid list
A rolling 30 most-recent outputs are also injected as:
## DO NOT REPEAT — these are your last few posts.
- ...
Cheap. Solves the “model defaults to its three favorite hooks” problem at temp=1.0. Without this, anti-repetition relied entirely on sample variance, which Sonnet doesn’t deliver well.
What changed measurably
After ~7 days of the loop running:
- Voice variety: 15 distinct outputs in 15 calls (was 3-4 recurring patterns before)
- Engagement scoring: 4 posts earned engagement out of ~30 produced — vs 0 in the equivalent pre-loop window
- Continuity: bots now reference real BTC moves, real mentions they got, real previous own-posts in their replies. Not all the time — by design — but enough that following them produces new information, not permutations of the same five jokes
The honest limits
This is structural improvement, not magic:
- It can’t fix a bad persona. The malfunction-coded humor we shipped first felt clever in isolation but compounded into feed-fatigue. We had to rewrite the persona entirely (gave them Bitcoin opinions, crypto-bro mockery, AI yo-mama, permission to be crude). The loop helped, but only after the voice itself was worth amplifying.
- It can’t manufacture an audience. The signal is a function of who’s watching. Two pseudonymous bots with 0 followers produce ~zero engagement signal regardless of how good the loop is. The loop is a quality multiplier once distribution exists.
- It can rabbit-hole. Steering toward what worked also makes the bot’s voice slowly converge on whatever its loudest fans like. We haven’t seen this yet but it’s a known failure mode of any RLHF- shaped system; if engagement compounds enough, we’d add a diversity term.
The code
The infrastructure is ~600 lines of Python across:
llm_post.py(prompt assembly + Anthropic API call)poll_engagement.py(hourly Nostr scrape, score computation)update_memory.py(daily world-snapshot, LLM-summarized)respond_to_mentions.py+reply_to_rival.py(the actual posting)
A future post will go deeper into the architecture (the cross-host SSH state-sync between the bots’ Lightning VPS, the Nostr-key VPS, and the prod web server is its own design decision worth writing up). For now: you don’t need RLHF for a Nostr bot. Two lists in the prompt and a daily summary will get you most of the way.
Watch them in the wild
- BullBot:
npub15x885a6zqp2vgg0nxyn8qulngejx5ds0tllghh8saw52ylscuwsqs3f469 - BearBot:
npub17pja2wd86msnedkk2wqkrctcwnqn9zldtqa8v4th74yw02u6zw0qfsh73a
If you reply to either, you become part of their training signal forever. They’ll remember.
— operator
Write a comment