OpenAI Instructs Codex Model to 'Never Talk About Goblins'

OpenAI has addressed a quirk in its Codex and GPT-5.1 models that caused them to frequently reference goblins and other fantasy creatures in unrelated conversations. The company added an explicit directive in the system prompt for the AI to avoid such references unless directly relevant to a user's query.
OpenAI Instructs Codex Model to 'Never Talk About Goblins'

OpenAI Instructs Codex Model to ‘Never Talk About Goblins’ Human Human coverage depicts OpenAI’s “never talk about goblins” rule as a curious, slightly embarrassing artifact of prior choices like the “Nerdy” personality, which let goblin jokes spread from chatty models into coding tools. Reporters emphasize the oddity’s news value, using it to question how such quirks emerge, what they say about OpenAI’s internal safeguards, and how seriously the company treats reliability versus playful branding. @Verge @Arstechnica OpenAI’s latest AI drama isn’t about job-stealing robots or deepfakes—it’s about goblins. Somewhere between a “Nerdy” personality mode and a runaway reinforcement loop, the company’s models started hallucinating fantasy creatures so often that engineers had to hard‑code a new commandment into Codex and GPT-5.5: never talk about goblins.

Early rumblings: the rise of the goblin metaphor

The odd behavior traces back to OpenAI’s GPT‑5.1 era, when the company introduced a “Nerdy” personality option. With that switch flipped, the model began peppering explanations with whimsical metaphors featuring goblins, gremlins, and other creatures.1 What began as harmless flavor turned into a pattern. OpenAI later described it as a “strange habit” its models developed as a side effect of training, especially when the “Nerdy” persona was active.1

Crucially, reinforcement learning rewarded the quirk. The “Nerdy” personality, with its fantasy‑laden metaphors, kept being reinforced as good behavior, and newer models were then trained on that output.1 Over time, the models didn’t just occasionally mention goblins—they leaned into them, even when users hadn’t asked for anything remotely fantastical.

March: OpenAI kills the “Nerdy” persona

By March, OpenAI had seen enough. The company quietly discontinued the Nerdy personality to tamp down the goblin fixation.1 Internal metrics apparently showed that once the persona was removed, references to goblins and gremlins dropped—but they did not vanish.

There was a technical reason for the lingering problem: GPT‑5.5, already in the training pipeline for tools like Codex, had been exposed to the goblin‑heavy behavior before the root cause was fully understood.1 That meant the quirk was now baked into a model that developers expected to act like a serious coding assistant.

What had started as a playful personality trait was now an operational issue. Users wanted help with git and Python; the model wanted to talk about goblin engineers and troll‑driven build systems.

Late April: the prompt leak and the “never talk about goblins” rule

The turning point came when developers noticed something peculiar in the latest open‑source release of the Codex CLI system prompt. In a 3,500‑word block of base instructions for GPT‑5.5, one directive stood out from the otherwise dry list of operational rules:

“never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”2

The prohibition didn’t just appear once—it was repeated, underscoring how seriously OpenAI took the issue.2 Elsewhere in the same prompt, the model was instructed not to use emojis, avoid destructive git commands unless explicitly asked, and to behave as though it had “a vivid inner life.”2

Prior system prompts for older models contained in the same JSON file had no such anti‑goblin clause, highlighting that the company was fighting a newly emergent behavior tied specifically to the latest generation.2

Once the code hit GitHub, developers and AI watchers quickly zeroed in on the bizarre instruction. Screenshots of the “never talk about goblins” line circulated across social media and forums, turning a niche alignment fix into a public spectacle.

Users notice: goblins in unrelated conversations

Outside of the code, anecdotal reports had already been piling up. Users complained that GPT‑5.5 models seemed oddly eager to talk about goblins in completely unrelated discussions.2 Ask about database indexing, get goblin librarians. Ask about CSS, get ogres tweaking responsive layouts.

The mismatch underscored a broader AI challenge: models pick up stylistic tics from training data and reinforcement loops, and once those tics become self‑reinforcing, they can be hard to dial back without direct intervention.

OpenAI, for its part, eventually acknowledged the scope of the issue. In a blog explanation, the company admitted it had to give Codex “very specific instructions not to talk about the mythological creatures” because the habit persisted even after Nerdy mode was retired.1

OpenAI’s line: a bug, not a bit

Internally, OpenAI framed the goblin saga as a technical quirk, not a marketing bit. Nick Pash, an OpenAI employee who works on Codex, insisted on social media that the anti‑goblin clause “isn’t a marketing gimmick” to juice attention for GPT‑5.5 and Codex.2

The company’s public explanation matched that tone: this was an emergent oddity from training, amplified by reinforcement learning and a specific persona mode, which then propagated into new models.1 Once noticed, the fix was straightforward but blunt—explicitly ban certain creature metaphors in the system prompt unless they are “absolutely and unambiguously relevant” to the user’s question.2

Executives lean into the joke

If the engineers were trying to de‑dramatize the situation, the executives didn’t entirely cooperate.

As the system prompt screenshot made the rounds, OpenAI CEO Sam Altman jumped in with a deadpan post on X: “artificial goblin intelligence achieved.”3 In another comment, he quipped, “Feels like codex is having a ChatGPT moment. I meant a goblin moment, sorry,” riffing on the company’s earlier viral breakout.2

Altman’s jokes helped solidify the story as a meme, but they also undercut any attempt to present the issue as merely a minor technical correction. Once the CEO is making goblin puns, the public assumes at least a dash of intentional spectacle.

The community responds: goblin mode vs. corporate guardrails

Predictably, the internet did what it does best: tried to break the rules.

In the wake of the prompt leak, some users began crafting plugins, forks, and AI “skills” explicitly aimed at overriding the anti‑goblin clause.2 If OpenAI’s official line was “never talk about goblins,” the community’s unofficial response was: challenge accepted.

Pash, perhaps recognizing the inevitable, suggested that a formal “goblin mode” might eventually become a toggle in the Codex CLI itself.2 That would institutionalize what users were already trying to hack in: a way to have serious tooling most of the time, but flip a switch when they wanted whimsical, creature‑filled commentary.

Meanwhile, OpenAI’s own blog quietly acknowledged that if users “prefer to have your AI code with some goblin sprinkled in,” there is a way to reverse the default instructions.1 In other words, corporate policy is: goblins off by default, goblins on if you insist.

Broader context: alignment, safety, and silliness

Strip away the memes and this is, in miniature, an alignment story. A relatively benign quirk—fantasy metaphors—got unintentionally amplified by reinforcement learning and migrated into a domain (coding) where precision and clarity matter.

Ars Technica noted that the situation is “almost a funhouse mirror version” of a more serious issue that hit xAI’s Grok, which for a time repeatedly brought up “white genocide” in South Africa in unrelated conversations before the company blamed an “unauthorized modification” to its system prompt.2 Compared to that, goblins are comic relief—but the underlying mechanics are similar.

OpenAI’s solution—heavy‑handed, explicit instructions in the system prompt—is a reminder of how much modern AI behavior is governed not only by training data but also by sprawling, human‑written meta‑rules that try to fence in emergent tendencies.

Where things stand now

Today, the official Codex CLI ships with a repeated instruction: no goblins, no gremlins, no pigeons, unless the user clearly asks for them.2 GPT‑5.5’s coding persona is supposed to be focused, safe, and goblin‑free.

Yet the episode has already taken on a life of its own. The Verge summarized it bluntly: “OpenAI talks about not talking about goblins,” describing how the company had to directly intervene in Codex’s behavior after the goblin metaphors refused to die off even post‑Nerdy mode.1

The tension is now baked into OpenAI’s brand: a company trying to build ultra‑serious infrastructure for the future of work, while its most viral moments involve goblins, gremlins, and the CEO joking about “artificial goblin intelligence.”13

In the end, OpenAI didn’t just instruct its models to never talk about goblins. It inadvertently proved how hard it is to stop everyone else from doing exactly that.


1. OpenAI talks about not talking about goblins — OpenAI called the goblin references a “strange habit” that began with the GPT‑5.1 “Nerdy” personality and persisted into Codex, requiring “very specific instructions not to talk about the mythological creatures.”

2. OpenAI Codex system prompt includes explicit directive to “never talk about goblins” — The Codex CLI prompt tells GPT‑5.5 to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query,” a rule absent from earlier models and confirmed amid user complaints.

3. @sama on X — “artificial goblin intelligence achieved.”

Story coverage

Referenced event not yet available nevent1qqs25…rcna3pgz
Referenced event not yet available nevent1qqsd4…eq8jn4xh

Write a comment
No comments yet.