OpenAI Releases 'GPT-5.5' Model, Codenamed 'Spud'

OpenAI has released its most capable model, GPT-5.5, codenamed 'Spud,' which is available to paid subscribers. The model is said to offer enhanced autonomy for multi-step workflows and has shown improved performance in areas like coding and scientific research.
OpenAI Releases 'GPT-5.5' Model, Codenamed 'Spud'

OpenAI Releases ‘GPT-5.5’ Model, Codenamed ‘Spud’ Human Human coverage depicts GPT-5.5 “Spud” as OpenAI’s most capable model yet, excelling on demanding benchmarks and real-world, multi-step professional tasks while serving as a fast, steady workhorse. It also underlines remaining limitations, notes skepticism about “smartest and most intuitive” branding, and situates the release within OpenAI’s subscription business model and the broader shift toward a compute-powered economy. @4qd8…qnwa @Every @Verge @7nam…jhkr OpenAI’s latest flagship model doesn’t arrive as a drumroll toward some distant AGI future. It lands like a corporate memo: GPT-5.5, codenamed “Spud,” is here to do real work now—while rivals argue over whose bot can count to ten.

April 23: OpenAI drops “Spud”

On April 23, OpenAI quietly but decisively moved the goalposts in the model race, releasing GPT-5.5 as its “most capable” system to date for paid ChatGPT and Codex users.1 Axios framed the launch as part of a broader acceleration: AI releases are now “faster, more efficient and more powerful,” with GPT-5.5 designed as “a new class of intelligence” and “a big step towards more agentic and intuitive computing,” according to OpenAI co-founder Greg Brockman.1

The pitch from OpenAI’s briefing was simple: keep GPT-5.4’s speed, add far more autonomy. GPT-5.5 is said to be a “faster, sharper thinker for fewer tokens” than 5.4, able to handle multi-step workflows with less hand-holding from users.1 Instead of forcing people to micromanage every step, OpenAI says users can throw “messy, multi-part tasks” at Spud and let it plan, call tools, check its own work and run toward a finished output.1

The model arrives with a 1 million‑token context window, extended prompt caching, and similar tooling and rate limits to GPT‑5.4.2 Early partners, OpenAI claims, used it to “gut check vibe-coded work, review thousands of additional documents and save up to 10 hours on work per week.”1

Critically, API access is being held back pending extra cybersecurity guardrails, a nod to growing industry pressure over model misuse.1

The strategy pivot: from spectacle to workhorse

For much of the last year, OpenAI’s story has been about splashy demos—video via Sora, web-browsing via Atlas, consumer ChatGPT tricks and creative tools.2 In that vacuum, Anthropic quietly became the default for serious workflows: Claude Opus 4.7 for long-haul engineering and planning, Sonnet as the dependable workhorse.3

GPT‑5.5 is explicitly designed to claw that territory back. One early review from Every describes Spud as having “minimize[d] the typical tradeoffs” of big models: more depth without the usual hit to speed, more agency without losing control, better code and strong prose.2 On their new Senior Engineer Benchmark—how well a model can “rewrite a slop-coded codebase the way a senior engineer would”—GPT‑5.5 with extra-high reasoning scored 62.5 on its best run, while Claude Opus 4.7 “landed in the low 30s.”2

Human senior engineers still sit in the “high 80s and low 90s,” a reminder that the ceiling is higher than the hype.2 But the direction is clear: OpenAI wants GPT‑5.5 to be the model you hand your company’s cruftiest internal systems to, not just your email drafts.

Interestingly, that same evaluation found GPT‑5.5 did its best work executing a plan written by Claude Opus 4.7.2 In other words: Claude still wins at designing the playbook; Spud shines at running it.

Inside the labs: “chief of staff” in silicon

OpenAI’s enterprise partners are already selling Spud as a synthetic colleague, not a chatbot. Axios reports Nvidia has been testing GPT‑5.5 internally for weeks, with the model now able to act as a “chief of staff,” powering AI agents that “are already acting as employees” at the chipmaker.1

Justin Boitano, Nvidia’s VP of enterprise computing, argues these agents can handle everything from document triage to technical review, fueled by Nvidia’s latest chips. Those chips, Nvidia claims, can slash the cost of running models like GPT‑5.5 “up to 35x per token,” a critical selling point for enterprises that want to scale AI without detonating their IT budgets.1

Greg Brockman puts this in grander terms: “We are moving to a compute-powered economy,” he told Axios—meaning AI capacity, and thus compute, becomes “the bedrock of the economy.”1

Internally, OpenAI staffers are cheerleading the same theme. One researcher, retweeted by CEO Sam Altman, claims: “I’m a manager at @OpenAI, but with GPT-5.5 I’m a more effective IC than I’ve ever been. I can now write CUDA kernels like a pro. I can rely on it to run my research experiments. And we know how to make it much more powerful from here.”4

Altman amplified a second quote about OpenAI’s roadmap: “We see pretty significant improvements in the short term, extremely significant improvements in the medium term” and that “the last few years have been surprisingly slow,” a not-so-subtle warning that the release pace from here is only going to accelerate.5

April 24 and beyond: hype, hesitation, and hands‑on tests

The day after launch, The Verge summed up the mood with a shrug and a raised eyebrow: OpenAI says GPT‑5.5 is its “smartest and most intuitive” model yet. “That’s probably true, and yet…”6 The ellipsis does a lot of work. After years of breathless model launches, simply being smarter is no longer enough; the question is whether any of this changes how people actually work.

Independent testers quickly stepped into that gap with practical trials rather than benchmarks. One Substack reviewer describes using GPT‑5.5 for the kind of punishing, ambiguous work that usually breaks models: “messy files and legal risk and 23 deliverables that have to open in the right format.” This time, Spud came back with “something close to a real executive handoff. That has not happened before.”3

For months, that same reviewer routed “real” work to Anthropic: “Opus 4.7 was where I went first when the work was real, Sonnet was the workhorse for everything else, and ChatGPT was something I checked in on rather than something I built around.” GPT‑5.5 “changes that,” they write, calling it “stronger than anything I have used on complex, multi-step work,” especially when wrapped in OpenAI’s broader harness of Codex, computer use, and Images 2 to “actually finish things.”3

On a set of three hard tests—an executive knowledge‑work package, a 465‑file data migration, and an interactive 3D research build—GPT‑5.5 posted “the closest thing to a real executive handoff” and became the new default starting point for “serious execution work.”3

Still, the verdict is not unqualified. The same piece warns that backend hygiene “is still not production-safe,” and that “blank-canvas visual taste remains Claude’s territory.”3 In other words: Spud is less a magical agent, more a very fast, very capable junior team that still needs review.

The vibe wars: love, shade, and counting to ten

If official briefings and newsletters painted GPT‑5.5 as a sober productivity upgrade, social media did what it does best: turned the launch into a vibe check and a meme battle.

Every’s review opens with the blunt verdict “Vibe Check: GPT‑5.5 Has It All,” calling it “much faster than Opus 4.7, easier to collaborate with, better at writing than any OpenAI model we’ve used since GPT‑4.5 and GPT‑4o, and the strongest model we’ve tested” on their engineering benchmark.2 Their summary: GPT‑5.5 is the “fast, capable workhorse” OpenAI needed to “reclaim the code-and-work narrative,” even if Opus still has a sharper eye for design and product details.2

On X, one user raved that GPT‑5.5 “feels like it absorbed the best of the previous ones: intelligence, insight, sense of humor and memory all work beautifully here. An absolutely stunning personality overall. OpenAI absolutely cooked,” a post Sam Altman blessed with a minimalist “🫶.”7

But the competitive sniping came just as fast. Elon Musk boosted a comparison test between his company’s Grok 4.3, GPT‑5.5 and Claude Opus 4.7. The prompt was almost comically simple: “Count to 10 starting from 11.” According to the tweeted result, “Grok 4.3 wins 🏆 Every single time. It gave 11, 10 and explained why going backwards was the only logical move… The others started counting from 11 to 20.”8

It’s a neat bit of rhetorical judo: frame GPT‑5.5 as the plodding, literalist bot that can’t see the “logical” trick, while Grok plays the clever contrarian. Whether that matters more than handling 465‑file migrations is another question entirely.

The unresolved tension: smarter, but to what end?

Across all these perspectives, a pattern emerges.

OpenAI and its enterprise allies see GPT‑5.5 as infrastructure: the thing that makes a “compute-powered economy” plausible by turning AI into a durable, repeatable worker.1 Power users and reviewers see a serious leap in actual, painful work getting done—from executive packets to gnarly code rewrites—even if the model still needs guardrails and good taste imported from elsewhere.23

Critics and rivals, meanwhile, are already bored of the “smartest and most intuitive yet” refrain and are attacking at the edges: from wry skepticism about open‑ended intelligence claims6 to cherry-picked reasoning puzzles where GPT‑5.5 comes off as less “intelligent” than the competition.8

What’s left is the unease The Verge hints at: yes, this is probably OpenAI’s smartest and most intuitive model so far.6 Yes, it appears to be a genuine upgrade for people doing real work. And yet, with OpenAI insiders promising that “we know how to make it much more powerful from here”4 and executives saying the “last few years have been surprisingly slow,”5 GPT‑5.5 feels less like a destination than the start of an arms race in which “good enough for production” keeps getting redefined.

Spud, in other words, might be the first OpenAI model that behaves like a dependable colleague. The question now is how many of those colleagues the world actually wants—and who ends up owning the compute‑powered economy they’re quietly building.


1. OpenAI releases “Spud” GPT-5.5 model — OpenAI on Thursday released its most capable model, GPT-5.5, codenamed “Spud,” just one week after competitor Anthropic launched its latest model.

2. Vibe Check: GPT-5.5 Has It All — Frontier models usually come with tradeoffs… The surprising thing about GPT-5.5, the new OpenAI model out today, is how few of those tradeoffs it asks you to make.

3. ChatGPT 5.5 scored 87 where the next best model scored 67. Here’s what that gap looks like in real work. — Using GPT-5.5 for the first time was the most blown-away I have felt about a model release in a while, and the reason is not benchmark scores.

4. OpenAI releases “Spud” GPT-5.5 model — The model can act as a “chief of staff,” helping power agents that are already acting as employees at Nvidia, vice president of enterprise computing Justin Boitano told Axios.

5. OpenAI releases “Spud” GPT-5.5 model — “We are moving to a compute-powered economy,” Brockman added, referring to the idea that work will be powered by AI capacity, and therefore compute will become the bedrock of the economy.

6. Hearsay. — OpenAI says its new GPT-5.5 model is its “smartest and most intuitive” model yet. That’s probably true, and yet…

7. @sama on X — GPT-5.5 is a breath of fresh air… An absolutely stunning personality overall. OpenAI absolutely cooked.

8. @elonmusk on X — The exact same question to Grok 4.3, GPT 5.5, and Claude Opus 4.7: “Count to 10 starting from 11” Grok 4.3 wins 🏆 Every single time.

9. @sama on X — I’m a manager at @OpenAI, but with GPT-5.5 I’m a more effective IC than I’ve ever been. I can now write CUDA kernels like a pro.

10. @sama on X — OpenAI: “We see pretty significant improvements in the short term, extremely significant improvements in the medium term” “I would say the last few years have been surprisingly slow.”

Story coverage

Referenced event not yet available nevent1qqsf6…0c5c2rpk
Referenced event not yet available nevent1qqsfq…gsq2ukj7
Referenced event not yet available nevent1qqsve…vqqf7hts
Referenced event not yet available nevent1qqsxd…rga9svg5

Write a comment
No comments yet.