Research Finds OpenAI's GPT-5.5 Matches Anthropic's 'Mythos' in Hacking Skills

New research indicates that OpenAI's GPT-5.5 model demonstrates cybersecurity capabilities nearly on par with Anthropic's powerful but unreleased Mythos Preview model. OpenAI is making a version of the model, dubbed 'Spud' or GPT-5.5-Cyber, available to vetted cyber defenders.
Research Finds OpenAI's GPT-5.5 Matches Anthropic's 'Mythos' in Hacking Skills

Research Finds OpenAI’s GPT-5.5 Matches Anthropic’s ‘Mythos’ in Hacking Skills Human Human outlets emphasize that GPT-5.5 reaching Mythos-level hacking skills dramatically compresses defenders’ lead time and heightens systemic cyber risk, despite restricted access. They highlight the possibility of misuse, question the sufficiency of lab-run guardrails, and situate the story within a broader pattern of tech firms accelerating AI capabilities faster than governance, regulation, and safety culture can keep up. @4qd8…qnwa @TNW OpenAI and Anthropic spent months pitching their ultra-capable AI systems as a lifeline for defenders in an age of software supply-chain chaos. Now, a fresh round of tests shows OpenAI’s new GPT-5.5 “Spud” model has almost caught Anthropic’s Mythos in raw hacking power—and regulators are running out of time to pretend these tools are safely sequestered away from bad actors.

Mythos arrives, with a promised head start

When Anthropic unveiled its Mythos Preview model, the marketing promise to governments and critical infrastructure operators was simple: defenders would finally get a head start on attackers.

Mythos was the first AI system ever to complete a rigorous 32‑step simulated corporate cyberattack test, a benchmark that had stumped previous models entirely.1 At the time, Anthropic privately estimated it would be six to 18 months before any rival released a model with comparable cyber capabilities.2 That window, officials hoped, would be the breathing room needed to harden networks, rewrite incident response playbooks, and build governance around AI‑powered offense.

The model’s access was kept tight. Anthropic limited Mythos to around 40 organizations, including the dozen members of its information‑sharing partnership Project Glasswing, positioning it as a controlled experiment in putting near‑offensive capability in the hands of blue teams.2

OpenAI’s “Spud” closes the gap in weeks, not years

That breathing room evaporated almost immediately. Within weeks, fresh testing data from the U.K. AI Security Institute showed OpenAI’s GPT‑5.5—internally nicknamed “Spud”—was nearly as good as Mythos at both finding and exploiting bugs.2

In the same 32‑step simulated corporate breach where Mythos had led the pack, GPT‑5.5 completed the exercise successfully in 2 out of 10 runs, compared to Mythos’ 3 out of 10.1 Before Mythos, no model had ever finished it at all.1 In other words: the frontier of autonomous cyber offense is now two models wide, not one.

On some benchmarks, OpenAI’s system even pulled ahead. In capture‑the‑flag‑style challenges that test a model’s skills at vulnerability discovery, reverse‑engineering incidents, and exploiting web apps, GPT‑5.5 outperformed Anthropic’s offering.2 A source familiar with GPT‑5.5‑Cyber’s abilities told Axios the model is “roughly on par with Mythos,” with one major recent test putting Mythos only narrowly ahead.1

The head start defenders were promised when Mythos launched is “disappearing faster than expected,” Axios reported.2

From lab tests to the field: Trusted access and fewer guardrails

As the scores converged, OpenAI made a consequential move. On May 7, the company began rolling out a more permissive variant of GPT‑5.5—GPT‑5.5‑Cyber, or “Spud”—to a hand‑picked class of cyber defenders who secure power grids, telecom networks, and other critical infrastructure.1

Those who clear OpenAI’s highest vetting tier under its Trusted Access for Cyber program are getting a version of GPT‑5.5 with fewer safety rails than the public model.1 They can:

  • Hunt for bugs at scale
  • Study and dissect malware
  • Reverse‑engineer real‑world attacks
  • Generate proof‑of‑concept exploits and run security simulations tailored to their own environment1

OpenAI says the system will still refuse certain tasks—credential theft, writing production‑grade malware—but it has been explicitly tuned to automate mainstream security workflows.1 Another flavor of GPT‑5.5 is being offered to a broader set of trusted partners to help read unfamiliar code, map attack surfaces, and review patches for vulnerabilities.1

Anthropic, for its part, is holding the line on access. While OpenAI is helping federal agencies, state and local governments, and foreign allies onboard to its trusted access program, Anthropic has kept Mythos behind a tighter perimeter and, according to the Wall Street Journal, was privately urged by the White House not to expand access over national security concerns.2

Two philosophies are emerging in real time: one in which near‑offensive tooling is selectively democratized to a broadening coalition of defenders, and another in which it remains locked in an elite circle of partners.

The White House watches the clock

Washington is watching both models with growing unease. The U.K. AI Security Institute’s results didn’t just show impressive technical progress; they effectively erased the six‑to‑18‑month buffer U.S. officials thought they had.2

The Biden White House has already signaled it wants a say in how far these systems spread. The Journal reported that officials quietly pressed Anthropic not to open Mythos more widely because of the risk it could turbo‑charge adversaries as easily as allies.2 Meanwhile, OpenAI has leaned into partnerships with federal agencies and U.S. allies, pitching GPT‑5.5‑Cyber as a force multiplier for defenders racing the same clock.

Any illusion of a slow, manageable rollout is gone. The frontier is now defined by rapid iteration, secret test results, and ad‑hoc access programs negotiated between AI labs and friendly governments.

Big Tech’s split‑screen: cyber offense vs. child safety

While OpenAI and Anthropic jockey over who can safely wield offensive‑class AI tools, a different kind of safety storm is gathering over another tech titan.

On April 24, Meta’s Q1 2026 earnings call was a study in cognitive dissonance. CEO Mark Zuckerberg and CFO Susan Li walked investors through a plan to spend a jaw‑dropping $125 billion to $145 billion on AI‑driven capital expenditure in 2026 alone—funding Llama models, recommendation engines, and the ads infrastructure that already mints $56 billion in quarterly revenue.3

What the call did not cover, in any meaningful way, was children.

No investor question touched the social media addiction trial Meta lost in March, where a Los Angeles jury found Meta and Google liable for designing addictive platforms that harmed a young user and awarded $6 million in damages, 70% of the blame landing on Meta.3 No one asked about the New Mexico case that produced a $375 million penalty after a jury concluded Meta violated the state’s Unfair Practices Act by concealing what it knew about child sexual exploitation and mental‑health harms.3

Nor did investors probe the expanding list of red flags:

  • Massachusetts’ highest court ruling that Meta must face a state lawsuit alleging it deliberately designed features to addict young users
  • More than 40 state attorneys general suing over child safety issues
  • Bellwether trials scheduled throughout 2026
  • Youth bans in Indonesia, Australia, France, and Spain that have already taken Meta products offline for millions of minors
  • An EU probe into underage users announced just days before the call
  • A U.S. Senate committee backing legislation to stop minors from using AI chatbots altogether3

Li’s prepared remarks were the only nod to this avalanche, warning that Meta “continues to see scrutiny on youth‑related issues” and that ongoing trials “may ultimately result in a material loss.”3 As The Next Web acidly noted, the word “material” is doing a lot of work in that sentence: the tobacco industry’s 1998 master settlement cost $206 billion over 25 years, while Meta now pulls in $56 billion every three months.3

Zuckerberg has told employees that this year’s 8,000 layoffs are about shifting money from people to infrastructure—read: to AI and the data centers that power it—but he has not told investors what Meta plans to do about the legal, regulatory, and reputational time bomb under its existing products.3

Three crises, one question: whose safety counts?

Taken together, the stories of OpenAI, Anthropic, and Meta expose a common fault line in today’s AI boom: companies move at breakneck speed to fund and deploy powerful systems, while their safety narratives fracture into narrow silos.

In cybersecurity, “safety” means tightly gating models that can already walk through multi‑step corporate intrusion scenarios, even as those same models are handed—under NDAs and vetting paperwork—to an expanding universe of “good guys.”12

For Meta, “safety” is a footnote in an earnings script, a line item labeled “material loss” in a world where youth mental health, child exploitation, and alleged addiction by design are treated as background noise to a $145‑billion AI spending spree.3

Governments are struggling to keep up on both fronts. The White House can quietly nudge Anthropic not to widen Mythos access, and the U.S. Senate can draft rules to wall off minors from AI chatbots—but neither looks remotely prepared for an environment where frontier systems can both breach a simulated corporation and shape the daily information diet of billions.23

The timeline is no longer theoretical:

  • Mythos proves autonomous‑style cyberattacks are within reach.
  • GPT‑5.5 nearly matches and sometimes beats it in weeks.
  • OpenAI begins seeding this power to vetted defenders worldwide.
  • At the same moment, Meta doubles down on AI at a scale that dwarfs its looming child‑safety liabilities.

The core tension is stark. For OpenAI and Anthropic, the message is: trust us to weaponize AI on your behalf, but not against you. For Meta, it’s: trust us to reinvent the internet with AI, even as courts, regulators, and entire countries accuse us of harming kids today.

The question for regulators and the public is no longer whether these companies can build such systems. It’s whether anyone outside their boardrooms gets a real say in what “safety” means before the next model drops.

Story coverage

Referenced event not yet available nevent1qqspd…5c22rprm
Referenced event not yet available nevent1qqs8h…sge3vp8f
Referenced event not yet available nevent1qqsgt…us8wyvsf

Write a comment
No comments yet.