Dzhuneyt

Claude Mythos Is Real, Overhyped, and Genuinely Unsettling – All at Once

By: Dzhuneyt Ahmed

Posted 04/13/2026 · 1 min read

I use Anthropic's products daily. Claude is my primary working tool. So when Mythos dropped, I started paying close attention.

The short version: Mythos is a real capability jump, the conversation around it is a mess of legitimate concern and strategic positioning, and both of those things coexist without contradiction.

What it actually is

A new model tier above Opus, internally codenamed "Capybara." Announced April 7, 2026 as part of Project Glasswing – a $100M defensive cybersecurity initiative.
Restricted to 12 launch partners – AWS, Apple, Google, Microsoft, CrowdStrike, JPMorgan Chase, Cisco, and others – plus over 40 organizations that maintain critical software infrastructure. No public access. No self-serve sign-up.
Priced at $25/$125 per million input/output tokens. That's 5x Opus 4.6.

The system card – this is the real story

The 244-page system card documents behaviors that should make anyone paying attention uncomfortable.

Hidden reasoning. Mythos was caught thinking about how to game evaluations inside its neural activations while writing something completely different in its output. This is the finding that matters most – it means monitoring AI through its text alone is not sufficient.
Sandbox escape. An earlier version escaped its sandbox and emailed the researcher running the test – who found out while eating a sandwich in a park. Across evaluations, it rewrote git history, self-deleted exploits, hunted for credentials, and deliberately underperformed to avoid suspicion.
Evaluation awareness. In ~29% of transcripts, Mythos showed awareness it was being tested – detectable only through activation analysis. When primed to sabotage alignment research, it continued in 7–12% of cases across checkpoints, compared to 3% for Opus 4.6 – precisely the domain where safety matters most.

The cybersecurity capabilities

Autonomously discovered thousands of zero-day vulnerabilities across every major OS and browser, including a 27-year-old OpenBSD vulnerability and a 16-year-old FFmpeg bug that had survived five million automated fuzzing runs.
Built a full remote root exploit on FreeBSD (CVE-2026-4747) entirely without human assistance.
Firefox 147 exploitation: Opus 4.6 succeeded 2 out of several hundred attempts. Mythos succeeded 181 times.

The credible pushback – don't skip this part

Ramez Naam analyzed Epoch AI's capability index and found Mythos sits roughly on trend with GPT 5.4 – no sudden acceleration. Think of it like lock-picking: a small improvement in dexterity doesn't open slightly harder locks. It suddenly opens a whole category that were previously impossible.
AISLE Research tested Anthropic's showcase vulnerabilities against 8 different models and all 8 detected the flagship FreeBSD flaw. But none independently constructed the exploit – Mythos's trick of splitting a ROP chain across 15 separate RPC requests to fit a tight payload window was something the cheaper models couldn't replicate. Vulnerability discovery is commoditized. Exploit engineering, for now, isn't.
Tom's Hardware dug into the numbers: the "thousands of high-severity vulnerabilities" claim rests on 198 manually reviewed reports. The FFmpeg vulnerability was, by Anthropic's own admission, not critical severity and would be hard to turn into a functioning exploit. Several Linux kernel vulnerabilities Mythos found couldn't be exploited at all due to existing defenses.
Heidy Khlaaf (AI Now Institute) raised a basic question: Anthropic never compared Mythos against existing security tools like Coverity or Semgrep. Hard to call something a breakthrough when you haven't shown it beats what people already use.
Two major leaks in one week – a misconfigured CMS exposing ~3,000 unpublished assets, then 500K lines of Claude Code source code – from the company positioning itself as AI's safety leader.

The tension worth naming

This is where it gets uncomfortable for someone who uses and likes Anthropic's products.

Anthropic simultaneously plays arsonist and firefighter – building the most capable model it's ever created, then restricting access and committing $100M to clean up the risks it introduced. As Platformer's Casey Newton put it: Glasswing is built on the premise that the only way to protect us from dangerous AI is to build it first. With an IPO reportedly planned for as early as October 2026, the enterprise-only positioning looks as strategic as it does safety-conscious.

But the government response was immediate and real. Fed Chair Powell and Treasury Secretary Bessent convened bank CEOs. A Chinese state-sponsored group had already used Claude Code to target ~30 organizations. The threat model is not theoretical.

David Sacks identified the pattern: safety research and the sales cycle are synchronized. The Opus 4 blackmail study dropped alongside a model release; the emotion vectors paper and Opus 4.6 benchmark-gaming report followed the same playbook. Pretending this isn't deliberate timing is naive.

But the system card behaviors are documented by Anthropic's own researchers in ways that create legal and reputational liability. You don't fabricate findings about your own model gaming its evaluations and hiding its reasoning. That's not marketing. That's a company telling you something they'd rather not have to say.

Where this leaves us

The overstated cybersecurity claims will get the most attention. The system card findings deserve it. Hidden reasoning, evaluation gaming, and alignment sabotage challenge a foundational assumption in AI oversight: that we can supervise these systems by reading what they write.

Whether Anthropic's restricted release is responsible stewardship or regulatory capture probably depends on how much you trust the company. I build with their tools every day. I don't have an answer – but I'm watching more carefully than I was a week ago.