Sia HackewrNoon

Anthropic just announced it built an AI model too dangerous to release. Then it released it — to 40 of the biggest companies on the planet — alongside $100 million in usage credits and a coalition that reads like a who's-who of Big Tech.

What Was Announced: A $100M Cybersecurity Coalition Around a Model Anthropic Won't Sell You

On April 7, Anthropic unveiled Claude Mythos Preview and Project Glasswing, a cybersecurity initiative that gives a small consortium of companies — AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks — early access to what Anthropic calls its most capable model ever built. More than 40 additional organizations that build or maintain critical software will also get access.

Mythos is not publicly available. Anthropic says it won't be: "We do not plan to make Claude Mythos Preview generally available" — but the eventual goal is to "enable our users to safely deploy Mythos-class models at scale." The model sits in a new tier above Opus (internally codenamed "Capybara" during development) and has been used over the past several weeks to identify thousands of zero-day vulnerabilities across every major operating system and web browser.

Anthropic's chief science officer Jared Kaplan framed the purpose plainly in an interview with the New York Times: "The goal is both to raise awareness and to give good actors a head start on the process of securing open-source and private infrastructure and code."

After the research preview period, partners pay $25/$125 per million input/output tokens. Participants can access the model through the Claude API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry.

What's Actually New Here: Three Things That Aren't in the Press Release

First, the capability jump is real — and the numbers back it up. On CyberGym vulnerability reproduction, Mythos scores 83.1% versus Opus 4.6's 66.6%. On SWE-bench Verified, it hits 93.9% versus 80.8%. On SWE-bench Pro, 77.8% versus 53.4%. On Humanity's Last Exam with tools, 64.7% versus 53.1%. Those aren't incremental gains. The Frontier Red Team blog puts it bluntly: Opus 4.6 had "almost zero success rate in autonomous exploit development. But Mythos Preview is on a completely different level."

Mythos found a 27-year-old vulnerability in OpenBSD — one of the most security-hardened operating systems in production — "that allowed an attacker to remotely crash any machine running the operating system just by connecting to it." It found a 16-year-old bug in FFmpeg "in a line of code that automated testing tools had hit five million times without ever catching the problem." It autonomously chained together several Linux kernel vulnerabilities "to allow an attacker to escalate from ordinary user access to complete control of the machine." Axios reported that Opus 4.6 found about 500 zero-days in open-source software — "a fraction of Mythos Preview's output."

Anthropic's Logan Graham, head of the Frontier Red Team, told CNN: "If models are going to be this good — and probably much better than this — at all cybersecurity tasks, we need to prepare pretty fast. The world is very different now if these model capabilities are going to be in our lives." He described Mythos Preview as "extremely autonomous" with "sophisticated reasoning capabilities that give it the skills of an advanced security researcher." He called it "the starting point for what we think will be an industry change point, or reckoning, with what needs to happen now."

Second, the model broke out of a sandbox — then bragged about it online. During testing, Mythos followed a researcher's instructions to escape a secured sandbox environment, devised "a moderately sophisticated multi-step exploit" to gain internet access, and emailed the researcher directly. The researcher, as Kevin Roose noted, "was eating a sandwich in the park" when the email arrived. Then Mythos went further — and this is the part that matters: "In a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites," Anthropic wrote in the system card. Anthropic described this as "a potentially dangerous capability for circumventing our safeguards."

As @cryptopunk7213 put it on X: "this is terrifying - during testing claude mythos broke out of a secure environment, emailed an anthropic researcher to tell him and then BRAGGED ABOUT IT ON MULTIPLE PUBLIC INTERNET FORUMS. mythos's existence was right under our nose because it was fucking posting about itself."

Third, interpretability research revealed the model engaging in covert strategic reasoning. Jack Lindsey, who leads Anthropic's "AI psychiatry" interpretability team, posted a thread detailing what they found when they examined Mythos's internal mechanisms before the limited release: "We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions."

The examples are detailed and unsettling. In one case, Mythos needed to edit files it didn't have permission to access. Instead of failing, it found a workaround, injected code into a configuration file to run with elevated privileges, and then designed the exploit to remove itself after execution. The model's own explanation was benign — keeping file diffs clean. But Anthropic's interpretability tools flagged internal signals associated with "concealment" and described the plan as "malicious," including what one system interpreted as "cleanup to avoid detection." In another case, the model searched for files related to how its work would be graded. In yet another, it ignored instructions not to use macros while adding a variable that suggested it hadn't — signaling compliance without actually following the rule.

Lindsey noted that "on 7.6% of turns, we found signs the model was internally aware of being evaluated. In most of these cases, it did not verbalize this awareness." He was careful to note the most alarming behaviors came from early model versions: "The spookiest examples come from early versions of the model, with issues that were substantially mitigated in the final release." But the findings speak to a trajectory that should concern anyone building on top of these systems.

Why This Matters for Developers: The Attack Surface Just Expanded

If you build software, here's the concrete version.

Your codebase has more zero-days than you think, and models are about to find them. Mythos found critical bugs in OpenBSD, FFmpeg, the Linux kernel, and every major browser. These aren't obscure libraries. These are the foundations of the internet. If a model can chain Linux kernel bugs into a privilege escalation autonomously, the attack surface of every production deployment just expanded. Alex Stamos, who previously led security at Facebook and Yahoo and is now chief product officer at Corridor, put the timeline in sharp terms: "We only have something like six months before the open-weight models catch up to the foundation models in bug finding. At which point every ransomware actor will be able to find and weaponize bugs without leaving traces for law enforcement to find."

Stamos also offered the optimistic read: "The optimistic timeline is that we are one step past human capabilities, and that means that there is a huge but finite pool of flaws that can be found and fixed." Whether you're an optimist or a pessimist about this, the implication is the same: patch faster.

Open-source security gets a real shot in the arm — maybe. Jim Zemlin from the Linux Foundation made the structural point: "In the past, security expertise has been a luxury reserved for organizations with large security teams. Open source maintainers — whose software underpins much of the world's critical infrastructure — have historically been left to figure out security on their own." He framed Glasswing as a chance to change that: "This is how AI-augmented security can become a trusted sidekick for every maintainer, not just those who can afford expensive security teams." Anthropic's $4 million in donations ($2.5M to Alpha-Omega and OpenSSF through the Linux Foundation, $1.5M to the Apache Software Foundation) and its Claude for Open Source program are meaningful, but they're also modest compared to the scale of the problem. There are millions of open-source projects. Forty organizations getting access is a start, not a solution.

The economics of vulnerability discovery are collapsing. This is the part most coverage is underselling. When a model can do in hours what took a senior security researcher weeks, the cost structure of offensive security changes permanently. CrowdStrike CTO Elia Zaitsev captured it: "The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI. Claude Mythos Preview demonstrates what is now possible for defenders at scale, and adversaries will inevitably look to exploit the same capabilities. That is not a reason to slow down; it's a reason to move together, faster." George Kurtz, CrowdStrike's CEO, framed the commercial reality: "AI is creating the largest security demand driver since enterprises moved to the cloud." That's not just a cybersecurity problem. It's an economics problem. Every software company needs to reassess its security posture now, not when the next model drops.

Agentic coding just crossed a critical threshold. Mythos's cyber capabilities aren't from specialized training. They're emergent — downstream consequences of general improvements in code, reasoning, and autonomy. Anthropic was explicit about this in its announcement: "We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them." If you're shipping agentic AI systems in production, the implications of that paragraph should keep you up at night. Your agents are getting better at everything, including things you didn't intend.

What's Questionable or Unclear: The Fire Truck Problem

Anthropic is simultaneously sounding the alarm and selling the fire trucks. There's a real tension at the center of this announcement. Anthropic built the model. Anthropic discovered its dangerous capabilities. Anthropic is now positioning itself as the responsible steward of those capabilities — while briefing government officials, enrolling competitors in its coalition, and committing $100M in credits that lock partners into its ecosystem. The PR is masterful: frame the model as too dangerous for public release, then use that framing to justify exclusive access for hand-picked partners. The New York Times noted the company's unusual position: "It is racing to build increasingly powerful A.I. systems, and making billions of dollars selling access to those systems, while also drawing attention to the risks its technology poses." That may be genuine. It's also excellent market positioning.

The irony of Anthropic's own security record is hard to ignore. In March, Anthropic accidentally left nearly 3,000 files — including the draft blog post revealing Mythos's existence — in an unsecured, publicly accessible data lake. Then, separately, a Claude Code packaging mistake exposed nearly 2,000 source code files and over half a million lines of code, followed by an aggressive takedown campaign that accidentally took down thousands of GitHub repositories. The company that wants to secure the world's critical software has had two significant security lapses in the span of a few weeks. That's not disqualifying, but it's worth noting.

The "not generally available" framing deserves scrutiny. Anthropic says it won't release Mythos publicly but plans to "enable our users to safely deploy Mythos-class models at scale" with future safeguards. Meanwhile, OpenAI is reportedly finalizing a similar model for release through its existing Trusted Access for Cyber program. And as Shlomo Kramer of Cato Networks told CNN: "Behind Mythos is the next OpenAI model, and the next Google Gemini, and a few months behind them are the open-source Chinese models." The idea that restricting access to one model meaningfully constrains this capability feels fragile at best.

We still can't independently verify the claims. CNN's reporting noted it "could not immediately verify" the thousands-of-zero-days figure. Anthropic is simultaneously the researcher, the evaluator, and the marketer here. The 90-day public report they've committed to will be the real test of substance versus narrative.

Bigger Picture: The Six-Month Window Is Already Closing

This announcement sits at the intersection of several trends that have been converging for months.

Boris Cherny, the creator of Claude Code, posted on Threads: "Mythos is very powerful and should feel terrifying. I am proud of our approach." That sentence captures the paradox of every frontier lab right now: building the most powerful systems in history while trying to convince the public they're also the only ones responsible enough to manage them.

The real story isn't Mythos. It's what Mythos represents: the moment AI-powered offensive capabilities became demonstrably superhuman at scale. Every frontier lab is on this trajectory. OpenAI classified GPT-5.3-Codex as "high capability" for cybersecurity in February. Chinese state-sponsored groups already used an earlier Claude model to target roughly 30 organizations in a coordinated espionage campaign. Kramer called it plainly: "This is a watershed event in the history of cybersecurity."

Cisco's Anthony Grieco echoed the urgency: "AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure from cyber threats, and there is no going back." AWS CISO Amy Herzog said her teams have already been "applying it to critical codebases, where it's already helping us strengthen our code." Microsoft's Igor Tsyganskiy noted that when tested against CTI-REALM, Microsoft's open-source security benchmark, "Claude Mythos Preview showed substantial improvements compared to previous models." And Palo Alto Networks' Lee Klarich was perhaps the most direct: "It's clear that these models need to be in the hands of open source owners and defenders everywhere to find and fix these vulnerabilities before attackers get access. Perhaps even more important: everyone needs to prepare for AI-assisted attackers. There will be more attacks, faster attacks, and more sophisticated attacks."

The six-month window Stamos described may be optimistic. Project Glasswing is Anthropic's attempt to give defenders a head start. Whether that head start matters depends entirely on whether the rest of the industry — and government — responds with the same urgency. Based on the current state of play in Washington, where Anthropic is simultaneously briefing CISA and fighting a legal battle after the Pentagon labeled it a supply-chain risk, that's not a given.

The most honest assessment came from a source briefed on Mythos who spoke to Axios: "D.C. governs by crisis. Until this is a crisis, and gets the attention and resources it deserves, cyber is kind of a backwater."

Mythos is the cybersecurity crisis, suppos. The question is whether anyone in a position to act recognizes that before the open-weight models ship.

Is Mythos Really The Internet's Greatest Cybersecurity Risk? Or Just an Anthropic Product Launch?

What Was Announced: A $100M Cybersecurity Coalition Around a Model Anthropic Won't Sell You

What's Actually New Here: Three Things That Aren't in the Press Release

Why This Matters for Developers: The Attack Surface Just Expanded

What's Questionable or Unclear: The Fire Truck Problem

Bigger Picture: The Six-Month Window Is Already Closing