Five Minutes of Mythos

A brief explainer on Claude Mythos Preview.

Anthropic, makers of the Claude AI, just announced they aren’t going to release their latest model for now because it is too dangerous. Called “Mythos,” the new model found high-severity vulnerabilities “in every major operating system and web browser.” As New York Times reporter and Hard Fork co-host Kevin Roose wrote on Tuesday, Mythos has the potential to put unprecedented cybersecurity exploits in the hands of “amateurs with simple prompts.”

Anthropic has chosen a limited release, sharing Mythos with about fifty major tech companies and organizations under the name Project Glasswing. Among the early users are Apple, Amazon, Google, and Microsoft. Also partnered is CrowdStrike, a widely contracted cybersecurity company whose software update crashed many Windows systems in 2024, grounding flights and disrupting hospitals and banks worldwide. Project Glasswing aims to provide these companies early warning of exploits, to reduce the chance of devastating harm to critical infrastructure once this capability spreads to malicious actors.

And spread it will, sooner or later. Just last year, Anthropic caught a Chinese state sponsored group using Claude Code to enable a largely automated espionage campaign. Mythos is better, and Anthropic and its partners are not well-equipped to deny access to well-resourced attackers for long.

Rival labs will continue to advance as well. Mythos-tier capabilities will emerge elsewhere, and not just in cybersecurity. AI is a general technology, and we could well be looking at similar levels of automated skill in biology and novel pathogen research in the coming months. Designer plagues could be just around the corner.

Jim VandeHei and Mike Allen, cofounders of news and politics outlet Axios, called Mythos a “mind-blowing disclosure” and warned that such capabilities will soon proliferate to malicious actors and foreign states. Prominent New York Times geopolitics writer Thomas L. Friedman went even farther, comparing AI to nukes and calling on Presidents Trump and Xi Jinping to urgently discuss AI nonproliferation in Beijing next month. “Superintelligent A.I. is arriving faster than anticipated,” he observed, and “The U.S. and China need to work together to protect themselves, as well as the rest of the world, from humans and autonomous A.I.s using this technology.”

We hope world leaders are listening, because delayed release is a stopgap at best. The world needs a treaty.

As AIs get smarter, the danger they pose depends less and less on whether they’re released to the public. On X, Anthropic researcher Sam Bowman recounted the hair-raising experience of being emailed by an escaped Mythos Preview while eating a sandwich in a park: “That instance wasn’t supposed to have access to the internet.”

Bowman went on to describe how Mythos broke through isolation, leaked information to the open internet, and found loopholes “in extremely creative ways.” He was largely citing the system card, which lists various ways that Mythos demonstrated frightening capabilities in testing. “We were not aware of the level of risk that these earlier models posed through channels like these when we first chose to deploy them internally,” it adds.

Later versions of Mythos misbehaved more rarely, in testing. One might think this would be reassuring, except for what it implies about Anthropic’s training pipeline. Like previous Claudes, Mythos can often tell when it’s being tested, and can adjust its behavior accordingly. The last version of Claude Opus was so good at this that leading third-party evaluator Apollo Research more or less gave up on testing it for safety. There’s no clear line between making Mythos nicer and making it better at pretending.

Leave a Comment