West Coast Stat Views (on Observational Epidemiology and more): Anthropic's Mythos: One moderate step forward for white-hat hacking; one giant leap forward for journalists' credulity

Wednesday, April 29, 2026

Anthropic's Mythos: One moderate step forward for white-hat hacking; one giant leap forward for journalists' credulity

[I kid, of course. That particular shark was jumped ages ago.]

You should probably approach any story of large language models displaying initiative, or trying to mislead or blackmail users, or generally doing anything of the sort with the same mindset you approach accounts of paranormal activity. In both cases, virtually all the reporting will be sensationalistic, anecdotal, and likely to collapse under scrutiny.

Example du jour, Anthropic is getting an enormous amount of sky-is-falling coverage over what appears to be the development of a good but hardly revolutionary white-hat hacking tool.

Here's Gary Marcus's assessment:

To a certain degree, I feel that we were played. The demo was definitely proof of concept that we need to get our regulatory and technical house in order, but not the immediate threat the media and public was lead to believe.
Not only has the reporting been credulous and incurious, it has largely ignored the ever-present elephants in the room when discussing OpenAI, Anthropic, etc.

Cal Newport follows up:

Since Marcus published his essay, I’ve come across several more similar findings:

The AI security expert Stanislav Fort ran an experiment to see if existing, cheap open-weight models could find the same vulnerability in FreeBSD (an open-source operating system) that Anthropic touted as evidence of Mythos’s scary abilities to uncover bugs that had been hiding for decades. The result: all eight existing models they tested discovered the same issue.
Meanwhile, the renowned security researcher Bruce Schneier weighed in, similarly concluding: “You don’t need Mythos to find the vulnerabilities they found.”

And of course, it doesn’t help that a week before Anthropic released this supposedly super-powered vulnerability detector, they accidentally leaked the Claude Code source, and security researchers immediately found serious vulnerabilities. (I guess Anthropic forgot to use Mythos to clean up their own software…)

Journalists covering this story need to constantly remind themselves that hundreds of billions of dollars, possibly even trillions, are at play here. What's more, the constant flow of funding that keeps this game going appears to be drying up, making this the highest-stakes game of musical chairs ever played. One of the key motivators that has kept the music going this long has been the carefully promoted belief that the end of the world is possibly days away and the only thing that can save us is if the good wizard discovers the incantation before the bad wizard does (at the risk of putting too fine a point on it, the bad wizard here is China).

Software developer Carl Brown of the Internet of Bugs has a good take. In particular, pay close attention to the part about Responsible Disclosure.

Brown got on my radar through this excellent discussion with Ed Zitron, Over an hour but well worth the time.

West Coast Stat Views (on Observational Epidemiology and more)

Wednesday, April 29, 2026

Anthropic's Mythos: One moderate step forward for white-hat hacking; one giant leap forward for journalists' credulity

No comments:

Post a Comment