Mythos and Legends

12 May 2026

A recent pastime for me has been reading the reports coming out from Project Glasswing. As you're probably aware by now, that's the scheme where Anthropic is permitting various companies and open source projects to scan their code for security vulnerabilities using the fancy new model called Mythos.

This all kicked off with a detailed post from Anthropic's own security researchers. They suggested that by evolving and improving its general capabilities, Mythos became dramatically better than previous models at locating and stringing together bugs to create working exploits. One of their proudest scalps was a 17-year-old RCE bug in FreeBSD's NFS implementation, and Mythos was apparently also adept at browser exploitation. Despite the overall sombre tone, they still found an opportunity to link to James Mickens' The Night Watch, a piece that deserves to be linked more often.

Recently I've been reading more LinkedIn (I know, I know) and it's been pretty funny watching this play out. First there was the marvelling and FUD concerning the new capabilities, then others came out to trumpet that it was pure marketing. This second group was reaching peak intensity when Mozilla published.

271 vulnerabilities identified by Mythos and fixed. They later posted more details about their process and also updated their count to 423 security bugs in April (achieved via various means). Remember that Tor Browser is based on Firefox? Probably a good thing if that one's secure. Thanks, Anthropic.

AISLE came out with a provocative post that they were able to find some of the same vulnerabilities using less capable models. This was interesting, but much of the trick was isolating the relevant buggy part so that the model wouldn't be distracted by everything else. I'll come back to this in a minute.

wolfSSL also made an announcement—8 new CVEs.

So, is Mythos as good as the hype? On our codebase, yes.

Now curl has had the chance to get checked out by Mythos. Apparently it has found one low-severity vulnerability (not memory safety) and around 20 non-security bugs.

Daniel Stenberg's comments about AI are pretty interesting in general. He's been through a lot in recent months—AI slop on curl's bug bounty program eventually causing him to shut it down, then later reporting that the security reports are getting quite good actually (due to AI), and now the Mythos thing.

He said this about the latest work:

My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.

I feel he's being modest about the absence of bugs[1], or at least avoiding hubris. Curl is a heavily scrutinised codebase that's already had significant attention from researchers assisted by AI. To me, the more likely explanation is that there's not much left to find. Amusingly, Daniel has a recent blog post stating exactly the opposite, that there are probably still plenty of bugs to come. You can all come and laugh at me in a few months if there's a raft of CVEs that Mythos didn't identify. It would only be fair. I'm still going to make this prediction, however, for two reasons.

The first is Mozilla's experience. They said that they found 22 security bugs with Opus 4.6 and then 271 with Mythos. If the model was only incrementally better at finding vulnerabilities, that's not the kind of result you would expect.

The other is AISLE's finding. If Opus 4.6 (or 4.7) is okay at finding vulnerabilities when targeted at particular vulnerable code, then it's likely that this has already been happening. If there are hundreds of researchers using publicly-available models and pointing them at different parts of curl's code in varying levels of detail, it's logical that they've found much of what Mythos could, except with more effort involved.

As you would expect, all the reports I've seen so far have come from open source projects. They don't mind talking about it. Sure, it's mildly embarrassing when memory safety bugs keep showing up in memory-unsafe languages but we're used to that. Best to just fix them and keep on keeping on.

For the commercial entities who are also part of Glasswing the calculus is a little different. I expect that few companies want to crow about the number of security bugs they had lurking in their code until Anthropic swept in to help. Still, I hope they find lots and lots of things and fix them all. The big tech companies are responsible for the security and privacy of most people on this planet and it's what their users deserve. Once the vulnerabilities are fixed, I hold out hope that Microsoft will spend some Mythos tokens taking care of the other bugs. A man can dream. And maybe some of them will find the courage to post something publicly.

What's most interesting to me is this: if we have all this neato AI tech that can find vulnerabilities, what's the path to getting this capability running against all my PRs? Will it be expensive? My hope is that this task can be optimised without having to reach for a huge general model like Mythos. Focusing analysis on diff contents will save tokens but sometimes it's the interactions with other parts of the codebase that bite you. Whoever gets this right at an economical price point is probably going to make a lot of money.

I said as much in a comment on HN. Part of my comment was relayed anonymously to Mastodon. Daniel's response was possibly either roasting me, or riffing on it. I'm just going to assume riffing. ↩︎

Serious Computer Business Blog by Thomas Karpiniec
Posts RSS, Atom