Mythos and its impact on security
I’m sure by now you’ve all read the news about Anthropic’s new “Mythos” model and its apparently “dangerous” capabilities in finding security vulnerabilities. I’m sure everyone reading this also has opinions about that. Well, here are a few of mine. Firstly, it’s tempting to dismiss the announcement as pure marketing hype. Anthropic are rumoured to be approaching IPO , so obviously a lot of hype is expected, and we’ve seen the “dangerous” card played before with GPT-2 . Throughout the history of computer security, both new tools and security researchers themselves have often been branded as dangerous or irresponsible. If you have sympathy for this viewpoint, then I hate to break it to you: Mythos is not inserting vulnerabilities into software, they were there all along. (Vibe-coding notwithstanding). That’s not to say that Mythos doesn’t represent a potentially interesting breakthrough. (Although apparently many existing small models are able to reproduce its findings, at least in part). And that’s not to say that releasing Mythos would not have some risk: potentially quite a large risk in some cases, and its ability to synthesise actual exploits is concerning. All security tools that find vulnerabilities come with a risk, but they also come with an upside: letting defenders find vulnerabilities too (ideally, first). Anthropic quote costs of around $10,000–20,000 for each vulnerability they found. You can quibble around those costs and I’m sure they’ll come down over time, but at the moment I think it’s fair to say that this won’t be run over every single software project out there. If it’s going to be used by bad actors, then it’ll probably still be somewhat targeted at high-impact systems. I’m sure we’ll see some new zero-day exploits of edge devices and probably an uptick in ransomware attacks, but it’s not like edge devices don’t regularly get exploited anyway . (Spoiler: many security products are shockingly poorly designed and implemented). But on the plus side, I can see Mythos and similar models being an excellent add-on to your annual pentest engagement. At those costs, you’re not going to run it on every build pipeline, and there’s probably going to be a certain amount of expertise required to get the most from it in a limited budget. As with all new tools, eventually the findings will plateau. There’s only so many times you can run the same tool over the same source code and come up with new findings. That’s not to say that there won’t still be vulnerabilities (there almost certainly will), but just that the tool will not be able to find them. As a former AI researcher myself (before modern ML exploded), I find this aspect of the Mythos write-up quite interesting. Most security tools suffer from problems with false positives, and LLMs are of course famous for that: they are “bullshit machines” . Putting it in slightly less pejorative terms, I would call them abduction machines: they generate plausible hypotheses to explain some set of observations. (Training an LLM is induction, but what they do at runtime is closer to abduction). In the case of a chatbot, the “observations” are the token context window, and the hypotheses are the plausible next token completions. In the case of vulnerability hunting, the observations are the source code and a prompt asking to look for a vulnerability and the hypotheses are the generated potential vulnerabilities. Despite knowing how this works, it is still kind of magic to me that the latter emerges from the former (plausible vulnerabilities from merely predicting the most likely next tokens given the context). Broadly speaking, the better the model the more likely those hypotheses are to be accurate. But they are still wrong an awful lot of the time, and false positives are the death of productivity. We’ve all seen reports of open source projects being overwhelmed by “slop” AI-generated vulnerability reports . But recently, that seems to have changed and a larger quantity of high-quality reports are being submitted to many high-profile projects. What changed? I think the clue is front and centre in Anthropic’s write-up: use of Address Sanitizer (ASan) as an oracle to weed-out false positives. I think this is a crucial dividing line that separates successful from unsuccessful uses of AI. This is why “agentic” (grr) AI is relatively successful at software development. The models aren’t inherently much better at writing code than any other task, but there already exists a large body of automated “bullshit correctors”: type checkers, linters, automated test suites, etc. (Many of which use techniques from earlier waves of “symbolic” AI research, just saying…) These oracles provide a clear signal about whether a hypothesis generated by the LLM is bullshit or not. (I would hypothesise that LLMs are likely to produce better code in languages with more sophisticated type systems). Hence why we see quite a lot of progress and marketing for AI systems in such use-cases, despite those markets being relatively small compared to the AI company’s massive valuations and funding. I’m guessing investors are not going to be pleased to have stumped up billions for a slice of a dev tools company? But software development does seem somewhat unique in this regard. Getting back to vulnerability hunting and oracles. This is the same situation that fuzzers face: a fuzzer is generally only really good at finding vulnerabilities when there is a good oracle to decide if you’ve found one or not (PDF link). Like Mythos, fuzzers are very good at finding crashes and (via oracles like ASan) memory safety issues, but they are not going to find subtle violations of user expectations. Mythos is clearly more than just a fuzzer though, it’s also looking at the source code and doing somewhat sophisticated “analysis” of potential weaknesses. But I think the problem of needing an oracle will remain. Without an oracle, I’m sure that Mythos would still find genuine vulnerabilities, but they will be overwhelmed by slop false positives, which will drown out the signal in the noise. For me then, I think this is the most interesting open question for LLM-based vulnerability finders: which classes of bugs can we write (or train) good oracles for? I think potentially quite a lot, but definitely not everything. I think humans will still have an edge in finding complex bugs for a long time to come. Obviously Anthropic’s take is that you should use AI tools to find all the bugs first. You could dismiss that as an obvious attempt to cash-in, and it even has shades of a protection racket . But I do think that Mythos and models like it are probably worth using, as an add-on to a human penetration test or similar. But really Mythos is just the latest tool revealing the continuing poor state of software security. Such tools continue to find a frightening array of vulnerabilities, because there are a terrifying quantity of them out there, and we keep adding more. The situation is not really going to be improved by throwing more AI at the problem. If anything, the rise of vibe-coding is likely to be increasing this trend. As I’ve covered here before, even apparent experts write total garbage security services when assisted by an LLM. If you want secure software, you need to slow down and think carefully and deeply, not rush ever faster to churn out more and more junk software. In the short term, we can just keep doing the things we know how to do: thinking about security earlier in the design process, incorporating basic security tools and testing into build pipelines, ensuring you can patch CVEs quickly (but not too quickly ), etc etc. Longer-term, we know how to solve some classes of vulnerabilities altogether. For example, we know that memory-safe programming languages eliminate whole swathes of potential issues, including many of the sort that Mythos is good at discovering. We’ve known this for decades but still write lots of software in unsafe languages. Numerous reports and government proclamations are slowly shifting that, but we still have a very long way to go. Capability-based security would solve many other classes of vulnerabilities, whether in CPUs , Operating Systems, or in supply chains . These are not easy fixes, and would require a massive investment over many years. Profit-driven companies are not going to pursue them without regulatory pressure, and are largely going in the opposite direction. Such fundamental changes would not solve everything: there will still be vulnerabilities, but fewer and of less severity if we do it right. Strong security mechanisms can provide a multiplicative reduction in risk . But if we really do want more secure software, and a foundation to our digital society that we can actually trust, then I don’t see an alternative. Finding and fixing individual vulnerabilities will never deliver that, however good the tools get.