Lessons from the Calif "MAD Bugs" Series

2026 May 11
6 min read
by Ferenc Schulcz
claude mythos
0 day
mad bugs
llm
ai
defense

Just a few weeks ago, we lived in a world where a robust security architecture with regular tests + patching semi-regularly provided a decent protection for most companies. Then, large language models (LLMs) started to generate working exploits for freshly patched vulnerabilities, or even 0-days they themselves had found. It is worth to read the Month of AI-Discovered Bugs (MAD Bugs) series series by Calif - they were one of the pioneers to document LLMs', especially Claude's new powerful capabilities. What are the consequences of these changes?

The past seems peaceful now

There are several ways an attacker may compromise a system:

Exploiting vulnerabilities in software configurations, weak passwords, benign users and so on.
Exploiting outdated, vulnerable software.
Exploiting 0-day bugs in software otherwise believed to be secure.
Combining any of the steps above in complex attack chains.

Most organizations did not need to worry about 0-days. Identifying such vulnerabilities takes experts a significant amount of time, and attackers know that exploiting them carries the risk of someone discovering both the attack and the vulnerability. Thus, 0-days become quickly obsolete, offering only a few opportunities. These factors previously made them rather expensive, leaving them to the toolchain of advanced persistent threats (APTs) - teams with abundant resources and extensive expertise. APTs, however do not usually target the small small-scale victims; they usually compromise the most valuable systems like critical infrastructure.

To protect against more commonplace/routine attacks, experts had developed a comprehensive approach over time: apply security patches at least every few weeks, have a secure architecture, raise awareness in your team, embrace a safety culture, have independent professionals test your infra. This is the best practice across countries, industries and standards.

These certainties of yesterday seem to collapse with the rise of powerful LLMs.

MAD exploits arriving

Calif has shown that Claude Mythos could find 0-day vulnerabilites in popular software like Vim and Emacs, even providing a working exploit to achieve remote code execution (RCE). In some other cases, while Claude could find a new bug, it required human guiding to accomplish the same.

In one scenario, after reporting a discovered vulnerability in nginx, the Calif team noticed one more strange event: another AI-driven service was apparently watching commits in the nginx repository, and it produced another working exploit based on the fix within hours.

It is certain that this technology is already used by attackers in the wild.

The consequences seem to be of fundamental importance. The window to get, test and deploy a security patch before the first exploit appears has collapsed from the magnitude of days to a few hours. More crucially, the cost of 0-days has just fallen immensely. It now makes sense for criminals to AI-generate an attack chain of 0-days, compromise as many systems as possible before getting detected. Even if that happens, it won’t entail any significant cost for the attackers: they’ll simply generate another 0-day exploit, which they’ll then use to launch a widespread attack on the systems within their sight.

How can defense keep up?

AI-driven 0-days and a collapsed patch window surely sound frightening. But take the term "AI-driven" away, and you just get the old reality of APT targets. At Ukatemi, we work with numerous organizations in finance, the electric power industry and critical infrastructure, which are all aware they are prime candidates for sophisticated, long-term intrusion attempts. They also have great defenses in place to combat this risk.

The scary new reality Calif's findings describe essentially the democratization of APT-level capabilities. What was once the exclusive domain of well-funded nation-states with teams of human researchers is now available to anyone with a powerful LLM and a clever prompt.

The blueprint for survival, therefore, already exists. Organizations that have successfully weathered the APT storm don’t rely on a single firewall or a patch-when-we-can philosophy. Instead, they lean into a battle-tested stack:

Defense-in-depth: If an AI finds a 0-day in your web server, does it grant them the keys to the kingdom? A resilient architecture ensures that a single breach is contained via strict micro-segmentation and robust Identity and Access Management (IAM) policies. Organizations succesfully deploying defense-in-depth usually set up multiple layers of protection around and inside their systems, each designed to completely block attacks against what they cover.
Assumed breach mindset: This involves high-fidelity Intrusion Detection Systems (IDS) and continuous monitoring that look for the behavior of an attacker rather than just the signature of a known exploit.
Professional incident response: When the patch window collapses from weeks to hours, your incident response team needs to be a well-oiled machine, capable of isolating systems at the first sign of an anomaly. It is worth noting that AI can also be used to flood your IDS with false positive alerts, thus your team must be able to quickly assess and discard such incident reports.

For most small to mid-sized organizations, the hesitation to adopt an APT-level security posture hasn't been a matter of negligence, but of simple economics. Building an in-house Security Operations Center (SOC), maintaining 24/7 incident response teams, and re-engineering legacy architecture for micro-segmentation carries a staggering overhead. Historically, the cost of a breach was a theoretical risk that often looked cheaper than the cost of prevention.

However, the AI-driven collapse of the patch window flips this logic. When the barrier to entry for sophisticated attacks drops, the frequency of those attacks rises, making the theoretical risk an eventual certainty. Fortunately, the solution doesn't have to be a multi-million dollar internal hire spree. By contracting independent experts and specialized managed service providers, smaller firms can rent the sophisticated defense-in-depth expertise they can't afford to build. Leveraging external specialists like Ukatemi for targeted architectural audits and rapid-response retainers offers a middle ground: achieving high-tier resilience at a fraction of the cost of a full-scale enterprise security department.

Ultimately, we are still in the early, unpredictable chapters of the AI security era. It remains to be seen exactly how intense these machine-driven attack campaigns will become, which sectors will be targeted first, or how the arms race between autonomous exploits and future AI-augmented defenses (such as secure software development lifecycles, automated code auditing, and intelligent intrusion prevention) will balance out. What is no longer up for debate, however, is that the era of treating an almost-APT-level security as an optional luxury is over. Organizations can no longer afford to wait for the dust to settle; they must proactively adopt a defensive architecture centered on defense-in-depth, continuous monitoring, and rapid incident response capabilities. The speed of the machine is already here, and the only viable response is to build a fortress that can withstand it.

Previous post: IX. National IT Competition (OITM) recap

Want to message us? Contact us: blog@ukatemi.com

Lessons from the Calif "MAD Bugs" Series

The past seems peaceful now #

MAD exploits arriving #

How can defense keep up? #

The past seems peaceful now

MAD exploits arriving

How can defense keep up?