AISI Evaluation: Claude Mythos Completes Autonomous End-to-End Network Attack for First Time

10 May 2026 · 2Twenty Solutions · 6 min read

What the report found

On 7 April 2026, Anthropic announced Claude Mythos Preview, the latest iteration of its frontier AI model. The UK's AI Security Institute (AISI) — the government body responsible for evaluating the safety and security implications of frontier AI models — conducted an independent assessment of the model's offensive cyber capabilities and published its findings alongside the release.

AISI's headline conclusion is direct: Mythos Preview "represents a step up over previous frontier models in a landscape where cyber performance was already rapidly improving." Two specific results stand out. In expert-level capture-the-flag (CTF) challenges — a class of task no AI model could complete before April 2025 — Mythos Preview achieved a 73% success rate. More significantly, when run against AISI's "The Last Ones" (TLO) scenario, a 32-step simulated corporate network attack, Mythos Preview became the first AI model ever to complete the scenario from start to finish, succeeding in 3 of 10 attempts.

The TLO scenario is meaningful precisely because it mirrors the structure of a real intrusion: beginning with initial reconnaissance and progressing through lateral movement to full network takeover — a sequence AISI estimates would take a skilled human 20 hours. The previous best result from Claude Opus 4.6 averaged 16 of the 32 steps. Mythos Preview averaged 22, and on three occasions went all the way.

Who should act

This evaluation is relevant to any organisation running enterprise network infrastructure. The specific risks are highest for:

Organisations with flat or under-segmented networks — an autonomous agent that can complete a full kill chain benefits most when there are few internal barriers to lateral movement
Businesses relying on perimeter-only security models — where there is no meaningful detection or response capability inside the network once initial access is achieved
IT and OT environments with unpatched systems or weak access controls — AISI noted that the model's attack paths exploited common enterprise weaknesses, not novel zero-days
Managed service providers and cloud-hosted environments — multi-tenant or shared infrastructure amplifies the reach of an autonomous attacker
Critical infrastructure operators — AISI tested an operational technology (OT)-focused scenario ("Cooling Tower") and while the model struggled on OT-specific steps, it progressed substantially through the IT segments that precede them

How it works

AISI's evaluation methodology uses two approaches. The first is capture-the-flag testing, where the AI is presented with isolated systems containing hidden "flags" it must retrieve by identifying and exploiting vulnerabilities. CTF benchmarks are well understood and allow consistent comparison across model generations, though they typically represent contained, single-target challenges rather than multi-stage intrusions.

The second — and more operationally significant — approach is the "cyber range": a purpose-built environment simulating a real corporate network. The TLO scenario requires the AI to sequence 32 discrete steps across multiple systems, adapting to findings at each stage, as an actual attacker would. The fact that Mythos Preview completed this autonomously, without human guidance mid-task, is what distinguishes this result from prior AI capability benchmarks. AISI is careful to note the limits of this result: the simulation lacks active defenders, detection tooling, and the unpredictability of a live environment. Real-world performance would differ. Nevertheless, the direction of the trend is unambiguous — autonomous AI is moving from completing fragments of attack chains to completing them in full.

What you should do

AISI's own recommendations focus on security fundamentals — the conditions that deny or degrade the attack paths an autonomous AI would take. For Australian organisations, the following actions are the most direct response to the capability shift this evaluation documents:

Audit your network segmentation — Map where an attacker with initial access to a single endpoint could move laterally. Prioritise closing pathways between user endpoints, administrative systems, servers, and any OT or control networks. If segmentation relies entirely on VLANs without active enforcement, treat it as insufficient.
Enforce least-privilege access controls — Mythos Preview's attack progression through a 32-step chain depends on being able to acquire credentials and escalate privileges at each stage. Review service account permissions, local administrator rights, and any accounts with domain-level access. Remove standing privileges where just-in-time access is feasible.
Establish detection coverage inside the network — An autonomous attacker that is never detected can operate without time pressure. Deploy endpoint detection and response (EDR) across all managed systems and ensure your SIEM or monitoring capability generates alerts on common lateral movement indicators: pass-the-hash, Kerberoasting, unusual authentication patterns, and administrative tool usage from non-administrative accounts.
Prioritise security logging and retention — AISI specifically cited logging as a core defensive control. Ensure authentication events, process creation, and network connection logs are centralised and retained for a minimum of 90 days. Review log coverage for gaps — servers and network devices that generate no logs are blind spots an AI-driven attacker will exploit without consequence.
Apply outstanding security updates to internet-facing and internal systems — The attack paths in TLO exploited common enterprise weaknesses. An accelerated patching cadence for critical and high-severity vulnerabilities on internet-facing systems reduces the number of viable entry points available to any attacker, automated or otherwise.
Commission a purple team or breach simulation exercise — With the bar for autonomous network compromise now demonstrably lower, point-in-time pen tests against perimeter systems are insufficient. A purple team exercise — where a red team attempts to complete a kill chain while your detection and response team observes — will reveal whether your current controls would meaningfully impede or detect an autonomous agent operating inside your network.
Brief your leadership team on the capability shift — AISI's finding that an AI model can complete a simulated end-to-end corporate network takeover is a material change in the threat landscape. It warrants a board-level conversation about risk appetite, investment in detection and response capabilities, and the timeline for closing known gaps.

The capabilities documented in this evaluation make a compelling case for investing in detection and response — not just perimeter defences. If you want to understand whether your current controls would meaningfully impede or detect an autonomous attacker operating inside your network, our cybersecurity consulting team can help you assess segmentation, access controls, and detection coverage. For ongoing intelligence on AI-enabled threats as the capability landscape continues to evolve, see our research and threat intelligence service.

References

← Back to Blog