Anthropic pulled the covers off its most capable model yet this week, announcing Claude Mythos Preview together with Project Glasswing, a carefully scoped initiative that puts the new model into the hands of cybersecurity defenders first. Announced on 7 April, the rollout is unusual: Anthropic is holding Mythos back from general release and instead routing it through partners working on the world’s most critical software.
Mythos performs strongly across standard benchmarks, but its most striking capability is in computer security. Over the past several weeks, Anthropic says it used Mythos Preview to autonomously identify thousands of previously unknown vulnerabilities across every major operating system and browser, including a 17-year-old remote code execution flaw in FreeBSD. Through Project Glasswing, the model is being made available to a small group of partners including Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft and Nvidia so they can shore up critical defences before models with similar capabilities become widely available.
The announcement matters because it reframes how a frontier lab can ship a model that is plainly too dangerous to release openly. Rather than sitting on the work or relying solely on refusals, Anthropic is giving defenders a head start. For enterprises, it also signals a new reality: the bar for offensive cyber capability in commercial AI is climbing fast, and the window to patch long-standing vulnerabilities is shrinking.
This is not just a product launch; it is a new playbook for shipping dangerous capabilities. For years, the AI safety debate has lived at the two extremes: release everything and let the community react, or lock everything down and hope nobody else catches up. Anthropic has carved out a third path with Glasswing. By routing Mythos to a vetted group of defenders running some of the world’s most critical infrastructure, the company is turning a “too dangerous to ship” moment into an offensive security exercise that favours the good guys. It is the most concrete example so far of what “responsible scaling” looks like when the risk is no longer theoretical.
The commercial and regulatory implications are enormous. Enterprises that assumed they had years to patch long-standing vulnerabilities now know that a frontier model can find thousands of zero-days on its own. Defenders with Glasswing access get a head start; everyone else is suddenly on a shorter clock. Expect CISOs, regulators and insurers to start asking hard questions about whether their suppliers are inside Glasswing or its equivalents, and whether mandatory disclosure rules should follow this kind of discovery at scale.
The bottom line: Mythos is a reminder that frontier capability no longer arrives with a press release and an API. The real question for 2026 is not who has the most powerful model, but who gets to use it first, under what conditions, and with what obligations to the rest of us.
The Great AI Capability Split. This week made it clear that the AI industry is splitting into two parallel races. On one side, closed frontier labs like Anthropic and OpenAI are spending tens of billions on compute and choosing who gets to use their most dangerous capabilities first. On the other, Chinese and open-weight players like Z.AI are closing the gap on benchmark performance and shipping models anyone can run. Everything from the Anthropic-Google-Broadcom deal to GLM-5.1’s SWE-Bench Pro win and the Glasswing launch points at the same reality.
The signal for enterprise leaders: raw capability is no longer the scarce resource. Trusted, well-governed deployment is. The companies that figure out how to use frontier models safely, inside proper permissioning and audit frameworks, will pull ahead faster than those still chasing the biggest benchmark number.
Chinese lab Z.AI released GLM-5.1 on 7 April, a 754-billion-parameter Mixture-of-Experts model published under the MIT licence with a 200K context window. GLM-5.1 topped SWE-Bench Pro at 58.4, narrowly beating GPT-5.4 (57.7), Claude Opus 4.6 (57.3) and Gemini 3.1 Pro (54.2). It is the first open-weight model to lead the most demanding software engineering benchmark.
GLM-5.1 was built explicitly for long-horizon autonomous execution. Z.AI reports the model sustained up to 8 hours of independent software engineering work across thousands of tool calls in internal tests. Combined with permissive licensing, this makes GLM-5.1 the strongest argument yet that open-weight models can match or beat frontier closed models for the hardest enterprise coding and agent tasks.
Read the coverage β