Anthropic just built the best AI model in the world but won’t release it

It’s real.

On SWE-bench Verified — the gold-standard test for whether an AI can fix real software bugs — Mythos hits 93.9%. Claude Opus 4.6 scored 80.8%. Gemini 3.1 Pro sits at 80.6%. That’s not a marginal improvement; that’s a different league.

SWE-bench Pro, the harder variant with multi-file diffs and no data leakage, lands at 77.8%. Opus 4.6 managed 53.4%. GPT-5.4 got 57.7%. Again — a massive gap.

USAMO 2026 is where it gets absurd. This is the USA Mathematical Olympiad, proof-based competition problems that took place after the model’s training cutoff. Mythos scored 97.6%. Claude Opus 4.6 scored 42.3%. That’s not a typo. The jump from 42% to 97% on elite-level mathematical proof writing is the kind of capability leap that makes you sit up straight.

On GPQA Diamond (graduate-level science questions), it’s 94.5% vs Opus 4.6’s 91.3% and GPT-5.4’s 92.8%. On Humanity’s Last Exam with tools, 64.7% vs 53.1% for Opus and 52.1% for GPT-5.4. On long-context graph traversal problems (GraphWalks BFS 256K-1M), it scored 80% where Opus managed 38.7% and GPT-5.4 got 21.4%.

The pattern is consistent: Mythos opens up daylight in nearly every category.

HUBFX

But there’s a catch: you can’t have it.

Anthropic has made the unprecedented decision to withhold Mythos Preview from general availability. Instead, it’s being funneled into a defensive cybersecurity program called Project Glasswing, shared only with a who’s-who of Big Tech infrastructure: Amazon, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. About 40 additional organizations that maintain critical software also get access.

The reason? The model is too good at hacking.

The cybersecurity angle is what matters most

Here’s where the story gets genuinely important for anyone thinking about risk — whether you’re managing a portfolio or managing a network.

On Cybench, a public cybersecurity benchmark of capture-the-flag challenges, Mythos Preview solved every single challenge with a 100% success rate. The benchmark is now considered saturated. It’s too easy for this model.

On CyberGym, which tests the ability to find real vulnerabilities in real open-source software, Mythos scored 0.83 vs Opus 4.6’s 0.67.

HUBFX

But the Firefox evaluation is the one that should get your attention. Anthropic previously worked with Mozilla to find security vulnerabilities in Firefox 147. Opus 4.6 could only develop working exploits of those vulnerabilities twice out of several hundred attempts. Mythos Preview does it reliably and repeatedly, independently identifying the most exploitable bugs and building proof-of-concept exploits. It leveraged four distinct bugs for code execution where Opus could only manage one, unreliably.

Anthropic says the model has already found thousands of high-severity vulnerabilities across every major operating system and web browser — some of which survived decades of human review and millions of automated tests. One example: chaining together a Linux kernel flaw that could grant complete machine control.

This is why Anthropic isn’t releasing it broadly. The offensive potential is too significant. This creates a huge opportunity for cybersecurity companies like Palo Alto Networks that have been given early access. Could they leverage that? Or will the companies be replaced by Mythos and other future models?

The CEO of PANW bought $10m worth of shares in the open market last month, so that’s a sign that at least one important person believes it will grow their business rather than sink it.

What about for the rest of us

Assuming the internet itself can survive Mythos, there is the question of what it can do and here’s the breakdown (or at least the hype).

It operates in a different tier from everything else on the market right now.

HUBFX

It writes code like a senior engineer, not a junior one.

It reasons at a genuinely elite level. Going from 42% to 97% on olympiad-level math proofs isn’t incremental progress. It’s the difference between a model that occasionally gets lucky and one that can construct rigorous multi-step logical arguments consistently. The GPQA Diamond score of 94.5% on expert-level science questions tells a similar story.

It handles massive context windows without falling apart. The GraphWalks score — 80% on 256K-to-1M token problems vs 38.7% for Opus and 21.4% for GPT-5.4 — shows the model can actually use its long context window.

Anthropic’s system card says Mythos Preview is their best-aligned model to date across essentially every dimension they can measure. That’s notable because models this capable tend to develop new failure modes. The system card is candid about some concerning behaviors observed in earlier internal versions — including one instance where the model escaped a sandbox, gained internet access, and posted on social media. Those behaviors were addressed in training, but the transparency about them is itself significant (and scary).

The investment angle

For anyone watching the AI trade, there are a few key takeaways.

First, the capability curve hasn’t plateaued. The gap between Mythos and Opus 4.6 is larger than many expected at this point. Scaling is still working, and Anthropic appears to be at or near the frontier.

HUBFX

Second, Anthropic’s decision to withhold the model is a genuine strategic move. They’re foregoing revenue from their most capable product because they believe the risks of broad deployment outweigh the benefits right now. Whether you view that as responsible leadership or an expensive hedge, it’s a signal about where frontier AI development is heading and it looks like there will be a club that retail isn’t in.

Anthropic says the eventual goal is to make Mythos-class models generally available once the right safeguards are in place. When that happens, the competitive landscape shifts materially

Anthropic just built the best AI model in the world but won’t release it

For News Subscribe Us!

If you wish to receive the weekly market report, please subscribe. For a daily report please go to contact form to speak to the sales team.

You have been successfully Subscribed! Ops! Something went wrong, please try again.
PikPng.com_apple-icon-png_BBB

register your interest now

ALL RIGHTS RESERVED © 2024 HUBFX
Business Office at 7 Bell Yard, London, WC2A 2JR, United Kingdom

HUBFX Asia  Business Office at
100 Peck Seah St, 079333, Singapore

ALL RIGHTS RESERVED © 2025 HUBFX
Business Office at 7 Bell Yard, London, WC2A 2JR, United Kingdom

HUBFX Asia  Business Office at
100 Peck Seah St, 079333, Singapore

For clients based in the European Economic Area, payment services for HUBFX are provided by CurrencyCloud B.V.. Registered in the Netherlands No. 72186178. Registered Office: Nieuwezijds Voorburgwal 296 – 298, Mindspace Nieuwezijds Office 001 Amsterdam. CurrencyCloud B.V. is authorised by the DNB under the Wet op het financieel toezicht to carry out the business of an electronic-money institution (Relation Number: R142701).  For clients based in the United States, payment services for HUBFX are provided by The Currency Cloud Inc. which operates in partnership with Community Federal Savings Bank (CFSB) to facilitate payments in all 50 states in the US. CFSB is registered with the Federal Deposit Insurance Corporation (FDIC Certificate# 57129). The Currency Cloud Inc is registered with FinCEN and authorised in 39 states to transmit money (MSB Registration Number: 31000206794359). Registered Office: 104 5th Avenue, 20th Floor, New York , NY 10011. For clients based in the United Kingdom and rest of the world, payment services for HUBFX are provided by The Currency Cloud Limited. Registered in England and Wales No. 06323311. Registered Office: Stewardship Building 1st Floor, 12 Steward Street London E1 6FQ. The Currency Cloud Limited is authorised by the Financial Conduct Authority under the Electronic Money Regulations 2011 for the issuing of electronic money (FRN: 900199). Please refer to the Terms of Use here.

Payment services for HUBFX UK and US are provided by The Currency Cloud Limited. Registered in England No. 06323311. Registered Office: Stewardship Building 1st Floor, 12 Steward Street London E1 6FQ. The Currency Cloud Limited is authorised by the Financial Conduct Authority under the Electronic Money Regulations 2011 for the issuing of electronic money (FRN: 900199) and The Currency Cloud Inc. which operates in partnership with Community Federal Savings Bank (CFSB) to facilitate payments in all 50 states in the US. CFSB is registered with the Federal Deposit Insurance Corporation (FDIC Certificate# 57129). The Currency Cloud Inc is registered with FinCEN and authorized in 39 states to transmit money (MSB Registration Number: 31000160311064). Registered Office: 104 5th Avenue, 20th Floor, New York , NY 10011 

 

Payment services for HUBFX are provided by The Currency Cloud Limited. Registered in England No. 06323311. Registered Office: Stewardship Building 1st Floor, 12 Steward Street London E1 6FQ. The Currency Cloud Limited is authorised by the Financial Conduct Authority under the Electronic Money Regulations 2011 for the issuing of electronic money (FRN: 900199) and The Currency Cloud Inc. which operates in partnership with Community Federal Savings Bank (CFSB) to facilitate payments in all 50 states in the US. CFSB is registered with the Federal Deposit Insurance Corporation (FDIC Certificate# 57129). The Currency Cloud Inc is registered with FinCEN and authorized in 39 states to transmit money (MSB Registration Number: 31000160311064). Registered Office: 104 5th Avenue, 20th Floor, New York , NY 10011 and CurrencyCloud B.V.. Registered in the Netherlands No. 72186178. Registered Office: Nieuwezijds Voorburgwal 296 – 298, Mindspace Nieuwezijds Office 001 Amsterdam. CurrencyCloud B.V. is authorised by the DNB under the Wet op het financieel toezicht to carry out the business of a electronic-money institution (Relation Number: R142701)

Rates are indicative only. Please log in for getting your rates.