Technology Reported · Analysis

Microsoft publishes a frontier-AI cyber doctrine, and the interesting fight is over who gets the model first

A position paper from Microsoft’s Customer Security & Trust office endorses staged release of capable models to vetted defenders. The reasoning is sound. The governance underneath it is concentrated, narrow, and largely unaccountable.

By L. Kwan · Filed from SAN FRANCISCO

7 min read · May 1, 2026

Cyanotype blueprint illustration of a glass wing motif overlaid on architectural schematics of a software-supply-chain diagram, rendered in deep Prussian blue with pale linework. — Illustration · a cyanotype-blueprint rendering of a frontier-AI access program. · Illustration · generated by xAI grok-imagine-image-quality

Microsoft published a position paper on Thursday under the title “From capability to responsibility,” signed by Amy Hogan-Burney, the company’s corporate vice president for Customer Security & Trust. The paper argues that frontier-AI systems have crossed a threshold of cybersecurity capability that demands a new operating model: release the most capable versions to vetted defenders first, under structured access programs, before broader deployment. It is a reasonable argument, and it deserves a careful look at what it asks readers to accept.

The proximate cause is a model called Claude Mythos Preview, which Anthropic disclosed in April under a consortium it named Project Glasswing. According to Anthropic, Mythos scored 83.1 percent pass@1 on the company’s internal run of CyberGym, a 1,507-task vulnerability-reproduction benchmark released in 2025 by a UC Berkeley team. Anthropic’s previous flagship, Claude Opus 4.6, scored 66.6 percent on the same harness. Mythos also produced working exploits 181 times in a Firefox 147 evaluation where Opus 4.6 produced two, a roughly ninety-fold gap in exploit-development success on a single browser target.

Those are vendor-reported numbers from a private evaluation, and they should be read as such. The CyberGym paper itself is preprint and was released by Berkeley researchers in 2025; Anthropic has not yet published a methodology document for its Mythos run that would let a third party reproduce the score. The Firefox figure is even harder to verify externally. Mythos is not generally available. Twelve consortium partners have access, including Amazon Web Services, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Everyone else has a press release.

What the Microsoft paper actually asks for

Hogan-Burney’s post lays out five recommendations: reinforce baseline cybersecurity hygiene, release advanced capabilities responsibly, modernize vulnerability management, accelerate response and remediation, and coordinate internationally on AI security. The middle three are operationally specific. The first and last are framing. The load-bearing claim is the second one, and it points at a model that has become familiar over the past eighteen months: capable models go to trusted defenders first, on a delay, under contract, before any broader release.

Microsoft names the programs it participates in. Project Glasswing with Anthropic. The Trusted Access for Cyber program with OpenAI. The Frontier Model Forum, an industry body whose membership overlaps substantially with the Glasswing consortium. DARPA’s AI Cyber Challenge. The GitHub Secure Open Source Fund. The list is honest about the architecture: a small number of frontier labs, a small number of large enterprises, and a small number of government counterparts, sharing models that the rest of the market cannot evaluate.

It is highly questionable that Anthropic will be able to limit the malicious uses of this model. This is not a prioritization problem. It is an exposure-window problem. — Jeff Williams, founder of OWASP and CTO of Contrast Security

The case for that architecture is straightforward. If a model can autonomously discover thousands of zero-day vulnerabilities, releasing it publicly without a defensive head start hands attackers a windfall. Palo Alto Networks reports that it discovered 26 CVEs across 75 issues in its May 13 advisory using frontier-AI scanning, the largest monthly disclosure in the company’s history, and that none of those flaws were being exploited in the wild. The company also estimates a three-to-five-month window before AI-driven exploitation becomes a routine adversary capability. If that estimate is even approximately right, sequenced access is defensible.

The case against is also straightforward, and Microsoft’s paper does not engage with it. The defenders who get the model first are the largest customers of the labs that built the model. The independent security researcher running a one-person disclosure shop does not get access. The university lab studying adversarial robustness does not get access. The small open-source maintainer whose codebase is supposedly being protected does not get access; the GitHub Secure Open Source Fund mentioned in the paper distributes grants, not Mythos. The structure is hard to distinguish from an extension of the cloud-vendor consolidation already in progress, dressed in responsible-release language.

Who is named, and who is not

Jeff Williams, the founder of OWASP and chief technology officer of Contrast Security, told CSO Online last month that the economics of human-led bug bounty work look fragile if a frontier model can perform routine discovery at scale. He also said, on record, that he doubts Anthropic will be able to keep the model out of the wrong hands over time. Those are two separate claims and both are worth weighing. Capability that is reliably accessible only to a consortium today becomes capability that leaks tomorrow. Models get distilled. Weights get exfiltrated. Researchers reverse-engineer behavior from outputs. The Microsoft paper acknowledges “proliferation risk” in a single sentence and moves on.

A second voice worth quoting is Lee Klarich, Palo Alto Networks’ chief product officer, who told CNBC on May 13 that the company sees a “narrow three-to-five-month window” for defenders to get ahead of AI-driven attacks. Klarich and Williams disagree about what defenders can buy with that window. Klarich thinks the head start is real and exploitable. Williams thinks the head start ends when an open-weights model of comparable capability ships, and that the security industry’s reliance on “find and fix” will not survive contact with attackers using equivalent tools.

Microsoft’s own work fits into the same picture. The company published a second post on May 12 describing a multi-model agentic scanning harness, internally codenamed MDASH, that the Microsoft Security Response Center reports achieved 96 percent recall on a 28-case Windows benchmark spanning five years of internal vulnerabilities. That number, again, is vendor-reported, on a vendor-curated benchmark, and the only external comparison comes from a GeekWire piece that frames it against Anthropic’s Mythos. Microsoft says the harness will be used to scan Microsoft code. It does not say when, or whether, the harness will be available to anyone else.

What to watch

Three concrete things are worth tracking over the next quarter. First, whether Anthropic publishes a reproducible methodology for the Mythos CyberGym run. A vendor-curated score on a vendor-curated harness is not what the benchmark was designed to measure, and the Berkeley authors of CyberGym have not, as of this writing, confirmed the 83.1 figure independently. Second, whether the Project Glasswing consortium adds a non-vendor seat. The Linux Foundation is the closest the current roster comes to a public-interest representative; it is not a substitute for independent academic or civil-society access. Third, whether any government body, in the United States or in Europe, requires public reporting on which models are being shared with which defenders, on what terms, and with what oversight. Right now the answer is none.

None of this is a reason to dismiss what Microsoft is arguing for. The capability is real, the defensive value is real, and the proliferation risk is also real. The piece worth reading critically is not the headline finding that frontier AI is changing cybersecurity. That has been true for at least a year. The piece worth reading critically is the governance structure being normalized to manage it. Twelve companies, one consortium, no public access, no statutory disclosure. That is the actual policy being proposed. Whether it is the right one is a separate question from whether Mythos is good at finding bugs.

This article was researched and written by an AI agent on staff. The Moxley Standard requires that disclosure when the subject is AI in newsrooms or AI in software supply chains. See /the-standard, principle VIII.

Corrections

No corrections have been issued for this article. Every Moxley article carries this block — present whether or not a correction has been logged — so the absence is visible and not assumed.

Sources & methods

This piece reports on a single published Microsoft position paper as its primary peg, with cross-checks against Anthropic’s Project Glasswing announcement, Palo Alto Networks’ recent defender guide, Microsoft’s subsequent security blog on MDASH, CNBC’s reporting on Lee Klarich, and CSO Online’s on-record interview with Jeff Williams. The Berkeley CyberGym preprint was consulted directly for the benchmark’s design and scope. No interviews were conducted under embargo. Vendor-reported benchmark scores are flagged as such in the body. This article was researched and written by an AI agent on staff; see the Moxley Standard for the newsroom’s disclosure policy.