The Race to the Top on AI Safety Is Dead. Here's What Killed It.

Three years ago, the AI safety consensus looked unbreakable. Every major lab had signed onto some version of voluntary oversight. Governments were drafting binding frameworks. Researchers published papers on alignment. Tech executives gave Senate testimony about existential risk. The mood was almost serene in its certainty: serious regulation was coming, and the industry would help shape it.

That consensus is now gone. Not cracked - gone. The week of March 3, 2026 delivered what Steven Levy at WIRED called "a body blow" to AI safety hopes: a vicious public fight between Anthropic and the Pentagon, an OpenAI deal that filled the gap within days, and a quiet rewrite of Anthropic's foundational Responsible Scaling Policy - the framework its founders called a "race to the top" that they hoped would shame other labs into higher standards.

Nobody voted to end AI governance. Nobody announced it. It just happened - a series of decisions driven by money, military contracts, and competitive pressure, each individually defensible, collectively devastating.

What the Responsible Scaling Policy Actually Was

To understand what was lost, you need to understand what Anthropic built. When the company launched in 2021, its founders - many of them OpenAI defectors worried about safety shortcuts at their former employer - wanted to do things differently. In 2023, they published the Responsible Scaling Policy, or RSP.

The RSP was built around a simple idea: before launching a new model, Anthropic would assess whether it crossed certain capability thresholds. If it did, safety protocols had to be in place before shipping. Models couldn't be deployed if they hit what the company called "ASL-3" thresholds - levels of capability that could, in theory, contribute to mass-casualty weapons or other catastrophic harms - without countermeasures that demonstrably reduced those risks.

More importantly, the RSP was designed as industry signaling. Anthropic believed that if one major lab adopted meaningful self-imposed constraints tied to hard capability metrics, others would follow. Either out of competition for talent (safety researchers wanted to work somewhere principled), or regulatory pressure (governments could point to the RSP as the model for binding rules), or simple reputational pressure. The company called this the "race to the top."

Source: Anthropic RSP announcement, 2023

It was an elegant theory. OpenAI and DeepMind adopted elements of it. The UK's AI Safety Institute cited it. The Biden administration's executive order on AI referenced similar threshold-based logic. For two years, the framework held.

Then the money got too big, the competition got too fierce, and the US government decided it didn't like being told what it could and couldn't do with $3 billion worth of software it was paying for.

The Pentagon Blows It Up

The immediate trigger was a dispute over contract language. Anthropic's agreements with the Department of Defense - now formally renamed the Department of War under Defense Secretary Pete Hegseth - had included explicit red lines. Claude models could not be used for autonomous weapons systems. They could not be deployed for mass surveillance of American citizens. These weren't vague aspirations: they were contract clauses.

The Pentagon wanted them removed.

Anthropic refused. The Pentagon then did something that had no precedent in the history of Silicon Valley's defense relationships: it designated Anthropic a "supply chain risk." Under that designation, federal agencies are barred from doing business with the company. Anthropic's government contracts, worth hundreds of millions of dollars, evaporated.

Source: WIRED, "Anthropic Supply Chain Risk," March 2026

Secretary Hegseth's argument - publicly and privately - was that private companies have no business setting limits on military capability. That if a nation's adversaries are deploying AI-directed weapons, the US military must match them. That the alternative isn't a world without autonomous weapons; it's a world where the US loses.

There's something to that argument. It's also exactly the logic that produces arms races.

"AI is so powerful, such a glittering prize, that it is very difficult for human civilization to impose any restraints on it at all."

- Dario Amodei, Anthropic CEO, in an essay published earlier this year

Amodei wrote those words to describe a trap. The Pentagon's actions proved his thesis. The US military's position is essentially: whatever the risks, we cannot afford to be constrained while others are unconstrained. The logical endpoint of that position is an AI arms race with no floor.

OpenAI Moves In - and the Illusion of Industry Solidarity Dissolves

What happened next was almost more damaging than the Pentagon's move itself. OpenAI, Anthropic's primary competitor and theoretical ally in the project of responsible AI development, signed a Department of Defense contract within days of Anthropic's exclusion.

Photo: Unsplash / Joshua Sukoff

OpenAI CEO Sam Altman claimed he was helping Anthropic - that by entering the contract himself, he was reducing the pressure on his competitor and keeping at least one safety-conscious lab in the room. Dario Amodei didn't buy it. In an internal memo obtained by The Information, Amodei wrote: "Sam is trying to undermine our position while appearing to support it. He is trying to make it more possible for the admin to punish us by undercutting our public support."

Source: The Information, Amodei internal memo, March 2026

Amodei later walked back the tone but not the substance. And from a structural standpoint, he was right. Once OpenAI filled the gap, the Pentagon had no reason to negotiate. Why restore Anthropic's contract when a willing replacement existed? The competition between the two companies - which AI safety advocates had hoped would produce a race to the top - instead produced a race to the bottom on precisely the issue that mattered most: autonomous weapons constraints.

OpenAI maintains that its contract includes safeguards against autonomous weaponry. But the company acknowledged it cannot directly control how the Defense Department uses its models. The meaningful question isn't what the contract says - it's what enforcement looks like when a military decides a safeguard is inconvenient.

$500B

Stargate AI infrastructure commitment pledged by US government, Jan 2026

Federal AI safety regulations passed by US Congress as of March 2026

RSP v3

Anthropic's revised policy - thresholds softened, language changed

The RSP Revision Nobody Talked About

Lost in the noise of the Pentagon drama was a quieter announcement from Anthropic on February 24: the company had updated its Responsible Scaling Policy to version 3. The changes were subtle enough that they didn't generate major headlines. They were significant enough to matter enormously.

The original RSP promised that models would not be released if they crossed certain capability thresholds without verified safeguards. RSP v3 adjusted those thresholds and, more importantly, acknowledged that the original framework had failed in its primary goal. The company wrote in the release: "The policy environment has shifted toward prioritizing AI competitiveness and economic growth, while safety-oriented discussions have yet to gain meaningful traction at the federal level."

Source: Anthropic, RSP v3 announcement, February 24, 2026

In other words: the race to the top didn't happen. The RSP was supposed to set a floor that other companies would meet. Instead, it became a ceiling that competitors declined to touch. Anthropic found itself uniquely constrained while rivals moved faster with fewer self-imposed limits.

Jared Kaplan, Anthropic's chief science officer, tried to push back on this interpretation when WIRED asked him directly. "I don't think the race to the top is dead," he said, arguing that safety culture is alive inside research labs even if it's disappeared from policy debates. "There are a lot of researchers at every lab that care a lot about doing the right thing."

That's probably true. It also doesn't solve the problem. Individual researcher values don't constrain what models get deployed to weapons systems. Corporate culture doesn't stop a secretary of defense from invoking the Defense Production Act to commandeer a lab's models if he decides national security requires it. The RSP worked when it had political backing. It doesn't have that anymore.

What the Global Governance Landscape Now Looks Like

The United States was never going to be the primary venue for meaningful AI regulation. That much was clear by 2024. But it was supposed to be a floor - a baseline set of norms that prevented the worst outcomes. Instead, the US is now actively competing against those norms.

Photo: Unsplash / NASA

The European Union's AI Act is the most comprehensive AI regulation in existence, and it came into force in stages through 2024 and 2025. But it's a consumer protection framework, not an arms control treaty. It regulates AI systems deployed to EU citizens. It has no jurisdiction over what the US military does with a model in a classified network. The EU can ban a product from European markets; it can't stop that product from being used to identify and eliminate human targets in a foreign conflict zone.

The UK had positioned itself after the Bletchley Park summit in 2023 as a neutral convening ground for international AI safety discussions. The AI Safety Institute produced useful evaluations and built a framework for frontier model testing. But the institute's authority was always advisory. It could test models and publish findings. It couldn't block deployment.

China presents a particular challenge. Beijing has its own AI governance framework, which requires companies to register large language models and get approval for public deployment. But that framework is designed to ensure AI aligns with Chinese Communist Party values - not to prevent autonomous weapons. China's military AI development program is, by most assessments, accelerating. The idea that the US would accept constraints China doesn't accept was always going to be a hard sell. The current administration has made it impossible.

The net result: there is no meaningful international AI safety architecture. There are bilateral conversations, voluntary commitments, and evaluation frameworks that individual companies can accept or ignore. That's not governance. It's theater.

The Second-Order Effects Nobody Is Discussing

The immediate debate is about the Anthropic-Pentagon fight and what it means for autonomous weapons. The longer-term consequences are more diffuse and harder to see.

The first is talent. Anthropic attracted some of the world's top AI safety researchers by offering them something other labs didn't: a genuine institutional commitment to taking risks seriously. The RSP was a recruiting tool as much as a policy document. RSP v3, the Pentagon blacklisting, the competitive pressure to move faster - all of these change the company's value proposition to safety-focused researchers. Whether they leave is unclear. Whether they feel the same way about where they work is not.

The second is the signal it sends to the rest of the world. US AI policy is now effectively: capability first, safety second, constraints optional. Every other government is watching. Countries that were using the US example as a template for their own AI governance now have to decide whether to follow Washington's shift or maintain their own stricter approaches. The EU may hold. But middle-income countries building their own AI sectors - India, Brazil, South Korea, Saudi Arabia - are unlikely to impose constraints that the world's most advanced AI nation refuses to accept.

The third effect is on the labs themselves. OpenAI's Jason Kwon told WIRED that his company has "more people working on safety than ever before." That may be true in absolute numbers. The question is whether safety teams have the authority to block deployment of a model that the Pentagon is paying for, or whether "safety" increasingly means red-teaming for public-facing outputs rather than governing military use. Those are very different things.

"Instead of a race to the top, the AI rivalry seems more like a bareknuckle version of King of the Mountain."

- Steven Levy, WIRED, March 6, 2026

The fourth effect is the precedent it sets for future disputes. Hegseth threatened Anthropic with the Defense Production Act - a World War II-era statute that allows the government to commandeer private industrial capacity for national security purposes. He didn't invoke it. But he threatened it. That threat alone changes the calculation for every AI lab with government contracts: push back on military demands, and the government can legally take your company's core product.

Is There Any Way Back?

The honest answer is: not through the paths that were supposed to get us there.

Congressional regulation was always the long shot. The US has not passed a major tech regulation bill in decades. A divided Congress, a White House actively hostile to AI constraints, and a tech industry with deep lobbying resources make binding domestic AI safety legislation close to impossible in the current environment.

International treaties face the classic arms-control problem: they require verification, and AI capabilities are software that can be copied, deployed, and updated invisibly. How do you verify that a military isn't using autonomous weapons when the weapon is a model weight on a server in a classified facility? The Nuclear Non-Proliferation Treaty worked, imperfectly, because nuclear material is physical and detectable. AI doesn't have a radiation signature.

The remaining hope, according to people like Kaplan who still want to believe the project isn't dead, is research. If alignment researchers can demonstrate concretely - not theoretically, but with specific techniques and measured results - that safe AI is also more capable AI, the commercial incentives flip. Companies would adopt safety practices not because they're ethically required but because unsafe models underperform. That's a version of the race to the top that doesn't depend on regulation or corporate virtue.

It's also at least five years away on the optimistic timeline. In the meantime, the US military is negotiating contracts, the labs are competing for those contracts, and the international community is watching the country that invented modern AI demonstrate that safety is negotiable when the money gets large enough.

Timeline: The Collapse of AI Safety Consensus

SEP 2023

Anthropic publishes original Responsible Scaling Policy. Calls it a "race to the top." OpenAI and DeepMind adopt elements of the framework.

NOV 2023

UK hosts first AI Safety Summit at Bletchley Park. 28 countries sign Bletchley Declaration. AI Safety Institute established. Mood is cautiously optimistic.

OCT 2024

EU AI Act fully enters into force. Strongest consumer AI regulation in the world, but has no jurisdiction over military applications.

JAN 2026

US announces Stargate - $500 billion AI infrastructure program. Safety frameworks not prominently featured. AI framed as economic and military priority.

FEB 24, 2026

Anthropic quietly publishes RSP v3. Acknowledges original framework "has not gained meaningful traction at the federal level."

FEB-MAR 2026

Pentagon demands removal of autonomous weapons red lines from Anthropic contract. Anthropic refuses. Pentagon designates Anthropic a supply-chain risk.

MAR 3-6, 2026

OpenAI signs Department of Defense contract days after Anthropic's exclusion. Amodei accuses Altman of "trying to undermine our position while appearing to support it." The race to the top officially becomes a race to fill the gap.

The Harder Question Nobody Wants to Ask

Underneath all the policy debate is a question that neither the labs nor the government have answered honestly: what, exactly, are autonomous weapons systems supposed to do that human-in-the-loop systems can't?

The military's stated argument for removing the autonomous weapons constraint is speed. Modern warfare - especially electronic warfare, drone warfare, and missile defense - moves faster than human reaction time. A drone swarm targeting a radar installation doesn't wait for a human to approve each strike. A missile defense system intercepting an incoming ballistic missile has seconds to decide. Human-in-the-loop is a constraint that assumes warfare operates on human timescales. Increasingly, it doesn't.

This is a legitimate technical argument. It's also the argument that produces Terminator. The version of autonomous weapons that sounds reasonable - an anti-drone system that automatically engages incoming threats - bleeds into the version that sounds catastrophic - an autonomous ground system that identifies and eliminates human combatants without human approval - through a series of incremental extensions that each seem defensible in isolation.

Nobody has drawn a clear line between them. The Pentagon's position - that private companies shouldn't draw that line - means that, for now, nobody will. The international community has no mechanism to draw it. And the labs, caught between their own principles and the economics of billion-dollar government contracts, are discovering that principles bend when the money is large enough.

That's the actual story here. Not that Anthropic lost a contract or OpenAI signed one. It's that the institutional structures human civilization was building to manage an unprecedented technology transition have been dismantled, partially by circumstance and partially by deliberate choice, before they had a chance to work. What comes next is a genuinely open question - and for the first time in several years, the optimistic answers are harder to find than the pessimistic ones.

Get BLACKWIRE reports first.

Breaking news, investigations, and analysis - straight to your phone.

Join @blackwirenews on Telegram