Anthropic Safety Pledge Dropped in Policy Overhaul

Daniel Okoye

Anthropic has dropped a central commitment from its flagship safety framework, reshaping how it governs the development of its frontier models. The Anthropic safety pledge previously promised the company would not train certain advanced systems. That pledge depended on guaranteeing that safety measures were adequate in advance. Company officials said the commitment was removed during a major rewrite of its Responsible Scaling Policy.

The revision comes as competition in advanced AI accelerates and policy expectations remain unsettled. Jared Kaplan, Anthropic’s chief science officer, said the company did not see value in unilateral pauses. He argued a stop would not “help anyone” if rivals continued advancing. Kaplan said the company still intends to pursue safety work aggressively.

For markets, the change matters because safety policy influences product release timing and enterprise adoption. It also shapes regulatory exposure and reputational risk for major AI vendors. Anthropic’s policy shift arrives alongside rapid commercial momentum for its Claude product line. The company has emphasized enterprise demand and developer usage as growth drivers.

What Changed Inside The Responsible Scaling Policy

In 2023, Anthropic introduced the Responsible Scaling Policy as a strict internal constraint. Leaders promoted it as a backstop against rushing powerful models to market. The core promise set a high bar for training and deployment decisions. The new version removes that categorical bar tied to advanced guarantees.

The updated framework shifts toward transparency and comparative commitments. The company says it will disclose more about model risks and safety testing. It also commits to matching or surpassing competitors’ safety efforts. Under the revised approach, Anthropic may “delay” development in narrow circumstances.

Those circumstances depend on two internal judgments. Leaders must see Anthropic as leading the AI race. They must also view catastrophic risk as significant. That structure sets a higher threshold for a slowdown than the earlier pledge. It also makes the trigger more contingent on external competition.

The company has framed the rewrite as a practical adjustment rather than a reversal. Kaplan described the change as responding to political and scientific realities. He pointed to intensified competition among firms and countries. He also cited the absence of binding regulation as a constraint on unilateral action.

New Reporting Promises And Safety Roadmaps

Anthropic says it will publish more frequent, structured disclosures under the revised policy. A major addition is a new cadence of public “Risk Reports.” The policy says these reports will discuss risks and mitigation decisions. The company says reports will arrive every 3 to 6 months. 

The policy describes the scope of these reports in operational terms. Risk Reports will cover all publicly deployed models at the time of publication. They may also cover internal models deemed to pose significant incremental risks. The text says this includes models used for large-scale autonomous research. 

Anthropic also commits to publishing “Frontier Safety Roadmaps.” These roadmaps outline safety goals spanning security, safeguards, alignment, and policy work. The company says the goals are intended to be ambitious but achievable. It describes them as public targets with open progress grading. 

In its explanation of the rewrite, Anthropic also separates two ideas. One is what the company will do regardless of rivals. The other is a broader “capabilities-to-mitigations” map it recommends across the industry. That structure aims to pair frontier capability growth with scalable safeguards. 

A third-party evaluator, METR, reviewed an early draft with permission, according to reporting. A METR policy director said the shift suggests safety methods are not keeping pace. He warned that gradual risk escalation could avoid sharp alarm thresholds. That critique focuses on the loss of binary tripwires.

Business Momentum Raises Stakes For Governance

The policy change comes as Anthropic highlights rapid growth and strong funding. In a February announcement, the company said it raised US$30 billion in Series G financing. It said the round implied a US$380 billion post-money valuation. The company also disclosed a US$14 billion revenue run-rate. 

Anthropic credited much of that growth to enterprise and developer demand. It said the number of customers spending over US$100,000 annually rose sharply. It also said that more than 500 customers now spend over US$1 million annually, on a run-rate basis. It said 8 of the Fortune 10 are Claude customers. 

The company also highlighted momentum for Claude Code, its agentic coding tool. It said Claude Code became widely available in May 2025. It reported Claude Code’s revenue run rate above US$2.5 billion. It said that the figure had more than doubled since early 2026. 

For investors, governance shifts can signal how a firm balances safety and speed. A looser internal constraint may reduce the risk of delays in model releases. It can also raise scrutiny from regulators and enterprise customers. Anthropic says transparency, roadmaps, and recurring risk reports will fill that gap. 

The broader policy environment also shapes incentives. Anthropic’s rewrite reflects its view that comprehensive federal AI rules are unlikely in the near term. Company officials have argued that state regulation efforts face political resistance. That dynamic increases the importance of voluntary governance systems.

Share This Article