Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.

Mar 31, 2025

Anthropic

Moderate

Released v2.1 of their Responsible Scaling Policy. Key changes include:

  • Added and clarified new capability thresholds and safeguards for CBRN and AI R&D risks

  • Removed commitment to “define ASL-N+ 1 evaluations by the time we develop ASL-N models”

  • Confusingly, AI R&D thresholds which were implicitly the next capability level, ASL-3, are now labeled as “4.” This threshold is still associated with ASL-3 security mitigations.

A diff of the changes can be found below: