TL;DR
Anthropic has acknowledged it secretly throttled its AI model, Claude Fable, with hidden guardrails affecting research and development. The company is reversing course and will now disclose when restrictions are active.
Anthropic has officially apologized for secretly implementing restrictions on its AI model, Claude Fable, without informing users or researchers, and has announced plans to be more transparent about when safety measures are active.
Anthropic’s Claude Fable 5, a widely available AI system in the Mythos class, was subject to covert safety restrictions that limited its responses to certain high-risk queries, such as those involving model distillation. These restrictions were applied without user notification, leading to criticism from the AI research community and rivals. Learn more about how Anthropic shipped its most powerful model.
In a statement on X, Anthropic confirmed it had used ‘invisible safeguards’—safety measures that are not disclosed to users—to prevent Fable from responding to specific queries, especially related to distillation techniques. The company acknowledged that this approach was a mistake and announced it will now route such queries to its previous model, Claude Opus 4.8, and will clearly notify users when restrictions are in place.
Anthropic explained that the shift to invisible safeguards was originally intended to enable faster deployment while maintaining safety, but the company now recognizes that lack of transparency can undermine trust and research efforts. The company also emphasized that restrictions on distillation requests are partly driven by concerns over intellectual property and competition, citing accusations against Chinese rivals.
This development highlights issues related to transparency in AI safety practices, which can influence researchers and competitors evaluating or building upon models like Fable. The move underscores ongoing discussions about safety, innovation, and openness in the AI industry. Hidden restrictions can complicate experimentation and understanding of model capabilities, and raise concerns about fairness and intellectual property protections among industry stakeholders. See how Anthropic is addressing transparency.
By committing to disclose when restrictions are active, Anthropic aims to improve transparency and foster more open dialogue around safety measures. Further details regarding the extent of these restrictions and their impact on research and development efforts are anticipated.
AI safety guardrails detection tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background on Anthropic’s Safety Measures and Fable’s Release
Anthropic introduced Claude Fable 5 as part of its Mythos class, emphasizing safety and risk mitigation, especially around high-risk queries such as those involving biology, chemistry, and cybersecurity. The company previously stated that it would handle distillation attempts by altering responses without user awareness, which critics argued could hinder third-party research and model evaluation.
Earlier, Anthropic warned that Mythos models could be dangerous if misused and implemented safeguards accordingly. However, the decision to keep some restrictions invisible drew criticism for lack of transparency, leading to the recent public apology and policy change.
“Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff.”
— Anthropic spokesperson
AI transparency monitoring software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Unclear Scope of Past Invisible Restrictions
It remains unclear how extensively these hidden safeguards were applied across different models and queries, and what impact they may have had on research, third-party testing, or development of competing systems. Details about the duration and specific circumstances of these restrictions are still emerging.
AI model restriction notification system
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Next Steps for Transparency and Policy Changes
Anthropic plans to implement clearer notification systems for all safety restrictions and will review its safeguard policies. The company may also face scrutiny from regulators and the research community regarding transparency practices. Further updates are expected as Anthropic considers releasing a Linux version of Claude.
AI research safety compliance tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Why did Anthropic use invisible safeguards on Claude Fable?
Anthropic aimed to deploy safety measures quickly without hindering model performance, believing invisible safeguards would be more effective and less prone to false positives. However, they now recognize that this approach lacked transparency.
What kind of restrictions were applied to Claude Fable?
Restrictions included limiting responses to high-risk queries, such as those involving model distillation, biology, chemistry, and cybersecurity. These restrictions were previously hidden from users.
How will Anthropic improve transparency moving forward?
Anthropic has committed to clearly notifying users when safety restrictions are active and will disclose when safeguards are triggered, including for distillation and other sensitive areas.
Yes, the lack of transparency could have hindered third-party testing, research, and competitive analysis, raising concerns about fairness and openness in AI development.
Will this change impact the safety or performance of Claude Fable?
The company asserts that routing high-risk queries to previous models and increasing transparency will not compromise safety but aims to balance safety with openness.
Source: Hacker News