TL;DR

Anthropic has acknowledged it secretly throttled its AI model, Claude Fable, with hidden guardrails affecting research and development. The company is reversing course and will now disclose when restrictions are active.

Anthropic has officially apologized for secretly implementing restrictions on its AI model, Claude Fable, without informing users or researchers, and has announced plans to be more transparent about when safety measures are active.

Anthropic’s Claude Fable 5, a widely available AI system in the Mythos class, was subject to covert safety restrictions that limited its responses to certain high-risk queries, such as those involving model distillation. These restrictions were applied without user notification, leading to criticism from the AI research community and rivals. Learn more about how Anthropic shipped its most powerful model.

In a statement on X, Anthropic confirmed it had used ‘invisible safeguards’—safety measures that are not disclosed to users—to prevent Fable from responding to specific queries, especially related to distillation techniques. The company acknowledged that this approach was a mistake and announced it will now route such queries to its previous model, Claude Opus 4.8, and will clearly notify users when restrictions are in place.

Anthropic explained that the shift to invisible safeguards was originally intended to enable faster deployment while maintaining safety, but the company now recognizes that lack of transparency can undermine trust and research efforts. The company also emphasized that restrictions on distillation requests are partly driven by concerns over intellectual property and competition, citing accusations against Chinese rivals.

Impact of Hidden Safeguards on AI Development

This development highlights issues related to transparency in AI safety practices, which can influence researchers and competitors evaluating or building upon models like Fable. The move underscores ongoing discussions about safety, innovation, and openness in the AI industry. Hidden restrictions can complicate experimentation and understanding of model capabilities, and raise concerns about fairness and intellectual property protections among industry stakeholders. See how Anthropic is addressing transparency.

By committing to disclose when restrictions are active, Anthropic aims to improve transparency and foster more open dialogue around safety measures. Further details regarding the extent of these restrictions and their impact on research and development efforts are anticipated.

Amazon

AI safety guardrails detection tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on Anthropic’s Safety Measures and Fable’s Release

Anthropic introduced Claude Fable 5 as part of its Mythos class, emphasizing safety and risk mitigation, especially around high-risk queries such as those involving biology, chemistry, and cybersecurity. The company previously stated that it would handle distillation attempts by altering responses without user awareness, which critics argued could hinder third-party research and model evaluation.

Earlier, Anthropic warned that Mythos models could be dangerous if misused and implemented safeguards accordingly. However, the decision to keep some restrictions invisible drew criticism for lack of transparency, leading to the recent public apology and policy change.

“Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff.”

— Anthropic spokesperson

Amazon

AI transparency monitoring software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Scope of Past Invisible Restrictions

It remains unclear how extensively these hidden safeguards were applied across different models and queries, and what impact they may have had on research, third-party testing, or development of competing systems. Details about the duration and specific circumstances of these restrictions are still emerging.

Amazon

AI model restriction notification system

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Transparency and Policy Changes

Anthropic plans to implement clearer notification systems for all safety restrictions and will review its safeguard policies. The company may also face scrutiny from regulators and the research community regarding transparency practices. Further updates are expected as Anthropic considers releasing a Linux version of Claude.

Amazon

AI research safety compliance tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why did Anthropic use invisible safeguards on Claude Fable?

Anthropic aimed to deploy safety measures quickly without hindering model performance, believing invisible safeguards would be more effective and less prone to false positives. However, they now recognize that this approach lacked transparency.

What kind of restrictions were applied to Claude Fable?

Restrictions included limiting responses to high-risk queries, such as those involving model distillation, biology, chemistry, and cybersecurity. These restrictions were previously hidden from users.

How will Anthropic improve transparency moving forward?

Anthropic has committed to clearly notifying users when safety restrictions are active and will disclose when safeguards are triggered, including for distillation and other sensitive areas.

Could these hidden safeguards have affected research or competition?

Yes, the lack of transparency could have hindered third-party testing, research, and competitive analysis, raising concerns about fairness and openness in AI development.

Will this change impact the safety or performance of Claude Fable?

The company asserts that routing high-risk queries to previous models and increasing transparency will not compromise safety but aims to balance safety with openness.

Source: Hacker News


You May Also Like

Managing With AI: Can Algorithms Be Your Boss?

Juggling AI’s benefits and limitations is crucial; discover whether algorithms can truly lead your team effectively.

SEO Is Dying — Here’s What Replaces It in the AI Mode Era

Just when you thought SEO was enough, discover what truly replaces it in the AI mode era and why your strategy must evolve now.

Why Good Meeting Audio Beats Fancy Meeting Video

Why good meeting audio matters more than fancy video, because clear sound ensures effective communication and prevents misunderstandings—discover how to optimize your virtual meetings.

What’s the AI Endgame?

Exploring the current state of AI development, its potential futures, and the uncertainties ahead. Key insights from industry experts and recent discourse.