SUPERBASH_ 繁體中文

Anthropic 認為過於危險而無法公開發布的 Claude Mythos 模型，在公告數小時內被未授權用戶存取。此次洩露暴露了負責任 AI 開發中的根本矛盾。

On the same day Anthropic publicly announced Claude Mythos — a model the company described as capable of finding novel cybersecurity vulnerabilities at a level exceeding human expert performance — a small group of users in a private Discord server had already gained unauthorized access to it. The breach, first reported by Bloomberg, was not the result of sophisticated hacking. It was the result of a contractor with legitimate access and a group that had learned enough about Anthropic's internal infrastructure to make an educated guess about where the model was hosted.

The incident is simultaneously mundane and alarming. Mundane because insider threats and supply chain vulnerabilities are among the most common vectors for corporate data breaches. Alarming because the asset in question is not a database of customer records or proprietary source code — it is an AI system that Anthropic's own safety team concluded was too dangerous to release to the general public.

"If some random Discord forum got access to it, it's already been breached by adversaries. The real question is not whether others have it — it's what they're doing with it."
— David Lindner, CISO, Contrast Security

Mythos was designed with a specific and narrow purpose: to find cybersecurity vulnerabilities faster and more comprehensively than human researchers. Anthropic demonstrated its capabilities by using the model to identify a 27-year-old vulnerability in OpenBSD, an operating system specifically designed for security. Mozilla subsequently used a preview version to find and patch 271 vulnerabilities in Firefox. These are not theoretical capabilities — they are demonstrated, reproducible results.

The model was deliberately restricted to a vetted group of 40 companies, including Microsoft, Apple, and Google, each of which agreed to strict usage policies and security protocols. Across those 40 companies, however, thousands of individual employees had some level of access. The contractor who provided the initial foothold was one of those thousands. Security professionals have a name for this problem: the insider threat surface scales with the number of trusted parties, and trust cannot be perfectly audited.

What makes the Mythos breach particularly significant is what it reveals about the structural challenges of responsible AI deployment. The standard model for releasing dangerous technology — whether it is biosafety level 4 pathogens, nuclear materials, or classified intelligence — involves physical containment, strict access controls, and severe legal consequences for unauthorized disclosure. None of these mechanisms translate cleanly to software.

"The more they add to this elite group, the more likely it was to get released to someone who should not have access. It was bound to happen."
— David Lindner, CISO, Contrast Security

The breach also complicates Anthropic's relationship with its critics. Sam Altman, OpenAI's CEO, publicly characterized Anthropic's marketing of Mythos as fear-based, suggesting the company was overstating the model's danger to generate attention and justify its safety-focused positioning. The unauthorized access incident cuts both ways: it suggests the model is real and capable enough to attract sophisticated interest, but it also raises questions about whether Anthropic's security posture matched the severity of the risk it claimed to be managing.

For enterprise security teams, the Mythos leak is a preview of challenges that will only intensify. As AI models become more capable of autonomous action — finding vulnerabilities, writing exploits, executing attacks — the question of who has access to those capabilities becomes existential. The current framework, in which AI companies self-regulate access through contractual agreements, is clearly insufficient.

Anthropic has stated it is investigating the incident and working with affected parties. The unauthorized users, according to Bloomberg, have not used the model for malicious purposes — they have simply been using it continuously since gaining access. That distinction may matter legally, but it does little to address the underlying security architecture problem that the breach has exposed.

Anthropic 最危險的 AI 模型遭洩露。這對 AI 安全意味著什麼。