Yet the controversy does not stop there. Both Anthropic and OpenAI are themselves facing legal challenges over how they trained their own systems. Into this already volatile mix stepped Elon Musk, who accused Anthropic of large-scale data theft. Meanwhile, Reuters reported that DeepSeek may have trained its model using Nvidia’s most advanced chip despite US export controls.
All these developments suggest a broader and far more complex struggle, a great AI heist in a hall of mirrors where nearly every major player stands accused.
The distillation attacks
Anthropic’s complaint centers on a technique known as distillation. In standard practice, distillation allows smaller AI models to mimic the performance of larger, more capable systems by learning from their outputs. It is widely used within companies to produce cheaper, faster versions of their own models. But Anthropic claims this practice was weaponised against it.
According to the company’s statement, DeepSeek, Moonshot AI and MiniMax allegedly flooded Claude with vast quantities of specially engineered prompts. The objective, Anthropic says, was to extract specific capabilities from Claude and transfer them into proprietary Chinese models. Despite service restrictions barring commercial access to Claude in China, the firms allegedly used commercial proxy services to bypass these safeguards.
Anthropic estimates that approximately 24,000 fraudulently created accounts were used to generate more than 16 million exchanges with Claude. Of those, it says MiniMax alone accounted for over 13 million interactions. These outputs were allegedly harvested en masse, either directly to train rival systems or to power reinforcement learning processes in which models improve through repeated trial-and-error without human guidance.OpenAI had made comparable claims weeks earlier, stating in a letter to US lawmakers that it observed activity “indicative of ongoing attempts by DeepSeek to distill frontier models of OpenAI and other US frontier labs, including through new, obfuscated methods.” The company had reportedly raised concerns as early as January 2025, when observers noted striking similarities between DeepSeek’s initial model and ChatGPT.
Distillation itself is not controversial in principle. Anthropic acknowledged that AI firms routinely distill their own models to create smaller, more efficient versions. What alarms American firms, however, is the prospect of rivals gaining frontier-level capabilities “in a fraction of the time, and at a fraction of the cost” required to develop them independently.
Also Read |Anthropic touts new AI tools weeks after legal plug-in spurred market rout
Corporate dispute as geopolitical contest
Both Anthropic and OpenAI have framed these alleged actions not merely as intellectual property violations but as national security threats.
OpenAI described the practice as “adversarial distillation,” while Anthropic warned of the risk that “authoritarian governments deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance.” By situating the issue within the broader geopolitical rivalry between the US and China, the companies have effectively elevated what could be seen as corporate misconduct into a matter of state concern.
On the same day Anthropic released its statement, Reuters reported that US officials had found evidence suggesting DeepSeek trained its AI model using Nvidia’s flagship Blackwell chip, potentially violating US export controls. The report, citing anonymous senior officials, indicated that China’s rapid AI gains may be tied to the use of restricted American hardware.
If accurate, such findings would deepen Washington’s anxieties. The US has sought to slow China’s access to advanced semiconductors precisely because of their importance to frontier AI development. The suggestion that a Chinese firm accessed Nvidia’s best chip despite restrictions adds fuel to an already combustible debate about enforcement and technological containment.
Also Read | What is Anthropic’s Claude Code Security and how does it work?
A hall of mirrors
Yet the moral clarity of the American firms’ position is complicated by accusations directed at them. Elon Musk, writing on his social media platform X, suggested that Anthropic itself has engaged in similar behavior. He alleged that the company “is guilty of stealing training data at massive scale” and referenced reports that it had used copyrighted books and freely available online data to train its systems. Musk claimed Anthropic had paid multi-billion dollar settlements related to these practices.
Last year Anthropic settled a $1.5 billion lawsuit with authors and publishers who alleged that it used copyrighted books without permission. OpenAI is also facing multiple high-stakes lawsuits from news organizations, authors, artists and private individuals. It is argued that OpenAI’s extensive web scraping violated copyright law, privacy rights and terms of service agreements.
OpenAI has defended itself by invoking the “fair use” doctrine under US copyright law, arguing that training AI models on publicly accessible internet data constitutes a transformative use. The company has also entered into licensing agreements with certain organizations and introduced opt-out mechanisms for publishers.
In this light, the dispute over distillation begins to resemble a hall of mirrors. Chinese firms are accused of extracting intelligence from American AI models. American firms, in turn, are accused of extracting intelligence from the open web, often without explicit permission. The central difference may lie less in the act of extraction than in who is extracting from whom.
Distillation versus scraping
The controversy raises a fundamental question: when does learning become theft? Distillation between models can be seen as analogous to reverse engineering, benchmarking or even competitive analysis which are long-standing practices in technology industries. However, the scale described by Anthropic — tens of millions of exchanges across thousands of accounts — suggests automation designed specifically to harvest capabilities.
On the other hand, web scraping for training data has been foundational to the rise of large language models. The distinction between “publicly accessible” and “publicly licensed” remains legally unsettled. Courts are still grappling with whether mass data ingestion for AI training is protected under fair use or constitutes infringement.
Many expect copyright law to undergo reinterpretation in the AI era. If scraping public text for training is eventually restricted, the business models of many leading AI firms would face significant disruption. Conversely, if distillation across competing models becomes normalised, the economic moat around frontier AI systems could erode rapidly.
The great AI heist
The phrase “AI heist” captures the atmosphere of suspicion now surrounding the global AI race. American firms accuse Chinese competitors of siphoning capabilities. Chinese firms are suspected of circumventing hardware export controls. American companies face lawsuits alleging they built their own systems on unlicensed data. And tech leaders like Elon Musk openly challenge the moral authority of those raising alarms.
At stake is not only commercial dominance but strategic advantage. Frontier AI systems are increasingly viewed as dual-use technologies with implications for cyber operations, surveillance and information warfare. In that context, the question of who trains on whose data becomes inseparable from national power.
The AI revolution was built on open research, shared papers and publicly available data. Now, as models approach ever more powerful capabilities, the ecosystem is hardening into guarded fortresses.