
Fair Use vs Piracy – How Anthropic’s Claude case just created the AI training playbook
June 23, 2025 marks a watershed moment in generative artificial intelligence law: for the first time in American jurisprudential history, a federal court has delineated with surgical precision the legal boundaries for using copyright-protected content in AI system training. The ruling by Judge William Alsup (Bartz v. Anthropic PBC) in Bartz v. Anthropic PBC represents the first comprehensive operational framework defining the regulatory parameters of commercial AI training within the digital copyright context.
Origins of the Controversy and the “Dual-Track” Strategy
The legal proceeding originated from an action brought by three authors – Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson – who discovered the unauthorized use of their literary works for training Claude, the AI assistant developed by Anthropic PBC, a company backed by tech giants including Amazon and Alphabet. The case’s peculiarity lies in the dual operational strategy adopted by Anthropic: on one hand, downloading over seven million works from pirate sources such as Books3, LibGen, and PiLiMi; on the other, legitimately purchasing millions of physical copies subjected to digitization through “destructive” scanning (i.e., without further possibility of making physical copies of the book).
This dual acquisition method allowed the judge to operate a fundamental distinction between two distinct legal categories, analyzing them separately through an innovative interpretation of fair use doctrine, which could now constitute the reference paradigm for the entire technology ecosystem.
Evolutionary Interpretation of Fair Use: From Copying to Transformation
The ruling introduces a highly advanced interpretation of fair use doctrine, adapting it to the technological specificities of generative artificial intelligence. Judge Alsup established that training large language models (LLMs) constitutes an “exceedingly transformative” use of original works, clearly distinguishing between mere digital reproduction and the extraction of statistical patterns aimed at generating original content.
The court emphasized how “language models have not reproduced to the public the creative elements of a given work, nor even the identifiable expressive style of an author.” This distinction assumes crucial relevance in the context of U.S. copyright law, where the transformativeness of use represents one of the four cardinal factors in fair use evaluation.
Judge Alsup used a particularly evocative metaphor to describe the process: “Like any reader who aspires to become a writer, Anthropic’s language models trained on works not to replicate or supplant them, but to change them and create something different.” This analogy with the human creative process establishes a precedent of cardinal importance.
Tripartite Classification of Acquisition Methods: A New Legal Framework
The most innovative aspect of the ruling lies in the clear distinction made between different content acquisition methods. Judge Alsup’s decision outlines a legal tripartition that configures three distinct categories, each subject to an autonomous and specific regulatory regime.
First Category: Training Large Language Models with Legitimate Content
The use of legitimately acquired works for artificial intelligence training is recognized as fair use due to its “spectacularly transformative” nature. The court equated this process to human learning, establishing that computational analysis of linguistic structures aimed at generating new content represents a substantial transformation of the original work, not mere reproduction.
Second Category: Print-to-Digital Conversion through Legitimate Format Shifting
The court recognized the legality of format conversion of legitimately acquired works, applying the principles of the first sale doctrine provided by Section 109(a) of the Copyright Act. This decision is based on precedents established in Sony Corp. v. Universal City Studios (1984) for time-shifting and Authors Guild v. Google (2015) for massive digitization. The technical-legal requirements identified by the court for this category include complete substitution with destruction of the original copy, maintaining a one-to-one ratio without copy multiplication, optimization purposes for storage and searchability, and exclusive internal use without any form of redistribution.
Third Category: Use of Pirate Libraries
The use of pirate libraries represents an unsalvageable copyright violation. The judge categorically rejected the argument that transformative final intent could justify initial pirate acquisition. The court applies the objective test established in the Warhol case (2023), emphasizing that each phase of the process requires autonomous justification and cannot be legitimized retroactively by final use. Copying and archiving books from pirate sources therefore constitutes a copyright violation that cannot be cured by invoking fair use doctrine.
The ruling thus establishes a cardinal principle: willful copyright violation can justify a damage claim that, based on U.S. statutory damage regulations, could reach $150,000 per infringed work. These figures, which will likely be subject to negotiation in subsequent litigation phases, must be determined in the trial scheduled for December 2025.
Systemic Implications for the Technology Sector and Corporate Compliance
The decision’s scope goes well beyond the specific case, outlining an operational framework that AI companies must necessarily implement. The ruling establishes a de facto legal “safe harbor” for AI training, however subordinated to compliance with specific procedural and substantive criteria including:
Due diligence on dataset provenance becomes imperative, requiring thorough verification of source legality. Documentation of the acquisition chain must guarantee complete traceability, while differentiation between legitimate origin and potentially problematic content becomes an essential operational requirement. Periodic audits on training process compliance complete this compliance architecture.
This represents good news for those industriously working with training generative AI models using proprietary data. Thus, there is a strong chance for European entities and companies to enter the global race for developing artificial intelligence models that do not present legal issues.
Text Specificity vs. Visual Creative Forms
A crucial aspect that the Anthropic ruling does not completely address concerns the specificity of textual material compared to other forms of creative expression, particularly relevant in the context of visual generation systems like Midjourney or ChatGPT/DALL-E. While in Anthropic’s case the court could distinguish between linguistic pattern extraction and literal reproduction, this distinction becomes significantly more complex when applied to systems generating visual outputs.
Disney and Universal recently sued Midjourney, marking the first major legal clash between Hollywood studios and an artificial intelligence company. The controversy highlights how basic tests show this tool can easily recreate images nearly identical to Marvel movie frames, raising serious doubts about the transferability of the Anthropic precedent to other creative contexts.
In the image context, the demarcation line between transformation and reproduction is inherently less clear. An image generation model could theoretically reproduce stylistic elements attributable to specific artists much more directly than a language model could do with a writer’s style. The complaints include multiple examples of Midjourney image outputs produced by simple text prompts juxtaposed with images of original copyright-protected characters.
This ontological difference between text and image appears to make the applicability of the transformativity test established in the Anthropic ruling non-universal. While identification of linguistic matrices from text can be considered highly transformative, generating images that replicate distinctive artistic styles or recognizable characters presents qualitatively different legal challenges.
Plausibly, following Judge Alsup’s interpretive lines to visual Generative AI models, fair use applicability might not be considered.
Regulatory Evolution and International Perspectives
The regulatory vacuum that the Alsup ruling seeks to temporarily fill requires structural legislative intervention. According to U.S. Copyright Office guidelines, copyright protection applies only to content produced by human creativity.
The American approach, based on fair use interpretive flexibility, contrasts with the European framework outlined by the AI Act, which – at least in principle – privileges an ex ante approach based on transparency obligations and preventive opt-out mechanisms. While the American system has currently entrusted courts with case-by-case evaluation of AI training legitimacy, the European model imposes standardized procedural obligations.
A model widely criticized by laissez-faire supporters but which, by pushing toward the possibility of training AI datasets with proprietary data, could enable the realization of models legally compatible with copyright regulations and, without this seeming utopian and especially without infringing authors’ rights.
Key Legal Citations and Resources
- Bartz v. Anthropic PBC, Case No. 3:24-cv-05417 (N.D. Cal. 2025)
- U.S. Copyright Act, 17 U.S.C. § 107 (Fair Use)
- Sony Corp. of America v. Universal City Studios, 464 U.S. 417 (1984)
- Authors Guild v. Google, 804 F.3d 202 (2d Cir. 2015)
- Andy Warhol Foundation v. Goldsmith, 598 U.S. ___ (2023)
- Disney & Universal v. Midjourney (C.D. Cal. 2025)
- EU AI Act (Regulation 2024/1689)
Related Coverage
- TechCrunch: Federal judge sides with Anthropic in lawsuit over training AI on books
- Reuters: Anthropic wins key US ruling on AI training in authors’ copyright lawsuit
- Washington Post: Federal court says copyrighted books are fair use for AI training
- CNBC: Judge rules Anthropic’s training of AI with books is fair use