By Elsie Wang
Artificial intelligence systems that generate text, images, music, and code are trained on vast amounts of existing material. Inevitably, much of that material is protected by copyright. This has triggered so far more than 40 lawsuits in the United States filed by authors, artists, and other creators against AI developers.
At the heart of these disputes is a simple but powerful question: If an AI company copies copyrighted books or images to train its model, is that copyright infringement?
Two recent U.S. cases, Bartz v. Anthropic and Kadrey v. Meta, are among the first to offer meaningful judicial guidance. While the rulings do not settle the debate, they begin to clarify how courts may approach AI training under copyright law.
Why Is AI Training Legally Controversial?
To train large language models (LLMs) or other generative AI systems, developers typically copy large quantities of data, including books, articles, and other expressive works, into their systems for analysis.
Correspondingly, authors usually argue that the unauthorized reproduction of their works constitutes copyright infringement. AI companies, however, respond by claiming that the original works are not reproduced or published verbatim and are used solely for model analysis, and therefore such use constitutes “fair use” under U.S. copyright law.
In U.S. copyright law, fair use allows certain uses of copyrighted material without permission, especially when the use serves a new and different purpose.
What Did the Courts Say?
In March 2025, in the cases of Bartz v. Anthropic and Kadrey v. Meta, Bartz and Kadrey served as lead plaintiffs in class actions brought against Anthropic and Meta respectively, representing groups of authors whose works had been used to train the Claude and Llama models.
In June 2025, the judges in both cases generally supported the defendants’ fair use defenses, but still expressed doubts regarding a small portion of the defendants’ arguments.
Both courts agreed on one important point: Using copyrighted works to train AI models can, in some situations, be considered “transformative.”
1. AI training may constitute “transformative use.”
Under U.S. law, one of the most important fair use factors is whether the use is transformative: that is, whether it adds something new or uses the work for a different purpose.
The judges in both cases reasoned as follows: authors create books for the purpose of educating or entertaining readers, whereas AI companies use the content of books for statistical analysis in order to train models. Because the purposes are different (reading vs. machine learning), the courts considered the training use to be highly transformative.
However, it should be emphasized that even if a use is transformative, this does not automatically make it lawful. Courts must still evaluate the other fair use factors as part of a holistic analysis. In these cases, the courts also considered the nature of the works, noting that highly expressive works such as novels and memoirs generally receive stronger copyright protection, whereas factual works such as manuals enjoy a relatively narrower scope of protection.
The courts acknowledged that AI developers selected these books precisely because of their rich linguistic expression, and therefore this factor weighed to some extent against the fair use defense. Nevertheless, this factor alone was not sufficient to determine the outcome of the cases.
2. Extent and Purpose of Copying the Works
Generally, copying an entire work weighs against a finding of fair use. However, courts will further examine the specific purpose for which the defendant copied the entire work.
In both cases, the AI developers argued that full texts were technically necessary in order to effectively analyze language patterns. Both judges accepted this explanation and considered that, in the context of AI training, copying entire works may be reasonable.
3.The Most Controversial Issue: “Market Dilution”
The most controversial issue in these cases concerns a new theory: whether AI systems may indirectly harm the market for human creators by generating large volumes of content. This theory is often referred to as “market dilution.”
Specifically, some authors argue that AI can rapidly generate massive amounts of content which, even if it does not directly copy any particular work, may still compete with human-created works. If readers increasingly turn to AI-generated novels or articles, authors’ income could decline, and in the long term this may weaken incentives for creative production.
This theory differs from the traditional copyright concept of direct substitution (such as pirated books replacing legitimate sales). Instead, it emphasizes an indirect form of market pressure.
Judge Alsup in the Bartz case expressed skepticism toward this theory, considering it overly speculative. By contrast, Judge Chhabria in the Kadrey case indicated that if plaintiffs in the future are able to present concrete evidence showing that AI-generated content has caused actual economic harm, courts may treat such claims more seriously. In particular, the possibility that generative AI outputs may create an indirect form of market substitution could be taken into account when evaluating a fair use defense.
Nevertheless, the courts have made clear that what is required is actual and specific evidence of harm, rather than mere predictions. This issue is therefore likely to be further examined in appeals and future cases.
Significance of the Cases
1. For AI Developers
The above decisions suggest:
Courts may accept that AI training is transformative.
Copying entire works may be justified for technical reasons.
However, whether AI-generated works may lead to “market dilution” remains unsettled, and courts have not yet shown a clear inclination on this issue.
Courts also hinted that instead of banning AI training altogether, they might favor monetary compensation or licensing solutions.This signals a possible future where licensing markets develop between creators and AI companies.
2. For Authors and Creators
From the perspective of authors, well-known writers may be less affected due to their established brands and reputations. By contrast, emerging creators may face greater challenges.
Writers producing genre-based or formulaic content (such as romance, lifestyle writing, or certain forms of journalism) may experience stronger competitive pressure, as these types of works are more easily replicated at scale by AI-generated content.
3. International Perspective
Although these cases arise under U.S. law, their implications are global. The issue of AI training has become a major topic of academic and policy debate across jurisdictions. Policymakers worldwide are considering a number of key questions: whether AI training should require authorization from right holders, whether compensation mechanisms should be established, and how to balance the protection of creators with the promotion of technological innovation. The answers to these questions are still evolving.
The answers are still evolving.
Final Thought
These early court decisions do not give AI companies a blank check. Nor do they guarantee victory for authors.
Instead, they show that some AI training uses may qualify as fair use. Others may not, especially if economic harm becomes clearer. Courts are cautious about stopping technological development outright. But they are also mindful of protecting creative industries.
The legal framework is still developing, and appeals are likely.
The debate over AI training and copyright is not just about law, but also about how societies value creativity, innovation, and economic fairness.
As AI systems grow more powerful, courts worldwide will continue shaping the balance between encouraging technological progress, and protecting the human effort behind creative works.
For now, the conversation is far from over.