08 de julho de 2025
Share
AI Training and Fair Use: U.S. Court Upholds Use of Lawfully Acquired Works and Rejects Pirated Copies
In a decision issued in June 2025, the United States District Court for the Northern District of California held that the use of copyrighted works for training language models by artificial intelligence systems may, under certain circumstances, constitute fair use pursuant to Section 107 of the U.S. Copyright Act. Conversely, the Court rejected the application of the fair use doctrine with respect to the general maintenance of pirated copies in the company’s internal library.
The controversy arose in August 2024, when three authors filed a class action lawsuit against Anthropic PBC, developer of the AI system Claude. The plaintiffs alleged that their works were copied without authorization and used for training language models, thereby infringing their copyrights.
The record established that Anthropic copied millions of books, including works authored by the plaintiffs, from two principal sources: pirated digital libraries and large-scale purchases of physical copies, which were subsequently digitized following the destruction of the originals.
During the proceedings, Anthropic requested summary judgment based on the fair use doctrine. In ruling on the request, the Court separately analyzed the various uses made by the company, classifying them into three main categories. These categories were: the training of Large Language Models (LLMs) using subsets of books; the digitization of lawfully acquired printed copies; and the creation and maintenance of an internal digital library containing pirated copies.
Each category was analyzed based on the four factors set forth in Section 107 of the U.S. Copyright Act, which are used to determine whether a particular use of a copyrighted work qualifies as fair use, namely: the purpose and character of the use — including whether such use is of a commercial nature or for nonprofit educational purposes; the nature of the copyrighted work; the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and the effect of the use upon the potential market for or value of the copyrighted work.
With respect to the first category, involving training of LLMs with lawfully obtained works, the Court found the use to be transformative, as it did not aim to replicate or supplant the original texts but rather to extract linguistic patterns to generate novel outputs. Although the use entailed full copying of creative works, the extent was deemed technically necessary, and no significant market harm was demonstrated. The Court analogized the process to a reader who, having assimilated diverse styles, composes original works—an analogy that reinforced the legitimacy of the use under fair use principles.
Regarding the digitization of lawfully purchased books for internal use, the Court recognized factors favoring fair use. Concerning the purpose and character of the use, it was held that substituting the physical copy with a digital version, without public redistribution, supports the finding of fair use, as the use is functional rather than expressive. The second factor, addressing the nature of the work, weighed slightly against fair use given the high degree of protection afforded by creative works. The third factor, concerning the quantity and substantiality of the portion used, favored fair use because the digitization was strictly necessary to create a digital replacement copy. Finally, the market effect was deemed neutral, as the digital copy neither supplanted nor adversely affected the market for the original.
As to the construction of an internal library comprising pirated copies, the Court concluded that maintaining unauthorized works, even for training purposes, does not fall within the ambit of fair use. This conclusion was grounded on the fact that such copies directly displaced demand for the original books, constituting a market substitution “copy for copy.” Moreover, these were complete reproductions of creative works, lacking any sufficiently transformative justification to legitimize the practice.
In conclusion, the court found, in a summary judgment, that in the case under analysis, the use of protected works for training LLMs constituted fair use, since the digitization was carried out from lawfully acquired copies and the use was deemed transformative. In contrast, it was decided that the maintenance of pirated copies in an internal library does not qualify as fair use, given that all the factors established in copyright law weighed against such practice. It was also decided that a trial will be held to determine liability for the use of pirated copies in the creation of Anthropic’s central library, as well as the damages resulting from this conduct.
The judgment can be accessed through the following link: ORDER ON FAIR USE
Note: For quick release, this English version is provided by automated translation without human review.