Anthropic Wins Major Fair Use Victory for AI Training
Judge William Alsup just ruled that training AI on copyrighted books is fair use. But building permanent libraries of pirated books is not — even when used for training. The ruling in Bartz v. Anthropic splits the difference in a way that reshapes the legal ground for every AI company.
From Piracy to Purchase
Anthropic's data collection history tells the whole story. Founded by ex-OpenAI researchers in February 2021, the company started with pirated content.
Co-founder Ben Mann downloaded Books3 in early 2021 — 196,640 pirated books. By June 2021, he had downloaded at least five million books from Library Genesis. In July 2022, Anthropic added two million more from the Pirate Library Mirror. All of these sources were known to contain unauthorized copies.
Then Anthropic changed course entirely. In February 2024, they hired Tom Turvey, former head of partnerships for Google's book-scanning project. His mission: obtain "all the books in the world" while avoiding "legal/practice/business slog."
Turvey's team spent millions buying print books, often used. They stripped bindings, cut pages to size, scanned them into PDFs, and discarded the physical copies.
The Ruling
Judge Alsup's 32-page decision draws lines that will define AI copyright law.
Fair Use
AI Training: The court called training LLMs on copyrighted books "spectacularly transformative." The judge compared it to how humans learn from reading — forcing people to pay "each time they read, each time they recall from memory, each time they later draw upon it when writing new things" would be unthinkable.
Purchased-and-Scanned Books: Converting bought print books to digital for internal use is fair use, though on narrower grounds. The court treated it as format shifting.
Not Fair Use
Pirated Central Library: Maintaining a permanent digital library of millions of pirated books is not fair use. The court stressed that Anthropic kept pirated copies even after deciding not to use them for training.
What AI Companies Should Take Away
- Training on copyrighted material can be fair use when the process is transformative and doesn't produce infringing outputs
- Legitimate purchase and digitization for internal AI training is likely protected
- No special carveout for AI: The court stated plainly, "There is no carveout from the Copyright Act for AI companies"
- Piracy isn't excused by downstream fair use
- Intent matters: The court weighed whether companies actively sought pirated content
What Happens Next
Anthropic won on the training question but still faces a jury trial over damages for their pirated book library. Statutory damages for willful infringement could be steep.
OpenAI, Meta, and others used similar datasets — Books3 was part of Meta's LLaMA training data. They are watching this ruling closely.
Unanswered Questions
- What counts as "transformative" use across different AI contexts?
- How does fair use apply to images, videos, or code?
- What licensing models emerge to serve both creators and AI companies?
Training on copyrighted content can be fair use, but building pirate libraries is not. The smartest approach is what Anthropic eventually adopted: invest in legitimate content acquisition, even when it is expensive.