US judge rules Anthropic’s use of books for AI training is fair use: All you need to know
A US federal judge has ruled that Anthropic’s use of copyrighted books to train its artificial intelligence system falls under fair use, but found the company in breach of copyright law for storing pirated digital copies of millions of titles. The decision, issued late on Monday by District Judge William Alsup in San Francisco, marks a significant development in ongoing legal battles over how AI companies use copyrighted material.
The lawsuit, brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, alleges that Anthropic used pirated versions of their works without permission or compensation to develop its Claude large language model. Filed as a proposed class action last year, the case is among several facing AI developers, including OpenAI, Meta, and Microsoft, by authors and publishers over the unauthorised use of creative works in training datasets.
Judge Alsup sided with Anthropic on the central claim, ruling that the company’s use of the books during AI training was “exceedingly transformative” and therefore protected under the doctrine of fair use. He wrote, “Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different.”
However, the judge drew a line at the company’s storage of over seven million pirated books in a so-called “central library,” which he said went beyond acceptable limits of fair use. He found this action constituted copyright infringement and scheduled a jury trial in December to determine potential damages. Under US copyright law, damages for wilful infringement can reach up to $150,000 per work.
Anthropic, which is backed by tech giants Amazon and Alphabet, has yet to issue a statement on the ruling. In previous court filings, the company argued that its AI training methods were legally permissible and promoted innovation. It also asserted that the source of the training data, whether obtained from legitimate or pirated sources, was irrelevant to the issue of fair use.
Judge Alsup rejected that argument, expressing scepticism over the necessity of using pirated materials. “This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” he wrote.
The ruling represents the first time a court has directly addressed the fair use defence in the context of generative AI, a legal area still largely unsettled. It highlights the growing tension between copyright holders and AI firms over how creative works are sourced and used in machine learning.
The case will now proceed to trial in December, where a jury will determine how much Anthropic must pay for its unauthorised storage of copyrighted material.
(With inputs from Reuters)