Silverman’s suit claims that ChatGPT can summarize parts of the the book, so it has clearly read the book, and since the book is copyrighted, this constitutes a violation.
I tested it, and as of today, ChatGPT’s summary of chapter 1 of The Bedwetter is pretty vague. It’s possible that OpenAI has directed ChatGPT to stop summarizing the book, as a way of evading the lawsuit. But when I asked ChatGPT to summarize chapter 1 of Malcolm Gladwell’s book The Tipping Point, it was able to provide a detailed summary.
So let’s accept Silverman’s legal claim — that OpenAI found a copy of the book on a pirate site, read it, and trained ChatGPT with it. Is anything actionable happening here?
I can’t find the copyright violation
Anyone who makes and posts illegal copies of copyrighted content is guilty of a copyright violation. As an author, I’ve found copies of my books on pirate sites, and I found that outrageous. To the extent possible, copyright owners and publishers should take whatever action they can to take such copies down and prosecute habitual copyright violators.
But let’s take a look at what OpenAI is accused of doing and try to find the violation.
- Accessing a site with pirated content is not illegal.
- Reading content on such a site does not, in itself, violate copyright. (But more on this point below.)
- Learning from what you read is obviously not illegal.
- Summarizing what you read is not illegal. (There is a whole industry of book summarizers offering summaries of books — and as far as I know, no one has ever won a lawsuit against them.)
Of these steps, only step 2 is problematic. Clearly, if a person reads a book, there is no copyright violation. But what if a computer reads it?
This passage from the AP article is relevant:
It may be a tough case for writers to win, especially after Google’s success in beating back legal challenges to its online book library. The U.S. Supreme Court in 2016 let stand lower court rulings that rejected authors’ claim that Google’s digitizing of millions of books and showing small portions of them to the public amount to “copyright infringement on an epic scale.”
“I think what OpenAI has done with books is awfully close to what Google was allowed to do with its Google Books project and so will be legal,” said Deven Desai, associate professor of law and ethics at the Georgia Institute of Technology.
Here ChatGPT is ingesting (rather than digitizing) millions of books and showing alternate versions of their content to the public — very much analogous to Google’s behavior. Paraphrasing and summarizing content does not constitute a copyright violation, as the presence of all those book summaries on Amazon demonstrates.
I’m not a lawyer, but I’ve analyzed media copyright developments for decades. You can bet that OpenAI will not settle here, because to do so would be to invite tens of thousands of suits from every copyright owner on the planet.
If you think this is unfair, you may have a point, but unfair is not the same as legally actionable.
I see two possible outcomes that could attempt to compensate copyright owners fairly.
The first would be a (very) small fee that large language models would pay to copyright owners for use of their content and release from all legal claims. Even if the fee added up to billions of dollars, the compensation for any individual copyright owner would be meager. It would even harder to make money from this as for a musical artist to make money from streaming.
Alternatively, Congress and other national authorities could address this through new copyright laws. Such changes are so contentious that they tend to pass no more than once a generation; the last major revision was the Digital Millennium Copyright Act (DMCA) in 1998. In my opinion, there’s less than a 50% chance of any revision addressing AI within the next five years. As of now, there’s not even a legal framework for AI tools trained on publicly visible content.
Authors including Sarah Silverman are screwed. And from where I sit, there’s very little we can do about it.