The popular but prohibited pipeline

What happens when people demand something illegal? We find a way to make it legal and compensate those whose content was stolen. That’s what’s about to happen with AI.
We’ve seen this over and over with digital innovations.
Napster made electronic access to all music possible — but violated the rights of musicians and record labels. Lawsuits went all the way to the Supreme Court, which killed Napster. But the idea didn’t die. The solution wasn’t to put downloaders in jail. It was streaming licenses. Now we use Spotify and Pandora.
YouTube rapidly filled up with copyrighted clips. But people loved it. Now it has an automated takedown system and detects music content – and compensates the musicians.
Airbnb skirted hotel regulations. The regulators developed new regulatory regimes including certification and taxation.
You can even draw the connection to cannabis: the public wanted it, the states figured out how to license it, and now, in many parts of America, it’s a regulated business. Alcohol prohibition didn’t last forever; drinking was too popular.
What this means for AI
Large language models are, for the most part, built on unlicensed, stolen content, just like Napster and YouTube.
Content owners have filed lawsuits. Just like Napster and YouTube.
The tech companies behind LLMs have seen this movie before. They’re fighting the lawsuits, of course. But everybody would be much happier with a technical solution.
Internet infrastructure and security company Cloudflare has announced a regime to block AI crawlers and negotiate licenses for access. Google is likely to have a version of the same idea.
Amazon is the largest supplier of ebooks, through its Kindle format. It would surprise me if it doesn’t implement a licensing scheme to allow ebook publishers — including traditional publishers — to license ebooks for training. This would work a lot better than existing models in which LLMs either break ebook encryption or actually purchase and digitally scan existing books. If Amazon doesn’t do it, another ebook supplier — Barnes & Noble or Apple — likely will.
There are two things you can count on here.
First, this is going to happen. People find AI far too useful to give it up. If a technical solution doesn’t arrive, a mandatory licensing regime from the government will.
And second, the content owners aren’t going to get rich off it. Each web page is worth very little individually. AI can get along fine with the other books if you choose not to license yours. Musicians don’t get rich off of Spotify, and content owners won’t get rich off of AI licenses.
But they will happen. Bet on it.