looked at 47 large language models released between 2019 and 2023, they found that 64 percent of them were trained, in part, on Common Crawl, a dataset that includes copyrighted works,
How AI Models Steal Creative Work — and What to Do About It | Ed Newton-Rex | TED · TED