and that's to say nothing of the smaller ones that don't get reported. There are marketplaces of training data where you can get more data. You can expand this with data that's in the public domain --
How AI Models Steal Creative Work — and What to Do About It | Ed Newton-Rex | TED · TED