2025-11-04
The Atlantic
A profile of nonprofit Common Crawl, which has scraped billions of webpages since 2013, including paywalled ones, to build an archive used by OpenAI and others
Editor's note: This work is part of AI Watchdog, The Atlantic's ongoing investigation into the generative-AI industry. X: @kait_tiffany . Bluesky: @katienotopoulos , @damonberes.com , @justinhendrix ,...
Loading articles...