robots.txt (Entity)

The Verge 23 related

Reddit says it will block the Internet Archive from indexing every page but its homepage, after catching AI companies scraping its data from the Wayback Machine

it was illegally collected by AI companies Andrew Nusca / Fortune : Ford's new EV strategy includes $2 billion U.S. investment Amanda Yeo / Mashable : Reddit is blocking Wayback Machine from archiving...

2025-08-12 View

Cloudflare 5 related

Cloudflare says Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

We are observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls …

2025-08-04 View

TechCrunch 4 related

OpenAI's crawlers took down e-commerce site Triplegangers by relentlessly trying to scrape the entire site, whose robots.txt file was not properly configured

techcrunch.com/2025/01/10/h... #google #seo #openai [image] @tante.cc : #OpenAI is basically the locusts of the digital by now. Their massive scrapers crushing websites in order to steal and feed th...

2025-01-12 View

TechCrunch 4 related

OpenAI's crawlers took down e-commerce site Triplegangers by relentlessly trying to scrape the entire site, whose robots.txt file was not properly configured

On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company's e-commerce site was down. Bluesky: @valkayec , @glynmoody , and @tante.cc Mastodon: @remixtures@tldr.nettime.org , @DrPe...

2025-01-11 View

404 Media 7 related

Some popular sites like Condé Nast's titles and Reuters.com modified robots.txt to block Anthropic's bots, but Anthropic has just made new bots with other names

We really are going to need a shared blocklist that doesn't rely on putting your website behind Cloudflare. — https://www.404media.co/... Jason Koebler / @jasonkoebler@mastodon.social : Many website...

2024-07-30 View

Search Engine Land 8 related

Microsoft says “Bing stopped crawling Reddit” after Reddit updated its robots.txt file on July 1 to prohibit “all crawling of their site”

Reddit has updated its robots.txt file, preventing Bing and many other search engines from crawling the site.

2024-07-25 View

TechCrunch 22 related

Cloudflare launches a tool that aims to block bots from scraping websites for AI training data, available free for all its customers

“We hear clearly that customers don't want AI bots visiting their websites, and especially those that do so dishonestly. To help, we've added a brand new one-click to block all AI bots. … X: @cloudfl...

2024-07-06 View

Engadget 7 related

Reddit says it will update its robots.txt to make “as clear as possible” that companies “using an automated agent to access Reddit” need to abide by its terms

The warning comes after reports that AI companies regularly ignore instructions not to scrape.

2024-06-26 View

Fast Company 11 related

In response to plagiarism allegations, Perplexity CEO Aravind Srinivas says the company “is not ignoring” robots.txt, but does rely on third-party web crawlers

* what we do is highly technical, you don't understand — * it wasn't us it was a third party service/contractor/vendor — https://www.fastcompany.com/ ... @bsmall2@mstdn.jp : Automated Plagiarism f...

2024-06-23 View

Fast Company 6 related

In response to plagiarism allegations, Perplexity CEO Aravind Srinivas says the company “is not ignoring” robots.txt, but does rely on third-party web crawlers

The AI search startup Perplexity is in hot water in the wake of a Wired investigation revealing that the startup …

2024-06-22 View

robots.txt

Related Entities

Top Voices

Explore Further

Coverage Timeline

Reddit says it will block the Internet Archive from indexing every page but its homepage, after catching AI companies scraping its data from the Wayback Machine

Cloudflare says Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

OpenAI's crawlers took down e-commerce site Triplegangers by relentlessly trying to scrape the entire site, whose robots.txt file was not properly configured

OpenAI's crawlers took down e-commerce site Triplegangers by relentlessly trying to scrape the entire site, whose robots.txt file was not properly configured

Some popular sites like Condé Nast's titles and Reuters.com modified robots.txt to block Anthropic's bots, but Anthropic has just made new bots with other names

Microsoft says “Bing stopped crawling Reddit” after Reddit updated its robots.txt file on July 1 to prohibit “all crawling of their site”

Cloudflare launches a tool that aims to block bots from scraping websites for AI training data, available free for all its customers

Reddit says it will update its robots.txt to make “as clear as possible” that companies “using an automated agent to access Reddit” need to abide by its terms

In response to plagiarism allegations, Perplexity CEO Aravind Srinivas says the company “is not ignoring” robots.txt, but does rely on third-party web crawlers

In response to plagiarism allegations, Perplexity CEO Aravind Srinivas says the company “is not ignoring” robots.txt, but does rely on third-party web crawlers

Quarterly Coverage

Top Sources

Narrative

Relationships