AI Crawler Policy Checker

See exactly what any website tells AI crawlers: which bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and the rest — its robots.txt allows or blocks, and whether it publishes an llms.txt. Whether you want to be cited by AI answers or kept out of training data, the first step is knowing what your site currently says.

Try a news site — many block training crawlers while allowing search bots.

Why AI crawler policy suddenly matters

AI assistants now answer questions directly, citing the sites they crawled. Your robots.txt is the control surface: it decides whether your content can appear in AI answers (search/fetch bots), be used to train models (training bots), or neither. There is no single right answer — publishers chasing AI-referral traffic allow search bots while blocking training; others lock everything down; many haven't decided, which is itself a decision (everything allowed).

The three kinds of AI bot

  • Training crawlers (GPTBot, ClaudeBot, Google-Extended, anthropic-ai…) — collect content to train future models. Blocking them does not remove you from AI search.
  • Search/index bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot…) — build the indexes behind AI answers and citations. Blocking these removes your chance of being cited.
  • User fetchers (ChatGPT-User, Claude-User…) — retrieve a specific page because a human asked. Blocking these breaks "summarize this URL" for your pages.

Common stances, copy-paste ready

Open to AI search, opted out of training:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

Fully closed to AI: add a Disallow: / group for every token in the table above. Fully open: simply have no AI-specific groups (the default).

Frequently asked questions

Do AI companies actually obey robots.txt?

The major ones (OpenAI, Anthropic, Google, Apple) document compliance and are observed honoring it. Smaller scrapers may not. robots.txt is a policy signal, not an enforcement mechanism — for hard enforcement you need CDN/firewall rules matching the user-agents and published IP ranges.

What is llms.txt?

An emerging convention (llmstxt.org): a markdown file at /llms.txt offering LLMs a curated map of your most useful content. It is advisory and optional — adoption is growing but no major model is known to require it. Publishing one signals you've thought about AI consumption.

Will blocking GPTBot remove my site from ChatGPT?

It opts you out of future training. ChatGPT's browsing/search uses OAI-SearchBot and ChatGPT-User — separate tokens with separate rules, which is exactly why this tool lists them individually.

Last reviewed: Reviewed by the

How this tool works: This tool runs in your browser and on our server in real time. Depending on the tool, results are computed directly from the input you provide or retrieved from live, authoritative data sources at the moment you run a lookup. We do not sell your data, and your lookups are kept private — any history shown here is stored only on your device.