AI Crawler Policy Checker

See exactly what any website tells AI crawlers: which bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and the rest — its robots.txt allows or blocks, and whether it publishes an llms.txt. Whether you want to be cited by AI answers or kept out of training data, the first step is knowing what your site currently says.

Why AI crawler policy suddenly matters

AI assistants now answer questions directly, citing the sites they crawled. Your robots.txt is the control surface: it decides whether your content can appear in AI answers (search/fetch bots), be used to train models (training bots), or neither. There is no single right answer — publishers chasing AI-referral traffic allow search bots while blocking training; others lock everything down; many haven't decided, which is itself a decision (everything allowed).

The three kinds of AI bot

Training crawlers (GPTBot, ClaudeBot, Google-Extended, anthropic-ai…) — collect content to train future models. Blocking them does not remove you from AI search.
Search/index bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot…) — build the indexes behind AI answers and citations. Blocking these removes your chance of being cited.
User fetchers (ChatGPT-User, Claude-User…) — retrieve a specific page because a human asked. Blocking these breaks "summarize this URL" for your pages.

Common stances, copy-paste ready

Open to AI search, opted out of training:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

Fully closed to AI: add a Disallow: / group for every token in the table above. Fully open: simply have no AI-specific groups (the default).

Frequently asked questions

Do AI companies actually obey robots.txt?

The major ones (OpenAI, Anthropic, Google, Apple) document compliance and are observed honoring it. Smaller scrapers may not. robots.txt is a policy signal, not an enforcement mechanism — for hard enforcement you need CDN/firewall rules matching the user-agents and published IP ranges.

What is llms.txt?

An emerging convention (llmstxt.org): a markdown file at /llms.txt offering LLMs a curated map of your most useful content. It is advisory and optional — adoption is growing but no major model is known to require it. Publishing one signals you've thought about AI consumption.

Will blocking GPTBot remove my site from ChatGPT?

It opts you out of future training. ChatGPT's browsing/search uses OAI-SearchBot and ChatGPT-User — separate tokens with separate rules, which is exactly why this tool lists them individually.

AI Crawler Policy Checker

Why AI crawler policy suddenly matters

The three kinds of AI bot

Common stances, copy-paste ready

Frequently asked questions

Do AI companies actually obey robots.txt?

What is llms.txt?

Will blocking GPTBot remove my site from ChatGPT?

IP & Network Tools

Website & DNS Tools

Security Tools

Developer Tools

AI Crawler Policy Checker

Why AI crawler policy suddenly matters

The three kinds of AI bot

Common stances, copy-paste ready

Frequently asked questions

Do AI companies actually obey robots.txt?

What is llms.txt?

Will blocking GPTBot remove my site from ChatGPT?

Related Tools

IP Lookup

DNS Lookup

WHOIS Lookup

SSL Certificate Checker

Port Scanner