AI Bot Detector - Identify AI Crawlers by User-Agent
Instantly check whether a User-Agent belongs to a known AI crawler such as OpenAI's GPTBot, Anthropic's ClaudeBot, PerplexityBot or Common Crawl's CCBot. We classify the vendor, what the bot is for (training, search, or a user-triggered fetch), and how to allow or block it in robots.txt.
Your current User-Agent
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
Detected: ClaudeBot
- Vendor: Anthropic
- Purpose: AI model training
- robots.txt: Obeys robots.txt. Crawls to help train Claude; block ClaudeBot (and anthropic-ai/Claude-Web) to opt out.
User-Agent strings can be freely spoofed. A match here means the UA *claims* to be this crawler — it is not proof. Confirm with reverse-DNS of the source IP and the vendor's published IP ranges.
Classify any User-Agent string
What is an AI crawler?
An AI crawler is an automated bot that fetches public web pages to feed a large language model or an AI search product. Unlike a classic search-engine crawler that exists to rank pages in results you click through to, AI crawlers often consume your content to train models or to answer a question directly - so the visit may never send a human to your site.
AI crawlers vs. search-engine crawlers: what's the difference?
Search crawlers (Googlebot, Bingbot) index pages so people find and visit them; AI crawlers fetch pages to train models or generate answers, often without a referral click. The practical split is by purpose:
- Training crawlers - collect text to train future models (GPTBot, ClaudeBot, CCBot, Amazonbot, Meta-ExternalAgent).
- AI search crawlers - index pages to answer queries inside an assistant (OAI-SearchBot, PerplexityBot, Applebot).
- User-initiated fetchers - load one page because a human asked the assistant about it (ChatGPT-User, Claude-User, Perplexity-User).
- Training-control tokens - not bots at all, but robots.txt labels (Google-Extended, Applebot-Extended) that opt you out of training while normal indexing continues.
How do I block or allow AI crawlers in robots.txt?
List each crawler's User-Agent token in robots.txt with a Disallow rule to block it, or a permissive rule to allow it. Robots.txt is voluntary - well-behaved bots obey it, but it is a request, not enforcement.
- Block OpenAI training:
User-agent: GPTBotthenDisallow: /. - Block Anthropic training:
User-agent: ClaudeBotthenDisallow: /. - Opt out of Google's Gemini training but keep Search:
User-agent: Google-ExtendedthenDisallow: /. - Block Common Crawl (used by many models):
User-agent: CCBotthenDisallow: /. - For bots that ignore robots.txt (e.g. Bytespider has been widely reported to), block by IP/ASN at your CDN or WAF.
Can a User-Agent be faked?
Yes. A User-Agent is just a text header the client chooses, so any scraper can send GPTBot/1.0 while having nothing to do with OpenAI. Reliable detection therefore needs more than a string match:
- Reverse DNS (rDNS) - resolve the connecting IP to a hostname and confirm it ends in the vendor's domain, then forward-resolve that hostname back to the same IP (forward-confirmed rDNS).
- Published IP ranges - major vendors publish the IP/ASN ranges their crawlers use; verify the request originates there.
- Rate and behavior - genuine crawlers honor crawl-delay and back off; spoofers often do not.
This tool does signature matching only, so treat a match as "claims to be," not "proven to be."
How this tool works: This tool runs in your browser and on our server in real time. Depending on the tool, results are computed directly from the input you provide or retrieved from live, authoritative data sources at the moment you run a lookup. We do not sell your data, and your lookups are kept private — any history shown here is stored only on your device.