Robots.txt Generator

Build a clean, valid robots.txt in seconds: choose how crawlers should treat your site by default, block specific folders, point bots to your sitemap, and opt out of named AI crawlers. The file is generated entirely in your browser — nothing is sent to us.

1. Default crawl policy

2. Paths to block (custom policy)

Type one path prefix per row, e.g. /admin/, /cart/, /search?. Leave rows empty to ignore them. Each becomes a Disallow: line under the * group.

3. Sitemap URL (optional)

Use the complete absolute URL (starting with https://). It is emitted as a single Sitemap: line at the end of the file.

4. Crawl-delay (optional)

Added as Crawl-delay: under the * group. Note: Googlebot ignores crawl-delay (set the rate in Search Console instead); Bing and others honour it.

5. Block AI crawlers (optional)

Tick any AI bot you want to keep out. Each ticked bot gets its own User-agent: <Bot> / Disallow: / block. The TRAIN tag marks model-training crawlers; the SEARCH tag marks live-answer / retrieval bots.

Your robots.txt

Save this as a plain text file named robots.txt in the root of your domain so it is served at https://yourdomain.com/robots.txt.

User-agent: *
Disallow:

What robots.txt is — and what it is not

robots.txt is a plain-text file at the root of your domain (/robots.txt) that tells well-behaved crawlers which parts of your site they may request. It follows the Robots Exclusion Protocol, standardised as RFC 9309 in 2022. Crucially, it is a request, not a lock: compliant crawlers like Googlebot, Bingbot and the major AI bots read and respect it, but it carries no technical enforcement. A bot that chooses to ignore it — or a curious person typing the URL — can still reach a disallowed path. Never use robots.txt to hide secrets. Anything truly private needs authentication or server-side access control; a public robots.txt actually advertises the folders you would rather keep quiet.

A second common surprise: blocking a URL in robots.txt does not guarantee it stays out of search results. If other pages link to a blocked URL, search engines can still list the bare link (without a snippet, because they were not allowed to read it). To reliably keep a page out of the index, leave it crawlable and add a noindex meta tag or X-Robots-Tag header instead — the crawler has to be allowed in to see the noindex instruction.

How the * group and specific user-agent groups interact

A robots.txt is a list of groups. Each group starts with one or more User-agent: lines and is followed by Allow: / Disallow: rules. The wildcard group, User-agent: *, is the fallback that applies to any crawler without its own block. Here is the rule that trips people up: a crawler obeys only the single most specific group that names it, and ignores the * group entirely once a matching named group exists. So if you write a User-agent: GPTBot group, GPTBot follows only that group — your global Disallow: rules in the * group no longer apply to it. That is exactly why this generator emits each blocked AI bot as its own self-contained Disallow: / group rather than tacking it onto the wildcard group. Within a group, when an Allow and a Disallow both match a URL, the rule with the longest path wins; ties go to Allow.

The 2025-2026 AI-crawler reality

AI companies run two broad kinds of bot, and the distinction decides what blocking them costs you. Training crawlers — GPTBot, ClaudeBot, anthropic-ai, Google-Extended, CCBot (Common Crawl), Bytespider (ByteDance/TikTok) and Amazonbot — harvest content to train or improve models. Blocking them keeps your words out of future training sets but has no effect on normal search rankings. Search / retrieval bots — OAI-SearchBot, ChatGPT-User, PerplexityBot — fetch pages so an assistant can cite or summarise them live in an answer. Blocking these is a real trade-off: you protect content but also remove yourself from AI-generated answers and the referral traffic they can send. Many publishers now allow search bots while blocking training bots. There is no single right answer — decide per your goals, and remember robots.txt is voluntary, so a contract or paywall is the only hard guarantee. After you publish, audit a live domain with the AI policy checker to see exactly which bots it allows or blocks.

Good practice

  • One file, at the root. Only https://yourdomain.com/robots.txt is read; subdirectory copies are ignored. Each subdomain needs its own.
  • Do not block your CSS/JS. Disallowing assets stops Google rendering your pages properly and can hurt rankings.
  • Always link your sitemap so crawlers discover new and updated URLs faster.
  • Test after publishing. Paste the live file into the robots.txt analyzer to confirm it parses and blocks what you intended.

Frequently asked questions

Will blocking a page in robots.txt remove it from Google?

Not reliably. Disallowing a URL only stops Google from reading it — if other sites link to it, the bare URL can still appear in results without a description. To keep a page out of the index, leave it crawlable and use a noindex meta tag or X-Robots-Tag header so the crawler can actually see the instruction.

Does blocking AI bots hurt my SEO?

No. The AI training and AI-search crawlers are separate from Googlebot and Bingbot, so opting them out does not change your classic search rankings. The only cost is for AI search bots (like PerplexityBot or OAI-SearchBot): blocking those removes you from the AI answers they generate, and the referral traffic that can follow.

Why is each AI bot its own group instead of one big list?

Because a crawler obeys only the single most specific group that names it and ignores the wildcard * group once a named group exists. Giving each blocked bot its own User-agent: <Bot> / Disallow: / block guarantees the rule applies cleanly, without it accidentally cancelling your global rules.

Where do I put the finished file?

Save it as a plain UTF-8 text file literally named robots.txt and upload it to the very root of your domain so it loads at https://yourdomain.com/robots.txt. A copy in a subfolder will not be used. Then verify it with the analyzer.

Last reviewed: Reviewed by the

How this tool works: This tool runs in your browser and on our server in real time. Depending on the tool, results are computed directly from the input you provide or retrieved from live, authoritative data sources at the moment you run a lookup. We do not sell your data, and your lookups are kept private — any history shown here is stored only on your device.