Robots.txt Generator
Build a clean, valid robots.txt in seconds: choose how crawlers should treat your
site by default, block specific folders, point bots to your sitemap, and opt out of named AI crawlers. The file
is generated entirely in your browser — nothing is sent to us.
What robots.txt is — and what it is not
robots.txt is a plain-text file at the root of your domain (/robots.txt) that
tells well-behaved crawlers which parts of your site they may request. It follows the Robots Exclusion
Protocol, standardised as RFC 9309 in 2022. Crucially, it is a request, not a
lock: compliant crawlers like Googlebot, Bingbot and the major AI bots read and respect it, but it carries
no technical enforcement. A bot that chooses to ignore it — or a curious person typing the URL —
can still reach a disallowed path. Never use robots.txt to hide secrets. Anything truly
private needs authentication or server-side access control; a public robots.txt actually advertises the
folders you would rather keep quiet.
A second common surprise: blocking a URL in robots.txt does not guarantee it stays out of search
results. If other pages link to a blocked URL, search engines can still list the bare link (without a
snippet, because they were not allowed to read it). To reliably keep a page out of the index, leave it
crawlable and add a noindex meta tag or X-Robots-Tag header instead —
the crawler has to be allowed in to see the noindex instruction.
How the * group and specific user-agent groups interact
A robots.txt is a list of groups. Each group starts with one or more User-agent: lines and is
followed by Allow: / Disallow: rules. The wildcard group, User-agent: *,
is the fallback that applies to any crawler without its own block. Here is the rule that trips people up:
a crawler obeys only the single most specific group that names it, and ignores the *
group entirely once a matching named group exists. So if you write a User-agent: GPTBot
group, GPTBot follows only that group — your global Disallow: rules in the
* group no longer apply to it. That is exactly why this generator emits each blocked AI bot as
its own self-contained Disallow: / group rather than tacking it onto the wildcard group. Within a
group, when an Allow and a Disallow both match a URL, the rule with the longest path
wins; ties go to Allow.
The 2025-2026 AI-crawler reality
AI companies run two broad kinds of bot, and the distinction decides what blocking them costs you. Training crawlers — GPTBot, ClaudeBot, anthropic-ai, Google-Extended, CCBot (Common Crawl), Bytespider (ByteDance/TikTok) and Amazonbot — harvest content to train or improve models. Blocking them keeps your words out of future training sets but has no effect on normal search rankings. Search / retrieval bots — OAI-SearchBot, ChatGPT-User, PerplexityBot — fetch pages so an assistant can cite or summarise them live in an answer. Blocking these is a real trade-off: you protect content but also remove yourself from AI-generated answers and the referral traffic they can send. Many publishers now allow search bots while blocking training bots. There is no single right answer — decide per your goals, and remember robots.txt is voluntary, so a contract or paywall is the only hard guarantee. After you publish, audit a live domain with the AI policy checker to see exactly which bots it allows or blocks.
Good practice
- One file, at the root. Only
https://yourdomain.com/robots.txtis read; subdirectory copies are ignored. Each subdomain needs its own. - Do not block your CSS/JS. Disallowing assets stops Google rendering your pages properly and can hurt rankings.
- Always link your sitemap so crawlers discover new and updated URLs faster.
- Test after publishing. Paste the live file into the robots.txt analyzer to confirm it parses and blocks what you intended.
Frequently asked questions
Will blocking a page in robots.txt remove it from Google?
Not reliably. Disallowing a URL only stops Google from reading it — if other sites link
to it, the bare URL can still appear in results without a description. To keep a page out of the index,
leave it crawlable and use a noindex meta tag or X-Robots-Tag header so the
crawler can actually see the instruction.
Does blocking AI bots hurt my SEO?
No. The AI training and AI-search crawlers are separate from Googlebot and Bingbot, so opting them out does not change your classic search rankings. The only cost is for AI search bots (like PerplexityBot or OAI-SearchBot): blocking those removes you from the AI answers they generate, and the referral traffic that can follow.
Why is each AI bot its own group instead of one big list?
Because a crawler obeys only the single most specific group that names it and ignores the wildcard
* group once a named group exists. Giving each blocked bot its own
User-agent: <Bot> / Disallow: / block guarantees the rule applies cleanly,
without it accidentally cancelling your global rules.
Where do I put the finished file?
Save it as a plain UTF-8 text file literally named robots.txt and upload it to the very
root of your domain so it loads at https://yourdomain.com/robots.txt. A copy in a
subfolder will not be used. Then verify it with the
analyzer.
How this tool works: This tool runs in your browser and on our server in real time. Depending on the tool, results are computed directly from the input you provide or retrieved from live, authoritative data sources at the moment you run a lookup. We do not sell your data, and your lookups are kept private — any history shown here is stored only on your device.