# What is robots.txt? robots.txt for AI, explained

> Crawl policy in the AI era.

robots.txt is the file at /robots.txt that tells crawlers which paths they may or may not fetch. In the AI era, robots.txt is also the primary opt-in/opt-out signal for AI training and retrieval — each major AI crawler honors its own user-agent directive (GPTBot, ClaudeBot, PerplexityBot, Google-Extended).

## How robots.txt differs from AI Crawler, LLMO, llms.txt

robots.txt is crawl policy. llms.txt is ingest-friendly context (a Markdown overview of the site). Both live at the root; they answer different questions.

## How Mentionwell handles robots.txt

- Default robots.txt explicitly allows 15+ named AI crawlers — no ambiguous wildcards.
- Per-domain customization so site owners can opt out of any specific crawler.
- Allowlist published as part of the LLMO setup, alongside llms.txt and Markdown mirrors.

## Frequently asked questions about robots.txt

### How do I allow or block AI crawlers in robots.txt?

Use named user-agent directives — User-agent: GPTBot, User-agent: ClaudeBot, etc — with explicit Allow or Disallow rules. Wildcards alone don't reliably control AI crawlers.

### Does robots.txt control AI training?

It controls AI crawlers that honor it — which most major ones do. It does not control downstream redistribution of already-trained-on data, and it has no legal force; it's an industry convention.

## See also

- [AI Crawler — AI Crawler / Bot](https://mentionwell.com/ai-crawler): GPTBot, ClaudeBot, PerplexityBot, and friends.
- [LLMO — LLM Optimization](https://mentionwell.com/llmo): Be reachable, parseable, ingestible.
- [llms.txt — llms.txt](https://mentionwell.com/llms-txt): robots.txt, but for LLM context.


---

Canonical URL: https://mentionwell.com/robots-txt
Live HTML version: https://mentionwell.com/robots-txt
Site index for AI ingestion: https://mentionwell.com/llms.txt
Full reference: https://mentionwell.com/llms-full.txt
