How robots.txt differs from AI Crawler, LLMO, llms.txt
robots.txt is crawl policy. llms.txt is ingest-friendly context (a Markdown overview of the site). Both live at the root; they answer different questions.
How Mentionwell handles robots.txt
- Default robots.txt explicitly allows 15+ named AI crawlers — no ambiguous wildcards.
- Per-domain customization so site owners can opt out of any specific crawler.
- Allowlist published as part of the LLMO setup, alongside llms.txt and Markdown mirrors.
Frequently asked questions about robots.txt
How do I allow or block AI crawlers in robots.txt?
Use named user-agent directives — User-agent: GPTBot, User-agent: ClaudeBot, etc — with explicit Allow or Disallow rules. Wildcards alone don't reliably control AI crawlers.
Does robots.txt control AI training?
It controls AI crawlers that honor it — which most major ones do. It does not control downstream redistribution of already-trained-on data, and it has no legal force; it's an industry convention.
See also
Ship robots.txt-optimized articles automatically
Mentionwell handles robots.txt on every published article — alongside the other six optimization targets in this glossary — so you don't have to think about it per post. Drop a domain, approve the first headline, watch the pipeline run.