How AI Crawler differs from LLMO, robots.txt
Classic search crawlers (Googlebot, Bingbot) feed ranking systems. AI crawlers feed training corpora and retrieval indexes. A site can be visible to one and invisible to the other — they're controlled separately.
How Mentionwell handles AI Crawler
- Default robots.txt explicitly names and allows 15+ AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, etc).
- Per-bot allow/disallow control so site owners can opt out of any specific crawler.
- Sitemaps and llms.txt published at canonical paths so crawlers can discover content efficiently.
Frequently asked questions about AI Crawler
What are the major AI crawlers I should know about?
GPTBot and OAI-SearchBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google generative uses), Applebot-Extended (Apple), Bytespider (ByteDance), Meta-ExternalAgent (Meta).
Should I block AI crawlers?
Only with a specific reason. Most sites benefit from being crawlable — that's how content shows up in ChatGPT, Claude, Perplexity, and AI Overviews answers.
See also
Ship AI Crawler-optimized articles automatically
Mentionwell handles AI Crawler on every published article — alongside the other six optimization targets in this glossary — so you don't have to think about it per post. Drop a domain, approve the first headline, watch the pipeline run.