# What is AI Crawler? AI Crawler / Bot, explained

> GPTBot, ClaudeBot, PerplexityBot, and friends.

An AI crawler is a bot operated by an AI company to ingest web content for training, retrieval, or live grounding. The major ones — GPTBot, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended — each have their own user-agent and (usually) their own robots.txt directive.

## How AI Crawler differs from LLMO, robots.txt

Classic search crawlers (Googlebot, Bingbot) feed ranking systems. AI crawlers feed training corpora and retrieval indexes. A site can be visible to one and invisible to the other — they're controlled separately.

## How Mentionwell handles AI Crawler

- Default robots.txt explicitly names and allows 15+ AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, etc).
- Per-bot allow/disallow control so site owners can opt out of any specific crawler.
- Sitemaps and llms.txt published at canonical paths so crawlers can discover content efficiently.

## Frequently asked questions about AI Crawler

### What are the major AI crawlers I should know about?

GPTBot and OAI-SearchBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google generative uses), Applebot-Extended (Apple), Bytespider (ByteDance), Meta-ExternalAgent (Meta).

### Should I block AI crawlers?

Only with a specific reason. Most sites benefit from being crawlable — that's how content shows up in ChatGPT, Claude, Perplexity, and AI Overviews answers.

## See also

- [LLMO — LLM Optimization](https://mentionwell.com/llmo): Be reachable, parseable, ingestible.
- [robots.txt — robots.txt for AI](https://mentionwell.com/robots-txt): Crawl policy in the AI era.


---

Canonical URL: https://mentionwell.com/ai-crawler
Live HTML version: https://mentionwell.com/ai-crawler
Site index for AI ingestion: https://mentionwell.com/llms.txt
Full reference: https://mentionwell.com/llms-full.txt