Feature deep-dive

llms.txt & AI ingestion surface

Last updated May 10, 2026

LLMs ingest your content through different surfaces than browsers do. Mentionwell ships every LLM-ingestion surface that exists today — llms.txt, llms-full.txt, per-page .md mirrors, RSS, JSON Feed, sitemap, and an explicit AI crawler allowlist — automatically on every site.

What gets generated per site

/llms.txt — llmstxt.org-spec executive summary of the site.
/llms-full.txt — exhaustive deep-reference version.
/sitemap.xml — classic sitemap.
/sitemap-llms.xml — curated AI-priority sitemap.
/feed.xml — RSS.
/feed.json — JSON Feed.
/{slug}.md — markdown mirror per article (and per page on the marketing site).
/robots.txt — explicit allowlist for GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, etc.

Why it matters

Generative engines read your site differently from search bots. RSS/JSON Feed gives them a clean signal of fresh content. The .md mirror lets them ingest answer text without parsing HTML. llms.txt gives them a curated map. The robots allowlist tells them which crawlers are welcome. Without these, you’re relying on the engine to figure out your site shape — usually badly.

What gets generated per site

Why it matters

See also