Content Architecture

Mar 26, 2026 Updated: Mar 29, 2026 11 min read

How to Build an llms.txt File for Your Business

An llms.txt file is a markdown document placed at a website's root directory that provides large language models with a structured summary of the site's most important content. Unlike robots.txt, which controls crawler access, llms.txt offers context and comprehension guidance for AI systems. This guide covers the specification, step-by-step construction, and honest limitations for founders and technical practitioners building AI search infrastructure.

Key Insights

An llms.txt file is a markdown document at a website's root that provides LLMs with a curated summary of the site's structure and key content pages.
The llms.txt file specification, proposed by Jeremy Howard of Answer.AI in September 2024, defines two complementary files: llms.txt for navigation and llms-full.txt for comprehensive content ingestion.
An llms.txt file requires only one mandatory element: an H1 heading with the project or company name, followed by an optional blockquote summary and H2-organized content sections.
The llms.txt file has achieved adoption among approximately 10% of surveyed domains, concentrated almost entirely in developer tools, AI-native companies, and SaaS documentation sites.
No major LLM-driven search or answer engine has announced official support for the llms.txt file protocol as of early 2026.
Google's John Mueller compared the llms.txt file to the keywords meta tag, noting that no consumer LLM or chatbot fetches the file during inference.
The llms.txt file functions as a comprehension layer rather than an access control mechanism, distinguishing it fundamentally from robots.txt.
Building an llms.txt file forces organizational clarity about which pages carry the highest informational value, a benefit independent of whether AI systems read the file today.

What an llms.txt File Actually Is

An llms.txt file is a plain-text markdown document hosted at the root of a website (yoursite.com/llms.txt) that gives large language models a structured overview of what the site contains and which pages matter most. Jeremy Howard, co-founder of Answer.AI, dropped the specification in September 2024 to solve a specific problem: LLMs struggle to parse complex HTML pages filled with navigation chrome, ads, JavaScript widgets, and boilerplate. The llms.txt file strips all of that away and hands the model a clean summary.

The specification defines two complementary files. The primary llms.txt provides a streamlined navigation index: company name, summary, and organized links to key pages. The secondary llms-full.txt bundles all important content into a single comprehensive document that an AI system could ingest in one pass. Neither file controls access. Neither file blocks crawlers. The llms.txt file is a context delivery mechanism, not a permission layer.

The format choice was deliberate and pragmatic. Markdown, not JSON. Not XML. Not YAML. Howard's reasoning: markdown is the format LLMs handle most naturally. Every major model parses markdown with near-perfect fidelity. Forcing site owners to learn yet another schema format would have killed adoption before it started. Whether this simplicity is a feature or a limitation depends entirely on what you expect the file to accomplish.

How the llms.txt Protocol Works

The llms.txt protocol operates through a rigid section hierarchy designed for top-down priority processing. The file opens with an H1 heading containing the project or company name. This is the only mandatory element. A blockquote follows with a one-to-three sentence summary of what the site does and who it serves. After the summary, H2 sections organize links to the site's most important pages, each link accompanied by a brief description of what that page covers.

The "Optional" section carries special meaning within the specification. Content listed under this heading signals to AI systems that those pages can be skipped when context windows are tight. This is a deliberate engineering choice: the llms.txt protocol acknowledges that models operate under token constraints and gives site owners a way to declare priority hierarchy explicitly.

Minimal llms.txt Example for a B2B Company

# Acme Analytics
> Acme Analytics provides real-time business intelligence dashboards for mid-market companies.

## Core Product
- [Platform Overview](/platform): Full feature walkthrough and pricing
- [API Reference](/docs/api): REST endpoints, authentication, rate limits

## Resources
- [Case Studies](/cases): Customer outcomes with named metrics
- [Methodology](/methodology): How the analytics engine processes data

## Optional
- [Blog](/blog): Industry commentary and product updates
- [Careers](/careers): Open positions

The file reads top to bottom. Priority decreases as you scroll down. AI systems, when they eventually consume these files, would process core sections first and skip "Optional" content when context is limited. The entire protocol bets on a future where LLMs request site context before generating answers, rather than inferring context from raw HTML.

llms.txt Compared to robots.txt and Schema Markup

The llms.txt file occupies a fundamentally different layer than either robots.txt or schema markup, though all three influence how machines interact with web content. Conflating them is the most common mistake in the discourse, and the SEO-to-AI pipeline has been particularly enthusiastic about confusing these three tools into a single "AI readiness" checklist.

Robots.txt controls access. The file tells crawlers which URLs they can and cannot visit. LLM crawlers like GPTBot, ClaudeBot, and PerplexityBot all respect robots.txt directives. Schema markup provides entity context through JSON-LD: it tells machines what things are, how they relate, and where they fit in a knowledge graph. The llms.txt file does neither. The llms.txt file provides narrative context: what a site is about, which pages matter, and how to interpret them.

Comparison of llms.txt, robots.txt, and schema markup across function, format, AI consumption status, and strategic purpose
Dimension	llms.txt	robots.txt	Schema Markup (JSON-LD)
Primary Function	Provides narrative context and page priority for LLMs	Controls crawler access to specific URLs	Defines entities and relationships for knowledge graphs
File Format	Markdown plain text	Plain text with directive syntax	JSON-LD embedded in HTML head
AI System Consumption (2026)	No confirmed LLM fetches the file during inference	All major LLM crawlers respect directives	Indirect influence on citation mechanics under active study
Required Technical Skill	Minimal: markdown authorship only	Low: directive syntax is straightforward	Moderate to high: requires Schema.org vocabulary knowledge
What Gets Controlled	Content comprehension and page prioritization	Crawl access and rate limiting	Entity disambiguation and type classification
When to Choose	Building forward-looking AI context infrastructure	Controlling which pages AI crawlers can access	Anchoring brand entities in knowledge graphs

The honest assessment: robots.txt is the only file in this trio that every AI system reads today. Schema markup has growing influence on AI citation mechanics, though the magnitude is still being quantified. The llms.txt file has no confirmed consumption by any major AI provider. Building one is a bet on future infrastructure, not a lever that produces measurable results tomorrow.

How to Build Your llms.txt File Step by Step

Building an llms.txt file requires three decisions before any markdown gets written: which pages represent the site's core identity, which pages support that identity with evidence, and which pages are genuinely optional. The entire exercise should take less than an hour for most businesses.

Step 1: Audit Your Page Inventory

The llms.txt file demands an honest page inventory. List every page on the site and categorize each as core (product pages, service pages, pricing, key landing pages), supporting (documentation, case studies, research, methodology pages), or optional (blog archives, careers, press releases). Most businesses have 5-15 core pages, 10-30 supporting pages, and everything else is noise. Be ruthless. The point of the llms.txt file is curation, not comprehensiveness.

Step 2: Write the Header Block

The llms.txt file opens with an H1 containing the company name and a blockquote summary of 1-3 sentences. The summary should define what the company does, who it serves, and what makes it different. Write this as if you were explaining the business to a knowledgeable colleague in 30 seconds. No marketing fluff. No aspirational vision statements. No "leading provider of innovative solutions" nonsense.

Step 3: Organize Sections and Deploy

Group core and supporting pages under descriptive H2 headings. Common sections include "Products," "Services," "Documentation," "Research," and "About." Each link gets a brief description of 5-15 words explaining what the page contains. Place genuinely skippable content under the "Optional" heading. Upload the finished file to your site's root directory so it is accessible at yoursite.com/llms.txt. If your CMS supports it, also generate an llms-full.txt that concatenates the full content of core pages into one markdown document.

Where the llms.txt File Falls Short

The llms.txt file has a fundamental adoption problem that no amount of specification elegance can fix: no major AI system reads it. Google's John Mueller compared llms.txt to the keywords meta tag, a comparison that should make anyone who remembers the early 2000s wince. His assessment in mid-2025 was direct: "No AI system currently uses llms.txt. It's super-obvious if you look at your server logs." He was not being coy. Server logs do not lie.

The data is merciless. SE Ranking's analysis found no statistical correlation between llms.txt implementation and AI citation frequency. Rankability's scan of the top 1,000 websites found zero implementations. The file's strongest adoption cluster is AI-native companies and developer documentation platforms, which are precisely the organizations that need it least because their content is already highly structured and markdown-native.

The specification also lacks a governance body. The llms.txt proposal lives on GitHub as an informal document maintained by Answer.AI. No W3C working group. No IETF RFC. No formal versioning process. Robots.txt evolved through widespread de facto adoption and eventually received a formal specification from Google. The llms.txt file is still waiting for its first confirmed consumer, let alone an institutional champion. There is also no verification mechanism: a company could claim anything in its llms.txt file, and no AI system validates those claims against actual page content.

Who Should Build an llms.txt File Right Now

The llms.txt file is worth building today for a narrow set of businesses, and not worth the effort for most others. The honest framework: if creating the file takes less than one hour, build it. If it requires a multi-week project involving content audits and cross-functional approvals, wait until the protocol has a confirmed consumer.

Build now if:

The business maintains developer documentation, API references, or technical product guides. Developer-facing content is where llms.txt adoption is concentrated, and AI coding assistants represent the most plausible early consumers.
The business operates in the AI or developer tools space. Customers and users in these markets expect an llms.txt file. Anthropic, Cursor, and Vercel all maintain one.
The team wants the organizational clarity exercise. Building an llms.txt file forces a hard conversation about which pages actually matter, and that audit has value independent of whether any machine reads the output.

Wait if:

The business is a local service provider, e-commerce store, or B2C brand. The llms.txt file has zero confirmed impact on consumer-facing AI answers.
AI visibility priority is citation in ChatGPT, Gemini, or Perplexity. Schema markup, entity infrastructure, and content architecture produce measurable results today. Allocate resources there first.
Building the file would divert engineering time from higher-impact projects. An llms.txt file with no confirmed reader is not worth pulling engineers off work that affects revenue.

The strategic posture is preparation, not optimization. Building an llms.txt file is a low-cost bet that AI providers will eventually standardize on a context delivery protocol. The bet costs almost nothing to place. The payoff is uncertain but structurally plausible.

How This All Fits Together

llms.txt fileprovides context for > large language modelscomplements > robots.txtcomplements > schema markup (JSON-LD)requires > markdown formattingrobots.txtcontrols access for > AI crawlers (GPTBot, ClaudeBot, PerplexityBot)precedes > llms.txt file in adoption maturityschema markupenables > entity disambiguation for knowledge graphsfeeds into > AI citation mechanicsllms-full.txtextends > llms.txt file with comprehensive contentcontains > full documentation in a single markdown documentJeremy Howardproposed > llms.txt specification via Answer.AIAI search visibilitydepends on > content architecture and entity infrastructurebenefits from > llms.txt file (when protocol gains consumer support)RAG pipelineconsumes > web content at inference timewould benefit from > llms.txt context layer for page prioritization

Final Takeaways

Build your llms.txt file as a one-hour exercise, not a quarter-long initiative. The specification is intentionally simple: H1 company name, blockquote summary, H2 sections with links. Most businesses can complete the file in a single sitting using the step-by-step process above.
Prioritize your page inventory with honesty. The core value of building an llms.txt file is the audit it forces: which pages actually represent the business, and which are noise? Use the exercise to sharpen content architecture regardless of AI consumption.
Do not treat an llms.txt file as a substitute for entity infrastructure. Schema markup, structured content, and knowledge graph anchoring produce measurable AI visibility results today. The llms.txt file is a forward-looking complement, not a replacement for strategies with confirmed impact.
Monitor server logs for llms.txt requests. The moment AI crawlers begin fetching the file, the investment thesis shifts from speculative to active. Until then, maintain the file with minimal ongoing effort.
Invest the bulk of AI visibility resources in strategies with confirmed impact: entity-linked schema markup, retrieval-optimized content architecture, and brand fact infrastructure that AI systems already consume.

FAQs

What is an llms.txt file and what does it do?

An llms.txt file is a markdown document placed at a website's root directory that provides large language models with a structured summary of the site's most important content. The file contains an H1 company name, a blockquote summary, and H2 sections organizing links to key pages with brief descriptions. The llms.txt file helps AI systems understand site structure without parsing complex HTML.

How is an llms.txt file different from robots.txt?

An llms.txt file provides context and comprehension guidance for AI systems, while robots.txt controls which URLs crawlers can access. Robots.txt is an access control mechanism; the llms.txt file is a context delivery mechanism. All major AI crawlers read robots.txt today, but no AI system has confirmed consumption of llms.txt files during inference.

Do any AI systems actually read llms.txt files?

No major LLM-driven search or answer engine has announced official support for the llms.txt protocol as of early 2026. Google's John Mueller stated that no consumer LLM or chatbot fetches the file during inference. Server log analysis consistently shows no llms.txt requests from AI crawlers like GPTBot, ClaudeBot, or PerplexityBot.

What format does an llms.txt file use?

An llms.txt file uses standard markdown formatting. The specification requires an H1 heading with the company or project name, followed by an optional blockquote summary and H2 sections containing descriptive links to key pages. A companion llms-full.txt file can bundle complete page content into a single comprehensive markdown document for deeper AI ingestion.

What are the limitations of an llms.txt file for AI visibility?

The llms.txt file has no confirmed impact on AI citations or search visibility as of early 2026. SE Ranking's analysis found no statistical correlation between llms.txt implementation and AI citation frequency. The specification lacks a formal governance body, has no content verification mechanism, and has achieved meaningful adoption primarily among AI-native companies and developer documentation platforms rather than mainstream websites.

Who should create an llms.txt file for their business?

Businesses maintaining developer documentation, API references, or technical products benefit most from creating an llms.txt file. Companies in the AI and developer tools space should build one because their audiences expect the file. Most consumer-facing businesses, local service providers, and e-commerce stores should prioritize schema markup and entity infrastructure over llms.txt implementation until the protocol gains confirmed AI consumer support.

What is the difference between llms.txt and llms-full.txt?

The llms.txt file serves as a streamlined navigation index linking to key pages with brief descriptions of 5-15 words each. The llms-full.txt file bundles all important documentation content into a single comprehensive markdown document for complete ingestion. Both files work together within the specification proposed by Jeremy Howard: llms.txt for quick orientation and prioritization, llms-full.txt for deep content consumption by AI systems with sufficient context windows.

All statistics verified as of March 2026. This article is reviewed quarterly. Strategies, adoption rates, and AI system behavior may have changed.

About the Author

Kurt Fischman is the CEO and founder of Growth Marshal, an AI-native search agency that helps challenger brands get recommended by large language models. Read some of Kurt's most recent research here.