LLMs.txt: The New Format for the AI Era
Why your brand needs to be cited by AI (and not just by Google)
Traditional search is no longer the first point of contact: more than 25% of US users already ask ChatGPT, Claude, or Perplexity first. If your content doesn’t appear in their responses, you disappear from the conversation. The new SEO isn’t about ranking ten blue links, but about training the model to name you when someone queries your niche.
This shift has fueled speculation about “magic formats” that would make AIs read us better. The latest on the list is LLMs.txt.
What is LLMs.txt and what problem does it solve?
LLMs.txt is a plain text file inspired by the classic robots.txt that allows website owners to specify how their content should be handled by AI training systems. Its purpose: communicate to AI crawlers which pages of your site are relevant, offer summaries in markdown, and theoretically facilitate semantic indexing.
The problem it solves
Currently, AI training data collection occurs largely without explicit permissions or clear attribution mechanisms. This creates several problems:
- Lack of Transparency: Content creators often don’t know if their work is being used for AI training
- No Attribution: Original authors don’t receive credit when their content contributes to AI models
- Ethical Concerns: Some creators may not want their work used for AI training at all
- Quality Control: No mechanism exists to ensure training data quality and source verification
LLMs.txt file format and directives
The file is placed at the website root (e.g., https://example.com/llms.txt
) and can include directives like:
# LLMs.txt - AI Training Data Attribution
User-agent: *
Allow: /blog/
Disallow: /private/
Attribution: required
License: CC-BY-4.0
Contact: ai-licensing@example.com
Main directives
- Allow/Disallow: Specify what content can be used for training
- Attribution: Require proper attribution when content is used
- License: Specify license terms for AI training use
- Contact: Provide contact information for AI licensing inquiries
Does it actually work? Market reality
OpenAI, Anthropic, and Perplexity have referenced the standard in internal documentation. OpenAI File Search itself mentions the utility of plain text files for training embeddings, which generated optimistic headlines. But referencing isn’t the same as prioritizing.
Do SEO heavyweights use it?
To settle doubts, we checked six top sites: Ahrefs, Moz, HubSpot, Semrush, Backlinko, Wordstream. The result: all return 404. Neither do The New York Times, BBC, nor the brands most cited by AIs in their responses.
ChatGPT, for its part, can read navigation, footers, YouTube transcripts with timestamps, and complete articles without needing this file. The conclusion is clear: the problem isn’t the format, it’s content quality and quantity.
Google is becoming a publisher with its AI
Meanwhile, Google responds to generic queries —“what is inbound marketing”, “best AI courses”, “what is urban mobility”, etc.— and republishes them. This turns Google into both competitor and distributor, a scenario reminiscent of Facebook Instant Articles or Apple News, but powered by language models.
The official Google I/O 2024 documentation confirms that content indexed for AI Overviews comes from the same traditional index; there’s no special LLMs.txt signal.
Potential benefits for creators and AI developers
For content creators
- Control over usage: Creators can explicitly control how their work is used
- Opt-out capability: Ability to completely exclude themselves from AI training
- Licensing revenue: Specify license terms and potentially generate income
For AI developers
- Clear guidelines: Explicit guidance on what content they can use and how
- Ethical compliance: Follow ethical AI development practices
- Quality assurance: Better tracking of training data sources
Implementation challenges
Adoption and standardization
- Major platforms need to implement the standard
- Content creators need to understand and use it
- AI developers need to respect the directives
Practical enforcement
Ensuring AI developers actually follow the directives requires:
- Technical enforcement mechanisms
- Legal frameworks for compliance
- Audit and verification processes
Current state and outlook
Sites like MyLLMtxt.com promote it as the “basic AI Intelligence tool,” while Directory.llmstxt.cloud collects implementation examples.
The LLMs.txt
standard is still in the proposal stage, but represents an important step toward more ethical and transparent AI development. Various organizations and researchers are working on:
- Development and refinement of the standard
- Implementation tools and libraries
- Industry adoption strategies
Conclusion: content before format
LLMs.txt is hype useful for selling tools, not a confirmed ranking signal. If your goal is to appear in ChatGPT or Claude, focus efforts on: long and well-referenced articles, FAQPage and HowTo schema, or presence in sources that AIs already consume (Wikipedia, Stack Overflow, academic repositories).
In the era of generative responses, being a primary source of content in text or video, and transactions is more valuable than keeping up with trendy formats.
Implementation example
For those interested in experimenting, here’s a basic LLMs.txt
file:
# LLMs.txt for AI Training Data Attribution
User-agent: *
Allow: /blog/
Allow: /articles/
Disallow: /admin/
Disallow: /private/
Attribution: required
License: CC-BY-4.0
Contact: ai-licensing@example.com
Last-Modified: 2024-01-30
The LLMs.txt standard is still in development and its practical effectiveness remains unproven. Content creators and AI developers should stay informed about its evolution, but prioritize content quality over experimental formats.