Your Documentation Retrieval Quality Is Broken. Here's How to Fix It.
AI agents don't just read your docs — they retrieve from them. And most documentation is structured so badly for retrieval that agents regularly act on the wrong context. Here's why documentation retrieval quality matters and how to fix it.
Your Documentation Retrieval Quality Is Broken. Here’s How to Fix It.
There’s a post on Hacker News right now — 713 points, 273 comments — about using AI to write better code more slowly. The core insight: AI doesn’t replace code review. It gives you a first draft, and the real work starts with careful revision.
That post is about code quality. But there’s a subthread in those comments that points to something more fundamental, something nobody in the documentation space is talking about:
“There is a lot of scaffolding/documentation you can put in place to make this process way more effective.”
The scaffolding matters. The documentation structure matters. Not just for whether an AI agent reads your docs, but for whether it retrieves the right part of your docs at the right time.
The Retrieval Problem Nobody Mentions
Most writing about AI agents and documentation focuses on accuracy: “Your docs are wrong, so the AI generates wrong code.” That’s a real problem. boringdocs has written about it extensively.
But there’s a second problem that’s just as damaging and much less visible: retrieval quality.
When an AI agent needs to use your API, it doesn’t read your entire documentation set linearly. It searches. It retrieves relevant chunks. It embeds queries against your docs and returns the top-k most similar passages.
If your documentation is structured so that the relevant passage is buried inside a wall of text, interspersed with three other unrelated topics, the retrieval system picks up noise. It returns partial context. The agent acts on incomplete information.
Not because your docs are inaccurate. Because they’re poorly retrievable.
Why Documentation Retrieval Quality Is Broken
There are three structural reasons most documentation sets retrieve badly:
1. Topics Are Mixed Together
A single page covers authentication, rate limiting, pagination, and error handling — in that order, as one long scroll. When an agent queries “how does pagination work?”, the retrieval system returns the entire page. The pagination content is 20% of what the agent sees. The other 80% is noise.
Humans handle this fine. We skim. We scan for the header we need. We skip what’s irrelevant. AI agents don’t skim — they weight everything in the retrieved context window. Irrelevant content dilutes relevant content.
2. Context Is Implicit
Most docs say “Pass the cursor parameter” without specifying which endpoint, what format, what happens at the boundary, and what errors to expect. A human infers from surrounding context. An agent working from a retrieved chunk often can’t.
The problem isn’t that the docs are wrong. It’s that the critical context is on a different page — or three paragraphs above the chunk that got retrieved.
3. Page Boundaries Don’t Match Conceptual Boundaries
Documentation is organized for publishing, not for retrieval. A “Getting Started” page might contain one paragraph that’s critical for a specific integration task. But the retrieval system returns the entire “Getting Started” page when the agent needs that one paragraph.
The page is the unit of retrieval. But the page is rarely the right unit of knowledge.
What Good Retrieval-Aware Docs Look Like
Research on documentation for AI agents — including practical experiments in indexing and classifying technical docs locally — points to a set of patterns that dramatically improve retrieval quality:
One Concept Per Page
Each page should cover exactly one thing. Not one topic. One concept. The authentication flow for a specific endpoint. The pagination behavior for a specific resource. The error format for a specific error class.
This feels wrong if you’re used to writing for humans. Humans prefer long-form comprehensive pages. But if your documentation has an AI audience — and it does — one concept per page is the single highest-impact structural change you can make.
Front-Load the Context
The first paragraph of every page should answer: what is this page about, what does it apply to, and what does it not apply to. Not as a summary — as a semantic anchor.
Good example:
“This page describes the cursor-based pagination format used by the
/ordersand/invoicesendpoints. It does not apply to/users, which uses offset pagination. For general pagination concepts, see [Pagination Overview].”
In 40 words, the agent now knows exactly when this content is relevant and when it isn’t. That’s retrieval context, not just documentation.
Use Machine-Readable Markers
There are emerging standards for making documentation explicitly retrievable by agents. The llms.txt format — showing up at the roots of websites next to robots.txt — is one of them. It’s a Markdown file that maps your documentation structure for AI agents.
But you can also add structure at the page level:
- Use consistent heading patterns (
## Parameters,## Response Format,## Errors) - Use structured error tables instead of prose error descriptions
- Use code examples that are self-contained (no “see above” references)
The goal: every chunk the agent retrieves should be independently meaningful. No forward references. No “as described earlier.” No “see the intro for context.”
Classify Your Pages
One of the more interesting findings from experiments in improving local doc quality for AI agents is that page classification dramatically improves retrieval. When each page is tagged with its type — tutorial, reference, explanation, error guide — the retrieval system can filter by type instead of relying purely on semantic similarity.
Authors can classify pages manually. Or you can use automated classifiers. Either way, the result is that when an agent asks “how do I authenticate?”, the system retrieves the authentication reference page — not the tutorial that happens to mention authentication once.
The Accuracy Trap
Here’s where this connects back to accuracy.
When an agent retrieves the wrong chunk, it doesn’t just get confused. It generates code based on wrong context. That code compiles. It might even pass basic tests. But it’s wrong for the specific operation the developer asked about.
Your docs were “accurate” — the information existed somewhere in your documentation set. But the agent retrieved the wrong part. From the agent’s perspective, your docs told it to do the wrong thing.
This is why documenting retrieval quality is not the same as documenting accuracy. You can have perfectly accurate docs that produce consistently wrong agent behavior — because the structure defeats retrieval.
What Boringdocs Does About This
This is one of the reasons boringdocs exists as a validation layer — not just checking whether your docs are accurate, but checking whether the structure of your documentation actually supports the ways agents consume it.
Some of the checks boringdocs runs:
- Chunk independence: Can each section of your docs stand alone, or does it require context from elsewhere?
- Context frontloading: Does each page explicitly state what it covers and what it doesn’t?
- Retrieval noise ratio: When an agent queries a specific topic, how much irrelevant content is co-retrieved?
These are structural properties of documentation. They’re not about writing quality. They’re about architecture.
The Takeaway
If your documentation has an AI audience — and it does, whether you intended it or not — then you have two problems: accuracy and retrieval quality.
Accuracy gets all the attention. It’s the obvious problem. You fix it by keeping docs in sync with code. boringdocs exists to solve exactly that.
But retrieval quality is the invisible problem. Your docs can be perfectly accurate and still produce wrong agent behavior because the structure makes the right information unfindable.
Fix both. Validate accuracy with boringdocs. And audit retrieval quality: break mixed-topic pages into single-concept pages, front-load context, classify by type, and make every chunk self-contained.
Your AI agents will stop hallucinating based on wrong context. And your developers will stop wondering why the agent keeps getting things that “are in the docs” — just not where the agent can find them.