Your Docs Got an Agent Score of 72. Now What?

Mintlify just released a free tool that scores how “agent-ready” your documentation is. You paste in your docs URL, it runs an analysis, and gives you a number. 72 out of 100. 85 out of 100.

It’s a good tool. And it’s missing the point.

The Score

Mintlify’s Agent Score checks things like: Is your structure clear? Are your headings descriptive? Is the content organized so an AI agent can navigate it without getting lost? Does your markup follow patterns that LLMs parse well?

These are real problems. A score of 40 means your docs are probably a wall of text with no hierarchy, buried examples, and inconsistent formatting. An agent trying to use those docs will struggle. The score is useful as a structural diagnostic.

But here’s what the score doesn’t check: Is any of it true?

Structure ≠ Accuracy

Imagine two documentation sets for the same API.

Set A has perfect structure. Clear headings. Logical navigation. Every endpoint has a description, parameters, and an example response. The Agent Score: 94.

But the response schema for POST /users describes a field called user_id as an integer. The actual API expects a UUID string. The error codes listed don’t include the ones the API actually returns. The rate limit documented is 100 requests per minute. The real limit is 60.

Set B has mediocre structure. Some headings are vague. The navigation is flat. Examples are inline instead of in separate sections. The Agent Score: 61.

But every endpoint, field, and example matches the actual API. The docs were validated against the codebase on the last commit. The structure isn’t pretty, but the content is correct.

Which set would you rather an AI agent read?

An agent reading Set A will generate flawless-looking code that fails at runtime. The structure is beautiful. The code is wrong.

An agent reading Set B will generate code that works. The structure is rough. The code is right.

The Agent Score can’t tell these two apart. It measures findability. It doesn’t measure truth.

The Industry’s Scoring Obsession

Mintlify isn’t alone in the scoring game. The documentation industry has been racing to define metrics: completeness scores, freshness scores, readability scores. All of them measure something real. None of them measure accuracy.

This is the food-rating problem. You can score a restaurant on ambiance, service, presentation, and menu design. A restaurant can excel at all four and still serve you food that makes you sick. If the meal is contaminated, nothing else matters.

Documentation is the same way. If the content is wrong — if the endpoint descriptions don’t match the API, if the examples return different shapes than the real response, if the documented error codes don’t include the ones that actually fire — then structure, readability, and freshness are irrelevant.

A perfectly-organized lie is still a lie.

What Mintlify’s $45M Raise Tells Us

Last month, Mintlify announced a $45M Series B led by Andreessen Horowitz and Salesforce Ventures. It’s a significant bet that documentation is a big market worth serious infrastructure.

They’re right about the market. Documentation is infrastructure. It’s the API contract, the onboarding material, the support deflection layer, and increasingly, the training data for AI agents. The companies that get documentation right will ship faster and support less.

But the $45M is going toward a specific vision of documentation infrastructure: better authoring, better editing, better AI-assisted writing. Mintlify is building a better word processor for docs.

A better word processor is great. If your docs are accurate, it makes them faster to produce and easier to maintain.

If your docs are inaccurate, it makes you write inaccurate docs faster.

The missing piece in the entire docs-platform landscape — Mintlify included — is validation. A system that reads what you wrote, reads your code, and tells you where they disagree. Not at publish time. On every commit. Continuously.

The Two Problems of AI-Native Docs

AI agents need documentation to solve two problems:

Find the right information. This is a navigation and structure problem. Mintlify’s Agent Score addresses this. llms.txt addresses this. GoodInformation architecture addresses this.
Trust what they find. This is an accuracy and validation problem. No scoring tool addresses this. No docs platform addresses this. No AI writing assistant addresses this.

Most of the industry is solving problem #1. And it’s the easier problem. Structure is visible. You can score it from the outside. You can optimize it with better templates and clearer headings.

Problem #2 is harder because you can’t see it from the outside. Accuracy is only visible when you compare docs to code. When you run the endpoint and check whether the response matches the documentation. When you trace the actual error path and see if the documented errors are the ones that fire.

This requires a validation layer. Not a writing tool. Not an editor. A system that sits between your code and your docs and checks, continuously, that they agree.

The Math of Wrong Docs

Here’s what makes this urgent: the cost of inaccurate documentation is multiplying.

When a human reads a wrong API doc, they get confused. They check the code. They figure it out. They waste an hour.

When an AI agent reads a wrong API doc, it generates wrong integration code. It doesn’t confuse the agent — agents don’t get confused, they hallucinate confidently. The generated code compiles. It runs. It fails at runtime in ways that are hard to debug because the code looks right according to the documentation the developer was never shown.

Now multiply that by every agent, every IDE integration, every automated support bot, every MCP server that reads your docs. Every consumer of wrong docs produces wrong output. The cost of inaccuracy scales with the number of consumers, and in 2026, you have more machine consumers than human ones.

A structural score of 94 means nothing if 20% of your endpoints are documented wrong. The agents are finding those wrong docs efficiently. That’s the problem.

What Comes After the Score

If you run Mintlify’s Agent Score and get a 72, the tool will tell you what’s structurally wrong with your docs. Fix the headings. Add more structure. Clarify the navigation. This is good advice. Follow it.

Then ask the question the score can’t answer: Is any of it true?

To answer that, you need a different kind of tool. One that:

Reads your code and extracts the ground truth: endpoints, schemas, types, error paths
Reads your documentation and extracts what it claims: endpoints, schemas, types, error paths
Compares the two and reports discrepancies — not as suggestions, as errors
Runs continuously, on every commit, because the code-doc gap widens with every change

That’s validation. Not scoring. Not writing assistance. Not a better editor. Validation.

The Bottom Line

Mintlify’s Agent Score is a useful structural diagnostic. Use it to improve your docs’ navigability. Their $45M raise validates that documentation infrastructure is a market worth building in.

But structure is not accuracy. A high agent score on inaccurate docs just means AI agents will efficiently consume your mistakes.

The next step after the score isn’t a better editor. It’s a validation layer. Because the question isn’t whether agents can find your docs. It’s whether your docs are telling the truth when they do.

An agent score measures findability. Accuracy needs validation. Join the waitlist — boringdocs is the validation layer that checks whether your docs are right, not just readable. Because in the AI era, the cost of wrong docs isn’t confusion. It’s confident, efficient, scalable wrongness.