Why AI Tools Struggle with PDFs — and How Markdown Fixes It
16 May 2026 · 5 min read
You upload a 10-page PDF to an AI assistant and ask for a summary of section three. The response is confident, vague, and wrong about which section is which. This is not a model intelligence problem. It is a format problem — and it is almost entirely avoidable.
What actually happens when you hand a PDF to an AI
When an AI system receives a PDF, it has two broad options for reading it. The first is vision mode: the document is rendered page by page as images, and the model reads those images the same way it reads a photograph. This works surprisingly well for visual layouts, but it is expensive. Each rendered page consumes roughly 750 tokens just for the image encoding, before any of the text has been understood.
The second approach is text extraction: the PDF is parsed for its embedded text content and that raw text is passed to the model. This sounds more efficient — and it often is — but it strips every structural signal the document had. A section heading formatted in 18pt bold looks identical to body text once it has been converted to a flat string of characters. Bullet points become orphaned words separated by inconsistent whitespace. A table becomes a river of numbers with no alignment.
The model receives something that reads like a shredded document reassembled by a well-meaning machine.
The token maths
Here is a concrete example. A 10-page business report containing around 2,500 words of actual content:
- As a vision PDF (10 pages × ~750 tokens/page for images, plus text): 7,500–15,000 tokens depending on the model and resolution settings.
- As extracted plain text (flat, unstructured): approximately 3,500–4,500 tokens — cheaper, but structurally useless.
- As clean Markdown (headings, bullets, tables preserved): approximately 3,000–3,500 tokens — cheaper than flat text because Markdown syntax is compact, and the structure is retained.
Converting to Markdown first typically reduces token consumption by 70–80% compared with the vision approach, while delivering far better structure than raw text extraction. On a document you reference repeatedly across a long conversation, this compounds quickly.
Why Markdown is the native language of AI models
Large language models are trained on enormous corpora of text from the internet. GitHub repositories, Stack Overflow answers, Wikipedia articles, technical documentation — a huge proportion of this is written in Markdown. The models have seen # used as a heading marker billions of times. They have seen - used as a bullet indicator across every conceivable subject domain. They are natively fluent in Markdown in a way they simply are not fluent in the structural conventions of a PDF renderer.
When the model reads ## Section 3: Financial Summary, it knows with high confidence that everything beneath this marker until the next ## belongs to that section. When it reads a flat string of text in which "Section 3: Financial Summary" appears somewhere in the middle, it is working much harder and making more assumptions.
What structure preservation means in practice
The difference becomes stark when you make targeted requests. "Summarise the risks identified in section four" requires the model to locate section four. In a Markdown document, that is a trivial lookup — scan for ## Section 4 or equivalent. In a flat text extraction, it is a semantic search through undifferentiated prose, and the boundaries between sections are often genuinely ambiguous.
Properly converted Markdown also preserves tables as Markdown tables (the | pipe syntax), which models can read and reason over accurately. It preserves numbered lists as numbered lists. It preserves code blocks as code blocks. The document's information architecture survives the transfer.
When to keep the PDF
Markdown conversion is not always the right answer. If the document's visual layout carries meaning — architectural drawings, charts where position matters, design mockups, scanned forms with hand-written annotations — then vision mode exists for good reason. The model needs to see the page, not read its text. For these cases, accept the token cost and use the PDF directly.
For everything else — reports, contracts, research papers, meeting notes, technical specifications — convert first, then paste.