extracting-mistral-ocr

Original：🇺🇸 English

Translated

1 scripts

Extracts text, tables, and images from PDFs (including scanned PDFs) using the Mistral OCR API. Use when user asks to OCR a PDF/image, extract text from a PDF, parse a scanned document, convert a PDF to Markdown, or extract structured fields from a document.

5installs

Sourcetristanmanchester/agent-skills

Added on2026-05-12

NPX Install

npx skill4agent add tristanmanchester/agent-skills extracting-mistral-ocr

SKILL.md Content

View Translation Comparison →

Mistral OCR PDF extraction

Quick start (default)

Run the bundled script to OCR a local PDF and write Markdown + JSON outputs:

bash

python {baseDir}/scripts/mistral_ocr_extract.py --input path/to/file.pdf --out out/ocr

Output directory layout:

```
combined.md
```
(all pages concatenated)
```
pages/page-000.md
```
(per-page markdown)
```
raw_response.json
```
(full OCR response)
```
images/
```
(decoded embedded images, if requested)
```
tables/
```
(separate tables, if requested)

Workflow

Pick input mode
- Local PDF (most common): upload via Files API, then OCR via
```
file_id
```
  .
- Public URL: OCR directly via
```
document_url
```
  .
Choose output fidelity (defaults are safe for RAG)
- Keep
```
table_format=inline
```
  unless the user explicitly wants tables split out.
- Set
```
--include-image-base64
```
  when the user needs figures/diagrams extracted.
- Use
```
--extract-header/--extract-footer
```
  if header/footer noise hurts downstream search.
Run OCR
- Use
```
scripts/mistral_ocr_extract.py
```
  to produce a deterministic on-disk artefact set.

(Optional) Structured extraction from the whole document

If the user wants fields (invoice totals, contract parties, etc.), provide an annotation prompt.
The OCR API can return a document-level
```
document_annotation
```
in addition to page markdown.

Example: