Loading...
Loading...
Convert PDFs to Markdown using Mistral OCR API with image extraction. Use when you need to extract structured text and images from PDFs, especially for scanned documents or documents with complex formatting. Outputs Markdown with embedded images.
npx skill4agent add fuzhiyu/researchprojecttemplate mistral-pdf-to-markdown# Convert entire PDF
python scripts/convert_pdf_to_markdown.py input.pdf output.md
# Convert specific pages
python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5"
python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"Output/PDFConversions/
├── document.md # Markdown with text and image references
└── images/
├── img-0.jpeg # Extracted images
├── img-1.jpeg
└── ...from pathlib import Path
import subprocess
# Run conversion script
result = subprocess.run([
"python",
".claude/skills/mistral-pdf-to-markdown/scripts/convert_pdf_to_markdown.py",
"input.pdf",
"Output/PDFConversions/output.md",
"--pages", "1-10"
], capture_output=True, text=True)
print(result.stdout)images/Notes/.envmistral_api_key=...mistralaipython-dotenvpypdfpython scripts/convert_pdf_to_markdown.py \
"Data/papers/research.pdf" \
"Notes/Paper Markdown/research.md"# Extract pages 10-20 (introduction and methods)
python scripts/convert_pdf_to_markdown.py \
"paper.pdf" \
"Notes/Paper Markdown/intro_methods.md" \
--pages "10-20"# Extract pages with figures
python scripts/convert_pdf_to_markdown.py \
"paper.pdf" \
"Notes/Paper Markdown/figures.md" \
--pages "25,27,30,35"Error: Mistral API key not found in Notes/.envmistral_api_key=YOUR_KEYNotes/.envWarning: Page 100 out of range, skippingimages/images/img-X.jpegpdfpdfreference.md