Loading...
Loading...
Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.
npx skill4agent add aliceisjustplaying/claude-resources-monorepo pdf-to-markdown~/.claude/skills/pdf-to-markdown/.venv/cd ~/.claude/skills/pdf-to-markdown && uv venv .venv && uv pip install --python .venv/bin/python pymupdf docling docling-core~/.claude/skills/pdf-to-markdown/.venv/bin/python -c "import pymupdf; import docling; import docling_core; print('OK')"# Convert PDF to markdown (always extracts images)
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py document.pdf
# Output: document.md + images/ folder (next to the .md file)test -d ~/.claude/skills/pdf-to-markdown/.venv || (cd ~/.claude/skills/pdf-to-markdown && uv venv .venv && uv pip install --python .venv/bin/python pymupdf docling docling-core)~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py "/path/to/document.pdf"# Output is written to document.md in the same directory as the PDF
cat /path/to/document.md~/.cache/pdf-to-markdown/<cache_key>/--clear-cache--clear-all-cache# Clear cache for a specific PDF
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py document.pdf --clear-cache
# Clear entire cache
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py --clear-all-cache
# Show cache statistics
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py --cache-stats~/.cache/pdf-to-markdown/<cache_key>/
├── metadata.json # source path, mtime, size, total_pages
├── full_output.md # cached full markdown
└── images/ # extracted images~/.cache/pdf-to-markdown/<cache_key>/images/images/.mdimages/filename.png**[Image: figure_1.png (1200x800, 125.3KB)]**---
source: document.pdf
total_pages: 42
extracted_at: 2025-01-15T10:30:00
from_cache: true
images_dir: images
---# Main Title
## Section Header
Regular paragraph text with **bold**, *italic*, and `code` formatting.

**[Image: figure_1.png (800x600, 45.2KB)]**
| Column A | Column B |
|----------|----------|
| Data 1 | Data 2 |---
## Extracted Images
| # | File | Dimensions | Size |
|---|------|------------|------|
| 1 | figure_1.png | 800x600 | 45.2KB |
| 2 | chart_2.png | 1200x800 | 89.1KB |~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.pyUsage: pdf_to_md.py <input.pdf> [output.md] [options]
Options:
--no-progress Disable progress indicator
Cache Options:
--clear-cache Clear cache for this PDF and re-extract
--clear-all-cache Clear entire cache directory and exit
--cache-stats Show cache statistics and exitcd ~/.claude/skills/pdf-to-markdown && rm -rf .venv && uv venv .venv && uv pip install --python .venv/bin/python pymupdf docling docling-corebrew install tesseract