Loading...
Loading...
Extract text, tables, and images from PDFs. Use when: extracting data from reports; converting PDF tables to CSV; pulling images from presentations; processing research papers; batch converting PDFs to text
npx skill4agent add guia-matthieu/clawfu-skills pdf-extractorExtract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Metric definitions |
| Identifies patterns in data | Business interpretation |
| Creates visualization templates | Dashboard design |
| Suggests optimization areas | Action priorities |
| Calculates statistical measures | Decision thresholds |
pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillowpython scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3python scripts/main.py images presentation.pdf --output ./images/python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdfpython scripts/main.py info document.pdfpython scripts/main.py tables annual-report.pdf --output financials.csv
# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csvpython scripts/main.py batch ./pdfs/ --output ./text/
# Converts all PDFs in folder to .txt filespython scripts/main.py text whitepaper.pdf --pages 1,5-10,15
# Extracts only pages 1, 5-10, and 15category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week