pdf-extractor

Original：🇺🇸 English

Translated

1 scriptsChecked / no sensitive code detected

Extract text, tables, and images from PDFs. Use when: extracting data from reports; converting PDF tables to CSV; pulling images from presentations; processing research papers; batch converting PDFs to text

8installs

Sourceguia-matthieu/clawfu-skills

Added on2026-02-28

NPX Install

npx skill4agent add guia-matthieu/clawfu-skills pdf-extractor

SKILL.md Content

View Translation Comparison →

PDF Extractor

Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

When to Use This Skill

Report processing - Extract data from PDF reports
Table extraction - Convert PDF tables to CSV
Image collection - Pull images from presentations
Text mining - Bulk convert PDFs to searchable text
Research - Process academic papers and whitepapers

What Claude Does vs What You Decide

Claude Does	You Decide
Structures analysis frameworks	Metric definitions
Identifies patterns in data	Business interpretation
Creates visualization templates	Dashboard design
Suggests optimization areas	Action priorities
Calculates statistical measures	Decision thresholds

Dependencies

bash

pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillow

Commands

Extract Text

bash

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

Extract Tables

bash

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

Extract Images

bash

python scripts/main.py images presentation.pdf --output ./images/

Merge PDFs

bash

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

PDF Info

bash

python scripts/main.py info document.pdf

Examples

Example 1: Extract Financial Tables

bash

python scripts/main.py tables annual-report.pdf --output financials.csv

# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Example 2: Batch Convert to Text

bash

python scripts/main.py batch ./pdfs/ --output ./text/

# Converts all PDFs in folder to .txt files

Example 3: Extract Specific Pages

bash

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

# Extracts only pages 1, 5-10, and 15

Skill Boundaries

What This Skill Does Well

Structuring data analysis
Identifying patterns and trends
Creating visualization frameworks
Calculating statistical measures

What This Skill Cannot Do

Access your actual data
Replace statistical expertise
Make business decisions
Guarantee prediction accuracy

Related Skills

web-scraper - Scrape web content
content-repurposer - Repurpose extracted content

Skill Metadata

Mode: centaur

yaml

category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week

pdf-extractor

NPX Install

Tags

SKILL.md Content

PDF Extractor

When to Use This Skill

What Claude Does vs What You Decide

Dependencies

Commands

Extract Text

Extract Tables

Extract Images

Merge PDFs

PDF Info

Examples

Example 1: Extract Financial Tables

Example 2: Batch Convert to Text

Example 3: Extract Specific Pages

Skill Boundaries

What This Skill Does Well

What This Skill Cannot Do

Related Skills

Skill Metadata