pdf-extractor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PDF Extractor

PDF提取工具

Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

使用pdfplumber从PDF文件中提取文本、表格和图片——将静态PDF转换为可用数据。

When to Use This Skill

适用场景

Report processing - Extract data from PDF reports
Table extraction - Convert PDF tables to CSV
Image collection - Pull images from presentations
Text mining - Bulk convert PDFs to searchable text
Research - Process academic papers and whitepapers

报告处理 - 从PDF报告中提取数据
表格提取 - 将PDF表格转换为CSV
图片收集 - 从演示文稿中提取图片
文本挖掘 - 批量将PDF转换为可搜索文本
研究工作 - 处理学术论文和白皮书

What Claude Does vs What You Decide

Claude负责的工作 vs 由你决定的内容

Claude Does	You Decide
Structures analysis frameworks	Metric definitions
Identifies patterns in data	Business interpretation
Creates visualization templates	Dashboard design
Suggests optimization areas	Action priorities
Calculates statistical measures	Decision thresholds

Claude负责的工作	由你决定的内容
构建分析框架	指标定义
识别数据中的模式	业务解读
创建可视化模板	仪表盘设计
提出优化方向建议	行动优先级
计算统计指标	决策阈值

Dependencies

依赖项

bash

pip install pdfplumber pypdf click pandas

bash

pip install pdfplumber pypdf click pandas

For image extraction:

pip install Pillow

undefined

pip install Pillow

undefined

Commands

命令

Extract Text

提取文本

bash

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

bash

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

Extract Tables

提取表格

bash

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

bash

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

Extract Images

提取图片

bash

python scripts/main.py images presentation.pdf --output ./images/

bash

python scripts/main.py images presentation.pdf --output ./images/

Merge PDFs

合并PDF

bash

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

bash

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

PDF Info

PDF信息查询

bash

python scripts/main.py info document.pdf

bash

python scripts/main.py info document.pdf

Examples

示例

Example 1: Extract Financial Tables

示例1：提取财务表格

bash

python scripts/main.py tables annual-report.pdf --output financials.csv

bash

python scripts/main.py tables annual-report.pdf --output financials.csv

Output: financials.csv with all tables found

Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

undefined

undefined

Example 2: Batch Convert to Text

示例2：批量转换为文本

bash

python scripts/main.py batch ./pdfs/ --output ./text/

bash

python scripts/main.py batch ./pdfs/ --output ./text/

Converts all PDFs in folder to .txt files

undefined

undefined

Example 3: Extract Specific Pages

示例3：提取指定页面

bash

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

bash

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

Extracts only pages 1, 5-10, and 15

undefined

undefined

Skill Boundaries

技能边界

What This Skill Does Well

本技能擅长的工作

Structuring data analysis
Identifying patterns and trends
Creating visualization frameworks
Calculating statistical measures

结构化数据分析
识别模式和趋势
创建可视化框架
计算统计指标

What This Skill Cannot Do

本技能无法完成的工作

Access your actual data
Replace statistical expertise
Make business decisions
Guarantee prediction accuracy

访问你的实际数据
替代专业统计知识
做出商业决策
保证预测准确性

Related Skills

Skill Metadata

技能元数据

Mode: centaur

yaml

category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week

Mode: centaur

yaml

category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week

pdf-extractor

Original

Translation

PDF Extractor

PDF提取工具

When to Use This Skill

适用场景

What Claude Does vs What You Decide

Claude负责的工作 vs 由你决定的内容

Dependencies

依赖项

For image extraction:

For image extraction:

Commands

命令

Extract Text

提取文本

Extract Tables

提取表格

Extract Images

提取图片

Merge PDFs

合并PDF

PDF Info

PDF信息查询

Examples

示例

Example 1: Extract Financial Tables

示例1：提取财务表格

Output: financials.csv with all tables found

Output: financials.csv with all tables found

Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Example 2: Batch Convert to Text

示例2：批量转换为文本

Converts all PDFs in folder to .txt files

Converts all PDFs in folder to .txt files

Example 3: Extract Specific Pages

示例3：提取指定页面

Extracts only pages 1, 5-10, and 15

Extracts only pages 1, 5-10, and 15

Skill Boundaries

技能边界

What This Skill Does Well

本技能擅长的工作

What This Skill Cannot Do

本技能无法完成的工作

Related Skills

相关技能

Skill Metadata

技能元数据