pdf-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePDF Skill
PDF 技能
Purpose
用途
Provides expertise in programmatic PDF generation, parsing, and manipulation. Specializes in creating PDFs from scratch, extracting content, merging/splitting documents, and handling forms using PDFKit, PDF.js, Puppeteer, and similar tools.
提供程序化PDF生成、解析和处理的专业能力。擅长使用PDFKit、PDF.js、Puppeteer及同类工具从零创建PDF、提取内容、合并/拆分文档以及处理表单。
When to Use
适用场景
- Generating PDFs programmatically
- Extracting text or data from PDFs
- Merging or splitting PDF documents
- Filling PDF forms programmatically
- Converting HTML to PDF
- Adding watermarks or annotations
- Parsing PDF structure and metadata
- Building PDF report generators
- 程序化生成PDF
- 从PDF中提取文本或数据
- 合并或拆分PDF文档
- 程序化填写PDF表单
- 将HTML转换为PDF
- 添加水印或注释
- 解析PDF结构和元数据
- 构建PDF报告生成器
Quick Start
快速入门
Invoke this skill when:
- Generating PDFs from code or data
- Extracting content from PDF files
- Merging, splitting, or manipulating PDFs
- Filling or creating PDF forms
- Converting HTML/web pages to PDF
Do NOT invoke when:
- Word document creation → use
/docx-skill - Excel/spreadsheet work → use
/xlsx-skill - PowerPoint creation → use
/pptx-skill - General file operations → use Bash or file tools
在以下场景调用此技能:
- 从代码或数据生成PDF
- 从PDF文件提取内容
- 合并、拆分或处理PDF
- 填写或创建PDF表单
- 将HTML/网页转换为PDF
请勿在以下场景调用:
- 创建Word文档 → 使用
/docx-skill - Excel/电子表格处理 → 使用
/xlsx-skill - 创建PowerPoint → 使用
/pptx-skill - 常规文件操作 → 使用Bash或文件工具
Decision Framework
决策框架
PDF Operation?
├── Generate from scratch
│ ├── Simple → PDFKit (Node) / ReportLab (Python)
│ └── Complex layouts → Puppeteer/Playwright + HTML
├── Parse/Extract
│ ├── Text extraction → pdf-parse / PyPDF2
│ └── Table extraction → Camelot / Tabula
├── Manipulate
│ └── pdf-lib (merge, split, edit)
└── Forms
└── pdf-lib (fill) / PDFtk (advanced)PDF Operation?
├── Generate from scratch
│ ├── Simple → PDFKit (Node) / ReportLab (Python)
│ └── Complex layouts → Puppeteer/Playwright + HTML
├── Parse/Extract
│ ├── Text extraction → pdf-parse / PyPDF2
│ └── Table extraction → Camelot / Tabula
├── Manipulate
│ └── pdf-lib (merge, split, edit)
└── Forms
└── pdf-lib (fill) / PDFtk (advanced)Core Workflows
核心工作流
1. PDF Generation with PDFKit
1. 使用PDFKit生成PDF
- Install PDFKit ()
npm install pdfkit - Create new PDDocument
- Add content (text, images, graphics)
- Style with fonts and colors
- Add pages as needed
- Pipe to file or response
- 安装PDFKit()
npm install pdfkit - 创建新的PDDocument
- 添加内容(文本、图片、图形)
- 设置字体和颜色样式
- 根据需要添加页面
- 输出到文件或响应
2. HTML to PDF Conversion
2. HTML转PDF
- Set up Puppeteer/Playwright
- Navigate to HTML content or URL
- Configure page size and margins
- Set print options (headers, footers)
- Generate PDF buffer
- Save or stream result
- 配置Puppeteer/Playwright
- 加载HTML内容或URL
- 配置页面尺寸和边距
- 设置打印选项(页眉、页脚)
- 生成PDF缓冲区
- 保存或流式输出结果
3. PDF Parsing and Extraction
3. PDF解析与提取
- Choose parser (pdf-parse, PyPDF2, pdfplumber)
- Load PDF file
- Extract text or structured data
- Handle multi-page documents
- Clean and normalize extracted text
- Output in desired format
- 选择解析器(pdf-parse、PyPDF2、pdfplumber)
- 加载PDF文件
- 提取文本或结构化数据
- 处理多页文档
- 清理并标准化提取的文本
- 以所需格式输出
Best Practices
最佳实践
- Use vector graphics over raster when possible
- Embed fonts for consistent rendering
- Test PDF output across different readers
- Handle large PDFs with streaming
- Use appropriate library for task complexity
- Consider accessibility (tagged PDFs)
- 尽可能使用矢量图形而非栅格图形
- 嵌入字体以确保渲染一致性
- 在不同阅读器中测试PDF输出
- 使用流式处理大型PDF
- 根据任务复杂度选择合适的库
- 考虑可访问性(带标签的PDF)
Anti-Patterns
反模式
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Image-only PDFs | Not searchable/accessible | Use text with fonts |
| No font embedding | Rendering issues | Embed required fonts |
| Memory loading large PDFs | Crashes | Stream processing |
| Ignoring encryption | Security/access issues | Handle encrypted PDFs |
| Wrong tool for job | Over-engineering | Match tool to complexity |
| 反模式 | 问题 | 正确做法 |
|---|---|---|
| 纯图片PDF | 无法搜索/访问 | 使用带字体的文本 |
| 未嵌入字体 | 渲染问题 | 嵌入所需字体 |
| 内存加载大型PDF | 崩溃 | 流式处理 |
| 忽略加密 | 安全/访问问题 | 处理加密PDF |
| 工具选择不当 | 过度设计 | 根据复杂度匹配工具 |