pdf-processing-openai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PDF Skill

PDF Skill

When to use

适用场景

  • Read or review PDF content where layout and visuals matter.
  • Create PDFs programmatically with reliable formatting.
  • Validate final rendering before delivery.
  • 当布局与视觉效果至关重要时,读取或审阅PDF内容。
  • 通过编程方式创建格式可靠的PDF。
  • 在交付前验证最终渲染效果。

Workflow

工作流程

  1. Prefer visual review: render PDF pages to PNGs and inspect them.
    • Use
      pdftoppm
      if available.
    • If unavailable, install Poppler or ask the user to review the output locally.
  2. Use
    reportlab
    to generate PDFs when creating new documents.
  3. Use
    pdfplumber
    (or
    pypdf
    ) for text extraction and quick checks; do not rely on it for layout fidelity.
  4. After each meaningful update, re-render pages and verify alignment, spacing, and legibility.
  1. 优先进行视觉审阅:将PDF页面渲染为PNG格式后检查。
    • 若环境中存在
      pdftoppm
      则使用该工具。
    • 若未安装,可安装Poppler或请用户在本地审阅输出结果。
  2. 创建新文档时,使用
    reportlab
    生成PDF。
  3. 使用
    pdfplumber
    (或
    pypdf
    )进行文本提取与快速检查;请勿依赖该工具保证布局还原度。
  4. 每次完成重要更新后,重新渲染页面并验证对齐方式、间距与可读性。

Temp and output conventions

临时文件与输出约定

  • Use
    tmp/pdfs/
    for intermediate files; delete when done.
  • Write final artifacts under
    output/pdf/
    when working in this repo.
  • Keep filenames stable and descriptive.
  • 中间文件存放在
    tmp/pdfs/
    目录下;使用完成后删除。
  • 在本仓库中工作时,将最终产物写入
    output/pdf/
    目录。
  • 保持文件名稳定且具有描述性。

Dependencies (install if missing)

依赖项(缺失时请安装)

Prefer
uv
for dependency management.
Python packages:
uv pip install reportlab pdfplumber pypdf
If
uv
is unavailable:
python3 -m pip install reportlab pdfplumber pypdf
System tools (for rendering):
undefined
优先使用
uv
进行依赖管理。
Python包:
uv pip install reportlab pdfplumber pypdf
uv
不可用:
python3 -m pip install reportlab pdfplumber pypdf
系统工具(用于渲染):
undefined

macOS (Homebrew)

macOS(Homebrew)

brew install poppler
brew install poppler

Ubuntu/Debian

Ubuntu/Debian

sudo apt-get install -y poppler-utils

If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.
sudo apt-get install -y poppler-utils

若当前环境无法安装,请告知用户缺失的依赖项及其本地安装方法。

Environment

环境要求

No required environment variables.
无必填环境变量。

Rendering command

渲染命令

pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX
pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX

Quality expectations

质量要求

  • Maintain polished visual design: consistent typography, spacing, margins, and section hierarchy.
  • Avoid rendering issues: clipped text, overlapping elements, broken tables, black squares, or unreadable glyphs.
  • Charts, tables, and images must be sharp, aligned, and clearly labeled.
  • Use ASCII hyphens only. Avoid U+2011 (non-breaking hyphen) and other Unicode dashes.
  • Citations and references must be human-readable; never leave tool tokens or placeholder strings.
  • 保持精致的视觉设计:统一的排版、间距、边距与章节层级。
  • 避免渲染问题:文本被截断、元素重叠、表格破损、黑色方块或无法识别的字形。
  • 图表、表格与图像必须清晰锐利、对齐规范且标注明确。
  • 仅使用ASCII连字符。避免使用U+2011(非断字连字符)及其他Unicode破折号。
  • 引用与参考文献必须便于人类阅读;绝不能保留工具令牌或占位符字符串。

Final checks

最终检查

  • Do not deliver until the latest PNG inspection shows zero visual or formatting defects.
  • Confirm headers/footers, page numbering, and section transitions look polished.
  • Keep intermediate files organized or remove them after final approval.
  • 直至最新的PNG检查结果显示无任何视觉或格式缺陷,才可交付产物。
  • 确认页眉/页脚、页码与章节过渡效果精致规范。
  • 整理中间文件,或在最终确认后删除。