pdf

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

pdf — PDF manipulation toolkit

pdf — PDF处理工具包

The gateway image ships:
  • pdftotext
    ,
    pdftoppm
    ,
    pdfinfo
    from
    poppler-utils
    (apt)
  • pypdf
    (Python) for programmatic merge / split / metadata edits
网关镜像包含:
  • pdftotext
    ,
    pdftoppm
    ,
    pdfinfo
    来自
    poppler-utils
    (apt安装)
  • 用于程序化合并/拆分/元数据编辑的
    pypdf
    (Python库)

Common operations

常见操作

Extract text from a PDF

从PDF中提取文本

bash
pdftotext /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/input.txt
bash
pdftotext /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/input.txt

Or stream to stdout:

或输出到标准输出:

pdftotext /workspace/$(basename "$PWD")/input.pdf -

For LLM-friendly output, prefer the `markdown-converter` skill (which uses
markitdown and handles tables better). Use `pdftotext` only when you need
raw text quickly.
pdftotext /workspace/$(basename "$PWD")/input.pdf -

如果需要适合LLM的输出,优先使用`markdown-converter`技能(它使用markitdown,处理表格效果更好)。仅当你需要快速获取原始文本时才使用`pdftotext`。

Count pages

统计页数

bash
pdfinfo /workspace/$(basename "$PWD")/input.pdf | grep Pages
bash
pdfinfo /workspace/$(basename "$PWD")/input.pdf | grep Pages

Render a page to PNG (preview / OCR input)

将页面渲染为PNG(预览/OCR输入)

bash
undefined
bash
undefined

All pages → PNG at 150dpi

所有页面 → 150dpi的PNG

pdftoppm -png -r 150 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/page
pdftoppm -png -r 150 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/page

Just page 1

仅第1页

pdftoppm -png -r 150 -f 1 -l 1 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/cover
undefined
pdftoppm -png -r 150 -f 1 -l 1 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/cover
undefined

Merge multiple PDFs

合并多个PDF

python
python3 - <<'PY'
from pypdf import PdfWriter
w = PdfWriter()
for f in ["a.pdf", "b.pdf", "c.pdf"]:
    w.append(f)
w.write("/workspace/$(basename "$PWD")/merged.pdf")
PY
python
python3 - <<'PY'
from pypdf import PdfWriter
w = PdfWriter()
for f in ["a.pdf", "b.pdf", "c.pdf"]:
    w.append(f)
w.write("/workspace/$(basename "$PWD")/merged.pdf")
PY

Split a PDF into per-page files

将PDF拆分为单页文件

python
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
for i, page in enumerate(r.pages):
    w = PdfWriter()
    w.add_page(page)
    w.write(f"/workspace/$(basename "$PWD")/page_{i+1}.pdf")
PY
python
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
for i, page in enumerate(r.pages):
    w = PdfWriter()
    w.add_page(page)
    w.write(f"/workspace/$(basename "$PWD")/page_{i+1}.pdf")
PY

Edit PDF metadata

编辑PDF元数据

python
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
w = PdfWriter(clone_from=r)
w.add_metadata({"/Title": "New Title", "/Author": "Theo"})
w.write("/workspace/$(basename "$PWD")/output.pdf")
PY
python
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
w = PdfWriter(clone_from=r)
w.add_metadata({"/Title": "New Title", "/Author": "Theo"})
w.write("/workspace/$(basename "$PWD")/output.pdf")
PY

When NOT to use this skill

何时不使用此技能

  • To read a PDF for context — use
    markdown-converter
    (cleaner output).
  • To create a PDF from markdown — use
    pandic-office
    (pandoc).
  • To create or edit Office docs (DOCX/XLSX/PPTX) — use
    officecli
    .
  • 若要读取PDF获取上下文——使用
    markdown-converter
    (输出更整洁)。
  • 若要从Markdown生成PDF——使用
    pandic-office
    (基于pandoc)。
  • 若要创建或编辑Office文档(DOCX/XLSX/PPTX)——使用
    officecli