Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesepdf — PDF manipulation toolkit
pdf — PDF处理工具包
The gateway image ships:
- ,
pdftotext,pdftoppmfrompdfinfo(apt)poppler-utils - (Python) for programmatic merge / split / metadata edits
pypdf
网关镜像包含:
- ,
pdftotext,pdftoppm来自pdfinfo(apt安装)poppler-utils - 用于程序化合并/拆分/元数据编辑的(Python库)
pypdf
Common operations
常见操作
Extract text from a PDF
从PDF中提取文本
bash
pdftotext /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/input.txtbash
pdftotext /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/input.txtOr stream to stdout:
或输出到标准输出:
pdftotext /workspace/$(basename "$PWD")/input.pdf -
For LLM-friendly output, prefer the `markdown-converter` skill (which uses
markitdown and handles tables better). Use `pdftotext` only when you need
raw text quickly.pdftotext /workspace/$(basename "$PWD")/input.pdf -
如果需要适合LLM的输出,优先使用`markdown-converter`技能(它使用markitdown,处理表格效果更好)。仅当你需要快速获取原始文本时才使用`pdftotext`。Count pages
统计页数
bash
pdfinfo /workspace/$(basename "$PWD")/input.pdf | grep Pagesbash
pdfinfo /workspace/$(basename "$PWD")/input.pdf | grep PagesRender a page to PNG (preview / OCR input)
将页面渲染为PNG(预览/OCR输入)
bash
undefinedbash
undefinedAll pages → PNG at 150dpi
所有页面 → 150dpi的PNG
pdftoppm -png -r 150 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/page
pdftoppm -png -r 150 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/page
Just page 1
仅第1页
pdftoppm -png -r 150 -f 1 -l 1 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/cover
undefinedpdftoppm -png -r 150 -f 1 -l 1 /workspace/$(basename "$PWD")/input.pdf /workspace/$(basename "$PWD")/cover
undefinedMerge multiple PDFs
合并多个PDF
python
python3 - <<'PY'
from pypdf import PdfWriter
w = PdfWriter()
for f in ["a.pdf", "b.pdf", "c.pdf"]:
w.append(f)
w.write("/workspace/$(basename "$PWD")/merged.pdf")
PYpython
python3 - <<'PY'
from pypdf import PdfWriter
w = PdfWriter()
for f in ["a.pdf", "b.pdf", "c.pdf"]:
w.append(f)
w.write("/workspace/$(basename "$PWD")/merged.pdf")
PYSplit a PDF into per-page files
将PDF拆分为单页文件
python
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
for i, page in enumerate(r.pages):
w = PdfWriter()
w.add_page(page)
w.write(f"/workspace/$(basename "$PWD")/page_{i+1}.pdf")
PYpython
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
for i, page in enumerate(r.pages):
w = PdfWriter()
w.add_page(page)
w.write(f"/workspace/$(basename "$PWD")/page_{i+1}.pdf")
PYEdit PDF metadata
编辑PDF元数据
python
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
w = PdfWriter(clone_from=r)
w.add_metadata({"/Title": "New Title", "/Author": "Theo"})
w.write("/workspace/$(basename "$PWD")/output.pdf")
PYpython
python3 - <<'PY'
from pypdf import PdfReader, PdfWriter
r = PdfReader("/workspace/$(basename "$PWD")/input.pdf")
w = PdfWriter(clone_from=r)
w.add_metadata({"/Title": "New Title", "/Author": "Theo"})
w.write("/workspace/$(basename "$PWD")/output.pdf")
PYWhen NOT to use this skill
何时不使用此技能
- To read a PDF for context — use (cleaner output).
markdown-converter - To create a PDF from markdown — use (pandoc).
pandic-office - To create or edit Office docs (DOCX/XLSX/PPTX) — use .
officecli
- 若要读取PDF获取上下文——使用(输出更整洁)。
markdown-converter - 若要从Markdown生成PDF——使用(基于pandoc)。
pandic-office - 若要创建或编辑Office文档(DOCX/XLSX/PPTX)——使用。
officecli