pdf-factory
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePDF Factory
PDF工厂
Dependencies
依赖项
Run before first use:
bash
python3 scripts/install_deps.pyRequired packages: xhtml2pdf, reportlab, pypdf, pyhanko, markdown, lxml,
pillow, html5lib, cssselect2, svglib, python-bidi, arabic-reshaper.
The installer uses for svglib, rlpycairo, and xhtml2pdf to avoid
building the C extension (which requires the system library).
If installation fails with cairo/meson errors, run manually:
--no-depspycairocairouv pip install --no-deps rlpycairo svglib xhtml2pdf首次使用前运行:
bash
python3 scripts/install_deps.py所需依赖包:xhtml2pdf, reportlab, pypdf, pyhanko, markdown, lxml,
pillow, html5lib, cssselect2, svglib, python-bidi, arabic-reshaper.
安装器对svglib、rlpycairo和xhtml2pdf使用参数,以避免构建 C扩展(该扩展需要系统库)。
如果安装过程中出现cairo/meson相关错误,请手动运行以下命令:
--no-depspycairocairouv pip install --no-deps rlpycairo svglib xhtml2pdfIcons
图标
Phosphor icons are fetched on demand (not bundled). To pre-download icons:
bash
undefinedPhosphor图标为按需获取(未打包)。如需预下载图标:
bash
undefinedFetch specific icons
下载指定图标
python3 scripts/fetch_icons.py arrow-right check-circle warning
python3 scripts/fetch_icons.py arrow-right check-circle warning
Fetch all 1,500+ icons for offline use
下载全部1500+图标以供离线使用
python3 scripts/fetch_icons.py --all
undefinedpython3 scripts/fetch_icons.py --all
undefinedPipeline
处理流程
Generate PDFs by following these steps in order:
- Resolve brand kit — Locate brand-{slug} skill, read manifest.json and zones.json
- Parse markdown — Convert source to HTML via library
markdown - Render content pages — Run render.py to produce styled content pages
- Compose document — Run compose.py to merge content with template pages
- Validate output — Run validate_output.py, fix errors, repeat until pass
- Sign (optional) — Apply digital signature via pyhanko
按照以下步骤依次生成PDF:
- 解析品牌套件 — 定位brand-{slug}工具,读取manifest.json和zones.json文件
- 解析Markdown — 通过库将源内容转换为HTML
markdown - 渲染内容页面 — 运行render.py生成带样式的内容页面
- 组合文档 — 运行compose.py将内容页面与模板页面合并
- 验证输出 — 运行validate_output.py,修复错误后重复验证直至通过
- 签名(可选) — 通过pyhanko添加数字签名
Rendering Internals
渲染内部机制
See for details on font registration, zone overlays,
section page breaks, orphan prevention, SVG rendering, image corner radius,
chart integration, image generation tokens, CSS specifics, and token resolution.
Read when debugging rendering issues or understanding pipeline behavior.
references/internals.md如需了解字体注册、区域覆盖、章节分页、孤行控制、SVG渲染、图片圆角、图表集成、图片生成令牌、CSS细节和令牌解析的详细信息,请查看。调试渲染问题或理解流程逻辑时可阅读该文档。
references/internals.mdStep 1: Resolve Brand Kit
步骤1:解析品牌套件
Locate the brand kit skill directory and read:
- — font paths, logo paths, template paths, tokens (colors + type_scale)
assets/manifest.json - — content zones for each template page
assets/templates/pdf/zones.json
The section is the machine-readable source of truth for all
color and typography values used by the pipeline scripts.
manifest.json["tokens"]If no brand kit is specified, use fallback assets from .
assets/fallback/定位品牌套件工具目录并读取以下文件:
- — 字体路径、Logo路径、模板路径、令牌(颜色+字体层级)
assets/manifest.json - — 各模板页面的内容区域配置
assets/templates/pdf/zones.json
manifest.json["tokens"]若未指定品牌套件,则使用下的备用资源。
assets/fallback/Step 2: Parse Markdown
步骤2:解析Markdown
Convert source markdown to HTML:
python
import markdown
html = markdown.markdown(source, extensions=[
'tables', 'fenced_code', 'codehilite', 'toc', 'meta', 'attr_list'
])Extract document metadata from markdown frontmatter:
- ,
title,subtitle,author→ cover page zonesdate - → section divider insertion
sections
将源Markdown转换为HTML:
python
import markdown
html = markdown.markdown(source, extensions=[
'tables', 'fenced_code', 'codehilite', 'toc', 'meta', 'attr_list'
])从Markdown前置元数据中提取文档信息:
- ,
title,subtitle,author→ 封面页面区域date - → 章节分隔符插入位置
sections
Step 3: Render Content Pages
步骤3:渲染内容页面
The flag is optional. Without it, render.py uses fallback fonts and colors.
--brandbash
undefined--brandbash
undefinedWith brand kit:
使用品牌套件:
python3 scripts/render.py
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'
python3 scripts/render.py
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'
Fallback (no brand kit):
备用模式(无品牌套件):
python3 scripts/render.py
--input content.html
--output content-pages.pdf
--input content.html
--output content-pages.pdf
The `--sections` flag accepts a JSON array of section titles. H1 headings matching
these titles are hidden from content pages (they appear on section divider pages instead).
Omit `--sections` when not using section dividers.
For composition rules (grid, spacing, page breaks, widows/orphans), load
[references/composition.md](references/composition.md).
For element rendering details (headings, code blocks, tables, lists), load
[references/elements.md](references/elements.md).python3 scripts/render.py
--input content.html
--output content-pages.pdf
--input content.html
--output content-pages.pdf
`--sections`参数接受章节标题组成的JSON数组。与这些标题匹配的H1标题将不会显示在内容页面中(仅会出现在章节分隔页面)。不使用章节分隔符时可省略该参数。
如需了解排版规则(网格、间距、分页、孤行/ widow行控制),请查看[references/composition.md](references/composition.md)。
如需了解元素渲染细节(标题、代码块、表格、列表),请查看[references/elements.md](references/elements.md)。Step 4: Compose Document
步骤4:组合文档
The flag is optional. Without it, compose.py produces content-only output
(no cover pages, section dividers, or zone overlays).
--brandbash
undefined--brandbash
undefinedWith brand kit:
使用品牌套件:
python3 scripts/compose.py
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
python3 scripts/compose.py
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
Fallback (no brand kit, metadata only):
备用模式(无品牌套件,仅使用元数据):
python3 scripts/compose.py
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
compose.py embeds title, author, and subtitle from metadata.json into the PDF info
dictionary.
Provide metadata.json with this structure:
```json
{
"title": "Document Title",
"subtitle": "Optional Subtitle",
"author": "Author Name",
"date": "February 9, 2026",
"sections": [
{ "title": "Introduction", "page": 1 },
{ "title": "Analysis", "page": 5 }
]
}python3 scripts/compose.py
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
compose.py会将metadata.json中的标题、作者和副标题嵌入PDF信息字典。
metadata.json需遵循以下结构:
```json
{
"title": "Document Title",
"subtitle": "Optional Subtitle",
"author": "Author Name",
"date": "February 9, 2026",
"sections": [
{ "title": "Introduction", "page": 1 },
{ "title": "Analysis", "page": 5 }
]
}Step 5: Validate Output
步骤5:验证输出
Run automated validation:
bash
python3 scripts/validate_output.py final.pdf --brand /path/to/brand-{slug}If validation fails, check these common issues:
- "Font not embedded" — Verify font path in manifest matches actual file in assets/fonts/
- "Page count 0" — Verify render.py produced output; check input HTML is not empty
- "Missing metadata" — Ensure metadata.json contains title and author fields
- "Brand font missing" — Confirm all fonts declared in manifest exist on disk
Fix errors and re-run validation. Only proceed when all checks pass.
Then perform manual QA:
- No H1 duplication — Section titles appear only on divider pages, not repeated on content pages (use in Step 3 to prevent this)
--sections - Images contextually relevant — Each image matches its section topic; no nonsensical compositions or artifacts
- Images brand-aligned — Style matches guidelines (B&W for Wave Artisans, editorial for Bluewaves, reportage for Decathlon)
tokens.imagery - Charts readable — Labels don't overlap, legends are clear, data is accurate, no clipping
- File size — Total PDF under 50MB; if over, regenerate images with JPEG format and 2K resolution
- Page flow — No orphaned headings at page bottoms, no widowed single lines at page tops
- Cover/divider zones — Title, subtitle, date, and author render correctly in their zones
运行自动化验证:
bash
python3 scripts/validate_output.py final.pdf --brand /path/to/brand-{slug}若验证失败,请检查以下常见问题:
- "Font not embedded" — 验证manifest中的字体路径与assets/fonts/下的实际文件路径是否匹配
- "Page count 0" — 验证render.py是否生成了输出内容;检查输入HTML是否为空
- "Missing metadata" — 确保metadata.json包含title和author字段
- "Brand font missing" — 确认manifest中声明的所有字体均存在于本地磁盘
修复错误后重新运行验证,仅当所有检查通过后再继续后续步骤。
随后执行人工质量检查:
- 无H1重复 — 章节标题仅出现在分隔页面,不会在内容页面重复显示(可在步骤3中使用参数避免该问题)
--sections - 图片上下文关联 — 每张图片均匹配对应章节主题;无不合理排版或渲染 artifacts
- 图片符合品牌风格 — 样式符合指南(Wave Artisans使用黑白风格,Bluewaves使用编辑风格,Decathlon使用新闻报道风格)
tokens.imagery - 图表可读性 — 标签无重叠、图例清晰、数据准确、无内容被截断
- 文件大小 — PDF总大小不超过50MB;若超出,重新生成JPEG格式、2K分辨率的图片
- 页面流 — 页面底部无孤立标题,页面顶部无单独一行的widow行
- 封面/分隔页区域 — 标题、副标题、日期和作者在对应区域中渲染正确
Step 6: Sign (Optional)
步骤6:签名(可选)
Apply digital signature via pyhanko when required:
python
from pyhanko.sign import signers
from pyhanko.keys import load_cert_from_pemder需要时可通过pyhanko添加数字签名:
python
from pyhanko.sign import signers
from pyhanko.keys import load_cert_from_pemderRequires a PKCS#12 certificate file (.p12/.pfx)
需要PKCS#12证书文件(.p12/.pfx格式)
undefinedundefinedExample
示例
Input markdown:
markdown
---
title: Q4 Performance Review
subtitle: Engineering Division
author: Jane Smith
date: January 2026
---输入Markdown:
markdown
---
title: Q4 Performance Review
subtitle: Engineering Division
author: Jane Smith
date: January 2026
---Executive Summary
Executive Summary
Revenue grew 23% year-over-year...
Revenue grew 23% year-over-year...
Technical Achievements
Technical Achievements
Infrastructure Scaling
Infrastructure Scaling
We migrated 40 services to the new platform...
**Expected output (using brand-bluewaves):**
1. Cover page: `cover-front.pdf` template with "Q4 Performance Review" in title zone (display style, text-inverse), "Engineering Division" in subtitle zone, "JANUARY 2026" in date zone, Bluewaves logo in logo zone
2. Content pages: Body text on `page-content.pdf` template, heading font for h1/h2, body font for paragraphs, document title in header zone, page numbers in footer zone
3. Section dividers: `section-divider.pdf` template inserted before "Executive Summary" and "Technical Achievements" with section titles in zones
4. Back cover: `cover-back.pdf` template as final pageWe migrated 40 services to the new platform...
**预期输出(使用brand-bluewaves套件):**
1. 封面:使用`cover-front.pdf`模板,标题区域显示"Q4 Performance Review"(显示样式,反色文本),副标题区域显示"Engineering Division",日期区域显示"JANUARY 2026",Logo区域显示Bluewaves Logo
2. 内容页面:使用`page-content.pdf`模板,h1/h2使用标题字体,段落使用正文字体,页眉区域显示文档标题,页脚区域显示页码
3. 章节分隔页:在"Executive Summary"和"Technical Achievements"前插入`section-divider.pdf`模板,章节标题显示在对应区域
4. 封底:最后一页使用`cover-back.pdf`模板Fallback Mode
备用模式
When no brand kit is specified, the pipeline uses assets:
assets/fallback/- Colors: text , headings
#1A1A1A, muted#1A1A1A#7A7A7A - Fonts: Noto Sans (body/heading), Fira Code (code) — production fonts, not placeholders
- Layout: A4, 25mm margins, 11pt body
- No cover pages or section dividers in fallback mode
未指定品牌套件时,流程将使用下的备用资源:
assets/fallback/- 颜色:文本、标题
#1A1A1A、次要文本#1A1A1A#7A7A7A - 字体:Noto Sans(正文/标题)、Fira Code(代码)—— 生产级字体,而非占位字体
- 布局:A4纸张,25mm边距,11pt正文字号
- 备用模式下无封面和章节分隔页