pdf-factory

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PDF Factory

PDF工厂

Dependencies

依赖项

Run before first use:
bash
python3 scripts/install_deps.py
Required packages: xhtml2pdf, reportlab, pypdf, pyhanko, markdown, lxml, pillow, html5lib, cssselect2, svglib, python-bidi, arabic-reshaper.
The installer uses
--no-deps
for svglib, rlpycairo, and xhtml2pdf to avoid building the
pycairo
C extension (which requires the system
cairo
library). If installation fails with cairo/meson errors, run manually:
uv pip install --no-deps rlpycairo svglib xhtml2pdf
首次使用前运行:
bash
python3 scripts/install_deps.py
所需依赖包:xhtml2pdf, reportlab, pypdf, pyhanko, markdown, lxml, pillow, html5lib, cssselect2, svglib, python-bidi, arabic-reshaper.
安装器对svglib、rlpycairo和xhtml2pdf使用
--no-deps
参数,以避免构建
pycairo
C扩展(该扩展需要系统
cairo
库)。 如果安装过程中出现cairo/meson相关错误,请手动运行以下命令:
uv pip install --no-deps rlpycairo svglib xhtml2pdf

Icons

图标

Phosphor icons are fetched on demand (not bundled). To pre-download icons:
bash
undefined
Phosphor图标为按需获取(未打包)。如需预下载图标:
bash
undefined

Fetch specific icons

下载指定图标

python3 scripts/fetch_icons.py arrow-right check-circle warning
python3 scripts/fetch_icons.py arrow-right check-circle warning

Fetch all 1,500+ icons for offline use

下载全部1500+图标以供离线使用

python3 scripts/fetch_icons.py --all
undefined
python3 scripts/fetch_icons.py --all
undefined

Pipeline

处理流程

Generate PDFs by following these steps in order:
  1. Resolve brand kit — Locate brand-{slug} skill, read manifest.json and zones.json
  2. Parse markdown — Convert source to HTML via
    markdown
    library
  3. Render content pages — Run render.py to produce styled content pages
  4. Compose document — Run compose.py to merge content with template pages
  5. Validate output — Run validate_output.py, fix errors, repeat until pass
  6. Sign (optional) — Apply digital signature via pyhanko
按照以下步骤依次生成PDF:
  1. 解析品牌套件 — 定位brand-{slug}工具,读取manifest.json和zones.json文件
  2. 解析Markdown — 通过
    markdown
    库将源内容转换为HTML
  3. 渲染内容页面 — 运行render.py生成带样式的内容页面
  4. 组合文档 — 运行compose.py将内容页面与模板页面合并
  5. 验证输出 — 运行validate_output.py,修复错误后重复验证直至通过
  6. 签名(可选) — 通过pyhanko添加数字签名

Rendering Internals

渲染内部机制

See
references/internals.md
for details on font registration, zone overlays, section page breaks, orphan prevention, SVG rendering, image corner radius, chart integration, image generation tokens, CSS specifics, and token resolution. Read when debugging rendering issues or understanding pipeline behavior.
如需了解字体注册、区域覆盖、章节分页、孤行控制、SVG渲染、图片圆角、图表集成、图片生成令牌、CSS细节和令牌解析的详细信息,请查看
references/internals.md
。调试渲染问题或理解流程逻辑时可阅读该文档。

Step 1: Resolve Brand Kit

步骤1:解析品牌套件

Locate the brand kit skill directory and read:
  • assets/manifest.json
    — font paths, logo paths, template paths, tokens (colors + type_scale)
  • assets/templates/pdf/zones.json
    — content zones for each template page
The
manifest.json["tokens"]
section is the machine-readable source of truth for all color and typography values used by the pipeline scripts.
If no brand kit is specified, use fallback assets from
assets/fallback/
.
定位品牌套件工具目录并读取以下文件:
  • assets/manifest.json
    — 字体路径、Logo路径、模板路径、令牌(颜色+字体层级)
  • assets/templates/pdf/zones.json
    — 各模板页面的内容区域配置
manifest.json["tokens"]
部分是流程脚本使用的所有颜色和字体配置的权威机器可读来源。
若未指定品牌套件,则使用
assets/fallback/
下的备用资源。

Step 2: Parse Markdown

步骤2:解析Markdown

Convert source markdown to HTML:
python
import markdown
html = markdown.markdown(source, extensions=[
    'tables', 'fenced_code', 'codehilite', 'toc', 'meta', 'attr_list'
])
Extract document metadata from markdown frontmatter:
  • title
    ,
    subtitle
    ,
    author
    ,
    date
    → cover page zones
  • sections
    → section divider insertion
将源Markdown转换为HTML:
python
import markdown
html = markdown.markdown(source, extensions=[
    'tables', 'fenced_code', 'codehilite', 'toc', 'meta', 'attr_list'
])
从Markdown前置元数据中提取文档信息:
  • title
    ,
    subtitle
    ,
    author
    ,
    date
    → 封面页面区域
  • sections
    → 章节分隔符插入位置

Step 3: Render Content Pages

步骤3:渲染内容页面

The
--brand
flag is optional. Without it, render.py uses fallback fonts and colors.
bash
undefined
--brand
参数为可选参数。若不指定,render.py将使用备用字体和颜色。
bash
undefined

With brand kit:

使用品牌套件:

python3 scripts/render.py
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'
python3 scripts/render.py
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'

Fallback (no brand kit):

备用模式(无品牌套件):

python3 scripts/render.py
--input content.html
--output content-pages.pdf

The `--sections` flag accepts a JSON array of section titles. H1 headings matching
these titles are hidden from content pages (they appear on section divider pages instead).
Omit `--sections` when not using section dividers.

For composition rules (grid, spacing, page breaks, widows/orphans), load
[references/composition.md](references/composition.md).

For element rendering details (headings, code blocks, tables, lists), load
[references/elements.md](references/elements.md).
python3 scripts/render.py
--input content.html
--output content-pages.pdf

`--sections`参数接受章节标题组成的JSON数组。与这些标题匹配的H1标题将不会显示在内容页面中(仅会出现在章节分隔页面)。不使用章节分隔符时可省略该参数。

如需了解排版规则(网格、间距、分页、孤行/ widow行控制),请查看[references/composition.md](references/composition.md)。

如需了解元素渲染细节(标题、代码块、表格、列表),请查看[references/elements.md](references/elements.md)。

Step 4: Compose Document

步骤4:组合文档

The
--brand
flag is optional. Without it, compose.py produces content-only output (no cover pages, section dividers, or zone overlays).
bash
undefined
--brand
参数为可选参数。若不指定,compose.py将仅生成内容输出(无封面、章节分隔符或区域覆盖)。
bash
undefined

With brand kit:

使用品牌套件:

python3 scripts/compose.py
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf
python3 scripts/compose.py
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf

Fallback (no brand kit, metadata only):

备用模式(无品牌套件,仅使用元数据):

python3 scripts/compose.py
--content content-pages.pdf
--metadata metadata.json
--output final.pdf

compose.py embeds title, author, and subtitle from metadata.json into the PDF info
dictionary.

Provide metadata.json with this structure:

```json
{
  "title": "Document Title",
  "subtitle": "Optional Subtitle",
  "author": "Author Name",
  "date": "February 9, 2026",
  "sections": [
    { "title": "Introduction", "page": 1 },
    { "title": "Analysis", "page": 5 }
  ]
}
python3 scripts/compose.py
--content content-pages.pdf
--metadata metadata.json
--output final.pdf

compose.py会将metadata.json中的标题、作者和副标题嵌入PDF信息字典。

metadata.json需遵循以下结构:

```json
{
  "title": "Document Title",
  "subtitle": "Optional Subtitle",
  "author": "Author Name",
  "date": "February 9, 2026",
  "sections": [
    { "title": "Introduction", "page": 1 },
    { "title": "Analysis", "page": 5 }
  ]
}

Step 5: Validate Output

步骤5:验证输出

Run automated validation:
bash
python3 scripts/validate_output.py final.pdf --brand /path/to/brand-{slug}
If validation fails, check these common issues:
  • "Font not embedded" — Verify font path in manifest matches actual file in assets/fonts/
  • "Page count 0" — Verify render.py produced output; check input HTML is not empty
  • "Missing metadata" — Ensure metadata.json contains title and author fields
  • "Brand font missing" — Confirm all fonts declared in manifest exist on disk
Fix errors and re-run validation. Only proceed when all checks pass.
Then perform manual QA:
  1. No H1 duplication — Section titles appear only on divider pages, not repeated on content pages (use
    --sections
    in Step 3 to prevent this)
  2. Images contextually relevant — Each image matches its section topic; no nonsensical compositions or artifacts
  3. Images brand-aligned — Style matches
    tokens.imagery
    guidelines (B&W for Wave Artisans, editorial for Bluewaves, reportage for Decathlon)
  4. Charts readable — Labels don't overlap, legends are clear, data is accurate, no clipping
  5. File size — Total PDF under 50MB; if over, regenerate images with JPEG format and 2K resolution
  6. Page flow — No orphaned headings at page bottoms, no widowed single lines at page tops
  7. Cover/divider zones — Title, subtitle, date, and author render correctly in their zones
运行自动化验证:
bash
python3 scripts/validate_output.py final.pdf --brand /path/to/brand-{slug}
若验证失败,请检查以下常见问题:
  • "Font not embedded" — 验证manifest中的字体路径与assets/fonts/下的实际文件路径是否匹配
  • "Page count 0" — 验证render.py是否生成了输出内容;检查输入HTML是否为空
  • "Missing metadata" — 确保metadata.json包含title和author字段
  • "Brand font missing" — 确认manifest中声明的所有字体均存在于本地磁盘
修复错误后重新运行验证,仅当所有检查通过后再继续后续步骤。
随后执行人工质量检查:
  1. 无H1重复 — 章节标题仅出现在分隔页面,不会在内容页面重复显示(可在步骤3中使用
    --sections
    参数避免该问题)
  2. 图片上下文关联 — 每张图片均匹配对应章节主题;无不合理排版或渲染 artifacts
  3. 图片符合品牌风格 — 样式符合
    tokens.imagery
    指南(Wave Artisans使用黑白风格,Bluewaves使用编辑风格,Decathlon使用新闻报道风格)
  4. 图表可读性 — 标签无重叠、图例清晰、数据准确、无内容被截断
  5. 文件大小 — PDF总大小不超过50MB;若超出,重新生成JPEG格式、2K分辨率的图片
  6. 页面流 — 页面底部无孤立标题,页面顶部无单独一行的widow行
  7. 封面/分隔页区域 — 标题、副标题、日期和作者在对应区域中渲染正确

Step 6: Sign (Optional)

步骤6:签名(可选)

Apply digital signature via pyhanko when required:
python
from pyhanko.sign import signers
from pyhanko.keys import load_cert_from_pemder
需要时可通过pyhanko添加数字签名:
python
from pyhanko.sign import signers
from pyhanko.keys import load_cert_from_pemder

Requires a PKCS#12 certificate file (.p12/.pfx)

需要PKCS#12证书文件(.p12/.pfx格式)

undefined
undefined

Example

示例

Input markdown:
markdown
---
title: Q4 Performance Review
subtitle: Engineering Division
author: Jane Smith
date: January 2026
---
输入Markdown:
markdown
---
title: Q4 Performance Review
subtitle: Engineering Division
author: Jane Smith
date: January 2026
---

Executive Summary

Executive Summary

Revenue grew 23% year-over-year...
Revenue grew 23% year-over-year...

Technical Achievements

Technical Achievements

Infrastructure Scaling

Infrastructure Scaling

We migrated 40 services to the new platform...

**Expected output (using brand-bluewaves):**

1. Cover page: `cover-front.pdf` template with "Q4 Performance Review" in title zone (display style, text-inverse), "Engineering Division" in subtitle zone, "JANUARY 2026" in date zone, Bluewaves logo in logo zone
2. Content pages: Body text on `page-content.pdf` template, heading font for h1/h2, body font for paragraphs, document title in header zone, page numbers in footer zone
3. Section dividers: `section-divider.pdf` template inserted before "Executive Summary" and "Technical Achievements" with section titles in zones
4. Back cover: `cover-back.pdf` template as final page
We migrated 40 services to the new platform...

**预期输出(使用brand-bluewaves套件):**

1. 封面:使用`cover-front.pdf`模板,标题区域显示"Q4 Performance Review"(显示样式,反色文本),副标题区域显示"Engineering Division",日期区域显示"JANUARY 2026",Logo区域显示Bluewaves Logo
2. 内容页面:使用`page-content.pdf`模板,h1/h2使用标题字体,段落使用正文字体,页眉区域显示文档标题,页脚区域显示页码
3. 章节分隔页:在"Executive Summary"和"Technical Achievements"前插入`section-divider.pdf`模板,章节标题显示在对应区域
4. 封底:最后一页使用`cover-back.pdf`模板

Fallback Mode

备用模式

When no brand kit is specified, the pipeline uses
assets/fallback/
assets:
  • Colors: text
    #1A1A1A
    , headings
    #1A1A1A
    , muted
    #7A7A7A
  • Fonts: Noto Sans (body/heading), Fira Code (code) — production fonts, not placeholders
  • Layout: A4, 25mm margins, 11pt body
  • No cover pages or section dividers in fallback mode
未指定品牌套件时,流程将使用
assets/fallback/
下的备用资源:
  • 颜色:文本
    #1A1A1A
    、标题
    #1A1A1A
    、次要文本
    #7A7A7A
  • 字体:Noto Sans(正文/标题)、Fira Code(代码)—— 生产级字体,而非占位字体
  • 布局:A4纸张,25mm边距,11pt正文字号
  • 备用模式下无封面和章节分隔页