course-ebook-publishing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Course Ebook Publishing

课程电子书发布

Schema authority: this skill reads the live
window.COURSE
object whose shape is defined in
_shared/domain-primitives.md
. Quiz / pre-test / post-test items are filtered OUT of the ebook (per §10 quiz item rules + ebook content policy).
Filename convention (English-first): outputs land in
dist/{name}.pdf
/
dist/{name}.docx
. Source markdown is composed under
dist/master.md
.
This skill turns a finished teaching website into a book — PDF (primary) and optionally DOCX (for editorial review or further authoring). It is a post-site step: it consumes
window.COURSE
from the live site, never re-authoring content.
Schema 权威性:此Skill会读取实时的
window.COURSE
对象,其结构定义在
_shared/domain-primitives.md
中。测验/预测试/后测试内容会被过滤出电子书(遵循第10条测验内容规则及电子书内容政策)。
文件名约定(英文优先):输出文件将保存至
dist/{name}.pdf
/
dist/{name}.docx
。源markdown文件合成后保存于
dist/master.md
此Skill可将已完成的教学网站转换为书籍格式——主要是PDF,也可生成DOCX(用于编辑审阅或进一步创作)。这是一个网站完成后的步骤:它从实时网站读取
window.COURSE
对象,不会重新创作内容。

When to Invoke

调用时机

  • Site is feature-complete (Stages 1–5 done) and content is stable.
  • Stakeholders ask for a printable hand-out, an archivable record, or a deliverable for non-web channels.
  • Sometimes invoked after
    course-corporate-edition
    to produce a client deliverable bundle.
Do NOT invoke when site is mid-development. The ebook pipeline reads
window.COURSE
; if the site changes daily, the ebook drifts daily.
  • 网站功能完整(阶段1–5已完成)且内容稳定。
  • 利益相关方需要可打印的讲义、可存档的记录,或用于非网络渠道的交付物。
  • 有时会在
    course-corporate-edition
    之后调用,以生成客户交付包。
请勿调用的情况:网站处于开发中期。电子书流水线读取
window.COURSE
对象;若网站每日变更,电子书内容也会随之每日偏离。

Architecture: Single Source → Two Outputs

架构:单一数据源 → 两种输出格式

window.COURSE  ─┐
                ├─→ compose-ebook.mjs ─→ master.md ─→ pandoc ─→ ebook.docx
asset folders ─┤                                  └─→ Playwright page.pdf() ─→ ebook.pdf
materials/   ─┘
Why a single composed markdown intermediate: keeps PDF and DOCX content identical. Pandoc handles markdown → DOCX cleanly; Playwright renders the same markdown via an HTML wrapper → PDF. Two outputs, one source of truth.
window.COURSE  ─┐
                ├─→ compose-ebook.mjs ─→ master.md ─→ pandoc ─→ ebook.docx
asset folders ─┤                                  └─→ Playwright page.pdf() ─→ ebook.pdf
materials/   ─┘
为何使用单一合成的markdown中间件:确保PDF和DOCX内容完全一致。Pandoc可将markdown干净地转换为DOCX;Playwright通过HTML包装器渲染同一markdown文件以生成PDF。两种输出格式共享同一数据源。

File Layout (Standard)

标准文件布局

scripts/
├── build-ebook.mjs              ← CLI entry point (--md-only, --no-docx, --no-pdf, --output, --keep-html)
├── render-pdf.mjs               ← markdown → HTML → PDF via Playwright
└── lib/
    ├── compose-ebook.mjs        ← loadSources + composeXxx functions (cover, TOC, chapters, appendix)
    └── reference.docx           ← pandoc style template (generated by gen-reference-docx.mjs)

style-ebook.css                   ← @page rules, cover, page-number footer, print-only styles
dist/                             ← output: ebook.md, ebook.pdf, ebook.docx
scripts/
├── build-ebook.mjs              ← CLI入口点(--md-only, --no-docx, --no-pdf, --output, --keep-html)
├── render-pdf.mjs               ← markdown → HTML → PDF via Playwright
└── lib/
    ├── compose-ebook.mjs        ← loadSources + composeXxx函数(封面、目录、章节、附录)
    └── reference.docx           ← pandoc样式模板(由gen-reference-docx.mjs生成)

style-ebook.css                   ← @page规则、封面、页码页脚、仅打印样式
dist/                             ←输出文件:ebook.md, ebook.pdf, ebook.docx

Loading Source Data:
window.COURSE
from index.html

加载源数据:从index.html读取
window.COURSE

When the site uses external
course-data.js
, just import it. When it uses inlined COURSE (corporate edition), extract via regex + vm sandbox:
js
import vm from 'node:vm';
async function loadCourseFromIndexHtml(htmlPath) {
  const html = await fs.readFile(htmlPath, 'utf8');
  const match = html.match(/<script>\s*window\.COURSE\s*=\s*\{[\s\S]*?\};\s*<\/script>/);
  if (!match) throw new Error('window.COURSE 區塊找不到');
  const code = match[0].replace(/<\/?script>/g, '');
  const sandbox = { window: {}, console };
  vm.createContext(sandbox);
  vm.runInContext(code, sandbox);
  return sandbox.window.COURSE;
}
Why vm sandbox not
eval
: vm isolates the script's globals from the build process; an accidental side-effect can't leak.
当网站使用外部
course-data.js
时,直接导入即可。当网站使用内联的COURSE对象(企业版)时,通过正则表达式+vm沙箱提取:
js
import vm from 'node:vm';
async function loadCourseFromIndexHtml(htmlPath) {
  const html = await fs.readFile(htmlPath, 'utf8');
  const match = html.match(/<script>\s*window\.COURSE\s*=\s*\{[\s\S]*?\};\s*<\/script>/);
  if (!match) throw new Error('window.COURSE 區塊找不到');
  const code = match[0].replace(/<\/?script>/g, '');
  const sandbox = { window: {}, console };
  vm.createContext(sandbox);
  vm.runInContext(code, sandbox);
  return sandbox.window.COURSE;
}
为何使用vm沙箱而非
eval
:vm沙箱可将脚本的全局变量与构建进程隔离;意外的副作用不会泄露到构建进程中。

PDF Rendering: Playwright
page.pdf()
(not Chrome CLI)

PDF渲染:使用Playwright
page.pdf()
(而非Chrome CLI)

Original implementation:
pandoc → HTML → msedge --headless --print-to-pdf
. Switch to Playwright — Chrome CLI doesn't support
footerTemplate
(no page numbers).
js
import { chromium } from 'playwright';
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(`file://${path.resolve(htmlPath)}`, { waitUntil: 'networkidle' });
await page.pdf({
  path: outputPath,
  format: 'A4',
  printBackground: true,
  displayHeaderFooter: true,
  headerTemplate: '<div></div>',
  footerTemplate: `
    <div style="font-size:9pt; width:100%; text-align:center; color:#666;">
      <span class="pageNumber"></span> / <span class="totalPages"></span>
    </div>`,
  margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' },
});
await browser.close();
最初的实现方案:
pandoc → HTML → msedge --headless --print-to-pdf
切换为Playwright——Chrome CLI不支持
footerTemplate
(无法添加页码)。
js
import { chromium } from 'playwright';
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(`file://${path.resolve(htmlPath)}`, { waitUntil: 'networkidle' });
await page.pdf({
  path: outputPath,
  format: 'A4',
  printBackground: true,
  displayHeaderFooter: true,
  headerTemplate: '<div></div>',
  footerTemplate: `
    <div style="font-size:9pt; width:100%; text-align:center; color:#666;">
      <span class="pageNumber"></span> / <span class="totalPages"></span>
    </div>`,
  margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' },
});
await browser.close();

Cover page without page number

无页码的封面页

The cover should be number-less. CSS:
css
@page { margin: 20mm; }
@page :first { margin: 0; }    /* full bleed cover */
Playwright respects
@page :first
— the footer doesn't print on it.
封面不应显示页码。CSS代码如下:
css
@page { margin: 20mm; }
@page :first { margin: 0; }    /* full bleed cover */
Playwright支持
@page :first
规则——封面页不会打印页脚。

Book-Style Layout (style-ebook.css)

书籍风格布局(style-ebook.css)

The example workshop's "book-grade typography" commit was driven by aesthetics, not just formality. Key rules worth porting:
css
/* 章首跨頁分頁 */
h1.chapter { page-break-before: always; }

/* 圖文並排(Grid 排版) */
.figure-grid {
  display: grid;
  grid-template-columns: repeat(2, 1fr);
  gap: 12mm;
}
.figure-grid .three-up { grid-template-columns: repeat(3, 1fr); }

/* 提示詞區塊:橘色 PROMPT 徽章 + 暗底 code */
.prompt-block {
  background: #1a1a1a; color: #e0e0e0;
  border-left: 4px solid #ff8c00;
  padding: 12pt; margin: 8pt 0;
}
.prompt-block::before {
  content: 'PROMPT'; display: inline-block;
  background: #ff8c00; color: white;
  padding: 2pt 8pt; border-radius: 3pt;
  font-size: 8pt; margin-bottom: 6pt;
}

/* 任務勾選方框(純印刷) */
.task::before {
  content: '☐'; margin-right: 8pt;
  font-size: 14pt; color: #888;
}

/* 學習目標綠色徽章 */
.goal::before {
  content: '✓'; margin-right: 6pt;
  color: #2e7d32; font-weight: bold;
}
示例工作坊的“书籍级排版”提交主要为了提升美观度,而非仅满足形式要求。以下是值得复用的关键规则:
css
/* 章首跨頁分頁 */
h1.chapter { page-break-before: always; }

/* 圖文並排(Grid 排版) */
.figure-grid {
  display: grid;
  grid-template-columns: repeat(2, 1fr);
  gap: 12mm;
}
.figure-grid .three-up { grid-template-columns: repeat(3, 1fr); }

/* 提示詞區塊:橘色 PROMPT 徽章 + 暗底 code */
.prompt-block {
  background: #1a1a1a; color: #e0e0e0;
  border-left: 4px solid #ff8c00;
  padding: 12pt; margin: 8pt 0;
}
.prompt-block::before {
  content: 'PROMPT'; display: inline-block;
  background: #ff8c00; color: white;
  padding: 2pt 8pt; border-radius: 3pt;
  font-size: 8pt; margin-bottom: 6pt;
}

/* 任務勾選方框(純印刷) */
.task::before {
  content: '☐'; margin-right: 8pt;
  font-size: 14pt; color: #888;
}

/* 學習目標綠色徽章 */
.goal::before {
  content: '✓'; margin-right: 6pt;
  color: #2e7d32; font-weight: bold;
}

DOCX via pandoc + reference template

通过pandoc+参考模板生成DOCX

js
const args = [
  mdPath,
  '-f', 'gfm+attributes+raw_html',
  '-t', 'docx',
  '--standalone',
  '--toc', '--toc-depth=2',
  '--resource-path', path.dirname(mdPath),
  '--reference-doc', 'scripts/lib/reference.docx',
  '-o', outPath,
];
spawn('pandoc', args);
reference.docx
is a one-time-generated template with your chosen fonts (e.g. Microsoft JhengHei UI for Chinese), heading colors, paragraph spacing. Generate via
scripts/gen-reference-docx.mjs
— pandoc copies the styles.
Caveat: raw HTML support in DOCX is partial.
<figure-grid>
blocks won't render as grids in Word — they degrade to vertical stack. Use markdown
![alt](path)
syntax for images so they survive both formats.
js
const args = [
  mdPath,
  '-f', 'gfm+attributes+raw_html',
  '-t', 'docx',
  '--standalone',
  '--toc', '--toc-depth=2',
  '--resource-path', path.dirname(mdPath),
  '--reference-doc', 'scripts/lib/reference.docx',
  '-o', outPath,
];
spawn('pandoc', args);
**
reference.docx
**是一次性生成的模板,包含你选择的字体(例如中文使用微软正黑体)、标题颜色、段落间距。可通过
scripts/gen-reference-docx.mjs
生成——pandoc会复制其中的样式。
注意事项:DOCX对原生HTML的支持有限
<figure-grid>
块在Word中不会渲染为网格——会退化为垂直堆叠。请使用markdown的
![alt](path)
语法插入图片,以确保在两种格式中都能正常显示。

Compose Module Anatomy

合成模块结构

compose-ebook.mjs
exports one function per chapter type:
js
export async function loadSources() { /* read COURSE + materials + assets */ }
export async function composeCover(meta) { /* big title, day list, meta card */ }
export async function composeOverview(meta) { /* TOC + course overview */ }
export async function composeChapter(day, assetIndex) { /* hero image + units */ }
export async function composeAppendixMaterials(materials) { /* full-text material appendix */ }
export async function composeAppendixSkills(skills) { /* QR-coded skill links */ }
Each returns a markdown string.
build-ebook.mjs
concatenates them with
\n\n---\n\n
.
compose-ebook.mjs
为每种章节类型导出一个函数:
js
export async function loadSources() { /* read COURSE + materials + assets */ }
export async function composeCover(meta) { /* big title, day list, meta card */ }
export async function composeOverview(meta) { /* TOC + course overview */ }
export async function composeChapter(day, assetIndex) { /* hero image + units */ }
export async function composeAppendixMaterials(materials) { /* full-text material appendix */ }
export async function composeAppendixSkills(skills) { /* QR-coded skill links */ }
每个函数返回一个markdown字符串。
build-ebook.mjs
使用
\n\n---\n\n
将它们拼接在一起。

Asset Path Strategy

资源路径策略

Inside the composed markdown, images use paths relative to the markdown file (
./assets/illustrations/foo.png
). Pandoc's
--resource-path
resolves them; Playwright's
file://
HTML loader does too. No absolute paths.
For corporate editions with asset fallback, resolve the actual file location at compose time and embed the resolved path:
js
async function pickAsset(name, roots) {
  for (const root of roots) {
    const full = path.join(root, name);
    try { await fs.access(full); return path.relative(mdDir, full); }
    catch { continue; }
  }
  return null;  // caller decides whether to drop the image or fall back to placeholder
}
在合成的markdown文件中,图片使用相对于markdown文件的路径
./assets/illustrations/foo.png
)。Pandoc的
--resource-path
参数可解析这些路径;Playwright的
file://
HTML加载器也支持这种路径。请勿使用绝对路径。
对于带有资源回退机制的企业版,在合成时解析实际文件位置并嵌入解析后的路径:
js
async function pickAsset(name, roots) {
  for (const root of roots) {
    const full = path.join(root, name);
    try { await fs.access(full); return path.relative(mdDir, full); }
    catch { continue; }
  }
  return null;  // caller decides whether to drop the image or fall back to placeholder
}

What to Filter Out

需要过滤的内容

Some content lives on the website but should not be in the ebook:
  • Quiz items: leaking exam questions defeats the assessment. The example workshop filters out the entire quiz chapter.
  • Pre-test / post-test sections: same reason.
  • Interactive-only features: "點擊複製" buttons make no sense in print. Render the raw prompt text, not the button.
  • Dynamic illustrations: anything generated client-side via canvas/JS won't render.
Add filter conditions to the compose functions; don't try to render-then-strip.
有些内容适合在网站上展示,但不应出现在电子书中
  • 测验内容:泄露考题会削弱评估的有效性。示例工作坊会过滤掉整个测验章节。
  • 预测试/后测试部分:原因同上。
  • 仅交互式功能:“点击复制”按钮在印刷版中毫无意义。应渲染原始提示文本,而非按钮。
  • 动态插图:任何通过canvas/JS在客户端生成的内容都无法被渲染。
请在合成函数中添加过滤条件;不要尝试先渲染再剥离内容。

Verification

验证

After generating, verify:
js
// scripts/verify-ebook.mjs
// - PDF file exists and is > 100KB
// - First page (cover) has no page number
// - TOC pages have entries
// - Every image reference resolved (no broken-image placeholders)
For DOCX, open in Word once and visually check: TOC linked, fonts applied, no missing-image squares.
生成电子书后,需进行验证:
js
// scripts/verify-ebook.mjs
// - PDF文件存在且大小>100KB
// -第一页(封面)无页码
// -目录页有正确条目
// -所有图片引用已解析(无损坏图片占位符)
对于DOCX文件,需在Word中打开并进行视觉检查:目录是否可跳转、字体是否正确应用、是否存在缺失图片的占位框。

CLI Conventions

CLI约定

The example workshop's
build:ebook
script supports:
FlagBehavior
(no flag)Build both PDF + DOCX
--md-only
Stop after composing markdown (debug content)
--no-docx
Skip DOCX (faster iteration on PDF)
--no-pdf
Skip PDF (faster iteration on DOCX)
--keep-html
Don't delete intermediate HTML (debug CSS)
--output PATH
Custom PDF output path
Provide these from day one — you'll iterate the layout many times.
示例工作坊的
build:ebook
脚本支持以下参数:
参数行为
无参数同时构建PDF + DOCX
--md-only
合成markdown后停止(用于调试内容)
--no-docx
跳过DOCX生成(加速PDF迭代)
--no-pdf
跳过PDF生成(加速DOCX迭代)
--keep-html
不删除中间HTML文件(用于调试CSS)
--output PATH
自定义PDF输出路径
从项目初期就提供这些参数——你会多次迭代布局。

Anti-Patterns

反模式

  • Re-authoring content in the ebook builder — the ebook is a derivative. If you find yourself adding new prose in the compose functions, that prose belongs in
    course-content-authoring
    instead.
  • Using Chrome CLI for PDF — switch to Playwright early; you'll want footers eventually.
  • Hardcoding image paths — use
    pickAsset()
    with a list of roots, especially when supporting corporate edition fallbacks.
  • Forgetting
    @page :first { margin: 0 }
    — cover ends up with a stray page number in the footer.
  • Re-running the ebook build every time the site changes — only build at stable checkpoints; the intermediate
    dist/
    folder is large.
  • 在电子书构建器中重新创作内容——电子书是衍生产品。如果你发现自己在合成函数中添加新的文本,这些文本应该放在
    course-content-authoring
    中。
  • 使用Chrome CLI生成PDF——尽早切换到Playwright;你最终会需要页脚功能。
  • 硬编码图片路径——使用
    pickAsset()
    函数并传入根路径列表,尤其是在支持企业版回退机制时。
  • 忘记添加
    @page :first { margin: 0 }
    ——封面页的页脚会出现多余的页码。
  • 网站每次变更都重新生成电子书——仅在稳定的检查点构建;中间产物
    dist/
    文件夹体积较大。

Hand-off

交付

When this skill finishes:
  • dist/{course-name}.pdf
    and optionally
    .docx
    are produced.
  • Verify scripts pass.
  • If for a client, the file goes into the deliverable bundle.
Tell the user: "ebook ready. The site and ebook are now in sync; from now on, treat the site as canonical and rebuild the ebook only at release milestones — don't edit the ebook directly, it'll get overwritten."
当此Skill执行完成后:
  • 会生成
    dist/{course-name}.pdf
    及可选的
    .docx
    文件。
  • 验证脚本执行通过。
  • 如果是为客户生成,需将文件加入交付包。
告知用户:“电子书已准备就绪。网站与电子书内容现已同步;从现在起,请将网站视为权威数据源,仅在发布里程碑时重新构建电子书——请勿直接编辑电子书,否则会被覆盖。”