translate-book-parallel

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Translate Book (Parallel Subagents)

书籍翻译(并行子代理)

Skill by ara.so — Daily 2026 Skills collection.
A Claude Code skill that translates entire books (PDF/DOCX/EPUB) into any language using parallel subagents. Each chunk gets an isolated context window — preventing truncation and context accumulation that plague single-session translation.
ara.so开发的Skill — 2026每日技能合集。
这是一款Claude Code Skill,通过并行子代理将整本书籍(PDF/DOCX/EPUB格式)翻译成任意语言。每个分块都拥有独立的上下文窗口,避免了单会话翻译中常见的截断和上下文累积问题。

Pipeline Overview

流水线概述

Input (PDF/DOCX/EPUB)
Calibre ebook-convert → HTMLZ → HTML → Markdown
Split into chunks (~6000 chars each)
  │  manifest.json tracks SHA-256 hashes
Parallel subagents (8 concurrent by default)
  │  each: read chunk → translate → write output_chunk*.md
Validate (manifest hash check, 1:1 source↔output match)
Merge → Pandoc → HTML (with TOC) → Calibre → DOCX / EPUB / PDF
输入(PDF/DOCX/EPUB)
Calibre ebook-convert → HTMLZ → HTML → Markdown
分割为分块(每块约6000字符)
  │  manifest.json记录SHA-256哈希值
并行子代理(默认8个并发)
  │  每个子代理:读取分块 → 翻译 → 写入output_chunk*.md
验证(清单哈希检查,源文件与输出文件1:1匹配)
合并 → Pandoc → HTML(带目录)→ Calibre → DOCX / EPUB / PDF

Prerequisites

前置依赖

bash
undefined
bash
undefined

1. Calibre (provides ebook-convert)

1. Calibre(提供ebook-convert工具)

macOS

macOS系统

brew install --cask calibre
brew install --cask calibre

Linux

Linux系统

sudo apt-get install calibre
sudo apt-get install calibre

或从官网下载:https://calibre-ebook.com/

2. Pandoc

2. Pandoc

brew install pandoc # macOS sudo apt-get install pandoc # Linux
brew install pandoc # macOS系统 sudo apt-get install pandoc # Linux系统

3. Python dependencies

3. Python依赖

pip install pypandoc beautifulsoup4

Verify all tools are available:

```bash
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"
pip install pypandoc beautifulsoup4

验证所有工具是否可用:

```bash
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"

Installation

安装方法

Option A: npx (recommended)
bash
npx skills add deusyu/translate-book -a claude-code -g
Option B: ClawHub
bash
clawhub install translate-book
Option C: Git clone
bash
git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book
选项A:npx(推荐)
bash
npx skills add deusyu/translate-book -a claude-code -g
选项B:ClawHub
bash
clawhub install translate-book
选项C:Git克隆
bash
git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book

Usage in Claude Code

在Claude Code中使用

Once the skill is installed, use natural language inside Claude Code:
translate /path/to/book.pdf to Chinese
translate ~/Downloads/mybook.epub to Japanese
/translate-book translate /path/to/book.docx to French
The skill orchestrates the full pipeline automatically.
安装完成后,在Claude Code中使用自然语言即可操作:
translate /path/to/book.pdf to 中文
translate ~/Downloads/mybook.epub to 日文
/translate-book translate /path/to/book.docx to 法文
该Skill会自动编排执行完整的流水线。

Supported Languages

支持的语言

CodeLanguage
zh
Chinese
en
English
ja
Japanese
ko
Korean
fr
French
de
German
es
Spanish
Language codes are extensible — add new ones in the skill definition.
代码语言
zh
中文
en
英文
ja
日文
ko
韩文
fr
法文
de
德文
es
西班牙文
语言代码可扩展——在Skill定义中添加新的语言即可。

Running Pipeline Steps Manually

手动执行流水线步骤

Step 1: Convert to Markdown Chunks

步骤1:转换为Markdown分块

bash
python3 scripts/convert.py /path/to/book.pdf --olang zh
This produces inside
{book_name}_temp/
:
  • chunk0001.md
    ,
    chunk0002.md
    , ... (source chunks, ~6000 chars each)
  • manifest.json
    (SHA-256 hashes for validation)
bash
undefined
bash
python3 scripts/convert.py /path/to/book.pdf --olang zh
执行后会在
{book_name}_temp/
目录下生成:
  • chunk0001.md
    chunk0002.md
    ...(源文件分块,每块约6000字符)
  • manifest.json
    (用于验证的SHA-256哈希记录)
bash
undefined

For EPUB input

处理EPUB输入文件

python3 scripts/convert.py /path/to/book.epub --olang ja
python3 scripts/convert.py /path/to/book.epub --olang ja

For DOCX input

处理DOCX输入文件

python3 scripts/convert.py /path/to/book.docx --olang fr
undefined
python3 scripts/convert.py /path/to/book.docx --olang fr
undefined

Step 2: Translate (Parallel Subagents)

步骤2:翻译(并行子代理)

The skill handles this step — it launches 8 concurrent subagents per batch, each translating one chunk independently:
undefined
该步骤由Skill自动处理——它会批量启动8个并发子代理,每个子代理独立翻译一个分块:
undefined

Each subagent receives exactly this task:

每个子代理执行的任务如下:

Read chunk0042.md → translate to target language → write output_chunk0042.md

**Resumable:** Already-translated chunks (valid `output_chunk*.md` files) are skipped on re-run.
读取chunk0042.md → 翻译为目标语言 → 写入output_chunk0042.md

**可恢复性:** 已翻译完成的分块(存在有效的`output_chunk*.md`文件)会在重新运行时自动跳过。

Step 3: Merge and Build All Formats

步骤3:合并并生成所有格式

bash
python3 scripts/merge_and_build.py \
  --temp-dir book_name_temp \
  --title "《Book Title in Target Language》"
Before merging, validation checks:
  • Every source chunk has a matching output file (1:1)
  • Source chunk hashes match
    manifest.json
    (no stale outputs)
  • No output files are empty
Outputs produced:
FileDescription
output.md
Merged translated Markdown
book.html
Web version with floating TOC
book.docx
Word document
book.epub
E-book format
book.pdf
Print-ready PDF
bash
python3 scripts/merge_and_build.py \
  --temp-dir book_name_temp \
  --title "《目标语言的书籍标题》"
合并前会执行验证检查:
  • 每个源文件分块都有对应的输出文件(1:1匹配)
  • 源文件分块的哈希值与
    manifest.json
    中的记录一致(避免过期输出)
  • 所有输出文件均不为空
生成的输出文件:
文件说明
output.md
合并后的翻译版Markdown文件
book.html
带浮动目录的网页版本
book.docx
Word文档
book.epub
电子书格式
book.pdf
可打印的PDF文件

Project Structure

项目结构

translate-book/
├── SKILL.md                    # Claude Code skill definition (orchestrator)
├── scripts/
│   ├── convert.py              # PDF/DOCX/EPUB → Markdown chunks via Calibre HTMLZ
│   ├── manifest.py             # SHA-256 chunk tracking and merge validation
│   ├── merge_and_build.py      # Merge chunks → HTML → DOCX/EPUB/PDF
│   ├── calibre_html_publish.py # Calibre wrapper for format conversion
│   ├── template.html           # Web HTML template with floating TOC
│   └── template_ebook.html     # Ebook HTML template
└── README.md
translate-book/
├── SKILL.md                    # Claude Code Skill定义文件(编排器)
├── scripts/
│   ├── convert.py              # 通过Calibre HTMLZ将PDF/DOCX/EPUB转换为Markdown分块
│   ├── manifest.py             # SHA-256分块追踪与合并验证
│   ├── merge_and_build.py      # 合并分块 → HTML → DOCX/EPUB/PDF
│   ├── calibre_html_publish.py # 用于格式转换的Calibre封装脚本
│   ├── template.html           # 带浮动目录的网页HTML模板
│   └── template_ebook.html     # 电子书HTML模板
└── README.md

How Manifest Validation Works

清单验证的工作原理

python
undefined
python
undefined

scripts/manifest.py (conceptual usage)

scripts/manifest.py(概念性用法)

During convert.py — records source hashes

在convert.py执行时——记录源文件哈希值

manifest = { "chunk0001.md": "sha256:abc123...", "chunk0002.md": "sha256:def456...", # ... }
manifest = { "chunk0001.md": "sha256:abc123...", "chunk0002.md": "sha256:def456...", # ... }

During merge_and_build.py — validates before merging

在merge_and_build.py执行时——合并前进行验证

1. Check every chunk has a corresponding output_chunk

1. 检查每个分块都有对应的output_chunk文件

2. Re-hash source chunks and compare against manifest

2. 重新计算源文件分块的哈希值并与manifest.json对比

3. Reject if any hash mismatches (stale/corrupt output)

3. 若哈希值不匹配(输出文件过期/损坏)则拒绝合并

4. Reject if any output file is empty

4. 若存在空输出文件则拒绝合并


If validation fails, the script auto-deletes stale `output.md` and re-merges from valid chunk outputs.

如果验证失败,脚本会自动删除过期的`output.md`文件,并从有效的分块输出重新合并。

Real-World Example: Translate a Technical Book

实际示例:翻译一本技术书籍

bash
undefined
bash
undefined

1. Install the skill

1. 安装该Skill

npx skills add deusyu/translate-book -a claude-code -g
npx skills add deusyu/translate-book -a claude-code -g

2. Open Claude Code in your working directory

2. 进入书籍所在目录并打开Claude Code

cd ~/books
cd ~/books

3. Say in Claude Code:

3. 在Claude Code中输入:

"translate clean-code.pdf to Chinese"

"translate clean-code.pdf to 中文"

Claude Code will:

Claude Code会执行以下操作:

- Run convert.py to split into chunks

- 运行convert.py将书籍分割为分块

- Launch 8 parallel subagents per batch

- 批量启动8个并行子代理

- Each subagent translates one chunk

- 每个子代理翻译一个分块

- Validate all outputs via manifest

- 通过清单验证所有输出文件

- Merge and build all formats

- 合并分块并生成所有格式的文件

4. Outputs appear in:

4. 输出文件会出现在以下目录:

ls clean-code_temp/
ls clean-code_temp/

chunk0001.md chunk0002.md ... (source)

chunk0001.md chunk0002.md ... (源文件分块)

output_chunk0001.md ... (translated)

output_chunk0001.md ... (翻译后的分块)

manifest.json

manifest.json

output.md

output.md

book.html

book.html

book.docx

book.docx

book.epub

book.epub

book.pdf

book.pdf

undefined
undefined

Resuming an Interrupted Translation

恢复中断的翻译任务

bash
undefined
bash
undefined

If translation is interrupted, just re-run the same command:

如果翻译任务被中断,只需重新运行相同的指令即可:

"translate clean-code.pdf to Chinese"

"translate clean-code.pdf to 中文"

The skill detects existing output_chunk*.md files

该Skill会检测已存在的output_chunk*.md文件

and skips already-translated chunks automatically.

并自动跳过已翻译完成的分块

Only missing or failed chunks are retried.

仅重新尝试未完成或失败的分块

undefined
undefined

Changing Output Metadata After Translation

翻译完成后修改输出元数据

If you need to update the title, author, template, or image assets without re-translating:
bash
undefined
如果需要更新标题、作者、模板或图片资源而无需重新翻译:
bash
undefined

Delete only the final artifacts (keeps translated chunks)

仅删除最终生成的产物(保留翻译后的分块)

cd book_name_temp/ rm -f output.md book*.html book.docx book.epub book.pdf
cd book_name_temp/ rm -f output.md book*.html book.docx book.epub book.pdf

Re-run merge step

重新运行合并步骤

python3 ../scripts/merge_and_build.py
--temp-dir .
--title "《New Title》"

**Do NOT delete chunk files** — those are your translated content. Only delete final artifacts when changing metadata.
python3 ../scripts/merge_and_build.py
--temp-dir .
--title "《新标题》"

**请勿删除分块文件**——这些是你的翻译内容。仅在修改元数据时删除最终产物。

Troubleshooting

故障排除

ProblemSolution
Calibre ebook-convert not found
Install Calibre; ensure
ebook-convert
is in
$PATH
Manifest validation failed
Source chunks changed — re-run
convert.py
Missing source chunk
Source file deleted — re-run
convert.py
to regenerate
Incomplete translationRe-run the skill — resumes from last valid chunk
Changed title/template but output unchangedDelete
output.md
,
book*.html
,
book.docx
,
book.epub
,
book.pdf
then re-run
merge_and_build.py
output.md exists but manifest invalid
Script auto-deletes stale output and re-merges
PDF generation failsVerify Calibre has PDF output support; try
ebook-convert --help
Empty output chunksRetry failed chunks; check API rate limits
问题解决方案
Calibre ebook-convert not found
安装Calibre;确保
ebook-convert
已添加到
$PATH
环境变量中
Manifest validation failed
源文件分块已更改——重新运行
convert.py
Missing source chunk
源文件已删除——重新运行
convert.py
重新生成分块
翻译不完整重新运行该Skill——从最后一个有效分块恢复
修改标题/模板但输出无变化删除
output.md
book*.html
book.docx
book.epub
book.pdf
后重新运行
merge_and_build.py
output.md exists but manifest invalid
脚本会自动删除过期输出并重新合并
PDF生成失败验证Calibre是否支持PDF输出;尝试运行
ebook-convert --help
查看帮助
输出分块为空重新尝试失败的分块;检查API调用频率限制

Diagnosing Chunk Issues

分块问题诊断

bash
undefined
bash
undefined

Check which chunks are missing translation

检查哪些分块尚未翻译

ls book_temp/chunk*.md | wc -l # total source chunks ls book_temp/output_chunk*.md | wc -l # translated chunks so far
ls book_temp/chunk*.md | wc -l # 源文件分块总数 ls book_temp/output_chunk*.md | wc -l # 已翻译的分块数

Find missing output chunks

查找缺失的输出分块

for f in book_temp/chunk*.md; do base=$(basename "$f" .md) out="book_temp/output_${base}.md" if [ ! -f "$out" ] || [ ! -s "$out" ]; then echo "Missing: $out" fi done
for f in book_temp/chunk*.md; do base=$(basename "$f" .md) out="book_temp/output_${base}.md" if [ ! -f "$out" ] || [ ! -s "$out" ]; then echo "Missing: $out" fi done

Check manifest

查看清单文件

cat book_temp/manifest.json | python3 -m json.tool | head -30
undefined
cat book_temp/manifest.json | python3 -m json.tool | head -30
undefined

Configuration Tips

配置技巧

  • Chunk size: ~6000 chars per chunk is the default. Smaller chunks = more parallelism but more API calls.
  • Concurrency: Default is 8 parallel subagents per batch. Adjust in
    SKILL.md
    if hitting rate limits.
  • Languages: Add new language codes to the skill triggers and translation prompt in
    SKILL.md
    .
  • Templates: Customize
    scripts/template.html
    and
    scripts/template_ebook.html
    for different HTML/ebook styling.
  • 分块大小: 默认每块约6000字符。分块越小,并行度越高,但API调用次数也越多。
  • 并发数: 默认批量启动8个并行子代理。若遇到API频率限制,可在
    SKILL.md
    中调整。
  • 语言支持:
    SKILL.md
    的Skill触发器和翻译提示中添加新的语言代码即可扩展支持的语言。
  • 模板自定义: 修改
    scripts/template.html
    scripts/template_ebook.html
    可自定义网页/电子书的样式。

Key Design Principles

核心设计原则

  1. Isolated context per chunk — each subagent starts fresh, preventing context overflow on long books
  2. Hash-based integrity — SHA-256 tracking catches stale or corrupt translated chunks before merging
  3. Resumable at chunk granularity — never re-translate what's already done
  4. Format-agnostic input — Calibre handles PDF/DOCX/EPUB normalization before the pipeline begins
  5. Multiple output formats — single pipeline produces HTML, DOCX, EPUB, and PDF simultaneously
  1. 分块独立上下文——每个子代理都从全新状态开始,避免长书籍翻译时的上下文溢出问题
  2. 基于哈希的完整性——SHA-256追踪可在合并前捕获过期或损坏的翻译分块
  3. 分块级可恢复——无需重新翻译已完成的内容
  4. 输入格式无关——在流水线开始前,由Calibre统一处理PDF/DOCX/EPUB格式
  5. 多格式输出——单次流水线可同时生成HTML、DOCX、EPUB和PDF格式的输出