translate-book-parallel
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTranslate Book (Parallel Subagents)
书籍翻译(并行子代理)
Skill by ara.so — Daily 2026 Skills collection.
A Claude Code skill that translates entire books (PDF/DOCX/EPUB) into any language using parallel subagents. Each chunk gets an isolated context window — preventing truncation and context accumulation that plague single-session translation.
由ara.so开发的Skill — 2026每日技能合集。
这是一款Claude Code Skill,通过并行子代理将整本书籍(PDF/DOCX/EPUB格式)翻译成任意语言。每个分块都拥有独立的上下文窗口,避免了单会话翻译中常见的截断和上下文累积问题。
Pipeline Overview
流水线概述
Input (PDF/DOCX/EPUB)
│
▼
Calibre ebook-convert → HTMLZ → HTML → Markdown
│
▼
Split into chunks (~6000 chars each)
│ manifest.json tracks SHA-256 hashes
▼
Parallel subagents (8 concurrent by default)
│ each: read chunk → translate → write output_chunk*.md
▼
Validate (manifest hash check, 1:1 source↔output match)
│
▼
Merge → Pandoc → HTML (with TOC) → Calibre → DOCX / EPUB / PDF输入(PDF/DOCX/EPUB)
│
▼
Calibre ebook-convert → HTMLZ → HTML → Markdown
│
▼
分割为分块(每块约6000字符)
│ manifest.json记录SHA-256哈希值
▼
并行子代理(默认8个并发)
│ 每个子代理:读取分块 → 翻译 → 写入output_chunk*.md
▼
验证(清单哈希检查,源文件与输出文件1:1匹配)
│
▼
合并 → Pandoc → HTML(带目录)→ Calibre → DOCX / EPUB / PDFPrerequisites
前置依赖
bash
undefinedbash
undefined1. Calibre (provides ebook-convert)
1. Calibre(提供ebook-convert工具)
macOS
macOS系统
brew install --cask calibre
brew install --cask calibre
Linux
Linux系统
sudo apt-get install calibre
sudo apt-get install calibre
Or download from https://calibre-ebook.com/
或从官网下载:https://calibre-ebook.com/
2. Pandoc
2. Pandoc
brew install pandoc # macOS
sudo apt-get install pandoc # Linux
brew install pandoc # macOS系统
sudo apt-get install pandoc # Linux系统
3. Python dependencies
3. Python依赖
pip install pypandoc beautifulsoup4
Verify all tools are available:
```bash
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"pip install pypandoc beautifulsoup4
验证所有工具是否可用:
```bash
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"Installation
安装方法
Option A: npx (recommended)
bash
npx skills add deusyu/translate-book -a claude-code -gOption B: ClawHub
bash
clawhub install translate-bookOption C: Git clone
bash
git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book选项A:npx(推荐)
bash
npx skills add deusyu/translate-book -a claude-code -g选项B:ClawHub
bash
clawhub install translate-book选项C:Git克隆
bash
git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-bookUsage in Claude Code
在Claude Code中使用
Once the skill is installed, use natural language inside Claude Code:
translate /path/to/book.pdf to Chinesetranslate ~/Downloads/mybook.epub to Japanese/translate-book translate /path/to/book.docx to FrenchThe skill orchestrates the full pipeline automatically.
安装完成后,在Claude Code中使用自然语言即可操作:
translate /path/to/book.pdf to 中文translate ~/Downloads/mybook.epub to 日文/translate-book translate /path/to/book.docx to 法文该Skill会自动编排执行完整的流水线。
Supported Languages
支持的语言
| Code | Language |
|---|---|
| Chinese |
| English |
| Japanese |
| Korean |
| French |
| German |
| Spanish |
Language codes are extensible — add new ones in the skill definition.
| 代码 | 语言 |
|---|---|
| 中文 |
| 英文 |
| 日文 |
| 韩文 |
| 法文 |
| 德文 |
| 西班牙文 |
语言代码可扩展——在Skill定义中添加新的语言即可。
Running Pipeline Steps Manually
手动执行流水线步骤
Step 1: Convert to Markdown Chunks
步骤1:转换为Markdown分块
bash
python3 scripts/convert.py /path/to/book.pdf --olang zhThis produces inside :
{book_name}_temp/- ,
chunk0001.md, ... (source chunks, ~6000 chars each)chunk0002.md - (SHA-256 hashes for validation)
manifest.json
bash
undefinedbash
python3 scripts/convert.py /path/to/book.pdf --olang zh执行后会在目录下生成:
{book_name}_temp/- 、
chunk0001.md...(源文件分块,每块约6000字符)chunk0002.md - (用于验证的SHA-256哈希记录)
manifest.json
bash
undefinedFor EPUB input
处理EPUB输入文件
python3 scripts/convert.py /path/to/book.epub --olang ja
python3 scripts/convert.py /path/to/book.epub --olang ja
For DOCX input
处理DOCX输入文件
python3 scripts/convert.py /path/to/book.docx --olang fr
undefinedpython3 scripts/convert.py /path/to/book.docx --olang fr
undefinedStep 2: Translate (Parallel Subagents)
步骤2:翻译(并行子代理)
The skill handles this step — it launches 8 concurrent subagents per batch, each translating one chunk independently:
undefined该步骤由Skill自动处理——它会批量启动8个并发子代理,每个子代理独立翻译一个分块:
undefinedEach subagent receives exactly this task:
每个子代理执行的任务如下:
Read chunk0042.md → translate to target language → write output_chunk0042.md
**Resumable:** Already-translated chunks (valid `output_chunk*.md` files) are skipped on re-run.读取chunk0042.md → 翻译为目标语言 → 写入output_chunk0042.md
**可恢复性:** 已翻译完成的分块(存在有效的`output_chunk*.md`文件)会在重新运行时自动跳过。Step 3: Merge and Build All Formats
步骤3:合并并生成所有格式
bash
python3 scripts/merge_and_build.py \
--temp-dir book_name_temp \
--title "《Book Title in Target Language》"Before merging, validation checks:
- Every source chunk has a matching output file (1:1)
- Source chunk hashes match (no stale outputs)
manifest.json - No output files are empty
Outputs produced:
| File | Description |
|---|---|
| Merged translated Markdown |
| Web version with floating TOC |
| Word document |
| E-book format |
| Print-ready PDF |
bash
python3 scripts/merge_and_build.py \
--temp-dir book_name_temp \
--title "《目标语言的书籍标题》"合并前会执行验证检查:
- 每个源文件分块都有对应的输出文件(1:1匹配)
- 源文件分块的哈希值与中的记录一致(避免过期输出)
manifest.json - 所有输出文件均不为空
生成的输出文件:
| 文件 | 说明 |
|---|---|
| 合并后的翻译版Markdown文件 |
| 带浮动目录的网页版本 |
| Word文档 |
| 电子书格式 |
| 可打印的PDF文件 |
Project Structure
项目结构
translate-book/
├── SKILL.md # Claude Code skill definition (orchestrator)
├── scripts/
│ ├── convert.py # PDF/DOCX/EPUB → Markdown chunks via Calibre HTMLZ
│ ├── manifest.py # SHA-256 chunk tracking and merge validation
│ ├── merge_and_build.py # Merge chunks → HTML → DOCX/EPUB/PDF
│ ├── calibre_html_publish.py # Calibre wrapper for format conversion
│ ├── template.html # Web HTML template with floating TOC
│ └── template_ebook.html # Ebook HTML template
└── README.mdtranslate-book/
├── SKILL.md # Claude Code Skill定义文件(编排器)
├── scripts/
│ ├── convert.py # 通过Calibre HTMLZ将PDF/DOCX/EPUB转换为Markdown分块
│ ├── manifest.py # SHA-256分块追踪与合并验证
│ ├── merge_and_build.py # 合并分块 → HTML → DOCX/EPUB/PDF
│ ├── calibre_html_publish.py # 用于格式转换的Calibre封装脚本
│ ├── template.html # 带浮动目录的网页HTML模板
│ └── template_ebook.html # 电子书HTML模板
└── README.mdHow Manifest Validation Works
清单验证的工作原理
python
undefinedpython
undefinedscripts/manifest.py (conceptual usage)
scripts/manifest.py(概念性用法)
During convert.py — records source hashes
在convert.py执行时——记录源文件哈希值
manifest = {
"chunk0001.md": "sha256:abc123...",
"chunk0002.md": "sha256:def456...",
# ...
}
manifest = {
"chunk0001.md": "sha256:abc123...",
"chunk0002.md": "sha256:def456...",
# ...
}
During merge_and_build.py — validates before merging
在merge_and_build.py执行时——合并前进行验证
1. Check every chunk has a corresponding output_chunk
1. 检查每个分块都有对应的output_chunk文件
2. Re-hash source chunks and compare against manifest
2. 重新计算源文件分块的哈希值并与manifest.json对比
3. Reject if any hash mismatches (stale/corrupt output)
3. 若哈希值不匹配(输出文件过期/损坏)则拒绝合并
4. Reject if any output file is empty
4. 若存在空输出文件则拒绝合并
If validation fails, the script auto-deletes stale `output.md` and re-merges from valid chunk outputs.
如果验证失败,脚本会自动删除过期的`output.md`文件,并从有效的分块输出重新合并。Real-World Example: Translate a Technical Book
实际示例:翻译一本技术书籍
bash
undefinedbash
undefined1. Install the skill
1. 安装该Skill
npx skills add deusyu/translate-book -a claude-code -g
npx skills add deusyu/translate-book -a claude-code -g
2. Open Claude Code in your working directory
2. 进入书籍所在目录并打开Claude Code
cd ~/books
cd ~/books
3. Say in Claude Code:
3. 在Claude Code中输入:
"translate clean-code.pdf to Chinese"
"translate clean-code.pdf to 中文"
Claude Code will:
Claude Code会执行以下操作:
- Run convert.py to split into chunks
- 运行convert.py将书籍分割为分块
- Launch 8 parallel subagents per batch
- 批量启动8个并行子代理
- Each subagent translates one chunk
- 每个子代理翻译一个分块
- Validate all outputs via manifest
- 通过清单验证所有输出文件
- Merge and build all formats
- 合并分块并生成所有格式的文件
4. Outputs appear in:
4. 输出文件会出现在以下目录:
ls clean-code_temp/
ls clean-code_temp/
chunk0001.md chunk0002.md ... (source)
chunk0001.md chunk0002.md ... (源文件分块)
output_chunk0001.md ... (translated)
output_chunk0001.md ... (翻译后的分块)
manifest.json
manifest.json
output.md
output.md
book.html
book.html
book.docx
book.docx
book.epub
book.epub
book.pdf
book.pdf
undefinedundefinedResuming an Interrupted Translation
恢复中断的翻译任务
bash
undefinedbash
undefinedIf translation is interrupted, just re-run the same command:
如果翻译任务被中断,只需重新运行相同的指令即可:
"translate clean-code.pdf to Chinese"
"translate clean-code.pdf to 中文"
The skill detects existing output_chunk*.md files
该Skill会检测已存在的output_chunk*.md文件
and skips already-translated chunks automatically.
并自动跳过已翻译完成的分块
Only missing or failed chunks are retried.
仅重新尝试未完成或失败的分块
undefinedundefinedChanging Output Metadata After Translation
翻译完成后修改输出元数据
If you need to update the title, author, template, or image assets without re-translating:
bash
undefined如果需要更新标题、作者、模板或图片资源而无需重新翻译:
bash
undefinedDelete only the final artifacts (keeps translated chunks)
仅删除最终生成的产物(保留翻译后的分块)
cd book_name_temp/
rm -f output.md book*.html book.docx book.epub book.pdf
cd book_name_temp/
rm -f output.md book*.html book.docx book.epub book.pdf
Re-run merge step
重新运行合并步骤
python3 ../scripts/merge_and_build.py
--temp-dir .
--title "《New Title》"
--temp-dir .
--title "《New Title》"
**Do NOT delete chunk files** — those are your translated content. Only delete final artifacts when changing metadata.python3 ../scripts/merge_and_build.py
--temp-dir .
--title "《新标题》"
--temp-dir .
--title "《新标题》"
**请勿删除分块文件**——这些是你的翻译内容。仅在修改元数据时删除最终产物。Troubleshooting
故障排除
| Problem | Solution |
|---|---|
| Install Calibre; ensure |
| Source chunks changed — re-run |
| Source file deleted — re-run |
| Incomplete translation | Re-run the skill — resumes from last valid chunk |
| Changed title/template but output unchanged | Delete |
| Script auto-deletes stale output and re-merges |
| PDF generation fails | Verify Calibre has PDF output support; try |
| Empty output chunks | Retry failed chunks; check API rate limits |
| 问题 | 解决方案 |
|---|---|
| 安装Calibre;确保 |
| 源文件分块已更改——重新运行 |
| 源文件已删除——重新运行 |
| 翻译不完整 | 重新运行该Skill——从最后一个有效分块恢复 |
| 修改标题/模板但输出无变化 | 删除 |
| 脚本会自动删除过期输出并重新合并 |
| PDF生成失败 | 验证Calibre是否支持PDF输出;尝试运行 |
| 输出分块为空 | 重新尝试失败的分块;检查API调用频率限制 |
Diagnosing Chunk Issues
分块问题诊断
bash
undefinedbash
undefinedCheck which chunks are missing translation
检查哪些分块尚未翻译
ls book_temp/chunk*.md | wc -l # total source chunks
ls book_temp/output_chunk*.md | wc -l # translated chunks so far
ls book_temp/chunk*.md | wc -l # 源文件分块总数
ls book_temp/output_chunk*.md | wc -l # 已翻译的分块数
Find missing output chunks
查找缺失的输出分块
for f in book_temp/chunk*.md; do
base=$(basename "$f" .md)
out="book_temp/output_${base}.md"
if [ ! -f "$out" ] || [ ! -s "$out" ]; then
echo "Missing: $out"
fi
done
for f in book_temp/chunk*.md; do
base=$(basename "$f" .md)
out="book_temp/output_${base}.md"
if [ ! -f "$out" ] || [ ! -s "$out" ]; then
echo "Missing: $out"
fi
done
Check manifest
查看清单文件
cat book_temp/manifest.json | python3 -m json.tool | head -30
undefinedcat book_temp/manifest.json | python3 -m json.tool | head -30
undefinedConfiguration Tips
配置技巧
- Chunk size: ~6000 chars per chunk is the default. Smaller chunks = more parallelism but more API calls.
- Concurrency: Default is 8 parallel subagents per batch. Adjust in if hitting rate limits.
SKILL.md - Languages: Add new language codes to the skill triggers and translation prompt in .
SKILL.md - Templates: Customize and
scripts/template.htmlfor different HTML/ebook styling.scripts/template_ebook.html
- 分块大小: 默认每块约6000字符。分块越小,并行度越高,但API调用次数也越多。
- 并发数: 默认批量启动8个并行子代理。若遇到API频率限制,可在中调整。
SKILL.md - 语言支持: 在的Skill触发器和翻译提示中添加新的语言代码即可扩展支持的语言。
SKILL.md - 模板自定义: 修改和
scripts/template.html可自定义网页/电子书的样式。scripts/template_ebook.html
Key Design Principles
核心设计原则
- Isolated context per chunk — each subagent starts fresh, preventing context overflow on long books
- Hash-based integrity — SHA-256 tracking catches stale or corrupt translated chunks before merging
- Resumable at chunk granularity — never re-translate what's already done
- Format-agnostic input — Calibre handles PDF/DOCX/EPUB normalization before the pipeline begins
- Multiple output formats — single pipeline produces HTML, DOCX, EPUB, and PDF simultaneously
- 分块独立上下文——每个子代理都从全新状态开始,避免长书籍翻译时的上下文溢出问题
- 基于哈希的完整性——SHA-256追踪可在合并前捕获过期或损坏的翻译分块
- 分块级可恢复——无需重新翻译已完成的内容
- 输入格式无关——在流水线开始前,由Calibre统一处理PDF/DOCX/EPUB格式
- 多格式输出——单次流水线可同时生成HTML、DOCX、EPUB和PDF格式的输出