translate-book-parallel

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Translate Book (Parallel Subagents)

书籍翻译（并行子代理）

Skill by ara.so — Daily 2026 Skills collection.

A Claude Code skill that translates entire books (PDF/DOCX/EPUB) into any language using parallel subagents. Each chunk gets an isolated context window — preventing truncation and context accumulation that plague single-session translation.

由ara.so开发的Skill — 2026每日技能合集。

这是一款Claude Code Skill，通过并行子代理将整本书籍（PDF/DOCX/EPUB格式）翻译成任意语言。每个分块都拥有独立的上下文窗口，避免了单会话翻译中常见的截断和上下文累积问题。

Pipeline Overview

流水线概述

Input (PDF/DOCX/EPUB)
  │
  ▼
Calibre ebook-convert → HTMLZ → HTML → Markdown
  │
  ▼
Split into chunks (~6000 chars each)
  │  manifest.json tracks SHA-256 hashes
  ▼
Parallel subagents (8 concurrent by default)
  │  each: read chunk → translate → write output_chunk*.md
  ▼
Validate (manifest hash check, 1:1 source↔output match)
  │
  ▼
Merge → Pandoc → HTML (with TOC) → Calibre → DOCX / EPUB / PDF

输入（PDF/DOCX/EPUB）
  │
  ▼
Calibre ebook-convert → HTMLZ → HTML → Markdown
  │
  ▼
分割为分块（每块约6000字符）
  │  manifest.json记录SHA-256哈希值
  ▼
并行子代理（默认8个并发）
  │  每个子代理：读取分块 → 翻译 → 写入output_chunk*.md
  ▼
验证（清单哈希检查，源文件与输出文件1:1匹配）
  │
  ▼
合并 → Pandoc → HTML（带目录）→ Calibre → DOCX / EPUB / PDF

Prerequisites

前置依赖

bash

undefined

bash

undefined

1. Calibre (provides ebook-convert)

1. Calibre（提供ebook-convert工具）

macOS

macOS系统

brew install --cask calibre

Linux

Linux系统

sudo apt-get install calibre

Or download from https://calibre-ebook.com/

或从官网下载：https://calibre-ebook.com/

2. Pandoc

brew install pandoc # macOS sudo apt-get install pandoc # Linux

brew install pandoc # macOS系统 sudo apt-get install pandoc # Linux系统

3. Python dependencies

3. Python依赖

pip install pypandoc beautifulsoup4


Verify all tools are available:

```bash
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"

pip install pypandoc beautifulsoup4


验证所有工具是否可用：

```bash
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"

Installation

安装方法

Option A: npx (recommended)

bash

npx skills add deusyu/translate-book -a claude-code -g

Option B: ClawHub

bash

clawhub install translate-book

Option C: Git clone

bash

git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book

选项A：npx（推荐）

bash

npx skills add deusyu/translate-book -a claude-code -g

选项B：ClawHub

bash

clawhub install translate-book

选项C：Git克隆

bash

git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book

Usage in Claude Code

在Claude Code中使用

Once the skill is installed, use natural language inside Claude Code:

translate /path/to/book.pdf to Chinese

translate ~/Downloads/mybook.epub to Japanese

/translate-book translate /path/to/book.docx to French

The skill orchestrates the full pipeline automatically.

安装完成后，在Claude Code中使用自然语言即可操作：

translate /path/to/book.pdf to 中文

translate ~/Downloads/mybook.epub to 日文

/translate-book translate /path/to/book.docx to 法文

该Skill会自动编排执行完整的流水线。

Supported Languages

支持的语言

Code	Language
`zh`	Chinese
`en`	English
`ja`	Japanese
`ko`	Korean
`fr`	French
`de`	German
`es`	Spanish

Language codes are extensible — add new ones in the skill definition.

代码	语言
`zh`	中文
`en`	英文
`ja`	日文
`ko`	韩文
`fr`	法文
`de`	德文
`es`	西班牙文

语言代码可扩展——在Skill定义中添加新的语言即可。

Running Pipeline Steps Manually

手动执行流水线步骤

Step 1: Convert to Markdown Chunks

步骤1：转换为Markdown分块

bash

python3 scripts/convert.py /path/to/book.pdf --olang zh

This produces inside

{book_name}_temp/

```
chunk0001.md
```
,
```
chunk0002.md
```
, ... (source chunks, ~6000 chars each)
```
manifest.json
```
(SHA-256 hashes for validation)

bash

undefined

bash

python3 scripts/convert.py /path/to/book.pdf --olang zh

执行后会在

{book_name}_temp/

目录下生成：

```
chunk0001.md
```
、
```
chunk0002.md
```
...（源文件分块，每块约6000字符）
```
manifest.json
```
（用于验证的SHA-256哈希记录）

bash

undefined

For EPUB input

处理EPUB输入文件

python3 scripts/convert.py /path/to/book.epub --olang ja

For DOCX input

处理DOCX输入文件

python3 scripts/convert.py /path/to/book.docx --olang fr

undefined

python3 scripts/convert.py /path/to/book.docx --olang fr

undefined

Step 2: Translate (Parallel Subagents)

步骤2：翻译（并行子代理）

The skill handles this step — it launches 8 concurrent subagents per batch, each translating one chunk independently:

undefined

该步骤由Skill自动处理——它会批量启动8个并发子代理，每个子代理独立翻译一个分块：

undefined

Each subagent receives exactly this task:

每个子代理执行的任务如下：

Read chunk0042.md → translate to target language → write output_chunk0042.md


**Resumable:** Already-translated chunks (valid `output_chunk*.md` files) are skipped on re-run.

读取chunk0042.md → 翻译为目标语言 → 写入output_chunk0042.md


**可恢复性：** 已翻译完成的分块（存在有效的`output_chunk*.md`文件）会在重新运行时自动跳过。

Step 3: Merge and Build All Formats

步骤3：合并并生成所有格式

bash

python3 scripts/merge_and_build.py \
  --temp-dir book_name_temp \
  --title "《Book Title in Target Language》"

Before merging, validation checks:

Every source chunk has a matching output file (1:1)
Source chunk hashes match
```
manifest.json
```
(no stale outputs)
No output files are empty

Outputs produced:

File	Description
`output.md`	Merged translated Markdown
`book.html`	Web version with floating TOC
`book.docx`	Word document
`book.epub`	E-book format
`book.pdf`	Print-ready PDF

bash

python3 scripts/merge_and_build.py \
  --temp-dir book_name_temp \
  --title "《目标语言的书籍标题》"

合并前会执行验证检查：

每个源文件分块都有对应的输出文件（1:1匹配）
源文件分块的哈希值与
```
manifest.json
```
中的记录一致（避免过期输出）
所有输出文件均不为空

生成的输出文件：

文件	说明
`output.md`	合并后的翻译版Markdown文件
`book.html`	带浮动目录的网页版本
`book.docx`	Word文档
`book.epub`	电子书格式
`book.pdf`	可打印的PDF文件

Project Structure

项目结构

translate-book/
├── SKILL.md                    # Claude Code skill definition (orchestrator)
├── scripts/
│   ├── convert.py              # PDF/DOCX/EPUB → Markdown chunks via Calibre HTMLZ
│   ├── manifest.py             # SHA-256 chunk tracking and merge validation
│   ├── merge_and_build.py      # Merge chunks → HTML → DOCX/EPUB/PDF
│   ├── calibre_html_publish.py # Calibre wrapper for format conversion
│   ├── template.html           # Web HTML template with floating TOC
│   └── template_ebook.html     # Ebook HTML template
└── README.md

translate-book/
├── SKILL.md                    # Claude Code Skill定义文件（编排器）
├── scripts/
│   ├── convert.py              # 通过Calibre HTMLZ将PDF/DOCX/EPUB转换为Markdown分块
│   ├── manifest.py             # SHA-256分块追踪与合并验证
│   ├── merge_and_build.py      # 合并分块 → HTML → DOCX/EPUB/PDF
│   ├── calibre_html_publish.py # 用于格式转换的Calibre封装脚本
│   ├── template.html           # 带浮动目录的网页HTML模板
│   └── template_ebook.html     # 电子书HTML模板
└── README.md

How Manifest Validation Works

清单验证的工作原理

python

undefined

python

undefined

scripts/manifest.py (conceptual usage)

scripts/manifest.py（概念性用法）

During convert.py — records source hashes

在convert.py执行时——记录源文件哈希值

manifest = { "chunk0001.md": "sha256:abc123...", "chunk0002.md": "sha256:def456...", # ... }

During merge_and_build.py — validates before merging

在merge_and_build.py执行时——合并前进行验证

1. Check every chunk has a corresponding output_chunk

1. 检查每个分块都有对应的output_chunk文件

2. Re-hash source chunks and compare against manifest

2. 重新计算源文件分块的哈希值并与manifest.json对比

3. Reject if any hash mismatches (stale/corrupt output)

3. 若哈希值不匹配（输出文件过期/损坏）则拒绝合并

4. Reject if any output file is empty

4. 若存在空输出文件则拒绝合并


If validation fails, the script auto-deletes stale `output.md` and re-merges from valid chunk outputs.


如果验证失败，脚本会自动删除过期的`output.md`文件，并从有效的分块输出重新合并。

Real-World Example: Translate a Technical Book

实际示例：翻译一本技术书籍

bash

undefined

bash

undefined

1. Install the skill

1. 安装该Skill

npx skills add deusyu/translate-book -a claude-code -g

2. Open Claude Code in your working directory

2. 进入书籍所在目录并打开Claude Code

cd ~/books

3. Say in Claude Code:

3. 在Claude Code中输入：

"translate clean-code.pdf to Chinese"

"translate clean-code.pdf to 中文"

Claude Code will:

Claude Code会执行以下操作：

- Run convert.py to split into chunks

- 运行convert.py将书籍分割为分块

- Launch 8 parallel subagents per batch

- 批量启动8个并行子代理

- Each subagent translates one chunk

- 每个子代理翻译一个分块

- Validate all outputs via manifest

- 通过清单验证所有输出文件

- Merge and build all formats

- 合并分块并生成所有格式的文件

4. Outputs appear in:

4. 输出文件会出现在以下目录：

ls clean-code_temp/

chunk0001.md chunk0002.md ... (source)

chunk0001.md chunk0002.md ... (源文件分块)

output_chunk0001.md ... (translated)

output_chunk0001.md ... (翻译后的分块)

manifest.json

output.md

book.html

book.docx

book.epub

book.pdf

undefined

undefined

Resuming an Interrupted Translation

恢复中断的翻译任务

bash

undefined

bash

undefined

If translation is interrupted, just re-run the same command:

如果翻译任务被中断，只需重新运行相同的指令即可：

"translate clean-code.pdf to Chinese"

"translate clean-code.pdf to 中文"

The skill detects existing output_chunk*.md files

该Skill会检测已存在的output_chunk*.md文件

and skips already-translated chunks automatically.

并自动跳过已翻译完成的分块

Only missing or failed chunks are retried.

仅重新尝试未完成或失败的分块

undefined

undefined

Changing Output Metadata After Translation

翻译完成后修改输出元数据

If you need to update the title, author, template, or image assets without re-translating:

bash

undefined

如果需要更新标题、作者、模板或图片资源而无需重新翻译：

bash

undefined

Delete only the final artifacts (keeps translated chunks)

仅删除最终生成的产物（保留翻译后的分块）

cd book_name_temp/ rm -f output.md book*.html book.docx book.epub book.pdf

Re-run merge step

重新运行合并步骤

python3 ../scripts/merge_and_build.py
--temp-dir .
--title "《New Title》"


**Do NOT delete chunk files** — those are your translated content. Only delete final artifacts when changing metadata.

python3 ../scripts/merge_and_build.py
--temp-dir .
--title "《新标题》"


**请勿删除分块文件**——这些是你的翻译内容。仅在修改元数据时删除最终产物。

Troubleshooting

故障排除

Problem	Solution
`Calibre ebook-convert not found`	Install Calibre; ensure `ebook-convert` is in `$PATH`
`Manifest validation failed`	Source chunks changed — re-run `convert.py`
`Missing source chunk`	Source file deleted — re-run `convert.py` to regenerate
Incomplete translation	Re-run the skill — resumes from last valid chunk
Changed title/template but output unchanged	Delete `output.md` , `book*.html` , `book.docx` , `book.epub` , `book.pdf` then re-run `merge_and_build.py`
`output.md exists but manifest invalid`	Script auto-deletes stale output and re-merges
PDF generation fails	Verify Calibre has PDF output support; try `ebook-convert --help`
Empty output chunks	Retry failed chunks; check API rate limits

问题	解决方案
`Calibre ebook-convert not found`	安装Calibre；确保 `ebook-convert` 已添加到 `$PATH` 环境变量中
`Manifest validation failed`	源文件分块已更改——重新运行 `convert.py`
`Missing source chunk`	源文件已删除——重新运行 `convert.py` 重新生成分块
翻译不完整	重新运行该Skill——从最后一个有效分块恢复
修改标题/模板但输出无变化	删除 `output.md` 、 `book*.html` 、 `book.docx` 、 `book.epub` 、 `book.pdf` 后重新运行 `merge_and_build.py`
`output.md exists but manifest invalid`	脚本会自动删除过期输出并重新合并
PDF生成失败	验证Calibre是否支持PDF输出；尝试运行 `ebook-convert --help` 查看帮助
输出分块为空	重新尝试失败的分块；检查API调用频率限制

Diagnosing Chunk Issues

分块问题诊断

bash

undefined

bash

undefined

Check which chunks are missing translation

检查哪些分块尚未翻译

ls book_temp/chunk*.md | wc -l # total source chunks ls book_temp/output_chunk*.md | wc -l # translated chunks so far

ls book_temp/chunk*.md | wc -l # 源文件分块总数 ls book_temp/output_chunk*.md | wc -l # 已翻译的分块数

Find missing output chunks

查找缺失的输出分块

for f in book_temp/chunk*.md; do base=$(basename "$f" .md) out="book_temp/output_${base}.md" if [ ! -f "$out" ] || [ ! -s "$out" ]; then echo "Missing: $out" fi done

Check manifest

查看清单文件

cat book_temp/manifest.json | python3 -m json.tool | head -30

undefined

cat book_temp/manifest.json | python3 -m json.tool | head -30

undefined

Configuration Tips

配置技巧

Chunk size: ~6000 chars per chunk is the default. Smaller chunks = more parallelism but more API calls.
Concurrency: Default is 8 parallel subagents per batch. Adjust in
```
SKILL.md
```
if hitting rate limits.
Languages: Add new language codes to the skill triggers and translation prompt in
```
SKILL.md
```
.
Templates: Customize
```
scripts/template.html
```
and
```
scripts/template_ebook.html
```
for different HTML/ebook styling.

分块大小： 默认每块约6000字符。分块越小，并行度越高，但API调用次数也越多。
并发数： 默认批量启动8个并行子代理。若遇到API频率限制，可在
```
SKILL.md
```
中调整。
语言支持： 在
```
SKILL.md
```
的Skill触发器和翻译提示中添加新的语言代码即可扩展支持的语言。
模板自定义： 修改
```
scripts/template.html
```
和
```
scripts/template_ebook.html
```
可自定义网页/电子书的样式。

Key Design Principles

核心设计原则

Isolated context per chunk — each subagent starts fresh, preventing context overflow on long books
Hash-based integrity — SHA-256 tracking catches stale or corrupt translated chunks before merging
Resumable at chunk granularity — never re-translate what's already done
Format-agnostic input — Calibre handles PDF/DOCX/EPUB normalization before the pipeline begins
Multiple output formats — single pipeline produces HTML, DOCX, EPUB, and PDF simultaneously

分块独立上下文——每个子代理都从全新状态开始，避免长书籍翻译时的上下文溢出问题
基于哈希的完整性——SHA-256追踪可在合并前捕获过期或损坏的翻译分块
分块级可恢复——无需重新翻译已完成的内容
输入格式无关——在流水线开始前，由Calibre统一处理PDF/DOCX/EPUB格式
多格式输出——单次流水线可同时生成HTML、DOCX、EPUB和PDF格式的输出