book-converter

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Book Converter Skill

书籍转换技能

Convert EPUB books into professionally formatted Markdown books with AI-assisted quality improvements.

使用AI辅助的质量优化功能将EPUB书籍转换为专业格式的Markdown书籍。

Overview

概述

This skill converts EPUB files into high-quality Markdown documents by:

Using pandoc to extract raw Markdown from EPUB
Creating a structured project directory
Planning and executing AI-driven formatting fixes
Producing chapter-by-chapter formatted output
Generating merged book file with Table of Contents

本技能通过以下步骤将EPUB文件转换为高质量Markdown文档：

使用pandoc从EPUB中提取原始Markdown
创建结构化项目目录
规划并执行AI驱动的格式修复
生成按章节划分的格式化输出
生成带目录的合并书籍文件

Quick Start

快速开始

User provides an EPUB file path:

/Users/username/Downloads/Book.Name.2024.epub

Execute the conversion workflow:

bash

python3 scripts/convert_book.py "/path/to/book.epub"

This initiates the complete conversion process.

用户提供EPUB文件路径：

/Users/username/Downloads/Book.Name.2024.epub

执行转换工作流：

bash

python3 scripts/convert_book.py "/path/to/book.epub"

这将启动完整的转换流程。

Workflow

工作流

CRITICAL: Use subagents for all formatting work to avoid polluting main context.

重要提示：所有格式化工作请使用subagent，以避免污染主上下文。

Phase 1: Setup and Extraction (Main Agent)

阶段1：设置与提取（主Agent）

Run the conversion script:

bash

python3 scripts/convert_book.py "/path/to/book.epub"

This script:

Verifies EPUB file exists
Creates project structure:
- ```
books/book-name/
```
  - Main directory
- ```
books/book-name/raw/
```
  - Pandoc output
- ```
books/book-name/chapters/
```
  - Formatted chapters
- ```
books/book-name/images/
```
  - Extracted images
Runs pandoc to extract Markdown
Copies formatting standards to project directory

Output: Raw Markdown in

books/book-name/raw/book-parsed.md

运行转换脚本：

bash

python3 scripts/convert_book.py "/path/to/book.epub"

该脚本会：

验证EPUB文件是否存在
创建项目结构：
- ```
books/book-name/
```
  - 主目录
- ```
books/book-name/raw/
```
  - Pandoc输出目录
- ```
books/book-name/chapters/
```
  - 格式化后的章节目录
- ```
books/book-name/images/
```
  - 提取的图片目录
运行pandoc提取Markdown
将格式标准文件复制到项目目录

输出：原始Markdown文件位于

books/book-name/raw/book-parsed.md

Phase 2: Analysis and Planning (Script + Subagent)

阶段2：分析与规划（脚本+Subagent）

Step 1: Run the structure analysis script (Main Agent):

bash

python3 books/book-name/analyze_structure.py books/book-name

This script:

Extracts all headers with line numbers
Detects formatting issues by sampling
Suggests chapter boundaries
Creates
```
STRUCTURE_ANALYSIS.md
```
report (~5-10 KB instead of 35k+ lines)

Step 2: Launch a general subagent to create mapping files:

python

Task(
  subagent_type="general",
  description="Create chapter map and formatting plan",
  prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:

1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
   - Use suggested chapter boundaries from analysis
   - Verify line ranges make sense
   - Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
   - Document issues found in analysis
   - Add severity and priority
   - Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete

Return: Summary of chapters found and major issues identified."""
)

Output:

CHAPTER_MAP.md

FORMATTING_PLAN.md

, and updated

progress.md

步骤1：运行结构分析脚本（主Agent）：

bash

python3 books/book-name/analyze_structure.py books/book-name

该脚本的功能：

提取所有带行号的标题
通过抽样检测格式问题
建议章节划分边界
生成
```
STRUCTURE_ANALYSIS.md
```
报告（约5-10 KB，替代35000+行的原始内容）

步骤2：启动一个通用subagent来创建映射文件：

python

Task(
  subagent_type="general",
  description="Create chapter map and formatting plan",
  prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:

1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
   - Use suggested chapter boundaries from analysis
   - Verify line ranges make sense
   - Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
   - Document issues found in analysis
   - Add severity and priority
   - Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete

Return: Summary of chapters found and major issues identified."""
)

输出：

CHAPTER_MAP.md

、

FORMATTING_PLAN.md

以及更新后的

progress.md

Phase 3: Chapter Formatting (Use Subagents)

阶段3：章节格式化（使用Subagent）

For EACH chapter, launch a separate general subagent:

python

undefined

针对每个章节，启动一个独立的通用subagent：

python

undefined

Example for Chapter 1

Task( subagent_type="general", description="Format Chapter 1", prompt="""Format Chapter 1 following the chapter formatting workflow.

Critical Instructions:

Read and follow ALL steps in books/book-name/references/chapter-workflow.md
Apply formatting rules from books/book-name/references/formatting-standards.md
Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
Read books/book-name/FORMATTING_PLAN.md for known issues to watch for

Workflow Summary (see chapter-workflow.md for complete details):

Step 1: Read Standards and Chapter Map

Read references/formatting-standards.md
Read CHAPTER_MAP.md for your chapter's line ranges
Read FORMATTING_PLAN.md for known issues

Step 2: Extract Chapter Content

Extract Chapter 1 from raw/book-parsed.md using line ranges

Step 3: Identify Issues follow the standards

Headers using bold instead of #
Shattered code blocks
Split paragraphs
Missing code language identifiers
Emphasis artifacts [word]
Corrupted footnotes
Missing image alt text
Broken links

Step 4: Apply Formatting Fixes

Follow the three-pass approach in chapter-workflow.md:
- First pass: Structure (headers, code blocks)
- Second pass: Content (paragraphs, emphasis)
- Third pass: Details (footnotes, images, links)

Step 5: Create Output File

Write to books/book-name/chapters/chapter-01-title.md
Use structure from chapter-workflow.md

Step 6: Update Progress

Update books/book-name/progress.md with completion status
Document fixes applied

Quality Checklist (from chapter-workflow.md):

All headers use proper # syntax
All code blocks have language identifiers
No shattered code blocks remain
Text flows naturally without mid-sentence breaks
All footnotes have [^N] format with definitions
Images have descriptive alt text

Return: Confirmation with summary of fixes applied.""" )


**Important**: 
- Launch subagents in parallel batches (3-5 at a time) for efficiency
- Each subagent must read chapter-workflow.md and formatting-standards.md
- Follow the systematic workflow to ensure consistent quality

**Output**: Formatted chapters in `books/book-name/chapters/`

Task( subagent_type="general", description="Format Chapter 1", prompt="""Format Chapter 1 following the chapter formatting workflow.

Critical Instructions:

Read and follow ALL steps in books/book-name/references/chapter-workflow.md
Apply formatting rules from books/book-name/references/formatting-standards.md
Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
Read books/book-name/FORMATTING_PLAN.md for known issues to watch for

Workflow Summary (see chapter-workflow.md for complete details):

Step 1: Read Standards and Chapter Map

Read references/formatting-standards.md
Read CHAPTER_MAP.md for your chapter's line ranges
Read FORMATTING_PLAN.md for known issues

Step 2: Extract Chapter Content

Extract Chapter 1 from raw/book-parsed.md using line ranges

Step 3: Identify Issues follow the standards

Headers using bold instead of #
Shattered code blocks
Split paragraphs
Missing code language identifiers
Emphasis artifacts [word]
Corrupted footnotes
Missing image alt text
Broken links

Step 4: Apply Formatting Fixes

Follow the three-pass approach in chapter-workflow.md:
- First pass: Structure (headers, code blocks)
- Second pass: Content (paragraphs, emphasis)
- Third pass: Details (footnotes, images, links)

Step 5: Create Output File

Write to books/book-name/chapters/chapter-01-title.md
Use structure from chapter-workflow.md

Step 6: Update Progress

Update books/book-name/progress.md with completion status
Document fixes applied

Quality Checklist (from chapter-workflow.md):

All headers use proper # syntax
All code blocks have language identifiers
No shattered code blocks remain
Text flows naturally without mid-sentence breaks
All footnotes have [^N] format with definitions
Images have descriptive alt text

Return: Confirmation with summary of fixes applied.""" )


**注意**：
- 为提升效率，可并行启动3-5个subagent批次
- 每个subagent必须读取chapter-workflow.md和formatting-standards.md
- 遵循系统化工作流以确保质量一致性

**输出**：格式化后的章节文件位于`books/book-name/chapters/`

Phase 4: Book Assembly (Main Agent)

阶段4：书籍组装（主Agent）

The

merge_book.py

script is already copied to your project directory. Simply run it:

bash

python3 books/book-name/merge_book.py books/book-name

The script will:

Read
```
CHAPTER_MAP.md
```
for chapter order
Load all formatted chapters from
```
chapters/
```
Extract headers for Table of Contents
Fix image paths (relative to final location)
Combine all chapters in order
Generate comprehensive TOC
Output to
```
books/book-name-book.md
```

Output:

books/book-name-book.md

with complete formatted book

Note: The merge script is reusable - no need to create it per book!

merge_book.py

脚本已复制到你的项目目录，直接运行即可：

bash

python3 books/book-name/merge_book.py books/book-name

该脚本会：

读取
```
CHAPTER_MAP.md
```
获取章节顺序
加载
```
chapters/
```
目录下所有格式化后的章节
提取标题以生成目录
修复图片路径（相对于最终文件位置）
按顺序合并所有章节
生成完整目录
输出到
```
books/book-name-book.md
```

输出：包含完整格式化书籍的

books/book-name-book.md

文件

提示：合并脚本可重复使用，无需为每本书重新创建！

Critical: Chapter Formatting Requirements

重要：章节格式化要求

Every subagent in Phase 3 MUST:

Read chapter-workflow.md first - Contains the complete step-by-step process
Read formatting-standards.md - Contains all formatting rules (678 lines)
Follow the workflow systematically - Don't skip steps
Use the three-pass approach:
- First pass: Fix structure (headers, code blocks)
- Second pass: Fix content (paragraphs, emphasis)
- Third pass: Fix details (footnotes, images, links)
Complete the quality checklist - Verify all items before finishing

Why this matters:

Ensures consistent quality across all chapters
Prevents common mistakes (skipped issues, inconsistent style)
Proven process from Clean Code Collection (35k+ lines)
Each chapter is only formatted once - must be thorough

The workflow documents are your complete instructions - trust them!

阶段3中的每个subagent必须：

先阅读chapter-workflow.md - 包含完整的分步流程
阅读formatting-standards.md - 包含所有格式规则（共678行）
系统化遵循工作流 - 请勿跳过步骤
采用三轮处理法：
- 第一轮：修复结构（标题、代码块）
- 第二轮：修复内容（段落、强调格式）
- 第三轮：修复细节（脚注、图片、链接）
完成质量检查清单 - 完成前验证所有项

为何如此重要：

确保所有章节的质量一致性
避免常见错误（遗漏问题、格式不一致）
该流程经《Clean Code Collection》（35000+行内容）验证有效
每个章节仅格式化一次，务必做到全面细致

工作流文档是你的完整操作指南，请严格遵循！

Subagent Usage Principles

Subagent使用原则

Never process book content in main context. Always use subagents to:

Keep main context clean: Book content is large and pollutes context
Enable parallelization: Format multiple chapters simultaneously
Isolate formatting work: Each chapter gets fresh context
Avoid token limits: Raw content can exceed context windows

Subagent Selection: Always use

subagent_type="general"

for all book processing tasks.

切勿在主上下文中处理书籍内容。请始终使用subagent来：

保持主上下文清洁：书籍内容体积庞大，会污染上下文
实现并行处理：可同时格式化多个章节
隔离格式化工作：每个章节使用全新上下文
避免令牌限制：原始内容可能超出上下文窗口限制

Subagent选择：所有书籍处理任务请始终使用

subagent_type="general"

Progress Tracking

进度跟踪

Create and maintain

books/book-name/progress.md

markdown

undefined

创建并维护

books/book-name/progress.md

文件：

markdown

undefined

Book Name - Conversion Progress

Phase 1: Setup ✓

EPUB extracted
Project structure created

EPUB extracted
Project structure created

Phase 2: Planning ✓

Chapter map created (15 chapters identified)
Formatting plan documented

Chapter map created (15 chapters identified)
Formatting plan documented

Phase 3: Chapter Formatting (5/15 complete)

Phase 4: Assembly

Merge script created
Final book generated


Update after each subagent completes.

Merge script created
Final book generated


每个subagent完成后更新该文件。

Quality Standards

质量标准

All formatted output must meet these criteria:

Headers: Use proper
```
#
```
syntax, not bold text
Code Blocks: Include language identifiers, merge shattered blocks
Text Flow: Join split sentences into natural paragraphs
Emphasis: Use
```
*italic*
```
and
```
**bold**
```
, not
```
[brackets]
```
Footnotes: Standard
```
[^1]
```
format with definitions
Images: Descriptive alt text, not generic filenames
Links: Clean anchors, no PDF conversion artifacts

Complete standards reference: references/formatting-standards.md

所有格式化输出必须满足以下标准：

标题：使用标准
```
#
```
语法，而非粗体文本
代码块：包含语言标识符，合并破碎的代码块
文本流畅性：将拆分的句子合并为自然段落
强调格式：使用
```
*斜体*
```
和
```
**粗体**
```
，而非
```
[方括号]
```
脚注：采用标准
```
[^1]
```
格式及定义
图片：使用描述性替代文本，而非通用文件名
链接：使用清晰的锚点，无PDF转换伪影

完整标准参考：references/formatting-standards.md

Example Usage

示例用法

User Request:

"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"

Skill Execution:

Run conversion script to extract content
Analyze structure and create chapter map
Format each chapter using AI subagents
Merge into final book with TOC
Provide user with
```
books/effective-java-final.md
```

用户请求：

"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"

技能执行流程：

运行转换脚本提取内容
分析结构并创建章节映射
使用AI subagent格式化每个章节
合并为带目录的最终书籍
向用户提供
```
books/effective-java-final.md
```
文件

Scripts

脚本列表

convert_book.py: Main conversion script (Phase 1) - Extracts EPUB and sets up project
analyze_structure.py: Structure analyzer (Phase 2) - Extracts headers and detects issues efficiently
merge_book.py: Reusable merge script (Phase 4) - Combines all chapters into final book

convert_book.py：主转换脚本（阶段1）- 提取EPUB并搭建项目
analyze_structure.py：结构分析器（阶段2）- 高效提取标题并检测问题
merge_book.py：可复用的合并脚本（阶段4）- 将所有章节合并为最终书籍

References

参考文档

formatting-standards.md: Complete formatting rules (loaded as needed during formatting)
chapter-workflow.md: Detailed chapter formatting workflow (loaded as needed)
progress-template.md: Template for progress tracking file
chapter-map-template.md: Template for chapter mapping
formatting-plan-template.md: Template for formatting issue documentation

formatting-standards.md：完整的格式规则（格式化过程中按需加载）
chapter-workflow.md：详细的章节格式化工作流（按需加载）
progress-template.md：进度跟踪文件模板
chapter-map-template.md：章节映射模板
formatting-plan-template.md：格式问题文档模板

Notes

注意事项

High Quality Focus: Manual AI-driven formatting ensures prose flows naturally
No Automated Scripts: Formatting requires human-like judgment for line joining
Preserve Content: Never alter meaning or remove content
Code Accuracy: Ensure code blocks are syntactically complete

高质量聚焦：人工式AI驱动格式化确保文本自然流畅
无自动化脚本：格式化需类人判断来合并行
保留内容：绝不修改语义或删除内容
代码准确性：确保代码块语法完整