book-converter

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Book Converter Skill

书籍转换技能

Convert EPUB books into professionally formatted Markdown books with AI-assisted quality improvements.
使用AI辅助的质量优化功能将EPUB书籍转换为专业格式的Markdown书籍。

Overview

概述

This skill converts EPUB files into high-quality Markdown documents by:
  1. Using pandoc to extract raw Markdown from EPUB
  2. Creating a structured project directory
  3. Planning and executing AI-driven formatting fixes
  4. Producing chapter-by-chapter formatted output
  5. Generating merged book file with Table of Contents
本技能通过以下步骤将EPUB文件转换为高质量Markdown文档:
  1. 使用pandoc从EPUB中提取原始Markdown
  2. 创建结构化项目目录
  3. 规划并执行AI驱动的格式修复
  4. 生成按章节划分的格式化输出
  5. 生成带目录的合并书籍文件

Quick Start

快速开始

User provides an EPUB file path:
/Users/username/Downloads/Book.Name.2024.epub
Execute the conversion workflow:
bash
python3 scripts/convert_book.py "/path/to/book.epub"
This initiates the complete conversion process.
用户提供EPUB文件路径:
/Users/username/Downloads/Book.Name.2024.epub
执行转换工作流:
bash
python3 scripts/convert_book.py "/path/to/book.epub"
这将启动完整的转换流程。

Workflow

工作流

CRITICAL: Use subagents for all formatting work to avoid polluting main context.
重要提示:所有格式化工作请使用subagent,以避免污染主上下文。

Phase 1: Setup and Extraction (Main Agent)

阶段1:设置与提取(主Agent)

Run the conversion script:
bash
python3 scripts/convert_book.py "/path/to/book.epub"
This script:
  1. Verifies EPUB file exists
  2. Creates project structure:
    • books/book-name/
      - Main directory
    • books/book-name/raw/
      - Pandoc output
    • books/book-name/chapters/
      - Formatted chapters
    • books/book-name/images/
      - Extracted images
  3. Runs pandoc to extract Markdown
  4. Copies formatting standards to project directory
Output: Raw Markdown in
books/book-name/raw/book-parsed.md
运行转换脚本:
bash
python3 scripts/convert_book.py "/path/to/book.epub"
该脚本会:
  1. 验证EPUB文件是否存在
  2. 创建项目结构:
    • books/book-name/
      - 主目录
    • books/book-name/raw/
      - Pandoc输出目录
    • books/book-name/chapters/
      - 格式化后的章节目录
    • books/book-name/images/
      - 提取的图片目录
  3. 运行pandoc提取Markdown
  4. 将格式标准文件复制到项目目录
输出:原始Markdown文件位于
books/book-name/raw/book-parsed.md

Phase 2: Analysis and Planning (Script + Subagent)

阶段2:分析与规划(脚本+Subagent)

Step 1: Run the structure analysis script (Main Agent):
bash
python3 books/book-name/analyze_structure.py books/book-name
This script:
  • Extracts all headers with line numbers
  • Detects formatting issues by sampling
  • Suggests chapter boundaries
  • Creates
    STRUCTURE_ANALYSIS.md
    report (~5-10 KB instead of 35k+ lines)
Step 2: Launch a general subagent to create mapping files:
python
Task(
  subagent_type="general",
  description="Create chapter map and formatting plan",
  prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:

1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
   - Use suggested chapter boundaries from analysis
   - Verify line ranges make sense
   - Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
   - Document issues found in analysis
   - Add severity and priority
   - Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete

Return: Summary of chapters found and major issues identified."""
)
Output:
CHAPTER_MAP.md
,
FORMATTING_PLAN.md
, and updated
progress.md
步骤1:运行结构分析脚本(主Agent):
bash
python3 books/book-name/analyze_structure.py books/book-name
该脚本的功能:
  • 提取所有带行号的标题
  • 通过抽样检测格式问题
  • 建议章节划分边界
  • 生成
    STRUCTURE_ANALYSIS.md
    报告(约5-10 KB,替代35000+行的原始内容)
步骤2:启动一个通用subagent来创建映射文件:
python
Task(
  subagent_type="general",
  description="Create chapter map and formatting plan",
  prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:

1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
   - Use suggested chapter boundaries from analysis
   - Verify line ranges make sense
   - Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
   - Document issues found in analysis
   - Add severity and priority
   - Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete

Return: Summary of chapters found and major issues identified."""
)
输出
CHAPTER_MAP.md
FORMATTING_PLAN.md
以及更新后的
progress.md

Phase 3: Chapter Formatting (Use Subagents)

阶段3:章节格式化(使用Subagent)

For EACH chapter, launch a separate general subagent:
python
undefined
针对每个章节,启动一个独立的通用subagent:
python
undefined

Example for Chapter 1

Example for Chapter 1

Task( subagent_type="general", description="Format Chapter 1", prompt="""Format Chapter 1 following the chapter formatting workflow.
Critical Instructions:
  1. Read and follow ALL steps in books/book-name/references/chapter-workflow.md
  2. Apply formatting rules from books/book-name/references/formatting-standards.md
  3. Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
  4. Read books/book-name/FORMATTING_PLAN.md for known issues to watch for
Workflow Summary (see chapter-workflow.md for complete details):
Step 1: Read Standards and Chapter Map
  • Read references/formatting-standards.md
  • Read CHAPTER_MAP.md for your chapter's line ranges
  • Read FORMATTING_PLAN.md for known issues
Step 2: Extract Chapter Content
  • Extract Chapter 1 from raw/book-parsed.md using line ranges
Step 3: Identify Issues follow the standards
  • Headers using bold instead of #
  • Shattered code blocks
  • Split paragraphs
  • Missing code language identifiers
  • Emphasis artifacts [word]
  • Corrupted footnotes
  • Missing image alt text
  • Broken links
Step 4: Apply Formatting Fixes
  • Follow the three-pass approach in chapter-workflow.md:
    • First pass: Structure (headers, code blocks)
    • Second pass: Content (paragraphs, emphasis)
    • Third pass: Details (footnotes, images, links)
Step 5: Create Output File
  • Write to books/book-name/chapters/chapter-01-title.md
  • Use structure from chapter-workflow.md
Step 6: Update Progress
  • Update books/book-name/progress.md with completion status
  • Document fixes applied
Quality Checklist (from chapter-workflow.md):
  • All headers use proper # syntax
  • All code blocks have language identifiers
  • No shattered code blocks remain
  • Text flows naturally without mid-sentence breaks
  • All footnotes have [^N] format with definitions
  • Images have descriptive alt text
Return: Confirmation with summary of fixes applied.""" )

**Important**: 
- Launch subagents in parallel batches (3-5 at a time) for efficiency
- Each subagent must read chapter-workflow.md and formatting-standards.md
- Follow the systematic workflow to ensure consistent quality

**Output**: Formatted chapters in `books/book-name/chapters/`
Task( subagent_type="general", description="Format Chapter 1", prompt="""Format Chapter 1 following the chapter formatting workflow.
Critical Instructions:
  1. Read and follow ALL steps in books/book-name/references/chapter-workflow.md
  2. Apply formatting rules from books/book-name/references/formatting-standards.md
  3. Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
  4. Read books/book-name/FORMATTING_PLAN.md for known issues to watch for
Workflow Summary (see chapter-workflow.md for complete details):
Step 1: Read Standards and Chapter Map
  • Read references/formatting-standards.md
  • Read CHAPTER_MAP.md for your chapter's line ranges
  • Read FORMATTING_PLAN.md for known issues
Step 2: Extract Chapter Content
  • Extract Chapter 1 from raw/book-parsed.md using line ranges
Step 3: Identify Issues follow the standards
  • Headers using bold instead of #
  • Shattered code blocks
  • Split paragraphs
  • Missing code language identifiers
  • Emphasis artifacts [word]
  • Corrupted footnotes
  • Missing image alt text
  • Broken links
Step 4: Apply Formatting Fixes
  • Follow the three-pass approach in chapter-workflow.md:
    • First pass: Structure (headers, code blocks)
    • Second pass: Content (paragraphs, emphasis)
    • Third pass: Details (footnotes, images, links)
Step 5: Create Output File
  • Write to books/book-name/chapters/chapter-01-title.md
  • Use structure from chapter-workflow.md
Step 6: Update Progress
  • Update books/book-name/progress.md with completion status
  • Document fixes applied
Quality Checklist (from chapter-workflow.md):
  • All headers use proper # syntax
  • All code blocks have language identifiers
  • No shattered code blocks remain
  • Text flows naturally without mid-sentence breaks
  • All footnotes have [^N] format with definitions
  • Images have descriptive alt text
Return: Confirmation with summary of fixes applied.""" )

**注意**:
- 为提升效率,可并行启动3-5个subagent批次
- 每个subagent必须读取chapter-workflow.md和formatting-standards.md
- 遵循系统化工作流以确保质量一致性

**输出**:格式化后的章节文件位于`books/book-name/chapters/`

Phase 4: Book Assembly (Main Agent)

阶段4:书籍组装(主Agent)

The
merge_book.py
script is already copied to your project directory. Simply run it:
bash
python3 books/book-name/merge_book.py books/book-name
The script will:
  1. Read
    CHAPTER_MAP.md
    for chapter order
  2. Load all formatted chapters from
    chapters/
  3. Extract headers for Table of Contents
  4. Fix image paths (relative to final location)
  5. Combine all chapters in order
  6. Generate comprehensive TOC
  7. Output to
    books/book-name-book.md
Output:
books/book-name-book.md
with complete formatted book
Note: The merge script is reusable - no need to create it per book!
merge_book.py
脚本已复制到你的项目目录,直接运行即可:
bash
python3 books/book-name/merge_book.py books/book-name
该脚本会:
  1. 读取
    CHAPTER_MAP.md
    获取章节顺序
  2. 加载
    chapters/
    目录下所有格式化后的章节
  3. 提取标题以生成目录
  4. 修复图片路径(相对于最终文件位置)
  5. 按顺序合并所有章节
  6. 生成完整目录
  7. 输出到
    books/book-name-book.md
输出:包含完整格式化书籍的
books/book-name-book.md
文件
提示:合并脚本可重复使用,无需为每本书重新创建!

Critical: Chapter Formatting Requirements

重要:章节格式化要求

Every subagent in Phase 3 MUST:
  1. Read chapter-workflow.md first - Contains the complete step-by-step process
  2. Read formatting-standards.md - Contains all formatting rules (678 lines)
  3. Follow the workflow systematically - Don't skip steps
  4. Use the three-pass approach:
    • First pass: Fix structure (headers, code blocks)
    • Second pass: Fix content (paragraphs, emphasis)
    • Third pass: Fix details (footnotes, images, links)
  5. Complete the quality checklist - Verify all items before finishing
Why this matters:
  • Ensures consistent quality across all chapters
  • Prevents common mistakes (skipped issues, inconsistent style)
  • Proven process from Clean Code Collection (35k+ lines)
  • Each chapter is only formatted once - must be thorough
The workflow documents are your complete instructions - trust them!
阶段3中的每个subagent必须:
  1. 先阅读chapter-workflow.md - 包含完整的分步流程
  2. 阅读formatting-standards.md - 包含所有格式规则(共678行)
  3. 系统化遵循工作流 - 请勿跳过步骤
  4. 采用三轮处理法
    • 第一轮:修复结构(标题、代码块)
    • 第二轮:修复内容(段落、强调格式)
    • 第三轮:修复细节(脚注、图片、链接)
  5. 完成质量检查清单 - 完成前验证所有项
为何如此重要:
  • 确保所有章节的质量一致性
  • 避免常见错误(遗漏问题、格式不一致)
  • 该流程经《Clean Code Collection》(35000+行内容)验证有效
  • 每个章节仅格式化一次,务必做到全面细致
工作流文档是你的完整操作指南,请严格遵循!

Subagent Usage Principles

Subagent使用原则

Never process book content in main context. Always use subagents to:
  1. Keep main context clean: Book content is large and pollutes context
  2. Enable parallelization: Format multiple chapters simultaneously
  3. Isolate formatting work: Each chapter gets fresh context
  4. Avoid token limits: Raw content can exceed context windows
Subagent Selection: Always use
subagent_type="general"
for all book processing tasks.
切勿在主上下文中处理书籍内容。请始终使用subagent来:
  1. 保持主上下文清洁:书籍内容体积庞大,会污染上下文
  2. 实现并行处理:可同时格式化多个章节
  3. 隔离格式化工作:每个章节使用全新上下文
  4. 避免令牌限制:原始内容可能超出上下文窗口限制
Subagent选择:所有书籍处理任务请始终使用
subagent_type="general"

Progress Tracking

进度跟踪

Create and maintain
books/book-name/progress.md
:
markdown
undefined
创建并维护
books/book-name/progress.md
文件:
markdown
undefined

Book Name - Conversion Progress

Book Name - Conversion Progress

Phase 1: Setup ✓

Phase 1: Setup ✓

  • EPUB extracted
  • Project structure created
  • EPUB extracted
  • Project structure created

Phase 2: Planning ✓

Phase 2: Planning ✓

  • Chapter map created (15 chapters identified)
  • Formatting plan documented
  • Chapter map created (15 chapters identified)
  • Formatting plan documented

Phase 3: Chapter Formatting (5/15 complete)

Phase 3: Chapter Formatting (5/15 complete)

  • Front Matter
  • Chapter 1: Introduction
  • Chapter 2: Getting Started
  • Chapter 3: Advanced Topics
  • Chapter 4: Best Practices
  • Chapter 5: Performance
  • ...
  • Front Matter
  • Chapter 1: Introduction
  • Chapter 2: Getting Started
  • Chapter 3: Advanced Topics
  • Chapter 4: Best Practices
  • Chapter 5: Performance
  • ...

Phase 4: Assembly

Phase 4: Assembly

  • Merge script created
  • Final book generated

Update after each subagent completes.
  • Merge script created
  • Final book generated

每个subagent完成后更新该文件。

Quality Standards

质量标准

All formatted output must meet these criteria:
  • Headers: Use proper
    #
    syntax, not bold text
  • Code Blocks: Include language identifiers, merge shattered blocks
  • Text Flow: Join split sentences into natural paragraphs
  • Emphasis: Use
    *italic*
    and
    **bold**
    , not
    [brackets]
  • Footnotes: Standard
    [^1]
    format with definitions
  • Images: Descriptive alt text, not generic filenames
  • Links: Clean anchors, no PDF conversion artifacts
Complete standards reference: references/formatting-standards.md
所有格式化输出必须满足以下标准:
  • 标题:使用标准
    #
    语法,而非粗体文本
  • 代码块:包含语言标识符,合并破碎的代码块
  • 文本流畅性:将拆分的句子合并为自然段落
  • 强调格式:使用
    *斜体*
    **粗体**
    ,而非
    [方括号]
  • 脚注:采用标准
    [^1]
    格式及定义
  • 图片:使用描述性替代文本,而非通用文件名
  • 链接:使用清晰的锚点,无PDF转换伪影
完整标准参考:references/formatting-standards.md

Example Usage

示例用法

User Request:
"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"
Skill Execution:
  1. Run conversion script to extract content
  2. Analyze structure and create chapter map
  3. Format each chapter using AI subagents
  4. Merge into final book with TOC
  5. Provide user with
    books/effective-java-final.md
用户请求:
"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"
技能执行流程:
  1. 运行转换脚本提取内容
  2. 分析结构并创建章节映射
  3. 使用AI subagent格式化每个章节
  4. 合并为带目录的最终书籍
  5. 向用户提供
    books/effective-java-final.md
    文件

Scripts

脚本列表

  • convert_book.py: Main conversion script (Phase 1) - Extracts EPUB and sets up project
  • analyze_structure.py: Structure analyzer (Phase 2) - Extracts headers and detects issues efficiently
  • merge_book.py: Reusable merge script (Phase 4) - Combines all chapters into final book
  • convert_book.py:主转换脚本(阶段1)- 提取EPUB并搭建项目
  • analyze_structure.py:结构分析器(阶段2)- 高效提取标题并检测问题
  • merge_book.py:可复用的合并脚本(阶段4)- 将所有章节合并为最终书籍

References

参考文档

  • formatting-standards.md: Complete formatting rules (loaded as needed during formatting)
  • chapter-workflow.md: Detailed chapter formatting workflow (loaded as needed)
  • progress-template.md: Template for progress tracking file
  • chapter-map-template.md: Template for chapter mapping
  • formatting-plan-template.md: Template for formatting issue documentation
  • formatting-standards.md:完整的格式规则(格式化过程中按需加载)
  • chapter-workflow.md:详细的章节格式化工作流(按需加载)
  • progress-template.md:进度跟踪文件模板
  • chapter-map-template.md:章节映射模板
  • formatting-plan-template.md:格式问题文档模板

Notes

注意事项

  • High Quality Focus: Manual AI-driven formatting ensures prose flows naturally
  • No Automated Scripts: Formatting requires human-like judgment for line joining
  • Preserve Content: Never alter meaning or remove content
  • Code Accuracy: Ensure code blocks are syntactically complete
  • 高质量聚焦:人工式AI驱动格式化确保文本自然流畅
  • 无自动化脚本:格式化需类人判断来合并行
  • 保留内容:绝不修改语义或删除内容
  • 代码准确性:确保代码块语法完整