book-converter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBook Converter Skill
书籍转换技能
Convert EPUB books into professionally formatted Markdown books with AI-assisted quality improvements.
使用AI辅助的质量优化功能将EPUB书籍转换为专业格式的Markdown书籍。
Overview
概述
This skill converts EPUB files into high-quality Markdown documents by:
- Using pandoc to extract raw Markdown from EPUB
- Creating a structured project directory
- Planning and executing AI-driven formatting fixes
- Producing chapter-by-chapter formatted output
- Generating merged book file with Table of Contents
本技能通过以下步骤将EPUB文件转换为高质量Markdown文档:
- 使用pandoc从EPUB中提取原始Markdown
- 创建结构化项目目录
- 规划并执行AI驱动的格式修复
- 生成按章节划分的格式化输出
- 生成带目录的合并书籍文件
Quick Start
快速开始
User provides an EPUB file path:
/Users/username/Downloads/Book.Name.2024.epubExecute the conversion workflow:
bash
python3 scripts/convert_book.py "/path/to/book.epub"This initiates the complete conversion process.
用户提供EPUB文件路径:
/Users/username/Downloads/Book.Name.2024.epub执行转换工作流:
bash
python3 scripts/convert_book.py "/path/to/book.epub"这将启动完整的转换流程。
Workflow
工作流
CRITICAL: Use subagents for all formatting work to avoid polluting main context.
重要提示:所有格式化工作请使用subagent,以避免污染主上下文。
Phase 1: Setup and Extraction (Main Agent)
阶段1:设置与提取(主Agent)
Run the conversion script:
bash
python3 scripts/convert_book.py "/path/to/book.epub"This script:
- Verifies EPUB file exists
- Creates project structure:
- - Main directory
books/book-name/ - - Pandoc output
books/book-name/raw/ - - Formatted chapters
books/book-name/chapters/ - - Extracted images
books/book-name/images/
- Runs pandoc to extract Markdown
- Copies formatting standards to project directory
Output: Raw Markdown in
books/book-name/raw/book-parsed.md运行转换脚本:
bash
python3 scripts/convert_book.py "/path/to/book.epub"该脚本会:
- 验证EPUB文件是否存在
- 创建项目结构:
- - 主目录
books/book-name/ - - Pandoc输出目录
books/book-name/raw/ - - 格式化后的章节目录
books/book-name/chapters/ - - 提取的图片目录
books/book-name/images/
- 运行pandoc提取Markdown
- 将格式标准文件复制到项目目录
输出:原始Markdown文件位于
books/book-name/raw/book-parsed.mdPhase 2: Analysis and Planning (Script + Subagent)
阶段2:分析与规划(脚本+Subagent)
Step 1: Run the structure analysis script (Main Agent):
bash
python3 books/book-name/analyze_structure.py books/book-nameThis script:
- Extracts all headers with line numbers
- Detects formatting issues by sampling
- Suggests chapter boundaries
- Creates report (~5-10 KB instead of 35k+ lines)
STRUCTURE_ANALYSIS.md
Step 2: Launch a general subagent to create mapping files:
python
Task(
subagent_type="general",
description="Create chapter map and formatting plan",
prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:
1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
- Use suggested chapter boundaries from analysis
- Verify line ranges make sense
- Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
- Document issues found in analysis
- Add severity and priority
- Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete
Return: Summary of chapters found and major issues identified."""
)Output: , , and updated
CHAPTER_MAP.mdFORMATTING_PLAN.mdprogress.md步骤1:运行结构分析脚本(主Agent):
bash
python3 books/book-name/analyze_structure.py books/book-name该脚本的功能:
- 提取所有带行号的标题
- 通过抽样检测格式问题
- 建议章节划分边界
- 生成报告(约5-10 KB,替代35000+行的原始内容)
STRUCTURE_ANALYSIS.md
步骤2:启动一个通用subagent来创建映射文件:
python
Task(
subagent_type="general",
description="Create chapter map and formatting plan",
prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:
1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
- Use suggested chapter boundaries from analysis
- Verify line ranges make sense
- Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
- Document issues found in analysis
- Add severity and priority
- Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete
Return: Summary of chapters found and major issues identified."""
)输出:、以及更新后的
CHAPTER_MAP.mdFORMATTING_PLAN.mdprogress.mdPhase 3: Chapter Formatting (Use Subagents)
阶段3:章节格式化(使用Subagent)
For EACH chapter, launch a separate general subagent:
python
undefined针对每个章节,启动一个独立的通用subagent:
python
undefinedExample for Chapter 1
Example for Chapter 1
Task(
subagent_type="general",
description="Format Chapter 1",
prompt="""Format Chapter 1 following the chapter formatting workflow.
Critical Instructions:
- Read and follow ALL steps in books/book-name/references/chapter-workflow.md
- Apply formatting rules from books/book-name/references/formatting-standards.md
- Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
- Read books/book-name/FORMATTING_PLAN.md for known issues to watch for
Workflow Summary (see chapter-workflow.md for complete details):
Step 1: Read Standards and Chapter Map
- Read references/formatting-standards.md
- Read CHAPTER_MAP.md for your chapter's line ranges
- Read FORMATTING_PLAN.md for known issues
Step 2: Extract Chapter Content
- Extract Chapter 1 from raw/book-parsed.md using line ranges
Step 3: Identify Issues follow the standards
- Headers using bold instead of #
- Shattered code blocks
- Split paragraphs
- Missing code language identifiers
- Emphasis artifacts [word]
- Corrupted footnotes
- Missing image alt text
- Broken links
Step 4: Apply Formatting Fixes
- Follow the three-pass approach in chapter-workflow.md:
- First pass: Structure (headers, code blocks)
- Second pass: Content (paragraphs, emphasis)
- Third pass: Details (footnotes, images, links)
Step 5: Create Output File
- Write to books/book-name/chapters/chapter-01-title.md
- Use structure from chapter-workflow.md
Step 6: Update Progress
- Update books/book-name/progress.md with completion status
- Document fixes applied
Quality Checklist (from chapter-workflow.md):
- All headers use proper # syntax
- All code blocks have language identifiers
- No shattered code blocks remain
- Text flows naturally without mid-sentence breaks
- All footnotes have [^N] format with definitions
- Images have descriptive alt text
Return: Confirmation with summary of fixes applied."""
)
**Important**:
- Launch subagents in parallel batches (3-5 at a time) for efficiency
- Each subagent must read chapter-workflow.md and formatting-standards.md
- Follow the systematic workflow to ensure consistent quality
**Output**: Formatted chapters in `books/book-name/chapters/`Task(
subagent_type="general",
description="Format Chapter 1",
prompt="""Format Chapter 1 following the chapter formatting workflow.
Critical Instructions:
- Read and follow ALL steps in books/book-name/references/chapter-workflow.md
- Apply formatting rules from books/book-name/references/formatting-standards.md
- Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
- Read books/book-name/FORMATTING_PLAN.md for known issues to watch for
Workflow Summary (see chapter-workflow.md for complete details):
Step 1: Read Standards and Chapter Map
- Read references/formatting-standards.md
- Read CHAPTER_MAP.md for your chapter's line ranges
- Read FORMATTING_PLAN.md for known issues
Step 2: Extract Chapter Content
- Extract Chapter 1 from raw/book-parsed.md using line ranges
Step 3: Identify Issues follow the standards
- Headers using bold instead of #
- Shattered code blocks
- Split paragraphs
- Missing code language identifiers
- Emphasis artifacts [word]
- Corrupted footnotes
- Missing image alt text
- Broken links
Step 4: Apply Formatting Fixes
- Follow the three-pass approach in chapter-workflow.md:
- First pass: Structure (headers, code blocks)
- Second pass: Content (paragraphs, emphasis)
- Third pass: Details (footnotes, images, links)
Step 5: Create Output File
- Write to books/book-name/chapters/chapter-01-title.md
- Use structure from chapter-workflow.md
Step 6: Update Progress
- Update books/book-name/progress.md with completion status
- Document fixes applied
Quality Checklist (from chapter-workflow.md):
- All headers use proper # syntax
- All code blocks have language identifiers
- No shattered code blocks remain
- Text flows naturally without mid-sentence breaks
- All footnotes have [^N] format with definitions
- Images have descriptive alt text
Return: Confirmation with summary of fixes applied."""
)
**注意**:
- 为提升效率,可并行启动3-5个subagent批次
- 每个subagent必须读取chapter-workflow.md和formatting-standards.md
- 遵循系统化工作流以确保质量一致性
**输出**:格式化后的章节文件位于`books/book-name/chapters/`Phase 4: Book Assembly (Main Agent)
阶段4:书籍组装(主Agent)
The script is already copied to your project directory. Simply run it:
merge_book.pybash
python3 books/book-name/merge_book.py books/book-nameThe script will:
- Read for chapter order
CHAPTER_MAP.md - Load all formatted chapters from
chapters/ - Extract headers for Table of Contents
- Fix image paths (relative to final location)
- Combine all chapters in order
- Generate comprehensive TOC
- Output to
books/book-name-book.md
Output: with complete formatted book
books/book-name-book.mdNote: The merge script is reusable - no need to create it per book!
merge_book.pybash
python3 books/book-name/merge_book.py books/book-name该脚本会:
- 读取获取章节顺序
CHAPTER_MAP.md - 加载目录下所有格式化后的章节
chapters/ - 提取标题以生成目录
- 修复图片路径(相对于最终文件位置)
- 按顺序合并所有章节
- 生成完整目录
- 输出到
books/book-name-book.md
输出:包含完整格式化书籍的文件
books/book-name-book.md提示:合并脚本可重复使用,无需为每本书重新创建!
Critical: Chapter Formatting Requirements
重要:章节格式化要求
Every subagent in Phase 3 MUST:
- Read chapter-workflow.md first - Contains the complete step-by-step process
- Read formatting-standards.md - Contains all formatting rules (678 lines)
- Follow the workflow systematically - Don't skip steps
- Use the three-pass approach:
- First pass: Fix structure (headers, code blocks)
- Second pass: Fix content (paragraphs, emphasis)
- Third pass: Fix details (footnotes, images, links)
- Complete the quality checklist - Verify all items before finishing
Why this matters:
- Ensures consistent quality across all chapters
- Prevents common mistakes (skipped issues, inconsistent style)
- Proven process from Clean Code Collection (35k+ lines)
- Each chapter is only formatted once - must be thorough
The workflow documents are your complete instructions - trust them!
阶段3中的每个subagent必须:
- 先阅读chapter-workflow.md - 包含完整的分步流程
- 阅读formatting-standards.md - 包含所有格式规则(共678行)
- 系统化遵循工作流 - 请勿跳过步骤
- 采用三轮处理法:
- 第一轮:修复结构(标题、代码块)
- 第二轮:修复内容(段落、强调格式)
- 第三轮:修复细节(脚注、图片、链接)
- 完成质量检查清单 - 完成前验证所有项
为何如此重要:
- 确保所有章节的质量一致性
- 避免常见错误(遗漏问题、格式不一致)
- 该流程经《Clean Code Collection》(35000+行内容)验证有效
- 每个章节仅格式化一次,务必做到全面细致
工作流文档是你的完整操作指南,请严格遵循!
Subagent Usage Principles
Subagent使用原则
Never process book content in main context. Always use subagents to:
- Keep main context clean: Book content is large and pollutes context
- Enable parallelization: Format multiple chapters simultaneously
- Isolate formatting work: Each chapter gets fresh context
- Avoid token limits: Raw content can exceed context windows
Subagent Selection: Always use for all book processing tasks.
subagent_type="general"切勿在主上下文中处理书籍内容。请始终使用subagent来:
- 保持主上下文清洁:书籍内容体积庞大,会污染上下文
- 实现并行处理:可同时格式化多个章节
- 隔离格式化工作:每个章节使用全新上下文
- 避免令牌限制:原始内容可能超出上下文窗口限制
Subagent选择:所有书籍处理任务请始终使用
subagent_type="general"Progress Tracking
进度跟踪
Create and maintain :
books/book-name/progress.mdmarkdown
undefined创建并维护文件:
books/book-name/progress.mdmarkdown
undefinedBook Name - Conversion Progress
Book Name - Conversion Progress
Phase 1: Setup ✓
Phase 1: Setup ✓
- EPUB extracted
- Project structure created
- EPUB extracted
- Project structure created
Phase 2: Planning ✓
Phase 2: Planning ✓
- Chapter map created (15 chapters identified)
- Formatting plan documented
- Chapter map created (15 chapters identified)
- Formatting plan documented
Phase 3: Chapter Formatting (5/15 complete)
Phase 3: Chapter Formatting (5/15 complete)
- Front Matter
- Chapter 1: Introduction
- Chapter 2: Getting Started
- Chapter 3: Advanced Topics
- Chapter 4: Best Practices
- Chapter 5: Performance
- ...
- Front Matter
- Chapter 1: Introduction
- Chapter 2: Getting Started
- Chapter 3: Advanced Topics
- Chapter 4: Best Practices
- Chapter 5: Performance
- ...
Phase 4: Assembly
Phase 4: Assembly
- Merge script created
- Final book generated
Update after each subagent completes.- Merge script created
- Final book generated
每个subagent完成后更新该文件。Quality Standards
质量标准
All formatted output must meet these criteria:
- Headers: Use proper syntax, not bold text
# - Code Blocks: Include language identifiers, merge shattered blocks
- Text Flow: Join split sentences into natural paragraphs
- Emphasis: Use and
*italic*, not**bold**[brackets] - Footnotes: Standard format with definitions
[^1] - Images: Descriptive alt text, not generic filenames
- Links: Clean anchors, no PDF conversion artifacts
Complete standards reference: references/formatting-standards.md
所有格式化输出必须满足以下标准:
- 标题:使用标准语法,而非粗体文本
# - 代码块:包含语言标识符,合并破碎的代码块
- 文本流畅性:将拆分的句子合并为自然段落
- 强调格式:使用和
*斜体*,而非**粗体**[方括号] - 脚注:采用标准格式及定义
[^1] - 图片:使用描述性替代文本,而非通用文件名
- 链接:使用清晰的锚点,无PDF转换伪影
完整标准参考:references/formatting-standards.md
Example Usage
示例用法
User Request:
"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"
Skill Execution:
- Run conversion script to extract content
- Analyze structure and create chapter map
- Format each chapter using AI subagents
- Merge into final book with TOC
- Provide user with
books/effective-java-final.md
用户请求:
"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"
技能执行流程:
- 运行转换脚本提取内容
- 分析结构并创建章节映射
- 使用AI subagent格式化每个章节
- 合并为带目录的最终书籍
- 向用户提供文件
books/effective-java-final.md
Scripts
脚本列表
- convert_book.py: Main conversion script (Phase 1) - Extracts EPUB and sets up project
- analyze_structure.py: Structure analyzer (Phase 2) - Extracts headers and detects issues efficiently
- merge_book.py: Reusable merge script (Phase 4) - Combines all chapters into final book
- convert_book.py:主转换脚本(阶段1)- 提取EPUB并搭建项目
- analyze_structure.py:结构分析器(阶段2)- 高效提取标题并检测问题
- merge_book.py:可复用的合并脚本(阶段4)- 将所有章节合并为最终书籍
References
参考文档
- formatting-standards.md: Complete formatting rules (loaded as needed during formatting)
- chapter-workflow.md: Detailed chapter formatting workflow (loaded as needed)
- progress-template.md: Template for progress tracking file
- chapter-map-template.md: Template for chapter mapping
- formatting-plan-template.md: Template for formatting issue documentation
- formatting-standards.md:完整的格式规则(格式化过程中按需加载)
- chapter-workflow.md:详细的章节格式化工作流(按需加载)
- progress-template.md:进度跟踪文件模板
- chapter-map-template.md:章节映射模板
- formatting-plan-template.md:格式问题文档模板
Notes
注意事项
- High Quality Focus: Manual AI-driven formatting ensures prose flows naturally
- No Automated Scripts: Formatting requires human-like judgment for line joining
- Preserve Content: Never alter meaning or remove content
- Code Accuracy: Ensure code blocks are syntactically complete
- 高质量聚焦:人工式AI驱动格式化确保文本自然流畅
- 无自动化脚本:格式化需类人判断来合并行
- 保留内容:绝不修改语义或删除内容
- 代码准确性:确保代码块语法完整