document-docx

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Document DOCX Skill - Quick Reference

DOCX文档处理技能 - 快速参考

This skill enables creation, editing, and analysis of
.docx
files for reports, contracts, proposals, documentation, and template-driven outputs.
Modern best practices (2026):
  • Prefer templates + styles over manual formatting.
  • Treat
    .docx
    as the editable source; treat PDF as a release artifact.
  • If distributing externally, include basic accessibility hygiene (headings, table headers, alt text).
本技能支持为报告、合同、提案、文档和模板驱动型输出创建、编辑和分析
.docx
文件。
2026年现代最佳实践:
  • 优先使用模板+样式,而非手动格式设置。
  • .docx
    视为可编辑源文件;将PDF视为发布产物。
  • 若对外分发,需遵循基础可访问性规范(标题、表格表头、替代文本)。

Quick Reference

快速参考

TaskTool/LibraryLanguageWhen to Use
Create DOCXpython-docxPythonReports, contracts, proposals
Create DOCXdocxNode.jsServer-side document generation
Convert to HTMLmammoth.jsNode.jsWeb display, content extraction
Parse DOCXpython-docxPythonExtract text, tables, metadata
Template filldocxtplPythonMail merge, template-based generation
Review workflowWord compare, comments/highlightsAnyHuman review without OOXML surgery
Tracked changesOOXML inspection, docx4j/OpenXML SDK/AsposeAnyTrue redlines or parsing tracked changes
任务工具/库语言适用场景
创建DOCX文件python-docxPython报告、合同、提案
创建DOCX文件docxNode.js服务端文档生成
转换为HTMLmammoth.jsNode.jsWeb展示、内容提取
解析DOCX文件python-docxPython提取文本、表格、元数据
模板填充docxtplPython邮件合并、基于模板的生成
审阅工作流Word对比、批注/高亮任意无需操作OOXML的人工审阅
修订记录OOXML检查、docx4j/OpenXML SDK/Aspose任意真正的修订标记或解析修订记录

Tool Selection

工具选择

  • Prefer
    docxtpl
    when non-developers must edit layout/design in Word.
  • Prefer
    python-docx
    for structural edits (paragraphs/tables/headers/footers) when formatting complexity is moderate.
  • Prefer
    docx
    (Node.js) for server-side generation in TypeScript-heavy stacks.
  • Prefer
    mammoth
    for text-first extraction or DOCX-to-HTML (best effort; may drop some layout fidelity).
  • 当非开发人员需要在Word中编辑布局/设计时,优先使用
    docxtpl
  • 当格式复杂度适中,需要进行结构编辑(段落/表格/页眉/页脚)时,优先使用
    python-docx
  • 在以TypeScript为主的技术栈中进行服务端生成时,优先使用Node.js的
    docx
  • 若以文本提取为主或进行DOCX转HTML,优先使用
    mammoth
    (尽力还原,可能会丢失部分布局保真度)。

Known Limits (Plan Around These)

已知限制(需提前规划)

  • .doc
    (legacy) is not supported by these libraries; convert to
    .docx
    first (e.g., LibreOffice).
  • python-docx
    cannot reliably create true tracked changes; use Word compare or specialized OOXML tooling.
  • Tables of Contents and many fields are placeholders until opened/updated in Word.
  • 这些库不支持
    .doc
    (旧版格式);需先转换为
    .docx
    (例如使用LibreOffice)。
  • python-docx
    无法可靠创建真正的修订记录;请使用Word对比或专门的OOXML工具。
  • 目录和许多字段在Word中打开/更新前仅为占位符。

Core Operations

核心操作

Create Document (Python - python-docx)

创建文档(Python - python-docx)

python
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

doc = Document()
python
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

doc = Document()

Title

标题

title = doc.add_heading('Document Title', 0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER
title = doc.add_heading('Document Title', 0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER

Paragraph with formatting

带格式的段落

para = doc.add_paragraph() run = para.add_run('Bold and ') run.bold = True run = para.add_run('italic text.') run.italic = True
para = doc.add_paragraph() run = para.add_run('Bold and ') run.bold = True run = para.add_run('italic text.') run.italic = True

Table

表格

table = doc.add_table(rows=3, cols=3) table.style = 'Table Grid' for i, row in enumerate(table.rows): for j, cell in enumerate(row.cells): cell.text = f'Row {i+1}, Col {j+1}'
table = doc.add_table(rows=3, cols=3) table.style = 'Table Grid' for i, row in enumerate(table.rows): for j, cell in enumerate(row.cells): cell.text = f'Row {i+1}, Col {j+1}'

Image

图片

doc.add_picture('image.png', width=Inches(4))
doc.add_picture('image.png', width=Inches(4))

Save

保存

doc.save('output.docx')
undefined
doc.save('output.docx')
undefined

Create Document (Node.js - docx)

创建文档(Node.js - docx)

typescript
import { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell } from 'docx';
import * as fs from 'fs';

const doc = new Document({
  sections: [{
    properties: {},
    children: [
      new Paragraph({
        children: [
          new TextRun({ text: 'Bold text', bold: true }),
          new TextRun({ text: ' and normal text.' }),
        ],
      }),
      new Table({
        rows: [
          new TableRow({
            children: [
              new TableCell({ children: [new Paragraph('Cell 1')] }),
              new TableCell({ children: [new Paragraph('Cell 2')] }),
            ],
          }),
        ],
      }),
    ],
  }],
});

Packer.toBuffer(doc).then((buffer) => {
  fs.writeFileSync('output.docx', buffer);
});
typescript
import { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell } from 'docx';
import * as fs from 'fs';

const doc = new Document({
  sections: [{
    properties: {},
    children: [
      new Paragraph({
        children: [
          new TextRun({ text: 'Bold text', bold: true }),
          new TextRun({ text: ' and normal text.' }),
        ],
      }),
      new Table({
        rows: [
          new TableRow({
            children: [
              new TableCell({ children: [new Paragraph('Cell 1')] }),
              new TableCell({ children: [new Paragraph('Cell 2')] }),
            ],
          }),
        ],
      }),
    ],
  }],
});

Packer.toBuffer(doc).then((buffer) => {
  fs.writeFileSync('output.docx', buffer);
});

Template-Based Generation (Python - docxtpl)

基于模板的生成(Python - docxtpl)

python
from docxtpl import DocxTemplate

doc = DocxTemplate('template.docx')
context = {
    'company_name': 'Acme Corp',
    'date': '2025-01-15',
    'items': [
        {'name': 'Widget A', 'price': 100},
        {'name': 'Widget B', 'price': 200},
    ]
}
doc.render(context)
doc.save('filled_template.docx')
python
from docxtpl import DocxTemplate

doc = DocxTemplate('template.docx')
context = {
    'company_name': 'Acme Corp',
    'date': '2025-01-15',
    'items': [
        {'name': 'Widget A', 'price': 100},
        {'name': 'Widget B', 'price': 200},
    ]
}
doc.render(context)
doc.save('filled_template.docx')

Extract Content (Python - python-docx)

提取内容(Python - python-docx)

python
from docx import Document

doc = Document('input.docx')
python
from docx import Document

doc = Document('input.docx')

Extract all text

提取所有文本

full_text = [] for para in doc.paragraphs: full_text.append(para.text)
full_text = [] for para in doc.paragraphs: full_text.append(para.text)

Extract tables

提取表格

for table in doc.tables: for row in table.rows: row_data = [cell.text for cell in row.cells] print(row_data)
undefined
for table in doc.tables: for row in table.rows: row_data = [cell.text for cell in row.cells] print(row_data)
undefined

Styling Reference

样式参考

ElementPython MethodNode.js Class
Heading 1
add_heading(text, 1)
HeadingLevel.HEADING_1
Bold
run.bold = True
TextRun({ bold: true })
Italic
run.italic = True
TextRun({ italics: true })
Font size
run.font.size = Pt(12)
TextRun({ size: 24 })
(half-points)
Alignment
WD_ALIGN_PARAGRAPH.CENTER
AlignmentType.CENTER
Page break
doc.add_page_break()
new PageBreak()
元素Python方法Node.js类
一级标题
add_heading(text, 1)
HeadingLevel.HEADING_1
粗体
run.bold = True
TextRun({ bold: true })
斜体
run.italic = True
TextRun({ italics: true })
字体大小
run.font.size = Pt(12)
TextRun({ size: 24 })
(半磅单位)
对齐方式
WD_ALIGN_PARAGRAPH.CENTER
AlignmentType.CENTER
分页符
doc.add_page_break()
new PageBreak()

Do / Avoid (Dec 2025)

建议/避免事项(2025年12月)

Do

建议

  • Use consistent heading levels and a table of contents for long docs.
  • Capture decisions and action items with owners and due dates.
  • Store docs in a versioned, searchable system.
  • 长文档使用一致的标题层级和目录。
  • 记录决策和行动项,明确负责人和截止日期。
  • 将文档存储在带版本控制、可搜索的系统中。

Avoid

避免

  • Manual formatting instead of styles (breaks consistency).
  • Docs with no owner or review cadence (stale quickly).
  • Copy/pasting without updating definitions and links.
  • 用手动格式设置替代样式(破坏一致性)。
  • 没有负责人或审阅周期的文档(易过时)。
  • 复制粘贴后未更新定义和链接。

Output Quality Checklist

输出质量检查清单

  • Structure: consistent heading hierarchy, styles, and (when needed) an auto-generated table of contents.
  • Decisions: decisions/actions captured with owner + due date (not buried in prose).
  • Versioning: doc ID + version + change summary; review cadence defined.
  • Accessibility hygiene: headings/reading order are correct; table headers are marked; alt text for non-decorative images.
  • Reuse: use
    assets/doc-template-pack.md
    for decision logs and recurring doc types.
  • 结构:一致的标题层级、样式,必要时包含自动生成的目录。
  • 决策:记录的决策/行动项需明确负责人+截止日期(不要隐藏在正文中)。
  • 版本控制:文档ID+版本+变更摘要;明确审阅周期。
  • 可访问性规范:标题/阅读顺序正确;表格表头已标记;非装饰性图片添加替代文本。
  • 复用:对于决策日志和重复类型的文档,使用
    assets/doc-template-pack.md

Optional: AI / Automation

可选:AI/自动化

Use only when explicitly requested and policy-compliant.
  • Summarize meeting notes into decisions/actions; humans verify accuracy.
  • Draft first-pass docs from outlines; do not invent facts or quotes.
仅在明确要求且符合政策时使用。
  • 将会议纪要提炼为决策/行动项;需人工验证准确性。
  • 根据大纲起草初稿;不得编造事实或引用内容。

Navigation

导航

Resources
  • references/docx-patterns.md - Advanced formatting, styles, headers/footers
  • references/template-workflows.md - Mail merge, batch generation
  • references/tracked-changes.md - Tracked changes: what is feasible, and what is not
  • data/sources.json - Library documentation links
Scripts
  • scripts/docx_inspect_ooxml.py
    - Dependency-free OOXML inspection (including tracked changes signals)
  • scripts/docx_extract.py
    - Extract text/tables to JSON (requires
    python-docx
    )
  • scripts/docx_render_template.py
    - Render a
    docxtpl
    template (requires
    docxtpl
    )
  • scripts/docx_to_html.mjs
    - Convert
    .docx
    to HTML (requires
    mammoth
    )
Templates
  • assets/report-template.md - Standard report structure
  • assets/contract-template.md - Legal document structure
  • assets/doc-template-pack.md - Decision log, meeting notes, changelog templates
Related Skills
  • ../document-pdf/SKILL.md - PDF generation and conversion
  • ../docs-codebase/SKILL.md - Technical writing patterns
资源
  • references/docx-patterns.md - 高级格式设置、样式、页眉/页脚
  • references/template-workflows.md - 邮件合并、批量生成
  • references/tracked-changes.md - 修订记录:可行与不可行操作
  • data/sources.json - 库文档链接
脚本
  • scripts/docx_inspect_ooxml.py
    - 无依赖的OOXML检查(包括修订记录标识)
  • scripts/docx_extract.py
    - 将文本/表格提取为JSON(需
    python-docx
  • scripts/docx_render_template.py
    - 渲染
    docxtpl
    模板(需
    docxtpl
  • scripts/docx_to_html.mjs
    - 将
    .docx
    转换为HTML(需
    mammoth
模板
  • assets/report-template.md - 标准报告结构
  • assets/contract-template.md - 法律文档结构
  • assets/doc-template-pack.md - 决策日志、会议纪要、变更日志模板
相关技能
  • ../document-pdf/SKILL.md - PDF生成与转换
  • ../docs-codebase/SKILL.md - 技术写作模式