document-docx
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocument DOCX Skill - Quick Reference
DOCX文档处理技能 - 快速参考
This skill enables creation, editing, and analysis of files for reports, contracts, proposals, documentation, and template-driven outputs.
.docxModern best practices (2026):
- Prefer templates + styles over manual formatting.
- Treat as the editable source; treat PDF as a release artifact.
.docx - If distributing externally, include basic accessibility hygiene (headings, table headers, alt text).
本技能支持为报告、合同、提案、文档和模板驱动型输出创建、编辑和分析文件。
.docx2026年现代最佳实践:
- 优先使用模板+样式,而非手动格式设置。
- 将视为可编辑源文件;将PDF视为发布产物。
.docx - 若对外分发,需遵循基础可访问性规范(标题、表格表头、替代文本)。
Quick Reference
快速参考
| Task | Tool/Library | Language | When to Use |
|---|---|---|---|
| Create DOCX | python-docx | Python | Reports, contracts, proposals |
| Create DOCX | docx | Node.js | Server-side document generation |
| Convert to HTML | mammoth.js | Node.js | Web display, content extraction |
| Parse DOCX | python-docx | Python | Extract text, tables, metadata |
| Template fill | docxtpl | Python | Mail merge, template-based generation |
| Review workflow | Word compare, comments/highlights | Any | Human review without OOXML surgery |
| Tracked changes | OOXML inspection, docx4j/OpenXML SDK/Aspose | Any | True redlines or parsing tracked changes |
| 任务 | 工具/库 | 语言 | 适用场景 |
|---|---|---|---|
| 创建DOCX文件 | python-docx | Python | 报告、合同、提案 |
| 创建DOCX文件 | docx | Node.js | 服务端文档生成 |
| 转换为HTML | mammoth.js | Node.js | Web展示、内容提取 |
| 解析DOCX文件 | python-docx | Python | 提取文本、表格、元数据 |
| 模板填充 | docxtpl | Python | 邮件合并、基于模板的生成 |
| 审阅工作流 | Word对比、批注/高亮 | 任意 | 无需操作OOXML的人工审阅 |
| 修订记录 | OOXML检查、docx4j/OpenXML SDK/Aspose | 任意 | 真正的修订标记或解析修订记录 |
Tool Selection
工具选择
- Prefer when non-developers must edit layout/design in Word.
docxtpl - Prefer for structural edits (paragraphs/tables/headers/footers) when formatting complexity is moderate.
python-docx - Prefer (Node.js) for server-side generation in TypeScript-heavy stacks.
docx - Prefer for text-first extraction or DOCX-to-HTML (best effort; may drop some layout fidelity).
mammoth
- 当非开发人员需要在Word中编辑布局/设计时,优先使用。
docxtpl - 当格式复杂度适中,需要进行结构编辑(段落/表格/页眉/页脚)时,优先使用。
python-docx - 在以TypeScript为主的技术栈中进行服务端生成时,优先使用Node.js的。
docx - 若以文本提取为主或进行DOCX转HTML,优先使用(尽力还原,可能会丢失部分布局保真度)。
mammoth
Known Limits (Plan Around These)
已知限制(需提前规划)
- (legacy) is not supported by these libraries; convert to
.docfirst (e.g., LibreOffice)..docx - cannot reliably create true tracked changes; use Word compare or specialized OOXML tooling.
python-docx - Tables of Contents and many fields are placeholders until opened/updated in Word.
- 这些库不支持(旧版格式);需先转换为
.doc(例如使用LibreOffice)。.docx - 无法可靠创建真正的修订记录;请使用Word对比或专门的OOXML工具。
python-docx - 目录和许多字段在Word中打开/更新前仅为占位符。
Core Operations
核心操作
Create Document (Python - python-docx)
创建文档(Python - python-docx)
python
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()python
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()Title
标题
title = doc.add_heading('Document Title', 0)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
title = doc.add_heading('Document Title', 0)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
Paragraph with formatting
带格式的段落
para = doc.add_paragraph()
run = para.add_run('Bold and ')
run.bold = True
run = para.add_run('italic text.')
run.italic = True
para = doc.add_paragraph()
run = para.add_run('Bold and ')
run.bold = True
run = para.add_run('italic text.')
run.italic = True
Table
表格
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid'
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
cell.text = f'Row {i+1}, Col {j+1}'
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid'
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
cell.text = f'Row {i+1}, Col {j+1}'
Image
图片
doc.add_picture('image.png', width=Inches(4))
doc.add_picture('image.png', width=Inches(4))
Save
保存
doc.save('output.docx')
undefineddoc.save('output.docx')
undefinedCreate Document (Node.js - docx)
创建文档(Node.js - docx)
typescript
import { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell } from 'docx';
import * as fs from 'fs';
const doc = new Document({
sections: [{
properties: {},
children: [
new Paragraph({
children: [
new TextRun({ text: 'Bold text', bold: true }),
new TextRun({ text: ' and normal text.' }),
],
}),
new Table({
rows: [
new TableRow({
children: [
new TableCell({ children: [new Paragraph('Cell 1')] }),
new TableCell({ children: [new Paragraph('Cell 2')] }),
],
}),
],
}),
],
}],
});
Packer.toBuffer(doc).then((buffer) => {
fs.writeFileSync('output.docx', buffer);
});typescript
import { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell } from 'docx';
import * as fs from 'fs';
const doc = new Document({
sections: [{
properties: {},
children: [
new Paragraph({
children: [
new TextRun({ text: 'Bold text', bold: true }),
new TextRun({ text: ' and normal text.' }),
],
}),
new Table({
rows: [
new TableRow({
children: [
new TableCell({ children: [new Paragraph('Cell 1')] }),
new TableCell({ children: [new Paragraph('Cell 2')] }),
],
}),
],
}),
],
}],
});
Packer.toBuffer(doc).then((buffer) => {
fs.writeFileSync('output.docx', buffer);
});Template-Based Generation (Python - docxtpl)
基于模板的生成(Python - docxtpl)
python
from docxtpl import DocxTemplate
doc = DocxTemplate('template.docx')
context = {
'company_name': 'Acme Corp',
'date': '2025-01-15',
'items': [
{'name': 'Widget A', 'price': 100},
{'name': 'Widget B', 'price': 200},
]
}
doc.render(context)
doc.save('filled_template.docx')python
from docxtpl import DocxTemplate
doc = DocxTemplate('template.docx')
context = {
'company_name': 'Acme Corp',
'date': '2025-01-15',
'items': [
{'name': 'Widget A', 'price': 100},
{'name': 'Widget B', 'price': 200},
]
}
doc.render(context)
doc.save('filled_template.docx')Extract Content (Python - python-docx)
提取内容(Python - python-docx)
python
from docx import Document
doc = Document('input.docx')python
from docx import Document
doc = Document('input.docx')Extract all text
提取所有文本
full_text = []
for para in doc.paragraphs:
full_text.append(para.text)
full_text = []
for para in doc.paragraphs:
full_text.append(para.text)
Extract tables
提取表格
for table in doc.tables:
for row in table.rows:
row_data = [cell.text for cell in row.cells]
print(row_data)
undefinedfor table in doc.tables:
for row in table.rows:
row_data = [cell.text for cell in row.cells]
print(row_data)
undefinedStyling Reference
样式参考
| Element | Python Method | Node.js Class |
|---|---|---|
| Heading 1 | | |
| Bold | | |
| Italic | | |
| Font size | | |
| Alignment | | |
| Page break | | |
| 元素 | Python方法 | Node.js类 |
|---|---|---|
| 一级标题 | | |
| 粗体 | | |
| 斜体 | | |
| 字体大小 | | |
| 对齐方式 | | |
| 分页符 | | |
Do / Avoid (Dec 2025)
建议/避免事项(2025年12月)
Do
建议
- Use consistent heading levels and a table of contents for long docs.
- Capture decisions and action items with owners and due dates.
- Store docs in a versioned, searchable system.
- 长文档使用一致的标题层级和目录。
- 记录决策和行动项,明确负责人和截止日期。
- 将文档存储在带版本控制、可搜索的系统中。
Avoid
避免
- Manual formatting instead of styles (breaks consistency).
- Docs with no owner or review cadence (stale quickly).
- Copy/pasting without updating definitions and links.
- 用手动格式设置替代样式(破坏一致性)。
- 没有负责人或审阅周期的文档(易过时)。
- 复制粘贴后未更新定义和链接。
Output Quality Checklist
输出质量检查清单
- Structure: consistent heading hierarchy, styles, and (when needed) an auto-generated table of contents.
- Decisions: decisions/actions captured with owner + due date (not buried in prose).
- Versioning: doc ID + version + change summary; review cadence defined.
- Accessibility hygiene: headings/reading order are correct; table headers are marked; alt text for non-decorative images.
- Reuse: use for decision logs and recurring doc types.
assets/doc-template-pack.md
- 结构:一致的标题层级、样式,必要时包含自动生成的目录。
- 决策:记录的决策/行动项需明确负责人+截止日期(不要隐藏在正文中)。
- 版本控制:文档ID+版本+变更摘要;明确审阅周期。
- 可访问性规范:标题/阅读顺序正确;表格表头已标记;非装饰性图片添加替代文本。
- 复用:对于决策日志和重复类型的文档,使用。
assets/doc-template-pack.md
Optional: AI / Automation
可选:AI/自动化
Use only when explicitly requested and policy-compliant.
- Summarize meeting notes into decisions/actions; humans verify accuracy.
- Draft first-pass docs from outlines; do not invent facts or quotes.
仅在明确要求且符合政策时使用。
- 将会议纪要提炼为决策/行动项;需人工验证准确性。
- 根据大纲起草初稿;不得编造事实或引用内容。
Navigation
导航
Resources
- references/docx-patterns.md - Advanced formatting, styles, headers/footers
- references/template-workflows.md - Mail merge, batch generation
- references/tracked-changes.md - Tracked changes: what is feasible, and what is not
- data/sources.json - Library documentation links
Scripts
- - Dependency-free OOXML inspection (including tracked changes signals)
scripts/docx_inspect_ooxml.py - - Extract text/tables to JSON (requires
scripts/docx_extract.py)python-docx - - Render a
scripts/docx_render_template.pytemplate (requiresdocxtpl)docxtpl - - Convert
scripts/docx_to_html.mjsto HTML (requires.docx)mammoth
Templates
- assets/report-template.md - Standard report structure
- assets/contract-template.md - Legal document structure
- assets/doc-template-pack.md - Decision log, meeting notes, changelog templates
Related Skills
- ../document-pdf/SKILL.md - PDF generation and conversion
- ../docs-codebase/SKILL.md - Technical writing patterns
资源
- references/docx-patterns.md - 高级格式设置、样式、页眉/页脚
- references/template-workflows.md - 邮件合并、批量生成
- references/tracked-changes.md - 修订记录:可行与不可行操作
- data/sources.json - 库文档链接
脚本
- - 无依赖的OOXML检查(包括修订记录标识)
scripts/docx_inspect_ooxml.py - - 将文本/表格提取为JSON(需
scripts/docx_extract.py)python-docx - - 渲染
scripts/docx_render_template.py模板(需docxtpl)docxtpl - - 将
scripts/docx_to_html.mjs转换为HTML(需.docx)mammoth
模板
- assets/report-template.md - 标准报告结构
- assets/contract-template.md - 法律文档结构
- assets/doc-template-pack.md - 决策日志、会议纪要、变更日志模板
相关技能
- ../document-pdf/SKILL.md - PDF生成与转换
- ../docs-codebase/SKILL.md - 技术写作模式