docx
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDOCX creation, editing, and analysis
DOCX创建、编辑与分析
Overview
概述
A .docx file is a ZIP archive containing XML files.
.docx文件是一个包含XML文件的ZIP压缩包。
Quick Reference
快速参考
| Task | Approach |
|---|---|
| Read/analyze content | |
| Create new document | Use |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
| 任务 | 处理方式 |
|---|---|
| 读取/分析内容 | |
| 创建新文档 | 使用 |
| 编辑现有文档 | 解压 → 编辑XML → 重新打包 - 详见下方“编辑现有文档”部分 |
Converting .doc to .docx
将.doc转换为.docx
Legacy files must be converted before editing:
.docbash
python scripts/office/soffice.py --headless --convert-to docx document.doc旧版.doc文件必须先转换才能编辑:
bash
python scripts/office/soffice.py --headless --convert-to docx document.docReading Content
读取内容
bash
undefinedbash
undefinedText extraction with tracked changes
提取带修订记录的文本
pandoc --track-changes=all document.docx -o output.md
pandoc --track-changes=all document.docx -o output.md
Raw XML access
访问原始XML
python scripts/office/unpack.py document.docx unpacked/
undefinedpython scripts/office/unpack.py document.docx unpacked/
undefinedConverting to Images
转换为图片
bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf pagebash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf pageAccepting Tracked Changes
接受修订内容
To produce a clean document with all tracked changes accepted (requires LibreOffice):
bash
python scripts/accept_changes.py input.docx output.docx要生成已接受所有修订的干净文档(需要LibreOffice):
bash
python scripts/accept_changes.py input.docx output.docxCreating New Documents
创建新文档
Generate .docx files with JavaScript, then validate. Install:
npm install -g docx使用JavaScript生成.docx文件,然后进行验证。安装:
npm install -g docxSetup
初始化
javascript
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));javascript
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* 内容 */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));Validation
验证
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
bash
python scripts/office/validate.py doc.docx创建文件后,对其进行验证。如果验证失败,解压文件、修复XML并重新打包。
bash
python scripts/office/validate.py doc.docxPage Size
页面尺寸
javascript
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 inches in DXA
height: 15840 // 11 inches in DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
}
},
children: [/* content */]
}]Common page sizes (DXA units, 1440 DXA = 1 inch):
| Paper | Width | Height | Content Width (1" margins) |
|---|---|---|---|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4 (default) | 11,906 | 16,838 | 9,026 |
Landscape orientation: docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
javascript
size: {
width: 12240, // Pass SHORT edge as width
height: 15840, // Pass LONG edge as height
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)javascript
// 重要提示:docx-js默认使用A4纸张,而非美国信纸(US Letter)
// 请始终显式设置页面尺寸以确保结果一致
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5英寸,单位为DXA
height: 15840 // 11英寸,单位为DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1英寸边距
}
},
children: [/* 内容 */]
}]常见页面尺寸(DXA单位,1440 DXA = 1英寸):
| 纸张类型 | 宽度 | 高度 | 内容宽度(1英寸边距) |
|---|---|---|---|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4(默认) | 11,906 | 16,838 | 9,026 |
横向排版: docx-js会在内部交换宽高,因此传入纵向尺寸即可,由它处理交换:
javascript
size: {
width: 12240, // 传入短边作为宽度
height: 15840, // 传入长边作为高度
orientation: PageOrientation.LANDSCAPE // docx-js会在XML中自动交换它们
},
// 内容宽度 = 15840 - 左边距 - 右边距(使用长边计算)Styles (Override Built-in Headings)
样式(覆盖内置标题)
Use Arial as the default font (universally supported). Keep titles black for readability.
javascript
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
paragraphStyles: [
// IMPORTANT: Use exact IDs to override built-in styles
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});使用Arial作为默认字体(通用支持)。标题保持黑色以保证可读性。
javascript
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 默认12号字体
paragraphStyles: [
// 重要提示:使用精确的ID来覆盖内置样式
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // 生成目录需要设置outlineLevel
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat:true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("标题")] }),
]
}]
});Lists (NEVER use unicode bullets)
列表(绝不要使用Unicode项目符号)
javascript
// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] }) // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD
// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
// ⚠️ Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)javascript
// ❌ 错误:绝不要手动插入项目符号字符
new Paragraph({ children: [new TextRun("• 项目")] }) // 错误示例
new Paragraph({ children: [new TextRun("\\u2022 项目")] }) // 错误示例
// ✅ 正确:使用带编号配置的LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("项目符号项")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("编号项")] }),
]
}]
});
// ⚠️ 每个引用会创建独立的编号序列
// 相同引用:编号连续(1,2,3 之后是4,5,6)
// 不同引用:编号重新开始(1,2,3 之后是1,2,3)Tables
表格
CRITICAL: Tables need dual widths - set both on the table AND on each cell. Without both, tables render incorrectly on some platforms.
columnWidthswidthjavascript
// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})Table width calculation:
Always use — breaks in Google Docs.
WidthType.DXAWidthType.PERCENTAGEjavascript
// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // Must sum to table widthWidth rules:
- Always use — never
WidthType.DXA(incompatible with Google Docs)WidthType.PERCENTAGE - Table width must equal the sum of
columnWidths - Cell must match corresponding
widthcolumnWidth - Cell are internal padding - they reduce content area, not add to cell width
margins - For full-width tables: use content width (page width minus left and right margins)
关键注意事项:表格需要双宽度设置 - 同时在表格上设置和每个单元格上设置。如果不同时设置,表格在部分平台上会显示异常。
columnWidthswidthjavascript
// 重要提示:请始终设置表格宽度以确保渲染一致
// 重要提示:使用ShadingType.CLEAR(而非SOLID)以避免黑色背景
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // 始终使用DXA单位(百分比在Google Docs中会失效)
columnWidths: [4680, 4680], // 总和必须等于表格宽度(DXA单位:1440 = 1英寸)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // 同时在每个单元格上设置宽度
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // 使用CLEAR而非SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // 单元格内边距(内部填充,不增加单元格宽度)
children: [new Paragraph({ children: [new TextRun("单元格")] })]
})
]
})
]
})表格宽度计算:
始终使用 — 在Google Docs中会失效。
WidthType.DXAWidthType.PERCENTAGEjavascript
// 表格宽度 = 列宽度总和 = 内容宽度
// 带1英寸边距的美国信纸:12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // 总和必须等于表格宽度宽度规则:
- 始终使用— 绝不要使用
WidthType.DXA(与Google Docs不兼容)WidthType.PERCENTAGE - 表格宽度必须等于的总和
columnWidths - 单元格必须与对应的
width匹配columnWidth - 单元格是内部填充 - 会缩小内容区域,不会增加单元格宽度
margins - 要创建全宽表格:使用内容宽度(页面宽度减去左右边距)
Images
图片
javascript
// CRITICAL: type parameter is REQUIRED
new Paragraph({
children: [new ImageRun({
type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" } // All three required
})]
})javascript
// 重要提示:必须指定type参数
new Paragraph({
children: [new ImageRun({
type: "png", // 必填:png, jpg, jpeg, gif, bmp, svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "标题", description: "描述", name: "名称" } // 这三个属性都必须设置
})]
})Page Breaks
分页符
javascript
// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })
// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })javascript
// 重要提示:PageBreak必须放在Paragraph内部
new Paragraph({ children: [new PageBreak()] })
// 或者使用pageBreakBefore属性
new Paragraph({ pageBreakBefore: true, children: [new TextRun("新页面")] })Table of Contents
目录
javascript
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })javascript
// 重要提示:标题必须仅使用HeadingLevel,不能使用自定义样式
new TableOfContents("目录", { hyperlink: true, headingStyleRange: "1-3" })Headers/Footers
页眉/页脚
javascript
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]javascript
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1英寸
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("页眉")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("第 "), new TextRun({ children: [PageNumber.CURRENT] }), new TextRun(" 页")]
})] })
},
children: [/* 内容 */]
}]Critical Rules for docx-js
docx-js使用关键规则
- Set page size explicitly - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
- Landscape: pass portrait dimensions - docx-js swaps width/height internally; pass short edge as , long edge as
width, and setheightorientation: PageOrientation.LANDSCAPE - Never use - use separate Paragraph elements
\n - Never use unicode bullets - use with numbering config
LevelFormat.BULLET - PageBreak must be in Paragraph - standalone creates invalid XML
- ImageRun requires - always specify png/jpg/etc
type - Always set table with DXA - never use
width(breaks in Google Docs)WidthType.PERCENTAGE - Tables need dual widths - array AND cell
columnWidths, both must matchwidth - Table width = sum of columnWidths - for DXA, ensure they add up exactly
- Always add cell margins - use for readable padding
margins: { top: 80, bottom: 80, left: 120, right: 120 } - Use - never SOLID for table shading
ShadingType.CLEAR - TOC requires HeadingLevel only - no custom styles on heading paragraphs
- Override built-in styles - use exact IDs: "Heading1", "Heading2", etc.
- Include - required for TOC (0 for H1, 1 for H2, etc.)
outlineLevel
- 显式设置页面尺寸 - docx-js默认使用A4纸;针对美国地区的文档请使用US Letter(12240 x 15840 DXA)
- 横向排版:传入纵向尺寸 - docx-js会在内部交换宽高;将短边作为传入,长边作为
width传入,并设置heightorientation: PageOrientation.LANDSCAPE - 绝不要使用- 使用独立的Paragraph元素
\ - 绝不要使用Unicode项目符号 - 使用带编号配置的
LevelFormat.BULLET - PageBreak必须放在Paragraph内 - 单独使用会生成无效XML
- ImageRun需要指定- 始终明确指定png/jpg等类型
type - 始终使用DXA设置表格- 绝不要使用
width(在Google Docs中会失效)WidthType.PERCENTAGE - 表格需要双宽度设置 - 数组和单元格
columnWidths必须匹配width - 表格宽度 = columnWidths的总和 - 对于DXA单位,确保数值完全相加匹配
- 始终添加单元格边距 - 使用以获得易读的内边距
margins: { top: 80, bottom: 80, left: 120, right: 120 } - 使用- 表格底纹绝不要使用SOLID
ShadingType.CLEAR - 目录仅支持HeadingLevel - 标题段落不要使用自定义样式
- 覆盖内置样式 - 使用精确的ID:"Heading1"、"Heading2"等
- 包含- 目录需要此属性(H1对应0,H2对应1等)
outlineLevel
Editing Existing Documents
编辑现有文档
Follow all 3 steps in order.
按顺序执行以下3个步骤。
Step 1: Unpack
步骤1:解压
bash
python scripts/office/unpack.py document.docx unpacked/Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities ( etc.) so they survive editing. Use to skip run merging.
“--merge-runs falsebash
python scripts/office/unpack.py document.docx unpacked/提取XML文件、格式化输出、合并相邻的run,并将智能引号转换为XML实体(如等),以确保编辑后不会丢失。使用可跳过run合并。
“--merge-runs falseStep 2: Edit XML
步骤2:编辑XML
Edit files in . See XML Reference below for patterns.
unpacked/word/Use "Claude" as the author for tracked changes and comments, unless the user explicitly requests use of a different name.
Use the Edit tool directly for string replacement. Do not write Python scripts. Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.
CRITICAL: Use smart quotes for new content. When adding text with apostrophes or quotes, use XML entities to produce smart quotes:
xml
<!-- Use these entities for professional typography -->
<w:t>Here’s a quote: “Hello”</w:t>| Entity | Character |
|---|---|
| ‘ (left single) |
| ’ (right single / apostrophe) |
| “ (left double) |
| ” (right double) |
Adding comments: Use to handle boilerplate across multiple XML files (text must be pre-escaped XML):
comment.pybash
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author nameThen add markers to document.xml (see Comments in XML Reference).
编辑目录下的文件。请参考下方的XML参考获取格式示例。
unpacked/word/将“Claude”作为作者用于修订内容和批注,除非用户明确要求使用其他名称。
直接使用编辑工具进行字符串替换。不要编写Python脚本。 脚本会引入不必要的复杂性。编辑工具可直观显示替换内容。
关键注意事项:为新内容使用智能引号。 添加带有撇号或引号的文本时,使用XML实体来生成智能引号:
xml
<!-- 使用这些实体来实现专业排版 -->
<w:t>Here’s a quote: “Hello”</w:t>| 实体 | 字符 |
|---|---|
| ‘ (左单引号) |
| ’ (右单引号/撇号) |
| “ (左双引号) |
| ” (右双引号) |
添加批注: 使用处理多个XML文件中的重复代码(文本必须是预转义的XML):
comment.pybash
python scripts/comment.py unpacked/ 0 "批注文本,包含&和’"
python scripts/comment.py unpacked/ 1 "回复文本" --parent 0 # 回复批注0
python scripts/comment.py unpacked/ 0 "文本" --author "自定义作者" # 自定义作者名称然后在document.xml中添加标记(请参考XML参考中的批注部分)。
Step 3: Pack
步骤3:重新打包
bash
python scripts/office/pack.py unpacked/ output.docx --original document.docxValidates with auto-repair, condenses XML, and creates DOCX. Use to skip.
--validate falseAuto-repair will fix:
- >= 0x7FFFFFFF (regenerates valid ID)
durableId - Missing on
xml:space="preserve"with whitespace<w:t>
Auto-repair won't fix:
- Malformed XML, invalid element nesting, missing relationships, schema violations
bash
python scripts/office/pack.py unpacked/ output.docx --original document.docx执行自动修复验证、压缩XML并生成DOCX文件。使用可跳过验证。
--validate false自动修复可解决以下问题:
- >= 0x7FFFFFFF(重新生成有效的ID)
durableId - 带有空格的元素缺少
<w:t>属性xml:space="preserve"
自动修复无法解决以下问题:
- 格式错误的XML、无效的元素嵌套、缺失的关联关系、Schema违规
Common Pitfalls
常见陷阱
- Replace entire elements: When adding tracked changes, replace the whole
<w:r>block with<w:r>...</w:r>as siblings. Don't inject tracked change tags inside a run.<w:del>...<w:ins>... - Preserve formatting: Copy the original run's
<w:rPr>block into your tracked change runs to maintain bold, font size, etc.<w:rPr>
- 替换整个元素:添加修订内容时,将整个
<w:r>块替换为同级的<w:r>...</w:r>。不要在run内部注入修订标记。<w:del>...<w:ins>... - 保留格式:将原始run的
<w:rPr>块复制到修订内容的run中,以保持加粗、字号等格式。<w:rPr>
XML Reference
XML参考
Schema Compliance
Schema合规性
- Element order in :
<w:pPr>,<w:pStyle>,<w:numPr>,<w:spacing>,<w:ind>,<w:jc>last<w:rPr> - Whitespace: Add to
xml:space="preserve"with leading/trailing spaces<w:t> - RSIDs: Must be 8-digit hex (e.g., )
00AB1234
- 中的元素顺序:
<w:pPr>、<w:pStyle>、<w:numPr>、<w:spacing>、<w:ind>、<w:jc>(最后)<w:rPr> - 空格处理:对带有前导/尾随空格的元素添加
<w:t>属性xml:space="preserve" - RSIDs:必须是8位十六进制数(例如)
00AB1234
Tracked Changes
修订内容
Insertion:
xml
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>Deletion:
xml
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>Inside : Use instead of , and instead of .
<w:del><w:delText><w:t><w:delInstrText><w:instrText>Minimal edits - only mark what changes:
xml
<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>Deleting entire paragraphs/list items - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add inside :
<w:del/><w:pPr><w:rPr>xml
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- list numbering if present -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>Without the in , accepting changes leaves an empty paragraph/list item.
<w:del/><w:pPr><w:rPr>Rejecting another author's insertion - nest deletion inside their insertion:
xml
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>Restoring another author's deletion - add insertion after (don't modify their deletion):
xml
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>插入内容:
xml
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>插入的文本</w:t></w:r>
</w:ins>删除内容:
xml
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>删除的文本</w:delText></w:
</w:del>在内部: 使用代替,使用代替。
<w:del><w:delText><w:t><w:delInstrText><w:instrText>最小化编辑 - 仅标记更改的部分:
xml
<!-- 将“30天”改为“60天” -->
<w:r><w:t>期限为 </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> 天。</w:t></w:r>删除整个段落/列表项 - 删除段落的所有内容时,还需将段落标记标记为已删除,使其与下一段落合并。在中添加:
<w:pPr><w:rPr><w:del/>xml
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- 若存在则保留列表编号 -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>要删除的整个段落内容...</w:delText></w:r>
</w:del>
</w:p>如果不在中添加,接受修订后会留下空段落/列表项。
<w:pPr><w:rPr><w:del/>拒绝其他作者的插入内容 - 在他们的插入内容中嵌套删除标记:
xml
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>他们插入的文本</w:delText></w:r>
</w:del>
</w:ins>恢复其他作者删除的内容 - 在删除内容后添加插入标记(不要修改他们的删除标记):
xml
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>被删除的文本</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>被删除的文本</w:t></w:r>
</w:ins>Comments
批注
After running (see Step 2), add markers to document.xml. For replies, use flag and nest markers inside the parent's.
comment.py--parentCRITICAL: and are siblings of , never inside .
<w:commentRangeStart><w:commentRangeEnd><w:r><w:r>xml
<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>text</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>运行后(请参考步骤2),在document.xml中添加标记。如果是回复,请使用标志并将标记嵌套在父批注的标记内。
comment.py--parent关键注意事项:和是的同级元素,绝不要放在内部。
<w:commentRangeStart><w:commentRangeEnd><w:r><w:r>xml
<!-- 批注标记是w:p的直接子元素,绝不要放在w:r内部 -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>已删除</w:delText></w:r>
</w:del>
<w:r><w:t> 更多文本</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- 批注0嵌套回复1 -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>文本</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:Images
图片
- Add image file to
word/media/ - Add relationship to :
word/_rels/document.xml.rels
xml
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>- Add content type to :
[Content_Types].xml
xml
<Default Extension="png" ContentType="image/png"/>- Reference in document.xml:
xml
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>- 将图片文件添加到目录
word/media/ - 在中添加关联关系:
word/_rels/document.xml.rels
xml
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>- 在中添加内容类型:
[Content_Types].xml
xml
<Default Extension="png" ContentType="image/png"/>- 在document.xml中引用:
xml
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs单位:914400 = 1英寸 -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>Dependencies
依赖项
- pandoc: Text extraction
- docx: (new documents)
npm install -g docx - LibreOffice: PDF conversion (auto-configured for sandboxed environments via )
scripts/office/soffice.py - Poppler: for images
pdftoppm
- pandoc:文本提取
- docx:(用于创建新文档)
npm install -g docx - LibreOffice:PDF转换(通过在沙箱环境中自动配置)
scripts/office/soffice.py - Poppler:用于图片转换的工具
pdftoppm