pr-docx
Original:🇺🇸 English
Translated
Comprehensive DOCX import/export handling for Plate editor with tracked changes and comments. Use when implementing or debugging DOCX file operations, mammoth.js modifications, suggestion/comment import from Word, or export of Plate content to Word format. Triggers on requests involving DOCX import, export, tracked changes, Word comments, mammoth.js, or round-trip document fidelity. This skill ensures the agent understands the NON-NEGOTIABLE requirements for seamless Word ↔ Plate integration.
4installs
Sourcearthrod/conejo-skills
Added on
NPX Install
npx skill4agent add arthrod/conejo-skills pr-docxTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →DOCX Import/Export for Plate Editor
CRITICAL UNDERSTANDING
This skill provides comprehensive guidance for DOCX import/export with absolute requirements that must be followed. Read this entire skill before making any changes to DOCX-related code.
The Non-Negotiable Principle: "NO MATTER WHAT"
"NO MATTER WHAT" is an absolutist requirement. Content is SACRED. Metadata is secondary.
Priority Hierarchy
PRIORITY 1 (REQUIRED): LOCATION
→ We MUST know WHERE the comment/change applies
→ Without location, we cannot place the annotation - ONLY valid skip
PRIORITY 2 (REQUIRED): CONTENT
→ The comment TEXT or changed TEXT must be preserved
PRIORITY 3 (BEST EFFORT): METADATA
→ Author → Use if available, else "imported-unknown"
→ Date → Use if available, else Date.now()The Golden Rules
| Scenario | Action | Skip? |
|---|---|---|
| Has location, has author, has date | Import fully | NO |
| Has location, NO author | Import with | NO |
| Has location, NO date | Import with | NO |
| Has location, NO text (comment) | Import with empty text | NO |
| Tracked change: NO start OR end | Log warning, clean up | YES |
| Comment: NO start | Log warning, clean up | YES |
| Comment: Has start, NO end | Use start as point comment | NO - infer end |
Special: Comments With Partial Markers (Golden Rule)
A comment needs ANY location marker to be preserved. The Golden Rule:
| Scenario | Action |
|---|---|
| Has start, no end | end = start → point comment |
| Has end, no start | start = end → point comment |
| Has neither | Skip (only valid skip) |
typescript
// Golden rule: if we have ANY location marker, preserve the comment
if (!startTokenRange && !endTokenRange) {
// Only skip when we have NO markers at all
if (process.env.NODE_ENV !== "production") {
console.warn("[DOCX Import] Skipping comment with no location markers:", comment.id);
}
continue;
}
// Use whichever marker we have, fallback to the other for point comments
const effectiveStartTokenRange = startTokenRange ?? endTokenRange;
const effectiveEndTokenRange = endTokenRange ?? startTokenRange;In Code Terms
typescript
// COMMENTS: Always import if we have location
if (!startTokenRange || !endTokenRange) {
// NO LOCATION = only valid skip
console.warn("Skipping - no location:", comment.id);
continue;
}
// Everything else? IMPORT with defaults
const userId = comment.authorName ?? "imported-unknown";
const date = comment.date ? Date.parse(comment.date) : Date.now();
// ... create discussion
// TRACKED CHANGES: Same principle
if (!changeRange) {
console.warn("Skipping - no location:", change.id);
continue;
}
const suggestion = {
userId: change.author ?? "imported-unknown",
createdAt: change.date ? Date.parse(change.date) : Date.now(),
// ... rest
};What This Means
- Every tracked change MUST be preserved - if we have location
- Every comment MUST be preserved - if we have location
- Authors do NOT need to exist - use names directly, no lookup
- Dates do NOT need to exist - use current timestamp as fallback
- No silent failures - log warnings but STILL import with defaults
- Round-trip fidelity - Import → Export → Import must preserve
CRITICAL: Precision vs. Preservation
┌─────────────────────────────────────────────────────────┐
│ PRESERVATION > PRECISION │
│ │
│ Better to import with imperfect metadata │
│ than lose content for "cleaner" code. │
└─────────────────────────────────────────────────────────┘Any change that increases risk of losing comments/tracked changes for precision MUST:
-
Have mandatory fallback logging:typescript
if (!meetsStrictCriteria(change)) { console.warn("[DOCX Import] Precision check failed, using fallback:", { id: change.id, reason: "...", originalData: change, }); importWithDefaults(change); // STILL IMPORT IT } -
Be implemented ONLY after careful research:
- Review DOCX specification for edge cases
- Test with Word, LibreOffice, AND Google Docs exports
- Verify NO content is lost in any scenario
- Document WHY the precision is needed
-
Never skip without logging:typescript
// ❌ WRONG - Silent skip if (!valid) continue; // ✅ CORRECT - Fallback with logging if (!valid) { console.warn("[DOCX Import] Using fallback for:", id); importWithDefaults(item); }
Review checklist for precision changes:
- Has fallback that preserves content?
- Logs when fallback is used?
- Tested with malformed documents?
- Is precision necessary or just "nice to have"?
- Could this cause silent data loss?
Architecture Overview
IMPORT FLOW
┌──────────┐ ┌─────────────┐ ┌────────────────┐ ┌─────────────┐
│ .docx │───►│ mammoth.js │───►│ HTML + Tokens │───►│Plate Editor │
│ file │ │ body-reader │ │ [[DOCX_*:...]] │ │ Suggestions │
└──────────┘ │ doc-to-html │ └────────────────┘ │ Comments │
└─────────────┘ └─────────────┘
EXPORT FLOW
┌─────────────┐ ┌────────────────┐ ┌───────────────┐ ┌──────────┐
│Plate Editor │───►│ Serialize to │───►│ docx-export │───►│ .docx │
│ Suggestions │ │ Word-safe HTML │ │ kit │ │ file │
│ Comments │ │ <ins>/<del> │ └───────────────┘ └──────────┘
└─────────────┘ │ Word comments │
└────────────────┘Token System
mammoth.js emits tokens that import-toolbar-button.tsx parses:
CRITICAL: Token Positioning with findHtmlPath().wrap()
Tokens MUST be positioned inline with text for Plate to find them.
In mammoth.js, element handlers should use to ensure tokens are emitted in the correct position within the document structure:
findHtmlPath(element, htmlPaths.empty).wrap()javascript
// ✅ CORRECT - Token positioned inline with content
commentRangeStart: function (element, messages, options) {
return findHtmlPath(element, htmlPaths.empty).wrap(function () {
var token = DOCX_COMMENT_START_TOKEN_PREFIX + payload + DOCX_COMMENT_TOKEN_SUFFIX;
return [Html.text(token)];
});
},
// ❌ WRONG - Token may appear outside paragraph structure
commentRangeStart: function (element, messages, options) {
var token = DOCX_COMMENT_START_TOKEN_PREFIX + payload + DOCX_COMMENT_TOKEN_SUFFIX;
return [Html.text(token)]; // No wrap = wrong position
},Why this matters for Plate:
WITHOUT wrap():
<p>Hello</p>[[DOCX_CMT_START:...]]<p>world</p>
└─ Token outside paragraph
└─ After deserialization: token in wrong node or lost
└─ searchRange() fails → comment not imported
WITH wrap():
<p>Hello[[DOCX_CMT_START:...]]world</p>
└─ Token inline with text
└─ After deserialization: token in same text node as content
└─ searchRange() succeeds → comment imported correctlyThe Flow:
- mammoth.js emits token inline with text
[[DOCX_CMT_START:{...}]] - +
cleanDocx()creates Plate nodeshtml.deserialize() - Token text is in the same node as the annotated content
- finds the token boundaries
searchRange() - Comment marks are applied to the correct range
Rule: All token-emitting handlers must use findHtmlPath().wrap()
- →
commentRangeStartfindHtmlPath(element, htmlPaths.empty).wrap() - →
commentRangeEndfindHtmlPath(element, htmlPaths.empty).wrap() - → Already wraps children correctly
inserted - → Already wraps children correctly
deleted
| Token | Purpose |
|---|---|
| Start of insertion (tracked change) |
| End of insertion |
| Start of deletion (tracked change) |
| End of deletion |
| Start of comment range |
| End of comment range |
Payload structure (JSON, URL-encoded):
json
{
"id": "unique-id",
"author": "Author Name",
"date": "2024-01-15T10:30:00Z"
}For comments, additional fields:
json
{
"id": "0",
"authorName": "John Doe",
"authorInitials": "JD",
"date": "2024-01-15T10:30:00Z",
"text": "Comment content here"
}Packages Overview
The codebase uses multiple packages for DOCX handling. Understanding their roles prevents conflicts:
| Package | Location | Purpose | Direction |
|---|---|---|---|
| mammoth.js | | DOCX → HTML conversion | Import |
| html-to-docx | | HTML → DOCX conversion | Export |
| docxjs | | DOCX preview/rendering | Preview |
| @platejs/docx | | Plate DOCX utilities | Both |
mammoth.js (Custom Fork)
Purpose: Convert DOCX to HTML with embedded tokens for tracked changes and comments.
Key modifications:
- - Parses
lib/docx/body-reader.js,w:ins,w:del,w:commentRangeStartw:commentRangeEnd - - Emits
lib/document-to-html.jstokens[[DOCX_*:...]] - - Document model with
lib/documents.js,inserted,deletedtypescommentRangeStart
Does NOT support export - only import.
html-to-docx
Purpose: Convert HTML to DOCX format for export.
Current limitations:
- No /
<w:ins>generation (tracked changes)<w:del> - No generation (Word comments)
comments.xml - Basic HTML → Word conversion only
Future enhancement needed: Add tracked changes and comments support for round-trip fidelity.
docxjs (docx-preview)
Purpose: Render DOCX files for preview in browser.
Key options:
typescript
{
renderChanges: false, // Can render tracked changes
renderComments: false, // Can render comments
breakPages: true,
// ...
}Does NOT modify files - read-only preview.
Package Interaction
IMPORT: .docx ──mammoth.js──► HTML+tokens ──Plate──► Editor
EXPORT: Editor ──serialize──► HTML ──html-to-docx──► .docx
PREVIEW: .docx ──docxjs──► DOM (read-only)No conflicts: Each package has a distinct role. Modifications to one don't affect others.
Key Files
Import Pipeline
- - Parses DOCX XML elements
packages/mammoth.js/lib/docx/body-reader.js - - Emits tokens for tracked changes/comments
packages/mammoth.js/lib/document-to-html.js - - Parses tokens, creates suggestions/comments
src/components/editor/ui/import-toolbar-button.tsx - - Finds token boundaries in editor content
src/components/editor/utils/searchRanges.ts
Export Pipeline
- - DOCX blob generation
src/registry/components/editor/plugins/docx-export-kit.tsx - - Export button UI
src/components/editor/ui/docx-export-toolbar-button.tsx - - Multi-format export
src/components/editor/ui/export-toolbar-button-fixed.tsx
Plate Plugins
- - Suggestion system
src/components/editor/plugins/suggestion-kit-app.tsx - - Comment/discussion system
src/components/editor/plugins/comment-kit-app.tsx
Logging
- - Unified Logfire logger (prod: Logfire only, dev: Logfire + console)
src/lib/logger.ts
Implementation Rules
Rule 1: Always Handle Orphan Tokens
typescript
if (!startTokenRange || !endTokenRange) {
// MUST clean up orphan tokens
if (startTokenRange) editor.tf.delete({ at: startTokenRange });
if (endTokenRange) editor.tf.delete({ at: endTokenRange });
continue; // But don't fail the whole import
}Rule 2: Never Require User Lookup
typescript
// CORRECT - Use author name directly
const userId = authorName ?? "imported-unknown";
// WRONG - Don't do this
const user = await findUserByEmail(authorEmail);
const userId = user?.id; // ❌ Will fail for external authorsRule 3: Always Use rangeRef for Node Operations
typescript
// Ranges can become stale after node-splitting operations
const startTokenRef = editor.api.rangeRef(startTokenRange);
const endTokenRef = editor.api.rangeRef(endTokenRange);
// After operations, get current ranges
const currentStart = startTokenRef.current;
const currentEnd = endTokenRef.current;
// Always unref when done
startTokenRef.unref();
endTokenRef.unref();Rule 4: Check for Null Comments in mammoth.js
javascript
var comment = comments[reference.commentId];
if (!comment) {
messages.push(results.warning("Comment not found: " + reference.commentId));
comment = { commentId: reference.commentId, body: [], authorInitials: "" };
}Rule 5: Always Log with Logfire, Never Crash
Use which wraps Logfire with environment-aware console output.
src/lib/logger.tstypescript
import { logger } from "@/lib/logger";
// For warnings (e.g., skipped items, fallbacks used)
// Uses logfire.warning() under the hood
logger.warning("[DOCX Import] Failed to parse token", {
rawPayload,
error: e,
});
// For errors (e.g., failed operations)
// Uses logfire.error() under the hood
logger.error("[DOCX Import] Failed to create comment", e, {
commentId: comment.id,
documentId,
});
// For info (e.g., successful operations)
// Uses logfire.info() under the hood
logger.info("[DOCX Import] Import completed", {
commentsCreated,
insertions,
deletions,
});Logfire API Reference:
| Logger Method | Logfire Method | Use Case |
|---|---|---|
| | Recoverable issues, fallbacks used |
| | Failed operations, exceptions |
| | Success messages, metrics |
Logging behavior:
- Production: Logs to Logfire only (no console spam)
- Development: Logs to both Logfire AND console (for debugging)
Plate Mark Structures
Suggestion Marks
typescript
{
[KEYS.suggestion]: true,
[getSuggestionKey(id)]: {
id: string,
type: "insert" | "remove",
userId: string,
createdAt: number
}
}Comment Marks
typescript
{
[KEYS.comment]: true,
[getCommentKey(discussionId)]: true,
[getTransientCommentKey()]: true // During creation
}Common Debugging Scenarios
Tokens Not Being Parsed
- Check mammoth.js output in browser console
- Verify token prefixes match exactly between files
- Check that isn't stripping tokens
cleanDocx()
Suggestions Not Appearing
- Verify is set to
KEYS.suggestiontrue - Check contains full object, not just ID
getSuggestionKey(id) - Ensure is exactly
typeor"insert""remove"
Comments Not Saved
- Check API call
createDiscussionWithComment - Verify is passed correctly
documentId - Check TRPC mutation response for errors
Ranges Becoming Stale
- Use before any node-modifying operations
rangeRef - Call after operations complete
unref() - Re-fetch ranges after with
setNodessplit: true
Testing Checklist
When modifying DOCX import/export:
- Test with Word document containing only insertions
- Test with Word document containing only deletions
- Test with Word document containing mixed tracked changes
- Test with Word document containing single comment
- Test with Word document containing multiple comments
- Test with Word document containing both tracked changes AND comments
- Test with document from different sources (Word, LibreOffice, Google Docs)
- Test round-trip: Import → Make changes → Export → Import again
- Verify no tokens are visible in final editor content
- Verify all authors are attributed correctly
Detailed References
- Import implementation: See references/import-pipeline.md
- Export implementation: See references/export-pipeline.md
- mammoth.js modifications: See references/mammoth-modifications.md
- Packages overview: See references/packages-overview.md
Emergency Fixes
If tokens appear in editor content:
typescript
// Force cleanup of all remaining tokens
const tokenPatterns = [
/\[\[DOCX_INS_START:.*?\]\]/g,
/\[\[DOCX_INS_END:.*?\]\]/g,
/\[\[DOCX_DEL_START:.*?\]\]/g,
/\[\[DOCX_DEL_END:.*?\]\]/g,
/\[\[DOCX_CMT_START:.*?\]\]/g,
/\[\[DOCX_CMT_END:.*?\]\]/g,
];
// Search and delete each matchIf suggestions aren't displaying:
- Check suggestion plugin is configured correctly
- Verify is rendering
SuggestionLeaf - Check browser console for rendering errors