Loading...
Loading...
Comprehensive HWPX (Korean Hancom Office) document creation, editing, and analysis. When Claude needs to work with Korean word processor documents (.hwpx files) for: (1) Reading and extracting content, (2) Creating new documents, (3) Modifying or editing content, (4) Extracting tables to CSV, (5) Modifying tables or table cells, or any other HWPX document tasks. MANDATORY TRIGGERS: hwpx, hwp, 한글, 한컴, Hancom, Korean document
npx skill4agent add iamseungpil/claude-for-dslab hwpx| Task | Approach |
|---|---|
| Read/analyze content | |
| Create new document | Use |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
.hwp# Using hwpxjs CLI (pure TypeScript, no external dependencies)
npx hwpxjs convert:hwp document.hwp output.hwpx
# Or using LibreOffice as fallback
python scripts/office/soffice.py --headless --convert-to hwpx document.hwp# Text extraction via CLI
npx hwpxjs txt document.hwpx
# HTML conversion (includes images/styles)
npx hwpxjs html document.hwpx > output.html
# Raw XML access
python scripts/unpack.py document.hwpx unpacked/python scripts/office/soffice.py --headless --convert-to pdf document.hwpx
pdftoppm -jpeg -r 150 document.pdf pagenpm install @ssabrojs/hwpxjsconst { HwpxWriter, HwpxReader } = require("@ssabrojs/hwpxjs");
const fs = require("fs");
// Create document from plain text
const writer = new HwpxWriter();
const content = `문서 제목
첫 번째 문단입니다.
두 번째 문단입니다.`;
const buffer = await writer.createFromPlainText(content);
fs.writeFileSync("output.hwpx", buffer);const { HwpxReader } = require("@ssabrojs/hwpxjs");
const fs = require("fs");
const reader = new HwpxReader();
const fileBuffer = fs.readFileSync("document.hwpx");
await reader.loadFromArrayBuffer(fileBuffer.buffer);
// Extract text
const text = await reader.extractText();
console.log(text);
// Get document info
const info = await reader.getDocumentInfo();
console.log(info);
// List images
const images = await reader.listImages();
console.log(images);
// [{ binPath: "BinData/0.jpg", width: 200, height: 150, format: "jpg" }]// Basic HTML conversion
const html = await reader.extractHtml();
// With all options
const fullHtml = await reader.extractHtml({
paragraphTag: "p",
tableClassName: "hwpx-table",
renderImages: true, // Include images
renderTables: true, // Include tables
renderStyles: true, // Apply styles (bold, italic, color)
embedImages: true, // Base64 embed images
tableHeaderFirstRow: true // First row as <th>
});const { HwpConverter } = require("@ssabrojs/hwpxjs");
const converter = new HwpConverter({ verbose: true });
// Check availability
if (converter.isAvailable()) {
// Convert HWP to HWPX
const result = await converter.convertHwpToHwpx("input.hwp", "output.hwpx");
if (result.success) {
console.log(`Converted: ${result.processingTime}ms`);
}
// Or extract text only
const text = await converter.convertHwpToText("input.hwp");
}// hwpxjs supports {{key}} template replacement
const reader = new HwpxReader();
await reader.loadFromArrayBuffer(templateBuffer);
// Apply template replacements
const html = await reader.extractHtml();
const result = html
.replace(/\{\{name\}\}/g, "홍길동")
.replace(/\{\{date\}\}/g, "2025-01-01");fs.writeFileSync(path, buffer)fileBuffer.bufferfileBufferpython scripts/unpack.py document.hwpx unpacked/unpacked/Contents/<hp:linesegarray><!-- BEFORE: paragraph with stale layout cache -->
<hp:p id="0" paraPrIDRef="0" styleIDRef="0">
<hp:run charPrIDRef="19">
<hp:t>Original text</hp:t>
</hp:run>
<hp:linesegarray>
<hp:lineseg textpos="0" vertpos="0" vertsize="1000" horzsize="5000" .../>
</hp:linesegarray>
</hp:p>
<!-- AFTER: remove linesegarray entirely -->
<hp:p id="0" paraPrIDRef="0" styleIDRef="0">
<hp:run charPrIDRef="19">
<hp:t>New longer text that exceeds original width</hp:t>
</hp:run>
</hp:p><hp:run><hp:linesegarray>python scripts/pack.py unpacked/ output.hwpx<hp:linesegarray><hp:p><hp:run><hp:cellAddr><hp:cellAddr>grep -B20 'colAddr="2" rowAddr="0"' section0.xmlcharPrIDRefwidth="180000"width="3400"mm × (7200 ÷ 25.4) = HWP units| Element | Purpose |
|---|---|
| Paragraph |
| Text run with formatting |
| Text content |
| Table |
| Table cell |
| Cell position (AFTER content) |
| Image |
| Layout cache (remove when editing) |
<hp:p id="0" paraPrIDRef="0" styleIDRef="0" pageBreak="0">
<hp:run charPrIDRef="0">
<hp:t>Text content</hp:t>
</hp:run>
<hp:linesegarray> <!-- Remove this when editing text -->
<hp:lineseg textpos="0" vertpos="0" vertsize="1000" .../>
</hp:linesegarray>
</hp:p><hp:tc borderFillIDRef="5">
<hp:subList textDirection="HORIZONTAL" vertAlign="CENTER">
<hp:p paraPrIDRef="20">
<hp:run charPrIDRef="19">
<hp:t>Cell content</hp:t>
</hp:run>
</hp:p>
</hp:subList>
<hp:cellAddr colAddr="0" rowAddr="0"/> <!-- Position identifier -->
<hp:cellSpan colSpan="1" rowSpan="1"/>
<hp:cellSz width="5136" height="4179"/>
</hp:tc><hp:pic><hp:run><hp:t/>BinData/Contents/content.hpf<opf:item id="image1" href="BinData/image1.png" media-type="image/png" isEmbeded="1"/><hp:p id="0" paraPrIDRef="38" styleIDRef="41">
<hp:run charPrIDRef="0">
<hp:pic id="12345" zOrder="0" numberingType="PICTURE" textWrap="TOP_AND_BOTTOM">
<hp:orgSz width="7200" height="7200"/> <!-- 1 inch = 7200 HWP units -->
<hp:curSz width="3600" height="3600"/> <!-- Display: 0.5 inch -->
<hc:img binaryItemIDRef="image1" bright="0" contrast="0" effect="REAL_PIC" alpha="0"/>
<hp:sz width="3600" widthRelTo="ABSOLUTE" height="3600" heightRelTo="ABSOLUTE"/>
<hp:pos treatAsChar="1" horzRelTo="COLUMN" horzAlign="CENTER" vertRelTo="PARA" vertAlign="TOP"/>
</hp:pic>
<hp:t/> <!-- REQUIRED: empty text element after hp:pic -->
</hp:run>
</hp:p><hp:p pageBreak="1" ...> <!-- pageBreak="1" inserts break before paragraph -->| Aspect | HWPX | DOCX |
|---|---|---|
| Text element | | |
| Paragraph | | |
| Run | | |
| Layout cache | | None |
| Content location | | |
| Cell identifier | | implicit order |
npm install @ssabrojs/hwpxjsnpm install @ssabrojs/hwpxjsscripts/office/soffice.pypdftoppm