Loading...
Loading...
This skill should be used when the user needs to convert documents between formats (Office to PDF, PDF to images, image to PDF), perform PDF operations (merge, split, rotate, encrypt, decrypt), or run OCR on scanned documents. Uses local free tools — LibreOffice, ghostscript, pdftk, tesseract, and imagemagick — with no API key required. Trigger when the user says "convert this document", "export to PDF", "merge PDFs", "split PDF", "rotate PDF", "OCR this scan", "convert PPTX to PDF", "convert DOCX to PDF", or any document format conversion request.
npx skill4agent add ericgandrade/claude-superskills document-converterdocling-converter# Detect OS
uname -s # Darwin = macOS, Linux = Linux; on Windows use 'ver' or check $OS
# Check each tool
libreoffice --version 2>/dev/null || echo "NOT FOUND"
gs --version 2>/dev/null || echo "NOT FOUND"
pdftk --version 2>/dev/null || echo "NOT FOUND"
tesseract --version 2>/dev/null || echo "NOT FOUND"
convert -version 2>/dev/null || echo "NOT FOUND" # ImageMagick| Tool | macOS | Linux (apt) | Windows |
|---|---|---|---|
| LibreOffice | | | |
| Ghostscript | | | |
| pdftk | | | |
| Tesseract | | | |
| ImageMagick | | | |
# Convert single file to PDF in the same directory
libreoffice --headless --convert-to pdf "/path/to/file.pptx" --outdir "/path/to/output/"
# Convert to HTML
libreoffice --headless --convert-to html "/path/to/file.docx" --outdir "/path/to/output/"
# Batch: convert all PPTX in a directory
libreoffice --headless --convert-to pdf /path/to/folder/*.pptx --outdir /path/to/output/--outdirlibreoffice--norestore# PDF pages → PNG (one file per page)
convert -density 150 "/path/to/doc.pdf" "/path/to/output/page_%03d.png"
# PDF pages → JPG with quality control
convert -density 150 "/path/to/doc.pdf" -quality 85 "/path/to/output/page_%03d.jpg"
# Single image → PDF
convert "/path/to/image.png" "/path/to/output/document.pdf"
# Multiple images → single PDF
convert img1.png img2.jpg img3.tiff "/path/to/output/combined.pdf"/etc/ImageMagick-*/policy.xmlpdftkghostscript# pdftk (preferred)
pdftk file1.pdf file2.pdf file3.pdf cat output merged.pdf
# ghostscript (fallback)
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf file1.pdf file2.pdf file3.pdf# pdftk — extract pages 1-3 and 5
pdftk input.pdf cat 1-3 5 output extracted.pdf
# ghostscript — extract pages 2 to 5
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dFirstPage=2 -dLastPage=5 -sOutputFile=extracted.pdf input.pdf# pdftk — rotate all pages 90° clockwise
pdftk input.pdf rotate 1-endeast output rotated.pdf
# pdftk — rotate specific page (page 3) 180°
pdftk input.pdf rotate 3south output rotated.pdfpdftk input.pdf output secured.pdf user_pw "userpass" owner_pw "ownerpass"pdftk secured.pdf input_pw "password" output decrypted.pdf# OCR a single image → searchable text file
tesseract "/path/to/scan.png" "/path/to/output/result"
# Output: result.txt
# OCR with language specification (Portuguese example)
tesseract "/path/to/scan.png" output -l por
# OCR → searchable PDF (requires tesseract with pdf support)
tesseract "/path/to/scan.png" output -l eng pdf
# Output: output.pdf
# OCR a scanned PDF: rasterize first, then OCR
convert -density 300 "scan.pdf" "scan_page_%03d.tiff"
tesseract "scan_page_000.tiff" output -l eng pdf# Rasterize all pages
convert -density 300 "scan.pdf" "page_%03d.tiff"
# OCR each page and merge results
for f in page_*.tiff; do
tesseract "$f" "${f%.tiff}" -l eng pdf
done
pdftk page_*.pdf cat output final_ocr.pdf| Task | Primary tool | Fallback |
|---|---|---|
| Office → PDF | LibreOffice | None (LibreOffice is the standard) |
| Office → HTML | LibreOffice | None |
| PDF → images | ImageMagick | ghostscript ( |
| Images → PDF | ImageMagick | ghostscript |
| Merge PDFs | pdftk | ghostscript |
| Split PDF | pdftk | ghostscript |
| Rotate PDF | pdftk | ghostscript |
| Encrypt PDF | pdftk | None |
| Decrypt PDF | pdftk | None |
| OCR | tesseract | None |
| Tool | Best for |
|---|---|
| Office → PDF, PDF operations (merge/split/rotate/encrypt), OCR, image ↔ PDF |
| PDF/Office → structured Markdown or JSON; layout-aware content extraction |
| Markdown ↔ HTML ↔ LaTeX ↔ DOCX; lightweight lightweight text format conversions |
| Translating PowerPoint files between languages |
libreoffice --headless --convert-to pdf presentation.pptx --outdir ./output/pdftk report.pdf appendix.pdf cover.pdf cat output final.pdftesseract scan.png resultado -l porconvert -density 150 document.pdf page_%03d.pngpdftk sensitive.pdf output protected.pdf user_pw "secret123"