Loading...
Loading...
Extract text/tables from PDFs, create formatted PDFs, merge/split/rotate, handle forms and metadata. Supports pdf-lib/pdfkit (Node.js) and pypdf/pdfplumber/ReportLab (Python).
npx skill4agent add vasilyu1983/ai-agents-public document-pdfreferences/pdf-extraction-patterns.mdscripts/| Task | Tool/Library | Language | When to Use |
|---|---|---|---|
| Create PDF | pdfkit | Node.js | Reports, invoices, certificates |
| Create PDF | ReportLab | Python | Complex layouts, tables |
| Create PDF | FPDF2 | Python | Simple PDFs with Unicode support |
| Create PDF | Borb | Python | Interactive elements, pure Python |
| Edit PDF | pdf-lib | Node.js | Modify existing PDFs, add pages |
| Extract text | pdfplumber | Python | OCR-free text extraction |
| OCR scanned PDF | PyMuPDF + Tesseract | Python | Scanned PDFs (no selectable text) |
| Extract tables | Camelot | Python | Tables with borders (Lattice mode) |
| Extract tables | Camelot/Tabula | Python | Tables without borders (Stream mode) |
| Parse/merge/split/rotate | pypdf | Python | Deterministic PDF manipulation |
| Fill forms | pdf-lib | Node.js | Form automation |
| HTML to PDF | Puppeteer/Playwright | Node.js | High-fidelity web page rendering |
| HTML to PDF | WeasyPrint | Python | CSS3-based, no browser needed |
pdfkitReportLabassets/invoice-template.mdassets/report-template.mdreferences/pdf-generation-patterns.mdreferences/pdf-extraction-patterns.mdassets/pdf-release-checklist.mdpython3 scripts/merge_pdfs.py merged.pdf a.pdf b.pdfpython3 scripts/split_pdf.py in.pdf out_dir --each-pagepython3 scripts/rotate_pdf.py in.pdf out.pdf --degrees 90python3 scripts/scrub_metadata.py in.pdf out.pdfINVOICE STRUCTURE
├── Header (logo, company info, invoice #)
├── Bill To / Ship To blocks
├── Line items table
│ ├── Description | Qty | Unit Price | Total
│ └── Subtotal, Tax, Total
├── Payment terms
└── Footer (contact, thank you)REPORT PDF STRUCTURE
├── Cover page (title, author, date)
├── Table of contents
├── Body sections with page numbers
├── Charts/images with captions
├── Appendices
└── Running header/footerPDF Task: [What do you need?]
├─ Create new PDF?
│ ├─ Simple text/tables → pdfkit (Node) or ReportLab (Python)
│ ├─ Complex layouts → ReportLab with Platypus
│ └─ From HTML → Puppeteer or wkhtmltopdf
│
├─ Extract from PDF?
│ ├─ Text only → pdfplumber (Python)
│ ├─ Tables → pdfplumber or camelot (Python)
│ └─ Images → PyMuPDF/fitz (Python)
│
├─ Modify existing PDF?
│ ├─ Add text/images → pdf-lib (Node)
│ ├─ Merge/split → pypdf or pdf-lib
│ └─ Fill forms → pdf-lib
│
└─ Batch processing?
└─ pypdf + pdfplumber pipelineassets/pdf-release-checklist.md