Loading...
Loading...
Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.
npx skill4agent add ypares/agent-skills read-bin-docsfrom pypdf import PdfReader
reader = PdfReader("document.pdf")
text = "".join(page.extract_text() for page in reader.pages)
print(text)uvx --with pypdf python /path/to/extract_pdf_text.py document.pdffrom pypdf import PdfReader
# Read all pages
reader = PdfReader("file.pdf")
for page in reader.pages:
text = page.extract_text()
print(text)from pypdf import PdfReader
reader = PdfReader("file.pdf")
# Get pages 1-5 (0-indexed)
for page in reader.pages[0:5]:
print(page.extract_text())scripts/extract_pdf_text.py# Extract all pages to stdout
python extract_pdf_text.py document.pdf
# Extract to file
python extract_pdf_text.py document.pdf --output text.txt
# Extract specific pages
python extract_pdf_text.py document.pdf --pages 1-5
python extract_pdf_text.py document.pdf --pages 1,3,5uvx --with pypdf python <script>page.extract_text(layout=True)