Search Results: text-extraction

Found 54 Skills

ebook-extractor

Use when user wants to extract text from ebooks (EPUB, MOBI, PDF). Use for converting ebooks to plain text for analysis, processing, or reading. Handles all common ebook formats.

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningpchalasani/claude-code-to...

recover-context

Extract full context of the last task from the most recent parent session shown in the session lineage. Strategically uses sub-agents to avoid bloating your own context.

🇺🇸|EnglishTranslated

Document Processingteam-commonly/commonly

pdf

Manipulate PDF files — extract text, count pages, render thumbnails, merge or split documents. Use for PDF-specific operations that don't fit `markdown-converter` (general read) or `pandic-office` (write from markdown).

🇺🇸|EnglishTranslated

Data Processingtiangong-ai/skills

eceee-news-fulltext-fetch

Discover article URLs from https://www.eceee.org/all-news/ and extract/persist full article text into SQLite with retry-safe incremental sync. Use when building or maintaining an eceee news fulltext corpus for downstream search, indexing, or summarization.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingwilloscar/research-units-...

pdf-text-extractor

Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`（或你明确需要全文证据）并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`（默认）；或你不希望进行下载/抽取（成本/权限/时间）。 **Network**: fulltext 下载通常需要网络（除非你手工提供 PDF 缓存在 `papers/pdfs/`）。 **Guardrail**: 缓存下载到 `papers/pdfs/`；默认不覆盖已有抽取文本（除非显式要求重抽）。

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingkunhai-88/skills

docx

Creation, editing, and analysis of Word documents, supporting track changes, comments, format retention, and text extraction. Use this when you need to create .docx files, modify content, handle track changes/comments, or perform other document tasks.

🇨🇳|ChineseTranslated

Tools & Utilitiesmembranedev/application-s...

ocr-web-service

OCR Web Service integration. Manage Documents. Use when the user wants to interact with OCR Web Service data.

🇺🇸|EnglishTranslated

AI & Machine Learningframersai/agentos-skills

vision-ocr

Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.

🇺🇸|EnglishTranslated

Document Processingdawiddutoit/custom-claude

pptx

Comprehensive PowerPoint presentation creation, editing, and analysis using OOXML manipulation including slides, layouts, speaker notes, comments, and formatting. Use when asked to "create a presentation", "edit this PowerPoint", "add slides to .pptx", "extract presentation text", or "analyze slide structure". Provides raw OOXML access for advanced formatting, python-pptx for programmatic slide generation, and markitdown for text extraction. Works with .pptx files through ZIP archive extraction and XML manipulation for professional presentation workflows.

🇺🇸|EnglishTranslated

13 scripts/Attention

Document Processingwinsorllc/upgraded-carniv...

pdf-read

Extract text and metadata from PDF files using pdf-parse. Use when: user uploads a PDF or asks to read/analyze PDF content. NOT for: creating PDFs, editing PDFs, or OCR on scanned documents.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingtrpc-group/trpc-agent-go

ocr

Extract text from images using Tesseract OCR

🇺🇸|EnglishTranslated

2 scripts/Checked

Document Processingypares/agent-skills

read-bin-docs

Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.

🇺🇸|EnglishTranslated

1 scripts/Checked