Loading...
Loading...
Found 112 Skills
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.
Extract contact information from business card images using OCR - name, company, email, phone, address.
[QwenCloud] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qwencloud-vision). DO NOT TRIGGER when: user wants to generate/create images (use qwencloud-image-generation), generate videos (use qwencloud-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.
OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC and attention-based decoders. Use when training, evaluating, exporting, pruning, quantizing, retraining, or running inference for a TAO OCRNet model. Trigger phrases include "train OCRNet", "scene text recognition", "OCR cropped text", "CTC / attention text decoder".
Extract vendor, date, items, amounts, and total from receipt images using OCR and pattern matching with structured JSON output.
Analyzes and processes images using Claude's vision capabilities. Supports OCR, image classification, diagram comparison, chart analysis, visual Q&A, and more. Use when users need to understand, extract, or analyze visual content.
Parse PDF into Markdown/JSON/DOCX using MinerU API. Extract text, tables, formulas with OCR support. Use when converting PDF documents, extracting content from scanned papers, or batch processing PDF files.
Convert files between 140+ formats using the ConversionTools MCP server. Use when the user needs to convert documents (Word, PDF, Excel, PowerPoint), data formats (JSON, CSV, XML, YAML, Parquet), images (PNG, JPG, WebP, AVIF, HEIC, JXL, SVG), audio (MP3, WAV, FLAC), video (MOV, MKV, AVI to MP4), e-books (EPUB, MOBI, AZW), OCR text extraction, AI-powered data extraction, AI text-to-speech (TTS), AI speech-to-text transcription (STT), subtitle conversion (SRT, VTT, ASS), or website screenshots.
[QianWen] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qianwen-vision). DO NOT TRIGGER when: user wants to generate/create images (use qianwen-image-generation), generate videos (use qianwen-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.
Verifies identity documents via the Didit standalone API. Use when verifying a passport, ID card, driver's license, or residence permit, performing OCR extraction, MRZ parsing, document authenticity checks, or KYC document validation. Supports 4000+ document types across 220+ countries.
7 ocr & translation skills. Trigger: scanning documents, recognizing formulas, translating academic papers. Design: specialized OCR (LaTeX, handwriting) and translation for scholarly content.