Loading...
Loading...
Found 203 Skills
图片分析与识别,可分析本地图片、网络图片、视频、文件。适用于 OCR、物体识别、场景理解等。当用户发送图片或要求分析图片时必须使用此技能。
Use when the user needs PDF generation, manipulation, form filling, table extraction, OCR, merging, splitting, watermarking, or metadata handling. Trigger conditions: generate PDF reports, extract text or tables from PDFs, fill PDF forms programmatically, merge or split PDF files, add watermarks, OCR scanned documents, read or write PDF metadata, convert HTML to PDF.
Extract frames from video files using ffmpeg for AI/LLM analysis. Use when (1) the user asks to analyze, describe, or summarize a video file, (2) the user wants to extract frames or screenshots from a video, (3) the user provides a video file (.mp4, .mov, .avi, .mkv, .webm, etc.) and asks questions about its visual content, (4) the user wants to identify scenes, objects, or events in a video, (5) the user wants timestamps overlaid on extracted frames for temporal reference. Converts video into JPEG frames that can be attached to LLM prompts as images. Requires ffmpeg on PATH. Supports scene-change detection, model-aware optimization (Claude/OpenAI/Gemini), quality presets (efficient/balanced/detailed/ocr), grayscale and high-contrast OCR mode, and automatic FPS calculation via --max-frames.
Analyze images using AI — segment objects, detect objects, extract text (OCR), describe images, ask questions about images. Use when the user requests "Segment image", "Detect objects", "OCR", "Extract text from image", "Describe image", "What's in this image", "Image analysis".
Socratic book-learning tutor for any book or course. Teaches chapter-by-chapter using guided questioning, ~200-word explanations, and comprehension checks. Tracks progress and writes durable concept notes to a vault. Reads book config from project CLAUDE.md. Use when the user says "chapter N", "let's study", "teach me X", or when a project CLAUDE.md declares a learning context.
Analyze images — segment objects, detect, run OCR, describe, and answer visual questions via fal.ai vision models.
AI screen memory — search everything you've seen or heard on your computer. Integrates with Screenpipe's local MCP server for OCR text, audio transcripts, and app usage history.
This skill should be used when the user needs to convert documents between formats (Office to PDF, PDF to images, image to PDF), perform PDF operations (merge, split, rotate, encrypt, decrypt), or run OCR on scanned documents. Uses local free tools — LibreOffice, ghostscript, pdftk, tesseract, and imagemagick — with no API key required. Trigger when the user says "convert this document", "export to PDF", "merge PDFs", "split PDF", "rotate PDF", "OCR this scan", "convert PPTX to PDF", "convert DOCX to PDF", or any document format conversion request.
Analyze images using Dashscope (Qwen) Vision models for detailed description, OCR text extraction, object recognition, and visual Q&A. Use when the user needs to understand image content via Alibaba Cloud Dashscope API, especially for Chinese-language image analysis and documents.
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
Analyze images using GPT-4 Vision for detailed description, OCR text extraction, object recognition, and visual Q&A. Use when the user needs to understand image content, extract text from screenshots, identify objects in photos, or ask questions about images via OpenAI GPT-4 Vision API.
Use when user says "create workflow", "create a workflow", "design workflow", "orchestrate", "automate multiple steps", "coordinate agents", "multi-agent workflow". Creates orchestration workflows from natural language using Socratic questioning to plan multi-agent workflows with visualization.