Search Results: ocr

Found 203 Skills

AI & Machine Learningmrgoonie/claudekit-skills

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

🇺🇸|EnglishTranslated

6 scripts/Attention

Documentation & Writingzpankz/mcp-skillset

dialectical

Compose intellectually sophisticated persuasive essays using tripartite dialectical structure (establish-critique-synthesize), paradox accumulation, conversational register calibration, and strategic humility. Supports three atomic writing primitives (AGONAL α, MAIEUTIC β, APOPHATIC γ) with hypersoft plithogenic composition, plus legacy style modes and hybrid combinations. Triggers on requests for persuasive writing to mixed/skeptical audiences, defending counterintuitive claims, Socratic pedagogical dialogue, editorial first-person essays, or writing that must balance accessibility with depth. Implements recursive thematic anchoring, forced dilemma construction, and transformed return closure. Use when linear argumentation is insufficient and accumulated tension resolves through synthesis.

🇺🇸|EnglishTranslated

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-multimodal-qwen-vl

Understand images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A, visual analysis, OCR-like extraction, chart/table reading, or screenshot understanding workflows.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitiesmaravilla-labs/maravilla-...

maravilla-media-transforms

Async media + document derivations via `platform.media.transforms` and the declarative `transforms` block in `maravilla.config.ts`. Media: transcode video, thumbnail extraction, image resize/variants, OCR. Documents (.docx/.odt/.pptx/.xlsx/...): convert to PDF, render page thumbnails, generic format conversion, Markdown extraction (RAG-ready), single-file HTML with inlined images, image-replacement templating ({{TAG}} swap + named-object swap), QR-code injection. Use when ingesting user uploads that need normalised renditions, generating contracts/invoices from templates, or extracting structured content for LLMs. Critical: derived keys are content-addressed — `keyFor(srcKey, spec)` is known up front, before the worker starts, so clients can render placeholder UI without round-trips. Declarative config is the default; imperative `transforms.*` calls are for one-offs.

🇺🇸|EnglishTranslated

AI & Machine Learningel-frontend/software-engi...

ai-driven-prd

Use when the user wants to author, refine, or audit a Product Requirements Document for AI coding agents. Walks through an 8-phase pipeline (Socratic discovery → PRD draft → acceptance criteria → adversarial review → task decomposition → AI-readiness gate → test generation → handoff). Triggers on "write a PRD", "spec this feature", "draft requirements", "prepare X for Claude/Cursor/Copilot/Windsurf/Aider to build", "audit my PRD", "is this PRD AI-ready", "score this spec".

🇺🇸|EnglishTranslated

Product & Designfounderjourney/claude-ski...

pathfinders-labs-brand-guidelines

Applies Pathfinders Labs' official brand identity to artifacts including landing pages, presentations, social media content, and documents. Use when creating content that represents Pathfinders Labs' mission of web democratization, technical expertise, and nomadic lifestyle. Ensures visual consistency and authentic voice across all platforms.

🇺🇸|EnglishTranslated

Platform Servicesvm0-ai/vm0-skills

kommo

Kommo (formerly amoCRM) API. Use when user mentions "Kommo", "amoCRM", "CRM", or sales pipeline management.

🇺🇸|EnglishTranslated

Document Processingwinsorllc/upgraded-carniv...

pdf-read

Extract text and metadata from PDF files using pdf-parse. Use when: user uploads a PDF or asks to read/analyze PDF content. NOT for: creating PDFs, editing PDFs, or OCR on scanned documents.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingwulaosiji/skills

document-hub

Unified Document Processing Hub, supporting creation, conversion, editing, and batch processing of formats including Word, Excel, PDF, Markdown, etc. Use when: - Create Word/Excel/PDF documents - Document format conversion - Batch document processing - Apply document templates - Document content editing - Media file conversion Cross-references: pdf, content-extractor, email-sender, long-form-writer, md-to-wechat, image-ocr Part of UniqueClub toolkit. Learn more: https://uniqueclub.ai

🇨🇳|ChineseTranslated

4 scripts/Attention

AI & Machine Learningglebis/claude-skills

vision-bench

Score and compare images using vision LLMs as judges. YAML-defined criteria presets for 11 use cases (text-to-image, photorealism, document OCR, charts, UI, portrait, product, scientific, invoice, alt-text, artistic style). Supports OpenAI, Anthropic, Gemini, Mistral, and OpenRouter as judge providers. Keys auto-decrypted via SOPS + age.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learningbbuf/sglang-auto-driven-s...

model-architecture-diagram

Return public original model architecture diagrams for user-specified LLM, VLM, MoE, diffusion, OCR, and SGLang/sgl-cookbook model families. Use when the user asks for a model structure chart, architecture diagram, or rendered image link for a specific model such as DeepSeek, GLM, Qwen, Kimi, MiniMax, Step, Hunyuan, or Qwen3-VL.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingypares/agent-skills

read-bin-docs

Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.

🇺🇸|EnglishTranslated

1 scripts/Checked