Search Results: ocr

Found 203 Skills

Document Processingdevskale/skale-skills

markdown-converter

Convert documents to Markdown using markitdown. Use when you need to extract text and convert PDF, Word, PowerPoint, Excel, HTML, CSV, JSON, XML, images (with EXIF/OCR), audio, ZIP archives, YouTube URLs, or EPUBs to Markdown format for LLM processing or text analysis.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningjst-well-dan/skill-box

deep-reading

Deep Reading Collaborative System: A system leveraging multi-layered AI Agents to help transform articles from "read" to "understood" to "mastered", and convert knowledge into actionable plans. Use this system when you need to deeply understand complex articles/papers, systematically organize reading notes, think critically about content, discover hidden logical issues and assumptions, or turn knowledge into action plans. Trigger keywords: deep reading, critical thinking, reading notes, article analysis, Socratic questioning, action plan

🇨🇳|ChineseTranslated

Mobile Developmentsoftware-mansion-labs/rea...

react-native-executorch

Build on-device AI into React Native apps using ExecuTorch. Provides hooks for LLMs, computer vision, OCR, audio processing, and embeddings without cloud dependencies. Use when building AI features into mobile apps - AI chatbots, image recognition, speech processing, or text search.

🇺🇸|EnglishTranslated

Automationdavila7/claude-code-templ...

zapier-make-patterns

No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power

🇺🇸|EnglishTranslated

Project Managementjoelhooks/joelclaw

adr-skill

Create and maintain Architecture Decision Records (ADRs) optimized for agentic coding workflows. Use when you need to propose, write, update, accept/reject, deprecate, or supersede an ADR; bootstrap an adr folder and index; consult existing ADRs before implementing changes; or enforce ADR conventions. This skill uses Socratic questioning to capture intent before drafting, and validates output against an agent-readiness checklist.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningqianwen-ai/qianwen-ai

qianwen-image-generation

[QianWen] Generate and edit images using Wan and Qwen Image models. Supports text-to-image, image editing (style transfer, subject consistency, text rendering), and interleaved text-image output. TRIGGER when: user wants to create illustrations, product images, artistic designs, posters, text-to-image generation, edit/transform existing images, apply style transfer, generate images based on reference photos, interleaved text-image content, mentions Wan/Qwen Image models/AI art creation, or explicitly invokes this skill by name (e.g. use qianwen-image-generation). DO NOT TRIGGER when: user wants to understand/analyze existing images or OCR (use qianwen-vision), video generation (use qianwen-video-generation), text-only tasks.

🇺🇸|EnglishTranslated

4 scripts/Checked

Tools & Utilitiesnumman-ali/zai-cli

zai-cli

Z.AI CLI providing: - Vision: image/video analysis, OCR, UI-to-code, error diagnosis (GLM-4.6V) - Search: real-time web search with domain/recency filtering - Reader: web page to markdown extraction - Repo: GitHub code search and reading via ZRead - Tools: MCP tool discovery and raw calls - Code: TypeScript tool chaining Use for visual content analysis, web search, page reading, or GitHub exploration. Requires Z_AI_API_KEY.

🇺🇸|EnglishTranslated

AI & Machine Learningd-o-hub/rust-self-learnin...

analysis-swarm

Multi-perspective code analysis using three AI personas (RYAN, FLASH, SOCRATES) for comprehensive decision-making. Use when complex code decisions need analysis from multiple viewpoints, or when avoiding single-perspective blind spots is critical.

🇺🇸|EnglishTranslated

AI & Machine Learningsamhvw8/dot-claude

ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

🇺🇸|EnglishTranslated

6 scripts/Attention

AI & Machine Learningimbad0202/academic-resear...

deep-research

Universal deep research agent team. 13-agent pipeline for rigorous academic research on any topic. 7 modes: full research, quick brief, paper review, lit-review, fact-check, Socratic guided research dialogue, and systematic review with optional meta-analysis. Covers research question formulation, Socratic mentoring, methodology design, systematic literature search, source verification, cross-source synthesis, risk of bias assessment, meta-analysis, APA 7.0 report compilation, editorial review, devil's advocate challenges, ethics review, and post-research literature monitoring. Triggers on: research, deep research, literature review, systematic review, meta-analysis, PRISMA, evidence synthesis, fact-check, guide my research, help me think through, 研究, 深度研究, 文獻回顧, 文獻探討, 系統性回顧, 後設分析, 事實查核, 引導我的研究, 幫我釐清, 幫我想想, 我不確定要研究什麼, 研究方向, 研究主題.

🇺🇸|EnglishTranslated

Tools & Utilitiesakillness/oh-my-skills

ooo

Run the Ouroboros specification-first development loop: reduce ambiguity with a Socratic interview, freeze an immutable seed/spec, execute against that contract, verify before claiming success, and keep looping until completion is actually verified. Use when the user wants spec-first clarification, immutable requirements, drift-aware implementation, or a persistent completion loop that should keep going until tests / checks / acceptance criteria pass. Triggers on: ooo, ouroboros, interview, seed, run workflow, evaluate, evolve, ooo ralph, specification first, socratic interview, ambiguity reduction, persistent completion.

🇺🇸|EnglishTranslated

Tools & Utilitiesjonbasse/adhd-assistant

adhd-assistant

ADHD-friendly life management assistant providing external scaffolding for executive function challenges. Use when the user asks for help with daily planning, task breakdown, time management, prioritization, body doubling, dopamine regulation, or maintaining routines. Triggers on requests about organizing life, staying on top of tasks, beating procrastination, planning day/week, managing overwhelm, or ADHD-related challenges like time blindness, forgetfulness, difficulty starting tasks, emotional dysregulation, shame/guilt about productivity, or feeling stuck/paralyzed.

🇺🇸|EnglishTranslated