Search Results: ai-vision

Found 8 Skills

android-adb

Android device control and UI automation via ADB using a TypeScript helper CLI. Use for device/emulator discovery, USB or Wi-Fi connection, app launch/force-stop, tap/swipe/keyevent/text input, screenshots, APK install handling, device reset for app, and ADB troubleshooting. Use with ai-vision for screenshot-based UI recognition and coordinate decisions.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningjimliu/baoyu-skills

baoyu-danger-gemini-web

Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation.

🇺🇸|EnglishTranslated

23.4k

24 scripts/Attention

Automationhttprunner/skills

wechat-search-collector

Automated collection process for WeChat Channels search and result traversal (Android), supporting scenarios such as comprehensive page search and personal page search.

🇨🇳|ChineseTranslated

AI & Machine Learningrotoslider/choom

image-analysis

Analyzes images using a vision-capable LLM (Optic). Can read workspace images, URLs, base64 data, or previously generated images by ID.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningfal-ai-community/skills

fal-vision

Analyze images using AI — segment objects, detect objects, extract text (OCR), describe images, ask questions about images. Use when the user requests "Segment image", "Detect objects", "OCR", "Extract text from image", "Describe image", "What's in this image", "Image analysis".

🇺🇸|EnglishTranslated

1 scripts/Attention

Tools & Utilitiesabsolutelyskilled/absolut...

video-analyzer

Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.

🇺🇸|EnglishTranslated

Automationjmsktm/claude-settings

omni-vu

Screen capture, AI vision analysis, and GUI automation for macOS. Use when you need to see what's on screen, analyze UI state, detect changes, or automate mouse/keyboard actions.

🇺🇸|EnglishTranslated

AI & Machine Learninghttprunner/skills

ai-vision

Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name.

🇺🇸|EnglishTranslated

1 scripts/Attention