Loading...
Loading...
Provides image recognition capabilities for non-multimodal models (such as pure text models like deepseek-v4-pro, GLM-5.1, mimo-v2.5-pro, etc.). This skill is automatically triggered when the main model cannot recognize images, when users send screenshots/design drafts/UI screenshots for analysis, or when users say 'Look at this image', 'Analyze this screenshot', 'What's wrong with this image'. It also applies to any scenario where users paste images but the current model does not support image input. Supports simultaneous recognition of multiple images, with primary-backup fallback achieved by configuring multiple image recognition models. It can also be manually triggered using the commands /skill:vision-support or /vision. Iron Rule: The models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. Note: If the current model is itself a multimodal model (such as Claude Sonnet 4, GPT-4o, Gemini, etc. that can directly recognize images), do not use this skill; let the main model recognize directly.
npx skill4agent add penfick/skills vision-supportIron Rule: All models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. These models will not replace the main model to make any decisions, analysis or coding; they only take charge of "seeing" images and describing the content in text.
/vision/skill:vision-supportnode SKILL_DIR/scripts/vision.mjs init| Category | Platforms |
|---|---|
| International | OpenAI, Google Gemini, Anthropic Claude, DeepSeek, Groq, Mistral, xAI (Grok), OpenRouter, Fireworks AI |
| Domestic | Tongyi Qianwen (Qwen VL), Zhipu GLM (GLM-4V), Moonshot (Kimi), Step (Jieyue Xingchen), MiniMax, SiliconFlow, Xiaomi MiMo |
| Local | Ollama, LM Studio |
| Custom | Any third-party platform compatible with OpenAI (fill in baseUrl manually) |
node SKILL_DIR/scripts/vision.mjs config add# Interactive
node SKILL_DIR/scripts/vision.mjs init # Initialize primary model
node SKILL_DIR/scripts/vision.mjs config add # Add fallback model
node SKILL_DIR/scripts/vision.mjs config edit [name] # Edit model
# Quick Commands
node SKILL_DIR/scripts/vision.mjs config list # List all models
node SKILL_DIR/scripts/vision.mjs config primary [name] # Set primary model
node SKILL_DIR/scripts/vision.mjs config remove <name> # Delete model
node SKILL_DIR/scripts/vision.mjs config set-key <name> <key> # Set API Key
node SKILL_DIR/scripts/vision.mjs config set-url <name> <url> # Set API URL
node SKILL_DIR/scripts/vision.mjs config test [name] # Test connectivitynode SKILL_DIR/scripts/vision.mjs ./screenshot.png
node SKILL_DIR/scripts/vision.mjs ./ui.png "What's wrong with the layout of this interface?"
node SKILL_DIR/scripts/vision.mjs "https://example.com/img.png" "Describe this image"node SKILL_DIR/scripts/vision.mjs img1.png img2.png "Compare the differences between these two images"
node SKILL_DIR/scripts/vision.mjs ./screenshots/*.png "Analyze these interface screenshots"
node SKILL_DIR/scripts/vision.mjs ./local.png https://example.com/remote.jpg "Describe these two images"find . -name "*.png" -o -name "*.jpg" -o -name "*.webp" | head -20
ls -lt *.png *.jpg *.webp 2>/dev/nullconfig list| Variable | Description |
|---|---|
| Custom configuration file path |
| Temporarily override the primary model (matched by name) |
| Global API Key fallback |