vision-support

Original：🇨🇳 Chinese

Translated

5 scripts

Provides image recognition capabilities for non-multimodal models (such as pure text models like deepseek-v4-pro, GLM-5.1, mimo-v2.5-pro, etc.). This skill is automatically triggered when the main model cannot recognize images, when users send screenshots/design drafts/UI screenshots for analysis, or when users say 'Look at this image', 'Analyze this screenshot', 'What's wrong with this image'. It also applies to any scenario where users paste images but the current model does not support image input. Supports simultaneous recognition of multiple images, with primary-backup fallback achieved by configuring multiple image recognition models. It can also be manually triggered using the commands /skill:vision-support or /vision. Iron Rule: The models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. Note: If the current model is itself a multimodal model (such as Claude Sonnet 4, GPT-4o, Gemini, etc. that can directly recognize images), do not use this skill; let the main model recognize directly.

7installs

Sourcepenfick/skills

Added on2026-05-19

NPX Install

npx skill4agent add penfick/skills vision-support

SKILL.md Content (Chinese)

View Translation Comparison →

Vision Support — Image Recognition Bridge for Non-Multimodal Models

Iron Rule: All models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. These models will not replace the main model to make any decisions, analysis or coding; they only take charge of "seeing" images and describing the content in text.

When to Use This Skill

Users attach images in conversations, but the current model does not support image understanding
Users mention screenshots/images/interfaces/designs: "Look at this screenshot", "There's a problem with the interface", "This design draft"
Users describe a visual problem but can't explain clearly: "The webpage displays incorrectly", "The layout is messed up"
Agents encounter image files during work (PNG/JPG/WebP, etc.)
Manually trigger via commands
/vision
or
/skill:vision-support

First Use — One-Click Initialization

bash

node SKILL_DIR/scripts/vision.mjs init

Interactive guidance in just three steps:

Select Provider — Choose from the preset list of mainstream platforms
Fill in API Key — Enter the API Key (or environment variable name)
Select Model — Automatically pull available model lists from the API for selection (recommended list will be displayed if pulling fails)

Supported platforms cover mainstream domestic and international options:

Category	Platforms
International	OpenAI, Google Gemini, Anthropic Claude, DeepSeek, Groq, Mistral, xAI (Grok), OpenRouter, Fireworks AI
Domestic	Tongyi Qianwen (Qwen VL), Zhipu GLM (GLM-4V), Moonshot (Kimi), Step (Jieyue Xingchen), MiniMax, SiliconFlow, Xiaomi MiMo
Local	Ollama, LM Studio
Custom	Any third-party platform compatible with OpenAI (fill in baseUrl manually)

Add Backup Models

bash

node SKILL_DIR/scripts/vision.mjs config add

Same interactive guidance, the added models serve as fallback options. They will be automatically tried if the primary model fails.

All Configuration Commands

bash

# Interactive
node SKILL_DIR/scripts/vision.mjs init                    # Initialize primary model
node SKILL_DIR/scripts/vision.mjs config add              # Add fallback model
node SKILL_DIR/scripts/vision.mjs config edit [name]      # Edit model

# Quick Commands
node SKILL_DIR/scripts/vision.mjs config list             # List all models
node SKILL_DIR/scripts/vision.mjs config primary [name]   # Set primary model
node SKILL_DIR/scripts/vision.mjs config remove <name>    # Delete model
node SKILL_DIR/scripts/vision.mjs config set-key <name> <key>   # Set API Key
node SKILL_DIR/scripts/vision.mjs config set-url <name> <url>   # Set API URL
node SKILL_DIR/scripts/vision.mjs config test [name]      # Test connectivity

Usage — Image Recognition

Single Image

bash

node SKILL_DIR/scripts/vision.mjs ./screenshot.png
node SKILL_DIR/scripts/vision.mjs ./ui.png "What's wrong with the layout of this interface?"
node SKILL_DIR/scripts/vision.mjs "https://example.com/img.png" "Describe this image"

Multiple Images

bash

node SKILL_DIR/scripts/vision.mjs img1.png img2.png "Compare the differences between these two images"
node SKILL_DIR/scripts/vision.mjs ./screenshots/*.png "Analyze these interface screenshots"
node SKILL_DIR/scripts/vision.mjs ./local.png https://example.com/remote.jpg "Describe these two images"

Find Images

If users mention images but don't provide paths, search first:

bash

find . -name "*.png" -o -name "*.jpg" -o -name "*.webp" | head -20
ls -lt *.png *.jpg *.webp 2>/dev/null

Workflow After Obtaining Results

The plain text output to stdout after the script succeeds is the recognition result (stderr is logs and does not affect).

Read recognition result: The content in stdout is the image description
Combine with user's question: Integrate the description with user's requirements
Main model continues working: Use the recognition result as context for the main model to complete subsequent tasks

Fallback Mechanism

The ★ primary model ranked first in

config list

is called with priority. Subsequent models will be tried automatically if it fails. If all models fail, the script exits with a non-zero exit code.

Environment Variables

Variable	Description
`VISION_CONFIG_PATH`	Custom configuration file path
`VISION_DEFAULT_MODEL`	Temporarily override the primary model (matched by name)
`VISION_API_KEY`	Global API Key fallback

vision-support

NPX Install

Tags

SKILL.md Content (Chinese)

Vision Support — Image Recognition Bridge for Non-Multimodal Models

When to Use This Skill

First Use — One-Click Initialization

Add Backup Models

All Configuration Commands

Usage — Image Recognition

Single Image

Multiple Images

Find Images

Workflow After Obtaining Results

Fallback Mechanism

Environment Variables