vision-support
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVision Support — 非多模态模型的图片识别桥接
Vision Support — Image Recognition Bridge for Non-Multimodal Models
铁律:本技能配置的所有模型仅用于图片内容识别,绝不参与主逻辑推理。 这些模型不会代替主模型做任何决策、分析或编码,它们只负责"看"图片然后把看到的内容用文字描述出来。
Iron Rule: All models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. These models will not replace the main model to make any decisions, analysis or coding; they only take charge of "seeing" images and describing the content in text.
什么时候使用此技能
When to Use This Skill
- 用户在对话中附带了图片,但当前模型不支持图片理解
- 用户提到截图/图片/界面/设计:"看看这个截图"、"界面有问题"、"这个设计稿"
- 用户描述了一个视觉问题但说不清楚:"网页显示不对"、"布局乱了"
- agent 在工作中遇到图片文件(PNG/JPG/WebP 等)
- 通过指令 或
/vision手动触发/skill:vision-support
- Users attach images in conversations, but the current model does not support image understanding
- Users mention screenshots/images/interfaces/designs: "Look at this screenshot", "There's a problem with the interface", "This design draft"
- Users describe a visual problem but can't explain clearly: "The webpage displays incorrectly", "The layout is messed up"
- Agents encounter image files during work (PNG/JPG/WebP, etc.)
- Manually trigger via commands or
/vision/skill:vision-support
首次使用 — 一键初始化
First Use — One-Click Initialization
bash
node SKILL_DIR/scripts/vision.mjs init交互式引导,只需三步:
- 选 Provider — 从预置的主流平台列表中选择
- 填密钥 — 输入 API Key(或环境变量名)
- 选模型 — 自动从 API 拉取可用模型列表供选择(如拉取失败则显示推荐列表)
支持的平台覆盖国内外主流:
| 分类 | 平台 |
|---|---|
| 国际 | OpenAI、Google Gemini、Anthropic Claude、DeepSeek、Groq、Mistral、xAI (Grok)、OpenRouter、Fireworks AI |
| 国内 | 通义千问 (Qwen VL)、智谱 GLM (GLM-4V)、Moonshot (Kimi)、阶跃星辰 (Step)、MiniMax、SiliconFlow (硅基流动)、小米 MiMo |
| 本地 | Ollama、LM Studio |
| 自定义 | 任何 OpenAI 兼容的第三方平台(自填 baseUrl) |
bash
node SKILL_DIR/scripts/vision.mjs initInteractive guidance in just three steps:
- Select Provider — Choose from the preset list of mainstream platforms
- Fill in API Key — Enter the API Key (or environment variable name)
- Select Model — Automatically pull available model lists from the API for selection (recommended list will be displayed if pulling fails)
Supported platforms cover mainstream domestic and international options:
| Category | Platforms |
|---|---|
| International | OpenAI, Google Gemini, Anthropic Claude, DeepSeek, Groq, Mistral, xAI (Grok), OpenRouter, Fireworks AI |
| Domestic | Tongyi Qianwen (Qwen VL), Zhipu GLM (GLM-4V), Moonshot (Kimi), Step (Jieyue Xingchen), MiniMax, SiliconFlow, Xiaomi MiMo |
| Local | Ollama, LM Studio |
| Custom | Any third-party platform compatible with OpenAI (fill in baseUrl manually) |
添加备用模型
Add Backup Models
bash
node SKILL_DIR/scripts/vision.mjs config add同样的交互式引导,添加的模型作为 fallback 回退。主模型失败后自动尝试。
bash
node SKILL_DIR/scripts/vision.mjs config addSame interactive guidance, the added models serve as fallback options. They will be automatically tried if the primary model fails.
所有配置命令
All Configuration Commands
bash
undefinedbash
undefined交互式
Interactive
node SKILL_DIR/scripts/vision.mjs init # 初始化主模型
node SKILL_DIR/scripts/vision.mjs config add # 添加 fallback
node SKILL_DIR/scripts/vision.mjs config edit [name] # 编辑模型
node SKILL_DIR/scripts/vision.mjs init # Initialize primary model
node SKILL_DIR/scripts/vision.mjs config add # Add fallback model
node SKILL_DIR/scripts/vision.mjs config edit [name] # Edit model
快捷命令
Quick Commands
node SKILL_DIR/scripts/vision.mjs config list # 列出所有模型
node SKILL_DIR/scripts/vision.mjs config primary [name] # 设置主模型
node SKILL_DIR/scripts/vision.mjs config remove <name> # 删除模型
node SKILL_DIR/scripts/vision.mjs config set-key <name> <key> # 设置密钥
node SKILL_DIR/scripts/vision.mjs config set-url <name> <url> # 设置 API 地址
node SKILL_DIR/scripts/vision.mjs config test [name] # 测试连通性
undefinednode SKILL_DIR/scripts/vision.mjs config list # List all models
node SKILL_DIR/scripts/vision.mjs config primary [name] # Set primary model
node SKILL_DIR/scripts/vision.mjs config remove <name> # Delete model
node SKILL_DIR/scripts/vision.mjs config set-key <name> <key> # Set API Key
node SKILL_DIR/scripts/vision.mjs config set-url <name> <url> # Set API URL
node SKILL_DIR/scripts/vision.mjs config test [name] # Test connectivity
undefined使用方法 — 识别图片
Usage — Image Recognition
单张
Single Image
bash
node SKILL_DIR/scripts/vision.mjs ./screenshot.png
node SKILL_DIR/scripts/vision.mjs ./ui.png "这个界面的布局有什么问题?"
node SKILL_DIR/scripts/vision.mjs "https://example.com/img.png" "描述这张图片"bash
node SKILL_DIR/scripts/vision.mjs ./screenshot.png
node SKILL_DIR/scripts/vision.mjs ./ui.png "What's wrong with the layout of this interface?"
node SKILL_DIR/scripts/vision.mjs "https://example.com/img.png" "Describe this image"多张
Multiple Images
bash
node SKILL_DIR/scripts/vision.mjs img1.png img2.png "对比这两张图的差异"
node SKILL_DIR/scripts/vision.mjs ./screenshots/*.png "分析这些界面截图"
node SKILL_DIR/scripts/vision.mjs ./local.png https://example.com/remote.jpg "描述这两张"bash
node SKILL_DIR/scripts/vision.mjs img1.png img2.png "Compare the differences between these two images"
node SKILL_DIR/scripts/vision.mjs ./screenshots/*.png "Analyze these interface screenshots"
node SKILL_DIR/scripts/vision.mjs ./local.png https://example.com/remote.jpg "Describe these two images"查找图片
Find Images
如果用户提到图片但没给路径,先搜索:
bash
find . -name "*.png" -o -name "*.jpg" -o -name "*.webp" | head -20
ls -lt *.png *.jpg *.webp 2>/dev/nullIf users mention images but don't provide paths, search first:
bash
find . -name "*.png" -o -name "*.jpg" -o -name "*.webp" | head -20
ls -lt *.png *.jpg *.webp 2>/dev/null获取结果后的工作流
Workflow After Obtaining Results
脚本成功后 stdout 输出的纯文本就是识别结果(stderr 是日志不影响)。
- 读取识别结果:stdout 内容即为图片描述
- 结合用户问题:把描述和用户需求结合
- 主模型继续工作:用识别结果作为上下文,主模型完成后续任务
The plain text output to stdout after the script succeeds is the recognition result (stderr is logs and does not affect).
- Read recognition result: The content in stdout is the image description
- Combine with user's question: Integrate the description with user's requirements
- Main model continues working: Use the recognition result as context for the main model to complete subsequent tasks
回退机制
Fallback Mechanism
config listThe ★ primary model ranked first in is called with priority. Subsequent models will be tried automatically if it fails. If all models fail, the script exits with a non-zero exit code.
config list环境变量
Environment Variables
| 变量 | 说明 |
|---|---|
| 自定义配置文件路径 |
| 临时覆盖主模型(按 name 匹配) |
| 全局密钥回退 |
| Variable | Description |
|---|---|
| Custom configuration file path |
| Temporarily override the primary model (matched by name) |
| Global API Key fallback |