vision-support

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vision Support — 非多模态模型的图片识别桥接

Vision Support — Image Recognition Bridge for Non-Multimodal Models

铁律:本技能配置的所有模型仅用于图片内容识别,绝不参与主逻辑推理。 这些模型不会代替主模型做任何决策、分析或编码,它们只负责"看"图片然后把看到的内容用文字描述出来。
Iron Rule: All models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. These models will not replace the main model to make any decisions, analysis or coding; they only take charge of "seeing" images and describing the content in text.

什么时候使用此技能

When to Use This Skill

  • 用户在对话中附带了图片,但当前模型不支持图片理解
  • 用户提到截图/图片/界面/设计:"看看这个截图"、"界面有问题"、"这个设计稿"
  • 用户描述了一个视觉问题但说不清楚:"网页显示不对"、"布局乱了"
  • agent 在工作中遇到图片文件(PNG/JPG/WebP 等)
  • 通过指令
    /vision
    /skill:vision-support
    手动触发
  • Users attach images in conversations, but the current model does not support image understanding
  • Users mention screenshots/images/interfaces/designs: "Look at this screenshot", "There's a problem with the interface", "This design draft"
  • Users describe a visual problem but can't explain clearly: "The webpage displays incorrectly", "The layout is messed up"
  • Agents encounter image files during work (PNG/JPG/WebP, etc.)
  • Manually trigger via commands
    /vision
    or
    /skill:vision-support

首次使用 — 一键初始化

First Use — One-Click Initialization

bash
node SKILL_DIR/scripts/vision.mjs init
交互式引导,只需三步:
  1. 选 Provider — 从预置的主流平台列表中选择
  2. 填密钥 — 输入 API Key(或环境变量名)
  3. 选模型 — 自动从 API 拉取可用模型列表供选择(如拉取失败则显示推荐列表)
支持的平台覆盖国内外主流:
分类平台
国际OpenAI、Google Gemini、Anthropic Claude、DeepSeek、Groq、Mistral、xAI (Grok)、OpenRouter、Fireworks AI
国内通义千问 (Qwen VL)、智谱 GLM (GLM-4V)、Moonshot (Kimi)、阶跃星辰 (Step)、MiniMax、SiliconFlow (硅基流动)、小米 MiMo
本地Ollama、LM Studio
自定义任何 OpenAI 兼容的第三方平台(自填 baseUrl)
bash
node SKILL_DIR/scripts/vision.mjs init
Interactive guidance in just three steps:
  1. Select Provider — Choose from the preset list of mainstream platforms
  2. Fill in API Key — Enter the API Key (or environment variable name)
  3. Select Model — Automatically pull available model lists from the API for selection (recommended list will be displayed if pulling fails)
Supported platforms cover mainstream domestic and international options:
CategoryPlatforms
InternationalOpenAI, Google Gemini, Anthropic Claude, DeepSeek, Groq, Mistral, xAI (Grok), OpenRouter, Fireworks AI
DomesticTongyi Qianwen (Qwen VL), Zhipu GLM (GLM-4V), Moonshot (Kimi), Step (Jieyue Xingchen), MiniMax, SiliconFlow, Xiaomi MiMo
LocalOllama, LM Studio
CustomAny third-party platform compatible with OpenAI (fill in baseUrl manually)

添加备用模型

Add Backup Models

bash
node SKILL_DIR/scripts/vision.mjs config add
同样的交互式引导,添加的模型作为 fallback 回退。主模型失败后自动尝试。
bash
node SKILL_DIR/scripts/vision.mjs config add
Same interactive guidance, the added models serve as fallback options. They will be automatically tried if the primary model fails.

所有配置命令

All Configuration Commands

bash
undefined
bash
undefined

交互式

Interactive

node SKILL_DIR/scripts/vision.mjs init # 初始化主模型 node SKILL_DIR/scripts/vision.mjs config add # 添加 fallback node SKILL_DIR/scripts/vision.mjs config edit [name] # 编辑模型
node SKILL_DIR/scripts/vision.mjs init # Initialize primary model node SKILL_DIR/scripts/vision.mjs config add # Add fallback model node SKILL_DIR/scripts/vision.mjs config edit [name] # Edit model

快捷命令

Quick Commands

node SKILL_DIR/scripts/vision.mjs config list # 列出所有模型 node SKILL_DIR/scripts/vision.mjs config primary [name] # 设置主模型 node SKILL_DIR/scripts/vision.mjs config remove <name> # 删除模型 node SKILL_DIR/scripts/vision.mjs config set-key <name> <key> # 设置密钥 node SKILL_DIR/scripts/vision.mjs config set-url <name> <url> # 设置 API 地址 node SKILL_DIR/scripts/vision.mjs config test [name] # 测试连通性
undefined
node SKILL_DIR/scripts/vision.mjs config list # List all models node SKILL_DIR/scripts/vision.mjs config primary [name] # Set primary model node SKILL_DIR/scripts/vision.mjs config remove <name> # Delete model node SKILL_DIR/scripts/vision.mjs config set-key <name> <key> # Set API Key node SKILL_DIR/scripts/vision.mjs config set-url <name> <url> # Set API URL node SKILL_DIR/scripts/vision.mjs config test [name] # Test connectivity
undefined

使用方法 — 识别图片

Usage — Image Recognition

单张

Single Image

bash
node SKILL_DIR/scripts/vision.mjs ./screenshot.png
node SKILL_DIR/scripts/vision.mjs ./ui.png "这个界面的布局有什么问题?"
node SKILL_DIR/scripts/vision.mjs "https://example.com/img.png" "描述这张图片"
bash
node SKILL_DIR/scripts/vision.mjs ./screenshot.png
node SKILL_DIR/scripts/vision.mjs ./ui.png "What's wrong with the layout of this interface?"
node SKILL_DIR/scripts/vision.mjs "https://example.com/img.png" "Describe this image"

多张

Multiple Images

bash
node SKILL_DIR/scripts/vision.mjs img1.png img2.png "对比这两张图的差异"
node SKILL_DIR/scripts/vision.mjs ./screenshots/*.png "分析这些界面截图"
node SKILL_DIR/scripts/vision.mjs ./local.png https://example.com/remote.jpg "描述这两张"
bash
node SKILL_DIR/scripts/vision.mjs img1.png img2.png "Compare the differences between these two images"
node SKILL_DIR/scripts/vision.mjs ./screenshots/*.png "Analyze these interface screenshots"
node SKILL_DIR/scripts/vision.mjs ./local.png https://example.com/remote.jpg "Describe these two images"

查找图片

Find Images

如果用户提到图片但没给路径,先搜索:
bash
find . -name "*.png" -o -name "*.jpg" -o -name "*.webp" | head -20
ls -lt *.png *.jpg *.webp 2>/dev/null
If users mention images but don't provide paths, search first:
bash
find . -name "*.png" -o -name "*.jpg" -o -name "*.webp" | head -20
ls -lt *.png *.jpg *.webp 2>/dev/null

获取结果后的工作流

Workflow After Obtaining Results

脚本成功后 stdout 输出的纯文本就是识别结果(stderr 是日志不影响)。
  1. 读取识别结果:stdout 内容即为图片描述
  2. 结合用户问题:把描述和用户需求结合
  3. 主模型继续工作:用识别结果作为上下文,主模型完成后续任务
The plain text output to stdout after the script succeeds is the recognition result (stderr is logs and does not affect).
  1. Read recognition result: The content in stdout is the image description
  2. Combine with user's question: Integrate the description with user's requirements
  3. Main model continues working: Use the recognition result as context for the main model to complete subsequent tasks

回退机制

Fallback Mechanism

config list
中排第一位的 ★ 主模型优先调用。失败后自动依次尝试后续模型。所有模型都失败则非零退出码退出。
The ★ primary model ranked first in
config list
is called with priority. Subsequent models will be tried automatically if it fails. If all models fail, the script exits with a non-zero exit code.

环境变量

Environment Variables

变量说明
VISION_CONFIG_PATH
自定义配置文件路径
VISION_DEFAULT_MODEL
临时覆盖主模型(按 name 匹配)
VISION_API_KEY
全局密钥回退
VariableDescription
VISION_CONFIG_PATH
Custom configuration file path
VISION_DEFAULT_MODEL
Temporarily override the primary model (matched by name)
VISION_API_KEY
Global API Key fallback