glm-understand-image

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

glm-understand-image

glm-understand-image

使用 GLM 视觉 MCP 服务器进行图像理解和分析。
Perform image understanding and analysis using GLM Vision MCP server.

执行流程(首次需要安装,后续直接步骤6调用)

Execution Flow (Installation required for first use, directly call step 6 afterwards)

步骤 1: 检查并安装依赖

Step 1: Check and Install Dependencies

1.1 检查 mcporter 是否可用

1.1 Check if mcporter is available

bash
npx -y mcporter --version
如果命令返回成功,说明 mcporter 可用,跳到步骤 2。
mcporter 可以直接通过 npx 使用,无需安装。
bash
npx -y mcporter --version
If the command returns successfully, mcporter is available, skip to Step 2.
mcporter can be used directly via npx without installation.

步骤 2: 检查 API Key 配置

Step 2: Check API Key Configuration

bash
cat ~/.openclaw/config/glm.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('api_key', ''))"
如果返回非空的 API Key,跳到步骤 4。
bash
cat ~/.openclaw/config/glm.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('api_key', ''))"
If a non-empty API Key is returned, skip to Step 4.

步骤 3: 配置 API Key(如果未配置)

Step 3: Configure API Key (if not configured)

3.2 如果没有找到 Key,向用户索要

3.2 Request Key from User if Not Found

询问用户提供智谱 API Key。
如果用户没有智谱 API Key,可以访问 https://www.bigmodel.cn/glm-coding?ic=OOKF4KGGTW 购买。
Ask the user to provide Zhipu API Key.
If the user does not have a Zhipu API Key, they can purchase it at https://www.bigmodel.cn/glm-coding?ic=OOKF4KGGTW.

3.3 保存 API Key

3.3 Save API Key

bash
mkdir -p ~/.openclaw/config
cat > ~/.openclaw/config/glm.json << EOF
{
  "api_key": "API密钥"
}
EOF
bash
mkdir -p ~/.openclaw/config
cat > ~/.openclaw/config/glm.json << EOF
{
  "api_key": "API_KEY"
}
EOF

步骤 4: 添加 MCP 服务器

Step 4: Add MCP Server

使用 mcporter 添加 GLM 视觉 MCP 服务器:
bash
mcporter config add glm-vision \
  --command "npx -y @z_ai/mcp-server" \
  --env Z_AI_API_KEY="your-key" \
  --env Z_AI_MODE="ZHIPU" \
  --env HOME="$PWD"
注意:将
your-key
替换为实际的智谱 API Key。
HOME
环境变量设置为当前工作目录以避免日志文件权限问题。
Use mcporter to add GLM Vision MCP server:
bash
mcporter config add glm-vision \
  --command "npx -y @z_ai/mcp-server" \
  --env Z_AI_API_KEY="your-key" \
  --env Z_AI_MODE="ZHIPU" \
  --env HOME="$PWD"
Note: Replace
your-key
with the actual Zhipu API Key. Set the
HOME
environment variable to the current working directory to avoid log file permission issues.

步骤 5: 测试连接

Step 5: Test Connection

bash
mcporter list
确认
glm-vision
服务器已成功添加。
bash
mcporter list
Confirm that the
glm-vision
server has been successfully added.

步骤 6: 使用 MCP 处理图像

Step 6: Use MCP to Process Images

6.1 准备图片

6.1 Prepare Images

将图片放到可访问路径,例如:
  • ~/.openclaw/workspace/images/图片名.jpg
  • 或者使用 URL
Place images in an accessible path, for example:
  • ~/.openclaw/workspace/images/image-name.jpg
  • Or use a URL

6.2 使用 mcporter 调用 MCP 工具

6.2 Call MCP Tool Using mcporter

使用 mcporter 调用 MCP 服务:
bash
mcporter call glm-vision.analyze_image prompt="<对图片的提问>" image_source="<图片路径或URL>"
示例:
bash
undefined
Call the MCP service using mcporter:
bash
mcporter call glm-vision.analyze_image prompt="<Question about the image>" image_source="<Image path or URL>"
Examples:
bash
undefined

描述图片内容

Describe image content

mcporter call glm-vision.analyze_image prompt="详细描述这张图片的内容" image_source="~/image.jpg"
mcporter call glm-vision.analyze_image prompt="Describe the content of this image in detail" image_source="~/image.jpg"

使用 URL

Use URL

mcporter call glm-vision.analyze_image prompt="这张图片展示了什么?" image_source="https://example.com/image.jpg"
mcporter call glm-vision.analyze_image prompt="What does this image show?" image_source="https://example.com/image.jpg"

提取图片中的文字

Extract text from image

mcporter call glm-vision.extract_text_from_screenshot image_source="~/screenshot.png"
mcporter call glm-vision.extract_text_from_screenshot image_source="~/screenshot.png"

诊断错误截图

Diagnose error screenshot

mcporter call glm-vision.diagnose_error_screenshot prompt="分析这个错误" image_source="~/error.png"
undefined
mcporter call glm-vision.diagnose_error_screenshot prompt="Analyze this error" image_source="~/error.png"
undefined

6.3 API 参数说明

6.3 API Parameter Description

参数说明类型
image_source图片路径或 URLstring (必填)
prompt对图片的提问string (必填)
ParameterDescriptionType
image_sourceImage path or URLstring (Required)
promptQuestion about the imagestring (Required)

支持的工具

Supported Tools

重要提示:如果出现问题以官方说明为准 官方版说明 : https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server
GLM 视觉 MCP 服务器提供以下工具:
  • ui_to_artifact
    - 将 UI 截图转换为代码、提示词、设计规范或自然语言描述
  • extract_text_from_screenshot
    - 使用先进的 OCR 能力从截图中提取和识别文字
  • diagnose_error_screenshot
    - 解析错误弹窗、堆栈和日志截图,给出定位与修复建议
  • understand_technical_diagram
    - 针对架构图、流程图、UML、ER 图等技术图纸生成结构化解读
  • analyze_data_visualization
    - 阅读仪表盘、统计图表,提炼趋势、异常与业务要点
  • ui_diff_check
    - 对比两张 UI 截图,识别视觉差异和实现偏差
  • analyze_image
    - 通用图像理解能力,适配未被专项工具覆盖的视觉内容
  • video_analysis
    - 支持 MP4/MOV/M4V 等格式的视频场景解析,抓取关键帧、事件与要点
Important Note: Refer to official documentation if issues occur Official Documentation: https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server
GLM Vision MCP server provides the following tools:
  • ui_to_artifact
    - Convert UI screenshots into code, prompts, design specifications, or natural language descriptions
  • extract_text_from_screenshot
    - Extract and recognize text from screenshots using advanced OCR capabilities
  • diagnose_error_screenshot
    - Parse error pop-ups, stack traces, and log screenshots to provide localization and repair suggestions
  • understand_technical_diagram
    - Generate structured interpretations for technical diagrams such as architecture diagrams, flowcharts, UML, ER diagrams, etc.
  • analyze_data_visualization
    - Read dashboards and statistical charts to extract trends, anomalies, and business key points
  • ui_diff_check
    - Compare two UI screenshots to identify visual differences and implementation deviations
  • analyze_image
    - General image understanding capability, suitable for visual content not covered by specialized tools
  • video_analysis
    - Support video scene analysis for formats like MP4/MOV/M4V, capture key frames, events, and key points

MCP 配置

MCP Configuration

MCP 服务器名称:
glm-vision
MCP 服务器配置:
@z_ai/mcp-server
环境变量:
  • Z_AI_API_KEY
    - 智谱 API Key(必需)
  • Z_AI_MODE
    - 服务平台选择,默认为
    ZHIPU
MCP Server Name:
glm-vision
MCP Server Configuration:
@z_ai/mcp-server
Environment Variables:
  • Z_AI_API_KEY
    - Zhipu API Key (Required)
  • Z_AI_MODE
    - Service platform selection, default is
    ZHIPU