glm-understand-image
Perform image understanding and analysis using GLM Vision MCP server.
Execution Flow (Installation required for first use, directly call step 6 afterwards)
Step 1: Check and Install Dependencies
1.1 Check if mcporter is available
bash
npx -y mcporter --version
If the command returns successfully, mcporter is available, skip to Step 2.
mcporter can be used directly via npx without installation.
Step 2: Check API Key Configuration
bash
cat ~/.openclaw/config/glm.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('api_key', ''))"
If a non-empty API Key is returned, skip to Step 4.
Step 3: Configure API Key (if not configured)
3.2 Request Key from User if Not Found
Ask the user to provide Zhipu API Key.
If the user does not have a Zhipu API Key, they can purchase it at
https://www.bigmodel.cn/glm-coding?ic=OOKF4KGGTW.
3.3 Save API Key
bash
mkdir -p ~/.openclaw/config
cat > ~/.openclaw/config/glm.json << EOF
{
"api_key": "API_KEY"
}
EOF
Step 4: Add MCP Server
Use mcporter to add GLM Vision MCP server:
bash
mcporter config add glm-vision \
--command "npx -y @z_ai/mcp-server" \
--env Z_AI_API_KEY="your-key" \
--env Z_AI_MODE="ZHIPU" \
--env HOME="$PWD"
Note: Replace
with the actual Zhipu API Key. Set the
environment variable to the current working directory to avoid log file permission issues.
Step 5: Test Connection
Confirm that the
server has been successfully added.
Step 6: Use MCP to Process Images
6.1 Prepare Images
Place images in an accessible path, for example:
~/.openclaw/workspace/images/image-name.jpg
- Or use a URL
6.2 Call MCP Tool Using mcporter
Call the MCP service using mcporter:
bash
mcporter call glm-vision.analyze_image prompt="<Question about the image>" image_source="<Image path or URL>"
Examples:
bash
# Describe image content
mcporter call glm-vision.analyze_image prompt="Describe the content of this image in detail" image_source="~/image.jpg"
# Use URL
mcporter call glm-vision.analyze_image prompt="What does this image show?" image_source="https://example.com/image.jpg"
# Extract text from image
mcporter call glm-vision.extract_text_from_screenshot image_source="~/screenshot.png"
# Diagnose error screenshot
mcporter call glm-vision.diagnose_error_screenshot prompt="Analyze this error" image_source="~/error.png"
6.3 API Parameter Description
| Parameter | Description | Type |
|---|
| image_source | Image path or URL | string (Required) |
| prompt | Question about the image | string (Required) |
Supported Tools
Important Note: Refer to official documentation if issues occur
Official Documentation:
https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server
GLM Vision MCP server provides the following tools:
- - Convert UI screenshots into code, prompts, design specifications, or natural language descriptions
extract_text_from_screenshot
- Extract and recognize text from screenshots using advanced OCR capabilities
diagnose_error_screenshot
- Parse error pop-ups, stack traces, and log screenshots to provide localization and repair suggestions
understand_technical_diagram
- Generate structured interpretations for technical diagrams such as architecture diagrams, flowcharts, UML, ER diagrams, etc.
analyze_data_visualization
- Read dashboards and statistical charts to extract trends, anomalies, and business key points
- - Compare two UI screenshots to identify visual differences and implementation deviations
- - General image understanding capability, suitable for visual content not covered by specialized tools
- - Support video scene analysis for formats like MP4/MOV/M4V, capture key frames, events, and key points
MCP Configuration
MCP Server Configuration:
Environment Variables:
- - Zhipu API Key (Required)
- - Service platform selection, default is