glm-understand-image

Original：🇨🇳 Chinese

Translated

Perform image understanding and analysis using GLM Vision MCP. Trigger conditions: (1) Users request image analysis, image understanding, or description of image content (2) Need to identify objects, text, or scenes in images (3) Use GLM's visual understanding capabilities

4installs

Sourcethincher/awsome_skills

Added on2026-03-01

NPX Install

npx skill4agent add thincher/awsome_skills glm-understand-image

SKILL.md Content (Chinese)

View Translation Comparison →

glm-understand-image

Perform image understanding and analysis using GLM Vision MCP server.

Execution Flow (Installation required for first use, directly call step 6 afterwards)

Step 1: Check and Install Dependencies

1.1 Check if mcporter is available

bash

npx -y mcporter --version

If the command returns successfully, mcporter is available, skip to Step 2.

mcporter can be used directly via npx without installation.

Step 2: Check API Key Configuration

bash

cat ~/.openclaw/config/glm.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('api_key', ''))"

If a non-empty API Key is returned, skip to Step 4.

Step 3: Configure API Key (if not configured)

3.2 Request Key from User if Not Found

Ask the user to provide Zhipu API Key.

If the user does not have a Zhipu API Key, they can purchase it at https://www.bigmodel.cn/glm-coding?ic=OOKF4KGGTW.

3.3 Save API Key

bash

mkdir -p ~/.openclaw/config
cat > ~/.openclaw/config/glm.json << EOF
{
  "api_key": "API_KEY"
}
EOF

Step 4: Add MCP Server

Use mcporter to add GLM Vision MCP server:

bash

mcporter config add glm-vision \
  --command "npx -y @z_ai/mcp-server" \
  --env Z_AI_API_KEY="your-key" \
  --env Z_AI_MODE="ZHIPU" \
  --env HOME="$PWD"

Note: Replace

your-key

with the actual Zhipu API Key. Set the

HOME

environment variable to the current working directory to avoid log file permission issues.

Step 5: Test Connection

bash

mcporter list

Confirm that the

glm-vision

server has been successfully added.

Step 6: Use MCP to Process Images

6.1 Prepare Images

Place images in an accessible path, for example:

~/.openclaw/workspace/images/image-name.jpg

Or use a URL

6.2 Call MCP Tool Using mcporter

Call the MCP service using mcporter:

bash

mcporter call glm-vision.analyze_image prompt="<Question about the image>" image_source="<Image path or URL>"

Examples:

bash

# Describe image content
mcporter call glm-vision.analyze_image prompt="Describe the content of this image in detail" image_source="~/image.jpg"

# Use URL
mcporter call glm-vision.analyze_image prompt="What does this image show?" image_source="https://example.com/image.jpg"

# Extract text from image
mcporter call glm-vision.extract_text_from_screenshot image_source="~/screenshot.png"

# Diagnose error screenshot
mcporter call glm-vision.diagnose_error_screenshot prompt="Analyze this error" image_source="~/error.png"

6.3 API Parameter Description

Parameter	Description	Type
image_source	Image path or URL	string (Required)
prompt	Question about the image	string (Required)

Supported Tools

Important Note: Refer to official documentation if issues occur Official Documentation: https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server

GLM Vision MCP server provides the following tools:

```
ui_to_artifact
```
- Convert UI screenshots into code, prompts, design specifications, or natural language descriptions
```
extract_text_from_screenshot
```
- Extract and recognize text from screenshots using advanced OCR capabilities
```
diagnose_error_screenshot
```
- Parse error pop-ups, stack traces, and log screenshots to provide localization and repair suggestions
```
understand_technical_diagram
```
- Generate structured interpretations for technical diagrams such as architecture diagrams, flowcharts, UML, ER diagrams, etc.
```
analyze_data_visualization
```
- Read dashboards and statistical charts to extract trends, anomalies, and business key points
```
ui_diff_check
```
- Compare two UI screenshots to identify visual differences and implementation deviations
```
analyze_image
```
- General image understanding capability, suitable for visual content not covered by specialized tools
```
video_analysis
```
- Support video scene analysis for formats like MP4/MOV/M4V, capture key frames, events, and key points

MCP Configuration

MCP Server Name:

glm-vision

MCP Server Configuration:

@z_ai/mcp-server

Environment Variables:

```
Z_AI_API_KEY
```
- Zhipu API Key (Required)
```
Z_AI_MODE
```
- Service platform selection, default is
```
ZHIPU
```

glm-understand-image

NPX Install

Tags

SKILL.md Content (Chinese)

glm-understand-image

Execution Flow (Installation required for first use, directly call step 6 afterwards)

Step 1: Check and Install Dependencies

1.1 Check if mcporter is available

Step 2: Check API Key Configuration

Step 3: Configure API Key (if not configured)

3.2 Request Key from User if Not Found

3.3 Save API Key

Step 4: Add MCP Server

Step 5: Test Connection

Step 6: Use MCP to Process Images

6.1 Prepare Images

6.2 Call MCP Tool Using mcporter

6.3 API Parameter Description

Supported Tools

MCP Configuration