minimax-image-understanding

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MiniMax Image Understanding Skill

MiniMax 图像理解Skill

Use this skill when you need to analyze, describe, or extract information from images.

当你需要分析、描述图像或从图像中提取信息时，可以使用本Skill。

How to Use

使用方法

Call the

understand_image

tool directly with a prompt and image URL:

understand_image({
  prompt: "Your question about the image",
  image_url: "https://example.com/image.png"
})

直接调用

understand_image

工具，传入提示词和图像URL：

understand_image({
  prompt: "你的图像相关问题",
  image_url: "https://example.com/image.png"
})

When to Use

适用场景

Use

understand_image

when:

Screenshots: Error messages, UI issues, code in screenshots
Visual content: Photos, diagrams, charts, graphs
Documents: Extracting text from images (OCR), understanding layouts
UI/UX analysis: Evaluating designs, identifying components
Visual debugging: Understanding visual bugs or layout issues

在以下场景使用

understand_image

：

截图：错误信息、UI问题、截图中的代码
视觉内容：照片、图表、示意图、图形
文档：从图像中提取文本（OCR）、理解版面布局
UI/UX分析：评估设计、识别组件
视觉调试：排查视觉bug或布局问题

When NOT to Use

不适用场景

Do NOT use

understand_image

when:

Image is already described in the conversation
The image is a simple icon or emoji you recognize
No image is provided or the image URL is inaccessible
Redundant with existing context (e.g., file contents already visible)

以下场景请勿使用

understand_image

：

对话中已描述过该图像
图像是你能识别的简单图标或表情符号
未提供图像或图像URL无法访问
与现有上下文重复（例如，文件内容已可见）

Usage

使用示例

understand_image({
  prompt: "What do you see in this image?",
  image_url: "https://example.com/screenshot.png"
})

understand_image({
  prompt: "这张图像里有什么？",
  image_url: "https://example.com/screenshot.png"
})

API Details

API详情

Endpoint:

POST {api_host}/v1/coding_plan/vlm

Request Body:

json

{
  "prompt": "Your question about the image",
  "image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}

Response Format:

json

{
  "content": "AI analysis of the image...",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

接口地址：

POST {api_host}/v1/coding_plan/vlm

请求体：

json

{
  "prompt": "你的图像相关问题",
  "image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}

响应格式：

json

{
  "content": "AI对图像的分析结果...",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Image Processing

图像处理

The tool automatically handles three types of image inputs:

HTTP/HTTPS URLs: Downloads the image and converts to base64
- Example:
```
https://example.com/image.jpg
```
Local file paths: Reads local files and converts to base64
- Absolute:
```
/Users/username/Documents/image.png
```
- Relative:
```
images/photo.png
```
- Removes
```
@
```
  prefix if present
Base64 data URLs: Passes through existing base64 data
- Example:
```
data:image/png;base64,iVBORw0KGgo...
```

该工具自动处理三种类型的图像输入：

HTTP/HTTPS URL：下载图像并转换为base64格式
- 示例：
```
https://example.com/image.jpg
```
本地文件路径：读取本地文件并转换为base64格式
- 绝对路径：
```
/Users/username/Documents/image.png
```
- 相对路径：
```
images/photo.png
```
- 若路径带有
```
@
```
  前缀，会自动移除
Base64数据URL：直接传递已有的base64数据
- 示例：
```
data:image/png;base64,iVBORw0KGgo...
```

Image Formats

支持的图像格式

Supported:

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)

Not supported:

PDF, GIF, PSD, SVG, and other formats

支持：

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)

不支持：

PDF、GIF、PSD、SVG及其他格式

Crafting Effective Prompts

编写有效提示词

For Descriptions

描述类

"Describe what's in this image in detail"
"What is the main subject of this image?"
"Describe the visual style and composition"

"详细描述这张图像的内容"
"这张图像的主要主题是什么？"
"描述图像的视觉风格和构图"

For Code/Technical

代码/技术类

"What code is shown in this screenshot?"
"Extract all text from this image"
"Identify the UI framework/components used"

"这张截图里显示的是什么代码？"
"提取这张图像中的所有文本"
"识别使用的UI框架/组件"

For Analysis

分析类

"Analyze this UI design. What is working well and what could be improved?"
"What emotions or mood does this image convey?"
"Compare this design to Material Design principles"

"分析这个UI设计，哪些部分做得好，哪些可以改进？"
"这张图像传递了什么情绪或氛围？"
"将该设计与Material Design原则进行对比"

For OCR/Text Extraction

OCR/文本提取类

"Extract all text from this image"
"Read the error message in this screenshot"
"What does the label say in this image?"

"提取这张图像中的所有文本"
"读取这张截图中的错误信息"
"这张图像中的标签内容是什么？"

Examples

示例

Error Analysis

错误分析

understand_image({
  prompt: "What is the error message and where is it located in this screenshot?",
  image_url: "./error-screenshot.png"
})

understand_image({
  prompt: "这张截图中的错误信息是什么，位于哪里？",
  image_url: "./error-screenshot.png"
})

Code Screenshot

代码截图

understand_image({
  prompt: "What code is shown in this screenshot? Please transcribe it exactly.",
  image_url: "https://example.com/code.png"
})

understand_image({
  prompt: "这张截图里显示的是什么代码？请准确转录。",
  image_url: "https://example.com/code.png"
})

Design Review

设计评审

understand_image({
  prompt: "Analyze this UI design. What is working well and what could be improved?",
  image_url: "https://example.com/mockup.png"
})

understand_image({
  prompt: "分析这个UI设计，哪些部分做得好，哪些可以改进？",
  image_url: "https://example.com/mockup.png"
})

OCR

OCR示例

understand_image({
  prompt: "Extract all text from this image",
  image_url: "/Users/username/Documents/scan.png"
})

understand_image({
  prompt: "提取这张图像中的所有文本",
  image_url: "/Users/username/Documents/scan.png"
})

Tips

使用技巧

Be specific in your prompt about what you want to know
Mention format if you need structured output (e.g., "list all elements")
Include context if the image is part of a larger task
For screenshots, specify if you need full-page or just a specific area
Complex analysis may trigger a confirmation prompt (analyze, extract, describe, recognize, transcribe, read)

提示词要具体，明确说明你想了解的内容
如果需要结构化输出，要指定格式（例如："列出所有元素"）
如果图像是更大任务的一部分，请提供上下文
对于截图，要说明是需要分析整页还是特定区域
复杂分析可能会触发确认提示（分析、提取、描述、识别、转录、读取）

Error Handling

错误处理

Status code 1004: Authentication error - check API key and region
Status code 2038: Real-name verification required
Invalid image: File doesn't exist or URL is inaccessible
Unsupported format: Image format not in JPEG, PNG, WebP

状态码1004：认证错误 - 检查API密钥和区域
状态码2038：需要完成实名认证
无效图像：文件不存在或URL无法访问
不支持的格式：图像格式不在JPEG、PNG、WebP范围内