minimax-image-understanding
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMiniMax Image Understanding Skill
MiniMax 图像理解Skill
Use this skill when you need to analyze, describe, or extract information from images.
当你需要分析、描述图像或从图像中提取信息时,可以使用本Skill。
How to Use
使用方法
Call the tool directly with a prompt and image URL:
understand_imageunderstand_image({
prompt: "Your question about the image",
image_url: "https://example.com/image.png"
})直接调用工具,传入提示词和图像URL:
understand_imageunderstand_image({
prompt: "你的图像相关问题",
image_url: "https://example.com/image.png"
})When to Use
适用场景
Use when:
understand_image- Screenshots: Error messages, UI issues, code in screenshots
- Visual content: Photos, diagrams, charts, graphs
- Documents: Extracting text from images (OCR), understanding layouts
- UI/UX analysis: Evaluating designs, identifying components
- Visual debugging: Understanding visual bugs or layout issues
在以下场景使用:
understand_image- 截图:错误信息、UI问题、截图中的代码
- 视觉内容:照片、图表、示意图、图形
- 文档:从图像中提取文本(OCR)、理解版面布局
- UI/UX分析:评估设计、识别组件
- 视觉调试:排查视觉bug或布局问题
When NOT to Use
不适用场景
Do NOT use when:
understand_image- Image is already described in the conversation
- The image is a simple icon or emoji you recognize
- No image is provided or the image URL is inaccessible
- Redundant with existing context (e.g., file contents already visible)
以下场景请勿使用:
understand_image- 对话中已描述过该图像
- 图像是你能识别的简单图标或表情符号
- 未提供图像或图像URL无法访问
- 与现有上下文重复(例如,文件内容已可见)
Usage
使用示例
understand_image({
prompt: "What do you see in this image?",
image_url: "https://example.com/screenshot.png"
})understand_image({
prompt: "这张图像里有什么?",
image_url: "https://example.com/screenshot.png"
})API Details
API详情
Endpoint:
POST {api_host}/v1/coding_plan/vlmRequest Body:
json
{
"prompt": "Your question about the image",
"image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}Response Format:
json
{
"content": "AI analysis of the image...",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}接口地址:
POST {api_host}/v1/coding_plan/vlm请求体:
json
{
"prompt": "你的图像相关问题",
"image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}响应格式:
json
{
"content": "AI对图像的分析结果...",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Image Processing
图像处理
The tool automatically handles three types of image inputs:
-
HTTP/HTTPS URLs: Downloads the image and converts to base64
- Example:
https://example.com/image.jpg
- Example:
-
Local file paths: Reads local files and converts to base64
- Absolute:
/Users/username/Documents/image.png - Relative:
images/photo.png - Removes prefix if present
@
- Absolute:
-
Base64 data URLs: Passes through existing base64 data
- Example:
data:image/png;base64,iVBORw0KGgo...
- Example:
该工具自动处理三种类型的图像输入:
-
HTTP/HTTPS URL:下载图像并转换为base64格式
- 示例:
https://example.com/image.jpg
- 示例:
-
本地文件路径:读取本地文件并转换为base64格式
- 绝对路径:
/Users/username/Documents/image.png - 相对路径:
images/photo.png - 若路径带有前缀,会自动移除
@
- 绝对路径:
-
Base64数据URL:直接传递已有的base64数据
- 示例:
data:image/png;base64,iVBORw0KGgo...
- 示例:
Image Formats
支持的图像格式
Supported:
- JPEG (.jpg, .jpeg)
- PNG (.png)
- WebP (.webp)
Not supported:
- PDF, GIF, PSD, SVG, and other formats
支持:
- JPEG (.jpg, .jpeg)
- PNG (.png)
- WebP (.webp)
不支持:
- PDF、GIF、PSD、SVG及其他格式
Crafting Effective Prompts
编写有效提示词
For Descriptions
描述类
- "Describe what's in this image in detail"
- "What is the main subject of this image?"
- "Describe the visual style and composition"
- "详细描述这张图像的内容"
- "这张图像的主要主题是什么?"
- "描述图像的视觉风格和构图"
For Code/Technical
代码/技术类
- "What code is shown in this screenshot?"
- "Extract all text from this image"
- "Identify the UI framework/components used"
- "这张截图里显示的是什么代码?"
- "提取这张图像中的所有文本"
- "识别使用的UI框架/组件"
For Analysis
分析类
- "Analyze this UI design. What is working well and what could be improved?"
- "What emotions or mood does this image convey?"
- "Compare this design to Material Design principles"
- "分析这个UI设计,哪些部分做得好,哪些可以改进?"
- "这张图像传递了什么情绪或氛围?"
- "将该设计与Material Design原则进行对比"
For OCR/Text Extraction
OCR/文本提取类
- "Extract all text from this image"
- "Read the error message in this screenshot"
- "What does the label say in this image?"
- "提取这张图像中的所有文本"
- "读取这张截图中的错误信息"
- "这张图像中的标签内容是什么?"
Examples
示例
Error Analysis
错误分析
understand_image({
prompt: "What is the error message and where is it located in this screenshot?",
image_url: "./error-screenshot.png"
})understand_image({
prompt: "这张截图中的错误信息是什么,位于哪里?",
image_url: "./error-screenshot.png"
})Code Screenshot
代码截图
understand_image({
prompt: "What code is shown in this screenshot? Please transcribe it exactly.",
image_url: "https://example.com/code.png"
})understand_image({
prompt: "这张截图里显示的是什么代码?请准确转录。",
image_url: "https://example.com/code.png"
})Design Review
设计评审
understand_image({
prompt: "Analyze this UI design. What is working well and what could be improved?",
image_url: "https://example.com/mockup.png"
})understand_image({
prompt: "分析这个UI设计,哪些部分做得好,哪些可以改进?",
image_url: "https://example.com/mockup.png"
})OCR
OCR示例
understand_image({
prompt: "Extract all text from this image",
image_url: "/Users/username/Documents/scan.png"
})understand_image({
prompt: "提取这张图像中的所有文本",
image_url: "/Users/username/Documents/scan.png"
})Tips
使用技巧
- Be specific in your prompt about what you want to know
- Mention format if you need structured output (e.g., "list all elements")
- Include context if the image is part of a larger task
- For screenshots, specify if you need full-page or just a specific area
- Complex analysis may trigger a confirmation prompt (analyze, extract, describe, recognize, transcribe, read)
- 提示词要具体,明确说明你想了解的内容
- 如果需要结构化输出,要指定格式(例如:"列出所有元素")
- 如果图像是更大任务的一部分,请提供上下文
- 对于截图,要说明是需要分析整页还是特定区域
- 复杂分析可能会触发确认提示(分析、提取、描述、识别、转录、读取)
Error Handling
错误处理
- Status code 1004: Authentication error - check API key and region
- Status code 2038: Real-name verification required
- Invalid image: File doesn't exist or URL is inaccessible
- Unsupported format: Image format not in JPEG, PNG, WebP
- 状态码1004:认证错误 - 检查API密钥和区域
- 状态码2038:需要完成实名认证
- 无效图像:文件不存在或URL无法访问
- 不支持的格式:图像格式不在JPEG、PNG、WebP范围内