openai-image-vision

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenAI Image Vision

OpenAI 图像视觉分析

Analyze images using OpenAI's GPT-4 Vision API. The model can understand visual elements including objects, shapes, colors, textures, and text within images.
使用OpenAI的GPT-4 Vision API分析图像。该模型可识别图像中的视觉元素,包括物体、形状、颜色、纹理和文字。

Setup

配置步骤

This skill requires an OpenAI API key. If not configured:
  1. Get your API key from https://platform.openai.com/api-keys
  2. Set the key using:
    env_config(action="set", key="OPENAI_API_KEY", value="your-key")
Optional: Set custom API base URL (default: https://api.openai.com/v1):
bash
env_config(action="set", key="OPENAI_API_BASE", value="your-base-url")
该功能需要OpenAI API密钥。若尚未配置:
  1. https://platform.openai.com/api-keys获取你的API密钥
  2. 通过以下命令设置密钥:
    env_config(action="set", key="OPENAI_API_KEY", value="your-key")
可选:设置自定义API基础URL(默认值:https://api.openai.com/v1):
bash
env_config(action="set", key="OPENAI_API_BASE", value="your-base-url")

Usage

使用方法

Important: Scripts are located relative to this skill's base directory.
When you see this skill in
<available_skills>
, note the
<base_dir>
path.
CRITICAL: Always use
bash
command to execute the script:
bash
undefined
重要提示:脚本位于该功能的基础目录下。
当你在
<available_skills>
中看到该功能时,请记下
<base_dir>
路径。
关键注意事项:必须使用
bash
命令执行脚本:
bash
undefined

General pattern (MUST start with bash):

通用格式(必须以bash开头):

bash "<base_dir>/scripts/vision.sh" "<image_path_or_url>" "<question>" [model]
bash "<base_dir>/scripts/vision.sh" "<image_path_or_url>" "<question>" [model]

DO NOT execute the script directly like this (WRONG):

请勿直接执行脚本(错误示例):

"<base_dir>/scripts/vision.sh" ...

"<base_dir>/scripts/vision.sh" ...

Parameters:

参数说明:

- image_path_or_url: Local image file path or HTTP(S) URL (required)

- image_path_or_url:本地图像文件路径或HTTP(S) URL(必填)

- question: Question to ask about the image (required)

- question:关于图像的问题(必填)

- model: OpenAI model to use (default: gpt-4.1-mini)

- model:要使用的OpenAI模型(默认值:gpt-4.1-mini)

Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo

可选值:gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo

undefined
undefined

Examples

使用示例

Analyze a local image

分析本地图片

bash
bash "<base_dir>/scripts/vision.sh" "/path/to/image.jpg" "What's in this image?"
bash
bash "<base_dir>/scripts/vision.sh" "/path/to/image.jpg" "这张图片里有什么?"

Analyze an image from URL

分析网络图片

bash
bash "<base_dir>/scripts/vision.sh" "https://example.com/image.jpg" "Describe this image in detail"
bash
bash "<base_dir>/scripts/vision.sh" "https://example.com/image.jpg" "详细描述这张图片"

Use specific model

使用指定模型

bash
bash "<base_dir>/scripts/vision.sh" "/path/to/photo.png" "What colors are prominent?" "gpt-4o-mini"
bash
bash "<base_dir>/scripts/vision.sh" "/path/to/photo.png" "图片中最突出的颜色是什么?" "gpt-4o-mini"

Extract text from image

提取图片文字

bash
bash "<base_dir>/scripts/vision.sh" "/path/to/document.jpg" "Extract all text from this image"
bash
bash "<base_dir>/scripts/vision.sh" "/path/to/document.jpg" "提取这张图片中的所有文字"

Analyze multiple aspects

多维度分析图片

bash
bash "<base_dir>/scripts/vision.sh" "image.jpg" "List all objects you can see and describe the overall scene"
bash
bash "<base_dir>/scripts/vision.sh" "image.jpg" "列出你能看到的所有物体,并描述整体场景"

Supported Image Formats

支持的图像格式

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • GIF (.gif)
  • WebP (.webp)
Performance Optimization: Files larger than 1MB are automatically compressed to 800px (longest side) to avoid command-line parameter limits. This happens transparently without affecting analysis quality.
  • JPEG(.jpg、.jpeg)
  • PNG(.png)
  • GIF(.gif)
  • WebP(.webp)
性能优化:大于1MB的文件会自动压缩至最长边800像素,以避免命令行参数限制。该过程在后台完成,不会影响分析质量。

Response Format

响应格式

The script returns a JSON response:
json
{
  "model": "gpt-4.1-mini",
  "content": "The image shows...",
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 567,
    "total_tokens": 1801
  }
}
Or in case of error:
json
{
  "error": "Error description",
  "details": "Additional error information"
}
脚本会返回JSON格式的响应:
json
{
  "model": "gpt-4.1-mini",
  "content": "图像显示...",
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 567,
    "total_tokens": 1801
  }
}
若出现错误,响应格式如下:
json
{
  "error": "错误描述",
  "details": "额外错误信息"
}

Notes

注意事项

  • Image size: Images are automatically resized if too large
  • Timeout: 60 seconds for API calls
  • Rate limits: Subject to your OpenAI API plan limits
  • Privacy: Images are sent to OpenAI's servers for processing
  • Local files: Automatically converted to base64 for API submission
  • URLs: Can be passed directly to the API without downloading
  • 图像大小:若图像过大,会自动调整尺寸
  • 超时设置:API调用超时时间为60秒
  • 速率限制:受你的OpenAI API套餐限制
  • 隐私说明:图像会发送至OpenAI服务器进行处理
  • 本地文件:会自动转换为base64格式提交至API
  • 网络URL:可直接传递给API,无需下载