alicloud-ai-multimodal-qwen-vl

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
分类:provider

Model Studio Qwen VL (Image Understanding)

Model Studio Qwen VL(图像理解)

Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.
通过兼容DashScope模式的API,使用Qwen VL模型完成图像输入+文本输出的理解任务。

Prerequisites

前置条件

  • Install dependencies (recommended in a venv):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
  • Set
    DASHSCOPE_API_KEY
    in environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    .
  • 安装依赖(推荐在虚拟环境中进行):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
  • 在环境变量中设置
    DASHSCOPE_API_KEY
    ,或在
    ~/.alibabacloud/credentials
    中添加
    dashscope_api_key
    配置。

Critical model names

关键模型名称

Prefer the Qwen3 VL family:
  • qwen3-vl-plus
  • qwen3-vl-flash
When you need explicit "latest" routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:
  • qwen3-vl-plus-latest
  • qwen3-vl-plus-2025-12-19
  • qwen3-vl-flash-latest
Legacy names still seen in some workloads:
  • qwen-vl-max-latest
  • qwen-vl-plus-latest
优先选择Qwen3 VL系列:
  • qwen3-vl-plus
  • qwen3-vl-flash
当需要明确的「最新版本」路由或可复现的快照时,可使用官方模型列表中的支持别名/快照,例如:
  • qwen3-vl-plus-latest
  • qwen3-vl-plus-2025-12-19
  • qwen3-vl-flash-latest
部分工作流中仍可见旧版模型名称:
  • qwen-vl-max-latest
  • qwen-vl-plus-latest

Normalized interface (multimodal.chat)

标准化接口(multimodal.chat)

Request

请求参数

  • prompt
    (string, required): user question/instruction about image.
  • image
    (string, required): HTTPS URL, local path, or
    data:
    URL.
  • model
    (string, optional): default
    qwen3-vl-plus
    .
  • max_tokens
    (int, optional): default
    512
    .
  • temperature
    (float, optional): default
    0.2
    .
  • detail
    (string, optional):
    auto
    /
    low
    /
    high
    , default
    auto
    .
  • json_mode
    (bool, optional): return JSON-only response when possible.
  • schema
    (object, optional): JSON Schema for structured extraction.
  • max_retries
    (int, optional): retry count for
    429/5xx
    , default
    2
    .
  • retry_backoff_s
    (float, optional): exponential backoff base seconds, default
    1.5
    .
  • prompt
    (字符串,必填):关于图像的用户问题/指令。
  • image
    (字符串,必填):HTTPS链接、本地路径或
    data:
    格式URL。
  • model
    (字符串,可选):默认值为
    qwen3-vl-plus
  • max_tokens
    (整数,可选):默认值为512。
  • temperature
    (浮点数,可选):默认值为0.2。
  • detail
    (字符串,可选):可选值
    auto
    /
    low
    /
    high
    ,默认值为
    auto
  • json_mode
    (布尔值,可选):开启时尽可能返回纯JSON格式响应。
  • schema
    (对象,可选):用于结构化提取的JSON Schema。
  • max_retries
    (整数,可选):针对
    429/5xx
    错误的重试次数,默认值为2。
  • retry_backoff_s
    (浮点数,可选):指数退避的基础秒数,默认值为1.5。

Response

响应参数

  • text
    (string): primary model answer.
  • model
    (string): model actually used.
  • usage
    (object): token usage if returned by backend.
  • text
    (字符串):模型返回的主要回答内容。
  • model
    (字符串):实际调用的模型名称。
  • usage
    (对象):后端返回的token使用情况(若有)。

Quickstart

快速开始

bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"请概括这张图里的主要内容","image":"https://example.com/demo.jpg"}' \
  --print-response
Using local image:
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"提取图片中的关键信息","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
  --print-response
Structured extraction (JSON mode):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"提取字段: title, amount, date","image":"./samples/invoice.png"}' \
  --json-mode \
  --print-response
Structured extraction (JSON Schema):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"提取发票字段","image":"./samples/invoice.png"}' \
  --schema skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/references/examples/invoice.schema.json \
  --print-response
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"请概括这张图里的主要内容","image":"https://example.com/demo.jpg"}' \
  --print-response
使用本地图像:
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"提取图片中的关键信息","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
  --print-response
结构化提取(JSON模式):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"提取字段: title, amount, date","image":"./samples/invoice.png"}' \
  --json-mode \
  --print-response
结构化提取(JSON Schema):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"提取发票字段","image":"./samples/invoice.png"}' \
  --schema skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/references/examples/invoice.schema.json \
  --print-response

cURL (compatible mode)

cURL(兼容模式)

bash
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"qwen3-vl-plus",
    "messages":[
      {
        "role":"user",
        "content":[
          {"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
          {"type":"text","text":"请描述这张图并列出可执行动作"}
        ]
      }
    ],
    "max_tokens":512,
    "temperature":0.2
  }'
bash
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"qwen3-vl-plus",
    "messages":[
      {
        "role":"user",
        "content":[
          {"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
          {"type":"text","text":"请描述这张图并列出可执行动作"}
        ]
      }
    ],
    "max_tokens":512,
    "temperature":0.2
  }'

Output location

输出位置

  • If
    --output
    is set, JSON response is saved to that file.
  • Default output dir convention:
    output/ai-multimodal-qwen-vl/
    .
  • 若设置
    --output
    参数,JSON响应将保存至指定文件。
  • 默认输出目录规则:
    output/ai-multimodal-qwen-vl/

Smoke test

冒烟测试

bash
python tests/ai/multimodal/alicloud-ai-multimodal-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
  --image output/ai-image-qwen-image/images/vl_test_cat.png
bash
python tests/ai/multimodal/alicloud-ai-multimodal-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
  --image output/ai-image-qwen-image/images/vl_test_cat.png

Error handling

错误处理

ErrorLikely causeAction
401/403Missing or invalid keyCheck
DASHSCOPE_API_KEY
and account permissions.
400Invalid request schema or unsupported image sourceValidate
messages
content and image URL/path format.
429Rate limitRetry with exponential backoff and lower concurrency.
5xxTemporary backend issueRetry with backoff and idempotent request design.
错误码可能原因处理措施
401/403密钥缺失或无效检查
DASHSCOPE_API_KEY
及账号权限。
400请求格式无效或图像来源不支持验证
messages
内容及图像URL/路径格式。
429触发速率限制使用指数退避重试,并降低并发量。
5xx后端临时故障使用退避策略重试,并设计幂等请求。

Operational guidance

操作指南

  • For stable production behavior, pin snapshot model IDs instead of pure
    -latest
    .
  • Compress very large images before upload to reduce latency and cost.
  • Add explicit extraction constraints in prompt (fields, JSON shape, language).
  • For OCR-like output, ask for confidence notes and unresolved text markers.
  • 若需稳定的生产环境表现,请固定快照模型ID,而非仅使用
    -latest
    别名。
  • 上传前压缩大尺寸图像,以降低延迟和成本。
  • 在prompt中添加明确的提取约束(字段、JSON格式、语言)。
  • 对于类OCR输出,可要求返回置信度说明及未识别文本标记。

References

参考资料

  • Source list:
    references/sources.md
  • API notes:
    references/api_reference.md
  • 来源列表:
    references/sources.md
  • API说明:
    references/api_reference.md