aliyun-qwen-ocr

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
Category: provider

Model Studio Qwen OCR

Model Studio Qwen OCR

Validation

校验

bash
mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt
Pass criteria: command exits 0 and
output/aliyun-qwen-ocr/validate.txt
is generated.
bash
mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt
通过标准:命令执行退出码为0,且生成
output/aliyun-qwen-ocr/validate.txt
文件。

Output And Evidence

输出与凭证

  • Save request payloads, selected OCR task name, and normalized output expectations under
    output/aliyun-qwen-ocr/
    .
  • Keep the exact model, image source, and task configuration with each saved run.
Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.
  • 将请求载荷、所选OCR任务名称、标准化的输出预期保存到
    output/aliyun-qwen-ocr/
    目录下。
  • 每次运行保存的内容都要附带准确的模型、图像来源和任务配置信息。
如果任务核心是文本提取或文档结构解析而非广义的视觉推理,请使用Qwen OCR。

Critical model names

关键模型名称

Use one of these exact model strings:
  • qwen-vl-ocr
  • qwen-vl-ocr-latest
  • qwen-vl-ocr-2025-11-20
  • qwen-vl-ocr-2025-08-28
  • qwen-vl-ocr-2025-04-13
  • qwen-vl-ocr-2024-10-28
Selection guidance:
  • Use
    qwen-vl-ocr
    for the stable channel.
  • Use
    qwen-vl-ocr-latest
    only when you explicitly want the newest OCR behavior.
  • Pin
    qwen-vl-ocr-2025-11-20
    when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.
请使用以下准确的模型字符串之一:
  • qwen-vl-ocr
  • qwen-vl-ocr-latest
  • qwen-vl-ocr-2025-11-20
  • qwen-vl-ocr-2025-08-28
  • qwen-vl-ocr-2025-04-13
  • qwen-vl-ocr-2024-10-28
选型指引:
  • 稳定渠道使用
    qwen-vl-ocr
  • 仅当你明确需要最新的OCR能力时,才使用
    qwen-vl-ocr-latest
  • 如果你需要基于Qwen3-VL OCR升级版本实现可复现的文档解析,请固定使用
    qwen-vl-ocr-2025-11-20
    版本。

Prerequisites

前置要求

  • Install dependencies (recommended in a venv):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
  • Set
    DASHSCOPE_API_KEY
    in environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    .
  • 安装依赖(推荐在venv中安装):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
  • 在环境变量中设置
    DASHSCOPE_API_KEY
    ,或者将
    dashscope_api_key
    添加到
    ~/.alibabacloud/credentials
    文件中。

Normalized interface (ocr.extract)

标准化接口(ocr.extract)

Request

请求参数

  • image
    (string, required): HTTPS URL, local path, or
    data:
    URL.
  • model
    (string, optional): default
    qwen-vl-ocr
    .
  • prompt
    (string, optional): use when you want custom extraction instructions.
  • task
    (string, optional): built-in OCR task.
  • task_config
    (object, optional): configuration for built-in task such as extraction fields.
  • enable_rotate
    (bool, optional): default
    false
    .
  • min_pixels
    (int, optional)
  • max_pixels
    (int, optional)
  • max_tokens
    (int, optional)
  • temperature
    (float, optional): recommended to keep near default/low values.
  • image
    (字符串,必填):HTTPS URL、本地路径或
    data:
    格式URL。
  • model
    (字符串,可选):默认值为
    qwen-vl-ocr
  • prompt
    (字符串,可选):需要自定义提取规则时使用。
  • task
    (字符串,可选):内置OCR任务。
  • task_config
    (对象,可选):内置任务的配置,例如提取字段。
  • enable_rotate
    (布尔值,可选):默认值为
    false
  • min_pixels
    (整数,可选)
  • max_pixels
    (整数,可选)
  • max_tokens
    (整数,可选)
  • temperature
    (浮点数,可选):推荐保持接近默认值/较低值。

Response

响应参数

  • text
    (string): extracted text or structured markdown/html-style output.
  • model
    (string)
  • usage
    (object, optional)
  • text
    (字符串):提取的文本或结构化的markdown/html格式输出。
  • model
    (字符串)
  • usage
    (对象,可选)

Built-in OCR tasks

内置OCR任务

Use one of these values in
task
:
  • text_recognition
  • key_information_extraction
  • document_parsing
  • table_parsing
  • formula_recognition
  • multi_lan
  • advanced_recognition
请在
task
参数中使用以下值之一:
  • text_recognition
  • key_information_extraction
  • document_parsing
  • table_parsing
  • formula_recognition
  • multi_lan
  • advanced_recognition

Quick start

快速开始

Custom prompt:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/invoice.png" \
  --prompt "Extract seller name, invoice date, amount, and tax number in JSON."
Built-in task:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/table.png" \
  --task table_parsing \
  --model qwen-vl-ocr-2025-11-20
自定义prompt:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/invoice.png" \
  --prompt "Extract seller name, invoice date, amount, and tax number in JSON."
内置任务:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/table.png" \
  --task table_parsing \
  --model qwen-vl-ocr-2025-11-20

Operational guidance

使用指引

  • Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
  • For critical business fields, add downstream validation rules after OCR.
  • qwen-vl-ocr
    and older snapshots default to
    4096
    max output tokens unless higher limits are approved by Alibaba Cloud;
    qwen-vl-ocr-2025-11-20
    follows the model maximum.
  • Increase
    max_pixels
    only when small text is missed; this raises token cost.
  • 标准解析任务优先使用内置OCR任务,因为它们使用官方的任务prompt。
  • 对于关键业务字段,建议在OCR后添加下游校验规则。
  • qwen-vl-ocr
    及更早的快照版本默认最大输出token为4096,除非阿里云审批通过了更高的额度;
    qwen-vl-ocr-2025-11-20
    版本遵循模型本身的最大上限。
  • 仅当存在小文本漏识别的情况时再调大
    max_pixels
    ,该操作会增加token消耗成本。

Output location

输出位置

  • Default output:
    output/aliyun-qwen-ocr/request.json
  • Override base dir with
    OUTPUT_DIR
    .
  • 默认输出路径:
    output/aliyun-qwen-ocr/request.json
  • 可通过
    OUTPUT_DIR
    环境变量覆盖基础目录。

References

参考文档

  • references/api_reference.md
  • references/sources.md
  • references/api_reference.md
  • references/sources.md