aliyun-qwen-ocr

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Category: provider

Model Studio Qwen OCR

Validation

校验

bash

mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt

Pass criteria: command exits 0 and

output/aliyun-qwen-ocr/validate.txt

is generated.

bash

mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt

通过标准：命令执行退出码为0，且生成

output/aliyun-qwen-ocr/validate.txt

文件。

Output And Evidence

输出与凭证

Save request payloads, selected OCR task name, and normalized output expectations under
```
output/aliyun-qwen-ocr/
```
.
Keep the exact model, image source, and task configuration with each saved run.

Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.

将请求载荷、所选OCR任务名称、标准化的输出预期保存到
```
output/aliyun-qwen-ocr/
```
目录下。
每次运行保存的内容都要附带准确的模型、图像来源和任务配置信息。

如果任务核心是文本提取或文档结构解析而非广义的视觉推理，请使用Qwen OCR。

Critical model names

关键模型名称

Use one of these exact model strings:

```
qwen-vl-ocr
```
```
qwen-vl-ocr-latest
```
```
qwen-vl-ocr-2025-11-20
```
```
qwen-vl-ocr-2025-08-28
```
```
qwen-vl-ocr-2025-04-13
```
```
qwen-vl-ocr-2024-10-28
```

Selection guidance:

Use
```
qwen-vl-ocr
```
for the stable channel.
Use
```
qwen-vl-ocr-latest
```
only when you explicitly want the newest OCR behavior.
Pin
```
qwen-vl-ocr-2025-11-20
```
when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.

请使用以下准确的模型字符串之一：

```
qwen-vl-ocr
```
```
qwen-vl-ocr-latest
```
```
qwen-vl-ocr-2025-11-20
```
```
qwen-vl-ocr-2025-08-28
```
```
qwen-vl-ocr-2025-04-13
```
```
qwen-vl-ocr-2024-10-28
```

选型指引：

稳定渠道使用
```
qwen-vl-ocr
```
。
仅当你明确需要最新的OCR能力时，才使用
```
qwen-vl-ocr-latest
```
。
如果你需要基于Qwen3-VL OCR升级版本实现可复现的文档解析，请固定使用
```
qwen-vl-ocr-2025-11-20
```
版本。

Prerequisites

前置要求

Install dependencies (recommended in a venv):

bash

python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests

Set

DASHSCOPE_API_KEY

in environment, or add

dashscope_api_key

~/.alibabacloud/credentials

安装依赖（推荐在venv中安装）：

bash

python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests

在环境变量中设置

DASHSCOPE_API_KEY

，或者将

dashscope_api_key

添加到

~/.alibabacloud/credentials

文件中。

Normalized interface (ocr.extract)

标准化接口（ocr.extract）

Request

请求参数

```
image
```
(string, required): HTTPS URL, local path, or
```
data:
```
URL.
```
model
```
(string, optional): default
```
qwen-vl-ocr
```
.
```
prompt
```
(string, optional): use when you want custom extraction instructions.
```
task
```
(string, optional): built-in OCR task.
```
task_config
```
(object, optional): configuration for built-in task such as extraction fields.
```
enable_rotate
```
(bool, optional): default
```
false
```
.
```
min_pixels
```
(int, optional)
```
max_pixels
```
(int, optional)
```
max_tokens
```
(int, optional)
```
temperature
```
(float, optional): recommended to keep near default/low values.

```
image
```
(字符串，必填)：HTTPS URL、本地路径或
```
data:
```
格式URL。
```
model
```
(字符串，可选)：默认值为
```
qwen-vl-ocr
```
。
```
prompt
```
(字符串，可选)：需要自定义提取规则时使用。
```
task
```
(字符串，可选)：内置OCR任务。
```
task_config
```
(对象，可选)：内置任务的配置，例如提取字段。
```
enable_rotate
```
(布尔值，可选)：默认值为
```
false
```
。
```
min_pixels
```
(整数，可选)
```
max_pixels
```
(整数，可选)
```
max_tokens
```
(整数，可选)
```
temperature
```
(浮点数，可选)：推荐保持接近默认值/较低值。

Response

响应参数

```
text
```
(string): extracted text or structured markdown/html-style output.
```
model
```
(string)
```
usage
```
(object, optional)

```
text
```
(字符串)：提取的文本或结构化的markdown/html格式输出。
```
model
```
(字符串)
```
usage
```
(对象，可选)

Built-in OCR tasks

内置OCR任务

Use one of these values in

task

```
text_recognition
```
```
key_information_extraction
```
```
document_parsing
```
```
table_parsing
```
```
formula_recognition
```
```
multi_lan
```
```
advanced_recognition
```

请在

task

参数中使用以下值之一：

```
text_recognition
```
```
key_information_extraction
```
```
document_parsing
```
```
table_parsing
```
```
formula_recognition
```
```
multi_lan
```
```
advanced_recognition
```

Quick start

快速开始

Custom prompt:

bash

python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/invoice.png" \
  --prompt "Extract seller name, invoice date, amount, and tax number in JSON."

Built-in task:

bash

python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/table.png" \
  --task table_parsing \
  --model qwen-vl-ocr-2025-11-20

自定义prompt：

bash

python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/invoice.png" \
  --prompt "Extract seller name, invoice date, amount, and tax number in JSON."

内置任务：

bash

python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/table.png" \
  --task table_parsing \
  --model qwen-vl-ocr-2025-11-20

Operational guidance

使用指引

Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
For critical business fields, add downstream validation rules after OCR.
```
qwen-vl-ocr
```
and older snapshots default to
```
4096
```
max output tokens unless higher limits are approved by Alibaba Cloud;
```
qwen-vl-ocr-2025-11-20
```
follows the model maximum.
Increase
```
max_pixels
```
only when small text is missed; this raises token cost.

标准解析任务优先使用内置OCR任务，因为它们使用官方的任务prompt。
对于关键业务字段，建议在OCR后添加下游校验规则。
```
qwen-vl-ocr
```
及更早的快照版本默认最大输出token为4096，除非阿里云审批通过了更高的额度；
```
qwen-vl-ocr-2025-11-20
```
版本遵循模型本身的最大上限。
仅当存在小文本漏识别的情况时再调大
```
max_pixels
```
，该操作会增加token消耗成本。

Output location

输出位置

Default output:
```
output/aliyun-qwen-ocr/request.json
```
Override base dir with
```
OUTPUT_DIR
```
.

默认输出路径：
```
output/aliyun-qwen-ocr/request.json
```
可通过
```
OUTPUT_DIR
```
环境变量覆盖基础目录。

References

参考文档

```
references/api_reference.md
```
```
references/sources.md
```

```
references/api_reference.md
```
```
references/sources.md
```