aliyun-qwen-ocr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCategory: provider
Category: provider
Model Studio Qwen OCR
Model Studio Qwen OCR
Validation
校验
bash
mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txtPass criteria: command exits 0 and is generated.
output/aliyun-qwen-ocr/validate.txtbash
mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt通过标准:命令执行退出码为0,且生成文件。
output/aliyun-qwen-ocr/validate.txtOutput And Evidence
输出与凭证
- Save request payloads, selected OCR task name, and normalized output expectations under .
output/aliyun-qwen-ocr/ - Keep the exact model, image source, and task configuration with each saved run.
Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.
- 将请求载荷、所选OCR任务名称、标准化的输出预期保存到目录下。
output/aliyun-qwen-ocr/ - 每次运行保存的内容都要附带准确的模型、图像来源和任务配置信息。
如果任务核心是文本提取或文档结构解析而非广义的视觉推理,请使用Qwen OCR。
Critical model names
关键模型名称
Use one of these exact model strings:
qwen-vl-ocrqwen-vl-ocr-latestqwen-vl-ocr-2025-11-20qwen-vl-ocr-2025-08-28qwen-vl-ocr-2025-04-13qwen-vl-ocr-2024-10-28
Selection guidance:
- Use for the stable channel.
qwen-vl-ocr - Use only when you explicitly want the newest OCR behavior.
qwen-vl-ocr-latest - Pin when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.
qwen-vl-ocr-2025-11-20
请使用以下准确的模型字符串之一:
qwen-vl-ocrqwen-vl-ocr-latestqwen-vl-ocr-2025-11-20qwen-vl-ocr-2025-08-28qwen-vl-ocr-2025-04-13qwen-vl-ocr-2024-10-28
选型指引:
- 稳定渠道使用。
qwen-vl-ocr - 仅当你明确需要最新的OCR能力时,才使用。
qwen-vl-ocr-latest - 如果你需要基于Qwen3-VL OCR升级版本实现可复现的文档解析,请固定使用版本。
qwen-vl-ocr-2025-11-20
Prerequisites
前置要求
- Install dependencies (recommended in a venv):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests- Set in environment, or add
DASHSCOPE_API_KEYtodashscope_api_key.~/.alibabacloud/credentials
- 安装依赖(推荐在venv中安装):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests- 在环境变量中设置,或者将
DASHSCOPE_API_KEY添加到dashscope_api_key文件中。~/.alibabacloud/credentials
Normalized interface (ocr.extract)
标准化接口(ocr.extract)
Request
请求参数
- (string, required): HTTPS URL, local path, or
imageURL.data: - (string, optional): default
model.qwen-vl-ocr - (string, optional): use when you want custom extraction instructions.
prompt - (string, optional): built-in OCR task.
task - (object, optional): configuration for built-in task such as extraction fields.
task_config - (bool, optional): default
enable_rotate.false - (int, optional)
min_pixels - (int, optional)
max_pixels - (int, optional)
max_tokens - (float, optional): recommended to keep near default/low values.
temperature
- (字符串,必填):HTTPS URL、本地路径或
image格式URL。data: - (字符串,可选):默认值为
model。qwen-vl-ocr - (字符串,可选):需要自定义提取规则时使用。
prompt - (字符串,可选):内置OCR任务。
task - (对象,可选):内置任务的配置,例如提取字段。
task_config - (布尔值,可选):默认值为
enable_rotate。false - (整数,可选)
min_pixels - (整数,可选)
max_pixels - (整数,可选)
max_tokens - (浮点数,可选):推荐保持接近默认值/较低值。
temperature
Response
响应参数
- (string): extracted text or structured markdown/html-style output.
text - (string)
model - (object, optional)
usage
- (字符串):提取的文本或结构化的markdown/html格式输出。
text - (字符串)
model - (对象,可选)
usage
Built-in OCR tasks
内置OCR任务
Use one of these values in :
tasktext_recognitionkey_information_extractiondocument_parsingtable_parsingformula_recognitionmulti_lanadvanced_recognition
请在参数中使用以下值之一:
tasktext_recognitionkey_information_extractiondocument_parsingtable_parsingformula_recognitionmulti_lanadvanced_recognition
Quick start
快速开始
Custom prompt:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/invoice.png" \
--prompt "Extract seller name, invoice date, amount, and tax number in JSON."Built-in task:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/table.png" \
--task table_parsing \
--model qwen-vl-ocr-2025-11-20自定义prompt:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/invoice.png" \
--prompt "Extract seller name, invoice date, amount, and tax number in JSON."内置任务:
bash
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/table.png" \
--task table_parsing \
--model qwen-vl-ocr-2025-11-20Operational guidance
使用指引
- Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
- For critical business fields, add downstream validation rules after OCR.
- and older snapshots default to
qwen-vl-ocrmax output tokens unless higher limits are approved by Alibaba Cloud;4096follows the model maximum.qwen-vl-ocr-2025-11-20 - Increase only when small text is missed; this raises token cost.
max_pixels
- 标准解析任务优先使用内置OCR任务,因为它们使用官方的任务prompt。
- 对于关键业务字段,建议在OCR后添加下游校验规则。
- 及更早的快照版本默认最大输出token为4096,除非阿里云审批通过了更高的额度;
qwen-vl-ocr版本遵循模型本身的最大上限。qwen-vl-ocr-2025-11-20 - 仅当存在小文本漏识别的情况时再调大,该操作会增加token消耗成本。
max_pixels
Output location
输出位置
- Default output:
output/aliyun-qwen-ocr/request.json - Override base dir with .
OUTPUT_DIR
- 默认输出路径:
output/aliyun-qwen-ocr/request.json - 可通过环境变量覆盖基础目录。
OUTPUT_DIR
References
参考文档
references/api_reference.mdreferences/sources.md
references/api_reference.mdreferences/sources.md