silicon-paddle-ocr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOCR - Image Text Recognition
OCR - 图片文字识别
Use PaddleOCR to extract text content from images. Supports single image or batch processing.
使用PaddleOCR从图片中提取文本内容,支持单张图片或批量处理。
Overview
概述
This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.
本技能通过SiliconFlow API调用PaddlePaddle/PaddleOCR-VL-1.5模型,提供光学字符识别(OCR)能力。可从JPG、PNG、WebP、BMP和GIF格式的图片中提取文字。
When to Use
使用场景
Invoke this skill when:
- User wants to extract text from an image
- User asks to OCR a screenshot or photo
- User needs to read text from an image file
- User mentions text recognition from images
在以下情况调用本技能:
- 用户需要从图片中提取文字
- 用户要求对截图或照片进行OCR识别
- 用户需要读取图片文件中的文字
- 用户提及图片文字识别
How to Use
使用方法
Prerequisites
前置条件
Ensure the environment variable is set:
SILICONFLOW_API_KEYbash
export SILICONFLOW_API_KEY="your_api_key"确保已设置环境变量:
SILICONFLOW_API_KEYbash
export SILICONFLOW_API_KEY="your_api_key"Basic Usage
基础用法
Execute the OCR script:
bash
python3 scripts/ocr_skill.py [options] image_path执行OCR脚本:
bash
python3 scripts/ocr_skill.py [options] image_pathArguments
参数说明
| Argument | Description |
|---|---|
| Image file path(s) or glob pattern (required) |
| API key (default: from SILICONFLOW_API_KEY env) |
| OCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5) |
| Recognition prompt for custom behavior |
| Output results in JSON format |
| Save results to specified file |
| Maximum tokens in response (default: 2000) |
| 参数 | 描述 |
|---|---|
| 图片文件路径或通配符模式(必填) |
| API密钥(默认:从SILICONFLOW_API_KEY环境变量读取) |
| OCR模型名称(默认:PaddlePaddle/PaddleOCR-VL-1.5) |
| 用于自定义识别行为的提示词 |
| 以JSON格式输出结果 |
| 将结果保存到指定文件 |
| 响应的最大令牌数(默认:2000) |
Examples
示例
Single image:
bash
python3 scripts/ocr_skill.py /path/to/image.jpgMultiple images with glob:
bash
python3 scripts/ocr_skill.py /path/to/images/*.pngJSON output format:
bash
python3 scripts/ocr_skill.py --json /path/to/image.jpgCustom prompt for table extraction:
bash
python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpgSave to file:
bash
python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg单张图片识别:
bash
python3 scripts/ocr_skill.py /path/to/image.jpg通配符批量处理图片:
bash
python3 scripts/ocr_skill.py /path/to/images/*.pngJSON格式输出:
bash
python3 scripts/ocr_skill.py --json /path/to/image.jpg自定义提示词提取表格内容:
bash
python3 scripts/ocr_skill.py -p "请识别表格内容并格式化为Markdown格式" /path/to/table.jpg保存结果到文件:
bash
python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpgOutput Format
输出格式
Text output (default):
--- image.jpg ---
识别到的文字内容
识别到 X 处文字区域JSON output:
json
{
"image.jpg": {
"image_path": "/path/to/image.jpg",
"image_size": [width, height],
"texts": [
{
"text": "识别的文字",
"box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
}
],
"full_text": "所有文本的组合"
},
"image2.png": { ... }
}Coordinates Explanation:
- LOC values are normalized coordinates converted to pixel coordinates
- Conversion: pixel = LOC × (image_size / LOC_max_value)
- LOC max_value is approximately 972 (may vary by model/image)
- The field provides the four corner coordinates of each text region in pixel format
box
文本输出(默认):
--- image.jpg ---
识别到的文字内容
识别到 X 处文字区域JSON输出:
json
{
"image.jpg": {
"image_path": "/path/to/image.jpg",
"image_size": [width, height],
"texts": [
{
"text": "识别的文字",
"box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
}
],
"full_text": "所有文本的组合"
},
"image2.png": { ... }
}坐标说明:
- LOC值为归一化坐标,已转换为像素坐标
- 转换公式:像素坐标 = LOC × (图片尺寸 / LOC最大值)
- LOC最大值约为972(可能因模型/图片而异)
- 字段提供每个文字区域的四个角点像素坐标
box
Supported Image Formats
支持的图片格式
- JPG/JPEG
- PNG
- WebP
- BMP
- GIF
- JPG/JPEG
- PNG
- WebP
- BMP
- GIF
Error Handling
错误处理
If processing fails:
- Check that the image file exists
- Verify the SILICONFLOW_API_KEY is valid
- Ensure the API endpoint is reachable
Images that fail to process will show an error message, and other images will continue processing.
若处理失败:
- 检查图片文件是否存在
- 验证SILICONFLOW_API_KEY是否有效
- 确保API端点可访问
处理失败的图片会显示错误信息,其他图片将继续处理。
Additional Resources
额外资源
Reference Files
参考文件
- - API configuration details
references/api-configuration.md
- - API配置详情
references/api-configuration.md
Example Files
示例文件
- - Example usage script
examples/sample-usage.sh
- - 示例使用脚本
examples/sample-usage.sh
Scripts
脚本文件
- - The main OCR implementation
scripts/ocr_skill.py
- - OCR核心实现脚本
scripts/ocr_skill.py