ocr-super-surya
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOCR Super Surya
OCR超级工具Surya
When to Use
适用场景
- OCR, extract text from image, text recognition, 画像から文字
- Extracting text from screenshots, photos, or scanned images
- Processing PDFs with embedded images
- Multi-language document OCR (90+ languages including Japanese)
- OCR、图片文本提取、文字识别、从图像提取文字
- 从截图、照片或扫描件中提取文本
- 处理包含嵌入图片的PDF
- 多语言文档OCR(支持日语在内的90余种语言)
Features
功能特性
| Feature | Description |
|---|---|
| Accuracy | 2x better than Tesseract (0.97 vs 0.88) |
| GPU | PyTorch-based, CUDA optimized |
| Languages | 90+ including CJK |
| Layout | Document layout, table recognition |
| 功能 | 描述 |
|---|---|
| 准确率 | 比Tesseract高2倍(0.97 vs 0.88) |
| GPU支持 | 基于PyTorch,CUDA优化 |
| 支持语言 | 90余种,包括中日韩(CJK)语言 |
| 版面分析 | 文档版面、表格识别 |
Quick Start
快速开始
Installation
安装步骤
bash
undefinedbash
undefined1. Check GPU
1. 检查GPU
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
2. Install (with CUDA if GPU available)
2. 安装(若有GPU则启用CUDA)
pip install surya-ocr
pip install surya-ocr
If CUDA=False but you have GPU, reinstall PyTorch:
若CUDA=False但实际有GPU,重新安装PyTorch:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
undefinedpip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
undefinedUsage
使用方法
bash
undefinedbash
undefinedCLI
命令行界面(CLI)
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt
Or use surya directly
或直接使用surya命令
surya_ocr image.png --output_dir ./results
undefinedsurya_ocr image.png --output_dir ./results
undefinedPython API
Python API
python
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor
image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()
predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
for line in page.text_lines:
print(line.text)python
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor
image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()
predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
for line in page.text_lines:
print(line.text)GPU Configuration
GPU配置
| Variable | Default | Description |
|---|---|---|
| 512 | Reduce for lower VRAM |
| 36 | Reduce if OOM |
bash
export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png| 环境变量 | 默认值 | 描述 |
|---|---|---|
| 512 | 显存不足时可减小该值 |
| 36 | 出现OOM时减小该值 |
bash
export RECOGNITION_BATCH_SIZE=256
surya_ocr image.pngScripts
脚本说明
| Script | Description |
|---|---|
| Helper with OOM auto-retry, batch support |
| 脚本 | 描述 |
|---|---|
| 具备OOM自动重试、批量处理功能的辅助脚本 |
Done Criteria
完成标准
- CUDA available (if GPU present)
- Text extracted from target image
- Output saved to specified file
- 若有GPU则CUDA可用
- 从目标图片中提取出文本
- 输出保存至指定文件
License
许可证
- This skill: CC BY-NC 4.0
- Surya: GPL-3.0 (code), commercial license for >$2M revenue
- 本技能:CC BY-NC 4.0
- Surya:GPL-3.0(代码),年收入超过200万美元需使用商业许可证