image-to-text

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image to Text

图片转文本

Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.
使用OCR(Tesseract)从图片中提取所有可读取的文本。返回完整文本内容以及单词级别的边界框和置信度分数。

When to Use

适用场景

  • Reading text content from a screenshot or design mockup
  • Extracting UI copy (labels, buttons, headings) so you don't have to retype it
  • Getting text positions and bounding boxes from a design image
  • 读取截图或设计原型中的文本内容
  • 提取UI文案(标签、按钮、标题),无需手动重新输入
  • 获取设计图中的文本位置和边界框

How It Works

工作原理

  1. The image is passed to Tesseract.js for optical character recognition
  2. Tesseract segments the image into lines and words
  3. Returns the full text plus word-level details (position, confidence)
  1. 将图片传入Tesseract.js进行光学字符识别
  2. Tesseract将图片分割为行和单词
  3. 返回完整文本以及单词级别的详细信息(位置、置信度)

Usage

使用方法

bash
bash <skill-path>/scripts/image-to-text.sh <image-path> [language]
Arguments:
  • image-path
    — Path to the image file (required)
  • language
    — OCR language code (optional, defaults to
    eng
    ). Common:
    eng
    ,
    fra
    ,
    deu
    ,
    spa
    ,
    chi_sim
    ,
    jpn
Examples:
bash
undefined
bash
bash <skill-path>/scripts/image-to-text.sh <image-path> [language]
参数说明:
  • image-path
    — 图片文件路径(必填)
  • language
    — OCR语言代码(可选,默认值为
    eng
    )。常用代码:
    eng
    (英文)、
    fra
    (法文)、
    deu
    (德文)、
    spa
    (西班牙文)、
    chi_sim
    (简体中文)、
    jpn
    (日文)
示例:
bash
undefined

Extract text from a screenshot

从截图中提取文本

bash <skill-path>/scripts/image-to-text.sh ./screenshot.png
bash <skill-path>/scripts/image-to-text.sh ./screenshot.png

Extract French text

提取法语文本

bash <skill-path>/scripts/image-to-text.sh ./mockup.png fra
undefined
bash <skill-path>/scripts/image-to-text.sh ./mockup.png fra
undefined

Output

输出示例

json
{
  "text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",
  "confidence": 87.4,
  "words": [
    {
      "text": "Request",
      "confidence": 94.2,
      "bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }
    },
    {
      "text": "work",
      "confidence": 96.1,
      "bbox": { "x0": 274, "y0": 180, "x1": 332, "y1": 204 }
    }
  ],
  "lines": [
    {
      "text": "Request work",
      "confidence": 95.1,
      "bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }
    }
  ]
}
FieldTypeDescription
textStringFull extracted text, newline-separated
confidenceNumberOverall confidence score (0-100)
wordsArrayEach word with text, confidence, and bounding box
linesArrayEach line with text, confidence, and bounding box
json
{
  "text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",
  "confidence": 87.4,
  "words": [
    {
      "text": "Request",
      "confidence": 94.2,
      "bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }
    },
    {
      "text": "work",
      "confidence": 96.1,
      "bbox": { "x0": 274, "y0": 180, "x1": 332, "y1": 204 }
    }
  ],
  "lines": [
    {
      "text": "Request work",
      "confidence": 95.1,
      "bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }
    }
  ]
}
字段类型描述
text字符串提取的完整文本,按换行分隔
confidence数字整体置信度分数(0-100)
words数组每个单词包含文本、置信度和边界框信息
lines数组每一行包含文本、置信度和边界框信息

Present Results to User

向用户展示结果

After extracting text, present the content grouped by lines:
Extracted text (87.4% confidence):

  Request work
  Suggestions
  Plumbing
  HVAC
  Cleaning
  Electrical

Found 6 lines, 6 words.
Use the extracted text directly when implementing UI copy from a design.
提取文本后,按行分组展示内容:
提取的文本(置信度87.4%):

  Request work
  Suggestions
  Plumbing
  HVAC
  Cleaning
  Electrical

共找到6行,6个单词。
从设计图实现UI文案时,可直接使用提取的文本。

Troubleshooting

故障排除

Low confidence / garbled text — Tesseract works best with clean, high-contrast text. Screenshots of rendered UI work well. Photos of text at angles or with noise may produce poor results.
Wrong language — Pass the correct language code as the second argument. Tesseract needs the right language model to recognize characters.
First run is slow — Tesseract downloads language data (~4MB for English) on the first run. Subsequent runs are faster.
低置信度/文本乱码 — Tesseract在处理清晰、高对比度的文本时效果最佳。渲染后的UI截图效果很好。倾斜或带有噪点的照片文本可能会产生较差的结果。
语言识别错误 — 传入正确的语言代码作为第二个参数。Tesseract需要匹配的语言模型才能识别字符。
首次运行缓慢 — Tesseract首次运行时会下载语言数据(英文约4MB)。后续运行速度会更快。