code-from-image
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCode From Image
从图片提取代码
Overview
概述
This skill provides guidance for extracting code or pseudocode from images and implementing it correctly. It covers OCR tool selection, handling ambiguous text extraction, and verification strategies to ensure accurate implementation.
本技能提供从图片中提取代码或伪代码并正确实现的指导,涵盖OCR工具选择、模糊文本提取处理以及确保实现准确性的验证策略。
Workflow
工作流程
Step 1: Environment Preparation
步骤1:环境准备
Before attempting to read an image, check available tools and packages:
- Check what package managers are available (,
pip,pip3,uv)conda - Check what image processing tools are installed (,
tesseract,pytesseract)PIL/Pillow - Install missing dependencies before proceeding
This avoids wasted attempts with unavailable tools.
在尝试读取图片前,检查可用的工具和包:
- 检查可用的包管理器(、
pip、pip3、uv)conda - 检查已安装的图像处理工具(、
tesseract、pytesseract)PIL/Pillow - 先安装缺失的依赖项再继续操作
这样可以避免因工具不可用而做无用功。
Step 2: Image Analysis
步骤2:图片分析
Examine the image before OCR extraction:
- Use to verify the file type and ensure it's a valid image
file <image> - Open the image visually if possible to understand content structure
- Note the image quality, contrast, and text clarity
在进行OCR提取前先检查图片:
- 使用命令验证文件类型,确保是有效图片
file <image> - 若可能,可视化打开图片以了解内容结构
- 记录图片的质量、对比度和文本清晰度
Step 3: OCR Extraction with Multiple Attempts
步骤3:多次尝试OCR提取
OCR is inherently error-prone. To maximize accuracy:
- First attempt: Use standard OCR (pytesseract with default settings)
- If output is garbled: Apply image preprocessing:
- Increase contrast
- Convert to grayscale
- Apply binarization (threshold)
- Resize the image (2x or 3x upscaling can help)
- Compare outputs: If multiple OCR attempts yield different results, cross-reference them
Example preprocessing with PIL:
python
from PIL import Image, ImageEnhance, ImageFilter
img = Image.open("code.png")OCR本质上容易出错,为了最大化准确率:
- 首次尝试:使用标准OCR(默认设置的pytesseract)
- 若输出乱码:应用图片预处理:
- 提高对比度
- 转换为灰度图
- 应用二值化(阈值处理)
- 调整图片大小(放大2倍或3倍可能有帮助)
- 对比输出:如果多次OCR尝试得到不同结果,交叉参考这些结果
使用PIL进行预处理的示例:
python
from PIL import Image, ImageEnhance, ImageFilter
img = Image.open("code.png")Convert to grayscale
转换为灰度图
img = img.convert("L")
img = img.convert("L")
Increase contrast
提高对比度
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2.0)
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2.0)
Apply threshold for binarization
应用阈值进行二值化
img = img.point(lambda x: 0 if x < 128 else 255, '1')
img.save("preprocessed.png")
undefinedimg = img.point(lambda x: 0 if x < 128 else 255, '1')
img.save("preprocessed.png")
undefinedStep 4: Interpreting OCR Output
步骤4:解读OCR输出
OCR frequently produces character substitution errors. Document all interpretations explicitly:
Common OCR Misreadings:
- (zero) vs
0(letter O) vsO(lowercase o)o - (one) vs
1(lowercase L) vsl(uppercase i)I - vs
Svs5$ - vs
G6 - vs
B8 - vs
:; - may appear as
sha256orcha256sha2S6 - Variable names may have incorrect characters (e.g., instead of
GALT)SALT - Quote characters may be mangled (instead of
6"for byte strings)b" - Array slicing may be garbled (appearing as
h0[:10])hof:10]
Process for interpretation:
- List each unclear portion of the OCR output
- Document the most likely correct interpretation
- Explain reasoning for each interpretation
- Flag any interpretations with high uncertainty
OCR经常会出现字符替换错误,需明确记录所有解读内容:
常见OCR识别错误:
- (数字零)与
0(大写字母O)与O(小写字母o)o - (数字一)与
1(小写字母L)与l(大写字母I)I - 与
S与5$ - 与
G6 - 与
B8 - 与
:; - 可能被识别为
sha256或cha256sha2S6 - 变量名可能包含错误字符(例如,被识别成
GALT)SALT - 引号字符可能被混淆(例如字节字符串的被识别成
b")6" - 数组切片可能被识别错误(例如被识别成
h0[:10])hof:10]
解读流程:
- 列出OCR输出中每个不清晰的部分
- 记录最可能正确的解读结果
- 解释每个解读的理由
- 标记任何不确定性高的解读
Step 5: Implementation
步骤5:实现代码
When implementing the extracted code:
- Preserve the algorithm structure: Follow the logic as written, don't optimize prematurely
- Handle encoding explicitly: For cryptographic operations, be explicit about string vs bytes encoding
- Add basic error handling: Include try/except for file operations and external calls
- Log intermediate values: Print or log intermediate results for debugging
实现提取的代码时:
- 保留算法结构:按照编写的逻辑执行,不要过早优化
- 明确处理编码:对于加密操作,要明确区分字符串与字节编码
- 添加基础错误处理:为文件操作和外部调用添加try/except块
- 记录中间值:打印或记录中间结果以便调试
Step 6: Verification
步骤6:验证
Verify the implementation systematically:
- If a hint is provided (e.g., expected output prefix): Use it to validate, but don't rely on it exclusively
- Trace through the algorithm manually: Verify your understanding matches the implementation
- Test with known inputs: If possible, create test cases with predictable outputs
- Check edge cases: Empty inputs, special characters, boundary conditions
Warning: Using hints as the sole validation is brittle. A correct output prefix doesn't guarantee the algorithm is fully correct for all inputs.
系统地验证实现结果:
- 若提供提示(例如预期输出前缀):用它来验证,但不要完全依赖
- 手动跟踪算法流程:验证你的理解是否与实现一致
- 使用已知输入测试:若可能,创建具有可预测输出的测试用例
- 检查边界情况:空输入、特殊字符、边界条件
警告:仅使用提示作为唯一验证方式不可靠。正确的输出前缀并不保证算法对所有输入都完全正确。
Common Pitfalls
常见陷阱
OCR-Related
OCR相关
- Accepting first OCR output without verification: Always cross-check unclear characters
- Not documenting assumptions: When interpreting garbled text, explicitly state what you're assuming
- Skipping preprocessing: Image enhancement significantly improves OCR accuracy
- 未验证就接受首次OCR输出:始终交叉检查不清晰的字符
- 未记录假设:解读乱码文本时,明确说明你的假设
- 跳过预处理:图片增强能显著提高OCR准确率
Implementation-Related
实现相关
- String vs bytes confusion: In Python, cryptographic functions often require bytes () not strings
b"string" - Missing imports: Ensure all required modules are imported before running
- Silent failures: Add explicit error messages for file operations
- 混淆字符串与字节:在Python中,加密函数通常需要字节()而非字符串
b"string" - 缺失导入:运行前确保导入所有必需的模块
- 静默失败:为文件操作添加明确的错误提示
Verification-Related
验证相关
- Over-relying on partial hints: A matching prefix doesn't mean the full output is correct
- Not validating intermediate steps: Check values at each stage, not just the final output
- Assuming OCR was correct: If output doesn't match expectations, revisit OCR interpretation
- 过度依赖部分提示:匹配前缀不代表完整输出正确
- 未验证中间步骤:检查每个阶段的值,而不只是最终输出
- 假设OCR结果正确:如果输出与预期不符,重新审视OCR解读
Fallback Strategy
备选策略
If the initial interpretation produces incorrect results:
- Re-examine the original image, focusing on unclear characters
- Try alternative OCR preprocessing techniques
- List all ambiguous characters and test alternative interpretations systematically
- If multiple interpretations exist, implement and test each one
如果初始解读产生错误结果:
- 重新检查原始图片,重点关注不清晰的字符
- 尝试其他OCR预处理技术
- 列出所有模糊字符,系统地测试备选解读
- 若存在多种解读,分别实现并测试每种情况
Example Workflow
工作流程示例
For a task like "Extract pseudocode from image and compute hash":
- Check environment: ,
which tesseractpip3 list | grep -i pil - Install if needed:
pip3 install pillow pytesseract - Analyze image:
file code.png - Extract text with OCR
- If garbled, preprocess image and retry OCR
- Document interpretations: "OCR shows - interpreting as
GALT = 6"0000...because G/S confusion is common and 6" likely represents b" for bytes"SALT = b"0000..." - Implement the algorithm
- Verify output against any provided hints
- If verification fails, revisit step 5-6 with alternative interpretations
对于诸如"从图片提取伪代码并计算哈希值"的任务:
- 检查环境:、
which tesseractpip3 list | grep -i pil - 若需要则安装:
pip3 install pillow pytesseract - 分析图片:
file code.png - 使用OCR提取文本
- 若输出乱码,预处理图片并重试OCR
- 记录解读内容:"OCR显示- 解读为
GALT = 6"0000...,因为G与S的混淆很常见,且SALT = b"0000..."可能代表字节的6""b" - 实现算法
- 根据提供的提示验证输出
- 若验证失败,回到步骤5-6尝试备选解读