code-from-image

Original🇺🇸 English
Translated

Guide for extracting code or pseudocode from images using OCR and implementing it correctly. This skill should be used when tasks involve reading code, pseudocode, or algorithms from images (PNG, JPG, screenshots) and executing or implementing the extracted logic.

3installs
Added on

NPX Install

npx skill4agent add letta-ai/skills code-from-image

Code From Image

Overview

This skill provides guidance for extracting code or pseudocode from images and implementing it correctly. It covers OCR tool selection, handling ambiguous text extraction, and verification strategies to ensure accurate implementation.

Workflow

Step 1: Environment Preparation

Before attempting to read an image, check available tools and packages:
  1. Check what package managers are available (
    pip
    ,
    pip3
    ,
    uv
    ,
    conda
    )
  2. Check what image processing tools are installed (
    tesseract
    ,
    pytesseract
    ,
    PIL/Pillow
    )
  3. Install missing dependencies before proceeding
This avoids wasted attempts with unavailable tools.

Step 2: Image Analysis

Examine the image before OCR extraction:
  1. Use
    file <image>
    to verify the file type and ensure it's a valid image
  2. Open the image visually if possible to understand content structure
  3. Note the image quality, contrast, and text clarity

Step 3: OCR Extraction with Multiple Attempts

OCR is inherently error-prone. To maximize accuracy:
  1. First attempt: Use standard OCR (pytesseract with default settings)
  2. If output is garbled: Apply image preprocessing:
    • Increase contrast
    • Convert to grayscale
    • Apply binarization (threshold)
    • Resize the image (2x or 3x upscaling can help)
  3. Compare outputs: If multiple OCR attempts yield different results, cross-reference them
Example preprocessing with PIL:
python
from PIL import Image, ImageEnhance, ImageFilter

img = Image.open("code.png")
# Convert to grayscale
img = img.convert("L")
# Increase contrast
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2.0)
# Apply threshold for binarization
img = img.point(lambda x: 0 if x < 128 else 255, '1')
img.save("preprocessed.png")

Step 4: Interpreting OCR Output

OCR frequently produces character substitution errors. Document all interpretations explicitly:
Common OCR Misreadings:
  • 0
    (zero) vs
    O
    (letter O) vs
    o
    (lowercase o)
  • 1
    (one) vs
    l
    (lowercase L) vs
    I
    (uppercase i)
  • S
    vs
    5
    vs
    $
  • G
    vs
    6
  • B
    vs
    8
  • :
    vs
    ;
  • sha256
    may appear as
    cha256
    or
    sha2S6
  • Variable names may have incorrect characters (e.g.,
    GALT
    instead of
    SALT
    )
  • Quote characters may be mangled (
    6"
    instead of
    b"
    for byte strings)
  • Array slicing may be garbled (
    h0[:10]
    appearing as
    hof:10]
    )
Process for interpretation:
  1. List each unclear portion of the OCR output
  2. Document the most likely correct interpretation
  3. Explain reasoning for each interpretation
  4. Flag any interpretations with high uncertainty

Step 5: Implementation

When implementing the extracted code:
  1. Preserve the algorithm structure: Follow the logic as written, don't optimize prematurely
  2. Handle encoding explicitly: For cryptographic operations, be explicit about string vs bytes encoding
  3. Add basic error handling: Include try/except for file operations and external calls
  4. Log intermediate values: Print or log intermediate results for debugging

Step 6: Verification

Verify the implementation systematically:
  1. If a hint is provided (e.g., expected output prefix): Use it to validate, but don't rely on it exclusively
  2. Trace through the algorithm manually: Verify your understanding matches the implementation
  3. Test with known inputs: If possible, create test cases with predictable outputs
  4. Check edge cases: Empty inputs, special characters, boundary conditions
Warning: Using hints as the sole validation is brittle. A correct output prefix doesn't guarantee the algorithm is fully correct for all inputs.

Common Pitfalls

OCR-Related

  • Accepting first OCR output without verification: Always cross-check unclear characters
  • Not documenting assumptions: When interpreting garbled text, explicitly state what you're assuming
  • Skipping preprocessing: Image enhancement significantly improves OCR accuracy

Implementation-Related

  • String vs bytes confusion: In Python, cryptographic functions often require bytes (
    b"string"
    ) not strings
  • Missing imports: Ensure all required modules are imported before running
  • Silent failures: Add explicit error messages for file operations

Verification-Related

  • Over-relying on partial hints: A matching prefix doesn't mean the full output is correct
  • Not validating intermediate steps: Check values at each stage, not just the final output
  • Assuming OCR was correct: If output doesn't match expectations, revisit OCR interpretation

Fallback Strategy

If the initial interpretation produces incorrect results:
  1. Re-examine the original image, focusing on unclear characters
  2. Try alternative OCR preprocessing techniques
  3. List all ambiguous characters and test alternative interpretations systematically
  4. If multiple interpretations exist, implement and test each one

Example Workflow

For a task like "Extract pseudocode from image and compute hash":
  1. Check environment:
    which tesseract
    ,
    pip3 list | grep -i pil
  2. Install if needed:
    pip3 install pillow pytesseract
  3. Analyze image:
    file code.png
  4. Extract text with OCR
  5. If garbled, preprocess image and retry OCR
  6. Document interpretations: "OCR shows
    GALT = 6"0000...
    - interpreting as
    SALT = b"0000..."
    because G/S confusion is common and 6" likely represents b" for bytes"
  7. Implement the algorithm
  8. Verify output against any provided hints
  9. If verification fails, revisit step 5-6 with alternative interpretations