smolvlm

Original：🇺🇸 English

Translated

1 scriptsChecked / no sensitive code detected

Local vision-language model for image analysis using SmolVLM-2B

10installs

Sourcetdimino/claude-code-minoan

Added on2026-02-22

NPX Install

npx skill4agent add tdimino/claude-code-minoan smolvlm

SKILL.md Content

View Translation Comparison →

SmolVLM - Local Image Analysis

Analyze images locally using SmolVLM-2B, a state-of-the-art compact vision-language model optimized for Apple Silicon via mlx-vlm.

Quick Usage

Describe an Image

bash

python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png

Ask a Question About an Image

bash

python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"

Specific Tasks

bash

# Extract text (OCR)
python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"

# UI analysis
python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"

# Detailed description
python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed

Effective Prompts

General Description

```
"Describe this image"
```
- Basic description

"Describe this image in detail, including colors, composition, and any text"

- Comprehensive

Text Extraction (OCR)

"Extract all visible text from this image"

```
"What text appears in this screenshot?"
```
```
"Read the text in this document"
```

UI/Screenshot Analysis

```
"Describe the user interface elements"
```

"What buttons and controls are visible?"

"Identify the application and its current state"

Visual Question Answering

```
"How many [objects] are in this image?"
```
```
"What color is the [object]?"
```
```
"Is there a [object] in this image?"
```

Code/Technical

```
"What programming language is shown?"
```
```
"Describe what this code does"
```

"Identify any errors in this code screenshot"

Model Details

Spec	Value
Model	SmolVLM-2B-Instruct
Size	~4GB
Peak Memory	5.8GB
Speed	~94 tok/s (M-series)
Supported Formats	PNG, JPG, JPEG, GIF, WebP

Requirements

macOS with Apple Silicon (M1/M2/M3)
Python 3.10+
mlx-vlm package:
```
uv pip install mlx-vlm --system
```

Troubleshooting

"Model not found": First run downloads the model (~4GB). Wait for completion.

Out of memory: Close other applications. Model needs ~6GB free RAM.

Slow first inference: Model loading takes 10-15s on first use, subsequent calls are faster.

smolvlm

NPX Install

Tags

SKILL.md Content

SmolVLM - Local Image Analysis

Quick Usage

Describe an Image

Ask a Question About an Image

Specific Tasks

Effective Prompts

General Description

Text Extraction (OCR)

UI/Screenshot Analysis

Visual Question Answering

Code/Technical

Model Details

Requirements

Troubleshooting