smolvlm

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SmolVLM - Local Image Analysis

SmolVLM - 本地图像分析

Analyze images locally using SmolVLM-2B, a state-of-the-art compact vision-language model optimized for Apple Silicon via mlx-vlm.
使用SmolVLM-2B在本地分析图像,这是一款通过mlx-vlm优化、适用于Apple Silicon的先进紧凑型视觉语言模型。

Quick Usage

快速使用

Describe an Image

描述图像

bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png
bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png

Ask a Question About an Image

询问图像相关问题

bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"
bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"

Specific Tasks

特定任务

bash
undefined
bash
undefined

Extract text (OCR)

提取文本(OCR)

python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"
python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"

UI analysis

UI分析

python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"
python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"

Detailed description

详细描述

python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed
undefined
python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed
undefined

Effective Prompts

高效提示词

General Description

通用描述

  • "Describe this image"
    - Basic description
  • "Describe this image in detail, including colors, composition, and any text"
    - Comprehensive
  • "Describe this image"
    - 基础描述
  • "Describe this image in detail, including colors, composition, and any text"
    - 全面描述

Text Extraction (OCR)

文本提取(OCR)

  • "Extract all visible text from this image"
  • "What text appears in this screenshot?"
  • "Read the text in this document"
  • "Extract all visible text from this image"
  • "What text appears in this screenshot?"
  • "Read the text in this document"

UI/Screenshot Analysis

UI/截图分析

  • "Describe the user interface elements"
  • "What buttons and controls are visible?"
  • "Identify the application and its current state"
  • "Describe the user interface elements"
  • "What buttons and controls are visible?"
  • "Identify the application and its current state"

Visual Question Answering

视觉问答

  • "How many [objects] are in this image?"
  • "What color is the [object]?"
  • "Is there a [object] in this image?"
  • "How many [objects] are in this image?"
  • "What color is the [object]?"
  • "Is there a [object] in this image?"

Code/Technical

代码/技术类

  • "What programming language is shown?"
  • "Describe what this code does"
  • "Identify any errors in this code screenshot"
  • "What programming language is shown?"
  • "Describe what this code does"
  • "Identify any errors in this code screenshot"

Model Details

模型详情

SpecValue
ModelSmolVLM-2B-Instruct
Size~4GB
Peak Memory5.8GB
Speed~94 tok/s (M-series)
Supported FormatsPNG, JPG, JPEG, GIF, WebP
规格数值
模型SmolVLM-2B-Instruct
大小~4GB
峰值内存5.8GB
速度~94 tok/s (M系列)
支持格式PNG, JPG, JPEG, GIF, WebP

Requirements

要求

  • macOS with Apple Silicon (M1/M2/M3)
  • Python 3.10+
  • mlx-vlm package:
    uv pip install mlx-vlm --system
  • 搭载Apple Silicon的macOS(M1/M2/M3)
  • Python 3.10及以上版本
  • mlx-vlm包:
    uv pip install mlx-vlm --system

Troubleshooting

故障排除

"Model not found": First run downloads the model (~4GB). Wait for completion.
Out of memory: Close other applications. Model needs ~6GB free RAM.
Slow first inference: Model loading takes 10-15s on first use, subsequent calls are faster.
"Model not found":首次运行会下载模型(约4GB),请等待完成。
内存不足:关闭其他应用程序,模型需要约6GB可用内存。
首次推理速度慢:首次使用时模型加载需要10-15秒,后续调用会更快。