smolvlm
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSmolVLM - Local Image Analysis
SmolVLM - 本地图像分析
Analyze images locally using SmolVLM-2B, a state-of-the-art compact vision-language model optimized for Apple Silicon via mlx-vlm.
使用SmolVLM-2B在本地分析图像,这是一款通过mlx-vlm优化、适用于Apple Silicon的先进紧凑型视觉语言模型。
Quick Usage
快速使用
Describe an Image
描述图像
bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.pngbash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.pngAsk a Question About an Image
询问图像相关问题
bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"bash
python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"Specific Tasks
特定任务
bash
undefinedbash
undefinedExtract text (OCR)
提取文本(OCR)
python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"
python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"
UI analysis
UI分析
python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"
python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"
Detailed description
详细描述
python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed
undefinedpython ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed
undefinedEffective Prompts
高效提示词
General Description
通用描述
- - Basic description
"Describe this image" - - Comprehensive
"Describe this image in detail, including colors, composition, and any text"
- - 基础描述
"Describe this image" - - 全面描述
"Describe this image in detail, including colors, composition, and any text"
Text Extraction (OCR)
文本提取(OCR)
"Extract all visible text from this image""What text appears in this screenshot?""Read the text in this document"
"Extract all visible text from this image""What text appears in this screenshot?""Read the text in this document"
UI/Screenshot Analysis
UI/截图分析
"Describe the user interface elements""What buttons and controls are visible?""Identify the application and its current state"
"Describe the user interface elements""What buttons and controls are visible?""Identify the application and its current state"
Visual Question Answering
视觉问答
"How many [objects] are in this image?""What color is the [object]?""Is there a [object] in this image?"
"How many [objects] are in this image?""What color is the [object]?""Is there a [object] in this image?"
Code/Technical
代码/技术类
"What programming language is shown?""Describe what this code does""Identify any errors in this code screenshot"
"What programming language is shown?""Describe what this code does""Identify any errors in this code screenshot"
Model Details
模型详情
| Spec | Value |
|---|---|
| Model | SmolVLM-2B-Instruct |
| Size | ~4GB |
| Peak Memory | 5.8GB |
| Speed | ~94 tok/s (M-series) |
| Supported Formats | PNG, JPG, JPEG, GIF, WebP |
| 规格 | 数值 |
|---|---|
| 模型 | SmolVLM-2B-Instruct |
| 大小 | ~4GB |
| 峰值内存 | 5.8GB |
| 速度 | ~94 tok/s (M系列) |
| 支持格式 | PNG, JPG, JPEG, GIF, WebP |
Requirements
要求
- macOS with Apple Silicon (M1/M2/M3)
- Python 3.10+
- mlx-vlm package:
uv pip install mlx-vlm --system
- 搭载Apple Silicon的macOS(M1/M2/M3)
- Python 3.10及以上版本
- mlx-vlm包:
uv pip install mlx-vlm --system
Troubleshooting
故障排除
"Model not found": First run downloads the model (~4GB). Wait for completion.
Out of memory: Close other applications. Model needs ~6GB free RAM.
Slow first inference: Model loading takes 10-15s on first use, subsequent calls are faster.
"Model not found":首次运行会下载模型(约4GB),请等待完成。
内存不足:关闭其他应用程序,模型需要约6GB可用内存。
首次推理速度慢:首次使用时模型加载需要10-15秒,后续调用会更快。