vision
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVision Image Processing Skill
视觉图像处理Skill
Overview
概述
This skill leverages Claude's multimodal vision capabilities to analyze, process, and extract insights from images. It supports a wide range of visual understanding tasks including optical character recognition (OCR), image classification, diagram analysis, chart interpretation, and visual comparison.
本Skill利用Claude的多模态视觉能力,对图像进行分析、处理并提取关键信息。它支持多种视觉理解任务,包括光学字符识别(OCR)、图像分类、图表分析、图形解读以及视觉对比等。
When to Use This Skill
适用场景
Activate this skill when users need to:
- Extract text from images, screenshots, or scanned documents (OCR)
- Classify or categorize images based on visual content
- Analyze charts, graphs, or data visualizations to extract insights
- Compare multiple images (diagrams, screenshots, designs)
- Describe or caption images in detail
- Answer questions about visual content
- Detect objects, people, or elements within images
- Analyze UI/UX from screenshots or mockups
- Read handwritten text or notes
- Process receipts, invoices, or forms for data extraction
当用户有以下需求时,可激活本Skill:
- 从图像、截图或扫描文档中提取文本(OCR)
- 根据视觉内容对图像进行分类或归类
- 分析图表、图形或数据可视化内容以提取关键信息
- 对比多张图像(图表、截图、设计稿)
- 详细描述或为图像生成说明
- 回答关于视觉内容的问题
- 检测图像中的物体、人物或元素
- 从截图或原型图中分析UI/UX
- 读取手写文本或笔记
- 处理收据、发票或表单以提取数据
Core Capabilities
核心能力
1. Optical Character Recognition (OCR)
1. 光学字符识别(OCR)
Extract text from images with high accuracy:
Instructions:
- Use the Read tool to load the image file
- Analyze the image and extract all visible text
- Preserve formatting, layout, and structure when possible
- Handle multiple languages and fonts
- Identify and extract text from challenging contexts (handwriting, artistic fonts, rotated text)
Output Format:
- Provide extracted text in markdown format
- Include confidence notes for challenging sections
- Maintain document structure (headings, paragraphs, lists)
Example Use Cases:
- Screenshot text extraction
- Scanned document digitization
- Receipt and invoice processing
- Handwritten note transcription
- Sign and label reading
高精度提取图像中的文本:
操作说明:
- 使用Read工具加载图像文件
- 分析图像并提取所有可见文本
- 尽可能保留格式、布局和结构
- 支持多种语言和字体
- 识别并提取复杂场景中的文本(手写体、艺术字体、旋转文本)
输出格式:
- 以markdown格式提供提取的文本
- 对复杂部分标注置信度说明
- 保留文档结构(标题、段落、列表)
示例用例:
- 截图文本提取
- 扫描文档数字化
- 收据和发票处理
- 手写笔记转录
- 标识和标签读取
2. Image Classification and Categorization
2. 图像分类与归类
Identify and classify image content:
Instructions:
- Analyze the overall subject and context
- Identify primary objects, scenes, or themes
- Provide classification labels with confidence levels
- Detect style, mood, and artistic elements
- Categorize by industry-relevant taxonomies when applicable
Output Format:
markdown
undefined识别并分类图像内容:
操作说明:
- 分析整体主题和上下文
- 识别主要物体、场景或主题
- 提供带置信度的分类标签
- 检测风格、氛围和艺术元素
- 适用时按行业相关分类法进行归类
输出格式:
markdown
undefinedPrimary Classification
主要分类
- Category: [main category]
- Confidence: [High/Medium/Low]
- 类别: [主类别]
- 置信度: 高/中/低
Detected Elements
检测到的元素
- Object 1: [description]
- Object 2: [description] ...
- 物体1: [描述]
- 物体2: [描述] ...
Additional Attributes
附加属性
- Style: [style description]
- Setting: [environment/context]
- Colors: [dominant colors]
undefined- 风格: [风格描述]
- 场景: [环境/上下文]
- 颜色: [主导颜色]
undefined3. Chart and Graph Analysis
3. 图表与图形分析
Extract insights from data visualizations:
Instructions:
- Identify chart type (bar, line, pie, scatter, etc.)
- Extract data points, values, and trends
- Read axes labels, legends, and annotations
- Summarize key insights and patterns
- Flag anomalies or notable data points
Output Format:
markdown
undefined从数据可视化内容中提取关键信息:
操作说明:
- 识别图表类型(柱状图、折线图、饼图、散点图等)
- 提取数据点、数值和趋势
- 读取坐标轴标签、图例和注释
- 总结关键信息和模式
- 标记异常值或值得注意的数据点
输出格式:
markdown
undefinedChart Analysis
图表分析
Type: [Chart Type]
Data Summary:
[Extracted data in table or structured format]
Key Insights:
- [Insight 1]
- [Insight 2]
- [Insight 3]
Trends:
- [Trend description]
Notable Points:
- [Anomalies or important observations]
undefined类型: [图表类型]
数据摘要:
[以表格或结构化格式呈现提取的数据]
关键信息:
- [信息1]
- [信息2]
- [信息3]
趋势:
- [趋势描述]
值得关注的点:
- [异常值或重要观察结果]
undefined4. Diagram and Visual Comparison
4. 图表与视觉对比
Compare multiple images or diagrams:
Instructions:
- Load all images to be compared
- Identify similarities and differences
- Highlight structural, content, and style variations
- Create side-by-side comparison tables
- Note additions, deletions, or modifications
Output Format:
markdown
undefined对比多张图像或图表:
操作说明:
- 加载所有待对比的图像
- 识别相似点和不同点
- 突出结构、内容和风格的差异
- 创建并排对比表格
- 记录新增、删除或修改的内容
输出格式:
markdown
undefinedVisual Comparison
视觉对比
Image 1: [description]
Image 2: [description]
图像1: [描述]
图像2: [描述]
Similarities
相似点
- [Similarity 1]
- [Similarity 2]
- [相似点1]
- [相似点2]
Differences
不同点
| Aspect | Image 1 | Image 2 |
|---|---|---|
| [Aspect] | [Description] | [Description] |
| 维度 | 图像1 | 图像2 |
|---|---|---|
| [维度] | [描述] | [描述] |
Overall Assessment
整体评估
[Summary of comparison]
undefined[对比总结]
undefined5. Detailed Image Description
5. 详细图像描述
Generate comprehensive image descriptions:
Instructions:
- Describe the overall scene or subject
- Identify and describe all visible elements
- Note spatial relationships and composition
- Describe colors, lighting, and atmosphere
- Mention text, logos, or symbols if present
- Consider accessibility (generate alt-text compatible descriptions)
Output Format:
- Natural language description (paragraph form)
- Structured element list (bulleted)
- Technical details (dimensions, format, quality notes)
生成全面的图像描述:
操作说明:
- 描述整体场景或主题
- 识别并描述所有可见元素
- 记录空间关系和构图
- 描述颜色、光线和氛围
- 提及存在的文本、标志或符号
- 考虑可访问性(生成兼容屏幕阅读器的替代文本)
输出格式:
- 自然语言描述(段落形式)
- 结构化元素列表(项目符号)
- 技术细节(尺寸、格式、质量说明)
6. Visual Question Answering
6. 视觉问答
Answer specific questions about image content:
Instructions:
- Carefully read the user's question
- Examine the relevant areas of the image
- Provide accurate, specific answers
- Reference visual evidence when answering
- Acknowledge uncertainty if details are unclear
Best Practices:
- Be precise and factual
- Avoid assumptions beyond what's visible
- Describe what you see, not what you infer (unless asked)
- Use spatial language (top-left, center, background, etc.)
回答关于图像内容的具体问题:
操作说明:
- 仔细阅读用户的问题
- 检查图像的相关区域
- 提供准确、具体的答案
- 回答时引用视觉证据
- 若细节不明确,需说明不确定性
最佳实践:
- 精确且基于事实
- 避免超出可见内容的假设
- 描述所见内容,而非推断内容(除非被要求)
- 使用空间语言(左上、中心、背景等)
7. UI/UX and Design Analysis
7. UI/UX与设计分析
Analyze user interfaces and design elements:
Instructions:
- Identify UI components (buttons, forms, navigation)
- Assess layout and visual hierarchy
- Note design patterns and conventions
- Evaluate accessibility considerations
- Compare against design best practices
- Extract color schemes and typography
Output Format:
markdown
undefined分析用户界面和设计元素:
操作说明:
- 识别UI组件(按钮、表单、导航栏)
- 评估布局和视觉层次
- 记录设计模式和惯例
- 评估可访问性考量
- 与设计最佳实践进行对比
- 提取配色方案和排版
输出格式:
markdown
undefinedUI/UX Analysis
UI/UX分析
Component Inventory:
- [List of UI elements]
Layout Assessment:
- [Layout description and grid analysis]
Design Patterns:
- [Identified patterns]
Accessibility Notes:
- [Contrast, readability, touch targets]
Recommendations:
- [Improvement suggestions]
undefined组件清单:
- [UI元素列表]
布局评估:
- [布局描述和网格分析]
设计模式:
- [识别的模式]
可访问性说明:
- [对比度、可读性、触摸目标]
建议:
- [改进建议]
undefined8. Document and Form Processing
8. 文档与表单处理
Extract structured data from forms, receipts, and documents:
Instructions:
- Identify document type and structure
- Extract field names and values
- Organize data into structured format (JSON, CSV, tables)
- Handle multi-column layouts
- Preserve data relationships and hierarchies
Output Format:
json
{
"document_type": "invoice",
"fields": {
"invoice_number": "value",
"date": "value",
"total": "value"
},
"line_items": [...]
}从表单、收据和文档中提取结构化数据:
操作说明:
- 识别文档类型和结构
- 提取字段名称和值
- 将数据整理为结构化格式(JSON、CSV、表格)
- 处理多列布局
- 保留数据关系和层次结构
输出格式:
json
{
"document_type": "invoice",
"fields": {
"invoice_number": "value",
"date": "value",
"total": "value"
},
"line_items": [...]
}Workflow and Best Practices
工作流程与最佳实践
Standard Vision Processing Workflow
标准视觉处理工作流程
-
Load the Image(s)
- Use the Read tool to access image files
- Support formats: PNG, JPG, JPEG, GIF, WebP, PDF (single page)
-
Understand the Request
- Identify the specific task (OCR, classification, analysis, etc.)
- Note any special requirements or focus areas
-
Analyze the Visual Content
- Apply Claude's vision capabilities to examine the image
- Extract relevant information based on the task
-
Structure the Output
- Format results according to the task type
- Use markdown for readability
- Include confidence indicators where appropriate
-
Validate and Refine
- Check for completeness
- Verify accuracy of extracted data
- Provide follow-up options if needed
-
加载图像
- 使用Read工具访问图像文件
- 支持格式:PNG、JPG、JPEG、GIF、WebP、PDF(单页)
-
理解需求
- 识别具体任务(OCR、分类、分析等)
- 记录任何特殊要求或重点领域
-
分析视觉内容
- 应用Claude的视觉能力检查图像
- 根据任务提取相关信息
-
结构化输出
- 根据任务类型格式化结果
- 使用markdown提升可读性
- 适当位置包含置信度指标
-
验证与优化
- 检查完整性
- 验证提取数据的准确性
- 必要时提供后续操作选项
Quality Guidelines
质量指南
- Accuracy First: Prioritize correct information over comprehensive coverage
- Structured Output: Use consistent formatting for similar tasks
- Confidence Indicators: Note when details are unclear or ambiguous
- Context Awareness: Consider the user's domain and use case
- Accessibility: Generate descriptions suitable for screen readers when appropriate
- 准确性优先: 优先确保信息正确,而非追求全面覆盖
- 结构化输出: 同类任务使用一致的格式
- 置信度指标: 标注细节不明确或模糊的部分
- 上下文感知: 考虑用户的领域和使用场景
- 可访问性: 必要时生成适合屏幕阅读器的描述
Limitations and Considerations
限制与注意事项
- Image Quality: Low resolution or blurry images may reduce accuracy
- Supported Formats: Primarily raster images; vector graphics may need conversion
- Privacy: Be cautious with sensitive information (PII, credentials, etc.)
- Complex Diagrams: Highly technical diagrams may require domain expertise clarification
- Real-Time Data: Cannot access live data or external resources not in the image
- 图像质量: 低分辨率或模糊图像可能降低准确性
- 支持格式: 主要支持光栅图像;矢量图形可能需要转换
- 隐私: 谨慎处理敏感信息(个人身份信息、凭证等)
- 复杂图表: 高度技术化的图表可能需要领域专业知识说明
- 实时数据: 无法访问图像之外的实时数据或外部资源
Advanced Features
高级功能
Batch Processing
批量处理
For multiple images:
markdown
Processing images in batch:
1. [Image1.png] - [Task result]
2. [Image2.png] - [Task result]
3. [Image3.png] - [Task result]
Summary: [Overall findings]针对多张图像:
markdown
批量处理图像:
1. [Image1.png] - [任务结果]
2. [Image2.png] - [任务结果]
3. [Image3.png] - [任务结果]
摘要: [整体发现]Multi-Modal Context
多模态上下文
Combine visual analysis with code, documents, or data:
- Cross-reference image content with codebase files
- Validate design implementations against mockups
- Extract data and populate code templates
- Generate code from UI screenshots
将视觉分析与代码、文档或数据结合:
- 将图像内容与代码库文件交叉引用
- 验证设计实现与原型图是否一致
- 提取数据并填充代码模板
- 从UI截图生成代码
Helper Script Integration
辅助脚本集成
For advanced processing, the skill includes a Python helper script at :
scripts/image_processor.py- Image format conversion
- Metadata extraction
- Batch file operations
- Image preprocessing (resize, enhance, crop)
Usage:
bash
python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]为高级处理,本Skill包含一个Python辅助脚本,路径为:
scripts/image_processor.py- 图像格式转换
- 元数据提取
- 批量文件操作
- 图像预处理(调整大小、增强、裁剪)
使用方法:
bash
python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]Examples
示例
Example 1: OCR from Screenshot
示例1:截图OCR提取
User Request: "Extract the text from this screenshot"
Process:
- Load image with Read tool
- Identify all text regions
- Extract and format text
- Preserve structure
Output:
markdown
undefined用户请求: "提取此截图中的文本"
流程:
- 使用Read工具加载图像
- 识别所有文本区域
- 提取并格式化文本
- 保留结构
输出:
markdown
undefinedExtracted Text
提取的文本
Heading
标题
Main paragraph text goes here...
主要段落内容...
List Items
列表项
- Item 1
- Item 2
- Item 3
Note: Text extracted with high confidence. Minor formatting adjustments made for readability.
undefined- 项1
- 项2
- 项3
说明:文本提取置信度高。为提升可读性,对格式进行了小幅调整。
undefinedExample 2: Chart Analysis
示例2:图表分析
User Request: "Analyze this sales chart and tell me the key trends"
Process:
- Load chart image
- Identify chart type and data
- Extract values and labels
- Analyze trends
Output:
markdown
undefined用户请求: "分析此销售图表并告知关键趋势"
流程:
- 加载图表图像
- 识别图表类型和数据
- 提取数值和标签
- 分析趋势
输出:
markdown
undefinedSales Chart Analysis
销售图表分析
Chart Type: Line graph with 3 series (2023-2025)
Data Points:
| Month | 2023 | 2024 | 2025 |
|---|---|---|---|
| Jan | 120K | 145K | 180K |
| Feb | 135K | 150K | 195K |
| ... |
Key Insights:
- Consistent year-over-year growth of ~20%
- Seasonal peak in Q4 across all years
- 2025 showing accelerated growth trend
Recommendation: Prepare for continued high demand in Q4 2025
undefined图表类型: 折线图,包含3个系列(2023-2025)
数据点:
| 月份 | 2023 | 2024 | 2025 |
|---|---|---|---|
| 1月 | 120K | 145K | 180K |
| 2月 | 135K | 150K | 195K |
| ... |
关键信息:
- 同比持续增长约20%
- 所有年份第四季度均出现季节性峰值
- 2025年增长趋势加速
建议: 为2025年第四季度的持续高需求做好准备
undefinedExample 3: UI Component Comparison
示例3:UI组件对比
User Request: "Compare these two design mockups and highlight the differences"
Process:
- Load both images
- Identify UI components in each
- Compare layouts and elements
- Document changes
Output:
markdown
undefined用户请求: "对比这两个设计原型并突出差异"
流程:
- 加载两张图像
- 识别每张图像中的UI组件
- 对比布局和元素
- 记录变更
输出:
markdown
undefinedDesign Mockup Comparison
设计原型对比
Added Elements (Mockup 2)
新增元素(原型2)
- Search bar in header
- User avatar in top-right corner
- Breadcrumb navigation
- 头部搜索栏
- 右上角用户头像
- 面包屑导航
Removed Elements
移除元素
- Sidebar menu (replaced with top navigation)
- 侧边栏菜单(替换为顶部导航)
Modified Elements
修改元素
| Component | Mockup 1 | Mockup 2 |
|---|---|---|
| Primary Button | Blue (#007AFF) | Green (#34C759) |
| Typography | Roboto 16px | SF Pro 15px |
| Card Spacing | 16px | 24px |
| 组件 | 原型1 | 原型2 |
|---|---|---|
| 主按钮 | 蓝色 (#007AFF) | 绿色 (#34C759) |
| 排版 | Roboto 16px | SF Pro 15px |
| 卡片间距 | 16px | 24px |
Layout Changes
布局变更
- Shifted from sidebar to top navigation
- Increased content area width by ~30%
- Changed from 3-column to 4-column grid
undefined- 从侧边栏导航改为顶部导航
- 内容区域宽度增加约30%
- 从3列网格改为4列网格
undefinedIntegration with Claude Code
与Claude Code集成
This skill works seamlessly with other Claude Code features:
- Read Tool: Load images from the filesystem
- Write Tool: Save processed results or extracted data
- Bash Tool: Run helper scripts for preprocessing
- Task Tool: Coordinate complex multi-image workflows
本Skill可与其他Claude Code功能无缝协作:
- Read工具: 从文件系统加载图像
- Write工具: 保存处理结果或提取的数据
- Bash工具: 运行辅助脚本进行预处理
- Task工具: 协调复杂的多图像工作流程
Quick Reference
快速参考
| Task | Command Pattern | Output Type |
|---|---|---|
| OCR | "Extract text from [image]" | Markdown text |
| Classification | "Classify this image" | Category labels |
| Chart Analysis | "Analyze this chart" | Data + insights |
| Comparison | "Compare [img1] and [img2]" | Diff table |
| Description | "Describe this image" | Paragraph |
| Q&A | "What [question] in this image?" | Answer |
| UI Analysis | "Analyze this UI screenshot" | Component breakdown |
| 任务 | 命令模式 | 输出类型 |
|---|---|---|
| OCR | "提取[图像]中的文本" | Markdown文本 |
| 分类 | "分类此图像" | 类别标签 |
| 图表分析 | "分析此图表" | 数据+关键信息 |
| 对比 | "对比[img1]和[img2]" | 差异表格 |
| 描述 | "描述此图像" | 段落 |
| 问答 | "此图像中的[问题]是什么?" | 答案 |
| UI分析 | "分析此UI截图" | 组件分解 |
Tips for Best Results
最佳效果提示
- Provide Context: Mention the domain or purpose (e.g., "medical diagram," "e-commerce UI")
- Be Specific: Request specific information rather than general analysis
- Multiple Angles: For complex images, ask follow-up questions
- File Paths: Use absolute or relative paths correctly
- Batch Operations: Process multiple similar images together for consistency
- 提供上下文: 提及领域或用途(如“医学图表”、“电商UI”)
- 明确需求: 请求具体信息而非泛泛分析
- 多角度提问: 针对复杂图像,提出后续问题
- 文件路径: 正确使用绝对或相对路径
- 批量操作: 批量处理多张相似图像以保持一致性
Support and Troubleshooting
支持与故障排除
Common Issues:
- "Cannot read image" → Verify file path and format
- "Low confidence extraction" → Image may be too low resolution
- "Unable to detect chart data" → Chart may be too complex or stylized
Getting Better Results:
- Use high-resolution images (300+ DPI for documents)
- Ensure good contrast and lighting
- Crop images to focus on relevant areas
- Provide context about the image content
常见问题:
- “无法读取图像” → 验证文件路径和格式
- “提取置信度低” → 图像分辨率可能过低
- “无法检测图表数据” → 图表可能过于复杂或风格化
提升结果质量:
- 使用高分辨率图像(文档建议300+ DPI)
- 确保对比度和光线良好
- 裁剪图像以聚焦相关区域
- 提供图像内容的上下文
License
许可证
This skill is licensed under Apache-2.0.
本Skill基于Apache-2.0许可证授权。
Version
版本
Version: 1.0.0
Last Updated: 2025-11-18
Compatible with: Claude Code (all versions with vision support)
版本: 1.0.0
最后更新: 2025-11-18
兼容版本: Claude Code(所有支持视觉功能的版本)