vision

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vision Image Processing Skill

视觉图像处理Skill

Overview

概述

This skill leverages Claude's multimodal vision capabilities to analyze, process, and extract insights from images. It supports a wide range of visual understanding tasks including optical character recognition (OCR), image classification, diagram analysis, chart interpretation, and visual comparison.
本Skill利用Claude的多模态视觉能力,对图像进行分析、处理并提取关键信息。它支持多种视觉理解任务,包括光学字符识别(OCR)、图像分类、图表分析、图形解读以及视觉对比等。

When to Use This Skill

适用场景

Activate this skill when users need to:
  • Extract text from images, screenshots, or scanned documents (OCR)
  • Classify or categorize images based on visual content
  • Analyze charts, graphs, or data visualizations to extract insights
  • Compare multiple images (diagrams, screenshots, designs)
  • Describe or caption images in detail
  • Answer questions about visual content
  • Detect objects, people, or elements within images
  • Analyze UI/UX from screenshots or mockups
  • Read handwritten text or notes
  • Process receipts, invoices, or forms for data extraction
当用户有以下需求时,可激活本Skill:
  • 从图像、截图或扫描文档中提取文本(OCR)
  • 根据视觉内容对图像进行分类或归类
  • 分析图表、图形或数据可视化内容以提取关键信息
  • 对比多张图像(图表、截图、设计稿)
  • 详细描述或为图像生成说明
  • 回答关于视觉内容的问题
  • 检测图像中的物体、人物或元素
  • 从截图或原型图中分析UI/UX
  • 读取手写文本或笔记
  • 处理收据、发票或表单以提取数据

Core Capabilities

核心能力

1. Optical Character Recognition (OCR)

1. 光学字符识别(OCR)

Extract text from images with high accuracy:
Instructions:
  • Use the Read tool to load the image file
  • Analyze the image and extract all visible text
  • Preserve formatting, layout, and structure when possible
  • Handle multiple languages and fonts
  • Identify and extract text from challenging contexts (handwriting, artistic fonts, rotated text)
Output Format:
  • Provide extracted text in markdown format
  • Include confidence notes for challenging sections
  • Maintain document structure (headings, paragraphs, lists)
Example Use Cases:
  • Screenshot text extraction
  • Scanned document digitization
  • Receipt and invoice processing
  • Handwritten note transcription
  • Sign and label reading
高精度提取图像中的文本:
操作说明:
  • 使用Read工具加载图像文件
  • 分析图像并提取所有可见文本
  • 尽可能保留格式、布局和结构
  • 支持多种语言和字体
  • 识别并提取复杂场景中的文本(手写体、艺术字体、旋转文本)
输出格式:
  • 以markdown格式提供提取的文本
  • 对复杂部分标注置信度说明
  • 保留文档结构(标题、段落、列表)
示例用例:
  • 截图文本提取
  • 扫描文档数字化
  • 收据和发票处理
  • 手写笔记转录
  • 标识和标签读取

2. Image Classification and Categorization

2. 图像分类与归类

Identify and classify image content:
Instructions:
  • Analyze the overall subject and context
  • Identify primary objects, scenes, or themes
  • Provide classification labels with confidence levels
  • Detect style, mood, and artistic elements
  • Categorize by industry-relevant taxonomies when applicable
Output Format:
markdown
undefined
识别并分类图像内容:
操作说明:
  • 分析整体主题和上下文
  • 识别主要物体、场景或主题
  • 提供带置信度的分类标签
  • 检测风格、氛围和艺术元素
  • 适用时按行业相关分类法进行归类
输出格式:
markdown
undefined

Primary Classification

主要分类

  • Category: [main category]
  • Confidence: [High/Medium/Low]
  • 类别: [主类别]
  • 置信度: 高/中/低

Detected Elements

检测到的元素

  • Object 1: [description]
  • Object 2: [description] ...
  • 物体1: [描述]
  • 物体2: [描述] ...

Additional Attributes

附加属性

  • Style: [style description]
  • Setting: [environment/context]
  • Colors: [dominant colors]
undefined
  • 风格: [风格描述]
  • 场景: [环境/上下文]
  • 颜色: [主导颜色]
undefined

3. Chart and Graph Analysis

3. 图表与图形分析

Extract insights from data visualizations:
Instructions:
  • Identify chart type (bar, line, pie, scatter, etc.)
  • Extract data points, values, and trends
  • Read axes labels, legends, and annotations
  • Summarize key insights and patterns
  • Flag anomalies or notable data points
Output Format:
markdown
undefined
从数据可视化内容中提取关键信息:
操作说明:
  • 识别图表类型(柱状图、折线图、饼图、散点图等)
  • 提取数据点、数值和趋势
  • 读取坐标轴标签、图例和注释
  • 总结关键信息和模式
  • 标记异常值或值得注意的数据点
输出格式:
markdown
undefined

Chart Analysis

图表分析

Type: [Chart Type]
Data Summary: [Extracted data in table or structured format]
Key Insights:
  1. [Insight 1]
  2. [Insight 2]
  3. [Insight 3]
Trends:
  • [Trend description]
Notable Points:
  • [Anomalies or important observations]
undefined
类型: [图表类型]
数据摘要: [以表格或结构化格式呈现提取的数据]
关键信息:
  1. [信息1]
  2. [信息2]
  3. [信息3]
趋势:
  • [趋势描述]
值得关注的点:
  • [异常值或重要观察结果]
undefined

4. Diagram and Visual Comparison

4. 图表与视觉对比

Compare multiple images or diagrams:
Instructions:
  • Load all images to be compared
  • Identify similarities and differences
  • Highlight structural, content, and style variations
  • Create side-by-side comparison tables
  • Note additions, deletions, or modifications
Output Format:
markdown
undefined
对比多张图像或图表:
操作说明:
  • 加载所有待对比的图像
  • 识别相似点和不同点
  • 突出结构、内容和风格的差异
  • 创建并排对比表格
  • 记录新增、删除或修改的内容
输出格式:
markdown
undefined

Visual Comparison

视觉对比

Image 1: [description] Image 2: [description]
图像1: [描述] 图像2: [描述]

Similarities

相似点

  • [Similarity 1]
  • [Similarity 2]
  • [相似点1]
  • [相似点2]

Differences

不同点

AspectImage 1Image 2
[Aspect][Description][Description]
维度图像1图像2
[维度][描述][描述]

Overall Assessment

整体评估

[Summary of comparison]
undefined
[对比总结]
undefined

5. Detailed Image Description

5. 详细图像描述

Generate comprehensive image descriptions:
Instructions:
  • Describe the overall scene or subject
  • Identify and describe all visible elements
  • Note spatial relationships and composition
  • Describe colors, lighting, and atmosphere
  • Mention text, logos, or symbols if present
  • Consider accessibility (generate alt-text compatible descriptions)
Output Format:
  • Natural language description (paragraph form)
  • Structured element list (bulleted)
  • Technical details (dimensions, format, quality notes)
生成全面的图像描述:
操作说明:
  • 描述整体场景或主题
  • 识别并描述所有可见元素
  • 记录空间关系和构图
  • 描述颜色、光线和氛围
  • 提及存在的文本、标志或符号
  • 考虑可访问性(生成兼容屏幕阅读器的替代文本)
输出格式:
  • 自然语言描述(段落形式)
  • 结构化元素列表(项目符号)
  • 技术细节(尺寸、格式、质量说明)

6. Visual Question Answering

6. 视觉问答

Answer specific questions about image content:
Instructions:
  • Carefully read the user's question
  • Examine the relevant areas of the image
  • Provide accurate, specific answers
  • Reference visual evidence when answering
  • Acknowledge uncertainty if details are unclear
Best Practices:
  • Be precise and factual
  • Avoid assumptions beyond what's visible
  • Describe what you see, not what you infer (unless asked)
  • Use spatial language (top-left, center, background, etc.)
回答关于图像内容的具体问题:
操作说明:
  • 仔细阅读用户的问题
  • 检查图像的相关区域
  • 提供准确、具体的答案
  • 回答时引用视觉证据
  • 若细节不明确,需说明不确定性
最佳实践:
  • 精确且基于事实
  • 避免超出可见内容的假设
  • 描述所见内容,而非推断内容(除非被要求)
  • 使用空间语言(左上、中心、背景等)

7. UI/UX and Design Analysis

7. UI/UX与设计分析

Analyze user interfaces and design elements:
Instructions:
  • Identify UI components (buttons, forms, navigation)
  • Assess layout and visual hierarchy
  • Note design patterns and conventions
  • Evaluate accessibility considerations
  • Compare against design best practices
  • Extract color schemes and typography
Output Format:
markdown
undefined
分析用户界面和设计元素:
操作说明:
  • 识别UI组件(按钮、表单、导航栏)
  • 评估布局和视觉层次
  • 记录设计模式和惯例
  • 评估可访问性考量
  • 与设计最佳实践进行对比
  • 提取配色方案和排版
输出格式:
markdown
undefined

UI/UX Analysis

UI/UX分析

Component Inventory:
  • [List of UI elements]
Layout Assessment:
  • [Layout description and grid analysis]
Design Patterns:
  • [Identified patterns]
Accessibility Notes:
  • [Contrast, readability, touch targets]
Recommendations:
  • [Improvement suggestions]
undefined
组件清单:
  • [UI元素列表]
布局评估:
  • [布局描述和网格分析]
设计模式:
  • [识别的模式]
可访问性说明:
  • [对比度、可读性、触摸目标]
建议:
  • [改进建议]
undefined

8. Document and Form Processing

8. 文档与表单处理

Extract structured data from forms, receipts, and documents:
Instructions:
  • Identify document type and structure
  • Extract field names and values
  • Organize data into structured format (JSON, CSV, tables)
  • Handle multi-column layouts
  • Preserve data relationships and hierarchies
Output Format:
json
{
  "document_type": "invoice",
  "fields": {
    "invoice_number": "value",
    "date": "value",
    "total": "value"
  },
  "line_items": [...]
}
从表单、收据和文档中提取结构化数据:
操作说明:
  • 识别文档类型和结构
  • 提取字段名称和值
  • 将数据整理为结构化格式(JSON、CSV、表格)
  • 处理多列布局
  • 保留数据关系和层次结构
输出格式:
json
{
  "document_type": "invoice",
  "fields": {
    "invoice_number": "value",
    "date": "value",
    "total": "value"
  },
  "line_items": [...]
}

Workflow and Best Practices

工作流程与最佳实践

Standard Vision Processing Workflow

标准视觉处理工作流程

  1. Load the Image(s)
    • Use the Read tool to access image files
    • Support formats: PNG, JPG, JPEG, GIF, WebP, PDF (single page)
  2. Understand the Request
    • Identify the specific task (OCR, classification, analysis, etc.)
    • Note any special requirements or focus areas
  3. Analyze the Visual Content
    • Apply Claude's vision capabilities to examine the image
    • Extract relevant information based on the task
  4. Structure the Output
    • Format results according to the task type
    • Use markdown for readability
    • Include confidence indicators where appropriate
  5. Validate and Refine
    • Check for completeness
    • Verify accuracy of extracted data
    • Provide follow-up options if needed
  1. 加载图像
    • 使用Read工具访问图像文件
    • 支持格式:PNG、JPG、JPEG、GIF、WebP、PDF(单页)
  2. 理解需求
    • 识别具体任务(OCR、分类、分析等)
    • 记录任何特殊要求或重点领域
  3. 分析视觉内容
    • 应用Claude的视觉能力检查图像
    • 根据任务提取相关信息
  4. 结构化输出
    • 根据任务类型格式化结果
    • 使用markdown提升可读性
    • 适当位置包含置信度指标
  5. 验证与优化
    • 检查完整性
    • 验证提取数据的准确性
    • 必要时提供后续操作选项

Quality Guidelines

质量指南

  • Accuracy First: Prioritize correct information over comprehensive coverage
  • Structured Output: Use consistent formatting for similar tasks
  • Confidence Indicators: Note when details are unclear or ambiguous
  • Context Awareness: Consider the user's domain and use case
  • Accessibility: Generate descriptions suitable for screen readers when appropriate
  • 准确性优先: 优先确保信息正确,而非追求全面覆盖
  • 结构化输出: 同类任务使用一致的格式
  • 置信度指标: 标注细节不明确或模糊的部分
  • 上下文感知: 考虑用户的领域和使用场景
  • 可访问性: 必要时生成适合屏幕阅读器的描述

Limitations and Considerations

限制与注意事项

  • Image Quality: Low resolution or blurry images may reduce accuracy
  • Supported Formats: Primarily raster images; vector graphics may need conversion
  • Privacy: Be cautious with sensitive information (PII, credentials, etc.)
  • Complex Diagrams: Highly technical diagrams may require domain expertise clarification
  • Real-Time Data: Cannot access live data or external resources not in the image
  • 图像质量: 低分辨率或模糊图像可能降低准确性
  • 支持格式: 主要支持光栅图像;矢量图形可能需要转换
  • 隐私: 谨慎处理敏感信息(个人身份信息、凭证等)
  • 复杂图表: 高度技术化的图表可能需要领域专业知识说明
  • 实时数据: 无法访问图像之外的实时数据或外部资源

Advanced Features

高级功能

Batch Processing

批量处理

For multiple images:
markdown
Processing images in batch:
1. [Image1.png] - [Task result]
2. [Image2.png] - [Task result]
3. [Image3.png] - [Task result]

Summary: [Overall findings]
针对多张图像:
markdown
批量处理图像:
1. [Image1.png] - [任务结果]
2. [Image2.png] - [任务结果]
3. [Image3.png] - [任务结果]

摘要: [整体发现]

Multi-Modal Context

多模态上下文

Combine visual analysis with code, documents, or data:
  • Cross-reference image content with codebase files
  • Validate design implementations against mockups
  • Extract data and populate code templates
  • Generate code from UI screenshots
将视觉分析与代码、文档或数据结合:
  • 将图像内容与代码库文件交叉引用
  • 验证设计实现与原型图是否一致
  • 提取数据并填充代码模板
  • 从UI截图生成代码

Helper Script Integration

辅助脚本集成

For advanced processing, the skill includes a Python helper script at
scripts/image_processor.py
:
  • Image format conversion
  • Metadata extraction
  • Batch file operations
  • Image preprocessing (resize, enhance, crop)
Usage:
bash
python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]
为高级处理,本Skill包含一个Python辅助脚本,路径为
scripts/image_processor.py
  • 图像格式转换
  • 元数据提取
  • 批量文件操作
  • 图像预处理(调整大小、增强、裁剪)
使用方法:
bash
python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]

Examples

示例

Example 1: OCR from Screenshot

示例1:截图OCR提取

User Request: "Extract the text from this screenshot"
Process:
  1. Load image with Read tool
  2. Identify all text regions
  3. Extract and format text
  4. Preserve structure
Output:
markdown
undefined
用户请求: "提取此截图中的文本"
流程:
  1. 使用Read工具加载图像
  2. 识别所有文本区域
  3. 提取并格式化文本
  4. 保留结构
输出:
markdown
undefined

Extracted Text

提取的文本

Heading

标题

Main paragraph text goes here...
主要段落内容...

List Items

列表项

  • Item 1
  • Item 2
  • Item 3

Note: Text extracted with high confidence. Minor formatting adjustments made for readability.
undefined
  • 项1
  • 项2
  • 项3

说明:文本提取置信度高。为提升可读性,对格式进行了小幅调整。
undefined

Example 2: Chart Analysis

示例2:图表分析

User Request: "Analyze this sales chart and tell me the key trends"
Process:
  1. Load chart image
  2. Identify chart type and data
  3. Extract values and labels
  4. Analyze trends
Output:
markdown
undefined
用户请求: "分析此销售图表并告知关键趋势"
流程:
  1. 加载图表图像
  2. 识别图表类型和数据
  3. 提取数值和标签
  4. 分析趋势
输出:
markdown
undefined

Sales Chart Analysis

销售图表分析

Chart Type: Line graph with 3 series (2023-2025)
Data Points:
Month202320242025
Jan120K145K180K
Feb135K150K195K
...
Key Insights:
  1. Consistent year-over-year growth of ~20%
  2. Seasonal peak in Q4 across all years
  3. 2025 showing accelerated growth trend
Recommendation: Prepare for continued high demand in Q4 2025
undefined
图表类型: 折线图,包含3个系列(2023-2025)
数据点:
月份202320242025
1月120K145K180K
2月135K150K195K
...
关键信息:
  1. 同比持续增长约20%
  2. 所有年份第四季度均出现季节性峰值
  3. 2025年增长趋势加速
建议: 为2025年第四季度的持续高需求做好准备
undefined

Example 3: UI Component Comparison

示例3:UI组件对比

User Request: "Compare these two design mockups and highlight the differences"
Process:
  1. Load both images
  2. Identify UI components in each
  3. Compare layouts and elements
  4. Document changes
Output:
markdown
undefined
用户请求: "对比这两个设计原型并突出差异"
流程:
  1. 加载两张图像
  2. 识别每张图像中的UI组件
  3. 对比布局和元素
  4. 记录变更
输出:
markdown
undefined

Design Mockup Comparison

设计原型对比

Added Elements (Mockup 2)

新增元素(原型2)

  • Search bar in header
  • User avatar in top-right corner
  • Breadcrumb navigation
  • 头部搜索栏
  • 右上角用户头像
  • 面包屑导航

Removed Elements

移除元素

  • Sidebar menu (replaced with top navigation)
  • 侧边栏菜单(替换为顶部导航)

Modified Elements

修改元素

ComponentMockup 1Mockup 2
Primary ButtonBlue (#007AFF)Green (#34C759)
TypographyRoboto 16pxSF Pro 15px
Card Spacing16px24px
组件原型1原型2
主按钮蓝色 (#007AFF)绿色 (#34C759)
排版Roboto 16pxSF Pro 15px
卡片间距16px24px

Layout Changes

布局变更

  • Shifted from sidebar to top navigation
  • Increased content area width by ~30%
  • Changed from 3-column to 4-column grid
undefined
  • 从侧边栏导航改为顶部导航
  • 内容区域宽度增加约30%
  • 从3列网格改为4列网格
undefined

Integration with Claude Code

与Claude Code集成

This skill works seamlessly with other Claude Code features:
  • Read Tool: Load images from the filesystem
  • Write Tool: Save processed results or extracted data
  • Bash Tool: Run helper scripts for preprocessing
  • Task Tool: Coordinate complex multi-image workflows
本Skill可与其他Claude Code功能无缝协作:
  • Read工具: 从文件系统加载图像
  • Write工具: 保存处理结果或提取的数据
  • Bash工具: 运行辅助脚本进行预处理
  • Task工具: 协调复杂的多图像工作流程

Quick Reference

快速参考

TaskCommand PatternOutput Type
OCR"Extract text from [image]"Markdown text
Classification"Classify this image"Category labels
Chart Analysis"Analyze this chart"Data + insights
Comparison"Compare [img1] and [img2]"Diff table
Description"Describe this image"Paragraph
Q&A"What [question] in this image?"Answer
UI Analysis"Analyze this UI screenshot"Component breakdown
任务命令模式输出类型
OCR"提取[图像]中的文本"Markdown文本
分类"分类此图像"类别标签
图表分析"分析此图表"数据+关键信息
对比"对比[img1]和[img2]"差异表格
描述"描述此图像"段落
问答"此图像中的[问题]是什么?"答案
UI分析"分析此UI截图"组件分解

Tips for Best Results

最佳效果提示

  1. Provide Context: Mention the domain or purpose (e.g., "medical diagram," "e-commerce UI")
  2. Be Specific: Request specific information rather than general analysis
  3. Multiple Angles: For complex images, ask follow-up questions
  4. File Paths: Use absolute or relative paths correctly
  5. Batch Operations: Process multiple similar images together for consistency
  1. 提供上下文: 提及领域或用途(如“医学图表”、“电商UI”)
  2. 明确需求: 请求具体信息而非泛泛分析
  3. 多角度提问: 针对复杂图像,提出后续问题
  4. 文件路径: 正确使用绝对或相对路径
  5. 批量操作: 批量处理多张相似图像以保持一致性

Support and Troubleshooting

支持与故障排除

Common Issues:
  • "Cannot read image" → Verify file path and format
  • "Low confidence extraction" → Image may be too low resolution
  • "Unable to detect chart data" → Chart may be too complex or stylized
Getting Better Results:
  • Use high-resolution images (300+ DPI for documents)
  • Ensure good contrast and lighting
  • Crop images to focus on relevant areas
  • Provide context about the image content

常见问题:
  • “无法读取图像” → 验证文件路径和格式
  • “提取置信度低” → 图像分辨率可能过低
  • “无法检测图表数据” → 图表可能过于复杂或风格化
提升结果质量:
  • 使用高分辨率图像(文档建议300+ DPI)
  • 确保对比度和光线良好
  • 裁剪图像以聚焦相关区域
  • 提供图像内容的上下文

License

许可证

This skill is licensed under Apache-2.0.
本Skill基于Apache-2.0许可证授权。

Version

版本

Version: 1.0.0 Last Updated: 2025-11-18 Compatible with: Claude Code (all versions with vision support)
版本: 1.0.0 最后更新: 2025-11-18 兼容版本: Claude Code(所有支持视觉功能的版本)