vision

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vision Image Processing Skill

视觉图像处理Skill

Overview

概述

This skill leverages Claude's multimodal vision capabilities to analyze, process, and extract insights from images. It supports a wide range of visual understanding tasks including optical character recognition (OCR), image classification, diagram analysis, chart interpretation, and visual comparison.

本Skill利用Claude的多模态视觉能力，对图像进行分析、处理并提取关键信息。它支持多种视觉理解任务，包括光学字符识别（OCR）、图像分类、图表分析、图形解读以及视觉对比等。

When to Use This Skill

适用场景

Activate this skill when users need to:

Extract text from images, screenshots, or scanned documents (OCR)
Classify or categorize images based on visual content
Analyze charts, graphs, or data visualizations to extract insights
Compare multiple images (diagrams, screenshots, designs)
Describe or caption images in detail
Answer questions about visual content
Detect objects, people, or elements within images
Analyze UI/UX from screenshots or mockups
Read handwritten text or notes
Process receipts, invoices, or forms for data extraction

当用户有以下需求时，可激活本Skill：

从图像、截图或扫描文档中提取文本（OCR）
根据视觉内容对图像进行分类或归类
分析图表、图形或数据可视化内容以提取关键信息
对比多张图像（图表、截图、设计稿）
详细描述或为图像生成说明
回答关于视觉内容的问题
检测图像中的物体、人物或元素
从截图或原型图中分析UI/UX
读取手写文本或笔记
处理收据、发票或表单以提取数据

Core Capabilities

核心能力

1. Optical Character Recognition (OCR)

1. 光学字符识别（OCR）

Extract text from images with high accuracy:

Instructions:

Use the Read tool to load the image file
Analyze the image and extract all visible text
Preserve formatting, layout, and structure when possible
Handle multiple languages and fonts
Identify and extract text from challenging contexts (handwriting, artistic fonts, rotated text)

Output Format:

Provide extracted text in markdown format
Include confidence notes for challenging sections
Maintain document structure (headings, paragraphs, lists)

Example Use Cases:

Screenshot text extraction
Scanned document digitization
Receipt and invoice processing
Handwritten note transcription
Sign and label reading

高精度提取图像中的文本：

操作说明：

使用Read工具加载图像文件
分析图像并提取所有可见文本
尽可能保留格式、布局和结构
支持多种语言和字体
识别并提取复杂场景中的文本（手写体、艺术字体、旋转文本）

输出格式：

以markdown格式提供提取的文本
对复杂部分标注置信度说明
保留文档结构（标题、段落、列表）

示例用例：

截图文本提取
扫描文档数字化
收据和发票处理
手写笔记转录
标识和标签读取

2. Image Classification and Categorization

2. 图像分类与归类

Identify and classify image content:

Instructions:

Analyze the overall subject and context
Identify primary objects, scenes, or themes
Provide classification labels with confidence levels
Detect style, mood, and artistic elements
Categorize by industry-relevant taxonomies when applicable

Output Format:

markdown

undefined

识别并分类图像内容：

操作说明：

分析整体主题和上下文
识别主要物体、场景或主题
提供带置信度的分类标签
检测风格、氛围和艺术元素
适用时按行业相关分类法进行归类

输出格式：

markdown

undefined

Primary Classification

主要分类

Category: [main category]
Confidence: [High/Medium/Low]

类别: [主类别]
置信度: 高/中/低

Detected Elements

检测到的元素

Object 1: [description]
Object 2: [description] ...

物体1: [描述]
物体2: [描述] ...

Additional Attributes

附加属性

Style: [style description]
Setting: [environment/context]
Colors: [dominant colors]

undefined

风格: [风格描述]
场景: [环境/上下文]
颜色: [主导颜色]

undefined

3. Chart and Graph Analysis

3. 图表与图形分析

Extract insights from data visualizations:

Instructions:

Identify chart type (bar, line, pie, scatter, etc.)
Extract data points, values, and trends
Read axes labels, legends, and annotations
Summarize key insights and patterns
Flag anomalies or notable data points

Output Format:

markdown

undefined

从数据可视化内容中提取关键信息：

操作说明：

识别图表类型（柱状图、折线图、饼图、散点图等）
提取数据点、数值和趋势
读取坐标轴标签、图例和注释
总结关键信息和模式
标记异常值或值得注意的数据点

输出格式：

markdown

undefined

Chart Analysis

图表分析

Type: [Chart Type]

Data Summary: [Extracted data in table or structured format]

Key Insights:

[Insight 1]
[Insight 2]
[Insight 3]

Trends:

[Trend description]

Notable Points:

[Anomalies or important observations]

undefined

类型: [图表类型]

数据摘要: [以表格或结构化格式呈现提取的数据]

关键信息:

[信息1]
[信息2]
[信息3]

趋势:

[趋势描述]

值得关注的点:

[异常值或重要观察结果]

undefined

4. Diagram and Visual Comparison

4. 图表与视觉对比

Compare multiple images or diagrams:

Instructions:

Load all images to be compared
Identify similarities and differences
Highlight structural, content, and style variations
Create side-by-side comparison tables
Note additions, deletions, or modifications

Output Format:

markdown

undefined

对比多张图像或图表：

操作说明：

加载所有待对比的图像
识别相似点和不同点
突出结构、内容和风格的差异
创建并排对比表格
记录新增、删除或修改的内容

输出格式：

markdown

undefined

Visual Comparison

视觉对比

Image 1: [description] Image 2: [description]

图像1: [描述] 图像2: [描述]

Similarities

相似点

[Similarity 1]
[Similarity 2]

[相似点1]
[相似点2]

Differences

不同点

Aspect	Image 1	Image 2
[Aspect]	[Description]	[Description]

维度	图像1	图像2
[维度]	[描述]	[描述]

Overall Assessment

整体评估

[Summary of comparison]

undefined

[对比总结]

undefined

5. Detailed Image Description

5. 详细图像描述

Generate comprehensive image descriptions:

Instructions:

Describe the overall scene or subject
Identify and describe all visible elements
Note spatial relationships and composition
Describe colors, lighting, and atmosphere
Mention text, logos, or symbols if present
Consider accessibility (generate alt-text compatible descriptions)

Output Format:

Natural language description (paragraph form)
Structured element list (bulleted)
Technical details (dimensions, format, quality notes)

生成全面的图像描述：

操作说明：

描述整体场景或主题
识别并描述所有可见元素
记录空间关系和构图
描述颜色、光线和氛围
提及存在的文本、标志或符号
考虑可访问性（生成兼容屏幕阅读器的替代文本）

输出格式：

自然语言描述（段落形式）
结构化元素列表（项目符号）
技术细节（尺寸、格式、质量说明）

6. Visual Question Answering

6. 视觉问答

Answer specific questions about image content:

Instructions:

Carefully read the user's question
Examine the relevant areas of the image
Provide accurate, specific answers
Reference visual evidence when answering
Acknowledge uncertainty if details are unclear

Best Practices:

Be precise and factual
Avoid assumptions beyond what's visible
Describe what you see, not what you infer (unless asked)
Use spatial language (top-left, center, background, etc.)

回答关于图像内容的具体问题：

操作说明：

仔细阅读用户的问题
检查图像的相关区域
提供准确、具体的答案
回答时引用视觉证据
若细节不明确，需说明不确定性

最佳实践：

精确且基于事实
避免超出可见内容的假设
描述所见内容，而非推断内容（除非被要求）
使用空间语言（左上、中心、背景等）

7. UI/UX and Design Analysis

7. UI/UX与设计分析

Analyze user interfaces and design elements:

Instructions:

Identify UI components (buttons, forms, navigation)
Assess layout and visual hierarchy
Note design patterns and conventions
Evaluate accessibility considerations
Compare against design best practices
Extract color schemes and typography

Output Format:

markdown

undefined

分析用户界面和设计元素：

操作说明：

识别UI组件（按钮、表单、导航栏）
评估布局和视觉层次
记录设计模式和惯例
评估可访问性考量
与设计最佳实践进行对比
提取配色方案和排版

输出格式：

markdown

undefined

UI/UX Analysis

UI/UX分析

Component Inventory:

[List of UI elements]

Layout Assessment:

[Layout description and grid analysis]

Design Patterns:

[Identified patterns]

Accessibility Notes:

[Contrast, readability, touch targets]

Recommendations:

[Improvement suggestions]

undefined

组件清单:

[UI元素列表]

布局评估:

[布局描述和网格分析]

设计模式:

[识别的模式]

可访问性说明:

[对比度、可读性、触摸目标]

建议:

[改进建议]

undefined

8. Document and Form Processing

8. 文档与表单处理

Extract structured data from forms, receipts, and documents:

Instructions:

Identify document type and structure
Extract field names and values
Organize data into structured format (JSON, CSV, tables)
Handle multi-column layouts
Preserve data relationships and hierarchies

Output Format:

json

{
  "document_type": "invoice",
  "fields": {
    "invoice_number": "value",
    "date": "value",
    "total": "value"
  },
  "line_items": [...]
}

从表单、收据和文档中提取结构化数据：

操作说明：

识别文档类型和结构
提取字段名称和值
将数据整理为结构化格式（JSON、CSV、表格）
处理多列布局
保留数据关系和层次结构

输出格式：

json

{
  "document_type": "invoice",
  "fields": {
    "invoice_number": "value",
    "date": "value",
    "total": "value"
  },
  "line_items": [...]
}

Workflow and Best Practices

工作流程与最佳实践

Standard Vision Processing Workflow

标准视觉处理工作流程

Load the Image(s)
- Use the Read tool to access image files
- Support formats: PNG, JPG, JPEG, GIF, WebP, PDF (single page)
Understand the Request
- Identify the specific task (OCR, classification, analysis, etc.)
- Note any special requirements or focus areas
Analyze the Visual Content
- Apply Claude's vision capabilities to examine the image
- Extract relevant information based on the task
Structure the Output
- Format results according to the task type
- Use markdown for readability
- Include confidence indicators where appropriate
Validate and Refine
- Check for completeness
- Verify accuracy of extracted data
- Provide follow-up options if needed

加载图像
- 使用Read工具访问图像文件
- 支持格式：PNG、JPG、JPEG、GIF、WebP、PDF（单页）
理解需求
- 识别具体任务（OCR、分类、分析等）
- 记录任何特殊要求或重点领域
分析视觉内容
- 应用Claude的视觉能力检查图像
- 根据任务提取相关信息
结构化输出
- 根据任务类型格式化结果
- 使用markdown提升可读性
- 适当位置包含置信度指标
验证与优化
- 检查完整性
- 验证提取数据的准确性
- 必要时提供后续操作选项

Quality Guidelines

质量指南

Accuracy First: Prioritize correct information over comprehensive coverage
Structured Output: Use consistent formatting for similar tasks
Confidence Indicators: Note when details are unclear or ambiguous
Context Awareness: Consider the user's domain and use case
Accessibility: Generate descriptions suitable for screen readers when appropriate

准确性优先： 优先确保信息正确，而非追求全面覆盖
结构化输出： 同类任务使用一致的格式
置信度指标： 标注细节不明确或模糊的部分
上下文感知： 考虑用户的领域和使用场景
可访问性： 必要时生成适合屏幕阅读器的描述

Limitations and Considerations

限制与注意事项

Image Quality: Low resolution or blurry images may reduce accuracy
Supported Formats: Primarily raster images; vector graphics may need conversion
Privacy: Be cautious with sensitive information (PII, credentials, etc.)
Complex Diagrams: Highly technical diagrams may require domain expertise clarification
Real-Time Data: Cannot access live data or external resources not in the image

图像质量： 低分辨率或模糊图像可能降低准确性
支持格式： 主要支持光栅图像；矢量图形可能需要转换
隐私： 谨慎处理敏感信息（个人身份信息、凭证等）
复杂图表： 高度技术化的图表可能需要领域专业知识说明
实时数据： 无法访问图像之外的实时数据或外部资源

Advanced Features

高级功能

Batch Processing

批量处理

For multiple images:

markdown

Processing images in batch:
1. [Image1.png] - [Task result]
2. [Image2.png] - [Task result]
3. [Image3.png] - [Task result]

Summary: [Overall findings]

针对多张图像：

markdown

批量处理图像:
1. [Image1.png] - [任务结果]
2. [Image2.png] - [任务结果]
3. [Image3.png] - [任务结果]

摘要: [整体发现]

Multi-Modal Context

多模态上下文

Combine visual analysis with code, documents, or data:

Cross-reference image content with codebase files
Validate design implementations against mockups
Extract data and populate code templates
Generate code from UI screenshots

将视觉分析与代码、文档或数据结合：

将图像内容与代码库文件交叉引用
验证设计实现与原型图是否一致
提取数据并填充代码模板
从UI截图生成代码

Helper Script Integration

辅助脚本集成

For advanced processing, the skill includes a Python helper script at

scripts/image_processor.py

Image format conversion
Metadata extraction
Batch file operations
Image preprocessing (resize, enhance, crop)

Usage:

bash

python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]

为高级处理，本Skill包含一个Python辅助脚本，路径为

scripts/image_processor.py

：

图像格式转换
元数据提取
批量文件操作
图像预处理（调整大小、增强、裁剪）

使用方法：

bash

python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]

Examples

示例

Example 1: OCR from Screenshot

示例1：截图OCR提取

User Request: "Extract the text from this screenshot"

Process:

Load image with Read tool
Identify all text regions
Extract and format text
Preserve structure

Output:

markdown

undefined

用户请求： "提取此截图中的文本"

流程：

使用Read工具加载图像
识别所有文本区域
提取并格式化文本
保留结构

输出：

markdown

undefined

Extracted Text

提取的文本

Heading

标题

Main paragraph text goes here...

主要段落内容...

List Items

列表项

Item 1
Item 2
Item 3

Note: Text extracted with high confidence. Minor formatting adjustments made for readability.

undefined

项1
项2
项3

说明：文本提取置信度高。为提升可读性，对格式进行了小幅调整。

undefined

Example 2: Chart Analysis

示例2：图表分析

User Request: "Analyze this sales chart and tell me the key trends"

Process:

Load chart image
Identify chart type and data
Extract values and labels
Analyze trends

Output:

markdown

undefined

用户请求： "分析此销售图表并告知关键趋势"

流程：

加载图表图像
识别图表类型和数据
提取数值和标签
分析趋势

输出：

markdown

undefined

Sales Chart Analysis

销售图表分析

Chart Type: Line graph with 3 series (2023-2025)

Data Points:

Month	2023	2024	2025
Jan	120K	145K	180K
Feb	135K	150K	195K
...

Key Insights:

Consistent year-over-year growth of ~20%
Seasonal peak in Q4 across all years
2025 showing accelerated growth trend

Recommendation: Prepare for continued high demand in Q4 2025

undefined

图表类型: 折线图，包含3个系列（2023-2025）

数据点:

月份	2023	2024	2025
1月	120K	145K	180K
2月	135K	150K	195K
...

关键信息:

同比持续增长约20%
所有年份第四季度均出现季节性峰值
2025年增长趋势加速

建议： 为2025年第四季度的持续高需求做好准备

undefined

Example 3: UI Component Comparison

示例3：UI组件对比

User Request: "Compare these two design mockups and highlight the differences"

Process:

Load both images
Identify UI components in each
Compare layouts and elements
Document changes

Output:

markdown

undefined

用户请求： "对比这两个设计原型并突出差异"

流程：

加载两张图像
识别每张图像中的UI组件
对比布局和元素
记录变更

输出：

markdown

undefined

Design Mockup Comparison

设计原型对比

Added Elements (Mockup 2)

新增元素（原型2）

Search bar in header
User avatar in top-right corner
Breadcrumb navigation

头部搜索栏
右上角用户头像
面包屑导航

Removed Elements

移除元素

Sidebar menu (replaced with top navigation)

侧边栏菜单（替换为顶部导航）

Modified Elements

修改元素

Component	Mockup 1	Mockup 2
Primary Button	Blue (#007AFF)	Green (#34C759)
Typography	Roboto 16px	SF Pro 15px
Card Spacing	16px	24px

组件	原型1	原型2
主按钮	蓝色 (#007AFF)	绿色 (#34C759)
排版	Roboto 16px	SF Pro 15px
卡片间距	16px	24px

Layout Changes

布局变更

Shifted from sidebar to top navigation
Increased content area width by ~30%
Changed from 3-column to 4-column grid

undefined

从侧边栏导航改为顶部导航
内容区域宽度增加约30%
从3列网格改为4列网格

undefined

Integration with Claude Code

与Claude Code集成

This skill works seamlessly with other Claude Code features:

Read Tool: Load images from the filesystem
Write Tool: Save processed results or extracted data
Bash Tool: Run helper scripts for preprocessing
Task Tool: Coordinate complex multi-image workflows

本Skill可与其他Claude Code功能无缝协作：

Read工具： 从文件系统加载图像
Write工具： 保存处理结果或提取的数据
Bash工具： 运行辅助脚本进行预处理
Task工具： 协调复杂的多图像工作流程

Quick Reference

快速参考

Task	Command Pattern	Output Type
OCR	"Extract text from [image]"	Markdown text
Classification	"Classify this image"	Category labels
Chart Analysis	"Analyze this chart"	Data + insights
Comparison	"Compare [img1] and [img2]"	Diff table
Description	"Describe this image"	Paragraph
Q&A	"What [question] in this image?"	Answer
UI Analysis	"Analyze this UI screenshot"	Component breakdown

任务	命令模式	输出类型
OCR	"提取[图像]中的文本"	Markdown文本
分类	"分类此图像"	类别标签
图表分析	"分析此图表"	数据+关键信息
对比	"对比[img1]和[img2]"	差异表格
描述	"描述此图像"	段落
问答	"此图像中的[问题]是什么？"	答案
UI分析	"分析此UI截图"	组件分解

Tips for Best Results

最佳效果提示

Provide Context: Mention the domain or purpose (e.g., "medical diagram," "e-commerce UI")
Be Specific: Request specific information rather than general analysis
Multiple Angles: For complex images, ask follow-up questions
File Paths: Use absolute or relative paths correctly
Batch Operations: Process multiple similar images together for consistency

提供上下文： 提及领域或用途（如“医学图表”、“电商UI”）
明确需求： 请求具体信息而非泛泛分析
多角度提问： 针对复杂图像，提出后续问题
文件路径： 正确使用绝对或相对路径
批量操作： 批量处理多张相似图像以保持一致性

Support and Troubleshooting

支持与故障排除

Common Issues:

"Cannot read image" → Verify file path and format
"Low confidence extraction" → Image may be too low resolution
"Unable to detect chart data" → Chart may be too complex or stylized

Getting Better Results:

Use high-resolution images (300+ DPI for documents)
Ensure good contrast and lighting
Crop images to focus on relevant areas
Provide context about the image content

常见问题：

“无法读取图像” → 验证文件路径和格式
“提取置信度低” → 图像分辨率可能过低
“无法检测图表数据” → 图表可能过于复杂或风格化

提升结果质量：

使用高分辨率图像（文档建议300+ DPI）
确保对比度和光线良好
裁剪图像以聚焦相关区域
提供图像内容的上下文

License

许可证

This skill is licensed under Apache-2.0.

本Skill基于Apache-2.0许可证授权。

Version

版本

Version: 1.0.0 Last Updated: 2025-11-18 Compatible with: Claude Code (all versions with vision support)

版本: 1.0.0 最后更新: 2025-11-18 兼容版本: Claude Code（所有支持视觉功能的版本）