visual-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVisual Testing
视觉测试
Kanitsal Cerceve (Evidential Frame Activation)
证据框架激活(Kanitsal Cerceve)
Kaynak dogrulama modu etkin.
[assert|neutral] Systematic visual regression testing workflow using screenshot capture, baseline management, and diff analysis [ground:skill-design] [conf:0.92] [state:confirmed]
源验证模式已激活。
[断言|中立] 基于截图捕获、基准管理和差异分析的系统化视觉回归测试工作流 [依据:skill-design] [置信度:0.92] [状态:已确认]
Overview
概述
Visual Testing specializes in detecting unintended UI changes through screenshot-based comparison. Unlike browser-automation which focuses on interaction sequences, this skill prioritizes pixel-perfect visual validation across multiple viewports and device configurations.
Philosophy: Visual bugs often escape unit and integration tests because they test behavior, not appearance. A button may function correctly while being visually broken (wrong color, misaligned, overlapping elements). Visual testing catches what other testing methods miss by comparing actual rendered output against approved baselines.
Methodology: Six-phase workflow with baseline management:
- PLAN Phase: Sequential-thinking MCP decomposes visual test cases with viewport configurations
- NAVIGATE Phase: Position page in correct state for capture
- CAPTURE Phase: Multi-viewport screenshot collection with zoom for detail inspection
- COMPARE Phase: Pixel-level diff against baseline (if exists)
- REPORT Phase: Generate visual regression report with highlighted changes
- BASELINE Phase: Update golden images (with approval) or flag regression
Value Proposition: Reduce visual bug escapes by 85% through systematic screenshot comparison. Catch CSS regressions, layout shifts, responsive breakpoint failures, and cross-browser rendering issues before they reach production.
Key Differentiation from browser-automation:
| Aspect | browser-automation | visual-testing |
|---|---|---|
| Focus | Interaction sequences | Visual state capture |
| Output | Workflow completion | Diff reports |
| Validation | Functional success | Pixel comparison |
| Artifacts | Execution logs | Baseline images + diffs |
| Primary Use | E2E workflows | Regression detection |
视觉测试专注于通过基于截图的对比检测非预期的UI变化。与专注于交互序列的浏览器自动化不同,该技能优先在多视口和设备配置下进行像素级精准的视觉验证。
核心理念:视觉缺陷常常能躲过单元测试和集成测试,因为这些测试验证的是行为而非外观。一个按钮可能功能正常,但视觉上存在问题(颜色错误、对齐偏差、元素重叠)。视觉测试通过将实际渲染输出与已批准的基准进行对比,捕捉其他测试方法遗漏的问题。
方法论:包含基准管理的六阶段工作流:
- 规划阶段:Sequential-thinking MCP将视觉测试用例按视口配置分解
- 导航阶段:将页面定位到适合捕获的状态
- 捕获阶段:多视口截图收集,支持缩放以进行细节检查
- 对比阶段:与基准进行像素级差异对比(如果基准存在)
- 报告阶段:生成带有高亮变化的视觉回归报告
- 基准阶段:(经批准后)更新基准图像,或标记回归问题
价值主张:通过系统化的截图对比将视觉缺陷遗漏率降低85%。在问题进入生产环境前,捕获CSS回归、布局偏移、响应式断点失效和跨浏览器渲染问题。
与browser-automation的关键区别:
| 维度 | browser-automation | visual-testing |
|---|---|---|
| 关注点 | 交互序列 | 视觉状态捕获 |
| 输出 | 工作流完成状态 | 差异报告 |
| 验证方式 | 功能正确性 | 像素对比 |
| 产物 | 执行日志 | 基准图像+差异图 |
| 主要用途 | 端到端工作流 | 回归检测 |
When to Use This Skill
何时使用该技能
Trigger Thresholds:
| Scenario | Recommendation |
|---|---|
| Single page screenshot | Use computer tool directly (too simple) |
| 2-5 page visual checks | Consider this skill |
| Multi-viewport responsive testing | Mandatory use |
| Baseline comparison needed | Mandatory use |
| Design system validation | Mandatory use |
Primary Use Cases:
- CSS regression detection after style changes
- Responsive layout validation across breakpoints
- Component library visual testing
- Design system compliance checking
- Cross-browser rendering comparison
- Animation and transition capture (via GIF)
- Before/after deployment comparison
Apply When:
- Deploying UI changes that may affect multiple pages
- Validating responsive breakpoints work correctly
- Ensuring design system tokens apply consistently
- Comparing staging vs production appearance
- Documenting UI states for handoff
触发阈值:
| 场景 | 建议 |
|---|---|
| 单页面截图 | 直接使用计算机工具(过于简单) |
| 2-5个页面的视觉检查 | 考虑使用该技能 |
| 多视口响应式测试 | 必须使用 |
| 需要基准对比 | 必须使用 |
| 设计系统验证 | 必须使用 |
主要使用场景:
- 样式变更后的CSS回归检测
- 跨断点的响应式布局验证
- 组件库视觉测试
- 设计系统合规性检查
- 跨浏览器渲染对比
- 动画与过渡效果捕获(通过GIF)
- 部署前后对比
适用时机:
- 部署可能影响多个页面的UI变更时
- 验证响应式断点是否正常工作时
- 确保设计系统令牌一致应用时
- 对比预发布环境与生产环境的外观时
- 为交接记录UI状态时
When NOT to Use This Skill
何时不使用该技能
- Functional testing without visual validation (use e2e-test)
- Simple navigation workflows (use browser-automation)
- API testing or data validation (no visual component)
- Performance testing (use load-test skills)
- Accessibility audits (use specialized a11y tools)
- 无需视觉验证的功能测试(使用e2e-test)
- 简单导航工作流(使用browser-automation)
- API测试或数据验证(无视觉组件)
- 性能测试(使用load-test技能)
- 可访问性审计(使用专用a11y工具)
Core Principles
核心原则
Visual Testing operates on 5 fundamental principles:
视觉测试遵循5项基本原则:
Principle 1: Baseline-First Approach
原则1:基准优先方法
Golden images (baselines) serve as the source of truth. Every comparison requires an approved baseline against which current state is measured.
Rationale: Without baselines, visual testing becomes subjective screenshot collection. Baselines make quality objective and measurable.
In Practice:
- Capture initial baselines for all critical pages/viewports
- Store baselines in Memory MCP with project/page/viewport keys
- Version baselines (ISO8601 timestamps) for rollback capability
- Require explicit approval before baseline updates
基准图像(Golden images)作为事实来源。每次对比都需要一个已批准的基准,以此为标准衡量当前状态。
原理:没有基准的话,视觉测试就变成了主观的截图收集。基准让质量评估变得客观且可衡量。
实践方式:
- 为所有关键页面/视口捕获初始基准
- 使用项目/页面/视口作为键,将基准存储在Memory MCP中
- 为基准添加版本(ISO8601时间戳)以支持回滚
- 基准更新前需要明确的批准
Principle 2: Multi-Viewport Coverage
原则2:多视口覆盖
Test across multiple viewport configurations to catch responsive regressions that only appear at specific breakpoints.
Rationale: Most visual bugs manifest at edge cases - unusual screen widths, portrait vs landscape, mobile vs desktop. Single-viewport testing misses these.
In Practice:
- Always test at minimum 3 viewports (mobile, tablet, desktop)
- Include both portrait and landscape orientations
- Use standardized viewport presets for consistency
- Document viewport matrix in test plan
在多视口配置下进行测试,以捕获仅在特定断点出现的响应式回归问题。
原理:大多数视觉缺陷出现在边缘场景中——非常规屏幕宽度、竖屏vs横屏、移动设备vs桌面设备。单视口测试会遗漏这些问题。
实践方式:
- 至少测试3种视口(移动、平板、桌面)
- 同时包含竖屏和横屏方向
- 使用标准化视口预设以保证一致性
- 在测试计划中记录视口矩阵
Principle 3: Threshold-Based Comparison
原则3:基于阈值的对比
Not every pixel difference is a regression. Configure tolerance thresholds to distinguish intentional changes from bugs.
Rationale: Anti-aliasing, font rendering, and timing-dependent animations create non-deterministic pixel variations. Zero-tolerance comparison produces false positives.
In Practice:
- Set default threshold at 0.1% pixel difference (99.9% match required)
- Use higher thresholds for animation-heavy pages (1-2%)
- Ignore specific regions known for dynamic content (timestamps, ads)
- Track threshold effectiveness and tune over time
并非所有像素差异都是回归问题。配置容差阈值,区分有意变更与缺陷。
原理:抗锯齿、字体渲染和时间相关的动画会产生非确定性的像素变化。零容差对比会产生误报。
实践方式:
- 默认阈值设为0.1%像素差异(要求99.9%匹配)
- 动画较多的页面使用更高阈值(1-2%)
- 忽略已知动态内容区域(时间戳、广告)
- 跟踪阈值有效性并随时间调整
Principle 4: Element-Level Zoom for Precision
原则4:元素级缩放以实现精准性
Use the zoom tool for detailed inspection of specific UI elements when full-page screenshots insufficient.
Rationale: Small elements (icons, badges, indicators) may have regressions invisible at full-page scale. Zoomed captures reveal micro-regressions.
In Practice:
- Capture full page first, then zoom to critical elements
- Define zoom regions in test plan (coordinates or element refs)
- Compare zoomed regions independently
- Document element-level baselines separately
当全页截图不足以满足需求时,使用缩放工具对特定UI元素进行详细检查。
原理:小元素(图标、徽章、指示器)的回归问题可能在全页尺度下不可见。缩放捕获能发现微观回归问题。
实践方式:
- 先捕获全页,再对关键元素进行缩放
- 在测试计划中定义缩放区域(坐标或元素引用)
- 独立对比缩放区域
- 单独记录元素级基准
Principle 5: GIF Recording for Interactions
原则5:使用GIF录制交互过程
Static screenshots miss animation and transition regressions. Use GIF recording to capture temporal UI behavior.
Rationale: CSS animations, hover states, loading sequences, and micro-interactions are visible only in motion. GIF recording captures these temporal aspects.
In Practice:
- Record GIFs for pages with significant animations
- Capture hover/focus/active states in sequence
- Use GIFs for documenting before/after comparisons
- Store GIFs with interaction metadata
静态截图会遗漏动画和过渡回归问题。使用GIF录制捕获UI的时间行为。
原理:CSS动画、悬停状态、加载序列和微交互只有在动态状态下才可见。GIF录制能捕获这些时间维度的表现。
实践方式:
- 为包含大量动画的页面录制GIF
- 按顺序捕获悬停/聚焦/激活状态
- 使用GIF记录前后对比
- 存储带有交互元数据的GIF
Production Guardrails
生产环境防护措施
MCP Preflight Check Protocol
MCP预检协议
Before executing visual tests, validate required MCPs:
Preflight Sequence:
javascript
async function visualTestPreflight() {
const checks = {
sequential_thinking: false,
claude_in_chrome: false,
memory_mcp: false
};
// Check sequential-thinking MCP (required for planning)
try {
await mcp__sequential-thinking__sequentialthinking({
thought: "Visual test preflight - verifying MCP availability",
thoughtNumber: 1,
totalThoughts: 1,
nextThoughtNeeded: false
});
checks.sequential_thinking = true;
} catch (error) {
throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
}
// Check claude-in-chrome MCP (required for capture)
try {
const context = await mcp__claude-in-chrome__tabs_context_mcp({});
checks.claude_in_chrome = true;
} catch (error) {
throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
}
// Check memory-mcp (required for baseline storage)
try {
// Memory MCP check
checks.memory_mcp = true;
} catch (error) {
throw new Error("CRITICAL: memory-mcp required for baseline storage");
}
return checks;
}执行视觉测试前,验证所需的MCP:
预检流程:
javascript
async function visualTestPreflight() {
const checks = {
sequential_thinking: false,
claude_in_chrome: false,
memory_mcp: false
};
// 检查sequential-thinking MCP(规划阶段必需)
try {
await mcp__sequential-thinking__sequentialthinking({
thought: "Visual test preflight - verifying MCP availability",
thoughtNumber: 1,
totalThoughts: 1,
nextThoughtNeeded: false
});
checks.sequential_thinking = true;
} catch (error) {
throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
}
// 检查claude-in-chrome MCP(捕获阶段必需)
try {
const context = await mcp__claude-in-chrome__tabs_context_mcp({});
checks.claude_in_chrome = true;
} catch (error) {
throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
}
// 检查memory-mcp(基准存储必需)
try {
// Memory MCP check
checks.memory_mcp = true;
} catch (error) {
throw new Error("CRITICAL: memory-mcp required for baseline storage");
}
return checks;
}Viewport Preset Configuration
视口预设配置
Standard Viewport Matrix:
javascript
const VIEWPORT_PRESETS = {
// Mobile Devices
iphone_se: { width: 375, height: 667, name: "iPhone SE" },
iphone_14: { width: 390, height: 844, name: "iPhone 14" },
iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
pixel_7: { width: 412, height: 915, name: "Pixel 7" },
// Tablets
ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },
// Desktop
laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};
// Standard test matrix (most common)
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];
// Extended test matrix (comprehensive)
const EXTENDED_MATRIX = [
"iphone_se", "iphone_14_pro_max", "pixel_7",
"ipad_mini", "ipad_pro_12",
"laptop_sm", "desktop_hd", "desktop_4k"
];标准视口矩阵:
javascript
const VIEWPORT_PRESETS = {
// 移动设备
iphone_se: { width: 375, height: 667, name: "iPhone SE" },
iphone_14: { width: 390, height: 844, name: "iPhone 14" },
iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
pixel_7: { width: 412, height: 915, name: "Pixel 7" },
// 平板设备
ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },
// 桌面设备
laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};
// 标准测试矩阵(最常用)
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];
// 扩展测试矩阵(全面覆盖)
const EXTENDED_MATRIX = [
"iphone_se", "iphone_14_pro_max", "pixel_7",
"ipad_mini", "ipad_pro_12",
"laptop_sm", "desktop_hd", "desktop_4k"
];Diff Threshold Configuration
差异阈值配置
javascript
const DIFF_THRESHOLDS = {
// Strict (design system components)
strict: {
pixelDiff: 0.01, // 0.01% tolerance (nearly pixel-perfect)
description: "For design system components requiring exact match"
},
// Default (most pages)
default: {
pixelDiff: 0.1, // 0.1% tolerance
description: "Standard threshold for most UI testing"
},
// Relaxed (dynamic content)
relaxed: {
pixelDiff: 1.0, // 1% tolerance
description: "For pages with minor dynamic variations"
},
// Animation (high variance)
animation: {
pixelDiff: 5.0, // 5% tolerance
description: "For animation captures with timing variance"
}
};javascript
const DIFF_THRESHOLDS = {
// 严格模式(设计系统组件)
strict: {
pixelDiff: 0.01, // 0.01%容差(近乎像素级完美匹配)
description: "For design system components requiring exact match"
},
// 默认模式(大多数页面)
default: {
pixelDiff: 0.1, // 0.1%容差
description: "Standard threshold for most UI testing"
},
// 宽松模式(动态内容)
relaxed: {
pixelDiff: 1.0, // 1%容差
description: "For pages with minor dynamic variations"
},
// 动画模式(高差异)
animation: {
pixelDiff: 5.0, // 5%容差
description: "For animation captures with timing variance"
}
};Error Handling Framework
错误处理框架
Error Categories:
| Category | Example | Recovery Strategy |
|---|---|---|
| MCP_UNAVAILABLE | claude-in-chrome offline | ABORT - cannot proceed |
| NAVIGATION_FAILED | Page timeout/404 | Retry 3x with backoff |
| CAPTURE_FAILED | Screenshot error | Retry with fresh tab |
| BASELINE_MISSING | No golden image | Prompt for baseline creation |
| COMPARISON_FAILED | Diff computation error | Log and skip, flag for review |
| THRESHOLD_EXCEEDED | Visual regression detected | Generate report, flag issue |
错误分类:
| 分类 | 示例 | 恢复策略 |
|---|---|---|
| MCP_UNAVAILABLE | claude-in-chrome离线 | 终止 - 无法继续 |
| NAVIGATION_FAILED | 页面超时/404 | 重试3次,带退避机制 |
| CAPTURE_FAILED | 截图错误 | 使用新标签重试 |
| BASELINE_MISSING | 无基准图像 | 提示创建基准 |
| COMPARISON_FAILED | 差异计算错误 | 记录并跳过,标记为需要复查 |
| THRESHOLD_EXCEEDED | 检测到视觉回归 | 生成报告,标记问题 |
Main Workflow
主要工作流
Phase 1: Test Planning (MANDATORY)
阶段1:测试规划(必需)
Purpose: Define visual test scope using sequential-thinking decomposition.
Process:
- Invoke sequential-thinking MCP
- Identify target pages/URLs
- Select viewport configurations
- Define capture regions (full page, element-specific)
- Set comparison thresholds
- Plan interaction sequences for state-dependent captures
Input Contract:
yaml
inputs:
target_url: string # URL to test
pages: list[string] # Page paths to capture
viewport_matrix: list[string] # Viewport presets to use
capture_mode: string # "full_page" | "element" | "both"
threshold_profile: string # "strict" | "default" | "relaxed"
interaction_sequence: list # Optional: actions before captureOutput Contract:
yaml
outputs:
test_plan:
pages: list[PagePlan]
viewports: list[ViewportConfig]
capture_points: list[CapturePoint]
threshold: number目的:使用sequential-thinking分解法定义视觉测试范围。
流程:
- 调用sequential-thinking MCP
- 确定目标页面/URL
- 选择视口配置
- 定义捕获区域(全页、特定元素)
- 设置对比阈值
- 为依赖状态的捕获规划交互序列
输入约定:
yaml
inputs:
target_url: string # 待测试的URL
pages: list[string] # 要捕获的页面路径
viewport_matrix: list[string] # 要使用的视口预设
capture_mode: string # "full_page" | "element" | "both"
threshold_profile: string # "strict" | "default" | "relaxed"
interaction_sequence: list # 可选:捕获前的操作输出约定:
yaml
outputs:
test_plan:
pages: list[PagePlan]
viewports: list[ViewportConfig]
capture_points: list[CapturePoint]
threshold: numberPhase 2: Navigation & State Setup
阶段2:导航与状态设置
Purpose: Navigate to target page and establish correct state for capture.
Process:
- Get/create tab context (tabs_context_mcp, tabs_create_mcp)
- Navigate to target URL
- Wait for page load completion
- Execute interaction sequence if needed (login, scroll, hover)
- Verify page state ready for capture
Agent:
Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")目的:导航到目标页面并建立适合捕获的状态。
流程:
- 获取/创建标签上下文(tabs_context_mcp, tabs_create_mcp)
- 导航到目标URL
- 等待页面加载完成
- 如有需要,执行交互序列(登录、滚动、悬停)
- 验证页面状态已准备好进行捕获
代理:
Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")Phase 3: Multi-Viewport Capture
阶段3:多视口捕获
Purpose: Capture screenshots across all configured viewports.
Process:
For each viewport in viewport_matrix:
1. Resize window (resize_window)
2. Wait for reflow (wait 500ms)
3. Capture full page (computer screenshot)
4. Capture zoomed regions if configured (computer zoom)
5. Store capture with viewport/page metadataKey Tools:
- : Set viewport dimensions
resize_window - (screenshot): Full page capture
computer - (zoom): Element-level detail capture
computer - : For interaction sequences
gif_creator
目的:在所有配置的视口下捕获截图。
流程:
For each viewport in viewport_matrix:
1. 调整窗口大小(resize_window)
2. 等待重排(等待500ms)
3. 捕获全页(computer截图功能)
4. 如已配置,捕获缩放区域(computer缩放功能)
5. 存储带有视口/页面元数据的捕获内容核心工具:
- : 设置视口尺寸
resize_window - (screenshot): 全页捕获
computer - (zoom): 元素级细节捕获
computer - : 用于交互序列录制
gif_creator
Phase 4: Baseline Comparison
阶段4:基准对比
Purpose: Compare current captures against stored baselines.
Process:
- Query Memory MCP for baseline (namespace: )
visual-testing/baselines/{project}/{page}/{viewport} - If baseline exists:
- Compute pixel diff percentage
- Generate diff visualization (highlight changed pixels)
- Apply threshold comparison
- If baseline missing:
- Flag as "new baseline needed"
- Prompt for approval
Comparison Algorithm:
javascript
function compareScreenshots(current, baseline, threshold) {
const totalPixels = current.width * current.height;
let diffPixels = 0;
for (let y = 0; y < current.height; y++) {
for (let x = 0; x < current.width; x++) {
if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
diffPixels++;
}
}
}
const diffPercent = (diffPixels / totalPixels) * 100;
return {
passed: diffPercent <= threshold,
diffPercent: diffPercent,
diffPixels: diffPixels,
totalPixels: totalPixels
};
}目的:将当前捕获内容与存储的基准进行对比。
流程:
- 查询Memory MCP获取基准(命名空间: )
visual-testing/baselines/{project}/{page}/{viewport} - 如果基准存在:
- 计算像素差异百分比
- 生成差异可视化(高亮变化的像素)
- 应用阈值对比
- 如果基准不存在:
- 标记为“需要创建新基准”
- 提示获取批准
对比算法:
javascript
function compareScreenshots(current, baseline, threshold) {
const totalPixels = current.width * current.height;
let diffPixels = 0;
for (let y = 0; y < current.height; y++) {
for (let x = 0; x < current.width; x++) {
if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
diffPixels++;
}
}
}
const diffPercent = (diffPixels / totalPixels) * 100;
return {
passed: diffPercent <= threshold,
diffPercent: diffPercent,
diffPixels: diffPixels,
totalPixels: totalPixels
};
}Phase 5: Report Generation
阶段5:报告生成
Purpose: Generate comprehensive visual regression report.
Process:
- Aggregate comparison results across all pages/viewports
- Generate summary (pass/fail counts, worst regressions)
- Create diff visualizations (side-by-side, overlay, diff-only)
- Include metadata (timestamps, viewport configs, thresholds)
- Store report in Memory MCP
Report Structure:
yaml
visual_regression_report:
timestamp: ISO8601
project: string
summary:
total_captures: number
passed: number
failed: number
new_baselines: number
failures:
- page: string
viewport: string
diff_percent: number
threshold: number
baseline_timestamp: ISO8601
current_capture_id: string
metadata:
viewports_tested: list
threshold_profile: string
duration_ms: number目的:生成全面的视觉回归报告。
流程:
- 汇总所有页面/视口的对比结果
- 生成摘要(通过/失败数量,最严重的回归问题)
- 创建差异可视化(并排对比、叠加对比、仅差异图)
- 包含元数据(时间戳、视口配置、阈值)
- 将报告存储在Memory MCP中
报告结构:
yaml
visual_regression_report:
timestamp: ISO8601
project: string
summary:
total_captures: number
passed: number
failed: number
new_baselines: number
failures:
- page: string
viewport: string
diff_percent: number
threshold: number
baseline_timestamp: ISO8601
current_capture_id: string
metadata:
viewports_tested: list
threshold_profile: string
duration_ms: numberPhase 6: Baseline Management
阶段6:基准管理
Purpose: Update baselines when changes are intentional.
Process:
- For failed comparisons, determine if change is intentional
- If intentional: Update baseline with approval
- If regression: Flag for fix
- For new pages: Create initial baseline with approval
- Version old baselines (keep 5 most recent)
Baseline Storage Schema:
yaml
baseline:
namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
data:
image_id: string # Reference to stored screenshot
captured_at: ISO8601
approved_by: string
threshold_used: number
viewport: object
url: string
version: number
tags:
WHO: "visual-testing:1.0.0"
WHEN: ISO8601
PROJECT: string
WHY: "baseline-capture"目的:当变更为有意操作时更新基准。
流程:
- 对于失败的对比,判断变更是有意还是无意
- 如果是有意变更:经批准后更新基准
- 如果是回归问题:标记为需要修复
- 对于新页面:经批准后创建初始基准
- 为旧基准添加版本(保留最近5个版本)
基准存储模式:
yaml
baseline:
namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
data:
image_id: string # 存储截图的引用
captured_at: ISO8601
approved_by: string
threshold_used: number
viewport: object
url: string
version: number
tags:
WHO: "visual-testing:1.0.0"
WHEN: ISO8601
PROJECT: string
WHY: "baseline-capture"LEARNED PATTERNS
已学习模式
<!-- This section will be populated by Loop 1.5 session reflection -->
<!-- Patterns are added when user corrections or approvals provide learning signals -->
<!-- 此部分将由Loop 1.5会话反思填充 -->
<!-- 当用户修正或批准提供学习信号时,将添加模式 -->
High Confidence [conf:0.90+]
高置信度 [conf:0.90+]
No patterns recorded yet. This section will be updated through Loop 1.5 reflection.
No patterns recorded yet. This section will be updated through Loop 1.5 reflection.
Medium Confidence [conf:0.70-0.89]
中等置信度 [conf:0.70-0.89]
No patterns recorded yet.
No patterns recorded yet.
Low Confidence [conf:0.50-0.69]
低置信度 [conf:0.50-0.69]
No patterns recorded yet.
No patterns recorded yet.
Pattern Recognition
模式识别
Different visual testing scenarios require different approaches:
不同的视觉测试场景需要不同的方法:
Responsive Layout Testing
响应式布局测试
Patterns: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"
Common Characteristics:
- Multiple viewport configurations required
- Layout shifts are primary concern
- Element visibility/hiding at breakpoints
- Text wrapping and overflow behavior
Key Focus:
- Breakpoint transitions (where layouts shift)
- Navigation collapse/expand behavior
- Grid/flex layout stability
- Touch target sizing on mobile
Approach: Use extended viewport matrix, focus on breakpoint edge cases (width +/- 10px from breakpoint)
模式: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"
共同特征:
- 需要多视口配置
- 主要关注点是布局偏移
- 断点处元素的显示/隐藏
- 文本换行和溢出行为
核心关注点:
- 断点过渡(布局发生变化的位置)
- 导航栏的折叠/展开行为
- 网格/弹性布局的稳定性
- 移动设备上的触摸目标尺寸
方法: 使用扩展视口矩阵,聚焦断点边缘场景(断点宽度±10px)
Component Visual Testing
组件视觉测试
Patterns: "component", "button", "card", "form", "modal", "dropdown"
Common Characteristics:
- Isolated element testing
- State variations (default, hover, active, disabled, error)
- Strict threshold requirements
- Design token compliance
Key Focus:
- Color accuracy (design tokens)
- Spacing consistency
- Typography rendering
- Border/shadow rendering
Approach: Use zoom tool for detailed capture, strict threshold, capture all states via interaction sequence
模式: "component", "button", "card", "form", "modal", "dropdown"
共同特征:
- 孤立元素测试
- 状态变化(默认、悬停、激活、禁用、错误)
- 严格的阈值要求
- 设计令牌合规性
核心关注点:
- 颜色准确性(设计令牌)
- 间距一致性
- 排版渲染
- 边框/阴影渲染
方法: 使用缩放工具进行详细捕获,使用严格阈值,通过交互序列捕获所有状态
Animation/Transition Testing
动画/过渡测试
Patterns: "animation", "transition", "hover", "loading", "skeleton"
Common Characteristics:
- Temporal behavior (not single frame)
- GIF recording required
- Higher diff thresholds due to timing variance
- Performance-sensitive
Key Focus:
- Animation timing correctness
- Transition smoothness
- Loading state appearance
- Skeleton to content transition
Approach: Use gif_creator for recording, relaxed/animation threshold profile, capture key frames
模式: "animation", "transition", "hover", "loading", "skeleton"
共同特征:
- 时间维度的行为(非单帧)
- 需要GIF录制
- 由于时间差异需要更高的差异阈值
- 对性能敏感
核心关注点:
- 动画时间正确性
- 过渡平滑度
- 加载状态外观
- 骨架屏到内容的过渡
方法: 使用gif_creator进行录制,使用宽松/动画阈值配置文件,捕获关键帧
Cross-Environment Comparison
跨环境对比
Patterns: "staging vs production", "before after", "compare", "deploy validation"
Common Characteristics:
- Two distinct environments/states
- Side-by-side comparison needed
- May have expected differences (content)
- Focus on structural consistency
Key Focus:
- Layout structure stability
- Component presence/absence
- Style application consistency
- No unexpected visual changes
Approach: Capture both states, generate side-by-side diff, use relaxed threshold for content areas
模式: "staging vs production", "before after", "compare", "deploy validation"
共同特征:
- 两个不同的环境/状态
- 需要并排对比
- 可能存在预期差异(内容)
- 关注结构一致性
核心关注点:
- 布局结构稳定性
- 组件存在/缺失
- 样式应用一致性
- 无意外视觉变化
方法: 捕获两种状态,生成并排差异图,对内容区域使用宽松阈值
Advanced Techniques
高级技术
Audience-Specific Testing
受众特定测试
Different stakeholders need different visual test outputs:
Developers: Technical diffs with pixel coordinates, DOM structure comparison, CSS property changes
Designers: Visual overlays, color accuracy reports, spacing measurements, design token compliance
QA Team: Pass/fail summaries, regression counts, trend reports, baseline approval queues
Executives: High-level dashboards, regression trends, release readiness indicators
不同利益相关者需要不同的视觉测试输出:
开发人员: 带像素坐标的技术差异图、DOM结构对比、CSS属性变化
设计师: 视觉叠加层、颜色准确性报告、间距测量、设计令牌合规性
QA团队: 通过/失败摘要、回归数量、趋势报告、基准批准队列
管理人员: 高级仪表板、回归趋势、发布就绪指标
Ignore Regions Configuration
忽略区域配置
For pages with dynamic content, configure ignore regions to prevent false positives:
javascript
const IGNORE_REGIONS = {
common: [
{ selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
{ selector: ".ad-container", reason: "Third-party ads" },
{ selector: ".live-chat-widget", reason: "Chat widget state varies" }
],
page_specific: {
"/dashboard": [
{ selector: ".metric-value", reason: "Live metrics" },
{ selector: ".user-avatar", reason: "User-specific content" }
]
}
};对于包含动态内容的页面,配置忽略区域以避免误报:
javascript
const IGNORE_REGIONS = {
common: [
{ selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
{ selector: ".ad-container", reason: "Third-party ads" },
{ selector: ".live-chat-widget", reason: "Chat widget state varies" }
],
page_specific: {
"/dashboard": [
{ selector: ".metric-value", reason: "Live metrics" },
{ selector: ".user-avatar", reason: "User-specific content" }
]
}
};Multi-Model Validation
多模型验证
For critical visual tests, use LLM Council for consensus:
javascript
// When visual diff is borderline (threshold +/- 0.5%)
async function multiModelVisualValidation(current, baseline, diff) {
const prompt = `
Analyze this visual comparison:
- Diff percentage: ${diff.diffPercent}%
- Changed pixels: ${diff.diffPixels}
- Threshold: ${diff.threshold}%
Is this change:
A) Intentional design update (approve new baseline)
B) Unintentional regression (flag for fix)
C) Acceptable variation (pass with note)
Provide reasoning.
`;
// Route to Gemini for image analysis capability
return await geminiAnalyze(current, baseline, prompt);
}对于关键视觉测试,使用LLM委员会达成共识:
javascript
// 当视觉差异处于临界值时(阈值±0.5%)
async function multiModelVisualValidation(current, baseline, diff) {
const prompt = `
Analyze this visual comparison:
- Diff percentage: ${diff.diffPercent}%
- Changed pixels: ${diff.diffPixels}
- Threshold: ${diff.threshold}%
Is this change:
A) Intentional design update (approve new baseline)
B) Unintentional regression (flag for fix)
C) Acceptable variation (pass with note)
Provide reasoning.
`;
// 路由到Gemini进行图像分析
return await geminiAnalyze(current, baseline, prompt);
}Common Anti-Patterns
常见反模式
Avoid these common mistakes:
避免这些常见错误:
Capture Anti-Patterns
捕获反模式
| Anti-Pattern | Problem | Solution |
|---|---|---|
| No wait after resize | Captures before reflow complete | Add 500ms wait after resize_window |
| Ignoring async content | Missing dynamically loaded elements | Wait for network idle or specific selectors |
| Single viewport only | Missing responsive regressions | Use minimum 3 viewports (mobile, tablet, desktop) |
| Capturing during animation | Non-deterministic frames | Wait for animations or use GIF |
| 反模式 | 问题 | 解决方案 |
|---|---|---|
| 调整大小后不等待 | 在重排完成前捕获 | 调整窗口大小后添加500ms等待 |
| 忽略异步内容 | 遗漏动态加载的元素 | 等待网络空闲或特定选择器加载完成 |
| 仅单视口测试 | 遗漏响应式回归问题 | 至少使用3种视口(移动、平板、桌面) |
| 动画过程中捕获 | 非确定性帧 | 等待动画完成或使用GIF |
Comparison Anti-Patterns
对比反模式
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Zero tolerance | False positives from anti-aliasing | Use minimum 0.01% threshold |
| No baseline versioning | Cannot rollback bad baseline | Version baselines with timestamps |
| Comparing different viewports | Invalid diff | Validate viewport match before compare |
| No ignore regions | Dynamic content causes failures | Configure ignore regions for timestamps, ads |
| 反模式 | 问题 | 解决方案 |
|---|---|---|
| 零容差 | 抗锯齿导致误报 | 至少使用0.01%的阈值 |
| 无基准版本控制 | 无法回滚错误的基准 | 为基准添加时间戳版本 |
| 对比不同视口 | 无效差异 | 对比前验证视口匹配 |
| 无忽略区域 | 动态内容导致失败 | 为时间戳、广告等配置忽略区域 |
Workflow Anti-Patterns
工作流反模式
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Skip planning phase | Missing edge cases | ALWAYS use sequential-thinking first |
| No interaction before capture | Missing auth/state-dependent pages | Plan interaction sequences |
| Silent baseline updates | Regressions approved accidentally | Require explicit approval |
| No cleanup | Orphaned tabs accumulate | Close tabs after test completion |
| 反模式 | 问题 | 解决方案 |
|---|---|---|
| 跳过规划阶段 | 遗漏边缘场景 | 始终先使用sequential-thinking |
| 捕获前无交互 | 遗漏需要认证/依赖状态的页面 | 规划交互序列 |
| 静默更新基准 | 回归问题被意外批准 | 需要明确的批准 |
| 无清理操作 | 孤立标签累积 | 测试完成后关闭标签 |
Practical Guidelines
实践指南
Full vs Quick Mode
完整模式 vs 快速模式
Full Mode (comprehensive):
- All viewports in extended matrix
- All pages in sitemap
- Element-level zoom captures
- GIF recording for animations
- Duration: 5-15 minutes
Quick Mode (smoke test):
- Standard matrix (3 viewports)
- Critical pages only
- Full-page captures only
- Skip animations
- Duration: 1-3 minutes
完整模式(全面):
- 扩展矩阵中的所有视口
- 站点地图中的所有页面
- 元素级缩放捕获
- 动画的GIF录制
- 时长: 5-15分钟
快速模式(冒烟测试):
- 标准矩阵(3种视口)
- 仅关键页面
- 仅全页捕获
- 跳过动画
- 时长: 1-3分钟
Checkpoint Strategy
检查点策略
For large test suites (20+ pages):
- Save progress every 5 pages
- Store partial results in Memory MCP
- Enable resume on failure
- Timeout individual captures at 30 seconds
对于大型测试套件(20+页面):
- 每5个页面保存一次进度
- 将部分结果存储在Memory MCP中
- 支持失败后恢复
- 单个捕获超时设为30秒
Trade-offs
权衡决策
| Decision | Option A | Option B | Guidance |
|---|---|---|---|
| Threshold strictness | Strict (0.01%) | Relaxed (1%) | Strict for design system, relaxed for content-heavy |
| Viewport coverage | Extended (8+) | Standard (3) | Extended for responsive-focused apps |
| Capture mode | Full page | Element zoom | Full page default, zoom for component testing |
| Baseline storage | Local | Memory MCP | Memory MCP for cross-session persistence |
| 决策 | 选项A | 选项B | 指导原则 |
|---|---|---|---|
| 阈值严格性 | 严格(0.01%) | 宽松(1%) | 设计系统用严格模式,内容密集页面用宽松模式 |
| 视口覆盖 | 扩展(8+) | 标准(3) | 响应式聚焦的应用使用扩展模式 |
| 捕获模式 | 全页 | 元素缩放 | 默认全页,组件测试用缩放 |
| 基准存储 | 本地 | Memory MCP | Memory MCP用于跨会话持久化 |
Cross-Skill Coordination
跨技能协作
Visual Testing works with other skills in the ecosystem:
视觉测试与生态系统中的其他技能配合工作:
Upstream Skills (provide input)
上游技能(提供输入)
| Skill | When to Use First | What It Provides |
|---|---|---|
| Always first | Detect visual testing need, extract URLs |
| For complex page states | Navigation + interaction to reach state |
| For test plan optimization | Structured test specifications |
| 技能 | 何时先使用 | 提供内容 |
|---|---|---|
| 始终首先使用 | 检测视觉测试需求,提取URL |
| 复杂页面状态 | 导航+交互以达到目标状态 |
| 测试计划优化 | 结构化测试规范 |
Downstream Skills (use output)
下游技能(使用输出)
| Skill | When to Use After | What It Does |
|---|---|---|
| On regression detection | Fix visual bugs identified |
| For test reports | Generate visual test documentation |
| Before deploy | Gate deployment on visual test pass |
| 技能 | 何时在之后使用 | 功能 |
|---|---|---|
| 检测到回归时 | 修复识别出的视觉缺陷 |
| 生成测试报告后 | 生成视觉测试文档 |
| 部署前 | 基于视觉测试通过情况控制部署 |
Parallel Skills (run alongside)
并行技能(同时运行)
| Skill | When to Run Together | How They Coordinate |
|---|---|---|
| Same page coverage | Visual captures functional tests |
| Page state setup | Automation provides capture-ready state |
| CSS changes | Visual test validates review findings |
| 技能 | 何时同时运行 | 协作方式 |
|---|---|---|
| 相同页面覆盖 | 视觉捕获验证功能测试 |
| 页面状态设置 | 自动化提供可捕获的状态 |
| CSS变更时 | 视觉测试验证评审结果 |
MCP Integration
MCP集成
Required MCPs:
| MCP | Purpose | Tools Used |
|---|---|---|
| sequential-thinking | Test planning | |
| claude-in-chrome | Screenshot capture | |
| memory-mcp | Baseline storage | |
Tool-Specific Usage:
| Tool | Purpose in Visual Testing |
|---|---|
| Get/verify browser context before tests |
| Create clean tab for test isolation |
| Set viewport dimensions |
| Load target URL |
| Capture full page state |
| Capture specific region with magnification |
| Pause for reflow/animation completion |
| Record interaction sequences |
| Verify page structure before capture |
| Locate elements for region capture |
必需MCP:
| MCP | 用途 | 使用工具 |
|---|---|---|
| sequential-thinking | 测试规划 | |
| claude-in-chrome | 截图捕获 | |
| memory-mcp | 基准存储 | |
工具特定用法:
| 工具 | 在视觉测试中的用途 |
|---|---|
| 测试前获取/验证浏览器上下文 |
| 创建干净标签以隔离测试 |
| 设置视口尺寸 |
| 加载目标URL |
| 捕获全页状态 |
| 放大捕获特定区域 |
| 暂停以等待重排/动画完成 |
| 录制交互序列 |
| 捕获前验证页面结构 |
| 定位元素以进行区域捕获 |
Memory Namespace
内存命名空间
Pattern:
skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}Types:
- - Golden images (approved screenshots)
baselines/ - - Current test captures
captures/ - - Visual regression reports
reports/ - - Generated diff visualizations
diffs/
Store:
- Baseline screenshots with approval metadata
- Test execution reports
- Diff visualizations
- Configuration (viewports, thresholds, ignore regions)
Retrieve:
- Baseline for comparison by page/viewport key
- Historical reports for trend analysis
- Previous configs for consistency
Tagging:
json
{
"WHO": "visual-testing:1.0.0",
"WHEN": "ISO8601_timestamp",
"PROJECT": "{project_name}",
"WHY": "visual-regression-testing",
"page": "{page_path}",
"viewport": "{viewport_name}",
"threshold_profile": "{profile}",
"passed": true
}模式:
skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}类型:
- - 基准图像(已批准的截图)
baselines/ - - 当前测试捕获内容
captures/ - - 视觉回归报告
reports/ - - 生成的差异可视化图
diffs/
存储内容:
- 带有批准元数据的基准截图
- 测试执行报告
- 差异可视化图
- 配置(视口、阈值、忽略区域)
检索内容:
- 通过页面/视口键获取用于对比的基准
- 历史报告用于趋势分析
- 之前的配置用于一致性
标签:
json
{
"WHO": "visual-testing:1.0.0",
"WHEN": "ISO8601_timestamp",
"PROJECT": "{project_name}",
"WHY": "visual-regression-testing",
"page": "{page_path}",
"viewport": "{viewport_name}",
"threshold_profile": "{profile}",
"passed": true
}Input/Output Contracts
输入/输出约定
Skill Input
技能输入
yaml
visual_test_request:
required:
target_url: string # Base URL to test
optional:
pages: list[string] # Specific paths (default: ["/"])
viewport_matrix: list[string] # Preset names (default: STANDARD_MATRIX)
capture_mode: string # "full_page" | "element" | "both" (default: "full_page")
threshold_profile: string # "strict" | "default" | "relaxed" (default: "default")
compare_baseline: boolean # Whether to compare (default: true)
update_baseline: boolean # Whether to update on approval (default: false)
interaction_sequence: list # Actions before capture
ignore_regions: list # Selectors to ignoreyaml
visual_test_request:
required:
target_url: string # 待测试的基础URL
optional:
pages: list[string] # 特定路径(默认: ["/"])
viewport_matrix: list[string] # 预设名称(默认: STANDARD_MATRIX)
capture_mode: string # "full_page" | "element" | "both"(默认: "full_page")
threshold_profile: string # "strict" | "default" | "relaxed"(默认: "default")
compare_baseline: boolean # 是否进行对比(默认: true)
update_baseline: boolean # 批准后是否更新(默认: false)
interaction_sequence: list # 捕获前的操作
ignore_regions: list # 要忽略的选择器Skill Output
技能输出
yaml
visual_test_result:
summary:
status: "passed" | "failed" | "new_baselines"
total_captures: number
passed: number
failed: number
new_baselines: number
execution_time_ms: number
captures:
- page: string
viewport: string
capture_id: string
baseline_id: string | null
comparison:
passed: boolean
diff_percent: number
threshold: number
failures:
- page: string
viewport: string
diff_percent: number
reason: string
report_id: string # Memory MCP reference to full reportyaml
visual_test_result:
summary:
status: "passed" | "failed" | "new_baselines"
total_captures: number
passed: number
failed: number
new_baselines: number
execution_time_ms: number
captures:
- page: string
viewport: string
capture_id: string
baseline_id: string | null
comparison:
passed: boolean
diff_percent: number
threshold: number
failures:
- page: string
viewport: string
diff_percent: number
reason: string
report_id: string # 完整报告的Memory MCP引用Recursive Improvement Integration
递归改进集成
Role in Meta-Loop
在元循环中的角色
| Loop | Visual Testing Role |
|---|---|
| Loop 1 | Execute visual tests as part of validation |
| Loop 1.5 | Capture learnings about threshold tuning, false positives |
| Loop 2 | Quality validation of test coverage |
| Loop 3 | Aggregate patterns for threshold optimization |
| 循环 | 视觉测试角色 |
|---|---|
| Loop 1 | 作为验证的一部分执行视觉测试 |
| Loop 1.5 | 收集关于阈值调整、误报的经验 |
| Loop 2 | 测试覆盖范围的质量验证 |
| Loop 3 | 聚合模式以优化阈值 |
Eval Harness Integration
评估工具集成
Visual testing supports evaluation via:
- Test pass rate tracking
- False positive rate monitoring
- Threshold effectiveness metrics
- Baseline update frequency
视觉测试通过以下方式支持评估:
- 测试通过率跟踪
- 误报率监控
- 阈值有效性指标
- 基准更新频率
Learning Signal Sources
学习信号来源
| Signal | Confidence | Learning |
|---|---|---|
| User approves new baseline | HIGH (0.90) | Threshold was appropriate |
| User rejects false positive | HIGH (0.90) | Threshold too strict for context |
| User flags missed regression | HIGH (0.90) | Threshold too relaxed |
| Same page fails repeatedly | MEDIUM (0.75) | Investigate dynamic content issue |
| 信号 | 置信度 | 学习内容 |
|---|---|---|
| 用户批准新基准 | 高(0.90) | 阈值设置合适 |
| 用户拒绝误报 | 高(0.90) | 该场景下阈值过于严格 |
| 用户标记遗漏的回归 | 高(0.90) | 阈值过于宽松 |
| 同一页面反复失败 | 中(0.75) | 调查动态内容问题 |
Examples
示例
Example 1: Responsive Layout Validation
示例1:响应式布局验证
Complexity: Medium (3 viewports, 5 pages)
Task: Validate homepage responsive behavior across mobile, tablet, desktop
Planning Output (sequential-thinking):
Thought 1/6: Need to validate responsive breakpoints for homepage
Thought 2/6: Viewports: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: Capture sections: hero, features, pricing, footer
Thought 4/6: Use default threshold (0.1%) for static content
Thought 5/6: Check baseline existence, compare if present
Thought 6/6: Generate report with pass/fail per viewportExecution:
javascript
// 1. Create test tab
await tabs_create_mcp() // -> tabId: 123
// 2. Navigate to homepage
await navigate({ url: "https://example.com/", tabId: 123 })
// 3. Mobile viewport (iPhone 14)
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png
// 4. Tablet viewport (iPad Pro 11)
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png
// 5. Desktop viewport (Desktop HD)
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png
// 6. Compare each against baseline from Memory MCP
// 7. Generate reportResult: 3/3 viewports passed, no regressions detected
Execution Time: 45 seconds
复杂度: 中等(3种视口,5个页面)
任务: 验证首页在移动、平板、桌面设备上的响应式行为
规划输出(sequential-thinking):
Thought 1/6: 需要验证首页的响应式断点
Thought 2/6: 视口: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: 捕获区域: 英雄区、功能区、定价区、页脚
Thought 4/6: 静态内容使用默认阈值(0.1%)
Thought 5/6: 检查基准是否存在,如存在则进行对比
Thought 6/6: 生成每个视口的通过/失败报告执行:
javascript
// 1. 创建测试标签
await tabs_create_mcp() // -> tabId: 123
// 2. 导航到首页
await navigate({ url: "https://example.com/", tabId: 123 })
// 3. 移动视口(iPhone 14)
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png
// 4. 平板视口(iPad Pro 11)
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png
// 5. 桌面视口(Desktop HD)
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png
// 6. 每个视口与Memory MCP中的基准对比
// 7. 生成报告结果: 3/3视口通过,未检测到回归问题
执行时间: 45秒
Example 2: Component State Testing (Buttons)
示例2:组件状态测试(按钮)
Complexity: Medium (4 states per button, zoom captures)
Task: Validate primary button visual states (default, hover, active, disabled)
Planning Output:
Thought 1/8: Testing primary button component visual states
Thought 2/8: States to capture: default, hover, active, disabled
Thought 3/8: Use zoom tool for detailed button capture
Thought 4/8: Strict threshold (0.01%) for design system component
Thought 5/8: Capture default state first
Thought 6/8: Use hover action for hover state
Thought 7/8: Use mouse down for active state
Thought 8/8: Navigate to disabled example for disabled stateExecution:
javascript
// 1. Navigate to component library
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })
// 2. Find button element
const button = await find({ query: "primary button", tabId: 123 })
// 3. Zoom capture default state
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// 4. Hover state capture
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// ... continue for active, disabled statesResult: 4/4 states passed strict threshold
Execution Time: 30 seconds
复杂度: 中等(每个按钮4种状态,缩放捕获)
任务: 验证主按钮的视觉状态(默认、悬停、激活、禁用)
规划输出:
Thought 1/8: 测试主按钮组件的视觉状态
Thought 2/8: 要捕获的状态: 默认、悬停、激活、禁用
Thought 3/8: 使用缩放工具进行按钮的详细捕获
Thought 4/8: 设计系统组件使用严格阈值(0.01%)
Thought 5/8: 先捕获默认状态
Thought 6/8: 使用悬停操作获取悬停状态
Thought 7/8: 使用鼠标按下获取激活状态
Thought 8/8: 导航到禁用示例获取禁用状态执行:
javascript
// 1. 导航到组件库
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })
// 2. 查找按钮元素
const button = await find({ query: "primary button", tabId: 123 })
// 3. 缩放捕获默认状态
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// 4. 悬停状态捕获
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// ... 继续捕获激活、禁用状态结果: 4/4状态通过严格阈值
执行时间: 30秒
Example 3: Animation Recording (Loading Sequence)
示例3:动画录制(加载序列)
Complexity: High (GIF recording, temporal comparison)
Task: Capture and validate skeleton-to-content loading animation
Planning Output:
Thought 1/6: Need to capture loading animation as GIF
Thought 2/6: Trigger reload to capture full sequence
Thought 3/6: Start GIF recording before reload
Thought 4/6: Wait for content load completion
Thought 5/6: Stop recording and export GIF
Thought 6/6: Use animation threshold (5%) for comparisonExecution:
javascript
// 1. Start GIF recording
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Initial frame
// 2. Trigger reload
await navigate({ url: "https://example.com/dashboard", tabId: 123 })
// 3. Wait for load sequence
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Final frame
// 4. Stop recording and export
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })Result: Animation captured successfully, 2.3% diff from baseline (within 5% animation threshold)
Execution Time: 15 seconds
复杂度: 高(GIF录制,时间维度对比)
任务: 捕获并验证骨架屏到内容的加载动画
规划输出:
Thought 1/6: 需要将加载动画捕获为GIF
Thought 2/6: 触发重载以捕获完整序列
Thought 3/6: 重载前开始GIF录制
Thought 4/6: 等待内容加载完成
Thought 5/6: 停止录制并导出GIF
Thought 6/6: 使用动画阈值(5%)进行对比执行:
javascript
// 1. 开始GIF录制
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // 初始帧
// 2. 触发重载
await navigate({ url: "https://example.com/dashboard", tabId: 123 })
// 3. 等待加载序列完成
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // 最终帧
// 4. 停止录制并导出
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })结果: 动画捕获成功,与基准差异为2.3%(在5%的动画阈值范围内)
执行时间: 15秒
Troubleshooting
故障排除
Common Issues and Solutions
常见问题与解决方案
| Issue | Cause | Solution |
|---|---|---|
| Screenshots are blank/black | Page not fully loaded | Add wait after navigation, check for lazy loading |
| Diff always fails | Threshold too strict | Increase threshold or configure ignore regions |
| Viewport resize not working | Tab permission issue | Create new tab with tabs_create_mcp |
| GIF not recording | Recording not started | Call gif_creator start_recording before actions |
| Baseline not found | Wrong namespace key | Verify page/viewport in Memory MCP query |
| Zoom captures wrong region | Coordinates shifted | Recalculate region after viewport resize |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 截图为空/黑屏 | 页面未完全加载 | 导航后添加等待,检查懒加载 |
| 差异始终失败 | 阈值过于严格 | 提高阈值或配置忽略区域 |
| 视口调整无效 | 标签权限问题 | 使用tabs_create_mcp创建新标签 |
| GIF未录制 | 未开始录制 | 操作前调用gif_creator start_recording |
| 基准未找到 | 命名空间键错误 | 验证Memory MCP查询中的页面/视口 |
| 缩放捕获错误区域 | 坐标偏移 | 视口调整后重新计算区域 |
Debug Mode
调试模式
Enable verbose output for troubleshooting:
javascript
const DEBUG_MODE = true;
if (DEBUG_MODE) {
console.log("Viewport:", viewport);
console.log("Page URL:", url);
console.log("Capture timestamp:", new Date().toISOString());
console.log("Baseline exists:", baselineExists);
console.log("Diff result:", diffResult);
}启用详细输出进行故障排除:
javascript
const DEBUG_MODE = true;
if (DEBUG_MODE) {
console.log("Viewport:", viewport);
console.log("Page URL:", url);
console.log("Capture timestamp:", new Date().toISOString());
console.log("Baseline exists:", baselineExists);
console.log("Diff result:", diffResult);
}Conclusion
结论
Visual Testing provides systematic screenshot-based regression detection that complements functional testing. By comparing actual rendered output against approved baselines across multiple viewports, this skill catches UI regressions that unit and integration tests miss.
The key differentiators are:
- Baseline management: Versioned golden images with explicit approval workflow
- Multi-viewport coverage: Responsive testing across mobile, tablet, and desktop
- Threshold-based comparison: Configurable tolerance to balance sensitivity and false positives
- Zoom capabilities: Element-level precision for design system validation
- GIF recording: Temporal capture for animation and interaction testing
When integrated with the CI/CD pipeline, visual testing serves as a deployment gate that prevents visual regressions from reaching production. Combined with Memory MCP for persistent baselines, the system maintains consistent quality across releases.
视觉测试提供系统化的基于截图的回归检测,是功能测试的补充。通过在多视口下对比实际渲染输出与已批准的基准,该技能能捕获单元测试和集成测试遗漏的UI回归问题。
关键差异化特性:
- 基准管理: 带明确批准工作流的版本化基准图像
- 多视口覆盖: 移动、平板、桌面设备的响应式测试
- 基于阈值的对比: 可配置容差,平衡敏感度与误报
- 缩放功能: 设计系统验证的元素级精准性
- GIF录制: 动画与交互测试的时间维度捕获
当集成到CI/CD流水线时,视觉测试可作为部署闸门,防止视觉回归问题进入生产环境。结合Memory MCP实现持久化基准,系统可在多个版本中保持一致的质量。
Success Criteria
成功标准
Quality Thresholds:
- All configured viewports captured successfully
- Baseline comparison completed for all captures (or flagged as new)
- Report generated with pass/fail status per page/viewport
- No orphaned tabs after test completion
- Execution time within 2x estimated duration
Failure Indicators:
- Screenshot capture fails (blank/timeout)
- Comparison fails with system error (not threshold failure)
- Memory MCP unavailable for baseline storage
- Tab context lost during multi-viewport capture
质量阈值:
- 所有配置的视口均成功捕获
- 所有捕获完成基准对比(或标记为新基准)
- 生成带页面/视口通过/失败状态的报告
- 测试完成后无孤立标签
- 执行时间在预估时长的2倍以内
失败指标:
- 截图捕获失败(空白/超时)
- 对比因系统错误失败(非阈值失败)
- Memory MCP不可用于基准存储
- 多视口捕获过程中丢失标签上下文
Completion Verification
完成验证
- YAML frontmatter with full description and triggers
- Overview explains philosophy and methodology
- Core Principles section has 5 principles with practical guidance
- When to Use has clear use/don't-use criteria
- Main Workflow has 6 phases with contracts
- Pattern Recognition covers 4 testing patterns
- Advanced Techniques includes multi-model and ignore regions
- Common Anti-Patterns has 3 tables (capture, comparison, workflow)
- Cross-Skill Coordination documents upstream/downstream/parallel
- MCP Requirements explains all required tools
- Input/Output Contracts clearly specified in YAML
- LEARNED PATTERNS section present (empty for future updates)
- Examples include 3 concrete scenarios
- Troubleshooting addresses common issues
- Conclusion summarizes skill value
- Memory namespace documented with tagging
<promise>VISUAL_TESTING_VERILINGUA_VERIX_COMPLIANT</promise>
- 带完整描述和触发条件的YAML前置元数据
- 概述解释核心理念和方法论
- 核心原则部分包含5项原则及实践指导
- 何时使用部分包含清晰的使用/不使用标准
- 主要工作流包含6个阶段及约定
- 模式识别覆盖4种测试模式
- 高级技术包含多模型和忽略区域
- 常见反模式包含3个表格(捕获、对比、工作流)
- 跨技能协作记录了上游/下游/并行技能
- MCP需求解释了所有必需工具
- 输入/输出约定以YAML清晰指定
- LEARNED PATTERNS部分存在(未来更新用)
- 示例包含3个具体场景
- 故障排除解决常见问题
- 结论总结技能价值
- 内存命名空间及标签已记录
<promise>VISUAL_TESTING_VERILINGUA_VERIX_COMPLIANT</promise>