visual-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Visual Testing

视觉测试

Kanitsal Cerceve (Evidential Frame Activation)

证据框架激活（Kanitsal Cerceve）

Kaynak dogrulama modu etkin.

[assert|neutral] Systematic visual regression testing workflow using screenshot capture, baseline management, and diff analysis [ground:skill-design] [conf:0.92] [state:confirmed]

源验证模式已激活。

[断言|中立] 基于截图捕获、基准管理和差异分析的系统化视觉回归测试工作流 [依据:skill-design] [置信度:0.92] [状态:已确认]

Overview

概述

Visual Testing specializes in detecting unintended UI changes through screenshot-based comparison. Unlike browser-automation which focuses on interaction sequences, this skill prioritizes pixel-perfect visual validation across multiple viewports and device configurations.

Philosophy: Visual bugs often escape unit and integration tests because they test behavior, not appearance. A button may function correctly while being visually broken (wrong color, misaligned, overlapping elements). Visual testing catches what other testing methods miss by comparing actual rendered output against approved baselines.

Methodology: Six-phase workflow with baseline management:

PLAN Phase: Sequential-thinking MCP decomposes visual test cases with viewport configurations
NAVIGATE Phase: Position page in correct state for capture
CAPTURE Phase: Multi-viewport screenshot collection with zoom for detail inspection
COMPARE Phase: Pixel-level diff against baseline (if exists)
REPORT Phase: Generate visual regression report with highlighted changes
BASELINE Phase: Update golden images (with approval) or flag regression

Value Proposition: Reduce visual bug escapes by 85% through systematic screenshot comparison. Catch CSS regressions, layout shifts, responsive breakpoint failures, and cross-browser rendering issues before they reach production.

Key Differentiation from browser-automation:

Aspect	browser-automation	visual-testing
Focus	Interaction sequences	Visual state capture
Output	Workflow completion	Diff reports
Validation	Functional success	Pixel comparison
Artifacts	Execution logs	Baseline images + diffs
Primary Use	E2E workflows	Regression detection

视觉测试专注于通过基于截图的对比检测非预期的UI变化。与专注于交互序列的浏览器自动化不同，该技能优先在多视口和设备配置下进行像素级精准的视觉验证。

核心理念：视觉缺陷常常能躲过单元测试和集成测试，因为这些测试验证的是行为而非外观。一个按钮可能功能正常，但视觉上存在问题（颜色错误、对齐偏差、元素重叠）。视觉测试通过将实际渲染输出与已批准的基准进行对比，捕捉其他测试方法遗漏的问题。

方法论：包含基准管理的六阶段工作流：

规划阶段：Sequential-thinking MCP将视觉测试用例按视口配置分解
导航阶段：将页面定位到适合捕获的状态
捕获阶段：多视口截图收集，支持缩放以进行细节检查
对比阶段：与基准进行像素级差异对比（如果基准存在）
报告阶段：生成带有高亮变化的视觉回归报告
基准阶段：（经批准后）更新基准图像，或标记回归问题

价值主张：通过系统化的截图对比将视觉缺陷遗漏率降低85%。在问题进入生产环境前，捕获CSS回归、布局偏移、响应式断点失效和跨浏览器渲染问题。

与browser-automation的关键区别:

维度	browser-automation	visual-testing
关注点	交互序列	视觉状态捕获
输出	工作流完成状态	差异报告
验证方式	功能正确性	像素对比
产物	执行日志	基准图像+差异图
主要用途	端到端工作流	回归检测

When to Use This Skill

何时使用该技能

Trigger Thresholds:

Scenario	Recommendation
Single page screenshot	Use computer tool directly (too simple)
2-5 page visual checks	Consider this skill
Multi-viewport responsive testing	Mandatory use
Baseline comparison needed	Mandatory use
Design system validation	Mandatory use

Primary Use Cases:

CSS regression detection after style changes
Responsive layout validation across breakpoints
Component library visual testing
Design system compliance checking
Cross-browser rendering comparison
Animation and transition capture (via GIF)
Before/after deployment comparison

Apply When:

Deploying UI changes that may affect multiple pages
Validating responsive breakpoints work correctly
Ensuring design system tokens apply consistently
Comparing staging vs production appearance
Documenting UI states for handoff

触发阈值:

场景	建议
单页面截图	直接使用计算机工具（过于简单）
2-5个页面的视觉检查	考虑使用该技能
多视口响应式测试	必须使用
需要基准对比	必须使用
设计系统验证	必须使用

主要使用场景:

样式变更后的CSS回归检测
跨断点的响应式布局验证
组件库视觉测试
设计系统合规性检查
跨浏览器渲染对比
动画与过渡效果捕获（通过GIF）
部署前后对比

适用时机:

部署可能影响多个页面的UI变更时
验证响应式断点是否正常工作时
确保设计系统令牌一致应用时
对比预发布环境与生产环境的外观时
为交接记录UI状态时

When NOT to Use This Skill

何时不使用该技能

Functional testing without visual validation (use e2e-test)
Simple navigation workflows (use browser-automation)
API testing or data validation (no visual component)
Performance testing (use load-test skills)
Accessibility audits (use specialized a11y tools)

无需视觉验证的功能测试（使用e2e-test）
简单导航工作流（使用browser-automation）
API测试或数据验证（无视觉组件）
性能测试（使用load-test技能）
可访问性审计（使用专用a11y工具）

Core Principles

核心原则

Visual Testing operates on 5 fundamental principles:

视觉测试遵循5项基本原则:

Principle 1: Baseline-First Approach

原则1：基准优先方法

Golden images (baselines) serve as the source of truth. Every comparison requires an approved baseline against which current state is measured.

Rationale: Without baselines, visual testing becomes subjective screenshot collection. Baselines make quality objective and measurable.

In Practice:

Capture initial baselines for all critical pages/viewports
Store baselines in Memory MCP with project/page/viewport keys
Version baselines (ISO8601 timestamps) for rollback capability
Require explicit approval before baseline updates

基准图像（Golden images）作为事实来源。每次对比都需要一个已批准的基准，以此为标准衡量当前状态。

原理：没有基准的话，视觉测试就变成了主观的截图收集。基准让质量评估变得客观且可衡量。

实践方式:

为所有关键页面/视口捕获初始基准
使用项目/页面/视口作为键，将基准存储在Memory MCP中
为基准添加版本（ISO8601时间戳）以支持回滚
基准更新前需要明确的批准

Principle 2: Multi-Viewport Coverage

原则2：多视口覆盖

Test across multiple viewport configurations to catch responsive regressions that only appear at specific breakpoints.

Rationale: Most visual bugs manifest at edge cases - unusual screen widths, portrait vs landscape, mobile vs desktop. Single-viewport testing misses these.

In Practice:

Always test at minimum 3 viewports (mobile, tablet, desktop)
Include both portrait and landscape orientations
Use standardized viewport presets for consistency
Document viewport matrix in test plan

在多视口配置下进行测试，以捕获仅在特定断点出现的响应式回归问题。

原理：大多数视觉缺陷出现在边缘场景中——非常规屏幕宽度、竖屏vs横屏、移动设备vs桌面设备。单视口测试会遗漏这些问题。

实践方式:

至少测试3种视口（移动、平板、桌面）
同时包含竖屏和横屏方向
使用标准化视口预设以保证一致性
在测试计划中记录视口矩阵

Principle 3: Threshold-Based Comparison

原则3：基于阈值的对比

Not every pixel difference is a regression. Configure tolerance thresholds to distinguish intentional changes from bugs.

Rationale: Anti-aliasing, font rendering, and timing-dependent animations create non-deterministic pixel variations. Zero-tolerance comparison produces false positives.

In Practice:

Set default threshold at 0.1% pixel difference (99.9% match required)
Use higher thresholds for animation-heavy pages (1-2%)
Ignore specific regions known for dynamic content (timestamps, ads)
Track threshold effectiveness and tune over time

并非所有像素差异都是回归问题。配置容差阈值，区分有意变更与缺陷。

原理：抗锯齿、字体渲染和时间相关的动画会产生非确定性的像素变化。零容差对比会产生误报。

实践方式:

默认阈值设为0.1%像素差异（要求99.9%匹配）
动画较多的页面使用更高阈值（1-2%）
忽略已知动态内容区域（时间戳、广告）
跟踪阈值有效性并随时间调整

Principle 4: Element-Level Zoom for Precision

原则4：元素级缩放以实现精准性

Use the zoom tool for detailed inspection of specific UI elements when full-page screenshots insufficient.

Rationale: Small elements (icons, badges, indicators) may have regressions invisible at full-page scale. Zoomed captures reveal micro-regressions.

In Practice:

Capture full page first, then zoom to critical elements
Define zoom regions in test plan (coordinates or element refs)
Compare zoomed regions independently
Document element-level baselines separately

当全页截图不足以满足需求时，使用缩放工具对特定UI元素进行详细检查。

原理：小元素（图标、徽章、指示器）的回归问题可能在全页尺度下不可见。缩放捕获能发现微观回归问题。

实践方式:

先捕获全页，再对关键元素进行缩放
在测试计划中定义缩放区域（坐标或元素引用）
独立对比缩放区域
单独记录元素级基准

Principle 5: GIF Recording for Interactions

原则5：使用GIF录制交互过程

Static screenshots miss animation and transition regressions. Use GIF recording to capture temporal UI behavior.

Rationale: CSS animations, hover states, loading sequences, and micro-interactions are visible only in motion. GIF recording captures these temporal aspects.

In Practice:

Record GIFs for pages with significant animations
Capture hover/focus/active states in sequence
Use GIFs for documenting before/after comparisons
Store GIFs with interaction metadata

静态截图会遗漏动画和过渡回归问题。使用GIF录制捕获UI的时间行为。

原理：CSS动画、悬停状态、加载序列和微交互只有在动态状态下才可见。GIF录制能捕获这些时间维度的表现。

实践方式:

为包含大量动画的页面录制GIF
按顺序捕获悬停/聚焦/激活状态
使用GIF记录前后对比
存储带有交互元数据的GIF

Production Guardrails

生产环境防护措施

MCP Preflight Check Protocol

MCP预检协议

Before executing visual tests, validate required MCPs:

Preflight Sequence:

javascript

async function visualTestPreflight() {
  const checks = {
    sequential_thinking: false,
    claude_in_chrome: false,
    memory_mcp: false
  };

  // Check sequential-thinking MCP (required for planning)
  try {
    await mcp__sequential-thinking__sequentialthinking({
      thought: "Visual test preflight - verifying MCP availability",
      thoughtNumber: 1,
      totalThoughts: 1,
      nextThoughtNeeded: false
    });
    checks.sequential_thinking = true;
  } catch (error) {
    throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
  }

  // Check claude-in-chrome MCP (required for capture)
  try {
    const context = await mcp__claude-in-chrome__tabs_context_mcp({});
    checks.claude_in_chrome = true;
  } catch (error) {
    throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
  }

  // Check memory-mcp (required for baseline storage)
  try {
    // Memory MCP check
    checks.memory_mcp = true;
  } catch (error) {
    throw new Error("CRITICAL: memory-mcp required for baseline storage");
  }

  return checks;
}

执行视觉测试前，验证所需的MCP:

预检流程:

javascript

async function visualTestPreflight() {
  const checks = {
    sequential_thinking: false,
    claude_in_chrome: false,
    memory_mcp: false
  };

  // 检查sequential-thinking MCP（规划阶段必需）
  try {
    await mcp__sequential-thinking__sequentialthinking({
      thought: "Visual test preflight - verifying MCP availability",
      thoughtNumber: 1,
      totalThoughts: 1,
      nextThoughtNeeded: false
    });
    checks.sequential_thinking = true;
  } catch (error) {
    throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
  }

  // 检查claude-in-chrome MCP（捕获阶段必需）
  try {
    const context = await mcp__claude-in-chrome__tabs_context_mcp({});
    checks.claude_in_chrome = true;
  } catch (error) {
    throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
  }

  // 检查memory-mcp（基准存储必需）
  try {
    // Memory MCP check
    checks.memory_mcp = true;
  } catch (error) {
    throw new Error("CRITICAL: memory-mcp required for baseline storage");
  }

  return checks;
}

Viewport Preset Configuration

视口预设配置

Standard Viewport Matrix:

javascript

const VIEWPORT_PRESETS = {
  // Mobile Devices
  iphone_se: { width: 375, height: 667, name: "iPhone SE" },
  iphone_14: { width: 390, height: 844, name: "iPhone 14" },
  iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
  pixel_7: { width: 412, height: 915, name: "Pixel 7" },

  // Tablets
  ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
  ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
  ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },

  // Desktop
  laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
  laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
  desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
  desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};

// Standard test matrix (most common)
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];

// Extended test matrix (comprehensive)
const EXTENDED_MATRIX = [
  "iphone_se", "iphone_14_pro_max", "pixel_7",
  "ipad_mini", "ipad_pro_12",
  "laptop_sm", "desktop_hd", "desktop_4k"
];

标准视口矩阵:

javascript

const VIEWPORT_PRESETS = {
  // 移动设备
  iphone_se: { width: 375, height: 667, name: "iPhone SE" },
  iphone_14: { width: 390, height: 844, name: "iPhone 14" },
  iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
  pixel_7: { width: 412, height: 915, name: "Pixel 7" },

  // 平板设备
  ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
  ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
  ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },

  // 桌面设备
  laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
  laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
  desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
  desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};

// 标准测试矩阵（最常用）
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];

// 扩展测试矩阵（全面覆盖）
const EXTENDED_MATRIX = [
  "iphone_se", "iphone_14_pro_max", "pixel_7",
  "ipad_mini", "ipad_pro_12",
  "laptop_sm", "desktop_hd", "desktop_4k"
];

Diff Threshold Configuration

差异阈值配置

javascript

const DIFF_THRESHOLDS = {
  // Strict (design system components)
  strict: {
    pixelDiff: 0.01,  // 0.01% tolerance (nearly pixel-perfect)
    description: "For design system components requiring exact match"
  },

  // Default (most pages)
  default: {
    pixelDiff: 0.1,   // 0.1% tolerance
    description: "Standard threshold for most UI testing"
  },

  // Relaxed (dynamic content)
  relaxed: {
    pixelDiff: 1.0,   // 1% tolerance
    description: "For pages with minor dynamic variations"
  },

  // Animation (high variance)
  animation: {
    pixelDiff: 5.0,   // 5% tolerance
    description: "For animation captures with timing variance"
  }
};

javascript

const DIFF_THRESHOLDS = {
  // 严格模式（设计系统组件）
  strict: {
    pixelDiff: 0.01,  // 0.01%容差（近乎像素级完美匹配）
    description: "For design system components requiring exact match"
  },

  // 默认模式（大多数页面）
  default: {
    pixelDiff: 0.1,   // 0.1%容差
    description: "Standard threshold for most UI testing"
  },

  // 宽松模式（动态内容）
  relaxed: {
    pixelDiff: 1.0,   // 1%容差
    description: "For pages with minor dynamic variations"
  },

  // 动画模式（高差异）
  animation: {
    pixelDiff: 5.0,   // 5%容差
    description: "For animation captures with timing variance"
  }
};

Error Handling Framework

错误处理框架

Error Categories:

Category	Example	Recovery Strategy
MCP_UNAVAILABLE	claude-in-chrome offline	ABORT - cannot proceed
NAVIGATION_FAILED	Page timeout/404	Retry 3x with backoff
CAPTURE_FAILED	Screenshot error	Retry with fresh tab
BASELINE_MISSING	No golden image	Prompt for baseline creation
COMPARISON_FAILED	Diff computation error	Log and skip, flag for review
THRESHOLD_EXCEEDED	Visual regression detected	Generate report, flag issue

错误分类:

分类	示例	恢复策略
MCP_UNAVAILABLE	claude-in-chrome离线	终止 - 无法继续
NAVIGATION_FAILED	页面超时/404	重试3次，带退避机制
CAPTURE_FAILED	截图错误	使用新标签重试
BASELINE_MISSING	无基准图像	提示创建基准
COMPARISON_FAILED	差异计算错误	记录并跳过，标记为需要复查
THRESHOLD_EXCEEDED	检测到视觉回归	生成报告，标记问题

Main Workflow

主要工作流

Phase 1: Test Planning (MANDATORY)

阶段1：测试规划（必需）

Purpose: Define visual test scope using sequential-thinking decomposition.

Process:

Invoke sequential-thinking MCP
Identify target pages/URLs
Select viewport configurations
Define capture regions (full page, element-specific)
Set comparison thresholds
Plan interaction sequences for state-dependent captures

Input Contract:

yaml

inputs:
  target_url: string           # URL to test
  pages: list[string]          # Page paths to capture
  viewport_matrix: list[string] # Viewport presets to use
  capture_mode: string         # "full_page" | "element" | "both"
  threshold_profile: string    # "strict" | "default" | "relaxed"
  interaction_sequence: list   # Optional: actions before capture

Output Contract:

yaml

outputs:
  test_plan:
    pages: list[PagePlan]
    viewports: list[ViewportConfig]
    capture_points: list[CapturePoint]
    threshold: number

目的：使用sequential-thinking分解法定义视觉测试范围。

流程:

调用sequential-thinking MCP
确定目标页面/URL
选择视口配置
定义捕获区域（全页、特定元素）
设置对比阈值
为依赖状态的捕获规划交互序列

输入约定:

yaml

inputs:
  target_url: string           # 待测试的URL
  pages: list[string]          # 要捕获的页面路径
  viewport_matrix: list[string] # 要使用的视口预设
  capture_mode: string         # "full_page" | "element" | "both"
  threshold_profile: string    # "strict" | "default" | "relaxed"
  interaction_sequence: list   # 可选：捕获前的操作

输出约定:

yaml

outputs:
  test_plan:
    pages: list[PagePlan]
    viewports: list[ViewportConfig]
    capture_points: list[CapturePoint]
    threshold: number

Phase 2: Navigation & State Setup

阶段2：导航与状态设置

Purpose: Navigate to target page and establish correct state for capture.

Process:

Get/create tab context (tabs_context_mcp, tabs_create_mcp)
Navigate to target URL
Wait for page load completion
Execute interaction sequence if needed (login, scroll, hover)
Verify page state ready for capture

Agent:

Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")

目的：导航到目标页面并建立适合捕获的状态。

流程:

获取/创建标签上下文（tabs_context_mcp, tabs_create_mcp）
导航到目标URL
等待页面加载完成
如有需要，执行交互序列（登录、滚动、悬停）
验证页面状态已准备好进行捕获

代理：

Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")

Phase 3: Multi-Viewport Capture

阶段3：多视口捕获

Purpose: Capture screenshots across all configured viewports.

Process:

For each viewport in viewport_matrix:
  1. Resize window (resize_window)
  2. Wait for reflow (wait 500ms)
  3. Capture full page (computer screenshot)
  4. Capture zoomed regions if configured (computer zoom)
  5. Store capture with viewport/page metadata

Key Tools:

```
resize_window
```
: Set viewport dimensions
```
computer
```
(screenshot): Full page capture
```
computer
```
(zoom): Element-level detail capture
```
gif_creator
```
: For interaction sequences

目的：在所有配置的视口下捕获截图。

流程:

For each viewport in viewport_matrix:
  1. 调整窗口大小（resize_window）
  2. 等待重排（等待500ms）
  3. 捕获全页（computer截图功能）
  4. 如已配置，捕获缩放区域（computer缩放功能）
  5. 存储带有视口/页面元数据的捕获内容

核心工具:

```
resize_window
```
: 设置视口尺寸
```
computer
```
(screenshot): 全页捕获
```
computer
```
(zoom): 元素级细节捕获
```
gif_creator
```
: 用于交互序列录制

Phase 4: Baseline Comparison

阶段4：基准对比

Purpose: Compare current captures against stored baselines.

Process:

Query Memory MCP for baseline (namespace:

visual-testing/baselines/{project}/{page}/{viewport}

)

If baseline exists:
- Compute pixel diff percentage
- Generate diff visualization (highlight changed pixels)
- Apply threshold comparison
If baseline missing:
- Flag as "new baseline needed"
- Prompt for approval

Comparison Algorithm:

javascript

function compareScreenshots(current, baseline, threshold) {
  const totalPixels = current.width * current.height;
  let diffPixels = 0;

  for (let y = 0; y < current.height; y++) {
    for (let x = 0; x < current.width; x++) {
      if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
        diffPixels++;
      }
    }
  }

  const diffPercent = (diffPixels / totalPixels) * 100;
  return {
    passed: diffPercent <= threshold,
    diffPercent: diffPercent,
    diffPixels: diffPixels,
    totalPixels: totalPixels
  };
}

目的：将当前捕获内容与存储的基准进行对比。

流程:

查询Memory MCP获取基准（命名空间:

visual-testing/baselines/{project}/{page}/{viewport}

）

如果基准存在:
- 计算像素差异百分比
- 生成差异可视化（高亮变化的像素）
- 应用阈值对比
如果基准不存在:
- 标记为“需要创建新基准”
- 提示获取批准

对比算法:

javascript

function compareScreenshots(current, baseline, threshold) {
  const totalPixels = current.width * current.height;
  let diffPixels = 0;

  for (let y = 0; y < current.height; y++) {
    for (let x = 0; x < current.width; x++) {
      if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
        diffPixels++;
      }
    }
  }

  const diffPercent = (diffPixels / totalPixels) * 100;
  return {
    passed: diffPercent <= threshold,
    diffPercent: diffPercent,
    diffPixels: diffPixels,
    totalPixels: totalPixels
  };
}

Phase 5: Report Generation

阶段5：报告生成

Purpose: Generate comprehensive visual regression report.

Process:

Aggregate comparison results across all pages/viewports
Generate summary (pass/fail counts, worst regressions)
Create diff visualizations (side-by-side, overlay, diff-only)
Include metadata (timestamps, viewport configs, thresholds)
Store report in Memory MCP

Report Structure:

yaml

visual_regression_report:
  timestamp: ISO8601
  project: string
  summary:
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      threshold: number
      baseline_timestamp: ISO8601
      current_capture_id: string
  metadata:
    viewports_tested: list
    threshold_profile: string
    duration_ms: number

目的：生成全面的视觉回归报告。

流程:

汇总所有页面/视口的对比结果
生成摘要（通过/失败数量，最严重的回归问题）
创建差异可视化（并排对比、叠加对比、仅差异图）
包含元数据（时间戳、视口配置、阈值）
将报告存储在Memory MCP中

报告结构:

yaml

visual_regression_report:
  timestamp: ISO8601
  project: string
  summary:
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      threshold: number
      baseline_timestamp: ISO8601
      current_capture_id: string
  metadata:
    viewports_tested: list
    threshold_profile: string
    duration_ms: number

Phase 6: Baseline Management

阶段6：基准管理

Purpose: Update baselines when changes are intentional.

Process:

For failed comparisons, determine if change is intentional
If intentional: Update baseline with approval
If regression: Flag for fix
For new pages: Create initial baseline with approval
Version old baselines (keep 5 most recent)

Baseline Storage Schema:

yaml

baseline:
  namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
  data:
    image_id: string          # Reference to stored screenshot
    captured_at: ISO8601
    approved_by: string
    threshold_used: number
    viewport: object
    url: string
    version: number
  tags:
    WHO: "visual-testing:1.0.0"
    WHEN: ISO8601
    PROJECT: string
    WHY: "baseline-capture"

目的：当变更为有意操作时更新基准。

流程:

对于失败的对比，判断变更是有意还是无意
如果是有意变更：经批准后更新基准
如果是回归问题：标记为需要修复
对于新页面：经批准后创建初始基准
为旧基准添加版本（保留最近5个版本）

基准存储模式:

yaml

baseline:
  namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
  data:
    image_id: string          # 存储截图的引用
    captured_at: ISO8601
    approved_by: string
    threshold_used: number
    viewport: object
    url: string
    version: number
  tags:
    WHO: "visual-testing:1.0.0"
    WHEN: ISO8601
    PROJECT: string
    WHY: "baseline-capture"

LEARNED PATTERNS

已学习模式

High Confidence [conf:0.90+]

高置信度 [conf:0.90+]

No patterns recorded yet. This section will be updated through Loop 1.5 reflection.

Medium Confidence [conf:0.70-0.89]

中等置信度 [conf:0.70-0.89]

No patterns recorded yet.

Low Confidence [conf:0.50-0.69]

低置信度 [conf:0.50-0.69]

No patterns recorded yet.

Pattern Recognition

模式识别

Different visual testing scenarios require different approaches:

不同的视觉测试场景需要不同的方法:

Responsive Layout Testing

响应式布局测试

Patterns: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"

Common Characteristics:

Multiple viewport configurations required
Layout shifts are primary concern
Element visibility/hiding at breakpoints
Text wrapping and overflow behavior

Key Focus:

Breakpoint transitions (where layouts shift)
Navigation collapse/expand behavior
Grid/flex layout stability
Touch target sizing on mobile

Approach: Use extended viewport matrix, focus on breakpoint edge cases (width +/- 10px from breakpoint)

模式: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"

共同特征:

需要多视口配置
主要关注点是布局偏移
断点处元素的显示/隐藏
文本换行和溢出行为

核心关注点:

断点过渡（布局发生变化的位置）
导航栏的折叠/展开行为
网格/弹性布局的稳定性
移动设备上的触摸目标尺寸

方法: 使用扩展视口矩阵，聚焦断点边缘场景（断点宽度±10px）

Component Visual Testing

组件视觉测试

Patterns: "component", "button", "card", "form", "modal", "dropdown"

Common Characteristics:

Isolated element testing
State variations (default, hover, active, disabled, error)
Strict threshold requirements
Design token compliance

Key Focus:

Color accuracy (design tokens)
Spacing consistency
Typography rendering
Border/shadow rendering

Approach: Use zoom tool for detailed capture, strict threshold, capture all states via interaction sequence

模式: "component", "button", "card", "form", "modal", "dropdown"

共同特征:

孤立元素测试
状态变化（默认、悬停、激活、禁用、错误）
严格的阈值要求
设计令牌合规性

核心关注点:

颜色准确性（设计令牌）
间距一致性
排版渲染
边框/阴影渲染

方法: 使用缩放工具进行详细捕获，使用严格阈值，通过交互序列捕获所有状态

Animation/Transition Testing

动画/过渡测试

Patterns: "animation", "transition", "hover", "loading", "skeleton"

Common Characteristics:

Temporal behavior (not single frame)
GIF recording required
Higher diff thresholds due to timing variance
Performance-sensitive

Key Focus:

Animation timing correctness
Transition smoothness
Loading state appearance
Skeleton to content transition

Approach: Use gif_creator for recording, relaxed/animation threshold profile, capture key frames

模式: "animation", "transition", "hover", "loading", "skeleton"

共同特征:

时间维度的行为（非单帧）
需要GIF录制
由于时间差异需要更高的差异阈值
对性能敏感

核心关注点:

动画时间正确性
过渡平滑度
加载状态外观
骨架屏到内容的过渡

方法: 使用gif_creator进行录制，使用宽松/动画阈值配置文件，捕获关键帧

Cross-Environment Comparison

跨环境对比

Patterns: "staging vs production", "before after", "compare", "deploy validation"

Common Characteristics:

Two distinct environments/states
Side-by-side comparison needed
May have expected differences (content)
Focus on structural consistency

Key Focus:

Layout structure stability
Component presence/absence
Style application consistency
No unexpected visual changes

Approach: Capture both states, generate side-by-side diff, use relaxed threshold for content areas

模式: "staging vs production", "before after", "compare", "deploy validation"

共同特征:

两个不同的环境/状态
需要并排对比
可能存在预期差异（内容）
关注结构一致性

核心关注点:

布局结构稳定性
组件存在/缺失
样式应用一致性
无意外视觉变化

方法: 捕获两种状态，生成并排差异图，对内容区域使用宽松阈值

Advanced Techniques

高级技术

Audience-Specific Testing

受众特定测试

Different stakeholders need different visual test outputs:

Developers: Technical diffs with pixel coordinates, DOM structure comparison, CSS property changes

Designers: Visual overlays, color accuracy reports, spacing measurements, design token compliance

QA Team: Pass/fail summaries, regression counts, trend reports, baseline approval queues

Executives: High-level dashboards, regression trends, release readiness indicators

不同利益相关者需要不同的视觉测试输出:

开发人员: 带像素坐标的技术差异图、DOM结构对比、CSS属性变化

设计师: 视觉叠加层、颜色准确性报告、间距测量、设计令牌合规性

QA团队: 通过/失败摘要、回归数量、趋势报告、基准批准队列

管理人员: 高级仪表板、回归趋势、发布就绪指标

Ignore Regions Configuration

忽略区域配置

For pages with dynamic content, configure ignore regions to prevent false positives:

javascript

const IGNORE_REGIONS = {
  common: [
    { selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
    { selector: ".ad-container", reason: "Third-party ads" },
    { selector: ".live-chat-widget", reason: "Chat widget state varies" }
  ],
  page_specific: {
    "/dashboard": [
      { selector: ".metric-value", reason: "Live metrics" },
      { selector: ".user-avatar", reason: "User-specific content" }
    ]
  }
};

对于包含动态内容的页面，配置忽略区域以避免误报:

javascript

const IGNORE_REGIONS = {
  common: [
    { selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
    { selector: ".ad-container", reason: "Third-party ads" },
    { selector: ".live-chat-widget", reason: "Chat widget state varies" }
  ],
  page_specific: {
    "/dashboard": [
      { selector: ".metric-value", reason: "Live metrics" },
      { selector: ".user-avatar", reason: "User-specific content" }
    ]
  }
};

Multi-Model Validation

多模型验证

For critical visual tests, use LLM Council for consensus:

javascript

// When visual diff is borderline (threshold +/- 0.5%)
async function multiModelVisualValidation(current, baseline, diff) {
  const prompt = `
    Analyze this visual comparison:
    - Diff percentage: ${diff.diffPercent}%
    - Changed pixels: ${diff.diffPixels}
    - Threshold: ${diff.threshold}%

    Is this change:
    A) Intentional design update (approve new baseline)
    B) Unintentional regression (flag for fix)
    C) Acceptable variation (pass with note)

    Provide reasoning.
  `;

  // Route to Gemini for image analysis capability
  return await geminiAnalyze(current, baseline, prompt);
}

对于关键视觉测试，使用LLM委员会达成共识:

javascript

// 当视觉差异处于临界值时（阈值±0.5%）
async function multiModelVisualValidation(current, baseline, diff) {
  const prompt = `
    Analyze this visual comparison:
    - Diff percentage: ${diff.diffPercent}%
    - Changed pixels: ${diff.diffPixels}
    - Threshold: ${diff.threshold}%

    Is this change:
    A) Intentional design update (approve new baseline)
    B) Unintentional regression (flag for fix)
    C) Acceptable variation (pass with note)

    Provide reasoning.
  `;

  // 路由到Gemini进行图像分析
  return await geminiAnalyze(current, baseline, prompt);
}

Common Anti-Patterns

常见反模式

Avoid these common mistakes:

避免这些常见错误:

Capture Anti-Patterns

捕获反模式

Anti-Pattern	Problem	Solution
No wait after resize	Captures before reflow complete	Add 500ms wait after resize_window
Ignoring async content	Missing dynamically loaded elements	Wait for network idle or specific selectors
Single viewport only	Missing responsive regressions	Use minimum 3 viewports (mobile, tablet, desktop)
Capturing during animation	Non-deterministic frames	Wait for animations or use GIF

反模式	问题	解决方案
调整大小后不等待	在重排完成前捕获	调整窗口大小后添加500ms等待
忽略异步内容	遗漏动态加载的元素	等待网络空闲或特定选择器加载完成
仅单视口测试	遗漏响应式回归问题	至少使用3种视口（移动、平板、桌面）
动画过程中捕获	非确定性帧	等待动画完成或使用GIF

Comparison Anti-Patterns

对比反模式

Anti-Pattern	Problem	Solution
Zero tolerance	False positives from anti-aliasing	Use minimum 0.01% threshold
No baseline versioning	Cannot rollback bad baseline	Version baselines with timestamps
Comparing different viewports	Invalid diff	Validate viewport match before compare
No ignore regions	Dynamic content causes failures	Configure ignore regions for timestamps, ads

反模式	问题	解决方案
零容差	抗锯齿导致误报	至少使用0.01%的阈值
无基准版本控制	无法回滚错误的基准	为基准添加时间戳版本
对比不同视口	无效差异	对比前验证视口匹配
无忽略区域	动态内容导致失败	为时间戳、广告等配置忽略区域

Workflow Anti-Patterns

工作流反模式

Anti-Pattern	Problem	Solution
Skip planning phase	Missing edge cases	ALWAYS use sequential-thinking first
No interaction before capture	Missing auth/state-dependent pages	Plan interaction sequences
Silent baseline updates	Regressions approved accidentally	Require explicit approval
No cleanup	Orphaned tabs accumulate	Close tabs after test completion

反模式	问题	解决方案
跳过规划阶段	遗漏边缘场景	始终先使用sequential-thinking
捕获前无交互	遗漏需要认证/依赖状态的页面	规划交互序列
静默更新基准	回归问题被意外批准	需要明确的批准
无清理操作	孤立标签累积	测试完成后关闭标签

Practical Guidelines

实践指南

Full vs Quick Mode

完整模式 vs 快速模式

Full Mode (comprehensive):

All viewports in extended matrix
All pages in sitemap
Element-level zoom captures
GIF recording for animations
Duration: 5-15 minutes

Quick Mode (smoke test):

Standard matrix (3 viewports)
Critical pages only
Full-page captures only
Skip animations
Duration: 1-3 minutes

完整模式（全面）:

扩展矩阵中的所有视口
站点地图中的所有页面
元素级缩放捕获
动画的GIF录制
时长: 5-15分钟

快速模式（冒烟测试）:

标准矩阵（3种视口）
仅关键页面
仅全页捕获
跳过动画
时长: 1-3分钟

Checkpoint Strategy

检查点策略

For large test suites (20+ pages):

Save progress every 5 pages
Store partial results in Memory MCP
Enable resume on failure
Timeout individual captures at 30 seconds

对于大型测试套件（20+页面）:

每5个页面保存一次进度
将部分结果存储在Memory MCP中
支持失败后恢复
单个捕获超时设为30秒

Trade-offs

权衡决策

Decision	Option A	Option B	Guidance
Threshold strictness	Strict (0.01%)	Relaxed (1%)	Strict for design system, relaxed for content-heavy
Viewport coverage	Extended (8+)	Standard (3)	Extended for responsive-focused apps
Capture mode	Full page	Element zoom	Full page default, zoom for component testing
Baseline storage	Local	Memory MCP	Memory MCP for cross-session persistence

决策	选项A	选项B	指导原则
阈值严格性	严格（0.01%）	宽松（1%）	设计系统用严格模式，内容密集页面用宽松模式
视口覆盖	扩展（8+）	标准（3）	响应式聚焦的应用使用扩展模式
捕获模式	全页	元素缩放	默认全页，组件测试用缩放
基准存储	本地	Memory MCP	Memory MCP用于跨会话持久化

Cross-Skill Coordination

跨技能协作

Visual Testing works with other skills in the ecosystem:

视觉测试与生态系统中的其他技能配合工作:

Upstream Skills (provide input)

上游技能（提供输入）

Skill	When to Use First	What It Provides
`intent-analyzer`	Always first	Detect visual testing need, extract URLs
`browser-automation`	For complex page states	Navigation + interaction to reach state
`prompt-architect`	For test plan optimization	Structured test specifications

技能	何时先使用	提供内容
`intent-analyzer`	始终首先使用	检测视觉测试需求，提取URL
`browser-automation`	复杂页面状态	导航+交互以达到目标状态
`prompt-architect`	测试计划优化	结构化测试规范

Downstream Skills (use output)

下游技能（使用输出）

Skill	When to Use After	What It Does
`fix-bug`	On regression detection	Fix visual bugs identified
`documenter`	For test reports	Generate visual test documentation
`deployment`	Before deploy	Gate deployment on visual test pass

技能	何时在之后使用	功能
`fix-bug`	检测到回归时	修复识别出的视觉缺陷
`documenter`	生成测试报告后	生成视觉测试文档
`deployment`	部署前	基于视觉测试通过情况控制部署

Parallel Skills (run alongside)

并行技能（同时运行）

Skill	When to Run Together	How They Coordinate
`e2e-test`	Same page coverage	Visual captures functional tests
`browser-automation`	Page state setup	Automation provides capture-ready state
`code-review-assistant`	CSS changes	Visual test validates review findings

技能	何时同时运行	协作方式
`e2e-test`	相同页面覆盖	视觉捕获验证功能测试
`browser-automation`	页面状态设置	自动化提供可捕获的状态
`code-review-assistant`	CSS变更时	视觉测试验证评审结果

MCP Integration

MCP集成

Required MCPs:

MCP	Purpose	Tools Used
sequential-thinking	Test planning	`sequentialthinking`
claude-in-chrome	Screenshot capture	`navigate` , `resize_window` , `computer` (screenshot, zoom), `gif_creator` , `tabs_context_mcp` , `tabs_create_mcp`
memory-mcp	Baseline storage	`memory_store` , `vector_search` , `memory_query`

Tool-Specific Usage:

Tool	Purpose in Visual Testing
`tabs_context_mcp`	Get/verify browser context before tests
`tabs_create_mcp`	Create clean tab for test isolation
`resize_window`	Set viewport dimensions
`navigate`	Load target URL
`computer` (screenshot)	Capture full page state
`computer` (zoom)	Capture specific region with magnification
`computer` (wait)	Pause for reflow/animation completion
`gif_creator`	Record interaction sequences
`read_page`	Verify page structure before capture
`find`	Locate elements for region capture

必需MCP:

MCP	用途	使用工具
sequential-thinking	测试规划	`sequentialthinking`
claude-in-chrome	截图捕获	`navigate` , `resize_window` , `computer` (screenshot, zoom), `gif_creator` , `tabs_context_mcp` , `tabs_create_mcp`
memory-mcp	基准存储	`memory_store` , `vector_search` , `memory_query`

工具特定用法:

工具	在视觉测试中的用途
`tabs_context_mcp`	测试前获取/验证浏览器上下文
`tabs_create_mcp`	创建干净标签以隔离测试
`resize_window`	设置视口尺寸
`navigate`	加载目标URL
`computer` (screenshot)	捕获全页状态
`computer` (zoom)	放大捕获特定区域
`computer` (wait)	暂停以等待重排/动画完成
`gif_creator`	录制交互序列
`read_page`	捕获前验证页面结构
`find`	定位元素以进行区域捕获

Memory Namespace

内存命名空间

Pattern:

skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}

Types:

```
baselines/
```
- Golden images (approved screenshots)
```
captures/
```
- Current test captures
```
reports/
```
- Visual regression reports
```
diffs/
```
- Generated diff visualizations

Store:

Baseline screenshots with approval metadata
Test execution reports
Diff visualizations
Configuration (viewports, thresholds, ignore regions)

Retrieve:

Baseline for comparison by page/viewport key
Historical reports for trend analysis
Previous configs for consistency

Tagging:

json

{
  "WHO": "visual-testing:1.0.0",
  "WHEN": "ISO8601_timestamp",
  "PROJECT": "{project_name}",
  "WHY": "visual-regression-testing",
  "page": "{page_path}",
  "viewport": "{viewport_name}",
  "threshold_profile": "{profile}",
  "passed": true
}

模式:

skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}

类型:

```
baselines/
```
- 基准图像（已批准的截图）
```
captures/
```
- 当前测试捕获内容
```
reports/
```
- 视觉回归报告
```
diffs/
```
- 生成的差异可视化图

存储内容:

带有批准元数据的基准截图
测试执行报告
差异可视化图
配置（视口、阈值、忽略区域）

检索内容:

通过页面/视口键获取用于对比的基准
历史报告用于趋势分析
之前的配置用于一致性

标签:

json

{
  "WHO": "visual-testing:1.0.0",
  "WHEN": "ISO8601_timestamp",
  "PROJECT": "{project_name}",
  "WHY": "visual-regression-testing",
  "page": "{page_path}",
  "viewport": "{viewport_name}",
  "threshold_profile": "{profile}",
  "passed": true
}

Input/Output Contracts

输入/输出约定

Skill Input

技能输入

yaml

visual_test_request:
  required:
    target_url: string          # Base URL to test
  optional:
    pages: list[string]         # Specific paths (default: ["/"])
    viewport_matrix: list[string] # Preset names (default: STANDARD_MATRIX)
    capture_mode: string        # "full_page" | "element" | "both" (default: "full_page")
    threshold_profile: string   # "strict" | "default" | "relaxed" (default: "default")
    compare_baseline: boolean   # Whether to compare (default: true)
    update_baseline: boolean    # Whether to update on approval (default: false)
    interaction_sequence: list  # Actions before capture
    ignore_regions: list        # Selectors to ignore

yaml

visual_test_request:
  required:
    target_url: string          # 待测试的基础URL
  optional:
    pages: list[string]         # 特定路径（默认: ["/"]）
    viewport_matrix: list[string] # 预设名称（默认: STANDARD_MATRIX）
    capture_mode: string        # "full_page" | "element" | "both"（默认: "full_page"）
    threshold_profile: string   # "strict" | "default" | "relaxed"（默认: "default"）
    compare_baseline: boolean   # 是否进行对比（默认: true）
    update_baseline: boolean    # 批准后是否更新（默认: false）
    interaction_sequence: list  # 捕获前的操作
    ignore_regions: list        # 要忽略的选择器

Skill Output

技能输出

yaml

visual_test_result:
  summary:
    status: "passed" | "failed" | "new_baselines"
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
    execution_time_ms: number
  captures:
    - page: string
      viewport: string
      capture_id: string
      baseline_id: string | null
      comparison:
        passed: boolean
        diff_percent: number
        threshold: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      reason: string
  report_id: string  # Memory MCP reference to full report

yaml

visual_test_result:
  summary:
    status: "passed" | "failed" | "new_baselines"
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
    execution_time_ms: number
  captures:
    - page: string
      viewport: string
      capture_id: string
      baseline_id: string | null
      comparison:
        passed: boolean
        diff_percent: number
        threshold: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      reason: string
  report_id: string  # 完整报告的Memory MCP引用

Recursive Improvement Integration

递归改进集成

Role in Meta-Loop

在元循环中的角色

Loop	Visual Testing Role
Loop 1	Execute visual tests as part of validation
Loop 1.5	Capture learnings about threshold tuning, false positives
Loop 2	Quality validation of test coverage
Loop 3	Aggregate patterns for threshold optimization

循环	视觉测试角色
Loop 1	作为验证的一部分执行视觉测试
Loop 1.5	收集关于阈值调整、误报的经验
Loop 2	测试覆盖范围的质量验证
Loop 3	聚合模式以优化阈值

Eval Harness Integration

评估工具集成

Visual testing supports evaluation via:

Test pass rate tracking
False positive rate monitoring
Threshold effectiveness metrics
Baseline update frequency

视觉测试通过以下方式支持评估:

测试通过率跟踪
误报率监控
阈值有效性指标
基准更新频率

Learning Signal Sources

学习信号来源

Signal	Confidence	Learning
User approves new baseline	HIGH (0.90)	Threshold was appropriate
User rejects false positive	HIGH (0.90)	Threshold too strict for context
User flags missed regression	HIGH (0.90)	Threshold too relaxed
Same page fails repeatedly	MEDIUM (0.75)	Investigate dynamic content issue

信号	置信度	学习内容
用户批准新基准	高（0.90）	阈值设置合适
用户拒绝误报	高（0.90）	该场景下阈值过于严格
用户标记遗漏的回归	高（0.90）	阈值过于宽松
同一页面反复失败	中（0.75）	调查动态内容问题

Examples

示例

Example 1: Responsive Layout Validation

示例1：响应式布局验证

Complexity: Medium (3 viewports, 5 pages)

Task: Validate homepage responsive behavior across mobile, tablet, desktop

Planning Output (sequential-thinking):

Thought 1/6: Need to validate responsive breakpoints for homepage
Thought 2/6: Viewports: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: Capture sections: hero, features, pricing, footer
Thought 4/6: Use default threshold (0.1%) for static content
Thought 5/6: Check baseline existence, compare if present
Thought 6/6: Generate report with pass/fail per viewport

Execution:

javascript

// 1. Create test tab
await tabs_create_mcp() // -> tabId: 123

// 2. Navigate to homepage
await navigate({ url: "https://example.com/", tabId: 123 })

// 3. Mobile viewport (iPhone 14)
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png

// 4. Tablet viewport (iPad Pro 11)
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png

// 5. Desktop viewport (Desktop HD)
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png

// 6. Compare each against baseline from Memory MCP
// 7. Generate report

Result: 3/3 viewports passed, no regressions detected

Execution Time: 45 seconds

复杂度: 中等（3种视口，5个页面）

任务: 验证首页在移动、平板、桌面设备上的响应式行为

规划输出（sequential-thinking）:

Thought 1/6: 需要验证首页的响应式断点
Thought 2/6: 视口: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: 捕获区域: 英雄区、功能区、定价区、页脚
Thought 4/6: 静态内容使用默认阈值（0.1%）
Thought 5/6: 检查基准是否存在，如存在则进行对比
Thought 6/6: 生成每个视口的通过/失败报告

执行:

javascript

// 1. 创建测试标签
await tabs_create_mcp() // -> tabId: 123

// 2. 导航到首页
await navigate({ url: "https://example.com/", tabId: 123 })

// 3. 移动视口（iPhone 14）
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png

// 4. 平板视口（iPad Pro 11）
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png

// 5. 桌面视口（Desktop HD）
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png

// 6. 每个视口与Memory MCP中的基准对比
// 7. 生成报告

结果: 3/3视口通过，未检测到回归问题

执行时间: 45秒

Example 2: Component State Testing (Buttons)

示例2：组件状态测试（按钮）

Complexity: Medium (4 states per button, zoom captures)

Task: Validate primary button visual states (default, hover, active, disabled)

Planning Output:

Thought 1/8: Testing primary button component visual states
Thought 2/8: States to capture: default, hover, active, disabled
Thought 3/8: Use zoom tool for detailed button capture
Thought 4/8: Strict threshold (0.01%) for design system component
Thought 5/8: Capture default state first
Thought 6/8: Use hover action for hover state
Thought 7/8: Use mouse down for active state
Thought 8/8: Navigate to disabled example for disabled state

Execution:

javascript

// 1. Navigate to component library
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })

// 2. Find button element
const button = await find({ query: "primary button", tabId: 123 })

// 3. Zoom capture default state
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// 4. Hover state capture
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// ... continue for active, disabled states

Result: 4/4 states passed strict threshold

Execution Time: 30 seconds

复杂度: 中等（每个按钮4种状态，缩放捕获）

任务: 验证主按钮的视觉状态（默认、悬停、激活、禁用）

规划输出:

Thought 1/8: 测试主按钮组件的视觉状态
Thought 2/8: 要捕获的状态: 默认、悬停、激活、禁用
Thought 3/8: 使用缩放工具进行按钮的详细捕获
Thought 4/8: 设计系统组件使用严格阈值（0.01%）
Thought 5/8: 先捕获默认状态
Thought 6/8: 使用悬停操作获取悬停状态
Thought 7/8: 使用鼠标按下获取激活状态
Thought 8/8: 导航到禁用示例获取禁用状态

执行:

javascript

// 1. 导航到组件库
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })

// 2. 查找按钮元素
const button = await find({ query: "primary button", tabId: 123 })

// 3. 缩放捕获默认状态
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// 4. 悬停状态捕获
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// ... 继续捕获激活、禁用状态

结果: 4/4状态通过严格阈值

执行时间: 30秒

Example 3: Animation Recording (Loading Sequence)

示例3：动画录制（加载序列）

Complexity: High (GIF recording, temporal comparison)

Task: Capture and validate skeleton-to-content loading animation

Planning Output:

Thought 1/6: Need to capture loading animation as GIF
Thought 2/6: Trigger reload to capture full sequence
Thought 3/6: Start GIF recording before reload
Thought 4/6: Wait for content load completion
Thought 5/6: Stop recording and export GIF
Thought 6/6: Use animation threshold (5%) for comparison

Execution:

javascript

// 1. Start GIF recording
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Initial frame

// 2. Trigger reload
await navigate({ url: "https://example.com/dashboard", tabId: 123 })

// 3. Wait for load sequence
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Final frame

// 4. Stop recording and export
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })

Result: Animation captured successfully, 2.3% diff from baseline (within 5% animation threshold)

Execution Time: 15 seconds

复杂度: 高（GIF录制，时间维度对比）

任务: 捕获并验证骨架屏到内容的加载动画

规划输出:

Thought 1/6: 需要将加载动画捕获为GIF
Thought 2/6: 触发重载以捕获完整序列
Thought 3/6: 重载前开始GIF录制
Thought 4/6: 等待内容加载完成
Thought 5/6: 停止录制并导出GIF
Thought 6/6: 使用动画阈值（5%）进行对比

执行:

javascript

// 1. 开始GIF录制
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // 初始帧

// 2. 触发重载
await navigate({ url: "https://example.com/dashboard", tabId: 123 })

// 3. 等待加载序列完成
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // 最终帧

// 4. 停止录制并导出
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })

结果: 动画捕获成功，与基准差异为2.3%（在5%的动画阈值范围内）

执行时间: 15秒

Troubleshooting

故障排除

Common Issues and Solutions

常见问题与解决方案

Issue	Cause	Solution
Screenshots are blank/black	Page not fully loaded	Add wait after navigation, check for lazy loading
Diff always fails	Threshold too strict	Increase threshold or configure ignore regions
Viewport resize not working	Tab permission issue	Create new tab with tabs_create_mcp
GIF not recording	Recording not started	Call gif_creator start_recording before actions
Baseline not found	Wrong namespace key	Verify page/viewport in Memory MCP query
Zoom captures wrong region	Coordinates shifted	Recalculate region after viewport resize

问题	原因	解决方案
截图为空/黑屏	页面未完全加载	导航后添加等待，检查懒加载
差异始终失败	阈值过于严格	提高阈值或配置忽略区域
视口调整无效	标签权限问题	使用tabs_create_mcp创建新标签
GIF未录制	未开始录制	操作前调用gif_creator start_recording
基准未找到	命名空间键错误	验证Memory MCP查询中的页面/视口
缩放捕获错误区域	坐标偏移	视口调整后重新计算区域

Debug Mode

调试模式

Enable verbose output for troubleshooting:

javascript

const DEBUG_MODE = true;

if (DEBUG_MODE) {
  console.log("Viewport:", viewport);
  console.log("Page URL:", url);
  console.log("Capture timestamp:", new Date().toISOString());
  console.log("Baseline exists:", baselineExists);
  console.log("Diff result:", diffResult);
}

启用详细输出进行故障排除:

javascript

const DEBUG_MODE = true;

if (DEBUG_MODE) {
  console.log("Viewport:", viewport);
  console.log("Page URL:", url);
  console.log("Capture timestamp:", new Date().toISOString());
  console.log("Baseline exists:", baselineExists);
  console.log("Diff result:", diffResult);
}

Conclusion

结论

Visual Testing provides systematic screenshot-based regression detection that complements functional testing. By comparing actual rendered output against approved baselines across multiple viewports, this skill catches UI regressions that unit and integration tests miss.

The key differentiators are:

Baseline management: Versioned golden images with explicit approval workflow
Multi-viewport coverage: Responsive testing across mobile, tablet, and desktop
Threshold-based comparison: Configurable tolerance to balance sensitivity and false positives
Zoom capabilities: Element-level precision for design system validation
GIF recording: Temporal capture for animation and interaction testing

When integrated with the CI/CD pipeline, visual testing serves as a deployment gate that prevents visual regressions from reaching production. Combined with Memory MCP for persistent baselines, the system maintains consistent quality across releases.

视觉测试提供系统化的基于截图的回归检测，是功能测试的补充。通过在多视口下对比实际渲染输出与已批准的基准，该技能能捕获单元测试和集成测试遗漏的UI回归问题。

关键差异化特性:

基准管理: 带明确批准工作流的版本化基准图像
多视口覆盖: 移动、平板、桌面设备的响应式测试
基于阈值的对比: 可配置容差，平衡敏感度与误报
缩放功能: 设计系统验证的元素级精准性
GIF录制: 动画与交互测试的时间维度捕获

当集成到CI/CD流水线时，视觉测试可作为部署闸门，防止视觉回归问题进入生产环境。结合Memory MCP实现持久化基准，系统可在多个版本中保持一致的质量。

Success Criteria

成功标准

Quality Thresholds:

All configured viewports captured successfully
Baseline comparison completed for all captures (or flagged as new)
Report generated with pass/fail status per page/viewport
No orphaned tabs after test completion
Execution time within 2x estimated duration

Failure Indicators:

Screenshot capture fails (blank/timeout)
Comparison fails with system error (not threshold failure)
Memory MCP unavailable for baseline storage
Tab context lost during multi-viewport capture

质量阈值:

所有配置的视口均成功捕获
所有捕获完成基准对比（或标记为新基准）
生成带页面/视口通过/失败状态的报告
测试完成后无孤立标签
执行时间在预估时长的2倍以内

失败指标:

截图捕获失败（空白/超时）
对比因系统错误失败（非阈值失败）
Memory MCP不可用于基准存储
多视口捕获过程中丢失标签上下文