visual-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Visual Testing

视觉测试

Kanitsal Cerceve (Evidential Frame Activation)

证据框架激活(Kanitsal Cerceve)

Kaynak dogrulama modu etkin.
[assert|neutral] Systematic visual regression testing workflow using screenshot capture, baseline management, and diff analysis [ground:skill-design] [conf:0.92] [state:confirmed]
源验证模式已激活。
[断言|中立] 基于截图捕获、基准管理和差异分析的系统化视觉回归测试工作流 [依据:skill-design] [置信度:0.92] [状态:已确认]

Overview

概述

Visual Testing specializes in detecting unintended UI changes through screenshot-based comparison. Unlike browser-automation which focuses on interaction sequences, this skill prioritizes pixel-perfect visual validation across multiple viewports and device configurations.
Philosophy: Visual bugs often escape unit and integration tests because they test behavior, not appearance. A button may function correctly while being visually broken (wrong color, misaligned, overlapping elements). Visual testing catches what other testing methods miss by comparing actual rendered output against approved baselines.
Methodology: Six-phase workflow with baseline management:
  1. PLAN Phase: Sequential-thinking MCP decomposes visual test cases with viewport configurations
  2. NAVIGATE Phase: Position page in correct state for capture
  3. CAPTURE Phase: Multi-viewport screenshot collection with zoom for detail inspection
  4. COMPARE Phase: Pixel-level diff against baseline (if exists)
  5. REPORT Phase: Generate visual regression report with highlighted changes
  6. BASELINE Phase: Update golden images (with approval) or flag regression
Value Proposition: Reduce visual bug escapes by 85% through systematic screenshot comparison. Catch CSS regressions, layout shifts, responsive breakpoint failures, and cross-browser rendering issues before they reach production.
Key Differentiation from browser-automation:
Aspectbrowser-automationvisual-testing
FocusInteraction sequencesVisual state capture
OutputWorkflow completionDiff reports
ValidationFunctional successPixel comparison
ArtifactsExecution logsBaseline images + diffs
Primary UseE2E workflowsRegression detection
视觉测试专注于通过基于截图的对比检测非预期的UI变化。与专注于交互序列的浏览器自动化不同,该技能优先在多视口和设备配置下进行像素级精准的视觉验证。
核心理念:视觉缺陷常常能躲过单元测试和集成测试,因为这些测试验证的是行为而非外观。一个按钮可能功能正常,但视觉上存在问题(颜色错误、对齐偏差、元素重叠)。视觉测试通过将实际渲染输出与已批准的基准进行对比,捕捉其他测试方法遗漏的问题。
方法论:包含基准管理的六阶段工作流:
  1. 规划阶段:Sequential-thinking MCP将视觉测试用例按视口配置分解
  2. 导航阶段:将页面定位到适合捕获的状态
  3. 捕获阶段:多视口截图收集,支持缩放以进行细节检查
  4. 对比阶段:与基准进行像素级差异对比(如果基准存在)
  5. 报告阶段:生成带有高亮变化的视觉回归报告
  6. 基准阶段:(经批准后)更新基准图像,或标记回归问题
价值主张:通过系统化的截图对比将视觉缺陷遗漏率降低85%。在问题进入生产环境前,捕获CSS回归、布局偏移、响应式断点失效和跨浏览器渲染问题。
与browser-automation的关键区别:
维度browser-automationvisual-testing
关注点交互序列视觉状态捕获
输出工作流完成状态差异报告
验证方式功能正确性像素对比
产物执行日志基准图像+差异图
主要用途端到端工作流回归检测

When to Use This Skill

何时使用该技能

Trigger Thresholds:
ScenarioRecommendation
Single page screenshotUse computer tool directly (too simple)
2-5 page visual checksConsider this skill
Multi-viewport responsive testingMandatory use
Baseline comparison neededMandatory use
Design system validationMandatory use
Primary Use Cases:
  • CSS regression detection after style changes
  • Responsive layout validation across breakpoints
  • Component library visual testing
  • Design system compliance checking
  • Cross-browser rendering comparison
  • Animation and transition capture (via GIF)
  • Before/after deployment comparison
Apply When:
  • Deploying UI changes that may affect multiple pages
  • Validating responsive breakpoints work correctly
  • Ensuring design system tokens apply consistently
  • Comparing staging vs production appearance
  • Documenting UI states for handoff
触发阈值:
场景建议
单页面截图直接使用计算机工具(过于简单)
2-5个页面的视觉检查考虑使用该技能
多视口响应式测试必须使用
需要基准对比必须使用
设计系统验证必须使用
主要使用场景:
  • 样式变更后的CSS回归检测
  • 跨断点的响应式布局验证
  • 组件库视觉测试
  • 设计系统合规性检查
  • 跨浏览器渲染对比
  • 动画与过渡效果捕获(通过GIF)
  • 部署前后对比
适用时机:
  • 部署可能影响多个页面的UI变更时
  • 验证响应式断点是否正常工作时
  • 确保设计系统令牌一致应用时
  • 对比预发布环境与生产环境的外观时
  • 为交接记录UI状态时

When NOT to Use This Skill

何时不使用该技能

  • Functional testing without visual validation (use e2e-test)
  • Simple navigation workflows (use browser-automation)
  • API testing or data validation (no visual component)
  • Performance testing (use load-test skills)
  • Accessibility audits (use specialized a11y tools)
  • 无需视觉验证的功能测试(使用e2e-test)
  • 简单导航工作流(使用browser-automation)
  • API测试或数据验证(无视觉组件)
  • 性能测试(使用load-test技能)
  • 可访问性审计(使用专用a11y工具)

Core Principles

核心原则

Visual Testing operates on 5 fundamental principles:
视觉测试遵循5项基本原则:

Principle 1: Baseline-First Approach

原则1:基准优先方法

Golden images (baselines) serve as the source of truth. Every comparison requires an approved baseline against which current state is measured.
Rationale: Without baselines, visual testing becomes subjective screenshot collection. Baselines make quality objective and measurable.
In Practice:
  • Capture initial baselines for all critical pages/viewports
  • Store baselines in Memory MCP with project/page/viewport keys
  • Version baselines (ISO8601 timestamps) for rollback capability
  • Require explicit approval before baseline updates
基准图像(Golden images)作为事实来源。每次对比都需要一个已批准的基准,以此为标准衡量当前状态。
原理:没有基准的话,视觉测试就变成了主观的截图收集。基准让质量评估变得客观且可衡量。
实践方式:
  • 为所有关键页面/视口捕获初始基准
  • 使用项目/页面/视口作为键,将基准存储在Memory MCP中
  • 为基准添加版本(ISO8601时间戳)以支持回滚
  • 基准更新前需要明确的批准

Principle 2: Multi-Viewport Coverage

原则2:多视口覆盖

Test across multiple viewport configurations to catch responsive regressions that only appear at specific breakpoints.
Rationale: Most visual bugs manifest at edge cases - unusual screen widths, portrait vs landscape, mobile vs desktop. Single-viewport testing misses these.
In Practice:
  • Always test at minimum 3 viewports (mobile, tablet, desktop)
  • Include both portrait and landscape orientations
  • Use standardized viewport presets for consistency
  • Document viewport matrix in test plan
在多视口配置下进行测试,以捕获仅在特定断点出现的响应式回归问题。
原理:大多数视觉缺陷出现在边缘场景中——非常规屏幕宽度、竖屏vs横屏、移动设备vs桌面设备。单视口测试会遗漏这些问题。
实践方式:
  • 至少测试3种视口(移动、平板、桌面)
  • 同时包含竖屏和横屏方向
  • 使用标准化视口预设以保证一致性
  • 在测试计划中记录视口矩阵

Principle 3: Threshold-Based Comparison

原则3:基于阈值的对比

Not every pixel difference is a regression. Configure tolerance thresholds to distinguish intentional changes from bugs.
Rationale: Anti-aliasing, font rendering, and timing-dependent animations create non-deterministic pixel variations. Zero-tolerance comparison produces false positives.
In Practice:
  • Set default threshold at 0.1% pixel difference (99.9% match required)
  • Use higher thresholds for animation-heavy pages (1-2%)
  • Ignore specific regions known for dynamic content (timestamps, ads)
  • Track threshold effectiveness and tune over time
并非所有像素差异都是回归问题。配置容差阈值,区分有意变更与缺陷。
原理:抗锯齿、字体渲染和时间相关的动画会产生非确定性的像素变化。零容差对比会产生误报。
实践方式:
  • 默认阈值设为0.1%像素差异(要求99.9%匹配)
  • 动画较多的页面使用更高阈值(1-2%)
  • 忽略已知动态内容区域(时间戳、广告)
  • 跟踪阈值有效性并随时间调整

Principle 4: Element-Level Zoom for Precision

原则4:元素级缩放以实现精准性

Use the zoom tool for detailed inspection of specific UI elements when full-page screenshots insufficient.
Rationale: Small elements (icons, badges, indicators) may have regressions invisible at full-page scale. Zoomed captures reveal micro-regressions.
In Practice:
  • Capture full page first, then zoom to critical elements
  • Define zoom regions in test plan (coordinates or element refs)
  • Compare zoomed regions independently
  • Document element-level baselines separately
当全页截图不足以满足需求时,使用缩放工具对特定UI元素进行详细检查。
原理:小元素(图标、徽章、指示器)的回归问题可能在全页尺度下不可见。缩放捕获能发现微观回归问题。
实践方式:
  • 先捕获全页,再对关键元素进行缩放
  • 在测试计划中定义缩放区域(坐标或元素引用)
  • 独立对比缩放区域
  • 单独记录元素级基准

Principle 5: GIF Recording for Interactions

原则5:使用GIF录制交互过程

Static screenshots miss animation and transition regressions. Use GIF recording to capture temporal UI behavior.
Rationale: CSS animations, hover states, loading sequences, and micro-interactions are visible only in motion. GIF recording captures these temporal aspects.
In Practice:
  • Record GIFs for pages with significant animations
  • Capture hover/focus/active states in sequence
  • Use GIFs for documenting before/after comparisons
  • Store GIFs with interaction metadata
静态截图会遗漏动画和过渡回归问题。使用GIF录制捕获UI的时间行为。
原理:CSS动画、悬停状态、加载序列和微交互只有在动态状态下才可见。GIF录制能捕获这些时间维度的表现。
实践方式:
  • 为包含大量动画的页面录制GIF
  • 按顺序捕获悬停/聚焦/激活状态
  • 使用GIF记录前后对比
  • 存储带有交互元数据的GIF

Production Guardrails

生产环境防护措施

MCP Preflight Check Protocol

MCP预检协议

Before executing visual tests, validate required MCPs:
Preflight Sequence:
javascript
async function visualTestPreflight() {
  const checks = {
    sequential_thinking: false,
    claude_in_chrome: false,
    memory_mcp: false
  };

  // Check sequential-thinking MCP (required for planning)
  try {
    await mcp__sequential-thinking__sequentialthinking({
      thought: "Visual test preflight - verifying MCP availability",
      thoughtNumber: 1,
      totalThoughts: 1,
      nextThoughtNeeded: false
    });
    checks.sequential_thinking = true;
  } catch (error) {
    throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
  }

  // Check claude-in-chrome MCP (required for capture)
  try {
    const context = await mcp__claude-in-chrome__tabs_context_mcp({});
    checks.claude_in_chrome = true;
  } catch (error) {
    throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
  }

  // Check memory-mcp (required for baseline storage)
  try {
    // Memory MCP check
    checks.memory_mcp = true;
  } catch (error) {
    throw new Error("CRITICAL: memory-mcp required for baseline storage");
  }

  return checks;
}
执行视觉测试前,验证所需的MCP:
预检流程:
javascript
async function visualTestPreflight() {
  const checks = {
    sequential_thinking: false,
    claude_in_chrome: false,
    memory_mcp: false
  };

  // 检查sequential-thinking MCP(规划阶段必需)
  try {
    await mcp__sequential-thinking__sequentialthinking({
      thought: "Visual test preflight - verifying MCP availability",
      thoughtNumber: 1,
      totalThoughts: 1,
      nextThoughtNeeded: false
    });
    checks.sequential_thinking = true;
  } catch (error) {
    throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
  }

  // 检查claude-in-chrome MCP(捕获阶段必需)
  try {
    const context = await mcp__claude-in-chrome__tabs_context_mcp({});
    checks.claude_in_chrome = true;
  } catch (error) {
    throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
  }

  // 检查memory-mcp(基准存储必需)
  try {
    // Memory MCP check
    checks.memory_mcp = true;
  } catch (error) {
    throw new Error("CRITICAL: memory-mcp required for baseline storage");
  }

  return checks;
}

Viewport Preset Configuration

视口预设配置

Standard Viewport Matrix:
javascript
const VIEWPORT_PRESETS = {
  // Mobile Devices
  iphone_se: { width: 375, height: 667, name: "iPhone SE" },
  iphone_14: { width: 390, height: 844, name: "iPhone 14" },
  iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
  pixel_7: { width: 412, height: 915, name: "Pixel 7" },

  // Tablets
  ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
  ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
  ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },

  // Desktop
  laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
  laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
  desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
  desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};

// Standard test matrix (most common)
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];

// Extended test matrix (comprehensive)
const EXTENDED_MATRIX = [
  "iphone_se", "iphone_14_pro_max", "pixel_7",
  "ipad_mini", "ipad_pro_12",
  "laptop_sm", "desktop_hd", "desktop_4k"
];
标准视口矩阵:
javascript
const VIEWPORT_PRESETS = {
  // 移动设备
  iphone_se: { width: 375, height: 667, name: "iPhone SE" },
  iphone_14: { width: 390, height: 844, name: "iPhone 14" },
  iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
  pixel_7: { width: 412, height: 915, name: "Pixel 7" },

  // 平板设备
  ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
  ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
  ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },

  // 桌面设备
  laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
  laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
  desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
  desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};

// 标准测试矩阵(最常用)
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];

// 扩展测试矩阵(全面覆盖)
const EXTENDED_MATRIX = [
  "iphone_se", "iphone_14_pro_max", "pixel_7",
  "ipad_mini", "ipad_pro_12",
  "laptop_sm", "desktop_hd", "desktop_4k"
];

Diff Threshold Configuration

差异阈值配置

javascript
const DIFF_THRESHOLDS = {
  // Strict (design system components)
  strict: {
    pixelDiff: 0.01,  // 0.01% tolerance (nearly pixel-perfect)
    description: "For design system components requiring exact match"
  },

  // Default (most pages)
  default: {
    pixelDiff: 0.1,   // 0.1% tolerance
    description: "Standard threshold for most UI testing"
  },

  // Relaxed (dynamic content)
  relaxed: {
    pixelDiff: 1.0,   // 1% tolerance
    description: "For pages with minor dynamic variations"
  },

  // Animation (high variance)
  animation: {
    pixelDiff: 5.0,   // 5% tolerance
    description: "For animation captures with timing variance"
  }
};
javascript
const DIFF_THRESHOLDS = {
  // 严格模式(设计系统组件)
  strict: {
    pixelDiff: 0.01,  // 0.01%容差(近乎像素级完美匹配)
    description: "For design system components requiring exact match"
  },

  // 默认模式(大多数页面)
  default: {
    pixelDiff: 0.1,   // 0.1%容差
    description: "Standard threshold for most UI testing"
  },

  // 宽松模式(动态内容)
  relaxed: {
    pixelDiff: 1.0,   // 1%容差
    description: "For pages with minor dynamic variations"
  },

  // 动画模式(高差异)
  animation: {
    pixelDiff: 5.0,   // 5%容差
    description: "For animation captures with timing variance"
  }
};

Error Handling Framework

错误处理框架

Error Categories:
CategoryExampleRecovery Strategy
MCP_UNAVAILABLEclaude-in-chrome offlineABORT - cannot proceed
NAVIGATION_FAILEDPage timeout/404Retry 3x with backoff
CAPTURE_FAILEDScreenshot errorRetry with fresh tab
BASELINE_MISSINGNo golden imagePrompt for baseline creation
COMPARISON_FAILEDDiff computation errorLog and skip, flag for review
THRESHOLD_EXCEEDEDVisual regression detectedGenerate report, flag issue

错误分类:
分类示例恢复策略
MCP_UNAVAILABLEclaude-in-chrome离线终止 - 无法继续
NAVIGATION_FAILED页面超时/404重试3次,带退避机制
CAPTURE_FAILED截图错误使用新标签重试
BASELINE_MISSING无基准图像提示创建基准
COMPARISON_FAILED差异计算错误记录并跳过,标记为需要复查
THRESHOLD_EXCEEDED检测到视觉回归生成报告,标记问题

Main Workflow

主要工作流

Phase 1: Test Planning (MANDATORY)

阶段1:测试规划(必需)

Purpose: Define visual test scope using sequential-thinking decomposition.
Process:
  1. Invoke sequential-thinking MCP
  2. Identify target pages/URLs
  3. Select viewport configurations
  4. Define capture regions (full page, element-specific)
  5. Set comparison thresholds
  6. Plan interaction sequences for state-dependent captures
Input Contract:
yaml
inputs:
  target_url: string           # URL to test
  pages: list[string]          # Page paths to capture
  viewport_matrix: list[string] # Viewport presets to use
  capture_mode: string         # "full_page" | "element" | "both"
  threshold_profile: string    # "strict" | "default" | "relaxed"
  interaction_sequence: list   # Optional: actions before capture
Output Contract:
yaml
outputs:
  test_plan:
    pages: list[PagePlan]
    viewports: list[ViewportConfig]
    capture_points: list[CapturePoint]
    threshold: number
目的:使用sequential-thinking分解法定义视觉测试范围。
流程:
  1. 调用sequential-thinking MCP
  2. 确定目标页面/URL
  3. 选择视口配置
  4. 定义捕获区域(全页、特定元素)
  5. 设置对比阈值
  6. 为依赖状态的捕获规划交互序列
输入约定:
yaml
inputs:
  target_url: string           # 待测试的URL
  pages: list[string]          # 要捕获的页面路径
  viewport_matrix: list[string] # 要使用的视口预设
  capture_mode: string         # "full_page" | "element" | "both"
  threshold_profile: string    # "strict" | "default" | "relaxed"
  interaction_sequence: list   # 可选:捕获前的操作
输出约定:
yaml
outputs:
  test_plan:
    pages: list[PagePlan]
    viewports: list[ViewportConfig]
    capture_points: list[CapturePoint]
    threshold: number

Phase 2: Navigation & State Setup

阶段2:导航与状态设置

Purpose: Navigate to target page and establish correct state for capture.
Process:
  1. Get/create tab context (tabs_context_mcp, tabs_create_mcp)
  2. Navigate to target URL
  3. Wait for page load completion
  4. Execute interaction sequence if needed (login, scroll, hover)
  5. Verify page state ready for capture
Agent:
Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")
目的:导航到目标页面并建立适合捕获的状态。
流程:
  1. 获取/创建标签上下文(tabs_context_mcp, tabs_create_mcp)
  2. 导航到目标URL
  3. 等待页面加载完成
  4. 如有需要,执行交互序列(登录、滚动、悬停)
  5. 验证页面状态已准备好进行捕获
代理
Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")

Phase 3: Multi-Viewport Capture

阶段3:多视口捕获

Purpose: Capture screenshots across all configured viewports.
Process:
For each viewport in viewport_matrix:
  1. Resize window (resize_window)
  2. Wait for reflow (wait 500ms)
  3. Capture full page (computer screenshot)
  4. Capture zoomed regions if configured (computer zoom)
  5. Store capture with viewport/page metadata
Key Tools:
  • resize_window
    : Set viewport dimensions
  • computer
    (screenshot): Full page capture
  • computer
    (zoom): Element-level detail capture
  • gif_creator
    : For interaction sequences
目的:在所有配置的视口下捕获截图。
流程:
For each viewport in viewport_matrix:
  1. 调整窗口大小(resize_window)
  2. 等待重排(等待500ms)
  3. 捕获全页(computer截图功能)
  4. 如已配置,捕获缩放区域(computer缩放功能)
  5. 存储带有视口/页面元数据的捕获内容
核心工具:
  • resize_window
    : 设置视口尺寸
  • computer
    (screenshot): 全页捕获
  • computer
    (zoom): 元素级细节捕获
  • gif_creator
    : 用于交互序列录制

Phase 4: Baseline Comparison

阶段4:基准对比

Purpose: Compare current captures against stored baselines.
Process:
  1. Query Memory MCP for baseline (namespace:
    visual-testing/baselines/{project}/{page}/{viewport}
    )
  2. If baseline exists:
    • Compute pixel diff percentage
    • Generate diff visualization (highlight changed pixels)
    • Apply threshold comparison
  3. If baseline missing:
    • Flag as "new baseline needed"
    • Prompt for approval
Comparison Algorithm:
javascript
function compareScreenshots(current, baseline, threshold) {
  const totalPixels = current.width * current.height;
  let diffPixels = 0;

  for (let y = 0; y < current.height; y++) {
    for (let x = 0; x < current.width; x++) {
      if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
        diffPixels++;
      }
    }
  }

  const diffPercent = (diffPixels / totalPixels) * 100;
  return {
    passed: diffPercent <= threshold,
    diffPercent: diffPercent,
    diffPixels: diffPixels,
    totalPixels: totalPixels
  };
}
目的:将当前捕获内容与存储的基准进行对比。
流程:
  1. 查询Memory MCP获取基准(命名空间:
    visual-testing/baselines/{project}/{page}/{viewport}
  2. 如果基准存在:
    • 计算像素差异百分比
    • 生成差异可视化(高亮变化的像素)
    • 应用阈值对比
  3. 如果基准不存在:
    • 标记为“需要创建新基准”
    • 提示获取批准
对比算法:
javascript
function compareScreenshots(current, baseline, threshold) {
  const totalPixels = current.width * current.height;
  let diffPixels = 0;

  for (let y = 0; y < current.height; y++) {
    for (let x = 0; x < current.width; x++) {
      if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
        diffPixels++;
      }
    }
  }

  const diffPercent = (diffPixels / totalPixels) * 100;
  return {
    passed: diffPercent <= threshold,
    diffPercent: diffPercent,
    diffPixels: diffPixels,
    totalPixels: totalPixels
  };
}

Phase 5: Report Generation

阶段5:报告生成

Purpose: Generate comprehensive visual regression report.
Process:
  1. Aggregate comparison results across all pages/viewports
  2. Generate summary (pass/fail counts, worst regressions)
  3. Create diff visualizations (side-by-side, overlay, diff-only)
  4. Include metadata (timestamps, viewport configs, thresholds)
  5. Store report in Memory MCP
Report Structure:
yaml
visual_regression_report:
  timestamp: ISO8601
  project: string
  summary:
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      threshold: number
      baseline_timestamp: ISO8601
      current_capture_id: string
  metadata:
    viewports_tested: list
    threshold_profile: string
    duration_ms: number
目的:生成全面的视觉回归报告。
流程:
  1. 汇总所有页面/视口的对比结果
  2. 生成摘要(通过/失败数量,最严重的回归问题)
  3. 创建差异可视化(并排对比、叠加对比、仅差异图)
  4. 包含元数据(时间戳、视口配置、阈值)
  5. 将报告存储在Memory MCP中
报告结构:
yaml
visual_regression_report:
  timestamp: ISO8601
  project: string
  summary:
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      threshold: number
      baseline_timestamp: ISO8601
      current_capture_id: string
  metadata:
    viewports_tested: list
    threshold_profile: string
    duration_ms: number

Phase 6: Baseline Management

阶段6:基准管理

Purpose: Update baselines when changes are intentional.
Process:
  1. For failed comparisons, determine if change is intentional
  2. If intentional: Update baseline with approval
  3. If regression: Flag for fix
  4. For new pages: Create initial baseline with approval
  5. Version old baselines (keep 5 most recent)
Baseline Storage Schema:
yaml
baseline:
  namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
  data:
    image_id: string          # Reference to stored screenshot
    captured_at: ISO8601
    approved_by: string
    threshold_used: number
    viewport: object
    url: string
    version: number
  tags:
    WHO: "visual-testing:1.0.0"
    WHEN: ISO8601
    PROJECT: string
    WHY: "baseline-capture"
目的:当变更为有意操作时更新基准。
流程:
  1. 对于失败的对比,判断变更是有意还是无意
  2. 如果是有意变更:经批准后更新基准
  3. 如果是回归问题:标记为需要修复
  4. 对于新页面:经批准后创建初始基准
  5. 为旧基准添加版本(保留最近5个版本)
基准存储模式:
yaml
baseline:
  namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
  data:
    image_id: string          # 存储截图的引用
    captured_at: ISO8601
    approved_by: string
    threshold_used: number
    viewport: object
    url: string
    version: number
  tags:
    WHO: "visual-testing:1.0.0"
    WHEN: ISO8601
    PROJECT: string
    WHY: "baseline-capture"

LEARNED PATTERNS

已学习模式

<!-- This section will be populated by Loop 1.5 session reflection --> <!-- Patterns are added when user corrections or approvals provide learning signals -->
<!-- 此部分将由Loop 1.5会话反思填充 --> <!-- 当用户修正或批准提供学习信号时,将添加模式 -->

High Confidence [conf:0.90+]

高置信度 [conf:0.90+]

No patterns recorded yet. This section will be updated through Loop 1.5 reflection.
No patterns recorded yet. This section will be updated through Loop 1.5 reflection.

Medium Confidence [conf:0.70-0.89]

中等置信度 [conf:0.70-0.89]

No patterns recorded yet.
No patterns recorded yet.

Low Confidence [conf:0.50-0.69]

低置信度 [conf:0.50-0.69]

No patterns recorded yet.
No patterns recorded yet.

Pattern Recognition

模式识别

Different visual testing scenarios require different approaches:
不同的视觉测试场景需要不同的方法:

Responsive Layout Testing

响应式布局测试

Patterns: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"
Common Characteristics:
  • Multiple viewport configurations required
  • Layout shifts are primary concern
  • Element visibility/hiding at breakpoints
  • Text wrapping and overflow behavior
Key Focus:
  • Breakpoint transitions (where layouts shift)
  • Navigation collapse/expand behavior
  • Grid/flex layout stability
  • Touch target sizing on mobile
Approach: Use extended viewport matrix, focus on breakpoint edge cases (width +/- 10px from breakpoint)
模式: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"
共同特征:
  • 需要多视口配置
  • 主要关注点是布局偏移
  • 断点处元素的显示/隐藏
  • 文本换行和溢出行为
核心关注点:
  • 断点过渡(布局发生变化的位置)
  • 导航栏的折叠/展开行为
  • 网格/弹性布局的稳定性
  • 移动设备上的触摸目标尺寸
方法: 使用扩展视口矩阵,聚焦断点边缘场景(断点宽度±10px)

Component Visual Testing

组件视觉测试

Patterns: "component", "button", "card", "form", "modal", "dropdown"
Common Characteristics:
  • Isolated element testing
  • State variations (default, hover, active, disabled, error)
  • Strict threshold requirements
  • Design token compliance
Key Focus:
  • Color accuracy (design tokens)
  • Spacing consistency
  • Typography rendering
  • Border/shadow rendering
Approach: Use zoom tool for detailed capture, strict threshold, capture all states via interaction sequence
模式: "component", "button", "card", "form", "modal", "dropdown"
共同特征:
  • 孤立元素测试
  • 状态变化(默认、悬停、激活、禁用、错误)
  • 严格的阈值要求
  • 设计令牌合规性
核心关注点:
  • 颜色准确性(设计令牌)
  • 间距一致性
  • 排版渲染
  • 边框/阴影渲染
方法: 使用缩放工具进行详细捕获,使用严格阈值,通过交互序列捕获所有状态

Animation/Transition Testing

动画/过渡测试

Patterns: "animation", "transition", "hover", "loading", "skeleton"
Common Characteristics:
  • Temporal behavior (not single frame)
  • GIF recording required
  • Higher diff thresholds due to timing variance
  • Performance-sensitive
Key Focus:
  • Animation timing correctness
  • Transition smoothness
  • Loading state appearance
  • Skeleton to content transition
Approach: Use gif_creator for recording, relaxed/animation threshold profile, capture key frames
模式: "animation", "transition", "hover", "loading", "skeleton"
共同特征:
  • 时间维度的行为(非单帧)
  • 需要GIF录制
  • 由于时间差异需要更高的差异阈值
  • 对性能敏感
核心关注点:
  • 动画时间正确性
  • 过渡平滑度
  • 加载状态外观
  • 骨架屏到内容的过渡
方法: 使用gif_creator进行录制,使用宽松/动画阈值配置文件,捕获关键帧

Cross-Environment Comparison

跨环境对比

Patterns: "staging vs production", "before after", "compare", "deploy validation"
Common Characteristics:
  • Two distinct environments/states
  • Side-by-side comparison needed
  • May have expected differences (content)
  • Focus on structural consistency
Key Focus:
  • Layout structure stability
  • Component presence/absence
  • Style application consistency
  • No unexpected visual changes
Approach: Capture both states, generate side-by-side diff, use relaxed threshold for content areas
模式: "staging vs production", "before after", "compare", "deploy validation"
共同特征:
  • 两个不同的环境/状态
  • 需要并排对比
  • 可能存在预期差异(内容)
  • 关注结构一致性
核心关注点:
  • 布局结构稳定性
  • 组件存在/缺失
  • 样式应用一致性
  • 无意外视觉变化
方法: 捕获两种状态,生成并排差异图,对内容区域使用宽松阈值

Advanced Techniques

高级技术

Audience-Specific Testing

受众特定测试

Different stakeholders need different visual test outputs:
Developers: Technical diffs with pixel coordinates, DOM structure comparison, CSS property changes
Designers: Visual overlays, color accuracy reports, spacing measurements, design token compliance
QA Team: Pass/fail summaries, regression counts, trend reports, baseline approval queues
Executives: High-level dashboards, regression trends, release readiness indicators
不同利益相关者需要不同的视觉测试输出:
开发人员: 带像素坐标的技术差异图、DOM结构对比、CSS属性变化
设计师: 视觉叠加层、颜色准确性报告、间距测量、设计令牌合规性
QA团队: 通过/失败摘要、回归数量、趋势报告、基准批准队列
管理人员: 高级仪表板、回归趋势、发布就绪指标

Ignore Regions Configuration

忽略区域配置

For pages with dynamic content, configure ignore regions to prevent false positives:
javascript
const IGNORE_REGIONS = {
  common: [
    { selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
    { selector: ".ad-container", reason: "Third-party ads" },
    { selector: ".live-chat-widget", reason: "Chat widget state varies" }
  ],
  page_specific: {
    "/dashboard": [
      { selector: ".metric-value", reason: "Live metrics" },
      { selector: ".user-avatar", reason: "User-specific content" }
    ]
  }
};
对于包含动态内容的页面,配置忽略区域以避免误报:
javascript
const IGNORE_REGIONS = {
  common: [
    { selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
    { selector: ".ad-container", reason: "Third-party ads" },
    { selector: ".live-chat-widget", reason: "Chat widget state varies" }
  ],
  page_specific: {
    "/dashboard": [
      { selector: ".metric-value", reason: "Live metrics" },
      { selector: ".user-avatar", reason: "User-specific content" }
    ]
  }
};

Multi-Model Validation

多模型验证

For critical visual tests, use LLM Council for consensus:
javascript
// When visual diff is borderline (threshold +/- 0.5%)
async function multiModelVisualValidation(current, baseline, diff) {
  const prompt = `
    Analyze this visual comparison:
    - Diff percentage: ${diff.diffPercent}%
    - Changed pixels: ${diff.diffPixels}
    - Threshold: ${diff.threshold}%

    Is this change:
    A) Intentional design update (approve new baseline)
    B) Unintentional regression (flag for fix)
    C) Acceptable variation (pass with note)

    Provide reasoning.
  `;

  // Route to Gemini for image analysis capability
  return await geminiAnalyze(current, baseline, prompt);
}
对于关键视觉测试,使用LLM委员会达成共识:
javascript
// 当视觉差异处于临界值时(阈值±0.5%)
async function multiModelVisualValidation(current, baseline, diff) {
  const prompt = `
    Analyze this visual comparison:
    - Diff percentage: ${diff.diffPercent}%
    - Changed pixels: ${diff.diffPixels}
    - Threshold: ${diff.threshold}%

    Is this change:
    A) Intentional design update (approve new baseline)
    B) Unintentional regression (flag for fix)
    C) Acceptable variation (pass with note)

    Provide reasoning.
  `;

  // 路由到Gemini进行图像分析
  return await geminiAnalyze(current, baseline, prompt);
}

Common Anti-Patterns

常见反模式

Avoid these common mistakes:
避免这些常见错误:

Capture Anti-Patterns

捕获反模式

Anti-PatternProblemSolution
No wait after resizeCaptures before reflow completeAdd 500ms wait after resize_window
Ignoring async contentMissing dynamically loaded elementsWait for network idle or specific selectors
Single viewport onlyMissing responsive regressionsUse minimum 3 viewports (mobile, tablet, desktop)
Capturing during animationNon-deterministic framesWait for animations or use GIF
反模式问题解决方案
调整大小后不等待在重排完成前捕获调整窗口大小后添加500ms等待
忽略异步内容遗漏动态加载的元素等待网络空闲或特定选择器加载完成
仅单视口测试遗漏响应式回归问题至少使用3种视口(移动、平板、桌面)
动画过程中捕获非确定性帧等待动画完成或使用GIF

Comparison Anti-Patterns

对比反模式

Anti-PatternProblemSolution
Zero toleranceFalse positives from anti-aliasingUse minimum 0.01% threshold
No baseline versioningCannot rollback bad baselineVersion baselines with timestamps
Comparing different viewportsInvalid diffValidate viewport match before compare
No ignore regionsDynamic content causes failuresConfigure ignore regions for timestamps, ads
反模式问题解决方案
零容差抗锯齿导致误报至少使用0.01%的阈值
无基准版本控制无法回滚错误的基准为基准添加时间戳版本
对比不同视口无效差异对比前验证视口匹配
无忽略区域动态内容导致失败为时间戳、广告等配置忽略区域

Workflow Anti-Patterns

工作流反模式

Anti-PatternProblemSolution
Skip planning phaseMissing edge casesALWAYS use sequential-thinking first
No interaction before captureMissing auth/state-dependent pagesPlan interaction sequences
Silent baseline updatesRegressions approved accidentallyRequire explicit approval
No cleanupOrphaned tabs accumulateClose tabs after test completion
反模式问题解决方案
跳过规划阶段遗漏边缘场景始终先使用sequential-thinking
捕获前无交互遗漏需要认证/依赖状态的页面规划交互序列
静默更新基准回归问题被意外批准需要明确的批准
无清理操作孤立标签累积测试完成后关闭标签

Practical Guidelines

实践指南

Full vs Quick Mode

完整模式 vs 快速模式

Full Mode (comprehensive):
  • All viewports in extended matrix
  • All pages in sitemap
  • Element-level zoom captures
  • GIF recording for animations
  • Duration: 5-15 minutes
Quick Mode (smoke test):
  • Standard matrix (3 viewports)
  • Critical pages only
  • Full-page captures only
  • Skip animations
  • Duration: 1-3 minutes
完整模式(全面):
  • 扩展矩阵中的所有视口
  • 站点地图中的所有页面
  • 元素级缩放捕获
  • 动画的GIF录制
  • 时长: 5-15分钟
快速模式(冒烟测试):
  • 标准矩阵(3种视口)
  • 仅关键页面
  • 仅全页捕获
  • 跳过动画
  • 时长: 1-3分钟

Checkpoint Strategy

检查点策略

For large test suites (20+ pages):
  • Save progress every 5 pages
  • Store partial results in Memory MCP
  • Enable resume on failure
  • Timeout individual captures at 30 seconds
对于大型测试套件(20+页面):
  • 每5个页面保存一次进度
  • 将部分结果存储在Memory MCP中
  • 支持失败后恢复
  • 单个捕获超时设为30秒

Trade-offs

权衡决策

DecisionOption AOption BGuidance
Threshold strictnessStrict (0.01%)Relaxed (1%)Strict for design system, relaxed for content-heavy
Viewport coverageExtended (8+)Standard (3)Extended for responsive-focused apps
Capture modeFull pageElement zoomFull page default, zoom for component testing
Baseline storageLocalMemory MCPMemory MCP for cross-session persistence
决策选项A选项B指导原则
阈值严格性严格(0.01%)宽松(1%)设计系统用严格模式,内容密集页面用宽松模式
视口覆盖扩展(8+)标准(3)响应式聚焦的应用使用扩展模式
捕获模式全页元素缩放默认全页,组件测试用缩放
基准存储本地Memory MCPMemory MCP用于跨会话持久化

Cross-Skill Coordination

跨技能协作

Visual Testing works with other skills in the ecosystem:
视觉测试与生态系统中的其他技能配合工作:

Upstream Skills (provide input)

上游技能(提供输入)

SkillWhen to Use FirstWhat It Provides
intent-analyzer
Always firstDetect visual testing need, extract URLs
browser-automation
For complex page statesNavigation + interaction to reach state
prompt-architect
For test plan optimizationStructured test specifications
技能何时先使用提供内容
intent-analyzer
始终首先使用检测视觉测试需求,提取URL
browser-automation
复杂页面状态导航+交互以达到目标状态
prompt-architect
测试计划优化结构化测试规范

Downstream Skills (use output)

下游技能(使用输出)

SkillWhen to Use AfterWhat It Does
fix-bug
On regression detectionFix visual bugs identified
documenter
For test reportsGenerate visual test documentation
deployment
Before deployGate deployment on visual test pass
技能何时在之后使用功能
fix-bug
检测到回归时修复识别出的视觉缺陷
documenter
生成测试报告后生成视觉测试文档
deployment
部署前基于视觉测试通过情况控制部署

Parallel Skills (run alongside)

并行技能(同时运行)

SkillWhen to Run TogetherHow They Coordinate
e2e-test
Same page coverageVisual captures functional tests
browser-automation
Page state setupAutomation provides capture-ready state
code-review-assistant
CSS changesVisual test validates review findings
技能何时同时运行协作方式
e2e-test
相同页面覆盖视觉捕获验证功能测试
browser-automation
页面状态设置自动化提供可捕获的状态
code-review-assistant
CSS变更时视觉测试验证评审结果

MCP Integration

MCP集成

Required MCPs:
MCPPurposeTools Used
sequential-thinkingTest planning
sequentialthinking
claude-in-chromeScreenshot capture
navigate
,
resize_window
,
computer
(screenshot, zoom),
gif_creator
,
tabs_context_mcp
,
tabs_create_mcp
memory-mcpBaseline storage
memory_store
,
vector_search
,
memory_query
Tool-Specific Usage:
ToolPurpose in Visual Testing
tabs_context_mcp
Get/verify browser context before tests
tabs_create_mcp
Create clean tab for test isolation
resize_window
Set viewport dimensions
navigate
Load target URL
computer
(screenshot)
Capture full page state
computer
(zoom)
Capture specific region with magnification
computer
(wait)
Pause for reflow/animation completion
gif_creator
Record interaction sequences
read_page
Verify page structure before capture
find
Locate elements for region capture
必需MCP:
MCP用途使用工具
sequential-thinking测试规划
sequentialthinking
claude-in-chrome截图捕获
navigate
,
resize_window
,
computer
(screenshot, zoom),
gif_creator
,
tabs_context_mcp
,
tabs_create_mcp
memory-mcp基准存储
memory_store
,
vector_search
,
memory_query
工具特定用法:
工具在视觉测试中的用途
tabs_context_mcp
测试前获取/验证浏览器上下文
tabs_create_mcp
创建干净标签以隔离测试
resize_window
设置视口尺寸
navigate
加载目标URL
computer
(screenshot)
捕获全页状态
computer
(zoom)
放大捕获特定区域
computer
(wait)
暂停以等待重排/动画完成
gif_creator
录制交互序列
read_page
捕获前验证页面结构
find
定位元素以进行区域捕获

Memory Namespace

内存命名空间

Pattern:
skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}
Types:
  • baselines/
    - Golden images (approved screenshots)
  • captures/
    - Current test captures
  • reports/
    - Visual regression reports
  • diffs/
    - Generated diff visualizations
Store:
  • Baseline screenshots with approval metadata
  • Test execution reports
  • Diff visualizations
  • Configuration (viewports, thresholds, ignore regions)
Retrieve:
  • Baseline for comparison by page/viewport key
  • Historical reports for trend analysis
  • Previous configs for consistency
Tagging:
json
{
  "WHO": "visual-testing:1.0.0",
  "WHEN": "ISO8601_timestamp",
  "PROJECT": "{project_name}",
  "WHY": "visual-regression-testing",
  "page": "{page_path}",
  "viewport": "{viewport_name}",
  "threshold_profile": "{profile}",
  "passed": true
}
模式:
skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}
类型:
  • baselines/
    - 基准图像(已批准的截图)
  • captures/
    - 当前测试捕获内容
  • reports/
    - 视觉回归报告
  • diffs/
    - 生成的差异可视化图
存储内容:
  • 带有批准元数据的基准截图
  • 测试执行报告
  • 差异可视化图
  • 配置(视口、阈值、忽略区域)
检索内容:
  • 通过页面/视口键获取用于对比的基准
  • 历史报告用于趋势分析
  • 之前的配置用于一致性
标签:
json
{
  "WHO": "visual-testing:1.0.0",
  "WHEN": "ISO8601_timestamp",
  "PROJECT": "{project_name}",
  "WHY": "visual-regression-testing",
  "page": "{page_path}",
  "viewport": "{viewport_name}",
  "threshold_profile": "{profile}",
  "passed": true
}

Input/Output Contracts

输入/输出约定

Skill Input

技能输入

yaml
visual_test_request:
  required:
    target_url: string          # Base URL to test
  optional:
    pages: list[string]         # Specific paths (default: ["/"])
    viewport_matrix: list[string] # Preset names (default: STANDARD_MATRIX)
    capture_mode: string        # "full_page" | "element" | "both" (default: "full_page")
    threshold_profile: string   # "strict" | "default" | "relaxed" (default: "default")
    compare_baseline: boolean   # Whether to compare (default: true)
    update_baseline: boolean    # Whether to update on approval (default: false)
    interaction_sequence: list  # Actions before capture
    ignore_regions: list        # Selectors to ignore
yaml
visual_test_request:
  required:
    target_url: string          # 待测试的基础URL
  optional:
    pages: list[string]         # 特定路径(默认: ["/"])
    viewport_matrix: list[string] # 预设名称(默认: STANDARD_MATRIX)
    capture_mode: string        # "full_page" | "element" | "both"(默认: "full_page")
    threshold_profile: string   # "strict" | "default" | "relaxed"(默认: "default")
    compare_baseline: boolean   # 是否进行对比(默认: true)
    update_baseline: boolean    # 批准后是否更新(默认: false)
    interaction_sequence: list  # 捕获前的操作
    ignore_regions: list        # 要忽略的选择器

Skill Output

技能输出

yaml
visual_test_result:
  summary:
    status: "passed" | "failed" | "new_baselines"
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
    execution_time_ms: number
  captures:
    - page: string
      viewport: string
      capture_id: string
      baseline_id: string | null
      comparison:
        passed: boolean
        diff_percent: number
        threshold: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      reason: string
  report_id: string  # Memory MCP reference to full report
yaml
visual_test_result:
  summary:
    status: "passed" | "failed" | "new_baselines"
    total_captures: number
    passed: number
    failed: number
    new_baselines: number
    execution_time_ms: number
  captures:
    - page: string
      viewport: string
      capture_id: string
      baseline_id: string | null
      comparison:
        passed: boolean
        diff_percent: number
        threshold: number
  failures:
    - page: string
      viewport: string
      diff_percent: number
      reason: string
  report_id: string  # 完整报告的Memory MCP引用

Recursive Improvement Integration

递归改进集成

Role in Meta-Loop

在元循环中的角色

LoopVisual Testing Role
Loop 1Execute visual tests as part of validation
Loop 1.5Capture learnings about threshold tuning, false positives
Loop 2Quality validation of test coverage
Loop 3Aggregate patterns for threshold optimization
循环视觉测试角色
Loop 1作为验证的一部分执行视觉测试
Loop 1.5收集关于阈值调整、误报的经验
Loop 2测试覆盖范围的质量验证
Loop 3聚合模式以优化阈值

Eval Harness Integration

评估工具集成

Visual testing supports evaluation via:
  • Test pass rate tracking
  • False positive rate monitoring
  • Threshold effectiveness metrics
  • Baseline update frequency
视觉测试通过以下方式支持评估:
  • 测试通过率跟踪
  • 误报率监控
  • 阈值有效性指标
  • 基准更新频率

Learning Signal Sources

学习信号来源

SignalConfidenceLearning
User approves new baselineHIGH (0.90)Threshold was appropriate
User rejects false positiveHIGH (0.90)Threshold too strict for context
User flags missed regressionHIGH (0.90)Threshold too relaxed
Same page fails repeatedlyMEDIUM (0.75)Investigate dynamic content issue
信号置信度学习内容
用户批准新基准高(0.90)阈值设置合适
用户拒绝误报高(0.90)该场景下阈值过于严格
用户标记遗漏的回归高(0.90)阈值过于宽松
同一页面反复失败中(0.75)调查动态内容问题

Examples

示例

Example 1: Responsive Layout Validation

示例1:响应式布局验证

Complexity: Medium (3 viewports, 5 pages)
Task: Validate homepage responsive behavior across mobile, tablet, desktop
Planning Output (sequential-thinking):
Thought 1/6: Need to validate responsive breakpoints for homepage
Thought 2/6: Viewports: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: Capture sections: hero, features, pricing, footer
Thought 4/6: Use default threshold (0.1%) for static content
Thought 5/6: Check baseline existence, compare if present
Thought 6/6: Generate report with pass/fail per viewport
Execution:
javascript
// 1. Create test tab
await tabs_create_mcp() // -> tabId: 123

// 2. Navigate to homepage
await navigate({ url: "https://example.com/", tabId: 123 })

// 3. Mobile viewport (iPhone 14)
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png

// 4. Tablet viewport (iPad Pro 11)
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png

// 5. Desktop viewport (Desktop HD)
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png

// 6. Compare each against baseline from Memory MCP
// 7. Generate report
Result: 3/3 viewports passed, no regressions detected
Execution Time: 45 seconds
复杂度: 中等(3种视口,5个页面)
任务: 验证首页在移动、平板、桌面设备上的响应式行为
规划输出(sequential-thinking):
Thought 1/6: 需要验证首页的响应式断点
Thought 2/6: 视口: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: 捕获区域: 英雄区、功能区、定价区、页脚
Thought 4/6: 静态内容使用默认阈值(0.1%)
Thought 5/6: 检查基准是否存在,如存在则进行对比
Thought 6/6: 生成每个视口的通过/失败报告
执行:
javascript
// 1. 创建测试标签
await tabs_create_mcp() // -> tabId: 123

// 2. 导航到首页
await navigate({ url: "https://example.com/", tabId: 123 })

// 3. 移动视口(iPhone 14)
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png

// 4. 平板视口(iPad Pro 11)
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png

// 5. 桌面视口(Desktop HD)
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png

// 6. 每个视口与Memory MCP中的基准对比
// 7. 生成报告
结果: 3/3视口通过,未检测到回归问题
执行时间: 45秒

Example 2: Component State Testing (Buttons)

示例2:组件状态测试(按钮)

Complexity: Medium (4 states per button, zoom captures)
Task: Validate primary button visual states (default, hover, active, disabled)
Planning Output:
Thought 1/8: Testing primary button component visual states
Thought 2/8: States to capture: default, hover, active, disabled
Thought 3/8: Use zoom tool for detailed button capture
Thought 4/8: Strict threshold (0.01%) for design system component
Thought 5/8: Capture default state first
Thought 6/8: Use hover action for hover state
Thought 7/8: Use mouse down for active state
Thought 8/8: Navigate to disabled example for disabled state
Execution:
javascript
// 1. Navigate to component library
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })

// 2. Find button element
const button = await find({ query: "primary button", tabId: 123 })

// 3. Zoom capture default state
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// 4. Hover state capture
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// ... continue for active, disabled states
Result: 4/4 states passed strict threshold
Execution Time: 30 seconds
复杂度: 中等(每个按钮4种状态,缩放捕获)
任务: 验证主按钮的视觉状态(默认、悬停、激活、禁用)
规划输出:
Thought 1/8: 测试主按钮组件的视觉状态
Thought 2/8: 要捕获的状态: 默认、悬停、激活、禁用
Thought 3/8: 使用缩放工具进行按钮的详细捕获
Thought 4/8: 设计系统组件使用严格阈值(0.01%)
Thought 5/8: 先捕获默认状态
Thought 6/8: 使用悬停操作获取悬停状态
Thought 7/8: 使用鼠标按下获取激活状态
Thought 8/8: 导航到禁用示例获取禁用状态
执行:
javascript
// 1. 导航到组件库
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })

// 2. 查找按钮元素
const button = await find({ query: "primary button", tabId: 123 })

// 3. 缩放捕获默认状态
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// 4. 悬停状态捕获
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })

// ... 继续捕获激活、禁用状态
结果: 4/4状态通过严格阈值
执行时间: 30秒

Example 3: Animation Recording (Loading Sequence)

示例3:动画录制(加载序列)

Complexity: High (GIF recording, temporal comparison)
Task: Capture and validate skeleton-to-content loading animation
Planning Output:
Thought 1/6: Need to capture loading animation as GIF
Thought 2/6: Trigger reload to capture full sequence
Thought 3/6: Start GIF recording before reload
Thought 4/6: Wait for content load completion
Thought 5/6: Stop recording and export GIF
Thought 6/6: Use animation threshold (5%) for comparison
Execution:
javascript
// 1. Start GIF recording
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Initial frame

// 2. Trigger reload
await navigate({ url: "https://example.com/dashboard", tabId: 123 })

// 3. Wait for load sequence
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Final frame

// 4. Stop recording and export
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })
Result: Animation captured successfully, 2.3% diff from baseline (within 5% animation threshold)
Execution Time: 15 seconds
复杂度: 高(GIF录制,时间维度对比)
任务: 捕获并验证骨架屏到内容的加载动画
规划输出:
Thought 1/6: 需要将加载动画捕获为GIF
Thought 2/6: 触发重载以捕获完整序列
Thought 3/6: 重载前开始GIF录制
Thought 4/6: 等待内容加载完成
Thought 5/6: 停止录制并导出GIF
Thought 6/6: 使用动画阈值(5%)进行对比
执行:
javascript
// 1. 开始GIF录制
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // 初始帧

// 2. 触发重载
await navigate({ url: "https://example.com/dashboard", tabId: 123 })

// 3. 等待加载序列完成
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // 最终帧

// 4. 停止录制并导出
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })
结果: 动画捕获成功,与基准差异为2.3%(在5%的动画阈值范围内)
执行时间: 15秒

Troubleshooting

故障排除

Common Issues and Solutions

常见问题与解决方案

IssueCauseSolution
Screenshots are blank/blackPage not fully loadedAdd wait after navigation, check for lazy loading
Diff always failsThreshold too strictIncrease threshold or configure ignore regions
Viewport resize not workingTab permission issueCreate new tab with tabs_create_mcp
GIF not recordingRecording not startedCall gif_creator start_recording before actions
Baseline not foundWrong namespace keyVerify page/viewport in Memory MCP query
Zoom captures wrong regionCoordinates shiftedRecalculate region after viewport resize
问题原因解决方案
截图为空/黑屏页面未完全加载导航后添加等待,检查懒加载
差异始终失败阈值过于严格提高阈值或配置忽略区域
视口调整无效标签权限问题使用tabs_create_mcp创建新标签
GIF未录制未开始录制操作前调用gif_creator start_recording
基准未找到命名空间键错误验证Memory MCP查询中的页面/视口
缩放捕获错误区域坐标偏移视口调整后重新计算区域

Debug Mode

调试模式

Enable verbose output for troubleshooting:
javascript
const DEBUG_MODE = true;

if (DEBUG_MODE) {
  console.log("Viewport:", viewport);
  console.log("Page URL:", url);
  console.log("Capture timestamp:", new Date().toISOString());
  console.log("Baseline exists:", baselineExists);
  console.log("Diff result:", diffResult);
}
启用详细输出进行故障排除:
javascript
const DEBUG_MODE = true;

if (DEBUG_MODE) {
  console.log("Viewport:", viewport);
  console.log("Page URL:", url);
  console.log("Capture timestamp:", new Date().toISOString());
  console.log("Baseline exists:", baselineExists);
  console.log("Diff result:", diffResult);
}

Conclusion

结论

Visual Testing provides systematic screenshot-based regression detection that complements functional testing. By comparing actual rendered output against approved baselines across multiple viewports, this skill catches UI regressions that unit and integration tests miss.
The key differentiators are:
  1. Baseline management: Versioned golden images with explicit approval workflow
  2. Multi-viewport coverage: Responsive testing across mobile, tablet, and desktop
  3. Threshold-based comparison: Configurable tolerance to balance sensitivity and false positives
  4. Zoom capabilities: Element-level precision for design system validation
  5. GIF recording: Temporal capture for animation and interaction testing
When integrated with the CI/CD pipeline, visual testing serves as a deployment gate that prevents visual regressions from reaching production. Combined with Memory MCP for persistent baselines, the system maintains consistent quality across releases.
视觉测试提供系统化的基于截图的回归检测,是功能测试的补充。通过在多视口下对比实际渲染输出与已批准的基准,该技能能捕获单元测试和集成测试遗漏的UI回归问题。
关键差异化特性:
  1. 基准管理: 带明确批准工作流的版本化基准图像
  2. 多视口覆盖: 移动、平板、桌面设备的响应式测试
  3. 基于阈值的对比: 可配置容差,平衡敏感度与误报
  4. 缩放功能: 设计系统验证的元素级精准性
  5. GIF录制: 动画与交互测试的时间维度捕获
当集成到CI/CD流水线时,视觉测试可作为部署闸门,防止视觉回归问题进入生产环境。结合Memory MCP实现持久化基准,系统可在多个版本中保持一致的质量。

Success Criteria

成功标准

Quality Thresholds:
  • All configured viewports captured successfully
  • Baseline comparison completed for all captures (or flagged as new)
  • Report generated with pass/fail status per page/viewport
  • No orphaned tabs after test completion
  • Execution time within 2x estimated duration
Failure Indicators:
  • Screenshot capture fails (blank/timeout)
  • Comparison fails with system error (not threshold failure)
  • Memory MCP unavailable for baseline storage
  • Tab context lost during multi-viewport capture
质量阈值:
  • 所有配置的视口均成功捕获
  • 所有捕获完成基准对比(或标记为新基准)
  • 生成带页面/视口通过/失败状态的报告
  • 测试完成后无孤立标签
  • 执行时间在预估时长的2倍以内
失败指标:
  • 截图捕获失败(空白/超时)
  • 对比因系统错误失败(非阈值失败)
  • Memory MCP不可用于基准存储
  • 多视口捕获过程中丢失标签上下文

Completion Verification

完成验证

  • YAML frontmatter with full description and triggers
  • Overview explains philosophy and methodology
  • Core Principles section has 5 principles with practical guidance
  • When to Use has clear use/don't-use criteria
  • Main Workflow has 6 phases with contracts
  • Pattern Recognition covers 4 testing patterns
  • Advanced Techniques includes multi-model and ignore regions
  • Common Anti-Patterns has 3 tables (capture, comparison, workflow)
  • Cross-Skill Coordination documents upstream/downstream/parallel
  • MCP Requirements explains all required tools
  • Input/Output Contracts clearly specified in YAML
  • LEARNED PATTERNS section present (empty for future updates)
  • Examples include 3 concrete scenarios
  • Troubleshooting addresses common issues
  • Conclusion summarizes skill value
  • Memory namespace documented with tagging
<promise>VISUAL_TESTING_VERILINGUA_VERIX_COMPLIANT</promise>
  • 带完整描述和触发条件的YAML前置元数据
  • 概述解释核心理念和方法论
  • 核心原则部分包含5项原则及实践指导
  • 何时使用部分包含清晰的使用/不使用标准
  • 主要工作流包含6个阶段及约定
  • 模式识别覆盖4种测试模式
  • 高级技术包含多模型和忽略区域
  • 常见反模式包含3个表格(捕获、对比、工作流)
  • 跨技能协作记录了上游/下游/并行技能
  • MCP需求解释了所有必需工具
  • 输入/输出约定以YAML清晰指定
  • LEARNED PATTERNS部分存在(未来更新用)
  • 示例包含3个具体场景
  • 故障排除解决常见问题
  • 结论总结技能价值
  • 内存命名空间及标签已记录
<promise>VISUAL_TESTING_VERILINGUA_VERIX_COMPLIANT</promise>