multi-agent-e2e-validation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Multi-Agent E2E Validation

多Agent端到端(E2E)验证

Overview

概述

Prescriptive workflow for spawning parallel validation agents to comprehensively test database refactors. Successfully identified 5 critical bugs (100% system failure rate) in QuestDB migration that would have shipped in production.
这是一套用于生成并行验证Agent以全面测试数据库重构的标准化工作流。在QuestDB迁移项目中,该工作流成功识别出5个关键Bug(系统故障率100%),这些Bug原本会被带入生产环境。

When to Use This Skill

何时使用该技能

Use this skill when:
  • Database refactors (e.g., v3.x file-based → v4.x QuestDB)
  • Schema migrations requiring validation
  • Bulk data ingestion pipeline testing
  • System migrations with multiple validation layers
  • Pre-release validation for database-centric systems
Key outcomes:
  • Parallel agent execution for comprehensive coverage
  • Structured validation reporting (VALIDATION_FINDINGS.md)
  • Bug discovery with severity classification (Critical/Medium/Low)
  • Release readiness assessment
在以下场景中使用本技能:
  • 数据库重构(例如:v3.x 文件存储 → v4.x QuestDB)
  • 需要验证的Schema迁移
  • 批量数据摄入管道测试
  • 包含多层验证的系统迁移
  • 以数据库为核心的系统的预发布验证
关键成果:
  • 并行Agent执行以实现全面覆盖
  • 结构化验证报告(VALIDATION_FINDINGS.md)
  • 带严重程度分类的Bug发现(关键/中等/轻微)
  • 发布就绪评估

Core Methodology

核心方法论

1. Validation Architecture (3-Layer Model)

1. 验证架构(三层模型)

Layer 1: Environment Setup
  • Container orchestration (Colima/Docker)
  • Database deployment and schema application
  • Connectivity validation (ILP, PostgreSQL, HTTP ports)
  • Configuration file creation and validation
Layer 2: Data Flow Validation
  • Bulk ingestion testing (CloudFront → QuestDB)
  • Performance benchmarking against SLOs
  • Multi-month data ingestion
  • Deduplication testing (re-ingestion scenarios)
  • Type conversion validation (FLOAT→LONG casts)
Layer 3: Query Interface Validation
  • High-level query methods (get_latest, get_range, execute_sql)
  • Edge cases (limit=1, cross-month boundaries)
  • Error handling (invalid symbols, dates, parameters)
  • Gap detection SQL compatibility
第一层:环境搭建
  • 容器编排(Colima/Docker)
  • 数据库部署与Schema应用
  • 连通性验证(ILP、PostgreSQL、HTTP端口)
  • 配置文件创建与验证
第二层:数据流验证
  • 批量摄入测试(CloudFront → QuestDB)
  • 基于服务水平目标(SLO)的性能基准测试
  • 跨多月数据摄入测试
  • 去重测试(重复摄入场景)
  • 类型转换验证(FLOAT→LONG 转换)
第三层:查询接口验证
  • 高级查询方法测试(get_latest、get_range、execute_sql)
  • 边缘场景测试(limit=1、跨月边界)
  • 错误处理测试(无效符号、日期、参数)
  • 间隙检测SQL兼容性测试

2. Agent Orchestration Pattern

2. Agent编排模式

Sequential vs Parallel Execution:
Agent 1 (Environment) → [SEQUENTIAL - prerequisite]
Agent 2 (Bulk Loader) → [PARALLEL with Agent 3]
Agent 3 (Query Interface) → [PARALLEL with Agent 2]
Dependency Rule: Environment validation must pass before data flow/query validation
Dynamic Todo Management:
  • Start with high-level plan (ADR-defined phases)
  • Prune completed agents from todo list
  • Grow todo list when bugs discovered (e.g., Bug #5 found by Agent 3)
  • Update VALIDATION_FINDINGS.md incrementally
串行 vs 并行执行:
Agent 1 (环境) → [串行 - 前置条件]
Agent 2 (批量加载器) → [与Agent 3并行]
Agent 3 (查询接口) → [与Agent 2并行]
依赖规则:环境验证必须在数据流/查询验证之前通过
动态待办事项管理:
  • 从高级计划开始(由架构决策记录ADR定义的阶段)
  • 从待办列表中移除已完成的Agent
  • 发现Bug时扩展待办列表(例如:Agent 3发现的Bug #5)
  • 逐步更新VALIDATION_FINDINGS.md

3. Validation Script Structure

3. 验证脚本结构

Each agent produces:
  1. Test Script (e.g.,
    test_bulk_loader.py
    )
    • 5+ test functions with clear pass/fail criteria
    • Structured output (test name, result, details)
    • Summary report at end
  2. Artifacts (logs, config files, evidence)
  3. Findings Report (bugs, severity, fix proposals)
Example Test Structure:
python
def test_feature(conn):
    """Test 1: Feature description"""
    print("=" * 80)
    print("TEST 1: Feature description")
    print("=" * 80)

    results = {}

    # Test 1a: Subtest name
    print("\n1a. Testing subtest:")
    result_1a = perform_test()
    print(f"   Result: {result_1a}")
    results["subtest_1a"] = result_1a == expected_1a

    # Summary
    print("\n" + "-" * 80)
    all_passed = all(results.values())
    print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
    for test_name, passed in results.items():
        print(f"  - {test_name}: {'✓' if passed else '✗'}")

    return {"success": all_passed, "details": results}
每个Agent会生成以下内容:
  1. 测试脚本(例如:
    test_bulk_loader.py
    • 5个以上带有明确通过/失败标准的测试函数
    • 结构化输出(测试名称、结果、详情)
    • 末尾的汇总报告
  2. 工件(日志、配置文件、证据)
  3. 发现报告(Bug、严重程度、修复建议)
测试结构示例:
python
def test_feature(conn):
    """Test 1: Feature description"""
    print("=" * 80)
    print("TEST 1: Feature description")
    print("=" * 80)

    results = {}

    # Test 1a: Subtest name
    print("\n1a. Testing subtest:")
    result_1a = perform_test()
    print(f"   Result: {result_1a}")
    results["subtest_1a"] = result_1a == expected_1a

    # Summary
    print("\n" + "-" * 80)
    all_passed = all(results.values())
    print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
    for test_name, passed in results.items():
        print(f"  - {test_name}: {'✓' if passed else '✗'}")

    return {"success": all_passed, "details": results}

4. Bug Classification and Tracking

4. Bug分类与跟踪

Severity Levels:
  • 🔴 Critical: 100% system failure (e.g., API mismatch, timestamp corruption)
  • 🟡 Medium: Degraded functionality (e.g., below SLO performance)
  • 🟢 Low: Minor issues, edge cases
Bug Report Format:
markdown
undefined
严重程度等级:
  • 🔴 关键:系统100%故障(例如:API不匹配、时间戳损坏)
  • 🟡 中等:功能降级(例如:性能低于SLO)
  • 🟢 轻微:小问题、边缘场景
Bug报告格式:
markdown
undefined

Bug N: Descriptive Name (SEVERITY - Status)

Bug N: 描述性名称(严重程度 - 状态)

Location:
file/path.py:line
Issue: One-sentence description
Impact: Quantified impact (e.g., "100% ingestion failure")
Root Cause: Technical explanation
Fix Applied: Code changes with before/after
Verification: Test results proving fix
Status: ✅ FIXED / ⚠️ PARTIAL / ❌ OPEN
undefined
位置
file/path.py:line
问题:一句话描述
影响:量化影响(例如:“100%摄入失败”)
根本原因:技术解释
已应用修复:代码变更的前后对比
验证:证明修复有效的测试结果
状态:✅ 已修复 / ⚠️ 部分修复 / ❌ 未解决
undefined

5. Release Readiness Decision Framework

5. 发布就绪决策框架

Go/No-Go Criteria:
BLOCKER = Any Critical bug unfixed
SHIP = All Critical bugs fixed + (Medium bugs acceptable OR fixed)
DEFER = >3 Medium bugs unfixed OR any High-severity bug
Example Decision:
  • 5 Critical bugs found → all fixed ✅
  • 1 Medium bug (performance 55% below SLO) → acceptable ✅
  • Verdict: RELEASE READY
发布/阻止发布标准:
阻止发布 = 存在未修复的关键Bug
允许发布 = 所有关键Bug已修复 + (中等Bug可接受或已修复)
延迟发布 = 超过3个未修复的中等Bug 或 存在任何高严重程度Bug
决策示例:
  • 发现5个关键Bug → 全部修复 ✅
  • 1个中等Bug(性能比SLO低55%)→ 可接受 ✅
  • 结论:已准备好发布

Workflow: Step-by-Step

工作流:分步指南

Step 1: Create Validation Plan (ADR-Driven)

步骤1:创建验证计划(ADR驱动)

Input: ADR document (e.g., ADR-0002 QuestDB Refactor) Output: Validation plan with 3-7 agents
Plan Structure:
markdown
undefined
输入:ADR文档(例如:ADR-0002 QuestDB重构) 输出:包含3-7个Agent的验证计划
计划结构:
markdown
undefined

Validation Agents

验证Agent

Agent 1: Environment Setup

Agent 1: 环境搭建

  • Deploy QuestDB via Docker
  • Apply schema.sql
  • Validate connectivity (ILP, PG, HTTP)
  • Create .env configuration
  • 通过Docker部署QuestDB
  • 应用schema.sql
  • 验证连通性(ILP、PG、HTTP)
  • 创建.env配置

Agent 2: Bulk Loader Validation

Agent 2: 批量加载器验证

  • Test CloudFront → QuestDB ingestion
  • Benchmark performance (target: >100K rows/sec)
  • Validate deduplication (re-ingestion test)
  • Multi-month ingestion test
  • 测试CloudFront → QuestDB数据摄入
  • 性能基准测试(目标:>10万行/秒)
  • 去重验证(重复摄入测试)
  • 跨多月摄入测试

Agent 3: Query Interface Validation

Agent 3: 查询接口验证

  • Test get_latest() with various limits
  • Test get_range() with date boundaries
  • Test execute_sql() with parameterized queries
  • Test detect_gaps() SQL compatibility
  • Test error handling (invalid inputs)
undefined
  • 测试不同限制条件下的get_latest()
  • 测试跨日期边界的get_range()
  • 测试带参数查询的execute_sql()
  • 测试detect_gaps()的SQL兼容性
  • 测试错误处理(无效输入)
undefined

Step 2: Execute Agent 1 (Environment)

步骤2:执行Agent 1(环境)

Directory Structure:
tmp/e2e-validation/
  agent-1-env/
    test_environment_setup.py
    questdb.log
    config.env
    schema-check.txt
Validation Checklist:
  • ✅ Container running
  • ✅ Ports accessible (9009 ILP, 8812 PG, 9000 HTTP)
  • ✅ Schema applied without errors
  • ✅ .env file created
目录结构:
tmp/e2e-validation/
  agent-1-env/
    test_environment_setup.py
    questdb.log
    config.env
    schema-check.txt
验证检查清单:
  • ✅ 容器正在运行
  • ✅ 端口可访问(9009 ILP、8812 PG、9000 HTTP)
  • ✅ Schema应用无错误
  • ✅ .env文件已创建

Step 3: Execute Agents 2-3 in Parallel

步骤3:并行执行Agent 2-3

Agent 2: Bulk Loader
tmp/e2e-validation/
  agent-2-bulk/
    test_bulk_loader.py
    ingestion_benchmark.txt
    deduplication_test.txt
Agent 3: Query Interface
tmp/e2e-validation/
  agent-3-query/
    test_query_interface.py
    gap_detection_test.txt
Execution:
bash
undefined
Agent 2: 批量加载器
tmp/e2e-validation/
  agent-2-bulk/
    test_bulk_loader.py
    ingestion_benchmark.txt
    deduplication_test.txt
Agent 3: 查询接口
tmp/e2e-validation/
  agent-3-query/
    test_query_interface.py
    gap_detection_test.txt
执行命令:
bash
undefined

Terminal 1

终端1

cd tmp/e2e-validation/agent-2-bulk uv run python test_bulk_loader.py
cd tmp/e2e-validation/agent-2-bulk uv run python test_bulk_loader.py

Terminal 2

终端2

cd tmp/e2e-validation/agent-3-query uv run python test_query_interface.py
undefined
cd tmp/e2e-validation/agent-3-query uv run python test_query_interface.py
undefined

Step 4: Document Findings in VALIDATION_FINDINGS.md

步骤4:在VALIDATION_FINDINGS.md中记录发现

Template:
markdown
undefined
模板:
markdown
undefined

E2E Validation Findings Report

E2E验证发现报告

Validation ID: ADR-XXXX Branch: feat/database-refactor Date: YYYY-MM-DD Target Release: vX.Y.Z Status: [BLOCKED / READY / IN_PROGRESS]
验证ID:ADR-XXXX 分支:feat/database-refactor 日期:YYYY-MM-DD 目标版本:vX.Y.Z 状态:[已阻止 / 已就绪 / 进行中]

Executive Summary

执行摘要

E2E validation discovered N critical bugs that would have caused [impact]:
FindingSeverityStatusImpactAgent
Bug 1CriticalFixed100% failureAgent 2
Recommendation: [RELEASE READY / BLOCKED / DEFER]
E2E验证发现N个关键Bug,这些Bug会导致[影响]:
发现严重程度状态影响Agent
Bug 1关键已修复100%故障Agent 2
建议:[已准备好发布 / 已阻止 / 延迟]

Agent 1: Environment Setup - [STATUS]

Agent 1: 环境搭建 - [状态]

...
...

Agent 2: [Name] - [STATUS]

Agent 2: [名称] - [状态]

...
undefined
...
undefined

Step 5: Iterate on Fixes

步骤5:迭代修复

For each bug:
  1. Document in VALIDATION_FINDINGS.md with 🔴/🟡/🟢 severity
  2. Apply fix to source code
  3. Re-run failing test
  4. Update bug status to ✅ FIXED
  5. Commit with semantic message (e.g.,
    fix: correct timestamp parsing in CSV ingestion
    )
Example Fix Commit:
bash
git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py
git commit -m "fix: prevent pandas from treating first CSV column as index

BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01)
due to pandas read_csv() auto-indexing. Added index_col=False to
preserve first column as data.

Fixes #ABC-123"
针对每个Bug:
  1. 在VALIDATION_FINDINGS.md中用🔴/🟡/🟢标记严重程度
  2. 对源代码应用修复
  3. 重新运行失败的测试
  4. 将Bug状态更新为✅ 已修复
  5. 用语义化信息提交(例如:
    fix: correct timestamp parsing in CSV ingestion
修复提交示例:
bash
git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py
git commit -m "fix: prevent pandas from treating first CSV column as index

BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01)
due to pandas read_csv() auto-indexing. Added index_col=False to
preserve first column as data.

Fixes #ABC-123"

Step 6: Final Validation and Release Decision

步骤6:最终验证与发布决策

Run all tests:
bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd tmp/e2e-validation
for agent in agent-*; do
    echo "=== Running $agent ==="
    cd $agent
    uv run python test_*.py
    cd ..
done
SKILL_SCRIPT_EOF
Update VALIDATION_FINDINGS.md status:
  • Count Critical bugs: X fixed, Y open
  • Count Medium bugs: X fixed, Y open
  • Apply decision framework
  • Update Status field to ✅ RELEASE READY or ❌ BLOCKED
运行所有测试:
bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd tmp/e2e-validation
for agent in agent-*; do
    echo "=== Running $agent ==="
    cd $agent
    uv run python test_*.py
    cd ..
done
SKILL_SCRIPT_EOF
更新VALIDATION_FINDINGS.md状态:
  • 统计关键Bug:已修复X个,未解决Y个
  • 统计中等Bug:已修复X个,未解决Y个
  • 应用决策框架
  • 状态字段更新为✅ 已准备好发布 或 ❌ 已阻止

Real-World Example: QuestDB Refactor Validation

真实案例:QuestDB重构验证

Context: Migrating from file-based storage (v3.x) to QuestDB (v4.0.0)
Bugs Found:
  1. 🔴 Sender API mismatch - Used non-existent
    Sender.from_uri()
    instead of
    Sender.from_conf()
  2. 🔴 Type conversion -
    number_of_trades
    sent as FLOAT, schema expects LONG
  3. 🔴 Timestamp parsing - pandas treating first column as index → epoch 0 timestamps
  4. 🔴 Deduplication - WAL mode doesn't provide UPSERT semantics (needed
    DEDUP ENABLE UPSERT KEYS
    )
  5. 🔴 SQL incompatibility - detect_gaps() used nested window functions (QuestDB unsupported)
Impact: Without this validation, v4.0.0 would ship with 100% data corruption and 100% ingestion failure
Outcome: All 5 bugs fixed, system validated, v4.0.0 released successfully
背景:从文件存储(v3.x)迁移到QuestDB(v4.0.0)
发现的Bug:
  1. 🔴 Sender API不匹配 - 使用了不存在的
    Sender.from_uri()
    ,正确应为
    Sender.from_conf()
  2. 🔴 类型转换 -
    number_of_trades
    以FLOAT类型发送,但Schema期望LONG类型
  3. 🔴 时间戳解析 - pandas将第一列视为索引 → 时间戳默认为纪元0(1970-01)
  4. 🔴 去重 - WAL模式不提供UPSERT语义(需要
    DEDUP ENABLE UPSERT KEYS
  5. 🔴 SQL不兼容 - detect_gaps()使用了嵌套窗口函数(QuestDB不支持)
影响:如果没有本次验证,v4.0.0版本会带着100%数据损坏和100%摄入失败的问题发布
结果:5个Bug全部修复,系统通过验证,v4.0.0成功发布

Common Pitfalls

常见陷阱

1. Skipping Environment Validation

1. 跳过环境验证

Bad: Assume Docker/database is working, jump to data ingestion tests ✅ Good: Agent 1 validates environment first, catches port conflicts, schema errors early
错误做法:假设Docker/数据库正常工作,直接跳转到数据摄入测试 ✅ 正确做法:Agent 1先验证环境,提前发现端口冲突、Schema错误

2. Serial Agent Execution

2. 串行执行Agent

Bad: Run Agent 2, wait for completion, then run Agent 3 ✅ Good: Run Agent 2 & 3 in parallel (no dependency between them)
错误做法:先运行Agent 2,等待完成后再运行Agent 3 ✅ 正确做法:并行运行Agent 2和3(两者之间无依赖)

3. Manual Test Reporting

3. 手动测试报告

Bad: Copy/paste test output into Slack/email ✅ Good: Structured VALIDATION_FINDINGS.md with severity, status, fix tracking
错误做法:将测试输出复制粘贴到Slack/邮件 ✅ 正确做法:使用结构化的VALIDATION_FINDINGS.md,包含严重程度、状态、修复跟踪

4. Ignoring Medium Bugs

4. 忽略中等Bug

Bad: "Performance is 55% below SLO, but we'll fix it later" ✅ Good: Document in VALIDATION_FINDINGS.md, make explicit go/no-go decision
错误做法:“性能比SLO低55%,但我们之后再修复” ✅ 正确做法:在VALIDATION_FINDINGS.md中记录,明确做出发布/阻止发布决策

5. No Re-validation After Fixes

5. 修复后不重新验证

Bad: Apply fix, assume it works, move on ✅ Good: Re-run failing test, update status in VALIDATION_FINDINGS.md
错误做法:应用修复后假设问题已解决,继续下一步 ✅ 正确做法:重新运行失败的测试,在VALIDATION_FINDINGS.md中更新状态

Resources

资源

scripts/

scripts/

Not applicable - validation scripts are project-specific (stored in
tmp/e2e-validation/
)
不适用 - 验证脚本是项目特定的(存储在
tmp/e2e-validation/

references/

references/

  • example_validation_findings.md
    - Complete VALIDATION_FINDINGS.md template
  • agent_test_template.py
    - Template for creating validation test scripts
  • bug_severity_classification.md
    - Detailed severity criteria and examples
  • example_validation_findings.md
    - 完整的VALIDATION_FINDINGS.md模板
  • agent_test_template.py
    - 用于创建验证测试脚本的模板
  • bug_severity_classification.md
    - 详细的严重程度标准和示例

assets/

assets/

Not applicable - validation artifacts are project-specific

不适用 - 验证工件是项目特定的

Troubleshooting

故障排除

IssueCauseSolution
Container not startingColima/Docker not runningRun
colima start
before Agent 1
Port conflictsPorts already in useStop conflicting containers or use different ports
Schema application failsInvalid SQL syntaxCheck schema.sql for database-specific compatibility
Agent 2/3 fail without Agent 1Environment not validatedEnsure Agent 1 completes before starting Agent 2/3
Test script import errorsMissing dependenciesRun
uv pip install
in agent directory
Bug status not updatingVALIDATION_FINDINGS.md staleManually refresh status after each fix
Parallel agents interferenceShared resources conflictEnsure agents use isolated directories
Decision unclearSeverity mixed Critical/MediumApply Go/No-Go criteria strictly per documentation
问题原因解决方案
容器无法启动Colima/Docker未运行在执行Agent 1前运行
colima start
端口冲突端口已被占用停止冲突的容器或使用其他端口
Schema应用失败SQL语法无效检查schema.sql的数据库特定兼容性
未执行Agent 1时Agent 2/3失败环境未验证确保Agent 1完成后再启动Agent 2/3
测试脚本导入错误缺少依赖在Agent目录中运行
uv pip install
Bug状态未更新VALIDATION_FINDINGS.md过时每次修复后手动刷新状态
并行Agent互相干扰共享资源冲突确保Agent使用独立的目录
决策不明确严重程度混合了关键/中等严格按照文档应用发布/阻止发布标准