multi-agent-e2e-validation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Multi-Agent E2E Validation

多Agent端到端（E2E）验证

Overview

概述

Prescriptive workflow for spawning parallel validation agents to comprehensively test database refactors. Successfully identified 5 critical bugs (100% system failure rate) in QuestDB migration that would have shipped in production.

这是一套用于生成并行验证Agent以全面测试数据库重构的标准化工作流。在QuestDB迁移项目中，该工作流成功识别出5个关键Bug（系统故障率100%），这些Bug原本会被带入生产环境。

When to Use This Skill

何时使用该技能

Use this skill when:

Database refactors (e.g., v3.x file-based → v4.x QuestDB)
Schema migrations requiring validation
Bulk data ingestion pipeline testing
System migrations with multiple validation layers
Pre-release validation for database-centric systems

Key outcomes:

Parallel agent execution for comprehensive coverage
Structured validation reporting (VALIDATION_FINDINGS.md)
Bug discovery with severity classification (Critical/Medium/Low)
Release readiness assessment

在以下场景中使用本技能：

数据库重构（例如：v3.x 文件存储 → v4.x QuestDB）
需要验证的Schema迁移
批量数据摄入管道测试
包含多层验证的系统迁移
以数据库为核心的系统的预发布验证

关键成果：

并行Agent执行以实现全面覆盖
结构化验证报告（VALIDATION_FINDINGS.md）
带严重程度分类的Bug发现（关键/中等/轻微）
发布就绪评估

Core Methodology

核心方法论

1. Validation Architecture (3-Layer Model)

1. 验证架构（三层模型）

Layer 1: Environment Setup

Container orchestration (Colima/Docker)
Database deployment and schema application
Connectivity validation (ILP, PostgreSQL, HTTP ports)
Configuration file creation and validation

Layer 2: Data Flow Validation

Bulk ingestion testing (CloudFront → QuestDB)
Performance benchmarking against SLOs
Multi-month data ingestion
Deduplication testing (re-ingestion scenarios)
Type conversion validation (FLOAT→LONG casts)

Layer 3: Query Interface Validation

High-level query methods (get_latest, get_range, execute_sql)
Edge cases (limit=1, cross-month boundaries)
Error handling (invalid symbols, dates, parameters)
Gap detection SQL compatibility

第一层：环境搭建

容器编排（Colima/Docker）
数据库部署与Schema应用
连通性验证（ILP、PostgreSQL、HTTP端口）
配置文件创建与验证

第二层：数据流验证

批量摄入测试（CloudFront → QuestDB）
基于服务水平目标（SLO）的性能基准测试
跨多月数据摄入测试
去重测试（重复摄入场景）
类型转换验证（FLOAT→LONG 转换）

第三层：查询接口验证

高级查询方法测试（get_latest、get_range、execute_sql）
边缘场景测试（limit=1、跨月边界）
错误处理测试（无效符号、日期、参数）
间隙检测SQL兼容性测试

2. Agent Orchestration Pattern

2. Agent编排模式

Sequential vs Parallel Execution:

Agent 1 (Environment) → [SEQUENTIAL - prerequisite]
  ↓
Agent 2 (Bulk Loader) → [PARALLEL with Agent 3]
Agent 3 (Query Interface) → [PARALLEL with Agent 2]

Dependency Rule: Environment validation must pass before data flow/query validation

Dynamic Todo Management:

Start with high-level plan (ADR-defined phases)
Prune completed agents from todo list
Grow todo list when bugs discovered (e.g., Bug #5 found by Agent 3)
Update VALIDATION_FINDINGS.md incrementally

串行 vs 并行执行：

Agent 1 (环境) → [串行 - 前置条件]
  ↓
Agent 2 (批量加载器) → [与Agent 3并行]
Agent 3 (查询接口) → [与Agent 2并行]

依赖规则：环境验证必须在数据流/查询验证之前通过

动态待办事项管理：

从高级计划开始（由架构决策记录ADR定义的阶段）
从待办列表中移除已完成的Agent
发现Bug时扩展待办列表（例如：Agent 3发现的Bug #5）
逐步更新VALIDATION_FINDINGS.md

3. Validation Script Structure

3. 验证脚本结构

Each agent produces:

Test Script (e.g.,
```
test_bulk_loader.py
```
)
- 5+ test functions with clear pass/fail criteria
- Structured output (test name, result, details)
- Summary report at end
Artifacts (logs, config files, evidence)
Findings Report (bugs, severity, fix proposals)

Example Test Structure:

python

def test_feature(conn):
    """Test 1: Feature description"""
    print("=" * 80)
    print("TEST 1: Feature description")
    print("=" * 80)

    results = {}

    # Test 1a: Subtest name
    print("\n1a. Testing subtest:")
    result_1a = perform_test()
    print(f"   Result: {result_1a}")
    results["subtest_1a"] = result_1a == expected_1a

    # Summary
    print("\n" + "-" * 80)
    all_passed = all(results.values())
    print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
    for test_name, passed in results.items():
        print(f"  - {test_name}: {'✓' if passed else '✗'}")

    return {"success": all_passed, "details": results}

每个Agent会生成以下内容：

测试脚本（例如：
```
test_bulk_loader.py
```
）
- 5个以上带有明确通过/失败标准的测试函数
- 结构化输出（测试名称、结果、详情）
- 末尾的汇总报告
工件（日志、配置文件、证据）
发现报告（Bug、严重程度、修复建议）

测试结构示例：

python

def test_feature(conn):
    """Test 1: Feature description"""
    print("=" * 80)
    print("TEST 1: Feature description")
    print("=" * 80)

    results = {}

    # Test 1a: Subtest name
    print("\n1a. Testing subtest:")
    result_1a = perform_test()
    print(f"   Result: {result_1a}")
    results["subtest_1a"] = result_1a == expected_1a

    # Summary
    print("\n" + "-" * 80)
    all_passed = all(results.values())
    print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
    for test_name, passed in results.items():
        print(f"  - {test_name}: {'✓' if passed else '✗'}")

    return {"success": all_passed, "details": results}

4. Bug Classification and Tracking

4. Bug分类与跟踪

Severity Levels:

🔴 Critical: 100% system failure (e.g., API mismatch, timestamp corruption)
🟡 Medium: Degraded functionality (e.g., below SLO performance)
🟢 Low: Minor issues, edge cases

Bug Report Format:

markdown

undefined

严重程度等级：

🔴 关键：系统100%故障（例如：API不匹配、时间戳损坏）
🟡 中等：功能降级（例如：性能低于SLO）
🟢 轻微：小问题、边缘场景

Bug报告格式：

markdown

undefined

Bug N: Descriptive Name (SEVERITY - Status)

Bug N: 描述性名称（严重程度 - 状态）

Location:

file/path.py:line

Issue: One-sentence description

Impact: Quantified impact (e.g., "100% ingestion failure")

Root Cause: Technical explanation

Fix Applied: Code changes with before/after

Verification: Test results proving fix

Status: ✅ FIXED / ⚠️ PARTIAL / ❌ OPEN

undefined

位置：

file/path.py:line

问题：一句话描述

影响：量化影响（例如：“100%摄入失败”）

根本原因：技术解释

已应用修复：代码变更的前后对比

验证：证明修复有效的测试结果

状态：✅ 已修复 / ⚠️ 部分修复 / ❌ 未解决

undefined

5. Release Readiness Decision Framework

5. 发布就绪决策框架

Go/No-Go Criteria:

BLOCKER = Any Critical bug unfixed
SHIP = All Critical bugs fixed + (Medium bugs acceptable OR fixed)
DEFER = >3 Medium bugs unfixed OR any High-severity bug

Example Decision:

5 Critical bugs found → all fixed ✅
1 Medium bug (performance 55% below SLO) → acceptable ✅
Verdict: RELEASE READY

发布/阻止发布标准：

阻止发布 = 存在未修复的关键Bug
允许发布 = 所有关键Bug已修复 + （中等Bug可接受或已修复）
延迟发布 = 超过3个未修复的中等Bug 或 存在任何高严重程度Bug

决策示例：

发现5个关键Bug → 全部修复 ✅
1个中等Bug（性能比SLO低55%）→ 可接受 ✅
结论：已准备好发布

Workflow: Step-by-Step

工作流：分步指南

Step 1: Create Validation Plan (ADR-Driven)

步骤1：创建验证计划（ADR驱动）

Input: ADR document (e.g., ADR-0002 QuestDB Refactor) Output: Validation plan with 3-7 agents

Plan Structure:

markdown

undefined

输入：ADR文档（例如：ADR-0002 QuestDB重构）输出：包含3-7个Agent的验证计划

计划结构：

markdown

undefined

Validation Agents

验证Agent

Agent 1: Environment Setup

Agent 1: 环境搭建

Deploy QuestDB via Docker
Apply schema.sql
Validate connectivity (ILP, PG, HTTP)
Create .env configuration

通过Docker部署QuestDB
应用schema.sql
验证连通性（ILP、PG、HTTP）
创建.env配置

Agent 2: Bulk Loader Validation

Agent 2: 批量加载器验证

Test CloudFront → QuestDB ingestion
Benchmark performance (target: >100K rows/sec)
Validate deduplication (re-ingestion test)
Multi-month ingestion test

测试CloudFront → QuestDB数据摄入
性能基准测试（目标：>10万行/秒）
去重验证（重复摄入测试）
跨多月摄入测试

Agent 3: Query Interface Validation

Agent 3: 查询接口验证

Test get_latest() with various limits
Test get_range() with date boundaries
Test execute_sql() with parameterized queries
Test detect_gaps() SQL compatibility
Test error handling (invalid inputs)

undefined

测试不同限制条件下的get_latest()
测试跨日期边界的get_range()
测试带参数查询的execute_sql()
测试detect_gaps()的SQL兼容性
测试错误处理（无效输入）

undefined

Step 2: Execute Agent 1 (Environment)

步骤2：执行Agent 1（环境）

Directory Structure:

tmp/e2e-validation/
  agent-1-env/
    test_environment_setup.py
    questdb.log
    config.env
    schema-check.txt

Validation Checklist:

✅ Container running
✅ Ports accessible (9009 ILP, 8812 PG, 9000 HTTP)
✅ Schema applied without errors
✅ .env file created

目录结构：

tmp/e2e-validation/
  agent-1-env/
    test_environment_setup.py
    questdb.log
    config.env
    schema-check.txt

验证检查清单：

✅ 容器正在运行
✅ 端口可访问（9009 ILP、8812 PG、9000 HTTP）
✅ Schema应用无错误
✅ .env文件已创建

Step 3: Execute Agents 2-3 in Parallel

步骤3：并行执行Agent 2-3

Agent 2: Bulk Loader

tmp/e2e-validation/
  agent-2-bulk/
    test_bulk_loader.py
    ingestion_benchmark.txt
    deduplication_test.txt

Agent 3: Query Interface

tmp/e2e-validation/
  agent-3-query/
    test_query_interface.py
    gap_detection_test.txt

Execution:

bash

undefined

Agent 2: 批量加载器

tmp/e2e-validation/
  agent-2-bulk/
    test_bulk_loader.py
    ingestion_benchmark.txt
    deduplication_test.txt

Agent 3: 查询接口

tmp/e2e-validation/
  agent-3-query/
    test_query_interface.py
    gap_detection_test.txt

执行命令：

bash

undefined

Terminal 1

终端1

cd tmp/e2e-validation/agent-2-bulk uv run python test_bulk_loader.py

Terminal 2

终端2

cd tmp/e2e-validation/agent-3-query uv run python test_query_interface.py

undefined

cd tmp/e2e-validation/agent-3-query uv run python test_query_interface.py

undefined

Step 4: Document Findings in VALIDATION_FINDINGS.md

步骤4：在VALIDATION_FINDINGS.md中记录发现

Template:

markdown

undefined

模板：

markdown

undefined

E2E Validation Findings Report

E2E验证发现报告

Validation ID: ADR-XXXX Branch: feat/database-refactor Date: YYYY-MM-DD Target Release: vX.Y.Z Status: [BLOCKED / READY / IN_PROGRESS]

验证ID：ADR-XXXX 分支：feat/database-refactor 日期：YYYY-MM-DD 目标版本：vX.Y.Z 状态：[已阻止 / 已就绪 / 进行中]

Executive Summary

执行摘要

E2E validation discovered N critical bugs that would have caused [impact]:

Finding	Severity	Status	Impact	Agent
Bug 1	Critical	Fixed	100% failure	Agent 2

Recommendation: [RELEASE READY / BLOCKED / DEFER]

E2E验证发现N个关键Bug，这些Bug会导致[影响]：

发现	严重程度	状态	影响	Agent
Bug 1	关键	已修复	100%故障	Agent 2

建议：[已准备好发布 / 已阻止 / 延迟]

Agent 1: Environment Setup - [STATUS]

Agent 1: 环境搭建 - [状态]

...

Agent 2: [Name] - [STATUS]

Agent 2: [名称] - [状态]

...

undefined

...

undefined

Step 5: Iterate on Fixes

步骤5：迭代修复

For each bug:

Document in VALIDATION_FINDINGS.md with 🔴/🟡/🟢 severity
Apply fix to source code
Re-run failing test
Update bug status to ✅ FIXED

Commit with semantic message (e.g.,

fix: correct timestamp parsing in CSV ingestion

)

Example Fix Commit:

bash

git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py
git commit -m "fix: prevent pandas from treating first CSV column as index

BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01)
due to pandas read_csv() auto-indexing. Added index_col=False to
preserve first column as data.

Fixes #ABC-123"

针对每个Bug：

在VALIDATION_FINDINGS.md中用🔴/🟡/🟢标记严重程度
对源代码应用修复
重新运行失败的测试
将Bug状态更新为✅ 已修复

用语义化信息提交（例如：

fix: correct timestamp parsing in CSV ingestion

）

修复提交示例：

bash

git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py
git commit -m "fix: prevent pandas from treating first CSV column as index

BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01)
due to pandas read_csv() auto-indexing. Added index_col=False to
preserve first column as data.

Fixes #ABC-123"

Step 6: Final Validation and Release Decision

步骤6：最终验证与发布决策

Run all tests:

bash

/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd tmp/e2e-validation
for agent in agent-*; do
    echo "=== Running $agent ==="
    cd $agent
    uv run python test_*.py
    cd ..
done
SKILL_SCRIPT_EOF

Update VALIDATION_FINDINGS.md status:

Count Critical bugs: X fixed, Y open
Count Medium bugs: X fixed, Y open
Apply decision framework
Update Status field to ✅ RELEASE READY or ❌ BLOCKED

运行所有测试：

bash

/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd tmp/e2e-validation
for agent in agent-*; do
    echo "=== Running $agent ==="
    cd $agent
    uv run python test_*.py
    cd ..
done
SKILL_SCRIPT_EOF

更新VALIDATION_FINDINGS.md状态：

统计关键Bug：已修复X个，未解决Y个
统计中等Bug：已修复X个，未解决Y个
应用决策框架
将状态字段更新为✅ 已准备好发布或 ❌ 已阻止

Real-World Example: QuestDB Refactor Validation

真实案例：QuestDB重构验证

Context: Migrating from file-based storage (v3.x) to QuestDB (v4.0.0)

Bugs Found:

🔴 Sender API mismatch - Used non-existent
```
Sender.from_uri()
```
instead of
```
Sender.from_conf()
```
🔴 Type conversion -
```
number_of_trades
```
sent as FLOAT, schema expects LONG
🔴 Timestamp parsing - pandas treating first column as index → epoch 0 timestamps
🔴 Deduplication - WAL mode doesn't provide UPSERT semantics (needed
```
DEDUP ENABLE UPSERT KEYS
```
)
🔴 SQL incompatibility - detect_gaps() used nested window functions (QuestDB unsupported)

Impact: Without this validation, v4.0.0 would ship with 100% data corruption and 100% ingestion failure

Outcome: All 5 bugs fixed, system validated, v4.0.0 released successfully

背景：从文件存储（v3.x）迁移到QuestDB（v4.0.0）

发现的Bug：

🔴 Sender API不匹配 - 使用了不存在的
```
Sender.from_uri()
```
，正确应为
```
Sender.from_conf()
```
🔴 类型转换 -
```
number_of_trades
```
以FLOAT类型发送，但Schema期望LONG类型
🔴 时间戳解析 - pandas将第一列视为索引 → 时间戳默认为纪元0（1970-01）
🔴 去重 - WAL模式不提供UPSERT语义（需要
```
DEDUP ENABLE UPSERT KEYS
```
）
🔴 SQL不兼容 - detect_gaps()使用了嵌套窗口函数（QuestDB不支持）

影响：如果没有本次验证，v4.0.0版本会带着100%数据损坏和100%摄入失败的问题发布

结果：5个Bug全部修复，系统通过验证，v4.0.0成功发布

Common Pitfalls

常见陷阱

1. Skipping Environment Validation

1. 跳过环境验证

❌ Bad: Assume Docker/database is working, jump to data ingestion tests ✅ Good: Agent 1 validates environment first, catches port conflicts, schema errors early

❌ 错误做法：假设Docker/数据库正常工作，直接跳转到数据摄入测试 ✅ 正确做法：Agent 1先验证环境，提前发现端口冲突、Schema错误

2. Serial Agent Execution

2. 串行执行Agent

❌ Bad: Run Agent 2, wait for completion, then run Agent 3 ✅ Good: Run Agent 2 & 3 in parallel (no dependency between them)

❌ 错误做法：先运行Agent 2，等待完成后再运行Agent 3 ✅ 正确做法：并行运行Agent 2和3（两者之间无依赖）

3. Manual Test Reporting

3. 手动测试报告

❌ Bad: Copy/paste test output into Slack/email ✅ Good: Structured VALIDATION_FINDINGS.md with severity, status, fix tracking

❌ 错误做法：将测试输出复制粘贴到Slack/邮件 ✅ 正确做法：使用结构化的VALIDATION_FINDINGS.md，包含严重程度、状态、修复跟踪

4. Ignoring Medium Bugs

4. 忽略中等Bug

❌ Bad: "Performance is 55% below SLO, but we'll fix it later" ✅ Good: Document in VALIDATION_FINDINGS.md, make explicit go/no-go decision

❌ 错误做法：“性能比SLO低55%，但我们之后再修复” ✅ 正确做法：在VALIDATION_FINDINGS.md中记录，明确做出发布/阻止发布决策

5. No Re-validation After Fixes

5. 修复后不重新验证

❌ Bad: Apply fix, assume it works, move on ✅ Good: Re-run failing test, update status in VALIDATION_FINDINGS.md

❌ 错误做法：应用修复后假设问题已解决，继续下一步 ✅ 正确做法：重新运行失败的测试，在VALIDATION_FINDINGS.md中更新状态

Resources

资源

scripts/

Not applicable - validation scripts are project-specific (stored in

tmp/e2e-validation/

)

不适用 - 验证脚本是项目特定的（存储在

tmp/e2e-validation/

）

references/

```
example_validation_findings.md
```
- Complete VALIDATION_FINDINGS.md template
```
agent_test_template.py
```
- Template for creating validation test scripts
```
bug_severity_classification.md
```
- Detailed severity criteria and examples

```
example_validation_findings.md
```
- 完整的VALIDATION_FINDINGS.md模板
```
agent_test_template.py
```
- 用于创建验证测试脚本的模板
```
bug_severity_classification.md
```
- 详细的严重程度标准和示例

assets/

Not applicable - validation artifacts are project-specific

不适用 - 验证工件是项目特定的

Troubleshooting

故障排除

Issue	Cause	Solution
Container not starting	Colima/Docker not running	Run `colima start` before Agent 1
Port conflicts	Ports already in use	Stop conflicting containers or use different ports
Schema application fails	Invalid SQL syntax	Check schema.sql for database-specific compatibility
Agent 2/3 fail without Agent 1	Environment not validated	Ensure Agent 1 completes before starting Agent 2/3
Test script import errors	Missing dependencies	Run `uv pip install` in agent directory
Bug status not updating	VALIDATION_FINDINGS.md stale	Manually refresh status after each fix
Parallel agents interference	Shared resources conflict	Ensure agents use isolated directories
Decision unclear	Severity mixed Critical/Medium	Apply Go/No-Go criteria strictly per documentation

问题	原因	解决方案
容器无法启动	Colima/Docker未运行	在执行Agent 1前运行 `colima start`
端口冲突	端口已被占用	停止冲突的容器或使用其他端口
Schema应用失败	SQL语法无效	检查schema.sql的数据库特定兼容性
未执行Agent 1时Agent 2/3失败	环境未验证	确保Agent 1完成后再启动Agent 2/3
测试脚本导入错误	缺少依赖	在Agent目录中运行 `uv pip install`
Bug状态未更新	VALIDATION_FINDINGS.md过时	每次修复后手动刷新状态
并行Agent互相干扰	共享资源冲突	确保Agent使用独立的目录
决策不明确	严重程度混合了关键/中等	严格按照文档应用发布/阻止发布标准