multi-agent-e2e-validation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMulti-Agent E2E Validation
多Agent端到端(E2E)验证
Overview
概述
Prescriptive workflow for spawning parallel validation agents to comprehensively test database refactors. Successfully identified 5 critical bugs (100% system failure rate) in QuestDB migration that would have shipped in production.
这是一套用于生成并行验证Agent以全面测试数据库重构的标准化工作流。在QuestDB迁移项目中,该工作流成功识别出5个关键Bug(系统故障率100%),这些Bug原本会被带入生产环境。
When to Use This Skill
何时使用该技能
Use this skill when:
- Database refactors (e.g., v3.x file-based → v4.x QuestDB)
- Schema migrations requiring validation
- Bulk data ingestion pipeline testing
- System migrations with multiple validation layers
- Pre-release validation for database-centric systems
Key outcomes:
- Parallel agent execution for comprehensive coverage
- Structured validation reporting (VALIDATION_FINDINGS.md)
- Bug discovery with severity classification (Critical/Medium/Low)
- Release readiness assessment
在以下场景中使用本技能:
- 数据库重构(例如:v3.x 文件存储 → v4.x QuestDB)
- 需要验证的Schema迁移
- 批量数据摄入管道测试
- 包含多层验证的系统迁移
- 以数据库为核心的系统的预发布验证
关键成果:
- 并行Agent执行以实现全面覆盖
- 结构化验证报告(VALIDATION_FINDINGS.md)
- 带严重程度分类的Bug发现(关键/中等/轻微)
- 发布就绪评估
Core Methodology
核心方法论
1. Validation Architecture (3-Layer Model)
1. 验证架构(三层模型)
Layer 1: Environment Setup
- Container orchestration (Colima/Docker)
- Database deployment and schema application
- Connectivity validation (ILP, PostgreSQL, HTTP ports)
- Configuration file creation and validation
Layer 2: Data Flow Validation
- Bulk ingestion testing (CloudFront → QuestDB)
- Performance benchmarking against SLOs
- Multi-month data ingestion
- Deduplication testing (re-ingestion scenarios)
- Type conversion validation (FLOAT→LONG casts)
Layer 3: Query Interface Validation
- High-level query methods (get_latest, get_range, execute_sql)
- Edge cases (limit=1, cross-month boundaries)
- Error handling (invalid symbols, dates, parameters)
- Gap detection SQL compatibility
第一层:环境搭建
- 容器编排(Colima/Docker)
- 数据库部署与Schema应用
- 连通性验证(ILP、PostgreSQL、HTTP端口)
- 配置文件创建与验证
第二层:数据流验证
- 批量摄入测试(CloudFront → QuestDB)
- 基于服务水平目标(SLO)的性能基准测试
- 跨多月数据摄入测试
- 去重测试(重复摄入场景)
- 类型转换验证(FLOAT→LONG 转换)
第三层:查询接口验证
- 高级查询方法测试(get_latest、get_range、execute_sql)
- 边缘场景测试(limit=1、跨月边界)
- 错误处理测试(无效符号、日期、参数)
- 间隙检测SQL兼容性测试
2. Agent Orchestration Pattern
2. Agent编排模式
Sequential vs Parallel Execution:
Agent 1 (Environment) → [SEQUENTIAL - prerequisite]
↓
Agent 2 (Bulk Loader) → [PARALLEL with Agent 3]
Agent 3 (Query Interface) → [PARALLEL with Agent 2]Dependency Rule: Environment validation must pass before data flow/query validation
Dynamic Todo Management:
- Start with high-level plan (ADR-defined phases)
- Prune completed agents from todo list
- Grow todo list when bugs discovered (e.g., Bug #5 found by Agent 3)
- Update VALIDATION_FINDINGS.md incrementally
串行 vs 并行执行:
Agent 1 (环境) → [串行 - 前置条件]
↓
Agent 2 (批量加载器) → [与Agent 3并行]
Agent 3 (查询接口) → [与Agent 2并行]依赖规则:环境验证必须在数据流/查询验证之前通过
动态待办事项管理:
- 从高级计划开始(由架构决策记录ADR定义的阶段)
- 从待办列表中移除已完成的Agent
- 发现Bug时扩展待办列表(例如:Agent 3发现的Bug #5)
- 逐步更新VALIDATION_FINDINGS.md
3. Validation Script Structure
3. 验证脚本结构
Each agent produces:
- Test Script (e.g., )
test_bulk_loader.py- 5+ test functions with clear pass/fail criteria
- Structured output (test name, result, details)
- Summary report at end
- Artifacts (logs, config files, evidence)
- Findings Report (bugs, severity, fix proposals)
Example Test Structure:
python
def test_feature(conn):
"""Test 1: Feature description"""
print("=" * 80)
print("TEST 1: Feature description")
print("=" * 80)
results = {}
# Test 1a: Subtest name
print("\n1a. Testing subtest:")
result_1a = perform_test()
print(f" Result: {result_1a}")
results["subtest_1a"] = result_1a == expected_1a
# Summary
print("\n" + "-" * 80)
all_passed = all(results.values())
print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
for test_name, passed in results.items():
print(f" - {test_name}: {'✓' if passed else '✗'}")
return {"success": all_passed, "details": results}每个Agent会生成以下内容:
- 测试脚本(例如:)
test_bulk_loader.py- 5个以上带有明确通过/失败标准的测试函数
- 结构化输出(测试名称、结果、详情)
- 末尾的汇总报告
- 工件(日志、配置文件、证据)
- 发现报告(Bug、严重程度、修复建议)
测试结构示例:
python
def test_feature(conn):
"""Test 1: Feature description"""
print("=" * 80)
print("TEST 1: Feature description")
print("=" * 80)
results = {}
# Test 1a: Subtest name
print("\n1a. Testing subtest:")
result_1a = perform_test()
print(f" Result: {result_1a}")
results["subtest_1a"] = result_1a == expected_1a
# Summary
print("\n" + "-" * 80)
all_passed = all(results.values())
print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
for test_name, passed in results.items():
print(f" - {test_name}: {'✓' if passed else '✗'}")
return {"success": all_passed, "details": results}4. Bug Classification and Tracking
4. Bug分类与跟踪
Severity Levels:
- 🔴 Critical: 100% system failure (e.g., API mismatch, timestamp corruption)
- 🟡 Medium: Degraded functionality (e.g., below SLO performance)
- 🟢 Low: Minor issues, edge cases
Bug Report Format:
markdown
undefined严重程度等级:
- 🔴 关键:系统100%故障(例如:API不匹配、时间戳损坏)
- 🟡 中等:功能降级(例如:性能低于SLO)
- 🟢 轻微:小问题、边缘场景
Bug报告格式:
markdown
undefinedBug N: Descriptive Name (SEVERITY - Status)
Bug N: 描述性名称(严重程度 - 状态)
Location:
file/path.py:lineIssue: One-sentence description
Impact: Quantified impact (e.g., "100% ingestion failure")
Root Cause: Technical explanation
Fix Applied: Code changes with before/after
Verification: Test results proving fix
Status: ✅ FIXED / ⚠️ PARTIAL / ❌ OPEN
undefined位置:
file/path.py:line问题:一句话描述
影响:量化影响(例如:“100%摄入失败”)
根本原因:技术解释
已应用修复:代码变更的前后对比
验证:证明修复有效的测试结果
状态:✅ 已修复 / ⚠️ 部分修复 / ❌ 未解决
undefined5. Release Readiness Decision Framework
5. 发布就绪决策框架
Go/No-Go Criteria:
BLOCKER = Any Critical bug unfixed
SHIP = All Critical bugs fixed + (Medium bugs acceptable OR fixed)
DEFER = >3 Medium bugs unfixed OR any High-severity bugExample Decision:
- 5 Critical bugs found → all fixed ✅
- 1 Medium bug (performance 55% below SLO) → acceptable ✅
- Verdict: RELEASE READY
发布/阻止发布标准:
阻止发布 = 存在未修复的关键Bug
允许发布 = 所有关键Bug已修复 + (中等Bug可接受或已修复)
延迟发布 = 超过3个未修复的中等Bug 或 存在任何高严重程度Bug决策示例:
- 发现5个关键Bug → 全部修复 ✅
- 1个中等Bug(性能比SLO低55%)→ 可接受 ✅
- 结论:已准备好发布
Workflow: Step-by-Step
工作流:分步指南
Step 1: Create Validation Plan (ADR-Driven)
步骤1:创建验证计划(ADR驱动)
Input: ADR document (e.g., ADR-0002 QuestDB Refactor)
Output: Validation plan with 3-7 agents
Plan Structure:
markdown
undefined输入:ADR文档(例如:ADR-0002 QuestDB重构)
输出:包含3-7个Agent的验证计划
计划结构:
markdown
undefinedValidation Agents
验证Agent
Agent 1: Environment Setup
Agent 1: 环境搭建
- Deploy QuestDB via Docker
- Apply schema.sql
- Validate connectivity (ILP, PG, HTTP)
- Create .env configuration
- 通过Docker部署QuestDB
- 应用schema.sql
- 验证连通性(ILP、PG、HTTP)
- 创建.env配置
Agent 2: Bulk Loader Validation
Agent 2: 批量加载器验证
- Test CloudFront → QuestDB ingestion
- Benchmark performance (target: >100K rows/sec)
- Validate deduplication (re-ingestion test)
- Multi-month ingestion test
- 测试CloudFront → QuestDB数据摄入
- 性能基准测试(目标:>10万行/秒)
- 去重验证(重复摄入测试)
- 跨多月摄入测试
Agent 3: Query Interface Validation
Agent 3: 查询接口验证
- Test get_latest() with various limits
- Test get_range() with date boundaries
- Test execute_sql() with parameterized queries
- Test detect_gaps() SQL compatibility
- Test error handling (invalid inputs)
undefined- 测试不同限制条件下的get_latest()
- 测试跨日期边界的get_range()
- 测试带参数查询的execute_sql()
- 测试detect_gaps()的SQL兼容性
- 测试错误处理(无效输入)
undefinedStep 2: Execute Agent 1 (Environment)
步骤2:执行Agent 1(环境)
Directory Structure:
tmp/e2e-validation/
agent-1-env/
test_environment_setup.py
questdb.log
config.env
schema-check.txtValidation Checklist:
- ✅ Container running
- ✅ Ports accessible (9009 ILP, 8812 PG, 9000 HTTP)
- ✅ Schema applied without errors
- ✅ .env file created
目录结构:
tmp/e2e-validation/
agent-1-env/
test_environment_setup.py
questdb.log
config.env
schema-check.txt验证检查清单:
- ✅ 容器正在运行
- ✅ 端口可访问(9009 ILP、8812 PG、9000 HTTP)
- ✅ Schema应用无错误
- ✅ .env文件已创建
Step 3: Execute Agents 2-3 in Parallel
步骤3:并行执行Agent 2-3
Agent 2: Bulk Loader
tmp/e2e-validation/
agent-2-bulk/
test_bulk_loader.py
ingestion_benchmark.txt
deduplication_test.txtAgent 3: Query Interface
tmp/e2e-validation/
agent-3-query/
test_query_interface.py
gap_detection_test.txtExecution:
bash
undefinedAgent 2: 批量加载器
tmp/e2e-validation/
agent-2-bulk/
test_bulk_loader.py
ingestion_benchmark.txt
deduplication_test.txtAgent 3: 查询接口
tmp/e2e-validation/
agent-3-query/
test_query_interface.py
gap_detection_test.txt执行命令:
bash
undefinedTerminal 1
终端1
cd tmp/e2e-validation/agent-2-bulk
uv run python test_bulk_loader.py
cd tmp/e2e-validation/agent-2-bulk
uv run python test_bulk_loader.py
Terminal 2
终端2
cd tmp/e2e-validation/agent-3-query
uv run python test_query_interface.py
undefinedcd tmp/e2e-validation/agent-3-query
uv run python test_query_interface.py
undefinedStep 4: Document Findings in VALIDATION_FINDINGS.md
步骤4:在VALIDATION_FINDINGS.md中记录发现
Template:
markdown
undefined模板:
markdown
undefinedE2E Validation Findings Report
E2E验证发现报告
Validation ID: ADR-XXXX
Branch: feat/database-refactor
Date: YYYY-MM-DD
Target Release: vX.Y.Z
Status: [BLOCKED / READY / IN_PROGRESS]
验证ID:ADR-XXXX
分支:feat/database-refactor
日期:YYYY-MM-DD
目标版本:vX.Y.Z
状态:[已阻止 / 已就绪 / 进行中]
Executive Summary
执行摘要
E2E validation discovered N critical bugs that would have caused [impact]:
| Finding | Severity | Status | Impact | Agent |
|---|---|---|---|---|
| Bug 1 | Critical | Fixed | 100% failure | Agent 2 |
Recommendation: [RELEASE READY / BLOCKED / DEFER]
E2E验证发现N个关键Bug,这些Bug会导致[影响]:
| 发现 | 严重程度 | 状态 | 影响 | Agent |
|---|---|---|---|---|
| Bug 1 | 关键 | 已修复 | 100%故障 | Agent 2 |
建议:[已准备好发布 / 已阻止 / 延迟]
Agent 1: Environment Setup - [STATUS]
Agent 1: 环境搭建 - [状态]
...
...
Agent 2: [Name] - [STATUS]
Agent 2: [名称] - [状态]
...
undefined...
undefinedStep 5: Iterate on Fixes
步骤5:迭代修复
For each bug:
- Document in VALIDATION_FINDINGS.md with 🔴/🟡/🟢 severity
- Apply fix to source code
- Re-run failing test
- Update bug status to ✅ FIXED
- Commit with semantic message (e.g., )
fix: correct timestamp parsing in CSV ingestion
Example Fix Commit:
bash
git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py
git commit -m "fix: prevent pandas from treating first CSV column as index
BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01)
due to pandas read_csv() auto-indexing. Added index_col=False to
preserve first column as data.
Fixes #ABC-123"针对每个Bug:
- 在VALIDATION_FINDINGS.md中用🔴/🟡/🟢标记严重程度
- 对源代码应用修复
- 重新运行失败的测试
- 将Bug状态更新为✅ 已修复
- 用语义化信息提交(例如:)
fix: correct timestamp parsing in CSV ingestion
修复提交示例:
bash
git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py
git commit -m "fix: prevent pandas from treating first CSV column as index
BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01)
due to pandas read_csv() auto-indexing. Added index_col=False to
preserve first column as data.
Fixes #ABC-123"Step 6: Final Validation and Release Decision
步骤6:最终验证与发布决策
Run all tests:
bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd tmp/e2e-validation
for agent in agent-*; do
echo "=== Running $agent ==="
cd $agent
uv run python test_*.py
cd ..
done
SKILL_SCRIPT_EOFUpdate VALIDATION_FINDINGS.md status:
- Count Critical bugs: X fixed, Y open
- Count Medium bugs: X fixed, Y open
- Apply decision framework
- Update Status field to ✅ RELEASE READY or ❌ BLOCKED
运行所有测试:
bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd tmp/e2e-validation
for agent in agent-*; do
echo "=== Running $agent ==="
cd $agent
uv run python test_*.py
cd ..
done
SKILL_SCRIPT_EOF更新VALIDATION_FINDINGS.md状态:
- 统计关键Bug:已修复X个,未解决Y个
- 统计中等Bug:已修复X个,未解决Y个
- 应用决策框架
- 将状态字段更新为✅ 已准备好发布 或 ❌ 已阻止
Real-World Example: QuestDB Refactor Validation
真实案例:QuestDB重构验证
Context: Migrating from file-based storage (v3.x) to QuestDB (v4.0.0)
Bugs Found:
- 🔴 Sender API mismatch - Used non-existent instead of
Sender.from_uri()Sender.from_conf() - 🔴 Type conversion - sent as FLOAT, schema expects LONG
number_of_trades - 🔴 Timestamp parsing - pandas treating first column as index → epoch 0 timestamps
- 🔴 Deduplication - WAL mode doesn't provide UPSERT semantics (needed )
DEDUP ENABLE UPSERT KEYS - 🔴 SQL incompatibility - detect_gaps() used nested window functions (QuestDB unsupported)
Impact: Without this validation, v4.0.0 would ship with 100% data corruption and 100% ingestion failure
Outcome: All 5 bugs fixed, system validated, v4.0.0 released successfully
背景:从文件存储(v3.x)迁移到QuestDB(v4.0.0)
发现的Bug:
- 🔴 Sender API不匹配 - 使用了不存在的,正确应为
Sender.from_uri()Sender.from_conf() - 🔴 类型转换 - 以FLOAT类型发送,但Schema期望LONG类型
number_of_trades - 🔴 时间戳解析 - pandas将第一列视为索引 → 时间戳默认为纪元0(1970-01)
- 🔴 去重 - WAL模式不提供UPSERT语义(需要)
DEDUP ENABLE UPSERT KEYS - 🔴 SQL不兼容 - detect_gaps()使用了嵌套窗口函数(QuestDB不支持)
影响:如果没有本次验证,v4.0.0版本会带着100%数据损坏和100%摄入失败的问题发布
结果:5个Bug全部修复,系统通过验证,v4.0.0成功发布
Common Pitfalls
常见陷阱
1. Skipping Environment Validation
1. 跳过环境验证
❌ Bad: Assume Docker/database is working, jump to data ingestion tests
✅ Good: Agent 1 validates environment first, catches port conflicts, schema errors early
❌ 错误做法:假设Docker/数据库正常工作,直接跳转到数据摄入测试
✅ 正确做法:Agent 1先验证环境,提前发现端口冲突、Schema错误
2. Serial Agent Execution
2. 串行执行Agent
❌ Bad: Run Agent 2, wait for completion, then run Agent 3
✅ Good: Run Agent 2 & 3 in parallel (no dependency between them)
❌ 错误做法:先运行Agent 2,等待完成后再运行Agent 3
✅ 正确做法:并行运行Agent 2和3(两者之间无依赖)
3. Manual Test Reporting
3. 手动测试报告
❌ Bad: Copy/paste test output into Slack/email
✅ Good: Structured VALIDATION_FINDINGS.md with severity, status, fix tracking
❌ 错误做法:将测试输出复制粘贴到Slack/邮件
✅ 正确做法:使用结构化的VALIDATION_FINDINGS.md,包含严重程度、状态、修复跟踪
4. Ignoring Medium Bugs
4. 忽略中等Bug
❌ Bad: "Performance is 55% below SLO, but we'll fix it later"
✅ Good: Document in VALIDATION_FINDINGS.md, make explicit go/no-go decision
❌ 错误做法:“性能比SLO低55%,但我们之后再修复”
✅ 正确做法:在VALIDATION_FINDINGS.md中记录,明确做出发布/阻止发布决策
5. No Re-validation After Fixes
5. 修复后不重新验证
❌ Bad: Apply fix, assume it works, move on
✅ Good: Re-run failing test, update status in VALIDATION_FINDINGS.md
❌ 错误做法:应用修复后假设问题已解决,继续下一步
✅ 正确做法:重新运行失败的测试,在VALIDATION_FINDINGS.md中更新状态
Resources
资源
scripts/
scripts/
Not applicable - validation scripts are project-specific (stored in )
tmp/e2e-validation/不适用 - 验证脚本是项目特定的(存储在)
tmp/e2e-validation/references/
references/
- - Complete VALIDATION_FINDINGS.md template
example_validation_findings.md - - Template for creating validation test scripts
agent_test_template.py - - Detailed severity criteria and examples
bug_severity_classification.md
- - 完整的VALIDATION_FINDINGS.md模板
example_validation_findings.md - - 用于创建验证测试脚本的模板
agent_test_template.py - - 详细的严重程度标准和示例
bug_severity_classification.md
assets/
assets/
Not applicable - validation artifacts are project-specific
不适用 - 验证工件是项目特定的
Troubleshooting
故障排除
| Issue | Cause | Solution |
|---|---|---|
| Container not starting | Colima/Docker not running | Run |
| Port conflicts | Ports already in use | Stop conflicting containers or use different ports |
| Schema application fails | Invalid SQL syntax | Check schema.sql for database-specific compatibility |
| Agent 2/3 fail without Agent 1 | Environment not validated | Ensure Agent 1 completes before starting Agent 2/3 |
| Test script import errors | Missing dependencies | Run |
| Bug status not updating | VALIDATION_FINDINGS.md stale | Manually refresh status after each fix |
| Parallel agents interference | Shared resources conflict | Ensure agents use isolated directories |
| Decision unclear | Severity mixed Critical/Medium | Apply Go/No-Go criteria strictly per documentation |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 容器无法启动 | Colima/Docker未运行 | 在执行Agent 1前运行 |
| 端口冲突 | 端口已被占用 | 停止冲突的容器或使用其他端口 |
| Schema应用失败 | SQL语法无效 | 检查schema.sql的数据库特定兼容性 |
| 未执行Agent 1时Agent 2/3失败 | 环境未验证 | 确保Agent 1完成后再启动Agent 2/3 |
| 测试脚本导入错误 | 缺少依赖 | 在Agent目录中运行 |
| Bug状态未更新 | VALIDATION_FINDINGS.md过时 | 每次修复后手动刷新状态 |
| 并行Agent互相干扰 | 共享资源冲突 | 确保Agent使用独立的目录 |
| 决策不明确 | 严重程度混合了关键/中等 | 严格按照文档应用发布/阻止发布标准 |