qe-pentest-validation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesename: pentest-validation description: "Orchestrate security finding validation through graduated exploitation. 4-phase pipeline: recon (SAST/DAST), analysis (code review), validation (exploit proof), report (No Exploit, No Report gate). Eliminates false positives by proving exploitability." category: specialized-testing priority: critical tokenEstimate: 1500 agents: [qe-pentest-validator, qe-security-scanner, qe-security-reviewer, qe-security-auditor, qe-quality-gate] implementation_status: optimized optimization_version: 1.0 last_optimized: 2026-02-08 dependencies: [security-testing] quick_reference_card: true tags: [pentest, exploitation, security-validation, shannon, no-exploit-no-report, graduated-exploitation] trust_tier: 3 validation: schema_path: schemas/output.json validator_path: scripts/validate-config.json eval_path: evals/pentest-validation.yaml
name: pentest-validation description: "通过分级利用编排安全漏洞验证流程。包含4阶段流水线:侦察(SAST/DAST)、分析(代码审查)、验证(漏洞利用证明)、报告(无利用则无报告门禁)。通过证明可利用性消除误报。" category: specialized-testing priority: critical tokenEstimate: 1500 agents: [qe-pentest-validator, qe-security-scanner, qe-security-reviewer, qe-security-auditor, qe-quality-gate] implementation_status: optimized optimization_version: 1.0 last_optimized: 2026-02-08 dependencies: [security-testing] quick_reference_card: true tags: [pentest, exploitation, security-validation, shannon, no-exploit-no-report, graduated-exploitation] trust_tier: 3 validation: schema_path: schemas/output.json validator_path: scripts/validate-config.json eval_path: evals/pentest-validation.yaml
Pentest Validation
渗透测试验证
<default_to_action>
When validating security findings:
- REQUIRE explicit authorization for target URL
- SCAN with qe-security-scanner (SAST + dependency + secrets)
- ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
- VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
- REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
- UPDATE exploit playbook with new patterns
Quality Gates:
- Authorization confirmed before ANY exploitation
- Target URL is staging/dev (NOT production)
- Budget cap enforced ($15 default)
- Time cap enforced (30 min default)
- All exploitation attempts logged </default_to_action>
<default_to_action>
验证安全漏洞时:
- 必须获取目标URL的明确授权
- 使用qe-security-scanner进行扫描(SAST + 依赖项 + 敏感信息检测)
- 并行使用qe-security-reviewer和qe-security-auditor进行分析
- 使用qe-pentest-validator进行验证(分级利用,按漏洞类型并行处理)
- 仅报告带有PoC(概念验证)证据的已确认漏洞(遵循"无利用则无报告"原则)
- 将新的攻击模式更新到漏洞利用手册
质量门禁:
- 在进行任何漏洞利用前必须确认授权
- 目标URL为预发布/开发环境(禁止生产环境)
- 执行预算上限(默认15美元)
- 执行时间上限(默认30分钟)
- 所有漏洞利用尝试都必须记录 </default_to_action>
Quick Reference Card
快速参考卡片
The 4-Phase Pipeline
四阶段流水线
| Phase | Agent(s) | Purpose | Parallelism |
|---|---|---|---|
| 1. Recon | qe-security-scanner | SAST, DAST, dependency scan, secrets | Internal parallel |
| 2. Analysis | qe-security-reviewer + qe-security-auditor | Code review + compliance check | Both in parallel |
| 3. Validation | qe-pentest-validator | Graduated exploit validation | Per-vuln-type parallel |
| 4. Report | qe-quality-gate | "No Exploit, No Report" filter | Sequential |
| 阶段 | Agent(s) | 目的 | 并行方式 |
|---|---|---|---|
| 1. 侦察 | qe-security-scanner | SAST、DAST、依赖项扫描、敏感信息检测 | 内部并行 |
| 2. 分析 | qe-security-reviewer + qe-security-auditor | 代码审查 + 合规性检查 | 两者并行 |
| 3. 验证 | qe-pentest-validator | 分级漏洞利用验证 | 按漏洞类型并行 |
| 4. 报告 | qe-quality-gate | "无利用则无报告"过滤 | 串行 |
Graduated Exploitation Tiers
分级利用层级
| Tier | Handler | Cost | Latency | Use When |
|---|---|---|---|---|
| 1 | Agent Booster (WASM) | $0 | <1ms | Code pattern is conclusive (eval, innerHTML, hardcoded creds) |
| 2 | Haiku | $0.0002 | ~500ms | Need payload test against live target |
| 3 | Sonnet/Opus | $0.003-$0.015 | 2-5s | Full exploit chain with data proof |
| 层级 | 处理程序 | 成本 | 延迟 | 适用场景 |
|---|---|---|---|---|
| 1 | Agent Booster (WASM) | $0 | <1ms | 代码模式可明确判定(eval、innerHTML、硬编码凭据) |
| 2 | Haiku | $0.0002 | ~500ms | 需要对真实目标进行Payload测试 |
| 3 | Sonnet/Opus | $0.003-$0.015 | 2-5s | 带有数据证明的完整漏洞利用链 |
When to Use This Skill
本技能的适用场景
| Scenario | Tier | Estimated Cost |
|---|---|---|
| PR security review (source only) | 1 | $0 |
| Pre-release validation (staging) | 1-2 | $1-5 |
| Full pentest validation | 1-3 | $5-15 |
| Compliance audit evidence | 1-3 | $5-15 |
| 场景 | 层级 | 预估成本 |
|---|---|---|
| PR安全审查(仅源码) | 1 | $0 |
| 预发布验证(预发布环境) | 1-2 | $1-5 |
| 完整渗透测试验证 | 1-3 | $5-15 |
| 合规审计证据 | 1-3 | $5-15 |
Configuration
配置
yaml
pentest:
target_url: https://staging.app.com # REQUIRED for Tier 2-3
source_repo: ./src # REQUIRED for Tier 1+
exploitation_tier: 2 # 1=pattern-only, 2=payload-test, 3=full-exploit
vuln_types: # Which pipelines to run
- injection # SQL, NoSQL, command injection
- xss # Reflected, stored, DOM XSS
- auth # Auth bypass, session, JWT
- ssrf # URL scheme abuse, metadata
max_cost_usd: 15 # Budget cap per run
timeout_minutes: 30 # Time cap per run
require_authorization: true # MUST confirm target ownership
no_production: true # Block production URLs
production_patterns: # URL patterns to block
- "*.prod.*"
- "api.*"
- "www.*"yaml
pentest:
target_url: https://staging.app.com # 2-3级利用必填
source_repo: ./src # 1级及以上必填
exploitation_tier: 2 # 1=仅模式检测, 2=Payload测试, 3=完整漏洞利用
vuln_types: # 要运行的流水线类型
- injection # SQL、NoSQL、命令注入
- xss # 反射型、存储型、DOM型XSS
- auth # 认证绕过、会话、JWT
- ssrf # URL scheme滥用、元数据
max_cost_usd: 15 # 每次运行的预算上限
timeout_minutes: 30 # 每次运行的时间上限
require_authorization: true # 必须确认目标所有权
no_production: true # 阻止生产环境URL
production_patterns: # 需阻止的URL模式
- "*.prod.*"
- "api.*"
- "www.*"Safeguards (Mandatory)
防护措施(强制要求)
Authorization Gate
授权门禁
Every pentest validation run MUST:
- Display target URL and exploitation tier to user
- Require explicit confirmation: "I own/authorized testing of this target"
- Log authorization with timestamp
- Block if target URL matches production patterns
每次渗透测试验证运行必须:
- 向用户展示目标URL和利用层级
- 要求明确确认:"我拥有/已授权对该目标进行测试"
- 记录授权信息及时间戳
- 若目标URL匹配生产环境模式则阻止运行
What This Skill Does NOT Do
本技能不支持的操作
- Full autonomous reconnaissance (Nmap, Subfinder)
- Zero-day exploit development
- Attack targets without explicit authorization
- Test production systems
- Store actual exfiltrated data (only proof of access)
- Social engineering or phishing simulation
- Port scanning or service discovery
- 完全自主侦察(Nmap、Subfinder)
- 零日漏洞利用开发
- 在无明确授权的情况下攻击目标
- 测试生产系统
- 存储实际窃取的数据(仅保留访问证明)
- 社会工程学或钓鱼模拟
- 端口扫描或服务发现
Validation Pipelines
验证流水线
Injection Pipeline
注入流水线
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|---|---|---|---|
| SQL injection | String concat in query | | UNION SELECT data extraction |
| NoSQL injection | | Operator injection test | Collection enumeration |
| Command injection | | Command delimiter test | Reverse shell proof |
| LDAP injection | String concat in filter | Wildcard injection | Directory enumeration |
| 攻击类型 | 1级(仅模式) | 2级(Payload测试) | 3级(完整利用) |
|---|---|---|---|
| SQL注入 | 查询语句中的字符串拼接 | | UNION SELECT 数据提取 |
| NoSQL注入 | 查询中的 | 操作符注入测试 | 集合枚举 |
| 命令注入 | | 命令分隔符测试 | 反向Shell证明 |
| LDAP注入 | 过滤器中的字符串拼接 | 通配符注入 | 目录枚举 |
XSS Pipeline
XSS流水线
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|---|---|---|---|
| Reflected XSS | No output encoding | | Browser JS execution via Playwright |
| Stored XSS | | Payload stored + retrieved | Cookie theft PoC |
| DOM XSS | | Fragment injection | DOM manipulation proof |
| 攻击类型 | 1级(仅模式) | 2级(Payload测试) | 3级(完整利用) |
|---|---|---|---|
| 反射型XSS | 无输出编码 | | 通过Playwright执行浏览器JS |
| 存储型XSS | | Payload存储并检索 | Cookie窃取PoC |
| DOM型XSS | | 片段注入 | DOM操作证明 |
Auth Pipeline
认证流水线
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|---|---|---|---|
| JWT none | No algorithm validation | Modified JWT accepted | Admin access with forged token |
| Session fixation | No session rotation | Pre-set session reused | Cross-user session hijack |
| Credential stuffing | No rate limiting | 100 attempts unblocked | Valid credential discovery |
| IDOR | No authorization check | Access other user data | Full CRUD on foreign resources |
| 攻击类型 | 1级(仅模式) | 2级(Payload测试) | 3级(完整利用) |
|---|---|---|---|
| JWT none算法 | 无算法验证 | 修改后的JWT被接受 | 使用伪造令牌获取管理员权限 |
| 会话固定 | 无会话轮换 | 预设置会话被重用 | 跨用户会话劫持 |
| 凭据填充 | 无速率限制 | 100次尝试未被阻止 | 有效凭据发现 |
| IDOR | 无授权检查 | 访问其他用户数据 | 对外部资源执行完整CRUD操作 |
SSRF Pipeline
SSRF流水线
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|---|---|---|---|
| Internal URL | User-controlled URL fetch | | Cloud metadata extraction |
| DNS rebinding | URL validation bypass | Rebind to internal IP | Internal service access |
| Protocol smuggling | URL scheme not restricted | | File content in response |
| 攻击类型 | 1级(仅模式) | 2级(Payload测试) | 3级(完整利用) |
|---|---|---|---|
| 内部URL | 用户可控的URL获取 | | 云元数据提取 |
| DNS重绑定 | URL验证绕过 | 重绑定到内部IP | 访问内部服务 |
| 协议走私 | URL scheme未受限 | | 响应中包含文件内容 |
Agent Coordination
Agent 协同
Orchestration Pattern
编排模式
typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
target: "./src",
layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");
// Phase 2: Analysis (parallel review)
await Promise.all([
Task("Code Security Review", {
findings: phase1Results,
depth: "comprehensive"
}, "qe-security-reviewer"),
Task("Compliance Audit", {
findings: phase1Results,
frameworks: ["owasp-top-10"]
}, "qe-security-auditor")
]);
// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
findings: [...phase1Results, ...phase2Results],
target_url: "https://staging.app.com",
exploitation_tier: 2,
vuln_types: ["injection", "xss", "auth", "ssrf"],
max_cost_usd: 15,
timeout_minutes: 30
}, "qe-pentest-validator");
// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
findings: phase3Results.confirmedFindings,
gate: "no-exploit-no-report",
require_poc: true
}, "qe-quality-gate");typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
target: "./src",
layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");
// Phase 2: Analysis (parallel review)
await Promise.all([
Task("Code Security Review", {
findings: phase1Results,
depth: "comprehensive"
}, "qe-security-reviewer"),
Task("Compliance Audit", {
findings: phase1Results,
frameworks: ["owasp-top-10"]
}, "qe-security-auditor")
]);
// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
findings: [...phase1Results, ...phase2Results],
target_url: "https://staging.app.com",
exploitation_tier: 2,
vuln_types: ["injection", "xss", "auth", "ssrf"],
max_cost_usd: 15,
timeout_minutes: 30
}, "qe-pentest-validator");
// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
findings: phase3Results.confirmedFindings,
gate: "no-exploit-no-report",
require_poc: true
}, "qe-quality-gate");Finding Classification
漏洞分类
| Status | Meaning | Action |
|---|---|---|
| Exploitation succeeded with PoC | Report with evidence |
| Partial exploitation, defenses detected | Report with caveats |
| All exploitation attempts failed | Filter from report |
| WAF/defense blocked, unclear if vulnerable | Report for manual review |
| 状态 | 含义 | 操作 |
|---|---|---|
| 漏洞利用成功并带有PoC | 附带证据报告 |
| 部分漏洞利用成功,检测到防御措施 | 附带说明报告 |
| 所有漏洞利用尝试失败 | 从报告中过滤 |
| WAF/防御措施阻止,是否存在漏洞不明确 | 报告供人工审查 |
Exploit Playbook Memory
漏洞利用手册记忆
Namespace Structure
命名空间结构
aqe/pentest/
playbook/
exploit/{vuln_type}/{tech_stack}/{technique}
bypass/{defense_type}/{technique}
payload/{vuln_type}/{variant}
results/
validation-{timestamp}
poc/
{finding_id}-pocaqe/pentest/
playbook/
exploit/{vuln_type}/{tech_stack}/{technique}
bypass/{defense_type}/{technique}
payload/{vuln_type}/{variant}
results/
validation-{timestamp}
poc/
{finding_id}-pocLearning Loop
学习循环
- Before validation: Query playbook for known patterns matching findings
- During validation: Try known payloads first (higher success rate)
- After validation: Store new successful patterns with confidence scores
- Over time: Agent converges on most effective payloads per tech stack
- 验证前:查询手册以匹配漏洞的已知模式
- 验证中:优先尝试已知Payload(成功率更高)
- 验证后:存储新的成功模式并附带置信度评分
- 长期优化:Agent会逐步收敛为针对各技术栈最有效的Payload
Cost Optimization
成本优化
Estimated Cost by Scenario
不同场景的预估成本
| Scenario | Tier Mix | Findings | Est. Cost | Est. Time |
|---|---|---|---|---|
| PR check (source only) | 100% Tier 1 | 5 | $0 | <5s |
| Sprint validation | 70% T1, 30% T2 | 15 | $2-5 | 5-10 min |
| Release validation | 40% T1, 40% T2, 20% T3 | 25 | $8-15 | 15-30 min |
| Full pentest | 20% T1, 30% T2, 50% T3 | 40 | $15-30 | 30-60 min |
| 场景 | 层级组合 | 漏洞数量 | 预估成本 | 预估时间 |
|---|---|---|---|---|
| PR检查(仅源码) | 100% 1级 | 5 | $0 | <5s |
| 迭代验证 | 70% 1级, 30% 2级 | 15 | $2-5 | 5-10分钟 |
| 发布验证 | 40% 1级, 40% 2级, 20% 3级 | 25 | $8-15 | 15-30分钟 |
| 完整渗透测试 | 20% 1级, 30% 2级, 50% 3级 | 40 | $15-30 | 30-60分钟 |
Cost vs Shannon Comparison
与Shannon的成本对比
| Metric | Shannon | AQE Pentest Validation |
|---|---|---|
| Cost per run | ~$50 | $5-15 (graduated tiers) |
| Runtime | 60-90 min | 15-30 min (parallel pipelines) |
| False positive rate | Low (exploit-proven) | Low (same principle) |
| Learning | None (static prompts) | ReasoningBank playbook |
| 指标 | Shannon | AQE 渗透测试验证 |
|---|---|---|
| 每次运行成本 | ~$50 | $5-15(分级层级) |
| 运行时间 | 60-90分钟 | 15-30分钟(并行流水线) |
| 误报率 | 低(需证明可利用) | 低(相同原则) |
| 学习能力 | 无(静态提示) | ReasoningBank 手册 |
Success Metrics
成功指标
| Metric | Target | Measurement |
|---|---|---|
| False positive reduction | >60% of findings eliminated | Pre/post validator comparison |
| Exploit confirmation rate | >80% of confirmed findings truly exploitable | Manual PoC verification |
| Cost per run | <$15 USD | Token tracking per pipeline |
| Time per run | <30 minutes | Execution time metrics |
| Playbook growth | 100+ patterns after 6 months | Memory namespace count |
| 指标 | 目标 | 测量方式 |
|---|---|---|
| 误报减少率 | 消除>60%的漏洞 | 验证前后的漏洞数量对比 |
| 漏洞利用确认率 | >80%的已确认漏洞确实可利用 | 人工PoC验证 |
| 每次运行成本 | <15美元 | 按流水线跟踪Token成本 |
| 每次运行时间 | <30分钟 | 执行时间指标 |
| 手册增长 | 6个月内积累100+模式 | 记忆命名空间数量 |
Related Skills
相关技能
- security-testing - OWASP vulnerability scanning
- qe-security-compliance - SAST/DAST automation
- compliance-testing - Regulatory compliance
- api-testing-patterns - API security testing
- chaos-engineering-resilience - Security under chaos
- security-testing - OWASP 漏洞扫描
- qe-security-compliance - SAST/DAST 自动化
- compliance-testing - 法规合规性测试
- api-testing-patterns - API 安全测试
- chaos-engineering-resilience - 混沌工程下的安全验证
Remember
注意事项
"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.
Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.
"无利用则无报告" 无法证明可利用性的漏洞扫描器价值存疑。本技能将安全漏洞从理论风险转化为有证据支持的已验证漏洞。每个已确认的漏洞都附带可复现的概念验证(PoC)。所有误报都会在进入报告前被排除。
注重证据,而非预测 不要报告可能存在的漏洞,要证明确实存在的漏洞。