qe-pentest-validation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

name: pentest-validation description: "Orchestrate security finding validation through graduated exploitation. 4-phase pipeline: recon (SAST/DAST), analysis (code review), validation (exploit proof), report (No Exploit, No Report gate). Eliminates false positives by proving exploitability." category: specialized-testing priority: critical tokenEstimate: 1500 agents: [qe-pentest-validator, qe-security-scanner, qe-security-reviewer, qe-security-auditor, qe-quality-gate] implementation_status: optimized optimization_version: 1.0 last_optimized: 2026-02-08 dependencies: [security-testing] quick_reference_card: true tags: [pentest, exploitation, security-validation, shannon, no-exploit-no-report, graduated-exploitation] trust_tier: 3 validation: schema_path: schemas/output.json validator_path: scripts/validate-config.json eval_path: evals/pentest-validation.yaml


name: pentest-validation description: "通过分级利用编排安全漏洞验证流程。包含4阶段流水线:侦察(SAST/DAST)、分析(代码审查)、验证(漏洞利用证明)、报告(无利用则无报告门禁)。通过证明可利用性消除误报。" category: specialized-testing priority: critical tokenEstimate: 1500 agents: [qe-pentest-validator, qe-security-scanner, qe-security-reviewer, qe-security-auditor, qe-quality-gate] implementation_status: optimized optimization_version: 1.0 last_optimized: 2026-02-08 dependencies: [security-testing] quick_reference_card: true tags: [pentest, exploitation, security-validation, shannon, no-exploit-no-report, graduated-exploitation] trust_tier: 3 validation: schema_path: schemas/output.json validator_path: scripts/validate-config.json eval_path: evals/pentest-validation.yaml

Pentest Validation

渗透测试验证

<default_to_action> When validating security findings:
  1. REQUIRE explicit authorization for target URL
  2. SCAN with qe-security-scanner (SAST + dependency + secrets)
  3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
  4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
  5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
  6. UPDATE exploit playbook with new patterns
Quality Gates:
  • Authorization confirmed before ANY exploitation
  • Target URL is staging/dev (NOT production)
  • Budget cap enforced ($15 default)
  • Time cap enforced (30 min default)
  • All exploitation attempts logged </default_to_action>
<default_to_action> 验证安全漏洞时:
  1. 必须获取目标URL的明确授权
  2. 使用qe-security-scanner进行扫描(SAST + 依赖项 + 敏感信息检测)
  3. 并行使用qe-security-reviewer和qe-security-auditor进行分析
  4. 使用qe-pentest-validator进行验证(分级利用,按漏洞类型并行处理)
  5. 仅报告带有PoC(概念验证)证据的已确认漏洞(遵循"无利用则无报告"原则)
  6. 将新的攻击模式更新到漏洞利用手册
质量门禁:
  • 在进行任何漏洞利用前必须确认授权
  • 目标URL为预发布/开发环境(禁止生产环境)
  • 执行预算上限(默认15美元)
  • 执行时间上限(默认30分钟)
  • 所有漏洞利用尝试都必须记录 </default_to_action>

Quick Reference Card

快速参考卡片

The 4-Phase Pipeline

四阶段流水线

PhaseAgent(s)PurposeParallelism
1. Reconqe-security-scannerSAST, DAST, dependency scan, secretsInternal parallel
2. Analysisqe-security-reviewer + qe-security-auditorCode review + compliance checkBoth in parallel
3. Validationqe-pentest-validatorGraduated exploit validationPer-vuln-type parallel
4. Reportqe-quality-gate"No Exploit, No Report" filterSequential
阶段Agent(s)目的并行方式
1. 侦察qe-security-scannerSAST、DAST、依赖项扫描、敏感信息检测内部并行
2. 分析qe-security-reviewer + qe-security-auditor代码审查 + 合规性检查两者并行
3. 验证qe-pentest-validator分级漏洞利用验证按漏洞类型并行
4. 报告qe-quality-gate"无利用则无报告"过滤串行

Graduated Exploitation Tiers

分级利用层级

TierHandlerCostLatencyUse When
1Agent Booster (WASM)$0<1msCode pattern is conclusive (eval, innerHTML, hardcoded creds)
2Haiku$0.0002~500msNeed payload test against live target
3Sonnet/Opus$0.003-$0.0152-5sFull exploit chain with data proof
层级处理程序成本延迟适用场景
1Agent Booster (WASM)$0<1ms代码模式可明确判定(eval、innerHTML、硬编码凭据)
2Haiku$0.0002~500ms需要对真实目标进行Payload测试
3Sonnet/Opus$0.003-$0.0152-5s带有数据证明的完整漏洞利用链

When to Use This Skill

本技能的适用场景

ScenarioTierEstimated Cost
PR security review (source only)1$0
Pre-release validation (staging)1-2$1-5
Full pentest validation1-3$5-15
Compliance audit evidence1-3$5-15

场景层级预估成本
PR安全审查(仅源码)1$0
预发布验证(预发布环境)1-2$1-5
完整渗透测试验证1-3$5-15
合规审计证据1-3$5-15

Configuration

配置

yaml
pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

yaml
pentest:
  target_url: https://staging.app.com    # 2-3级利用必填
  source_repo: ./src                      # 1级及以上必填
  exploitation_tier: 2                    # 1=仅模式检测, 2=Payload测试, 3=完整漏洞利用
  vuln_types:                             # 要运行的流水线类型
    - injection                           # SQL、NoSQL、命令注入
    - xss                                 # 反射型、存储型、DOM型XSS
    - auth                                # 认证绕过、会话、JWT
    - ssrf                                # URL scheme滥用、元数据
  max_cost_usd: 15                        # 每次运行的预算上限
  timeout_minutes: 30                     # 每次运行的时间上限
  require_authorization: true             # 必须确认目标所有权
  no_production: true                     # 阻止生产环境URL
  production_patterns:                    # 需阻止的URL模式
    - "*.prod.*"
    - "api.*"
    - "www.*"

Safeguards (Mandatory)

防护措施(强制要求)

Authorization Gate

授权门禁

Every pentest validation run MUST:
  1. Display target URL and exploitation tier to user
  2. Require explicit confirmation: "I own/authorized testing of this target"
  3. Log authorization with timestamp
  4. Block if target URL matches production patterns
每次渗透测试验证运行必须:
  1. 向用户展示目标URL和利用层级
  2. 要求明确确认:"我拥有/已授权对该目标进行测试"
  3. 记录授权信息及时间戳
  4. 若目标URL匹配生产环境模式则阻止运行

What This Skill Does NOT Do

本技能不支持的操作

  • Full autonomous reconnaissance (Nmap, Subfinder)
  • Zero-day exploit development
  • Attack targets without explicit authorization
  • Test production systems
  • Store actual exfiltrated data (only proof of access)
  • Social engineering or phishing simulation
  • Port scanning or service discovery

  • 完全自主侦察(Nmap、Subfinder)
  • 零日漏洞利用开发
  • 在无明确授权的情况下攻击目标
  • 测试生产系统
  • 存储实际窃取的数据(仅保留访问证明)
  • 社会工程学或钓鱼模拟
  • 端口扫描或服务发现

Validation Pipelines

验证流水线

Injection Pipeline

注入流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
SQL injectionString concat in query
' OR '1'='1
response diff
UNION SELECT data extraction
NoSQL injection
$where
,
$gt
in query
Operator injection testCollection enumeration
Command injection
exec()
,
system()
calls
Command delimiter testReverse shell proof
LDAP injectionString concat in filterWildcard injectionDirectory enumeration
攻击类型1级(仅模式)2级(Payload测试)3级(完整利用)
SQL注入查询语句中的字符串拼接
' OR '1'='1
响应差异
UNION SELECT 数据提取
NoSQL注入查询中的
$where
$gt
操作符注入测试集合枚举
命令注入
exec()
system()
调用
命令分隔符测试反向Shell证明
LDAP注入过滤器中的字符串拼接通配符注入目录枚举

XSS Pipeline

XSS流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
Reflected XSSNo output encoding
<img onerror>
reflection
Browser JS execution via Playwright
Stored XSS
innerHTML
assignment
Payload stored + retrievedCookie theft PoC
DOM XSS
document.write(location)
Fragment injectionDOM manipulation proof
攻击类型1级(仅模式)2级(Payload测试)3级(完整利用)
反射型XSS无输出编码
<img onerror>
反射
通过Playwright执行浏览器JS
存储型XSS
innerHTML
赋值
Payload存储并检索Cookie窃取PoC
DOM型XSS
document.write(location)
片段注入DOM操作证明

Auth Pipeline

认证流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
JWT noneNo algorithm validationModified JWT acceptedAdmin access with forged token
Session fixationNo session rotationPre-set session reusedCross-user session hijack
Credential stuffingNo rate limiting100 attempts unblockedValid credential discovery
IDORNo authorization checkAccess other user dataFull CRUD on foreign resources
攻击类型1级(仅模式)2级(Payload测试)3级(完整利用)
JWT none算法无算法验证修改后的JWT被接受使用伪造令牌获取管理员权限
会话固定无会话轮换预设置会话被重用跨用户会话劫持
凭据填充无速率限制100次尝试未被阻止有效凭据发现
IDOR无授权检查访问其他用户数据对外部资源执行完整CRUD操作

SSRF Pipeline

SSRF流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
Internal URLUser-controlled URL fetch
http://169.254.169.254
Cloud metadata extraction
DNS rebindingURL validation bypassRebind to internal IPInternal service access
Protocol smugglingURL scheme not restricted
file:///etc/passwd
File content in response

攻击类型1级(仅模式)2级(Payload测试)3级(完整利用)
内部URL用户可控的URL获取
http://169.254.169.254
云元数据提取
DNS重绑定URL验证绕过重绑定到内部IP访问内部服务
协议走私URL scheme未受限
file:///etc/passwd
响应中包含文件内容

Agent Coordination

Agent 协同

Orchestration Pattern

编排模式

typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");
typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");

Finding Classification

漏洞分类

StatusMeaningAction
confirmed-exploitable
Exploitation succeeded with PoCReport with evidence
likely-exploitable
Partial exploitation, defenses detectedReport with caveats
not-exploitable
All exploitation attempts failedFilter from report
inconclusive
WAF/defense blocked, unclear if vulnerableReport for manual review

状态含义操作
confirmed-exploitable
漏洞利用成功并带有PoC附带证据报告
likely-exploitable
部分漏洞利用成功,检测到防御措施附带说明报告
not-exploitable
所有漏洞利用尝试失败从报告中过滤
inconclusive
WAF/防御措施阻止,是否存在漏洞不明确报告供人工审查

Exploit Playbook Memory

漏洞利用手册记忆

Namespace Structure

命名空间结构

aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc
aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc

Learning Loop

学习循环

  1. Before validation: Query playbook for known patterns matching findings
  2. During validation: Try known payloads first (higher success rate)
  3. After validation: Store new successful patterns with confidence scores
  4. Over time: Agent converges on most effective payloads per tech stack

  1. 验证前:查询手册以匹配漏洞的已知模式
  2. 验证中:优先尝试已知Payload(成功率更高)
  3. 验证后:存储新的成功模式并附带置信度评分
  4. 长期优化:Agent会逐步收敛为针对各技术栈最有效的Payload

Cost Optimization

成本优化

Estimated Cost by Scenario

不同场景的预估成本

ScenarioTier MixFindingsEst. CostEst. Time
PR check (source only)100% Tier 15$0<5s
Sprint validation70% T1, 30% T215$2-55-10 min
Release validation40% T1, 40% T2, 20% T325$8-1515-30 min
Full pentest20% T1, 30% T2, 50% T340$15-3030-60 min
场景层级组合漏洞数量预估成本预估时间
PR检查(仅源码)100% 1级5$0<5s
迭代验证70% 1级, 30% 2级15$2-55-10分钟
发布验证40% 1级, 40% 2级, 20% 3级25$8-1515-30分钟
完整渗透测试20% 1级, 30% 2级, 50% 3级40$15-3030-60分钟

Cost vs Shannon Comparison

与Shannon的成本对比

MetricShannonAQE Pentest Validation
Cost per run~$50$5-15 (graduated tiers)
Runtime60-90 min15-30 min (parallel pipelines)
False positive rateLow (exploit-proven)Low (same principle)
LearningNone (static prompts)ReasoningBank playbook

指标ShannonAQE 渗透测试验证
每次运行成本~$50$5-15(分级层级)
运行时间60-90分钟15-30分钟(并行流水线)
误报率低(需证明可利用)低(相同原则)
学习能力无(静态提示)ReasoningBank 手册

Success Metrics

成功指标

MetricTargetMeasurement
False positive reduction>60% of findings eliminatedPre/post validator comparison
Exploit confirmation rate>80% of confirmed findings truly exploitableManual PoC verification
Cost per run<$15 USDToken tracking per pipeline
Time per run<30 minutesExecution time metrics
Playbook growth100+ patterns after 6 monthsMemory namespace count

指标目标测量方式
误报减少率消除>60%的漏洞验证前后的漏洞数量对比
漏洞利用确认率>80%的已确认漏洞确实可利用人工PoC验证
每次运行成本<15美元按流水线跟踪Token成本
每次运行时间<30分钟执行时间指标
手册增长6个月内积累100+模式记忆命名空间数量

Related Skills

相关技能

  • security-testing - OWASP vulnerability scanning
  • qe-security-compliance - SAST/DAST automation
  • compliance-testing - Regulatory compliance
  • api-testing-patterns - API security testing
  • chaos-engineering-resilience - Security under chaos

  • security-testing - OWASP 漏洞扫描
  • qe-security-compliance - SAST/DAST 自动化
  • compliance-testing - 法规合规性测试
  • api-testing-patterns - API 安全测试
  • chaos-engineering-resilience - 混沌工程下的安全验证

Remember

注意事项

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.
Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.
"无利用则无报告" 无法证明可利用性的漏洞扫描器价值存疑。本技能将安全漏洞从理论风险转化为有证据支持的已验证漏洞。每个已确认的漏洞都附带可复现的概念验证(PoC)。所有误报都会在进入报告前被排除。
注重证据,而非预测 不要报告可能存在的漏洞,要证明确实存在的漏洞。