pentest-validation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Pentest Validation

渗透测试验证

<default_to_action> When validating security findings:
  1. REQUIRE explicit authorization for target URL
  2. SCAN with qe-security-scanner (SAST + dependency + secrets)
  3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
  4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
  5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
  6. UPDATE exploit playbook with new patterns
Quality Gates:
  • Authorization confirmed before ANY exploitation
  • Target URL is staging/dev (NOT production)
  • Budget cap enforced ($15 default)
  • Time cap enforced (30 min default)
  • All exploitation attempts logged </default_to_action>
<default_to_action> 验证安全漏洞时:
  1. 要求目标URL的明确授权
  2. 使用qe-security-scanner进行扫描(SAST + 依赖项 + 敏感信息)
  3. 并行使用qe-security-reviewer和qe-security-auditor进行分析
  4. 使用qe-pentest-validator进行验证(分级利用,按漏洞类型并行处理)
  5. 仅报告带有PoC证据的已确认漏洞(“无利用则不报告”)
  6. 将新的攻击模式更新到利用手册中
质量关卡:
  • 任何利用操作前必须确认授权
  • 目标URL为预发布/开发环境(而非生产环境)
  • 执行预算上限(默认15美元)
  • 执行时间上限(默认30分钟)
  • 所有利用尝试均需记录 </default_to_action>

Quick Reference Card

快速参考卡片

The 4-Phase Pipeline

四阶段流水线

PhaseAgent(s)PurposeParallelism
1. Reconqe-security-scannerSAST, DAST, dependency scan, secretsInternal parallel
2. Analysisqe-security-reviewer + qe-security-auditorCode review + compliance checkBoth in parallel
3. Validationqe-pentest-validatorGraduated exploit validationPer-vuln-type parallel
4. Reportqe-quality-gate"No Exploit, No Report" filterSequential
阶段代理工具目的并行方式
1. 侦察qe-security-scannerSAST、DAST、依赖项扫描、敏感信息检测内部并行
2. 分析qe-security-reviewer + qe-security-auditor代码安全审查 + 合规性检查两者并行
3. 验证qe-pentest-validator分级利用验证按漏洞类型并行
4. 报告qe-quality-gate“无利用则不报告”过滤串行

Graduated Exploitation Tiers

分级利用层级

TierHandlerCostLatencyUse When
1Agent Booster (WASM)$0<1msCode pattern is conclusive (eval, innerHTML, hardcoded creds)
2Haiku$0.0002~500msNeed payload test against live target
3Sonnet/Opus$0.003-$0.0152-5sFull exploit chain with data proof
层级处理工具成本延迟使用场景
1Agent Booster (WASM)$0<1ms代码模式明确存在风险(如eval、innerHTML、硬编码凭据)
2Haiku$0.0002~500ms需要针对真实目标测试 payload
3Sonnet/Opus$0.003-$0.0152-5s完整利用链并获取数据证明

When to Use This Skill

适用场景

ScenarioTierEstimated Cost
PR security review (source only)1$0
Pre-release validation (staging)1-2$1-5
Full pentest validation1-3$5-15
Compliance audit evidence1-3$5-15

场景层级预估成本
PR安全审查(仅源码)1$0
发布前验证(预发布环境)1-2$1-5
完整渗透测试验证1-3$5-15
合规审计取证1-3$5-15

Configuration

配置

yaml
pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

yaml
pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

Safeguards (Mandatory)

防护措施(强制)

Authorization Gate

授权关卡

Every pentest validation run MUST:
  1. Display target URL and exploitation tier to user
  2. Require explicit confirmation: "I own/authorized testing of this target"
  3. Log authorization with timestamp
  4. Block if target URL matches production patterns
每次渗透测试验证必须:
  1. 向用户展示目标URL和利用层级
  2. 要求明确确认:“我拥有/已授权测试该目标”
  3. 记录授权时间戳
  4. 若目标URL匹配生产环境模式则阻止操作

What This Skill Does NOT Do

本技能不支持的操作

  • Full autonomous reconnaissance (Nmap, Subfinder)
  • Zero-day exploit development
  • Attack targets without explicit authorization
  • Test production systems
  • Store actual exfiltrated data (only proof of access)
  • Social engineering or phishing simulation
  • Port scanning or service discovery

  • 完全自主侦察(如Nmap、Subfinder)
  • 0day漏洞开发
  • 无明确授权的目标攻击
  • 测试生产系统
  • 存储实际窃取的数据(仅保留访问证明)
  • 社会工程或钓鱼模拟
  • 端口扫描或服务发现

Validation Pipelines

验证流水线

Injection Pipeline

注入漏洞流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
SQL injectionString concat in query
' OR '1'='1
response diff
UNION SELECT data extraction
NoSQL injection
$where
,
$gt
in query
Operator injection testCollection enumeration
Command injection
exec()
,
system()
calls
Command delimiter testReverse shell proof
LDAP injectionString concat in filterWildcard injectionDirectory enumeration
攻击类型层级1(模式检测)层级2(Payload测试)层级3(完整利用)
SQL注入查询中的字符串拼接
' OR '1'='1
响应差异
UNION SELECT数据提取
NoSQL注入查询中存在
$where
$gt
操作符注入测试集合枚举
命令注入存在
exec()
system()
调用
命令分隔符测试反向Shell证明
LDAP注入过滤器中的字符串拼接通配符注入目录枚举

XSS Pipeline

XSS漏洞流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
Reflected XSSNo output encoding
<img onerror>
reflection
Browser JS execution via Playwright
Stored XSS
innerHTML
assignment
Payload stored + retrievedCookie theft PoC
DOM XSS
document.write(location)
Fragment injectionDOM manipulation proof
攻击类型层级1(模式检测)层级2(Payload测试)层级3(完整利用)
反射型XSS输出未编码
<img onerror>
反射
通过Playwright执行浏览器JS
存储型XSS存在
innerHTML
赋值
Payload存储并成功检索Cookie窃取PoC
DOM型XSS存在
document.write(location)
片段注入DOM操作证明

Auth Pipeline

认证漏洞流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
JWT noneNo algorithm validationModified JWT acceptedAdmin access with forged token
Session fixationNo session rotationPre-set session reusedCross-user session hijack
Credential stuffingNo rate limiting100 attempts unblockedValid credential discovery
IDORNo authorization checkAccess other user dataFull CRUD on foreign resources
攻击类型层级1(模式检测)层级2(Payload测试)层级3(完整利用)
JWT none算法未验证算法修改后的JWT被接受伪造令牌获取管理员权限
Session固定未进行Session轮换预设置的Session被复用跨用户会话劫持
凭证填充无速率限制100次尝试未被拦截有效凭证发现
IDOR无授权校验访问其他用户数据对外部资源执行完整CRUD操作

SSRF Pipeline

SSRF漏洞流水线

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
Internal URLUser-controlled URL fetch
http://169.254.169.254
Cloud metadata extraction
DNS rebindingURL validation bypassRebind to internal IPInternal service access
Protocol smugglingURL scheme not restricted
file:///etc/passwd
File content in response

攻击类型层级1(模式检测)层级2(Payload测试)层级3(完整利用)
内部URL访问用户可控制URL获取访问
http://169.254.169.254
云元数据提取
DNS重绑定URL验证绕过重绑定到内部IP内部服务访问
协议走私URL协议未限制访问
file:///etc/passwd
响应中包含文件内容

Agent Coordination

代理协调

Orchestration Pattern

编排模式

typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");
typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");

Finding Classification

漏洞分类

StatusMeaningAction
confirmed-exploitable
Exploitation succeeded with PoCReport with evidence
likely-exploitable
Partial exploitation, defenses detectedReport with caveats
not-exploitable
All exploitation attempts failedFilter from report
inconclusive
WAF/defense blocked, unclear if vulnerableReport for manual review

状态含义操作
confirmed-exploitable
利用成功并获取PoC附带证据报告
likely-exploitable
部分利用成功,检测到防御措施附带说明报告
not-exploitable
所有利用尝试失败从报告中过滤
inconclusive
WAF/防御措施拦截,是否存在漏洞不明确上报供人工审查

Exploit Playbook Memory

利用手册记忆

Namespace Structure

命名空间结构

aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc
aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc

Learning Loop

学习循环

  1. Before validation: Query playbook for known patterns matching findings
  2. During validation: Try known payloads first (higher success rate)
  3. After validation: Store new successful patterns with confidence scores
  4. Over time: Agent converges on most effective payloads per tech stack

  1. 验证前:查询手册获取与漏洞匹配的已知模式
  2. 验证中:优先尝试已知Payload(成功率更高)
  3. 验证后:存储新的成功模式并附带置信度评分
  4. 长期优化:代理针对不同技术栈收敛出最有效的Payload

Cost Optimization

成本优化

Estimated Cost by Scenario

不同场景的预估成本

ScenarioTier MixFindingsEst. CostEst. Time
PR check (source only)100% Tier 15$0<5s
Sprint validation70% T1, 30% T215$2-55-10 min
Release validation40% T1, 40% T2, 20% T325$8-1515-30 min
Full pentest20% T1, 30% T2, 50% T340$15-3030-60 min
场景层级组合漏洞数量预估成本预估时间
PR检查(仅源码)100% 层级15$0<5秒
迭代验证70% 层级1,30% 层级215$2-55-10分钟
发布验证40% 层级1,40% 层级2,20% 层级325$8-1515-30分钟
完整渗透测试20% 层级1,30% 层级2,50% 层级340$15-3030-60分钟

Cost vs Shannon Comparison

与Shannon的成本对比

MetricShannonAQE Pentest Validation
Cost per run~$50$5-15 (graduated tiers)
Runtime60-90 min15-30 min (parallel pipelines)
False positive rateLow (exploit-proven)Low (same principle)
LearningNone (static prompts)ReasoningBank playbook

指标ShannonAQE渗透测试验证
单次运行成本~$50$5-15(分级层级)
运行时间60-90分钟15-30分钟(并行流水线)
误报率低(基于利用证明)低(相同原理)
学习能力无(静态提示)ReasoningBank手册

Success Metrics

成功指标

MetricTargetMeasurement
False positive reduction>60% of findings eliminatedPre/post validator comparison
Exploit confirmation rate>80% of confirmed findings truly exploitableManual PoC verification
Cost per run<$15 USDToken tracking per pipeline
Time per run<30 minutesExecution time metrics
Playbook growth100+ patterns after 6 monthsMemory namespace count

指标目标测量方式
误报减少率消除>60%的漏洞验证前后的漏洞数量对比
漏洞确认率>80%的已确认漏洞真实可利用人工PoC验证
单次运行成本<15美元按流水线跟踪Token成本
单次运行时间<30分钟执行时间指标
手册增长6个月后积累100+模式记忆命名空间数量

Related Skills

相关技能

  • security-testing - OWASP vulnerability scanning
  • qe-security-compliance - SAST/DAST automation
  • compliance-testing - Regulatory compliance
  • api-testing-patterns - API security testing
  • chaos-engineering-resilience - Security under chaos

  • security-testing - OWASP漏洞扫描
  • qe-security-compliance - SAST/DAST自动化
  • compliance-testing - 合规性测试
  • api-testing-patterns - API安全测试
  • chaos-engineering-resilience - 混沌环境下的安全测试

Remember

核心原则

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.
Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.
“无利用则不报告。” 无法证明可利用性的漏洞扫描工具只能提供不确定的价值。本技能将安全漏洞从理论风险转化为带有证据的已证实漏洞。每个已确认的漏洞都附带可复现的PoC。所有误报都会在进入报告前被消除。
注重证明,而非预测。 不要报告可能存在的漏洞。要证明确实存在的漏洞。