pentest-whitebox-code-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePentest Whitebox Code Review
渗透测试白盒代码评审
Purpose
目的
Perform systematic white-box source code security audit using Shannon's backward taint analysis methodology. Traces from dangerous sinks back to user-controlled sources, classifies injection contexts by slot type, verifies XSS render contexts, and produces a prioritized exploitation queue for downstream proof-driven exploitation.
采用Shannon反向污点分析方法论,开展系统性的白盒源代码安全审计。从危险的输出点(sink)追溯至用户可控的输入源,按插槽类型对注入上下文进行分类,验证XSS渲染上下文,并生成优先级排序的漏洞利用队列,供下游基于验证的漏洞利用工作使用。
Prerequisites
前置条件
Authorization Requirements
授权要求
- Written authorization with explicit scope for source code review
- Source code access — full repository with version control history
- Architecture documentation if available (data flow diagrams, API specs)
- Deployment configuration access (environment variables, secrets management)
- 书面授权:明确包含源代码评审的范围
- 源代码访问权限:含版本控制历史的完整仓库
- 架构文档(如有):数据流图、API规范
- 部署配置访问权限:环境变量、密钥管理配置
Environment Setup
环境搭建
- semgrep with custom rules for taint analysis
- CodeQL database built for target language
- ripgrep for fast pattern searching
- jadx for Android APK decompilation (if applicable)
- Source map extraction tools for minified JavaScript
- AST parsing tools for target language (tree-sitter, babel, etc.)
- 用于污点分析的自定义规则semgrep
- 为目标语言构建的CodeQL数据库
- 用于快速模式搜索的ripgrep
- 用于Android APK反编译的jadx(如适用)
- 用于压缩JavaScript的源码映射提取工具
- 目标语言的AST解析工具(tree-sitter、babel等)
Core Workflow
核心工作流
Phase 1: Discovery
阶段1:发现
- Architecture Mapping: Identify application layers (routing, controllers, services, data access, templates). Map data flow from HTTP entry points through business logic to database/file/external sinks.
- Entry Point Enumeration: Catalog all user-controlled input sources — HTTP parameters, headers, cookies, file uploads, WebSocket messages, environment variables, database reads of user-stored data.
- Security Pattern Inventory: Identify existing security controls — input validation functions, output encoding helpers, parameterized query patterns, CSRF protections, authentication middleware, rate limiters.
- 架构映射:识别应用层(路由、控制器、服务、数据访问、模板)。绘制从HTTP入口点经业务逻辑到数据库/文件/外部输出点的数据流图。
- 入口点枚举:记录所有用户可控输入源——HTTP参数、请求头、Cookie、文件上传、WebSocket消息、环境变量、存储用户数据的数据库读取结果。
- 安全模式盘点:识别现有安全控制措施——输入验证函数、输出编码助手、参数化查询模式、CSRF防护、认证中间件、速率限制器。
Phase 2: Vulnerability Analysis (5 Parallel Tracks)
阶段2:漏洞分析(5个并行追踪方向)
- Injection Sink Hunting: Backward taint from SQL/command/file/template sinks to sources. Classify each sink by slot type: SQL-val, SQL-ident, CMD-argument, FILE-path, TEMPLATE-expr. Verify whether parameterization or sanitization breaks the taint chain.
- XSS Render Context Analysis: Identify all dynamic output points in templates/responses. Classify each by render context: HTML_BODY, HTML_ATTRIBUTE, JAVASCRIPT_STRING, URL_PARAM, CSS_VALUE. Verify context-appropriate encoding is applied at each output point.
- Authentication Checklist (9-point): Transport security, rate limiting, session management, token properties, session fixation resistance, password policy enforcement, login response uniformity, account recovery security, SSO/OAuth implementation.
- Authorization Model Review (3-type): Horizontal (same-role cross-user access), vertical (privilege escalation across roles), context-workflow (state-dependent authorization bypass).
- SSRF Sink Hunting: Identify all outbound request sinks. Classify by type: classic (direct URL), blind (no response), semi-blind (partial response), stored (deferred execution). Trace URL construction from user input to request dispatch.
- 注入输出点排查:从SQL/命令/文件/模板输出点反向追溯至输入源。按插槽类型对每个输出点分类:SQL-val、SQL-ident、CMD-argument、FILE-path、TEMPLATE-expr。验证参数化或sanitization措施是否中断污点链。
- XSS渲染上下文分析:识别模板/响应中的所有动态输出点。按渲染上下文分类:HTML_BODY、HTML_ATTRIBUTE、JAVASCRIPT_STRING、URL_PARAM、CSS_VALUE。验证每个输出点是否应用了符合上下文要求的编码。
- 认证清单(9项):传输安全、速率限制、会话管理、令牌属性、会话固定防护、密码策略执行、登录响应一致性、账号恢复安全性、SSO/OAuth实现。
- 授权模型评审(3类):横向(同角色跨用户访问)、纵向(跨角色权限提升)、上下文工作流(基于状态的授权绕过)。
- SSRF输出点排查:识别所有 outbound 请求输出点。按类型分类:经典型(直接URL)、盲型(无响应)、半盲型(部分响应)、存储型(延迟执行)。追踪从用户输入到请求分发的URL构建流程。
Phase 3: Synthesis
阶段3:综合分析
- Confidence Scoring & Exploitation Queue: Score each finding by taint chain completeness, sanitization bypass likelihood, and impact severity. Generate exploitation queue JSON for downstream exploit validation.
- 置信度评分与漏洞利用队列:根据污点链完整性、sanitization绕过可能性、影响严重程度对每个发现进行评分。生成JSON格式的漏洞利用队列,供下游漏洞验证使用。
Slot Type Classification
插槽类型分类
| Slot Type | Sink Pattern | Sanitization Required |
|---|---|---|
| SQL-val | Query parameter value position | Parameterized query / prepared statement |
| SQL-ident | Table name, column name, ORDER BY | Allowlist validation |
| CMD-argument | Shell command argument | Argument escaping + allowlist |
| FILE-path | File read/write path construction | Path canonicalization + allowlist |
| TEMPLATE-expr | Template engine expression | Context-aware auto-escaping |
| Slot Type | 输出点模式 | 所需Sanitization措施 |
|---|---|---|
| SQL-val | 查询参数值位置 | 参数化查询/预编译语句 |
| SQL-ident | 表名、列名、ORDER BY | 白名单验证 |
| CMD-argument | Shell命令参数 | 参数转义+白名单 |
| FILE-path | 文件读写路径构造 | 路径规范化+白名单 |
| TEMPLATE-expr | 模板引擎表达式 | 上下文感知自动转义 |
Render Context Classification
渲染上下文分类
| Context | Output Location | Encoding Required |
|---|---|---|
| HTML_BODY | Between HTML tags | HTML entity encoding |
| HTML_ATTRIBUTE | Inside attribute values | Attribute encoding + quoting |
| JAVASCRIPT_STRING | Inside JS string literals | JavaScript Unicode escaping |
| URL_PARAM | URL query parameter values | URL percent encoding |
| CSS_VALUE | Inside CSS property values | CSS hex encoding |
| Context | 输出位置 | 所需编码方式 |
|---|---|---|
| HTML_BODY | HTML标签之间 | HTML实体编码 |
| HTML_ATTRIBUTE | 属性值内部 | 属性编码+引号包裹 |
| JAVASCRIPT_STRING | JS字符串字面量内部 | JavaScript Unicode转义 |
| URL_PARAM | URL查询参数值 | URL百分号编码 |
| CSS_VALUE | CSS属性值内部 | CSS十六进制编码 |
Tool Categories
工具分类
| Category | Tools | Purpose |
|---|---|---|
| Taint Analysis | semgrep, CodeQL | Automated sink-to-source taint tracing |
| Pattern Search | ripgrep, ast-grep | Fast code pattern matching |
| Decompilation | jadx, sourcemap-extract | Recover source from compiled artifacts |
| AST Parsing | tree-sitter, babel | Language-aware code structure analysis |
| Dependency Audit | npm audit, pip-audit, snyk | Known vulnerability detection |
| 分类 | 工具 | 用途 |
|---|---|---|
| 污点分析 | semgrep、CodeQL | 自动化输出点到输入源的污点追踪 |
| 模式搜索 | ripgrep、ast-grep | 快速代码模式匹配 |
| 反编译 | jadx、sourcemap-extract | 从编译产物恢复源代码 |
| AST解析 | tree-sitter、babel | 基于语言特性的代码结构分析 |
| 依赖审计 | npm audit、pip-audit、snyk | 已知漏洞检测 |
References
参考资料
- - Tool function signatures and parameters
references/tools.md - - Taint analysis workflows and vulnerability patterns
references/workflows.md
- - 工具函数签名与参数
references/tools.md - - 污点分析工作流与漏洞模式
references/workflows.md