break-filter-js-from-html

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Break Filter JS From HTML

从HTML中绕过JavaScript过滤器

Overview

概述

This skill provides a systematic methodology for analyzing and bypassing HTML sanitization filters that attempt to prevent JavaScript execution. The focus is on understanding filter mechanics deeply before attempting bypasses, and on robust verification of solutions.
本技能提供了一套系统化的方法,用于分析和绕过试图阻止JavaScript执行的HTML清理过滤器。核心是在尝试绕过前深入理解过滤器机制,并对解决方案进行可靠验证。

When to Use This Skill

何时使用本技能

  • Analyzing HTML sanitization filters to find bypass vectors
  • CTF challenges involving XSS filter evasion
  • Authorized security testing of web application input sanitization
  • Understanding parser differentials between server-side parsers and browsers
  • 分析HTML清理过滤器以寻找绕过途径
  • 涉及XSS过滤器规避的CTF挑战
  • Web应用输入清理机制的授权安全测试
  • 理解服务器端解析器与浏览器之间的解析差异

Phase 1: Environment and Filter Analysis

第一阶段:环境与过滤器分析

Before attempting any bypass, thoroughly understand the test environment and filter mechanics.
在尝试任何绕过操作前,需彻底了解测试环境和过滤器机制。

Environment Reconnaissance

环境侦察

  1. Identify all relevant file locations - Locate the filter implementation, test harness, and any configuration files
  2. Understand the test verification process - Determine how success is measured (browser alert, DOM inspection, etc.)
  3. Verify path dependencies - Check if tests expect files at specific paths; create symlinks or copies if needed
  4. Document the execution flow - Trace how input flows from your payload through the filter to the browser
  1. 确定所有相关文件位置 - 找到过滤器实现代码、测试工具及所有配置文件
  2. 理解测试验证流程 - 明确成功的判定标准(如浏览器弹窗、DOM检查等)
  3. 验证路径依赖 - 检查测试是否要求文件位于特定路径;如有需要,创建符号链接或副本
  4. 记录执行流程 - 追踪输入从你的 payload 经过过滤器到浏览器的完整流转路径

Filter Mechanism Analysis

过滤器机制分析

Examine the filter code to understand:
  1. Parsing library used - Different parsers (BeautifulSoup, DOMPurify, html-sanitizer, etc.) have different behaviors
  2. What elements are removed - Script tags, iframes, objects, embeds, etc.
  3. What attributes are stripped - Event handlers (on*), href with javascript:, etc.
  4. Processing order - Does the filter run once or recursively? Are there multiple passes?
  5. Output encoding - Is the output HTML-encoded, or passed through raw?
检查过滤器代码以了解:
  1. 使用的解析库 - 不同解析库(BeautifulSoup、DOMPurify、html-sanitizer等)的行为存在差异
  2. 被移除的元素 - Script标签、iframe、object、embed等
  3. 被剥离的属性 - 事件处理器(on*)、带javascript:的href等
  4. 处理顺序 - 过滤器是运行一次还是递归执行?是否有多轮处理?
  5. 输出编码 - 输出是经过HTML编码,还是直接原样传递?

Create a Filter Output Test

创建过滤器输出测试

Before running browser tests, create a quick method to see the filter's output directly:
bash
undefined
在进行浏览器测试前,先创建一种快速查看过滤器输出的方法:
bash
undefined

Example: Check what the filter outputs for a given input

Example: Check what the filter outputs for a given input

echo '<script>alert(1)</script>' > /tmp/test.html && python filter.py /tmp/test.html && cat /tmp/test.html

This allows rapid iteration without slow browser-based testing.
echo '<script>alert(1)</script>' > /tmp/test.html && python filter.py /tmp/test.html && cat /tmp/test.html

这让你无需依赖缓慢的浏览器测试,即可快速迭代测试。

Phase 2: Bypass Strategy Selection

第二阶段:选择绕过策略

Based on the filter analysis, select appropriate bypass strategies. Order these by likelihood of success given the specific filter.
基于过滤器分析结果,选择合适的绕过策略。根据目标过滤器的特性,按成功概率排序。

Parser Differential Exploits

解析差异利用

Parser differentials occur when the server-side filter parses HTML differently than browsers. This is often the most effective approach for library-based filters.
Key concept: The filter's parser may interpret certain HTML constructs differently than browsers, allowing tags that appear "safe" to the filter to execute JavaScript in browsers.
Elements that commonly cause parser differentials:
  • <noscript>
    - Parsed differently with/without JavaScript enabled
  • <template>
    - Content may not be parsed as HTML by some libraries
  • <textarea>
    and
    <title>
    - RCDATA parsing contexts
  • Comments and CDATA sections
  • Malformed or nested tags
当服务器端过滤器与浏览器对HTML的解析方式不同时,就会出现解析差异。这通常是针对基于库的过滤器最有效的方法。
核心概念: 过滤器的解析器对某些HTML结构的解释可能与浏览器不同,这使得在过滤器看来“安全”的标签,能够在浏览器中执行JavaScript。
常引发解析差异的元素:
  • <noscript>
    - 启用/禁用JavaScript时的解析方式不同
  • <template>
    - 部分库可能不会将其内容解析为HTML
  • <textarea>
    <title>
    - RCDATA解析上下文
  • 注释和CDATA段
  • 格式错误或嵌套的标签

Encoding and Obfuscation

编码与混淆

  • HTML entity encoding (decimal, hex, named entities)
  • Unicode normalization issues
  • Double encoding
  • Null bytes and other special characters
  • Case variations (if filter is case-sensitive)
  • HTML实体编码(十进制、十六进制、命名实体)
  • Unicode归一化问题
  • 双重编码
  • 空字节及其他特殊字符
  • 大小写变体(若过滤器区分大小写)

DOM Clobbering and Indirect Execution

DOM劫持与间接执行

  • Creating elements that shadow built-in properties
  • Exploiting existing JavaScript that reads from DOM
  • CSS-based attacks (if JavaScript reads computed styles)
  • 创建覆盖内置属性的元素
  • 利用现有读取DOM的JavaScript代码
  • 基于CSS的攻击(若JavaScript会读取计算样式)

Lesser-Known Vectors

小众绕过途径

  • SVG with embedded scripts or event handlers
  • MathML elements
  • XML processing instructions (if XHTML mode)
  • Data URIs in appropriate contexts
  • 嵌入脚本或事件处理器的SVG
  • MathML元素
  • XML处理指令(若处于XHTML模式)
  • 合适上下文中的Data URI

Phase 3: Systematic Testing

第三阶段:系统化测试

Testing Methodology

测试方法

  1. Test filter output first - Before browser testing, verify the filter passes your payload through
  2. Use a minimal payload - Start with the simplest possible XSS (
    alert(1)
    ) before complex payloads
  3. Document each attempt - Record what was tried, filter output, and browser result
  4. Understand failures - When a technique fails, determine if it was filtered or if the browser didn't execute it
  1. 先测试过滤器输出 - 在浏览器测试前,先验证过滤器是否允许你的payload通过
  2. 使用最小化payload - 在尝试复杂payload前,从最简单的XSS(
    alert(1)
    )开始
  3. 记录每一次尝试 - 记录尝试的内容、过滤器输出和浏览器结果
  4. 分析失败原因 - 当某技术失败时,确定是被过滤器拦截,还是浏览器未执行

Efficient Iteration Pattern

高效迭代模式

1. Hypothesize a bypass based on filter analysis
2. Test against filter directly (fast)
3. If filter passes payload through, test in browser
4. If browser doesn't execute, investigate why
5. If filter blocks, analyze how and adjust approach
1. 基于过滤器分析提出绕过假设
2. 直接在过滤器上测试(快速)
3. 若过滤器允许payload通过,在浏览器中测试
4. 若浏览器未执行,调查原因
5. 若过滤器拦截,分析拦截方式并调整方法

Avoid These Inefficiencies

避免这些低效操作

  • Running slow browser tests for payloads that don't survive the filter
  • Moving to new techniques without understanding why previous ones failed
  • Trying browser-incompatible techniques (e.g., deprecated HTML features)
  • 对无法通过过滤器的payload进行缓慢的浏览器测试
  • 在未理解之前方法失败原因的情况下更换新技巧
  • 使用浏览器不兼容的技术(如已废弃的HTML特性)

Phase 4: Verification

第四阶段:验证

Robust Solution Verification

可靠的解决方案验证

A single passing test is insufficient. Verify solutions thoroughly:
  1. Run multiple times - Ensure the solution works consistently, not just once
  2. Test filter idempotency - Run the filtered output through the filter again to ensure it still works
  3. Check for timing issues - Browser-based tests may have race conditions
  4. Verify in isolation - Test the filtered HTML directly in a browser outside the test harness
  5. Document exact steps - Record the precise sequence to reproduce the successful bypass
单次测试通过并不足够。需全面验证解决方案:
  1. 多次运行测试 - 确保解决方案持续有效,而非偶然成功
  2. 测试过滤器幂等性 - 将过滤后的输出再次传入过滤器,确保仍可正常工作
  3. 检查时序问题 - 基于浏览器的测试可能存在竞争条件
  4. 隔离测试 - 在测试工具之外,直接在浏览器中测试过滤后的HTML
  5. 记录精确步骤 - 记录重现成功绕过的具体流程

Before Declaring Success

宣布成功前需确认

  • Confirm the test passes multiple consecutive runs
  • Verify no pending file modifications could invalidate the solution
  • Ensure the solution doesn't depend on test environment quirks
  • Check that the final state of all files is correct
  • 连续多次测试均通过
  • 确认没有未完成的文件修改可能导致结果无效
  • 确保解决方案不依赖测试环境的特殊配置
  • 检查所有文件的最终状态是否正确

Common Pitfalls

常见陷阱

Environment Issues

环境问题

  • Path mismatches - Test harnesses may expect files at specific locations different from where you found them
  • Stale state - Previous failed attempts may leave files in unexpected states
  • Permission issues - Filters may fail silently if they can't write output files
  • 路径不匹配 - 测试工具可能期望文件位于与你找到的位置不同的特定路径
  • 状态过期 - 之前的失败尝试可能导致文件处于异常状态
  • 权限问题 - 若过滤器无法写入输出文件,可能会静默失败

Analysis Mistakes

分析错误

  • Assuming filter behavior - Always verify by reading the code; don't guess what's filtered
  • Ignoring processing order - A filter that removes
    <script>
    then
    <iframe>
    may be bypassed differently than one that does it in reverse
  • Missing recursive filtering - Some filters process until no more matches; others run once
  • 假设过滤器行为 - 务必通过阅读代码验证,不要猜测过滤规则
  • 忽略处理顺序 - 先移除
    <script>
    再移除
    <iframe>
    的过滤器,与顺序相反的过滤器,绕过方法可能不同
  • 遗漏递归过滤 - 部分过滤器会处理到无匹配项为止,其他则仅运行一次

Testing Mistakes

测试错误

  • Browser-specific payloads - Techniques that work in one browser may fail in another
  • Deprecated HTML - Many classic XSS vectors no longer work in modern browsers
  • Premature optimization - Getting a complex payload through is worthless if a simpler one works
  • 浏览器特定payload - 在某浏览器有效的技巧,可能在其他浏览器失效
  • 已废弃HTML - 许多经典XSS向量在现代浏览器中已无法使用
  • 过早优化 - 若简单payload即可生效,复杂payload毫无意义

Verification Mistakes

验证错误

  • Single test run - Flaky tests can pass once then fail
  • Modifying files after success - Any changes after a successful test may invalidate it
  • Ignoring test harness quirks - The test may measure success differently than expected
  • 单次测试运行 - 不稳定的测试可能偶然通过,随后失败
  • 成功测试后修改文件 - 成功测试后的任何修改都可能导致结果无效
  • 忽略测试工具特性 - 测试对成功的判定标准可能与你的预期不同