break-filter-js-from-html

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Break Filter JS From HTML

从HTML中绕过JavaScript过滤器

Overview

概述

This skill provides a systematic methodology for analyzing and bypassing HTML sanitization filters that attempt to prevent JavaScript execution. The focus is on understanding filter mechanics deeply before attempting bypasses, and on robust verification of solutions.

本技能提供了一套系统化的方法，用于分析和绕过试图阻止JavaScript执行的HTML清理过滤器。核心是在尝试绕过前深入理解过滤器机制，并对解决方案进行可靠验证。

When to Use This Skill

何时使用本技能

Analyzing HTML sanitization filters to find bypass vectors
CTF challenges involving XSS filter evasion
Authorized security testing of web application input sanitization
Understanding parser differentials between server-side parsers and browsers

分析HTML清理过滤器以寻找绕过途径
涉及XSS过滤器规避的CTF挑战
Web应用输入清理机制的授权安全测试
理解服务器端解析器与浏览器之间的解析差异

Phase 1: Environment and Filter Analysis

第一阶段：环境与过滤器分析

Before attempting any bypass, thoroughly understand the test environment and filter mechanics.

在尝试任何绕过操作前，需彻底了解测试环境和过滤器机制。

Environment Reconnaissance

环境侦察

Identify all relevant file locations - Locate the filter implementation, test harness, and any configuration files
Understand the test verification process - Determine how success is measured (browser alert, DOM inspection, etc.)
Verify path dependencies - Check if tests expect files at specific paths; create symlinks or copies if needed
Document the execution flow - Trace how input flows from your payload through the filter to the browser

确定所有相关文件位置 - 找到过滤器实现代码、测试工具及所有配置文件
理解测试验证流程 - 明确成功的判定标准（如浏览器弹窗、DOM检查等）
验证路径依赖 - 检查测试是否要求文件位于特定路径；如有需要，创建符号链接或副本
记录执行流程 - 追踪输入从你的 payload 经过过滤器到浏览器的完整流转路径

Filter Mechanism Analysis

过滤器机制分析

Examine the filter code to understand:

Parsing library used - Different parsers (BeautifulSoup, DOMPurify, html-sanitizer, etc.) have different behaviors
What elements are removed - Script tags, iframes, objects, embeds, etc.
What attributes are stripped - Event handlers (on*), href with javascript:, etc.
Processing order - Does the filter run once or recursively? Are there multiple passes?
Output encoding - Is the output HTML-encoded, or passed through raw?

检查过滤器代码以了解：

使用的解析库 - 不同解析库（BeautifulSoup、DOMPurify、html-sanitizer等）的行为存在差异
被移除的元素 - Script标签、iframe、object、embed等
被剥离的属性 - 事件处理器（on*）、带javascript:的href等
处理顺序 - 过滤器是运行一次还是递归执行？是否有多轮处理？
输出编码 - 输出是经过HTML编码，还是直接原样传递？

Create a Filter Output Test

创建过滤器输出测试

Before running browser tests, create a quick method to see the filter's output directly:

bash

undefined

在进行浏览器测试前，先创建一种快速查看过滤器输出的方法：

bash

undefined

Example: Check what the filter outputs for a given input

echo '<script>alert(1)</script>' > /tmp/test.html && python filter.py /tmp/test.html && cat /tmp/test.html


This allows rapid iteration without slow browser-based testing.

echo '<script>alert(1)</script>' > /tmp/test.html && python filter.py /tmp/test.html && cat /tmp/test.html


这让你无需依赖缓慢的浏览器测试，即可快速迭代测试。

Phase 2: Bypass Strategy Selection

第二阶段：选择绕过策略

Based on the filter analysis, select appropriate bypass strategies. Order these by likelihood of success given the specific filter.

基于过滤器分析结果，选择合适的绕过策略。根据目标过滤器的特性，按成功概率排序。

Parser Differential Exploits

解析差异利用

Parser differentials occur when the server-side filter parses HTML differently than browsers. This is often the most effective approach for library-based filters.

Key concept: The filter's parser may interpret certain HTML constructs differently than browsers, allowing tags that appear "safe" to the filter to execute JavaScript in browsers.

Elements that commonly cause parser differentials:

```
<noscript>
```
- Parsed differently with/without JavaScript enabled
```
<template>
```
- Content may not be parsed as HTML by some libraries
```
<textarea>
```
and
```
<title>
```
- RCDATA parsing contexts
Comments and CDATA sections
Malformed or nested tags

当服务器端过滤器与浏览器对HTML的解析方式不同时，就会出现解析差异。这通常是针对基于库的过滤器最有效的方法。

核心概念： 过滤器的解析器对某些HTML结构的解释可能与浏览器不同，这使得在过滤器看来“安全”的标签，能够在浏览器中执行JavaScript。

常引发解析差异的元素：

```
<noscript>
```
- 启用/禁用JavaScript时的解析方式不同
```
<template>
```
- 部分库可能不会将其内容解析为HTML
```
<textarea>
```
和
```
<title>
```
- RCDATA解析上下文
注释和CDATA段
格式错误或嵌套的标签

Encoding and Obfuscation

编码与混淆

HTML entity encoding (decimal, hex, named entities)
Unicode normalization issues
Double encoding
Null bytes and other special characters
Case variations (if filter is case-sensitive)

HTML实体编码（十进制、十六进制、命名实体）
Unicode归一化问题
双重编码
空字节及其他特殊字符
大小写变体（若过滤器区分大小写）

DOM Clobbering and Indirect Execution

DOM劫持与间接执行

Creating elements that shadow built-in properties
Exploiting existing JavaScript that reads from DOM
CSS-based attacks (if JavaScript reads computed styles)

创建覆盖内置属性的元素
利用现有读取DOM的JavaScript代码
基于CSS的攻击（若JavaScript会读取计算样式）

Lesser-Known Vectors

小众绕过途径

SVG with embedded scripts or event handlers
MathML elements
XML processing instructions (if XHTML mode)
Data URIs in appropriate contexts

嵌入脚本或事件处理器的SVG
MathML元素
XML处理指令（若处于XHTML模式）
合适上下文中的Data URI

Phase 3: Systematic Testing

第三阶段：系统化测试

Testing Methodology

测试方法

Test filter output first - Before browser testing, verify the filter passes your payload through
Use a minimal payload - Start with the simplest possible XSS (
```
alert(1)
```
) before complex payloads
Document each attempt - Record what was tried, filter output, and browser result
Understand failures - When a technique fails, determine if it was filtered or if the browser didn't execute it

先测试过滤器输出 - 在浏览器测试前，先验证过滤器是否允许你的payload通过
使用最小化payload - 在尝试复杂payload前，从最简单的XSS（
```
alert(1)
```
）开始
记录每一次尝试 - 记录尝试的内容、过滤器输出和浏览器结果
分析失败原因 - 当某技术失败时，确定是被过滤器拦截，还是浏览器未执行

Efficient Iteration Pattern

高效迭代模式

1. Hypothesize a bypass based on filter analysis
2. Test against filter directly (fast)
3. If filter passes payload through, test in browser
4. If browser doesn't execute, investigate why
5. If filter blocks, analyze how and adjust approach

1. 基于过滤器分析提出绕过假设
2. 直接在过滤器上测试（快速）
3. 若过滤器允许payload通过，在浏览器中测试
4. 若浏览器未执行，调查原因
5. 若过滤器拦截，分析拦截方式并调整方法

Avoid These Inefficiencies

避免这些低效操作

Running slow browser tests for payloads that don't survive the filter
Moving to new techniques without understanding why previous ones failed
Trying browser-incompatible techniques (e.g., deprecated HTML features)

对无法通过过滤器的payload进行缓慢的浏览器测试
在未理解之前方法失败原因的情况下更换新技巧
使用浏览器不兼容的技术（如已废弃的HTML特性）

Phase 4: Verification

第四阶段：验证

Robust Solution Verification

可靠的解决方案验证

A single passing test is insufficient. Verify solutions thoroughly:

Run multiple times - Ensure the solution works consistently, not just once
Test filter idempotency - Run the filtered output through the filter again to ensure it still works
Check for timing issues - Browser-based tests may have race conditions
Verify in isolation - Test the filtered HTML directly in a browser outside the test harness
Document exact steps - Record the precise sequence to reproduce the successful bypass

单次测试通过并不足够。需全面验证解决方案：

多次运行测试 - 确保解决方案持续有效，而非偶然成功
测试过滤器幂等性 - 将过滤后的输出再次传入过滤器，确保仍可正常工作
检查时序问题 - 基于浏览器的测试可能存在竞争条件
隔离测试 - 在测试工具之外，直接在浏览器中测试过滤后的HTML
记录精确步骤 - 记录重现成功绕过的具体流程

Before Declaring Success

宣布成功前需确认

Confirm the test passes multiple consecutive runs
Verify no pending file modifications could invalidate the solution
Ensure the solution doesn't depend on test environment quirks
Check that the final state of all files is correct

连续多次测试均通过
确认没有未完成的文件修改可能导致结果无效
确保解决方案不依赖测试环境的特殊配置
检查所有文件的最终状态是否正确

Common Pitfalls

常见陷阱

Environment Issues

环境问题

Path mismatches - Test harnesses may expect files at specific locations different from where you found them
Stale state - Previous failed attempts may leave files in unexpected states
Permission issues - Filters may fail silently if they can't write output files

路径不匹配 - 测试工具可能期望文件位于与你找到的位置不同的特定路径
状态过期 - 之前的失败尝试可能导致文件处于异常状态
权限问题 - 若过滤器无法写入输出文件，可能会静默失败

Analysis Mistakes

分析错误

Assuming filter behavior - Always verify by reading the code; don't guess what's filtered
Ignoring processing order - A filter that removes
```
<script>
```
then
```
<iframe>
```
may be bypassed differently than one that does it in reverse
Missing recursive filtering - Some filters process until no more matches; others run once

假设过滤器行为 - 务必通过阅读代码验证，不要猜测过滤规则
忽略处理顺序 - 先移除
```
<script>
```
再移除
```
<iframe>
```
的过滤器，与顺序相反的过滤器，绕过方法可能不同
遗漏递归过滤 - 部分过滤器会处理到无匹配项为止，其他则仅运行一次

Testing Mistakes

测试错误

Browser-specific payloads - Techniques that work in one browser may fail in another
Deprecated HTML - Many classic XSS vectors no longer work in modern browsers
Premature optimization - Getting a complex payload through is worthless if a simpler one works

浏览器特定payload - 在某浏览器有效的技巧，可能在其他浏览器失效
已废弃HTML - 许多经典XSS向量在现代浏览器中已无法使用
过早优化 - 若简单payload即可生效，复杂payload毫无意义

Verification Mistakes

验证错误

Single test run - Flaky tests can pass once then fail
Modifying files after success - Any changes after a successful test may invalidate it
Ignoring test harness quirks - The test may measure success differently than expected

单次测试运行 - 不稳定的测试可能偶然通过，随后失败
成功测试后修改文件 - 成功测试后的任何修改都可能导致结果无效
忽略测试工具特性 - 测试对成功的判定标准可能与你的预期不同