fuzzing-dictionary
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFuzzing Dictionary
模糊测试字典
A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.
模糊测试字典提供特定领域的标记,引导模糊测试工具生成更有价值的输入。不同于纯粹的随机变异,模糊测试工具会整合已知的关键字、魔术数字、协议命令和特定格式字符串,这些内容更有可能触达解析器、协议处理器和文件格式处理程序中的深层代码路径。
Overview
概述
Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.
字典是包含带引号字符串的文本文件,这些字符串代表目标程序的有意义标记。它们帮助模糊测试工具绕过早期验证检查,探索仅通过盲目变异难以触及的代码路径。
Key Concepts
核心概念
| Concept | Description |
|---|---|
| Dictionary Entry | A quoted string (e.g., |
| Hex Escapes | Byte sequences like |
| Token Injection | Fuzzer inserts dictionary entries into generated inputs |
| Cross-Fuzzer Format | Dictionary files work with libFuzzer, AFL++, and cargo-fuzz |
| 概念 | 说明 |
|---|---|
| 字典条目 | 带引号的字符串(例如 |
| 十六进制转义 | 用于表示不可打印字符的字节序列,例如 |
| 标记注入 | 模糊测试工具将字典条目插入到生成的输入中 |
| 跨工具兼容格式 | 字典文件可与libFuzzer、AFL++和cargo-fuzz配合使用 |
When to Apply
适用场景
Apply this technique when:
- Fuzzing parsers (JSON, XML, config files)
- Fuzzing protocol implementations (HTTP, DNS, custom protocols)
- Fuzzing file format handlers (PNG, PDF, media codecs)
- Coverage plateaus early without reaching deeper logic
- Target code checks for specific keywords or magic values
Skip this technique when:
- Fuzzing pure algorithms without format expectations
- Target has no keyword-based parsing
- Corpus already achieves high coverage
在以下场景应用该技术:
- 对解析器(JSON、XML、配置文件)进行模糊测试
- 对协议实现(HTTP、DNS、自定义协议)进行模糊测试
- 对文件格式处理程序(PNG、PDF、媒体编解码器)进行模糊测试
- 测试覆盖率过早进入平台期,无法触及深层逻辑
- 目标代码会检查特定关键字或魔术值
在以下场景跳过该技术:
- 对不涉及格式期望的纯算法进行模糊测试
- 目标程序不基于关键字进行解析
- 测试语料库已实现高覆盖率
Quick Reference
快速参考
| Task | Command/Pattern |
|---|---|
| Use with libFuzzer | |
| Use with AFL++ | |
| Use with cargo-fuzz | |
| Extract from header | |
| Generate from binary | |
| 任务 | 命令/模式 |
|---|---|
| 与libFuzzer配合使用 | |
| 与AFL++配合使用 | |
| 与cargo-fuzz配合使用 | |
| 从头文件提取 | |
| 从二进制文件生成 | |
Step-by-Step
分步指南
Step 1: Create Dictionary File
步骤1:创建字典文件
Create a text file with quoted strings on each line. Use comments () for documentation.
#Example dictionary format:
conf
undefined创建一个文本文件,每行包含一个带引号的字符串。使用注释()添加文档说明。
#字典格式示例:
conf
undefinedLines starting with '#' and empty lines are ignored.
以'#'开头的行和空行将被忽略。
Adds "blah" (w/o quotes) to the dictionary.
向字典添加"blah"(不带引号)。
kw1="blah"
kw1="blah"
Use \ for backslash and " for quotes.
使用\表示反斜杠,"表示引号。
kw2=""ac\dc""
kw2=""ac\dc""
Use \xAB for hex values
使用\xAB表示十六进制值
kw3="\xF7\xF8"
kw3="\xF7\xF8"
the name of the keyword followed by '=' may be omitted:
可以省略关键字名称和等号:
"foo\x0Abar"
undefined"foo\x0Abar"
undefinedStep 2: Generate Dictionary Content
步骤2:生成字典内容
Choose a generation method based on what's available:
From LLM: Prompt ChatGPT or Claude with:
text
A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.From header files:
bash
grep -o '".*"' header.h > header.dictFrom man pages (for CLI tools):
bash
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dictFrom binary strings:
bash
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict根据可用资源选择生成方法:
使用大语言模型: 向ChatGPT或Claude发送以下提示:
text
字典可用于引导模糊测试工具。请为<PNG解析器>的模糊测试编写一个字典文件。每行应为带引号的字符串或键值对,例如kw="value"。包含魔术字节、块类型和常见头值。使用\xF7\xF8这样的十六进制转义表示二进制值。从头文件生成:
bash
grep -o '".*"' header.h > header.dict从手册页生成(适用于CLI工具):
bash
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict从二进制字符串生成:
bash
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dictStep 3: Pass Dictionary to Fuzzer
步骤3:将字典传入模糊测试工具
Use the appropriate flag for your fuzzer (see Quick Reference above).
使用对应工具的专属参数(见上方快速参考)。
Common Patterns
常见模式
Pattern: Protocol Keywords
模式:协议关键字
Use Case: Fuzzing HTTP or custom protocol handlers
Dictionary content:
conf
undefined适用场景: 对HTTP或自定义协议处理器进行模糊测试
字典内容:
conf
undefinedHTTP methods
HTTP方法
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"
Headers
请求头
"Content-Type"
"Authorization"
"Host"
"Content-Type"
"Authorization"
"Host"
Protocol markers
协议标记
"HTTP/1.1"
"HTTP/2.0"
undefined"HTTP/1.1"
"HTTP/2.0"
undefinedPattern: Magic Bytes and File Format Headers
模式:魔术字节与文件格式头
Use Case: Fuzzing image parsers, media decoders, archive handlers
Dictionary content:
conf
undefined适用场景: 对图像解析器、媒体解码器、归档处理程序进行模糊测试
字典内容:
conf
undefinedPNG magic bytes and chunks
PNG魔术字节和块
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"
JPEG markers
JPEG标记
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"
undefinedjpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"
undefinedPattern: Configuration File Keywords
模式:配置文件关键字
Use Case: Fuzzing config file parsers (YAML, TOML, INI)
Dictionary content:
conf
undefined适用场景: 对配置文件解析器(YAML、TOML、INI)进行模糊测试
字典内容:
conf
undefinedCommon config keywords
常见配置关键字
"true"
"false"
"null"
"version"
"enabled"
"disabled"
"true"
"false"
"null"
"version"
"enabled"
"disabled"
Section headers
章节头
"[general]"
"[network]"
"[security]"
undefined"[general]"
"[network]"
"[security]"
undefinedAdvanced Usage
高级用法
Tips and Tricks
技巧与窍门
| Tip | Why It Helps |
|---|---|
| Combine multiple generation methods | LLM-generated keywords + strings from binary covers broad surface |
| Include boundary values | |
| Add format delimiters | |
| Keep dictionaries focused | 50-200 entries perform better than thousands |
| Test dictionary effectiveness | Run with and without dict, compare coverage |
| 技巧 | 优势 |
|---|---|
| 结合多种生成方法 | 大语言模型生成的关键字 + 二进制字符串提取可覆盖更广范围 |
| 包含边界值 | |
| 添加格式分隔符 | |
| 保持字典聚焦 | 50-200个条目的性能优于数千个条目 |
| 测试字典有效性 | 分别在使用和不使用字典的情况下运行测试,对比覆盖率 |
Auto-Generated Dictionaries (AFL++)
自动生成字典(AFL++)
When using compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.
afl-clang-ltoEnable auto-dictionary:
bash
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target使用编译器时,AFL++会自动从二进制文件中的字符串比较中提取字典条目。这一过程通过AUTODICTIONARY特性在编译时完成。
afl-clang-lto启用自动字典:
bash
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o targetDictionary saved to auto.dict
字典将保存到auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target
undefinedafl-fuzz -x auto.dict -i in -o out -- ./target
undefinedCombining Multiple Dictionaries
合并多个字典
Some fuzzers support multiple dictionary files:
bash
undefined部分模糊测试工具支持使用多个字典文件:
bash
undefinedAFL++ with multiple dictionaries
与多个字典配合使用的AFL++命令
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target
undefinedafl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target
undefinedAnti-Patterns
反模式
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Including full sentences | Fuzzer needs atomic tokens, not prose | Break into individual keywords |
| Duplicating entries | Wastes mutation budget | Use |
| Over-sized dictionaries | Slows fuzzer, dilutes useful tokens | Keep focused: 50-200 most relevant entries |
| Missing hex escapes | Non-printable bytes become mangled | Use |
| No comments | Hard to maintain and audit | Document sections with |
| 反模式 | 问题 | 正确做法 |
|---|---|---|
| 包含完整句子 | 模糊测试工具需要原子标记,而非完整文本 | 将句子拆分为独立关键字 |
| 包含重复条目 | 浪费变异资源 | 使用 |
| 字典过大 | 拖慢模糊测试工具,稀释有效标记 | 保持聚焦:仅保留50-200个最相关的条目 |
| 缺少十六进制转义 | 不可打印字节会被损坏 | 使用 |
| 无注释 | 难以维护和审计 | 使用 |
Tool-Specific Guidance
工具专属指南
libFuzzer
libFuzzer
bash
clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/Integration tips:
- Dictionary tokens are inserted/replaced during mutations
- Combine with to control input size
-max_len - Use to see dictionary effectiveness metrics
-print_final_stats=1 - Dictionary entries longer than are ignored
-max_len
bash
clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/集成技巧:
- 字典标记会在变异过程中被插入/替换
- 与配合使用以控制输入大小
-max_len - 使用查看字典有效性指标
-print_final_stats=1 - 长度超过的字典条目会被忽略
-max_len
AFL++
AFL++
bash
afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@Integration tips:
- AFL++ supports multiple flags for multiple dictionaries
-x - Use with
AFL_LLVM_DICT2FILEfor auto-generated dictionariesafl-clang-lto - Dictionary effectiveness shown in fuzzer stats UI
- Tokens are used during deterministic and havoc stages
bash
afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@集成技巧:
- AFL++支持多个参数以加载多个字典
-x - 结合与
AFL_LLVM_DICT2FILE使用自动生成字典afl-clang-lto - 字典有效性会显示在模糊测试工具的统计UI中
- 标记会在确定性变异和随机变异阶段被使用
cargo-fuzz (Rust)
cargo-fuzz(Rust)
bash
cargo fuzz run fuzz_target -- -dict=./dictionary.dictIntegration tips:
- cargo-fuzz uses libFuzzer backend, so all libFuzzer dict flags work
- Place dictionary file in directory alongside harness
fuzz/ - Reference from harness directory:
cargo fuzz run target -- -dict=../dictionary.dict
bash
cargo fuzz run fuzz_target -- -dict=./dictionary.dict集成技巧:
- cargo-fuzz基于libFuzzer后端,因此支持所有libFuzzer的字典参数
- 将字典文件放在目录下,与测试 harness 同级
fuzz/ - 从harness目录引用字典:
cargo fuzz run target -- -dict=../dictionary.dict
go-fuzz (Go)
go-fuzz(Go)
go-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:
bash
undefinedgo-fuzz不支持内置字典功能,但可以手动将字典条目添加到测试语料库中:
bash
undefinedConvert dictionary to corpus files
将字典转换为语料库文件
grep -o '".*"' dict.txt | while read line; do
echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done
go-fuzz -bin=./target-fuzz.zip -workdir=.
undefinedgrep -o '".*"' dict.txt | while read line; do
echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done
go-fuzz -bin=./target-fuzz.zip -workdir=.
undefinedTroubleshooting
故障排除
| Issue | Cause | Solution |
|---|---|---|
| Dictionary file not loaded | Wrong path or format error | Check fuzzer output for dict parsing errors; verify file format |
| No coverage improvement | Dictionary tokens not relevant | Analyze target code for actual keywords; try different generation method |
| Syntax errors in dict file | Unescaped quotes or invalid escapes | Use |
| Fuzzer ignores long entries | Entries exceed | Keep entries under max input length, or increase |
| Too many entries slow fuzzer | Dictionary too large | Prune to 50-200 most relevant entries |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 字典文件未加载 | 路径错误或格式错误 | 检查模糊测试工具的输出以查看字典解析错误;验证文件格式 |
| 覆盖率无提升 | 字典标记不相关 | 分析目标代码以确定实际使用的关键字;尝试不同的生成方法 |
| 字典文件存在语法错误 | 未转义引号或无效转义 | 使用 |
| 模糊测试工具忽略长条目 | 条目长度超过 | 保持条目长度在最大输入长度以内,或增大 |
| 条目过多导致模糊测试工具变慢 | 字典过大 | 精简到50-200个最相关的条目 |
Related Skills
相关技能
Tools That Use This Technique
应用该技术的工具
| Skill | How It Applies |
|---|---|
| libfuzzer | Native dictionary support via |
| aflpp | Native dictionary support via |
| cargo-fuzz | Uses libFuzzer backend, inherits |
| 技能 | 应用方式 |
|---|---|
| libfuzzer | 通过 |
| aflpp | 通过 |
| cargo-fuzz | 基于libFuzzer后端,继承 |
Related Techniques
相关技术
| Skill | Relationship |
|---|---|
| fuzzing-corpus | Dictionaries complement corpus: corpus provides structure, dictionary provides keywords |
| coverage-analysis | Use coverage data to validate dictionary effectiveness |
| harness-writing | Harness structure determines which dictionary tokens are useful |
| 技能 | 关系 |
|---|---|
| fuzzing-corpus | 字典与测试语料库互补:语料库提供结构,字典提供关键字 |
| coverage-analysis | 使用覆盖率数据验证字典有效性 |
| harness-writing | 测试 harness 的结构决定了哪些字典标记有用 |
Resources
资源
Key External Resources
核心外部资源
AFL++ Dictionaries
Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.
libFuzzer Dictionary Documentation
Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.
AFL++ Dictionaries
针对常见格式(HTML、XML、JSON、SQL等)的预构建字典。是特定格式模糊测试的良好起点。
libFuzzer Dictionary Documentation
libFuzzer关于字典格式和用法的官方文档。解释了标记插入策略和性能影响。
Additional Examples
额外示例
OSS-Fuzz Dictionaries
Real-world dictionaries from Google's continuous fuzzing service. Search project directories for files to see production examples.
*.dictOSS-Fuzz Dictionaries
来自Google持续模糊测试服务的真实字典。在项目目录中搜索文件可查看生产环境示例。
*.dict