fuzzing-dictionary

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Fuzzing Dictionary

模糊测试字典

A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.

模糊测试字典提供特定领域的标记，引导模糊测试工具生成更有价值的输入。不同于纯粹的随机变异，模糊测试工具会整合已知的关键字、魔术数字、协议命令和特定格式字符串，这些内容更有可能触达解析器、协议处理器和文件格式处理程序中的深层代码路径。

Overview

概述

Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.

字典是包含带引号字符串的文本文件，这些字符串代表目标程序的有意义标记。它们帮助模糊测试工具绕过早期验证检查，探索仅通过盲目变异难以触及的代码路径。

Key Concepts

核心概念

Concept	Description
Dictionary Entry	A quoted string (e.g., `"keyword"` ) or key-value pair (e.g., `kw="value"` )
Hex Escapes	Byte sequences like `"\xF7\xF8"` for non-printable characters
Token Injection	Fuzzer inserts dictionary entries into generated inputs
Cross-Fuzzer Format	Dictionary files work with libFuzzer, AFL++, and cargo-fuzz

概念	说明
字典条目	带引号的字符串（例如 `"keyword"` ）或键值对（例如 `kw="value"` ）
十六进制转义	用于表示不可打印字符的字节序列，例如 `"\xF7\xF8"`
标记注入	模糊测试工具将字典条目插入到生成的输入中
跨工具兼容格式	字典文件可与libFuzzer、AFL++和cargo-fuzz配合使用

When to Apply

适用场景

Apply this technique when:

Fuzzing parsers (JSON, XML, config files)
Fuzzing protocol implementations (HTTP, DNS, custom protocols)
Fuzzing file format handlers (PNG, PDF, media codecs)
Coverage plateaus early without reaching deeper logic
Target code checks for specific keywords or magic values

Skip this technique when:

Fuzzing pure algorithms without format expectations
Target has no keyword-based parsing
Corpus already achieves high coverage

在以下场景应用该技术：

对解析器（JSON、XML、配置文件）进行模糊测试
对协议实现（HTTP、DNS、自定义协议）进行模糊测试
对文件格式处理程序（PNG、PDF、媒体编解码器）进行模糊测试
测试覆盖率过早进入平台期，无法触及深层逻辑
目标代码会检查特定关键字或魔术值

在以下场景跳过该技术：

对不涉及格式期望的纯算法进行模糊测试
目标程序不基于关键字进行解析
测试语料库已实现高覆盖率

Quick Reference

快速参考

Task	Command/Pattern
Use with libFuzzer	`./fuzz -dict=./dictionary.dict ...`
Use with AFL++	`afl-fuzz -x ./dictionary.dict ...`
Use with cargo-fuzz	`cargo fuzz run fuzz_target -- -dict=./dictionary.dict`
Extract from header	`grep -o '".*"' header.h > header.dict`
Generate from binary	`strings ./binary \| sed 's/^/"&/; s/$/&"/' > strings.dict`

任务	命令/模式
与libFuzzer配合使用	`./fuzz -dict=./dictionary.dict ...`
与AFL++配合使用	`afl-fuzz -x ./dictionary.dict ...`
与cargo-fuzz配合使用	`cargo fuzz run fuzz_target -- -dict=./dictionary.dict`
从头文件提取	`grep -o '".*"' header.h > header.dict`
从二进制文件生成	`strings ./binary \| sed 's/^/"&/; s/$/&"/' > strings.dict`

Step-by-Step

分步指南

Step 1: Create Dictionary File

步骤1：创建字典文件

Create a text file with quoted strings on each line. Use comments (

) for documentation.

Example dictionary format:

conf

undefined

创建一个文本文件，每行包含一个带引号的字符串。使用注释（

）添加文档说明。

字典格式示例：

conf

undefined

Lines starting with '#' and empty lines are ignored.

以'#'开头的行和空行将被忽略。

Adds "blah" (w/o quotes) to the dictionary.

向字典添加"blah"（不带引号）。

kw1="blah"

Use \ for backslash and " for quotes.

使用\表示反斜杠，"表示引号。

kw2=""ac\dc""

Use \xAB for hex values

使用\xAB表示十六进制值

kw3="\xF7\xF8"

the name of the keyword followed by '=' may be omitted:

可以省略关键字名称和等号：

"foo\x0Abar"

undefined

"foo\x0Abar"

undefined

Step 2: Generate Dictionary Content

步骤2：生成字典内容

Choose a generation method based on what's available:

From LLM: Prompt ChatGPT or Claude with:

text

A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.

From header files:

bash

grep -o '".*"' header.h > header.dict

From man pages (for CLI tools):

bash

man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict

From binary strings:

bash

strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

根据可用资源选择生成方法：

使用大语言模型： 向ChatGPT或Claude发送以下提示：

text

字典可用于引导模糊测试工具。请为<PNG解析器>的模糊测试编写一个字典文件。每行应为带引号的字符串或键值对，例如kw="value"。包含魔术字节、块类型和常见头值。使用\xF7\xF8这样的十六进制转义表示二进制值。

从头文件生成：

bash

grep -o '".*"' header.h > header.dict

从手册页生成（适用于CLI工具）：

bash

man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict

从二进制字符串生成：

bash

strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

Step 3: Pass Dictionary to Fuzzer

步骤3：将字典传入模糊测试工具

Use the appropriate flag for your fuzzer (see Quick Reference above).

使用对应工具的专属参数（见上方快速参考）。

Common Patterns

常见模式

Pattern: Protocol Keywords

模式：协议关键字

Use Case: Fuzzing HTTP or custom protocol handlers

Dictionary content:

conf

undefined

适用场景： 对HTTP或自定义协议处理器进行模糊测试

字典内容：

conf

undefined

HTTP methods

HTTP方法

"GET" "POST" "PUT" "DELETE" "HEAD"

Headers

请求头

"Content-Type" "Authorization" "Host"

Protocol markers

协议标记

"HTTP/1.1" "HTTP/2.0"

undefined

"HTTP/1.1" "HTTP/2.0"

undefined

Pattern: Magic Bytes and File Format Headers

模式：魔术字节与文件格式头

Use Case: Fuzzing image parsers, media decoders, archive handlers

Dictionary content:

conf

undefined

适用场景： 对图像解析器、媒体解码器、归档处理程序进行模糊测试

字典内容：

conf

undefined

PNG magic bytes and chunks

PNG魔术字节和块

png_magic="\x89PNG\r\n\x1a\n" ihdr="IHDR" plte="PLTE" idat="IDAT" iend="IEND"

JPEG markers

JPEG标记

jpeg_soi="\xFF\xD8" jpeg_eoi="\xFF\xD9"

undefined

jpeg_soi="\xFF\xD8" jpeg_eoi="\xFF\xD9"

undefined

Pattern: Configuration File Keywords

模式：配置文件关键字

Use Case: Fuzzing config file parsers (YAML, TOML, INI)

Dictionary content:

conf

undefined

适用场景： 对配置文件解析器（YAML、TOML、INI）进行模糊测试

字典内容：

conf

undefined

Common config keywords

常见配置关键字

"true" "false" "null" "version" "enabled" "disabled"

Section headers

章节头

"[general]" "[network]" "[security]"

undefined

"[general]" "[network]" "[security]"

undefined

Advanced Usage

高级用法

Tips and Tricks

技巧与窍门

Tip	Why It Helps
Combine multiple generation methods	LLM-generated keywords + strings from binary covers broad surface
Include boundary values	`"0"` , `"-1"` , `"2147483647"` trigger edge cases
Add format delimiters	`:` , `=` , `{` , `}` help fuzzer construct valid structures
Keep dictionaries focused	50-200 entries perform better than thousands
Test dictionary effectiveness	Run with and without dict, compare coverage

技巧	优势
结合多种生成方法	大语言模型生成的关键字 + 二进制字符串提取可覆盖更广范围
包含边界值	`"0"` 、 `"-1"` 、 `"2147483647"` 可触发边缘情况
添加格式分隔符	`:` 、 `=` 、 `{` 、 `}` 帮助模糊测试工具构建有效结构
保持字典聚焦	50-200个条目的性能优于数千个条目
测试字典有效性	分别在使用和不使用字典的情况下运行测试，对比覆盖率

Auto-Generated Dictionaries (AFL++)

自动生成字典（AFL++）

When using

afl-clang-lto

compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.

Enable auto-dictionary:

bash

export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target

使用

afl-clang-lto

编译器时，AFL++会自动从二进制文件中的字符串比较中提取字典条目。这一过程通过AUTODICTIONARY特性在编译时完成。

启用自动字典：

bash

export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target

Dictionary saved to auto.dict

字典将保存到auto.dict

afl-fuzz -x auto.dict -i in -o out -- ./target

undefined

afl-fuzz -x auto.dict -i in -o out -- ./target

undefined

Combining Multiple Dictionaries

合并多个字典

Some fuzzers support multiple dictionary files:

bash

undefined

部分模糊测试工具支持使用多个字典文件：

bash

undefined

AFL++ with multiple dictionaries

与多个字典配合使用的AFL++命令

afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target

undefined

afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target

undefined

Anti-Patterns

反模式

Anti-Pattern	Problem	Correct Approach
Including full sentences	Fuzzer needs atomic tokens, not prose	Break into individual keywords
Duplicating entries	Wastes mutation budget	Use `sort -u` to deduplicate
Over-sized dictionaries	Slows fuzzer, dilutes useful tokens	Keep focused: 50-200 most relevant entries
Missing hex escapes	Non-printable bytes become mangled	Use `\xXX` for binary values
No comments	Hard to maintain and audit	Document sections with `#` comments

反模式	问题	正确做法
包含完整句子	模糊测试工具需要原子标记，而非完整文本	将句子拆分为独立关键字
包含重复条目	浪费变异资源	使用 `sort -u` 去重
字典过大	拖慢模糊测试工具，稀释有效标记	保持聚焦：仅保留50-200个最相关的条目
缺少十六进制转义	不可打印字节会被损坏	使用 `\xXX` 表示二进制值
无注释	难以维护和审计	使用 `#` 注释为章节添加文档说明

Tool-Specific Guidance

工具专属指南

libFuzzer

bash

clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/

Integration tips:

Dictionary tokens are inserted/replaced during mutations
Combine with
```
-max_len
```
to control input size
Use
```
-print_final_stats=1
```
to see dictionary effectiveness metrics
Dictionary entries longer than
```
-max_len
```
are ignored

bash

clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/

集成技巧：

字典标记会在变异过程中被插入/替换
与
```
-max_len
```
配合使用以控制输入大小
使用
```
-print_final_stats=1
```
查看字典有效性指标
长度超过
```
-max_len
```
的字典条目会被忽略

AFL++

bash

afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@

Integration tips:

AFL++ supports multiple
```
-x
```
flags for multiple dictionaries
Use
```
AFL_LLVM_DICT2FILE
```
with
```
afl-clang-lto
```
for auto-generated dictionaries
Dictionary effectiveness shown in fuzzer stats UI
Tokens are used during deterministic and havoc stages

bash

afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@

集成技巧：

AFL++支持多个
```
-x
```
参数以加载多个字典
结合
```
AFL_LLVM_DICT2FILE
```
与
```
afl-clang-lto
```
使用自动生成字典
字典有效性会显示在模糊测试工具的统计UI中
标记会在确定性变异和随机变异阶段被使用

cargo-fuzz (Rust)

cargo-fuzz（Rust）

bash

cargo fuzz run fuzz_target -- -dict=./dictionary.dict

Integration tips:

cargo-fuzz uses libFuzzer backend, so all libFuzzer dict flags work
Place dictionary file in
```
fuzz/
```
directory alongside harness

Reference from harness directory:

cargo fuzz run target -- -dict=../dictionary.dict

bash

cargo fuzz run fuzz_target -- -dict=./dictionary.dict

集成技巧：

cargo-fuzz基于libFuzzer后端，因此支持所有libFuzzer的字典参数
将字典文件放在
```
fuzz/
```
目录下，与测试 harness 同级

从harness目录引用字典：

cargo fuzz run target -- -dict=../dictionary.dict

go-fuzz (Go)

go-fuzz（Go）

go-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:

bash

undefined

go-fuzz不支持内置字典功能，但可以手动将字典条目添加到测试语料库中：

bash

undefined

Convert dictionary to corpus files

将字典转换为语料库文件

grep -o '".*"' dict.txt | while read line; do echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1) done

go-fuzz -bin=./target-fuzz.zip -workdir=.

undefined

grep -o '".*"' dict.txt | while read line; do echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1) done

go-fuzz -bin=./target-fuzz.zip -workdir=.

undefined

Troubleshooting

故障排除

Issue	Cause	Solution
Dictionary file not loaded	Wrong path or format error	Check fuzzer output for dict parsing errors; verify file format
No coverage improvement	Dictionary tokens not relevant	Analyze target code for actual keywords; try different generation method
Syntax errors in dict file	Unescaped quotes or invalid escapes	Use `\\` for backslash, `\"` for quotes; validate with test run
Fuzzer ignores long entries	Entries exceed `-max_len`	Keep entries under max input length, or increase `-max_len`
Too many entries slow fuzzer	Dictionary too large	Prune to 50-200 most relevant entries

问题	原因	解决方案
字典文件未加载	路径错误或格式错误	检查模糊测试工具的输出以查看字典解析错误；验证文件格式
覆盖率无提升	字典标记不相关	分析目标代码以确定实际使用的关键字；尝试不同的生成方法
字典文件存在语法错误	未转义引号或无效转义	使用 `\\` 表示反斜杠， `\"` 表示引号；通过测试运行验证格式
模糊测试工具忽略长条目	条目长度超过 `-max_len`	保持条目长度在最大输入长度以内，或增大 `-max_len`
条目过多导致模糊测试工具变慢	字典过大	精简到50-200个最相关的条目

Related Skills

Skill	How It Applies
libfuzzer	Native dictionary support via `-dict=` flag
aflpp	Native dictionary support via `-x` flag; auto-generation with AUTODICTIONARIES
cargo-fuzz	Uses libFuzzer backend, inherits `-dict=` support

技能	应用方式
libfuzzer	通过 `-dict=` 参数原生支持字典
aflpp	通过 `-x` 参数原生支持字典；通过AUTODICTIONARIES特性自动生成字典
cargo-fuzz	基于libFuzzer后端，继承 `-dict=` 参数支持

Skill	Relationship
fuzzing-corpus	Dictionaries complement corpus: corpus provides structure, dictionary provides keywords
coverage-analysis	Use coverage data to validate dictionary effectiveness
harness-writing	Harness structure determines which dictionary tokens are useful

技能	关系
fuzzing-corpus	字典与测试语料库互补：语料库提供结构，字典提供关键字
coverage-analysis	使用覆盖率数据验证字典有效性
harness-writing	测试 harness 的结构决定了哪些字典标记有用

核心外部资源

AFL++ Dictionaries Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.

libFuzzer Dictionary Documentation Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.

AFL++ Dictionaries 针对常见格式（HTML、XML、JSON、SQL等）的预构建字典。是特定格式模糊测试的良好起点。

libFuzzer Dictionary Documentation libFuzzer关于字典格式和用法的官方文档。解释了标记插入策略和性能影响。

Additional Examples

额外示例

OSS-Fuzz Dictionaries Real-world dictionaries from Google's continuous fuzzing service. Search project directories for

*.dict

files to see production examples.

OSS-Fuzz Dictionaries 来自Google持续模糊测试服务的真实字典。在项目目录中搜索

*.dict

文件可查看生产环境示例。