fuzzing-dictionary

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Fuzzing Dictionary

模糊测试字典

A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.
模糊测试字典提供特定领域的标记,引导模糊测试工具生成更有价值的输入。不同于纯粹的随机变异,模糊测试工具会整合已知的关键字、魔术数字、协议命令和特定格式字符串,这些内容更有可能触达解析器、协议处理器和文件格式处理程序中的深层代码路径。

Overview

概述

Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.
字典是包含带引号字符串的文本文件,这些字符串代表目标程序的有意义标记。它们帮助模糊测试工具绕过早期验证检查,探索仅通过盲目变异难以触及的代码路径。

Key Concepts

核心概念

ConceptDescription
Dictionary EntryA quoted string (e.g.,
"keyword"
) or key-value pair (e.g.,
kw="value"
)
Hex EscapesByte sequences like
"\xF7\xF8"
for non-printable characters
Token InjectionFuzzer inserts dictionary entries into generated inputs
Cross-Fuzzer FormatDictionary files work with libFuzzer, AFL++, and cargo-fuzz
概念说明
字典条目带引号的字符串(例如
"keyword"
)或键值对(例如
kw="value"
十六进制转义用于表示不可打印字符的字节序列,例如
"\xF7\xF8"
标记注入模糊测试工具将字典条目插入到生成的输入中
跨工具兼容格式字典文件可与libFuzzer、AFL++和cargo-fuzz配合使用

When to Apply

适用场景

Apply this technique when:
  • Fuzzing parsers (JSON, XML, config files)
  • Fuzzing protocol implementations (HTTP, DNS, custom protocols)
  • Fuzzing file format handlers (PNG, PDF, media codecs)
  • Coverage plateaus early without reaching deeper logic
  • Target code checks for specific keywords or magic values
Skip this technique when:
  • Fuzzing pure algorithms without format expectations
  • Target has no keyword-based parsing
  • Corpus already achieves high coverage
在以下场景应用该技术:
  • 对解析器(JSON、XML、配置文件)进行模糊测试
  • 对协议实现(HTTP、DNS、自定义协议)进行模糊测试
  • 对文件格式处理程序(PNG、PDF、媒体编解码器)进行模糊测试
  • 测试覆盖率过早进入平台期,无法触及深层逻辑
  • 目标代码会检查特定关键字或魔术值
在以下场景跳过该技术:
  • 对不涉及格式期望的纯算法进行模糊测试
  • 目标程序不基于关键字进行解析
  • 测试语料库已实现高覆盖率

Quick Reference

快速参考

TaskCommand/Pattern
Use with libFuzzer
./fuzz -dict=./dictionary.dict ...
Use with AFL++
afl-fuzz -x ./dictionary.dict ...
Use with cargo-fuzz
cargo fuzz run fuzz_target -- -dict=./dictionary.dict
Extract from header
grep -o '".*"' header.h > header.dict
Generate from binary
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict
任务命令/模式
与libFuzzer配合使用
./fuzz -dict=./dictionary.dict ...
与AFL++配合使用
afl-fuzz -x ./dictionary.dict ...
与cargo-fuzz配合使用
cargo fuzz run fuzz_target -- -dict=./dictionary.dict
从头文件提取
grep -o '".*"' header.h > header.dict
从二进制文件生成
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

Step-by-Step

分步指南

Step 1: Create Dictionary File

步骤1:创建字典文件

Create a text file with quoted strings on each line. Use comments (
#
) for documentation.
Example dictionary format:
conf
undefined
创建一个文本文件,每行包含一个带引号的字符串。使用注释(
#
)添加文档说明。
字典格式示例:
conf
undefined

Lines starting with '#' and empty lines are ignored.

以'#'开头的行和空行将被忽略。

Adds "blah" (w/o quotes) to the dictionary.

向字典添加"blah"(不带引号)。

kw1="blah"
kw1="blah"

Use \ for backslash and " for quotes.

使用\表示反斜杠,"表示引号。

kw2=""ac\dc""
kw2=""ac\dc""

Use \xAB for hex values

使用\xAB表示十六进制值

kw3="\xF7\xF8"
kw3="\xF7\xF8"

the name of the keyword followed by '=' may be omitted:

可以省略关键字名称和等号:

"foo\x0Abar"
undefined
"foo\x0Abar"
undefined

Step 2: Generate Dictionary Content

步骤2:生成字典内容

Choose a generation method based on what's available:
From LLM: Prompt ChatGPT or Claude with:
text
A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.
From header files:
bash
grep -o '".*"' header.h > header.dict
From man pages (for CLI tools):
bash
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict
From binary strings:
bash
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict
根据可用资源选择生成方法:
使用大语言模型: 向ChatGPT或Claude发送以下提示:
text
字典可用于引导模糊测试工具。请为<PNG解析器>的模糊测试编写一个字典文件。每行应为带引号的字符串或键值对,例如kw="value"。包含魔术字节、块类型和常见头值。使用\xF7\xF8这样的十六进制转义表示二进制值。
从头文件生成:
bash
grep -o '".*"' header.h > header.dict
从手册页生成(适用于CLI工具):
bash
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict
从二进制字符串生成:
bash
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

Step 3: Pass Dictionary to Fuzzer

步骤3:将字典传入模糊测试工具

Use the appropriate flag for your fuzzer (see Quick Reference above).
使用对应工具的专属参数(见上方快速参考)。

Common Patterns

常见模式

Pattern: Protocol Keywords

模式:协议关键字

Use Case: Fuzzing HTTP or custom protocol handlers
Dictionary content:
conf
undefined
适用场景: 对HTTP或自定义协议处理器进行模糊测试
字典内容:
conf
undefined

HTTP methods

HTTP方法

"GET" "POST" "PUT" "DELETE" "HEAD"
"GET" "POST" "PUT" "DELETE" "HEAD"

Headers

请求头

"Content-Type" "Authorization" "Host"
"Content-Type" "Authorization" "Host"

Protocol markers

协议标记

"HTTP/1.1" "HTTP/2.0"
undefined
"HTTP/1.1" "HTTP/2.0"
undefined

Pattern: Magic Bytes and File Format Headers

模式:魔术字节与文件格式头

Use Case: Fuzzing image parsers, media decoders, archive handlers
Dictionary content:
conf
undefined
适用场景: 对图像解析器、媒体解码器、归档处理程序进行模糊测试
字典内容:
conf
undefined

PNG magic bytes and chunks

PNG魔术字节和块

png_magic="\x89PNG\r\n\x1a\n" ihdr="IHDR" plte="PLTE" idat="IDAT" iend="IEND"
png_magic="\x89PNG\r\n\x1a\n" ihdr="IHDR" plte="PLTE" idat="IDAT" iend="IEND"

JPEG markers

JPEG标记

jpeg_soi="\xFF\xD8" jpeg_eoi="\xFF\xD9"
undefined
jpeg_soi="\xFF\xD8" jpeg_eoi="\xFF\xD9"
undefined

Pattern: Configuration File Keywords

模式:配置文件关键字

Use Case: Fuzzing config file parsers (YAML, TOML, INI)
Dictionary content:
conf
undefined
适用场景: 对配置文件解析器(YAML、TOML、INI)进行模糊测试
字典内容:
conf
undefined

Common config keywords

常见配置关键字

"true" "false" "null" "version" "enabled" "disabled"
"true" "false" "null" "version" "enabled" "disabled"

Section headers

章节头

"[general]" "[network]" "[security]"
undefined
"[general]" "[network]" "[security]"
undefined

Advanced Usage

高级用法

Tips and Tricks

技巧与窍门

TipWhy It Helps
Combine multiple generation methodsLLM-generated keywords + strings from binary covers broad surface
Include boundary values
"0"
,
"-1"
,
"2147483647"
trigger edge cases
Add format delimiters
:
,
=
,
{
,
}
help fuzzer construct valid structures
Keep dictionaries focused50-200 entries perform better than thousands
Test dictionary effectivenessRun with and without dict, compare coverage
技巧优势
结合多种生成方法大语言模型生成的关键字 + 二进制字符串提取可覆盖更广范围
包含边界值
"0"
"-1"
"2147483647"
可触发边缘情况
添加格式分隔符
:
=
{
}
帮助模糊测试工具构建有效结构
保持字典聚焦50-200个条目的性能优于数千个条目
测试字典有效性分别在使用和不使用字典的情况下运行测试,对比覆盖率

Auto-Generated Dictionaries (AFL++)

自动生成字典(AFL++)

When using
afl-clang-lto
compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.
Enable auto-dictionary:
bash
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
使用
afl-clang-lto
编译器时,AFL++会自动从二进制文件中的字符串比较中提取字典条目。这一过程通过AUTODICTIONARY特性在编译时完成。
启用自动字典:
bash
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target

Dictionary saved to auto.dict

字典将保存到auto.dict

afl-fuzz -x auto.dict -i in -o out -- ./target
undefined
afl-fuzz -x auto.dict -i in -o out -- ./target
undefined

Combining Multiple Dictionaries

合并多个字典

Some fuzzers support multiple dictionary files:
bash
undefined
部分模糊测试工具支持使用多个字典文件:
bash
undefined

AFL++ with multiple dictionaries

与多个字典配合使用的AFL++命令

afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target
undefined
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target
undefined

Anti-Patterns

反模式

Anti-PatternProblemCorrect Approach
Including full sentencesFuzzer needs atomic tokens, not proseBreak into individual keywords
Duplicating entriesWastes mutation budgetUse
sort -u
to deduplicate
Over-sized dictionariesSlows fuzzer, dilutes useful tokensKeep focused: 50-200 most relevant entries
Missing hex escapesNon-printable bytes become mangledUse
\xXX
for binary values
No commentsHard to maintain and auditDocument sections with
#
comments
反模式问题正确做法
包含完整句子模糊测试工具需要原子标记,而非完整文本将句子拆分为独立关键字
包含重复条目浪费变异资源使用
sort -u
去重
字典过大拖慢模糊测试工具,稀释有效标记保持聚焦:仅保留50-200个最相关的条目
缺少十六进制转义不可打印字节会被损坏使用
\xXX
表示二进制值
无注释难以维护和审计使用
#
注释为章节添加文档说明

Tool-Specific Guidance

工具专属指南

libFuzzer

libFuzzer

bash
clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/
Integration tips:
  • Dictionary tokens are inserted/replaced during mutations
  • Combine with
    -max_len
    to control input size
  • Use
    -print_final_stats=1
    to see dictionary effectiveness metrics
  • Dictionary entries longer than
    -max_len
    are ignored
bash
clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/
集成技巧:
  • 字典标记会在变异过程中被插入/替换
  • -max_len
    配合使用以控制输入大小
  • 使用
    -print_final_stats=1
    查看字典有效性指标
  • 长度超过
    -max_len
    的字典条目会被忽略

AFL++

AFL++

bash
afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@
Integration tips:
  • AFL++ supports multiple
    -x
    flags for multiple dictionaries
  • Use
    AFL_LLVM_DICT2FILE
    with
    afl-clang-lto
    for auto-generated dictionaries
  • Dictionary effectiveness shown in fuzzer stats UI
  • Tokens are used during deterministic and havoc stages
bash
afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@
集成技巧:
  • AFL++支持多个
    -x
    参数以加载多个字典
  • 结合
    AFL_LLVM_DICT2FILE
    afl-clang-lto
    使用自动生成字典
  • 字典有效性会显示在模糊测试工具的统计UI中
  • 标记会在确定性变异和随机变异阶段被使用

cargo-fuzz (Rust)

cargo-fuzz(Rust)

bash
cargo fuzz run fuzz_target -- -dict=./dictionary.dict
Integration tips:
  • cargo-fuzz uses libFuzzer backend, so all libFuzzer dict flags work
  • Place dictionary file in
    fuzz/
    directory alongside harness
  • Reference from harness directory:
    cargo fuzz run target -- -dict=../dictionary.dict
bash
cargo fuzz run fuzz_target -- -dict=./dictionary.dict
集成技巧:
  • cargo-fuzz基于libFuzzer后端,因此支持所有libFuzzer的字典参数
  • 将字典文件放在
    fuzz/
    目录下,与测试 harness 同级
  • 从harness目录引用字典:
    cargo fuzz run target -- -dict=../dictionary.dict

go-fuzz (Go)

go-fuzz(Go)

go-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:
bash
undefined
go-fuzz不支持内置字典功能,但可以手动将字典条目添加到测试语料库中:
bash
undefined

Convert dictionary to corpus files

将字典转换为语料库文件

grep -o '".*"' dict.txt | while read line; do echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1) done
go-fuzz -bin=./target-fuzz.zip -workdir=.
undefined
grep -o '".*"' dict.txt | while read line; do echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1) done
go-fuzz -bin=./target-fuzz.zip -workdir=.
undefined

Troubleshooting

故障排除

IssueCauseSolution
Dictionary file not loadedWrong path or format errorCheck fuzzer output for dict parsing errors; verify file format
No coverage improvementDictionary tokens not relevantAnalyze target code for actual keywords; try different generation method
Syntax errors in dict fileUnescaped quotes or invalid escapesUse
\\
for backslash,
\"
for quotes; validate with test run
Fuzzer ignores long entriesEntries exceed
-max_len
Keep entries under max input length, or increase
-max_len
Too many entries slow fuzzerDictionary too largePrune to 50-200 most relevant entries
问题原因解决方案
字典文件未加载路径错误或格式错误检查模糊测试工具的输出以查看字典解析错误;验证文件格式
覆盖率无提升字典标记不相关分析目标代码以确定实际使用的关键字;尝试不同的生成方法
字典文件存在语法错误未转义引号或无效转义使用
\\
表示反斜杠,
\"
表示引号;通过测试运行验证格式
模糊测试工具忽略长条目条目长度超过
-max_len
保持条目长度在最大输入长度以内,或增大
-max_len
条目过多导致模糊测试工具变慢字典过大精简到50-200个最相关的条目

Related Skills

相关技能

Tools That Use This Technique

应用该技术的工具

SkillHow It Applies
libfuzzerNative dictionary support via
-dict=
flag
aflppNative dictionary support via
-x
flag; auto-generation with AUTODICTIONARIES
cargo-fuzzUses libFuzzer backend, inherits
-dict=
support
技能应用方式
libfuzzer通过
-dict=
参数原生支持字典
aflpp通过
-x
参数原生支持字典;通过AUTODICTIONARIES特性自动生成字典
cargo-fuzz基于libFuzzer后端,继承
-dict=
参数支持

Related Techniques

相关技术

SkillRelationship
fuzzing-corpusDictionaries complement corpus: corpus provides structure, dictionary provides keywords
coverage-analysisUse coverage data to validate dictionary effectiveness
harness-writingHarness structure determines which dictionary tokens are useful
技能关系
fuzzing-corpus字典与测试语料库互补:语料库提供结构,字典提供关键字
coverage-analysis使用覆盖率数据验证字典有效性
harness-writing测试 harness 的结构决定了哪些字典标记有用

Resources

资源

Key External Resources

核心外部资源

AFL++ Dictionaries Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.
libFuzzer Dictionary Documentation Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.
AFL++ Dictionaries 针对常见格式(HTML、XML、JSON、SQL等)的预构建字典。是特定格式模糊测试的良好起点。
libFuzzer Dictionary Documentation libFuzzer关于字典格式和用法的官方文档。解释了标记插入策略和性能影响。

Additional Examples

额外示例

OSS-Fuzz Dictionaries Real-world dictionaries from Google's continuous fuzzing service. Search project directories for
*.dict
files to see production examples.
OSS-Fuzz Dictionaries 来自Google持续模糊测试服务的真实字典。在项目目录中搜索
*.dict
文件可查看生产环境示例。