harness-writing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWriting Fuzzing Harnesses
编写Fuzzing Harnesses
A fuzzing harness is the entrypoint function that receives random data from the fuzzer and routes it to your system under test (SUT). The quality of your harness directly determines which code paths get exercised and whether critical bugs are found. A poorly written harness can miss entire subsystems or produce non-reproducible crashes.
Fuzzing Harness是从模糊测试器接收随机数据并将其传递给被测系统(SUT)的入口函数。Harness的质量直接决定了哪些代码路径会被执行,以及是否能发现关键漏洞。编写不佳的Harness可能会遗漏整个子系统,或者导致无法复现的崩溃。
Overview
概述
The harness is the bridge between the fuzzer's random byte generation and your application's API. It must parse raw bytes into meaningful inputs, call target functions, and handle edge cases gracefully. The most important part of any fuzzing setup is the harness—if written poorly, critical parts of your application may not be covered.
Harness是模糊测试器的随机字节生成器与应用程序API之间的桥梁。它必须将原始字节解析为有意义的输入,调用目标函数,并优雅地处理边缘情况。任何模糊测试设置中最重要的部分就是Harness——如果编写得不好,应用程序的关键部分可能无法被覆盖。
Key Concepts
核心概念
| Concept | Description |
|---|---|
| Harness | Function that receives fuzzer input and calls target code under test |
| SUT | System Under Test—the code being fuzzed |
| Entry point | Function signature required by the fuzzer (e.g., |
| FuzzedDataProvider | Helper class for structured extraction of typed data from raw bytes |
| Determinism | Property that ensures same input always produces same behavior |
| Interleaved fuzzing | Single harness that exercises multiple operations based on input |
| 概念 | 描述 |
|---|---|
| Harness | 接收模糊测试器输入并调用被测目标代码的函数 |
| SUT | 被测系统(System Under Test)——即被模糊测试的代码 |
| 入口点(Entry point) | 模糊测试器要求的函数签名(例如 |
| FuzzedDataProvider | 用于从原始字节中结构化提取类型化数据的辅助类 |
| 确定性(Determinism) | 确保相同输入始终产生相同行为的特性 |
| 交错模糊测试(Interleaved fuzzing) | 可根据输入执行多个操作的单一Harness |
When to Apply
适用场景
Apply this technique when:
- Creating a new fuzz target for the first time
- Fuzz campaign has low code coverage or isn't finding bugs
- Crashes found during fuzzing are not reproducible
- Target API requires complex or structured inputs
- Multiple related functions should be tested together
Skip this technique when:
- Using existing well-tested harnesses from your project
- Tool provides automatic harness generation that meets your needs
- Target already has comprehensive fuzzing infrastructure
在以下场景中应用此技术:
- 首次创建新的模糊测试目标时
- 模糊测试活动的代码覆盖率低或未发现漏洞时
- 模糊测试中发现的崩溃无法复现时
- 目标API需要复杂或结构化的输入时
- 需要同时测试多个相关函数时
在以下场景中跳过此技术:
- 使用项目中已有的经过充分测试的Harness时
- 工具提供的自动Harness生成功能满足需求时
- 目标已具备全面的模糊测试基础设施时
Quick Reference
快速参考
| Task | Pattern |
|---|---|
| Minimal C++ harness | |
| Minimal Rust harness | `fuzz_target!( |
| Size validation | |
| Cast to integers | |
| Use FuzzedDataProvider | |
| Extract typed data (C++) | |
| Extract string (C++) | |
| 任务 | 实现模式 |
|---|---|
| 最简C++ Harness | |
| 最简Rust Harness | `fuzz_target!( |
| 大小验证 | |
| 转换为整数 | |
| 使用FuzzedDataProvider | |
| 提取类型化数据(C++) | |
| 提取字符串(C++) | |
Step-by-Step
分步指南
Step 1: Identify Entry Points
步骤1:确定入口点
Find functions in your codebase that:
- Accept external input (parsers, validators, protocol handlers)
- Parse complex data formats (JSON, XML, binary protocols)
- Perform security-critical operations (authentication, cryptography)
- Have high cyclomatic complexity or many branches
Good targets are typically:
- Protocol parsers
- File format parsers
- Serialization/deserialization functions
- Input validation routines
在代码库中找到以下函数:
- 接收外部输入的函数(解析器、验证器、协议处理程序)
- 解析复杂数据格式的函数(JSON、XML、二进制协议)
- 执行安全关键操作的函数(身份验证、加密)
- 圈复杂度高或分支较多的函数
理想的测试目标通常是:
- 协议解析器
- 文件格式解析器
- 序列化/反序列化函数
- 输入验证例程
Step 2: Write Minimal Harness
步骤2:编写最简Harness
Start with the simplest possible harness that calls your target function:
C/C++:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
target_function(data, size);
return 0;
}Rust:
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
target_function(data);
});从调用目标函数的最简单Harness开始:
C/C++:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
target_function(data, size);
return 0;
}Rust:
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
target_function(data);
});Step 3: Add Input Validation
步骤3:添加输入验证
Reject inputs that are too small or too large to be meaningful:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Ensure minimum size for meaningful input
if (size < MIN_INPUT_SIZE || size > MAX_INPUT_SIZE) {
return 0;
}
target_function(data, size);
return 0;
}Rationale: The fuzzer generates random inputs of all sizes. Your harness must handle empty, tiny, huge, or malformed inputs without causing unexpected issues in the harness itself (crashes in the SUT are fine—that's what we're looking for).
拒绝过小或过大的无意义输入:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// 确保输入达到有意义的最小尺寸
if (size < MIN_INPUT_SIZE || size > MAX_INPUT_SIZE) {
return 0;
}
target_function(data, size);
return 0;
}原理: 模糊测试器会生成各种尺寸的随机输入。你的Harness必须能够处理空输入、极小输入、超大输入或格式错误的输入,且自身不会出现意外问题(被测系统中的崩溃是可接受的——这正是我们要寻找的)。
Step 4: Structure the Input
步骤4:结构化输入
For APIs that require typed data (integers, strings, etc.), use casting or helpers like :
FuzzedDataProviderSimple casting:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size != 2 * sizeof(uint32_t)) {
return 0;
}
uint32_t numerator = *(uint32_t*)(data);
uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));
divide(numerator, denominator);
return 0;
}Using FuzzedDataProvider:
cpp
#include "FuzzedDataProvider.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
FuzzedDataProvider fuzzed_data(data, size);
size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
return 0;
}对于需要类型化数据(整数、字符串等)的API,使用强制类型转换或等辅助工具:
FuzzedDataProvider简单强制类型转换:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size != 2 * sizeof(uint32_t)) {
return 0;
}
uint32_t numerator = *(uint32_t*)(data);
uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));
divide(numerator, denominator);
return 0;
}使用FuzzedDataProvider:
cpp
#include "FuzzedDataProvider.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
FuzzedDataProvider fuzzed_data(data, size);
size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
return 0;
}Step 5: Test and Iterate
步骤5:测试与迭代
Run the fuzzer and monitor:
- Code coverage (are all interesting paths reached?)
- Executions per second (is it fast enough?)
- Crash reproducibility (can you reproduce crashes with saved inputs?)
Iterate on the harness to improve these metrics.
运行模糊测试器并监控以下指标:
- 代码覆盖率(是否覆盖了所有重要路径?)
- 每秒执行次数(速度是否足够快?)
- 崩溃可复现性(能否使用保存的输入复现崩溃?)
迭代优化Harness以提升这些指标。
Common Patterns
常见模式
Pattern: Beyond Byte Arrays—Casting to Integers
模式:字节数组之外——转换为整数
Use Case: When target expects primitive types like integers or floats
Implementation:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Ensure exactly 2 4-byte numbers
if (size != 2 * sizeof(uint32_t)) {
return 0;
}
// Split input into two integers
uint32_t numerator = *(uint32_t*)(data);
uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));
divide(numerator, denominator);
return 0;
}Rust equivalent:
rust
fuzz_target!(|data: &[u8]| {
if data.len() != 2 * std::mem::size_of::<i32>() {
return;
}
let numerator = i32::from_ne_bytes([data[0], data[1], data[2], data[3]]);
let denominator = i32::from_ne_bytes([data[4], data[5], data[6], data[7]]);
divide(numerator, denominator);
});Why it works: Any 8-byte input is valid. The fuzzer learns that inputs must be exactly 8 bytes, and every bit flip produces a new, potentially interesting input.
适用场景: 目标函数期望整数或浮点数等基本类型时
实现:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// 确保输入恰好包含2个4字节数字
if (size != 2 * sizeof(uint32_t)) {
return 0;
}
// 将输入拆分为两个整数
uint32_t numerator = *(uint32_t*)(data);
uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));
divide(numerator, denominator);
return 0;
}Rust等效实现:
rust
fuzz_target!(|data: &[u8]| {
if data.len() != 2 * std::mem::size_of::<i32>() {
return;
}
let numerator = i32::from_ne_bytes([data[0], data[1], data[2], data[3]]);
let denominator = i32::from_ne_bytes([data[4], data[5], data[6], data[7]]);
divide(numerator, denominator);
});优势: 任何8字节输入都是有效的。模糊测试器会学习到输入必须恰好为8字节,且每一位的翻转都会产生新的、可能有趣的输入。
Pattern: FuzzedDataProvider for Complex Inputs
模式:使用FuzzedDataProvider处理复杂输入
Use Case: When target requires multiple strings, integers, or variable-length data
Implementation:
cpp
#include "FuzzedDataProvider.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
FuzzedDataProvider fuzzed_data(data, size);
// Extract different types of data
size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
// Consume variable-length strings with terminator
std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
char* result = concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
if (result != NULL) {
free(result);
}
return 0;
}Why it helps: handles the complexity of extracting structured data from a byte stream. It's particularly useful for APIs that need multiple parameters of different types.
FuzzedDataProvider适用场景: 目标函数需要多个字符串、整数或可变长度数据时
实现:
cpp
#include "FuzzedDataProvider.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
FuzzedDataProvider fuzzed_data(data, size);
// 提取不同类型的数据
size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
// 提取带终止符的可变长度字符串
std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
char* result = concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
if (result != NULL) {
free(result);
}
return 0;
}优势: 处理了从字节流中结构化提取数据的复杂性。对于需要多个不同类型参数的API来说特别有用。
FuzzedDataProviderPattern: Interleaved Fuzzing
模式:交错模糊测试
Use Case: When multiple related operations should be tested in a single harness
Implementation:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < 1 + 2 * sizeof(int32_t)) {
return 0;
}
// First byte selects operation
uint8_t mode = data[0];
// Next bytes are operands
int32_t numbers[2];
memcpy(numbers, data + 1, 2 * sizeof(int32_t));
int32_t result = 0;
switch (mode % 4) {
case 0:
result = add(numbers[0], numbers[1]);
break;
case 1:
result = subtract(numbers[0], numbers[1]);
break;
case 2:
result = multiply(numbers[0], numbers[1]);
break;
case 3:
result = divide(numbers[0], numbers[1]);
break;
}
// Prevent compiler from optimizing away the calls
printf("%d", result);
return 0;
}Advantages:
- Faster to write one harness than multiple individual harnesses
- Single shared corpus means interesting inputs for one operation may be interesting for others
- Can discover bugs in interactions between operations
When to use:
- Operations share similar input types
- Operations are logically related (e.g., arithmetic operations, CRUD operations)
- Single corpus makes sense across all operations
适用场景: 需要在单个Harness中测试多个相关操作时
实现:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < 1 + 2 * sizeof(int32_t)) {
return 0;
}
// 第一个字节选择操作类型
uint8_t mode = data[0];
// 后续字节为操作数
int32_t numbers[2];
memcpy(numbers, data + 1, 2 * sizeof(int32_t));
int32_t result = 0;
switch (mode % 4) {
case 0:
result = add(numbers[0], numbers[1]);
break;
case 1:
result = subtract(numbers[0], numbers[1]);
break;
case 2:
result = multiply(numbers[0], numbers[1]);
break;
case 3:
result = divide(numbers[0], numbers[1]);
break;
}
// 防止编译器优化掉函数调用
printf("%d", result);
return 0;
}优势:
- 编写一个Harness比编写多个独立Harness更快
- 单一共享语料库意味着针对一个操作的有趣输入可能对其他操作也有用
- 可以发现操作之间的交互漏洞
适用时机:
- 操作使用相似的输入类型时
- 操作在逻辑上相关时(例如算术运算、CRUD操作)
- 单一语料库对所有操作都有意义时
Pattern: Structure-Aware Fuzzing with Arbitrary (Rust)
模式:使用Arbitrary进行感知结构的模糊测试(Rust)
Use Case: When fuzzing Rust code that uses custom structs
Implementation:
rust
use arbitrary::Arbitrary;
#[derive(Debug, Arbitrary)]
pub struct Name {
data: String
}
impl Name {
pub fn check_buf(&self) {
let data = self.data.as_bytes();
if data.len() > 0 && data[0] == b'a' {
if data.len() > 1 && data[1] == b'b' {
if data.len() > 2 && data[2] == b'c' {
process::abort();
}
}
}
}
}Harness with arbitrary:
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: your_project::Name| {
data.check_buf();
});Add to Cargo.toml:
toml
[dependencies]
arbitrary = { version = "1", features = ["derive"] }Why it helps: The crate automatically handles deserialization of raw bytes into your Rust structs, reducing boilerplate and ensuring valid struct construction.
arbitraryLimitation: The arbitrary crate doesn't offer reverse serialization, so you can't manually construct byte arrays that map to specific structs. This works best when starting from an empty corpus (fine for libFuzzer, problematic for AFL++).
适用场景: 模糊测试使用自定义结构体的Rust代码时
实现:
rust
use arbitrary::Arbitrary;
#[derive(Debug, Arbitrary)]
pub struct Name {
data: String
}
impl Name {
pub fn check_buf(&self) {
let data = self.data.as_bytes();
if data.len() > 0 && data[0] == b'a' {
if data.len() > 1 && data[1] == b'b' {
if data.len() > 2 && data[2] == b'c' {
process::abort();
}
}
}
}
}使用Arbitrary的Harness:
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: your_project::Name| {
data.check_buf();
});添加到Cargo.toml:
toml
[dependencies]
arbitrary = { version = "1", features = ["derive"] }优势: crate 自动处理将原始字节反序列化为Rust结构体的过程,减少样板代码并确保结构体的有效构造。
arbitrary局限性: arbitrary crate 不支持反向序列化,因此无法手动构造映射到特定结构体的字节数组。这在从空语料库开始时效果很好(适用于libFuzzer),但对于AFL++来说可能存在问题。
Advanced Usage
高级用法
Tips and Tricks
技巧与窍门
| Tip | Why It Helps |
|---|---|
| Start with parsers | High bug density, clear entry points, easy to harness |
| Mock I/O operations | Prevents hangs from blocking I/O, enables determinism |
| Use FuzzedDataProvider | Simplifies extraction of structured data from raw bytes |
| Reset global state | Ensures each iteration is independent and reproducible |
| Free resources in harness | Prevents memory exhaustion during long campaigns |
| Avoid logging in harness | Logging is slow—fuzzing needs 100s-1000s exec/sec |
| Test harness manually first | Run harness with known inputs before starting campaign |
| Check coverage early | Ensure harness reaches expected code paths |
| 技巧 | 优势 |
|---|---|
| 从解析器开始 | 漏洞密度高,入口点清晰,易于编写Harness |
| 模拟I/O操作 | 防止阻塞I/O导致的挂起,确保确定性 |
| 使用FuzzedDataProvider | 简化从原始字节中提取结构化数据的过程 |
| 重置全局状态 | 确保每次迭代独立且可复现 |
| 在Harness中释放资源 | 防止长时间测试活动中的内存耗尽 |
| 避免在Harness中记录日志 | 日志记录速度慢——模糊测试需要每秒执行数百到数千次 |
| 先手动测试Harness | 在开始测试活动前,使用已知输入运行Harness |
| 尽早检查覆盖率 | 确保Harness覆盖了预期的代码路径 |
Structure-Aware Fuzzing with Protocol Buffers
使用Protocol Buffers进行感知结构的模糊测试
For highly structured input formats, consider using Protocol Buffers as an intermediate format with custom mutators:
cpp
// Define your input format in .proto file
// Use libprotobuf-mutator to generate valid mutations
// This ensures fuzzer mutates message contents, not the protobuf encoding itselfThis approach is more setup but prevents the fuzzer from wasting time on unparseable inputs. See structure-aware fuzzing documentation for details.
对于高度结构化的输入格式,可以考虑使用Protocol Buffers作为中间格式,并配合自定义变异器:
cpp
// 在.proto文件中定义输入格式
// 使用libprotobuf-mutator生成有效的变异
// 确保模糊测试器变异消息内容,而非protobuf编码本身这种方法设置更复杂,但可以防止模糊测试器在无法解析的输入上浪费时间。详情请参阅感知结构的模糊测试文档。
Handling Non-Determinism
处理非确定性
Problem: Random values or timing dependencies cause non-reproducible crashes.
Solutions:
- Replace with deterministic PRNG seeded from fuzzer input:
rand()cppuint32_t seed = fuzzed_data.ConsumeIntegral<uint32_t>(); srand(seed); - Mock system calls that return time, PIDs, or random data
- Avoid reading from or
/dev/random/dev/urandom
问题: 随机值或时间依赖导致无法复现的崩溃。
解决方案:
- 使用从模糊测试器输入中生成的种子替换:
rand()cppuint32_t seed = fuzzed_data.ConsumeIntegral<uint32_t>(); srand(seed); - 模拟返回时间、PID或随机数据的系统调用
- 避免读取或
/dev/random/dev/urandom
Resetting Global State
重置全局状态
If your SUT uses global state (singletons, static variables), reset it between iterations:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Reset global state before each iteration
global_reset();
target_function(data, size);
// Clean up resources
global_cleanup();
return 0;
}Rationale: Global state can cause crashes after N iterations rather than on a specific input, making bugs non-reproducible.
如果被测系统使用全局状态(单例、静态变量),请在每次迭代之间重置:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// 在每次迭代开始前重置全局状态
global_reset();
target_function(data, size);
// 清理资源
global_cleanup();
return 0;
}原理: 全局状态可能导致崩溃发生在N次迭代后,而非特定输入,从而使漏洞无法复现。
Practical Harness Rules
实用Harness规则
Follow these rules to ensure effective fuzzing harnesses:
| Rule | Rationale |
|---|---|
| Handle all input sizes | Fuzzer generates empty, tiny, huge inputs—harness must handle gracefully |
Never call | Calling |
| Join all threads | Each iteration must run to completion before next iteration starts |
| Be fast | Aim for 100s-1000s executions/sec. Avoid logging, high complexity, excess memory |
| Maintain determinism | Same input must always produce same behavior for reproducibility |
| Avoid global state | Global state reduces reproducibility—reset between iterations if unavoidable |
| Use narrow targets | Don't fuzz PNG and TCP in same harness—different formats need separate targets |
| Free resources | Prevent memory leaks that cause resource exhaustion during long campaigns |
Note: These guidelines apply not just to harness code, but to the entire SUT. If the SUT violates these rules, consider patching it (see the fuzzing obstacles technique).
遵循以下规则以确保高效的模糊测试Harness:
| 规则 | 原理 |
|---|---|
| 处理所有输入尺寸 | 模糊测试器会生成空输入、极小输入、超大输入——Harness必须优雅处理 |
切勿调用 | 调用 |
| 等待所有线程结束 | 每次迭代必须在下次迭代开始前完成执行 |
| 保持快速 | 目标为每秒执行数百到数千次。避免日志记录、高复杂度操作和过多内存使用 |
| 保持确定性 | 相同输入必须始终产生相同行为,以确保可复现性 |
| 避免全局状态 | 全局状态会降低可复现性——如果无法避免,请在迭代之间重置 |
| 使用窄范围目标 | 不要在同一个Harness中同时模糊测试PNG和TCP——不同格式需要单独的测试目标 |
| 释放资源 | 防止长时间测试活动中因内存泄漏导致的资源耗尽 |
注意: 这些准则不仅适用于Harness代码,也适用于整个被测系统。如果被测系统违反这些规则,请考虑对其进行修补(请参阅模糊测试障碍技术)。
Anti-Patterns
反模式
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Global state without reset | Non-deterministic crashes | Reset all globals at start of harness |
| Blocking I/O or network calls | Hangs fuzzer, wastes time | Mock I/O, use in-memory buffers |
| Memory leaks in harness | Resource exhaustion kills campaign | Free all allocations before returning |
Calling | Stops entire fuzzing process | Use |
| Heavy logging in harness | Reduces exec/sec by orders of magnitude | Disable logging during fuzzing |
| Too many operations per iteration | Slows down fuzzer | Keep iterations fast and focused |
| Mixing unrelated input formats | Corpus entries not useful across formats | Separate harnesses for different formats |
| Not validating input size | Harness crashes on edge cases | Check |
| 反模式 | 问题 | 正确做法 |
|---|---|---|
| 全局状态未重置 | 非确定性崩溃 | 在Harness开始时重置所有全局状态 |
| 阻塞I/O或网络调用 | 导致模糊测试器挂起,浪费时间 | 模拟I/O操作,使用内存缓冲区 |
| Harness中的内存泄漏 | 资源耗尽导致测试活动终止 | 在返回前释放所有分配的内存 |
在被测系统中调用 | 终止整个模糊测试进程 | 使用 |
| Harness中大量日志记录 | 导致每秒执行次数大幅下降 | 模糊测试期间禁用日志记录 |
| 每次迭代执行过多操作 | 减慢模糊测试器速度 | 保持迭代快速且聚焦 |
| 混合不相关的输入格式 | 语料库条目对其他格式无用 | 为不同格式使用单独的Harness |
| 未验证输入尺寸 | Harness在边缘情况中崩溃 | 在访问 |
Tool-Specific Guidance
工具特定指南
libFuzzer
libFuzzer
Harness signature:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Your code here
return 0; // Non-zero return is reserved for future use
}Compilation:
bash
clang++ -fsanitize=fuzzer,address -g harness.cc -o fuzz_targetIntegration tips:
- Use for structured input extraction
FuzzedDataProvider.h - Compile with to link the fuzzing runtime
-fsanitize=fuzzer - Add sanitizers () to detect more bugs
-fsanitize=address,undefined - Use for better stack traces when crashes occur
-g - libFuzzer can start with empty corpus—no seed inputs required
Running:
bash
./fuzz_target corpus_dir/Resources:
Harness签名:
cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// 你的代码
return 0; // 非零返回值预留供未来使用
}编译:
bash
clang++ -fsanitize=fuzzer,address -g harness.cc -o fuzz_target集成技巧:
- 使用进行结构化输入提取
FuzzedDataProvider.h - 编译时添加以链接模糊测试运行时
-fsanitize=fuzzer - 添加 sanitizer()以检测更多漏洞
-fsanitize=address,undefined - 使用以在崩溃时获得更好的堆栈跟踪
-g - libFuzzer可从空语料库开始——无需种子输入
运行:
bash
./fuzz_target corpus_dir/资源:
AFL++
AFL++
AFL++ supports multiple harness styles. For best performance, use persistent mode:
Persistent mode harness:
cpp
#include <unistd.h>
int main(int argc, char **argv) {
#ifdef __AFL_HAVE_MANUAL_CONTROL
__AFL_INIT();
#endif
unsigned char buf[MAX_SIZE];
while (__AFL_LOOP(10000)) {
// Read input from stdin
ssize_t len = read(0, buf, sizeof(buf));
if (len <= 0) break;
// Call target function
target_function(buf, len);
}
return 0;
}Compilation:
bash
afl-clang-fast++ -g harness.cc -o fuzz_targetIntegration tips:
- Use persistent mode () for 10-100x speedup
__AFL_LOOP - Consider deferred initialization () to skip setup overhead
__AFL_INIT() - AFL++ requires at least one seed input in the corpus directory
- Use or
AFL_USE_ASAN=1for sanitizer buildsAFL_USE_UBSAN=1
Running:
bash
afl-fuzz -i seeds/ -o findings/ -- ./fuzz_targetAFL++支持多种Harness风格。为获得最佳性能,请使用持久化模式:
持久化模式Harness:
cpp
#include <unistd.h>
int main(int argc, char **argv) {
#ifdef __AFL_HAVE_MANUAL_CONTROL
__AFL_INIT();
#endif
unsigned char buf[MAX_SIZE];
while (__AFL_LOOP(10000)) {
// 从标准输入读取输入
ssize_t len = read(0, buf, sizeof(buf));
if (len <= 0) break;
// 调用目标函数
target_function(buf, len);
}
return 0;
}编译:
bash
afl-clang-fast++ -g harness.cc -o fuzz_target集成技巧:
- 使用持久化模式()可获得10-100倍的速度提升
__AFL_LOOP - 考虑使用延迟初始化()以跳过设置开销
__AFL_INIT() - AFL++要求语料库目录中至少有一个种子输入
- 使用或
AFL_USE_ASAN=1启用sanitizer构建AFL_USE_UBSAN=1
运行:
bash
afl-fuzz -i seeds/ -o findings/ -- ./fuzz_targetcargo-fuzz (Rust)
cargo-fuzz(Rust)
Harness signature:
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
// Your code here
});With structured input (arbitrary crate):
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: YourStruct| {
data.check();
});Creating harness:
bash
cargo fuzz init
cargo fuzz add my_targetIntegration tips:
- Use crate for automatic struct deserialization
arbitrary - cargo-fuzz wraps libFuzzer, so all libFuzzer features work
- Compile with sanitizers automatically via cargo-fuzz
- Harnesses go in directory
fuzz/fuzz_targets/
Running:
bash
cargo +nightly fuzz run my_targetResources:
Harness签名:
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
// 你的代码
});使用结构化输入(arbitrary crate):
rust
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: YourStruct| {
data.check();
});创建Harness:
bash
cargo fuzz init
cargo fuzz add my_target集成技巧:
- 使用crate 自动进行结构体反序列化
arbitrary - cargo-fuzz封装了libFuzzer,因此所有libFuzzer功能都可使用
- 通过cargo-fuzz自动启用sanitizer
- Harness存放在目录中
fuzz/fuzz_targets/
运行:
bash
cargo +nightly fuzz run my_target资源:
go-fuzz
go-fuzz
Harness signature:
go
// +build gofuzz
package mypackage
func Fuzz(data []byte) int {
// Call target function
target(data)
// Return codes:
// -1 if input is invalid
// 0 if input is valid but not interesting
// 1 if input is interesting (e.g., added new coverage)
return 0
}Building:
bash
go-fuzz-buildIntegration tips:
- Return 1 for inputs that add coverage (optional—fuzzer can detect automatically)
- Return -1 for invalid inputs to deprioritize similar mutations
- go-fuzz handles persistence automatically
Running:
bash
go-fuzz -bin=./mypackage-fuzz.zip -workdir=fuzzHarness签名:
go
// +build gofuzz
package mypackage
func Fuzz(data []byte) int {
// 调用目标函数
target(data)
// 返回码:
// -1 表示输入无效
// 0 表示输入有效但无意义
// 1 表示输入有意义(例如新增了覆盖率)
return 0
}构建:
bash
go-fuzz-build集成技巧:
- 对于新增覆盖率的输入返回1(可选——模糊测试器可自动检测)
- 对于无效输入返回-1以降低相似变异的优先级
- go-fuzz自动处理持久化
运行:
bash
go-fuzz -bin=./mypackage-fuzz.zip -workdir=fuzzTroubleshooting
故障排除
| Issue | Cause | Solution |
|---|---|---|
| Low executions/sec | Harness is too slow (logging, I/O, complexity) | Profile harness, remove bottlenecks, mock I/O |
| No crashes found | Coverage not reaching buggy code | Check coverage, improve harness to reach more paths |
| Non-reproducible crashes | Non-determinism or global state | Remove randomness, reset globals between iterations |
| Fuzzer exits immediately | Harness calls | Replace |
| Out of memory errors | Memory leaks in harness or SUT | Free allocations, use leak sanitizer to find leaks |
| Crashes on empty input | Harness doesn't validate size | Add |
| Corpus not growing | Inputs too constrained or format too strict | Use FuzzedDataProvider or structure-aware fuzzing |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 每秒执行次数低 | Harness过慢(日志记录、I/O、复杂度高) | 分析Harness性能,移除瓶颈,模拟I/O |
| 未发现崩溃 | 覆盖率未达到有漏洞的代码 | 检查覆盖率,优化Harness以覆盖更多路径 |
| 崩溃无法复现 | 非确定性或全局状态 | 移除随机性,在迭代之间重置全局状态 |
| 模糊测试器立即退出 | Harness调用了 | 将 |
| 内存不足错误 | Harness或被测系统存在内存泄漏 | 释放分配的内存,使用泄漏sanitizer查找泄漏点 |
| 空输入导致崩溃 | Harness未验证输入尺寸 | 添加 |
| 语料库未增长 | 输入限制过严或格式过于严格 | 使用FuzzedDataProvider或感知结构的模糊测试 |
Related Skills
相关技术
Tools That Use This Technique
使用此技术的工具
| Skill | How It Applies |
|---|---|
| libfuzzer | Uses |
| aflpp | Supports persistent mode harnesses with |
| cargo-fuzz | Uses Rust-specific |
| atheris | Python harness takes bytes, calls Python functions |
| ossfuzz | Requires harnesses in specific directory structure for cloud fuzzing |
| 技术 | 应用方式 |
|---|---|
| libfuzzer | 使用 |
| aflpp | 支持使用 |
| cargo-fuzz | 使用Rust特定的 |
| atheris | Python Harness接收字节并调用Python函数 |
| ossfuzz | 要求Harness存放在特定目录结构中以进行云端模糊测试 |
Related Techniques
相关技术
| Skill | Relationship |
|---|---|
| coverage-analysis | Measure harness effectiveness—are you reaching target code? |
| address-sanitizer | Detects bugs found by harness (buffer overflows, use-after-free) |
| fuzzing-dictionary | Provide tokens to help fuzzer pass format checks in harness |
| fuzzing-obstacles | Patch SUT when it violates harness rules (exit, non-determinism) |
| 技术 | 关系 |
|---|---|
| coverage-analysis | 衡量Harness的有效性——是否覆盖了目标代码? |
| address-sanitizer | 检测Harness发现的漏洞(缓冲区溢出、释放后使用) |
| fuzzing-dictionary | 提供令牌以帮助模糊测试器通过Harness中的格式检查 |
| fuzzing-obstacles | 当被测系统违反Harness规则(调用exit()、非确定性)时对其进行修补 |
Resources
资源
Key External Resources
关键外部资源
Split Inputs in libFuzzer - Google Fuzzing Docs
Explains techniques for handling multiple input parameters in a single fuzzing harness, including use of magic separators and FuzzedDataProvider.
Structure-Aware Fuzzing with Protocol Buffers
Advanced technique using protobuf as intermediate format with custom mutators to ensure fuzzer mutates message contents rather than format encoding.
libFuzzer Documentation
Official LLVM documentation covering harness requirements, best practices, and advanced features.
cargo-fuzz Book
Comprehensive guide to writing Rust fuzzing harnesses with cargo-fuzz and the arbitrary crate.
libFuzzer中的输入拆分 - Google模糊测试文档
解释了在单个模糊测试Harness中处理多个输入参数的技术,包括使用魔术分隔符和FuzzedDataProvider。
使用Protocol Buffers进行感知结构的模糊测试
高级技术,使用protobuf作为中间格式并配合自定义变异器,确保模糊测试器变异消息内容而非格式编码。
libFuzzer文档
官方LLVM文档,涵盖Harness要求、最佳实践和高级功能。
cargo-fuzz手册
关于使用cargo-fuzz和arbitrary crate编写Rust模糊测试Harness的综合指南。
Video Resources
视频资源
- Effective File Format Fuzzing - Conference talk on writing harnesses for file format parsers
- Modern Fuzzing of C/C++ Projects - Tutorial covering harness design patterns
- 高效文件格式模糊测试 - 关于为文件格式解析器编写Harness的会议演讲
- C/C++项目的现代模糊测试 - 涵盖Harness设计模式的教程