Hegel: Property-Based Testing
Hegel is a universal family of property-based testing, supporting a variety of
languages all powered by Hypothesis. Tests integrate with standard test runners
(e.g.
,
,
, etc.). Hegel generates random inputs
for your code and automatically shrinks failing cases to minimal counterexamples.
Workflow
Follow these steps when writing property-based tests.
1. Detect Language and Load References
Identify the project language from build files:
| File | Language | Reference |
|---|
| Rust | |
Load the corresponding reference file for API details and idiomatic patterns.
2. Explore the Code Under Test
Before writing any test, understand what you're testing:
- Read the source code of the function/module under test
- Read existing tests to understand expected behavior and edge cases
- Read docstrings, comments, and type signatures for documented contracts
- Read usage sites to see how callers use the code and what they expect
The goal is to find evidence for properties, not to invent them.
3. Identify Valuable Properties
Look for properties that are:
- Grounded in evidence from the code, docs, or usage patterns
- Non-trivial — they test real behavior, not tautologies, and do not duplicate the code being tested
- Falsifiable — a buggy implementation could actually violate them
Write one test per property. Don't cram multiple properties into one test.
4. Check for Existing Tests to Evolve or Port
Before writing tests from scratch, always check existing tests:
- Existing PBTs in another framework (proptest, quickcheck, etc.) should be
ported to hegel. Load for general guidance and the
language-specific porting reference (e.g.,
references/porting-rust.md
).
Don't carry over narrow generator bounds from the old framework — use broader
generators unless bounds are justified by the function's contract.
- Unit tests and example-based tests can often be evolved into PBTs. Load
references/evolving-tests.md
for guidance. Tests with hardcoded seeds,
parameterized examples, or multiple similar test cases are prime candidates.
- Tests that use with fixed seeds are especially good candidates —
the randomness should come from hegel instead so failures produce shrinkable
counterexamples.
When you evolve an existing test,
modify the existing test file rather than
creating a new one. Add hegel tests alongside (or replacing) the existing tests
in the same file where the original tests live. Do not create a separate
or similar — property-based tests are tests like any other and
belong with the code they're testing.
5. Write the Tests
For each property:
- Add tests to the appropriate existing test file. If there's already a
covering the module, add hegel tests there. Only create a new
file if no relevant test file exists.
- Choose the simplest possible generators — start with no bounds, unless
bounds are logically necessary (e.g. if a number has to be non-zero it's
fine to force it to be, but lists should not have set unless there
is a compelling correctness reason to set them or poor performance has been
observed when actually running the test)
- Draw values using
- Run the code under test
- Assert the property
6. Run and Reflect
Run the tests. When a test fails, ask:
- Is this a real bug? If the code violates its own contract, flag the bug to the user and ask what to do, or fix the code if instructed to do so.
- Is the property unsound? If you asserted something the code never promised, fix the test.
- Is the generator too broad? Only if the failing input is genuinely outside the function's domain, add constraints. Investigate before constraining.
Property Categories
Use this taxonomy to identify what to test. Not every category applies to every
function — pick the ones supported by evidence.
| Category | Description | Example |
|---|
| Round-trip | encode then decode recovers the original | deserialize(serialize(x)) == x
|
| Idempotence | applying twice equals applying once | sort(sort(xs)) == sort(xs)
|
| Commutativity | order of operations doesn't matter | or |
| Invariant preservation | an operation maintains a structural property | insert into BST preserves ordering
|
| Oracle / reference impl | compare against a known-correct implementation | , or comparing against an unoptimised implementation |
| Monotonicity | more input means more (or equal) output | |
| Bounds / contracts | output stays within documented limits | is in |
| No-crash / robustness | function handles all valid inputs without panicking | doesn't panic |
| Equivalence | two implementations produce the same result | iterative_fib(n) == recursive_fib(n)
|
| Model-based | operations on real system match a simplified model | HashMap ops match Vec<(K,V)> model
|
| Consistency | related APIs in the same library agree | string_width(s) == sum(char_width(c) for c in s)
|
| Precision preservation | numeric values survive format conversions | for all |
High-Value Patterns (Field-Tested)
These patterns are ranked by how often they found real bugs when tested across
many popular Rust crate libraries. See
references/field-tested-patterns.md
for detailed examples.
1. Model Tests (Highest Value for Data Structures)
For any data structure, the highest-value first test is a model test — run
the same operations on the library under test and a known-good reference (usually
a std type), then assert they agree after every operation.
Choose the right oracle:
- for sequential containers (fixed-capacity vecs, small vecs)
- for hash maps (alternative/concurrent hash maps)
- for ordered maps (tree maps, persistent maps)
- for ordered sets / bitmaps (compressed bitmaps, tree sets)
- for unordered sets (indexed sets, bit sets)
2. Idempotence Tests (Highest Value for String/Text Processing)
Any normalization, case conversion, or formatting function should be idempotent:
. Use
(not ASCII-only generators) because
Unicode edge cases like
→
and combining characters are where bugs hide.
3. Parse Robustness (Universal — Test Every Parser)
Every
,
, or
function should be tested with
. The property is simple: it should never panic. Parsers
that delegate to constructors which panic on invalid values (instead of returning
errors) are a common source of bugs.
4. Roundtrip Tests (High Value for Serialization)
for any serialize/deserialize pair. Test with the full
input domain — don't restrict to "reasonable" values. Bugs hide at boundaries
like zero (e.g. scientific notation missing the coefficient), large integers
(precision loss through f64 intermediaries for values > 2^53), and unusual
string content (double slashes in paths, control characters).
5. Boundary Value Tests (High Value for Numeric Code)
Integer operations should be tested with
,
,
, and unconstrained
ranges. Negating
overflows, dividing by
overflows, and
many libraries forget to handle these. Don't add
.min_value(-100).max_value(100)
— those bounds hide real bugs.
Choosing Properties
Properties must be evidence-based. Find evidence in:
- Names and Type signatures: A function
fn merge(a: Vec<T>, b: Vec<T>) -> Vec<T>
implies the output length might equal the sum of input lengths.
- Docstrings and comments: "Returns a sorted list" directly gives you an invariant.
- Assertions and debug_asserts in the source: These are properties the author already identified, and do not need to be duplicated in the tests, but may suggest other invariants.
- Usage patterns: If callers always assume a result is non-empty, assert that the result is always non-empty.
- Existing tests: Unit tests often encode specific instances of general properties.
Err on the side of creating more properties rather than fewer, and if they fail investigate whether the failure is legitimate behaviour or not.
Beware of properties that seem universal but aren't. Read the docs carefully
before asserting a property. Examples from real testing:
- Grapheme-based string reverse is NOT an involution (
reverse(reverse("\n\r")) ≠ "\n\r"
because is one grapheme cluster while is two).
- A method called might mean symmetric difference (A △ B), not set
difference (A \ B) — check the docs.
- A function documented as "returns the largest key ≤ k" means ≤, not <.
When a property fails, investigate whether it's a real bug or a genuine edge case
in the domain. A weaker property often still holds.
Generator Discipline
A common mistake agents make when writing property-based tests is over-constraining generators.
This leads to tests that are weaker than they need to be.
Start With No Bounds
If the function accepts any
, use:
rust
generators::integers::<i32>() // no min_value, no max_value
Do NOT preemptively write:
rust
generators::integers::<i32>().min_value(0).max_value(100) // WRONG unless justified
Edge Cases Are the Point
Don't narrow ranges to "avoid edge cases." Edge cases are exactly what PBT is for. If a function claims to work on all
values, test it on all
values — including
,
,
,
, and
.
Don't Add by Default
Unless the function's contract explicitly requires non-empty input, test with empty collections too. If a function panics on an empty vec, that might be a bug worth knowing about.
When a Test Fails on Extreme Values
Your first reaction should be: is this a real bug?
You should assume that it is unless you have strong evidence that it is not. If in doubt, ask the user.
- If the function's documentation says it handles all integers but it overflows on , that's a bug in the code, not in your test.
- Only add bounds after investigating and confirming the input is outside the function's documented domain.
When to Add Constraints
Add generator bounds only when:
- The function's contract explicitly excludes some inputs. For example, documents that is required.
- You need to avoid undefined behavior. For example, division by zero.
- A test failure has been investigated and confirmed to be outside the function's domain.
Avoid rejection sampling where possible
When a constraint involves relationships between multiple generated values, you may use
:
rust
let a = tc.draw(generators::integers::<i32>());
let b = tc.draw(generators::integers::<i32>());
tc.assume(a != b); // constraint relates two values
This example is perfectly fine, but it is better to avoid
altogether when you can:
e.g.
rust
let a = tc.draw(generators::integers::<i32>());
let b = tc.draw(generators::integers::<i32>().min_value(a));
is better than
rust
let a = tc.draw(generators::integers::<i32>());
let b = tc.draw(generators::integers::<i32>());
tc.assume(a <= b)
Even better is:
rust
let mut a = tc.draw(generators::integers::<i32>());
let mut b = tc.draw(generators::integers::<i32>());
if (a > b) {
(a, b) = (b, a);
}
It is particularly important to avoid rejection sampling in cases where the rejection rate is likely to be high.
For example
st.integers().map(|n| n * 2)
is much better than
st.integers().filter(|n| n % 2 == 0)
, as the former constructs an even number directly, while the latter throws away around 50% of test cases.
Getting Large Collections
Hegel's default collection size is small. If you need large collections (e.g.,
to exercise deep tree paths), draw the size separately and pass it as
:
rust
// GOOD — can generate large collections, shrinks well
let n = tc.draw(generators::integers::<usize>().max_value(300));
let keys: Vec<i32> = tc.draw(generators::vecs(generators::integers())
.min_size(n)); // no max_size — let hegel go bigger if it wants
// BAD — hegel's default size distribution rarely produces 100+ elements
let keys: Vec<i32> = tc.draw(generators::vecs(generators::integers()));
Setting
but
not is a shrinking optimization: hegel can
shrink
to find the minimal collection size that triggers the bug, while
still being able to add extra elements if needed.
Use for Key Generation
When testing maps/sets that need unique keys:
rust
let keys: Vec<i32> = tc.draw(generators::vecs(generators::integers::<i32>())
.max_size(30).unique());
This avoids confusion about which value wins for duplicate keys.
Handling Randomness in Code Under Test
When the code under test requires an RNG (e.g.,
fn sample(&self, rng: &mut impl Rng)
),
do not create a seeded RNG like
ChaCha8Rng::seed_from_u64(seed)
with a
hegel-generated seed. This defeats shrinking — hegel can only shrink the seed
integer, not the actual random decisions the RNG makes.
Instead, use hegel's
feature to get a hegel-controlled RNG. See the
language-specific reference for API details.
Rand version mismatch
Hegel's
feature uses rand 0.9. If the project uses an older version of
rand (e.g., 0.8), the RNG traits will be incompatible. In this case, ask the
user whether they'd like to upgrade the project's rand dependency to 0.9. This
is usually straightforward (the main API changes are
->
,
->
,
->
,
->
). Do not silently fall back to seeded ChaCha —
that defeats the purpose.
Two modes: artificial vs true randomness
-
Default (artificial randomness): Every random decision goes through hegel,
enabling fine-grained shrinking of individual random values. This is the best
option for most code.
-
generators::randoms().use_true_random(true)
: Generates a single seed via
hegel, then creates a real
from it. Hegel can only shrink the seed,
not individual random decisions. Use this when the code under test does
rejection sampling or otherwise depends on the RNG producing
statistically random-looking output. Artificial randomness can cause rejection
loops to hang because the controlled byte sequences don't look random enough.
How to choose: Start with the default. If tests hang or time out because the
code does rejection sampling internally, switch to
.
Refactoring concrete RNG types
If the code under test takes a concrete RNG type (e.g.,
)
rather than a trait bound, consider whether it should be refactored to accept
or
instead. This is both better API design and
makes the code testable with hegel's random generator. Suggest this refactoring
to the user.
Common Mistakes
- Over-constraining generators — Adding bounds "just in case." This hides bugs and makes tests less valuable. See Generator Discipline above.
- Testing trivial properties — or test nothing. Every property should be falsifiable by a buggy implementation.
- Using the implementation as the oracle — If your test calls the same function to compute the expected result, it can never fail. Use an independent reference implementation (do not just copy the code to write this!), a simpler algorithm, or a structural property.
- Generating too broadly then filtering almost everything — If or rejects most inputs, Hegel will give up. Restructure your generators instead (e.g., use or dependent generation).
- Creating a separate test file for hegel tests — Property-based tests belong alongside the existing tests for the same code. Don't put them in or — add them to the existing test files.
- Using manually seeded RNGs — Don't generate a seed with hegel then create
ChaCha8Rng::seed_from_u64(seed)
. Use with the feature so hegel controls the random decisions and can shrink them. See "Handling Randomness" above.
- Overflowing in test code — When computing values from generated data (e.g., ), your test code itself can overflow before the library has a chance to be buggy. Use wrapping arithmetic () or smaller intermediate types (draw , cast to for multiplication) to prevent this. Distinguish "this constraint protects the library's contract" (keep it) from "this constraint prevents my test from overflowing" (use wrapping arithmetic instead).
- Adding for performance — If a test is slow with large collections, lower rather than restricting the input space. A slow test that finds bugs beats a fast test that can't. Many tree/trie bugs only manifest at 50-200+ elements.
Quick Setup
Rust
toml
# Cargo.toml
[dev-dependencies]
hegel = { git = "https://github.com/hegeldev/hegel-rust" }
If the code under test uses
:
toml
[dev-dependencies]
hegel = { git = "https://github.com/hegeldev/hegel-rust", features = ["rand"] }
Requires
on PATH. Run with
.