Test-Driven Development: Red-Green-Refactor
Core Principle
Tests verify behavior through public interfaces, not implementation details.
A test that breaks when you refactor internals -- but behavior is unchanged --
is testing implementation, not behavior. Good tests survive refactors.
When fixing a bug: prove it exists with a failing test before touching
production code. The test is evidence. The fix is the response to that evidence.
Workflow Overview
RED -> Write a test that fails (proves the bug or defines missing behavior)
GREEN -> Write the minimum code to make the test pass
REFACTOR -> Improve structure, naming, duplication -- tests stay green
One cycle per behavior. Vertical slices, not horizontal.
Phase 1: RED -- Establish Failure
For Bug Fixes
- Reproduce the bug -- identify the exact input, state, or sequence that
triggers the defect.
- Write a test that exercises the buggy code path with the offending input.
- Assert the correct (expected) behavior, not the current broken output.
- Run the test -- it must fail. If it passes, your test is not capturing the
bug. Rethink your assertion or test setup.
- Name the test descriptively -- include the bug/ticket reference if one
exists (e.g.,
test_bug_1234_negative_balance_rejected
).
For New Features
- Define one behavior the feature should exhibit.
- Write a test for that single behavior using the public API/interface.
- Run the test -- confirm it fails (the feature does not exist yet).
RED Phase Rules
- The test must fail for the right reason (missing behavior, not a compile
error or import failure).
- If you cannot write a test, that is a design signal: the code is not testable
enough. Address testability first.
- Do not write multiple tests at once. One test, one behavior.
Hypothesis-Driven Bug Investigation
When the bug's root cause is unclear:
- Brainstorm multiple hypotheses about what causes the defect.
- Prioritize by likelihood and cost to falsify.
- Write a test targeting the top hypothesis.
- Timebox investigation -- if a hypothesis does not pan out within the timebox,
move to the next one.
- A test that passes unexpectedly is useful data: it eliminates a hypothesis.
Phase 2: GREEN -- Minimal Implementation
- Write the smallest, simplest code that makes the failing test pass.
- Do not add features, abstractions, or optimizations not required by the test.
- Do not anticipate future tests -- solve only the current one.
- Run all tests -- the new test passes and no existing tests broke.
GREEN Phase Rules
- Minimal is enough: ugly code is fine at this stage. Correctness over elegance.
- If an existing test breaks, your change introduced a regression. Fix it before
proceeding.
- If you find yourself writing significant code, consider whether you skipped a
smaller intermediate test.
Phase 3: REFACTOR -- Improve Structure
Only enter this phase when all tests are green.
- Look for duplication, unclear naming, or structural issues.
- Apply one refactoring at a time.
- Run tests after each change -- they must remain green.
- Common refactorings at this stage:
- Extract shared logic into functions/methods
- Rename for clarity
- Simplify conditionals
- Move code to more appropriate modules
- Deepen modules (smaller public interface, richer implementation)
REFACTOR Phase Rules
- Never refactor while RED. Get to GREEN first.
- If a refactoring breaks a test, undo and take a smaller step.
- Do not add new behavior during refactoring. That is a new RED phase.
- Refactoring is optional per cycle -- skip if the code is clean enough.
Bug Fix Workflow (Detailed)
This is the primary use case. When encountering a bug:
1. UNDERSTAND -> Reproduce and isolate the defect
2. RED -> Write a test asserting correct behavior (test fails)
3. GREEN -> Fix the bug with minimal code (test passes)
4. REFACTOR -> Clean up if needed (tests stay green)
5. VERIFY -> Run full test suite; confirm no regressions
Separation of Concerns in PRs
For team workflows, consider splitting into two commits or PRs:
Commit/PR 1 -- Expose the bug:
- Add the failing test that demonstrates the defect
- Assert the correct expected behavior (test will fail)
- This proves the bug is real and reproducible
Commit/PR 2 -- Fix the bug:
- Change production code to fix the defect
- The previously failing test now passes
- This proves the fix addresses the exact bug
This separation provides auditable evidence that the test actually catches the
defect, not that it was written after-the-fact to rubberstamp a fix.
Anti-Patterns
Horizontal Slicing (write all tests, then all code)
Tests written in bulk test imagined behavior. You end up testing shapes and
signatures instead of actual behavior. Tests become insensitive to real changes.
WRONG:
RED: test1, test2, test3, test4, test5
GREEN: impl1, impl2, impl3, impl4, impl5
RIGHT:
RED->GREEN: test1 -> impl1
RED->GREEN: test2 -> impl2
RED->GREEN: test3 -> impl3
Testing Implementation Instead of Behavior
Bad signals:
- Test mocks internal collaborators
- Test accesses private methods or fields
- Test verifies internal state (e.g., querying a database directly instead of
using the public interface)
- Test breaks when you rename an internal function
Skipping RED
Writing tests after the implementation ("test-after") does not provide the
design feedback that TDD gives. If the test never failed, you have no proof it
can catch regressions.
Gold-Plating in GREEN
Adding abstractions, optimizations, or extra features during the GREEN phase.
The GREEN phase is about correctness, not elegance. Save structural improvements
for REFACTOR.
Refactoring While RED
Changing structure while tests are failing makes it impossible to distinguish
between test failures from the original defect and new failures from your
refactoring.
Per-Cycle Checklist
Use this mental checklist for each RED-GREEN-REFACTOR cycle:
[ ] Test describes behavior, not implementation
[ ] Test uses the public interface only
[ ] Test would survive an internal refactor
[ ] Test fails for the right reason (RED)
[ ] Implementation is minimal for this test (GREEN)
[ ] No speculative features added (GREEN)
[ ] All tests pass after refactoring (REFACTOR)
[ ] No new behavior introduced during refactor
Language-Specific Guidance
Rust
- Use and for unit tests
- Place integration tests in directory
- Use to run;
cargo test -- --nocapture
for stdout
- Consider for test modules alongside source
- Use , , macros
- For async tests: with tokio runtime
Go
- Use file suffix and
func TestXxx(t *testing.T)
signature
- Run with
- Use / for assertions
- Table-driven tests are idiomatic for testing multiple inputs
- Use for subtests
TypeScript
- Use test frameworks like vitest, jest, or node:test
- Run with the appropriate test runner command
- Use // pattern
- For async: return promises or use / in test functions
Solidity
- Use Foundry's with naming
- Use , , for assertions
- Fork tests with for mainnet state
- Use for test fixtures
- Fuzz tests:
function testFuzz_*(uint256 x)
for property-based testing
When the Bug is Hard to Test
If writing a test is difficult or the environment lacks test infrastructure:
- Write a test that fails with an explicit message explaining the bug and why
testing is hard.
- Fix the bug.
- Replace the explicit failure with a proper assertion once testability
improves.
- Invest in making the code more testable -- this is a design improvement.
Additional Resources
- For concrete examples per language, see references/examples.md