Spec-Driven Test-Driven Development
Transform requests into verified implementations through a structured pipeline
that adapts to the project context.
Workflow Router
Determine the workflow type before starting. This drives which phases apply and
which templates to use.
Does a codebase already exist?
|
NO --> GREENFIELD (new project from scratch)
| Phases: Clarify -> Specify -> Test Plan -> TDD -> Verify
|
YES --> What type of change?
|
+-> Bug report / error / regression
| --> BUGFIX workflow
| Phases: Explore -> Bugfix Spec -> TDD -> Verify
|
+-> Add new capability not in codebase
| --> Does test infrastructure exist?
| |
| YES --> ENHANCEMENT workflow
| | Phases: Explore -> Clarify -> Specify (delta) -> Test Plan -> TDD -> Verify
| |
| NO --> ENHANCEMENT (NO TESTS) workflow
| Phases: Explore -> Clarify -> Specify (delta) -> Bootstrap Tests -> Test Plan -> TDD -> Verify
|
+-> Change behavior without adding capability
| --> Does test infrastructure exist?
| |
| YES --> REFACTOR workflow
| | Phases: Explore -> Clarify -> Specify (delta) -> Test Plan -> TDD -> Verify
| |
| NO --> REFACTOR (NO TESTS) workflow
| Phases: Explore -> Clarify -> Specify (delta) -> Bootstrap Tests -> Test Plan -> TDD -> Verify
|
+-> Simple, well-understood change (one-sentence diff)
--> DIRECT (skip this skill)
Signal detection:
| Signal in request | Likely type |
|---|
| Error message, stack trace, "doesn't work", "broken" | Bugfix |
| "Add", "create", "new", "build", "implement" | Enhancement |
| "Improve", "enhance", "extend", "also support" | Enhancement |
| "Clean up", "modernize", "consolidate", "simplify" | Refactor |
| "Slow", "optimize", "performance" | Refactor (optimization) |
| No existing code, "start fresh", "new project" | Greenfield |
| "Add tests", "no tests", "untested" | Enhancement/Refactor (no tests) |
Test infrastructure detection:
Existing test infrastructure = ANY of these found:
- Test files (, , )
- Test runner config (jest.config., vitest.config., pytest.ini, conftest.py, .mocharc.*)
- Test script in manifest ( script in package.json, test target in Makefile)
- CI config that runs tests
No test infrastructure = NONE of the above found.
Phase 0: Explore Existing Codebase (Brownfield Only)
Skip this phase for greenfield projects.
Before asking the user anything or writing any spec, understand what already
exists. Explore in read-only mode.
What to discover
- Project structure: Directory layout, where source and tests live
- Tech stack: Framework, language, versions, build system, package manager
- Architectural patterns: MVC, layered, event-driven, microservices, etc.
- Naming conventions: File naming, function naming, variable casing
- Existing patterns: Find the closest analog to the requested change — how
is similar functionality already implemented?
- Data models: Key entities, relationships, schemas
- Test infrastructure: Test framework, test runner command, fixture patterns,
approximate coverage, where test files live
- Existing spec artifacts: Check for prior requirements.md, design.md,
tasks.md, CLAUDE.md, or similar documentation
- Affected area: Which files and modules will this change touch?
How to explore
- Read the project's CLAUDE.md or README first (if they exist)
- Use glob/grep to find relevant source files and test files
- Read 2-3 existing files in the affected area to absorb patterns
- If prior spec files exist, read them to understand accumulated requirements
Test infrastructure assessment
Determine the test state by scanning for:
| Indicator | What it tells you |
|---|
| Test files exist in affected area | Tests cover the code you'll change |
| Test files exist elsewhere but not here | Project has tests, but not for this area |
| Test framework in dependencies | Framework chosen but maybe unused |
| Test runner config exists | Infrastructure is set up |
| script in manifest | Runner command is established |
| CI config runs tests | Tests are part of the workflow |
| No test files, no config, no scripts | No test infrastructure at all |
Classify the result:
| Test State | Definition | Workflow Impact |
|---|
| Tests exist | Test files + runner + config all present | Standard brownfield |
| Partial infra | Framework in deps or config exists, but no/few test files | Bootstrap: write tests, skip framework selection |
| No tests | No test files, no config, no runner | Bootstrap: full test infrastructure setup |
Output: Codebase Context Summary
Produce a brief mental model (do not write a file unless the user requests it):
Stack: [framework, language, test runner]
Patterns: [architecture, naming, file organization]
Affected area: [files/modules this change touches]
Existing tests: [relevant test files, approximate count, runner command]
OR: No test infrastructure found
Test state: [tests exist | partial infra | no tests]
Prior specs: [any existing requirements.md/design.md or none]
Closest analog: [existing feature most similar to requested change]
This context informs every subsequent phase.
Phase 1: Clarify Requirements
Evaluate the request's clarity before starting any work.
Greenfield
Ask about the blank slate: inputs/outputs, data formats, error handling,
edge cases, performance constraints, tech stack preferences.
Brownfield (tests exist)
Use codebase context from Phase 0. Do NOT ask questions the codebase
already answers. Focus questions on:
- What should the NEW behavior be? (the codebase shows current behavior)
- Where does new behavior differ from existing patterns?
- What existing behavior must NOT change?
- Are there constraints the existing architecture imposes?
Brownfield (no tests)
In addition to the standard brownfield questions, clarify:
- Test framework preference: Does the user have a preferred test framework?
If not, recommend one based on the stack (see Bootstrap Tests phase).
- Test scope: Should tests cover ONLY the new/changed code, or also
establish baseline coverage for existing code in the affected area?
- Test location: Co-located with source () or separate
directory ()? Infer from project conventions if possible.
Ambiguity signals (if ANY present, ask before proceeding):
- Multiple valid interpretations producing different implementations
- Missing I/O specifications, pre/post conditions, or data formats
- Unspecified error handling, edge cases, or boundary conditions
- Unclear performance, security, or environmental constraints
- Change touches shared/critical code paths
When to proceed without asking:
- Request is well-constrained with one obvious implementation
- Ambiguity is purely cosmetic (naming, formatting)
- Clarification loop has reached 3 rounds (proceed with stated assumptions)
Interview approach:
- Ask up to 3 focused questions per round
- Each question should target a decision that changes the implementation
- State the default assumption alongside each question
- After each round, summarize resolved items and remaining unknowns
Phase 2: Write Specification
Use the templates in references/spec-template.md.
Which template depends on the workflow type:
| Workflow | Spec Template | Key Difference |
|---|
| Greenfield | Feature (requirements.md + design.md) | Written from scratch |
| Enhancement | Enhancement (requirements.md + design-delta.md) | Extends existing specs, documents only what changes |
| Enhancement (no tests) | Enhancement (requirements.md + design-delta.md) | Same as above, plus test infrastructure decisions |
| Refactor | Refactor (refactor.md) | Documents behavior preservation + structural changes |
| Refactor (no tests) | Refactor (refactor.md) | Same as above, plus test infrastructure decisions |
| Bugfix | Bugfix (bugfix.md) | Current/Expected/Unchanged behavior |
EARS notation (all workflows)
| Pattern | Syntax | Use Case |
|---|
| Ubiquitous | THE SYSTEM SHALL [behavior] | Always-on requirements |
| Event-Driven | WHEN [event] THE SYSTEM SHALL [behavior] | Triggered by events |
| State-Driven | WHILE [state] THE SYSTEM SHALL [behavior] | During a state |
| Unwanted | IF [condition] THEN THE SYSTEM SHALL [behavior] | Exception handling |
| Complex | WHILE [state] WHEN [event] THE SYSTEM SHALL [behavior] | Combined |
Every requirement must be atomic, testable, and have a unique ID.
Requirement ID namespacing (brownfield)
If prior specs exist with REQ-001 through REQ-N, continue numbering from REQ-(N+1).
If no prior specs exist, start fresh. Use prefixes to distinguish:
- for enhancement requirements
- for refactor requirements
- for bugfix requirements
Unchanged Behavior section (critical for brownfield)
Every brownfield spec MUST include an Unchanged Behavior section:
- INV-001: WHEN [condition] THE SYSTEM SHALL CONTINUE TO [existing behavior]
This is the primary regression prevention mechanism.
When no tests exist, invariants are your ONLY regression protection. Be
more thorough than usual: list every behavior in the affected area that must
survive the change. Read the source code and its callers to identify these.
Greenfield: requirements.md + design.md
Write both from scratch per the Feature template. Full architecture, data models,
API contracts, error handling, testing strategy.
Enhancement: requirements.md + design-delta.md
requirements.md: Add only NEW requirements. Reference existing spec if present.
Include Unchanged Behavior invariants for every existing behavior the change
could affect.
design-delta.md: Document only what changes:
- Modified components (what exists → what changes)
- New components (what's added)
- Integration points (how new connects to existing)
- Files to modify vs. files to create
Do NOT rewrite the entire architecture. Reference existing design.
No-tests addition to design-delta.md: Include a Test Infrastructure section:
markdown
## Test Infrastructure (New)
- **Framework:** [chosen framework and rationale]
- **Runner command:** [command to run tests]
- **Config file:** [path to test config]
- **Test location:** [co-located | separate directory]
- **Naming convention:** [pattern, e.g., *.test.ts, test_*.py]
Bugfix: bugfix.md
Three-section format:
- Current Behavior (defect): WHEN [x] THEN the system [incorrect behavior]
- Expected Behavior (correct): WHEN [x] THEN the system SHALL [correct behavior]
- Unchanged Behavior (regression): WHEN [y] THE SYSTEM SHALL CONTINUE TO [existing behavior]
Derive three tests: reproduce bug, validate fix, confirm no regressions.
Phase 3: Bootstrap Test Infrastructure (No-Tests Workflows Only)
Skip this phase if test infrastructure already exists.
Before deriving a test plan, establish the test infrastructure. You cannot
write tests without a framework, runner, and conventions.
Step 1: Select test framework
If the user specified a preference in Phase 1, use it. Otherwise, select
based on the stack:
| Stack | Recommended Framework | Rationale |
|---|
| TypeScript / ESM | vitest | Native ESM, fast, compatible with Jest API |
| TypeScript / CJS | jest + ts-jest | Mature ecosystem, wide adoption |
| JavaScript / Node | jest or vitest | Either works; match project's module system |
| JavaScript / Browser | vitest or jest + jsdom | DOM testing support |
| Python 3 | pytest | De facto standard, fixtures, parametrize |
| Python (Django) | pytest-django | Django integration with pytest |
| Go | testing (stdlib) | Built-in, no external dependency needed |
| Rust | cargo test (built-in) | Built-in test framework |
| Java / Spring | JUnit 5 + Mockito | Industry standard |
| Ruby / Rails | RSpec or Minitest | RSpec for BDD style; Minitest for minimal |
| C# / .NET | xUnit or NUnit | xUnit for modern .NET |
| Elixir | ExUnit (built-in) | Built-in test framework |
Principle: Choose the most conventional option. The goal is not the "best"
framework but the one the team (or a future developer) will expect.
Step 2: Install and configure
-
Add the framework as a dev dependency:
npm install --save-dev vitest # Node/TS
pip install pytest # Python
# Go and Rust have built-in testing
-
Create test config (if the framework needs one):
- , , , etc.
- Match the project's existing config patterns (TypeScript for TS projects,
YAML if project uses YAML configs, etc.)
-
Add test runner script to the project manifest:
json
// package.json
{ "scripts": { "test": "vitest run" } }
toml
# pyproject.toml
[tool.pytest.ini_options]
testpaths = ["tests"]
-
Create test directory (if using separate directory convention):
mkdir tests/ # or __tests__/ or test/
-
Verify the framework works by writing a trivial smoke test:
typescript
// tests/smoke.test.ts
import { describe, it, expect } from 'vitest'
describe('test infrastructure', () => {
it('works', () => {
expect(1 + 1).toBe(2)
})
})
Run it. If it passes, infrastructure is ready. Delete or keep the smoke
test as appropriate.
Step 3: Establish test conventions
Derive conventions from the project's CODE conventions:
| Code Convention | Test Convention |
|---|
| Files in | Tests in (co-located) or (separate) |
| Functions use camelCase | Test names use camelCase: it('createsUserWithEmail')
|
| Functions use snake_case | Test names use snake_case: test_creates_user_with_email
|
| Modules organized by feature | Test directories mirror source directories |
| ES modules (import/export) | Tests use same import style |
| CommonJS (require) | Tests use same require style |
Document the conventions in design-delta.md (or design.md for greenfield)
so future tests stay consistent.
Step 4: Write characterization tests (critical)
Before changing any existing code, write characterization tests that capture
the current behavior of the code you're about to modify. These are not tests
you want to pass — they are tests that document what the code ACTUALLY does,
so you can detect when your changes break something.
See references/test-plan.md for the full
characterization test method.
What to characterize
Focus on the affected area identified in Phase 0:
- Public functions/methods you'll call or modify: test their current
inputs → outputs
- API endpoints you'll change: test their current request → response
- Critical code paths through the module: test the happy path and
major error paths
- Integration points between the module and its callers: test that
callers get what they expect
How many characterization tests
- Enhancement: Characterize the specific functions/endpoints you'll modify
or call. Not the entire codebase — just the affected surface.
- Refactor: More extensive. Characterize ALL externally observable behavior
of the code being refactored. This is your safety net.
- Bugfix: Characterize the behavior around the bug. The reproduction test
IS a characterization test (it documents the current broken behavior).
Characterization test naming
Use a distinct naming pattern so these are recognizable:
test_CHAR_[function]_[scenario]_[current_behavior]
Examples:
python
def test_CHAR_create_user_with_valid_data_returns_user_object():
def test_CHAR_create_user_with_duplicate_email_raises_conflict():
def test_CHAR_get_user_nonexistent_returns_none():
Run characterization tests
All characterization tests MUST pass against the UNCHANGED code. If a
characterization test fails, your test is wrong — fix the test, not the code.
These characterization tests become your regression suite. They replace
the "existing tests" that the standard brownfield workflow relies on.
Deliverable: Test infrastructure ready
After this phase:
- Framework installed and configured
- Runner command works
- Test conventions documented
- Characterization tests pass against unchanged code
- You can now proceed to Phase 4 (Test Plan) with confidence
Phase 4: Derive Test Plan
From the spec, derive tests and write
. See
references/test-plan.md for templates.
Brownfield: Discover existing tests first
Before writing any new tests:
- Find existing test files in the affected area (glob for , )
- Read relevant existing tests to understand patterns (framework, assertions,
fixtures, naming conventions)
- Check for existing coverage of the behaviors you're about to change
- Match existing patterns in all new tests (same framework, same style,
same file location conventions)
Brownfield (no tests): Use characterization tests as baseline
If you bootstrapped tests in Phase 3:
- Your characterization tests ARE the existing tests. They serve the same
role as pre-existing tests in the standard brownfield workflow.
- Match the conventions you established in Phase 3 for all new tests.
- Do not add more characterization tests at this point — Phase 3 covered
the affected area. Focus on test derivation from the spec.
- Map invariants to characterization tests: Each INV-* should correspond
to at least one characterization test that already passes.
Test derivation rules
- Each EARS requirement → at least one acceptance test (Given/When/Then)
- Each data model/interface → unit tests for validation, transformation, edge cases
- Each integration point → integration test using real services where possible
- Each invariant → regression test confirming unchanged behavior
- If tests already exist: existing tests cover this
- If no tests existed: characterization tests from Phase 3 cover this
- Identify properties (invariants for all inputs) → property-based tests
Traceability
Maintain a mapping in tasks.md:
| Req ID | Test Case IDs | Status |
|-----------|-------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002| Not Started |
| INV-001 | TC-REG-001 | Not Started |
Brownfield traceability rules:
- Every NEW requirement must have >= 1 test
- Every INVARIANT must have >= 1 regression test
- Existing tests from prior iterations are NOT orphans — only flag tests from
the current iteration that don't map to a requirement
- The traceability matrix covers only the current iteration's scope
No-tests traceability addition:
- Characterization tests (CHAR-) map to INV- invariants
- Include them in the matrix with their CHAR prefix:
| Req ID | Test Case IDs | Status |
|-----------|----------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002 | Not Started |
| INV-001 | CHAR-create-user-001 | Passing (baseline) |
| INV-002 | CHAR-get-user-001 | Passing (baseline) |
tasks.md
Break implementation into discrete, sequenced tasks. Each task:
- Maps to one or more requirements
- Has clear acceptance criteria
- Follows dependency order
- Includes "Write tests" as the FIRST subtask (TDD)
- Brownfield: Specifies which files are MODIFIED vs. CREATED
No-tests task ordering:
For brownfield-no-tests workflows, tasks.md should include the bootstrap
work as Task 0:
## Task 0: Bootstrap test infrastructure
- Status: [ ] Not Started
- Requirements: (infrastructure — no REQ mapping)
- Subtasks:
1. Install [framework], create config
2. Add test runner script
3. Write characterization tests for affected area
4. Verify all characterization tests pass
- Acceptance: `[test command]` runs and all characterization tests pass
Principle: no big jumps in complexity.
Phase 5: TDD Implementation Loop
Execute tasks from tasks.md using strict TDD. This is NON-NEGOTIABLE.
For each task:
1. RED - Write failing test(s) for the task's requirements
2. RUN - Execute test, confirm it FAILS (if it passes, test is wrong)
3. GREEN - Write MINIMAL code to make the test pass
4. RUN - Execute ALL tests (new + existing), confirm ALL pass
5. REFACTOR - Clean up, ensure no test breakage
6. COMMIT - Mark task complete in tasks.md
Running tests (brownfield, tests exist)
- Inner loop: Run only the new/affected tests during Red-Green iterations
(for speed)
- Task boundary: Run the FULL test suite after completing each task
(for regression safety)
- Final verification: Run full suite + linters + type checks at the end
Running tests (brownfield, no tests — after bootstrap)
- Inner loop: Run new tests + characterization tests during Red-Green
iterations. The characterization tests are your regression guardrail.
- Task boundary: Run ALL tests (characterization + new) after each task.
- Final verification: Run all tests + linters + type checks at the end.
Critical: If a characterization test fails during implementation, you broke
existing behavior. This is the same signal as "existing test breaks" in the
standard brownfield workflow. Fix your new code first.
Rules
- Never write implementation before its test.
- Never alter the spec to satisfy a test. Spec-derived tests are authoritative.
- Minimal code only. Add nothing beyond what makes the current test pass.
- All tests green before moving to next task.
- Use real dependencies where feasible. Mocks only for external services
outside your control.
- Decompose classes by method dependency. Generate in dependency order,
test each method individually.
- Bounded repair. 3 fix attempts max, then reassess or ask user.
Brownfield-specific rules (all brownfield workflows)
- Match existing patterns. New code must follow the conventions discovered
in Phase 0 (naming, file structure, import style, error handling).
- Refactor only what you wrote. Do NOT refactor existing code unless the
task explicitly requires it. Existing code is assumed correct until proven otherwise.
- Read before calling. Before calling existing functions, read their actual
signatures. Do not assume existing interfaces — verify them.
- If an existing test breaks, your new code caused a regression. Fix your
new code first (existing passing tests are authoritative). Only modify an
existing test if the spec explicitly changes that behavior.
- If a characterization test breaks (no-tests workflow), the same rule
applies: your new code caused a regression. Fix your new code. The
characterization test documents real behavior that something depends on.
Hallucination prevention
- Verify external APIs/libraries exist and check current interfaces
- Chain-of-thought: reason step-by-step before coding
- Run static analysis after generation
- Use execution traceback (not just re-reading) to fix failures
- Brownfield: Read existing code before calling it; verify signatures
Phase 6: Verification
After all tasks complete, verify the full delivery.
Checklist
Additional no-tests verification
Deliverables
Greenfield:
- — Full specification
- — Full technical design
- — Task list with traceability matrix
- Test suite — All passing
- Implementation code
Brownfield enhancement/refactor (tests exist):
- — New/changed requirements only (or appended to existing)
- — What changed in the design
- — Task list with traceability matrix for this iteration
- New/modified tests
- Implementation changes
- Change summary: Files modified, files created, behaviors added/changed
Brownfield enhancement/refactor (no tests):
- — New/changed requirements only (or appended to existing)
- — What changed in the design, INCLUDING test infrastructure
decisions (framework, conventions, directory structure)
- — Task list with traceability matrix (includes CHAR-* mappings)
- Test infrastructure: config, runner script, directory structure
- Characterization tests for affected area
- New tests derived from spec
- Implementation changes
- Change summary: Files modified, files created, behaviors added/changed,
test infrastructure established
Bugfix:
- — Bug analysis with Current/Expected/Unchanged
- Tests: reproduction, fix validation, regression
- Fix implementation
- Change summary
Failure Recovery
Test fails
|
+-> Is it a NEW test that fails?
| +-> Code bug: fix implementation
| +-> Test wrong: does it match spec?
| +-> Yes: fix code (spec is authoritative)
| +-> No: fix test (or revisit spec with user)
|
+-> Is it an EXISTING test that fails? (standard brownfield)
| +-> Your new code caused a regression
| +-> Fix your new code (existing tests are authoritative)
| +-> Do NOT modify the existing test unless the spec
| explicitly changes that behavior
| +-> If the existing test seems wrong, confirm with user
| before changing it
|
+-> Is it a CHARACTERIZATION test that fails? (no-tests brownfield)
| +-> Your new code caused a behavioral regression
| +-> Fix your new code (characterization tests document real behavior)
| +-> Do NOT modify the characterization test unless the spec
| explicitly changes that behavior (listed in Modified Behavior
| with Was/Now)
| +-> If the characterization test documents behavior the spec
| INTENDS to change, update the test to match the new spec
|
+-> Test infrastructure won't set up? (no-tests bootstrap)
+-> Check framework compatibility with project's Node/Python/etc. version
+-> Check for conflicting config (e.g., module type mismatches)
+-> Try the next framework in the recommendation table
+-> If stuck after 2 frameworks, ask user for guidance
Never silently change the spec. Confirm with user first.
If stuck after 3 attempts, ask user for guidance.
Reference Files
- Spec templates: See references/spec-template.md
for all templates: Feature, Enhancement, Refactor, Bugfix, and Codebase Context
- Test plan guide: See references/test-plan.md
for test derivation, existing test discovery, characterization tests,
traceability, and Given/When/Then
- Edge case catalog: See references/edge-cases.md
for edge case categories to check during specification