dev-tdd

Overview

No production code without a failing test first.

This is a discipline overlay, not a workflow step. It fires during ANY coding task - whether you're working from a plan, fixing a bug, adding a feature, or refactoring. There is no entry gate and no exit gate. If you're writing code, TDD rules are active.

The cycle is simple. The discipline is what matters.

Announcement

When this skill is active, announce:

"I'm using enggenie:dev-tdd to enforce test-driven development."

Hard Rule: No Production Code Without a Failing Test First

If you catch yourself writing production code before a test: stop. Delete it. Write the test first.

No exceptions. No "just this once." No "I'll circle back." The test comes first or the code doesn't get written.

Violating the Letter of TDD Is Violating the Spirit

There is no distinction between the letter and the spirit of TDD. The spirit IS the letter.

Writing a test that you know will pass is not TDD. Writing two tests before implementing is not TDD. Writing production code and then backfilling a test is not TDD. These aren't "close enough." They're a different practice entirely.

The value of TDD comes from the discipline of the cycle. Skip a step and you lose the feedback loop that makes it work.

The Cycle: RED -> GREEN -> REFACTOR

Every piece of new behavior follows this cycle. Every time. No shortcuts.

RED - Write a Failing Test

Write ONE minimal test that describes the next piece of behavior.
Run the test.
It MUST fail (not error - fail). A compilation error or import error is not RED. RED means the test ran and the assertion failed.
Read the failure message. It should describe the missing behavior clearly. If the failure message is confusing, fix the test before proceeding.

Verification: You saw a clear, expected failure message. The test ran. The assertion failed for the right reason.

GREEN - Make It Pass

Write the simplest code that makes the failing test pass. Not the clever code. Not the complete code. The simplest.
Run the test. It MUST pass.
Run ALL tests. They MUST all pass. If something else broke, fix it before moving on.

Verification: The new test passes. All existing tests still pass. You wrote minimal code - nothing beyond what the test demanded.

REFACTOR - Clean Up

Look at the code you just wrote. Look at the test you just wrote. Is there duplication? Poor naming? Unnecessary complexity?
Clean it up. Extract. Rename. Simplify.
Run ALL tests after every change. They MUST stay green.
No new behavior during refactor. If you want new behavior, start a new RED.

Verification: Tests are still green. Code is cleaner. No new behavior was added.

Then start the cycle again.

Worked Example: Bug Fix with TDD

Scenario: Email validation accepts "user@" as valid.

RED - Write the failing test:

python

def test_rejects_email_without_domain():
    assert validate_email("user@") == False

Run:

pytest tests/test_email.py::test_rejects_email_without_domain

FAILED - assert True == False

Good. The test fails because the current code does not check for a domain.

GREEN - Write the simplest fix:

python

def validate_email(email: str) -> bool:
    if "@" not in email:
        return False
    local, domain = email.rsplit("@", 1)
    return len(local) > 0 and len(domain) > 0

Run:

pytest tests/test_email.py

4 passed

All tests pass. Do NOT add more validation yet.

REFACTOR - Clean up while tests stay green: No refactoring needed - the code is already clean.

Good test vs Bad test:

Aspect	Good	Bad
Name	`test_rejects_email_without_domain`	`test_email_validation`
Assertion	`assert validate_email("user@") == False`	`assert result is not None`
Scope	Tests ONE behavior	Tests multiple behaviors
Failure message	"assert True == False" tells you what broke	"AssertionError" tells you nothing

The Shortcut Tax

Every shortcut has a cost. Here's what you're actually paying.

Shortcut	What it costs you
"I'll write tests after"	Tests pass immediately - you've proved nothing. Bugs ship.
"Too simple to test"	Simple code breaks. 30 seconds to test. 30 minutes to debug in prod.
"Already manually tested"	No record. Can't re-run. You'll re-test every change by hand forever.
"TDD slows me down"	TDD is faster than debugging. Systematic beats ad-hoc.
"Just this once"	That's what you said last time. Discipline compounds.
"Keep the code as reference"	You'll adapt it instead of writing tests first. Delete means delete.
"Need to explore first"	Fine. Explore. Then throw it away. Start fresh with TDD.
"The test is hard to write"	Hard to test = hard to use = bad design. Simplify the interface.
"Tests after achieve same goals"	After answers "what does this do?" First answers "what should it do?"
"Existing code has no tests"	You're improving it. Add tests for what you touch.
"I already see the problem"	Seeing symptoms does not equal understanding root cause. Write the test.

Deep Rebuttals -- When the Shortcut Tax Table Is Not Enough

The table above is quick reference. Below are the full arguments for the 5 most dangerous rationalizations. Use these when you (or the user) need the complete reasoning.

"I'll write tests after to verify it works"

Tests written after code pass immediately. Passing immediately proves nothing:

You might test the wrong thing (testing what you built, not what was required)
You might test implementation details instead of behavior
You missed edge cases you forgot about during implementation
You never saw the test catch the bug -- so you cannot trust it catches anything

Test-first forces you to see the test fail, proving it actually tests something. A test that never failed is a test you cannot trust.

"I already manually tested all the edge cases"

Manual testing is ad-hoc. You think you tested everything but:

There is no record of what you tested
You cannot re-run the same tests when code changes next week
Under pressure, you forget cases
"It worked when I tried it" is anecdote, not evidence

Automated tests are systematic. They run the same way every time. They catch regressions you forgot to re-test. They document what was verified.

"Deleting X hours of work is wasteful"

Sunk cost fallacy. The time is already spent. Your choice now:

Delete and rewrite with TDD (X more hours, but high confidence the code works)
Keep it and add tests after (30 minutes, but low confidence -- likely bugs hiding)

The "waste" is keeping code you cannot trust. Working code without real tests is technical debt that compounds every sprint. The faster path is the disciplined path.

"TDD is dogmatic -- being pragmatic means adapting"

TDD IS pragmatic:

Finds bugs before commit (faster than debugging after deployment)
Prevents regressions (tests catch breaks immediately, not in QA or production)
Documents behavior (tests show how to use the code -- they are living documentation)
Enables fearless refactoring (change anything, tests catch breaks)

"Pragmatic" shortcuts lead to debugging in production, which is slower, more expensive, and more stressful. Discipline IS pragmatism.

"Tests after achieve the same goals -- it's the spirit, not the ritual"

No. Tests-after answer "What does this code do?" Tests-first answer "What should this code do?"

Tests-after are biased by your implementation. You test what you built, not what was required. You verify the edge cases you remembered, not the ones you would have discovered by writing the test first.

Tests-first force edge case discovery BEFORE implementation. You think about failure modes while designing the interface, not while verifying the output. 30 minutes of tests-after gives you coverage. TDD gives you proof that the tests actually work.

Gut Check

STOP and start over if any of these are true:

You wrote production code before writing a test
Your test passed immediately (you never saw RED)
You're writing multiple tests at once before implementing
You're "just going to quickly add" something without a test
You're thinking "this case is different because..."
You're keeping deleted code "as reference"
You wrote tests after implementation → Delete the code. Write the test. Watch it fail. Rewrite.
You can't explain why a test failed → You don't understand the code. Investigate before proceeding.
You're thinking "I already spent X hours on this code, deleting is wasteful" → Sunk cost fallacy. Delete it. TDD code is faster to rewrite than debug.
You're thinking "TDD is dogmatic, I'm being pragmatic" → Pragmatic means following processes that work. TDD works. Skipping it is not pragmatic, it's reckless.

If you hit any of these: stop. Delete the production code. Go back to RED.

Exceptions (Require Explicit User Permission)

These are the ONLY acceptable reasons to skip TDD. Each requires the user to explicitly say "skip TDD for this":

Throwaway prototypes - Code that will be deleted before merge. Not "might be deleted" - WILL be deleted.
Generated code - Auto-generated files (migrations, scaffolds, codegen output). Not hand-written code that "feels generated."
Configuration files - Pure config with no logic (JSON, YAML, env files). Not config that contains conditional logic.

If you catch yourself thinking "this is basically an exception" - it's not. Ask the user.

When Stuck

Problem	Solution
Don't know how to test	Write the wished-for API. Write the assertion first. Work backwards.
Test too complicated	Design too complicated. Simplify the interface.
Must mock everything	Code too coupled. Use dependency injection. Reduce dependencies.
Test setup huge	Extract helpers. Simplify the design. If setup is painful, the API is painful.

Good Tests

A good test has three qualities:

Minimal - Tests one thing. One behavior. One assertion where possible. When it fails, you know exactly what broke.

Clear - The test name describes the behavior, not the implementation.

test_returns_empty_list_when_no_items

tells you more than

test_get_items

Shows intent - The test demonstrates the desired API. Reading the test tells you how the code should be used. It's the first consumer of your design.

Verification Checklist

Before marking ANY coding task complete, verify:

Every new function/method has a test
Watched each test fail before implementing (saw RED)
Each test failed for the expected reason (correct failure message)
Wrote minimal code to pass (no speculative generality)
All tests pass (full suite, not just the new test)
Tests use real code (mocks only when unavoidable - external services, file systems, network)
Edge cases and error paths are covered

If any box is unchecked, you're not done.

Supporting References

```
references/testing-anti-patterns.md
```
- Never test mock behavior, never add test-only methods to production code, mock COMPLETE data structures not partial ones
```
references/defense-in-depth.md
```
- Validate at every layer, make bugs structurally impossible through types and constraints

These references contain detailed patterns. Read them when you need specifics on test structure or validation strategy.

Relationship to enggenie:dev-implement

enggenie:dev-implement

orchestrates implementation work - it manages plans, dispatches subagents, and coordinates multi-step tasks.

enggenie:dev-tdd

is a discipline overlay that fires during ANY coding, whether or not

dev-implement

is active. It is not a step in a workflow. It is the way code gets written.

When

dev-implement

is active,

dev-tdd

's rules are enforced through the implementer subagent prompt. The subagent follows the RED-GREEN-REFACTOR cycle for every piece of code it writes.

When

dev-implement

is NOT active (ad-hoc coding, quick fixes, explorations that turn into real code),

dev-tdd

still fires. The discipline does not depend on having a plan.

Recommended Model

Primary: sonnet Why: TDD requires balanced speed and code quality. Sonnet writes good tests and clean implementations without the latency of opus. For complex domain logic, override to opus.

This is a recommendation. Ask the user: "Confirm model selection or override?" Do not proceed until the user responds.

Entry / Exit

Entry: None. This skill fires during any coding task. No trigger required.

Exit: None. This is a discipline overlay, not a workflow step. It remains active as long as code is being written.

dev-tdd

NPX Install

Tags

SKILL.md Content

dev-tdd

Overview

Announcement

Hard Rule: No Production Code Without a Failing Test First

Violating the Letter of TDD Is Violating the Spirit

The Cycle: RED -> GREEN -> REFACTOR

RED - Write a Failing Test

GREEN - Make It Pass

REFACTOR - Clean Up

Worked Example: Bug Fix with TDD

The Shortcut Tax

Deep Rebuttals -- When the Shortcut Tax Table Is Not Enough

"I'll write tests after to verify it works"

"I already manually tested all the edge cases"

"Deleting X hours of work is wasteful"

"TDD is dogmatic -- being pragmatic means adapting"

"Tests after achieve the same goals -- it's the spirit, not the ritual"

Gut Check

Exceptions (Require Explicit User Permission)

When Stuck

Good Tests

Verification Checklist

Supporting References

Relationship to enggenie:dev-implement

Recommended Model

Entry / Exit