TDD (Test-Driven Development)
Guide the Red-Green-Refactor cycle and pragmatic testing strategy. "Write tests. Not too many. Mostly integration."
Initial context received via slash: $ARGUMENTS
If
is filled (e.g., module name, feature description), use as starting point.
If empty, ask what will be tested.
Language
Write artifacts and test descriptions in the user's language. When in doubt, ask. Test code itself (function names, assertions) stays in English.
Project root
This skill writes artifacts at paths relative to the project root (the repo where the work happens), not the agent's current working directory.
- If invoked from inside the project, use the relative paths shown in this skill.
- If invoked from another directory (e.g., a sibling repo, or when the project lives elsewhere), prepend to every artifact path.
- When the project root is ambiguous, confirm with the user via the harness question tool before writing.
Prompting
Follow the project-wide convention in
/
("Skill Prompting Conventions"). Use the harness's structured-question tool —
(Claude Code),
(Codex), or
(OpenCode) — for the decision points below. Use free-form text only where a path/name/value cannot be enumerated.
| Decision point | Why structured | Suggested options |
|---|
| Enforcement mode (when installing) | Hard-to-undo policy choice | warn · block · keep current |
| Test strategy (when ambiguous) | Affects file layout | sibling · sibling_dir · tests_root |
| Exempt a specific path | Edits guardrails config | yes · no · review later |
Free-form prompts (no structured tool):
- Test descriptions
- Exemption rationale
No-pause mode: if the user has explicitly disabled mid-skill clarification, convert every structured prompt into an entry under Open questions (or equivalent) and proceed without blocking.
When to use
- Starting a new feature with TDD
- Adding tests to existing code
- Establishing test coverage for a module
- Unclear whether something needs unit, integration, or E2E tests
When NOT to use
- Quick prototypes where tests add no value -- use
- Throwaway scripts
- Pure documentation changes
TDD cycle
- Red -- write a failing test that describes the desired behavior
- Green -- write the minimum code to make it pass
- Refactor -- improve structure without changing behavior
- Repeat
Present each step explicitly. Do not skip Red -- the test must fail first.
Test pyramid (pragmatic)
| Layer | Target | Focus |
|---|
| Unit | 60% | Pure functions, transformers, utils |
| Integration | 30% | Services, DB interactions, API routes |
| E2E | 10% | Critical user flows |
Overall coverage target: 75%+.
For front-end work, treat these percentages as risk guidance, not quotas. Prefer integration tests that exercise user behavior, validation, local state, API contracts, permissions, offline/sync behavior, and critical flows. Avoid tests that only assert static text, that a button rendered, or implementation details of a design-system component.
When a project keeps business rules in
planning/<initiative>/business/*.md
, use those rule IDs to decide what deserves tests. Tests should prove behavior behind important rules, not restate the rule text.
File structure
- Unit: co-located with source ( beside )
- Integration/E2E: with , , , ,
- Naming: (unit/integration), (E2E)
- Never
Rules
- AAA pattern (Arrange / Act / Assert)
- One concept per test
- Descriptive names that read as sentences
- Always use factories (e.g., ) over hardcoded data
- Isolate with -- no shared state between tests
- Test behavior, not implementation details
Anti-patterns (avoid)
- Interdependent tests (test A depends on test B running first)
- Arbitrary -- use proper waits
- Testing private methods -- test through public API
- in tests -- use proper assertions
- Order-dependent tests
- Mocking what you own (mock external dependencies, not your own code)
Coverage targets (granular)
| Area | Target |
|---|
| Transformers / pure functions | 90%+ |
| Utils | 85%+ |
| Services | 80%+ |
| Routes / handlers | 70%+ |
Commands (Bun)
bun test
bun test --watch
bun test --coverage
bun test --filter "name"
bun test src/dir/
Adjust for other runtimes (vitest, jest) as needed. Detect the project's test runner from
or config files before suggesting commands.
Process
1. Understand what to test
Explore the code to understand:
- What module or feature needs tests
- What behaviors are critical
- What is already covered (check existing tests)
- Which business rule IDs, acceptance criteria, or prototype flows the change must satisfy
2. Choose the right test type
Use the test pyramid as guide:
- Pure function with no side effects? Unit test.
- Service that talks to DB or external API? Integration test.
- Critical user flow that spans multiple systems? E2E test.
- Front-end behavior with validation, API contract, permission, optimistic update, or offline/sync state? Integration test.
- Static copy, simple rendering, or visual-only detail with no rule? Usually no test unless it protects a known regression.
For local-first products, give priority to tests that cover command validation, optimistic state, offline queue persistence, reconciliation, conflict handling, permissions, and audit events.
3. Execute the TDD cycle
For each behavior:
- Write the failing test (Red)
- Implement the minimum code (Green)
- Refactor if needed
- Verify the test still passes
Record the business rule ID or acceptance criterion in the test description or surrounding story artifact when that mapping helps future refinement.
4. Verify coverage
Run coverage and check against targets. Fill gaps in critical areas first.
Chaining
- During feature implementation: work inside the checklist
- After implementation: to review test quality
- Before closing: ensure tests are part of (closure mode) verification
- If the TDD workflow exposes repeated friction, missing guidance, weak templates, or unclear verification, capture a concise skill feedback note with the affected skill/template, evidence, proposed change, and validation artifact.
- If repeated TDD friction suggests a skill/template change, use before editing the process library.
Enforcement (optional, opt-in per project)
Beyond advisory guidance, this skill ships hook templates that turn the TDD rule into a project-level guardrail. When a project enables them, every implementation session is checked at the file-write level — without the agent having to remember to invoke this skill.
What gets enforced
- PreToolUse on — if the target file matches in and a companion test does not exist, the hook warns () or blocks the tool call ().
- Stop hook — at session end, scans the git diff and reports source files touched without a companion test.
- SessionStart hook — announces "TDD enforcement active" so the agent knows the rule is in force.
The hook templates live at
skills/agile-tdd/templates/hooks/*.sh.tmpl
and the config schema at
skills/agile-tdd/templates/tdd-guardrails.yml.tmpl
.
Config ()
| Key | Meaning |
|---|
| Global on/off |
| (stderr only) or (PreToolUse rejects the call) |
| Globs that require a companion test |
| ( → ), (), or (<app>/tests/integration/foo.test.ts
) |
| Globs allowed without a test (entry points, generated files, UI primitives) |
Pattern semantics: the hook scripts use
bash globs, not extended globstar. In bash
patterns,
matches any sequence of characters including
; there is
no . So
matches both
apps/server/src/handler.ts
and
apps/server/src/auth/handler.ts
. Do not use
in
or
.
Enforcement caveats
The hook checks file-pair existence — it does not, and cannot, verify:
- That the test was written before the source (no Red-before-Green order check).
- That the test actually exercises the source (no semantic match).
- That the test currently passes (no test execution).
Semantic discipline (one behavior per test, factories over hardcoded data, descriptive names, AAA) still belongs to the agent. The hooks are guardrails, not a guarantee.
Manual install (until a script exists)
Per-harness mechanics differ — Claude Code and Codex run shell hooks directly; OpenCode runs a JS plugin that orchestrates the same shell scripts.
-
Copy
templates/tdd-guardrails.yml.tmpl
to
<project-root>/.tdd-guardrails.yml
and edit
and
to match the repo layout.
Then run the project-type detectors to populate
:
bash
for type in .claude/skills/agile-tdd/templates/project-types/*/; do
name="$(basename "$type")"
[ -f "$type/detect.sh" ] || continue
bash "$type/detect.sh" "$PWD" && echo "detected: $name"
done
For each detected type, append its
to the
newly-created
(or merge if the block already
exists) and add the type slug to the
list.
-
Copy the hook templates. From
to
<project-root>/.claude/hooks/
,
<project-root>/.codex/hooks/
,
and
<project-root>/.opencode/hooks/
(the OpenCode plugin invokes the
same scripts via
). Drop the
suffix and
.
Also copy the shared helpers:
templates/lib/audit-helpers.sh.tmpl → .claude/hooks/lib/audit-helpers.sh
and the project-types directory:
templates/project-types/ → .claude/hooks/project-types/
The session-audit script searches both locations. Codex/OpenCode
should mirror under
and
.
-
Register the hooks in each harness config:
- Claude Code (): add a entry matching calling
$CLAUDE_PROJECT_DIR/.claude/hooks/tdd-pre-write.sh
, a entry calling , and a entry calling . Merge with existing hooks (e.g. wiki-init) — do not replace.
- Codex (): add matching
apply_patch|Edit|Write|MultiEdit
, , and entries. Use bash "$(git rev-parse --show-toplevel)/.codex/hooks/<script>.sh"
as the command form (Codex pattern). Include field for each.
- OpenCode: copy
templates/opencode-plugin.js.tmpl
to <project-root>/.opencode/plugins/tdd-guardrails.js
. The plugin subscribes to (PreToolUse equivalent), (SessionStart equivalent), and (closest to Stop — the audit shell is idempotent via a tmp state file so multi-firing is safe). The plugin spawns the same scripts via . OpenCode does not invoke shell scripts directly; the plugin is the entry point.
-
Append the contents of
templates/agents-block.md.tmpl
to
and
so the agent is told the project has TDD enforcement.
For each detected project-type, also append its
templates/project-types/<type>/agents-block.partial.md
to the same
files (preserving the
<!-- agile-tdd:<type>:start --> / :end
markers so re-installs can update in place).
-
Register the type-specific matchers in
:
- For : a entry matching
mcp__tauri__webview_screenshot|mcp__tauri__webview_execute_js|mcp__tauri__webview_dom_snapshot|mcp__tauri__manage_window
calling $CLAUDE_PROJECT_DIR/.claude/hooks/tdd-record-mcp.sh
.
The Codex equivalent goes in
; for OpenCode,
subscribe to
filtered by tool name and spawn
the same shell via
.
The hooks are guardrails, not a guarantee — they check file-pair existence, not test quality. Semantic discipline (one behavior per test, factories over hardcoded data) still belongs to the agent.
Harness compatibility matrix
| Harness | Entry point | Pre-write event | Stop equivalent | Session start |
|---|
| Claude Code | hooks | (matcher ) | | |
| Codex | hooks | (matcher apply_patch|Edit|Write|MultiEdit
) | | |
| OpenCode | .opencode/plugins/tdd-guardrails.js
JS plugin | | (closest available; audit is idempotent) | |
Bypassing intentionally
- For one path: add it to
.tdd-guardrails.yml → exemptions
.
- For one session: temporarily set (and revert before commit).
- Never delete the test file just to silence the hook — that defeats the point.
Project-type templates
Beyond the base companion-test rule,
ships
project-type
templates under
templates/project-types/<type>/
. Each template
opts into project-specific evidence that the Stop hook checks in
addition to companion tests. The session-audit script composes the
base check with one fragment per active type — modes are independent
per type, so you can run TDD in
and a stricter type in
.
Available types
| Type | Detects | Enforces (in addition to companion tests) |
|---|
| (single-repo or monorepo) | fresh + fresh + ≥1 mcp__tauri__webview_screenshot
per affected app after the last edit |
| (planejado) + service worker | service worker rebuilt + offline smoke test |
| (planejado) (Expo) / / | platform build green + simulator screenshot |
| (planejado) electron/forge config | electron-builder validate + window screenshot |
Active types are listed in
.tdd-guardrails.yml → project_types
. At
install, each
templates/project-types/<type>/detect.sh
runs against
the repo; types that succeed get appended to the config along with
their
block.
Anatomy of a type
A type is a folder with:
- — exits 0 when the type applies to the repo.
- — block appended to .
- — section appended to /
inside
<!-- agile-tdd:<type>:start -->
markers.
- — sourced by . Exports
check_<type>_evidence "$ROOT"
returning textual violations. Sets
(uppercase) so the framework can decide block vs warn.
- — optional extra hooks the type registers (e.g.
PostToolUse evidence recorder).
- — human description of the type.
The framework calls helpers from
templates/lib/audit-helpers.sh
for
yml parsing, glob matching, app-root discovery, and freshness checks.
Adding a new type does not require touching the framework script.
Installing types
The base
install already follows the "Manual install" steps
above. For each detected type:
- Append
templates/project-types/<type>/guardrails.partial.yml
to
(or merge if the block already exists).
- Append to / between
the marker pair.
- Copy to , , and
, dropping the suffix and -ing.
- Add the type's matcher to each harness config:
- Claude Code ( → ): add a
matcher entry for the type's PostToolUse hooks. For :
mcp__tauri__webview_screenshot|mcp__tauri__webview_execute_js|mcp__tauri__webview_dom_snapshot|mcp__tauri__manage_window
calling $CLAUDE_PROJECT_DIR/.claude/hooks/tdd-record-mcp.sh
.
- Codex (): same matcher pattern; command
uses
bash "$(git rev-parse --show-toplevel)/.codex/hooks/tdd-record-mcp.sh"
.
- OpenCode: extend
.opencode/plugins/tdd-guardrails.js
to
subscribe to with a filter on tool name,
spawning the same shell script via .
- Append the type slug (e.g. ) to in
. The session-audit reads this list at run.
Tauri MCP validation (project type)
Knowhow consolidated from real Tauri debug/test sessions. Read this
before touching
or any
src/{routes,components,hooks}/**
in a Tauri-detected repo.
Toolbox (canonical references)
mcp__tauri__driver_session(start)
— always invoke first; subsequent
webview tools assume an active session.
mcp__tauri__webview_screenshot
— the canonical "I opened the
screen" signal. Pass (Tauri apps usually have
+ ).
mcp__tauri__webview_execute_js
— for invokes that take longer than
the JS executor timeout (≈seconds), fire-and-forget via
window.__TAURI_INTERNALS__.invoke(cmd, args).then(r => window.__last = r)
;
poll DB or screenshot instead of -ing.
mcp__tauri__webview_interact
/ / refs in
— these depend on window.__MCP__.resolveRef
which is undefined after dev-server HMR reload. Prefer
document.querySelector(...).click()
via .
Rebuild flow
- Rust change →
touch src-tauri/src/<file.rs>
forces to
rebuild (cargo's mtime watcher).
- Monitor the dev process output for +
MCP Bridge plugin initialized
before re-running MCP calls; HMR only covers the
frontend.
- In Claude Code, prefer with the filter
tail -f tauri.log | grep --line-buffered -E "Finished|error\\[|MCP Bridge plugin initialized"
.
Validation patterns
- DB direct read — cheaper than for state assertions:
sqlite3 <workspace>/<app>.db "SELECT ..."
.
- GPU offload check —
ps -p $(pgrep -x <app-name>) -o pcpu,etime
.
Apps using Metal/CUDA show low CPU (5-15%) when the GPU is doing the
work; CPU-only fallback shows 200-400%.
- Visual check —
location.assign('/route/...')
via
, wait ~500ms, then .
Known gotchas (cross-project patterns)
Each project keeps its own list of in-tree fixes (e.g. in a
wiki/technical/tauri-gotchas.md
for projects that follow the LLM
wiki pattern). The items below describe the
patterns — the actual
file paths vary per project.
-
whisper-rs 0.14.x type confusion — in
v0.14.4 the trampoline is instantiated with the original closure
type, while
is actually written as
. The callback dereferences the wrong
memory layout and returns garbage bools, aborting whisper at 0–5%
with error
. Workaround: use the
variant
+
set_abort_callback_user_data
with a
hand-written trampoline that matches the stored type:
rust
unsafe extern "C" fn abort_trampoline(ud: *mut std::ffi::c_void) -> bool {
let f = &mut *(ud as *mut Box<dyn FnMut() -> bool>);
f()
}
let closure: Box<dyn FnMut() -> bool> = Box::new({
let cancel = cancel.clone();
move || cancel.load(Ordering::Relaxed)
});
let ud = Box::into_raw(Box::new(closure)) as *mut std::ffi::c_void;
unsafe {
params.set_abort_callback(Some(abort_trampoline));
params.set_abort_callback_user_data(ud);
}
-
Background jobs stuck after binary kill — a tokio task
that dies mid-flight (cargo restart, app crash) never updates its
DB row to a terminal state. Add a reconcile pass at pool-open:
rust
// db::open_pool, after schema migration
sqlx::query(
"UPDATE jobs SET status='cancelled',
error_message=COALESCE(error_message,'interrupted by app restart'),
finished_at=?
WHERE status IN ('running','pending')"
).bind(now_iso()).execute(&pool).await?;
-
Modal libraries that couple to data state (e.g. Base UI
Dialog, some Radix patterns) — when the data prop goes
in
the same render that
flips false, the modal unmounts before
the close animation finishes; pointer-events stay trapped on
and the page becomes unclickable. Decouple
(boolean) from the data, then defer the data clear with
(or use an
callback when
the library exposes one).
-
CREATE TABLE IF NOT EXISTS
is a no-op on existing tables —
reused workspaces / DBs do NOT pick up new columns added in the
schema file. Symptom: runtime
after upgrade. Fix
options: explicit
ALTER TABLE … ADD COLUMN IF NOT EXISTS
per
drift; OR, if the DB has no production data, drop and re-bootstrap.
-
only decodes WAV — for
/
/
/
, pipe through ffmpeg to
before
feeding Whisper:
rust
// ffmpeg -hide_banner -loglevel error -nostdin -i <input>
// -ac 1 -ar 16000 -f f32le -
let mut child = Command::new(ffmpeg)
.args(["-hide_banner","-loglevel","error","-nostdin","-i"])
.arg(input)
.args(["-ac","1","-ar","16000","-f","f32le","-"])
.stdout(Stdio::piped()).spawn()?;
// read stdout, reinterpret bytes as &[f32]
Definition of Done (Tauri)
For each change that touches
in
every affected
app (monorepo: per-app):
cargo check --manifest-path <app>/src-tauri/Cargo.toml --lib
green.
- (or ) green.
- showed after the last in
.
- Companion test exists (base TDD rule, if applicable).
- ≥1
mcp__tauri__webview_screenshot
call after the last edit
under .
- Post-operation state confirmed: DB query, DOM snapshot, or
evidence visible in the screenshot.
The
Stop hook checks (1), (2), and (5)
automatically via the
template. Steps (3), (4), and (6) remain
agent responsibility.
Relationship with the flow
mermaid
flowchart LR
A["/agile-story"] --> B[TDD cycle]
B --> C[Red: failing test]
C --> D[Green: minimum code]
D --> E[Refactor]
E --> F{More?}
F -->|Yes| C
F -->|No| G["/agile-refinement"]
This skill operates during execution. It pairs with
(which defines what to build) and feeds into
(which validates the result). When the optional enforcement is installed, the rule is also applied automatically at every
tool call.