Who you are: If
exists, read it — it defines your character.
No MCP? Use
instead of MCP tools.
Tests — Write, Generate, Fix
Orient First (Always)
Before doing anything, check what already exists:
helpmetest_status()
helpmetest_search_artifacts({ query: "" })
helpmetest_search_artifacts({ type: "Tasks" })
- Tests already failing? → that's the priority, not creating new ones
- Tasks artifact in progress? → resume it, don't start over
- Feature artifacts exist? → use them, don't re-discover
Use Cases
"I need to build something" (TDD)
New feature, bug fix, or refactor. Tests come first — they define what "done" means.
1. Create a Tasks artifact to track the work:
json
{
"id": "tasks-[feature-name]",
"type": "Tasks",
"content": {
"overview": "What this implements and why",
"tasks": [
{ "id": "1.0", "title": "Write all tests first", "status": "pending", "priority": "critical" },
{ "id": "2.0", "title": "Implement to make tests pass", "status": "pending" },
{ "id": "3.0", "title": "All green — review for gaps", "status": "pending" }
]
}
}
2. Create a Feature artifact with all scenarios before writing a single test:
json
{
"id": "feature-[name]",
"type": "Feature",
"content": {
"goal": "What this feature does",
"functional": [
{ "name": "User can do X", "given": "...", "when": "...", "then": "...", "tags": ["priority:critical"], "test_ids": [] }
],
"edge_cases": [],
"bugs": []
}
}
3. Write ALL tests — happy paths, edge cases, errors — before implementing anything. Failing tests are your spec.
4. Implement incrementally — pick the highest-priority failing test, make it pass, move to the next.
5. Done when all tests are green and you've reviewed for missing edge cases.
"Write tests for an existing feature"
Feature exists (or was just built by someone else). Your job is tests only.
1. Read the Feature artifact —
helpmetest_get_artifact({ id: "feature-X" })
. If none exists, create one first based on what you know.
2. Explore interactively before writing — run the scenario step by step using
helpmetest_run_interactive_command
. A test written after seeing real behavior uses real selectors and reflects actual timing. A test written from a description is a guess.
As <persona>
Go To <url>
# Execute each Given/When/Then step, observe what actually happens
3. Write tests for
scenarios first, then high, then medium. For each:
- 5+ meaningful steps
- Verify business outcomes (data saved, state changed) — not just that an element is visible
- Use for any registration/email fields — never hardcode
4. Link tests back — add each test ID to
in the Feature artifact.
5. Validate each test with
before running. Catches tests that pass when the feature is broken.
6. Run and fix — see "Fix broken tests" below if a newly-written test fails.
"Fix broken tests" / "Tests are failing"
First: understand the failure pattern
Check recent code changes:
bash
git diff --stat HEAD
git log --oneline -5
Map changed files to likely causes:
- , → selector changes
- , → auth state issues
- , → backend errors or changed response shapes
Then get test history:
helpmetest_status({ id: "test-id", testRunLimit: 10 })
Classify:
- Consistent failure after a code change → selector/behavior changed
- Intermittent PASS/FAIL with changing errors → isolation issue (shared state, test order dependency)
- Timeout / element not visible → timing issue
- Auth/session error → state not restored correctly
- Backend error in test output → real bug, not a test issue
Reproduce interactively — always do this before fixing
Run the failing steps one at a time:
As <persona>
Go To <url>
# Execute each step, observe what actually happens at the point of failure
For "element not found": list all elements of that type, try alternate selectors.
For "wrong value": check what's actually displayed vs what the test expected.
For timeouts: try longer waits, check whether the element ever appears.
Decide: test issue or app bug?
- Test issue (selector changed, timing, wrong expectation) → fix the test, validate the fix interactively before saving
- App bug (feature is actually broken) → document in , update Feature.status to "broken" or "partial"
Many tests broke after a UI change?
Work through them systematically one by one. For each:
- Classify the failure (usually selector or timing)
- Reproduce interactively
- Fix
- Run to confirm
Don't shotgun-fix by guessing — one wrong fix creates two broken tests.
Tests are out of date after a refactor?
- Get test list:
- For each failing test, check whether the Feature artifact scenario still matches intended behavior
- If the code is the source of truth → update the test
- If the test was right and the refactor broke behavior → document the regression
Writing Tests
Structure
As <persona> # auth state — always first
Go To <url>
# Given — establish precondition
<steps>
# When — perform the action
<steps>
# Then — verify the outcome
<assertions>
# Persistence check (if relevant)
Reload
<re-assert that state survived>
What makes a good test
✅ Verifies a business outcome — data saved, filter applied, order created
✅ Would FAIL if the feature is broken
✅ 5+ meaningful steps
✅ Checks state change, not just that a button exists
❌ Just navigates to a page and counts elements
❌ Clicks something without checking what happened
❌ Passes when the feature is broken
Test naming
- ✅
User can update profile email
- ✅
Cart total updates when quantity changes
- ❌
- ❌
Auth
Use
once to capture auth state. Reuse with
in every test — never re-authenticate inside tests.
Emails
Use
— never hardcode
. Hardcoded emails break on second run.
${email}= Create Fake Email
Fill Text input[name=email] ${email}
${code}= Get Email Verification Code ${email}
Localhost
If testing a local server, set up the proxy first:
helpmetest_proxy({ action: "start", domain: "dev.local", sourcePort: 3000 })
Verify it works before writing any tests. See the
skill for details.
Tags
priority:critical|high|medium|low
type:e2e|smoke|regression
Done means
- ✅ All tests passing
- ✅ All scenarios have
- ✅ Bugs documented in
- ✅ Feature.status updated ( / / )
- ✅ Tasks artifact all done