groundwork-verify
Generates Cucumber.js step definitions that test the system as a black box through its external interface.
Step definitions live in
— a self-contained layer with its own
, independent of the project's stack.
CONSTRAINT — Epistemic perimeter
This skill NEVER reads files outside docs/specs/.
No access to src/, app/, lib/, or any application directory.
If information is missing, ASK the user — never infer it from code.
This is non-negotiable. The skill must have exactly the same information as the Gherkin — no more, no less.
Reading application code introduces cognitive bias: the skill learns patterns from the implementation (including bugs) and replicates them in step definitions. Tests end up validating the implementation instead of the contract.
The constraint prevents the bias vector — it doesn't matter if the dev knows the code. What matters is that the skill doesn't use it as source of truth.
Clarification: Information provided verbally by the dev (e.g. "the endpoint changed to POST /refunds") is legitimate input. You MAY update
with information the dev tells you directly — the constraint forbids reading application files, not listening to the dev. The interview mechanism works the same way: the dev describes the interface, you write it down.
When to use
- After implementation completes a feature and you need to verify it matches the spec
- User asks to generate tests from files
- User asks "does the code do what we specified?"
- New scenarios were added via and step definitions need updating
When NOT to use
- User wants to define or design a feature → use groundwork-discovery
- User wants unit tests or internal code tests — this skill is black-box only
- No files exist yet
Quick reference
| Command | Purpose |
|---|
/groundwork verify <feature>
| Generate/update step definitions for one feature |
| Generate/update step definitions for all features |
| Interface type | Tooling | Assertions via |
|---|
| / | | HTTP status, JSON body |
| | stdout, stderr, exit code |
| / | Playwright | Page elements, navigation |
First action
Read
docs/specs/CONVENTIONS.md
if it exists.
Then read existing step definitions in
docs/specs/step-definitions/
to understand the style already in use.
Do NOT read any file outside .
SPEC-INTERFACE.md — Interface contract
lives in
and is the
only source from which this skill gets information about the system under test.
It is written by the dev, not inferred from code. This is the critical point: describing the system explicitly breaks implicit bias — the dev reasons about the contract, not the implementation.
Structure
markdown
# Interface Contract
## Type
rest | web | rest+web | cli
## Base URLs
api: http://localhost:3000/api/v1
web: http://localhost:5173
## Auth
type: bearer | cookie | none
token_env: TEST_AUTH_TOKEN # env var name, not the value
## Endpoint map
# Format: METHOD /path → expected response (success | error)
POST /auth/login → 200 {token} | 401
GET /users/:id → 200 {user} | 404
POST /orders → 201 {order} | 422
## Web entry points
# Format: <feature> → <path>
login → /login
dashboard → /dashboard
checkout → /checkout
## Notes
# Any contextual information useful for tests, not inferable from scenarios
The Endpoint map section is the most important: it explicitly maps scenarios → interface, forcing the dev to declare the contract instead of letting the skill deduce it from code.
Architecture
Prerequisite
The system under test exposes a standard external interface. Step definitions never touch internal code — they interact exclusively through:
- REST/GraphQL — endpoints
- CLI — command-line process
- Web — browser UI
Fixed toolchain
Cucumber.js for all cases. Playwright added if any feature uses the
interface type.
docs/
specs/
package.json ← cucumber.js (+ playwright if needed)
cucumber.js ← cucumber config
CONVENTIONS.md
SPEC-INTERFACE.md ← interface contract (only source of system info)
features/
<domain>/<feature>.feature
step-definitions/
<domain>/<feature>.steps.js
support/
hooks.js ← Before/After setup and teardown
world.js ← shared state across steps
.env.test.example ← required env vars, no secrets
Interface → tooling map
| Interface type | HTTP client | Notes |
|---|
| | JSON request/response assertions |
| | spawn process, assert stdout/exit code |
| Playwright | headless browser, behavioural assertions |
A single feature can mix
and
steps if the interface type is
.
Command
/groundwork verify [feature-name]
Generate or update step definitions. Without argument, processes all
files.
Phase 1 — Interface discovery
SPEC-INTERFACE.md exists?
├── YES → read the file
│ ↓
│ all necessary information present?
│ ├── YES → proceed to Phase 2
│ └── NO → ask only the missing gaps, update the file
│
└── NO → enter interview mode
generate SPEC-INTERFACE.md at the end
proceed to Phase 2
Interview (only if the file doesn't exist or is incomplete)
Ask questions in order, one at a time:
- "Interface type: REST API, Web UI, or both?"
- "Base URL of the system in test environment?"
- "How does it authenticate? Bearer token, session cookie, or none?"
- "For each scenario in the , which endpoint or page is involved?"
Question 4 is the core anti-bias mechanism: the dev manually maps scenarios → interface, without the skill looking at code.
Incremental update
If
exists but has gaps (e.g. a new scenario references an unmapped endpoint):
- Ask only for the missing information
- Update the file with append, without touching existing sections
- Explicitly report what was added
Phase 2 — Gap analysis
Compare
files against existing step definitions in
docs/specs/step-definitions/
:
| Status | Action |
|---|
| Missing — scenario with no step definition | Generate |
| Broken — step definition referencing a changed endpoint or selector | Regenerate |
| Obsolete — step definition with no matching scenario | Report to user, do not delete |
Only generate what is missing or broken. Never rewrite working step definitions.
Report the gap analysis before generating:
Found 3 scenarios without step definitions:
- auth/login.feature: "login with expired token"
- payments/checkout.feature: "checkout with empty cart" (new)
- payments/checkout.feature: "duplicate order submission" (new)
Generating...
Phase 3 — Generate step definitions
Use information from
(base URLs, endpoints, auth type) to generate step definitions.
Never look at application code to determine URLs, selectors, or configuration.
REST pattern:
javascript
// Scenario: <scenario name from .feature>
Given('a registered user exists with email {string}', async function(email) {
// TODO: adjust to your seed/fixture mechanism
this.user = await createTestUser({ email });
});
When('they attempt login with incorrect credentials', async function() {
// URL from SPEC-INTERFACE.md endpoint map
this.response = await fetch(`${process.env.BASE_URL}/auth/login`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ email: this.user.email, password: 'wrong' })
});
});
Then('the operation fails with an authentication error', async function() {
assert.strictEqual(this.response.status, 401);
});
Web pattern:
javascript
// Scenario: <scenario name from .feature>
Given('a registered user exists', async function() {
// TODO: adjust to your seed/fixture mechanism
this.user = await createTestUser();
});
When('they attempt login with incorrect credentials', async function() {
// Path from SPEC-INTERFACE.md web entry points
await this.page.goto(`${process.env.WEB_URL}/login`);
await this.page.fill('[data-testid="login-email"]', this.user.email);
await this.page.fill('[data-testid="login-password"]', 'wrong');
await this.page.click('[data-testid="login-submit"]');
});
Then('the operation fails with an authentication error', async function() {
await expect(this.page.locator('[data-testid="login-error"]')).toBeVisible();
});
Selector naming convention: Use
data-testid="<page>-<element>"
(e.g.
,
). Choose names that describe the UI concept from the scenario, not the implementation. These selectors are declared in the step definitions first — the dev adds them to the markup after.
CLI pattern:
javascript
// Scenario: <scenario name from .feature>
When('they run the export command', async function() {
this.cliOutput = await exec(`node cli.js export --user=${this.user.id}`);
});
Then('the export succeeds', async function() {
assert.strictEqual(this.cliOutput.exitCode, 0);
});
Rules for all patterns:
- Add comment at the top referencing the origin
- Use URLs and paths from — comment the source (e.g.
// URL from SPEC-INTERFACE.md endpoint map
)
- Mark with anything requiring manual configuration (seed mechanisms, selectors)
- For scenarios, generate a stub with — do not skip silently
- For scenarios, generate nothing
Phase 4 — Setup (first run only)
- Generate :
json
{
"name": "specs",
"private": true,
"scripts": {
"test": "cucumber-js",
"test:tag": "cucumber-js --tags"
},
"dependencies": {
"@cucumber/cucumber": "^10.0.0",
"node-fetch": "^3.0.0"
}
}
Add
"@playwright/test": "^1.0.0"
and
if any
interface type exists.
- Generate :
javascript
module.exports = {
default: {
paths: ['features/**/*.feature'],
require: ['step-definitions/**/*.steps.js', 'support/**/*.js'],
format: ['progress-bar', 'html:reports/cucumber-report.html'],
parallel: 2
}
}
-
Generate
docs/specs/support/world.js
with shared state (page instance for Playwright, auth tokens, last response)
-
Generate
docs/specs/support/hooks.js
with
/
lifecycle hooks for setup and teardown
-
Generate
docs/specs/.env.test.example
based on
:
BASE_URL=http://localhost:3000
# Add other required env vars from SPEC-INTERFACE.md here
- Add to the project root (if it exists):
json
"test:bdd": "cd docs/specs && npm test"
- Add to
Drift handling
can become stale if the API changes without updating the file. The skill cannot detect this automatically — that would require reading application code, violating the constraint.
Mitigation strategy: CI failure is the drift signal.
CI red for 404 / element not found
→ dev updates SPEC-INTERFACE.md
→ /groundwork verify <feature> (regenerates step definitions)
→ CI green
Drift is detected at runtime, not compile time. This is an acceptable trade-off: the cost of a false negative in CI is much lower than the cost of tests that validate bugs.
Output summary
After generation, always report:
Generated:
docs/specs/step-definitions/auth/login.steps.js (3 scenarios)
docs/specs/step-definitions/payments/checkout.steps.js (2 scenarios)
Stubs (pending):
docs/specs/step-definitions/auth/login.steps.js → "login with biometric" (@pending)
Skipped:
payments/checkout.feature → "guest checkout" (@out-of-scope)
Obsolete (no matching scenario — not deleted):
docs/specs/step-definitions/auth/register.steps.js → "register with invite code"
Next steps:
1. Copy docs/specs/.env.test.example → docs/specs/.env.test and fill in values
2. Review // TODO: comments in generated files
3. Add the required selectors (see prompt below)
4. cd docs/specs && npm install && npm test
Selector prompt (only for / features)
After the output summary, if any step definition uses Playwright selectors, generate a
ready-to-copy prompt listing every
the step definitions expect. The dev pastes this prompt into their coding agent to add the selectors to the markup.
Format:
---
The following data-testid attributes are required by the BDD step definitions.
Add them to the corresponding UI elements in the application code.
Page: /login
- data-testid="login-email" → email input field
- data-testid="login-password" → password input field
- data-testid="login-submit" → submit button
- data-testid="login-error" → error message container (visible on auth failure)
Page: /checkout
- data-testid="checkout-card" → credit card input
- data-testid="checkout-submit" → place order button
- data-testid="checkout-success" → order confirmation message
---
Rules for the selector prompt:
- Group by page (from web entry points)
- Each line: the exact value + a short description of the expected element
- Include selectors for assertions too (error messages, success states, content checks)
- Only list selectors that appear in the generated step definitions — no extras
Common mistakes
| Mistake | Fix |
|---|
| Reading application code (src/, app/, lib/) to determine URLs or config | NEVER. Use only . If info is missing, ask the user. |
| Rewriting working step definitions | Always run gap analysis first — only generate what is missing or broken |
| Assuming the interface type without checking | Read the file or ask the user — never infer from code |
| Importing or calling internal project code in step definitions | Step definitions are black-box — interact only through the external interface |
| Skipping scenarios silently | Generate a stub with so they show up in test reports |
| Deleting obsolete step definitions without asking | Report obsolete definitions to the user — never delete automatically |
| Forgetting markers on config-dependent code | Mark seed mechanisms, selectors, and URLs that need manual adjustment |
| Inferring endpoints from code when has gaps | Ask the user to update — never fill gaps from code |