spec-kitty-checklist

Checklist Purpose: "Unit Tests for English"

CRITICAL CONCEPT: Checklists are UNIT TESTS FOR REQUIREMENTS WRITING - they validate the quality, clarity, and completeness of requirements in a given domain.

NOT for verification/testing:

❌ NOT "Verify the button clicks correctly"
❌ NOT "Test error handling works"
❌ NOT "Confirm the API returns 200"
❌ NOT checking if code/implementation matches the spec

FOR requirements quality validation:

✅ "Are visual hierarchy requirements defined for all card types?" (completeness)
✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity)
✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
✅ "Does the spec define what happens when logo image fails to load?" (edge cases)

Metaphor: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.

User Input

text

$ARGUMENTS

You MUST consider the user input before proceeding (if not empty).

Execution Steps

Setup: Run
```
spec-kitty agent feature check-prerequisites --json
```
from repo root and parse JSON for feature_dir and available_docs list.
- All file paths must be absolute.
Clarify intent (dynamic): Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST:
- Be generated from the user's phrasing + extracted signals from spec/plan/tasks
- Only ask about information that materially changes checklist content
- Be skipped individually if already unambiguous in
```
$ARGUMENTS
```
- Prefer precision over breadth
Generation algorithm:
1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts").
2. Cluster signals into candidate focus areas (max 4) ranked by relevance.
3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit.
4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria.
5. Formulate questions chosen from these archetypes:
  - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?")
  - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?")
  - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?")
  - Audience framing (e.g., "Will this be used by the author only or peers during PR review?")
  - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?")
  - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?")
Question formatting rules:
- If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters
- Limit to A–E options maximum; omit table if a free-form answer is clearer
- Never ask the user to restate what they already said
- Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope."
Defaults when interaction impossible:
- Depth: Standard
- Audience: Reviewer (PR) if code-related; Author otherwise
- Focus: Top 2 relevance clusters
Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more.
Understand user request: Combine
```
$ARGUMENTS
```
+ clarifying answers:
- Derive checklist theme (e.g., security, review, deploy, ux)
- Consolidate explicit must-have items mentioned by user
- Map focus selections to category scaffolding
- Infer any missing context from spec/plan/tasks (do NOT hallucinate)
Load feature context: Read from feature_dir:
- spec.md: Feature requirements and scope
- plan.md (if exists): Technical details, dependencies
- tasks.md (if exists): Implementation tasks
Context Loading Strategy:
- Load only necessary portions relevant to active focus areas (avoid full-file dumping)
- Prefer summarizing long sections into concise scenario/requirement bullets
- Use progressive disclosure: add follow-on retrieval only if gaps detected
- If source docs are large, generate interim summary items instead of embedding raw text
Generate checklist - Create "Unit Tests for Requirements":
- Create
```
feature_dir/checklists/
```
  directory if it doesn't exist
- Generate unique checklist filename:
  - Use short, descriptive name based on domain (e.g.,
    ux.md
    ,
    api.md
    ,
    security.md
    )
  - Format:
    [domain].md
  - If file exists, append to existing file
- Number items sequentially starting from CHK001
- Each
```
/spec-kitty.checklist
```
  run creates a NEW file (never overwrites existing checklists)
CORE PRINCIPLE - Test the Requirements, Not the Implementation: Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for:
- Completeness: Are all necessary requirements present?
- Clarity: Are requirements unambiguous and specific?
- Consistency: Do requirements align with each other?
- Measurability: Can requirements be objectively verified?
- Coverage: Are all scenarios/edge cases addressed?
Category Structure - Group items by requirement quality dimensions:
- Requirement Completeness (Are all necessary requirements documented?)
- Requirement Clarity (Are requirements specific and unambiguous?)
- Requirement Consistency (Do requirements align without conflicts?)
- Acceptance Criteria Quality (Are success criteria measurable?)
- Scenario Coverage (Are all flows/cases addressed?)
- Edge Case Coverage (Are boundary conditions defined?)
- Non-Functional Requirements (Performance, Security, Accessibility, etc. - are they specified?)
- Dependencies & Assumptions (Are they documented and validated?)
- Ambiguities & Conflicts (What needs clarification?)
HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English":

❌ WRONG (Testing implementation):
- "Verify landing page displays 3 episode cards"
- "Test hover states work on desktop"
- "Confirm logo click navigates home"
✅ CORRECT (Testing requirements quality):
- "Are the exact number and layout of featured episodes specified?" [Completeness]
- "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity]
- "Are hover state requirements consistent across all interactive elements?" [Consistency]
- "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
- "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
- "Are loading states defined for asynchronous episode data?" [Completeness]
- "Does the spec define visual hierarchy for competing UI elements?" [Clarity]
ITEM STRUCTURE: Each item should follow this pattern:
- Question format asking about requirement quality
- Focus on what's WRITTEN (or not written) in the spec/plan
- Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.]
- Reference spec section
```
[Spec §X.Y]
```
  when checking existing requirements
- Use
```
[Gap]
```
  marker when checking for missing requirements
EXAMPLES BY QUALITY DIMENSION:

Completeness:
- "Are error handling requirements defined for all API failure modes? [Gap]"
- "Are accessibility requirements specified for all interactive elements? [Completeness]"
- "Are mobile breakpoint requirements defined for responsive layouts? [Gap]"
Clarity:
- "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]"
- "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
- "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]"
Consistency:
- "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]"
- "Are card component requirements consistent between landing and detail pages? [Consistency]"
Coverage:
- "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]"
- "Are concurrent user interaction scenarios addressed? [Coverage, Gap]"
- "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]"
Measurability:
- "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
- "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
Scenario Classification & Coverage (Requirements Quality Focus):
- Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
- For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
- If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]"
- Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]"
Traceability Requirements:
- MINIMUM: ≥80% of items MUST include at least one traceability reference
- Each item should reference: spec section
```
[Spec §X.Y]
```
  , or use markers:
```
[Gap]
```
  ,
```
[Ambiguity]
```
  ,
```
[Conflict]
```
  ,
```
[Assumption]
```
- If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
Surface & Resolve Issues (Requirements Quality Problems): Ask questions about the requirements themselves:
- Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]"
- Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]"
- Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
- Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
- Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
Content Consolidation:
- Soft cap: If raw candidate items > 40, prioritize by risk/impact
- Merge near-duplicates checking the same requirement aspect
- If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]"
🚫 ABSOLUTELY PROHIBITED - These make it an implementation test, not a requirements test:
- ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior
- ❌ References to code execution, user actions, system behavior
- ❌ "Displays correctly", "works properly", "functions as expected"
- ❌ "Click", "navigate", "render", "load", "execute"
- ❌ Test cases, test plans, QA procedures
- ❌ Implementation details (frameworks, APIs, algorithms)
✅ REQUIRED PATTERNS - These test requirements quality:
- ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
- ✅ "Is [vague term] quantified/clarified with specific criteria?"
- ✅ "Are requirements consistent between [section A] and [section B]?"
- ✅ "Can [requirement] be objectively measured/verified?"
- ✅ "Are [edge cases/scenarios] addressed in requirements?"
- ✅ "Does the spec define [missing aspect]?"
Structure Reference: Generate the checklist following the canonical template in
```
.kittify/templates/checklist-template.md
```
for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines,
```
##
```
category sections containing
```
- [ ] CHK### <requirement item>
```
lines with globally incrementing IDs starting at CHK001.
Report: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize:
- Focus areas selected
- Depth level
- Actor/timing
- Any explicit user-specified must-have items incorporated

Important: Each

/spec-kitty.checklist

command invocation creates a checklist file using short, descriptive names unless file already exists. This allows:

Multiple checklists of different types (e.g.,
```
ux.md
```
,
```
test.md
```
,
```
security.md
```
)
Simple, memorable filenames that indicate checklist purpose
Easy identification and navigation in the
```
checklists/
```
folder

To avoid clutter, use descriptive types and clean up obsolete checklists when done.

Example Checklist Types & Sample Items

UX Requirements Quality:

ux.md

Sample items (testing the requirements, NOT the implementation):

"Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]"
"Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]"
"Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]"
"Are accessibility requirements specified for all interactive elements? [Coverage, Gap]"
"Is fallback behavior defined when images fail to load? [Edge Case, Gap]"
"Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]"

API Requirements Quality:

api.md

Sample items:

"Are error response formats specified for all failure scenarios? [Completeness]"
"Are rate limiting requirements quantified with specific thresholds? [Clarity]"
"Are authentication requirements consistent across all endpoints? [Consistency]"
"Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]"
"Is versioning strategy documented in requirements? [Gap]"

Performance Requirements Quality:

performance.md

Sample items:

"Are performance requirements quantified with specific metrics? [Clarity]"
"Are performance targets defined for all critical user journeys? [Coverage]"
"Are performance requirements under different load conditions specified? [Completeness]"
"Can performance requirements be objectively measured? [Measurability]"
"Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]"

Security Requirements Quality:

security.md

Sample items:

"Are authentication requirements specified for all protected resources? [Coverage]"
"Are data protection requirements defined for sensitive information? [Completeness]"
"Is the threat model documented and requirements aligned to it? [Traceability]"
"Are security requirements consistent with compliance obligations? [Consistency]"
"Are security failure/breach response requirements defined? [Gap, Exception Flow]"

Anti-Examples: What NOT To Do

❌ WRONG - These test implementation, not requirements:

markdown

- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]

✅ CORRECT - These test requirements quality:

markdown

- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]

Key Differences:

Wrong: Tests if the system works correctly
Correct: Tests if the requirements are written correctly
Wrong: Verification of behavior
Correct: Validation of requirement quality
Wrong: "Does it do X?"
Correct: "Is X clearly specified?"

spec-kitty-checklist

NPX Install

Tags

SKILL.md Content

Checklist Purpose: "Unit Tests for English"

User Input

Execution Steps

Example Checklist Types & Sample Items

Anti-Examples: What NOT To Do