<intake>
Create, Audit, or Consolidate Skills
Create agent skills following the
Agent Skills open standard.
What do you need to do?
- Audit an existing skill — Review, improve, or debug a SKILL.md
- Create a new skill — Interview, draft, and review from scratch
- Consolidate skills — Merge multiple skills into fewer
Wait for response before proceeding.
</intake>
<routing>
| Response | Workflow |
|---|
| 1, "audit", "review", "check", "fix", "improve" | Audit Workflow (Step 1–4 in this file) |
| 2, "create", "write", "build", "new", "draft" | Phases 1–5 (Interview → Draft → Description → Scripts → Review) in this file |
| 3, "consolidate", "merge", "combine" | references/consolidation-guide.md
— return to Phase 5 for final checklist |
</routing>
Audit Workflow
Use this workflow when reviewing, improving, or debugging an existing skill.
Step 1: Locate and read the skill
Read the full SKILL.md and list all files in the skill directory (
,
,
,
).
Step 2: Run the audit checklist
Check each category. Note issues as you go.
Frontmatter:
Structure:
Content quality:
Router pattern (if applicable):
Scripts (if present):
Read
references/anti-patterns.md
for the full catalog of common failures.
Step 3: Generate the report
Present findings grouped by severity:
- Critical — skill won't trigger or produces wrong output
- Important — structural issues, missing files, spec violations
- Minor — style, conciseness, optimization opportunities
For each finding, state the issue, cite the specific line or section, and recommend a fix.
Step 4: Offer fixes
Ask the user which findings to fix. Apply changes surgically — don't rewrite sections that aren't broken. Run the Phase 5 review checklist on the modified skill before finishing.
Phase 1: Interview
Interview the user about every aspect of this skill until reaching shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
Interview cadence
Ask one question at a time. Wait for the answer before asking the next. Adapt follow-ups based on what you learn. Each question should provide clear benefit toward building a better skill — cut questions the codebase can answer for you.
If a question can be answered by exploring the codebase, explore the codebase instead of asking.
Focus areas, roughly in order:
- Purpose and audience. What task does this skill cover? What specific problem does it solve? What does the user do today without it?
- Scope boundaries. What should this skill NOT do? What adjacent tasks belong to other skills?
- Input/output. What does the user provide? What does the skill produce? Specific formats?
- Edge cases. What goes wrong? Common mistakes? Gotchas for new users?
- Success criteria. How do you know the skill worked correctly?
- What can be scripted? Look for deterministic operations that should be code, not LLM instructions. Scripts are cheaper, faster, and more reliable.
- References needed? Domain knowledge too large for SKILL.md that should live in separate files?
- Existing patterns. Similar skills or workflows to draw from? Check the codebase.
- Platform constraints. macOS, Windows, and Linux? Scripts must handle path separators, temp directories, and shell differences.
- External services and APIs. Does the skill call external APIs or services? If yes, read
references/api-skill-patterns.md
— it covers credential handling, schema discovery, instance-specific values, and error placement.
Architecture decision tree
After the interview questions above, decide the architecture. Most skills are simple — only escalate when the answers demand it.
Question 1: How many distinct things can a user want to do?
- One specific thing → Simple skill (single SKILL.md, under 200 lines)
- Multiple things with shared principles → continue to Q2
Question 2: Is there shared domain knowledge across those operations?
- No, each operation is self-contained → Simple skill (or multiple separate simple skills)
- Yes, multiple operations share knowledge → Router skill (SKILL.md + )
Question 3: Does it cover a full lifecycle (build, debug, test, ship)?
- No → Router skill is sufficient
- Yes → Domain expertise skill (exhaustive references, full lifecycle workflows)
| What you're building | Pattern |
|---|
| "A skill that commits with a conventional message" | Simple |
| "A skill that manages PRs — create, review, merge, close" | Router |
| "A skill for building and shipping macOS apps" | Domain expertise |
| "A skill that audits other skills" | Simple (upgrade to Router if it grows) |
For Router and Domain expertise patterns, also ask:
- Does the skill need project-level context? If every command needs the same background, design a context file pattern with a loader script.
- Are there mandatory setup gates? Steps that must pass before any work begins. Gates prevent generic output.
- Does behavior vary by task type? If so, design a register/mode system that classifies the task first, then loads different references.
Read
references/architecture-patterns.md
for implementation details of each pattern.
Consolidation signal check: If the interview reveals the new skill overlaps significantly with existing skills (shared scripts, cross-references, linear pipeline), consider consolidating instead of creating. Read
references/consolidation-guide.md
for the signals and workflow.
Do not proceed to Phase 2 until the user confirms the scope is complete.
Phase 2: Draft the SKILL.md
Write the skill following the spec. Read
for the full format reference before drafting.
Starter templates: Use
templates/simple-skill.md
for single-purpose skills,
templates/router-skill.md
for multi-command skills using markdown headings, or
templates/router-skill-xml.md
for multi-command skills using XML structure. Copy the template as a starting point, then customize.
Frontmatter
yaml
---
name: skill-name # lowercase, hyphens, max 64 chars
description: | # max 1024 chars — this is the ONLY triggering mechanism
What the skill does. Use when [specific triggers].
Also use when [additional triggers].
---
The description must be slightly "pushy" — agents tend to undertrigger. Include both what the skill does AND specific phrases/contexts that should activate it.
Body structure
Follow progressive disclosure — three loading levels:
- Metadata (~100 tokens): and loaded at startup for all skills
- Instructions (< 500 lines): Full SKILL.md body loaded when skill activates
- Resources (as needed): , , loaded only when required
Keep the SKILL.md body under 500 lines. If approaching this limit, split domain-specific content into
files with clear pointers about when to read them.
Deduplication check
Before writing domain knowledge into a new reference file, check if it already exists in another reference. Shared data (exit criteria, field mappings, workflow rules) must live in exactly one file. New references should point to the existing source — not embed a copy.
Common trap: a new sub-command reference duplicates tables from an existing reference because it "needs them for context." Instead, add a one-line pointer: "Load
for exit criteria per status."
Exception: intentional duplication. When two sub-commands need the same query pattern but referencing each other would create a transitive loading chain (A → B → C), duplicate the pattern and add a note: "Same query pattern as X.md Step N — duplicated here to avoid transitive loading." This is cheaper than forcing the agent to load an unrelated file.
Writing patterns
- Imperative form: "Run the command" not "You should run the command"
- Explain WHY, not just what: Avoid rigid ALWAYS/NEVER rules without reasoning. Agents generalize from principles better than from rigid rules. Instead of "ALWAYS use pdfplumber. NEVER use PyPDF2," write "Use pdfplumber over PyPDF2 — it handles malformed PDFs more gracefully and preserves layout metadata needed for table extraction." Principles adapt to edge cases; rigid rules break.
- Don't explain what the agent already knows: Skip basic programming concepts, standard library usage, and well-known tool behavior. Only add context the agent doesn't have — project-specific conventions, non-obvious behavior, domain-specific gotchas. A 30-token code example beats a 150-token explanation of what a library is.
- Output templates: Define exact formats when the output structure matters
- Concrete examples: Show input → output for non-obvious workflows
- Gotchas sections: Common mistakes the agent should avoid
- Checklists: Multi-step workflows with validation gates
- Conditional loading: "Read if the API returns a non-200 status code" — not "see references/ for details"
- Absolute bans: When certain patterns are always wrong, use match-and-refuse lists. "If you're about to write X, stop and do Y instead." More effective than vague "be careful" guidance.
- Avoid hardcoded thresholds: Don't write arbitrary numbers as rules (e.g., "when you have 3+ sub-commands" or "if more than 5 issues") unless the threshold comes from a real constraint (API limit, spec requirement). Instead, describe the signal that triggers the behavior (e.g., "when you're copying the same text into another sub-command"). Hardcoded numbers feel authoritative but are usually guesses that don't generalize.
Read
references/anti-patterns.md
during drafting to avoid known pitfalls.
XML structure (router and domain expertise skills)
Agents parse XML tags more reliably than markdown headings when a skill has semantically distinct sections (principles, intake, routing, references). XML tags create unambiguous containers; markdown headings blend together in long prompts.
Read
references/xml-structure-guide.md
for suggested patterns and anti-patterns.
When XML helps:
- Skills with an intake question + routing table + essential principles
- Skills where an agent needs to quickly locate a specific section
- Skills with inline workflows that need clear start/end boundaries
When markdown is enough:
- Simple skills with a single linear workflow
- Sequential instructional content (phases, steps) where order matters more than section lookup
Sub-command router (when applicable)
For skills with multiple distinct operations, use a router table in SKILL.md.
xml
<intake>
## What would you like to do?
1. **Craft a feature** — Build end-to-end
2. **Audit code** — Technical quality checks
**Wait for response before proceeding.**
</intake>
<routing>
| Response | Workflow |
|----------|----------|
| 1, "craft", "build" | `references/craft.md` |
| 2, "audit", "check" | `references/audit.md` |
</routing>
Back the router with a
scripts/command-metadata.json
as the single source of truth:
json
{
"craft": {
"description": "Full build flow. Use when building a new feature end-to-end.",
"argumentHint": "[feature description]"
}
}
Setup gates (when applicable)
Non-negotiable checks before any file edits. Gates prevent generic output from missing context.
markdown
## Setup (non-optional)
|---|---|---|
| Context | Project config loaded via `python scripts/load_context.py` | Run the loader first |
| Config | Config file exists and is valid | Run `skill-name setup` |
| Command | Sub-command reference is loaded | Load the reference |
| Mutation | All gates above pass | Do not edit project files |
Register/mode system (when applicable)
When behavior varies by task type, classify first, then load different references:
markdown
## Register
Every task is **library** (published, API-stable) or **application** (internal, can break).
Identify before acting. Load the matching reference: [references/library.md] or [references/application.md].
Capability-gating
Steps that depend on optional environment capabilities (browser automation, specific CLI tools) must degrade gracefully:
markdown
### Automated Scan (Capability-Gated)
Run the automated scanner when ALL of these are true:
- The target files exist and are readable
- The required CLI tool is installed
If unavailable, state in one line that the step is skipped and why. Do not ask the user to install tooling.
Structured artifacts as handoffs
When one command produces output that another consumes, define the artifact structure explicitly. The producing command's reference defines the format; the consuming command's reference says what it expects:
markdown
### Plan Structure
**1. Summary** (2-3 sentences)
**2. Primary Goal**
**3. Approach**
...
Self-critique loops
For build/implementation commands, mandate inspect-and-fix passes with explicit exit bars:
markdown
### Critique and fix loop
After the first pass, write a short self-critique and patch. Repeat until no material issues remain:
1. Does it match the requirements?
2. Does it pass the [quality test]?
3. Check every expected scenario.
4. Check edge cases.
The exit bar is not "it works." It is: [explicit quality threshold].
Phase 3: Description Optimization
The description is the only thing agents see at startup. Read
references/description-guide.md
for the full optimization process.
Quick validation:
- Write 5 should-trigger queries (different phrasings, including ones that don't name the skill directly)
- Write 5 should-not-trigger queries (near-misses that share keywords but need different skills)
- Check: would the description correctly distinguish these?
- Revise if needed — broaden for missed triggers, narrow for false triggers
- Verify under 1024 characters
For skills with sub-commands, the main description covers the skill broadly. Each sub-command's description in
is optimized separately for auto-trigger keyword matching.
Phase 4: Scripts
Read
references/scripts-guide.md
for the full guide.
Bias toward scripts. Every deterministic operation should be a script, not an instruction. Scripts are cheaper (no LLM tokens), faster (no reasoning), and more reliable (no hallucination).
For each piece of the skill's workflow, ask: "Could a script do this?" If yes, write the script.
Should be scripts:
- Validation (input format, required fields, schema compliance)
- File generation from templates
- Data extraction and transformation
- API calls with structured responses
- Setup and environment checks
- Output formatting
- Context loading (read project files, resolve paths, return JSON)
- Pin/unpin shortcuts (create/remove command aliases)
- Cleanup (remove deprecated files after skill updates)
Should stay as instructions:
- Deciding between architectural approaches
- Reviewing code for quality or style
- Explaining tradeoffs to the user
- Creative writing or design decisions
- Interview/discovery conversations
Key patterns:
- Python without dependencies: stdlib only, for CLI parsing
- Python with dependencies: PEP 723 inline metadata with
- All scripts: Structured output (JSON when piped), clear exit codes, descriptive
Context loader pattern
For skills that need project-level context, write a loader script:
The script should follow all standard patterns:
with
, structured JSON output (pretty when interactive, compact when piped), clear exit codes (0 = found, 1 = missing),
for cross-platform paths, and stdlib-only imports. See the "Context File System" section in
references/architecture-patterns.md
for a skeleton.
The SKILL.md references it: "Load context via
python scripts/load_context.py
. Consume the full JSON output. Never pipe through
,
, or
."
Phase 5: Review
Before presenting the final skill, verify against this checklist:
Basics
Architecture (if applicable)
References
Scripts
API/Service Skills (if applicable)
Consolidation (if merging existing skills)
Quality
<reference_index>
Reference Index
| Reference | Load when... |
|---|
| Drafting a SKILL.md (Phase 2) — full format reference |
references/description-guide.md
| Optimizing the description (Phase 3) |
references/scripts-guide.md
| Writing scripts (Phase 4) |
references/anti-patterns.md
| Drafting or auditing — common failures to avoid |
references/architecture-patterns.md
| Choosing between simple, router, and domain expertise patterns |
references/api-skill-patterns.md
| Skill calls external APIs or services |
references/consolidation-guide.md
| Merging multiple skills into fewer |
references/xml-structure-guide.md
| Deciding on XML vs markdown structure |
</reference_index>