Loading...
Loading...
Found 14 Skills
Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.
Find and evaluate Claude skills for specific use cases using semantic search, Anthropic best practices assessment, and fitness scoring. Use when the user asks to find skills for a particular task (e.g., "find me a skill for pitch decks"), not for generic "show all skills" requests.
After an agentic task completes, perform a retrospective analysis across 6 dimensions (goal alignment, efficiency, decision quality, error handling, communication, reusability). Score performance, identify inefficiency patterns, evaluate skill usage, and produce actionable improvement recommendations. Triggers on "how did it go", "retrospective", "review performance", "what could be better", or after any long agentic task completes.
Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.
Comprehensive security and safety evaluation system for agent skills (.skill files). Use when users provide GitHub URLs, website links, or .skill files for download and request security assessment, safety evaluation, or ask "is this skill safe to use." Evaluates prompt injection risks, malicious code patterns, hidden instructions, data exfiltration attempts, and provides actionable recommendations with risk scoring.
Define the design rules (Skill Laws) that all Skills must follow, including core principles such as AI-first, human-centric, and ready-to-use. When to use: When users create a new Skill, optimize an existing Skill, ask about Skill design specifications, or need to evaluate Skill quality.
Evaluate and improve skills through measured testing. Run trigger evaluations to test whether skill descriptions cause correct activation, optimize descriptions via automated train/test loops, benchmark skill output quality with A/B comparisons, and validate skill structure. Use when user says "improve skill", "test skill triggers", "optimize description", "benchmark skill", "eval skill", or "skill quality". Do NOT use for creating new skills (use skill-creator-engineer).
Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch for Claude Code or Cursor, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
Audit existing skills with Tessl scoring, metadata and trigger-coverage checks, repo conventions, and skill-authoring best practices. Use when creating or revising a skill, triaging weak self-activation, or comparing a skill against source-repo guidance such as `AGENTS.md`, `CLAUDE.md`, or repo rules, plus external skill guidance. Do not use to verify general application code or to rewrite unrelated docs.
Analyzes and compares existing skills from any source (skills.sh, GitHub, Claude marketplace, or local files) against a target skill or requirement. Fetches skill content, evaluates it across 10 dimensions, produces a structured comparison table, identifies gaps, and recommends whether to adopt, adapt, or build from scratch. Trigger when: analyze this skill, compare skills, is this skill good enough, what does this skill do, skill evaluation, should I use this skill, skill gap analysis, paste a skills.sh URL, GitHub skill URL, or upload a SKILL.md file for review.
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or iterate on skill quality. Triggers: "create a skill", "make a new skill", "build a skill for", "write a skill that", "skill for doing X", "I want a skill to", "new skill", "design a skill", "scaffold a skill", "improve this skill", "optimize this skill", "this skill isn't working well", "evaluate this skill", "score this skill", "how good is this skill", "run evals on", "benchmark this skill", "test this skill's quality", "skill quality", "skill performance". Also triggers when a user describes a repeatable workflow they want to automate, says "I keep doing X manually", "can you remember how to do X", or "turn this into a skill".