Loading...
Loading...
Found 74 Skills
Generates eval test cases from an eval suite plan (output of /eval-suite-planner) or a plain-English agent description. Supports both single-response and conversation (multi-turn) evaluation modes. Outputs a Copilot Studio test set table, a CSV file for import (single-response only), and a docx report for human review.
Use when working with tdd workflows tdd cycle
Analyze gaps between implementation plans and actual codebase implementation for the Rust self-learning memory project
Help users ship products faster and with higher quality. Use when someone is planning a launch, struggling to release features, dealing with shipping velocity issues, or trying to establish better release practices.
Internal sub-skill: agentic review of a printed CLI's sampled command output for plausibility issues that rule-based checks can't encode (substring-match relevance, format bugs, silent source drops, ranking failures). Invoked via the Skill tool by main printing-press SKILL.md (Phase 4.85) and printing-press-polish SKILL.md during the diagnostic loop. Not for direct user invocation — its actionable wrappers are /printing-press and /printing-press-polish.
Build comprehensive, mobile-compatible Obsidian study vaults from academic course materials with checkpoint-based workflow, error pattern recognition, and quality assurance. Battle-tested patterns from 828KB/37-file projects. Works across all subjects - CS, medicine, business, self-study.
A comprehensive verification system for Claude Code sessions.
コード・プラン・スコープを多角的にレビュー。品質の番人、参上。Use when user mentions reviews, code review, plan review, scope analysis, security, performance, quality checks, PRs, diffs, or change review. Do NOT load for: implementation work, new feature development, bug fixes, or setup.
Conduct an architecture health check on a design — either verify if the design is internally consistent (no conflicts between terminology, contracts, and implementation steps) or check if the design aligns with the code (ensuring what was promised in the design is actually implemented in code). This skill only outputs issue lists and repair suggestions, and does not make any modifications. It focuses on only one target each time; "顺手把另一项也查了" is not allowed. Trigger scenarios: Users say "perform architecture check", "is the design internally consistent?", "does the plan align with the code?", or want to conduct a health check before proceeding to the implement/acceptance phase.
Amazon Bedrock AgentCore Evaluations for testing and monitoring AI agent quality. 13 built-in evaluators plus custom LLM-as-Judge patterns. Use when testing agents, monitoring production quality, setting up alerts, or validating agent behavior.
Comprehensive autonomous development strategies including milestone planning, incremental implementation, auto-debugging, and continuous quality assurance for full development lifecycle management
After building a feature, verify it matches what was planned, respects the system architecture and design standards, and is ready for production. Reports issues clearly so the developer decides what to fix.