Loading...
Loading...
Found 141 Skills
Quality review of test files and manual evidence documents. Goes beyond existence checks — evaluates assertion coverage, edge case handling, naming conventions, and evidence completeness. Produces ADEQUATE/INCOMPLETE/MISSING verdict per story. Run before QA sign-off or on demand.
Investigate LLM analytics evaluations of both types — `hog` (deterministic code-based) and `llm_judge` (LLM-prompt-based). Find existing evaluations, inspect their configuration, run them against specific generations, query individual pass/fail results, and generate AI-powered summaries of patterns across many runs. Use when the user asks to debug why an evaluation is failing, surface common failure modes, compare results across filters, dry-run a Hog evaluator, prototype a new LLM-judge prompt, or manage the evaluation lifecycle (create, update, enable/disable, delete).
Novel Chapter Review - Triggered when proofreading, reviewing, or evaluating the quality of novel chapters is required. Keywords: review, proofread, manuscript evaluation, inspection, quality assessment.
This skill should be used when the user asks to "repair an agent", "audit an agent", "fix my agent", "review agent quality", "check if my agent is well-written", "diagnose agent problems", "what's wrong with this agent", "improve this agent", or "what's wrong with this agent file". Not for skills — use repair-skill.
Meta-prompting framework for critiquing responses, analyzing solution trajectories, and evaluating AI-generated content quality
WeChat Pay Score Access Solution, covering the full link of order creation/confirmation/completion/deduction/refund, provides five capabilities: product selection, sample code, quick business reference, quality assessment, and troubleshooting. Use when user mentions "WeChat Pay Score", "Pay Score", "credit score", "deposit-free rental", "deposit-free", "enjoy first pay later", "first deposit-free mode", "first enjoy mode", "order requiring confirmation", "Pay Score service order", or asks to "access WeChat Pay Score", "request Pay Score interface sample code", "troubleshoot Pay Score issues".
Evaluate the reproducibility of technical articles. Dispatch a subagent to simulate a first-time reader reproducing the work locally and list missing information. Use as the final check on a draft before publication.
Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.
Detects entropy signals in a codebase: stale TODOs, disabled tests, lint suppressions, commented-out code, dead imports, empty catch blocks, and deprecated API usage. Designed for daily runs to catch quality erosion early. Do NOT use for feature work, refactoring planning, or security audits.