Total 31,304 skills, AI & Machine Learning has 5072 skills
Showing 12 of 5072 skills
Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).
Help the user systematically identify and categorize failure modes in an LLM pipeline by reading traces. Use when starting a new eval project, after significant pipeline changes (new features, model switches, prompt rewrites), when production metrics drop, or after incidents.
Generates AI images using the nano-banana CLI (Gemini 3.1 Flash default, Pro available). Handles multi-resolution (512-4K), aspect ratios, reference images for style transfer, green screen workflow for transparent assets, cost tracking, and exact dimension control. Use when asked to "generate an image", "create a sprite", "make an asset", "generate artwork", or any image generation task for UI mockups, game assets, videos, or marketing materials.
Use when reviewing SKILL.md files for structure and trigger quality.
Guidelines for deep learning development with PyTorch, Transformers, Diffusers, and Gradio for LLM and diffusion model work.
Research any topic online and create learning guides. Use when user asks to 'learn about', 'research topic', 'create learning guide', 'build knowledge base', or 'study subject'.
Sample skill for testing the skill-tester validation pipeline. Demonstrates proper skill structure with scripts, references, and assets.
Designs multi-agent system architectures with orchestration patterns, tool schemas, and performance evaluation. Use when building AI agent systems, designing agent workflows, creating tool schemas, or evaluating agent performance.
This skill should be used when inspecting, analyzing, or querying Claude Code session logs. Use when users ask about session history, want to find sessions, analyze context usage, extract tool call patterns, debug agent execution, or understand what happened in previous sessions. Essential for understanding Claude Code's ~/.claude/projects/ structure, JSONL session format, and the erk extraction pipeline.
Global Agent rules, including language, response style, debugging priority, engineering quality baseline, mandatory code metric limits, security baseline, test verification standards and Skills routing table. Applicable to all programming tasks.
Hand off a task to Codex CLI for autonomous execution. Use when a task would benefit from a capable subagent to implement, fix, investigate, or review code. Codex has full codebase access and can make changes.
Think carefully no matter what question you answer. Before answering any question or performing any task, conduct in-depth analysis and reasoning first.