Loading...
Loading...
Found 17 Skills
Agent skill for performance-monitor - invoke with $agent-performance-monitor
Agent skill for performance-optimizer - invoke with $agent-performance-optimizer
Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.
Agent harness performance system for Claude Code and other AI coding agents — skills, instincts, memory, hooks, commands, and security scanning
Audits Claude Code context window consumption across agents, skills, MCP servers, and rules. Identifies bloat, redundant components, and produces prioritized token-savings recommendations.
Apply optimization techniques to extend effective context capacity. Use when context limits constrain agent performance, when optimizing for cost or latency, or when implementing long-running agent systems.
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
Expert data analysis and manipulation for customer support operations using pandas
Query cross-project usage analytics. Use when reviewing agent, skill, hook, or team performance across OrchestKit projects. Also replay sessions, estimate costs, and view model delegation trends.
Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).
Designs multi-agent system architectures with orchestration patterns, tool schemas, and performance evaluation. Use when building AI agent systems, designing agent workflows, creating tool schemas, or evaluating agent performance.