Search Results: agent-performance

Use this skill when the user's Copilot Studio agent evaluations have come back and they need to interpret scores, diagnose root causes of underperforming test cases, find remediation steps, or analyze patterns to improve their agent. Always use this skill when the user mentions: "eval failed", "why did this fail", "triage", "diagnose failure", "low pass rate", "fix evaluation results", "not passing", "failing test cases", "evaluation results", "improve my eval scores", or any situation where eval scores need interpretation and action.

🇺🇸|EnglishTranslated

AI & Machine Learningaffaan-m/everything-claud...

context-budget

Audits Claude Code context window consumption across agents, skills, MCP servers, and rules. Identifies bloat, redundant components, and produces prioritized token-savings recommendations.

🇺🇸|EnglishTranslated

AI & Machine Learningshipshitdev/library

context-optimization

Apply optimization techniques to extend effective context capacity. Use when context limits constrain agent performance, when optimizing for cost or latency, or when implementing long-running agent systems.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingmanutej/luxor-claude-mark...

pandas

Expert data analysis and manipulation for customer support operations using pandas

🇺🇸|EnglishTranslated

AI & Machine Learningp47phoenix/claude-plugins

prompt-engineer

Expert prompt optimization for LLMs and AI systems. Use PROACTIVELY when building AI features, improving agent performance, or crafting system prompts. Masters prompt patterns and techniques.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

everything-claude-code-harness

Agent harness performance system for Claude Code and other AI coding agents — skills, instincts, memory, hooks, commands, and security scanning

🇺🇸|EnglishTranslated

AI & Machine Learningtry-works/recursive-mode

recursive-benchmark

Paired benchmark orchestration for comparing coding-agent performance with recursive-mode off and on. Use when the user wants to benchmark recursive-mode, compare recursive vs non-recursive execution on the same project, generate disposable benchmark repos, capture timing/build-test logs, or write a benchmark report.

🇺🇸|EnglishTranslated

AI & Machine Learningmlflow/skills

agent-evaluation

Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).

🇺🇸|EnglishTranslated

12 scripts/Attention