Search Results: agent-evaluation

Found 24 Skills

AI & Machine Learningshipshitdev/library

evaluation

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningneolabhq/context-engineer...

sadd:tree-of-thoughts

Execute tasks through systematic exploration, pruning, and expansion using Tree of Thoughts methodology with multi-agent evaluation

🇺🇸|EnglishTranslated

AI & Machine Learningthe-perfect-developer/the...

google-adk

This skill should be used when the user asks to "build an agent with Google ADK", "use the Agent Development Kit", "create a Google ADK agent", "set up ADK tools", or needs guidance on Google's Agent Development Kit best practices, multi-agent systems, or agent evaluation.

🇺🇸|EnglishTranslated

AI & Machine Learningalirezarezvani/claude-ski...

eval

Evaluate and rank agent results by metric or LLM judge for an AgentHub session.

🇺🇸|EnglishTranslated

AI & Machine Learningaffaan-m/everything-claud...

agent-eval

Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics

🇺🇸|EnglishTranslated

AI & Machine Learningsammcj/agentic-coding

deepeval

Use when discussing or working with DeepEval (the python AI evaluation framework)

🇺🇸|EnglishTranslated

AI & Machine Learninggotalab/skillport

skill-evaluator

Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learninglangchain-ai/lca-skills

langsmith-code-eval

Create code-based evaluators for LangSmith-traced agents with step-by-step collaborative guidance through inspection, evaluation logic, and testing.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learninglubu-labs/langchain-agent...

langgraph-testing-evaluation

Use this skill when you need to test or evaluate LangGraph/LangChain agents: writing unit or integration tests, generating test scaffolds, mocking LLM/tool behavior, running trajectory evaluation (match or LLM-as-judge), running LangSmith dataset evaluations, and comparing two agent versions with A/B-style offline analysis. Use it for Python and JavaScript/TypeScript workflows, evaluator design, experiment setup, regression gates, and debugging flaky/incorrect evaluation results.

🇺🇸|EnglishTranslated

11 scripts/Attention

AI & Machine Learningtyler-r-kendrick/agent-sk...

microsoft-foundry

Use this skill to work with Microsoft Foundry (Azure AI Foundry): deploy AI models from catalog, build RAG applications with knowledge indexes, create and evaluate AI agents. USE FOR: Microsoft Foundry, AI Foundry, deploy model, model catalog, RAG, knowledge index, create agent, evaluate agent, agent monitoring. DO NOT USE FOR: Azure Functions (use azure-functions), App Service (use azure-create-app).

🇺🇸|EnglishTranslated

AI & Machine Learningbagelhole/devops-security...

agent-evals

Build automated evaluation suites for AI agents using golden datasets, rubrics, and regression gates.

🇺🇸|EnglishTranslated

AI & Machine Learningjackjin1997/clawforge

langsmith-dataset

Use this skill for ANY question about creating test or evaluation datasets for LangChain agents. Covers generating datasets from traces (final_response, single_step, trajectory, RAG types), uploading to LangSmith, and managing evaluation data.

🇺🇸|EnglishTranslated

2 scripts/Checked