Search Results: uat

Found 1,744 Skills

AI & Machine Learningdaymade/claude-code-skill...

promptfoo-evaluation

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".

🇺🇸|EnglishTranslated

Backend Developmentgiuseppe-trisciuoglio/dev...

spring-boot-actuator

Configure Spring Boot Actuator for production-grade monitoring, health probes, secured management endpoints, and Micrometer metrics across JVM services.

🇺🇸|EnglishTranslated

AI & Machine Learninglangchain-ai/langsmith-sk...

langsmith-evaluator

INVOKE THIS SKILL when building evaluation pipelines for LangSmith. Covers three core components: (1) Creating Evaluators - LLM-as-Judge, custom code; (2) Defining Run Functions - how to capture outputs and trajectories from your agent; (3) Running Evaluations - locally with evaluate() or auto-run via LangSmith. Uses the langsmith CLI tool.

🇺🇸|EnglishTranslated

Tools & Utilitiesalirezarezvani/claude-ski...

tech-stack-evaluator

Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating technology stacks, calculating total cost of ownership, assessing migration paths, or analyzing ecosystem viability.

🇺🇸|EnglishTranslated

7 scripts/Checked

AI & Machine Learningrysweet/amplihack

model-evaluation-benchmark

Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.

🇺🇸|EnglishTranslated

AI & Machine Learninggonglingrui/screen-creati...

novel-evaluator

Strictly and meticulously judge and score story texts, analyze quality from the dimensions of market potential, innovation attributes, and content highlights. Suitable for initial novel screening and multi-dimensional evaluation and scoring

🇨🇳|ChineseTranslated

Product & Designvangongwanxiaowan/screen-...

drama-evaluator

Evaluate and score based on the evaluation criteria for vertical short dramas, covering dimensions such as core appealing points and story types. It is suitable for assessing the potential of adapting stories into vertical short dramas and analyzing market competitiveness

🇨🇳|ChineseTranslated

AI & Machine Learningorq-ai/assistant-plugins

build-evaluator

Create validated LLM-as-a-Judge evaluators following best practices — binary Pass/Fail judges with TPR/TNR validation for measuring specific failure modes. Use when you need to automate quality checks, build guardrails, or measure a specific failure mode identified during trace analysis. Do NOT use when failures are fixable with prompt changes (use optimize-prompt) or when failure modes are unknown (use analyze-trace-failures first).

🇺🇸|EnglishTranslated

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

Project Managementoldwinter/skills

evaluating-new-technology

Create a Technology Evaluation Pack (problem framing, options matrix, build vs buy, pilot plan, risk review, decision memo). Use for evaluating new tech, emerging technology, AI tools, vendor selection, and tech stack decisions.

🇺🇸|EnglishTranslated

Testing & QAmikeyobrien/ralph-orchest...

evaluate-presets

Use when testing Ralph's hat collection presets, validating preset configurations, or auditing the preset library for bugs and UX issues.

🇺🇸|EnglishTranslated

Project Managementlyndonkl/claude

evaluation-rubrics

Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions rubric, scoring criteria, quality standards, evaluation framework, inter-rater reliability, or grade/assess work.

🇺🇸|EnglishTranslated