Search Results: error-analysis

Found 10 Skills

AI & Machine Learninghamelsmu/evals-skills

eval-audit

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).

🇺🇸|EnglishTranslated

AI & Machine Learningoldwinter/skills

ai-evals

Create an AI Evals Pack (eval PRD, test set, rubric, judge plan, results + iteration loop). Use for LLM evaluation, benchmarks, rubrics, error analysis/open coding, and ship/no-ship quality gates for AI features.

🇺🇸|EnglishTranslated

AI & Machine Learningmaragudk/evals-skills

trace-annotation-tool

Generate a custom trace annotation web app for open coding during LLM error analysis. Use when the user wants to review LLM traces, annotate failures with freeform comments, and do first-pass qualitative labeling (open coding). Also use when the user mentions "annotate traces", "trace review tool", "open coding tool", "label traces", "build an annotation interface", "review LLM outputs", or wants to manually inspect pipeline traces before building a failure taxonomy. This skill produces a tailored Python web application using FastHTML, TailwindCSS, and HTMX.

🇺🇸|EnglishTranslated

Code Qualityzhiruifeng/localagentcrew

debug

Identifies bugs, analyzes errors, performs root cause analysis, and proposes fixes

🇺🇸|EnglishTranslated

AI & Machine Learningascend/agent-skills

ascendc-operator-precision-eval

AscendC Operator Precision Evaluation. Generate a comprehensive precision test case set (≥30 cases) for the compiled and installed operator, run the tests and generate a precision verification report. Keywords: precision test, precision evaluation, precision report, accuracy, error analysis. After execution, YOU MUST display the overview, failure summary and key findings in the current conversation, and must not only attach the report path.

🇨🇳|ChineseTranslated

2 scripts/Checked

AI & Machine Learning0xdarkmatter/claude-mods

introspect

Analyze Claude Code session logs - extract thinking blocks, tool usage stats, error patterns, debug trajectories. Triggers on: introspect, session logs, trajectory, analyze sessions, what went wrong, tool usage, thinking blocks, session history, my reasoning, past sessions, what did I do.

🇺🇸|EnglishTranslated

Tools & Utilitiesrobzolkos/appsignal-cli

appsignal

Fetch and analyze AppSignal error incidents. Use when debugging errors, investigating exceptions, or when the user mentions AppSignal, incidents, or error monitoring.

🇺🇸|EnglishTranslated

Testing & QAbuildrtech/dotagents

sentry-issue

Use when given a Sentry issue URL and you need to fetch exception details, stacktrace, and request context using sentry-cli (and Sentry API fallback when needed).

🇺🇸|EnglishTranslated

AI & Machine Learningmaragudk/evals-skills

failure-taxonomy

Build a structured taxonomy of failure modes from open-coded trace annotations. Use this skill whenever the user has freeform annotations from reviewing LLM traces and wants to cluster them into a coherent, non-overlapping set of binary failure categories (axial coding). Also use when the user mentions "failure modes", "error taxonomy", "axial coding", "cluster annotations", "categorize errors", "failure analysis", or wants to go from raw observation notes to structured evaluation criteria. This skill covers the full pipeline: grouping open codes, defining failure modes, re-labeling traces, and quantifying error rates.

🇺🇸|EnglishTranslated

Backend Developmentmajesticlabs-dev/majestic...

rails-debugger

Use proactively when encountering Rails errors, test failures, build issues, or unexpected behavior. Analyzes errors, reproduces issues, and identifies root causes.

🇺🇸|EnglishTranslated