Search Results: rca

Found 17 Skills

DevOps & Cloud Servicesclickhouse/agent-skills

clickhouse-managed-postgres-rca

MUST USE when investigating performance issues on a ClickHouse-managed Postgres instance. Provides an evidence-based RCA workflow that scrapes the Prometheus endpoint for system signal, pulls per-digest evidence from the Slow Query Patterns API, and recommends (does not apply) a fix.

🇺🇸|EnglishTranslated

Tools & Utilitiesmohitagw15856/pm-claude-s...

incident-postmortem

Write a structured incident postmortem or post-incident review. Use when asked to write a postmortem, incident report, P1/P2 review, outage report, or RCA (root cause analysis). Generates a blameless postmortem with timeline, root cause, contributing factors, impact summary, and action items.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesgemini-cli-extensions/dat...

gcp-composer-troubleshooting

Provides expert guidance for troubleshooting Cloud Composer (Apache Airflow) and Orchestration pipelines. Use this skill when the user asks to generate Root Cause Analysis (RCA), troubleshoot or fix a failed pipeline, DAG in Composer environment and generate RCA report.

🇺🇸|EnglishTranslated

Code Qualitythananon/9arm-skills

post-mortem

Write the canonical engineering record of a fixed bug — root cause, mechanism, fix, validation, and how it slipped through. Engineer-audience, code identifiers welcome. Use after a debug session lands a fix, before closing the ticket. Trigger on /post-mortem, when the user says "write the post-mortem / postmortem / RCA / root cause analysis", "document this fix", "write up the root cause", "close out this bug with a writeup", or hands you a fixed-and-validated bug and asks for the writeup.

🇺🇸|EnglishTranslated

Tools & Utilitiesaffaan-m/everything-claud...

code-tour

Create CodeTour `.tour` files — persona-targeted, step-by-step walkthroughs with real file and line anchors. Use for onboarding tours, architecture walkthroughs, PR tours, RCA tours, and structured "explain how this works" requests.

🇺🇸|EnglishTranslated

Tools & Utilitiesalirezarezvani/claude-ski...

code-tour

Use when the user asks to create a CodeTour .tour file — persona-targeted, step-by-step walkthroughs that link to real files and line numbers. Trigger for: create a tour, onboarding tour, architecture tour, PR review tour, explain how X works, vibe check, RCA tour, contributor guide, or any structured code walkthrough request.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesgrafana/skills

ml-ai

Grafana Cloud AI and ML features — Grafana Assistant (natural language queries, dashboard generation, incident investigations), Dynamic Alerting (ML forecasting and outlier detection), Sift (automated root cause analysis with 8 analysis types), Knowledge Graph (entity discovery and RCA Workbench), and the LLM Plugin (OpenAI/Anthropic/Azure integration). Use when setting up AI-powered alerting, using natural language to query metrics/logs, automating incident investigation, or integrating LLMs with Grafana panels and workflows.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-analyze-changenet-rca

Performs deep Root Cause Analysis (RCA) on NVIDIA TAO Visual ChangeNet classification experiments with image-evidence-driven investigation. Use when analyzing ChangeNet model failures, investigating poor recall / FAR / PASS-NO_PASS metrics, auditing visual inspection pipeline quality, or running an RCA report for an AOI defect-detection model. Trigger phrases include "RCA on my ChangeNet model", "why is my AOI model failing", "audit ChangeNet predictions", "investigate FAR regressions", "root cause analysis on visual-changenet".

🇺🇸|EnglishTranslated

7 scripts/Attention

DevOps & Cloud Servicesbmad-labs/skills

rca-report

Use when investigating and documenting a production incident, outage, data corruption event, or post-mortem — guides evidence collection during the investigation AND produces a rich, reproducible Root Cause Analysis report. Trigger on phrases like "write an RCA", "post-mortem for X", "document this incident", "what went wrong with...", "the pipeline broke yesterday, help me investigate", or any time the user is debugging a recently-resolved incident and wants a writeup. Also use proactively when the user finishes resolving an incident in-session and the resolution context is fresh — offer to capture it as an RCA before details fade.

🇺🇸|EnglishTranslated

AI & Machine Learningdatadog-labs/agent-skills

llm-obs-eval-bootstrap

Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-run-deft-aoi

Run the full DEFT AOI improvement loop for NVIDIA TAO VisualChangeNet / ChangeNet PCB inspection models: baseline evaluate, RCA, ingestion of customer-supplied pre-generated AnomalyGen images, k-NN mining, retraining, and deployment gating until FAR / recall KPI targets are met. EA variant — does not run AnomalyGen inline; the customer pre-generates synthetic NG/OK pairs out-of-band and the loop ingests them. Use for prompts like "run the DEFT loop", "fine-tune until FAR below 0.1% at recall=100%", or "improve my AOI ChangeNet model with RCA and pre-generated synthetic defects"; do not use for standalone TAO training, one-off inference, generic anomaly generation, or RCA-only analysis.

🇺🇸|EnglishTranslated

8 scripts/Checked

Code Qualitya-tokyo/agent-skills

production-grade

Principle-engineering posture for production-grade code: reads the repo first, plans before code, matches conventions, pulls latest docs over training recall, and ships the simplest correct change that holds the bar — proper algorithms and data structures, idempotent writes, schema+queries+indexes as one artefact, typed errors, tests in the same diff. Substrate-agnostic; defers to peer skills on their lanes. Use for non-trivial planning, design, implementation, review, or refactoring; RCA and debugging; performance and optimization work; changes touching a database schema, security, infrastructure, or a public API; hardening inherited, vibe-coded, or LLM-generated code (dependency/CVE and migration audits); and over-engineering cleanup ("simplest solution," "YAGNI," "what can we delete").

🇺🇸|EnglishTranslated