Loading...
Loading...
Found 13 Skills
Write a structured incident postmortem or post-incident review. Use when asked to write a postmortem, incident report, P1/P2 review, outage report, or RCA (root cause analysis). Generates a blameless postmortem with timeline, root cause, contributing factors, impact summary, and action items.
Write the canonical engineering record of a fixed bug — root cause, mechanism, fix, validation, and how it slipped through. Engineer-audience, code identifiers welcome. Use after a debug session lands a fix, before closing the ticket. Trigger on /post-mortem, when the user says "write the post-mortem / postmortem / RCA / root cause analysis", "document this fix", "write up the root cause", "close out this bug with a writeup", or hands you a fixed-and-validated bug and asks for the writeup.
Grafana Cloud AI and ML features — Grafana Assistant (natural language queries, dashboard generation, incident investigations), Dynamic Alerting (ML forecasting and outlier detection), Sift (automated root cause analysis with 8 analysis types), Knowledge Graph (entity discovery and RCA Workbench), and the LLM Plugin (OpenAI/Anthropic/Azure integration). Use when setting up AI-powered alerting, using natural language to query metrics/logs, automating incident investigation, or integrating LLMs with Grafana panels and workflows.
Use when investigating and documenting a production incident, outage, data corruption event, or post-mortem — guides evidence collection during the investigation AND produces a rich, reproducible Root Cause Analysis report. Trigger on phrases like "write an RCA", "post-mortem for X", "document this incident", "what went wrong with...", "the pipeline broke yesterday, help me investigate", or any time the user is debugging a recently-resolved incident and wants a writeup. Also use proactively when the user finishes resolving an incident in-session and the resolution context is fresh — offer to capture it as an RCA before details fade.
Create CodeTour `.tour` files — persona-targeted, step-by-step walkthroughs with real file and line anchors. Use for onboarding tours, architecture walkthroughs, PR tours, RCA tours, and structured "explain how this works" requests.
Provides expert guidance for troubleshooting Cloud Composer (Apache Airflow) and Orchestration pipelines. Use this skill when the user asks to generate Root Cause Analysis (RCA), troubleshoot or fix a failed pipeline, DAG in Composer environment and generate RCA report.
Use when the user asks to create a CodeTour .tour file — persona-targeted, step-by-step walkthroughs that link to real files and line numbers. Trigger for: create a tour, onboarding tour, architecture tour, PR review tour, explain how X works, vibe check, RCA tour, contributor guide, or any structured code walkthrough request.
Run a structured after-action review (postmortem, retrospective) on a launch, incident, or completed project to capture timeline, root cause analysis, contributing factors, and actionable lessons. Use this skill whenever the user wants to run a postmortem, retrospective, AAR, or after-action review on any past event. Triggers on after-action report, AAR, postmortem, retrospective, retro, post-incident review, what went well what didn't, lessons learned, blameless postmortem, root cause analysis, RCA, five whys. Also triggers when the user has just shipped something or just resolved an incident and wants to capture learnings.
Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.
End-to-end pipeline from unlabeled ml_app traces to a bootstrapped evaluator suite. Runs trace classification → root cause analysis → eval bootstrap in sequence with user checkpoints. Use when user says "run the eval pipeline", "go from traces to evals", "bootstrap evals end to end", "classify then RCA then bootstrap", "build an eval set from scratch", or wants a guided walkthrough from production data to evaluator code.
Bug investigation and fix workflow. Triggers: 'debug', 'fix bug', 'investigate issue', 'something is broken', or /debug. Hotfix track for quick fixes, thorough track for root cause analysis. Do NOT use for feature development or refactoring. Do NOT escalate to /ideate unless the fix requires architectural redesign.
Conduct systematic root cause analysis to identify underlying problems. Use structured methodologies to prevent recurring issues and drive improvements.