Loading...
Loading...
Found 5 Skills
Monitoring guidelines for applications and infrastructure including metrics collection, alerting strategies, and SLO-based monitoring
Comprehensive toolkit for generating best practice PromQL (Prometheus Query Language) queries following current standards and conventions. Use this skill when creating new PromQL queries, implementing monitoring and alerting rules, or building observability dashboards.
Grafana Alerting, Incident Response Management (IRM), and SLOs. Covers Grafana-managed and data source-managed alert rules, notification policies, contact points (Slack/PagerDuty/email/webhook), silences, muting, on-call scheduling, incident management workflows, and SLO configuration with burn-rate alerts. Use when configuring alerts, debugging notification routing, setting up on-call rotations, managing incidents, defining SLOs, or provisioning alerting via YAML/API.
Use this skill when the user asks to "investigate incident", "triage this alert", "what's firing", "who got paged", "incident response", "check incident status", "SLO breaching", "error budget burned", "check service level", "SLI status", "who was notified", "check notification delivery", "verify alert routing", "MTTR", "incident severity", "error budget", "burn rate", "acknowledge incident", "resolve incident", "production incident", "what alerts are active", "incident timeline", "on-call triage", or wants to triage, manage, or respond to incidents using alerts, SLOs, and notifications.
Assess APM service health using SLOs, alerts, ML, throughput, latency, error rate, and dependencies. Use when checking service status, performance, or when the user asks about service health.