Search Results: slo-monitoring

Found 5 Skills

DevOps & Cloud Servicesakin-ozer/cc-devops-skill...

promql-generator

Comprehensive toolkit for generating best practice PromQL (Prometheus Query Language) queries following current standards and conventions. Use this skill when creating new PromQL queries, implementing monitoring and alerting rules, or building observability dashboards.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesmindrally/skills

monitoring-guidelines

Monitoring guidelines for applications and infrastructure including metrics collection, alerting strategies, and SLO-based monitoring

🇺🇸|EnglishTranslated

DevOps & Cloud Serviceselastic/agent-skills

observability-service-health

Assess APM service health using SLOs, alerts, ML, throughput, latency, error rate, and dependencies. Use when checking service status, performance, or when the user asks about service health.

🇺🇸|EnglishTranslated

2 scripts/Attention

DevOps & Cloud Servicesgrafana/skills

alerting-irm

Grafana Alerting, Incident Response Management (IRM), and SLOs. Covers Grafana-managed and data source-managed alert rules, notification policies, contact points (Slack/PagerDuty/email/webhook), silences, muting, on-call scheduling, incident management workflows, and SLO configuration with burn-rate alerts. Use when configuring alerts, debugging notification routing, setting up on-call rotations, managing incidents, defining SLOs, or provisioning alerting via YAML/API.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicescoralogix/cx-cli

cx-incident-management

Use this skill when the user asks to "investigate incident", "triage this alert", "what's firing", "who got paged", "incident response", "check incident status", "SLO breaching", "error budget burned", "check service level", "SLI status", "who was notified", "check notification delivery", "verify alert routing", "MTTR", "incident severity", "error budget", "burn rate", "acknowledge incident", "resolve incident", "production incident", "what alerts are active", "incident timeline", "on-call triage", or wants to triage, manage, or respond to incidents using alerts, SLOs, and notifications.

🇺🇸|EnglishTranslated