Loading...
Loading...
Found 5 Skills
Implement OpenTelemetry logs/metrics/traces, SLI/SLO gates, burn-rate alerts, and APM integrations. Use when adding or validating observability.
Grafana Alerting, Incident Response Management (IRM), and SLOs. Covers Grafana-managed and data source-managed alert rules, notification policies, contact points (Slack/PagerDuty/email/webhook), silences, muting, on-call scheduling, incident management workflows, and SLO configuration with burn-rate alerts. Use when configuring alerts, debugging notification routing, setting up on-call rotations, managing incidents, defining SLOs, or provisioning alerting via YAML/API.
Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.
Guides Site Reliability Engineering—SLI/SLO and error budgets, reliability dashboards and burn-rate alerting, production readiness reviews, capacity planning for availability, toil reduction, dependency and failure-mode analysis, release reliability (canaries, rollback criteria), and service-owner incident mitigation tied to customer impact. Use when defining or operating SLOs, measuring error budget burn, improving service reliability, running PRRs before launch, planning scalable resilient capacity, or leading technical mitigation during outages—not for CI/CD pipeline implementation (devops), incident program and paging policy design (incident-management-engineer), cloud access and patch tickets (cloud-system-administrator), load-test profiling (performance-engineer), rollout cutover strategy (deployment-strategist), or greenfield cloud build-out (cloud-engineer).
Use when modeling unit economics, calculating burn rate, building financial projections, pricing analysis, revenue forecasting, or any CFO-level financial decisions