Loading...
Loading...
Found 109 Skills
Create and manage Kibana alerting rules via REST API or Terraform. Use when creating, updating, or managing rule lifecycle (enable, disable, mute, snooze) or rules-as-code workflows.
Grafana Alerting, Incident Response Management (IRM), and SLOs. Covers Grafana-managed and data source-managed alert rules, notification policies, contact points (Slack/PagerDuty/email/webhook), silences, muting, on-call scheduling, incident management workflows, and SLO configuration with burn-rate alerts. Use when configuring alerts, debugging notification routing, setting up on-call rotations, managing incidents, defining SLOs, or provisioning alerting via YAML/API.
Create and manage Axiom monitors and notifiers via the v2 public API. Use when building alerting, routing notifications, validating monitor behavior, and maintaining alert configurations end-to-end.
Creates SLO-based alerts and operational dashboards with key charts, alert thresholds, and runbook links. Use for "alerting", "dashboards", "SLO", or "monitoring".
Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.
Manage Oodle notifiers, notification policies, and muting rules — routing alerts to the right channels and silencing during maintenance.
Set up monitoring, logging, and observability for applications and infrastructure. Use when implementing health checks, metrics collection, log aggregation, or alerting systems. Handles Prometheus, Grafana, ELK Stack, Datadog, and monitoring best practices.
Design and run a monitoring system for a website or web app. Use this skill when setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue. Triggers on monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, what should we monitor. Also triggers when an incident reveals a gap in monitoring.
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
Expert-level monitoring and observability with Prometheus, Grafana, logging, and alerting
Know when your AI breaks in production. Use when you need to monitor AI quality, track accuracy over time, detect model degradation, set up alerts for AI failures, log predictions, measure production quality, catch when a model provider changes behavior, build an AI monitoring dashboard, or prove your AI is still working for compliance. Covers DSPy evaluation for ongoing monitoring, prediction logging, drift detection, and alerting.
Grafana OSS core features — dashboards, panels, visualization types, data sources, template variables, alerting, annotations, provisioning, RBAC, service accounts, and configuration. Use when building dashboards, configuring data sources, setting up provisioning YAML, managing users and permissions, writing PromQL/LogQL/TraceQL in panels, or configuring Grafana server settings.