Loading...
Loading...
Found 72 Skills
Create and manage Kibana alerting rules via REST API or Terraform. Use when creating, updating, or managing rule lifecycle (enable, disable, mute, snooze) or rules-as-code workflows.
Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.
Set up monitoring, logging, and observability for applications and infrastructure. Use when implementing health checks, metrics collection, log aggregation, or alerting systems. Handles Prometheus, Grafana, ELK Stack, Datadog, and monitoring best practices.
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
Monitoring guidelines for applications and infrastructure including metrics collection, alerting strategies, and SLO-based monitoring
Expert-level Prometheus monitoring, metrics collection, PromQL queries, alerting, and production operations
Observability visualization with Grafana and LGTM stack. Dashboard design, panel configuration, alerting, variables/templating, and data sources. USE WHEN: Creating Grafana dashboards, configuring panels and visualizations, writing LogQL/TraceQL queries, setting up Grafana data sources, configuring dashboard variables and templates, building Grafana alerts. DO NOT USE: For writing PromQL queries (use /prometheus), for alerting rule strategy (use /prometheus), for general observability architecture (use senior-software-engineer with infrastructure focus). TRIGGERS: grafana, dashboard, panel, visualization, logql, traceql, loki, tempo, mimir, data source, annotation, variable, template, row, stat, graph, table, heatmap, gauge, bar chart, pie chart, time series, logs panel, traces panel, LGTM stack.
Drift detection + baseline integrity guard for agent workspace files with automatic alerting support
Expert-level Grafana dashboards, visualization, data sources, alerting, and production operations
Production observability with structured logging, metrics collection, distributed tracing, and alerting
Monitor management - create, update, mute, and alerting best practices.
OpenTelemetry, structured logging, distributed tracing, alerting, and dashboards