Loading...
Loading...
Found 95 Skills
Prometheus monitoring expert for PromQL, alerting rules, Grafana dashboards, and observability
Prometheus monitoring and alerting for cloud-native observability. USE WHEN: Writing PromQL queries, configuring Prometheus scrape targets, creating alerting rules, setting up recording rules, instrumenting applications with Prometheus metrics, configuring service discovery. DO NOT USE: For building dashboards (use /grafana), for log analysis (use /logging-observability), for general observability architecture (use senior-software-engineer with infrastructure focus). TRIGGERS: metrics, prometheus, promql, counter, gauge, histogram, summary, alert, alertmanager, alerting rule, recording rule, scrape, target, label, service discovery, relabeling, exporter, instrumentation, slo, error budget.
Golang everyday observability — the always-on signals in production. Covers structured logging with slog, Prometheus metrics, OpenTelemetry distributed tracing, continuous profiling with pprof/Pyroscope, server-side RUM event tracking, alerting, and Grafana dashboards. Apply when instrumenting Go services for production monitoring, setting up metrics or alerting, adding OpenTelemetry tracing, correlating logs with traces, migrating legacy loggers (zap/logrus/zerolog) to slog, adding observability to new features, or implementing GDPR/CCPA-compliant tracking with Customer Data Platforms (CDP). Not for temporary deep-dive performance investigation (→ See golang-benchmark and golang-performance skills).
In-memory caching in Golang using samber/hot — eviction algorithms (LRU, LFU, TinyLFU, W-TinyLFU, S3FIFO, ARC, TwoQueue, SIEVE, FIFO), TTL, cache loaders, sharding, stale-while-revalidate, missing key caching, and Prometheus metrics. Apply when using or adopting samber/hot, when the codebase imports github.com/samber/hot, or when the project repeatedly loads the same medium-to-low cardinality resources at high frequency and needs to reduce latency or backend pressure.
CLI for querying Prometheus and PromQL-compatible engines (Thanos, Cortex, VictoriaMetrics, Grafana Mimir, Grafana Tempo...) — instant queries, range queries, metric discovery (metrics/labels/meta subcommands), output formats (table/csv/json/graph). Apply when executing PromQL queries, troubleshooting performance issues on a software having observability, investigating latency/error rates/saturation, or analyzing time series data.
Guide for creating GreptimeDB triggers, by which we can trigger external webhook like Alertmanager. This feature can be used as alternative to Prometheus alerting rule.
Set up comprehensive infrastructure monitoring with Prometheus, Grafana, and alerting systems for metrics, health checks, and performance tracking.
Use when setting up monitoring systems, logging, metrics, tracing, or alerting. Invoke for dashboards, Prometheus/Grafana, load testing, profiling, capacity planning.
Instrumenting Go applications with OpenTelemetry for distributed tracing, Prometheus for metrics, and structured logging with slog
Expert-level monitoring and observability with Prometheus, Grafana, logging, and alerting
Author monitoring resources: PrometheusRules, ServiceMonitors, PodMonitors, AlertmanagerConfig, Silence CRs, and canary-checker health checks. Use when: (1) Creating or modifying alert rules (PrometheusRule), (2) Adding scrape targets (ServiceMonitor/PodMonitor), (3) Configuring Alertmanager routing or silences, (4) Writing canary-checker health checks, (5) Creating recording rules, (6) Adding monitoring for a new application or platform component. Triggers: "create alert", "add alerting", "PrometheusRule", "ServiceMonitor", "PodMonitor", "AlertmanagerConfig", "silence alert", "canary check", "recording rule", "add monitoring", "scrape target", "alert rule", "prometheus rule", "health check canary"
Guides Qdrant monitoring and observability setup. Use when someone asks 'how to monitor Qdrant', 'what metrics to track', 'is Qdrant healthy', 'optimizer stuck', 'why is memory growing', 'requests are slow', or needs to set up Prometheus, Grafana, or health checks. Also use when debugging production issues that require metric analysis.