Loading...
Loading...
Found 13 Skills
Set up comprehensive infrastructure monitoring with Prometheus, Grafana, and alerting systems for metrics, health checks, and performance tracking.
Host and process metrics including CPU, memory, disk, network, containers, and process-level telemetry. Monitor infrastructure health and resource utilization.
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
Full-stack observability with Datadog APM, logs, metrics, synthetics, and RUM. Use when implementing monitoring, tracing, alerting, or cost optimization for production systems.
Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
Author monitoring resources: PrometheusRules, ServiceMonitors, PodMonitors, AlertmanagerConfig, Silence CRs, and canary-checker health checks. Use when: (1) Creating or modifying alert rules (PrometheusRule), (2) Adding scrape targets (ServiceMonitor/PodMonitor), (3) Configuring Alertmanager routing or silences, (4) Writing canary-checker health checks, (5) Creating recording rules, (6) Adding monitoring for a new application or platform component. Triggers: "create alert", "add alerting", "PrometheusRule", "ServiceMonitor", "PodMonitor", "AlertmanagerConfig", "silence alert", "canary check", "recording rule", "add monitoring", "scrape target", "alert rule", "prometheus rule", "health check canary"
Grafana Beyla eBPF auto-instrumentation for application observability without code changes. Covers supported languages/runtimes, requirements, installation, configuration (discovery, eBPF settings, OTLP traces export, Prometheus metrics export), Kubernetes deployment, and integration with Grafana Cloud. Use when setting up zero-code instrumentation, configuring eBPF probes, deploying Beyla to Kubernetes, connecting to Tempo/Prometheus, or troubleshooting instrumentation issues.
监控与告警
Guide for implementing Grafana Tempo - a high-scale distributed tracing backend for OpenTelemetry traces. Use when configuring Tempo deployments, setting up storage backends (S3, Azure Blob, GCS), writing TraceQL queries, deploying via Helm, understanding trace structure, or troubleshooting Tempo issues on Kubernetes.
Use when the user wants to analyze EKS application logs during or after a FIS experiment. Triggers on "analyze app logs", "application log analysis", "check application behavior", "分析应用日志", "查看应用表现", "应用日志分析". Supports two modes: real-time monitoring (during experiment) and post-hoc analysis (after experiment). Reads experiment context from aws-fis-experiment-prepare/execute outputs.
Grafana Cloud infrastructure monitoring — Kubernetes monitoring, cloud provider integrations (AWS, Azure, GCP), host and container monitoring, infrastructure dashboards, and collector setup. Use when setting up Kubernetes monitoring, connecting cloud provider metrics, configuring node exporter or cAdvisor, setting up infrastructure dashboards, or using the k8s-monitoring Helm chart.
Expert-level Prometheus monitoring, metrics collection, PromQL queries, alerting, and production operations