Loading...
Loading...
Found 13 Skills
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
Set up comprehensive infrastructure monitoring with Prometheus, Grafana, and alerting systems for metrics, health checks, and performance tracking.
Full-stack observability with Datadog APM, logs, metrics, synthetics, and RUM. Use when implementing monitoring, tracing, alerting, or cost optimization for production systems.
Host and process metrics including CPU, memory, disk, network, containers, and process-level telemetry. Monitor infrastructure health and resource utilization.
Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
Grafana Beyla eBPF auto-instrumentation for application observability without code changes. Covers supported languages/runtimes, requirements, installation, configuration (discovery, eBPF settings, OTLP traces export, Prometheus metrics export), Kubernetes deployment, and integration with Grafana Cloud. Use when setting up zero-code instrumentation, configuring eBPF probes, deploying Beyla to Kubernetes, connecting to Tempo/Prometheus, or troubleshooting instrumentation issues.
Expert-level Prometheus monitoring, metrics collection, PromQL queries, alerting, and production operations
监控与告警
Use when the user wants to analyze EKS application logs during or after a FIS experiment. Triggers on "analyze app logs", "application log analysis", "check application behavior", "分析应用日志", "查看应用表现", "应用日志分析". Supports two modes: real-time monitoring (during experiment) and post-hoc analysis (after experiment). Reads experiment context from aws-fis-experiment-prepare/execute outputs.
Grafana Cloud infrastructure monitoring — Kubernetes monitoring, cloud provider integrations (AWS, Azure, GCP), host and container monitoring, infrastructure dashboards, and collector setup. Use when setting up Kubernetes monitoring, connecting cloud provider metrics, configuring node exporter or cAdvisor, setting up infrastructure dashboards, or using the k8s-monitoring Helm chart.
Author monitoring resources: PrometheusRules, ServiceMonitors, PodMonitors, AlertmanagerConfig, Silence CRs, and canary-checker health checks. Use when: (1) Creating or modifying alert rules (PrometheusRule), (2) Adding scrape targets (ServiceMonitor/PodMonitor), (3) Configuring Alertmanager routing or silences, (4) Writing canary-checker health checks, (5) Creating recording rules, (6) Adding monitoring for a new application or platform component. Triggers: "create alert", "add alerting", "PrometheusRule", "ServiceMonitor", "PodMonitor", "AlertmanagerConfig", "silence alert", "canary check", "recording rule", "add monitoring", "scrape target", "alert rule", "prometheus rule", "health check canary"
Design, refactor, and validate Grafana dashboards for OpenShift/Kubernetes platform operations. Use when users ask to improve platform health dashboards, prioritize critical tenant-impacting signals, filter noise (for example ArgoCD), add Crossplane/Keycloak health panels, validate PromQL programmatically, or apply GrafanaDashboard CR changes live then promote to GitOps.