Loading...
Loading...
Found 377 Skills
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
Implement uptime monitoring and status page systems for tracking service availability. Use when monitoring application uptime, creating status pages, or implementing health checks.
Implement canary deployment strategies to gradually roll out new versions to subset of users with automatic rollback based on metrics.
Full-stack observability with Datadog APM, logs, metrics, synthetics, and RUM. Use when implementing monitoring, tracing, alerting, or cost optimization for production systems.
Expert guidance for emitting high-quality, cost-efficient OpenTelemetry telemetry. Use when instrumenting applications with traces, metrics, or logs. Triggers on requests for observability, telemetry, tracing, metrics collection, logging integration, or OTel setup.
Implement blue-green deployment strategies for zero-downtime releases with instant rollback capability and traffic switching between environments.
Implement graceful shutdown procedures to handle SIGTERM signals, drain connections, complete in-flight requests, and clean up resources properly. Use when deploying containerized applications, handling server restarts, or ensuring zero-downtime deployments.
Implement backup strategies, disaster recovery plans, and data restoration procedures for protecting critical infrastructure and data.
Author monitoring resources: PrometheusRules, ServiceMonitors, PodMonitors, AlertmanagerConfig, Silence CRs, and canary-checker health checks. Use when: (1) Creating or modifying alert rules (PrometheusRule), (2) Adding scrape targets (ServiceMonitor/PodMonitor), (3) Configuring Alertmanager routing or silences, (4) Writing canary-checker health checks, (5) Creating recording rules, (6) Adding monitoring for a new application or platform component. Triggers: "create alert", "add alerting", "PrometheusRule", "ServiceMonitor", "PodMonitor", "AlertmanagerConfig", "silence alert", "canary check", "recording rule", "add monitoring", "scrape target", "alert rule", "prometheus rule", "health check canary"
Use when: user asks to create a Grafana app, initialize a grafana-app-sdk project, set up a Grafana App Platform app, scaffold a new app, or asks about deployment modes (standalone operator, grafana/apps, frontend-only), how grafana-app-sdk works, or the overall development workflow. Provides foundational knowledge of the grafana-app-sdk CLI, project structure, deployment modes, and overall workflow.
Implement Linkerd service mesh patterns for lightweight, security-focused service mesh deployments. Use when setting up Linkerd, configuring traffic policies, or implementing zero-trust networking with minimal overhead.
Optimize cloud infrastructure costs through resource rightsizing, reserved instances, spot instances, and waste reduction strategies.