Loading...
Loading...
Found 95 Skills
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Grafana Alloy OpenTelemetry collector and telemetry pipeline configuration. Covers the Alloy configuration language (blocks, attributes, expressions), components for collecting metrics/logs/traces/profiles, sending data to Grafana Cloud/Prometheus/Loki/Tempo, clustering, Fleet Management remote config, and building telemetry pipelines. Use when configuring Alloy, writing Alloy config files (.alloy), building data collection pipelines, setting up scraping, or troubleshooting Alloy deployments.
Expert guidance for building production-ready FastAPI applications with modular architecture where each business domain is an independent module with own routes, models, schemas, services, cache, and migrations. Uses UV + pyproject.toml for modern Python dependency management, project name subdirectory for clean workspace organization, structlog (JSON+colored logging), pydantic-settings configuration, auto-discovery module loader, async SQLAlchemy with PostgreSQL, per-module Alembic migrations, Redis/memory cache with module-specific namespaces, central httpx client, OpenTelemetry/Prometheus observability, conversation ID tracking (X-Conversation-ID header+cookie), conditional Keycloak/app-based RBAC authentication, DDD/clean code principles, and automation scripts for rapid module development. Use when user requests FastAPI project setup, modular architecture, independent module development, microservice architecture, async database operations, caching strategies, logging patterns, configuration management, authentication systems, observability implementation, or enterprise Python web services. Supports max 3-4 route nesting depth, cache invalidation patterns, inter-module communication via service layer, and comprehensive error handling workflows.
Redis observability guidance — which metrics to monitor (memory, connections, hit ratio, ops/sec, rejected connections), which built-in commands to reach for during incident triage (SLOWLOG, INFO, MEMORY DOCTOR, CLIENT LIST, FT.PROFILE), and when to use the Redis Insight GUI. Use when setting up monitoring or alerts for a Redis instance, diagnosing a performance regression, profiling a slow FT.SEARCH query, or wiring Redis metrics into Prometheus, Datadog, or similar.
Automatically discover observability and monitoring skills when working with Prometheus, Grafana, distributed tracing, structured logging, metrics, alerting, dashboards, or monitoring. Activates for observability development tasks.
Use when adding logging to services, setting up monitoring, creating alerts, debugging production issues, designing SLIs/SLOs, or implementing structured logging (Pino, Winston), metrics (Prometheus, DataDog, CloudWatch), or distributed tracing (OpenTelemetry).
Three-lens code review using parallel subagents: Epimetheus (hindsight — bugs, debt, fragility), Metis (craft — clarity, idiom, fit-for-purpose), Prometheus (foresight — vision, extensibility, future-Claude). Triggers on /titans, /review, 'review this code', 'what did I miss', 'before I ship this'. Use after completing substantial work, before /close. (user)
Build PromQL, LogQL, TraceQL queries for Perses panels. Validate query syntax, suggest optimizations, handle variable templating with Perses interpolation formats. Integrates with prometheus-grafana-engineer for deep PromQL expertise. Use for "perses query", "promql perses", "logql perses", "perses panel query". Do NOT use for datasource configuration (use perses-datasource-manage).
Use when operating production Kubernetes — Helm, autoscaling (HPA/VPA), resource management, StatefulSets, external-secrets, observability (Prometheus/Grafana/Loki), RBAC, Pod Security Standards, NetworkPolicies, admission control, backup (Velero), and cost control.
CI/CD pipeline design, containerization, and infrastructure management. Handles Docker, Kubernetes, monitoring setup (Prometheus/Grafana), and infrastructure-as-code (Terraform/Pulumi).
Guide for implementing HolmesGPT - an AI agent for troubleshooting cloud-native environments. Use when investigating Kubernetes issues, analyzing alerts from Prometheus/AlertManager/PagerDuty, performing root cause analysis, configuring HolmesGPT installations (CLI/Helm/Docker), setting up AI providers (OpenAI/Anthropic/Azure), creating custom toolsets, or integrating with observability platforms (Grafana, Loki, Tempo, DataDog).
Observability visualization with Grafana and LGTM stack. Dashboard design, panel configuration, alerting, variables/templating, and data sources. USE WHEN: Creating Grafana dashboards, configuring panels and visualizations, writing LogQL/TraceQL queries, setting up Grafana data sources, configuring dashboard variables and templates, building Grafana alerts. DO NOT USE: For writing PromQL queries (use /prometheus), for alerting rule strategy (use /prometheus), for general observability architecture (use senior-software-engineer with infrastructure focus). TRIGGERS: grafana, dashboard, panel, visualization, logql, traceql, loki, tempo, mimir, data source, annotation, variable, template, row, stat, graph, table, heatmap, gauge, bar chart, pie chart, time series, logs panel, traces panel, LGTM stack.