Loading...
Loading...
Found 356 Skills
Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
Cloudflare Workers observability with logging, Analytics Engine, Tail Workers, metrics, and alerting. Use for monitoring, debugging, tracing, or encountering log parsing, metric aggregation, alert configuration errors.
Use when building or reviewing service, job, or CLI runtime behavior in Python — designing startup validation, shutdown sequences, observability, and structured logging. Also use when startup crashes from late config, shutdown leaves orphaned processes, terminal states are implicit, or logs lack structure.
World-class application logging - structured logs, correlation IDs, log aggregation, and the battle scars from debugging production without proper logsUse when "log, logging, logger, debug, trace, audit, structured log, correlation id, request id, log level, winston, pino, bunyan, log4j, logging, observability, debugging, monitoring, tracing, structured-logs, correlation, aggregation" mentioned.
Architect a full-stack application on Eve Horizon — manifest-driven services, managed databases, build pipelines, deployment strategies, secrets, and observability. Use when designing a new app, planning a migration, or evaluating your architecture.
Use when you need to implement or improve Java logging and observability — including selecting SLF4J with Logback/Log4j2, applying proper log levels (ERROR, WARN, INFO, DEBUG, TRACE), parameterized logging, secure logging without sensitive data exposure, environment-specific configuration, log aggregation and monitoring, or validating logging through tests. Part of the skills-for-java project
Use when building comprehensive monitoring and observability systems.
OpenTelemetry with Grafana stack. Covers OTel SDK instrumentation for Go/Java/Python/Node.js/.NET, OTLP protocol and endpoint configuration, sending telemetry to Grafana Cloud via OTLP endpoint, Grafana Alloy as OTel collector, sampling strategies, Kubernetes OTel Operator, and migration from other observability tools. Use when instrumenting apps with OTel, configuring OTLP endpoints, setting up collectors, or migrating to OpenTelemetry.
This skill should be used when the user asks to "investigate an issue", "debug a problem", "find out why something is slow", "check error rates", "analyze user behavior", "understand a production incident", "query telemetry data", "look at logs", "check traces", "examine spans", "analyze RUM data", "check frontend performance", "investigate backend latency", "find transaction data", "check payment metrics", "analyze user journeys", or wants to answer questions using observability data from logs, metrics, traces, RUM, or APM - this is the gateway skill for deciding where to look first.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Instrument applications with OpenTelemetry SDK and validate telemetry using Kopai. Use when setting up observability, adding tracing/logging/metrics, testing instrumentation, or debugging missing telemetry data.