Loading...
Loading...
Found 277 Skills
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Use when building or reviewing service, job, or CLI runtime behavior in Python — designing startup validation, shutdown sequences, observability, and structured logging. Also use when startup crashes from late config, shutdown leaves orphaned processes, terminal states are implicit, or logs lack structure.
Set up monitoring, logging, and alerting for infrastructure and applications. Use when implementing observability, creating dashboards, or configuring alerts.
Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
Cilium and Hubble network observability for Kubernetes. Use when managing network policies, observing traffic flows, or troubleshooting connectivity with eBPF-based networking.
Application monitoring and observability setup for Python/React projects. Use when configuring logging, metrics collection, health checks, alerting rules, or dashboard creation. Covers structured logging with structlog, Prometheus metrics for FastAPI, health check endpoints, alert threshold design, Grafana dashboard patterns, error tracking with Sentry, and uptime monitoring. Does NOT cover incident response procedures (use incident-response) or deployment (use deployment-pipeline).
Structured observability with Pydantic Logfire and OpenTelemetry. Use when: (1) Adding traces/logs to Python APIs, (2) Instrumenting FastAPI, HTTPX, SQLAlchemy, or LLMs, (3) Setting up service metadata, (4) Configuring sampling or scrubbing sensitive data, (5) Testing observability code.
Principal backend engineering intelligence for TypeScript services. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale backend code and architectures. Focus: correctness, reliability, performance, security, observability, scalability, operability, cost.
Deploy Tinybird pipes and datasources for enter.pollinations.ai observability. Validates and pushes changes to Tinybird Cloud.
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
Production Python engineering patterns covering architecture, observability, testing, performance/concurrency, and core practices. Use when designing Python systems, implementing async/sync APIs, setting up monitoring, structuring tests, optimizing performance, or following Python best practices.
PostgreSQL monitoring - metrics, alerting, observability