Loading...
Loading...
Found 356 Skills
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Comprehensive infrastructure engineering covering DevOps, cloud platforms, FinOps, and DevSecOps. Platforms: AWS (EC2, Lambda, S3, ECS, EKS, RDS, CloudFormation), Azure basics, Cloudflare (Workers, R2, D1, Pages), GCP (GKE, Cloud Run, Cloud Storage), Docker, Kubernetes. Capabilities: CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), GitOps, infrastructure as code (Terraform, CloudFormation), container orchestration, cost optimization, security scanning, vulnerability management, secrets management, compliance (SOC2, HIPAA). Actions: deploy, configure, manage, scale, monitor, secure, optimize cloud infrastructure. Keywords: AWS, EC2, Lambda, S3, ECS, EKS, RDS, CloudFormation, Azure, Kubernetes, k8s, Docker, Terraform, CI/CD, GitHub Actions, GitLab CI, Jenkins, ArgoCD, Flux, cost optimization, FinOps, reserved instances, spot instances, security scanning, SAST, DAST, vulnerability management, secrets management, Vault, compliance, monitoring, observability. Use when: deploying to AWS/Azure/GCP/Cloudflare, setting up CI/CD pipelines, implementing GitOps workflows, managing Kubernetes clusters, optimizing cloud costs, implementing security best practices, managing infrastructure as code, container orchestration, compliance requirements, cost analysis and optimization.
Application monitoring and observability setup for Python/React projects. Use when configuring logging, metrics collection, health checks, alerting rules, or dashboard creation. Covers structured logging with structlog, Prometheus metrics for FastAPI, health check endpoints, alert threshold design, Grafana dashboard patterns, error tracking with Sentry, and uptime monitoring. Does NOT cover incident response procedures (use incident-response) or deployment (use deployment-pipeline).
Set up Apollo.io monitoring and observability. Use when implementing logging, metrics, tracing, and alerting for Apollo integrations. Trigger with phrases like "apollo monitoring", "apollo metrics", "apollo observability", "apollo logging", "apollo alerts".
MUST READ before setting up observability for ADK agents or when analyzing production traffic, debugging agent behavior, or improving agent performance. ADK observability guide — Cloud Trace, prompt-response logging, BigQuery Agent Analytics, third-party integrations, and troubleshooting. Use when configuring monitoring, tracing, or logging for agents, or when understanding how a deployed agent handles real traffic.
Implement OpenAI Harness Engineering practices in any repository. Use when setting up or refactoring agent-first workflows, writing or upgrading AGENTS.md and PLANS.md, creating deterministic smoke/test/lint/typecheck harness commands, defining strict architecture boundaries and data-shape contracts, wiring observability from day 1, and adding entropy-control checks plus CI automation for reliable autonomous runs.
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
Grafana Beyla eBPF auto-instrumentation for application observability without code changes. Covers supported languages/runtimes, requirements, installation, configuration (discovery, eBPF settings, OTLP traces export, Prometheus metrics export), Kubernetes deployment, and integration with Grafana Cloud. Use when setting up zero-code instrumentation, configuring eBPF probes, deploying Beyla to Kubernetes, connecting to Tempo/Prometheus, or troubleshooting instrumentation issues.
Set up orq.ai observability for LLM applications. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.
Comprehensive testing doctrine for software and AI systems — covers positive patterns, anti-patterns, gates for coding agents writing tests, CI discipline, and an LLM/agent evaluation primer. Use when authoring or reviewing tests, adding mocks, deciding test placement, generating tests via agents, debugging flaky CI, designing eval suites for LLM features, or rebuilding a brittle test suite. Contains 12 positive patterns (selector hierarchy, table-driven, builders, real-system gates), 25 anti-patterns across Brittleness, Flakiness, Mock-misuse, Process, and AI-specific families, 7 mandatory gates for agents writing tests, flaky-test taxonomy with quarantine workflow, contract / property / mutation testing patterns, and an oracle-ladder primer for LLM-as-judge and agent eval. Language-agnostic — pseudo-code only. Don't use for general code review, library-specific debugging unrelated to tests, non-testing CI pipeline design, or production observability.
Expert in Cilium eBPF-based networking and security for Kubernetes. Use for CNI setup, network policies (L3/L4/L7), service mesh, Hubble observability, zero-trust security, and cluster-wide network troubleshooting. Specializes in high-performance, secure cluster networking.
Risk-based quality engineering test strategy for software delivery. Use when defining or updating test strategy, selecting unit/integration/contract/E2E/performance/security coverage, setting CI quality gates and suite budgets, managing flaky tests and test data, and operationalizing observability-first debugging and release criteria.