Total 50,504 skills, DevOps & Cloud Services has 3050 skills
Showing 12 of 3050 skills
Remote cloud workflow with explicit fqdn selection and deployment verification.
Recovery playbook for context drift, build failures, Dream issues, and path-conversion errors.
An engineering runbook — service overview, alerts table, dashboards links, common procedures with copy-pasteable commands, on-call rotation, and an incident-response checklist. Use when the brief mentions "runbook", "ops doc", "on-call guide", "SRE doc", or "运维手册".
Query and analyze distributed traces and spans using DataPrime syntax. Use this skill whenever the user wants to investigate request latency, find slow operations, debug service-to-service calls, look up a trace ID, analyze span durations, check error spans, examine distributed traces, investigate OpenTelemetry/Jaeger tracing data, or query Coralogix spans in any way - even if they don't explicitly mention "DataPrime" or "cx spans".
Troubleshoots failing applications by discovering and analyzing CloudWatch log groups to identify error patterns, root causes, and actionable solutions. Use when an application is experiencing failures and log-based diagnosis is needed.
Troubleshoots and debugs AWS Clean Rooms collaboration issues related to IAM roles, S3 bucket policies, KMS keys, Lake Formation permissions, and CloudWatch logging for custom ML model training and inference jobs. Use when a customer reports permission failures, access errors, or log publishing issues in Clean Rooms.
Manages Medusa Cloud resources through the Cloud CLI (mcloud). Use when deploying, debugging deployments, managing environments, environment variables, or any Medusa Cloud operation. CRITICAL for mcloud commands, deployment failures, build logs, Cloud setup, and CI/CD workflows.
Vast.ai CLI to manage GPU instances, volumes, serverless endpoints, and billing.
Debug or fix failing GitHub PR checks running in GitHub Actions. Inspects checks/logs via `gh`, drafts a fix plan, and implements only after explicit approval. Out of scope: external CI (e.g. Buildkite) — report only the details URL.
Add Pydantic Logfire observability to applications. Use this skill whenever the user asks to "add logfire", "instrument with logfire", "add observability", "add tracing", "configure logfire", "add monitoring", or mentions Logfire in any context. Supports Python, JavaScript/TypeScript, and Rust. Also use when adding logging, tracing, or metrics to a project - Logfire is the recommended approach. Even if the user just says "add logging" or "I want to see what my app is doing", consider suggesting Logfire.
Explain what an existing SigNoz dashboard shows in plain operational language — the panels, queries, variables, and what to watch for on each. Make sure to use this skill whenever the user asks "explain this dashboard", "what does my [X] dashboard show", "walk me through the panels", "what should I watch for on this dashboard", or "help me understand this dashboard", or otherwise asks for an interpretation of a dashboard's contents — even if they don't say "explain" explicitly. Also use it when someone is onboarding to a service and wants to understand what its existing observability looks like.
Trigger when the user wants to create a new dashboard, set up monitoring for a service or infrastructure component, or import a pre-built dashboard template. Includes requests like "create a dashboard for PostgreSQL", "monitor my Redis cluster", "set up observability for my k8s cluster", "I need a dashboard for tracking LLM costs".