Total 50,473 skills, DevOps & Cloud Services has 3048 skills
Showing 12 of 3048 skills
Implements zero-downtime deployments on GKE using rolling updates, blue-green strategies, and health checks. Use when deploying new versions, rolling back failed deployments, configuring Spring Boot health probes (liveness/readiness), managing rollout status, or implementing progressive rollout patterns. Includes automated health verification and rollback procedures.
Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems.
Comprehensive toolkit for validating, linting, testing, and analyzing Helm charts and their rendered Kubernetes resources. Use this skill when working with Helm charts, validating templates, debugging chart issues, working with Custom Resource Definitions (CRDs) that require documentation lookup, or checking Helm best practices.
Deployment & Operations Expert responsible for securely, rollbackable, and observably deploying builds that pass Reviewer and QA gates to servers (PM2 3-process cluster + Nginx reverse proxy + BT Panel). Adheres to engineering baselines including zero-downtime deployment, health checks, rollback within ≤3 minutes, and post-release smoke testing. Handles deployment orchestration, configuration management, traffic management, and monitoring & alerting. Applicable when receiving task cards from the Deploy department or needing to release to production.
Use this skill for Makefile audit and optimization. Use when auditing Makefiles, reviewing build system, checking portability, eliminating recipe duplication. Do not use when creating new Makefiles - use abstract:make-dogfood. DO NOT use when: architecture review - use architecture-review.
Manage Alibaba Cloud ApsaraVideo Live resources and workflows via OpenAPI/SDK. Use for live domain configuration, stream ingest and playback setup, recording/transcoding templates, monitoring queries, and live stream operations.
CI/CD Pipelines, Versioning & Release Management
Builds and deploys Firebase SQL Connect (aka Firebase Data Connect) backends with PostgreSQL securely. Use when designing schemas with tables and relations, writing authorized queries and mutations, configuring real-time data updates, or generating type-safe SDKs. Use when you need a relational database with Firebase, or when the user mentions SQL Connect or Data Connect.
Logging, testing, cost hygiene, incident triage, and usage metrics for PubNub apps. Covers the correlation fields every send/receive must log, the test pyramid for real-time apps, payload + fan-out cost hygiene, the incident triage runbook, and PubNub usage metrics for billing reconciliation. Use during code reviews, when planning monitoring, when triaging incidents, or when investigating PubNub cost overruns.
Instruments code so production behavior is visible and diagnosable. Use when adding logging, metrics, tracing, or alerting. Use when shipping any feature that runs in production and you need evidence it works. Use when production issues are reported but you can't tell what happened from the available data.
Brev managed GPU instances with Docker support. Use when running TAO training, evaluation, or inference on Brev GPU instances, managing Brev deployments, or dispatching TAO jobs through the Brev CLI. Trigger phrases include "run on Brev", "Brev GPU instance", "submit job to Brev", "Brev CLI deployment".
OpenTelemetry, distributed tracing, structured logging, metrics (Prometheus, Grafana, Datadog). Use when implementing monitoring, tracing, or debugging production issues.