Total 50,553 skills, DevOps & Cloud Services has 3053 skills
Showing 12 of 3053 skills
Guides Site Reliability Engineering—SLI/SLO and error budgets, reliability dashboards and burn-rate alerting, production readiness reviews, capacity planning for availability, toil reduction, dependency and failure-mode analysis, release reliability (canaries, rollback criteria), and service-owner incident mitigation tied to customer impact. Use when defining or operating SLOs, measuring error budget burn, improving service reliability, running PRRs before launch, planning scalable resilient capacity, or leading technical mitigation during outages—not for CI/CD pipeline implementation (devops), incident program and paging policy design (incident-management-engineer), cloud access and patch tickets (cloud-system-administrator), load-test profiling (performance-engineer), rollout cutover strategy (deployment-strategist), or greenfield cloud build-out (cloud-engineer).
Guides enterprise data center portfolio planning and execution—multi-site capacity roadmaps, investment prioritization (build, expand, refresh, exit, colo vs owned), portfolio RAID and dependency management across DC programs, stage-gate governance, capex/opex alignment, regional and resiliency strategy, and steering-committee reporting. Use when prioritizing several DC initiatives, harmonizing site plans over 3–5 years, tracking a portfolio of hall builds and refreshes, or aligning facilities/IT/finance on DC investments—not for a single hall MEP design (data-center-design-execution-lead), host-level utilization (data-center-compute-supply-efficiency), generic software programs (technical-program-manager), or cloud IaC (infrastructure-engineer). For executing approved MW/rack delivery on schedule, use senior-data-center-capacity-delivery-manager.
Bump a pinned dependency (TransformerEngine, Megatron-LM, NRX, etc.), regenerate the lockfile, open a PR, and drive it to green by attaching a watchdog to the "CICD NeMo" workflow and quarantining failing functional tests as flaky until the run is green.
Structured workflows for investigating production issues in Honeycomb — the sequence of tool calls (context priming, broad query, BubbleUp, trace analysis, verification) and how to chain results between steps to reach root causes. Trigger phrases: "investigate production issue", "debug latency spike", "find root cause", "use BubbleUp", "analyze traces", "debug an outage", "why is my API slow", "errors are increasing", "health check", "SLO burning", or any request to investigate or debug production problems.
Unity Cloud Build integration. Manage data, records, and automate workflows. Use when the user wants to interact with Unity Cloud Build data.
Alicloud CMS Dataset lifecycle management and querying skill. Covers listing, inspecting, creating, updating, deleting datasets and executing dataset queries via the aliyun CLI (CMS API version 2024-03-30). Triggers: "CMS dataset", "数据集", "创建数据集", "查询数据集", "dataset 查询", "ExecuteQuery", "CreateDataset", "GetDataset", "ListDatasets", "UpdateDataset", "DeleteDataset".
Cloudflare Workers performance optimization with CPU, memory, caching, bundle size. Use for slow workers, high latency, cold starts, or encountering CPU limits, memory issues, timeout errors.
Cloudflare Email Routing for receiving/sending emails via Workers. Use for email workers, forwarding, allowlists, or encountering Email Trigger errors, worker call failures, SPF issues.
Cloudflare Workers KV global key-value storage. Use for namespaces, caching, TTL, or encountering KV_ERROR, 429 rate limits, consistency issues.
This skill should be used when the user asks to "upload images to Cloudflare", "implement direct creator upload", "configure image transformations", "optimize WebP/AVIF", "create image variants", "generate signed URLs", "add image watermarks", "integrate with Next.js/Remix", "configure webhooks", "debug CORS errors", "troubleshoot error 5408/9401-9413", or "build responsive images with Cloudflare Images API".
Manage Harness users, user groups, and service accounts via MCP. List and search users, create and manage user groups for team-based access, create service accounts for API automation, and view available permissions. Use when asked to list users, create a user group, manage service accounts, check who has access, or set up team permissions. Do NOT use for role assignments or resource groups (use manage-roles instead). Trigger phrases: list users, user group, service account, who has access, team permissions, add user to group, create service account.
Generate Harness Environment YAML for deployment targets and create via MCP. Supports PreProduction and Production types with environment variables, manifest overrides, and multi-environment setup (dev, staging, prod). Use when asked to create an environment, set up staging, configure production, define deployment targets, or manage environment overrides. Trigger phrases: create environment, deployment environment, setup dev, setup staging, setup production, environment variables, environment overrides.