Loading...
Loading...
Found 26 Skills
Orquestração Docker Swarm, gerenciamento de cluster e implantações em produção
Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.
Kubernetes container orchestration with Helm, operators, and service mesh. Use for cluster management.
Guides Qdrant scaling decisions. Use when someone asks 'how many nodes do I need', 'data doesn't fit on one node', 'need more throughput', 'cluster is slow', 'too many tenants', 'vertical or horizontal', 'how to shard', or 'need to add capacity'.
Manage the full lifecycle of Alibaba Cloud E-MapReduce (EMR) ECS clusters—creation, scaling, renewal, and status queries. Use this Skill when users want to set up big data clusters, view cluster status, add nodes, release nodes, configure auto-scaling, check cluster and node states, or diagnose creation failures. Also applicable for scenarios like "create a Hadoop cluster", "data lake cluster", "running out of resources", "check my cluster", "renew", etc. NOTE: This Skill does NOT support cluster deletion, release, or termination under any circumstances. Any request to delete or terminate a cluster will be refused and redirected to the EMR console.
Use when contributing to the GLIDE client - Rust core internals, language bindings (PyO3/JNI/NAPI/CGO/FFI), protocol layer, PubSub synchronizer, cluster topology, and build system. For using GLIDE in apps, see valkey-glide instead.
OpenSearch development best practices for indexing, querying, search optimization, vector search, and cluster management
Expert knowledge for Azure Service Fabric development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building Service Fabric clusters, Reliable Actors/Collections, reverse proxy, remoting, or Azure-integrated apps, and other Azure Service Fabric related development tasks. Not for Azure Kubernetes Service (AKS) (use azure-kubernetes-service), Azure App Service (use azure-app-service), Azure Container Apps (use azure-container-apps), Azure Red Hat OpenShift (use azure-redhat-openshift).
Remote command execution and file transfer on SageMaker HyperPod cluster nodes via AWS Systems Manager (SSM). This is the primary interface for accessing HyperPod nodes — direct SSH is not available. Use when any skill, workflow, or user request needs to execute commands on cluster nodes, upload files to nodes, read/download files from nodes, run diagnostics, install packages, or perform any operation requiring shell access to HyperPod instances. Other HyperPod skills depend on this skill for all node-level operations.
Check and compare software component versions on SageMaker HyperPod cluster nodes - NVIDIA drivers, CUDA toolkit, cuDNN, NCCL, EFA, AWS OFI NCCL, GDRCopy, MPI, Neuron SDK (Trainium/Inferentia), Python, and PyTorch. Use when checking component versions, verifying CUDA/driver compatibility, detecting version mismatches across nodes, planning upgrades, documenting cluster configuration, or troubleshooting version-related issues on HyperPod. Triggers on requests about versions, compatibility, component checks, or upgrade planning for HyperPod clusters.
GitOps — el estado del clúster Kubernetes refleja siempre el estado del repositorio Git
Complete ArgoCD cluster bootstrapping skill for multi-repository GitOps environments. Use when provisioning new Kubernetes clusters, registering clusters with ArgoCD, configuring ApplicationSets, setting up cluster secrets, or troubleshooting cluster connectivity issues.