kubernetes-specialist

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kubernetes Specialist

Kubernetes专家

When to Use This Skill

何时使用该技能

  • Deploying workloads (Deployments, StatefulSets, DaemonSets, Jobs)
  • Configuring networking (Services, Ingress, NetworkPolicies)
  • Managing configuration (ConfigMaps, Secrets, environment variables)
  • Setting up persistent storage (PV, PVC, StorageClasses)
  • Creating Helm charts for application packaging
  • Troubleshooting cluster and workload issues
  • Implementing security best practices
  • 部署工作负载(Deployments、StatefulSets、DaemonSets、Jobs)
  • 配置网络(Services、Ingress、NetworkPolicies)
  • 管理配置(ConfigMaps、Secrets、环境变量)
  • 搭建持久化存储(PV、PVC、StorageClasses)
  • 创建Helm charts用于应用打包
  • 排查集群和工作负载问题
  • 落地安全最佳实践

Core Workflow

核心工作流

  1. Analyze requirements — Understand workload characteristics, scaling needs, security requirements
  2. Design architecture — Choose workload types, networking patterns, storage solutions
  3. Implement manifests — Create declarative YAML with proper resource limits, health checks
  4. Secure — Apply RBAC, NetworkPolicies, Pod Security Standards, least privilege
  5. Validate — Run
    kubectl rollout status
    ,
    kubectl get pods -w
    , and
    kubectl describe pod <name>
    to confirm health; roll back with
    kubectl rollout undo
    if needed
  1. 需求分析 — 了解工作负载特征、扩缩容需求、安全要求
  2. 架构设计 — 选择工作负载类型、网络模式、存储方案
  3. 清单实现 — 编写声明式YAML,配置合理的资源限制、健康检查
  4. 安全加固 — 应用RBAC、NetworkPolicies、Pod安全标准、最小权限原则
  5. 验证生效 — 执行
    kubectl rollout status
    kubectl get pods -w
    kubectl describe pod <name>
    确认运行健康;如有需要可通过
    kubectl rollout undo
    回滚

Reference Guide

参考指南

Load detailed guidance based on context:
TopicReferenceLoad When
Workloads
references/workloads.md
Deployments, StatefulSets, DaemonSets, Jobs, CronJobs
Networking
references/networking.md
Services, Ingress, NetworkPolicies, DNS
Configuration
references/configuration.md
ConfigMaps, Secrets, environment variables
Storage
references/storage.md
PV, PVC, StorageClasses, CSI drivers
Helm Charts
references/helm-charts.md
Chart structure, values, templates, hooks, testing, repositories
Troubleshooting
references/troubleshooting.md
kubectl debug, logs, events, common issues
Custom Operators
references/custom-operators.md
CRD, Operator SDK, controller-runtime, reconciliation
Service Mesh
references/service-mesh.md
Istio, Linkerd, traffic management, mTLS, canary
GitOps
references/gitops.md
ArgoCD, Flux, progressive delivery, sealed secrets
Cost Optimization
references/cost-optimization.md
VPA, HPA tuning, spot instances, quotas, right-sizing
Multi-Cluster
references/multi-cluster.md
Cluster API, federation, cross-cluster networking, DR
可根据上下文加载详细指导:
主题参考文件触发加载场景
工作负载
references/workloads.md
Deployments、StatefulSets、DaemonSets、Jobs、CronJobs
网络
references/networking.md
Services、Ingress、NetworkPolicies、DNS
配置管理
references/configuration.md
ConfigMaps、Secrets、环境变量
存储
references/storage.md
PV、PVC、StorageClasses、CSI驱动
Helm Charts
references/helm-charts.md
Chart结构、values、模板、钩子、测试、镜像仓库
问题排查
references/troubleshooting.md
kubectl debug、日志、事件、常见问题
自定义Operator
references/custom-operators.md
CRD、Operator SDK、controller-runtime、调谐逻辑
服务网格
references/service-mesh.md
Istio、Linkerd、流量管理、mTLS、金丝雀发布
GitOps
references/gitops.md
ArgoCD、Flux、渐进式交付、加密密钥
成本优化
references/cost-optimization.md
VPA、HPA调优、竞价实例、配额、资源规格优化
多集群
references/multi-cluster.md
集群API、联邦、跨集群网络、容灾

Constraints

约束

MUST DO

必须遵守

  • Use declarative YAML manifests (avoid imperative kubectl commands)
  • Set resource requests and limits on all containers
  • Include liveness and readiness probes
  • Use secrets for sensitive data (never hardcode credentials)
  • Apply least privilege RBAC permissions
  • Implement NetworkPolicies for network segmentation
  • Use namespaces for logical isolation
  • Label resources consistently for organization
  • Document configuration decisions in annotations
  • 使用声明式YAML清单(避免使用命令式kubectl指令)
  • 为所有容器设置资源请求和限制
  • 包含存活探针(liveness probe)和就绪探针(readiness probe)
  • 敏感数据使用Secrets存储(严禁硬编码凭证)
  • 应用最小权限RBAC权限
  • 配置NetworkPolicies实现网络分段
  • 使用命名空间进行逻辑隔离
  • 为资源添加统一标签便于管理
  • 通过注解记录配置决策

MUST NOT DO

禁止操作

  • Deploy to production without resource limits
  • Store secrets in ConfigMaps or as plain environment variables
  • Use default ServiceAccount for application pods
  • Allow unrestricted network access (default allow-all)
  • Run containers as root without justification
  • Skip health checks (liveness/readiness probes)
  • Use latest tag for production images
  • Expose unnecessary ports or services
  • 未设置资源限制就部署到生产环境
  • 将密钥存储在ConfigMaps中或作为明文环境变量
  • 应用Pod使用默认ServiceAccount
  • 允许无限制网络访问(默认全通策略)
  • 无正当理由以root身份运行容器
  • 省略健康检查(存活/就绪探针)
  • 生产镜像使用latest标签
  • 暴露不必要的端口或服务

Common YAML Patterns

常用YAML模板

Deployment with resource limits, probes, and security context

带资源限制、探针和安全上下文的Deployment

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: my-namespace
  labels:
    app: my-app
    version: "1.2.3"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: "1.2.3"
    spec:
      serviceAccountName: my-app-sa # never use default SA
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      containers:
        - name: my-app
          image: my-registry/my-app:1.2.3 # never use latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          envFrom:
            - secretRef:
                name: my-app-secret # pull credentials from Secret, not ConfigMap
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: my-namespace
  labels:
    app: my-app
    version: "1.2.3"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: "1.2.3"
    spec:
      serviceAccountName: my-app-sa # never use default SA
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      containers:
        - name: my-app
          image: my-registry/my-app:1.2.3 # never use latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          envFrom:
            - secretRef:
                name: my-app-secret # pull credentials from Secret, not ConfigMap

Minimal RBAC (least privilege)

最小权限RBAC配置

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  namespace: my-namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: my-app-role
  namespace: my-namespace
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"] # grant only what is needed
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-rolebinding
  namespace: my-namespace
subjects:
  - kind: ServiceAccount
    name: my-app-sa
    namespace: my-namespace
roleRef:
  kind: Role
  name: my-app-role
  apiGroup: rbac.authorization.k8s.io
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  namespace: my-namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: my-app-role
  namespace: my-namespace
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"] # grant only what is needed
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-rolebinding
  namespace: my-namespace
subjects:
  - kind: ServiceAccount
    name: my-app-sa
    namespace: my-namespace
roleRef:
  kind: Role
  name: my-app-role
  apiGroup: rbac.authorization.k8s.io

NetworkPolicy (default-deny + explicit allow)

NetworkPolicy(默认拒绝+显式放行)

yaml
undefined
yaml
undefined

Deny all ingress and egress by default

Deny all ingress and egress by default

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: my-namespace spec: podSelector: {} policyTypes: ["Ingress", "Egress"]

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: my-namespace spec: podSelector: {} policyTypes: ["Ingress", "Egress"]

Allow only specific traffic

Allow only specific traffic

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-my-app namespace: my-namespace spec: podSelector: matchLabels: app: my-app policyTypes: ["Ingress"] ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080
undefined
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-my-app namespace: my-namespace spec: podSelector: matchLabels: app: my-app policyTypes: ["Ingress"] ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080
undefined

Validation Commands

验证命令

After deploying, verify health and security posture:
bash
undefined
部署完成后,验证运行健康和安全配置:
bash
undefined

Watch rollout complete

监控滚动部署完成状态

kubectl rollout status deployment/my-app -n my-namespace
kubectl rollout status deployment/my-app -n my-namespace

Stream pod events to catch crash loops or image pull errors

实时查看Pod事件,排查崩溃循环或镜像拉取错误

kubectl get pods -n my-namespace -w
kubectl get pods -n my-namespace -w

Inspect a specific pod for failures

检查特定Pod的失败原因

kubectl describe pod <pod-name> -n my-namespace
kubectl describe pod <pod-name> -n my-namespace

Check container logs

查看容器日志

kubectl logs <pod-name> -n my-namespace --previous # use --previous for crashed containers
kubectl logs <pod-name> -n my-namespace --previous # 对已崩溃的容器添加--previous参数

Verify resource usage vs. limits

对比资源使用量和限制

kubectl top pods -n my-namespace
kubectl top pods -n my-namespace

Audit RBAC permissions for a service account

审计ServiceAccount的RBAC权限

kubectl auth can-i --list --as=system:serviceaccount:my-namespace:my-app-sa
kubectl auth can-i --list --as=system:serviceaccount:my-namespace:my-app-sa

Roll back a failed deployment

回滚失败的部署

kubectl rollout undo deployment/my-app -n my-namespace
undefined
kubectl rollout undo deployment/my-app -n my-namespace
undefined

Output Templates

输出模板

When implementing Kubernetes resources, provide:
  1. Complete YAML manifests with proper structure
  2. RBAC configuration if needed (ServiceAccount, Role, RoleBinding)
  3. NetworkPolicy for network isolation
  4. Brief explanation of design decisions and security considerations
实现Kubernetes资源时,请提供:
  1. 结构完整的YAML清单
  2. 必要的RBAC配置(ServiceAccount、Role、RoleBinding)
  3. 用于网络隔离的NetworkPolicy
  4. 设计决策和安全考量的简要说明