kubernetes

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kubernetes & OpenShift Cluster Management

Kubernetes & OpenShift 集群管理

Comprehensive skill for Kubernetes and OpenShift clusters covering operations, troubleshooting, manifests, security, and GitOps.
针对Kubernetes和OpenShift集群的全面技能指南,涵盖运维、故障排查、清单、安全及GitOps相关内容。

Current Versions (January 2026)

当前版本(2026年1月)

Key Tools

核心工具

ToolVersionPurpose
ArgoCDv2.13.xGitOps deployments
Fluxv2.4.xGitOps toolkit
Kustomizev5.5.xManifest customization
Helmv3.16.xPackage management
Velero1.15.xBackup/restore
Trivy0.58.xSecurity scanning
Kyverno1.13.xPolicy engine
工具版本用途
ArgoCDv2.13.xGitOps部署
Fluxv2.4.xGitOps工具包
Kustomizev5.5.x清单定制
Helmv3.16.x包管理
Velero1.15.x备份/恢复
Trivy0.58.x安全扫描
Kyverno1.13.x策略引擎

Command Convention

命令使用规范

IMPORTANT: Use
kubectl
for standard Kubernetes. Use
oc
for OpenShift/ARO.

重要提示:标准Kubernetes环境使用
kubectl
命令。OpenShift/ARO环境使用
oc
命令。

1. CLUSTER OPERATIONS

1. 集群运维

Node Management

节点管理

bash
undefined
bash
undefined

View nodes

View nodes

kubectl get nodes -o wide
kubectl get nodes -o wide

Drain node for maintenance

Drain node for maintenance

kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60

Uncordon after maintenance

Uncordon after maintenance

kubectl uncordon ${NODE}
kubectl uncordon ${NODE}

View node resources

View node resources

kubectl top nodes
undefined
kubectl top nodes
undefined

Cluster Upgrades

集群升级

AKS:
bash
az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}
EKS:
bash
aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}
GKE:
bash
gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}
OpenShift:
bash
oc adm upgrade --to=${VERSION}
oc get clusterversion
AKS:
bash
az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}
EKS:
bash
aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}
GKE:
bash
gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}
OpenShift:
bash
oc adm upgrade --to=${VERSION}
oc get clusterversion

Backup with Velero

使用Velero进行备份

bash
undefined
bash
undefined

Install Velero

Install Velero

velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}
velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}

Create backup

Create backup

velero backup create ${BACKUP_NAME} --include-namespaces ${NS}
velero backup create ${BACKUP_NAME} --include-namespaces ${NS}

Restore

Restore

velero restore create --from-backup ${BACKUP_NAME}

---
velero restore create --from-backup ${BACKUP_NAME}

---

2. TROUBLESHOOTING

2. 故障排查

Health Assessment

健康状态评估

Run the bundled script for comprehensive health check:
bash
bash scripts/cluster-health-check.sh
运行内置脚本进行全面健康检查:
bash
bash scripts/cluster-health-check.sh

Pod Status Interpretation

Pod状态说明

StatusMeaningAction
Pending
Scheduling issueCheck resources, nodeSelector, tolerations
CrashLoopBackOff
Container crashingCheck logs:
kubectl logs ${POD} --previous
ImagePullBackOff
Image unavailableVerify image name, registry access
OOMKilled
Out of memoryIncrease memory limits
Evicted
Node pressureCheck node resources
状态含义操作建议
Pending
调度问题检查资源、nodeSelector、容忍度
CrashLoopBackOff
容器崩溃查看日志:
kubectl logs ${POD} --previous
ImagePullBackOff
镜像无法获取验证镜像名称、镜像仓库访问权限
OOMKilled
内存不足增加内存限制
Evicted
节点资源压力检查节点资源

Debugging Commands

调试命令

bash
undefined
bash
undefined

Pod logs (current and previous)

Pod logs (current and previous)

kubectl logs ${POD} -c ${CONTAINER} --previous
kubectl logs ${POD} -c ${CONTAINER} --previous

Multi-pod logs with stern

Multi-pod logs with stern

stern ${LABEL_SELECTOR} -n ${NS}
stern ${LABEL_SELECTOR} -n ${NS}

Exec into pod

Exec into pod

kubectl exec -it ${POD} -- /bin/sh
kubectl exec -it ${POD} -- /bin/sh

Pod events

Pod events

kubectl describe pod ${POD} | grep -A 20 Events
kubectl describe pod ${POD} | grep -A 20 Events

Cluster events (sorted by time)

Cluster events (sorted by time)

kubectl get events -A --sort-by='.lastTimestamp' | tail -50
undefined
kubectl get events -A --sort-by='.lastTimestamp' | tail -50
undefined

Network Troubleshooting

网络故障排查

bash
undefined
bash
undefined

Test DNS

Test DNS

kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default

Test service connectivity

Test service connectivity

kubectl run -it --rm debug --image=curlimages/curl -- curl -v http://${SVC}.${NS}:${PORT}
kubectl run -it --rm debug --image=curlimages/curl -- curl -v http://${SVC}.${NS}:${PORT}

Check endpoints

Check endpoints

kubectl get endpoints ${SVC}

---
kubectl get endpoints ${SVC}

---

3. MANIFEST GENERATION

3. 清单生成

Production Deployment Template

生产环境Deployment模板

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${APP_NAME}
  namespace: ${NAMESPACE}
  labels:
    app.kubernetes.io/name: ${APP_NAME}
    app.kubernetes.io/version: "${VERSION}"
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: ${APP_NAME}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ${APP_NAME}
    spec:
      serviceAccountName: ${APP_NAME}
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: ${APP_NAME}
          image: ${IMAGE}:${TAG}
          ports:
            - name: http
              containerPort: 8080
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: {}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: ${APP_NAME}
                topologyKey: kubernetes.io/hostname
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${APP_NAME}
  namespace: ${NAMESPACE}
  labels:
    app.kubernetes.io/name: ${APP_NAME}
    app.kubernetes.io/version: "${VERSION}"
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: ${APP_NAME}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ${APP_NAME}
    spec:
      serviceAccountName: ${APP_NAME}
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: ${APP_NAME}
          image: ${IMAGE}:${TAG}
          ports:
            - name: http
              containerPort: 8080
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: {}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: ${APP_NAME}
                topologyKey: kubernetes.io/hostname

Service & Ingress

Service & Ingress清单

yaml
apiVersion: v1
kind: Service
metadata:
  name: ${APP_NAME}
spec:
  selector:
    app.kubernetes.io/name: ${APP_NAME}
  ports:
    - name: http
      port: 80
      targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ${APP_NAME}
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - ${HOST}
      secretName: ${APP_NAME}-tls
  rules:
    - host: ${HOST}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ${APP_NAME}
                port:
                  name: http
yaml
apiVersion: v1
kind: Service
metadata:
  name: ${APP_NAME}
spec:
  selector:
    app.kubernetes.io/name: ${APP_NAME}
  ports:
    - name: http
      port: 80
      targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ${APP_NAME}
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - ${HOST}
      secretName: ${APP_NAME}-tls
  rules:
    - host: ${HOST}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ${APP_NAME}
                port:
                  name: http

OpenShift Route

OpenShift Route清单

yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: ${APP_NAME}
spec:
  to:
    kind: Service
    name: ${APP_NAME}
  port:
    targetPort: http
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
Use the bundled script for manifest generation:
bash
bash scripts/generate-manifest.sh deployment myapp production

yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: ${APP_NAME}
spec:
  to:
    kind: Service
    name: ${APP_NAME}
  port:
    targetPort: http
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
使用内置脚本生成清单:
bash
bash scripts/generate-manifest.sh deployment myapp production

4. SECURITY

4. 安全管理

Security Audit

安全审计

Run the bundled script:
bash
bash scripts/security-audit.sh [namespace]
运行内置脚本:
bash
bash scripts/security-audit.sh [namespace]

Pod Security Standards

Pod安全标准

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: ${NAMESPACE}
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: baseline
    pod-security.kubernetes.io/warn: restricted
yaml
apiVersion: v1
kind: Namespace
metadata:
  name: ${NAMESPACE}
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: baseline
    pod-security.kubernetes.io/warn: restricted

NetworkPolicy (Zero Trust)

NetworkPolicy(零信任)

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ${APP_NAME}-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ${APP_NAME}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: database
      ports:
        - protocol: TCP
          port: 5432
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ${APP_NAME}-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ${APP_NAME}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: database
      ports:
        - protocol: TCP
          port: 5432
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

RBAC Best Practices

RBAC最佳实践

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ${APP_NAME}-role
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ${APP_NAME}-binding
subjects:
  - kind: ServiceAccount
    name: ${APP_NAME}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ${APP_NAME}-role
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ${APP_NAME}-role
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ${APP_NAME}-binding
subjects:
  - kind: ServiceAccount
    name: ${APP_NAME}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ${APP_NAME}-role

Image Scanning

镜像扫描

bash
undefined
bash
undefined

Scan image with Trivy

Scan image with Trivy

trivy image ${IMAGE}:${TAG}
trivy image ${IMAGE}:${TAG}

Scan with severity filter

Scan with severity filter

trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}
trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}

Generate SBOM

Generate SBOM

trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}

---
trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}

---

5. GITOPS

5. GitOps

ArgoCD Application

ArgoCD应用清单

yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ${APP_NAME}
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: ${GIT_REPO}
    targetRevision: main
    path: k8s/overlays/${ENV}
  destination:
    server: https://kubernetes.default.svc
    namespace: ${NAMESPACE}
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ${APP_NAME}
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: ${GIT_REPO}
    targetRevision: main
    path: k8s/overlays/${ENV}
  destination:
    server: https://kubernetes.default.svc
    namespace: ${NAMESPACE}
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Kustomize Structure

Kustomize目录结构

k8s/
├── base/
│   ├── kustomization.yaml
│   ├── deployment.yaml
│   └── service.yaml
└── overlays/
    ├── dev/
    │   └── kustomization.yaml
    ├── staging/
    │   └── kustomization.yaml
    └── prod/
        └── kustomization.yaml
base/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
overlays/prod/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namePrefix: prod-
namespace: production
replicas:
  - name: myapp
    count: 5
images:
  - name: myregistry/myapp
    newTag: v1.2.3
k8s/
├── base/
│   ├── kustomization.yaml
│   ├── deployment.yaml
│   └── service.yaml
└── overlays/
    ├── dev/
    │   └── kustomization.yaml
    ├── staging/
    │   └── kustomization.yaml
    └── prod/
        └── kustomization.yaml
base/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
overlays/prod/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namePrefix: prod-
namespace: production
replicas:
  - name: myapp
    count: 5
images:
  - name: myregistry/myapp
    newTag: v1.2.3

GitHub Actions CI/CD

GitHub Actions CI/CD流水线

yaml
name: Build and Deploy
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build and push image
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
      
      - name: Update Kustomize image
        run: |
          cd k8s/overlays/prod
          kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
          
      - name: Commit and push
        run: |
          git config user.name "github-actions"
          git config user.email "github-actions@github.com"
          git add .
          git commit -m "Update image to ${{ github.sha }}"
          git push
Use the bundled script for ArgoCD sync:
bash
bash scripts/argocd-app-sync.sh ${APP_NAME} --prune

yaml
name: Build and Deploy
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build and push image
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
      
      - name: Update Kustomize image
        run: |
          cd k8s/overlays/prod
          kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
          
      - name: Commit and push
        run: |
          git config user.name "github-actions"
          git config user.email "github-actions@github.com"
          git add .
          git commit -m "Update image to ${{ github.sha }}"
          git push
使用内置脚本进行ArgoCD应用同步:
bash
bash scripts/argocd-app-sync.sh ${APP_NAME} --prune

Helper Scripts

辅助脚本

This skill includes automation scripts in the
scripts/
directory:
ScriptPurpose
cluster-health-check.sh
Comprehensive cluster health assessment with scoring
security-audit.sh
Security posture audit (privileged, root, RBAC, NetworkPolicy)
node-maintenance.sh
Safe node drain and maintenance prep
pre-upgrade-check.sh
Pre-upgrade validation checklist
generate-manifest.sh
Generate production-ready K8s manifests
argocd-app-sync.sh
ArgoCD application sync helper
Run any script:
bash
bash scripts/<script-name>.sh [arguments]
本技能包在
scripts/
目录下提供了自动化脚本:
脚本用途
cluster-health-check.sh
带评分的全面集群健康评估
security-audit.sh
安全态势审计(特权容器、Root权限、RBAC、NetworkPolicy等)
node-maintenance.sh
安全的节点驱逐及维护准备
pre-upgrade-check.sh
升级前验证检查清单
generate-manifest.sh
生成生产环境就绪的K8s清单
argocd-app-sync.sh
ArgoCD应用同步辅助工具
运行任意脚本:
bash
bash scripts/<script-name>.sh [arguments]