kubernetes
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKubernetes & OpenShift Cluster Management
Kubernetes & OpenShift 集群管理
Comprehensive skill for Kubernetes and OpenShift clusters covering operations, troubleshooting, manifests, security, and GitOps.
针对Kubernetes和OpenShift集群的全面技能指南,涵盖运维、故障排查、清单、安全及GitOps相关内容。
Current Versions (January 2026)
当前版本(2026年1月)
| Platform | Version | Documentation |
|---|---|---|
| Kubernetes | 1.31.x | https://kubernetes.io/docs/ |
| OpenShift | 4.17.x | https://docs.openshift.com/ |
| EKS | 1.31 | https://docs.aws.amazon.com/eks/ |
| AKS | 1.31 | https://learn.microsoft.com/azure/aks/ |
| GKE | 1.31 | https://cloud.google.com/kubernetes-engine/docs |
| 平台 | 版本 | 文档 |
|---|---|---|
| Kubernetes | 1.31.x | https://kubernetes.io/docs/ |
| OpenShift | 4.17.x | https://docs.openshift.com/ |
| EKS | 1.31 | https://docs.aws.amazon.com/eks/ |
| AKS | 1.31 | https://learn.microsoft.com/azure/aks/ |
| GKE | 1.31 | https://cloud.google.com/kubernetes-engine/docs |
Key Tools
核心工具
| Tool | Version | Purpose |
|---|---|---|
| ArgoCD | v2.13.x | GitOps deployments |
| Flux | v2.4.x | GitOps toolkit |
| Kustomize | v5.5.x | Manifest customization |
| Helm | v3.16.x | Package management |
| Velero | 1.15.x | Backup/restore |
| Trivy | 0.58.x | Security scanning |
| Kyverno | 1.13.x | Policy engine |
| 工具 | 版本 | 用途 |
|---|---|---|
| ArgoCD | v2.13.x | GitOps部署 |
| Flux | v2.4.x | GitOps工具包 |
| Kustomize | v5.5.x | 清单定制 |
| Helm | v3.16.x | 包管理 |
| Velero | 1.15.x | 备份/恢复 |
| Trivy | 0.58.x | 安全扫描 |
| Kyverno | 1.13.x | 策略引擎 |
Command Convention
命令使用规范
IMPORTANT: Use for standard Kubernetes. Use for OpenShift/ARO.
kubectloc重要提示:标准Kubernetes环境使用命令。OpenShift/ARO环境使用命令。
kubectloc1. CLUSTER OPERATIONS
1. 集群运维
Node Management
节点管理
bash
undefinedbash
undefinedView nodes
View nodes
kubectl get nodes -o wide
kubectl get nodes -o wide
Drain node for maintenance
Drain node for maintenance
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60
Uncordon after maintenance
Uncordon after maintenance
kubectl uncordon ${NODE}
kubectl uncordon ${NODE}
View node resources
View node resources
kubectl top nodes
undefinedkubectl top nodes
undefinedCluster Upgrades
集群升级
AKS:
bash
az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}EKS:
bash
aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}GKE:
bash
gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}OpenShift:
bash
oc adm upgrade --to=${VERSION}
oc get clusterversionAKS:
bash
az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}EKS:
bash
aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}GKE:
bash
gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}OpenShift:
bash
oc adm upgrade --to=${VERSION}
oc get clusterversionBackup with Velero
使用Velero进行备份
bash
undefinedbash
undefinedInstall Velero
Install Velero
velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}
velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}
Create backup
Create backup
velero backup create ${BACKUP_NAME} --include-namespaces ${NS}
velero backup create ${BACKUP_NAME} --include-namespaces ${NS}
Restore
Restore
velero restore create --from-backup ${BACKUP_NAME}
---velero restore create --from-backup ${BACKUP_NAME}
---2. TROUBLESHOOTING
2. 故障排查
Health Assessment
健康状态评估
Run the bundled script for comprehensive health check:
bash
bash scripts/cluster-health-check.sh运行内置脚本进行全面健康检查:
bash
bash scripts/cluster-health-check.shPod Status Interpretation
Pod状态说明
| Status | Meaning | Action |
|---|---|---|
| Scheduling issue | Check resources, nodeSelector, tolerations |
| Container crashing | Check logs: |
| Image unavailable | Verify image name, registry access |
| Out of memory | Increase memory limits |
| Node pressure | Check node resources |
| 状态 | 含义 | 操作建议 |
|---|---|---|
| 调度问题 | 检查资源、nodeSelector、容忍度 |
| 容器崩溃 | 查看日志: |
| 镜像无法获取 | 验证镜像名称、镜像仓库访问权限 |
| 内存不足 | 增加内存限制 |
| 节点资源压力 | 检查节点资源 |
Debugging Commands
调试命令
bash
undefinedbash
undefinedPod logs (current and previous)
Pod logs (current and previous)
kubectl logs ${POD} -c ${CONTAINER} --previous
kubectl logs ${POD} -c ${CONTAINER} --previous
Multi-pod logs with stern
Multi-pod logs with stern
stern ${LABEL_SELECTOR} -n ${NS}
stern ${LABEL_SELECTOR} -n ${NS}
Exec into pod
Exec into pod
kubectl exec -it ${POD} -- /bin/sh
kubectl exec -it ${POD} -- /bin/sh
Pod events
Pod events
kubectl describe pod ${POD} | grep -A 20 Events
kubectl describe pod ${POD} | grep -A 20 Events
Cluster events (sorted by time)
Cluster events (sorted by time)
kubectl get events -A --sort-by='.lastTimestamp' | tail -50
undefinedkubectl get events -A --sort-by='.lastTimestamp' | tail -50
undefinedNetwork Troubleshooting
网络故障排查
bash
undefinedbash
undefinedTest DNS
Test DNS
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default
Test service connectivity
Test service connectivity
kubectl run -it --rm debug --image=curlimages/curl -- curl -v http://${SVC}.${NS}:${PORT}
kubectl run -it --rm debug --image=curlimages/curl -- curl -v http://${SVC}.${NS}:${PORT}
Check endpoints
Check endpoints
kubectl get endpoints ${SVC}
---kubectl get endpoints ${SVC}
---3. MANIFEST GENERATION
3. 清单生成
Production Deployment Template
生产环境Deployment模板
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}
namespace: ${NAMESPACE}
labels:
app.kubernetes.io/name: ${APP_NAME}
app.kubernetes.io/version: "${VERSION}"
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
template:
metadata:
labels:
app.kubernetes.io/name: ${APP_NAME}
spec:
serviceAccountName: ${APP_NAME}
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: ${APP_NAME}
image: ${IMAGE}:${TAG}
ports:
- name: http
containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
topologyKey: kubernetes.io/hostnameyaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}
namespace: ${NAMESPACE}
labels:
app.kubernetes.io/name: ${APP_NAME}
app.kubernetes.io/version: "${VERSION}"
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
template:
metadata:
labels:
app.kubernetes.io/name: ${APP_NAME}
spec:
serviceAccountName: ${APP_NAME}
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: ${APP_NAME}
image: ${IMAGE}:${TAG}
ports:
- name: http
containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
topologyKey: kubernetes.io/hostnameService & Ingress
Service & Ingress清单
yaml
apiVersion: v1
kind: Service
metadata:
name: ${APP_NAME}
spec:
selector:
app.kubernetes.io/name: ${APP_NAME}
ports:
- name: http
port: 80
targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ${APP_NAME}
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- ${HOST}
secretName: ${APP_NAME}-tls
rules:
- host: ${HOST}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ${APP_NAME}
port:
name: httpyaml
apiVersion: v1
kind: Service
metadata:
name: ${APP_NAME}
spec:
selector:
app.kubernetes.io/name: ${APP_NAME}
ports:
- name: http
port: 80
targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ${APP_NAME}
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- ${HOST}
secretName: ${APP_NAME}-tls
rules:
- host: ${HOST}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ${APP_NAME}
port:
name: httpOpenShift Route
OpenShift Route清单
yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: ${APP_NAME}
spec:
to:
kind: Service
name: ${APP_NAME}
port:
targetPort: http
tls:
termination: edge
insecureEdgeTerminationPolicy: RedirectUse the bundled script for manifest generation:
bash
bash scripts/generate-manifest.sh deployment myapp productionyaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: ${APP_NAME}
spec:
to:
kind: Service
name: ${APP_NAME}
port:
targetPort: http
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect使用内置脚本生成清单:
bash
bash scripts/generate-manifest.sh deployment myapp production4. SECURITY
4. 安全管理
Security Audit
安全审计
Run the bundled script:
bash
bash scripts/security-audit.sh [namespace]运行内置脚本:
bash
bash scripts/security-audit.sh [namespace]Pod Security Standards
Pod安全标准
yaml
apiVersion: v1
kind: Namespace
metadata:
name: ${NAMESPACE}
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: baseline
pod-security.kubernetes.io/warn: restrictedyaml
apiVersion: v1
kind: Namespace
metadata:
name: ${NAMESPACE}
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: baseline
pod-security.kubernetes.io/warn: restrictedNetworkPolicy (Zero Trust)
NetworkPolicy(零信任)
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ${APP_NAME}-policy
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: database
ports:
- protocol: TCP
port: 5432
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ${APP_NAME}-policy
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: database
ports:
- protocol: TCP
port: 5432
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53RBAC Best Practices
RBAC最佳实践
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ${APP_NAME}-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ${APP_NAME}-binding
subjects:
- kind: ServiceAccount
name: ${APP_NAME}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ${APP_NAME}-roleyaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ${APP_NAME}-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ${APP_NAME}-binding
subjects:
- kind: ServiceAccount
name: ${APP_NAME}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ${APP_NAME}-roleImage Scanning
镜像扫描
bash
undefinedbash
undefinedScan image with Trivy
Scan image with Trivy
trivy image ${IMAGE}:${TAG}
trivy image ${IMAGE}:${TAG}
Scan with severity filter
Scan with severity filter
trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}
trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}
Generate SBOM
Generate SBOM
trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}
---trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}
---5. GITOPS
5. GitOps
ArgoCD Application
ArgoCD应用清单
yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ${APP_NAME}
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: ${GIT_REPO}
targetRevision: main
path: k8s/overlays/${ENV}
destination:
server: https://kubernetes.default.svc
namespace: ${NAMESPACE}
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueyaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ${APP_NAME}
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: ${GIT_REPO}
targetRevision: main
path: k8s/overlays/${ENV}
destination:
server: https://kubernetes.default.svc
namespace: ${NAMESPACE}
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueKustomize Structure
Kustomize目录结构
k8s/
├── base/
│ ├── kustomization.yaml
│ ├── deployment.yaml
│ └── service.yaml
└── overlays/
├── dev/
│ └── kustomization.yaml
├── staging/
│ └── kustomization.yaml
└── prod/
└── kustomization.yamlbase/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yamloverlays/prod/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
namespace: production
replicas:
- name: myapp
count: 5
images:
- name: myregistry/myapp
newTag: v1.2.3k8s/
├── base/
│ ├── kustomization.yaml
│ ├── deployment.yaml
│ └── service.yaml
└── overlays/
├── dev/
│ └── kustomization.yaml
├── staging/
│ └── kustomization.yaml
└── prod/
└── kustomization.yamlbase/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yamloverlays/prod/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
namespace: production
replicas:
- name: myapp
count: 5
images:
- name: myregistry/myapp
newTag: v1.2.3GitHub Actions CI/CD
GitHub Actions CI/CD流水线
yaml
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Update Kustomize image
run: |
cd k8s/overlays/prod
kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Commit and push
run: |
git config user.name "github-actions"
git config user.email "github-actions@github.com"
git add .
git commit -m "Update image to ${{ github.sha }}"
git pushUse the bundled script for ArgoCD sync:
bash
bash scripts/argocd-app-sync.sh ${APP_NAME} --pruneyaml
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Update Kustomize image
run: |
cd k8s/overlays/prod
kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Commit and push
run: |
git config user.name "github-actions"
git config user.email "github-actions@github.com"
git add .
git commit -m "Update image to ${{ github.sha }}"
git push使用内置脚本进行ArgoCD应用同步:
bash
bash scripts/argocd-app-sync.sh ${APP_NAME} --pruneHelper Scripts
辅助脚本
This skill includes automation scripts in the directory:
scripts/| Script | Purpose |
|---|---|
| Comprehensive cluster health assessment with scoring |
| Security posture audit (privileged, root, RBAC, NetworkPolicy) |
| Safe node drain and maintenance prep |
| Pre-upgrade validation checklist |
| Generate production-ready K8s manifests |
| ArgoCD application sync helper |
Run any script:
bash
bash scripts/<script-name>.sh [arguments]本技能包在目录下提供了自动化脚本:
scripts/| 脚本 | 用途 |
|---|---|
| 带评分的全面集群健康评估 |
| 安全态势审计(特权容器、Root权限、RBAC、NetworkPolicy等) |
| 安全的节点驱逐及维护准备 |
| 升级前验证检查清单 |
| 生成生产环境就绪的K8s清单 |
| ArgoCD应用同步辅助工具 |
运行任意脚本:
bash
bash scripts/<script-name>.sh [arguments]