kubernetes-operations

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kubernetes Operations

Kubernetes 操作

Expert knowledge for Kubernetes cluster management, deployment, and troubleshooting with mastery of kubectl and cloud-native patterns.
具备Kubernetes集群管理、部署和故障排查的专业知识,精通kubectl和云原生模式。

Core Expertise

核心专业能力

Kubernetes Operations
  • Workload Management: Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs
  • Networking: Services, Ingress, NetworkPolicies, and DNS configuration
  • Configuration & Storage: ConfigMaps, Secrets, PersistentVolumes, and PersistentVolumeClaims
  • Troubleshooting: Debugging pods, analyzing logs, and inspecting cluster events
Kubernetes 操作
  • 工作负载管理:Deployments、StatefulSets、DaemonSets、Jobs和CronJobs
  • 网络管理:Services、Ingress、NetworkPolicies和DNS配置
  • 配置与存储:ConfigMaps、Secrets、PersistentVolumes和PersistentVolumeClaims
  • 故障排查:Pod调试、日志分析和集群事件检查

Cluster Operations Process

集群操作流程

  1. Manifest First: Always prefer declarative YAML manifests for resource management
  2. Validate & Dry-Run: Use
    kubectl apply --dry-run=client
    to validate changes
  3. Inspect & Verify: After applying changes, verify with
    kubectl get
    ,
    kubectl describe
    ,
    kubectl logs
  4. Monitor Health: Continuously check status of nodes, pods, and services
  5. Clean Up: Ensure old or unused resources are properly garbage collected
  1. 优先使用清单:资源管理始终优先采用声明式YAML清单
  2. 验证与试运行:使用
    kubectl apply --dry-run=client
    验证变更
  3. 检查与验证:应用变更后,使用
    kubectl get
    kubectl describe
    kubectl logs
    进行验证
  4. 监控健康状态:持续检查节点、Pod和服务的状态
  5. 清理资源:确保旧的或未使用的资源被正确垃圾回收

Essential Commands

必备命令

bash
undefined
bash
undefined

Resource management

Resource management

kubectl apply -f manifest.yaml kubectl get pods -A kubectl describe pod <pod-name> kubectl logs -f <pod-name> kubectl exec -it <pod-name> -- /bin/bash
kubectl apply -f manifest.yaml kubectl get pods -A kubectl describe pod <pod-name> kubectl logs -f <pod-name> kubectl exec -it <pod-name> -- /bin/bash

Debugging

Debugging

kubectl get events --sort-by='.lastTimestamp' kubectl top nodes kubectl top pods --containers kubectl port-forward <pod-name> 8080:80
kubectl get events --sort-by='.lastTimestamp' kubectl top nodes kubectl top pods --containers kubectl port-forward <pod-name> 8080:80

Deployment management

Deployment management

kubectl rollout status deployment/<name> kubectl rollout history deployment/<name> kubectl rollout undo deployment/<name>
kubectl rollout status deployment/<name> kubectl rollout history deployment/<name> kubectl rollout undo deployment/<name>

Cluster inspection

Cluster inspection

kubectl cluster-info kubectl get nodes -o wide kubectl api-resources
undefined
kubectl cluster-info kubectl get nodes -o wide kubectl api-resources
undefined

Key Debugging Patterns

关键调试模式

Pod Debugging
bash
undefined
Pod 调试
bash
undefined

Pod inspection

Pod inspection

kubectl describe pod <pod-name> kubectl get pod <pod-name> -o yaml kubectl logs <pod-name> --previous
kubectl describe pod <pod-name> kubectl get pod <pod-name> -o yaml kubectl logs <pod-name> --previous

Interactive debugging

Interactive debugging

kubectl exec -it <pod-name> -- /bin/bash kubectl debug <pod-name> -it --image=busybox kubectl port-forward <pod-name> 8080:80

**Networking Troubleshooting**
```bash
kubectl exec -it <pod-name> -- /bin/bash kubectl debug <pod-name> -it --image=busybox kubectl port-forward <pod-name> 8080:80

**网络故障排查**
```bash

Service debugging

Service debugging

kubectl get svc -o wide kubectl get endpoints kubectl describe svc <service>
kubectl get svc -o wide kubectl get endpoints kubectl describe svc <service>

Network connectivity

Network connectivity

kubectl run test-pod --image=busybox -it --rm -- sh
kubectl run test-pod --image=busybox -it --rm -- sh

Inside pod: nslookup, wget, nc commands

Inside pod: nslookup, wget, nc commands


**Common Issues**
```bash

**常见问题**
```bash

CrashLoopBackOff debugging

CrashLoopBackOff debugging

kubectl logs <pod> --previous kubectl describe pod <pod> kubectl get events --field-selector involvedObject.name=<pod>
kubectl logs <pod> --previous kubectl describe pod <pod> kubectl get events --field-selector involvedObject.name=<pod>

Resource constraints

Resource constraints

kubectl top pod <pod> kubectl describe pod <pod> | grep -A 5 Limits
kubectl top pod <pod> kubectl describe pod <pod> | grep -A 5 Limits

State management

State management

kubectl state list kubectl state show <resource>
undefined
kubectl state list kubectl state show <resource>
undefined

Best Practices

最佳实践

Context Safety (CRITICAL)
  • Always specify
    --context
    explicitly in every kubectl command
  • Never rely on the current context - it may have been changed by another process
  • Use
    kubectl --context=<context-name> get pods
    format for all operations
  • This prevents accidental operations on the wrong cluster (e.g., running production commands against staging)
bash
undefined
上下文安全(至关重要)
  • 始终显式指定
    --context
    :在每个kubectl命令中都要显式指定
  • 永远不要依赖当前上下文——它可能已被其他进程修改
  • 对所有操作使用
    kubectl --context=<context-name> get pods
    格式
  • 这可以防止在错误的集群上执行意外操作(例如,针对预发布集群运行生产命令)
bash
undefined

CORRECT: Explicit context

CORRECT: Explicit context

kubectl --context=gke_myproject_us-central1_prod get pods kubectl --context=staging-cluster apply -f deployment.yaml
kubectl --context=gke_myproject_us-central1_prod get pods kubectl --context=staging-cluster apply -f deployment.yaml

WRONG: Relying on current context

WRONG: Relying on current context

kubectl get pods # Which cluster is this targeting?

**Resource Definitions**
- Use declarative YAML manifests
- Implement proper labels and selectors
- Define resource requests and limits
- Configure health checks (liveness/readiness probes)

**Security**
- Use NetworkPolicies to restrict traffic
- Implement RBAC for access control
- Store sensitive data in Secrets
- Run containers as non-root users

**Monitoring**
- Configure proper logging and metrics
- Set up alerts for critical conditions
- Use health checks and readiness probes
- Monitor resource usage and quotas
kubectl get pods # Which cluster is this targeting?

**资源定义**
- 使用声明式YAML清单
- 实现适当的标签和选择器
- 定义资源请求和限制
- 配置健康检查(存活/就绪探针)

**安全**
- 使用NetworkPolicies限制流量
- 实现RBAC进行访问控制
- 将敏感数据存储在Secrets中
- 以非root用户身份运行容器

**监控**
- 配置适当的日志和指标
- 为关键条件设置告警
- 使用健康检查和就绪探针
- 监控资源使用情况和配额

Agentic Optimizations

Agentic 优化

ContextCommand
Pod status (structured)
kubectl get pods -n <ns> -o json | jq '.items[] | {name:.metadata.name, status:.status.phase}'
Quick overview
kubectl get pods -n <ns> -o wide
Events (compact)
kubectl get events -n <ns> --sort-by='.lastTimestamp' -o json
Resource details
kubectl get <resource> -o json
Logs (bounded)
kubectl logs <pod> -n <ns> --tail=50
For detailed debugging commands, troubleshooting patterns, Helm workflows, and advanced K8s operations, see REFERENCE.md.
上下文命令
Pod状态(结构化)
kubectl get pods -n <ns> -o json | jq '.items[] | {name:.metadata.name, status:.status.phase}'
快速概览
kubectl get pods -n <ns> -o wide
事件(精简版)
kubectl get events -n <ns> --sort-by='.lastTimestamp' -o json
资源详情
kubectl get <resource> -o json
日志(限制条数)
kubectl logs <pod> -n <ns> --tail=50
有关详细的调试命令、故障排查模式、Helm工作流和高级K8s操作,请参阅REFERENCE.md。