kubernetes-best-practices
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKubernetes Best Practices
Kubernetes最佳实践
This skill provides guidance for writing production-ready Kubernetes manifests and managing cloud-native applications.
本技能提供编写可用于生产环境的Kubernetes清单以及管理云原生应用的指南。
Resource Management
资源管理
Memory: Set requests and limits to the same value to ensure QoS class and prevent OOM kills.
CPU: Set requests only, omit limits to allow performance bursting and avoid throttling.
yaml
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "256Mi"
# No CPU limit内存:将requests和limits设置为相同值,以确保QoS等级并防止OOM终止。
CPU:仅设置requests,省略limits以允许性能突发并避免节流。
yaml
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "256Mi"
# No CPU limitImage Versioning
镜像版本控制
Always pin specific versions, never use tag unless explicitly requested:
:latestyaml
undefined始终固定特定版本,除非明确要求,否则绝不使用标签:
:latestyaml
undefinedGood
Good
image: nginx:1.25.3
image: nginx:1.25.3
Bad
Bad
image: nginx:latest
For immutability, consider pinning to specific digests.image: nginx:latest
为了实现不可变性,可考虑固定到特定摘要。Configuration Management
配置管理
Secrets: Sensitive data (passwords, tokens, certificates)
ConfigMaps: Non-sensitive configuration (feature flags, URLs, settings)
yaml
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: app-config
key: log-levelBest practices:
- Never hardcode secrets in manifests
- Use external secret management (Sealed Secrets, External Secrets Operator)
- Rotate secrets regularly
- Limit access with RBAC
Secrets:敏感数据(密码、令牌、证书)
ConfigMaps:非敏感配置(功能标志、URL、设置)
yaml
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: app-config
key: log-level最佳实践:
- 绝不在清单中硬编码密钥
- 使用外部密钥管理(Sealed Secrets、External Secrets Operator)
- 定期轮换密钥
- 使用RBAC限制访问
Workload Selection
工作负载选择
Choose the appropriate workload type:
- Deployment: Stateless applications (web servers, APIs, microservices)
- StatefulSet: Stateful applications (databases, message queues)
- DaemonSet: Node-level services (log collectors, monitoring agents)
- Job/CronJob: Batch processing and scheduled tasks
选择合适的工作负载类型:
- Deployment:无状态应用(Web服务器、API、微服务)
- StatefulSet:有状态应用(数据库、消息队列)
- DaemonSet:节点级服务(日志收集器、监控Agent)
- Job/CronJob:批处理和定时任务
Security Context
安全上下文
Always implement security best practices:
yaml
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
allowPrivilegeEscalation: falseSecurity checklist:
- Run as non-root user
- Drop all capabilities by default
- Use read-only root filesystem
- Disable privilege escalation
- Implement network policies
- Scan images for vulnerabilities
始终实施安全最佳实践:
yaml
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false安全检查清单:
- 以非root用户运行
- 默认丢弃所有权限
- 使用只读根文件系统
- 禁用权限提升
- 实施网络策略
- 扫描镜像漏洞
Health Checks
健康检查
Implement all three probe types:
Liveness: Restart container if unhealthy
Readiness: Remove from service endpoints if not ready
Startup: Allow slow-starting containers time to initialize
yaml
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /startup
port: 8080
periodSeconds: 10
failureThreshold: 30实现三种探针类型:
Liveness:若容器不健康则重启
Readiness:若未就绪则从服务端点移除
Startup:为启动缓慢的容器预留初始化时间
yaml
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /startup
port: 8080
periodSeconds: 10
failureThreshold: 30High Availability
高可用性
Replica counts: Set minimum 2 for production workloads
Pod Disruption Budgets: Maintain availability during voluntary disruptions
yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: web-appAdditional HA considerations:
- Use anti-affinity rules for pod distribution across nodes
- Configure graceful shutdown periods
- Implement horizontal pod autoscaling
- Set appropriate resource requests for scheduling
副本数:生产环境工作负载至少设置2个副本
Pod中断预算:在自愿中断期间保持可用性
yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: web-app额外高可用性注意事项:
- 使用反亲和规则在节点间分布Pod
- 配置优雅关闭时长
- 实现水平Pod自动扩缩容
- 为调度设置合适的资源请求
Namespace Organization
命名空间管理
Use namespaces for environment isolation and apply resource quotas:
yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: prod-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
persistentvolumeclaims: "10"Benefits: Logical separation, resource limits, RBAC boundaries, cost tracking
使用命名空间进行环境隔离并应用资源配额:
yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: prod-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
persistentvolumeclaims: "10"优势:逻辑隔离、资源限制、RBAC边界、成本追踪
Labels and Annotations
标签与注解
Use consistent, recommended labels:
yaml
metadata:
labels:
app.kubernetes.io/name: myapp
app.kubernetes.io/instance: myapp-prod
app.kubernetes.io/version: "1.0.0"
app.kubernetes.io/component: backend
app.kubernetes.io/part-of: ecommerce
app.kubernetes.io/managed-by: helm使用一致的推荐标签:
yaml
metadata:
labels:
app.kubernetes.io/name: myapp
app.kubernetes.io/instance: myapp-prod
app.kubernetes.io/version: "1.0.0"
app.kubernetes.io/component: backend
app.kubernetes.io/part-of: ecommerce
app.kubernetes.io/managed-by: helmService Types
服务类型
- ClusterIP: Internal cluster communication (default)
- NodePort: External access via node ports (dev/test)
- LoadBalancer: Cloud provider load balancer (production)
- ExternalName: DNS CNAME record (external services)
- ClusterIP:集群内部通信(默认)
- NodePort:通过节点端口对外访问(开发/测试)
- LoadBalancer:云提供商负载均衡器(生产环境)
- ExternalName:DNS CNAME记录(外部服务)
Storage
存储
Choose appropriate storage class and access mode:
Access Modes:
- ReadWriteOnce (RWO): Single node read-write
- ReadOnlyMany (ROX): Multiple nodes read-only
- ReadWriteMany (RWX): Multiple nodes read-write
yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi选择合适的存储类和访问模式:
访问模式:
- ReadWriteOnce (RWO):单节点读写
- ReadOnlyMany (ROX):多节点只读
- ReadWriteMany (RWX):多节点读写
yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 10GiValidation and Testing
验证与测试
Always validate before applying to production:
- Client-side validation:
kubectl apply --dry-run=client -f manifest.yaml - Server-side validation:
kubectl apply --dry-run=server -f manifest.yaml - Test in staging: Deploy to non-production environment first
- Monitor metrics: Watch resource usage and application health
- Gradual rollout: Use rolling updates with health checks
应用到生产环境前始终进行验证:
- 客户端验证:
kubectl apply --dry-run=client -f manifest.yaml - 服务端验证:
kubectl apply --dry-run=server -f manifest.yaml - 在预发布环境测试:先部署到非生产环境
- 监控指标:监控资源使用情况和应用健康状态
- 逐步发布:结合健康检查使用滚动更新
Application Checklist
应用检查清单
When creating or reviewing Kubernetes manifests:
- Resource requests and limits configured
- Specific image version pinned (not :latest)
- Secrets and ConfigMaps used for configuration
- Security context implemented (non-root, dropped capabilities)
- Health checks configured (liveness, readiness, startup)
- Pod Disruption Budget defined for HA workloads
- Consistent labels applied
- Appropriate workload type selected
- Namespace and resource quotas configured
- Validated with dry-run before applying
创建或审核Kubernetes清单时:
- 已配置资源requests和limits
- 已固定特定镜像版本(非:latest)
- 已使用Secrets和ConfigMaps进行配置
- 已实现安全上下文(非root用户、丢弃权限)
- 已配置健康检查(liveness、readiness、startup)
- 已为HA工作负载定义Pod Disruption Budget
- 已应用一致的标签
- 已选择合适的工作负载类型
- 已配置命名空间和资源配额
- 应用前已通过dry-run验证