helm-debugging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHelm Debugging & Troubleshooting
Helm调试与故障排查
Comprehensive guidance for diagnosing and fixing Helm deployment failures, template errors, and configuration issues.
这份指南全面介绍了如何诊断和修复Helm部署失败、模板错误以及配置问题。
When to Use
适用场景
Use this skill automatically when:
- User reports Helm deployment failures or errors
- User mentions debugging, troubleshooting, or fixing Helm issues
- Template rendering problems occur
- Value validation or type errors
- Resource conflicts or API errors
- Image pull failures or pod crashes
- User needs to inspect deployed resources
当出现以下情况时,自动使用本技能:
- 用户反馈Helm部署失败或报错
- 用户提及调试、排查或修复Helm问题
- 出现模板渲染问题
- 值验证或类型错误
- 资源冲突或API错误
- 镜像拉取失败或Pod崩溃
- 用户需要检查已部署的资源
Context Safety (CRITICAL)
上下文安全性(至关重要)
Always specify explicitly in all kubectl and helm commands. Never rely on the current context.
--contextbash
undefined务必在所有kubectl和helm命令中显式指定,切勿依赖当前上下文。
--contextbash
undefinedCORRECT: Explicit context
正确做法:显式指定上下文
kubectl --context=prod-cluster get pods -n prod
helm --kube-context=prod-cluster status myapp -n prod
kubectl --context=prod-cluster get pods -n prod
helm --kube-context=prod-cluster status myapp -n prod
WRONG: Relying on current context
错误做法:依赖当前上下文
kubectl get pods -n prod # Which cluster?
This prevents accidental operations on the wrong cluster.
---kubectl get pods -n prod # 不知道是哪个集群?
这可以防止在错误的集群上执行操作。
---Layered Validation Approach
分层验证方法
ALWAYS follow this progression for robust deployments:
bash
undefined务必遵循以下步骤以实现可靠部署:
bash
undefined1. LINT - Static analysis (local charts only)
1. 语法检查(LINT)- 静态分析(仅本地Chart)
helm lint ./mychart --strict
helm lint ./mychart --strict
2. TEMPLATE - Render templates locally
2. 模板渲染(TEMPLATE)- 本地渲染模板
helm template myapp ./mychart
--debug
--values values.yaml
--debug
--values values.yaml
helm template myapp ./mychart
--debug
--values values.yaml
--debug
--values values.yaml
3. DRY-RUN - Server-side validation
3. 预演(DRY-RUN)- 服务端验证
helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run --debug
--namespace prod
--values values.yaml
--dry-run --debug
helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run --debug
--namespace prod
--values values.yaml
--dry-run --debug
4. INSTALL - Actual deployment
4. 安装(INSTALL)- 实际部署
helm install myapp ./mychart
--namespace prod
--values values.yaml
--atomic --wait
--namespace prod
--values values.yaml
--atomic --wait
helm install myapp ./mychart
--namespace prod
--values values.yaml
--atomic --wait
--namespace prod
--values values.yaml
--atomic --wait
5. TEST - Post-deployment validation (if chart has tests)
5. 测试(TEST)- 部署后验证(如果Chart包含测试用例)
helm test myapp --namespace prod --logs
undefinedhelm test myapp --namespace prod --logs
undefinedCore Debugging Commands
核心调试命令
Template Rendering & Inspection
模板渲染与检查
bash
undefinedbash
undefinedRender all templates locally
本地渲染所有模板
helm template myapp ./mychart
--debug
--values values.yaml
--debug
--values values.yaml
helm template myapp ./mychart
--debug
--values values.yaml
--debug
--values values.yaml
Render specific template file
渲染特定模板文件
helm template myapp ./mychart
--show-only templates/deployment.yaml
--values values.yaml
--show-only templates/deployment.yaml
--values values.yaml
helm template myapp ./mychart
--show-only templates/deployment.yaml
--values values.yaml
--show-only templates/deployment.yaml
--values values.yaml
Render with debug output (shows computed values)
带调试输出渲染(显示计算后的值)
helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | less
--debug
--values values.yaml
2>&1 | less
helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | less
--debug
--values values.yaml
2>&1 | less
Validate against Kubernetes API (dry-run)
针对Kubernetes API验证(预演)
helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run
--debug
--namespace prod
--values values.yaml
--dry-run
--debug
undefinedhelm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run
--debug
--namespace prod
--values values.yaml
--dry-run
--debug
undefinedInspect Deployed Resources
检查已部署资源
bash
undefinedbash
undefinedGet deployed manifest (actual YAML in cluster)
获取已部署的清单(集群中的实际YAML)
helm get manifest myapp --namespace prod
helm get manifest myapp --namespace prod
Get deployed values (what was actually used)
获取已部署的值(实际使用的配置)
helm get values myapp --namespace prod
helm get values myapp --namespace prod
Get ALL values (including defaults)
获取所有值(包括默认值)
helm get values myapp --namespace prod --all
helm get values myapp --namespace prod --all
Get release status with resources
获取包含资源的发布状态
helm status myapp --namespace prod --show-resources
helm status myapp --namespace prod --show-resources
Get release metadata
获取发布元数据
helm get metadata myapp --namespace prod
helm get metadata myapp --namespace prod
Get release hooks
获取发布钩子
helm get hooks myapp --namespace prod
helm get hooks myapp --namespace prod
Get everything about a release
获取发布的所有信息
helm get all myapp --namespace prod
undefinedhelm get all myapp --namespace prod
undefinedChart Validation
Chart验证
bash
undefinedbash
undefinedLint chart structure and templates
检查Chart结构和模板
helm lint ./mychart
helm lint ./mychart
Lint with strict mode (treats warnings as errors)
严格模式检查(将警告视为错误)
helm lint ./mychart --strict
helm lint ./mychart --strict
Lint with specific values
使用特定值进行检查
helm lint ./mychart --values values.yaml --strict
helm lint ./mychart --values values.yaml --strict
Validate chart against Kubernetes API
针对Kubernetes API验证Chart
helm install myapp ./mychart
--dry-run
--validate
--namespace prod
--dry-run
--validate
--namespace prod
undefinedhelm install myapp ./mychart
--dry-run
--validate
--namespace prod
--dry-run
--validate
--namespace prod
undefinedVerbose Debugging
详细调试
bash
undefinedbash
undefinedEnable Helm debug logging
启用Helm调试日志
helm install myapp ./mychart
--namespace prod
--debug
--dry-run
--namespace prod
--debug
--dry-run
helm install myapp ./mychart
--namespace prod
--debug
--dry-run
--namespace prod
--debug
--dry-run
Enable Kubernetes client logging
启用Kubernetes客户端日志
helm install myapp ./mychart
--namespace prod
--v=6 # Verbosity level 0-9
--namespace prod
--v=6 # Verbosity level 0-9
helm install myapp ./mychart
--namespace prod
--v=6 # 详细级别0-9
--namespace prod
--v=6 # 详细级别0-9
Combine debug and verbose
同时启用调试和详细日志
helm upgrade myapp ./mychart
--namespace prod
--debug
--v=6
--wait
--namespace prod
--debug
--v=6
--wait
undefinedhelm upgrade myapp ./mychart
--namespace prod
--debug
--v=6
--wait
--namespace prod
--debug
--v=6
--wait
undefinedCommon Failure Scenarios
常见故障场景
1. YAML Parse Errors
1. YAML解析错误
Symptom:
Error: YAML parse error on <file>: error converting YAML to JSONCauses:
- Template whitespace issues (extra spaces, tabs mixed with spaces)
- Incorrect indentation
- Malformed YAML syntax
- Template rendering issues
Debugging Steps:
bash
undefined症状:
Error: YAML parse error on <file>: error converting YAML to JSON原因:
- 模板空格问题(多余空格、制表符与空格混用)
- 缩进错误
- YAML语法格式错误
- 模板渲染问题
调试步骤:
bash
undefined1. Render template locally to see output
1. 本地渲染模板查看输出
helm template myapp ./mychart --debug 2>&1 | grep -A 10 "error"
helm template myapp ./mychart --debug 2>&1 | grep -A 10 "error"
2. Render specific problematic template
2. 渲染特定的问题模板
helm template myapp ./mychart
--show-only templates/deployment.yaml
--debug
--show-only templates/deployment.yaml
--debug
helm template myapp ./mychart
--show-only templates/deployment.yaml
--debug
--show-only templates/deployment.yaml
--debug
3. Check for whitespace issues
3. 检查空格问题
helm template myapp ./mychart | cat -A # Shows tabs/spaces
helm template myapp ./mychart | cat -A # 显示制表符/空格
4. Validate YAML syntax
4. 验证YAML语法
helm template myapp ./mychart | yq eval '.' -
**Common Fixes:**
```yamlhelm template myapp ./mychart | yq eval '.' -
**常见修复方法:**
```yaml❌ WRONG: Inconsistent whitespace
❌ 错误:空格不一致
spec:
containers:
- name: {{ .Values.name }} image: {{ .Values.image }} # Too much indent
spec:
containers:
- name: {{ .Values.name }} image: {{ .Values.image }} # 缩进过多
✅ CORRECT: Consistent 2-space indent
✅ 正确:统一使用2个空格缩进
spec:
containers:
- name: {{ .Values.name }} image: {{ .Values.image }}
spec:
containers:
- name: {{ .Values.name }} image: {{ .Values.image }}
❌ WRONG: Missing whitespace chomping
❌ 错误:未去除多余空格
labels:
{{ toYaml .Values.labels }} # Adds extra newlines
labels:
{{ toYaml .Values.labels }} # 会添加多余换行
✅ CORRECT: Chomp whitespace
✅ 正确:去除空格
labels:
{{- toYaml .Values.labels | nindent 2 }}
labels:
{{- toYaml .Values.labels | nindent 2 }}
❌ WRONG: Conditional creates empty lines
❌ 错误:条件判断产生空行
{{- if .Values.enabled }}
enabled: true
{{- end }}
{{- if .Values.enabled }}
enabled: true
{{- end }}
✅ CORRECT: Chomp trailing whitespace
✅ 正确:去除末尾空格
{{- if .Values.enabled }}
enabled: true
{{- end -}}
undefined{{- if .Values.enabled }}
enabled: true
{{- end -}}
undefined2. Template Rendering Errors
2. 模板渲染错误
Symptom:
Error: template: mychart/templates/deployment.yaml:15:8: executing "mychart/templates/deployment.yaml" at <.Values.foo>: nil pointer evaluating interface {}.fooCauses:
- Accessing undefined values
- Incorrect value path
- Missing required values
- Type mismatches
Debugging Steps:
bash
undefined症状:
Error: template: mychart/templates/deployment.yaml:15:8: executing "mychart/templates/deployment.yaml" at <.Values.foo>: nil pointer evaluating interface {}.foo原因:
- 访问未定义的值
- 值路径错误
- 缺少必填值
- 类型不匹配
调试步骤:
bash
undefined1. Check what values are available
1. 查看可用的值
helm show values ./mychart
helm show values ./mychart
2. Verify values being passed
2. 验证传入的值
helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | grep "COMPUTED VALUES"
--debug
--values values.yaml
2>&1 | grep "COMPUTED VALUES"
helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | grep "COMPUTED VALUES"
--debug
--values values.yaml
2>&1 | grep "COMPUTED VALUES"
3. Test with minimal values
3. 使用最小化值测试
helm template myapp ./mychart
--set foo=test
--debug
--set foo=test
--debug
**Common Fixes:**
```yamlhelm template myapp ./mychart
--set foo=test
--debug
--set foo=test
--debug
**常见修复方法:**
```yaml❌ WRONG: No default or check
❌ 错误:无默认值或检查
image: {{ .Values.image.tag }} # Fails if .Values.image is nil
image: {{ .Values.image.tag }} # 如果.Values.image为nil则失败
✅ CORRECT: Use default
✅ 正确:使用默认值
image: {{ .Values.image.tag | default "latest" }}
image: {{ .Values.image.tag | default "latest" }}
✅ CORRECT: Check before accessing
✅ 正确:先检查再访问
{{- if .Values.image }}
image: {{ .Values.image.tag | default "latest" }}
{{- end }}
{{- if .Values.image }}
image: {{ .Values.image.tag | default "latest" }}
{{- end }}
✅ CORRECT: Use required for mandatory values
✅ 正确:对必填值使用required
image: {{ required "image.repository is required" .Values.image.repository }}
image: {{ required "image.repository是必填项" .Values.image.repository }}
❌ WRONG: Assuming type
❌ 错误:假设类型
replicas: {{ .Values.replicaCount }} # May be string "3"
replicas: {{ .Values.replicaCount }} # 可能是字符串"3"
✅ CORRECT: Ensure int type
✅ 正确:确保为整数类型
replicas: {{ .Values.replicaCount | int }}
undefinedreplicas: {{ .Values.replicaCount | int }}
undefined3. Value Type Errors
3. 值类型错误
Symptom:
Error: json: cannot unmarshal string into Go value of type intCauses:
- String passed where number expected
- Boolean as string
- Incorrect YAML parsing
Debugging Steps:
bash
undefined症状:
Error: json: cannot unmarshal string into Go value of type int原因:
- 传入字符串但期望数字
- 布尔值以字符串形式传入
- YAML解析错误
调试步骤:
bash
undefined1. Check value types in rendered output
1. 查看渲染输出中的值类型
helm template myapp ./mychart --debug | grep -A 5 "replicaCount"
helm template myapp ./mychart --debug | grep -A 5 "replicaCount"
2. Verify values file syntax
2. 验证值文件语法
yq eval '.replicaCount' values.yaml
yq eval '.replicaCount' values.yaml
3. Test with explicit type conversion
3. 测试显式类型转换
helm template myapp ./mychart --set-string name="value"
**Common Fixes:**
```yamlhelm template myapp ./mychart --set-string name="value"
**常见修复方法:**
```yaml❌ WRONG: String in values.yaml
❌ 错误:值文件中使用字符串
replicaCount: "3" # String
replicaCount: "3" # 字符串类型
✅ CORRECT: Number in values.yaml
✅ 正确:值文件中使用数字
replicaCount: 3 # Int
replicaCount: 3 # 整数类型
Template: Always convert to correct type
模板:始终转换为正确类型
replicas: {{ .Values.replicaCount | int }}
port: {{ .Values.service.port | int }}
enabled: {{ .Values.feature.enabled | ternary "true" "false" }}
replicas: {{ .Values.replicaCount | int }}
port: {{ .Values.service.port | int }}
enabled: {{ .Values.feature.enabled | ternary "true" "false" }}
Use --set-string for forcing strings
使用--set-string强制设置为字符串
helm install myapp ./chart --set-string version="1.0"
undefinedhelm install myapp ./chart --set-string version="1.0"
undefined4. Resource Already Exists
4. 资源已存在
Symptom:
Error: rendered manifests contain a resource that already existsCauses:
- Resource from previous failed install
- Resource managed by another release
- Manual resource creation conflict
Debugging Steps:
bash
undefined症状:
Error: rendered manifests contain a resource that already exists原因:
- 之前安装失败残留的资源
- 其他发布管理的资源
- 手动创建的资源冲突
调试步骤:
bash
undefined1. Check if resource exists
1. 检查资源是否存在
kubectl get <resource-type> <name> -n <namespace>
kubectl get <resource-type> <name> -n <namespace>
2. Check resource ownership
2. 检查资源所有权
kubectl get <resource-type> <name> -n <namespace> -o yaml | grep -A 5 "labels:"
kubectl get <resource-type> <name> -n <namespace> -o yaml | grep -A 5 "labels:"
3. Check which Helm release owns it
3. 检查哪个Helm发布拥有该资源
helm list --all-namespaces | grep <resource-name>
helm list --all-namespaces | grep <resource-name>
4. Check for stuck releases
4. 检查卡住的发布
helm list --all-namespaces --failed
helm list --all-namespaces --pending
**Solutions:**
```bashhelm list --all-namespaces --failed
helm list --all-namespaces --pending
**解决方案:**
```bashOption 1: Uninstall conflicting release
方案1:卸载冲突的发布
helm uninstall <release> --namespace <namespace>
helm uninstall <release> --namespace <namespace>
Option 2: Delete specific resource manually
方案2:手动删除特定资源
kubectl delete <resource-type> <name> -n <namespace>
kubectl delete <resource-type> <name> -n <namespace>
Option 3: Use different release name
方案3:使用不同的发布名称
helm install myapp-v2 ./chart --namespace prod
helm install myapp-v2 ./chart --namespace prod
Option 4: Adopt existing resources (advanced)
方案4:接管现有资源(高级)
kubectl annotate <resource-type> <name>
meta.helm.sh/release-name=<release>
meta.helm.sh/release-namespace=<namespace>
-n <namespace> kubectl label <resource-type> <name>
app.kubernetes.io/managed-by=Helm
-n <namespace>
meta.helm.sh/release-name=<release>
meta.helm.sh/release-namespace=<namespace>
-n <namespace> kubectl label <resource-type> <name>
app.kubernetes.io/managed-by=Helm
-n <namespace>
undefinedkubectl annotate <resource-type> <name>
meta.helm.sh/release-name=<release>
meta.helm.sh/release-namespace=<namespace>
-n <namespace> kubectl label <resource-type> <name>
app.kubernetes.io/managed-by=Helm
-n <namespace>
meta.helm.sh/release-name=<release>
meta.helm.sh/release-namespace=<namespace>
-n <namespace> kubectl label <resource-type> <name>
app.kubernetes.io/managed-by=Helm
-n <namespace>
undefined5. Image Pull Failures
5. 镜像拉取失败
Symptom:
Pod status: ImagePullBackOff or ErrImagePullCauses:
- Wrong image name/tag
- Missing registry credentials
- Private registry authentication
- Network/registry issues
Debugging Steps:
bash
undefined症状:
Pod状态:ImagePullBackOff或ErrImagePull原因:
- 镜像名称/标签错误
- 缺少镜像仓库凭证
- 私有仓库认证失败
- 网络/镜像仓库问题
调试步骤:
bash
undefined1. Check pod events
1. 检查Pod事件
kubectl describe pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
2. Verify image in manifest
2. 验证清单中的镜像
helm get manifest myapp -n prod | grep "image:"
helm get manifest myapp -n prod | grep "image:"
3. Check image pull secrets
3. 检查镜像拉取密钥
kubectl get secrets -n <namespace>
kubectl get sa default -n <namespace> -o yaml | grep imagePullSecrets
kubectl get secrets -n <namespace>
kubectl get sa default -n <namespace> -o yaml | grep imagePullSecrets
4. Test image pull manually
4. 手动测试镜像拉取
docker pull image:tag
**Solutions:**
```bashdocker pull image:tag
**解决方案:**
```bashOption 1: Fix image name/tag in values
方案1:修复值文件中的镜像名称/标签
helm upgrade myapp ./chart
--namespace prod
--set image.repository=myregistry.io/myapp
--set image.tag=v1.0.0
--namespace prod
--set image.repository=myregistry.io/myapp
--set image.tag=v1.0.0
helm upgrade myapp ./chart
--namespace prod
--set image.repository=myregistry.io/myapp
--set image.tag=v1.0.0
--namespace prod
--set image.repository=myregistry.io/myapp
--set image.tag=v1.0.0
Option 2: Create image pull secret
方案2:创建镜像拉取密钥
kubectl create secret docker-registry regcred
--docker-server=<registry>
--docker-username=<user>
--docker-password=<pass>
--namespace <namespace>
--docker-server=<registry>
--docker-username=<user>
--docker-password=<pass>
--namespace <namespace>
kubectl create secret docker-registry regcred
--docker-server=<registry>
--docker-username=<user>
--docker-password=<pass>
--namespace <namespace>
--docker-server=<registry>
--docker-username=<user>
--docker-password=<pass>
--namespace <namespace>
Reference in values.yaml:
在values.yaml中引用:
imagePullSecrets:
- name: regcred
imagePullSecrets:
- name: regcred
Option 3: Update service account
方案3:更新服务账户
kubectl patch serviceaccount default -n <namespace>
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
undefinedkubectl patch serviceaccount default -n <namespace>
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
undefined6. CRD Issues
6. CRD问题
Symptom:
Error: unable to recognize "": no matches for kind "MyCustomResource" in version "mygroup/v1"Causes:
- CRD not installed
- CRD installed in wrong order
- CRD version mismatch
- API version not supported in cluster
Debugging Steps:
bash
undefined症状:
Error: unable to recognize "": no matches for kind "MyCustomResource" in version "mygroup/v1"原因:
- CRD未安装
- CRD安装顺序错误
- CRD版本不匹配
- 集群不支持该API版本
调试步骤:
bash
undefined1. Check if CRD exists
1. 检查CRD是否存在
kubectl get crds | grep myresource
kubectl get crds | grep myresource
2. Check CRD version
2. 检查CRD版本
kubectl get crd myresource.mygroup.io -o yaml | grep "version:"
kubectl get crd myresource.mygroup.io -o yaml | grep "version:"
3. Check API versions supported
3. 检查支持的API版本
kubectl api-resources | grep mygroup
kubectl api-resources | grep mygroup
4. Verify template uses correct API version
4. 验证模板使用的API版本是否正确
helm template myapp ./chart | grep "apiVersion:"
**Solutions:**
```bashhelm template myapp ./chart | grep "apiVersion:"
**解决方案:**
```bashOption 1: Install CRDs first (if separate chart)
方案1:先安装CRD(如果是独立Chart)
helm install myapp-crds ./crds --namespace prod
helm install myapp ./chart --namespace prod
helm install myapp-crds ./crds --namespace prod
helm install myapp ./chart --namespace prod
Option 2: Use --skip-crds if reinstalling
方案2:重新安装时跳过CRD
helm upgrade myapp ./chart
--namespace prod
--skip-crds
--namespace prod
--skip-crds
helm upgrade myapp ./chart
--namespace prod
--skip-crds
--namespace prod
--skip-crds
Option 3: Manually install CRDs
方案3:手动安装CRD
kubectl apply -f crds/
kubectl apply -f crds/
Option 4: Update chart to use correct API version
方案4:更新Chart使用正确的API版本
Edit templates to use supported apiVersion
编辑模板使用支持的apiVersion
undefinedundefined7. Timeout Errors
7. 超时错误
Symptom:
Error: timed out waiting for the conditionCauses:
- Pods not becoming ready (failing health checks)
- Resource limits too low
- Image pull taking too long
- Init containers failing
Debugging Steps:
bash
undefined症状:
Error: timed out waiting for the condition原因:
- Pod未就绪(健康检查失败)
- 资源限制过低
- 镜像拉取耗时过长
- 初始化容器失败
调试步骤:
bash
undefined1. Check pod status
1. 检查Pod状态
kubectl get pods -n <namespace> -l app.kubernetes.io/instance=myapp
kubectl get pods -n <namespace> -l app.kubernetes.io/instance=myapp
2. Check pod events and logs
2. 检查Pod事件和日志
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
3. Check init containers
3. 检查初始化容器
kubectl logs <pod-name> -n <namespace> -c <init-container-name>
kubectl logs <pod-name> -n <namespace> -c <init-container-name>
4. Increase timeout and watch
4. 增加超时时间并监控
helm upgrade myapp ./chart
--namespace prod
--wait
--timeout 15m
--debug & watch kubectl get pods -n prod
--namespace prod
--wait
--timeout 15m
--debug & watch kubectl get pods -n prod
**Solutions:**
```bashhelm upgrade myapp ./chart
--namespace prod
--wait
--timeout 15m
--debug & watch kubectl get pods -n prod
--namespace prod
--wait
--timeout 15m
--debug & watch kubectl get pods -n prod
**解决方案:**
```bashOption 1: Increase timeout
方案1:增加超时时间
helm upgrade myapp ./chart
--namespace prod
--timeout 10m
--wait
--namespace prod
--timeout 10m
--wait
helm upgrade myapp ./chart
--namespace prod
--timeout 10m
--wait
--namespace prod
--timeout 10m
--wait
Option 2: Don't wait (manual verification)
方案2:不等待(手动验证)
helm upgrade myapp ./chart
--namespace prod
--namespace prod
helm upgrade myapp ./chart
--namespace prod
--namespace prod
Then manually check: kubectl get pods -n prod
然后手动检查:kubectl get pods -n prod
Option 3: Fix readiness probe
方案3:修复就绪探针
Adjust in values.yaml or chart templates:
在values.yaml或Chart模板中调整:
readinessProbe:
initialDelaySeconds: 30 # Give more time to start
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6 # Allow more failures
readinessProbe:
initialDelaySeconds: 30 # 给予更多启动时间
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6 # 允许更多失败次数
Option 4: Increase resource limits
方案4:增加资源限制
resources:
limits:
memory: "512Mi" # Was too low at 128Mi
cpu: "1000m"
undefinedresources:
limits:
memory: "512Mi" # 之前128Mi过低
cpu: "1000m"
undefined8. Hook Failures
8. 钩子失败
Symptom:
Error: pre-upgrade hooks failed: job failedCauses:
- Hook job failing
- Hook timing issues
- Hook dependencies not met
- Hook timeout
Debugging Steps:
bash
undefined症状:
Error: pre-upgrade hooks failed: job failed原因:
- 钩子任务失败
- 钩子时序问题
- 钩子依赖未满足
- 钩子超时
调试步骤:
bash
undefined1. Check hook jobs/pods
1. 检查钩子任务/Pod
kubectl get jobs -n <namespace>
kubectl get pods -n <namespace> -l helm.sh/hook
kubectl get jobs -n <namespace>
kubectl get pods -n <namespace> -l helm.sh/hook
2. Check hook logs
2. 检查钩子日志
kubectl logs job/<hook-job-name> -n <namespace>
kubectl logs job/<hook-job-name> -n <namespace>
3. Get hook definitions
3. 获取钩子定义
helm get hooks myapp -n <namespace>
helm get hooks myapp -n <namespace>
4. Check hook status in release
4. 检查发布中的钩子状态
helm get manifest myapp -n <namespace> | grep -A 10 "helm.sh/hook"
**Solutions:**
```bashhelm get manifest myapp -n <namespace> | grep -A 10 "helm.sh/hook"
**解决方案:**
```bashOption 1: Delete failed hook resources
方案1:删除失败的钩子资源
kubectl delete job <hook-job> -n <namespace>
helm upgrade myapp ./chart --namespace prod
kubectl delete job <hook-job> -n <namespace>
helm upgrade myapp ./chart --namespace prod
Option 2: Skip hooks temporarily (debugging only)
方案2:临时跳过钩子(仅调试用)
helm upgrade myapp ./chart
--namespace prod
--no-hooks
--namespace prod
--no-hooks
helm upgrade myapp ./chart
--namespace prod
--no-hooks
--namespace prod
--no-hooks
Option 3: Fix hook in template
方案3:修复模板中的钩子
Adjust hook annotations:
调整钩子注解:
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "0" # Order of execution
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed # Cleanup
undefinedannotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "0" # 执行顺序
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed # 清理策略
undefinedDebugging Workflow
调试工作流
Step-by-Step Debugging Process
分步调试流程
bash
undefinedbash
undefined1. IDENTIFY THE PROBLEM
1. 定位问题
Check release status
检查发布状态
helm status myapp --namespace prod --show-resources
helm status myapp --namespace prod --show-resources
Check release history
检查发布历史
helm history myapp --namespace prod
helm history myapp --namespace prod
2. INSPECT CONFIGURATION
2. 检查配置
What values were used?
使用了哪些值?
helm get values myapp --namespace prod --all > actual-values.yaml
helm get values myapp --namespace prod --all > actual-values.yaml
What manifests were deployed?
部署了哪些清单?
helm get manifest myapp --namespace prod > actual-manifests.yaml
helm get manifest myapp --namespace prod > actual-manifests.yaml
3. CHECK KUBERNETES RESOURCES
3. 检查Kubernetes资源
Are pods running?
Pod是否在运行?
kubectl get pods -n prod -l app.kubernetes.io/instance=myapp
kubectl get pods -n prod -l app.kubernetes.io/instance=myapp
Any events?
有哪些事件?
kubectl get events -n prod --sort-by='.lastTimestamp' | tail -20
kubectl get events -n prod --sort-by='.lastTimestamp' | tail -20
Pod details
Pod详情
kubectl describe pod <pod-name> -n prod
kubectl logs <pod-name> -n prod
kubectl describe pod <pod-name> -n prod
kubectl logs <pod-name> -n prod
4. VALIDATE LOCALLY
4. 本地验证
Re-render templates with same values
使用相同的值重新渲染模板
helm template myapp ./chart -f actual-values.yaml > local-manifests.yaml
helm template myapp ./chart -f actual-values.yaml > local-manifests.yaml
Compare deployed vs local
对比已部署和本地渲染的内容
diff actual-manifests.yaml local-manifests.yaml
diff actual-manifests.yaml local-manifests.yaml
5. TEST FIX
5. 测试修复
Dry-run with fix
预演修复后的效果
helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--dry-run --debug
--namespace prod
--set fix.value=true
--dry-run --debug
helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--dry-run --debug
--namespace prod
--set fix.value=true
--dry-run --debug
Apply fix
应用修复
helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--atomic --wait
--namespace prod
--set fix.value=true
--atomic --wait
undefinedhelm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--atomic --wait
--namespace prod
--set fix.value=true
--atomic --wait
undefinedBest Practices for Debugging
调试最佳实践
Enable Debug Output
启用调试输出
✅ DO: Use to see what's happening
--debugbash
helm install myapp ./chart --namespace prod --debug✅ 推荐:使用查看详细过程
--debugbash
helm install myapp ./chart --namespace prod --debugDry-Run Everything
预演所有操作
✅ DO: Always dry-run before applying changes
bash
helm upgrade myapp ./chart -n prod --dry-run --debug✅ 推荐:在应用更改前始终进行预演
bash
helm upgrade myapp ./chart -n prod --dry-run --debugLayer Your Validation
分层验证
✅ DO: Progress through validation layers
bash
helm lint ./chart --strict
helm template myapp ./chart -f values.yaml
helm install myapp ./chart -n prod --dry-run --debug
helm install myapp ./chart -n prod --atomic --wait✅ 推荐:按步骤完成分层验证
bash
helm lint ./chart --strict
helm template myapp ./chart -f values.yaml
helm install myapp ./chart -n prod --dry-run --debug
helm install myapp ./chart -n prod --atomic --waitCapture State
捕获状态
✅ DO: Save release state before changes
bash
undefined✅ 推荐:在更改前保存发布状态
bash
undefinedBefore upgrade
升级前
helm get values myapp -n prod --all > values-before.yaml
helm get manifest myapp -n prod > manifest-before.yaml
kubectl get pods -n prod -o yaml > pods-before.yaml
undefinedhelm get values myapp -n prod --all > values-before.yaml
helm get manifest myapp -n prod > manifest-before.yaml
kubectl get pods -n prod -o yaml > pods-before.yaml
undefinedUse Atomic Deployments
使用原子部署
✅ DO: Enable automatic rollback
bash
helm upgrade myapp ./chart -n prod --atomic --wait✅ 推荐:启用自动回滚
bash
helm upgrade myapp ./chart -n prod --atomic --waitCheck Kubernetes Resources
检查Kubernetes资源
✅ DO: Inspect deployed resources directly
bash
kubectl get all -n prod -l app.kubernetes.io/instance=myapp
kubectl describe pod <pod> -n prod
kubectl logs <pod> -n prod✅ 推荐:直接检查已部署的资源
bash
kubectl get all -n prod -l app.kubernetes.io/instance=myapp
kubectl describe pod <pod> -n prod
kubectl logs <pod> -n prodUnderstand Value Precedence
理解值的优先级
✅ DO: Know override order
bash
undefined✅ 推荐:了解值的覆盖顺序
bash
undefinedLowest to highest precedence:
优先级从低到高:
1. Chart defaults (values.yaml)
1. Chart默认值(values.yaml)
2. --reuse-values (previous release)
2. --reuse-values(上一次发布的值)
3. -f values1.yaml
3. -f values1.yaml
4. -f values2.yaml (overrides values1.yaml)
4. -f values2.yaml(覆盖values1.yaml)
5. --set key=value (overrides everything)
5. --set key=value(覆盖所有值)
undefinedundefinedDebugging Tools & Utilities
调试工具与实用程序
yq - YAML Processor
yq - YAML处理器
bash
undefinedbash
undefinedValidate YAML syntax
验证YAML语法
helm template myapp ./chart | yq eval '.' -
helm template myapp ./chart | yq eval '.' -
Extract specific values
提取特定值
helm get values myapp -n prod -o yaml | yq eval '.image.tag' -
helm get values myapp -n prod -o yaml | yq eval '.image.tag' -
Pretty print
格式化输出
helm get manifest myapp -n prod | yq eval '.' -
undefinedhelm get manifest myapp -n prod | yq eval '.' -
undefinedkubectl Plugin: stern
kubectl插件:stern
bash
undefinedbash
undefinedTail logs from multiple pods
多Pod日志尾部查看
stern -n prod myapp
stern -n prod myapp
Follow logs with timestamps
带时间戳跟踪日志
stern -n prod myapp --timestamps
undefinedstern -n prod myapp --timestamps
undefinedkubectl Plugin: neat
kubectl插件:neat
bash
undefinedbash
undefinedClean kubectl output (remove clutter)
清理kubectl输出(去除冗余信息)
kubectl get pod <pod> -n prod -o yaml | kubectl neat
undefinedkubectl get pod <pod> -n prod -o yaml | kubectl neat
undefinedk9s - Kubernetes CLI
k9s - Kubernetes命令行工具
bash
undefinedbash
undefinedInteractive cluster management
交互式集群管理
k9s -n prod
k9s -n prod
Features:
功能:
- Live resource updates
- 资源实时更新
- Log viewing
- 日志查看
- Resource editing
- 资源编辑
- Port forwarding
- 端口转发
undefinedundefinedIntegration with Other Tools
与其他工具的集成
ArgoCD Debugging
ArgoCD调试
bash
undefinedbash
undefinedWhen managed by ArgoCD:
当由ArgoCD管理时:
1. Check ArgoCD Application status
1. 检查ArgoCD应用状态
argocd app get <app-name>
argocd app get <app-name>
2. Still use helm for inspection
2. 仍可使用helm进行检查
helm get values <release> -n <namespace> --all
helm get manifest <release> -n <namespace>
helm get values <release> -n <namespace> --all
helm get manifest <release> -n <namespace>
3. Sync with debugging
3. 带调试的同步
argocd app sync <app-name> --dry-run
argocd app sync <app-name> --prune --force
undefinedargocd app sync <app-name> --dry-run
argocd app sync <app-name> --prune --force
undefinedCI/CD Debugging
CI/CD调试
yaml
undefinedyaml
undefinedAdd debugging to pipeline
在流水线中添加调试步骤
-
name: Debug Helm Install run: | set -x # Enable bash debugging helm template myapp ./chart
-f values.yaml
--debug helm install myapp ./chart
--namespace prod
--dry-run
--debug continue-on-error: true # Don't fail pipeline -
name: Capture State on Failure if: failure() run: | helm list --all-namespaces kubectl get all -n prod kubectl describe pods -n prod kubectl logs -n prod --all-containers --tail=100
undefined-
name: 调试Helm安装 run: | set -x # 启用bash调试 helm template myapp ./chart
-f values.yaml
--debug helm install myapp ./chart
--namespace prod
--dry-run
--debug continue-on-error: true # 不终止流水线 -
name: 失败时捕获状态 if: failure() run: | helm list --all-namespaces kubectl get all -n prod kubectl describe pods -n prod kubectl logs -n prod --all-containers --tail=100
undefinedAgentic Optimizations
智能优化命令
| Context | Command |
|---|---|
| Release status (JSON) | |
| All values (JSON) | |
| Pod status (compact) | |
| Events (sorted) | |
| Render + validate | |
| 场景 | 命令 |
|---|---|
| 发布状态(JSON) | |
| 所有值(JSON) | |
| Pod状态(精简) | |
| 事件(排序后) | |
| 渲染+验证 | |
Related Skills
相关技能
- Helm Release Management - Install, upgrade, uninstall operations
- Helm Values Management - Advanced configuration management
- Helm Release Recovery - Rollback and recovery strategies
- Kubernetes Operations - Managing and debugging K8s resources
- ArgoCD CLI Login - GitOps debugging with ArgoCD
- Helm发布管理 - 安装、升级、卸载操作
- Helm值管理 - 高级配置管理
- Helm发布恢复 - 回滚与恢复策略
- Kubernetes运维 - K8s资源管理与调试
- ArgoCD CLI登录 - GitOps调试