helm-debugging

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Helm Debugging & Troubleshooting

Helm调试与故障排查

Comprehensive guidance for diagnosing and fixing Helm deployment failures, template errors, and configuration issues.
这份指南全面介绍了如何诊断和修复Helm部署失败、模板错误以及配置问题。

When to Use

适用场景

Use this skill automatically when:
  • User reports Helm deployment failures or errors
  • User mentions debugging, troubleshooting, or fixing Helm issues
  • Template rendering problems occur
  • Value validation or type errors
  • Resource conflicts or API errors
  • Image pull failures or pod crashes
  • User needs to inspect deployed resources
当出现以下情况时,自动使用本技能:
  • 用户反馈Helm部署失败或报错
  • 用户提及调试、排查或修复Helm问题
  • 出现模板渲染问题
  • 值验证或类型错误
  • 资源冲突或API错误
  • 镜像拉取失败或Pod崩溃
  • 用户需要检查已部署的资源

Context Safety (CRITICAL)

上下文安全性(至关重要)

Always specify
--context
explicitly in all kubectl and helm commands. Never rely on the current context.
bash
undefined
务必在所有kubectl和helm命令中显式指定
--context
,切勿依赖当前上下文。
bash
undefined

CORRECT: Explicit context

正确做法:显式指定上下文

kubectl --context=prod-cluster get pods -n prod helm --kube-context=prod-cluster status myapp -n prod
kubectl --context=prod-cluster get pods -n prod helm --kube-context=prod-cluster status myapp -n prod

WRONG: Relying on current context

错误做法:依赖当前上下文

kubectl get pods -n prod # Which cluster?

This prevents accidental operations on the wrong cluster.

---
kubectl get pods -n prod # 不知道是哪个集群?

这可以防止在错误的集群上执行操作。

---

Layered Validation Approach

分层验证方法

ALWAYS follow this progression for robust deployments:
bash
undefined
务必遵循以下步骤以实现可靠部署:
bash
undefined

1. LINT - Static analysis (local charts only)

1. 语法检查(LINT)- 静态分析(仅本地Chart)

helm lint ./mychart --strict
helm lint ./mychart --strict

2. TEMPLATE - Render templates locally

2. 模板渲染(TEMPLATE)- 本地渲染模板

helm template myapp ./mychart
--debug
--values values.yaml
helm template myapp ./mychart
--debug
--values values.yaml

3. DRY-RUN - Server-side validation

3. 预演(DRY-RUN)- 服务端验证

helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run --debug
helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run --debug

4. INSTALL - Actual deployment

4. 安装(INSTALL)- 实际部署

helm install myapp ./mychart
--namespace prod
--values values.yaml
--atomic --wait
helm install myapp ./mychart
--namespace prod
--values values.yaml
--atomic --wait

5. TEST - Post-deployment validation (if chart has tests)

5. 测试(TEST)- 部署后验证(如果Chart包含测试用例)

helm test myapp --namespace prod --logs
undefined
helm test myapp --namespace prod --logs
undefined

Core Debugging Commands

核心调试命令

Template Rendering & Inspection

模板渲染与检查

bash
undefined
bash
undefined

Render all templates locally

本地渲染所有模板

helm template myapp ./mychart
--debug
--values values.yaml
helm template myapp ./mychart
--debug
--values values.yaml

Render specific template file

渲染特定模板文件

helm template myapp ./mychart
--show-only templates/deployment.yaml
--values values.yaml
helm template myapp ./mychart
--show-only templates/deployment.yaml
--values values.yaml

Render with debug output (shows computed values)

带调试输出渲染(显示计算后的值)

helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | less
helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | less

Validate against Kubernetes API (dry-run)

针对Kubernetes API验证(预演)

helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run
--debug
undefined
helm install myapp ./mychart
--namespace prod
--values values.yaml
--dry-run
--debug
undefined

Inspect Deployed Resources

检查已部署资源

bash
undefined
bash
undefined

Get deployed manifest (actual YAML in cluster)

获取已部署的清单(集群中的实际YAML)

helm get manifest myapp --namespace prod
helm get manifest myapp --namespace prod

Get deployed values (what was actually used)

获取已部署的值(实际使用的配置)

helm get values myapp --namespace prod
helm get values myapp --namespace prod

Get ALL values (including defaults)

获取所有值(包括默认值)

helm get values myapp --namespace prod --all
helm get values myapp --namespace prod --all

Get release status with resources

获取包含资源的发布状态

helm status myapp --namespace prod --show-resources
helm status myapp --namespace prod --show-resources

Get release metadata

获取发布元数据

helm get metadata myapp --namespace prod
helm get metadata myapp --namespace prod

Get release hooks

获取发布钩子

helm get hooks myapp --namespace prod
helm get hooks myapp --namespace prod

Get everything about a release

获取发布的所有信息

helm get all myapp --namespace prod
undefined
helm get all myapp --namespace prod
undefined

Chart Validation

Chart验证

bash
undefined
bash
undefined

Lint chart structure and templates

检查Chart结构和模板

helm lint ./mychart
helm lint ./mychart

Lint with strict mode (treats warnings as errors)

严格模式检查(将警告视为错误)

helm lint ./mychart --strict
helm lint ./mychart --strict

Lint with specific values

使用特定值进行检查

helm lint ./mychart --values values.yaml --strict
helm lint ./mychart --values values.yaml --strict

Validate chart against Kubernetes API

针对Kubernetes API验证Chart

helm install myapp ./mychart
--dry-run
--validate
--namespace prod
undefined
helm install myapp ./mychart
--dry-run
--validate
--namespace prod
undefined

Verbose Debugging

详细调试

bash
undefined
bash
undefined

Enable Helm debug logging

启用Helm调试日志

helm install myapp ./mychart
--namespace prod
--debug
--dry-run
helm install myapp ./mychart
--namespace prod
--debug
--dry-run

Enable Kubernetes client logging

启用Kubernetes客户端日志

helm install myapp ./mychart
--namespace prod
--v=6 # Verbosity level 0-9
helm install myapp ./mychart
--namespace prod
--v=6 # 详细级别0-9

Combine debug and verbose

同时启用调试和详细日志

helm upgrade myapp ./mychart
--namespace prod
--debug
--v=6
--wait
undefined
helm upgrade myapp ./mychart
--namespace prod
--debug
--v=6
--wait
undefined

Common Failure Scenarios

常见故障场景

1. YAML Parse Errors

1. YAML解析错误

Symptom:
Error: YAML parse error on <file>: error converting YAML to JSON
Causes:
  • Template whitespace issues (extra spaces, tabs mixed with spaces)
  • Incorrect indentation
  • Malformed YAML syntax
  • Template rendering issues
Debugging Steps:
bash
undefined
症状:
Error: YAML parse error on <file>: error converting YAML to JSON
原因:
  • 模板空格问题(多余空格、制表符与空格混用)
  • 缩进错误
  • YAML语法格式错误
  • 模板渲染问题
调试步骤:
bash
undefined

1. Render template locally to see output

1. 本地渲染模板查看输出

helm template myapp ./mychart --debug 2>&1 | grep -A 10 "error"
helm template myapp ./mychart --debug 2>&1 | grep -A 10 "error"

2. Render specific problematic template

2. 渲染特定的问题模板

helm template myapp ./mychart
--show-only templates/deployment.yaml
--debug
helm template myapp ./mychart
--show-only templates/deployment.yaml
--debug

3. Check for whitespace issues

3. 检查空格问题

helm template myapp ./mychart | cat -A # Shows tabs/spaces
helm template myapp ./mychart | cat -A # 显示制表符/空格

4. Validate YAML syntax

4. 验证YAML语法

helm template myapp ./mychart | yq eval '.' -

**Common Fixes:**

```yaml
helm template myapp ./mychart | yq eval '.' -

**常见修复方法:**

```yaml

❌ WRONG: Inconsistent whitespace

❌ 错误:空格不一致

spec: containers:
  • name: {{ .Values.name }} image: {{ .Values.image }} # Too much indent
spec: containers:
  • name: {{ .Values.name }} image: {{ .Values.image }} # 缩进过多

✅ CORRECT: Consistent 2-space indent

✅ 正确:统一使用2个空格缩进

spec: containers:
  • name: {{ .Values.name }} image: {{ .Values.image }}
spec: containers:
  • name: {{ .Values.name }} image: {{ .Values.image }}

❌ WRONG: Missing whitespace chomping

❌ 错误:未去除多余空格

labels: {{ toYaml .Values.labels }} # Adds extra newlines
labels: {{ toYaml .Values.labels }} # 会添加多余换行

✅ CORRECT: Chomp whitespace

✅ 正确:去除空格

labels: {{- toYaml .Values.labels | nindent 2 }}
labels: {{- toYaml .Values.labels | nindent 2 }}

❌ WRONG: Conditional creates empty lines

❌ 错误:条件判断产生空行

{{- if .Values.enabled }} enabled: true {{- end }}
{{- if .Values.enabled }} enabled: true {{- end }}

✅ CORRECT: Chomp trailing whitespace

✅ 正确:去除末尾空格

{{- if .Values.enabled }} enabled: true {{- end -}}
undefined
{{- if .Values.enabled }} enabled: true {{- end -}}
undefined

2. Template Rendering Errors

2. 模板渲染错误

Symptom:
Error: template: mychart/templates/deployment.yaml:15:8: executing "mychart/templates/deployment.yaml" at <.Values.foo>: nil pointer evaluating interface {}.foo
Causes:
  • Accessing undefined values
  • Incorrect value path
  • Missing required values
  • Type mismatches
Debugging Steps:
bash
undefined
症状:
Error: template: mychart/templates/deployment.yaml:15:8: executing "mychart/templates/deployment.yaml" at <.Values.foo>: nil pointer evaluating interface {}.foo
原因:
  • 访问未定义的值
  • 值路径错误
  • 缺少必填值
  • 类型不匹配
调试步骤:
bash
undefined

1. Check what values are available

1. 查看可用的值

helm show values ./mychart
helm show values ./mychart

2. Verify values being passed

2. 验证传入的值

helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | grep "COMPUTED VALUES"
helm template myapp ./mychart
--debug
--values values.yaml
2>&1 | grep "COMPUTED VALUES"

3. Test with minimal values

3. 使用最小化值测试

helm template myapp ./mychart
--set foo=test
--debug

**Common Fixes:**

```yaml
helm template myapp ./mychart
--set foo=test
--debug

**常见修复方法:**

```yaml

❌ WRONG: No default or check

❌ 错误:无默认值或检查

image: {{ .Values.image.tag }} # Fails if .Values.image is nil
image: {{ .Values.image.tag }} # 如果.Values.image为nil则失败

✅ CORRECT: Use default

✅ 正确:使用默认值

image: {{ .Values.image.tag | default "latest" }}
image: {{ .Values.image.tag | default "latest" }}

✅ CORRECT: Check before accessing

✅ 正确:先检查再访问

{{- if .Values.image }} image: {{ .Values.image.tag | default "latest" }} {{- end }}
{{- if .Values.image }} image: {{ .Values.image.tag | default "latest" }} {{- end }}

✅ CORRECT: Use required for mandatory values

✅ 正确:对必填值使用required

image: {{ required "image.repository is required" .Values.image.repository }}
image: {{ required "image.repository是必填项" .Values.image.repository }}

❌ WRONG: Assuming type

❌ 错误:假设类型

replicas: {{ .Values.replicaCount }} # May be string "3"
replicas: {{ .Values.replicaCount }} # 可能是字符串"3"

✅ CORRECT: Ensure int type

✅ 正确:确保为整数类型

replicas: {{ .Values.replicaCount | int }}
undefined
replicas: {{ .Values.replicaCount | int }}
undefined

3. Value Type Errors

3. 值类型错误

Symptom:
Error: json: cannot unmarshal string into Go value of type int
Causes:
  • String passed where number expected
  • Boolean as string
  • Incorrect YAML parsing
Debugging Steps:
bash
undefined
症状:
Error: json: cannot unmarshal string into Go value of type int
原因:
  • 传入字符串但期望数字
  • 布尔值以字符串形式传入
  • YAML解析错误
调试步骤:
bash
undefined

1. Check value types in rendered output

1. 查看渲染输出中的值类型

helm template myapp ./mychart --debug | grep -A 5 "replicaCount"
helm template myapp ./mychart --debug | grep -A 5 "replicaCount"

2. Verify values file syntax

2. 验证值文件语法

yq eval '.replicaCount' values.yaml
yq eval '.replicaCount' values.yaml

3. Test with explicit type conversion

3. 测试显式类型转换

helm template myapp ./mychart --set-string name="value"

**Common Fixes:**

```yaml
helm template myapp ./mychart --set-string name="value"

**常见修复方法:**

```yaml

❌ WRONG: String in values.yaml

❌ 错误:值文件中使用字符串

replicaCount: "3" # String
replicaCount: "3" # 字符串类型

✅ CORRECT: Number in values.yaml

✅ 正确:值文件中使用数字

replicaCount: 3 # Int
replicaCount: 3 # 整数类型

Template: Always convert to correct type

模板:始终转换为正确类型

replicas: {{ .Values.replicaCount | int }} port: {{ .Values.service.port | int }} enabled: {{ .Values.feature.enabled | ternary "true" "false" }}
replicas: {{ .Values.replicaCount | int }} port: {{ .Values.service.port | int }} enabled: {{ .Values.feature.enabled | ternary "true" "false" }}

Use --set-string for forcing strings

使用--set-string强制设置为字符串

helm install myapp ./chart --set-string version="1.0"
undefined
helm install myapp ./chart --set-string version="1.0"
undefined

4. Resource Already Exists

4. 资源已存在

Symptom:
Error: rendered manifests contain a resource that already exists
Causes:
  • Resource from previous failed install
  • Resource managed by another release
  • Manual resource creation conflict
Debugging Steps:
bash
undefined
症状:
Error: rendered manifests contain a resource that already exists
原因:
  • 之前安装失败残留的资源
  • 其他发布管理的资源
  • 手动创建的资源冲突
调试步骤:
bash
undefined

1. Check if resource exists

1. 检查资源是否存在

kubectl get <resource-type> <name> -n <namespace>
kubectl get <resource-type> <name> -n <namespace>

2. Check resource ownership

2. 检查资源所有权

kubectl get <resource-type> <name> -n <namespace> -o yaml | grep -A 5 "labels:"
kubectl get <resource-type> <name> -n <namespace> -o yaml | grep -A 5 "labels:"

3. Check which Helm release owns it

3. 检查哪个Helm发布拥有该资源

helm list --all-namespaces | grep <resource-name>
helm list --all-namespaces | grep <resource-name>

4. Check for stuck releases

4. 检查卡住的发布

helm list --all-namespaces --failed helm list --all-namespaces --pending

**Solutions:**

```bash
helm list --all-namespaces --failed helm list --all-namespaces --pending

**解决方案:**

```bash

Option 1: Uninstall conflicting release

方案1:卸载冲突的发布

helm uninstall <release> --namespace <namespace>
helm uninstall <release> --namespace <namespace>

Option 2: Delete specific resource manually

方案2:手动删除特定资源

kubectl delete <resource-type> <name> -n <namespace>
kubectl delete <resource-type> <name> -n <namespace>

Option 3: Use different release name

方案3:使用不同的发布名称

helm install myapp-v2 ./chart --namespace prod
helm install myapp-v2 ./chart --namespace prod

Option 4: Adopt existing resources (advanced)

方案4:接管现有资源(高级)

kubectl annotate <resource-type> <name>
meta.helm.sh/release-name=<release>
meta.helm.sh/release-namespace=<namespace>
-n <namespace> kubectl label <resource-type> <name>
app.kubernetes.io/managed-by=Helm
-n <namespace>
undefined
kubectl annotate <resource-type> <name>
meta.helm.sh/release-name=<release>
meta.helm.sh/release-namespace=<namespace>
-n <namespace> kubectl label <resource-type> <name>
app.kubernetes.io/managed-by=Helm
-n <namespace>
undefined

5. Image Pull Failures

5. 镜像拉取失败

Symptom:
Pod status: ImagePullBackOff or ErrImagePull
Causes:
  • Wrong image name/tag
  • Missing registry credentials
  • Private registry authentication
  • Network/registry issues
Debugging Steps:
bash
undefined
症状:
Pod状态:ImagePullBackOff或ErrImagePull
原因:
  • 镜像名称/标签错误
  • 缺少镜像仓库凭证
  • 私有仓库认证失败
  • 网络/镜像仓库问题
调试步骤:
bash
undefined

1. Check pod events

1. 检查Pod事件

kubectl describe pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

2. Verify image in manifest

2. 验证清单中的镜像

helm get manifest myapp -n prod | grep "image:"
helm get manifest myapp -n prod | grep "image:"

3. Check image pull secrets

3. 检查镜像拉取密钥

kubectl get secrets -n <namespace> kubectl get sa default -n <namespace> -o yaml | grep imagePullSecrets
kubectl get secrets -n <namespace> kubectl get sa default -n <namespace> -o yaml | grep imagePullSecrets

4. Test image pull manually

4. 手动测试镜像拉取

docker pull image:tag

**Solutions:**

```bash
docker pull image:tag

**解决方案:**

```bash

Option 1: Fix image name/tag in values

方案1:修复值文件中的镜像名称/标签

helm upgrade myapp ./chart
--namespace prod
--set image.repository=myregistry.io/myapp
--set image.tag=v1.0.0
helm upgrade myapp ./chart
--namespace prod
--set image.repository=myregistry.io/myapp
--set image.tag=v1.0.0

Option 2: Create image pull secret

方案2:创建镜像拉取密钥

kubectl create secret docker-registry regcred
--docker-server=<registry>
--docker-username=<user>
--docker-password=<pass>
--namespace <namespace>
kubectl create secret docker-registry regcred
--docker-server=<registry>
--docker-username=<user>
--docker-password=<pass>
--namespace <namespace>

Reference in values.yaml:

在values.yaml中引用:

imagePullSecrets:
  • name: regcred
imagePullSecrets:
  • name: regcred

Option 3: Update service account

方案3:更新服务账户

kubectl patch serviceaccount default -n <namespace>
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
undefined
kubectl patch serviceaccount default -n <namespace>
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
undefined

6. CRD Issues

6. CRD问题

Symptom:
Error: unable to recognize "": no matches for kind "MyCustomResource" in version "mygroup/v1"
Causes:
  • CRD not installed
  • CRD installed in wrong order
  • CRD version mismatch
  • API version not supported in cluster
Debugging Steps:
bash
undefined
症状:
Error: unable to recognize "": no matches for kind "MyCustomResource" in version "mygroup/v1"
原因:
  • CRD未安装
  • CRD安装顺序错误
  • CRD版本不匹配
  • 集群不支持该API版本
调试步骤:
bash
undefined

1. Check if CRD exists

1. 检查CRD是否存在

kubectl get crds | grep myresource
kubectl get crds | grep myresource

2. Check CRD version

2. 检查CRD版本

kubectl get crd myresource.mygroup.io -o yaml | grep "version:"
kubectl get crd myresource.mygroup.io -o yaml | grep "version:"

3. Check API versions supported

3. 检查支持的API版本

kubectl api-resources | grep mygroup
kubectl api-resources | grep mygroup

4. Verify template uses correct API version

4. 验证模板使用的API版本是否正确

helm template myapp ./chart | grep "apiVersion:"

**Solutions:**

```bash
helm template myapp ./chart | grep "apiVersion:"

**解决方案:**

```bash

Option 1: Install CRDs first (if separate chart)

方案1:先安装CRD(如果是独立Chart)

helm install myapp-crds ./crds --namespace prod helm install myapp ./chart --namespace prod
helm install myapp-crds ./crds --namespace prod helm install myapp ./chart --namespace prod

Option 2: Use --skip-crds if reinstalling

方案2:重新安装时跳过CRD

helm upgrade myapp ./chart
--namespace prod
--skip-crds
helm upgrade myapp ./chart
--namespace prod
--skip-crds

Option 3: Manually install CRDs

方案3:手动安装CRD

kubectl apply -f crds/
kubectl apply -f crds/

Option 4: Update chart to use correct API version

方案4:更新Chart使用正确的API版本

Edit templates to use supported apiVersion

编辑模板使用支持的apiVersion

undefined
undefined

7. Timeout Errors

7. 超时错误

Symptom:
Error: timed out waiting for the condition
Causes:
  • Pods not becoming ready (failing health checks)
  • Resource limits too low
  • Image pull taking too long
  • Init containers failing
Debugging Steps:
bash
undefined
症状:
Error: timed out waiting for the condition
原因:
  • Pod未就绪(健康检查失败)
  • 资源限制过低
  • 镜像拉取耗时过长
  • 初始化容器失败
调试步骤:
bash
undefined

1. Check pod status

1. 检查Pod状态

kubectl get pods -n <namespace> -l app.kubernetes.io/instance=myapp
kubectl get pods -n <namespace> -l app.kubernetes.io/instance=myapp

2. Check pod events and logs

2. 检查Pod事件和日志

kubectl describe pod <pod-name> -n <namespace> kubectl logs <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace> kubectl logs <pod-name> -n <namespace>

3. Check init containers

3. 检查初始化容器

kubectl logs <pod-name> -n <namespace> -c <init-container-name>
kubectl logs <pod-name> -n <namespace> -c <init-container-name>

4. Increase timeout and watch

4. 增加超时时间并监控

helm upgrade myapp ./chart
--namespace prod
--wait
--timeout 15m
--debug & watch kubectl get pods -n prod

**Solutions:**

```bash
helm upgrade myapp ./chart
--namespace prod
--wait
--timeout 15m
--debug & watch kubectl get pods -n prod

**解决方案:**

```bash

Option 1: Increase timeout

方案1:增加超时时间

helm upgrade myapp ./chart
--namespace prod
--timeout 10m
--wait
helm upgrade myapp ./chart
--namespace prod
--timeout 10m
--wait

Option 2: Don't wait (manual verification)

方案2:不等待(手动验证)

helm upgrade myapp ./chart
--namespace prod
helm upgrade myapp ./chart
--namespace prod

Then manually check: kubectl get pods -n prod

然后手动检查:kubectl get pods -n prod

Option 3: Fix readiness probe

方案3:修复就绪探针

Adjust in values.yaml or chart templates:

在values.yaml或Chart模板中调整:

readinessProbe: initialDelaySeconds: 30 # Give more time to start periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 # Allow more failures
readinessProbe: initialDelaySeconds: 30 # 给予更多启动时间 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 # 允许更多失败次数

Option 4: Increase resource limits

方案4:增加资源限制

resources: limits: memory: "512Mi" # Was too low at 128Mi cpu: "1000m"
undefined
resources: limits: memory: "512Mi" # 之前128Mi过低 cpu: "1000m"
undefined

8. Hook Failures

8. 钩子失败

Symptom:
Error: pre-upgrade hooks failed: job failed
Causes:
  • Hook job failing
  • Hook timing issues
  • Hook dependencies not met
  • Hook timeout
Debugging Steps:
bash
undefined
症状:
Error: pre-upgrade hooks failed: job failed
原因:
  • 钩子任务失败
  • 钩子时序问题
  • 钩子依赖未满足
  • 钩子超时
调试步骤:
bash
undefined

1. Check hook jobs/pods

1. 检查钩子任务/Pod

kubectl get jobs -n <namespace> kubectl get pods -n <namespace> -l helm.sh/hook
kubectl get jobs -n <namespace> kubectl get pods -n <namespace> -l helm.sh/hook

2. Check hook logs

2. 检查钩子日志

kubectl logs job/<hook-job-name> -n <namespace>
kubectl logs job/<hook-job-name> -n <namespace>

3. Get hook definitions

3. 获取钩子定义

helm get hooks myapp -n <namespace>
helm get hooks myapp -n <namespace>

4. Check hook status in release

4. 检查发布中的钩子状态

helm get manifest myapp -n <namespace> | grep -A 10 "helm.sh/hook"

**Solutions:**

```bash
helm get manifest myapp -n <namespace> | grep -A 10 "helm.sh/hook"

**解决方案:**

```bash

Option 1: Delete failed hook resources

方案1:删除失败的钩子资源

kubectl delete job <hook-job> -n <namespace> helm upgrade myapp ./chart --namespace prod
kubectl delete job <hook-job> -n <namespace> helm upgrade myapp ./chart --namespace prod

Option 2: Skip hooks temporarily (debugging only)

方案2:临时跳过钩子(仅调试用)

helm upgrade myapp ./chart
--namespace prod
--no-hooks
helm upgrade myapp ./chart
--namespace prod
--no-hooks

Option 3: Fix hook in template

方案3:修复模板中的钩子

Adjust hook annotations:

调整钩子注解:

annotations: "helm.sh/hook": pre-upgrade "helm.sh/hook-weight": "0" # Order of execution "helm.sh/hook-delete-policy": hook-succeeded,hook-failed # Cleanup
undefined
annotations: "helm.sh/hook": pre-upgrade "helm.sh/hook-weight": "0" # 执行顺序 "helm.sh/hook-delete-policy": hook-succeeded,hook-failed # 清理策略
undefined

Debugging Workflow

调试工作流

Step-by-Step Debugging Process

分步调试流程

bash
undefined
bash
undefined

1. IDENTIFY THE PROBLEM

1. 定位问题

Check release status

检查发布状态

helm status myapp --namespace prod --show-resources
helm status myapp --namespace prod --show-resources

Check release history

检查发布历史

helm history myapp --namespace prod
helm history myapp --namespace prod

2. INSPECT CONFIGURATION

2. 检查配置

What values were used?

使用了哪些值?

helm get values myapp --namespace prod --all > actual-values.yaml
helm get values myapp --namespace prod --all > actual-values.yaml

What manifests were deployed?

部署了哪些清单?

helm get manifest myapp --namespace prod > actual-manifests.yaml
helm get manifest myapp --namespace prod > actual-manifests.yaml

3. CHECK KUBERNETES RESOURCES

3. 检查Kubernetes资源

Are pods running?

Pod是否在运行?

kubectl get pods -n prod -l app.kubernetes.io/instance=myapp
kubectl get pods -n prod -l app.kubernetes.io/instance=myapp

Any events?

有哪些事件?

kubectl get events -n prod --sort-by='.lastTimestamp' | tail -20
kubectl get events -n prod --sort-by='.lastTimestamp' | tail -20

Pod details

Pod详情

kubectl describe pod <pod-name> -n prod kubectl logs <pod-name> -n prod
kubectl describe pod <pod-name> -n prod kubectl logs <pod-name> -n prod

4. VALIDATE LOCALLY

4. 本地验证

Re-render templates with same values

使用相同的值重新渲染模板

helm template myapp ./chart -f actual-values.yaml > local-manifests.yaml
helm template myapp ./chart -f actual-values.yaml > local-manifests.yaml

Compare deployed vs local

对比已部署和本地渲染的内容

diff actual-manifests.yaml local-manifests.yaml
diff actual-manifests.yaml local-manifests.yaml

5. TEST FIX

5. 测试修复

Dry-run with fix

预演修复后的效果

helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--dry-run --debug
helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--dry-run --debug

Apply fix

应用修复

helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--atomic --wait
undefined
helm upgrade myapp ./chart
--namespace prod
--set fix.value=true
--atomic --wait
undefined

Best Practices for Debugging

调试最佳实践

Enable Debug Output

启用调试输出

DO: Use
--debug
to see what's happening
bash
helm install myapp ./chart --namespace prod --debug
推荐:使用
--debug
查看详细过程
bash
helm install myapp ./chart --namespace prod --debug

Dry-Run Everything

预演所有操作

DO: Always dry-run before applying changes
bash
helm upgrade myapp ./chart -n prod --dry-run --debug
推荐:在应用更改前始终进行预演
bash
helm upgrade myapp ./chart -n prod --dry-run --debug

Layer Your Validation

分层验证

DO: Progress through validation layers
bash
helm lint ./chart --strict
helm template myapp ./chart -f values.yaml
helm install myapp ./chart -n prod --dry-run --debug
helm install myapp ./chart -n prod --atomic --wait
推荐:按步骤完成分层验证
bash
helm lint ./chart --strict
helm template myapp ./chart -f values.yaml
helm install myapp ./chart -n prod --dry-run --debug
helm install myapp ./chart -n prod --atomic --wait

Capture State

捕获状态

DO: Save release state before changes
bash
undefined
推荐:在更改前保存发布状态
bash
undefined

Before upgrade

升级前

helm get values myapp -n prod --all > values-before.yaml helm get manifest myapp -n prod > manifest-before.yaml kubectl get pods -n prod -o yaml > pods-before.yaml
undefined
helm get values myapp -n prod --all > values-before.yaml helm get manifest myapp -n prod > manifest-before.yaml kubectl get pods -n prod -o yaml > pods-before.yaml
undefined

Use Atomic Deployments

使用原子部署

DO: Enable automatic rollback
bash
helm upgrade myapp ./chart -n prod --atomic --wait
推荐:启用自动回滚
bash
helm upgrade myapp ./chart -n prod --atomic --wait

Check Kubernetes Resources

检查Kubernetes资源

DO: Inspect deployed resources directly
bash
kubectl get all -n prod -l app.kubernetes.io/instance=myapp
kubectl describe pod <pod> -n prod
kubectl logs <pod> -n prod
推荐:直接检查已部署的资源
bash
kubectl get all -n prod -l app.kubernetes.io/instance=myapp
kubectl describe pod <pod> -n prod
kubectl logs <pod> -n prod

Understand Value Precedence

理解值的优先级

DO: Know override order
bash
undefined
推荐:了解值的覆盖顺序
bash
undefined

Lowest to highest precedence:

优先级从低到高:

1. Chart defaults (values.yaml)

1. Chart默认值(values.yaml)

2. --reuse-values (previous release)

2. --reuse-values(上一次发布的值)

3. -f values1.yaml

3. -f values1.yaml

4. -f values2.yaml (overrides values1.yaml)

4. -f values2.yaml(覆盖values1.yaml)

5. --set key=value (overrides everything)

5. --set key=value(覆盖所有值)

undefined
undefined

Debugging Tools & Utilities

调试工具与实用程序

yq - YAML Processor

yq - YAML处理器

bash
undefined
bash
undefined

Validate YAML syntax

验证YAML语法

helm template myapp ./chart | yq eval '.' -
helm template myapp ./chart | yq eval '.' -

Extract specific values

提取特定值

helm get values myapp -n prod -o yaml | yq eval '.image.tag' -
helm get values myapp -n prod -o yaml | yq eval '.image.tag' -

Pretty print

格式化输出

helm get manifest myapp -n prod | yq eval '.' -
undefined
helm get manifest myapp -n prod | yq eval '.' -
undefined

kubectl Plugin: stern

kubectl插件:stern

bash
undefined
bash
undefined

Tail logs from multiple pods

多Pod日志尾部查看

stern -n prod myapp
stern -n prod myapp

Follow logs with timestamps

带时间戳跟踪日志

stern -n prod myapp --timestamps
undefined
stern -n prod myapp --timestamps
undefined

kubectl Plugin: neat

kubectl插件:neat

bash
undefined
bash
undefined

Clean kubectl output (remove clutter)

清理kubectl输出(去除冗余信息)

kubectl get pod <pod> -n prod -o yaml | kubectl neat
undefined
kubectl get pod <pod> -n prod -o yaml | kubectl neat
undefined

k9s - Kubernetes CLI

k9s - Kubernetes命令行工具

bash
undefined
bash
undefined

Interactive cluster management

交互式集群管理

k9s -n prod
k9s -n prod

Features:

功能:

- Live resource updates

- 资源实时更新

- Log viewing

- 日志查看

- Resource editing

- 资源编辑

- Port forwarding

- 端口转发

undefined
undefined

Integration with Other Tools

与其他工具的集成

ArgoCD Debugging

ArgoCD调试

bash
undefined
bash
undefined

When managed by ArgoCD:

当由ArgoCD管理时:

1. Check ArgoCD Application status

1. 检查ArgoCD应用状态

argocd app get <app-name>
argocd app get <app-name>

2. Still use helm for inspection

2. 仍可使用helm进行检查

helm get values <release> -n <namespace> --all helm get manifest <release> -n <namespace>
helm get values <release> -n <namespace> --all helm get manifest <release> -n <namespace>

3. Sync with debugging

3. 带调试的同步

argocd app sync <app-name> --dry-run argocd app sync <app-name> --prune --force
undefined
argocd app sync <app-name> --dry-run argocd app sync <app-name> --prune --force
undefined

CI/CD Debugging

CI/CD调试

yaml
undefined
yaml
undefined

Add debugging to pipeline

在流水线中添加调试步骤

  • name: Debug Helm Install run: | set -x # Enable bash debugging helm template myapp ./chart
    -f values.yaml
    --debug helm install myapp ./chart
    --namespace prod
    --dry-run
    --debug continue-on-error: true # Don't fail pipeline
  • name: Capture State on Failure if: failure() run: | helm list --all-namespaces kubectl get all -n prod kubectl describe pods -n prod kubectl logs -n prod --all-containers --tail=100
undefined
  • name: 调试Helm安装 run: | set -x # 启用bash调试 helm template myapp ./chart
    -f values.yaml
    --debug helm install myapp ./chart
    --namespace prod
    --dry-run
    --debug continue-on-error: true # 不终止流水线
  • name: 失败时捕获状态 if: failure() run: | helm list --all-namespaces kubectl get all -n prod kubectl describe pods -n prod kubectl logs -n prod --all-containers --tail=100
undefined

Agentic Optimizations

智能优化命令

ContextCommand
Release status (JSON)
helm status <release> -n <ns> -o json
All values (JSON)
helm get values <release> -n <ns> --all -o json
Pod status (compact)
kubectl get pods -n <ns> -l app.kubernetes.io/instance=<release> -o wide
Events (sorted)
kubectl get events -n <ns> --sort-by='.lastTimestamp' -o json
Render + validate
helm template <release> ./chart --debug 2>&1 | head -100
场景命令
发布状态(JSON)
helm status <release> -n <ns> -o json
所有值(JSON)
helm get values <release> -n <ns> --all -o json
Pod状态(精简)
kubectl get pods -n <ns> -l app.kubernetes.io/instance=<release> -o wide
事件(排序后)
kubectl get events -n <ns> --sort-by='.lastTimestamp' -o json
渲染+验证
helm template <release> ./chart --debug 2>&1 | head -100

Related Skills

相关技能

  • Helm Release Management - Install, upgrade, uninstall operations
  • Helm Values Management - Advanced configuration management
  • Helm Release Recovery - Rollback and recovery strategies
  • Kubernetes Operations - Managing and debugging K8s resources
  • ArgoCD CLI Login - GitOps debugging with ArgoCD
  • Helm发布管理 - 安装、升级、卸载操作
  • Helm值管理 - 高级配置管理
  • Helm发布恢复 - 回滚与恢复策略
  • Kubernetes运维 - K8s资源管理与调试
  • ArgoCD CLI登录 - GitOps调试

References

参考资料