terraform-troubleshooting

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Terraform Troubleshooting Skill

Terraform故障排除技能

Table of Contents

目录

How to ImplementStep-by-Step | Examples
实施方法分步指南 | 示例
帮助要求 | 相关链接

Purpose

用途

Troubleshoot Terraform errors efficiently using systematic debugging workflows, detailed error analysis, and proven solutions for common problems.
通过系统化的调试工作流、详细的错误分析和经过验证的常见问题解决方案,高效排查Terraform错误。

When to Use

适用场景

Use this skill when you encounter:
  • Terraform failures - Apply or plan commands fail with errors
  • State lock issues - Lock timeout or concurrent modification errors
  • Provider errors - GCP API errors, authentication failures
  • Syntax problems - Invalid HCL, type mismatches, missing arguments
  • Unexpected infrastructure changes - State drift, unintended modifications
  • Version conflicts - Provider or Terraform version incompatibilities
  • Permission errors - GCP IAM or service account issues
Error Categories:
  1. Language Errors - Syntax, configuration, type mismatches
  2. State Errors - Lock timeouts, corruption, concurrent access
  3. Core Errors - Terraform version, plugin issues
  4. Provider Errors - GCP API, permissions, authentication
Trigger Phrases:
  • "Terraform apply failed"
  • "Fix state lock error"
  • "Debug Terraform syntax error"
  • "Resolve GCP permission denied"
  • "Fix state drift"
  • "Terraform version incompatibility"
当你遇到以下情况时使用本技能:
  • Terraform执行失败 - Apply或Plan命令执行报错
  • 状态锁定问题 - 锁定超时或并发修改错误
  • 提供商错误 - GCP API错误、认证失败
  • 语法问题 - 无效HCL、类型不匹配、参数缺失
  • 意外的基础设施变更 - 状态漂移、非预期修改
  • 版本冲突 - 提供商或Terraform版本不兼容
  • 权限错误 - GCP IAM或服务账号问题
错误分类:
  1. 语言错误 - 语法、配置、类型不匹配
  2. 状态错误 - 锁定超时、损坏、并发访问
  3. 核心错误 - Terraform版本、插件问题
  4. 提供商错误 - GCP API、权限、认证
触发短语:
  • "Terraform apply失败"
  • "修复状态锁定错误"
  • "调试Terraform语法错误"
  • "解决GCP权限拒绝问题"
  • "修复状态漂移"
  • "Terraform版本不兼容"

Quick Start

快速入门

Diagnose a Terraform error in 5 minutes:
bash
undefined
5分钟内诊断Terraform错误:
bash
undefined

1. Enable debug logging

1. 启用调试日志

export TF_LOG=DEBUG export TF_LOG_PATH=/tmp/terraform.log
export TF_LOG=DEBUG export TF_LOG_PATH=/tmp/terraform.log

2. Validate syntax

2. 验证语法

terraform validate
terraform validate

3. Run plan with detailed output

3. 运行plan并生成详细输出

terraform plan -out=tfplan
terraform plan -out=tfplan

4. Review logs for errors

4. 查看日志中的错误

cat /tmp/terraform.log | grep -i error
cat /tmp/terraform.log | grep -i error

5. Disable logging when done

5. 完成后禁用日志

unset TF_LOG unset TF_LOG_PATH
undefined
unset TF_LOG unset TF_LOG_PATH
undefined

Instructions

操作步骤

Step 1: Categorize the Error

步骤1:分类错误

Terraform errors fall into four categories. Identify which type you're dealing with:
1. Language Errors (Syntax, configuration)
  • Invalid HCL syntax
  • Type mismatches
  • Missing required arguments
  • Example:
    Error: Invalid value for module argument
2. State Errors (State lock, corruption)
  • Lock timeouts
  • Concurrent modifications
  • State file corruption
  • Example:
    Error: Error acquiring the state lock
3. Core Errors (Terraform version, plugins)
  • Version incompatibility
  • Missing plugins
  • Initialize issues
  • Example:
    Error: Unsupported Terraform version
4. Provider Errors (GCP API, permissions)
  • GCP API errors
  • Authentication issues
  • Permission denied
  • Example:
    Error: Error creating PubSub topic: googleapi: Error 403
Terraform错误分为四类,先确定你遇到的错误类型:
1. 语言错误(语法、配置)
  • 无效HCL语法
  • 类型不匹配
  • 缺少必填参数
  • 示例:
    Error: Invalid value for module argument
2. 状态错误(状态锁定、损坏)
  • 锁定超时
  • 并发修改
  • 状态文件损坏
  • 示例:
    Error: Error acquiring the state lock
3. 核心错误(Terraform版本、插件)
  • 版本不兼容
  • 插件缺失
  • 初始化问题
  • 示例:
    Error: Unsupported Terraform version
4. 提供商错误(GCP API、权限)
  • GCP API错误
  • 认证问题
  • 权限拒绝
  • 示例:
    Error: Error creating PubSub topic: googleapi: Error 403

Step 2: Enable Detailed Logging

步骤2:启用详细日志

bash
undefined
bash
undefined

Set debug logging

设置调试日志

export TF_LOG=DEBUG export TF_LOG_PATH=/tmp/terraform.log
export TF_LOG=DEBUG export TF_LOG_PATH=/tmp/terraform.log

Available levels: TRACE, DEBUG, INFO, WARN, ERROR

可用日志级别:TRACE, DEBUG, INFO, WARN, ERROR

TRACE: Most verbose, includes all operations

TRACE: 最详细,包含所有操作

DEBUG: Detailed, good for troubleshooting

DEBUG: 详细信息,适合故障排除

INFO: General information

INFO: 常规信息


**Reading Logs**:
```bash

**日志查看:**
```bash

Filter for errors

过滤错误信息

cat /tmp/terraform.log | grep -i error
cat /tmp/terraform.log | grep -i error

Filter for specific resource

过滤特定资源

cat /tmp/terraform.log | grep "google_pubsub"
cat /tmp/terraform.log | grep "google_pubsub"

Filter for timestamps

过滤时间戳

cat /tmp/terraform.log | grep "2025-11-14"
undefined
cat /tmp/terraform.log | grep "2025-11-14"
undefined

Step 3: Execute Troubleshooting Workflow

步骤3:执行故障排除工作流

Follow this sequence for systematic debugging:
bash
undefined
按照以下顺序进行系统化调试:
bash
undefined

1. Validate HCL syntax

1. 验证HCL语法

terraform validate
terraform validate

✓ Catches syntax, type, and required argument errors

✓ 捕获语法、类型和必填参数错误

✗ Does NOT validate against actual cloud state

✗ 不会针对实际云状态进行验证

2. Format code (catches formatting issues)

2. 格式化代码(捕获格式问题)

terraform fmt -check -recursive terraform fmt -recursive # Fix formatting
terraform fmt -check -recursive terraform fmt -recursive # 修复格式问题

3. Refresh state (sync with actual infrastructure)

3. 刷新状态(与实际基础设施同步)

terraform refresh
terraform refresh

✓ Updates Terraform state to match real infrastructure

✓ 更新Terraform状态以匹配真实基础设施

✗ Does NOT make changes, only reads

✗ 不会进行变更,仅读取

4. Re-initialize (if provider issues)

4. 重新初始化(如果遇到提供商问题)

terraform init -upgrade
terraform init -upgrade

✓ Updates provider versions to latest compatible

✓ 将提供商版本更新到最新兼容版本

✗ Requires time for downloads

✗ 需要时间下载

5. Plan with detailed output

5. 运行plan并生成详细输出

terraform plan -out=tfplan
terraform plan -out=tfplan

✓ Shows exactly what will change

✓ 准确显示将要进行的变更

✗ Does NOT make changes

✗ 不会进行变更

6. Check logs

6. 查看日志

grep -i error /tmp/terraform.log
undefined
grep -i error /tmp/terraform.log
undefined

Step 4: Handle Specific Error Types

步骤4:处理特定错误类型

State Lock Errors

状态锁定错误

Problem: Another Terraform operation is running or left a stale lock.
bash
undefined
问题:另一个Terraform操作正在运行或留下了无效锁定。
bash
undefined

Option 1: Wait for lock (if operation is legitimately running)

选项1:等待锁定释放(如果操作确实在运行)

terraform apply -lock-timeout=10m
terraform apply -lock-timeout=10m

Option 2: Force unlock (use with caution!)

选项2:强制解锁(谨慎使用!)

terraform force-unlock LOCK_ID
terraform force-unlock LOCK_ID

Get LOCK_ID from error message

从错误消息中获取LOCK_ID

Option 3: Manual recovery (last resort)

选项3:手动恢复(最后手段)

Delete lock file from GCS backend

从GCS后端删除锁定文件

gsutil rm gs://bucket/prefix/default.tflock

**Prevention**:
- Use CI/CD with job queuing (prevents concurrent runs)
- Communicate with team before applying
- Use Terraform Cloud/Enterprise for automatic locking
gsutil rm gs://bucket/prefix/default.tflock

**预防措施**:
- 使用带作业排队的CI/CD(防止并发运行)
- 执行前与团队沟通
- 使用Terraform Cloud/Enterprise进行自动锁定

Cycle Errors (Circular Dependencies)

循环错误(循环依赖)

Problem: Resources depend on each other in a circle.
Error: Cycle: resource_a, resource_b, resource_a
Solution: Break the cycle by using
depends_on
or reordering:
hcl
undefined
问题:资源之间存在循环依赖。
Error: Cycle: resource_a, resource_b, resource_a
解决方案:使用
depends_on
或重新排序来打破循环:
hcl
undefined

❌ BAD: Circular reference

❌ 错误:循环引用

resource "google_compute_firewall" "allow_app" { source_tags = [google_compute_instance.app.tags[0]] } resource "google_compute_instance" "app" { tags = [google_compute_firewall.allow_app.name] }
resource "google_compute_firewall" "allow_app" { source_tags = [google_compute_instance.app.tags[0]] } resource "google_compute_instance" "app" { tags = [google_compute_firewall.allow_app.name] }

✅ GOOD: Break dependency

✅ 正确:打破依赖

resource "google_compute_firewall" "allow_app" { source_tags = ["app"] # Use explicit string instead } resource "google_compute_instance" "app" { tags = ["app"] # Explicit value }
undefined
resource "google_compute_firewall" "allow_app" { source_tags = ["app"] # 使用显式字符串替代 } resource "google_compute_instance" "app" { tags = ["app"] # 显式值 }
undefined

Provider Version Conflicts

提供商版本冲突

Problem: Provider version constraint conflict.
Error: Incompatible provider version

Terraform requires >= 5.26.0, < 5.27.0
You have 6.0.0 installed
Solution:
bash
undefined
问题:提供商版本约束冲突。
Error: Incompatible provider version

Terraform requires >= 5.26.0, < 5.27.0
You have 6.0.0 installed
解决方案
bash
undefined

1. Check current version

1. 检查当前版本

terraform version
terraform version

2. Lock to compatible version

2. 锁定到兼容版本

In main.tf

在main.tf中

terraform { required_providers { google = { source = "hashicorp/google" version = "~> 5.26.0" # Allows 5.26.x, not 5.27.0 } } }
terraform { required_providers { google = { source = "hashicorp/google" version = "~> 5.26.0" # 允许5.26.x,不允许5.27.0 } } }

3. Re-initialize

3. 重新初始化

terraform init -upgrade
terraform init -upgrade

4. Commit .terraform.lock.hcl

4. 提交.terraform.lock.hcl

git add .terraform.lock.hcl git commit -m "lock: pin Google provider to 5.26.0"
undefined
git add .terraform.lock.hcl git commit -m "lock: pin Google provider to 5.26.0"
undefined

GCP Permission Errors

GCP权限错误

Problem: Service account lacks required GCP permissions.
Error: Error creating PubSub topic: googleapi: Error 403:
The caller does not have permission
Solution:
bash
undefined
问题:服务账号缺少所需的GCP权限。
Error: Error creating PubSub topic: googleapi: Error 403:
The caller does not have permission
解决方案
bash
undefined

1. Check current authentication

1. 检查当前认证

gcloud auth list gcloud config get-value project
gcloud auth list gcloud config get-value project

2. Verify service account permissions

2. 验证服务账号权限

gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:app-runtime@*"
gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:app-runtime@*"

3. Grant required role

3. 授予所需角色

gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:app-runtime@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:app-runtime@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"

4. Re-plan

4. 重新运行plan

terraform plan
undefined
terraform plan
undefined

State Out of Sync

状态不同步

Problem: Terraform state doesn't match actual infrastructure.
bash
undefined
问题:Terraform状态与实际基础设施不匹配。
bash
undefined

Detect drift

检测漂移

terraform plan
terraform plan

Shows changes that don't exist in your .tf files

显示.tf文件中不存在的变更

Sync state

同步状态

terraform refresh
terraform refresh

Updates state to match real infrastructure

更新Terraform状态以匹配真实基础设施

Manual fix (if refresh fails)

手动修复(如果refresh失败)

terraform import google_pubsub_topic.incoming
projects/ecp-wtr-supplier-charges-prod/topics/my-topic
terraform import google_pubsub_topic.incoming
projects/ecp-wtr-supplier-charges-prod/topics/my-topic

Remove from state (if resource manually deleted)

从状态中移除(如果资源已手动删除)

terraform state rm google_pubsub_topic.incoming
undefined
terraform state rm google_pubsub_topic.incoming
undefined

Step 5: Review and Recover

步骤5:复查与恢复

bash
undefined
bash
undefined

View recent state changes

查看最近的状态变更

terraform state list terraform state show google_pubsub_topic.incoming
terraform state list terraform state show google_pubsub_topic.incoming

See what changed in last apply

查看上次apply的变更内容

terraform show tfplan | head -50
terraform show tfplan | head -50

Rollback by re-applying previous configuration

通过重新应用之前的配置来回滚

git checkout HEAD~1 # Go back one commit terraform plan terraform apply
undefined
git checkout HEAD~1 # 回退一个提交 terraform plan terraform apply
undefined

Examples

示例

Example 1: Debugging State Lock

示例1:调试状态锁定

bash
undefined
bash
undefined

Error occurs

出现错误

Error: Error acquiring the state lock

Error: Error acquiring the state lock

Lock Info:

Lock Info:

ID: abc123def456

ID: abc123def456

Path: gs://terraform-state-prod/supplier-charges-hub/default.tflock

Path: gs://terraform-state-prod/supplier-charges-hub/default.tflock

Created: 2025-11-14 10:30:00 UTC

Created: 2025-11-14 10:30:00 UTC

Step 1: Check if operation is running

步骤1:检查是否有操作在运行

gcloud compute operations list --filter="status:RUNNING"
gcloud compute operations list --filter="status:RUNNING"

Step 2: If no running operation, force unlock

步骤2:如果没有运行中的操作,强制解锁

terraform force-unlock abc123def456
terraform force-unlock abc123def456

Step 3: If force-unlock fails, delete lock file

步骤3:如果强制解锁失败,删除锁定文件

gsutil rm gs://terraform-state-prod/supplier-charges-hub/default.tflock
gsutil rm gs://terraform-state-prod/supplier-charges-hub/default.tflock

Step 4: Re-plan to verify state is correct

步骤4:重新运行plan以验证状态正确

terraform refresh terraform plan
undefined
terraform refresh terraform plan
undefined

Example 2: Fixing Syntax Error

示例2:修复语法错误

bash
undefined
bash
undefined

Error occurs

出现错误

Error: Invalid value for module argument

Error: Invalid value for module argument

Step 1: Validate syntax

步骤1:验证语法

terraform validate
terraform validate

Output shows exactly what's wrong:

输出会明确显示问题:

Error: Missing required argument

Error: Missing required argument

on pubsub.tf line 5, in resource "google_pubsub_topic" "topics":

on pubsub.tf line 5, in resource "google_pubsub_topic" "topics":

5: resource "google_pubsub_topic" "topics" {

5: resource "google_pubsub_topic" "topics" {

The argument "name" is required, but was not set.

The argument "name" is required, but was not set.

Step 2: Review and fix the file

步骤2:复查并修复文件

Add missing argument:

添加缺失的参数:

resource "google_pubsub_topic" "topics" { name = "my-topic" # Add this }
resource "google_pubsub_topic" "topics" { name = "my-topic" # 添加此行 }

Step 3: Validate again

步骤3:再次验证

terraform validate
undefined
terraform validate
undefined

Example 3: GCP Permission Recovery

示例3:GCP权限恢复

bash
undefined
bash
undefined

Error occurs

出现错误

Error creating PubSub topic: googleapi: Error 403

Error creating PubSub topic: googleapi: Error 403

Step 1: Check authentication

步骤1:检查认证

gcloud auth list gcloud config get-value project
gcloud auth list gcloud config get-value project

Step 2: Get current IAM bindings

步骤2:获取当前IAM绑定

gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod

Step 3: Add Pub/Sub Editor role

步骤3:添加Pub/Sub Editor角色

gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:terraform@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:terraform@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"

Step 4: Re-run Terraform

步骤4:重新运行Terraform

terraform plan terraform apply
undefined
terraform plan terraform apply
undefined

Requirements

要求

  • Terraform 1.x+ installed
  • GCP credentials configured (gcloud auth or service account key)
  • Logging environment (for TF_LOG_PATH)
  • GCP CLI tools installed (
    gcloud
    ,
    gsutil
    )
  • 已安装Terraform 1.x+
  • 已配置GCP凭据(gcloud auth或服务账号密钥)
  • 日志环境(用于TF_LOG_PATH)
  • 已安装GCP CLI工具(
    gcloud
    gsutil

See Also

相关链接

  • terraform-basics - General Terraform reference
  • terraform-state-management - Advanced state patterns
  • terraform-gcp-integration - GCP-specific issues
  • terraform-basics - Terraform通用参考
  • terraform-state-management - 高级状态管理模式
  • terraform-gcp-integration - GCP专属问题