terraform-troubleshooting
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTerraform Troubleshooting Skill
Terraform故障排除技能
Table of Contents
目录
How to Implement → Step-by-Step | Examples
Help → Requirements | See Also
Purpose
用途
Troubleshoot Terraform errors efficiently using systematic debugging workflows, detailed error analysis, and proven solutions for common problems.
通过系统化的调试工作流、详细的错误分析和经过验证的常见问题解决方案,高效排查Terraform错误。
When to Use
适用场景
Use this skill when you encounter:
- Terraform failures - Apply or plan commands fail with errors
- State lock issues - Lock timeout or concurrent modification errors
- Provider errors - GCP API errors, authentication failures
- Syntax problems - Invalid HCL, type mismatches, missing arguments
- Unexpected infrastructure changes - State drift, unintended modifications
- Version conflicts - Provider or Terraform version incompatibilities
- Permission errors - GCP IAM or service account issues
Error Categories:
- Language Errors - Syntax, configuration, type mismatches
- State Errors - Lock timeouts, corruption, concurrent access
- Core Errors - Terraform version, plugin issues
- Provider Errors - GCP API, permissions, authentication
Trigger Phrases:
- "Terraform apply failed"
- "Fix state lock error"
- "Debug Terraform syntax error"
- "Resolve GCP permission denied"
- "Fix state drift"
- "Terraform version incompatibility"
当你遇到以下情况时使用本技能:
- Terraform执行失败 - Apply或Plan命令执行报错
- 状态锁定问题 - 锁定超时或并发修改错误
- 提供商错误 - GCP API错误、认证失败
- 语法问题 - 无效HCL、类型不匹配、参数缺失
- 意外的基础设施变更 - 状态漂移、非预期修改
- 版本冲突 - 提供商或Terraform版本不兼容
- 权限错误 - GCP IAM或服务账号问题
错误分类:
- 语言错误 - 语法、配置、类型不匹配
- 状态错误 - 锁定超时、损坏、并发访问
- 核心错误 - Terraform版本、插件问题
- 提供商错误 - GCP API、权限、认证
触发短语:
- "Terraform apply失败"
- "修复状态锁定错误"
- "调试Terraform语法错误"
- "解决GCP权限拒绝问题"
- "修复状态漂移"
- "Terraform版本不兼容"
Quick Start
快速入门
Diagnose a Terraform error in 5 minutes:
bash
undefined5分钟内诊断Terraform错误:
bash
undefined1. Enable debug logging
1. 启用调试日志
export TF_LOG=DEBUG
export TF_LOG_PATH=/tmp/terraform.log
export TF_LOG=DEBUG
export TF_LOG_PATH=/tmp/terraform.log
2. Validate syntax
2. 验证语法
terraform validate
terraform validate
3. Run plan with detailed output
3. 运行plan并生成详细输出
terraform plan -out=tfplan
terraform plan -out=tfplan
4. Review logs for errors
4. 查看日志中的错误
cat /tmp/terraform.log | grep -i error
cat /tmp/terraform.log | grep -i error
5. Disable logging when done
5. 完成后禁用日志
unset TF_LOG
unset TF_LOG_PATH
undefinedunset TF_LOG
unset TF_LOG_PATH
undefinedInstructions
操作步骤
Step 1: Categorize the Error
步骤1:分类错误
Terraform errors fall into four categories. Identify which type you're dealing with:
1. Language Errors (Syntax, configuration)
- Invalid HCL syntax
- Type mismatches
- Missing required arguments
- Example:
Error: Invalid value for module argument
2. State Errors (State lock, corruption)
- Lock timeouts
- Concurrent modifications
- State file corruption
- Example:
Error: Error acquiring the state lock
3. Core Errors (Terraform version, plugins)
- Version incompatibility
- Missing plugins
- Initialize issues
- Example:
Error: Unsupported Terraform version
4. Provider Errors (GCP API, permissions)
- GCP API errors
- Authentication issues
- Permission denied
- Example:
Error: Error creating PubSub topic: googleapi: Error 403
Terraform错误分为四类,先确定你遇到的错误类型:
1. 语言错误(语法、配置)
- 无效HCL语法
- 类型不匹配
- 缺少必填参数
- 示例:
Error: Invalid value for module argument
2. 状态错误(状态锁定、损坏)
- 锁定超时
- 并发修改
- 状态文件损坏
- 示例:
Error: Error acquiring the state lock
3. 核心错误(Terraform版本、插件)
- 版本不兼容
- 插件缺失
- 初始化问题
- 示例:
Error: Unsupported Terraform version
4. 提供商错误(GCP API、权限)
- GCP API错误
- 认证问题
- 权限拒绝
- 示例:
Error: Error creating PubSub topic: googleapi: Error 403
Step 2: Enable Detailed Logging
步骤2:启用详细日志
bash
undefinedbash
undefinedSet debug logging
设置调试日志
export TF_LOG=DEBUG
export TF_LOG_PATH=/tmp/terraform.log
export TF_LOG=DEBUG
export TF_LOG_PATH=/tmp/terraform.log
Available levels: TRACE, DEBUG, INFO, WARN, ERROR
可用日志级别:TRACE, DEBUG, INFO, WARN, ERROR
TRACE: Most verbose, includes all operations
TRACE: 最详细,包含所有操作
DEBUG: Detailed, good for troubleshooting
DEBUG: 详细信息,适合故障排除
INFO: General information
INFO: 常规信息
**Reading Logs**:
```bash
**日志查看:**
```bashFilter for errors
过滤错误信息
cat /tmp/terraform.log | grep -i error
cat /tmp/terraform.log | grep -i error
Filter for specific resource
过滤特定资源
cat /tmp/terraform.log | grep "google_pubsub"
cat /tmp/terraform.log | grep "google_pubsub"
Filter for timestamps
过滤时间戳
cat /tmp/terraform.log | grep "2025-11-14"
undefinedcat /tmp/terraform.log | grep "2025-11-14"
undefinedStep 3: Execute Troubleshooting Workflow
步骤3:执行故障排除工作流
Follow this sequence for systematic debugging:
bash
undefined按照以下顺序进行系统化调试:
bash
undefined1. Validate HCL syntax
1. 验证HCL语法
terraform validate
terraform validate
✓ Catches syntax, type, and required argument errors
✓ 捕获语法、类型和必填参数错误
✗ Does NOT validate against actual cloud state
✗ 不会针对实际云状态进行验证
2. Format code (catches formatting issues)
2. 格式化代码(捕获格式问题)
terraform fmt -check -recursive
terraform fmt -recursive # Fix formatting
terraform fmt -check -recursive
terraform fmt -recursive # 修复格式问题
3. Refresh state (sync with actual infrastructure)
3. 刷新状态(与实际基础设施同步)
terraform refresh
terraform refresh
✓ Updates Terraform state to match real infrastructure
✓ 更新Terraform状态以匹配真实基础设施
✗ Does NOT make changes, only reads
✗ 不会进行变更,仅读取
4. Re-initialize (if provider issues)
4. 重新初始化(如果遇到提供商问题)
terraform init -upgrade
terraform init -upgrade
✓ Updates provider versions to latest compatible
✓ 将提供商版本更新到最新兼容版本
✗ Requires time for downloads
✗ 需要时间下载
5. Plan with detailed output
5. 运行plan并生成详细输出
terraform plan -out=tfplan
terraform plan -out=tfplan
✓ Shows exactly what will change
✓ 准确显示将要进行的变更
✗ Does NOT make changes
✗ 不会进行变更
6. Check logs
6. 查看日志
grep -i error /tmp/terraform.log
undefinedgrep -i error /tmp/terraform.log
undefinedStep 4: Handle Specific Error Types
步骤4:处理特定错误类型
State Lock Errors
状态锁定错误
Problem: Another Terraform operation is running or left a stale lock.
bash
undefined问题:另一个Terraform操作正在运行或留下了无效锁定。
bash
undefinedOption 1: Wait for lock (if operation is legitimately running)
选项1:等待锁定释放(如果操作确实在运行)
terraform apply -lock-timeout=10m
terraform apply -lock-timeout=10m
Option 2: Force unlock (use with caution!)
选项2:强制解锁(谨慎使用!)
terraform force-unlock LOCK_ID
terraform force-unlock LOCK_ID
Get LOCK_ID from error message
从错误消息中获取LOCK_ID
Option 3: Manual recovery (last resort)
选项3:手动恢复(最后手段)
Delete lock file from GCS backend
从GCS后端删除锁定文件
gsutil rm gs://bucket/prefix/default.tflock
**Prevention**:
- Use CI/CD with job queuing (prevents concurrent runs)
- Communicate with team before applying
- Use Terraform Cloud/Enterprise for automatic lockinggsutil rm gs://bucket/prefix/default.tflock
**预防措施**:
- 使用带作业排队的CI/CD(防止并发运行)
- 执行前与团队沟通
- 使用Terraform Cloud/Enterprise进行自动锁定Cycle Errors (Circular Dependencies)
循环错误(循环依赖)
Problem: Resources depend on each other in a circle.
Error: Cycle: resource_a, resource_b, resource_aSolution: Break the cycle by using or reordering:
depends_onhcl
undefined问题:资源之间存在循环依赖。
Error: Cycle: resource_a, resource_b, resource_a解决方案:使用或重新排序来打破循环:
depends_onhcl
undefined❌ BAD: Circular reference
❌ 错误:循环引用
resource "google_compute_firewall" "allow_app" {
source_tags = [google_compute_instance.app.tags[0]]
}
resource "google_compute_instance" "app" {
tags = [google_compute_firewall.allow_app.name]
}
resource "google_compute_firewall" "allow_app" {
source_tags = [google_compute_instance.app.tags[0]]
}
resource "google_compute_instance" "app" {
tags = [google_compute_firewall.allow_app.name]
}
✅ GOOD: Break dependency
✅ 正确:打破依赖
resource "google_compute_firewall" "allow_app" {
source_tags = ["app"] # Use explicit string instead
}
resource "google_compute_instance" "app" {
tags = ["app"] # Explicit value
}
undefinedresource "google_compute_firewall" "allow_app" {
source_tags = ["app"] # 使用显式字符串替代
}
resource "google_compute_instance" "app" {
tags = ["app"] # 显式值
}
undefinedProvider Version Conflicts
提供商版本冲突
Problem: Provider version constraint conflict.
Error: Incompatible provider version
Terraform requires >= 5.26.0, < 5.27.0
You have 6.0.0 installedSolution:
bash
undefined问题:提供商版本约束冲突。
Error: Incompatible provider version
Terraform requires >= 5.26.0, < 5.27.0
You have 6.0.0 installed解决方案:
bash
undefined1. Check current version
1. 检查当前版本
terraform version
terraform version
2. Lock to compatible version
2. 锁定到兼容版本
In main.tf
在main.tf中
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.26.0" # Allows 5.26.x, not 5.27.0
}
}
}
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.26.0" # 允许5.26.x,不允许5.27.0
}
}
}
3. Re-initialize
3. 重新初始化
terraform init -upgrade
terraform init -upgrade
4. Commit .terraform.lock.hcl
4. 提交.terraform.lock.hcl
git add .terraform.lock.hcl
git commit -m "lock: pin Google provider to 5.26.0"
undefinedgit add .terraform.lock.hcl
git commit -m "lock: pin Google provider to 5.26.0"
undefinedGCP Permission Errors
GCP权限错误
Problem: Service account lacks required GCP permissions.
Error: Error creating PubSub topic: googleapi: Error 403:
The caller does not have permissionSolution:
bash
undefined问题:服务账号缺少所需的GCP权限。
Error: Error creating PubSub topic: googleapi: Error 403:
The caller does not have permission解决方案:
bash
undefined1. Check current authentication
1. 检查当前认证
gcloud auth list
gcloud config get-value project
gcloud auth list
gcloud config get-value project
2. Verify service account permissions
2. 验证服务账号权限
gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:app-runtime@*"
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:app-runtime@*"
gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:app-runtime@*"
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:app-runtime@*"
3. Grant required role
3. 授予所需角色
gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:app-runtime@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
--member="serviceAccount:app-runtime@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:app-runtime@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
--member="serviceAccount:app-runtime@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
4. Re-plan
4. 重新运行plan
terraform plan
undefinedterraform plan
undefinedState Out of Sync
状态不同步
Problem: Terraform state doesn't match actual infrastructure.
bash
undefined问题:Terraform状态与实际基础设施不匹配。
bash
undefinedDetect drift
检测漂移
terraform plan
terraform plan
Shows changes that don't exist in your .tf files
显示.tf文件中不存在的变更
Sync state
同步状态
terraform refresh
terraform refresh
Updates state to match real infrastructure
更新Terraform状态以匹配真实基础设施
Manual fix (if refresh fails)
手动修复(如果refresh失败)
terraform import google_pubsub_topic.incoming
projects/ecp-wtr-supplier-charges-prod/topics/my-topic
projects/ecp-wtr-supplier-charges-prod/topics/my-topic
terraform import google_pubsub_topic.incoming
projects/ecp-wtr-supplier-charges-prod/topics/my-topic
projects/ecp-wtr-supplier-charges-prod/topics/my-topic
Remove from state (if resource manually deleted)
从状态中移除(如果资源已手动删除)
terraform state rm google_pubsub_topic.incoming
undefinedterraform state rm google_pubsub_topic.incoming
undefinedStep 5: Review and Recover
步骤5:复查与恢复
bash
undefinedbash
undefinedView recent state changes
查看最近的状态变更
terraform state list
terraform state show google_pubsub_topic.incoming
terraform state list
terraform state show google_pubsub_topic.incoming
See what changed in last apply
查看上次apply的变更内容
terraform show tfplan | head -50
terraform show tfplan | head -50
Rollback by re-applying previous configuration
通过重新应用之前的配置来回滚
git checkout HEAD~1 # Go back one commit
terraform plan
terraform apply
undefinedgit checkout HEAD~1 # 回退一个提交
terraform plan
terraform apply
undefinedExamples
示例
Example 1: Debugging State Lock
示例1:调试状态锁定
bash
undefinedbash
undefinedError occurs
出现错误
Error: Error acquiring the state lock
Error: Error acquiring the state lock
Lock Info:
Lock Info:
ID: abc123def456
ID: abc123def456
Path: gs://terraform-state-prod/supplier-charges-hub/default.tflock
Path: gs://terraform-state-prod/supplier-charges-hub/default.tflock
Created: 2025-11-14 10:30:00 UTC
Created: 2025-11-14 10:30:00 UTC
Step 1: Check if operation is running
步骤1:检查是否有操作在运行
gcloud compute operations list --filter="status:RUNNING"
gcloud compute operations list --filter="status:RUNNING"
Step 2: If no running operation, force unlock
步骤2:如果没有运行中的操作,强制解锁
terraform force-unlock abc123def456
terraform force-unlock abc123def456
Step 3: If force-unlock fails, delete lock file
步骤3:如果强制解锁失败,删除锁定文件
gsutil rm gs://terraform-state-prod/supplier-charges-hub/default.tflock
gsutil rm gs://terraform-state-prod/supplier-charges-hub/default.tflock
Step 4: Re-plan to verify state is correct
步骤4:重新运行plan以验证状态正确
terraform refresh
terraform plan
undefinedterraform refresh
terraform plan
undefinedExample 2: Fixing Syntax Error
示例2:修复语法错误
bash
undefinedbash
undefinedError occurs
出现错误
Error: Invalid value for module argument
Error: Invalid value for module argument
Step 1: Validate syntax
步骤1:验证语法
terraform validate
terraform validate
Output shows exactly what's wrong:
输出会明确显示问题:
Error: Missing required argument
Error: Missing required argument
on pubsub.tf line 5, in resource "google_pubsub_topic" "topics":
on pubsub.tf line 5, in resource "google_pubsub_topic" "topics":
5: resource "google_pubsub_topic" "topics" {
5: resource "google_pubsub_topic" "topics" {
The argument "name" is required, but was not set.
The argument "name" is required, but was not set.
Step 2: Review and fix the file
步骤2:复查并修复文件
Add missing argument:
添加缺失的参数:
resource "google_pubsub_topic" "topics" {
name = "my-topic" # Add this
}
resource "google_pubsub_topic" "topics" {
name = "my-topic" # 添加此行
}
Step 3: Validate again
步骤3:再次验证
terraform validate
undefinedterraform validate
undefinedExample 3: GCP Permission Recovery
示例3:GCP权限恢复
bash
undefinedbash
undefinedError occurs
出现错误
Error creating PubSub topic: googleapi: Error 403
Error creating PubSub topic: googleapi: Error 403
Step 1: Check authentication
步骤1:检查认证
gcloud auth list
gcloud config get-value project
gcloud auth list
gcloud config get-value project
Step 2: Get current IAM bindings
步骤2:获取当前IAM绑定
gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
gcloud projects get-iam-policy ecp-wtr-supplier-charges-prod
Step 3: Add Pub/Sub Editor role
步骤3:添加Pub/Sub Editor角色
gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:terraform@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
--member="serviceAccount:terraform@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
gcloud projects add-iam-policy-binding ecp-wtr-supplier-charges-prod
--member="serviceAccount:terraform@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
--member="serviceAccount:terraform@project.iam.gserviceaccount.com"
--role="roles/pubsub.editor"
Step 4: Re-run Terraform
步骤4:重新运行Terraform
terraform plan
terraform apply
undefinedterraform plan
terraform apply
undefinedRequirements
要求
- Terraform 1.x+ installed
- GCP credentials configured (gcloud auth or service account key)
- Logging environment (for TF_LOG_PATH)
- GCP CLI tools installed (,
gcloud)gsutil
- 已安装Terraform 1.x+
- 已配置GCP凭据(gcloud auth或服务账号密钥)
- 日志环境(用于TF_LOG_PATH)
- 已安装GCP CLI工具(、
gcloud)gsutil
See Also
相关链接
- terraform-basics - General Terraform reference
- terraform-state-management - Advanced state patterns
- terraform-gcp-integration - GCP-specific issues
- terraform-basics - Terraform通用参考
- terraform-state-management - 高级状态管理模式
- terraform-gcp-integration - GCP专属问题