terraform-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTerraform Engineer
Terraform工程师
Purpose
目的
Provides Infrastructure as Code expertise specializing in Terraform and OpenTofu for cloud provisioning. Designs modular, scalable infrastructure with proper state management, remote backends, and GitOps-driven automation pipelines.
提供专注于Terraform和OpenTofu的基础设施即代码(IaC)专业能力,用于云资源部署。设计具备完善状态管理、远程后端和GitOps驱动自动化流水线的模块化、可扩展基础设施。
When to Use
适用场景
- Provisioning new cloud infrastructure (VPCs, EKS, RDS)
- Refactoring monolithic Terraform code into reusable modules
- Implementing "GitOps" for infrastructure (Atlantis/TFC)
- Managing remote state, locking, and backend configuration
- Writing custom providers or complex HCL logic (loops, conditionals)
- Migrating/importing existing manual infrastructure into Terraform
- 部署新的云基础设施(VPC、EKS、RDS)
- 将单体Terraform代码重构为可复用模块
- 为基础设施实现"GitOps"流程(Atlantis/TFC)
- 管理远程状态、锁机制和后端配置
- 编写自定义Provider或复杂HCL逻辑(循环、条件判断)
- 将现有手动部署的基础设施迁移/导入到Terraform中
Examples
示例
Example 1: Multi-Cloud Landing Zone
示例1:多云着陆区
Scenario: Building a secure, compliant multi-cloud landing zone.
Implementation:
- Created reusable modules for VPC, IAM, security groups
- Implemented remote state with S3 backend and DynamoDB locking
- Added variable validation and preconditions
- Implemented cost estimation and budget alerts
- Set up Terraform Cloud for state management
Results:
- Infrastructure provisioning reduced from weeks to hours
- 100% consistency across environments
- Security compliance automated
- 40% reduction in cloud costs through optimization
场景: 构建安全、合规的多云着陆区。
实现方案:
- 创建VPC、IAM、安全组的可复用模块
- 基于S3后端和DynamoDB锁机制实现远程状态管理
- 添加变量验证和前置条件
- 实现成本估算和预算告警
- 配置Terraform Cloud进行状态管理
成果:
- 基础设施部署时间从数周缩短至数小时
- 各环境实现100%一致性
- 安全合规实现自动化
- 通过优化将云成本降低40%
Example 2: Kubernetes Platform with EKS
示例2:基于EKS的Kubernetes平台
Scenario: Building a production-ready Kubernetes platform.
Implementation:
- Created EKS module with managed node groups
- Implemented RBAC and service accounts
- Added network policies and security groups
- Configured secrets management with Vault integration
- Set up monitoring and observability
Results:
- Platform deployment in under 30 minutes
- Zero configuration drift
- Built-in security controls
- Clear upgrade path for K8s versions
场景: 构建生产可用的Kubernetes平台。
实现方案:
- 创建包含托管节点组的EKS模块
- 实现RBAC和服务账号配置
- 添加网络策略和安全组
- 集成Vault配置密钥管理
- 搭建监控与可观测性体系
成果:
- 平台部署时间控制在30分钟内
- 无配置漂移问题
- 内置安全管控能力
- 具备清晰的K8s版本升级路径
Example 3: Legacy Infrastructure Migration
示例3:遗留基础设施迁移
Scenario: Importing manually provisioned infrastructure into Terraform.
Implementation:
- Used terraform import for existing resources
- Created corresponding Terraform configurations
- Implemented state mv for resource reorganization
- Verified no changes during import
- Established Terraform as source of truth
Results:
- 200+ resources migrated to Terraform
- Infrastructure now version controlled
- Enables infrastructure as code workflows
- Improved audit and compliance
场景: 将手动部署的基础设施导入Terraform管控。
实现方案:
- 使用terraform import命令导入现有资源
- 创建对应的Terraform配置文件
- 使用state mv命令重组资源
- 验证导入过程中无配置变更
- 将Terraform确立为基础设施的唯一可信源
成果:
- 200+资源成功迁移至Terraform管控
- 基础设施实现版本控制
- 启用基础设施即代码工作流
- 审计与合规能力提升
Best Practices
最佳实践
State Management
状态管理
- Remote Backend: Always use remote state (S3, GCS, Terraform Cloud)
- State Locking: Prevent concurrent modifications
- State Isolation: Separate state for environments
- Backup: Enable state versioning
- 远程后端:始终使用远程状态(S3、GCS、Terraform Cloud)
- 状态锁:防止并发修改
- 状态隔离:为不同环境分离状态文件
- 备份:启用状态版本控制
Module Development
模块开发
- Single Responsibility: Each module does one thing well
- Version Pinning: Lock module versions
- Documentation: Document inputs, outputs, behavior
- Testing: Test modules before publishing
- 单一职责:每个模块专注完成一项功能
- 版本固定:锁定模块版本
- 文档化:记录输入、输出和行为逻辑
- 测试:发布前对模块进行测试
Code Quality
代码质量
- Formatting: Use terraform fmt consistently
- Validation: Run terraform validate
- Linting: Use tflint for provider-specific issues
- Security Scanning: Use tfsec/checkov
- 格式化:统一使用terraform fmt命令
- 验证:运行terraform validate命令
- 代码检查:使用tflint检查Provider相关问题
- 安全扫描:使用tfsec/checkov工具
Collaboration
协作流程
- Code Review: All changes reviewed before merge
- Workspace Strategy: Use workspaces for environment isolation
- Variable Management: Use variable files, not hardcoding
- Output Documentation: Document important outputs
- 代码评审:所有变更合并前必须经过评审
- 工作区策略:使用工作区实现环境隔离
- 变量管理:使用变量文件,避免硬编码
- 输出文档化:记录重要的输出内容
2. Decision Framework
2. 决策框架
State Management Strategy
状态管理策略
| Scale | Strategy | Backend |
|---|---|---|
| Individual | Local State | |
| Small Team | Remote State + Locking | |
| Enterprise | Managed State + Runs | Terraform Cloud / spacelift / env0 |
| GitOps | PR-driven Runs | Atlantis (Self-hosted) |
| 规模 | 策略 | 后端 |
|---|---|---|
| 个人 | 本地状态 | |
| 小型团队 | 远程状态+锁机制 | |
| 企业 | 托管状态+运行管控 | Terraform Cloud / spacelift / env0 |
| GitOps | PR驱动的运行流程 | Atlantis(自托管) |
Module Architecture
模块架构
What are you building?
│
├─ **Root Module** (The "Glue")
│ ├─ `main.tf`: Instantiates child modules
│ ├─ `providers.tf`: Provider config
│ └─ `backend.tf`: State config
│
├─ **Child Modules** (Reusable)
│ ├─ **Resource Modules**: Wraps single resource (e.g., `s3-secure-bucket`)
│ │ └─ Enforces tagging, encryption, logging defaults.
│ │
│ └─ **Infrastructure Modules**: Logical group (e.g., `vpc-with-peering`)
│ └─ Combines VPC, Subnets, Route Tables, NAT Gateways.
│
└─ **Composition** (Terragrunt/Workspaces)
├─ `prod/`
├─ `stage/`
└─ `dev/`What are you building?
│
├─ **Root Module** (The "Glue")
│ ├─ `main.tf`: Instantiates child modules
│ ├─ `providers.tf`: Provider config
│ └─ `backend.tf`: State config
│
├─ **Child Modules** (Reusable)
│ ├─ **Resource Modules**: Wraps single resource (e.g., `s3-secure-bucket`)
│ │ └─ Enforces tagging, encryption, logging defaults.
│ │
│ └─ **Infrastructure Modules**: Logical group (e.g., `vpc-with-peering`)
│ └─ Combines VPC, Subnets, Route Tables, NAT Gateways.
│
└─ **Composition** (Terragrunt/Workspaces)
├─ `prod/`
├─ `stage/`
└─ `dev/`Terraform vs. The World
Terraform vs. 其他工具
| Tool | Approach | Best For |
|---|---|---|
| Terraform | HCL (Declarative) | Industry standard, massive ecosystem. |
| Pulumi | General Purpose Lang (TS/Py) | Devs who hate HCL, dynamic logic. |
| Crossplane | K8s Custom Resources | Control planes, self-service platforms. |
| CloudFormation | YAML/JSON | AWS purists (drift detection is native). |
Red Flags → Escalate to :
security-engineer- Hardcoded AWS keys in block
provider - State files stored in git ()
terraform.tfstate - Security Groups allowing on SSH/RDP
0.0.0.0/0 - S3 buckets public by default
| 工具 | 实现方式 | 最佳适用场景 |
|---|---|---|
| Terraform | HCL(声明式) | 行业标准,生态系统庞大 |
| Pulumi | 通用编程语言(TS/Py) | 讨厌HCL、需要动态逻辑的开发者 |
| Crossplane | K8s自定义资源 | 控制平面、自助服务平台 |
| CloudFormation | YAML/JSON | AWS纯原生用户(原生支持漂移检测) |
危险信号 → 需升级至处理:
security-engineer- Provider块中硬编码AWS密钥
- 将状态文件存储在Git中()
terraform.tfstate - 安全组允许SSH/RDP端口对开放
0.0.0.0/0 - S3桶默认设为公开访问
3. Core Workflows
3. 核心工作流
Workflow 1: Production AWS VPC (Modular)
工作流1:生产环境AWS VPC(模块化)
Goal: Create a 3-tier VPC network using the community module.
Steps:
-
Dependency Definition ()
versions.tfhclterraform { required_version = ">= 1.5.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } -
Implementation ()
main.tfhclmodule "vpc" { source = "terraform-aws-modules/vpc/aws" version = "5.5.1" name = "prod-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true single_nat_gateway = false # High Availability enable_vpn_gateway = false tags = { Environment = "Production" Terraform = "true" } } -
Outputs ()
outputs.tfhcloutput "vpc_id" { description = "The ID of the VPC" value = module.vpc.vpc_id }
目标: 使用社区模块创建三层VPC网络。
步骤:
-
定义依赖()
versions.tfhclterraform { required_version = ">= 1.5.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } -
实现配置()
main.tfhclmodule "vpc" { source = "terraform-aws-modules/vpc/aws" version = "5.5.1" name = "prod-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true single_nat_gateway = false # High Availability enable_vpn_gateway = false tags = { Environment = "Production" Terraform = "true" } } -
输出配置()
outputs.tfhcloutput "vpc_id" { description = "The ID of the VPC" value = module.vpc.vpc_id }
Workflow 3: Importing Existing Infrastructure
工作流3:导入现有基础设施
Goal: Bring a manually created EC2 instance under Terraform control.
Steps:
-
Identify Resource ID
- AWS Console → EC2 → Instance ID:
i-0123456789abcdef0
- AWS Console → EC2 → Instance ID:
-
Write Terraform Codehcl
resource "aws_instance" "legacy_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t2.micro" # Fill in other known details... } -
Run Importbash
terraform import aws_instance.legacy_server i-0123456789abcdef0(Or useblock in TF 1.5+)importhclimport { to = aws_instance.legacy_server id = "i-0123456789abcdef0" } -
Reconcile
- Run .
terraform plan - Update code to match the state until "No changes" is reported.
- Run
目标: 将手动创建的EC2实例纳入Terraform管控。
步骤:
-
获取资源ID
- AWS控制台 → EC2 → 实例ID:
i-0123456789abcdef0
- AWS控制台 → EC2 → 实例ID:
-
编写Terraform代码hcl
resource "aws_instance" "legacy_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t2.micro" # 补充其他已知配置... } -
执行导入命令bash
terraform import aws_instance.legacy_server i-0123456789abcdef0(或在TF 1.5+版本中使用块)importhclimport { to = aws_instance.legacy_server id = "i-0123456789abcdef0" } -
配置调和
- 运行命令
terraform plan - 更新代码直至状态与实际资源一致,显示"No changes"
- 运行
5. Anti-Patterns & Gotchas
5. 反模式与常见陷阱
❌ Anti-Pattern 1: Monolithic State File
❌ 反模式1:单体状态文件
What it looks like:
- One controlling VPC, Database, EKS, and 50 Microservices.
main.tf - takes 10 minutes.
terraform plan
Why it fails:
- Blast Radius: One error breaks everything.
- Performance: API rate limits (AWS Throttling).
- Locking: Dev A blocks Dev B.
Correct approach:
- Split State: Separate ,
network,data.app-cluster - Use data source to read outputs from other layers.
terraform_remote_state
表现:
- 一个文件同时管控VPC、数据库、EKS和50个微服务
main.tf - 执行时间长达10分钟
terraform plan
问题:
- 影响范围大:一处错误会导致所有资源故障
- 性能差:触发API速率限制(AWS限流)
- 锁冲突:开发者A的操作会阻塞开发者B
正确做法:
- 拆分状态:将状态拆分为、
network、data等独立部分app-cluster - 使用数据源读取其他层的输出内容
terraform_remote_state
❌ Anti-Pattern 2: Hardcoding Environments
❌ 反模式2:硬编码环境配置
What it looks like:
- ,
vpc-prod.tffiles with duplicated code.vpc-dev.tf
Why it fails:
- Drift between environments.
- Double maintenance.
Correct approach:
- Workspaces: Use with
terraform workspace.var.environment - Tfvars: vs
prod.tfvars.dev.tfvars - Modules: Reuse the same logic, pass different variables.
表现:
- 存在、
vpc-prod.tf等重复代码文件vpc-dev.tf
问题:
- 各环境间出现配置漂移
- 维护成本翻倍
正确做法:
- 工作区:使用结合
terraform workspace变量var.environment - 变量文件:使用和
prod.tfvars区分环境dev.tfvars - 模块复用:复用相同逻辑,通过传递不同变量适配环境
❌ Anti-Pattern 3: Ignoring .gitignore
.gitignore❌ 反模式3:忽略.gitignore
配置
.gitignoreWhat it looks like:
- Committing directory (plugins).
.terraform/ - Committing (secrets).
terraform.tfvars
Why it fails:
- Repo bloat.
- Security leak.
Correct approach:
- Standard for Terraform:
.gitignore.terraform/ *.tfstate *.tfstate.backup *.tfvars .terraform.lock.hcl (Commit this one!)
表现:
- 提交目录(插件文件)
.terraform/ - 提交文件(包含密钥)
terraform.tfvars
问题:
- 仓库体积臃肿
- 存在安全泄露风险
正确做法:
- 使用标准的Terraform 配置:
.gitignore.terraform/ *.tfstate *.tfstate.backup *.tfvars .terraform.lock.hcl (Commit this one!)
7. Quality Checklist
7. 质量检查清单
Code Quality:
- Formatting: Run .
terraform fmt -recursive - Validation: Run .
terraform validate - Linting: Run for provider-specific issues.
tflint - Docs: Generate README using .
terraform-docs
Security:
- Secrets: No plain text secrets (Use KMS/Vault/Secrets Manager).
- Encryption: on all storage (EBS, S3, RDS).
encrypted = true - Public Access: Locked down (S3 Block Public Access).
Reliability:
- State: Remote backend configured with locking.
- Versions: Provider and Terraform versions pinned (e.g., ).
~> 5.0 - Cleanup: provisioners tested (or protection enabled for DBs).
destroy
代码质量:
- 格式化:运行命令
terraform fmt -recursive - 验证:运行命令
terraform validate - 代码检查:使用tflint检查Provider相关问题
- 文档:使用生成README文档
terraform-docs
安全:
- 密钥管理:无明文密钥(使用KMS/Vault/密钥管理器)
- 加密配置:所有存储资源(EBS、S3、RDS)启用
encrypted = true - 公共访问控制:严格管控(启用S3公共访问阻止)
可靠性:
- 状态管理:配置带锁机制的远程后端
- 版本固定:固定Provider和Terraform版本(例如)
~> 5.0 - 清理验证:测试执行器(或对数据库启用保护机制)
destroy
Anti-Patterns
反模式汇总
State Management Anti-Patterns
状态管理反模式
- Local State: Using local state files - always use remote backends
- State Drift: Manual changes outside Terraform - use only Terraform for changes
- State Lock Contention: No state locking - implement proper locking
- State Corruption: Editing state files manually - never manually edit state
- 本地状态:使用本地状态文件 - 应始终使用远程后端
- 状态漂移:在Terraform外手动修改资源 - 仅通过Terraform进行变更
- 状态锁冲突:未配置状态锁 - 应实现完善的锁机制
- 状态损坏:手动编辑状态文件 - 绝对禁止手动修改状态
Module Anti-Patterns
模块反模式
- Monolithic Modules: Large, unwieldy modules - split into focused modules
- Hardcoded Values: Using values instead of variables - parameterize everything
- Module Version Chaos: No version pinning - pin module versions
- Deep Module Nesting: Over-nested module structures - keep module hierarchy flat
- 单体模块:庞大且难以维护的模块 - 拆分为专注的小模块
- 硬编码值:使用固定值而非变量 - 所有内容都应参数化
- 模块版本混乱:未固定模块版本 - 必须固定模块版本
- 模块嵌套过深:模块层级过于复杂 - 保持模块层级扁平化
Resource Anti-Patterns
资源反模式
- Resource Spam: Many small resources instead of patterns - use resource grouping
- Lifecycle Lock: Resources that can't update - avoid create_before_destroy conflicts
- Ignored Changes: Overusing ignore_changes - understand and manage changes
- Sensitive Data Exposure: Plain text secrets in state - use sensitive flag
- 资源冗余:创建大量零散资源而非采用模式化配置 - 使用资源分组
- 生命周期锁:资源无法更新 - 避免create_before_destroy冲突
- 忽略变更:过度使用ignore_changes - 理解并合理管理变更
- 敏感数据暴露:状态文件中包含明文敏感数据 - 使用sensitive标记
Code Organization Anti-Patterns
代码组织反模式
- Flat Structure: No directory organization - use modular structure
- Duplication: Repeated code blocks - use modules and for_each
- No Formatting: Unformatted HCL code - use terraform fmt
- Missing Documentation: undocumented modules - document all inputs/outputs
- 扁平结构:无目录组织 - 使用模块化结构
- 代码重复:存在重复代码块 - 使用模块和for_each
- 未格式化:HCL代码未格式化 - 使用terraform fmt命令
- 缺少文档:模块无文档 - 必须记录所有输入/输出