terragrunt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Terragrunt Infrastructure Skill

Terragrunt基础设施管理技能

Manage bare-metal Kubernetes infrastructure from PXE boot to running clusters.
For architecture overview (units vs modules, config centralization), see infrastructure/CLAUDE.md. For detailed unit patterns, see infrastructure/units/CLAUDE.md.
管理从PXE启动到运行集群的裸金属Kubernetes基础设施。
如需了解架构概述(units与modules的区别、配置集中化),请查看infrastructure/CLAUDE.md。如需详细的单元模式说明,请查看infrastructure/units/CLAUDE.md

Task Commands (Always Use These)

Task命令(请始终使用这些命令)

bash
undefined
bash
undefined

Validation (run in order)

Validation (run in order)

task tg:fmt # Format HCL files task tg:test-<module> # Test specific module (e.g., task tg:test-config) task tg:validate-<stack> # Validate stack (e.g., task tg:validate-integration)
task tg:fmt # Format HCL files task tg:test-<module> # Test specific module (e.g., task tg:test-config) task tg:validate-<stack> # Validate stack (e.g., task tg:validate-integration)

Operations

Operations

task tg:list # List available stacks task tg:plan-<stack> # Plan (e.g., task tg:plan-integration) task tg:apply-<stack> # Apply (REQUIRES HUMAN APPROVAL) task tg:gen-<stack> # Generate stack files task tg:clean-<stack> # Clean generated files

**NEVER** run `terragrunt` or `tofu` directly—always use `task` commands.
task tg:list # List available stacks task tg:plan-<stack> # Plan (e.g., task tg:plan-integration) task tg:apply-<stack> # Apply (REQUIRES HUMAN APPROVAL) task tg:gen-<stack> # Generate stack files task tg:clean-<stack> # Clean generated files

**切勿**直接运行`terragrunt`或`tofu`——请始终使用`task`命令。

How to Add a Machine

如何添加机器

  1. Edit
    inventory.hcl
    :
hcl
node50 = {
  cluster = "live"
  type    = "worker"
  install = {
    selector     = "disk.model == 'Samsung'"
    architecture = "amd64"
  }
  interfaces = [{
    id           = "eth0"
    hardwareAddr = "aa:bb:cc:dd:ee:ff"  # VERIFY correct
    addresses    = [{ ip = "192.168.10.50" }]  # VERIFY available
  }]
}
  1. Run
    task tg:plan-live
  2. Review plan—config module auto-includes machines where
    cluster == "live"
  3. Request human approval before apply
  1. 编辑
    inventory.hcl
hcl
node50 = {
  cluster = "live"
  type    = "worker"
  install = {
    selector     = "disk.model == 'Samsung'"
    architecture = "amd64"
  }
  interfaces = [{
    id           = "eth0"
    hardwareAddr = "aa:bb:cc:dd:ee:ff"  # VERIFY correct
    addresses    = [{ ip = "192.168.10.50" }]  # VERIFY available
  }]
}
  1. 运行
    task tg:plan-live
  2. 查看规划——config模块会自动包含
    cluster == "live"
    的机器
  3. 应用前需获得人工批准

How to Add a Feature Flag

如何添加功能标志

  1. Add version to
    versions.hcl
    if needed
  2. Add feature detection in
    modules/config/main.tf
    :
hcl
locals {
  new_feature_enabled = contains(var.features, "new-feature")
}
  1. Enable in stack's features list:
hcl
features = ["gateway-api", "longhorn", "new-feature"]
  1. 如有需要,在
    versions.hcl
    中添加版本
  2. modules/config/main.tf
    中添加功能检测:
hcl
locals {
  new_feature_enabled = contains(var.features, "new-feature")
}
  1. 在栈的功能列表中启用:
hcl
features = ["gateway-api", "longhorn", "new-feature"]

How to Create a New Unit

如何创建新单元

  1. Create
    units/new-unit/terragrunt.hcl
    :
hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
}

terraform {
  source = "../../../.././/modules/new-unit"
}

dependency "config" {
  config_path = "../config"
  mock_outputs = { new_unit = {} }
}

inputs = dependency.config.outputs.new_unit
  1. Create corresponding
    modules/new-unit/
    with
    variables.tf
    ,
    main.tf
    ,
    outputs.tf
    ,
    versions.tf
  2. Add output from config module
  3. Add
    unit
    block to stacks that need it
  1. 创建
    units/new-unit/terragrunt.hcl
hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
}

terraform {
  source = "../../../.././/modules/new-unit"
}

dependency "config" {
  config_path = "../config"
  mock_outputs = { new_unit = {} }
}

inputs = dependency.config.outputs.new_unit
  1. 创建对应的
    modules/new-unit/
    目录,包含
    variables.tf
    main.tf
    outputs.tf
    versions.tf
  2. 在config模块中添加输出
  3. 向需要该单元的栈中添加
    unit

How to Write Module Tests

如何编写模块测试

Tests use OpenTofu native testing in
modules/<name>/tests/*.tftest.hcl
:
hcl
undefined
测试使用OpenTofu原生测试,位于
modules/<name>/tests/*.tftest.hcl
hcl
undefined

Top-level variables set defaults for ALL run blocks

Top-level variables set defaults for ALL run blocks

variables { name = "test-cluster" features = ["gateway-api"] machines = { node1 = { cluster = "test-cluster" type = "controlplane" # ... complete machine definition } } }
run "feature_enabled" { command = plan variables { features = ["prometheus"] # Only override what differs } assert { condition = output.prometheus_enabled == true error_message = "Prometheus should be enabled" } }

Run with `task tg:test-config` or `task tg:test` for all modules.
variables { name = "test-cluster" features = ["gateway-api"] machines = { node1 = { cluster = "test-cluster" type = "controlplane" # ... complete machine definition } } }
run "feature_enabled" { command = plan variables { features = ["prometheus"] # Only override what differs } assert { condition = output.prometheus_enabled == true error_message = "Prometheus should be enabled" } }

使用`task tg:test-config`运行单个模块测试,或使用`task tg:test`运行所有模块测试。

Safety Rules

安全规则

  • NEVER run apply without explicit human approval
  • NEVER use
    --auto-approve
    flags
  • NEVER guess MAC addresses or IPs—verify against
    inventory.hcl
  • NEVER commit
    .terragrunt-cache/
    or
    .terragrunt-stack/
  • NEVER manually edit Terraform state
  • 切勿在未获得明确人工批准的情况下运行apply
  • 切勿使用
    --auto-approve
    参数
  • 切勿猜测MAC地址或IP地址——请对照
    inventory.hcl
    进行验证
  • 切勿提交
    .terragrunt-cache/
    .terragrunt-stack/
    目录
  • 切勿手动编辑Terraform状态

State Operations

状态操作

When removing state entries with indexed resources (e.g.,
this["rpi4"]
),
xargs
strips the quotes causing errors. Use a
while
loop instead:
bash
undefined
当移除带有索引资源的状态条目时(例如
this["rpi4"]
),
xargs
会去除引号导致错误。请改用
while
循环:
bash
undefined

WRONG - xargs mangles quotes in resource names

WRONG - xargs mangles quotes in resource names

terragrunt state list | xargs -n 1 terragrunt state rm
terragrunt state list | xargs -n 1 terragrunt state rm

CORRECT - while loop preserves quotes

CORRECT - while loop preserves quotes

terragrunt state list | while read -r resource; do terragrunt state rm "$resource"; done

This applies to any state operation on resources with map keys like `data.talos_machine_configuration.this["rpi4"]`.
terragrunt state list | while read -r resource; do terragrunt state rm "$resource"; done

这适用于所有对带有映射键的资源(如`data.talos_machine_configuration.this["rpi4"]`)执行的状态操作。

Validation Checklist

验证检查清单

Before requesting apply approval:
  • task tg:fmt
    passes
  • task tg:test
    passes (if module tests exist)
  • task tg:validate
    passes for ALL stacks
  • task tg:plan-<stack>
    reviewed
  • No unexpected destroys in plan
  • Network changes won't break connectivity
申请应用批准前,请确认:
  • task tg:fmt
    执行通过
  • task tg:test
    执行通过(如果存在模块测试)
  • 所有栈的
    task tg:validate
    执行通过
  • 已查看
    task tg:plan-<stack>
    的结果
  • 规划中无意外的销毁操作
  • 网络变更不会中断连通性

References

参考资料

  • stacks.md - Detailed Terragrunt stacks documentation
  • units.md - Detailed Terragrunt units documentation
  • stacks.md - Terragrunt栈详细文档
  • units.md - Terragrunt单元详细文档