Skill
4
Agent
All Skills
Search
Tools
中文
|
EN
Explore
Loading...
Back to Details
devops-iac-engineer
Compare original and translation side by side
🇺🇸
Original
English
🇨🇳
Translation
Chinese
DevOps IaC Engineer
DevOps IaC工程师
This skill provides expertise in designing and managing cloud infrastructure using Infrastructure as Code (IaC) and DevOps/SRE best practices.
本技能提供基于基础设施即代码(IaC)和DevOps/SRE最佳实践的云基础设施设计与管理专业知识。
When to Use
适用场景
Designing cloud architecture (AWS, GCP, Azure)
Implementing or refactoring CI/CD pipelines
Setting up observability (logging, metrics, tracing)
Creating Kubernetes clusters and container orchestration strategies
Implementing security controls and compliance checks
Improving system reliability (SLO/SLA, Disaster Recovery)
设计云架构(AWS、GCP、Azure)
实施或重构CI/CD流水线
搭建可观测性系统(日志、指标、追踪)
创建Kubernetes集群及容器编排策略
实施安全控制与合规检查
提升系统可靠性(SLO/SLA、灾难恢复)
Infrastructure as Code (IaC) Principles
基础设施即代码(IaC)原则
Declarative Code
: Use Terraform/OpenTofu to define the desired state.
GitOps
: Code repository is the single source of truth. Changes are applied via PRs and automated pipelines.
Immutable Infrastructure
: Replace servers/containers rather than patching them in place.
声明式代码
:使用Terraform/OpenTofu定义期望状态。
GitOps
:代码仓库为唯一可信源,通过PR和自动化流水线应用变更。
不可变基础设施
:替换服务器/容器而非原地修补。
Core Domains
核心领域
1. Terraform & IaC
1. Terraform & IaC
Use modules for reusability.
Separate state by environment (dev, stage, prod) and region.
Automate
plan
and
apply
in CI/CD.
使用模块提升复用性。
按环境(开发、预发布、生产)和区域分离状态。
在CI/CD中自动化
plan
和
apply
操作。
2. Kubernetes & Containers
2. Kubernetes & 容器
Build small, stateless containers.
Use Helm or Kustomize for resource management.
Implement resource limits and requests.
Use namespaces for isolation.
构建轻量、无状态容器。
使用Helm或Kustomize进行资源管理。
实施资源限制与请求配置。
使用命名空间实现隔离。
3. CI/CD Pipelines
3. CI/CD流水线
CI
: Lint, test, build, and scan (security) on every commit.
CD
: Automated deployment to lower environments; manual approval for production.
Use tools like GitHub Actions, Cloud Build, or ArgoCD.
CI
:每次提交时执行代码检查、测试、构建和安全扫描。
CD
:自动部署到低环境;生产环境需手动审批。
使用GitHub Actions、Cloud Build或ArgoCD等工具。
4. Observability
4. 可观测性
Logs
: Centralized logging (e.g., Cloud Logging, ELK).
Metrics
: Prometheus/Grafana or Cloud Monitoring.
Tracing
: OpenTelemetry for distributed tracing.
日志
:集中式日志系统(如Cloud Logging、ELK)。
指标
:Prometheus/Grafana或Cloud Monitoring。
追踪
:使用OpenTelemetry进行分布式追踪。
5. Security (DevSecOps)
5. 安全(DevSecOps)
Scan IaC for misconfigurations (e.g., Checkov, Trivy).
Manage secrets utilizing Secret Manager or Vault (never in code).
Least privilege IAM roles.
扫描IaC配置错误(如Checkov、Trivy)。
使用Secret Manager或Vault管理密钥(绝不要嵌入代码)。
遵循最小权限IAM角色原则。
SRE Practices
SRE实践
SLI/SLO
: Define Service Level Indicators and Objectives for critical user journeys.
Error Budgets
: Use error budgets to balance innovation and reliability.
Post-Mortems
: Conduct blameless post-mortems for incidents.
SLI/SLO
:为关键用户旅程定义服务水平指标和目标。
错误预算
:利用错误平衡创新与可靠性。
事后复盘
:针对事件开展无责复盘。