devops-automator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DevOps Automator

DevOps自动化助手

Expert DevOps engineer specializing in CI/CD pipelines, infrastructure as code, container orchestration, and deployment automation.
专注于CI/CD流水线、基础设施即代码(IaC)、容器编排和部署自动化的资深DevOps工程师。

Activation Triggers

触发场景

Activate on: "CI/CD", "GitHub Actions", "deployment pipeline", "Terraform", "infrastructure as code", "IaC", "Docker", "Kubernetes", "K8s", "Helm", "container orchestration", "GitOps", "ArgoCD", "deployment automation", "secrets management", "monitoring setup"
NOT for: Application development → language skills | Database design →
data-pipeline-engineer
| API design →
api-architect
适用触发词: "CI/CD"、"GitHub Actions"、"部署流水线"、"Terraform"、"基础设施即代码"、"IaC"、"Docker"、"Kubernetes"、"K8s"、"Helm"、"容器编排"、"GitOps"、"ArgoCD"、"部署自动化"、"密钥管理"、"监控配置"
不适用场景: 应用开发 → 语言类技能 | 数据库设计 →
data-pipeline-engineer
| API设计 →
api-architect

Quick Start

快速开始

  1. Define deployment strategy: Blue/Green, Canary, or Rolling
  2. Choose IaC tool: Terraform for cloud resources, Helm for K8s apps
  3. Design CI stages: lint → test → security scan → build → deploy
  4. Implement GitOps: Config repo synced by ArgoCD
  5. Add observability: Prometheus metrics, structured logging
  1. 定义部署策略:蓝绿部署、金丝雀部署或滚动部署
  2. 选择IaC工具:使用Terraform管理云资源,使用Helm管理K8s应用
  3. 设计CI阶段:代码检查 → 测试 → 安全扫描 → 构建 → 部署
  4. 实施GitOps:通过ArgoCD同步配置仓库
  5. 添加可观测性:Prometheus指标、结构化日志

Core Capabilities

核心能力

DomainTools & Technologies
CI/CDGitHub Actions, GitLab CI, Jenkins
IaCTerraform, AWS CDK, Pulumi
ContainersDocker, Kubernetes, Helm
GitOpsArgoCD, Flux, Kustomize
MonitoringPrometheus, Grafana, ELK/EFK
领域工具与技术
CI/CDGitHub Actions, GitLab CI, Jenkins
IaCTerraform, AWS CDK, Pulumi
容器Docker, Kubernetes, Helm
GitOpsArgoCD, Flux, Kustomize
监控Prometheus, Grafana, ELK/EFK

Architecture Patterns

架构模式

CI/CD Pipeline Flow

CI/CD流水线流程

Code Commit → Build → Test → Security Scan → Package
Monitor ← Release Staging ← Smoke Tests ← Deploy Dev
         Manual Approval
         Deploy Production
Code Commit → Build → Test → Security Scan → Package
Monitor ← Release Staging ← Smoke Tests ← Deploy Dev
         Manual Approval
         Deploy Production

GitOps Architecture

GitOps架构

App Repo ──CI──▶ Config Repo ──ArgoCD──▶ K8s Cluster
                     ▲                        │
                     └────Continuous Sync─────┘
App Repo ──CI──▶ Config Repo ──ArgoCD──▶ K8s Cluster
                     ▲                        │
                     └────Continuous Sync─────┘

Reference Files

参考文件

Full working examples are in
./references/
:
FileDescriptionLines
github-actions-patterns.yaml
Complete CI/CD pipeline217
terraform-eks-module.tf
Production EKS cluster282
kubernetes-deployment.yaml
Deployment + HPA + ArgoCD200
dockerfile-multistage.dockerfile
Optimized multi-stage build51
完整的可用示例位于
./references/
目录下:
文件描述行数
github-actions-patterns.yaml
完整的CI/CD流水线配置217
terraform-eks-module.tf
生产级EKS集群配置282
kubernetes-deployment.yaml
部署 + HPA + ArgoCD配置200
dockerfile-multistage.dockerfile
优化的多阶段构建配置51

Anti-Patterns (AVOID These)

反模式(需避免)

1. YAML Copy-Paste Proliferation

1. YAML配置复制粘贴泛滥

Symptom: Nearly identical workflow files duplicated across repositories Fix: Reusable workflows, Helm charts, Kustomize bases, Terraform modules
症状:几乎相同的工作流文件在多个仓库中重复 修复方案:使用可复用工作流、Helm Chart、Kustomize基础配置、Terraform模块

2. Hardcoded Secrets in Code

2. 代码中硬编码密钥

Symptom: API keys, passwords committed to git Fix: Secret managers (Vault, AWS SM), sealed secrets, env vars from secure sources
症状:API密钥、密码提交到Git仓库 修复方案:使用密钥管理器(Vault、AWS Secrets Manager)、加密密钥、从安全源获取环境变量

3. No Rollback Strategy

3. 无回滚策略

Symptom: No plan for deployment failure, manual intervention required Fix: Blue/green, canary with automated rollback, ArgoCD auto-revert
症状:没有部署失败的应对方案,需要手动干预 修复方案:蓝绿部署、带自动回滚的金丝雀部署、ArgoCD自动回滚

4. Monolithic CI Pipeline

4. 单体CI流水线

Symptom: Single 45-minute pipeline rebuilding everything on every commit Fix: Parallel jobs, caching, incremental builds, path-based triggers
症状:单个流水线运行45分钟,每次提交都重新构建所有内容 修复方案:并行任务、缓存、增量构建、基于路径的触发

5. No Resource Limits

5. 无资源限制

Symptom: K8s pods without CPU/memory limits consuming all host resources Fix: Always set requests/limits, use LimitRanges and ResourceQuotas
症状:K8s Pod未设置CPU/内存限制,耗尽主机所有资源 修复方案:始终设置资源请求/限制,使用LimitRanges和ResourceQuotas

6. Running as Root in Containers

6. 容器中以Root用户运行

Symptom: Dockerfile without USER instruction, pods running privileged Fix: Add USER instruction, set securityContext.runAsNonRoot: true
症状:Dockerfile中无USER指令,Pod以特权模式运行 修复方案:添加USER指令,设置securityContext.runAsNonRoot: true

7. Using :latest Tags

7. 使用:latest标签

Symptom:
FROM node:latest
or
image: app:latest
in production Fix: Pin specific versions, use immutable tags with SHA digests
症状:生产环境中使用
FROM node:latest
image: app:latest
修复方案:固定特定版本,使用带SHA摘要的不可变标签

8. No Health Checks

8. 无健康检查

Symptom: Missing HEALTHCHECK in Dockerfile, no liveness/readiness probes Fix: Add health endpoints, configure probes with appropriate timeouts
症状:Dockerfile中缺少HEALTHCHECK,无存活/就绪探针 修复方案:添加健康检查端点,配置带适当超时的探针

9. Single Point of Failure

9. 单点故障

Symptom: replicas: 1, no pod anti-affinity, single availability zone Fix: Multiple replicas, pod anti-affinity, topology spread constraints
症状:副本数为1,无Pod反亲和性,仅单个可用区 修复方案:多副本配置、Pod反亲和性、拓扑分布约束

10. Terraform State in Local File

10. Terraform状态存储在本地文件

Symptom:
terraform.tfstate
committed to git or stored locally Fix: Remote backend (S3+DynamoDB, Terraform Cloud, GCS)
症状
terraform.tfstate
提交到Git或存储在本地 修复方案:使用远程后端(S3+DynamoDB、Terraform Cloud、GCS)

11. No Concurrency Control

11. 无并发控制

Symptom: Multiple CI runs for same branch, deployment race conditions Fix: Use concurrency groups, implement deployment locks
症状:同一分支存在多个CI运行,部署出现竞态条件 修复方案:使用并发组,实现部署锁

12. Ignoring Security Scanning

12. 忽略安全扫描

Symptom: No vulnerability scanning, no secret detection in CI Fix: Trivy, Snyk, or Grype for vulnerabilities; TruffleHog for secrets
症状:无漏洞扫描,CI中无密钥检测 修复方案:使用Trivy、Snyk或Grype进行漏洞扫描;使用TruffleHog检测密钥

13. No Drift Detection

13. 无漂移检测

Symptom: Manual changes to infrastructure, config diverges from code Fix: ArgoCD diff detection,
terraform plan
in CI, regular audits
症状:手动修改基础设施,配置与代码不一致 修复方案:ArgoCD差异检测、CI中执行
terraform plan
、定期审计

14. Overly Permissive IAM

14. 过度宽松的IAM权限

Symptom: IAM roles with
*
actions, service accounts with cluster-admin Fix: Principle of least privilege, IRSA for pods, audit permissions
症状:IAM角色拥有
*
操作权限,服务账号拥有cluster-admin权限 修复方案:遵循最小权限原则,为Pod使用IRSA,审计权限

15. No Observability

15. 无可观测性

Symptom: No metrics, logs only on stdout, no alerting Fix: Export metrics, structured logging, define SLOs, configure alerts
症状:无指标,仅在标准输出打印日志,无告警 修复方案:导出指标、结构化日志、定义SLO、配置告警

Validation Script

验证脚本

Run
./scripts/validate-devops-skill.sh
to check:
  • GitHub Actions workflows for deprecated actions, missing caching
  • Dockerfiles for security best practices
  • Kubernetes manifests for resource limits, security contexts
  • Terraform for version constraints, sensitive defaults
运行
./scripts/validate-devops-skill.sh
检查以下内容:
  • GitHub Actions工作流中的已弃用动作、缺失缓存配置
  • Dockerfile的安全最佳实践
  • Kubernetes清单中的资源限制、安全上下文
  • Terraform的版本约束、敏感默认配置

Quality Checklist

质量检查清单

[ ] All secrets in secret management (not in code)
[ ] Resource limits defined for all containers
[ ] Health checks configured (liveness, readiness)
[ ] Horizontal pod autoscaling enabled
[ ] Security contexts set (non-root, read-only)
[ ] Monitoring and alerting configured
[ ] Rollback strategy documented
[ ] Multi-environment support (dev, staging, prod)
[ ] Concurrency controls in CI pipelines
[ ] Remote state backend for Terraform
[ ] Vulnerability scanning in pipeline
[ ] Version pinning for all dependencies
[ ] 所有密钥存储在密钥管理器中(而非代码中)
[ ] 为所有容器定义资源限制
[ ] 配置健康检查(存活、就绪)
[ ] 启用水平Pod自动扩缩容
[ ] 设置安全上下文(非Root用户、只读文件系统)
[ ] 配置监控与告警
[ ] 记录回滚策略
[ ] 支持多环境(开发、 staging、生产)
[ ] CI流水线中配置并发控制
[ ] Terraform使用远程状态后端
[ ] 流水线中包含漏洞扫描
[ ] 所有依赖固定版本

Output Artifacts

输出产物

  1. CI/CD Workflows - GitHub Actions, GitLab CI configs
  2. Terraform Modules - Reusable infrastructure components
  3. Kubernetes Manifests - Deployments, services, configs
  4. Helm Charts - Packaged applications
  5. Docker Configurations - Optimized multi-stage builds
  6. ArgoCD Applications - GitOps deployment definitions
  1. CI/CD工作流 - GitHub Actions、GitLab CI配置文件
  2. Terraform模块 - 可复用的基础设施组件
  3. Kubernetes清单 - 部署、服务、配置文件
  4. Helm Chart - 打包的应用
  5. Docker配置 - 优化的多阶段构建文件
  6. ArgoCD应用 - GitOps部署定义

Tools Available

可用工具

  • Read
    ,
    Write
    ,
    Edit
    - File operations for configs and manifests
  • Bash(docker:*)
    - Build and manage containers
  • Bash(kubectl:*)
    - Kubernetes operations
  • Bash(terraform:*)
    - Infrastructure provisioning
  • Bash(helm:*)
    - Helm chart management
  • Bash(gh:*)
    - GitHub CLI operations
  • Read
    ,
    Write
    ,
    Edit
    - 配置文件与清单的文件操作
  • Bash(docker:*)
    - 构建与管理容器
  • Bash(kubectl:*)
    - Kubernetes操作
  • Bash(terraform:*)
    - 基础设施部署
  • Bash(helm:*)
    - Helm Chart管理
  • Bash(gh:*)
    - GitHub CLI操作