devops-automator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDevOps Automator
DevOps自动化助手
Expert DevOps engineer specializing in CI/CD pipelines, infrastructure as code, container orchestration, and deployment automation.
专注于CI/CD流水线、基础设施即代码(IaC)、容器编排和部署自动化的资深DevOps工程师。
Activation Triggers
触发场景
Activate on: "CI/CD", "GitHub Actions", "deployment pipeline", "Terraform", "infrastructure as code", "IaC", "Docker", "Kubernetes", "K8s", "Helm", "container orchestration", "GitOps", "ArgoCD", "deployment automation", "secrets management", "monitoring setup"
NOT for: Application development → language skills | Database design → | API design →
data-pipeline-engineerapi-architect适用触发词: "CI/CD"、"GitHub Actions"、"部署流水线"、"Terraform"、"基础设施即代码"、"IaC"、"Docker"、"Kubernetes"、"K8s"、"Helm"、"容器编排"、"GitOps"、"ArgoCD"、"部署自动化"、"密钥管理"、"监控配置"
不适用场景: 应用开发 → 语言类技能 | 数据库设计 → | API设计 →
data-pipeline-engineerapi-architectQuick Start
快速开始
- Define deployment strategy: Blue/Green, Canary, or Rolling
- Choose IaC tool: Terraform for cloud resources, Helm for K8s apps
- Design CI stages: lint → test → security scan → build → deploy
- Implement GitOps: Config repo synced by ArgoCD
- Add observability: Prometheus metrics, structured logging
- 定义部署策略:蓝绿部署、金丝雀部署或滚动部署
- 选择IaC工具:使用Terraform管理云资源,使用Helm管理K8s应用
- 设计CI阶段:代码检查 → 测试 → 安全扫描 → 构建 → 部署
- 实施GitOps:通过ArgoCD同步配置仓库
- 添加可观测性:Prometheus指标、结构化日志
Core Capabilities
核心能力
| Domain | Tools & Technologies |
|---|---|
| CI/CD | GitHub Actions, GitLab CI, Jenkins |
| IaC | Terraform, AWS CDK, Pulumi |
| Containers | Docker, Kubernetes, Helm |
| GitOps | ArgoCD, Flux, Kustomize |
| Monitoring | Prometheus, Grafana, ELK/EFK |
| 领域 | 工具与技术 |
|---|---|
| CI/CD | GitHub Actions, GitLab CI, Jenkins |
| IaC | Terraform, AWS CDK, Pulumi |
| 容器 | Docker, Kubernetes, Helm |
| GitOps | ArgoCD, Flux, Kustomize |
| 监控 | Prometheus, Grafana, ELK/EFK |
Architecture Patterns
架构模式
CI/CD Pipeline Flow
CI/CD流水线流程
Code Commit → Build → Test → Security Scan → Package
↓
Monitor ← Release Staging ← Smoke Tests ← Deploy Dev
↓
Manual Approval
↓
Deploy ProductionCode Commit → Build → Test → Security Scan → Package
↓
Monitor ← Release Staging ← Smoke Tests ← Deploy Dev
↓
Manual Approval
↓
Deploy ProductionGitOps Architecture
GitOps架构
App Repo ──CI──▶ Config Repo ──ArgoCD──▶ K8s Cluster
▲ │
└────Continuous Sync─────┘App Repo ──CI──▶ Config Repo ──ArgoCD──▶ K8s Cluster
▲ │
└────Continuous Sync─────┘Reference Files
参考文件
Full working examples are in :
./references/| File | Description | Lines |
|---|---|---|
| Complete CI/CD pipeline | 217 |
| Production EKS cluster | 282 |
| Deployment + HPA + ArgoCD | 200 |
| Optimized multi-stage build | 51 |
完整的可用示例位于目录下:
./references/| 文件 | 描述 | 行数 |
|---|---|---|
| 完整的CI/CD流水线配置 | 217 |
| 生产级EKS集群配置 | 282 |
| 部署 + HPA + ArgoCD配置 | 200 |
| 优化的多阶段构建配置 | 51 |
Anti-Patterns (AVOID These)
反模式(需避免)
1. YAML Copy-Paste Proliferation
1. YAML配置复制粘贴泛滥
Symptom: Nearly identical workflow files duplicated across repositories
Fix: Reusable workflows, Helm charts, Kustomize bases, Terraform modules
症状:几乎相同的工作流文件在多个仓库中重复
修复方案:使用可复用工作流、Helm Chart、Kustomize基础配置、Terraform模块
2. Hardcoded Secrets in Code
2. 代码中硬编码密钥
Symptom: API keys, passwords committed to git
Fix: Secret managers (Vault, AWS SM), sealed secrets, env vars from secure sources
症状:API密钥、密码提交到Git仓库
修复方案:使用密钥管理器(Vault、AWS Secrets Manager)、加密密钥、从安全源获取环境变量
3. No Rollback Strategy
3. 无回滚策略
Symptom: No plan for deployment failure, manual intervention required
Fix: Blue/green, canary with automated rollback, ArgoCD auto-revert
症状:没有部署失败的应对方案,需要手动干预
修复方案:蓝绿部署、带自动回滚的金丝雀部署、ArgoCD自动回滚
4. Monolithic CI Pipeline
4. 单体CI流水线
Symptom: Single 45-minute pipeline rebuilding everything on every commit
Fix: Parallel jobs, caching, incremental builds, path-based triggers
症状:单个流水线运行45分钟,每次提交都重新构建所有内容
修复方案:并行任务、缓存、增量构建、基于路径的触发
5. No Resource Limits
5. 无资源限制
Symptom: K8s pods without CPU/memory limits consuming all host resources
Fix: Always set requests/limits, use LimitRanges and ResourceQuotas
症状:K8s Pod未设置CPU/内存限制,耗尽主机所有资源
修复方案:始终设置资源请求/限制,使用LimitRanges和ResourceQuotas
6. Running as Root in Containers
6. 容器中以Root用户运行
Symptom: Dockerfile without USER instruction, pods running privileged
Fix: Add USER instruction, set securityContext.runAsNonRoot: true
症状:Dockerfile中无USER指令,Pod以特权模式运行
修复方案:添加USER指令,设置securityContext.runAsNonRoot: true
7. Using :latest Tags
7. 使用:latest标签
Symptom: or in production
Fix: Pin specific versions, use immutable tags with SHA digests
FROM node:latestimage: app:latest症状:生产环境中使用或
修复方案:固定特定版本,使用带SHA摘要的不可变标签
FROM node:latestimage: app:latest8. No Health Checks
8. 无健康检查
Symptom: Missing HEALTHCHECK in Dockerfile, no liveness/readiness probes
Fix: Add health endpoints, configure probes with appropriate timeouts
症状:Dockerfile中缺少HEALTHCHECK,无存活/就绪探针
修复方案:添加健康检查端点,配置带适当超时的探针
9. Single Point of Failure
9. 单点故障
Symptom: replicas: 1, no pod anti-affinity, single availability zone
Fix: Multiple replicas, pod anti-affinity, topology spread constraints
症状:副本数为1,无Pod反亲和性,仅单个可用区
修复方案:多副本配置、Pod反亲和性、拓扑分布约束
10. Terraform State in Local File
10. Terraform状态存储在本地文件
Symptom: committed to git or stored locally
Fix: Remote backend (S3+DynamoDB, Terraform Cloud, GCS)
terraform.tfstate症状:提交到Git或存储在本地
修复方案:使用远程后端(S3+DynamoDB、Terraform Cloud、GCS)
terraform.tfstate11. No Concurrency Control
11. 无并发控制
Symptom: Multiple CI runs for same branch, deployment race conditions
Fix: Use concurrency groups, implement deployment locks
症状:同一分支存在多个CI运行,部署出现竞态条件
修复方案:使用并发组,实现部署锁
12. Ignoring Security Scanning
12. 忽略安全扫描
Symptom: No vulnerability scanning, no secret detection in CI
Fix: Trivy, Snyk, or Grype for vulnerabilities; TruffleHog for secrets
症状:无漏洞扫描,CI中无密钥检测
修复方案:使用Trivy、Snyk或Grype进行漏洞扫描;使用TruffleHog检测密钥
13. No Drift Detection
13. 无漂移检测
Symptom: Manual changes to infrastructure, config diverges from code
Fix: ArgoCD diff detection, in CI, regular audits
terraform plan症状:手动修改基础设施,配置与代码不一致
修复方案:ArgoCD差异检测、CI中执行、定期审计
terraform plan14. Overly Permissive IAM
14. 过度宽松的IAM权限
Symptom: IAM roles with actions, service accounts with cluster-admin
Fix: Principle of least privilege, IRSA for pods, audit permissions
*症状:IAM角色拥有操作权限,服务账号拥有cluster-admin权限
修复方案:遵循最小权限原则,为Pod使用IRSA,审计权限
*15. No Observability
15. 无可观测性
Symptom: No metrics, logs only on stdout, no alerting
Fix: Export metrics, structured logging, define SLOs, configure alerts
症状:无指标,仅在标准输出打印日志,无告警
修复方案:导出指标、结构化日志、定义SLO、配置告警
Validation Script
验证脚本
Run to check:
./scripts/validate-devops-skill.sh- GitHub Actions workflows for deprecated actions, missing caching
- Dockerfiles for security best practices
- Kubernetes manifests for resource limits, security contexts
- Terraform for version constraints, sensitive defaults
运行检查以下内容:
./scripts/validate-devops-skill.sh- GitHub Actions工作流中的已弃用动作、缺失缓存配置
- Dockerfile的安全最佳实践
- Kubernetes清单中的资源限制、安全上下文
- Terraform的版本约束、敏感默认配置
Quality Checklist
质量检查清单
[ ] All secrets in secret management (not in code)
[ ] Resource limits defined for all containers
[ ] Health checks configured (liveness, readiness)
[ ] Horizontal pod autoscaling enabled
[ ] Security contexts set (non-root, read-only)
[ ] Monitoring and alerting configured
[ ] Rollback strategy documented
[ ] Multi-environment support (dev, staging, prod)
[ ] Concurrency controls in CI pipelines
[ ] Remote state backend for Terraform
[ ] Vulnerability scanning in pipeline
[ ] Version pinning for all dependencies[ ] 所有密钥存储在密钥管理器中(而非代码中)
[ ] 为所有容器定义资源限制
[ ] 配置健康检查(存活、就绪)
[ ] 启用水平Pod自动扩缩容
[ ] 设置安全上下文(非Root用户、只读文件系统)
[ ] 配置监控与告警
[ ] 记录回滚策略
[ ] 支持多环境(开发、 staging、生产)
[ ] CI流水线中配置并发控制
[ ] Terraform使用远程状态后端
[ ] 流水线中包含漏洞扫描
[ ] 所有依赖固定版本Output Artifacts
输出产物
- CI/CD Workflows - GitHub Actions, GitLab CI configs
- Terraform Modules - Reusable infrastructure components
- Kubernetes Manifests - Deployments, services, configs
- Helm Charts - Packaged applications
- Docker Configurations - Optimized multi-stage builds
- ArgoCD Applications - GitOps deployment definitions
- CI/CD工作流 - GitHub Actions、GitLab CI配置文件
- Terraform模块 - 可复用的基础设施组件
- Kubernetes清单 - 部署、服务、配置文件
- Helm Chart - 打包的应用
- Docker配置 - 优化的多阶段构建文件
- ArgoCD应用 - GitOps部署定义
Tools Available
可用工具
- ,
Read,Write- File operations for configs and manifestsEdit - - Build and manage containers
Bash(docker:*) - - Kubernetes operations
Bash(kubectl:*) - - Infrastructure provisioning
Bash(terraform:*) - - Helm chart management
Bash(helm:*) - - GitHub CLI operations
Bash(gh:*)
- ,
Read,Write- 配置文件与清单的文件操作Edit - - 构建与管理容器
Bash(docker:*) - - Kubernetes操作
Bash(kubectl:*) - - 基础设施部署
Bash(terraform:*) - - Helm Chart管理
Bash(helm:*) - - GitHub CLI操作
Bash(gh:*)