agentic-engineering
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgentic Engineering
Agentic 工程
Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
本技能适用于AI Agent承担大部分实现工作、人类负责质量与风险管控的工程工作流。
Operating Principles
运作原则
- Define completion criteria before execution.
- Decompose work into agent-sized units.
- Route model tiers by task complexity.
- Measure with evals and regression checks.
- 执行前先定义完成标准
- 将工作拆解为适合Agent处理的单元
- 根据任务复杂度匹配不同层级的模型
- 通过评估和回归检查来衡量效果
Eval-First Loop
评估优先循环
- Define capability eval and regression eval.
- Run baseline and capture failure signatures.
- Execute implementation.
- Re-run evals and compare deltas.
- 定义能力评估和回归评估规则
- 运行基准测试并记录失败特征
- 执行代码实现
- 重新运行评估并对比差异
Task Decomposition
任务拆解
Apply the 15-minute unit rule:
- each unit should be independently verifiable
- each unit should have a single dominant risk
- each unit should expose a clear done condition
遵循15分钟单元规则:
- 每个单元都可独立验证
- 每个单元仅存在一个主要风险
- 每个单元有明确的完成条件
Model Routing
模型路由
- Haiku: classification, boilerplate transforms, narrow edits
- Sonnet: implementation and refactors
- Opus: architecture, root-cause analysis, multi-file invariants
- Haiku:分类、样板代码转换、小范围修改
- Sonnet:代码实现与重构
- Opus:架构设计、根因分析、多文件一致性校验
Session Strategy
会话策略
- Continue session for closely-coupled units.
- Start fresh session after major phase transitions.
- Compact after milestone completion, not during active debugging.
- 关联性强的工作单元延续同一会话
- 重大阶段转换后开启全新会话
- 里程碑完成后再压缩会话,活跃调试期间不要压缩
Review Focus for AI-Generated Code
AI生成代码的审核重点
Prioritize:
- invariants and edge cases
- error boundaries
- security and auth assumptions
- hidden coupling and rollout risk
Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
优先关注:
- 一致性规则与边界情况
- 错误边界处理
- 安全与鉴权假设
- 隐式耦合和上线风险
如果已有自动化格式化/lint工具强制规范代码风格,不要在仅关乎风格的分歧上浪费审核精力。
Cost Discipline
成本管控
Track per task:
- model
- token estimate
- retries
- wall-clock time
- success/failure
Escalate model tier only when lower tier fails with a clear reasoning gap.
每个任务都要追踪:
- 使用的模型
- Token预估消耗量
- 重试次数
- 实际耗时
- 成功/失败状态
仅当低层级模型因明确的推理能力不足失败时,再升级使用更高层级的模型。