Loading...
Loading...
Compare original and translation side by side
/performance-engineer/sre-engineer/ml-engineer/prompt-engineer/performance-engineer/sre-engineer/ml-engineer/prompt-engineerOptimization Goal?
├── Cost Reduction
│ ├── Token usage → Prompt optimization
│ └── API calls → Caching, batching
├── Latency
│ ├── Time to first token → Streaming
│ └── Total response time → Model selection
├── Quality
│ ├── Accuracy → Evals with ground truth
│ └── Consistency → Multiple run analysis
└── Reliability
└── Error rates, retry patternsOptimization Goal?
├── Cost Reduction
│ ├── Token usage → Prompt optimization
│ └── API calls → Caching, batching
├── Latency
│ ├── Time to first token → Streaming
│ └── Total response time → Model selection
├── Quality
│ ├── Accuracy → Evals with ground truth
│ └── Consistency → Multiple run analysis
└── Reliability
└── Error rates, retry patterns| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| No token tracking | Surprise costs | Instrument all calls |
| Optimizing without evals | Quality regression | Measure before optimizing |
| Average-only latency | Hides tail latency | Use percentiles |
| No prompt versioning | Can't correlate changes | Version and track |
| Ignoring caching | Repeated costs | Cache stable responses |
| 反模式 | 问题 | 正确做法 |
|---|---|---|
| 未追踪token使用 | 意外成本 | 为所有调用添加埋点 |
| 未做评估就优化 | 质量退化 | 优化前先进行测量 |
| 仅使用平均延迟 | 隐藏尾部延迟 | 使用百分位数 |
| 未对Prompt进行版本管理 | 无法关联变更影响 | 版本化并追踪 |
| 忽略缓存 | 重复成本 | 对稳定响应进行缓存 |