performance-monitor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePerformance Monitor
性能监控
Purpose
用途
Provides expertise in monitoring, benchmarking, and optimizing AI agent performance. Specializes in token usage tracking, latency analysis, cost optimization, and implementing quality evaluation metrics (evals) for AI systems.
提供AI Agent性能监控、基准测试与优化的专业能力。专注于AI系统的token使用追踪、延迟分析、成本优化以及质量评估指标(evals)的实施。
When to Use
适用场景
- Tracking token usage and costs for AI agents
- Measuring and optimizing agent latency
- Implementing evaluation metrics (evals)
- Benchmarking agent quality and accuracy
- Optimizing agent cost efficiency
- Building observability for AI pipelines
- Analyzing agent conversation patterns
- Setting up A/B testing for agents
- 追踪AI Agent的token使用情况及成本
- 衡量并优化Agent延迟
- 实施评估指标(evals)
- 对Agent的质量与准确性进行基准测试
- 优化Agent的成本效率
- 为AI流水线搭建可观测性能力
- 分析Agent对话模式
- 为Agent设置A/B测试
Quick Start
快速入门
Invoke this skill when:
- Optimizing AI agent costs and token usage
- Measuring agent latency and performance
- Implementing evaluation frameworks
- Building observability for AI systems
- Benchmarking agent quality
Do NOT invoke when:
- General application performance → use
/performance-engineer - Infrastructure monitoring → use
/sre-engineer - ML model training optimization → use
/ml-engineer - Prompt design → use
/prompt-engineer
调用此技能的场景:
- 优化AI Agent的成本与token使用
- 衡量Agent的延迟与性能
- 实施评估框架
- 为AI系统搭建可观测性能力
- 对Agent质量进行基准测试
请勿调用此技能的场景:
- 通用应用性能问题 → 使用
/performance-engineer - 基础设施监控 → 使用
/sre-engineer - 机器学习模型训练优化 → 使用
/ml-engineer - Prompt设计 → 使用
/prompt-engineer
Decision Framework
决策框架
Optimization Goal?
├── Cost Reduction
│ ├── Token usage → Prompt optimization
│ └── API calls → Caching, batching
├── Latency
│ ├── Time to first token → Streaming
│ └── Total response time → Model selection
├── Quality
│ ├── Accuracy → Evals with ground truth
│ └── Consistency → Multiple run analysis
└── Reliability
└── Error rates, retry patternsOptimization Goal?
├── Cost Reduction
│ ├── Token usage → Prompt optimization
│ └── API calls → Caching, batching
├── Latency
│ ├── Time to first token → Streaming
│ └── Total response time → Model selection
├── Quality
│ ├── Accuracy → Evals with ground truth
│ └── Consistency → Multiple run analysis
└── Reliability
└── Error rates, retry patternsCore Workflows
核心工作流
1. Token Usage Tracking
1. Token使用追踪
- Instrument API calls to capture usage
- Track input vs output tokens separately
- Aggregate by agent, task, user
- Calculate costs per operation
- Build dashboards for visibility
- Set alerts for anomalous usage
- 为API调用添加埋点以捕获使用数据
- 分别追踪输入与输出token
- 按Agent、任务、用户维度聚合数据
- 计算每个操作的成本
- 搭建可视化仪表盘
- 为异常使用设置告警
2. Eval Framework Setup
2. 评估框架搭建
- Define evaluation criteria
- Create test dataset with expected outputs
- Implement scoring functions
- Run automated eval pipeline
- Track scores over time
- Use for regression testing
- 定义评估标准
- 创建带有预期输出的测试数据集
- 实现评分函数
- 运行自动化评估流水线
- 随时间追踪评分变化
- 用于回归测试
3. Latency Optimization
3. 延迟优化
- Measure baseline latency
- Identify bottlenecks (model, network, parsing)
- Implement streaming where applicable
- Optimize prompt length
- Consider model size tradeoffs
- Add caching for repeated queries
- 测量基准延迟
- 识别瓶颈(模型、网络、解析环节)
- 按需实现流式传输
- 优化Prompt长度
- 考虑模型尺寸的权衡
- 为重复查询添加缓存
Best Practices
最佳实践
- Track tokens separately from API call counts
- Implement evals before optimizing
- Use percentiles (p50, p95, p99) not averages for latency
- Log prompt and response for debugging
- Set cost budgets and alerts
- Version prompts and track performance per version
- 将token使用与API调用次数分开追踪
- 在优化前先实施评估
- 延迟指标使用百分位数(p50、p95、p99)而非平均值
- 记录Prompt与响应以用于调试
- 设置成本预算与告警
- 对Prompt进行版本管理,并追踪每个版本的性能
Anti-Patterns
反模式
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| No token tracking | Surprise costs | Instrument all calls |
| Optimizing without evals | Quality regression | Measure before optimizing |
| Average-only latency | Hides tail latency | Use percentiles |
| No prompt versioning | Can't correlate changes | Version and track |
| Ignoring caching | Repeated costs | Cache stable responses |
| 反模式 | 问题 | 正确做法 |
|---|---|---|
| 未追踪token使用 | 意外成本 | 为所有调用添加埋点 |
| 未做评估就优化 | 质量退化 | 优化前先进行测量 |
| 仅使用平均延迟 | 隐藏尾部延迟 | 使用百分位数 |
| 未对Prompt进行版本管理 | 无法关联变更影响 | 版本化并追踪 |
| 忽略缓存 | 重复成本 | 对稳定响应进行缓存 |