performance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePerformance Engineering
性能工程
Evidence-based performance optimization → measure → profile → optimize → validate.
<when_to_use>
- Profiling slow code paths or bottlenecks
- Identifying memory leaks or excessive allocations
- Optimizing latency-critical operations (P95, P99)
- Benchmarking competing implementations
- Database query optimization
- Reducing CPU usage in hot paths
- Improving throughput (RPS, ops/sec)
NOT for: premature optimization, optimization without measurement, guessing at bottlenecks
</when_to_use>
<iron_law>
NO OPTIMIZATION WITHOUT MEASUREMENT
Required workflow:
- Measure baseline performance with realistic workload
- Profile to identify actual bottleneck
- Optimize the bottleneck (not what you think is slow)
- Measure again to verify improvement
- Document gains and tradeoffs
Optimizing unmeasured code wastes time and introduces bugs.
</iron_law>
<stages>
Load the maintain-tasks skill for stage tracking:
Stage 1: Establishing baseline
- content: "Establish performance baseline with realistic workload"
- activeForm: "Establishing performance baseline"
Stage 2: Profiling bottlenecks
- content: "Profile code to identify actual bottlenecks"
- activeForm: "Profiling code to identify bottlenecks"
Stage 3: Analyzing root cause
- content: "Analyze profiling data to determine root cause"
- activeForm: "Analyzing profiling data"
Stage 4: Implementing optimization
- content: "Implement targeted optimization for identified bottleneck"
- activeForm: "Implementing optimization"
Stage 5: Validating improvement
- content: "Measure performance gains and verify no regressions"
- activeForm: "Validating performance improvement"
基于实证的性能优化:测量→性能分析→优化→验证。
<when_to_use>
- 对缓慢的代码路径或瓶颈进行性能分析
- 识别内存泄漏或过度内存分配问题
- 优化对延迟敏感的操作(P95、P99)
- 对竞品实现进行基准测试
- 数据库查询优化
- 降低热点路径的CPU使用率
- 提升吞吐量(RPS、每秒操作数)
不适用于:过早优化、无测量依据的优化、凭猜测判断瓶颈
</when_to_use>
<iron_law>
无测量,不优化
必备工作流:
- 基于真实工作负载测量基准性能
- 进行性能分析以定位实际瓶颈
- 针对瓶颈进行优化(而非你认为缓慢的部分)
- 再次测量以验证优化效果
- 记录性能提升和权衡点
对未测量的代码进行优化会浪费时间并引入bug。
</iron_law>
<stages>
加载maintain-tasks技能以进行阶段跟踪:
阶段1:建立基准
- content: "基于真实工作负载建立性能基准"
- activeForm: "正在建立性能基准"
阶段2:分析瓶颈
- content: "对代码进行性能分析以定位实际瓶颈"
- activeForm: "正在对代码进行性能分析以定位瓶颈"
阶段3:分析根本原因
- content: "分析性能分析数据以确定根本原因"
- activeForm: "正在分析性能分析数据"
阶段4:实施优化
- content: "针对已定位的瓶颈实施定向优化"
- activeForm: "正在实施优化"
阶段5:验证优化效果
- content: "测量性能提升并验证无回归问题"
- activeForm: "正在验证性能提升效果"
Key Performance Indicators
关键性能指标
Latency (response time):
- P50 (median) — typical case
- P95 — most users
- P99 — tail latency
- P99.9 — outliers
- TTFB — time to first byte
- TTLB — time to last byte
Throughput:
- RPS — requests per second
- ops/sec — operations per second
- bytes/sec — data transfer rate
- queries/sec — database throughput
Memory:
- Heap usage — allocated memory
- GC frequency — garbage collection pauses
- GC duration — stop-the-world time
- Allocation rate — memory churn
- Resident set size (RSS) — total memory
CPU:
- CPU time — total compute
- Wall time — elapsed time
- Hot paths — frequently executed code
- Time complexity — algorithmic efficiency
- CPU utilization — percentage used
Always measure:
- Before optimization (baseline)
- After optimization (improvement)
- Under realistic load (not toy data)
- Multiple runs (account for variance)
<profiling_tools>
延迟(响应时间):
- P50(中位数)——典型场景
- P95——覆盖多数用户
- P99——尾部延迟
- P99.9——极端异常值
- TTFB——首字节响应时间
- TTLB——末字节响应时间
吞吐量:
- RPS——每秒请求数
- ops/sec——每秒操作数
- bytes/sec——数据传输速率
- queries/sec——数据库吞吐量
内存:
- 堆内存使用量——已分配内存
- GC频率——垃圾回收暂停次数
- GC持续时间——全局停顿时间
- 分配速率——内存周转量
- 常驻内存集大小(RSS)——总内存占用
CPU:
- CPU时间——总计算耗时
- 挂钟时间——实际流逝时间
- 热点路径——频繁执行的代码
- 时间复杂度——算法效率
- CPU利用率——占用百分比
必须测量的场景:
- 优化前(基准)
- 优化后(性能提升)
- 真实负载下(而非测试用例数据)
- 多次运行(考虑方差)
<profiling_tools>
TypeScript/Bun
TypeScript/Bun
Built-in timing:
typescript
console.time('operation')
// ... code to measure
console.timeEnd('operation')
// High precision
const start = Bun.nanoseconds()
// ... code to measure
const elapsed = Bun.nanoseconds() - start
console.log(`Took ${elapsed / 1_000_000}ms`)Performance API:
typescript
const mark1 = performance.mark('start')
// ... code to measure
const mark2 = performance.mark('end')
performance.measure('operation', 'start', 'end')
const measure = performance.getEntriesByName('operation')[0]
console.log(`Duration: ${measure.duration}ms`)Memory profiling:
- Chrome DevTools → Memory tab → heap snapshots
- Node.js flag + Chrome DevTools
--inspect - for RSS/heap tracking
process.memoryUsage()
CPU profiling:
- Chrome DevTools → Performance tab → record session
- Node.js flag +
--profnode --prof-process - Flamegraphs for visualization
内置计时工具:
typescript
console.time('operation')
// ... code to measure
console.timeEnd('operation')
// High precision
const start = Bun.nanoseconds()
// ... code to measure
const elapsed = Bun.nanoseconds() - start
console.log(`Took ${elapsed / 1_000_000}ms`)Performance API:
typescript
const mark1 = performance.mark('start')
// ... code to measure
const mark2 = performance.mark('end')
performance.measure('operation', 'start', 'end')
const measure = performance.getEntriesByName('operation')[0]
console.log(`Duration: ${measure.duration}ms`)内存分析:
- Chrome DevTools → 内存标签页 → 堆快照
- Node.js 标记 + Chrome DevTools
--inspect - 用于跟踪RSS/堆内存
process.memoryUsage()
CPU分析:
- Chrome DevTools → 性能标签页 → 录制会话
- Node.js 标记 +
--profnode --prof-process - 火焰图可视化
Rust
Rust
Benchmarking:
rust
#[cfg(test)]
mod benches {
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_function(c: &mut Criterion) {
c.bench_function("my_function", |b| {
b.iter(|| my_function(black_box(42)))
});
}
criterion_group!(benches, benchmark_function);
criterion_main!(benches);
}Profiling:
- — criterion benchmarks
cargo bench - +
perf record— Linux profilingperf report - — visual flamegraphs
cargo flamegraph - — binary size analysis
cargo bloat - — detailed profiling
valgrind --tool=callgrind - — memory profiling
heaptrack
Instrumentation:
rust
use std::time::Instant;
let start = Instant::now();
// ... code to measure
let duration = start.elapsed();
println!("Took: {:?}", duration);</profiling_tools>
<optimization_patterns>
基准测试:
rust
#[cfg(test)]
mod benches {
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_function(c: &mut Criterion) {
c.bench_function("my_function", |b| {
b.iter(|| my_function(black_box(42)))
});
}
criterion_group!(benches, benchmark_function);
criterion_main!(benches);
}性能分析:
- —— criterion基准测试
cargo bench - +
perf record—— Linux性能分析perf report - —— 可视化火焰图
cargo flamegraph - —— 二进制大小分析
cargo bloat - —— 详细性能分析
valgrind --tool=callgrind - —— 内存分析
heaptrack
插桩计时:
rust
use std::time::Instant;
let start = Instant::now();
// ... code to measure
let duration = start.elapsed();
println!("Took: {:?}", duration);</profiling_tools>
<optimization_patterns>
Algorithm Improvements
算法优化
Time complexity:
- O(n²) → O(n log n) — sorting, searching
- O(n) → O(log n) — binary search, trees
- O(n) → O(1) — hash maps, memoization
Space-time tradeoffs:
- Cache computed results (memoization)
- Precompute expensive operations
- Index data for faster lookup
- Use hash maps for O(1) access
时间复杂度优化:
- O(n²) → O(n log n) —— 排序、搜索
- O(n) → O(log n) —— 二分查找、树结构
- O(n) → O(1) —— 哈希表、记忆化
时空权衡:
- 缓存计算结果(记忆化)
- 预计算耗时操作
- 为数据建立索引以加快查找
- 使用哈希表实现O(1)访问
Memory Optimization
内存优化
Reduce allocations:
typescript
// Bad: creates new array each iteration
for (const item of items) {
const results = []
results.push(process(item))
}
// Good: reuse array
const results = []
for (const item of items) {
results.push(process(item))
}rust
// Bad: allocates String every time
fn format_user(name: &str) -> String {
format!("User: {}", name)
}
// Good: reuses buffer
fn format_user(name: &str, buf: &mut String) {
buf.clear();
buf.push_str("User: ");
buf.push_str(name);
}Memory pooling:
- Reuse expensive objects (connections, buffers)
- Object pools for frequently allocated types
- Arena allocators for batch allocations
Lazy evaluation:
- Compute only when needed
- Stream processing vs loading all data
- Iterators over materialized collections
减少内存分配:
typescript
// 不佳:每次迭代创建新数组
for (const item of items) {
const results = []
results.push(process(item))
}
// 优化:复用数组
const results = []
for (const item of items) {
results.push(process(item))
}rust
// 不佳:每次分配新String
fn format_user(name: &str) -> String {
format!("User: {}", name)
}
// 优化:复用缓冲区
fn format_user(name: &str, buf: &mut String) {
buf.clear();
buf.push_str("User: ");
buf.push_str(name);
}内存池:
- 复用昂贵对象(连接、缓冲区)
- 为频繁分配的类型使用对象池
- 为批量分配使用区域分配器
延迟计算:
- 仅在需要时进行计算
- 流处理而非加载全部数据
- 使用迭代器而非物化集合
I/O Optimization
I/O优化
Batching:
- Batch API calls (1 request vs 100)
- Batch database writes (bulk insert)
- Batch file operations (single write vs many)
Caching:
- Cache expensive computations
- Cache database queries (Redis, in-memory)
- Cache API responses (HTTP caching)
- Invalidate stale cache entries
Async I/O:
- Non-blocking operations (async/await)
- Concurrent requests (Promise.all, tokio::spawn)
- Connection pooling (reuse connections)
批量处理:
- 批量API调用(1次请求 vs 100次)
- 批量数据库写入(批量插入)
- 批量文件操作(单次写入 vs 多次写入)
缓存:
- 缓存耗时计算结果
- 缓存数据库查询(Redis、内存缓存)
- 缓存API响应(HTTP缓存)
- 失效过期缓存条目
异步I/O:
- 非阻塞操作(async/await)
- 并发请求(Promise.all、tokio::spawn)
- 连接池(复用连接)
Database Optimization
数据库优化
Query optimization:
- Add indexes for common queries
- Use EXPLAIN/EXPLAIN ANALYZE
- Avoid N+1 queries (use joins or batch loading)
- Select only needed columns
- Filter at database level (WHERE vs client filter)
Schema design:
- Normalize to reduce duplication
- Denormalize for read-heavy workloads
- Partition large tables
- Use appropriate data types
Connection management:
- Connection pooling (don't create per request)
- Prepared statements (avoid SQL parsing)
- Transaction batching (reduce round trips)
</optimization_patterns>
<workflow>
Loop: Measure → Profile → Analyze → Optimize → Validate
- Define performance goal — target metric (e.g., P95 < 100ms)
- Establish baseline — measure current performance under realistic load
- Profile systematically — identify actual bottleneck (not guesses)
- Analyze root cause — understand why code is slow
- Design optimization — plan targeted improvement
- Implement optimization — make focused change
- Measure improvement — verify gains, check for regressions
- Document results — record baseline, optimization, gains, tradeoffs
At each step:
- Document measurements with methodology
- Note profiling tool output
- Track optimization attempts (what worked/failed)
- Update performance documentation
Before declaring optimization complete:
Check gains:
- ✓ Measured improvement meets target?
- ✓ Improvement statistically significant?
- ✓ Tested under realistic load?
- ✓ Multiple runs confirm consistency?
Check regressions:
- ✓ No degradation in other metrics?
- ✓ Memory usage still acceptable?
- ✓ Code complexity still manageable?
- ✓ Tests still pass?
Check documentation:
- ✓ Baseline measurements recorded?
- ✓ Optimization approach explained?
- ✓ Gains quantified with numbers?
- ✓ Tradeoffs documented?
ALWAYS:
- Measure before optimizing (baseline)
- Profile to find actual bottleneck
- Use realistic workload (not toy data)
- Measure multiple runs (account for variance)
- Document baseline and improvements
- Check for regressions in other metrics
- Consider readability vs performance tradeoff
- Verify statistical significance
NEVER:
- Optimize without measuring first
- Guess at bottleneck without profiling
- Benchmark with unrealistic data
- Trust single-run measurements
- Skip documentation of results
- Sacrifice correctness for speed
- Optimize without clear performance goal
- Ignore algorithmic improvements
Methodology:
- benchmarking.md — rigorous benchmarking methodology
Related skills:
- codebase-recon — evidence-based investigation (foundation)
- debugging — structured bug investigation
- typescript-dev — correctness before performance
查询优化:
- 为常用查询添加索引
- 使用EXPLAIN/EXPLAIN ANALYZE
- 避免N+1查询(使用连接或批量加载)
- 仅选择所需列
- 在数据库层面过滤(WHERE子句 vs 客户端过滤)
Schema设计:
- 规范化以减少重复
- 为读密集型工作负载进行反规范化
- 对大表进行分区
- 使用合适的数据类型
连接管理:
- 连接池(不为每个请求创建新连接)
- 预编译语句(避免SQL解析)
- 事务批量处理(减少往返次数)
</optimization_patterns>
<workflow>
循环流程:测量→性能分析→分析→优化→验证
- 定义性能目标 —— 目标指标(例如,P95 < 100ms)
- 建立基准 —— 在真实负载下测量当前性能
- 系统性能分析 —— 定位实际瓶颈(而非猜测)
- 分析根本原因 —— 理解代码缓慢的原因
- 设计优化方案 —— 规划定向改进措施
- 实施优化 —— 进行针对性修改
- 测量性能提升 —— 验证优化效果,检查是否有回归
- 记录结果 —— 记录基准、优化措施、性能提升和权衡点
在每个步骤中:
- 记录测量方法和结果
- 记录性能分析工具的输出
- 跟踪优化尝试(哪些有效/无效)
- 更新性能文档
在宣布优化完成前:
检查性能提升:
- ✓ 测量的性能提升是否达到目标?
- ✓ 性能提升是否具有统计显著性?
- ✓ 是否在真实负载下测试?
- ✓ 多次运行是否确认结果一致?
检查回归问题:
- ✓ 其他指标是否出现退化?
- ✓ 内存使用是否仍在可接受范围内?
- ✓ 代码复杂度是否仍可控?
- ✓ 测试是否全部通过?
检查文档:
- ✓ 是否记录了基准测量结果?
- ✓ 是否解释了优化方法?
- ✓ 是否用数据量化了性能提升?
- ✓ 是否记录了权衡点?
必须遵守:
- 优化前先测量(建立基准)
- 通过性能分析定位实际瓶颈
- 使用真实工作负载(而非测试用例数据)
- 多次测量(考虑方差)
- 记录基准和性能提升
- 检查其他指标是否出现回归
- 考虑可读性与性能的权衡
- 验证统计显著性
绝对禁止:
- 未先测量就进行优化
- 未进行性能分析就猜测瓶颈
- 使用不真实数据进行基准测试
- 相信单次运行的测量结果
- 跳过结果记录
- 为了速度牺牲正确性
- 无明确性能目标就进行优化
- 忽略算法层面的优化
方法学:
- benchmarking.md —— 严谨的基准测试方法学
相关技能:
- codebase-recon —— 基于实证的调查(基础)
- debugging —— 结构化bug调查
- typescript-dev —— 正确性优先于性能