m10-performance

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Performance Optimization

性能优化

Layer 2: Design Choices

第二层：设计选择

Core Question

核心问题

What's the bottleneck, and is optimization worth it?

Before optimizing:

Have you measured? (Don't guess)
What's the acceptable performance?
Will optimization add complexity?

性能瓶颈是什么，优化是否值得？

优化前：

你是否已经进行了性能测量？（不要凭猜测）
可接受的性能标准是什么？
优化会增加复杂度吗？

Performance Decision → Implementation

性能决策 → 实现方案

Goal	Design Choice	Implementation
Reduce allocations	Pre-allocate, reuse	`with_capacity` , object pools
Improve cache	Contiguous data	`Vec` , `SmallVec`
Parallelize	Data parallelism	`rayon` , threads
Avoid copies	Zero-copy	References, `Cow<T>`
Reduce indirection	Inline data	`smallvec` , arrays

目标	设计选择	实现方式
减少内存分配	预分配、复用	`with_capacity` 、对象池
提升缓存效率	连续存储的数据	`Vec` 、 `SmallVec`
并行处理	数据并行	`rayon` 、线程
避免数据拷贝	零拷贝	引用、 `Cow<T>`
减少间接访问	内联数据	`smallvec` 、数组

Thinking Prompt

思考提示

Before optimizing:

Have you measured?
- Profile first → flamegraph, perf
- Benchmark → criterion, cargo bench
- Identify actual hotspots
What's the priority?
- Algorithm (10x-1000x improvement)
- Data structure (2x-10x)
- Allocation (2x-5x)
- Cache (1.5x-3x)
What's the trade-off?
- Complexity vs speed
- Memory vs CPU
- Latency vs throughput

优化前需考虑：

你是否已经进行了性能测量？
- 先进行性能分析 → flamegraph、perf
- 基准测试 → criterion、cargo bench
- 定位实际性能瓶颈
优化优先级是什么？
- 算法优化（性能提升10倍-1000倍）
- 数据结构优化（性能提升2倍-10倍）
- 内存分配优化（性能提升2倍-5倍）
- 缓存优化（性能提升1.5倍-3倍）
优化的权衡点是什么？
- 复杂度与速度的权衡
- 内存与CPU的权衡
- 延迟与吞吐量的权衡

Trace Up ↑

向上追溯 ↑

To domain constraints (Layer 3):

"How fast does this need to be?"
    ↑ Ask: What's the performance SLA?
    ↑ Check: domain-* (latency requirements)
    ↑ Check: Business requirements (acceptable response time)

Question	Trace To	Ask
Latency requirements	domain-*	What's acceptable response time?
Throughput needs	domain-*	How many requests per second?
Memory constraints	domain-*	What's the memory budget?

关联领域约束（第三层）：

"系统需要达到多快的速度？"
    ↑ 询问：性能SLA是什么？
    ↑ 查看：domain-*（延迟要求）
    ↑ 查看：业务需求（可接受的响应时间）

问题	关联对象	询问内容
延迟要求	domain-*	可接受的响应时间是多少？
吞吐量需求	domain-*	每秒需要处理多少请求？
内存限制	domain-*	内存预算是多少？

Trace Down ↓

向下落地 ↓

To implementation (Layer 1):

"Need to reduce allocations"
    ↓ m01-ownership: Use references, avoid clone
    ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize"
    ↓ m07-concurrency: Choose rayon or threads
    ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency"
    ↓ Data layout: Prefer Vec over HashMap when possible
    ↓ Access patterns: Sequential over random access

关联实现细节（第一层）：

"需要减少内存分配"
    ↓ m01-ownership：使用引用，避免clone
    ↓ m02-resource：使用with_capacity进行预分配

"需要并行处理"
    ↓ m07-concurrency：选择rayon或线程
    ↓ m07-concurrency：针对I/O密集型场景考虑异步

"需要提升缓存效率"
    ↓ 数据布局：尽可能使用Vec替代HashMap
    ↓ 访问模式：优先顺序访问而非随机访问

Quick Reference

速查表

Tool	Purpose
`cargo bench`	Micro-benchmarks
`criterion`	Statistical benchmarks
`perf` / `flamegraph`	CPU profiling
`heaptrack`	Allocation tracking
`valgrind` / `cachegrind`	Cache analysis

工具	用途
`cargo bench`	微基准测试
`criterion`	统计型基准测试
`perf` / `flamegraph`	CPU性能分析
`heaptrack`	内存分配跟踪
`valgrind` / `cachegrind`	缓存分析

Optimization Priority

优化优先级

1. Algorithm choice     (10x - 1000x)
2. Data structure       (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization   (1.5x - 3x)
5. SIMD/Parallelism     (2x - 8x)

1. 算法选择     (性能提升10倍-1000倍)
2. 数据结构       (性能提升2倍-10倍)
3. 减少内存分配 (性能提升2倍-5倍)
4. 缓存优化   (性能提升1.5倍-3倍)
5. SIMD/并行处理     (性能提升2倍-8倍)

Common Techniques

常用优化技巧

Technique	When	How
Pre-allocation	Known size	`Vec::with_capacity(n)`
Avoid cloning	Hot paths	Use references or `Cow<T>`
Batch operations	Many small ops	Collect then process
SmallVec	Usually small	`smallvec::SmallVec<[T; N]>`
Inline buffers	Fixed-size data	Arrays over Vec

技巧	适用场景	实现方式
预分配	已知数据大小	`Vec::with_capacity(n)`
避免克隆	性能热点路径	使用引用或 `Cow<T>`
批量操作	大量小操作	先收集再处理
SmallVec	数据通常较小	`smallvec::SmallVec<[T; N]>`
内联缓冲区	固定大小的数据	使用数组替代Vec

Common Mistakes

常见错误

Mistake	Why Wrong	Better
Optimize without profiling	Wrong target	Profile first
Benchmark in debug mode	Meaningless	Always `--release`
Use LinkedList	Cache unfriendly	`Vec` or `VecDeque`
Hidden `.clone()`	Unnecessary allocs	Use references
Premature optimization	Wasted effort	Make it work first

错误做法	问题所在	优化方案
未做性能分析就优化	优化目标错误	先进行性能分析
在debug模式下做基准测试	结果无意义	始终使用 `--release` 模式
使用LinkedList	缓存不友好	使用 `Vec` 或 `VecDeque`
隐藏的 `.clone()` 调用	产生不必要的内存分配	使用引用
过早优化	浪费精力	先保证功能正常再优化

Anti-Patterns

反模式

Anti-Pattern	Why Bad	Better
Clone to avoid lifetimes	Performance cost	Proper ownership
Box everything	Indirection cost	Stack when possible
HashMap for small sets	Overhead	Vec with linear search
String concat in loop	O(n^2)	`String::with_capacity` or `format!`

反模式	危害	替代方案
通过克隆避免生命周期问题	性能损耗	正确处理所有权
所有对象都用Box包装	间接访问开销	尽可能使用栈存储
对小型集合使用HashMap	开销过大	使用Vec配合线性搜索
循环中拼接字符串	时间复杂度O(n²)	使用 `String::with_capacity` 或 `format!`