rust-profiling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRust Profiling
Rust性能剖析
Purpose
用途
Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.
指导开发者完成Rust性能剖析:通过cargo-flamegraph生成火焰图、二进制文件大小分析、单态化膨胀测量、Criterion微基准测试,以及解析包含Rust内联帧的剖析结果。
Triggers
触发场景
- "How do I generate a flamegraph for a Rust program?"
- "My Rust binary is huge — how do I find what's causing it?"
- "How do I write Criterion benchmarks?"
- "How do I measure monomorphization bloat?"
- "Rust performance is worse than expected — how do I profile it?"
- "How do I use perf with Rust?"
- "如何为Rust程序生成火焰图?"
- "我的Rust二进制文件太大了——如何找到原因?"
- "如何编写Criterion基准测试?"
- "如何测量单态化膨胀?"
- "Rust性能不如预期——如何进行剖析?"
- "如何在Rust中使用perf?"
Workflow
工作流程
1. Build for profiling
1. 为剖析构建程序
bash
undefinedbash
undefinedRelease with debug symbols (needed for readable profiles)
带调试符号的Release版本(生成可读剖析结果所需)
Cargo.toml:
Cargo.toml:
[profile.release-with-debug]
inherits = "release"
debug = true
cargo build --profile release-with-debug
[profile.release-with-debug]
inherits = "release"
debug = true
cargo build --profile release-with-debug
Or quick: release + debug info inline
快速方式:Release版本 + 内联调试信息
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
undefinedCARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
undefined2. Flamegraphs with cargo-flamegraph
2. 使用cargo-flamegraph生成火焰图
bash
undefinedbash
undefinedInstall
安装
cargo install flamegraph
cargo install flamegraph
Linux: uses perf (requires perf_event_paranoid ≤ 1)
Linux:使用perf(要求perf_event_paranoid ≤ 1)
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid'
cargo flamegraph --bin myapp -- arg1 arg2
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid'
cargo flamegraph --bin myapp -- arg1 arg2
macOS: uses DTrace (requires sudo)
macOS:使用DTrace(需要sudo权限)
sudo cargo flamegraph --bin myapp -- arg1 arg2
sudo cargo flamegraph --bin myapp -- arg1 arg2
Profile tests
剖析测试用例
cargo flamegraph --test mytest -- test_filter
cargo flamegraph --test mytest -- test_filter
Profile benchmarks
剖析基准测试
cargo flamegraph --bench mybench -- --bench
cargo flamegraph --bench mybench -- --bench
Output
输出结果
Generates flamegraph.svg in current directory
在当前目录生成flamegraph.svg
Open in browser: firefox flamegraph.svg
在浏览器中打开:firefox flamegraph.svg
Custom flamegraph options:
```bash
自定义火焰图选项:
```bashMore samples
更多采样次数
cargo flamegraph --freq 1000 --bin myapp
cargo flamegraph --freq 1000 --bin myapp
Filter to specific threads
过滤特定线程
cargo flamegraph --bin myapp -- args 2>/dev/null
cargo flamegraph --bin myapp -- args 2>/dev/null
Using perf directly for more control
直接使用perf获得更多控制
perf record -g -F 999 ./target/release-with-debug/myapp args
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
undefinedperf record -g -F 999 ./target/release-with-debug/myapp args
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
undefined3. Binary size analysis with cargo-bloat
3. 使用cargo-bloat分析二进制文件大小
bash
undefinedbash
undefinedInstall
安装
cargo install cargo-bloat
cargo install cargo-bloat
Show top functions by size
按大小显示排名前20的函数
cargo bloat --release -n 20
cargo bloat --release -n 20
Show per-crate size breakdown
按 crate 显示大小拆分
cargo bloat --release --crates
cargo bloat --release --crates
Include only specific crate
仅包含特定crate
cargo bloat --release --filter myapp
cargo bloat --release --filter myapp
Compare before/after a change
对比变更前后的差异
cargo bloat --release --crates > before.txt
cargo bloat --release --crates > before.txt
make changes
进行代码变更
cargo bloat --release --crates > after.txt
diff before.txt after.txt
Typical output:File .text Size Crate Name
2.4% 3.0% 47.0KiB std <std macros>
1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process
1.2% 1.5% 23.1KiB serde serde::de::...
undefinedcargo bloat --release --crates > after.txt
diff before.txt after.txt
典型输出:File .text Size Crate Name
2.4% 3.0% 47.0KiB std <std macros>
1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process
1.2% 1.5% 23.1KiB serde serde::de::...
undefined4. Monomorphization bloat with cargo-llvm-lines
4. 使用cargo-llvm-lines测量单态化膨胀
bash
undefinedbash
undefinedInstall
安装
cargo install cargo-llvm-lines
cargo install cargo-llvm-lines
Show LLVM IR line counts (proxy for monomorphization)
显示LLVM IR行数(单态化的参考指标)
cargo llvm-lines --release | head -40
cargo llvm-lines --release | head -40
Filter to your crate only
仅过滤当前crate
cargo llvm-lines --release | grep '^myapp'
Typical output:Lines Copies Function name
85330 1 [LLVM passes]
7761 92 core::fmt::write
4672 11 myapp::process::<impl MyTrait for T>
3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop
High `Copies` count = monomorphization expansion. Fix:
```rust
// Before: generic, gets monomorphized for every T
fn process<T: AsRef<[u8]>>(data: T) -> usize {
do_work(data.as_ref())
}
// After: thin generic wrapper + concrete inner
fn process<T: AsRef<[u8]>>(data: T) -> usize {
fn inner(data: &[u8]) -> usize { do_work(data) }
inner(data.as_ref())
}cargo llvm-lines --release | grep '^myapp'
典型输出:Lines Copies Function name
85330 1 [LLVM passes]
7761 92 core::fmt::write
4672 11 myapp::process::<impl MyTrait for T>
3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop
高`Copies`计数 = 单态化膨胀。修复方案:
```rust
// 之前:泛型函数,会为每个T进行单态化
fn process<T: AsRef<[u8]>>(data: T) -> usize {
do_work(data.as_ref())
}
// 之后:轻量泛型包装 + 具体内部实现
fn process<T: AsRef<[u8]>>(data: T) -> usize {
fn inner(data: &[u8]) -> usize { do_work(data) }
inner(data.as_ref())
}5. Criterion microbenchmarks
5. Criterion微基准测试
toml
undefinedtoml
undefinedCargo.toml
Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "my_bench"
harness = false
```rust
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_process(c: &mut Criterion) {
// Simple benchmark
c.bench_function("process 1000 items", |b| {
let data: Vec<i32> = (0..1000).collect();
b.iter(|| process(black_box(&data))) // black_box prevents optimization
});
}
fn bench_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("process_sizes");
for size in [100, 1000, 10000].iter() {
let data: Vec<i32> = (0..*size).collect();
group.bench_with_input(
BenchmarkId::from_parameter(size),
&data,
|b, data| b.iter(|| process(black_box(data))),
);
}
group.finish();
}
criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);bash
undefined[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "my_bench"
harness = false
```rust
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_process(c: &mut Criterion) {
// 简单基准测试
c.bench_function("process 1000 items", |b| {
let data: Vec<i32> = (0..1000).collect();
b.iter(|| process(black_box(&data))) // black_box 防止编译器优化
});
}
fn bench_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("process_sizes");
for size in [100, 1000, 10000].iter() {
let data: Vec<i32> = (0..*size).collect();
group.bench_with_input(
BenchmarkId::from_parameter(size),
&data,
|b, data| b.iter(|| process(black_box(data))),
);
}
group.finish();
}
criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);bash
undefinedRun all benchmarks
运行所有基准测试
cargo bench
cargo bench
Run specific benchmark
运行特定基准测试
cargo bench --bench my_bench
cargo bench --bench my_bench
Run with filter
按过滤条件运行
cargo bench -- process_sizes
cargo bench -- process_sizes
Compare with baseline (save/load)
与基线版本对比(保存/加载)
cargo bench -- --save-baseline before
cargo bench -- --save-baseline before
make changes
进行代码变更
cargo bench -- --baseline before
cargo bench -- --baseline before
View HTML report
查看HTML报告
open target/criterion/report/index.html
undefinedopen target/criterion/report/index.html
undefined6. perf with Rust (Linux)
6. 在Rust中使用perf(Linux)
bash
undefinedbash
undefinedRecord
记录数据
perf record -g ./target/release-with-debug/myapp args
perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq
perf record -g ./target/release-with-debug/myapp args
perf record -g -F 999 ./target/release-with-debug/myapp args # 更高采样频率
Report
查看报告
perf report # interactive TUI
perf report --stdio --no-call-graph | head -40 # text
perf report # 交互式TUI界面
perf report --stdio --no-call-graph | head -40 # 文本格式
Annotate specific function
注释特定函数
perf annotate myapp::hot_function
perf annotate myapp::hot_function
stat (quick counters)
统计信息(快速计数器)
perf stat ./target/release/myapp args
Rust-specific perf tips:
- Build with `debug = 1` (line tables only) for faster builds with line-level attribution
- Use `RUSTFLAGS="-C force-frame-pointers=yes"` for better call graphs without DWARF unwinding
- Disable ASLR for reproducible addresses: `setarch $(uname -m) -R ./myapp`perf stat ./target/release/myapp args
Rust专属perf技巧:
- 使用`debug = 1`构建(仅保留行表),以更快的构建速度获得行级归因
- 使用`RUSTFLAGS="-C force-frame-pointers=yes"`获得更优的调用图,无需DWARF展开
- 禁用ASLR以获得可复现的地址:`setarch $(uname -m) -R ./myapp`7. heaptrack / DHAT for allocations
7. 使用heaptrack / DHAT分析内存分配
bash
undefinedbash
undefinedheaptrack (Linux)
heaptrack(Linux)
heaptrack ./target/release/myapp args
heaptrack_print heaptrack.myapp.*.zst | head -50
heaptrack ./target/release/myapp args
heaptrack_print heaptrack.myapp.*.zst | head -50
DHAT via Valgrind
通过Valgrind使用DHAT
valgrind --tool=dhat ./target/debug/myapp args
valgrind --tool=dhat ./target/debug/myapp args
Open dhat-out.* with dh_view.html
使用dh_view.html打开dhat-out.*文件
For flamegraph setup and Criterion configuration, see [references/cargo-flamegraph-setup.md](references/cargo-flamegraph-setup.md).
关于火焰图设置和Criterion配置,详见[references/cargo-flamegraph-setup.md](references/cargo-flamegraph-setup.md)。Related skills
相关技能
- Use for build configuration (debug symbols, profiles)
skills/rust/rustc-basics - Use for perf fundamentals
skills/profilers/linux-perf - Use for reading and interpreting flamegraph SVGs
skills/profilers/flamegraphs - Use for allocation profiling with massif/DHAT
skills/profilers/valgrind
- 如需构建配置(调试符号、编译配置文件),请使用
skills/rust/rustc-basics - 如需perf基础知识,请使用
skills/profilers/linux-perf - 如需阅读和解析火焰图SVG,请使用
skills/profilers/flamegraphs - 如需使用massif/DHAT进行内存分配剖析,请使用
skills/profilers/valgrind