rust-profiling

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Rust Profiling

Rust性能剖析

Purpose

用途

Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.
指导开发者完成Rust性能剖析:通过cargo-flamegraph生成火焰图、二进制文件大小分析、单态化膨胀测量、Criterion微基准测试,以及解析包含Rust内联帧的剖析结果。

Triggers

触发场景

  • "How do I generate a flamegraph for a Rust program?"
  • "My Rust binary is huge — how do I find what's causing it?"
  • "How do I write Criterion benchmarks?"
  • "How do I measure monomorphization bloat?"
  • "Rust performance is worse than expected — how do I profile it?"
  • "How do I use perf with Rust?"
  • "如何为Rust程序生成火焰图?"
  • "我的Rust二进制文件太大了——如何找到原因?"
  • "如何编写Criterion基准测试?"
  • "如何测量单态化膨胀?"
  • "Rust性能不如预期——如何进行剖析?"
  • "如何在Rust中使用perf?"

Workflow

工作流程

1. Build for profiling

1. 为剖析构建程序

bash
undefined
bash
undefined

Release with debug symbols (needed for readable profiles)

带调试符号的Release版本(生成可读剖析结果所需)

Cargo.toml:

Cargo.toml:

[profile.release-with-debug] inherits = "release" debug = true
cargo build --profile release-with-debug
[profile.release-with-debug] inherits = "release" debug = true
cargo build --profile release-with-debug

Or quick: release + debug info inline

快速方式:Release版本 + 内联调试信息

CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
undefined
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
undefined

2. Flamegraphs with cargo-flamegraph

2. 使用cargo-flamegraph生成火焰图

bash
undefined
bash
undefined

Install

安装

cargo install flamegraph
cargo install flamegraph

Linux: uses perf (requires perf_event_paranoid ≤ 1)

Linux:使用perf(要求perf_event_paranoid ≤ 1)

sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' cargo flamegraph --bin myapp -- arg1 arg2
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' cargo flamegraph --bin myapp -- arg1 arg2

macOS: uses DTrace (requires sudo)

macOS:使用DTrace(需要sudo权限)

sudo cargo flamegraph --bin myapp -- arg1 arg2
sudo cargo flamegraph --bin myapp -- arg1 arg2

Profile tests

剖析测试用例

cargo flamegraph --test mytest -- test_filter
cargo flamegraph --test mytest -- test_filter

Profile benchmarks

剖析基准测试

cargo flamegraph --bench mybench -- --bench
cargo flamegraph --bench mybench -- --bench

Output

输出结果

Generates flamegraph.svg in current directory

在当前目录生成flamegraph.svg

Open in browser: firefox flamegraph.svg

在浏览器中打开:firefox flamegraph.svg


Custom flamegraph options:
```bash

自定义火焰图选项:
```bash

More samples

更多采样次数

cargo flamegraph --freq 1000 --bin myapp
cargo flamegraph --freq 1000 --bin myapp

Filter to specific threads

过滤特定线程

cargo flamegraph --bin myapp -- args 2>/dev/null
cargo flamegraph --bin myapp -- args 2>/dev/null

Using perf directly for more control

直接使用perf获得更多控制

perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
undefined
perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
undefined

3. Binary size analysis with cargo-bloat

3. 使用cargo-bloat分析二进制文件大小

bash
undefined
bash
undefined

Install

安装

cargo install cargo-bloat
cargo install cargo-bloat

Show top functions by size

按大小显示排名前20的函数

cargo bloat --release -n 20
cargo bloat --release -n 20

Show per-crate size breakdown

按 crate 显示大小拆分

cargo bloat --release --crates
cargo bloat --release --crates

Include only specific crate

仅包含特定crate

cargo bloat --release --filter myapp
cargo bloat --release --filter myapp

Compare before/after a change

对比变更前后的差异

cargo bloat --release --crates > before.txt
cargo bloat --release --crates > before.txt

make changes

进行代码变更

cargo bloat --release --crates > after.txt diff before.txt after.txt

Typical output:
File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...
undefined
cargo bloat --release --crates > after.txt diff before.txt after.txt

典型输出:
File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...
undefined

4. Monomorphization bloat with cargo-llvm-lines

4. 使用cargo-llvm-lines测量单态化膨胀

bash
undefined
bash
undefined

Install

安装

cargo install cargo-llvm-lines
cargo install cargo-llvm-lines

Show LLVM IR line counts (proxy for monomorphization)

显示LLVM IR行数(单态化的参考指标)

cargo llvm-lines --release | head -40
cargo llvm-lines --release | head -40

Filter to your crate only

仅过滤当前crate

cargo llvm-lines --release | grep '^myapp'

Typical output:
Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop

High `Copies` count = monomorphization expansion. Fix:
```rust
// Before: generic, gets monomorphized for every T
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    do_work(data.as_ref())
}

// After: thin generic wrapper + concrete inner
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    fn inner(data: &[u8]) -> usize { do_work(data) }
    inner(data.as_ref())
}
cargo llvm-lines --release | grep '^myapp'

典型输出:
Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop

高`Copies`计数 = 单态化膨胀。修复方案:
```rust
// 之前:泛型函数,会为每个T进行单态化
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    do_work(data.as_ref())
}

// 之后:轻量泛型包装 + 具体内部实现
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    fn inner(data: &[u8]) -> usize { do_work(data) }
    inner(data.as_ref())
}

5. Criterion microbenchmarks

5. Criterion微基准测试

toml
undefined
toml
undefined

Cargo.toml

Cargo.toml

[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }
[[bench]] name = "my_bench" harness = false

```rust
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_process(c: &mut Criterion) {
    // Simple benchmark
    c.bench_function("process 1000 items", |b| {
        let data: Vec<i32> = (0..1000).collect();
        b.iter(|| process(black_box(&data)))  // black_box prevents optimization
    });
}

fn bench_sizes(c: &mut Criterion) {
    let mut group = c.benchmark_group("process_sizes");

    for size in [100, 1000, 10000].iter() {
        let data: Vec<i32> = (0..*size).collect();
        group.bench_with_input(
            BenchmarkId::from_parameter(size),
            &data,
            |b, data| b.iter(|| process(black_box(data))),
        );
    }
    group.finish();
}

criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);
bash
undefined
[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }
[[bench]] name = "my_bench" harness = false

```rust
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_process(c: &mut Criterion) {
    // 简单基准测试
    c.bench_function("process 1000 items", |b| {
        let data: Vec<i32> = (0..1000).collect();
        b.iter(|| process(black_box(&data)))  // black_box 防止编译器优化
    });
}

fn bench_sizes(c: &mut Criterion) {
    let mut group = c.benchmark_group("process_sizes");

    for size in [100, 1000, 10000].iter() {
        let data: Vec<i32> = (0..*size).collect();
        group.bench_with_input(
            BenchmarkId::from_parameter(size),
            &data,
            |b, data| b.iter(|| process(black_box(data))),
        );
    }
    group.finish();
}

criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);
bash
undefined

Run all benchmarks

运行所有基准测试

cargo bench
cargo bench

Run specific benchmark

运行特定基准测试

cargo bench --bench my_bench
cargo bench --bench my_bench

Run with filter

按过滤条件运行

cargo bench -- process_sizes
cargo bench -- process_sizes

Compare with baseline (save/load)

与基线版本对比(保存/加载)

cargo bench -- --save-baseline before
cargo bench -- --save-baseline before

make changes

进行代码变更

cargo bench -- --baseline before
cargo bench -- --baseline before

View HTML report

查看HTML报告

open target/criterion/report/index.html
undefined
open target/criterion/report/index.html
undefined

6. perf with Rust (Linux)

6. 在Rust中使用perf(Linux)

bash
undefined
bash
undefined

Record

记录数据

perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq
perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # 更高采样频率

Report

查看报告

perf report # interactive TUI perf report --stdio --no-call-graph | head -40 # text
perf report # 交互式TUI界面 perf report --stdio --no-call-graph | head -40 # 文本格式

Annotate specific function

注释特定函数

perf annotate myapp::hot_function
perf annotate myapp::hot_function

stat (quick counters)

统计信息(快速计数器)

perf stat ./target/release/myapp args

Rust-specific perf tips:
- Build with `debug = 1` (line tables only) for faster builds with line-level attribution
- Use `RUSTFLAGS="-C force-frame-pointers=yes"` for better call graphs without DWARF unwinding
- Disable ASLR for reproducible addresses: `setarch $(uname -m) -R ./myapp`
perf stat ./target/release/myapp args

Rust专属perf技巧:
- 使用`debug = 1`构建(仅保留行表),以更快的构建速度获得行级归因
- 使用`RUSTFLAGS="-C force-frame-pointers=yes"`获得更优的调用图,无需DWARF展开
- 禁用ASLR以获得可复现的地址:`setarch $(uname -m) -R ./myapp`

7. heaptrack / DHAT for allocations

7. 使用heaptrack / DHAT分析内存分配

bash
undefined
bash
undefined

heaptrack (Linux)

heaptrack(Linux)

heaptrack ./target/release/myapp args heaptrack_print heaptrack.myapp.*.zst | head -50
heaptrack ./target/release/myapp args heaptrack_print heaptrack.myapp.*.zst | head -50

DHAT via Valgrind

通过Valgrind使用DHAT

valgrind --tool=dhat ./target/debug/myapp args
valgrind --tool=dhat ./target/debug/myapp args

Open dhat-out.* with dh_view.html

使用dh_view.html打开dhat-out.*文件


For flamegraph setup and Criterion configuration, see [references/cargo-flamegraph-setup.md](references/cargo-flamegraph-setup.md).

关于火焰图设置和Criterion配置,详见[references/cargo-flamegraph-setup.md](references/cargo-flamegraph-setup.md)。

Related skills

相关技能

  • Use
    skills/rust/rustc-basics
    for build configuration (debug symbols, profiles)
  • Use
    skills/profilers/linux-perf
    for perf fundamentals
  • Use
    skills/profilers/flamegraphs
    for reading and interpreting flamegraph SVGs
  • Use
    skills/profilers/valgrind
    for allocation profiling with massif/DHAT
  • 如需构建配置(调试符号、编译配置文件),请使用
    skills/rust/rustc-basics
  • 如需perf基础知识,请使用
    skills/profilers/linux-perf
  • 如需阅读和解析火焰图SVG,请使用
    skills/profilers/flamegraphs
  • 如需使用massif/DHAT进行内存分配剖析,请使用
    skills/profilers/valgrind