rust-profiling

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Rust Profiling

Rust性能剖析

Purpose

用途

Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.

指导开发者完成Rust性能剖析：通过cargo-flamegraph生成火焰图、二进制文件大小分析、单态化膨胀测量、Criterion微基准测试，以及解析包含Rust内联帧的剖析结果。

Triggers

触发场景

"How do I generate a flamegraph for a Rust program?"
"My Rust binary is huge — how do I find what's causing it?"
"How do I write Criterion benchmarks?"
"How do I measure monomorphization bloat?"
"Rust performance is worse than expected — how do I profile it?"
"How do I use perf with Rust?"

"如何为Rust程序生成火焰图？"
"我的Rust二进制文件太大了——如何找到原因？"
"如何编写Criterion基准测试？"
"如何测量单态化膨胀？"
"Rust性能不如预期——如何进行剖析？"
"如何在Rust中使用perf？"

Workflow

工作流程

1. Build for profiling

1. 为剖析构建程序

bash

undefined

bash

undefined

Release with debug symbols (needed for readable profiles)

带调试符号的Release版本（生成可读剖析结果所需）

Cargo.toml:

[profile.release-with-debug] inherits = "release" debug = true

cargo build --profile release-with-debug

[profile.release-with-debug] inherits = "release" debug = true

cargo build --profile release-with-debug

Or quick: release + debug info inline

快速方式：Release版本 + 内联调试信息

CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release

undefined

CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release

undefined

2. Flamegraphs with cargo-flamegraph

2. 使用cargo-flamegraph生成火焰图

bash

undefined

bash

undefined

Install

安装

cargo install flamegraph

Linux: uses perf (requires perf_event_paranoid ≤ 1)

Linux：使用perf（要求perf_event_paranoid ≤ 1）

sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' cargo flamegraph --bin myapp -- arg1 arg2

macOS: uses DTrace (requires sudo)

macOS：使用DTrace（需要sudo权限）

sudo cargo flamegraph --bin myapp -- arg1 arg2

Profile tests

剖析测试用例

cargo flamegraph --test mytest -- test_filter

Profile benchmarks

剖析基准测试

cargo flamegraph --bench mybench -- --bench

Output

输出结果

Generates flamegraph.svg in current directory

在当前目录生成flamegraph.svg

Open in browser: firefox flamegraph.svg

在浏览器中打开：firefox flamegraph.svg


Custom flamegraph options:
```bash


自定义火焰图选项：
```bash

More samples

Filter to specific threads

过滤特定线程

cargo flamegraph --bin myapp -- args 2>/dev/null

Using perf directly for more control

直接使用perf获得更多控制

perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg

undefined

perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg

undefined

3. Binary size analysis with cargo-bloat

3. 使用cargo-bloat分析二进制文件大小

bash

undefined

bash

undefined

Install

安装

cargo install cargo-bloat

Show top functions by size

按大小显示排名前20的函数

cargo bloat --release -n 20

Show per-crate size breakdown

按 crate 显示大小拆分

cargo bloat --release --crates

Include only specific crate

仅包含特定crate

cargo bloat --release --filter myapp

Compare before/after a change

对比变更前后的差异

cargo bloat --release --crates > before.txt

make changes

进行代码变更

cargo bloat --release --crates > after.txt diff before.txt after.txt


Typical output:

File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...

undefined

cargo bloat --release --crates > after.txt diff before.txt after.txt


典型输出：

File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...

undefined

4. Monomorphization bloat with cargo-llvm-lines

4. 使用cargo-llvm-lines测量单态化膨胀

bash

undefined

bash

undefined

Install

安装

cargo install cargo-llvm-lines

Show LLVM IR line counts (proxy for monomorphization)

显示LLVM IR行数（单态化的参考指标）

cargo llvm-lines --release | head -40

Filter to your crate only

仅过滤当前crate

cargo llvm-lines --release | grep '^myapp'


Typical output:

Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop


High `Copies` count = monomorphization expansion. Fix:
```rust
// Before: generic, gets monomorphized for every T
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    do_work(data.as_ref())
}

// After: thin generic wrapper + concrete inner
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    fn inner(data: &[u8]) -> usize { do_work(data) }
    inner(data.as_ref())
}

cargo llvm-lines --release | grep '^myapp'


典型输出：

Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop


高`Copies`计数 = 单态化膨胀。修复方案：
```rust
// 之前：泛型函数，会为每个T进行单态化
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    do_work(data.as_ref())
}

// 之后：轻量泛型包装 + 具体内部实现
fn process<T: AsRef<[u8]>>(data: T) -> usize {
    fn inner(data: &[u8]) -> usize { do_work(data) }
    inner(data.as_ref())
}

5. Criterion microbenchmarks

5. Criterion微基准测试

toml

undefined

toml

undefined

Cargo.toml

[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }

[[bench]] name = "my_bench" harness = false


```rust
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_process(c: &mut Criterion) {
    // Simple benchmark
    c.bench_function("process 1000 items", |b| {
        let data: Vec<i32> = (0..1000).collect();
        b.iter(|| process(black_box(&data)))  // black_box prevents optimization
    });
}

fn bench_sizes(c: &mut Criterion) {
    let mut group = c.benchmark_group("process_sizes");

    for size in [100, 1000, 10000].iter() {
        let data: Vec<i32> = (0..*size).collect();
        group.bench_with_input(
            BenchmarkId::from_parameter(size),
            &data,
            |b, data| b.iter(|| process(black_box(data))),
        );
    }
    group.finish();
}

criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);

bash

undefined

[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }

[[bench]] name = "my_bench" harness = false


```rust
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_process(c: &mut Criterion) {
    // 简单基准测试
    c.bench_function("process 1000 items", |b| {
        let data: Vec<i32> = (0..1000).collect();
        b.iter(|| process(black_box(&data)))  // black_box 防止编译器优化
    });
}

fn bench_sizes(c: &mut Criterion) {
    let mut group = c.benchmark_group("process_sizes");

    for size in [100, 1000, 10000].iter() {
        let data: Vec<i32> = (0..*size).collect();
        group.bench_with_input(
            BenchmarkId::from_parameter(size),
            &data,
            |b, data| b.iter(|| process(black_box(data))),
        );
    }
    group.finish();
}

criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);

bash

undefined

Run all benchmarks

运行所有基准测试

cargo bench

Run specific benchmark

运行特定基准测试

cargo bench --bench my_bench

Run with filter

按过滤条件运行

cargo bench -- process_sizes

Compare with baseline (save/load)

与基线版本对比（保存/加载）

cargo bench -- --save-baseline before

make changes

进行代码变更

cargo bench -- --baseline before

View HTML report

查看HTML报告

open target/criterion/report/index.html

undefined

open target/criterion/report/index.html

undefined

6. perf with Rust (Linux)

6. 在Rust中使用perf（Linux）

bash

undefined

bash

undefined

Record

记录数据

perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq

perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # 更高采样频率

Report

查看报告

perf report # interactive TUI perf report --stdio --no-call-graph | head -40 # text

perf report # 交互式TUI界面 perf report --stdio --no-call-graph | head -40 # 文本格式

Annotate specific function

注释特定函数

perf annotate myapp::hot_function

stat (quick counters)

统计信息（快速计数器）

perf stat ./target/release/myapp args


Rust-specific perf tips:
- Build with `debug = 1` (line tables only) for faster builds with line-level attribution
- Use `RUSTFLAGS="-C force-frame-pointers=yes"` for better call graphs without DWARF unwinding
- Disable ASLR for reproducible addresses: `setarch $(uname -m) -R ./myapp`

perf stat ./target/release/myapp args


Rust专属perf技巧：
- 使用`debug = 1`构建（仅保留行表），以更快的构建速度获得行级归因
- 使用`RUSTFLAGS="-C force-frame-pointers=yes"`获得更优的调用图，无需DWARF展开
- 禁用ASLR以获得可复现的地址：`setarch $(uname -m) -R ./myapp`

7. heaptrack / DHAT for allocations

7. 使用heaptrack / DHAT分析内存分配

bash

undefined

bash

undefined

heaptrack (Linux)

heaptrack（Linux）

heaptrack ./target/release/myapp args heaptrack_print heaptrack.myapp.*.zst | head -50

DHAT via Valgrind

通过Valgrind使用DHAT

valgrind --tool=dhat ./target/debug/myapp args

Open dhat-out.* with dh_view.html

使用dh_view.html打开dhat-out.*文件


For flamegraph setup and Criterion configuration, see [references/cargo-flamegraph-setup.md](references/cargo-flamegraph-setup.md).


关于火焰图设置和Criterion配置，详见[references/cargo-flamegraph-setup.md](references/cargo-flamegraph-setup.md)。

rust-profiling

Original

Translation

Rust Profiling

Rust性能剖析

Purpose

用途

Triggers

触发场景

Workflow

工作流程

1. Build for profiling

1. 为剖析构建程序

Release with debug symbols (needed for readable profiles)

带调试符号的Release版本（生成可读剖析结果所需）

Cargo.toml:

Cargo.toml:

Or quick: release + debug info inline

快速方式：Release版本 + 内联调试信息

2. Flamegraphs with cargo-flamegraph

2. 使用cargo-flamegraph生成火焰图

Install

安装

Linux: uses perf (requires perf_event_paranoid ≤ 1)

Linux：使用perf（要求perf_event_paranoid ≤ 1）

macOS: uses DTrace (requires sudo)

macOS：使用DTrace（需要sudo权限）

Profile tests

剖析测试用例

Profile benchmarks

剖析基准测试

Output

输出结果

Generates flamegraph.svg in current directory

在当前目录生成flamegraph.svg

Open in browser: firefox flamegraph.svg

在浏览器中打开：firefox flamegraph.svg

More samples

更多采样次数

Filter to specific threads

过滤特定线程

Using perf directly for more control

直接使用perf获得更多控制

3. Binary size analysis with cargo-bloat

3. 使用cargo-bloat分析二进制文件大小

Install

安装

Show top functions by size

按大小显示排名前20的函数

Show per-crate size breakdown

按 crate 显示大小拆分

Include only specific crate

仅包含特定crate

Compare before/after a change

对比变更前后的差异

make changes

进行代码变更

4. Monomorphization bloat with cargo-llvm-lines

4. 使用cargo-llvm-lines测量单态化膨胀

Install

安装

Show LLVM IR line counts (proxy for monomorphization)

显示LLVM IR行数（单态化的参考指标）

Filter to your crate only

仅过滤当前crate

5. Criterion microbenchmarks

5. Criterion微基准测试

Cargo.toml

Cargo.toml

Run all benchmarks

运行所有基准测试

Run specific benchmark

运行特定基准测试

Run with filter

按过滤条件运行

Compare with baseline (save/load)

与基线版本对比（保存/加载）

make changes

进行代码变更

View HTML report