linux-perf
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLinux perf
Linux perf
Purpose
用途
Guide agents through for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.
perf指导Agent使用进行CPU性能分析:采样、硬件计数器测量、热点识别,以及与火焰图生成工具的集成。
perfTriggers
触发场景
- "Which function is consuming the most CPU?"
- "How do I measure cache misses / IPC?"
- "How do I use to find hotspots?"
perf - "How do I generate a flamegraph from perf data?"
- "perf shows or
[unknown]frames"[kernel]
- "哪个函数占用CPU最多?"
- "如何测量缓存未命中/IPC?"
- "如何使用查找热点?"
perf - "如何从perf数据生成火焰图?"
- "perf显示或
[unknown]帧"[kernel]
Workflow
操作流程
1. Prerequisites
1. 前置条件
bash
undefinedbash
undefinedInstall
安装
sudo apt install linux-perf # Debian/Ubuntu (version-matched)
sudo dnf install perf # Fedora/RHEL
sudo apt install linux-perf # Debian/Ubuntu(需与内核版本匹配)
sudo dnf install perf # Fedora/RHEL
Check permissions
检查权限
By default perf requires root or paranoid level ≤ 1
默认情况下perf需要root权限或perf_event_paranoid级别≤1
cat /proc/sys/kernel/perf_event_paranoid
cat /proc/sys/kernel/perf_event_paranoid
2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions
2 = 仅允许CPU统计(无法访问内核数据),1 = 允许用户态+内核态,0 = 全部允许,-1 = 无限制
Temporarily lower (session only)
临时降低级别(仅当前会话有效)
sudo sysctl -w kernel.perf_event_paranoid=1
sudo sysctl -w kernel.perf_event_paranoid=1
Persistent
永久生效
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf
sudo sysctl -p /etc/sysctl.d/99-perf.conf
Compile the target with debug symbols for useful frame data:
```bash
gcc -g -O2 -fno-omit-frame-pointer -o prog main.cecho 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf
sudo sysctl -p /etc/sysctl.d/99-perf.conf
编译目标程序时需包含调试符号,以获取有效的帧数据:
```bash
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c-fno-omit-frame-pointer: essential for frame-pointer-based unwinding
-fno-omit-frame-pointer: 对基于帧指针的栈展开至关重要
Alternative: compile with DWARF CFI and use --call-graph=dwarf
替代方案:使用DWARF CFI编译并添加--call-graph=dwarf参数
undefinedundefined2. perf stat — quick counters
2. perf stat — 快速计数器统计
bash
undefinedbash
undefinedBasic hardware counters
基础硬件计数器统计
perf stat ./prog
perf stat ./prog
With specific events
指定事件统计
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
Wall-clock comparison: N runs
多次运行对比 wall-clock 时间
perf stat -r 5 ./prog
perf stat -r 5 ./prog
Attach to existing process
附加到已有进程
perf stat -p 12345 sleep 10
Interpret `perf stat` output:
- **IPC** (instructions per cycle) < 1.0: memory-bound or stalled pipeline
- **cache-miss rate** > 5%: significant cache pressure
- **branch-miss rate** > 5%: branch predictor strugglingperf stat -p 12345 sleep 10
解读`perf stat`输出:
- **IPC**(每周期指令数)< 1.0:内存受限或流水线停滞
- **缓存未命中率** >5%:存在显著缓存压力
- **分支错误预测率**>5%:分支预测器性能不佳3. perf record — sampling
3. perf record — 采样数据采集
bash
undefinedbash
undefinedDefault: sample at 1000 Hz (cycles event)
默认:以1000Hz频率采样(cycles事件)
perf record -g ./prog
perf record -g ./prog
Specify frequency
指定采样频率
perf record -F 999 -g ./prog
perf record -F 999 -g ./prog
Specific event
指定事件采样
perf record -e cache-misses -g ./prog
perf record -e cache-misses -g ./prog
Attach to running process
附加到运行中的进程
perf record -F 999 -g -p 12345 sleep 30
perf record -F 999 -g -p 12345 sleep 30
Off-CPU profiling (time spent waiting)
Off-CPU性能分析(统计等待时间)
perf record -e sched:sched_switch -ag sleep 10
perf record -e sched:sched_switch -ag sleep 10
DWARF call graphs (better for binaries without frame pointers)
DWARF调用图(适用于无帧指针的二进制文件)
perf record -F 999 --call-graph=dwarf ./prog
perf record -F 999 --call-graph=dwarf ./prog
Save to named file
保存到指定文件
perf record -o myapp.perf.data -g ./prog
undefinedperf record -o myapp.perf.data -g ./prog
undefined4. perf report — interactive analysis
4. perf report — 交互式分析
bash
perf report # reads perf.data
perf report -i myapp.perf.data
perf report --no-children # self time only (not cumulative)
perf report --sort comm,dso,sym # sort by fields
perf report --stdio # non-interactive text outputNavigation in TUI:
- — expand a symbol
Enter - — annotate (show assembly with hit counts)
a - — show source (needs debug info)
s - — filter by DSO (library)
d - — filter by thread
t - — help
?
bash
perf report # 读取perf.data文件
perf report -i myapp.perf.data
perf report --no-children # 仅统计自身耗时(不含累计耗时)
perf report --sort comm,dso,sym # 按指定字段排序
perf report --stdio # 非交互式文本输出TUI界面导航:
- — 展开符号详情
Enter - — 查看汇编代码(带命中计数)
a - — 查看源代码(需调试信息)
s - — 按DSO(库)过滤
d - — 按线程过滤
t - — 查看帮助
?
5. perf annotate — hot instructions
5. perf annotate — 热点指令分析
bash
undefinedbash
undefinedShow assembly with hit percentages
显示带命中百分比的汇编代码
perf annotate sym_name
perf annotate sym_name
From report: press 'a' on a symbol
在report界面:选中符号后按'a'
Or directly:
或直接执行:
perf annotate -i perf.data --symbol=hot_function --stdio
High hit count on a `mov` or `vmovdqa` suggests a cache miss at that load.perf annotate -i perf.data --symbol=hot_function --stdio
`mov`或`vmovdqa`指令的高命中数表明该加载操作存在缓存未命中。6. perf top — live profiling
6. perf top — 实时性能分析
bash
undefinedbash
undefinedLive top, like 'top' but for functions
实时热点函数排行,类似'top'但针对函数
sudo perf top -g
sudo perf top -g
Filter by process
按进程过滤
sudo perf top -p 12345
undefinedsudo perf top -p 12345
undefined7. Feed into flamegraphs
7. 导入火焰图工具
bash
undefinedbash
undefinedGenerate perf script output
生成perf script输出
perf script > out.perf
perf script > out.perf
Use Brendan Gregg's FlameGraph tools
使用Brendan Gregg的FlameGraph工具
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
Open flamegraph.svg in browser
在浏览器中打开flamegraph.svg
See `skills/profilers/flamegraphs` for reading flamegraphs and interpreting results.
关于火焰图的解读和结果分析,请参考`skills/profilers/flamegraphs`。8. Common issues
8. 常见问题
| Problem | Cause | Fix |
|---|---|---|
| | Lower paranoid level or run with |
| Missing frame pointers or debug info | Recompile with |
| Kernel symbols not visible | Use |
| Kernel symbols unavailable | `echo 0 |
| Empty report for short program | Program exits too fast | Use |
| DWARF unwinding slow | Large DWARF stack | Limit with |
| 问题 | 原因 | 解决方法 |
|---|---|---|
| | 降低该级别或使用 |
| 缺少帧指针或调试信息 | 重新编译时添加 |
全是 | 内核符号不可见 | 使用 |
| 内核符号不可用 | `echo 0 |
| 短程序生成空报告 | 程序退出过快 | 使用 |
| DWARF栈展开缓慢 | DWARF栈过大 | 使用 |
9. Useful events
9. 常用事件
bash
undefinedbash
undefinedList all available events
列出所有可用事件
perf list
perf list
Common hardware events
常见硬件事件
cycles
instructions
cache-references
cache-misses
branch-instructions
branch-misses
stalled-cycles-frontend
stalled-cycles-backend
cycles
instructions
cache-references
cache-misses
branch-instructions
branch-misses
stalled-cycles-frontend
stalled-cycles-backend
Software events
软件事件
context-switches
cpu-migrations
page-faults
context-switches
cpu-migrations
page-faults
Tracepoints (requires root)
跟踪点(需root权限)
sched:sched_switch
syscalls:sys_enter_read
For a counter reference and interpretation guide, see [references/events.md](references/events.md).sched:sched_switch
syscalls:sys_enter_read
关于计数器参考和解读指南,请查看[references/events.md](references/events.md)。Related skills
相关技能
- Use for SVG flamegraph generation and reading
skills/profilers/flamegraphs - Use for cache simulation and memory profiling
skills/profilers/valgrind - Use or
skills/compilers/gccfor PGO from perf data (AutoFDO)skills/compilers/clang
- 使用生成SVG火焰图并解读
skills/profilers/flamegraphs - 使用进行缓存模拟和内存性能分析
skills/profilers/valgrind - 使用或
skills/compilers/gcc基于perf数据进行PGO(AutoFDO)优化skills/compilers/clang