linux-perf

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Linux perf

Linux perf

Purpose

用途

Guide agents through
perf
for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.
指导Agent使用
perf
进行CPU性能分析:采样、硬件计数器测量、热点识别,以及与火焰图生成工具的集成。

Triggers

触发场景

  • "Which function is consuming the most CPU?"
  • "How do I measure cache misses / IPC?"
  • "How do I use
    perf
    to find hotspots?"
  • "How do I generate a flamegraph from perf data?"
  • "perf shows
    [unknown]
    or
    [kernel]
    frames"
  • "哪个函数占用CPU最多?"
  • "如何测量缓存未命中/IPC?"
  • "如何使用
    perf
    查找热点?"
  • "如何从perf数据生成火焰图?"
  • "perf显示
    [unknown]
    [kernel]
    帧"

Workflow

操作流程

1. Prerequisites

1. 前置条件

bash
undefined
bash
undefined

Install

安装

sudo apt install linux-perf # Debian/Ubuntu (version-matched) sudo dnf install perf # Fedora/RHEL
sudo apt install linux-perf # Debian/Ubuntu(需与内核版本匹配) sudo dnf install perf # Fedora/RHEL

Check permissions

检查权限

By default perf requires root or paranoid level ≤ 1

默认情况下perf需要root权限或perf_event_paranoid级别≤1

cat /proc/sys/kernel/perf_event_paranoid
cat /proc/sys/kernel/perf_event_paranoid

2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions

2 = 仅允许CPU统计(无法访问内核数据),1 = 允许用户态+内核态,0 = 全部允许,-1 = 无限制

Temporarily lower (session only)

临时降低级别(仅当前会话有效)

sudo sysctl -w kernel.perf_event_paranoid=1
sudo sysctl -w kernel.perf_event_paranoid=1

Persistent

永久生效

echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf sudo sysctl -p /etc/sysctl.d/99-perf.conf

Compile the target with debug symbols for useful frame data:

```bash
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf sudo sysctl -p /etc/sysctl.d/99-perf.conf

编译目标程序时需包含调试符号,以获取有效的帧数据:

```bash
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c

-fno-omit-frame-pointer: essential for frame-pointer-based unwinding

-fno-omit-frame-pointer: 对基于帧指针的栈展开至关重要

Alternative: compile with DWARF CFI and use --call-graph=dwarf

替代方案:使用DWARF CFI编译并添加--call-graph=dwarf参数

undefined
undefined

2. perf stat — quick counters

2. perf stat — 快速计数器统计

bash
undefined
bash
undefined

Basic hardware counters

基础硬件计数器统计

perf stat ./prog
perf stat ./prog

With specific events

指定事件统计

perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog

Wall-clock comparison: N runs

多次运行对比 wall-clock 时间

perf stat -r 5 ./prog
perf stat -r 5 ./prog

Attach to existing process

附加到已有进程

perf stat -p 12345 sleep 10

Interpret `perf stat` output:

- **IPC** (instructions per cycle) < 1.0: memory-bound or stalled pipeline
- **cache-miss rate** > 5%: significant cache pressure
- **branch-miss rate** > 5%: branch predictor struggling
perf stat -p 12345 sleep 10

解读`perf stat`输出:

- **IPC**(每周期指令数)< 1.0:内存受限或流水线停滞
- **缓存未命中率** >5%:存在显著缓存压力
- **分支错误预测率**>5%:分支预测器性能不佳

3. perf record — sampling

3. perf record — 采样数据采集

bash
undefined
bash
undefined

Default: sample at 1000 Hz (cycles event)

默认:以1000Hz频率采样(cycles事件)

perf record -g ./prog
perf record -g ./prog

Specify frequency

指定采样频率

perf record -F 999 -g ./prog
perf record -F 999 -g ./prog

Specific event

指定事件采样

perf record -e cache-misses -g ./prog
perf record -e cache-misses -g ./prog

Attach to running process

附加到运行中的进程

perf record -F 999 -g -p 12345 sleep 30
perf record -F 999 -g -p 12345 sleep 30

Off-CPU profiling (time spent waiting)

Off-CPU性能分析(统计等待时间)

perf record -e sched:sched_switch -ag sleep 10
perf record -e sched:sched_switch -ag sleep 10

DWARF call graphs (better for binaries without frame pointers)

DWARF调用图(适用于无帧指针的二进制文件)

perf record -F 999 --call-graph=dwarf ./prog
perf record -F 999 --call-graph=dwarf ./prog

Save to named file

保存到指定文件

perf record -o myapp.perf.data -g ./prog
undefined
perf record -o myapp.perf.data -g ./prog
undefined

4. perf report — interactive analysis

4. perf report — 交互式分析

bash
perf report                          # reads perf.data
perf report -i myapp.perf.data
perf report --no-children            # self time only (not cumulative)
perf report --sort comm,dso,sym      # sort by fields
perf report --stdio                  # non-interactive text output
Navigation in TUI:
  • Enter
    — expand a symbol
  • a
    — annotate (show assembly with hit counts)
  • s
    — show source (needs debug info)
  • d
    — filter by DSO (library)
  • t
    — filter by thread
  • ?
    — help
bash
perf report                          # 读取perf.data文件
perf report -i myapp.perf.data
perf report --no-children            # 仅统计自身耗时(不含累计耗时)
perf report --sort comm,dso,sym      # 按指定字段排序
perf report --stdio                  # 非交互式文本输出
TUI界面导航:
  • Enter
    — 展开符号详情
  • a
    — 查看汇编代码(带命中计数)
  • s
    — 查看源代码(需调试信息)
  • d
    — 按DSO(库)过滤
  • t
    — 按线程过滤
  • ?
    — 查看帮助

5. perf annotate — hot instructions

5. perf annotate — 热点指令分析

bash
undefined
bash
undefined

Show assembly with hit percentages

显示带命中百分比的汇编代码

perf annotate sym_name
perf annotate sym_name

From report: press 'a' on a symbol

在report界面:选中符号后按'a'

Or directly:

或直接执行:

perf annotate -i perf.data --symbol=hot_function --stdio

High hit count on a `mov` or `vmovdqa` suggests a cache miss at that load.
perf annotate -i perf.data --symbol=hot_function --stdio

`mov`或`vmovdqa`指令的高命中数表明该加载操作存在缓存未命中。

6. perf top — live profiling

6. perf top — 实时性能分析

bash
undefined
bash
undefined

Live top, like 'top' but for functions

实时热点函数排行,类似'top'但针对函数

sudo perf top -g
sudo perf top -g

Filter by process

按进程过滤

sudo perf top -p 12345
undefined
sudo perf top -p 12345
undefined

7. Feed into flamegraphs

7. 导入火焰图工具

bash
undefined
bash
undefined

Generate perf script output

生成perf script输出

perf script > out.perf
perf script > out.perf

Use Brendan Gregg's FlameGraph tools

使用Brendan Gregg的FlameGraph工具

git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.perf > out.folded ./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.perf > out.folded ./FlameGraph/flamegraph.pl out.folded > flamegraph.svg

Open flamegraph.svg in browser

在浏览器中打开flamegraph.svg


See `skills/profilers/flamegraphs` for reading flamegraphs and interpreting results.

关于火焰图的解读和结果分析,请参考`skills/profilers/flamegraphs`。

8. Common issues

8. 常见问题

ProblemCauseFix
Permission denied
perf_event_paranoid
too high
Lower paranoid level or run with
sudo
[unknown]
frames
Missing frame pointers or debug infoRecompile with
-fno-omit-frame-pointer
or use
--call-graph=dwarf
[kernel]
everywhere
Kernel symbols not visibleUse
sudo perf record
; install
linux-image-$(uname -r)-dbgsym
No kallsyms
Kernel symbols unavailable`echo 0
Empty report for short programProgram exits too fastUse
-F 9999
or instrument longer workload
DWARF unwinding slowLarge DWARF stackLimit with
--call-graph dwarf,512
问题原因解决方法
Permission denied
perf_event_paranoid
级别过高
降低该级别或使用
sudo
运行
[unknown]
缺少帧指针或调试信息重新编译时添加
-fno-omit-frame-pointer
参数,或使用
--call-graph=dwarf
全是
[kernel]
内核符号不可见使用
sudo perf record
;安装
linux-image-$(uname -r)-dbgsym
No kallsyms
内核符号不可用`echo 0
短程序生成空报告程序退出过快使用
-F 9999
参数或延长工作负载时间
DWARF栈展开缓慢DWARF栈过大使用
--call-graph dwarf,512
限制栈深度

9. Useful events

9. 常用事件

bash
undefined
bash
undefined

List all available events

列出所有可用事件

perf list
perf list

Common hardware events

常见硬件事件

cycles instructions cache-references cache-misses branch-instructions branch-misses stalled-cycles-frontend stalled-cycles-backend
cycles instructions cache-references cache-misses branch-instructions branch-misses stalled-cycles-frontend stalled-cycles-backend

Software events

软件事件

context-switches cpu-migrations page-faults
context-switches cpu-migrations page-faults

Tracepoints (requires root)

跟踪点(需root权限)

sched:sched_switch syscalls:sys_enter_read

For a counter reference and interpretation guide, see [references/events.md](references/events.md).
sched:sched_switch syscalls:sys_enter_read

关于计数器参考和解读指南,请查看[references/events.md](references/events.md)。

Related skills

相关技能

  • Use
    skills/profilers/flamegraphs
    for SVG flamegraph generation and reading
  • Use
    skills/profilers/valgrind
    for cache simulation and memory profiling
  • Use
    skills/compilers/gcc
    or
    skills/compilers/clang
    for PGO from perf data (AutoFDO)
  • 使用
    skills/profilers/flamegraphs
    生成SVG火焰图并解读
  • 使用
    skills/profilers/valgrind
    进行缓存模拟和内存性能分析
  • 使用
    skills/compilers/gcc
    skills/compilers/clang
    基于perf数据进行PGO(AutoFDO)优化