memory-benchmark

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Memory Benchmarking & Analysis

内存基准测试与分析

The
perf/memory
crate benchmarks memory usage of SQL workloads under WAL and MVCC journal modes. It uses
dhat
as the global allocator to track every heap allocation, and
memory-stats
for process-level RSS snapshots.
perf/memory
crate用于在WAL和MVCC日志模式下对SQL工作负载的内存使用进行基准测试。它使用
dhat
作为全局分配器来跟踪每一次堆分配,并通过
memory-stats
获取进程级别的RSS快照。

Location

位置

  • Benchmark crate:
    perf/memory/
  • Analysis script:
    perf/memory/analyze-dhat.py
  • dhat output:
    dhat-heap.json
    (written to CWD after each run)
  • 基准测试crate:
    perf/memory/
  • 分析脚本:
    perf/memory/analyze-dhat.py
  • dhat输出:
    dhat-heap.json
    (每次运行后写入当前工作目录)

Running Benchmarks

运行基准测试

Always run in release mode — debug builds have wildly different allocation patterns and the results are not representative of real-world usage.
bash
undefined
始终在release模式下运行——debug构建的分配模式差异极大,其结果无法代表真实场景的使用情况。
bash
undefined

Basic: single connection, WAL mode, insert-heavy workload

基础用法:单连接、WAL模式、插入密集型工作负载

cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100
cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100

MVCC with concurrent connections

带并发连接的MVCC模式

cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4
cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4

All CLI options

所有CLI选项

cargo run --release -p memory-benchmark --
--mode wal|mvcc
--workload insert-heavy|read-heavy|mixed|scan-heavy
-i <iterations>
-b <batch-size>
--connections <N>
--timeout <ms>
--cache-size <pages>
--format human|json|csv

Every run produces a `dhat-heap.json` in the current directory. This file contains per-allocation-site data for the entire run.
cargo run --release -p memory-benchmark --
--mode wal|mvcc
--workload insert-heavy|read-heavy|mixed|scan-heavy
-i <iterations>
-b <batch-size>
--connections <N>
--timeout <ms>
--cache-size <pages>
--format human|json|csv

每次运行都会在当前目录生成一个`dhat-heap.json`文件,该文件包含整个运行过程中每个分配站点的数据。

Built-in Workload Profiles

内置工作负载配置文件

ProfileDescriptionSetup
insert-heavy
100% INSERT statementsCreates table
read-heavy
90% SELECT by id / 10% INSERTSeeds 10k rows
mixed
50% SELECT / 50% INSERTSeeds 10k rows
scan-heavy
Full table scans with LIKESeeds 10k rows
Profiles implement the
Profile
trait in
perf/memory/src/profile/
. To add a new workload, create a new file implementing the trait and wire it into the
WorkloadProfile
enum in
main.rs
.
配置文件描述设置
insert-heavy
100% INSERT语句创建表
read-heavy
90% 按ID查询 / 10% 插入预填充10k行数据
mixed
50% 查询 / 50% 插入预填充10k行数据
scan-heavy
使用LIKE进行全表扫描预填充10k行数据
配置文件在
perf/memory/src/profile/
中实现
Profile
trait。若要添加新的工作负载,创建一个实现该trait的新文件,并将其关联到
main.rs
中的
WorkloadProfile
枚举。

Understanding the Output

理解输出结果

The benchmark reports three categories of metrics:
基准测试报告三类指标:

RSS (process-level)

RSS(进程级别)

Measured via
memory-stats
crate. Includes everything: heap, mmap'd files (WAL, DB pages pulled into OS page cache), tokio runtime, etc. Snapshots are taken at phase transitions (setup -> run) and after each batch.
  • Baseline: RSS before any DB work (runtime overhead)
  • Peak: Highest RSS observed during the run
  • Net growth: Final RSS minus baseline — the memory attributable to the workload
通过
memory-stats
crate测量,包含所有内容:堆、内存映射文件(WAL、被操作系统页缓存加载的数据库页)、tokio运行时等。在阶段转换(设置→运行)和每个批次结束时拍摄快照。
  • 基准值: 任何数据库工作开始前的RSS(运行时开销)
  • 峰值: 运行过程中观测到的最高RSS
  • 净增长: 最终RSS减去基准值——该工作负载导致的内存增量

Heap (dhat)

堆(dhat)

Precise allocation tracking via the
dhat
global allocator. Only counts explicit heap allocations (malloc/alloc), not mmap.
  • Current: Bytes still allocated at measurement time
  • Peak: Highest simultaneous live allocation during the entire run
  • Total allocs: Number of individual allocation calls
  • Total bytes: Cumulative bytes allocated (includes freed memory) — measures allocation pressure
通过
dhat
全局分配器进行精确的分配跟踪,仅统计显式堆分配(malloc/alloc),不包含内存映射。
  • 当前值: 测量时仍已分配的字节数
  • 峰值: 整个运行过程中同时存在的最高活跃分配量
  • 总分配次数: 单独分配调用的次数
  • 总字节数: 累计分配的字节数(包含已释放内存)——衡量分配压力

Disk

磁盘

File sizes after the benchmark completes:
  • DB file: The
    .db
    file
  • WAL file: The
    .db-wal
    file (WAL mode only)
  • Log file: The
    .db-log
    file (MVCC logical log only)
基准测试完成后的文件大小:
  • 数据库文件:
    .db
    文件
  • WAL文件:
    .db-wal
    文件(仅WAL模式)
  • 日志文件:
    .db-log
    文件(仅MVCC逻辑日志)

Analyzing dhat Output

分析dhat输出

After running a benchmark, use the analysis script to produce a readable report from
dhat-heap.json
:
bash
undefined
运行基准测试后,使用分析脚本从
dhat-heap.json
生成可读报告:
bash
undefined

Overview: top allocation sites by bytes live at global peak

概览:全局峰值时活跃字节数最多的前N个分配站点

python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules
python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules

Focus on a specific subsystem

聚焦特定子系统

python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks

Sort by different metrics

按不同指标排序

python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # bytes at exit (leaks) python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # total bytes (pressure) python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # max live bytes per site
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # 退出时的字节数(内存泄漏) python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # 总字节数(分配压力) python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # 每个站点的最大活跃字节数

JSON output for programmatic use

用于程序化使用的JSON输出

python3 perf/memory/analyze-dhat.py dhat-heap.json --json
undefined
python3 perf/memory/analyze-dhat.py dhat-heap.json --json
undefined

Sort Metrics

排序指标

FlagMetricUse when
gb
Bytes live at global peak (default)Finding what dominates memory at the high-water mark
eb
Bytes live at exitFinding memory leaks or things that never get freed
tb
Total bytes allocatedFinding allocation pressure hotspots (GC churn)
mb
Max bytes live per siteFinding per-site high-water marks
tbk
Total allocation countFinding chatty allocators (many small allocs)
标志指标使用场景
gb
全局峰值时的活跃字节数(默认)找出内存使用最高的部分
eb
退出时的活跃字节数找出内存泄漏或不应保留的内容
tb
总分配字节数找出分配压力热点(GC churn)
mb
每个站点的最大活跃字节数找出每个站点的内存使用峰值
tbk
总分配次数找出频繁分配的模块(大量小分配)

Analysis Flags

分析标志

  • --top N
    — Show top N sites (default 15)
  • --filter PATTERN
    — Filter to sites/stacks containing substring (e.g.
    mvcc
    ,
    btree
    ,
    wal
    ,
    pager
    )
  • --stacks
    — Show full callstacks for top allocation sites
  • --modules
    — Aggregate by crate/module for a high-level breakdown
  • --json
    — Machine-readable aggregated output
  • --top N
    — 显示前N个站点(默认15)
  • --filter PATTERN
    — 筛选包含指定子串的站点/调用栈(如
    mvcc
    btree
    wal
    pager
  • --stacks
    — 显示前N个分配站点的完整调用栈
  • --modules
    — 按crate/模块聚合,进行高层级拆解
  • --json
    — 机器可读的聚合输出

Typical Workflow

典型工作流程

When investigating memory usage or a suspected regression:
  1. Run the benchmark with parameters matching the scenario:
    bash
    cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4
  2. Get the high-level picture — which modules use the most memory:
    bash
    python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20
  3. Drill into the hot module — e.g. if
    turso_core
    dominates:
    bash
    python3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10
  4. Check for leaks — anything still alive at exit that shouldn't be:
    bash
    python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10
  5. Compare modes — run the same workload under WAL and MVCC and compare the reports to see the memory cost of MVCC versioning.
当调查内存使用或疑似内存回归时:
  1. 运行基准测试,使用与场景匹配的参数:
    bash
    cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4
  2. 获取高层级概览 — 哪些模块使用内存最多:
    bash
    python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20
  3. 深入分析热点模块 — 例如若
    turso_core
    占主导:
    bash
    python3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10
  4. 检查内存泄漏 — 退出时仍存活且不应存在的内容:
    bash
    python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10
  5. 对比模式 — 在WAL和MVCC模式下运行相同工作负载,对比报告以查看MVCC版本控制的内存成本。

Concurrency Details

并发细节

When
--connections > 1
:
  • Setup phase (schema creation, seeding) always runs on a single connection sequentially
  • Run phase spawns one tokio task per connection, each executing its batch concurrently
  • Each connection gets
    busy_timeout
    set (default 30s, configurable via
    --timeout
    )
  • WAL mode uses
    BEGIN
    , MVCC uses
    BEGIN CONCURRENT
  • The
    Profile
    trait's
    next_batch(connections)
    returns one batch per connection with non-overlapping row IDs
--connections > 1
时:
  • 设置阶段(模式创建、预填充)始终在单个连接上顺序运行
  • 运行阶段为每个连接生成一个tokio任务,每个任务并发执行其批次
  • 每个连接都会设置
    busy_timeout
    (默认30秒,可通过
    --timeout
    配置)
  • WAL模式使用
    BEGIN
    ,MVCC模式使用
    BEGIN CONCURRENT
  • Profile
    trait的
    next_batch(connections)
    为每个连接返回一个批次,且行ID不重叠

Adding a New Profile

添加新配置文件

  1. Create
    perf/memory/src/profile/your_profile.rs
    implementing the
    Profile
    trait
  2. Add
    pub mod your_profile;
    to
    perf/memory/src/profile/mod.rs
  3. Add a variant to
    WorkloadProfile
    enum in
    main.rs
  4. Wire it into
    create_profile()
    in
    main.rs
The
Profile
trait:
rust
pub trait Profile {
    fn name(&self) -> &str;
    fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}
Return
Phase::Setup
for schema/seeding (single batch),
Phase::Run
for measured work (one batch per connection),
Phase::Done
when finished.
  1. 创建
    perf/memory/src/profile/your_profile.rs
    ,实现
    Profile
    trait
  2. perf/memory/src/profile/mod.rs
    中添加
    pub mod your_profile;
  3. main.rs
    WorkloadProfile
    枚举中添加一个变体
  4. main.rs
    create_profile()
    中关联该变体
Profile
trait定义:
rust
pub trait Profile {
    fn name(&self) -> &str;
    fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}
返回
Phase::Setup
表示模式/预填充(单批次),
Phase::Run
表示测量工作(每个连接一个批次),
Phase::Done
表示完成。

Keeping This Skill Up to Date

保持本技能文档更新

This skill document is the source of truth for how agents use the memory benchmark tooling. If you modify the
perf/memory
crate — adding profiles, changing CLI flags, altering output format, updating the analysis script, changing the
Profile
trait, etc. — update this SKILL.md to match. Specifically:
  • New CLI flags: add to the "Running Benchmarks" section
  • New profiles: add to the "Built-in Workload Profiles" table
  • Changed output metrics: update the "Understanding the Output" section
  • New analyze-dhat.py flags or sort metrics: update the "Analyzing dhat Output" section
  • Changed
    Profile
    trait signature: update "Adding a New Profile"
Future agents rely on this document being accurate. Stale instructions cause wasted work.
本技能文档是Agent使用内存基准测试工具的权威来源。若你修改了
perf/memory
crate——添加配置文件、更改CLI标志、修改输出格式、更新分析脚本、更改
Profile
trait等,请更新本SKILL.md以匹配变更。具体包括:
  • 新CLI标志:添加到“运行基准测试”部分
  • 新配置文件:添加到“内置工作负载配置文件”表格
  • 变更的输出指标:更新“理解输出结果”部分
  • analyze-dhat.py的新标志或排序指标:更新“分析dhat输出”部分
  • 变更的
    Profile
    trait签名:更新“添加新配置文件”部分
未来Agent依赖本文档的准确性,过时的说明会导致无效工作。