memory-benchmark

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Memory Benchmarking & Analysis

内存基准测试与分析

The

perf/memory

crate benchmarks memory usage of SQL workloads under WAL and MVCC journal modes. It uses

dhat

as the global allocator to track every heap allocation, and

memory-stats

for process-level RSS snapshots.

perf/memory

crate用于在WAL和MVCC日志模式下对SQL工作负载的内存使用进行基准测试。它使用

dhat

作为全局分配器来跟踪每一次堆分配，并通过

memory-stats

获取进程级别的RSS快照。

Location

位置

Benchmark crate:
```
perf/memory/
```
Analysis script:
```
perf/memory/analyze-dhat.py
```
dhat output:
```
dhat-heap.json
```
(written to CWD after each run)

基准测试crate：
```
perf/memory/
```
分析脚本：
```
perf/memory/analyze-dhat.py
```
dhat输出：
```
dhat-heap.json
```
（每次运行后写入当前工作目录）

Running Benchmarks

运行基准测试

Always run in release mode — debug builds have wildly different allocation patterns and the results are not representative of real-world usage.

bash

undefined

始终在release模式下运行——debug构建的分配模式差异极大，其结果无法代表真实场景的使用情况。

bash

undefined

Basic: single connection, WAL mode, insert-heavy workload

基础用法：单连接、WAL模式、插入密集型工作负载

cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100

MVCC with concurrent connections

带并发连接的MVCC模式

cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4

All CLI options

所有CLI选项


Every run produces a `dhat-heap.json` in the current directory. This file contains per-allocation-site data for the entire run.


每次运行都会在当前目录生成一个`dhat-heap.json`文件，该文件包含整个运行过程中每个分配站点的数据。

Built-in Workload Profiles

内置工作负载配置文件

Profile	Description	Setup
`insert-heavy`	100% INSERT statements	Creates table
`read-heavy`	90% SELECT by id / 10% INSERT	Seeds 10k rows
`mixed`	50% SELECT / 50% INSERT	Seeds 10k rows
`scan-heavy`	Full table scans with LIKE	Seeds 10k rows

Profiles implement the

Profile

trait in

perf/memory/src/profile/

. To add a new workload, create a new file implementing the trait and wire it into the

WorkloadProfile

enum in

main.rs

配置文件	描述	设置
`insert-heavy`	100% INSERT语句	创建表
`read-heavy`	90% 按ID查询 / 10% 插入	预填充10k行数据
`mixed`	50% 查询 / 50% 插入	预填充10k行数据
`scan-heavy`	使用LIKE进行全表扫描	预填充10k行数据

配置文件在

perf/memory/src/profile/

中实现

Profile

trait。若要添加新的工作负载，创建一个实现该trait的新文件，并将其关联到

main.rs

中的

WorkloadProfile

枚举。

Understanding the Output

理解输出结果

The benchmark reports three categories of metrics:

基准测试报告三类指标：

RSS (process-level)

RSS（进程级别）

Measured via

memory-stats

crate. Includes everything: heap, mmap'd files (WAL, DB pages pulled into OS page cache), tokio runtime, etc. Snapshots are taken at phase transitions (setup -> run) and after each batch.

Baseline: RSS before any DB work (runtime overhead)
Peak: Highest RSS observed during the run
Net growth: Final RSS minus baseline — the memory attributable to the workload

通过

memory-stats

crate测量，包含所有内容：堆、内存映射文件（WAL、被操作系统页缓存加载的数据库页）、tokio运行时等。在阶段转换（设置→运行）和每个批次结束时拍摄快照。

基准值: 任何数据库工作开始前的RSS（运行时开销）
峰值: 运行过程中观测到的最高RSS
净增长: 最终RSS减去基准值——该工作负载导致的内存增量

Heap (dhat)

堆（dhat）

Precise allocation tracking via the

dhat

global allocator. Only counts explicit heap allocations (malloc/alloc), not mmap.

Current: Bytes still allocated at measurement time
Peak: Highest simultaneous live allocation during the entire run
Total allocs: Number of individual allocation calls
Total bytes: Cumulative bytes allocated (includes freed memory) — measures allocation pressure

通过

dhat

全局分配器进行精确的分配跟踪，仅统计显式堆分配（malloc/alloc），不包含内存映射。

当前值: 测量时仍已分配的字节数
峰值: 整个运行过程中同时存在的最高活跃分配量
总分配次数: 单独分配调用的次数
总字节数: 累计分配的字节数（包含已释放内存）——衡量分配压力

Disk

磁盘

File sizes after the benchmark completes:

DB file: The
```
.db
```
file
WAL file: The
```
.db-wal
```
file (WAL mode only)
Log file: The
```
.db-log
```
file (MVCC logical log only)

基准测试完成后的文件大小：

数据库文件:
```
.db
```
文件
WAL文件:
```
.db-wal
```
文件（仅WAL模式）
日志文件:
```
.db-log
```
文件（仅MVCC逻辑日志）

Analyzing dhat Output

分析dhat输出

After running a benchmark, use the analysis script to produce a readable report from

dhat-heap.json

bash

undefined

运行基准测试后，使用分析脚本从

dhat-heap.json

生成可读报告：

bash

undefined

Overview: top allocation sites by bytes live at global peak

概览：全局峰值时活跃字节数最多的前N个分配站点

python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules

Focus on a specific subsystem

聚焦特定子系统

python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks

Sort by different metrics

按不同指标排序

python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # bytes at exit (leaks) python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # total bytes (pressure) python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # max live bytes per site

python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # 退出时的字节数（内存泄漏） python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # 总字节数（分配压力） python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # 每个站点的最大活跃字节数

JSON output for programmatic use

用于程序化使用的JSON输出

python3 perf/memory/analyze-dhat.py dhat-heap.json --json

undefined

python3 perf/memory/analyze-dhat.py dhat-heap.json --json

undefined

Sort Metrics

排序指标

Flag	Metric	Use when
`gb`	Bytes live at global peak (default)	Finding what dominates memory at the high-water mark
`eb`	Bytes live at exit	Finding memory leaks or things that never get freed
`tb`	Total bytes allocated	Finding allocation pressure hotspots (GC churn)
`mb`	Max bytes live per site	Finding per-site high-water marks
`tbk`	Total allocation count	Finding chatty allocators (many small allocs)

标志	指标	使用场景
`gb`	全局峰值时的活跃字节数（默认）	找出内存使用最高的部分
`eb`	退出时的活跃字节数	找出内存泄漏或不应保留的内容
`tb`	总分配字节数	找出分配压力热点（GC churn）
`mb`	每个站点的最大活跃字节数	找出每个站点的内存使用峰值
`tbk`	总分配次数	找出频繁分配的模块（大量小分配）

Analysis Flags

分析标志

```
--top N
```
— Show top N sites (default 15)
```
--filter PATTERN
```
— Filter to sites/stacks containing substring (e.g.
```
mvcc
```
,
```
btree
```
,
```
wal
```
,
```
pager
```
)
```
--stacks
```
— Show full callstacks for top allocation sites
```
--modules
```
— Aggregate by crate/module for a high-level breakdown
```
--json
```
— Machine-readable aggregated output

```
--top N
```
— 显示前N个站点（默认15）
```
--filter PATTERN
```
— 筛选包含指定子串的站点/调用栈（如
```
mvcc
```
、
```
btree
```
、
```
wal
```
、
```
pager
```
）
```
--stacks
```
— 显示前N个分配站点的完整调用栈
```
--modules
```
— 按crate/模块聚合，进行高层级拆解
```
--json
```
— 机器可读的聚合输出

Typical Workflow

典型工作流程

When investigating memory usage or a suspected regression:

Run the benchmark with parameters matching the scenario:

bash

cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4

Get the high-level picture — which modules use the most memory:

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20

Drill into the hot module — e.g. if

turso_core

dominates:

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10

Check for leaks — anything still alive at exit that shouldn't be:

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10

Compare modes — run the same workload under WAL and MVCC and compare the reports to see the memory cost of MVCC versioning.

当调查内存使用或疑似内存回归时：

运行基准测试，使用与场景匹配的参数：

bash

cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4

获取高层级概览 — 哪些模块使用内存最多：

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20

深入分析热点模块 — 例如若

turso_core

占主导：

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10

检查内存泄漏 — 退出时仍存活且不应存在的内容：

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10

对比模式 — 在WAL和MVCC模式下运行相同工作负载，对比报告以查看MVCC版本控制的内存成本。

Concurrency Details

并发细节

When

--connections > 1

Setup phase (schema creation, seeding) always runs on a single connection sequentially
Run phase spawns one tokio task per connection, each executing its batch concurrently
Each connection gets
```
busy_timeout
```
set (default 30s, configurable via
```
--timeout
```
)
WAL mode uses
```
BEGIN
```
, MVCC uses
```
BEGIN CONCURRENT
```
The
```
Profile
```
trait's
```
next_batch(connections)
```
returns one batch per connection with non-overlapping row IDs

当

--connections > 1

时：

设置阶段（模式创建、预填充）始终在单个连接上顺序运行
运行阶段为每个连接生成一个tokio任务，每个任务并发执行其批次
每个连接都会设置
```
busy_timeout
```
（默认30秒，可通过
```
--timeout
```
配置）
WAL模式使用
```
BEGIN
```
，MVCC模式使用
```
BEGIN CONCURRENT
```
```
Profile
```
trait的
```
next_batch(connections)
```
为每个连接返回一个批次，且行ID不重叠

Adding a New Profile

添加新配置文件

Create

perf/memory/src/profile/your_profile.rs

implementing the

Profile

trait

Add

pub mod your_profile;

perf/memory/src/profile/mod.rs

Add a variant to
```
WorkloadProfile
```
enum in
```
main.rs
```
Wire it into
```
create_profile()
```
in
```
main.rs
```

The

Profile

trait:

rust

pub trait Profile {
    fn name(&self) -> &str;
    fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}

Return

Phase::Setup

for schema/seeding (single batch),

Phase::Run

for measured work (one batch per connection),

Phase::Done

when finished.

创建

perf/memory/src/profile/your_profile.rs

，实现

Profile

trait

在

perf/memory/src/profile/mod.rs

中添加

pub mod your_profile;

在
```
main.rs
```
的
```
WorkloadProfile
```
枚举中添加一个变体
在
```
main.rs
```
的
```
create_profile()
```
中关联该变体

Profile

trait定义：

rust

pub trait Profile {
    fn name(&self) -> &str;
    fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}

Phase::Setup

表示模式/预填充（单批次），

Phase::Run

表示测量工作（每个连接一个批次），

Phase::Done

表示完成。

Keeping This Skill Up to Date

保持本技能文档更新

This skill document is the source of truth for how agents use the memory benchmark tooling. If you modify the

perf/memory

crate — adding profiles, changing CLI flags, altering output format, updating the analysis script, changing the

Profile

trait, etc. — update this SKILL.md to match. Specifically:

New CLI flags: add to the "Running Benchmarks" section
New profiles: add to the "Built-in Workload Profiles" table
Changed output metrics: update the "Understanding the Output" section
New analyze-dhat.py flags or sort metrics: update the "Analyzing dhat Output" section
Changed
```
Profile
```
trait signature: update "Adding a New Profile"

Future agents rely on this document being accurate. Stale instructions cause wasted work.

本技能文档是Agent使用内存基准测试工具的权威来源。若你修改了

perf/memory

crate——添加配置文件、更改CLI标志、修改输出格式、更新分析脚本、更改

Profile

trait等，请更新本SKILL.md以匹配变更。具体包括：

新CLI标志：添加到“运行基准测试”部分
新配置文件：添加到“内置工作负载配置文件”表格
变更的输出指标：更新“理解输出结果”部分
analyze-dhat.py的新标志或排序指标：更新“分析dhat输出”部分
变更的
```
Profile
```
trait签名：更新“添加新配置文件”部分

未来Agent依赖本文档的准确性，过时的说明会导致无效工作。