memory-benchmark
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMemory Benchmarking & Analysis
内存基准测试与分析
The crate benchmarks memory usage of SQL workloads under WAL and MVCC journal modes. It uses as the global allocator to track every heap allocation, and for process-level RSS snapshots.
perf/memorydhatmemory-statsperf/memorydhatmemory-statsLocation
位置
- Benchmark crate:
perf/memory/ - Analysis script:
perf/memory/analyze-dhat.py - dhat output: (written to CWD after each run)
dhat-heap.json
- 基准测试crate:
perf/memory/ - 分析脚本:
perf/memory/analyze-dhat.py - dhat输出:(每次运行后写入当前工作目录)
dhat-heap.json
Running Benchmarks
运行基准测试
Always run in release mode — debug builds have wildly different allocation patterns and the results are not representative of real-world usage.
bash
undefined始终在release模式下运行——debug构建的分配模式差异极大,其结果无法代表真实场景的使用情况。
bash
undefinedBasic: single connection, WAL mode, insert-heavy workload
基础用法:单连接、WAL模式、插入密集型工作负载
cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100
cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100
MVCC with concurrent connections
带并发连接的MVCC模式
cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4
cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4
All CLI options
所有CLI选项
cargo run --release -p memory-benchmark --
--mode wal|mvcc
--workload insert-heavy|read-heavy|mixed|scan-heavy
-i <iterations>
-b <batch-size>
--connections <N>
--timeout <ms>
--cache-size <pages>
--format human|json|csv
--mode wal|mvcc
--workload insert-heavy|read-heavy|mixed|scan-heavy
-i <iterations>
-b <batch-size>
--connections <N>
--timeout <ms>
--cache-size <pages>
--format human|json|csv
Every run produces a `dhat-heap.json` in the current directory. This file contains per-allocation-site data for the entire run.cargo run --release -p memory-benchmark --
--mode wal|mvcc
--workload insert-heavy|read-heavy|mixed|scan-heavy
-i <iterations>
-b <batch-size>
--connections <N>
--timeout <ms>
--cache-size <pages>
--format human|json|csv
--mode wal|mvcc
--workload insert-heavy|read-heavy|mixed|scan-heavy
-i <iterations>
-b <batch-size>
--connections <N>
--timeout <ms>
--cache-size <pages>
--format human|json|csv
每次运行都会在当前目录生成一个`dhat-heap.json`文件,该文件包含整个运行过程中每个分配站点的数据。Built-in Workload Profiles
内置工作负载配置文件
| Profile | Description | Setup |
|---|---|---|
| 100% INSERT statements | Creates table |
| 90% SELECT by id / 10% INSERT | Seeds 10k rows |
| 50% SELECT / 50% INSERT | Seeds 10k rows |
| Full table scans with LIKE | Seeds 10k rows |
Profiles implement the trait in . To add a new workload, create a new file implementing the trait and wire it into the enum in .
Profileperf/memory/src/profile/WorkloadProfilemain.rs| 配置文件 | 描述 | 设置 |
|---|---|---|
| 100% INSERT语句 | 创建表 |
| 90% 按ID查询 / 10% 插入 | 预填充10k行数据 |
| 50% 查询 / 50% 插入 | 预填充10k行数据 |
| 使用LIKE进行全表扫描 | 预填充10k行数据 |
配置文件在中实现 trait。若要添加新的工作负载,创建一个实现该trait的新文件,并将其关联到中的枚举。
perf/memory/src/profile/Profilemain.rsWorkloadProfileUnderstanding the Output
理解输出结果
The benchmark reports three categories of metrics:
基准测试报告三类指标:
RSS (process-level)
RSS(进程级别)
Measured via crate. Includes everything: heap, mmap'd files (WAL, DB pages pulled into OS page cache), tokio runtime, etc. Snapshots are taken at phase transitions (setup -> run) and after each batch.
memory-stats- Baseline: RSS before any DB work (runtime overhead)
- Peak: Highest RSS observed during the run
- Net growth: Final RSS minus baseline — the memory attributable to the workload
通过 crate测量,包含所有内容:堆、内存映射文件(WAL、被操作系统页缓存加载的数据库页)、tokio运行时等。在阶段转换(设置→运行)和每个批次结束时拍摄快照。
memory-stats- 基准值: 任何数据库工作开始前的RSS(运行时开销)
- 峰值: 运行过程中观测到的最高RSS
- 净增长: 最终RSS减去基准值——该工作负载导致的内存增量
Heap (dhat)
堆(dhat)
Precise allocation tracking via the global allocator. Only counts explicit heap allocations (malloc/alloc), not mmap.
dhat- Current: Bytes still allocated at measurement time
- Peak: Highest simultaneous live allocation during the entire run
- Total allocs: Number of individual allocation calls
- Total bytes: Cumulative bytes allocated (includes freed memory) — measures allocation pressure
通过全局分配器进行精确的分配跟踪,仅统计显式堆分配(malloc/alloc),不包含内存映射。
dhat- 当前值: 测量时仍已分配的字节数
- 峰值: 整个运行过程中同时存在的最高活跃分配量
- 总分配次数: 单独分配调用的次数
- 总字节数: 累计分配的字节数(包含已释放内存)——衡量分配压力
Disk
磁盘
File sizes after the benchmark completes:
- DB file: The file
.db - WAL file: The file (WAL mode only)
.db-wal - Log file: The file (MVCC logical log only)
.db-log
基准测试完成后的文件大小:
- 数据库文件: 文件
.db - WAL文件: 文件(仅WAL模式)
.db-wal - 日志文件: 文件(仅MVCC逻辑日志)
.db-log
Analyzing dhat Output
分析dhat输出
After running a benchmark, use the analysis script to produce a readable report from :
dhat-heap.jsonbash
undefined运行基准测试后,使用分析脚本从生成可读报告:
dhat-heap.jsonbash
undefinedOverview: top allocation sites by bytes live at global peak
概览:全局峰值时活跃字节数最多的前N个分配站点
python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules
python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules
Focus on a specific subsystem
聚焦特定子系统
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks
Sort by different metrics
按不同指标排序
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # bytes at exit (leaks)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # total bytes (pressure)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # max live bytes per site
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # 退出时的字节数(内存泄漏)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # 总字节数(分配压力)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # 每个站点的最大活跃字节数
JSON output for programmatic use
用于程序化使用的JSON输出
python3 perf/memory/analyze-dhat.py dhat-heap.json --json
undefinedpython3 perf/memory/analyze-dhat.py dhat-heap.json --json
undefinedSort Metrics
排序指标
| Flag | Metric | Use when |
|---|---|---|
| Bytes live at global peak (default) | Finding what dominates memory at the high-water mark |
| Bytes live at exit | Finding memory leaks or things that never get freed |
| Total bytes allocated | Finding allocation pressure hotspots (GC churn) |
| Max bytes live per site | Finding per-site high-water marks |
| Total allocation count | Finding chatty allocators (many small allocs) |
| 标志 | 指标 | 使用场景 |
|---|---|---|
| 全局峰值时的活跃字节数(默认) | 找出内存使用最高的部分 |
| 退出时的活跃字节数 | 找出内存泄漏或不应保留的内容 |
| 总分配字节数 | 找出分配压力热点(GC churn) |
| 每个站点的最大活跃字节数 | 找出每个站点的内存使用峰值 |
| 总分配次数 | 找出频繁分配的模块(大量小分配) |
Analysis Flags
分析标志
- — Show top N sites (default 15)
--top N - — Filter to sites/stacks containing substring (e.g.
--filter PATTERN,mvcc,btree,wal)pager - — Show full callstacks for top allocation sites
--stacks - — Aggregate by crate/module for a high-level breakdown
--modules - — Machine-readable aggregated output
--json
- — 显示前N个站点(默认15)
--top N - — 筛选包含指定子串的站点/调用栈(如
--filter PATTERN、mvcc、btree、wal)pager - — 显示前N个分配站点的完整调用栈
--stacks - — 按crate/模块聚合,进行高层级拆解
--modules - — 机器可读的聚合输出
--json
Typical Workflow
典型工作流程
When investigating memory usage or a suspected regression:
-
Run the benchmark with parameters matching the scenario:bash
cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4 -
Get the high-level picture — which modules use the most memory:bash
python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20 -
Drill into the hot module — e.g. ifdominates:
turso_corebashpython3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10 -
Check for leaks — anything still alive at exit that shouldn't be:bash
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10 -
Compare modes — run the same workload under WAL and MVCC and compare the reports to see the memory cost of MVCC versioning.
当调查内存使用或疑似内存回归时:
-
运行基准测试,使用与场景匹配的参数:bash
cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4 -
获取高层级概览 — 哪些模块使用内存最多:bash
python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20 -
深入分析热点模块 — 例如若占主导:
turso_corebashpython3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10 -
检查内存泄漏 — 退出时仍存活且不应存在的内容:bash
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10 -
对比模式 — 在WAL和MVCC模式下运行相同工作负载,对比报告以查看MVCC版本控制的内存成本。
Concurrency Details
并发细节
When :
--connections > 1- Setup phase (schema creation, seeding) always runs on a single connection sequentially
- Run phase spawns one tokio task per connection, each executing its batch concurrently
- Each connection gets set (default 30s, configurable via
busy_timeout)--timeout - WAL mode uses , MVCC uses
BEGINBEGIN CONCURRENT - The trait's
Profilereturns one batch per connection with non-overlapping row IDsnext_batch(connections)
当时:
--connections > 1- 设置阶段(模式创建、预填充)始终在单个连接上顺序运行
- 运行阶段为每个连接生成一个tokio任务,每个任务并发执行其批次
- 每个连接都会设置(默认30秒,可通过
busy_timeout配置)--timeout - WAL模式使用,MVCC模式使用
BEGINBEGIN CONCURRENT - trait的
Profile为每个连接返回一个批次,且行ID不重叠next_batch(connections)
Adding a New Profile
添加新配置文件
- Create implementing the
perf/memory/src/profile/your_profile.rstraitProfile - Add to
pub mod your_profile;perf/memory/src/profile/mod.rs - Add a variant to enum in
WorkloadProfilemain.rs - Wire it into in
create_profile()main.rs
The trait:
Profilerust
pub trait Profile {
fn name(&self) -> &str;
fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}Return for schema/seeding (single batch), for measured work (one batch per connection), when finished.
Phase::SetupPhase::RunPhase::Done- 创建,实现
perf/memory/src/profile/your_profile.rstraitProfile - 在中添加
perf/memory/src/profile/mod.rspub mod your_profile; - 在的
main.rs枚举中添加一个变体WorkloadProfile - 在的
main.rs中关联该变体create_profile()
Profilerust
pub trait Profile {
fn name(&self) -> &str;
fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}返回表示模式/预填充(单批次),表示测量工作(每个连接一个批次),表示完成。
Phase::SetupPhase::RunPhase::DoneKeeping This Skill Up to Date
保持本技能文档更新
This skill document is the source of truth for how agents use the memory benchmark tooling. If you modify the crate — adding profiles, changing CLI flags, altering output format, updating the analysis script, changing the trait, etc. — update this SKILL.md to match. Specifically:
perf/memoryProfile- New CLI flags: add to the "Running Benchmarks" section
- New profiles: add to the "Built-in Workload Profiles" table
- Changed output metrics: update the "Understanding the Output" section
- New analyze-dhat.py flags or sort metrics: update the "Analyzing dhat Output" section
- Changed trait signature: update "Adding a New Profile"
Profile
Future agents rely on this document being accurate. Stale instructions cause wasted work.
本技能文档是Agent使用内存基准测试工具的权威来源。若你修改了 crate——添加配置文件、更改CLI标志、修改输出格式、更新分析脚本、更改 trait等,请更新本SKILL.md以匹配变更。具体包括:
perf/memoryProfile- 新CLI标志:添加到“运行基准测试”部分
- 新配置文件:添加到“内置工作负载配置文件”表格
- 变更的输出指标:更新“理解输出结果”部分
- analyze-dhat.py的新标志或排序指标:更新“分析dhat输出”部分
- 变更的trait签名:更新“添加新配置文件”部分
Profile
未来Agent依赖本文档的准确性,过时的说明会导致无效工作。