transaction-correctness

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Transaction Correctness Guide

事务正确性指南

Turso uses WAL (Write-Ahead Logging) mode exclusively.
Files:
.db
,
.db-wal
(no
.db-shm
- Turso uses in-memory WAL index)
Turso仅使用WAL(预写日志)模式。
文件:
.db
.db-wal
(无
.db-shm
文件 - Turso使用内存中的WAL索引)

WAL Mechanics

WAL机制

Write Path

写入路径

  1. Writer appends frames (page data) to WAL file (sequential I/O)
  2. COMMIT = frame with non-zero db_size in header (marks transaction end)
  3. Original DB unchanged until checkpoint
  1. 写入器将帧(页面数据)追加到WAL文件(顺序I/O)
  2. 提交(COMMIT)= 头部包含非零db_size的帧(标记事务结束)
  3. 原始数据库在检查点完成前保持不变

Read Path

读取路径

  1. Reader acquires read mark (mxFrame = last valid commit frame)
  2. For each page: check WAL up to mxFrame, fall back to main DB
  3. Reader sees consistent snapshot at its read mark
  1. 读取器获取读取标记(mxFrame = 最后一个有效提交帧)
  2. 对于每个页面:检查WAL中直到mxFrame的内容,若未找到则回退到主数据库
  3. 读取器在其读取标记处看到一致的快照

Checkpointing

检查点机制

Transfers WAL content back to main DB.
WAL grows → checkpoint triggered (default: 1000 pages) → pages copied to DB → WAL reused
Checkpoint types:
  • PASSIVE: Non-blocking, stops at pages needed by active readers
  • FULL: Waits for readers, checkpoints everything
  • RESTART: Like FULL, also resets WAL to beginning
  • TRUNCATE: Like RESTART, also truncates WAL file to zero length
将WAL内容回写到主数据库。
WAL grows → checkpoint triggered (default: 1000 pages) → pages copied to DB → WAL reused
检查点类型:
  • 被动(PASSIVE):非阻塞,在活跃读取器需要的页面处停止
  • 完全(FULL):等待读取器完成,对所有内容执行检查点
  • 重启(RESTART):与完全检查点类似,同时将WAL重置到起始位置
  • 截断(TRUNCATE):与重启检查点类似,同时将WAL文件截断为零长度

WAL-Index

WAL索引

SQLite uses a shared memory file (
-shm
) for WAL index. Turso does not - it uses in-memory data structures (
frame_cache
hashmap, atomic read marks) since multi-process access is not supported.
SQLite使用共享内存文件(
-shm
)作为WAL索引。Turso不采用这种方式——由于不支持多进程访问,它使用内存中的数据结构(
frame_cache
哈希表、原子读取标记)。

Concurrency Rules

并发规则

  • One writer at a time
  • Readers don't block writer, writer doesn't block readers
  • Checkpoint must stop at pages needed by active readers
  • 同一时间仅允许一个写入器
  • 读取器不会阻塞写入器,写入器也不会阻塞读取器
  • 检查点必须在活跃读取器需要的页面处停止

Recovery

恢复机制

On crash:
  1. First connection acquires exclusive lock
  2. Replays valid commits from WAL
  3. Releases lock, normal operation resumes
崩溃时:
  1. 第一个连接获取排他锁
  2. 从WAL中重放有效的提交
  3. 释放锁,恢复正常操作

Turso Implementation

Turso实现细节

Key files:
  • WAL implementation - WAL implementation
  • Page management, transactions
关键文件:
  • WAL实现 - WAL实现代码
  • 页面管理、事务处理

Connection-Private vs Shared

连接私有 vs 共享资源

Per-Connection (private):
  • Pager
    - page cache, dirty pages, savepoints, commit state
  • WalFile
    - connection's snapshot view:
    • max_frame
      /
      min_frame
      - frame range for this connection's snapshot
    • max_frame_read_lock_index
      - which read lock slot this connection holds
    • last_checksum
      - rolling checksum state
Shared across connections:
  • WalFileShared
    - global WAL state:
    • frame_cache
      - page-to-frame index (replaces
      .shm
      file)
    • max_frame
      /
      nbackfills
      - global WAL progress
    • read_locks[5]
      - read mark slots (TursoRwLock with embedded frame values)
    • write_lock
      - exclusive writer lock
    • checkpoint_lock
      - checkpoint serialization
    • file
      - WAL file handle
  • DatabaseStorage
    - main
    .db
    file
  • BufferPool
    - shared memory allocation
每个连接私有:
  • Pager
    - 页面缓存、脏页、保存点、提交状态
  • WalFile
    - 连接的快照视图:
    • max_frame
      /
      min_frame
      - 该连接快照的帧范围
    • max_frame_read_lock_index
      - 该连接持有的读取锁槽位
    • last_checksum
      - 滚动校验和状态
跨连接共享:
  • WalFileShared
    - 全局WAL状态:
    • frame_cache
      - 页面到帧的索引(替代
      .shm
      文件)
    • max_frame
      /
      nbackfills
      - 全局WAL进度
    • read_locks[5]
      - 读取标记槽位(带有嵌入帧值的TursoRwLock)
    • write_lock
      - 排他写入锁
    • checkpoint_lock
      - 检查点序列化锁
    • file
      - WAL文件句柄
  • DatabaseStorage
    - 主
    .db
    文件
  • BufferPool
    - 共享内存分配池

Correctness Invariants

正确性不变量

  1. Durability: COMMIT record must be fsynced before returning success
  2. Atomicity: Partial transactions never visible to readers
  3. Isolation: Each reader sees consistent snapshot
  4. No lost updates: Checkpoint can't overwrite uncommitted changes
  1. 持久性:COMMIT记录必须在返回成功前完成fsync同步
  2. 原子性:部分完成的事务对读取器不可见
  3. 隔离性:每个读取器看到一致的快照
  4. 无更新丢失:检查点不能覆盖未提交的更改

References

参考资料