spice-accelerators

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Spice Data Accelerators

Spice 数据加速器

Accelerators materialize data locally from connected sources for faster queries and reduced load on source systems.

加速器可从关联数据源在本地物化数据，以实现更快的查询并降低源系统负载。

Basic Configuration

基础配置

yaml

datasets:
  - from: postgres:my_table
    name: my_table
    acceleration:
      enabled: true
      engine: duckdb # arrow, duckdb, sqlite, cayenne, postgres, turso
      mode: memory # memory or file
      refresh_check_interval: 1h

yaml

datasets:
  - from: postgres:my_table
    name: my_table
    acceleration:
      enabled: true
      engine: duckdb # arrow, duckdb, sqlite, cayenne, postgres, turso
      mode: memory # memory or file
      refresh_check_interval: 1h

Choosing an Accelerator

选择加速器

Use Case	Engine	Why
Small datasets (<1 GB), max speed	`arrow`	In-memory, lowest latency
Medium datasets (1-100 GB), complex SQL	`duckdb`	Mature SQL, memory management
Large datasets (100 GB-1+ TB), analytics	`cayenne`	Built on Vortex (Linux Foundation), 10-20x faster scans
Point lookups on large datasets	`cayenne`	100x faster random access vs Parquet
Simple queries, low resource usage	`sqlite`	Lightweight, minimal overhead
Async operations, concurrent workloads	`turso`	Native async, modern connection pooling
External database integration	`postgres`	Leverage existing PostgreSQL infra

使用场景	引擎	原因
小型数据集（<1 GB），追求极致速度	`arrow`	内存级存储，延迟最低
中型数据集（1-100 GB），复杂SQL查询	`duckdb`	成熟的SQL支持，优秀的内存管理
大型数据集（100 GB-1+ TB），分析场景	`cayenne`	基于Vortex（Linux基金会）构建，扫描速度快10-20倍
大型数据集上的点查询	`cayenne`	随机访问速度比Parquet快100倍
简单查询，低资源占用	`sqlite`	轻量级，开销极小
异步操作，并发工作负载	`turso`	原生异步支持，现代连接池机制
外部数据库集成	`postgres`	利用现有PostgreSQL基础设施

Cayenne vs DuckDB

Choose Cayenne when datasets exceed ~1 TB, multi-file ingestion is needed, or point lookups are common. Choose DuckDB when datasets are under ~1 TB, complex SQL (window functions, CTEs) is needed, or DuckDB tooling is beneficial.

当数据集超过约1 TB、需要多文件 ingestion 或点查询频繁时，选择Cayenne。当数据集小于约1 TB、需要复杂SQL（窗口函数、CTE）或借助DuckDB工具链时，选择DuckDB。

Supported Engines

支持的引擎

Engine	Mode	Status
`arrow`	memory	Stable
`duckdb`	memory, file	Stable
`sqlite`	memory, file	Release Candidate
`cayenne`	file	Beta
`postgres`	N/A (attached)	Release Candidate
`turso`	memory, file	Beta

引擎	模式	状态
`arrow`	memory	稳定
`duckdb`	memory, file	稳定
`sqlite`	memory, file	候选发布版
`cayenne`	file	Beta
`postgres`	N/A (attached)	候选发布版
`turso`	memory, file	Beta

Refresh Modes

刷新模式

Mode	Description	Use Case
`full`	Complete dataset replacement on each refresh	Small, slowly-changing datasets
`append` (batch)	Adds new records based on a `time_column`	Append-only logs, time-series data
`append` (stream)	Continuous streaming without time column	Real-time event streams (Kafka, Debezium)
`changes`	CDC-based incremental updates via Debezium or DynamoDB Streams	Frequently updated transactional data
`caching`	Request-based row-level caching	API responses, HTTP endpoints

yaml

undefined

模式	描述	使用场景
`full`	每次刷新时完全替换数据集	小型、变化缓慢的数据集
`append` (批量)	基于 `time_column` 添加新记录	仅追加日志、时序数据
`append` (流)	无时间列的持续流摄入	实时事件流（Kafka、Debezium）
`changes`	基于CDC的增量更新（通过Debezium或DynamoDB Streams）	频繁更新的事务数据
`caching`	基于请求的行级缓存	API响应、HTTP端点

yaml

undefined

Full refresh every 8 hours

每8小时全量刷新

acceleration: refresh_mode: full refresh_check_interval: 8h

Append mode: check for new records from the last day every 10 minutes

追加模式：每10分钟检查过去一天的新记录

acceleration: refresh_mode: append time_column: created_at refresh_check_interval: 10m refresh_data_window: 1d

Continuous ingestion using Kafka

使用Kafka的持续摄入

acceleration: refresh_mode: append

CDC with Debezium or DynamoDB Streams

基于Debezium或DynamoDB Streams的CDC

acceleration: refresh_mode: changes

undefined

acceleration: refresh_mode: changes

undefined

Common Configurations

常见配置

In-Memory with Interval Refresh

内存模式+间隔刷新

yaml

acceleration:
  enabled: true
  engine: arrow
  refresh_check_interval: 5m

yaml

acceleration:
  enabled: true
  engine: arrow
  refresh_check_interval: 5m

File-Based with Append and Time Window

文件模式+追加+时间窗口

yaml

datasets:
  - from: postgres:events
    name: events
    time_column: created_at
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_mode: append
      refresh_check_interval: 1h
      refresh_data_window: 7d

yaml

datasets:
  - from: postgres:events
    name: events
    time_column: created_at
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_mode: append
      refresh_check_interval: 1h
      refresh_data_window: 7d

With Retention Policy

带保留策略

Retention policies prevent unbounded growth of accelerated datasets. Spice supports time-based and custom SQL-based retention strategies:

yaml

datasets:
  - from: postgres:events
    name: events
    time_column: created_at
    acceleration:
      enabled: true
      engine: duckdb
      retention_check_enabled: true
      retention_period: 30d
      retention_check_interval: 1h

保留策略可防止加速数据集无限制增长。Spice支持基于时间和自定义SQL的保留策略：

yaml

datasets:
  - from: postgres:events
    name: events
    time_column: created_at
    acceleration:
      enabled: true
      engine: duckdb
      retention_check_enabled: true
      retention_period: 30d
      retention_check_interval: 1h

With SQL-Based Retention

基于SQL的保留

yaml

acceleration:
  retention_check_enabled: true
  retention_check_interval: 1h
  retention_sql: "DELETE FROM logs WHERE status = 'archived'"

yaml

acceleration:
  retention_check_enabled: true
  retention_check_interval: 1h
  retention_sql: "DELETE FROM logs WHERE status = 'archived'"

With Indexes (DuckDB, SQLite, Turso)

带索引（DuckDB、SQLite、Turso）

yaml

acceleration:
  enabled: true
  engine: sqlite
  indexes:
    user_id: enabled
    '(created_at, status)': unique
  primary_key: id

yaml

acceleration:
  enabled: true
  engine: sqlite
  indexes:
    user_id: enabled
    '(created_at, status)': unique
  primary_key: id

Engine-Specific Parameters

引擎特定参数

DuckDB

yaml

acceleration:
  engine: duckdb
  mode: file
  params:
    duckdb_file: ./data/cache.db

yaml

acceleration:
  engine: duckdb
  mode: file
  params:
    duckdb_file: ./data/cache.db

SQLite

yaml

acceleration:
  engine: sqlite
  mode: file
  params:
    sqlite_file: ./data/cache.sqlite

yaml

acceleration:
  engine: sqlite
  mode: file
  params:
    sqlite_file: ./data/cache.sqlite

Constraints and Indexes

约束与索引

Accelerated datasets support primary key constraints and indexes:

yaml

acceleration:
  enabled: true
  engine: duckdb
  primary_key: order_id # Creates non-null unique index
  indexes:
    customer_id: enabled # Single column index
    '(created_at, status)': unique # Multi-column unique index

加速数据集支持主键约束和索引：

yaml

acceleration:
  enabled: true
  engine: duckdb
  primary_key: order_id # 创建非空唯一索引
  indexes:
    customer_id: enabled # 单列索引
    '(created_at, status)': unique # 多列唯一索引

Snapshots (DuckDB, SQLite & Cayenne file mode)

快照（DuckDB、SQLite & Cayenne文件模式）

Bootstrap file-based accelerations from S3 or filesystem snapshots on startup. This dramatically reduces cold-start latency in distributed deployments.

Snapshot triggers vary by refresh mode:

```
refresh_complete
```
: Creates snapshots after each refresh (full and batch-append modes)
```
time_interval
```
: Creates snapshots on a fixed schedule (all refresh modes)
```
stream_batches
```
: Creates snapshots after every N batches (streaming modes: Kafka, Debezium, DynamoDB Streams)

yaml

snapshots:
  enabled: true
  location: s3://my_bucket/snapshots/
  bootstrap_on_failure_behavior: warn # warn | retry | fallback
  params:
    s3_auth: iam_role

Per-dataset opt-in:

yaml

acceleration:
  enabled: true
  engine: duckdb
  mode: file
  snapshots:
    enabled: true

启动时从S3或文件系统快照初始化基于文件的加速器。这可大幅降低分布式部署中的冷启动延迟。

快照触发方式因刷新模式而异：

```
refresh_complete
```
: 每次刷新后创建快照（全量和批量追加模式）
```
time_interval
```
: 按固定计划创建快照（所有刷新模式）
```
stream_batches
```
: 每处理N个批次后创建快照（流模式：Kafka、Debezium、DynamoDB Streams）

yaml

snapshots:
  enabled: true
  location: s3://my_bucket/snapshots/
  bootstrap_on_failure_behavior: warn # warn | retry | fallback
  params:
    s3_auth: iam_role

按数据集启用：

yaml

acceleration:
  enabled: true
  engine: duckdb
  mode: file
  snapshots:
    enabled: true

Memory Considerations

内存注意事项

When using

mode: memory

(default), the dataset is loaded into RAM. Ensure sufficient memory including overhead for queries and the runtime. Mitigate with

mode: file

for duckdb, sqlite, turso, or cayenne accelerators.

使用

mode: memory

（默认）时，数据集会加载到RAM中。确保有足够的内存，包括查询和运行时的开销。对于duckdb、sqlite、turso或cayenne加速器，可使用

mode: file

来缓解内存压力。

spice-accelerators

Original

Translation

Spice Data Accelerators

Spice 数据加速器

Basic Configuration

基础配置

Choosing an Accelerator

选择加速器

Cayenne vs DuckDB

Cayenne vs DuckDB

Supported Engines

支持的引擎

Refresh Modes

刷新模式

Full refresh every 8 hours

每8小时全量刷新

Append mode: check for new records from the last day every 10 minutes

追加模式：每10分钟检查过去一天的新记录

Continuous ingestion using Kafka

使用Kafka的持续摄入

CDC with Debezium or DynamoDB Streams

基于Debezium或DynamoDB Streams的CDC

Common Configurations

常见配置

In-Memory with Interval Refresh

内存模式+间隔刷新

File-Based with Append and Time Window

文件模式+追加+时间窗口

With Retention Policy

带保留策略

With SQL-Based Retention

基于SQL的保留

With Indexes (DuckDB, SQLite, Turso)

带索引（DuckDB、SQLite、Turso）

Engine-Specific Parameters

引擎特定参数

DuckDB

DuckDB

SQLite

SQLite

Constraints and Indexes

约束与索引

Snapshots (DuckDB, SQLite & Cayenne file mode)

快照（DuckDB、SQLite & Cayenne文件模式）

Memory Considerations

内存注意事项

Documentation

文档