spice-accelerators
Original:🇺🇸 English
Translated
Configure data accelerators for local materialization and caching in Spice (Arrow, DuckDB, SQLite, Cayenne, PostgreSQL, Turso). Use when asked to "accelerate data", "enable caching", "materialize dataset", "configure refresh", "set up local storage", "improve query performance", "choose an accelerator", or "configure snapshots".
1installs
Sourcespiceai/skills
Added on
NPX Install
npx skill4agent add spiceai/skills spice-acceleratorsTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Spice Data Accelerators
Accelerators materialize data locally from connected sources for faster queries and reduced load on source systems.
Basic Configuration
yaml
datasets:
- from: postgres:my_table
name: my_table
acceleration:
enabled: true
engine: duckdb # arrow, duckdb, sqlite, cayenne, postgres, turso
mode: memory # memory or file
refresh_check_interval: 1hChoosing an Accelerator
| Use Case | Engine | Why |
|---|---|---|
| Small datasets (<1 GB), max speed | | In-memory, lowest latency |
| Medium datasets (1-100 GB), complex SQL | | Mature SQL, memory management |
| Large datasets (100 GB-1+ TB), analytics | | Built on Vortex (Linux Foundation), 10-20x faster scans |
| Point lookups on large datasets | | 100x faster random access vs Parquet |
| Simple queries, low resource usage | | Lightweight, minimal overhead |
| Async operations, concurrent workloads | | Native async, modern connection pooling |
| External database integration | | Leverage existing PostgreSQL infra |
Cayenne vs DuckDB
Choose Cayenne when datasets exceed ~1 TB, multi-file ingestion is needed, or point lookups are common.
Choose DuckDB when datasets are under ~1 TB, complex SQL (window functions, CTEs) is needed, or DuckDB tooling is beneficial.
Supported Engines
| Engine | Mode | Status |
|---|---|---|
| memory | Stable |
| memory, file | Stable |
| memory, file | Release Candidate |
| file | Beta |
| N/A (attached) | Release Candidate |
| memory, file | Beta |
Refresh Modes
| Mode | Description | Use Case |
|---|---|---|
| Complete dataset replacement on each refresh | Small, slowly-changing datasets |
| Adds new records based on a | Append-only logs, time-series data |
| Continuous streaming without time column | Real-time event streams (Kafka, Debezium) |
| CDC-based incremental updates via Debezium or DynamoDB Streams | Frequently updated transactional data |
| Request-based row-level caching | API responses, HTTP endpoints |
yaml
# Full refresh every 8 hours
acceleration:
refresh_mode: full
refresh_check_interval: 8h
# Append mode: check for new records from the last day every 10 minutes
acceleration:
refresh_mode: append
time_column: created_at
refresh_check_interval: 10m
refresh_data_window: 1d
# Continuous ingestion using Kafka
acceleration:
refresh_mode: append
# CDC with Debezium or DynamoDB Streams
acceleration:
refresh_mode: changesCommon Configurations
In-Memory with Interval Refresh
yaml
acceleration:
enabled: true
engine: arrow
refresh_check_interval: 5mFile-Based with Append and Time Window
yaml
datasets:
- from: postgres:events
name: events
time_column: created_at
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: append
refresh_check_interval: 1h
refresh_data_window: 7dWith Retention Policy
Retention policies prevent unbounded growth of accelerated datasets. Spice supports time-based and custom SQL-based retention strategies:
yaml
datasets:
- from: postgres:events
name: events
time_column: created_at
acceleration:
enabled: true
engine: duckdb
retention_check_enabled: true
retention_period: 30d
retention_check_interval: 1hWith SQL-Based Retention
yaml
acceleration:
retention_check_enabled: true
retention_check_interval: 1h
retention_sql: "DELETE FROM logs WHERE status = 'archived'"With Indexes (DuckDB, SQLite, Turso)
yaml
acceleration:
enabled: true
engine: sqlite
indexes:
user_id: enabled
'(created_at, status)': unique
primary_key: idEngine-Specific Parameters
DuckDB
yaml
acceleration:
engine: duckdb
mode: file
params:
duckdb_file: ./data/cache.dbSQLite
yaml
acceleration:
engine: sqlite
mode: file
params:
sqlite_file: ./data/cache.sqliteConstraints and Indexes
Accelerated datasets support primary key constraints and indexes:
yaml
acceleration:
enabled: true
engine: duckdb
primary_key: order_id # Creates non-null unique index
indexes:
customer_id: enabled # Single column index
'(created_at, status)': unique # Multi-column unique indexSnapshots (DuckDB, SQLite & Cayenne file mode)
Bootstrap file-based accelerations from S3 or filesystem snapshots on startup. This dramatically reduces cold-start latency in distributed deployments.
Snapshot triggers vary by refresh mode:
- : Creates snapshots after each refresh (full and batch-append modes)
refresh_complete - : Creates snapshots on a fixed schedule (all refresh modes)
time_interval - : Creates snapshots after every N batches (streaming modes: Kafka, Debezium, DynamoDB Streams)
stream_batches
yaml
snapshots:
enabled: true
location: s3://my_bucket/snapshots/
bootstrap_on_failure_behavior: warn # warn | retry | fallback
params:
s3_auth: iam_rolePer-dataset opt-in:
yaml
acceleration:
enabled: true
engine: duckdb
mode: file
snapshots:
enabled: trueMemory Considerations
When using (default), the dataset is loaded into RAM. Ensure sufficient memory including overhead for queries and the runtime. Mitigate with for duckdb, sqlite, turso, or cayenne accelerators.
mode: memorymode: file