Loading...
Loading...
Configure data accelerators for local materialization and caching in Spice (Arrow, DuckDB, SQLite, Cayenne, PostgreSQL, Turso). Use when asked to "accelerate data", "enable caching", "materialize dataset", "configure refresh", "set up local storage", "improve query performance", "choose an accelerator", or "configure snapshots".
npx skill4agent add spiceai/skills spice-acceleratorsdatasets:
- from: postgres:my_table
name: my_table
acceleration:
enabled: true
engine: duckdb # arrow, duckdb, sqlite, cayenne, postgres, turso
mode: memory # memory or file
refresh_check_interval: 1h| Use Case | Engine | Why |
|---|---|---|
| Small datasets (<1 GB), max speed | | In-memory, lowest latency |
| Medium datasets (1-100 GB), complex SQL | | Mature SQL, memory management |
| Large datasets (100 GB-1+ TB), analytics | | Built on Vortex (Linux Foundation), 10-20x faster scans |
| Point lookups on large datasets | | 100x faster random access vs Parquet |
| Simple queries, low resource usage | | Lightweight, minimal overhead |
| Async operations, concurrent workloads | | Native async, modern connection pooling |
| External database integration | | Leverage existing PostgreSQL infra |
| Engine | Mode | Status |
|---|---|---|
| memory | Stable |
| memory, file | Stable |
| memory, file | Release Candidate |
| file | Beta |
| N/A (attached) | Release Candidate |
| memory, file | Beta |
| Mode | Description | Use Case |
|---|---|---|
| Complete dataset replacement on each refresh | Small, slowly-changing datasets |
| Adds new records based on a | Append-only logs, time-series data |
| Continuous streaming without time column | Real-time event streams (Kafka, Debezium) |
| CDC-based incremental updates via Debezium or DynamoDB Streams | Frequently updated transactional data |
| Request-based row-level caching | API responses, HTTP endpoints |
# Full refresh every 8 hours
acceleration:
refresh_mode: full
refresh_check_interval: 8h
# Append mode: check for new records from the last day every 10 minutes
acceleration:
refresh_mode: append
time_column: created_at
refresh_check_interval: 10m
refresh_data_window: 1d
# Continuous ingestion using Kafka
acceleration:
refresh_mode: append
# CDC with Debezium or DynamoDB Streams
acceleration:
refresh_mode: changesacceleration:
enabled: true
engine: arrow
refresh_check_interval: 5mdatasets:
- from: postgres:events
name: events
time_column: created_at
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: append
refresh_check_interval: 1h
refresh_data_window: 7ddatasets:
- from: postgres:events
name: events
time_column: created_at
acceleration:
enabled: true
engine: duckdb
retention_check_enabled: true
retention_period: 30d
retention_check_interval: 1hacceleration:
retention_check_enabled: true
retention_check_interval: 1h
retention_sql: "DELETE FROM logs WHERE status = 'archived'"acceleration:
enabled: true
engine: sqlite
indexes:
user_id: enabled
'(created_at, status)': unique
primary_key: idacceleration:
engine: duckdb
mode: file
params:
duckdb_file: ./data/cache.dbacceleration:
engine: sqlite
mode: file
params:
sqlite_file: ./data/cache.sqliteacceleration:
enabled: true
engine: duckdb
primary_key: order_id # Creates non-null unique index
indexes:
customer_id: enabled # Single column index
'(created_at, status)': unique # Multi-column unique indexrefresh_completetime_intervalstream_batchessnapshots:
enabled: true
location: s3://my_bucket/snapshots/
bootstrap_on_failure_behavior: warn # warn | retry | fallback
params:
s3_auth: iam_roleacceleration:
enabled: true
engine: duckdb
mode: file
snapshots:
enabled: truemode: memorymode: file