Turbo Pipeline Configuration Reference
YAML configuration reference for Turbo pipelines. This is a lookup reference — for interactive pipeline building, use
. For pipeline troubleshooting, use
.
CRITICAL: Always validate YAML with
goldsky turbo validate <file.yaml>
before showing complete pipeline YAML to the user or deploying.
Quick Start
Deploy a minimal pipeline:
yaml
name: my-first-pipeline
resource_size: s
sources:
transfers:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: latest
transforms: {}
sinks:
output:
type: blackhole
from: transfers
bash
# Validate first:
goldsky turbo validate pipeline.yaml
# Then deploy:
goldsky turbo apply pipeline.yaml -i
Prerequisites
Discovering Available Data Sources
For dataset discovery, invoke the skill.
Quick reference for common datasets:
| What They Want | Dataset to Use |
|---|
| Token transfers (fungible) | |
| NFT transfers | |
| All contract events | |
| Block data | |
| Transaction data | |
For full chain prefixes, dataset types, and version discovery, use
.
Quick Reference
Installation Commands
| Action | Command |
|---|
| Install Goldsky CLI | curl https://goldsky.com | sh
|
| Install Turbo extension | curl https://install-turbo.goldsky.com | sh
|
| Verify Turbo installed | |
Pipeline Commands
| Action | Command |
|---|
| List datasets | ⚠️ Slow (30-60s) |
| Validate (REQUIRED) | goldsky turbo validate pipeline.yaml
✓ Fast (3s) |
| Deploy/Update | goldsky turbo apply pipeline.yaml
|
| Deploy + Inspect | goldsky turbo apply pipeline.yaml -i
|
| List pipelines | |
| View live data | goldsky turbo inspect <name>
|
| Inspect node | goldsky turbo inspect <name> -n <node>
|
| View logs | goldsky turbo logs <name>
|
| Follow logs | goldsky turbo logs <name> --follow
|
| List secrets | |
For pause, resume, restart, and delete commands, see
.
Configuration Reference
Pipeline Structure
Every Turbo pipeline YAML has this structure:
yaml
name: my-pipeline # Required: unique identifier
resource_size: s # Required: s, m, or l
description: "Optional desc" # Optional: what the pipeline does
sources:
source_name: # Define data inputs
type: dataset
# ... source config
transforms: # Optional: process data
transform_name:
type: sql
# ... transform config
sinks:
sink_name: # Define data outputs
type: postgres
# ... sink config
Top-Level Fields
| Field | Required | Description |
|---|
| Yes | Unique pipeline identifier (lowercase, hyphens) |
| Yes | Worker allocation: , , or |
| No | Human-readable description |
| No | for one-time batch jobs (default: = streaming) |
| Yes | Data input definitions |
| No | Data processing definitions |
| Yes | Data output definitions |
Job Mode
Set
for one-time batch processing (historical backfills, data exports):
yaml
name: backfill-usdc-history
resource_size: l
job: true
sources:
logs:
type: dataset
dataset_name: ethereum.raw_logs
version: 1.0.0
start_at: earliest
end_block: 19000000
filter: >-
address = '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'
transforms: {}
sinks:
output:
type: s3_sink
from: logs
endpoint: https://s3.amazonaws.com
bucket: my-backfill-bucket
prefix: usdc/
secret_name: MY_S3
Job mode rules:
- Runs to completion and auto-cleans up ~1 hour after finishing
- Must before redeploying — cannot update in-place
- Cannot use — use delete + apply instead
- Use to bound the range (otherwise processes to chain tip and stops)
- Best with for faster backfills
For architecture guidance on when to use job vs streaming mode, see .
Resource Sizes
| Size | Workers | Use Case |
|---|
| 1 | Testing, low-volume data |
| 2 | Production, moderate volume |
| 4 | High-volume, multi-chain pipelines |
Source Configuration
Dataset Source
yaml
sources:
my_source:
type: dataset
dataset_name: <chain>.<dataset_type>
version: <version>
start_at: latest | earliest # EVM chains
# OR
start_block: <slot_number> # Solana only
Source Fields
| Field | Required | Description |
|---|
| Yes | for blockchain data |
| Yes | Format: |
| Yes | Dataset version (e.g., ) |
| EVM | or |
| Solana | Specific slot number (omit for latest) |
| No | Stop processing at this block (for bounded backfills) |
| No | SQL WHERE clause to pre-filter at source level (efficient) |
Source-Level Filtering
Use
to reduce data volume
before it reaches transforms. This is significantly more efficient than filtering in SQL transforms because it eliminates data at the ingestion layer:
yaml
sources:
usdc_logs:
type: dataset
dataset_name: base.raw_logs
version: 1.0.0
start_at: earliest
filter: >-
address = lower('0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913')
AND block_number >= 10000000
Best practices:
- Use for contract addresses and block ranges (coarse pre-filtering)
- Use transform for event types, parameter values, exclusions (fine-grained)
- uses standard SQL WHERE syntax (same as DataFusion)
- Combine with + for precise bounded backfills
Chains and Dataset Types
For the full list of chains, prefixes, and dataset types, see
. Key points:
- EVM chains: , , (Polygon — not ), , , ,
- Non-EVM: (uses not ), , , , , ,
- EVM dataset types: , (not ), , , , ,
Transform Configuration
Transform Types
| Type | Use Case |
|---|
| Filtering, projections, SQL functions |
| Custom TypeScript/WASM logic |
| Call external HTTP APIs to enrich data |
| Lookup tables backed by a database |
SQL Transform
Most common transform type:
yaml
transforms:
filtered:
type: sql
primary_key: id
sql: |
SELECT
id,
sender,
recipient,
amount
FROM source_name
WHERE amount > 1000
| Field | Required | Description |
|---|
| Yes | |
| Yes | Column for uniqueness/ordering |
| Yes | SQL query (reference sources by name) |
| No | Override default source (for chaining) |
TypeScript Transform
For complex logic that SQL can't handle (runs in WASM sandbox):
yaml
transforms:
custom:
type: script
primary_key: id
language: typescript
from: source_name
schema:
id: string
sender: string
amount: string
processed_at: string
script: |
function invoke(data) {
if (data.amount < 1000) return null; // Filter out
return {
id: data.id,
sender: data.sender,
amount: data.amount,
processed_at: new Date().toISOString()
};
}
For full TypeScript transform documentation, schema types, and examples, see .
Dynamic Table Transform
Updatable lookup tables for runtime filtering (allowlists, blocklists, enrichment):
yaml
transforms:
tracked_wallets:
type: dynamic_table
backend_type: Postgres # or: InMemory
backend_entity_name: tracked_wallets
secret_name: MY_DB # required for Postgres
Use with
in SQL transforms:
sql
WHERE dynamic_table_check('tracked_wallets', sender)
For full dynamic table documentation, backend options, and examples, see .
Handler Transform
Call external HTTP APIs to enrich data:
yaml
transforms:
enriched:
type: handler
primary_key: id
from: my_source
url: https://my-api.example.com/enrich
headers:
Authorization: Bearer my-token
batch_size: 100
timeout_ms: 5000
For full handler transform documentation, see .
Transform Chaining
yaml
transforms:
step1:
type: sql
primary_key: id
sql: SELECT * FROM source WHERE amount > 100
step2:
type: sql
primary_key: id
from: step1
sql: SELECT *, 'processed' as status FROM step1
Sink Configuration
Common Sink Fields
| Field | Required | Description |
|---|
| Yes | Sink type |
| Yes | Source or transform to read from |
| Varies | Secret for credentials (most sinks) |
| Varies | Column for upserts (database sinks) |
Blackhole Sink (Testing)
yaml
sinks:
test_output:
type: blackhole
from: my_transform
PostgreSQL Sink
yaml
sinks:
postgres_output:
type: postgres
from: my_transform
schema: public
table: my_table
secret_name: MY_POSTGRES_SECRET
primary_key: id
Secret format: PostgreSQL connection string:
postgres://username:password@host:port/database
PostgreSQL Aggregate Sink
Real-time aggregations in PostgreSQL using database triggers. Data flows into a landing table, and a trigger maintains aggregated values in a separate table.
yaml
sinks:
balances:
type: postgres_aggregate
from: transfers
schema: public
landing_table: transfer_log
agg_table: account_balances
primary_key: transfer_id
secret_name: MY_POSTGRES
group_by:
account:
type: text
aggregate:
balance:
from: amount
fn: sum
Supported aggregation functions:
,
,
,
,
ClickHouse Sink
yaml
sinks:
clickhouse_output:
type: clickhouse
from: my_transform
table: my_table
secret_name: MY_CLICKHOUSE_SECRET
primary_key: id
Secret format: ClickHouse connection string:
https://username:password@host:port/database
Kafka Sink
yaml
sinks:
kafka_output:
type: kafka
from: my_transform
topic: my-topic
topic_partitions: 10
data_format: avro # or: json
schema_registry_url: http://schema-registry:8081 # required for avro
Webhook Sink
Note: Turbo webhook sinks do not support Goldsky's native secrets management. Include auth headers directly in the pipeline config.
yaml
sinks:
webhook_output:
type: webhook
from: my_transform
url: https://api.example.com/webhook
one_row_per_request: true
headers:
Authorization: Bearer your-token
Content-Type: application/json
S3 Sink
yaml
sinks:
s3_output:
type: s3_sink
from: my_transform
endpoint: https://s3.amazonaws.com
bucket: my-bucket
prefix: data/
secret_name: MY_S3_SECRET
Secret format: access_key_id:secret_access_key
(or
access_key_id:secret_access_key:session_token
for temporary credentials)
S2 Sink
Publish to
S2.dev streams — a serverless alternative to Kafka.
yaml
sinks:
s2_output:
type: s2_sink
from: my_transform
access_token: your_access_token
basin: your-basin-name
stream: your-stream-name
Starter Templates
Template files are available in the folder. Copy and customize these for your pipelines.
| Template | Description | Use Case |
|---|
minimal-erc20-blackhole.yaml
| Simplest pipeline, no credentials | Quick testing |
filtered-transfers-sql.yaml
| Filter by contract address | USDC, specific tokens |
| Write to PostgreSQL | Production data storage |
multi-chain-pipeline.yaml
| Combine multiple chains | Cross-chain analytics |
| Solana SPL tokens | Non-EVM chains |
| Multiple outputs | Archive + alerts + streaming |
To use a template:
bash
# Copy template to your project
cp templates/minimal-erc20-blackhole.yaml my-pipeline.yaml
# Customize as needed, then validate
goldsky turbo validate my-pipeline.yaml
# Deploy
goldsky turbo apply my-pipeline.yaml -i
Template location: (relative to this skill's directory)
Common Update Patterns
Adding a SQL Transform
Before:
yaml
transforms: {}
sinks:
output:
type: blackhole
from: transfers
After:
yaml
transforms:
filtered:
type: sql
primary_key: id
sql: |
SELECT * FROM transfers WHERE amount > 1000000
sinks:
output:
type: blackhole
from: filtered # Changed from 'transfers'
Adding a PostgreSQL Sink
yaml
sinks:
existing_sink:
type: blackhole
from: my_transform
# Add new sink
postgres_output:
type: postgres
from: my_transform
schema: public
table: my_data
secret_name: MY_POSTGRES_SECRET
primary_key: id
Changing Resource Size
yaml
resource_size: m # was: s
Adding a New Source
yaml
sources:
eth_transfers:
type: dataset
dataset_name: ethereum.erc20_transfers
version: 1.0.0
start_at: latest
# Add new source
base_transfers:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: latest
Checkpoint Behavior
Understanding Checkpoints
When you update a pipeline:
- Checkpoints are preserved by default - Processing continues from where it left off
- Source checkpoints are tied to source names - Renaming a source resets its checkpoint
- Pipeline checkpoints are tied to pipeline names - Renaming the pipeline resets all checkpoints
Resetting Checkpoints
Option 1: Rename the source
yaml
sources:
transfers_v2: # Changed from 'transfers'
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: earliest # Will process from beginning
Option 2: Rename the pipeline
yaml
name: my-pipeline-v2 # Changed from 'my-pipeline'
Warning: Resetting checkpoints means reprocessing all historical data.
Troubleshooting
See
references/troubleshooting.md
for:
- CLI hanging / Turbo binary not found fixes
- Common validation errors (unknown dataset, missing primary_key, bad source reference)
- Common runtime errors (auth failed, connection refused, Neon size limit)
- Quick troubleshooting table
Also see
for error patterns and log analysis.
Related
- — Interactive wizard to build pipelines step-by-step
- — Diagnose and fix pipeline issues
- — Dataset names and chain prefixes