Pipeline Builder
Boundaries
- Build NEW pipelines. Do not diagnose broken pipelines — that belongs to .
- Do not serve as a YAML reference. If the user only needs to look up a field or syntax, use the skill instead.
- For dataset lookups, use .
Walk the user through building a complete pipeline from scratch, step by step. Generate a valid YAML configuration, validate it, and deploy it.
Builder Workflow
Step 1: Verify Authentication
Run
goldsky project list 2>&1
to check login status.
- If logged in: Note the current project and continue.
- If not logged in: Use the skill for guidance.
Step 2: Understand the Goal
Ask the user what they want to index. Good questions:
- What blockchain/chain? (Ethereum, Base, Polygon, Solana, etc.)
- What data? (transfers, swaps, events from a specific contract, all transactions, etc.)
- Where should the data go? (PostgreSQL, ClickHouse, Kafka, S3, etc.)
- Do they need transforms? (filtering, aggregation, enrichment)
- One-time backfill or continuous streaming?
If the user already described their goal, extract answers from their description.
Step 3: Choose the Dataset
Use the
skill to find the right dataset.
Key points:
- Common datasets: , , ,
- For decoded contract events, use with a filter on and
- For Solana: use , , etc.
Present the dataset choice to the user for confirmation.
Step 4: Configure the Source
Build the source section of the YAML:
yaml
sources:
my_source:
type: dataset
dataset_name: <chain>.<dataset>
version: 1.0.0
start_at: earliest # or a specific block number
Ask about:
- Start block: (from genesis), (from now), or a specific block number
- End block: Only for job-mode/backfill pipelines. Omit for streaming.
- Source-level filter: Optional filter to reduce data at the source (e.g., specific contract address)
Step 5: Configure Transforms (if needed)
If the user needs transforms, use the
skill to help:
- SQL transforms — filter, aggregate, join, or reshape data using DataFusion SQL
- TypeScript transforms — custom logic, external API calls, complex processing
- Dynamic tables — join with a PostgreSQL table or in-memory allowlist
Build the transforms section:
yaml
transforms:
my_transform:
type: sql
primary_key: id
sql: |
SELECT * FROM my_source
WHERE <conditions>
Step 6: Configure the Sink
Ask where the data should go. Use the
skill for sink configuration:
| Sink | Key config |
|---|
| PostgreSQL | , , , |
| ClickHouse | , , |
| Kafka | , |
| S3 | , , , |
| Webhook | , |
For sinks requiring
, check if the secret exists:
If it doesn't exist, help create it using the
skill.
Step 7: Choose Mode
Use the
skill for guidance:
- Streaming (default) — continuous processing, no , runs indefinitely
- Job mode — one-time backfill, set and
Step 8: Generate, Validate, and Present
Assemble the complete pipeline YAML. Use a descriptive name following the convention:
(e.g.,
base-erc20-transfers-postgres
).
- Write the YAML file to disk (e.g., ).
- Run validation BEFORE showing the YAML to the user:
bash
goldsky turbo validate -f <pipeline-name>.yaml
-
If validation fails, fix the issues and re-validate. Do NOT present the YAML until validation passes. Common fixes:
- Missing field on dataset source
- Invalid dataset name (check chain prefix)
- Missing for database sinks
- SQL syntax errors in transforms
-
Once validation passes, present the full YAML to the user for review.
Step 9: Deploy
After user confirms the YAML looks good:
bash
goldsky turbo apply <pipeline-name>.yaml
Step 10: Verify
After deployment:
Suggest running inspect to verify data flow:
bash
goldsky turbo inspect <pipeline-name>
Present a summary:
## Pipeline Deployed
**Name:** [name]
**Chain:** [chain]
**Dataset:** [dataset]
**Sink:** [sink type]
**Mode:** [streaming/job]
**Next steps:**
- Monitor with `goldsky turbo inspect <name>`
- Check logs with `goldsky turbo logs <name>`
- Use /turbo-doctor if you run into issues
Important Rules
- Always validate before presenting complete YAML to the user. Never show unvalidated complete pipeline YAML.
- Always validate before deploying.
- Always show the user the complete YAML before deploying.
- For job-mode pipelines, remind the user they auto-cleanup ~1hr after completion.
- Use sink for testing pipelines without writing to a real destination.
- If the user wants to modify an existing pipeline, check if it's streaming (update in place) or job-mode (must delete first).
- Default to unless the user specifies otherwise.
- Always include on dataset sources.
Related
- — YAML configuration and architecture reference
- — Diagnose and fix pipeline issues
- — Lifecycle commands and monitoring reference
- — SQL and TypeScript transform reference
- — Dataset names and chain prefixes
- — Sink credential management