turbo-architecture

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Turbo Pipeline Architecture

Turbo管道架构

Help users make architecture decisions for Turbo pipelines — source types, data flow patterns, resource sizing, sink strategies, streaming vs job mode, dynamic table design, and multi-chain deployment.
帮助用户为Turbo管道制定架构决策——包括源类型、数据流模式、资源规格、Sink策略、流式与作业模式、动态表设计以及多链部署。

Agent Instructions

Agent操作指引

Step 1: Understand the Requirements

步骤1:理解需求

Ask the user about:
  1. What data? — Which chain(s), which events/datasets, historical or real-time only?
  2. Where does it go? — Database, webhook, streaming topic, multiple destinations?
  3. How much volume? — Single contract or all chain activity? How many events/sec?
  4. Latency needs? — Real-time dashboards (sub-second) or analytics (minutes OK)?
询问用户以下信息:
  1. 数据内容? — 涉及哪些链、哪些事件/数据集,仅需历史数据还是实时数据?
  2. 数据流向? — 数据库、Webhook、流式主题,还是多个目标?
  3. 数据量? — 单个合约还是全链活动?每秒多少事件?
  4. 延迟要求? — 实时仪表盘(亚秒级)还是分析场景(分钟级可接受)?

Step 2: Recommend an Architecture

步骤2:推荐架构

Use the patterns and decision guides below to recommend a pipeline architecture. Reference the templates in
templates/
as starting points:
  • templates/linear-pipeline.yaml
    — Simple decode → filter → sink
  • templates/fan-out-pipeline.yaml
    — One source → multiple sinks
  • templates/fan-in-pipeline.yaml
    — Multiple events → UNION ALL → sink
  • templates/multi-chain-templated.yaml
    — Per-chain pipeline pattern
使用以下模式和决策指南推荐管道架构,可参考
templates/
中的模板作为起点:
  • templates/linear-pipeline.yaml
    — 简单的解码→过滤→Sink流程
  • templates/fan-out-pipeline.yaml
    — 单源→多Sink
  • templates/fan-in-pipeline.yaml
    — 多事件→UNION ALL→Sink
  • templates/multi-chain-templated.yaml
    — 单链管道模板

Step 3: Hand Off to Implementation Skills

步骤3:移交至实现类技能

After the architecture is decided, direct the user to:
  • /turbo-pipelines
    — To build and deploy the YAML
  • /turbo-transforms
    — To write the SQL transforms
  • /secrets
    — To set up sink credentials
Reminder: When presenting complete pipeline YAML as part of architecture recommendations, validate it first with
goldsky turbo validate
. Templates in
templates/
are structural patterns — any customized version must be validated before presenting to the user.

确定架构后,引导用户使用:
  • /turbo-pipelines
    — 构建并部署YAML配置
  • /turbo-transforms
    — 编写SQL转换逻辑
  • /secrets
    — 配置Sink凭证
提示: 当在架构推荐中提供完整的管道YAML时,需先用
goldsky turbo validate
验证。
templates/
中的模板为结构示例,任何自定义版本在提供给用户前都必须经过验证。

Source Types

源类型

Dataset Sources — Historical + Real-Time

数据集源 — 历史+实时

Use
type: dataset
when you need to process historical blockchain data and/or continue streaming new data.
yaml
sources:
  my_source:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: earliest  # or: latest
Best for: Raw logs, transactions, transfers, blocks — anything where you need historical backfill or decoded event processing.
Available EVM datasets:
raw_logs
,
blocks
,
raw_transactions
,
raw_traces
,
erc20_transfers
,
erc721_transfers
,
erc1155_transfers
,
decoded_logs
Non-EVM chains also supported: Solana (
solana.*
), Bitcoin (
bitcoin.raw.*
), Stellar (
stellar_mainnet.*
)
当你需要处理区块链历史数据,同时持续处理新的实时数据时,使用
type: dataset
yaml
sources:
  my_source:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: earliest  # 或: latest
适用场景: 原始日志、交易、转账、区块——任何需要历史回填或解码事件处理的场景。
支持的EVM数据集:
raw_logs
,
blocks
,
raw_transactions
,
raw_traces
,
erc20_transfers
,
erc721_transfers
,
erc1155_transfers
,
decoded_logs
也支持非EVM链: Solana(
solana.*
)、Bitcoin(
bitcoin.raw.*
)、Stellar(
stellar_mainnet.*

Kafka Sources — Real-Time Streaming

Kafka源 — 实时流

Note: Kafka sources are used in production pipelines but are not documented in the official Goldsky docs. Use with caution and contact Goldsky support for topic names.
Use
type: kafka
when consuming from a Goldsky-managed Kafka topic, typically for continuously-updated state like balances.
yaml
sources:
  my_source:
    type: kafka
    topic: base.raw.latest_balances_v2
Best for: Balance snapshots, latest state data, high-volume continuous streams.
Key differences from dataset sources:
  • No
    start_at
    or
    version
    fields
  • Optional fields:
    filter
    ,
    include_metadata
    ,
    starting_offsets
  • Delivers the latest state rather than historical event logs
注意: Kafka源已用于生产管道,但未在Goldsky官方文档中记录。使用时需谨慎,如需主题名称请联系Goldsky支持团队。
当你需要消费Goldsky托管的Kafka主题时使用
type: kafka
,通常用于处理余额这类持续更新的状态数据。
yaml
sources:
  my_source:
    type: kafka
    topic: base.raw.latest_balances_v2
适用场景: 代币余额快照、最新状态数据、高容量持续流。
与数据集源的关键区别:
  • start_at
    version
    字段
  • 可选字段:
    filter
    ,
    include_metadata
    ,
    starting_offsets
  • 提供最新状态而非历史事件日志

When to Use Which

源类型选择对比

ScenarioSource TypeWhy
Decode contract events from logs
dataset
Need
raw_logs
+
_gs_log_decode()
Track token transfers
dataset
erc20_transfers
has structured data
Historical backfill + live
dataset
start_at: earliest
processes history
Live token balances
kafka
latest_balances_v2
is a streaming topic
Real-time state snapshots
kafka
Kafka delivers latest state continuously
Only need new data going forwardEitherDataset with
start_at: latest
or Kafka

场景源类型原因
从日志中解码合约事件
dataset
需要
raw_logs
+
_gs_log_decode()
函数
追踪代币转账
dataset
erc20_transfers
提供结构化数据
历史回填+实时处理
dataset
start_at: earliest
可处理历史数据
实时代币余额
kafka
latest_balances_v2
是流式主题
实时状态快照
kafka
Kafka持续推送最新状态
仅需处理未来新数据两者均可数据集设置
start_at: latest
或使用Kafka均可

Data Flow Patterns

数据流模式

Linear Pipeline

线性管道

The simplest pattern. One source → one or more chained transforms → one sink.
source → transform_a → transform_b → sink
Use when: You have a single data source, single destination, and straightforward processing (decode, filter, reshape).
Example:
templates/linear-pipeline.yaml
— raw logs → decode → extract trade events → postgres
Resource size:
s
or
m
最简单的模式:单源→一个或多个链式转换→单Sink。
source → transform_a → transform_b → sink
适用场景: 单一数据源、单一目标,且处理逻辑简单(解码、过滤、重塑)。
示例:
templates/linear-pipeline.yaml
— 原始日志→解码→提取交易事件→Postgres
资源规格:
s
m

Fan-Out (One Source → Multiple Sinks)

扇出(单源→多Sink)

One source feeds multiple transforms, each writing to a different sink.
              ┌→ transform_a → sink_1 (clickhouse)
source ──────┤
              └→ transform_b → sink_2 (webhook)
Use when: You need different views or subsets of the same data going to different destinations — e.g., balances to a warehouse AND token metadata to a webhook.
Example:
templates/fan-out-pipeline.yaml
— one Kafka source → fungible balances to ClickHouse + all tokens to a webhook
yaml
transforms:
  fungible_balances:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type = 'ERC_20' OR balance.token_type IS NULL

  all_tokens:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type IN ('ERC_20', 'ERC_721', 'ERC_1155')

sinks:
  warehouse:
    type: clickhouse
    from: fungible_balances
    # ...
  webhook:
    type: webhook
    from: all_tokens
    # ...
Resource size:
m
(multiple output paths)
单个源供给多个转换逻辑,每个转换写入不同的Sink。
              ┌→ transform_a → sink_1 (clickhouse)
source ──────┤
              └→ transform_b → sink_2 (webhook)
适用场景: 需要将同一数据的不同视图或子集发送到不同目标——例如,将余额数据发送到数据仓库,同时将代币元数据发送到Webhook。
示例:
templates/fan-out-pipeline.yaml
— 单个Kafka源→ fungible余额到ClickHouse + 所有代币数据到Webhook
yaml
transforms:
  fungible_balances:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type = 'ERC_20' OR balance.token_type IS NULL

  all_tokens:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type IN ('ERC_20', 'ERC_721', 'ERC_1155')

sinks:
  warehouse:
    type: clickhouse
    from: fungible_balances
    # ...
  webhook:
    type: webhook
    from: all_tokens
    # ...
资源规格:
m
(多输出路径)

Fan-In (Multiple Events → One Output)

扇入(多事件→单输出)

Multiple event types decoded from the same source, normalized to a common schema, then combined with UNION ALL into a single sink.
              ┌→ event_type_a ──┐
source → decode ┤                 ├→ UNION ALL → sink
              └→ event_type_b ──┘
Use when: You want a unified activity feed, combining trades, deposits, withdrawals, transfers, etc. into one table.
Example:
templates/fan-in-pipeline.yaml
— one raw_logs source → decode → multiple event-type transforms → UNION ALL → ClickHouse
Resource size:
l
(complex processing with many transforms)
从同一源解码出多种事件类型,标准化为通用Schema,再通过UNION ALL合并到单个Sink。
              ┌→ event_type_a ──┐
source → decode ┤                 ├→ UNION ALL → sink
              └→ event_type_b ──┘
适用场景: 需要统一的活动流,将交易、存款、取款、转账等合并到单个表中。
示例:
templates/fan-in-pipeline.yaml
— 单个原始日志源→解码→多事件类型转换→UNION ALL→ClickHouse
资源规格:
l
(多转换逻辑的复杂处理)

Multi-Chain Fan-In

多链扇入

Multiple sources from different chains combined into a single output.
source_chain_a ──┐
source_chain_b ──┼→ UNION ALL → sink
source_chain_c ──┘
Use when: You want cross-chain analytics or a unified view across chains.
yaml
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest
  base_transfers:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  combined:
    type: sql
    primary_key: id
    sql: |
      SELECT *, 'ethereum' AS chain FROM eth_transfers
      UNION ALL
      SELECT *, 'base' AS chain FROM base_transfers
Resource size:
m
or
l
depending on chain count

来自不同链的多个源合并为单个输出。
source_chain_a ──┐
source_chain_b ──┼→ UNION ALL → sink
source_chain_c ──┘
适用场景: 需要跨链分析或多链统一视图。
yaml
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest
  base_transfers:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  combined:
    type: sql
    primary_key: id
    sql: |
      SELECT *, 'ethereum' AS chain FROM eth_transfers
      UNION ALL
      SELECT *, 'base' AS chain FROM base_transfers
资源规格:
m
l
,取决于链的数量

Templated Multi-Chain Deployment

模板化多链部署

When you need the same pipeline logic across multiple chains, create separate pipeline files per chain rather than one multi-source pipeline. This gives you:
  • Independent lifecycle (deploy/delete per chain)
  • Independent checkpointing (one chain failing doesn't block others)
  • Clearer monitoring per chain
Pattern: Copy the pipeline YAML and swap the chain-specific values:
FieldChain A (base)Chain B (arbitrum)
name
base-balance-streaming
arbitrum-balance-streaming
topic
base.raw.latest_balances_v2
arbitrum.raw.latest_balances_v2
Source key
base_latest_balances_v2
arbitrum_latest_balances_v2
Transform SQL
'base' AS chain
'arbitrum' AS chain
Sink table
base_token_balances
arbitrum_token_balances
Example:
templates/multi-chain-templated.yaml
— shows the base chain version; duplicate for each chain.
When to use templated vs multi-source:
ApproachProsCons
Templated (per-chain)Independent lifecycle, clear monitoringMore files to manage
Multi-source (one file)Single deployment, cross-chain UNION possibleCoupled lifecycle, harder to debug

当需要在多个链上使用相同管道逻辑时,应为每个链创建单独的管道文件,而非单文件多源管道。这样做的优势:
  • 独立生命周期(可按链部署/删除)
  • 独立检查点(某条链故障不会阻塞其他链)
  • 更清晰的分链监控
模式: 复制管道YAML并替换链相关参数:
字段链A(base)链B(arbitrum)
name
base-balance-streaming
arbitrum-balance-streaming
topic
base.raw.latest_balances_v2
arbitrum.raw.latest_balances_v2
源Key
base_latest_balances_v2
arbitrum_latest_balances_v2
转换SQL
'base' AS chain
'arbitrum' AS chain
Sink表名
base_token_balances
arbitrum_token_balances
示例:
templates/multi-chain-templated.yaml
— 展示base链版本,复制后适配其他链即可。

Resource Sizing

模板化 vs 多源管道对比

Each size doubles the previous tier's CPU and memory:
SizeWorkersCPU RequestMemoryWhen to Use
xs
0.40.5 GiSmall datasets, light testing
s
10.81.0 GiTesting, simple filters, single source/sink, low volume (default)
m
41.62.0 GiMultiple sinks, Kafka streaming, moderate transform complexity
l
103.24.0 GiMulti-event decoding with UNION ALL, high-volume historical backfill
xl
206.48.0 GiLarge chain backfills, complex JOINs (e.g. Solana accounts+transactions)
xxl
4012.816.0 GiHighest throughput needs; up to 6.3M rows/min
Rules of thumb from production pipelines:
  • Simple filter + single sink →
    s
    (default, try this first)
  • Kafka source + multiple sinks OR multiple transforms →
    m
  • Raw log decoding + 5+ event types + UNION ALL →
    l
  • Historical backfill of high-volume data →
    l
    or
    xl
    (can downsize after catch-up)
  • Start small and scale up — defensive sizing avoids wasted resources

方案优势劣势
模板化(单链文件)独立生命周期、监控清晰需要管理更多文件
多源(单文件)单次部署、支持跨链UNION操作生命周期耦合、调试难度大

Sink Selection

资源规格

Quick Reference

DestinationSink TypeBest For
Application DB
postgres
Row-level lookups, joins, application serving
Real-time aggregates
postgres_aggregate
Balances, counters, running totals via triggers
Analytics queries
clickhouse
Large-scale aggregations, time-series data
Event processing
kafka
Downstream consumers, event-driven systems
Serverless streaming
s2_sink
S2.dev streams, alternative to Kafka
Notifications
webhook
Lambda functions, API callbacks, alerts
Data lake
s3_sink
Long-term archival, batch processing
Testing
blackhole
Validate pipeline without writing data
每个规格的CPU和内存是上一级的两倍:
规格工作节点数CPU请求内存适用场景
xs
0.40.5 Gi小数据集、轻量测试
s
10.81.0 Gi测试、简单过滤、单源单Sink、低数据量(默认值)
m
41.62.0 Gi多Sink、Kafka流、中等复杂度转换
l
103.24.0 Gi多事件解码+UNION ALL、高容量历史回填
xl
206.48.0 Gi大链历史回填、复杂JOIN(如Solana账户+交易)
xxl
4012.816.0 Gi超高吞吐量需求;最高可达630万行/分钟
生产管道经验法则:
  • 简单过滤+单Sink →
    s
    (默认,优先尝试)
  • Kafka源+多Sink 或 多转换逻辑 →
    m
  • 原始日志解码+5种以上事件类型+UNION ALL →
    l
  • 高容量数据历史回填 →
    l
    xl
    (回填完成后可降级规格)
  • 从小规格开始,按需扩容——保守选型避免资源浪费

Decision Flowchart

Sink选择

快速参考

What's your primary use case?
├─ Application serving (REST/GraphQL API)
│  └─ PostgreSQL ← row-level lookups, joins, strong consistency
├─ Analytics / dashboards
│  ├─ Time-series queries → ClickHouse ← columnar, fast aggregations
│  └─ Full-text search → Elasticsearch / OpenSearch
├─ Real-time aggregations (balances, counters)
│  └─ PostgreSQL Aggregate ← trigger-based running totals
├─ Event-driven downstream processing
│  ├─ Need Kafka ecosystem → Kafka
│  └─ Serverless / simpler → S2 (s2.dev)
├─ Notifications / webhooks
│  └─ Webhook ← HTTP POST per event
├─ Long-term archival
│  └─ S3 ← object storage, cheapest for bulk data
├─ Just testing
│  └─ Blackhole ← validates pipeline without writing
└─ Multiple of the above
   └─ Use multiple sinks in the same pipeline (fan-out pattern)
目标系统Sink类型适用场景
应用数据库
postgres
行级查询、关联、应用数据服务
实时聚合计算
postgres_aggregate
余额、计数器、累计值(通过触发器实现)
分析查询
clickhouse
大规模聚合、时序数据
事件处理
kafka
下游消费者、事件驱动系统
无服务器流处理
s2_sink
S2.dev流、Kafka替代方案
通知推送
webhook
Lambda函数、API回调、告警
数据湖
s3_sink
长期归档、批处理
测试场景
blackhole
验证管道逻辑但不写入数据

PostgreSQL Aggregate Sink

决策流程图

The
postgres_aggregate
sink is uniquely suited for real-time running aggregations (balances, counters, totals). It uses a two-table pattern: a landing table that receives raw events, and an aggregation table maintained by a database trigger.
yaml
sinks:
  token_balances:
    type: postgres_aggregate
    from: transfers
    schema: public
    landing_table: transfer_events
    agg_table: account_balances
    primary_key: id
    secret_name: MY_POSTGRES
    group_by:
      account:
        type: text
      token_address:
        type: text
    aggregate:
      balance:
        from: amount
        fn: sum
      transfer_count:
        from: id
        fn: count
Supported aggregation functions:
sum
,
count
,
avg
,
min
,
max
你的核心使用场景是什么?
├─ 应用服务(REST/GraphQL API)
│  └─ PostgreSQL ← 行级查询、关联、强一致性
├─ 分析/仪表盘
│  ├─ 时序查询 → ClickHouse ← 列式存储、快速聚合
│  └─ 全文检索 → Elasticsearch / OpenSearch
├─ 实时聚合(余额、计数器)
│  └─ PostgreSQL Aggregate ← 基于触发器的累计计算
├─ 事件驱动下游处理
│  ├─ 需要Kafka生态 → Kafka
│  └─ 无服务器/更简单方案 → S2 (s2.dev)
├─ 通知/Webhook
│  └─ Webhook ← 每个事件触发HTTP POST请求
├─ 长期归档
│  └─ S3 ← 对象存储、批量数据低成本存储
├─ 仅测试
│  └─ Blackhole ← 验证管道但不写入数据
└─ 以上多种场景
   └─ 在同一管道中使用多Sink(扇出模式)

Multi-Sink Considerations

PostgreSQL Aggregate Sink

  • Each sink reads from a
    from:
    field — different sinks can read from different transforms
  • Sinks are independent — one failing doesn't block others
  • Use different
    batch_size
    /
    batch_flush_interval
    per sink based on latency needs
  • ClickHouse supports
    parallelism: N
    for concurrent writers (default
    1
    )
postgres_aggregate
Sink专为实时累计聚合(余额、计数器、总和)设计。它采用双表模式:接收原始事件的落地表,以及由数据库触发器维护的聚合表。
yaml
sinks:
  token_balances:
    type: postgres_aggregate
    from: transfers
    schema: public
    landing_table: transfer_events
    agg_table: account_balances
    primary_key: id
    secret_name: MY_POSTGRES
    group_by:
      account:
        type: text
      token_address:
        type: text
    aggregate:
      balance:
        from: amount
        fn: sum
      transfer_count:
        from: id
        fn: count
支持的聚合函数:
sum
,
count
,
avg
,
min
,
max

Webhook Sinks Without Secrets

多Sink注意事项

Webhooks can use a direct URL instead of a secret when no auth headers are needed:
yaml
sinks:
  my_webhook:
    type: webhook
    from: my_transform
    url: https://my-lambda.us-west-2.on.aws/

  • 每个Sink通过
    from:
    字段指定数据源——不同Sink可读取不同转换结果
  • Sink相互独立——单个Sink故障不会阻塞其他Sink
  • 可根据延迟需求为每个Sink设置不同的
    batch_size
    /
    batch_flush_interval
  • ClickHouse支持
    parallelism: N
    配置并发写入(默认值
    1

Pipeline Splitting Decisions

无需凭证的Webhook Sink

One Pipeline vs. Multiple

Use one pipeline when:
  • All data comes from the same source
  • Transforms share intermediate results (e.g., a shared decode step)
  • You want atomic deployment of the whole flow
Split into multiple pipelines when:
  • Different data sources with no shared transforms
  • Different lifecycle needs (one is stable, another changes frequently)
  • Different resource requirements (one needs
    l
    , another needs
    s
    )
  • Different chains with independent processing (templated pattern)
当无需认证头时,Webhook可直接使用URL而非凭证:
yaml
sinks:
  my_webhook:
    type: webhook
    from: my_transform
    url: https://my-lambda.us-west-2.on.aws/

Keeping Pipelines Focused

管道拆分决策

单管道 vs 多管道

A pipeline should ideally do one logical thing:
PipelineFocus
dex-trades
Trade events → Postgres
dex-activities
All activity types → ClickHouse DWH
token-balances
Token balances → Postgres
base-balance-streaming
Base balances → ClickHouse + webhook
Even though trades are a subset of activities, they're separate pipelines because they serve different consumers (application DB vs data warehouse).

使用单管道的场景:
  • 所有数据来自同一源
  • 转换逻辑共享中间结果(如共享解码步骤)
  • 需要整个流程原子化部署
拆分为多管道的场景:
  • 不同数据源且无共享转换逻辑
  • 生命周期需求不同(一个稳定,另一个频繁变更)
  • 资源需求不同(一个需要
    l
    ,另一个需要
    s
  • 多链独立处理(模板化模式)

Streaming vs Job Mode

保持管道聚焦

Turbo pipelines have two execution modes:
一个管道应专注于单一逻辑任务
管道名称核心目标
dex-trades
交易事件→Postgres
dex-activities
所有活动类型→ClickHouse数据仓库
token-balances
代币余额→Postgres
base-balance-streaming
Base链余额→ClickHouse + Webhook
尽管交易是活动的子集,但因服务不同消费者(应用数据库 vs 数据仓库),仍需拆分为独立管道。

Streaming Mode (Default)

流式模式 vs 作业模式

yaml
name: my-streaming-pipeline
resource_size: s
Turbo管道有两种执行模式:

job: false (default — omit this field)

流式模式(默认)


- Runs continuously, processing data as it arrives
- Maintains checkpoints for exactly-once processing
- Use for real-time feeds, dashboards, APIs
yaml
name: my-streaming-pipeline
resource_size: s

Job Mode (One-Time Batch)

job: false (默认,可省略该字段)

yaml
name: my-backfill-job
resource_size: l
job: true
  • Runs to completion and stops automatically
  • Auto-deletes resources ~1 hour after completion
  • Must delete before redeploying — cannot update a job pipeline, must
    goldsky turbo delete
    first
  • Cannot use
    restart
    — use delete + apply instead
  • Use for historical backfills, one-time data migrations, snapshot exports

- 持续运行,实时处理新到达的数据
- 维护检查点以实现Exactly-Once处理
- 适用于实时数据流、仪表盘、API服务

When to Use Which

作业模式(一次性批处理)

ScenarioModeWhy
Real-time dashboardStreamingContinuous updates needed
Backfill 6 months of historyJobOne-time, stops when done
Real-time + catch-up on deployStreaming
start_at: earliest
does backfill then streams
Export data to S3 onceJobNo need for continuous processing
Webhook notifications on eventsStreamingNeeds to react as events happen
Load test with historical dataJobProcess and inspect, then discard
yaml
name: my-backfill-job
resource_size: l
job: true
  • 运行至完成后自动停止
  • 完成约1小时后自动删除资源
  • 重新部署前必须删除——无法更新作业管道,需先执行
    goldsky turbo delete
  • 无法使用
    restart
    命令
    ——需先删除再重新部署
  • 适用于历史回填、一次性数据迁移、快照导出

Job Mode with Bounded Ranges

模式选择对比

Combine job mode with
start_at: earliest
and an
end_block
to process a specific range:
yaml
name: historical-export
resource_size: l
job: true

sources:
  logs:
    type: dataset
    dataset_name: ethereum.raw_logs
    version: 1.0.0
    start_at: earliest
    end_block: 19000000
    filter: >-
      address = '0xdac17f958d2ee523a2206206994597c13d831ec7'

场景模式原因
实时仪表盘流式模式需要持续更新数据
6个月历史数据回填作业模式一次性任务,完成后停止
实时处理+部署时回填历史流式模式
start_at: earliest
可先处理历史再转为实时流
一次性导出数据至S3作业模式无需持续处理
事件触发Webhook通知流式模式需要实时响应事件
用历史数据做负载测试作业模式处理后可直接丢弃资源

Dynamic Table Architecture

带范围限制的作业模式

Dynamic tables enable runtime-updatable lookup data within a pipeline. They're the Turbo answer to the "no joins in streaming SQL" limitation.
结合作业模式与
start_at: earliest
end_block
可处理特定范围的数据:
yaml
name: historical-export
resource_size: l
job: true

sources:
  logs:
    type: dataset
    dataset_name: ethereum.raw_logs
    version: 1.0.0
    start_at: earliest
    end_block: 19000000
    filter: >-
      address = '0xdac17f958d2ee523a2206206994597c13d831ec7'

Pattern: Dynamic Allowlist/Blocklist

动态表架构

                    ┌──────────────────────┐
                    │  External Updates     │
                    │  (Postgres / REST)    │
                    └──────────┬───────────┘
source ──→ sql transform ──→ [dynamic_table_check()] ──→ sink
The SQL transform filters records against the dynamic table. The table contents can be updated externally without pipeline restart.
动态表支持在管道运行时更新查询数据,是Turbo解决“流式SQL不支持JOIN”限制的方案。

Pattern: Lookup Enrichment

模式:动态白名单/黑名单

source ──→ decode ──→ filter ──→ sql (with dynamic_table_check) ──→ sink
                              [token_metadata table]
                              (postgres-backed)
Store metadata (token symbols, decimals, protocol names) in a PostgreSQL table. Reference it in transforms for enrichment.
                    ┌──────────────────────┐
                    │  外部更新            │
                    │  (Postgres / REST)   │
                    └──────────┬───────────┘
source ──→ sql转换 ──→ [dynamic_table_check()] ──→ sink
SQL转换逻辑基于动态表过滤记录,表内容可通过外部系统更新且无需重启管道。

Backend Decisions

模式:查询增强

Backend
backend_type
When to Use
PostgreSQL
Postgres
Data managed by external systems, shared across pipeline restarts
In-memory
InMemory
Auto-populated from pipeline data, ephemeral, fastest lookups
source ──→ 解码 ──→ 过滤 ──→ SQL(含dynamic_table_check) ──→ sink
                              [代币元数据表]
                              (Postgres后端)
在Postgres表中存储元数据(代币符号、小数位数、协议名称),在转换逻辑中引用以增强数据。

Sizing Considerations

后端选择

  • Dynamic tables add memory overhead proportional to table size
  • For large lookup tables (>100K rows), use
    Postgres
    backend
  • For small, frequently-changing lists (<10K rows),
    InMemory
    is faster
  • Dynamic table queries are async — they add slight latency per record
For full dynamic table configuration syntax and examples, see
/turbo-transforms
.

后端
backend_type
适用场景
PostgreSQL
Postgres
数据由外部系统管理,管道重启后仍可共享
内存
InMemory
由管道数据自动填充,临时存储,查询速度最快

Related

规格注意事项

  • /turbo-builder
    — Build and deploy pipelines interactively using these architecture patterns
  • /turbo-doctor
    — Diagnose and fix pipeline issues
  • /turbo-pipelines
    — Pipeline YAML configuration reference
  • /turbo-transforms
    — SQL, TypeScript, and dynamic table transform reference
  • /datasets
    — Blockchain dataset and chain prefix reference
  • /secrets
    — Sink credential management
  • /turbo-monitor-debug
    — Monitoring and debugging reference
  • /turbo-lifecycle
    — Pipeline lifecycle command reference
  • 动态表会增加与表大小成正比的内存开销
  • 大型查询表(>10万行)建议使用
    Postgres
    后端
  • 小型、频繁变更的列表(<1万行)使用
    InMemory
    速度更快
  • 动态表查询为异步操作——会为每条记录增加轻微延迟
完整的动态表配置语法和示例,请查看
/turbo-transforms

相关技能

  • /turbo-builder
    — 使用上述架构模式交互式构建并部署管道
  • /turbo-doctor
    — 诊断并修复管道问题
  • /turbo-pipelines
    — 管道YAML配置参考
  • /turbo-transforms
    — SQL、TypeScript和动态表转换参考
  • /datasets
    — 区块链数据集和链前缀参考
  • /secrets
    — Sink凭证管理
  • /turbo-monitor-debug
    — 监控与调试参考
  • /turbo-lifecycle
    — 管道生命周期命令参考