turbo-architecture

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Turbo Pipeline Architecture

Turbo管道架构

Help users make architecture decisions for Turbo pipelines — source types, data flow patterns, resource sizing, sink strategies, streaming vs job mode, dynamic table design, and multi-chain deployment.

帮助用户为Turbo管道制定架构决策——包括源类型、数据流模式、资源规格、Sink策略、流式与作业模式、动态表设计以及多链部署。

Agent Instructions

Agent操作指引

Step 1: Understand the Requirements

步骤1：理解需求

Ask the user about:

What data? — Which chain(s), which events/datasets, historical or real-time only?
Where does it go? — Database, webhook, streaming topic, multiple destinations?
How much volume? — Single contract or all chain activity? How many events/sec?
Latency needs? — Real-time dashboards (sub-second) or analytics (minutes OK)?

询问用户以下信息：

数据内容？ — 涉及哪些链、哪些事件/数据集，仅需历史数据还是实时数据？
数据流向？ — 数据库、Webhook、流式主题，还是多个目标？
数据量？ — 单个合约还是全链活动？每秒多少事件？
延迟要求？ — 实时仪表盘（亚秒级）还是分析场景（分钟级可接受）？

Step 2: Recommend an Architecture

步骤2：推荐架构

Use the patterns and decision guides below to recommend a pipeline architecture. Reference the templates in

templates/

as starting points:

```
templates/linear-pipeline.yaml
```
— Simple decode → filter → sink
```
templates/fan-out-pipeline.yaml
```
— One source → multiple sinks
```
templates/fan-in-pipeline.yaml
```
— Multiple events → UNION ALL → sink
```
templates/multi-chain-templated.yaml
```
— Per-chain pipeline pattern

使用以下模式和决策指南推荐管道架构，可参考

templates/

中的模板作为起点：

```
templates/linear-pipeline.yaml
```
— 简单的解码→过滤→Sink流程
```
templates/fan-out-pipeline.yaml
```
— 单源→多Sink
```
templates/fan-in-pipeline.yaml
```
— 多事件→UNION ALL→Sink
```
templates/multi-chain-templated.yaml
```
— 单链管道模板

Step 3: Hand Off to Implementation Skills

步骤3：移交至实现类技能

After the architecture is decided, direct the user to:

/turbo-pipelines
— To build and deploy the YAML
/turbo-transforms
— To write the SQL transforms
/secrets
— To set up sink credentials

Reminder: When presenting complete pipeline YAML as part of architecture recommendations, validate it first with

goldsky turbo validate

. Templates in

templates/

are structural patterns — any customized version must be validated before presenting to the user.

确定架构后，引导用户使用：

/turbo-pipelines
— 构建并部署YAML配置
/turbo-transforms
— 编写SQL转换逻辑
/secrets
— 配置Sink凭证

提示： 当在架构推荐中提供完整的管道YAML时，需先用

goldsky turbo validate

验证。

templates/

中的模板为结构示例，任何自定义版本在提供给用户前都必须经过验证。

Source Types

源类型

Dataset Sources — Historical + Real-Time

数据集源 — 历史+实时

Use

type: dataset

when you need to process historical blockchain data and/or continue streaming new data.

yaml

sources:
  my_source:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: earliest  # or: latest

Best for: Raw logs, transactions, transfers, blocks — anything where you need historical backfill or decoded event processing.

Available EVM datasets:

raw_logs

blocks

raw_transactions

raw_traces

erc20_transfers

erc721_transfers

erc1155_transfers

decoded_logs

Non-EVM chains also supported: Solana (

solana.*

), Bitcoin (

bitcoin.raw.*

), Stellar (

stellar_mainnet.*

)

当你需要处理区块链历史数据，同时持续处理新的实时数据时，使用

type: dataset

。

yaml

sources:
  my_source:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: earliest  # 或: latest

适用场景： 原始日志、交易、转账、区块——任何需要历史回填或解码事件处理的场景。

支持的EVM数据集：

raw_logs

blocks

raw_transactions

raw_traces

erc20_transfers

erc721_transfers

erc1155_transfers

decoded_logs

也支持非EVM链： Solana（

solana.*

）、Bitcoin（

bitcoin.raw.*

）、Stellar（

stellar_mainnet.*

）

Kafka Sources — Real-Time Streaming

Kafka源 — 实时流

Note: Kafka sources are used in production pipelines but are not documented in the official Goldsky docs. Use with caution and contact Goldsky support for topic names.

Use

type: kafka

when consuming from a Goldsky-managed Kafka topic, typically for continuously-updated state like balances.

yaml

sources:
  my_source:
    type: kafka
    topic: base.raw.latest_balances_v2

Best for: Balance snapshots, latest state data, high-volume continuous streams.

Key differences from dataset sources:

No
```
start_at
```
or
```
version
```
fields
Optional fields:
```
filter
```
,
```
include_metadata
```
,
```
starting_offsets
```
Delivers the latest state rather than historical event logs

注意： Kafka源已用于生产管道，但未在Goldsky官方文档中记录。使用时需谨慎，如需主题名称请联系Goldsky支持团队。

当你需要消费Goldsky托管的Kafka主题时使用

type: kafka

，通常用于处理余额这类持续更新的状态数据。

yaml

sources:
  my_source:
    type: kafka
    topic: base.raw.latest_balances_v2

适用场景： 代币余额快照、最新状态数据、高容量持续流。

与数据集源的关键区别：

无
```
start_at
```
或
```
version
```
字段
可选字段：
```
filter
```
,
```
include_metadata
```
,
```
starting_offsets
```
提供最新状态而非历史事件日志

When to Use Which

源类型选择对比

Scenario	Source Type	Why
Decode contract events from logs	`dataset`	Need `raw_logs` + `_gs_log_decode()`
Track token transfers	`dataset`	`erc20_transfers` has structured data
Historical backfill + live	`dataset`	`start_at: earliest` processes history
Live token balances	`kafka`	`latest_balances_v2` is a streaming topic
Real-time state snapshots	`kafka`	Kafka delivers latest state continuously
Only need new data going forward	Either	Dataset with `start_at: latest` or Kafka

场景	源类型	原因
从日志中解码合约事件	`dataset`	需要 `raw_logs` + `_gs_log_decode()` 函数
追踪代币转账	`dataset`	`erc20_transfers` 提供结构化数据
历史回填+实时处理	`dataset`	`start_at: earliest` 可处理历史数据
实时代币余额	`kafka`	`latest_balances_v2` 是流式主题
实时状态快照	`kafka`	Kafka持续推送最新状态
仅需处理未来新数据	两者均可	数据集设置 `start_at: latest` 或使用Kafka均可

Data Flow Patterns

数据流模式

Linear Pipeline

线性管道

The simplest pattern. One source → one or more chained transforms → one sink.

source → transform_a → transform_b → sink

Use when: You have a single data source, single destination, and straightforward processing (decode, filter, reshape).

Example:

templates/linear-pipeline.yaml

— raw logs → decode → extract trade events → postgres

Resource size:

最简单的模式：单源→一个或多个链式转换→单Sink。

source → transform_a → transform_b → sink

适用场景： 单一数据源、单一目标，且处理逻辑简单（解码、过滤、重塑）。

示例：

templates/linear-pipeline.yaml

— 原始日志→解码→提取交易事件→Postgres

资源规格：

或

Fan-Out (One Source → Multiple Sinks)

扇出（单源→多Sink）

One source feeds multiple transforms, each writing to a different sink.

              ┌→ transform_a → sink_1 (clickhouse)
source ──────┤
              └→ transform_b → sink_2 (webhook)

Use when: You need different views or subsets of the same data going to different destinations — e.g., balances to a warehouse AND token metadata to a webhook.

Example:

templates/fan-out-pipeline.yaml

— one Kafka source → fungible balances to ClickHouse + all tokens to a webhook

yaml

transforms:
  fungible_balances:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type = 'ERC_20' OR balance.token_type IS NULL

  all_tokens:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type IN ('ERC_20', 'ERC_721', 'ERC_1155')

sinks:
  warehouse:
    type: clickhouse
    from: fungible_balances
    # ...
  webhook:
    type: webhook
    from: all_tokens
    # ...

Resource size:

(multiple output paths)

单个源供给多个转换逻辑，每个转换写入不同的Sink。

              ┌→ transform_a → sink_1 (clickhouse)
source ──────┤
              └→ transform_b → sink_2 (webhook)

适用场景： 需要将同一数据的不同视图或子集发送到不同目标——例如，将余额数据发送到数据仓库，同时将代币元数据发送到Webhook。

示例：

templates/fan-out-pipeline.yaml

— 单个Kafka源→ fungible余额到ClickHouse + 所有代币数据到Webhook

yaml

transforms:
  fungible_balances:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type = 'ERC_20' OR balance.token_type IS NULL

  all_tokens:
    type: sql
    primary_key: id
    sql: |
      SELECT ... FROM latest_balances balance
      WHERE balance.token_type IN ('ERC_20', 'ERC_721', 'ERC_1155')

sinks:
  warehouse:
    type: clickhouse
    from: fungible_balances
    # ...
  webhook:
    type: webhook
    from: all_tokens
    # ...

资源规格：

（多输出路径）

Fan-In (Multiple Events → One Output)

扇入（多事件→单输出）

Multiple event types decoded from the same source, normalized to a common schema, then combined with UNION ALL into a single sink.

              ┌→ event_type_a ──┐
source → decode ┤                 ├→ UNION ALL → sink
              └→ event_type_b ──┘

Use when: You want a unified activity feed, combining trades, deposits, withdrawals, transfers, etc. into one table.

Example:

templates/fan-in-pipeline.yaml

— one raw_logs source → decode → multiple event-type transforms → UNION ALL → ClickHouse

Resource size:

(complex processing with many transforms)

从同一源解码出多种事件类型，标准化为通用Schema，再通过UNION ALL合并到单个Sink。

              ┌→ event_type_a ──┐
source → decode ┤                 ├→ UNION ALL → sink
              └→ event_type_b ──┘

适用场景： 需要统一的活动流，将交易、存款、取款、转账等合并到单个表中。

示例：

templates/fan-in-pipeline.yaml

— 单个原始日志源→解码→多事件类型转换→UNION ALL→ClickHouse

资源规格：

（多转换逻辑的复杂处理）

Multi-Chain Fan-In

多链扇入

Multiple sources from different chains combined into a single output.

source_chain_a ──┐
source_chain_b ──┼→ UNION ALL → sink
source_chain_c ──┘

Use when: You want cross-chain analytics or a unified view across chains.

yaml

sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest
  base_transfers:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  combined:
    type: sql
    primary_key: id
    sql: |
      SELECT *, 'ethereum' AS chain FROM eth_transfers
      UNION ALL
      SELECT *, 'base' AS chain FROM base_transfers

Resource size:

depending on chain count

来自不同链的多个源合并为单个输出。

source_chain_a ──┐
source_chain_b ──┼→ UNION ALL → sink
source_chain_c ──┘

适用场景： 需要跨链分析或多链统一视图。

yaml

sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest
  base_transfers:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  combined:
    type: sql
    primary_key: id
    sql: |
      SELECT *, 'ethereum' AS chain FROM eth_transfers
      UNION ALL
      SELECT *, 'base' AS chain FROM base_transfers

资源规格：

或

，取决于链的数量

Templated Multi-Chain Deployment

模板化多链部署

When you need the same pipeline logic across multiple chains, create separate pipeline files per chain rather than one multi-source pipeline. This gives you:

Independent lifecycle (deploy/delete per chain)
Independent checkpointing (one chain failing doesn't block others)
Clearer monitoring per chain

Pattern: Copy the pipeline YAML and swap the chain-specific values:

Field	Chain A (base)	Chain B (arbitrum)
`name`	`base-balance-streaming`	`arbitrum-balance-streaming`
`topic`	`base.raw.latest_balances_v2`	`arbitrum.raw.latest_balances_v2`
Source key	`base_latest_balances_v2`	`arbitrum_latest_balances_v2`
Transform SQL	`'base' AS chain`	`'arbitrum' AS chain`
Sink table	`base_token_balances`	`arbitrum_token_balances`

Example:

templates/multi-chain-templated.yaml

— shows the base chain version; duplicate for each chain.

When to use templated vs multi-source:

Approach	Pros	Cons
Templated (per-chain)	Independent lifecycle, clear monitoring	More files to manage
Multi-source (one file)	Single deployment, cross-chain UNION possible	Coupled lifecycle, harder to debug

当需要在多个链上使用相同管道逻辑时，应为每个链创建单独的管道文件，而非单文件多源管道。这样做的优势：

独立生命周期（可按链部署/删除）
独立检查点（某条链故障不会阻塞其他链）
更清晰的分链监控

模式： 复制管道YAML并替换链相关参数：

字段	链A（base）	链B（arbitrum）
`name`	`base-balance-streaming`	`arbitrum-balance-streaming`
`topic`	`base.raw.latest_balances_v2`	`arbitrum.raw.latest_balances_v2`
源Key	`base_latest_balances_v2`	`arbitrum_latest_balances_v2`
转换SQL	`'base' AS chain`	`'arbitrum' AS chain`
Sink表名	`base_token_balances`	`arbitrum_token_balances`

示例：

templates/multi-chain-templated.yaml

— 展示base链版本，复制后适配其他链即可。

Resource Sizing

模板化 vs 多源管道对比

Each size doubles the previous tier's CPU and memory:

Size	Workers	CPU Request	Memory	When to Use
`xs`	—	0.4	0.5 Gi	Small datasets, light testing
`s`	1	0.8	1.0 Gi	Testing, simple filters, single source/sink, low volume (default)
`m`	4	1.6	2.0 Gi	Multiple sinks, Kafka streaming, moderate transform complexity
`l`	10	3.2	4.0 Gi	Multi-event decoding with UNION ALL, high-volume historical backfill
`xl`	20	6.4	8.0 Gi	Large chain backfills, complex JOINs (e.g. Solana accounts+transactions)
`xxl`	40	12.8	16.0 Gi	Highest throughput needs; up to 6.3M rows/min

Rules of thumb from production pipelines:

Simple filter + single sink →
```
s
```
(default, try this first)
Kafka source + multiple sinks OR multiple transforms →
```
m
```
Raw log decoding + 5+ event types + UNION ALL →
```
l
```
Historical backfill of high-volume data →
```
l
```
or
```
xl
```
(can downsize after catch-up)
Start small and scale up — defensive sizing avoids wasted resources

方案	优势	劣势
模板化（单链文件）	独立生命周期、监控清晰	需要管理更多文件
多源（单文件）	单次部署、支持跨链UNION操作	生命周期耦合、调试难度大

Sink Selection

资源规格

Quick Reference

—

Destination	Sink Type	Best For
Application DB	`postgres`	Row-level lookups, joins, application serving
Real-time aggregates	`postgres_aggregate`	Balances, counters, running totals via triggers
Analytics queries	`clickhouse`	Large-scale aggregations, time-series data
Event processing	`kafka`	Downstream consumers, event-driven systems
Serverless streaming	`s2_sink`	S2.dev streams, alternative to Kafka
Notifications	`webhook`	Lambda functions, API callbacks, alerts
Data lake	`s3_sink`	Long-term archival, batch processing
Testing	`blackhole`	Validate pipeline without writing data

每个规格的CPU和内存是上一级的两倍：

规格	工作节点数	CPU请求	内存	适用场景
`xs`	—	0.4	0.5 Gi	小数据集、轻量测试
`s`	1	0.8	1.0 Gi	测试、简单过滤、单源单Sink、低数据量（默认值）
`m`	4	1.6	2.0 Gi	多Sink、Kafka流、中等复杂度转换
`l`	10	3.2	4.0 Gi	多事件解码+UNION ALL、高容量历史回填
`xl`	20	6.4	8.0 Gi	大链历史回填、复杂JOIN（如Solana账户+交易）
`xxl`	40	12.8	16.0 Gi	超高吞吐量需求；最高可达630万行/分钟

生产管道经验法则：

简单过滤+单Sink →
```
s
```
（默认，优先尝试）
Kafka源+多Sink 或多转换逻辑 →
```
m
```
原始日志解码+5种以上事件类型+UNION ALL →
```
l
```
高容量数据历史回填 →
```
l
```
或
```
xl
```
（回填完成后可降级规格）
从小规格开始，按需扩容——保守选型避免资源浪费

Decision Flowchart

Sink选择

—

快速参考

What's your primary use case?
│
├─ Application serving (REST/GraphQL API)
│  └─ PostgreSQL ← row-level lookups, joins, strong consistency
│
├─ Analytics / dashboards
│  ├─ Time-series queries → ClickHouse ← columnar, fast aggregations
│  └─ Full-text search → Elasticsearch / OpenSearch
│
├─ Real-time aggregations (balances, counters)
│  └─ PostgreSQL Aggregate ← trigger-based running totals
│
├─ Event-driven downstream processing
│  ├─ Need Kafka ecosystem → Kafka
│  └─ Serverless / simpler → S2 (s2.dev)
│
├─ Notifications / webhooks
│  └─ Webhook ← HTTP POST per event
│
├─ Long-term archival
│  └─ S3 ← object storage, cheapest for bulk data
│
├─ Just testing
│  └─ Blackhole ← validates pipeline without writing
│
└─ Multiple of the above
   └─ Use multiple sinks in the same pipeline (fan-out pattern)

目标系统	Sink类型	适用场景
应用数据库	`postgres`	行级查询、关联、应用数据服务
实时聚合计算	`postgres_aggregate`	余额、计数器、累计值（通过触发器实现）
分析查询	`clickhouse`	大规模聚合、时序数据
事件处理	`kafka`	下游消费者、事件驱动系统
无服务器流处理	`s2_sink`	S2.dev流、Kafka替代方案
通知推送	`webhook`	Lambda函数、API回调、告警
数据湖	`s3_sink`	长期归档、批处理
测试场景	`blackhole`	验证管道逻辑但不写入数据

PostgreSQL Aggregate Sink

决策流程图

The

postgres_aggregate

sink is uniquely suited for real-time running aggregations (balances, counters, totals). It uses a two-table pattern: a landing table that receives raw events, and an aggregation table maintained by a database trigger.

yaml

sinks:
  token_balances:
    type: postgres_aggregate
    from: transfers
    schema: public
    landing_table: transfer_events
    agg_table: account_balances
    primary_key: id
    secret_name: MY_POSTGRES
    group_by:
      account:
        type: text
      token_address:
        type: text
    aggregate:
      balance:
        from: amount
        fn: sum
      transfer_count:
        from: id
        fn: count

Supported aggregation functions:

sum

count

avg

min

max

你的核心使用场景是什么？
│
├─ 应用服务（REST/GraphQL API）
│  └─ PostgreSQL ← 行级查询、关联、强一致性
│
├─ 分析/仪表盘
│  ├─ 时序查询 → ClickHouse ← 列式存储、快速聚合
│  └─ 全文检索 → Elasticsearch / OpenSearch
│
├─ 实时聚合（余额、计数器）
│  └─ PostgreSQL Aggregate ← 基于触发器的累计计算
│
├─ 事件驱动下游处理
│  ├─ 需要Kafka生态 → Kafka
│  └─ 无服务器/更简单方案 → S2 (s2.dev)
│
├─ 通知/Webhook
│  └─ Webhook ← 每个事件触发HTTP POST请求
│
├─ 长期归档
│  └─ S3 ← 对象存储、批量数据低成本存储
│
├─ 仅测试
│  └─ Blackhole ← 验证管道但不写入数据
│
└─ 以上多种场景
   └─ 在同一管道中使用多Sink（扇出模式）

Multi-Sink Considerations

PostgreSQL Aggregate Sink

Each sink reads from a
```
from:
```
field — different sinks can read from different transforms
Sinks are independent — one failing doesn't block others
Use different
```
batch_size
```
/
```
batch_flush_interval
```
per sink based on latency needs
ClickHouse supports
```
parallelism: N
```
for concurrent writers (default
```
1
```
)

postgres_aggregate

Sink专为实时累计聚合（余额、计数器、总和）设计。它采用双表模式：接收原始事件的落地表，以及由数据库触发器维护的聚合表。

yaml

sinks:
  token_balances:
    type: postgres_aggregate
    from: transfers
    schema: public
    landing_table: transfer_events
    agg_table: account_balances
    primary_key: id
    secret_name: MY_POSTGRES
    group_by:
      account:
        type: text
      token_address:
        type: text
    aggregate:
      balance:
        from: amount
        fn: sum
      transfer_count:
        from: id
        fn: count

支持的聚合函数：

sum

count

avg

min

max

Webhook Sinks Without Secrets

多Sink注意事项

Webhooks can use a direct URL instead of a secret when no auth headers are needed:

yaml

sinks:
  my_webhook:
    type: webhook
    from: my_transform
    url: https://my-lambda.us-west-2.on.aws/

每个Sink通过
```
from:
```
字段指定数据源——不同Sink可读取不同转换结果
Sink相互独立——单个Sink故障不会阻塞其他Sink
可根据延迟需求为每个Sink设置不同的
```
batch_size
```
/
```
batch_flush_interval
```
ClickHouse支持
```
parallelism: N
```
配置并发写入（默认值
```
1
```
）

Pipeline Splitting Decisions

无需凭证的Webhook Sink

One Pipeline vs. Multiple

—

Use one pipeline when:

All data comes from the same source
Transforms share intermediate results (e.g., a shared decode step)
You want atomic deployment of the whole flow

Split into multiple pipelines when:

Different data sources with no shared transforms
Different lifecycle needs (one is stable, another changes frequently)
Different resource requirements (one needs
```
l
```
, another needs
```
s
```
)
Different chains with independent processing (templated pattern)

当无需认证头时，Webhook可直接使用URL而非凭证：

yaml

sinks:
  my_webhook:
    type: webhook
    from: my_transform
    url: https://my-lambda.us-west-2.on.aws/

Keeping Pipelines Focused

管道拆分决策

—

单管道 vs 多管道

A pipeline should ideally do one logical thing:

Pipeline	Focus
`dex-trades`	Trade events → Postgres
`dex-activities`	All activity types → ClickHouse DWH
`token-balances`	Token balances → Postgres
`base-balance-streaming`	Base balances → ClickHouse + webhook

Even though trades are a subset of activities, they're separate pipelines because they serve different consumers (application DB vs data warehouse).

使用单管道的场景：

所有数据来自同一源
转换逻辑共享中间结果（如共享解码步骤）
需要整个流程原子化部署

拆分为多管道的场景：

不同数据源且无共享转换逻辑
生命周期需求不同（一个稳定，另一个频繁变更）
资源需求不同（一个需要
```
l
```
，另一个需要
```
s
```
）
多链独立处理（模板化模式）

Streaming vs Job Mode

保持管道聚焦

Turbo pipelines have two execution modes:

一个管道应专注于单一逻辑任务：

管道名称	核心目标
`dex-trades`	交易事件→Postgres
`dex-activities`	所有活动类型→ClickHouse数据仓库
`token-balances`	代币余额→Postgres
`base-balance-streaming`	Base链余额→ClickHouse + Webhook

尽管交易是活动的子集，但因服务不同消费者（应用数据库 vs 数据仓库），仍需拆分为独立管道。

Streaming Mode (Default)

流式模式 vs 作业模式

yaml

name: my-streaming-pipeline
resource_size: s

Turbo管道有两种执行模式：

job: false (default — omit this field)

流式模式（默认）


- Runs continuously, processing data as it arrives
- Maintains checkpoints for exactly-once processing
- Use for real-time feeds, dashboards, APIs

yaml

name: my-streaming-pipeline
resource_size: s

Job Mode (One-Time Batch)

job: false （默认，可省略该字段）

yaml

name: my-backfill-job
resource_size: l
job: true

Runs to completion and stops automatically
Auto-deletes resources ~1 hour after completion
Must delete before redeploying — cannot update a job pipeline, must
```
goldsky turbo delete
```
first
Cannot use
restart
— use delete + apply instead
Use for historical backfills, one-time data migrations, snapshot exports


- 持续运行，实时处理新到达的数据
- 维护检查点以实现Exactly-Once处理
- 适用于实时数据流、仪表盘、API服务

When to Use Which

作业模式（一次性批处理）

Scenario	Mode	Why
Real-time dashboard	Streaming	Continuous updates needed
Backfill 6 months of history	Job	One-time, stops when done
Real-time + catch-up on deploy	Streaming	`start_at: earliest` does backfill then streams
Export data to S3 once	Job	No need for continuous processing
Webhook notifications on events	Streaming	Needs to react as events happen
Load test with historical data	Job	Process and inspect, then discard

yaml

name: my-backfill-job
resource_size: l
job: true

运行至完成后自动停止
完成约1小时后自动删除资源
重新部署前必须删除——无法更新作业管道，需先执行
```
goldsky turbo delete
```
无法使用
restart
命令——需先删除再重新部署
适用于历史回填、一次性数据迁移、快照导出

Job Mode with Bounded Ranges

模式选择对比

Combine job mode with

start_at: earliest

and an

end_block

to process a specific range:

yaml

name: historical-export
resource_size: l
job: true

sources:
  logs:
    type: dataset
    dataset_name: ethereum.raw_logs
    version: 1.0.0
    start_at: earliest
    end_block: 19000000
    filter: >-
      address = '0xdac17f958d2ee523a2206206994597c13d831ec7'

场景	模式	原因
实时仪表盘	流式模式	需要持续更新数据
6个月历史数据回填	作业模式	一次性任务，完成后停止
实时处理+部署时回填历史	流式模式	`start_at: earliest` 可先处理历史再转为实时流
一次性导出数据至S3	作业模式	无需持续处理
事件触发Webhook通知	流式模式	需要实时响应事件
用历史数据做负载测试	作业模式	处理后可直接丢弃资源

Dynamic Table Architecture

带范围限制的作业模式

Dynamic tables enable runtime-updatable lookup data within a pipeline. They're the Turbo answer to the "no joins in streaming SQL" limitation.

结合作业模式与

start_at: earliest

和

end_block

可处理特定范围的数据：

yaml

name: historical-export
resource_size: l
job: true

sources:
  logs:
    type: dataset
    dataset_name: ethereum.raw_logs
    version: 1.0.0
    start_at: earliest
    end_block: 19000000
    filter: >-
      address = '0xdac17f958d2ee523a2206206994597c13d831ec7'

Pattern: Dynamic Allowlist/Blocklist

动态表架构

                    ┌──────────────────────┐
                    │  External Updates     │
                    │  (Postgres / REST)    │
                    └──────────┬───────────┘
                               ▼
source ──→ sql transform ──→ [dynamic_table_check()] ──→ sink

The SQL transform filters records against the dynamic table. The table contents can be updated externally without pipeline restart.

动态表支持在管道运行时更新查询数据，是Turbo解决“流式SQL不支持JOIN”限制的方案。

Pattern: Lookup Enrichment

模式：动态白名单/黑名单

source ──→ decode ──→ filter ──→ sql (with dynamic_table_check) ──→ sink
                                        ▲
                              [token_metadata table]
                              (postgres-backed)

Store metadata (token symbols, decimals, protocol names) in a PostgreSQL table. Reference it in transforms for enrichment.

                    ┌──────────────────────┐
                    │  外部更新            │
                    │  (Postgres / REST)   │
                    └──────────┬───────────┘
                               ▼
source ──→ sql转换 ──→ [dynamic_table_check()] ──→ sink

SQL转换逻辑基于动态表过滤记录，表内容可通过外部系统更新且无需重启管道。

Backend Decisions

模式：查询增强

Backend	`backend_type`	When to Use
PostgreSQL	`Postgres`	Data managed by external systems, shared across pipeline restarts
In-memory	`InMemory`	Auto-populated from pipeline data, ephemeral, fastest lookups

source ──→ 解码 ──→ 过滤 ──→ SQL（含dynamic_table_check） ──→ sink
                                        ▲
                              [代币元数据表]
                              (Postgres后端)

在Postgres表中存储元数据（代币符号、小数位数、协议名称），在转换逻辑中引用以增强数据。

Sizing Considerations

后端选择

Dynamic tables add memory overhead proportional to table size
For large lookup tables (>100K rows), use
```
Postgres
```
backend
For small, frequently-changing lists (<10K rows),
```
InMemory
```
is faster
Dynamic table queries are async — they add slight latency per record

For full dynamic table configuration syntax and examples, see
/turbo-transforms
.

后端	`backend_type`	适用场景
PostgreSQL	`Postgres`	数据由外部系统管理，管道重启后仍可共享
内存	`InMemory`	由管道数据自动填充，临时存储，查询速度最快

规格注意事项

/turbo-builder
— Build and deploy pipelines interactively using these architecture patterns
/turbo-doctor
— Diagnose and fix pipeline issues
/turbo-pipelines
— Pipeline YAML configuration reference
/turbo-transforms
— SQL, TypeScript, and dynamic table transform reference
/datasets
— Blockchain dataset and chain prefix reference
/secrets
— Sink credential management
/turbo-monitor-debug
— Monitoring and debugging reference
/turbo-lifecycle
— Pipeline lifecycle command reference

动态表会增加与表大小成正比的内存开销
大型查询表（>10万行）建议使用
```
Postgres
```
后端
小型、频繁变更的列表（<1万行）使用
```
InMemory
```
速度更快
动态表查询为异步操作——会为每条记录增加轻微延迟

完整的动态表配置语法和示例，请查看
/turbo-transforms
。

—

turbo-architecture

Original

Translation

Turbo Pipeline Architecture

Turbo管道架构

Agent Instructions

Agent操作指引

Step 1: Understand the Requirements

步骤1：理解需求

Step 2: Recommend an Architecture

步骤2：推荐架构

Step 3: Hand Off to Implementation Skills

步骤3：移交至实现类技能

Source Types

源类型

Dataset Sources — Historical + Real-Time

数据集源 — 历史+实时

Kafka Sources — Real-Time Streaming

Kafka源 — 实时流

When to Use Which

源类型选择对比

Data Flow Patterns

数据流模式

Linear Pipeline

线性管道

Fan-Out (One Source → Multiple Sinks)

扇出（单源→多Sink）

Fan-In (Multiple Events → One Output)

扇入（多事件→单输出）

Multi-Chain Fan-In

多链扇入

Templated Multi-Chain Deployment

模板化多链部署

Resource Sizing

模板化 vs 多源管道对比

Sink Selection

资源规格

Quick Reference

Decision Flowchart

Sink选择

快速参考

PostgreSQL Aggregate Sink

决策流程图

Multi-Sink Considerations

PostgreSQL Aggregate Sink

Webhook Sinks Without Secrets

多Sink注意事项

Pipeline Splitting Decisions

无需凭证的Webhook Sink

One Pipeline vs. Multiple

Keeping Pipelines Focused

管道拆分决策

单管道 vs 多管道

Streaming vs Job Mode

保持管道聚焦

Streaming Mode (Default)

流式模式 vs 作业模式

job: false (default — omit this field)

流式模式（默认）

Job Mode (One-Time Batch)

job: false （默认，可省略该字段）

When to Use Which

作业模式（一次性批处理）

Job Mode with Bounded Ranges

模式选择对比

Dynamic Table Architecture

带范围限制的作业模式

Pattern: Dynamic Allowlist/Blocklist

动态表架构

Pattern: Lookup Enrichment

模式：动态白名单/黑名单

Backend Decisions

模式：查询增强

Sizing Considerations

后端选择

Related

规格注意事项

相关技能