turbo-architecture
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTurbo Pipeline Architecture
Turbo管道架构
Help users make architecture decisions for Turbo pipelines — source types, data flow patterns, resource sizing, sink strategies, streaming vs job mode, dynamic table design, and multi-chain deployment.
帮助用户为Turbo管道制定架构决策——包括源类型、数据流模式、资源规格、Sink策略、流式与作业模式、动态表设计以及多链部署。
Agent Instructions
Agent操作指引
Step 1: Understand the Requirements
步骤1:理解需求
Ask the user about:
- What data? — Which chain(s), which events/datasets, historical or real-time only?
- Where does it go? — Database, webhook, streaming topic, multiple destinations?
- How much volume? — Single contract or all chain activity? How many events/sec?
- Latency needs? — Real-time dashboards (sub-second) or analytics (minutes OK)?
询问用户以下信息:
- 数据内容? — 涉及哪些链、哪些事件/数据集,仅需历史数据还是实时数据?
- 数据流向? — 数据库、Webhook、流式主题,还是多个目标?
- 数据量? — 单个合约还是全链活动?每秒多少事件?
- 延迟要求? — 实时仪表盘(亚秒级)还是分析场景(分钟级可接受)?
Step 2: Recommend an Architecture
步骤2:推荐架构
Use the patterns and decision guides below to recommend a pipeline architecture. Reference the templates in as starting points:
templates/- — Simple decode → filter → sink
templates/linear-pipeline.yaml - — One source → multiple sinks
templates/fan-out-pipeline.yaml - — Multiple events → UNION ALL → sink
templates/fan-in-pipeline.yaml - — Per-chain pipeline pattern
templates/multi-chain-templated.yaml
使用以下模式和决策指南推荐管道架构,可参考中的模板作为起点:
templates/- — 简单的解码→过滤→Sink流程
templates/linear-pipeline.yaml - — 单源→多Sink
templates/fan-out-pipeline.yaml - — 多事件→UNION ALL→Sink
templates/fan-in-pipeline.yaml - — 单链管道模板
templates/multi-chain-templated.yaml
Step 3: Hand Off to Implementation Skills
步骤3:移交至实现类技能
After the architecture is decided, direct the user to:
- — To build and deploy the YAML
/turbo-pipelines - — To write the SQL transforms
/turbo-transforms - — To set up sink credentials
/secrets
Reminder: When presenting complete pipeline YAML as part of architecture recommendations, validate it first with . Templates in are structural patterns — any customized version must be validated before presenting to the user.
goldsky turbo validatetemplates/确定架构后,引导用户使用:
- — 构建并部署YAML配置
/turbo-pipelines - — 编写SQL转换逻辑
/turbo-transforms - — 配置Sink凭证
/secrets
提示: 当在架构推荐中提供完整的管道YAML时,需先用验证。中的模板为结构示例,任何自定义版本在提供给用户前都必须经过验证。
goldsky turbo validatetemplates/Source Types
源类型
Dataset Sources — Historical + Real-Time
数据集源 — 历史+实时
Use when you need to process historical blockchain data and/or continue streaming new data.
type: datasetyaml
sources:
my_source:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: earliest # or: latestBest for: Raw logs, transactions, transfers, blocks — anything where you need historical backfill or decoded event processing.
Available EVM datasets: , , , , , , ,
raw_logsblocksraw_transactionsraw_traceserc20_transferserc721_transferserc1155_transfersdecoded_logsNon-EVM chains also supported: Solana (), Bitcoin (), Stellar ()
solana.*bitcoin.raw.*stellar_mainnet.*当你需要处理区块链历史数据,同时持续处理新的实时数据时,使用。
type: datasetyaml
sources:
my_source:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: earliest # 或: latest适用场景: 原始日志、交易、转账、区块——任何需要历史回填或解码事件处理的场景。
支持的EVM数据集: , , , , , , ,
raw_logsblocksraw_transactionsraw_traceserc20_transferserc721_transferserc1155_transfersdecoded_logs也支持非EVM链: Solana()、Bitcoin()、Stellar()
solana.*bitcoin.raw.*stellar_mainnet.*Kafka Sources — Real-Time Streaming
Kafka源 — 实时流
Note: Kafka sources are used in production pipelines but are not documented in the official Goldsky docs. Use with caution and contact Goldsky support for topic names.
Use when consuming from a Goldsky-managed Kafka topic, typically for continuously-updated state like balances.
type: kafkayaml
sources:
my_source:
type: kafka
topic: base.raw.latest_balances_v2Best for: Balance snapshots, latest state data, high-volume continuous streams.
Key differences from dataset sources:
- No or
start_atfieldsversion - Optional fields: ,
filter,include_metadatastarting_offsets - Delivers the latest state rather than historical event logs
注意: Kafka源已用于生产管道,但未在Goldsky官方文档中记录。使用时需谨慎,如需主题名称请联系Goldsky支持团队。
当你需要消费Goldsky托管的Kafka主题时使用,通常用于处理余额这类持续更新的状态数据。
type: kafkayaml
sources:
my_source:
type: kafka
topic: base.raw.latest_balances_v2适用场景: 代币余额快照、最新状态数据、高容量持续流。
与数据集源的关键区别:
- 无或
start_at字段version - 可选字段:,
filter,include_metadatastarting_offsets - 提供最新状态而非历史事件日志
When to Use Which
源类型选择对比
| Scenario | Source Type | Why |
|---|---|---|
| Decode contract events from logs | | Need |
| Track token transfers | | |
| Historical backfill + live | | |
| Live token balances | | |
| Real-time state snapshots | | Kafka delivers latest state continuously |
| Only need new data going forward | Either | Dataset with |
| 场景 | 源类型 | 原因 |
|---|---|---|
| 从日志中解码合约事件 | | 需要 |
| 追踪代币转账 | | |
| 历史回填+实时处理 | | |
| 实时代币余额 | | |
| 实时状态快照 | | Kafka持续推送最新状态 |
| 仅需处理未来新数据 | 两者均可 | 数据集设置 |
Data Flow Patterns
数据流模式
Linear Pipeline
线性管道
The simplest pattern. One source → one or more chained transforms → one sink.
source → transform_a → transform_b → sinkUse when: You have a single data source, single destination, and straightforward processing (decode, filter, reshape).
Example: — raw logs → decode → extract trade events → postgres
templates/linear-pipeline.yamlResource size: or
sm最简单的模式:单源→一个或多个链式转换→单Sink。
source → transform_a → transform_b → sink适用场景: 单一数据源、单一目标,且处理逻辑简单(解码、过滤、重塑)。
示例: — 原始日志→解码→提取交易事件→Postgres
templates/linear-pipeline.yaml资源规格: 或
smFan-Out (One Source → Multiple Sinks)
扇出(单源→多Sink)
One source feeds multiple transforms, each writing to a different sink.
┌→ transform_a → sink_1 (clickhouse)
source ──────┤
└→ transform_b → sink_2 (webhook)Use when: You need different views or subsets of the same data going to different destinations — e.g., balances to a warehouse AND token metadata to a webhook.
Example: — one Kafka source → fungible balances to ClickHouse + all tokens to a webhook
templates/fan-out-pipeline.yamlyaml
transforms:
fungible_balances:
type: sql
primary_key: id
sql: |
SELECT ... FROM latest_balances balance
WHERE balance.token_type = 'ERC_20' OR balance.token_type IS NULL
all_tokens:
type: sql
primary_key: id
sql: |
SELECT ... FROM latest_balances balance
WHERE balance.token_type IN ('ERC_20', 'ERC_721', 'ERC_1155')
sinks:
warehouse:
type: clickhouse
from: fungible_balances
# ...
webhook:
type: webhook
from: all_tokens
# ...Resource size: (multiple output paths)
m单个源供给多个转换逻辑,每个转换写入不同的Sink。
┌→ transform_a → sink_1 (clickhouse)
source ──────┤
└→ transform_b → sink_2 (webhook)适用场景: 需要将同一数据的不同视图或子集发送到不同目标——例如,将余额数据发送到数据仓库,同时将代币元数据发送到Webhook。
示例: — 单个Kafka源→ fungible余额到ClickHouse + 所有代币数据到Webhook
templates/fan-out-pipeline.yamlyaml
transforms:
fungible_balances:
type: sql
primary_key: id
sql: |
SELECT ... FROM latest_balances balance
WHERE balance.token_type = 'ERC_20' OR balance.token_type IS NULL
all_tokens:
type: sql
primary_key: id
sql: |
SELECT ... FROM latest_balances balance
WHERE balance.token_type IN ('ERC_20', 'ERC_721', 'ERC_1155')
sinks:
warehouse:
type: clickhouse
from: fungible_balances
# ...
webhook:
type: webhook
from: all_tokens
# ...资源规格: (多输出路径)
mFan-In (Multiple Events → One Output)
扇入(多事件→单输出)
Multiple event types decoded from the same source, normalized to a common schema, then combined with UNION ALL into a single sink.
┌→ event_type_a ──┐
source → decode ┤ ├→ UNION ALL → sink
└→ event_type_b ──┘Use when: You want a unified activity feed, combining trades, deposits, withdrawals, transfers, etc. into one table.
Example: — one raw_logs source → decode → multiple event-type transforms → UNION ALL → ClickHouse
templates/fan-in-pipeline.yamlResource size: (complex processing with many transforms)
l从同一源解码出多种事件类型,标准化为通用Schema,再通过UNION ALL合并到单个Sink。
┌→ event_type_a ──┐
source → decode ┤ ├→ UNION ALL → sink
└→ event_type_b ──┘适用场景: 需要统一的活动流,将交易、存款、取款、转账等合并到单个表中。
示例: — 单个原始日志源→解码→多事件类型转换→UNION ALL→ClickHouse
templates/fan-in-pipeline.yaml资源规格: (多转换逻辑的复杂处理)
lMulti-Chain Fan-In
多链扇入
Multiple sources from different chains combined into a single output.
source_chain_a ──┐
source_chain_b ──┼→ UNION ALL → sink
source_chain_c ──┘Use when: You want cross-chain analytics or a unified view across chains.
yaml
sources:
eth_transfers:
type: dataset
dataset_name: ethereum.erc20_transfers
version: 1.0.0
start_at: latest
base_transfers:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: latest
transforms:
combined:
type: sql
primary_key: id
sql: |
SELECT *, 'ethereum' AS chain FROM eth_transfers
UNION ALL
SELECT *, 'base' AS chain FROM base_transfersResource size: or depending on chain count
ml来自不同链的多个源合并为单个输出。
source_chain_a ──┐
source_chain_b ──┼→ UNION ALL → sink
source_chain_c ──┘适用场景: 需要跨链分析或多链统一视图。
yaml
sources:
eth_transfers:
type: dataset
dataset_name: ethereum.erc20_transfers
version: 1.0.0
start_at: latest
base_transfers:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: latest
transforms:
combined:
type: sql
primary_key: id
sql: |
SELECT *, 'ethereum' AS chain FROM eth_transfers
UNION ALL
SELECT *, 'base' AS chain FROM base_transfers资源规格: 或,取决于链的数量
mlTemplated Multi-Chain Deployment
模板化多链部署
When you need the same pipeline logic across multiple chains, create separate pipeline files per chain rather than one multi-source pipeline. This gives you:
- Independent lifecycle (deploy/delete per chain)
- Independent checkpointing (one chain failing doesn't block others)
- Clearer monitoring per chain
Pattern: Copy the pipeline YAML and swap the chain-specific values:
| Field | Chain A (base) | Chain B (arbitrum) |
|---|---|---|
| | |
| | |
| Source key | | |
| Transform SQL | | |
| Sink table | | |
Example: — shows the base chain version; duplicate for each chain.
templates/multi-chain-templated.yamlWhen to use templated vs multi-source:
| Approach | Pros | Cons |
|---|---|---|
| Templated (per-chain) | Independent lifecycle, clear monitoring | More files to manage |
| Multi-source (one file) | Single deployment, cross-chain UNION possible | Coupled lifecycle, harder to debug |
当需要在多个链上使用相同管道逻辑时,应为每个链创建单独的管道文件,而非单文件多源管道。这样做的优势:
- 独立生命周期(可按链部署/删除)
- 独立检查点(某条链故障不会阻塞其他链)
- 更清晰的分链监控
模式: 复制管道YAML并替换链相关参数:
| 字段 | 链A(base) | 链B(arbitrum) |
|---|---|---|
| | |
| | |
| 源Key | | |
| 转换SQL | | |
| Sink表名 | | |
示例: — 展示base链版本,复制后适配其他链即可。
templates/multi-chain-templated.yamlResource Sizing
模板化 vs 多源管道对比
Each size doubles the previous tier's CPU and memory:
| Size | Workers | CPU Request | Memory | When to Use |
|---|---|---|---|---|
| — | 0.4 | 0.5 Gi | Small datasets, light testing |
| 1 | 0.8 | 1.0 Gi | Testing, simple filters, single source/sink, low volume (default) |
| 4 | 1.6 | 2.0 Gi | Multiple sinks, Kafka streaming, moderate transform complexity |
| 10 | 3.2 | 4.0 Gi | Multi-event decoding with UNION ALL, high-volume historical backfill |
| 20 | 6.4 | 8.0 Gi | Large chain backfills, complex JOINs (e.g. Solana accounts+transactions) |
| 40 | 12.8 | 16.0 Gi | Highest throughput needs; up to 6.3M rows/min |
Rules of thumb from production pipelines:
- Simple filter + single sink → (default, try this first)
s - Kafka source + multiple sinks OR multiple transforms →
m - Raw log decoding + 5+ event types + UNION ALL →
l - Historical backfill of high-volume data → or
l(can downsize after catch-up)xl - Start small and scale up — defensive sizing avoids wasted resources
| 方案 | 优势 | 劣势 |
|---|---|---|
| 模板化(单链文件) | 独立生命周期、监控清晰 | 需要管理更多文件 |
| 多源(单文件) | 单次部署、支持跨链UNION操作 | 生命周期耦合、调试难度大 |
Sink Selection
资源规格
Quick Reference
—
| Destination | Sink Type | Best For |
|---|---|---|
| Application DB | | Row-level lookups, joins, application serving |
| Real-time aggregates | | Balances, counters, running totals via triggers |
| Analytics queries | | Large-scale aggregations, time-series data |
| Event processing | | Downstream consumers, event-driven systems |
| Serverless streaming | | S2.dev streams, alternative to Kafka |
| Notifications | | Lambda functions, API callbacks, alerts |
| Data lake | | Long-term archival, batch processing |
| Testing | | Validate pipeline without writing data |
每个规格的CPU和内存是上一级的两倍:
| 规格 | 工作节点数 | CPU请求 | 内存 | 适用场景 |
|---|---|---|---|---|
| — | 0.4 | 0.5 Gi | 小数据集、轻量测试 |
| 1 | 0.8 | 1.0 Gi | 测试、简单过滤、单源单Sink、低数据量(默认值) |
| 4 | 1.6 | 2.0 Gi | 多Sink、Kafka流、中等复杂度转换 |
| 10 | 3.2 | 4.0 Gi | 多事件解码+UNION ALL、高容量历史回填 |
| 20 | 6.4 | 8.0 Gi | 大链历史回填、复杂JOIN(如Solana账户+交易) |
| 40 | 12.8 | 16.0 Gi | 超高吞吐量需求;最高可达630万行/分钟 |
生产管道经验法则:
- 简单过滤+单Sink → (默认,优先尝试)
s - Kafka源+多Sink 或 多转换逻辑 →
m - 原始日志解码+5种以上事件类型+UNION ALL →
l - 高容量数据历史回填 → 或
l(回填完成后可降级规格)xl - 从小规格开始,按需扩容——保守选型避免资源浪费
Decision Flowchart
Sink选择
—
快速参考
What's your primary use case?
│
├─ Application serving (REST/GraphQL API)
│ └─ PostgreSQL ← row-level lookups, joins, strong consistency
│
├─ Analytics / dashboards
│ ├─ Time-series queries → ClickHouse ← columnar, fast aggregations
│ └─ Full-text search → Elasticsearch / OpenSearch
│
├─ Real-time aggregations (balances, counters)
│ └─ PostgreSQL Aggregate ← trigger-based running totals
│
├─ Event-driven downstream processing
│ ├─ Need Kafka ecosystem → Kafka
│ └─ Serverless / simpler → S2 (s2.dev)
│
├─ Notifications / webhooks
│ └─ Webhook ← HTTP POST per event
│
├─ Long-term archival
│ └─ S3 ← object storage, cheapest for bulk data
│
├─ Just testing
│ └─ Blackhole ← validates pipeline without writing
│
└─ Multiple of the above
└─ Use multiple sinks in the same pipeline (fan-out pattern)| 目标系统 | Sink类型 | 适用场景 |
|---|---|---|
| 应用数据库 | | 行级查询、关联、应用数据服务 |
| 实时聚合计算 | | 余额、计数器、累计值(通过触发器实现) |
| 分析查询 | | 大规模聚合、时序数据 |
| 事件处理 | | 下游消费者、事件驱动系统 |
| 无服务器流处理 | | S2.dev流、Kafka替代方案 |
| 通知推送 | | Lambda函数、API回调、告警 |
| 数据湖 | | 长期归档、批处理 |
| 测试场景 | | 验证管道逻辑但不写入数据 |
PostgreSQL Aggregate Sink
决策流程图
The sink is uniquely suited for real-time running aggregations (balances, counters, totals). It uses a two-table pattern: a landing table that receives raw events, and an aggregation table maintained by a database trigger.
postgres_aggregateyaml
sinks:
token_balances:
type: postgres_aggregate
from: transfers
schema: public
landing_table: transfer_events
agg_table: account_balances
primary_key: id
secret_name: MY_POSTGRES
group_by:
account:
type: text
token_address:
type: text
aggregate:
balance:
from: amount
fn: sum
transfer_count:
from: id
fn: countSupported aggregation functions: , , , ,
sumcountavgminmax你的核心使用场景是什么?
│
├─ 应用服务(REST/GraphQL API)
│ └─ PostgreSQL ← 行级查询、关联、强一致性
│
├─ 分析/仪表盘
│ ├─ 时序查询 → ClickHouse ← 列式存储、快速聚合
│ └─ 全文检索 → Elasticsearch / OpenSearch
│
├─ 实时聚合(余额、计数器)
│ └─ PostgreSQL Aggregate ← 基于触发器的累计计算
│
├─ 事件驱动下游处理
│ ├─ 需要Kafka生态 → Kafka
│ └─ 无服务器/更简单方案 → S2 (s2.dev)
│
├─ 通知/Webhook
│ └─ Webhook ← 每个事件触发HTTP POST请求
│
├─ 长期归档
│ └─ S3 ← 对象存储、批量数据低成本存储
│
├─ 仅测试
│ └─ Blackhole ← 验证管道但不写入数据
│
└─ 以上多种场景
└─ 在同一管道中使用多Sink(扇出模式)Multi-Sink Considerations
PostgreSQL Aggregate Sink
- Each sink reads from a field — different sinks can read from different transforms
from: - Sinks are independent — one failing doesn't block others
- Use different /
batch_sizeper sink based on latency needsbatch_flush_interval - ClickHouse supports for concurrent writers (default
parallelism: N)1
postgres_aggregateyaml
sinks:
token_balances:
type: postgres_aggregate
from: transfers
schema: public
landing_table: transfer_events
agg_table: account_balances
primary_key: id
secret_name: MY_POSTGRES
group_by:
account:
type: text
token_address:
type: text
aggregate:
balance:
from: amount
fn: sum
transfer_count:
from: id
fn: count支持的聚合函数: , , , ,
sumcountavgminmaxWebhook Sinks Without Secrets
多Sink注意事项
Webhooks can use a direct URL instead of a secret when no auth headers are needed:
yaml
sinks:
my_webhook:
type: webhook
from: my_transform
url: https://my-lambda.us-west-2.on.aws/- 每个Sink通过字段指定数据源——不同Sink可读取不同转换结果
from: - Sink相互独立——单个Sink故障不会阻塞其他Sink
- 可根据延迟需求为每个Sink设置不同的/
batch_sizebatch_flush_interval - ClickHouse支持配置并发写入(默认值
parallelism: N)1
Pipeline Splitting Decisions
无需凭证的Webhook Sink
One Pipeline vs. Multiple
—
Use one pipeline when:
- All data comes from the same source
- Transforms share intermediate results (e.g., a shared decode step)
- You want atomic deployment of the whole flow
Split into multiple pipelines when:
- Different data sources with no shared transforms
- Different lifecycle needs (one is stable, another changes frequently)
- Different resource requirements (one needs , another needs
l)s - Different chains with independent processing (templated pattern)
当无需认证头时,Webhook可直接使用URL而非凭证:
yaml
sinks:
my_webhook:
type: webhook
from: my_transform
url: https://my-lambda.us-west-2.on.aws/Keeping Pipelines Focused
管道拆分决策
—
单管道 vs 多管道
A pipeline should ideally do one logical thing:
| Pipeline | Focus |
|---|---|
| Trade events → Postgres |
| All activity types → ClickHouse DWH |
| Token balances → Postgres |
| Base balances → ClickHouse + webhook |
Even though trades are a subset of activities, they're separate pipelines because they serve different consumers (application DB vs data warehouse).
使用单管道的场景:
- 所有数据来自同一源
- 转换逻辑共享中间结果(如共享解码步骤)
- 需要整个流程原子化部署
拆分为多管道的场景:
- 不同数据源且无共享转换逻辑
- 生命周期需求不同(一个稳定,另一个频繁变更)
- 资源需求不同(一个需要,另一个需要
l)s - 多链独立处理(模板化模式)
Streaming vs Job Mode
保持管道聚焦
Turbo pipelines have two execution modes:
一个管道应专注于单一逻辑任务:
| 管道名称 | 核心目标 |
|---|---|
| 交易事件→Postgres |
| 所有活动类型→ClickHouse数据仓库 |
| 代币余额→Postgres |
| Base链余额→ClickHouse + Webhook |
尽管交易是活动的子集,但因服务不同消费者(应用数据库 vs 数据仓库),仍需拆分为独立管道。
Streaming Mode (Default)
流式模式 vs 作业模式
yaml
name: my-streaming-pipeline
resource_size: sTurbo管道有两种执行模式:
job: false (default — omit this field)
流式模式(默认)
- Runs continuously, processing data as it arrives
- Maintains checkpoints for exactly-once processing
- Use for real-time feeds, dashboards, APIsyaml
name: my-streaming-pipeline
resource_size: sJob Mode (One-Time Batch)
job: false (默认,可省略该字段)
yaml
name: my-backfill-job
resource_size: l
job: true- Runs to completion and stops automatically
- Auto-deletes resources ~1 hour after completion
- Must delete before redeploying — cannot update a job pipeline, must first
goldsky turbo delete - Cannot use — use delete + apply instead
restart - Use for historical backfills, one-time data migrations, snapshot exports
- 持续运行,实时处理新到达的数据
- 维护检查点以实现Exactly-Once处理
- 适用于实时数据流、仪表盘、API服务When to Use Which
作业模式(一次性批处理)
| Scenario | Mode | Why |
|---|---|---|
| Real-time dashboard | Streaming | Continuous updates needed |
| Backfill 6 months of history | Job | One-time, stops when done |
| Real-time + catch-up on deploy | Streaming | |
| Export data to S3 once | Job | No need for continuous processing |
| Webhook notifications on events | Streaming | Needs to react as events happen |
| Load test with historical data | Job | Process and inspect, then discard |
yaml
name: my-backfill-job
resource_size: l
job: true- 运行至完成后自动停止
- 完成约1小时后自动删除资源
- 重新部署前必须删除——无法更新作业管道,需先执行
goldsky turbo delete - 无法使用命令——需先删除再重新部署
restart - 适用于历史回填、一次性数据迁移、快照导出
Job Mode with Bounded Ranges
模式选择对比
Combine job mode with and an to process a specific range:
start_at: earliestend_blockyaml
name: historical-export
resource_size: l
job: true
sources:
logs:
type: dataset
dataset_name: ethereum.raw_logs
version: 1.0.0
start_at: earliest
end_block: 19000000
filter: >-
address = '0xdac17f958d2ee523a2206206994597c13d831ec7'| 场景 | 模式 | 原因 |
|---|---|---|
| 实时仪表盘 | 流式模式 | 需要持续更新数据 |
| 6个月历史数据回填 | 作业模式 | 一次性任务,完成后停止 |
| 实时处理+部署时回填历史 | 流式模式 | |
| 一次性导出数据至S3 | 作业模式 | 无需持续处理 |
| 事件触发Webhook通知 | 流式模式 | 需要实时响应事件 |
| 用历史数据做负载测试 | 作业模式 | 处理后可直接丢弃资源 |
Dynamic Table Architecture
带范围限制的作业模式
Dynamic tables enable runtime-updatable lookup data within a pipeline. They're the Turbo answer to the "no joins in streaming SQL" limitation.
结合作业模式与和可处理特定范围的数据:
start_at: earliestend_blockyaml
name: historical-export
resource_size: l
job: true
sources:
logs:
type: dataset
dataset_name: ethereum.raw_logs
version: 1.0.0
start_at: earliest
end_block: 19000000
filter: >-
address = '0xdac17f958d2ee523a2206206994597c13d831ec7'Pattern: Dynamic Allowlist/Blocklist
动态表架构
┌──────────────────────┐
│ External Updates │
│ (Postgres / REST) │
└──────────┬───────────┘
▼
source ──→ sql transform ──→ [dynamic_table_check()] ──→ sinkThe SQL transform filters records against the dynamic table. The table contents can be updated externally without pipeline restart.
动态表支持在管道运行时更新查询数据,是Turbo解决“流式SQL不支持JOIN”限制的方案。
Pattern: Lookup Enrichment
模式:动态白名单/黑名单
source ──→ decode ──→ filter ──→ sql (with dynamic_table_check) ──→ sink
▲
[token_metadata table]
(postgres-backed)Store metadata (token symbols, decimals, protocol names) in a PostgreSQL table. Reference it in transforms for enrichment.
┌──────────────────────┐
│ 外部更新 │
│ (Postgres / REST) │
└──────────┬───────────┘
▼
source ──→ sql转换 ──→ [dynamic_table_check()] ──→ sinkSQL转换逻辑基于动态表过滤记录,表内容可通过外部系统更新且无需重启管道。
Backend Decisions
模式:查询增强
| Backend | | When to Use |
|---|---|---|
| PostgreSQL | | Data managed by external systems, shared across pipeline restarts |
| In-memory | | Auto-populated from pipeline data, ephemeral, fastest lookups |
source ──→ 解码 ──→ 过滤 ──→ SQL(含dynamic_table_check) ──→ sink
▲
[代币元数据表]
(Postgres后端)在Postgres表中存储元数据(代币符号、小数位数、协议名称),在转换逻辑中引用以增强数据。
Sizing Considerations
后端选择
- Dynamic tables add memory overhead proportional to table size
- For large lookup tables (>100K rows), use backend
Postgres - For small, frequently-changing lists (<10K rows), is faster
InMemory - Dynamic table queries are async — they add slight latency per record
For full dynamic table configuration syntax and examples, see./turbo-transforms
| 后端 | | 适用场景 |
|---|---|---|
| PostgreSQL | | 数据由外部系统管理,管道重启后仍可共享 |
| 内存 | | 由管道数据自动填充,临时存储,查询速度最快 |
Related
规格注意事项
- — Build and deploy pipelines interactively using these architecture patterns
/turbo-builder - — Diagnose and fix pipeline issues
/turbo-doctor - — Pipeline YAML configuration reference
/turbo-pipelines - — SQL, TypeScript, and dynamic table transform reference
/turbo-transforms - — Blockchain dataset and chain prefix reference
/datasets - — Sink credential management
/secrets - — Monitoring and debugging reference
/turbo-monitor-debug - — Pipeline lifecycle command reference
/turbo-lifecycle
- 动态表会增加与表大小成正比的内存开销
- 大型查询表(>10万行)建议使用后端
Postgres - 小型、频繁变更的列表(<1万行)使用速度更快
InMemory - 动态表查询为异步操作——会为每条记录增加轻微延迟
完整的动态表配置语法和示例,请查看。/turbo-transforms
—
相关技能
—
- — 使用上述架构模式交互式构建并部署管道
/turbo-builder - — 诊断并修复管道问题
/turbo-doctor - — 管道YAML配置参考
/turbo-pipelines - — SQL、TypeScript和动态表转换参考
/turbo-transforms - — 区块链数据集和链前缀参考
/datasets - — Sink凭证管理
/secrets - — 监控与调试参考
/turbo-monitor-debug - — 管道生命周期命令参考
/turbo-lifecycle