turbo-builder

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Pipeline Builder

流水线构建器

Boundaries

边界

  • Build NEW pipelines. Do not diagnose broken pipelines — that belongs to
    /turbo-doctor
    .
  • Do not serve as a YAML reference. If the user only needs to look up a field or syntax, use the
    /turbo-pipelines
    skill instead.
  • For dataset lookups, use
    /datasets
    .
Walk the user through building a complete pipeline from scratch, step by step. Generate a valid YAML configuration, validate it, and deploy it.
  • 仅构建全新流水线。不要诊断已损坏的流水线——该场景请使用
    /turbo-doctor
  • 不提供YAML参考功能。如果用户仅需要查询字段或语法,请使用
    /turbo-pipelines
    技能。
  • 如需查询数据集,请使用
    /datasets
引导用户一步步从零开始构建完整的流水线,生成有效的YAML配置,对其进行校验,然后部署。

Builder Workflow

构建器工作流

Step 1: Verify Authentication

步骤1:校验身份认证

Run
goldsky project list 2>&1
to check login status.
  • If logged in: Note the current project and continue.
  • If not logged in: Use the
    /auth-setup
    skill for guidance.
运行
goldsky project list 2>&1
检查登录状态。
  • 已登录: 记录当前项目,继续后续流程。
  • 未登录: 使用
    /auth-setup
    技能获取指引。

Step 2: Understand the Goal

步骤2:明确目标

Ask the user what they want to index. Good questions:
  • What blockchain/chain? (Ethereum, Base, Polygon, Solana, etc.)
  • What data? (transfers, swaps, events from a specific contract, all transactions, etc.)
  • Where should the data go? (PostgreSQL, ClickHouse, Kafka, S3, etc.)
  • Do they need transforms? (filtering, aggregation, enrichment)
  • One-time backfill or continuous streaming?
If the user already described their goal, extract answers from their description.
询问用户需要索引的内容。参考问题:
  • 目标区块链/公链?(Ethereum、Base、Polygon、Solana等)
  • 需要什么数据?(转账、兑换、特定合约的事件、所有交易等)
  • 数据要同步到哪里?(PostgreSQL、ClickHouse、Kafka、S3等)
  • 是否需要数据转换?(过滤、聚合、数据丰富)
  • 是一次性历史数据回填还是持续流处理?
如果用户已经描述过目标,直接从描述中提取对应答案即可。

Step 3: Choose the Dataset

步骤3:选择数据集

Use the
/datasets
skill to find the right dataset.
Key points:
  • Common datasets:
    <chain>.decoded_logs
    ,
    <chain>.raw_transactions
    ,
    <chain>.erc20_transfers
    ,
    <chain>.raw_traces
  • For decoded contract events, use
    <chain>.decoded_logs
    with a filter on
    address
    and
    topic0
  • For Solana: use
    solana.transactions
    ,
    solana.token_transfers
    , etc.
Present the dataset choice to the user for confirmation.
使用
/datasets
技能查找合适的数据集。
要点:
  • 常用数据集:
    <chain>.decoded_logs
    <chain>.raw_transactions
    <chain>.erc20_transfers
    <chain>.raw_traces
  • 对于已解码的合约事件,使用
    <chain>.decoded_logs
    并对
    address
    topic0
    添加过滤条件
  • 对于Solana:使用
    solana.transactions
    solana.token_transfers
将数据集选择结果告知用户确认。

Step 4: Configure the Source

步骤4:配置数据源

Build the source section of the YAML:
yaml
sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset>
    version: 1.0.0
    start_at: earliest  # or a specific block number
Ask about:
  • Start block:
    earliest
    (from genesis),
    latest
    (from now), or a specific block number
  • End block: Only for job-mode/backfill pipelines. Omit for streaming.
  • Source-level filter: Optional filter to reduce data at the source (e.g., specific contract address)
构建YAML的source部分:
yaml
sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset>
    version: 1.0.0
    start_at: earliest  # or a specific block number
询问以下信息:
  • 起始区块:
    earliest
    (从创世块开始)、
    latest
    (从当前区块开始),或是指定的区块号
  • 结束区块: 仅用于任务模式/回填流水线,流处理模式可省略
  • 数据源层级过滤: 可选的过滤规则,用于在数据源端减少数据量(例如特定合约地址)

Step 5: Configure Transforms (if needed)

步骤5:配置转换规则(如需)

If the user needs transforms, use the
/turbo-transforms
skill to help:
  • SQL transforms — filter, aggregate, join, or reshape data using DataFusion SQL
  • TypeScript transforms — custom logic, external API calls, complex processing
  • Dynamic tables — join with a PostgreSQL table or in-memory allowlist
Build the transforms section:
yaml
transforms:
  my_transform:
    type: sql
    primary_key: id
    sql: |
      SELECT * FROM my_source
      WHERE <conditions>
如果用户需要数据转换,使用
/turbo-transforms
技能协助:
  • SQL转换——使用DataFusion SQL对数据进行过滤、聚合、关联或重构
  • TypeScript转换——自定义逻辑、外部API调用、复杂处理
  • 动态表——与PostgreSQL表或内存白名单关联
构建transforms部分:
yaml
transforms:
  my_transform:
    type: sql
    primary_key: id
    sql: |
      SELECT * FROM my_source
      WHERE <conditions>

Step 6: Configure the Sink

步骤6:配置输出端(Sink)

Ask where the data should go. Use the
/turbo-pipelines
skill for sink configuration:
SinkKey config
PostgreSQL
secret_name
,
schema
,
table
,
primary_key
ClickHouse
secret_name
,
table
,
order_by
Kafka
secret_name
,
topic
S3
bucket
,
region
,
prefix
,
format
Webhook
url
,
format
For sinks requiring
secret_name
, check if the secret exists:
bash
goldsky secret list
If it doesn't exist, help create it using the
/secrets
skill.
询问数据需要同步到哪里。使用
/turbo-pipelines
技能获取输出端配置指引:
输出端核心配置
PostgreSQL
secret_name
,
schema
,
table
,
primary_key
ClickHouse
secret_name
,
table
,
order_by
Kafka
secret_name
,
topic
S3
bucket
,
region
,
prefix
,
format
Webhook
url
,
format
对于需要
secret_name
的输出端,检查密钥是否存在:
bash
goldsky secret list
如果不存在,使用
/secrets
技能协助创建。

Step 7: Choose Mode

步骤7:选择运行模式

Use the
/turbo-pipelines
skill for guidance:
  • Streaming (default) — continuous processing, no
    end_block
    , runs indefinitely
  • Job mode — one-time backfill, set
    job: true
    and
    end_block
使用
/turbo-pipelines
技能获取指引:
  • 流处理(默认)——持续处理,无
    end_block
    ,永久运行
  • 任务模式——一次性历史回填,设置
    job: true
    end_block

Step 8: Generate, Validate, and Present

步骤8:生成、校验并展示

Assemble the complete pipeline YAML. Use a descriptive name following the convention:
<chain>-<data>-<sink>
(e.g.,
base-erc20-transfers-postgres
).
  1. Write the YAML file to disk (e.g.,
    <pipeline-name>.yaml
    ).
  2. Run validation BEFORE showing the YAML to the user:
bash
goldsky turbo validate -f <pipeline-name>.yaml
  1. If validation fails, fix the issues and re-validate. Do NOT present the YAML until validation passes. Common fixes:
    • Missing
      version
      field on dataset source
    • Invalid dataset name (check chain prefix)
    • Missing
      secret_name
      for database sinks
    • SQL syntax errors in transforms
  2. Once validation passes, present the full YAML to the user for review.
组装完整的流水线YAML。使用遵循
<chain>-<data>-<sink>
格式的描述性名称(例如
base-erc20-transfers-postgres
)。
  1. 将YAML文件写入磁盘(例如
    <pipeline-name>.yaml
    )。
  2. 在向用户展示YAML之前先运行校验:
bash
goldsky turbo validate -f <pipeline-name>.yaml
  1. 如果校验失败,修复问题后重新校验。校验通过前不要向用户展示YAML。常见修复项:
    • 数据集源缺少
      version
      字段
    • 数据集名称无效(检查链前缀)
    • 数据库输出端缺少
      secret_name
    • 转换规则中的SQL语法错误
  2. 校验通过后,向用户展示完整YAML供审核。

Step 9: Deploy

步骤9:部署

After user confirms the YAML looks good:
bash
goldsky turbo apply <pipeline-name>.yaml
用户确认YAML无误后:
bash
goldsky turbo apply <pipeline-name>.yaml

Step 10: Verify

步骤10:验证

After deployment:
bash
goldsky turbo list
Suggest running inspect to verify data flow:
bash
goldsky turbo inspect <pipeline-name>
Present a summary:
undefined
部署完成后:
bash
goldsky turbo list
建议运行inspect命令验证数据流:
bash
goldsky turbo inspect <pipeline-name>
展示汇总信息:
undefined

Pipeline Deployed

流水线已部署

Name: [name] Chain: [chain] Dataset: [dataset] Sink: [sink type] Mode: [streaming/job]
Next steps:
  • Monitor with
    goldsky turbo inspect <name>
  • Check logs with
    goldsky turbo logs <name>
  • Use /turbo-doctor if you run into issues
undefined
名称: [name] 链: [chain] 数据集: [dataset] 输出端: [sink type] 模式: [streaming/job]
后续步骤:
  • 使用
    goldsky turbo inspect <name>
    监控
  • 使用
    goldsky turbo logs <name>
    查看日志
  • 遇到问题请使用
    /turbo-doctor
undefined

Important Rules

重要规则

  • Always validate before presenting complete YAML to the user. Never show unvalidated complete pipeline YAML.
  • Always validate before deploying.
  • Always show the user the complete YAML before deploying.
  • For job-mode pipelines, remind the user they auto-cleanup ~1hr after completion.
  • Use
    blackhole
    sink for testing pipelines without writing to a real destination.
  • If the user wants to modify an existing pipeline, check if it's streaming (update in place) or job-mode (must delete first).
  • Default to
    start_at: earliest
    unless the user specifies otherwise.
  • Always include
    version: 1.0.0
    on dataset sources.
  • 向用户展示完整YAML前必须先校验,永远不要展示未校验的完整流水线YAML。
  • 部署前必须先校验。
  • 部署前必须向用户展示完整YAML。
  • 对于任务模式流水线,提醒用户任务完成后约1小时会自动清理。
  • 测试流水线时使用
    blackhole
    输出端,无需写入真实目标存储。
  • 如果用户需要修改现有流水线,检查是否为流处理模式(可原地更新)或任务模式(必须先删除)。
  • 除非用户另行指定,默认使用
    start_at: earliest
  • 数据集源必须始终包含
    version: 1.0.0

Related

相关资源

  • /turbo-pipelines
    — YAML configuration and architecture reference
  • /turbo-doctor
    — Diagnose and fix pipeline issues
  • /turbo-operations
    — Lifecycle commands and monitoring reference
  • /turbo-transforms
    — SQL and TypeScript transform reference
  • /datasets
    — Dataset names and chain prefixes
  • /secrets
    — Sink credential management
  • /turbo-pipelines
    —— YAML配置和架构参考
  • /turbo-doctor
    —— 诊断并修复流水线问题
  • /turbo-operations
    —— 生命周期命令和监控参考
  • /turbo-transforms
    —— SQL和TypeScript转换规则参考
  • /datasets
    —— 数据集名称和链前缀参考
  • /secrets
    —— 输出端凭证管理