carto-create-workflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

carto-create-workflow

carto-create-workflow

CARTO Workflows is a visual DAG builder that compiles to warehouse SQL. Each workflow runs inside a connected warehouse — no CARTO compute is involved at execution time. This skill covers the full lifecycle: building the DAG (the bulk of this file), operating it via the CLI (CRUD, schedules), and cross-profile copy (
dev → prod
promotion, customer-segregated workspaces via
carto workflows copy
) — see the references below.
For one-off ad-hoc SQL, use
carto-query-datawarehouse
— workflows are for repeatable, scheduled, multi-step DAGs.
Bundle structure, component schemas, input formats, and gotchas are all served by the CLI — never hardcode or assume them. The CLI is the source of truth.
Live introspection commands (use these before reaching for any reference file):
CommandWhat it serves
carto workflows schema
Index of all bundle/DAG schema sections
carto workflows schema bundle
Top-level bundle shape (id, title, connectionId, config, privacy, tags).
privacy
is a
$ref
— fetch its shape with
carto workflows schema privacy
. Minimal valid form:
"privacy": { "privacy": "private" }
(the inner string is not a bare
"private"
enums
lists the allowed values for that inner field).
carto workflows schema config
Full DAG config (schemaVersion, connectionProvider enum, nodes, edges, variables, viewport, useCache, executionSettings, schedule)
carto workflows schema node
Generic node shape, including
data.version
requirement and
data.title
vs
data.label
carto workflows schema node.source
Source/
ReadTable
node shape and the
data.id == data.inputs[0].value
invariant
carto workflows schema node.customsql
Full customsql node spec
carto workflows schema customsql
Copy-paste customsql node template (with
version: "2.0.0"
)
carto workflows schema edge
Edge shape
carto workflows schema handles
Edge handle naming reference — sourceHandle/targetHandle by node type, by operator, by component. Critical for valid edges.
carto workflows schema variable
Variable (parameter) shape —
{ order, name, type, value, public }
carto workflows schema schedule
Declarative schedule metadata fields
carto workflows schema enums
All valid enums (node types, providers, privacies, schedule frequencies)
carto workflows components list --connection <conn> --json
Component catalog for the connected warehouse
carto workflows components get <names> --connection <conn> --json
Per-component
inputs
,
outputs
,
notes
carto workflows components get <names> --connection <conn> --input-formats --json
Input-type
format
,
examples
,
pitfalls
carto workflows --help
Full command reference, including schedule-expression dialects per engine
References (only for what the CLI doesn't serve):
  • references/providers/
    — per-warehouse details (BigQuery, Snowflake, Databricks): identifier quoting, column casing, AT path.
  • references/scheduling.md
    add
    vs
    update
    semantics, bundle-level schedule warning, activity-log verification.
  • references/mcp-and-api-publish.md
    — publishing a workflow as an MCP tool or callable API endpoint: bundle requirements (
    native.mcptooloutput
    + scoped variables + draft descriptions),
    {{@var}}
    vs
    @var
    substitution syntax,
    Number → FLOAT64
    LIMIT
    gotcha, post-publish verification.
  • references/cross-profile-copy.md
    workflows copy
    mechanics, connection mapping (
    --connection-mapping
    /
    --connection
    ),
    --skip-source-validation
    , why copies are always new workflows.
  • references/schedule-readd.md
    — schedules don't transfer across
    workflows copy
    ; how to re-add them, including dialect translation when source and destination engines differ.
connectionProvider
must match the connection.
config.connectionProvider
(enum in
schema enums
) must match the connection's actual provider — mismatches generate the wrong SQL dialect and error at runtime. Look it up with
carto connections list --search <name> --json
(
connections get
requires a UUID).

CARTO Workflows是一款可视化DAG构建工具,可编译为数据仓库SQL。每个工作流都在已连接的数据仓库内运行——执行阶段不涉及CARTO计算。本技能覆盖完整生命周期:构建DAG(本文档核心内容)、通过CLI操作工作流(增删改查、调度),以及跨配置文件复制
dev → prod
升级、通过
carto workflows copy
实现客户隔离工作区)——详见下方参考资料。
对于一次性临时SQL,请使用
carto-query-datawarehouse
——工作流适用于可重复、可调度的多步骤DAG。
Bundle结构、组件Schema、输入格式及注意事项均由CLI提供——切勿硬编码或主观假设。CLI是唯一可信来源。
实时自省命令(在查阅任何参考文件前先使用这些命令):
命令用途
carto workflows schema
所有bundle/DAG schema章节的索引
carto workflows schema bundle
顶层bundle结构(id、title、connectionId、config、privacy、tags)。
privacy
$ref
——使用
carto workflows schema privacy
获取其结构。最小有效格式:
"privacy": { "privacy": "private" }
(内部字符串并非单纯的
"private"
——
enums
列出了该内部字段的允许值)。
carto workflows schema config
完整DAG配置(schemaVersion、connectionProvider枚举、nodes、edges、variables、viewport、useCache、executionSettings、schedule)
carto workflows schema node
通用节点结构,包括
data.version
要求以及
data.title
data.label
的区别
carto workflows schema node.source
Source/
ReadTable
节点结构,以及
data.id == data.inputs[0].value
的约束
carto workflows schema node.customsql
完整customsql节点规范
carto workflows schema customsql
可直接复制的customsql节点模板(含
version: "2.0.0"
carto workflows schema edge
边的结构
carto workflows schema handles
边句柄命名参考——按节点类型、操作符、组件划分的sourceHandle/targetHandle。对创建有效边至关重要。
carto workflows schema variable
变量(参数)结构——
{ order, name, type, value, public }
carto workflows schema schedule
声明式调度元数据字段
carto workflows schema enums
所有有效枚举值(节点类型、提供商、隐私设置、调度频率)
carto workflows components list --connection <conn> --json
已连接数据仓库的组件目录
carto workflows components get <names> --connection <conn> --json
单个组件的
inputs
outputs
notes
carto workflows components get <names> --connection <conn> --input-formats --json
输入类型的
format
examples
pitfalls
carto workflows --help
完整命令参考,包括各引擎对应的调度表达式语法
参考资料(仅用于CLI未覆盖的内容):
  • references/providers/
    ——各数据仓库细节(BigQuery、Snowflake、Databricks):标识符引用、列大小写、AT路径。
  • references/scheduling.md
    ——
    add
    update
    的语义差异、bundle级调度警告、活动日志验证。
  • references/mcp-and-api-publish.md
    ——将工作流发布为MCP工具或可调用API端点:bundle要求(
    native.mcptooloutput
    + 作用域变量 + 草稿描述)、
    {{@var}}
    @var
    替换语法、
    Number → FLOAT64
    LIMIT
    注意事项、发布后验证。
  • references/cross-profile-copy.md
    ——
    workflows copy
    机制、连接映射(
    --connection-mapping
    /
    --connection
    )、
    --skip-source-validation
    、复制为何始终生成新工作流。
  • references/schedule-readd.md
    ——调度不会随
    workflows copy
    转移;如何重新添加调度,包括源引擎与目标引擎不同时的语法转换。
connectionProvider
必须与连接匹配。
config.connectionProvider
schema enums
中的枚举值)必须与连接的实际提供商匹配——不匹配会生成错误的SQL方言并在运行时出错。使用
carto connections list --search <name> --json
查询(
connections get
需要UUID)。

Development process

开发流程

Follow these 6 phases in order for every workflow request. Do not skip or reorder them.
每个工作流请求都必须按以下6个阶段依次执行,不得跳过或调整顺序。

Phase 1 — Gather information

阶段1 — 收集信息

  1. Identify data sources. If the user named tables, note them. Otherwise discover what's available with
    carto connections list
    and
    carto connections describe <connection> "<fqn>"
    .
  2. Clarify the goal. What transformation? What output? What filters/conditions?
  3. Determine the connection.
    carto connections list | head -n 20
    . Note its
    provider
    (
    bigquery
    /
    snowflake
    /
    databricks
    ) — you will need it for the next step.
  4. Read the provider reference.
    <critical-rule id="read-provider-reference"> Before writing any node, you MUST open `references/providers/<provider>.md` (e.g. `references/providers/bigquery.md`) and read it end-to-end. This is non-negotiable.
    <why>It contains identifier-quoting rules, column-casing behaviour, Analytics Toolbox path, schedule-expression dialect, and customsql
    $a
    /
    $b
    placeholder requirements that
    validate
    cannot catch. These only surface later as
    verify-remote
    failures or runtime SQL errors, and are the single most common cause of late-stage rework.</why>
    <do-not>Do not skip this step because the next phases look concrete. Do not rely on memory of a previous run — provider files change.</do-not> </critical-rule>
  5. Fetch the component catalog.
    carto workflows components list --connection <connection> --json
    — your only source of truth for component names.
  1. 确定数据源。如果用户指定了表,记录下来。否则使用
    carto connections list
    carto connections describe <connection> "<fqn>"
    发现可用数据源。
  2. 明确目标。要进行什么转换?输出是什么?有哪些过滤/条件?
  3. 确定连接。执行
    carto connections list | head -n 20
    ,记录其
    provider
    bigquery
    /
    snowflake
    /
    databricks
    )——下一步会用到。
  4. 阅读提供商参考文档
    <critical-rule id="read-provider-reference"> 在编写任何节点之前,你必须打开`references/providers/<provider>.md`(例如`references/providers/bigquery.md`)并完整阅读。这是硬性要求。
    <why>文档包含标识符引用规则、列大小写行为、Analytics Toolbox路径、调度表达式语法以及customsql的
    $a
    /
    $b
    占位符要求,这些都是
    validate
    无法检测到的。这些问题只会在后续
    verify-remote
    失败或运行时SQL错误中暴露,是后期返工最常见的原因。</why>
    <do-not>不要因为后续阶段看似具体就跳过此步骤。不要依赖之前的记忆——提供商文档可能会更新。</do-not> </critical-rule>
  5. 获取组件目录。执行
    carto workflows components list --connection <connection> --json
    ——这是组件名称的唯一可信来源。

Phase 2 — Design the approach

阶段2 — 设计实现方案

  1. Select components from the catalog you fetched.
  2. Fetch schemas for every component you plan to use.
    carto workflows components get <name1>,<name2>,<name3> --connection <connection> --json
    returns
    inputs
    ,
    outputs
    , and
    notes
    . Read the
    notes
    array carefully — it contains gotchas.
  3. Fetch input type formats.
    carto workflows components get <component1>,<component2> --connection <connection> --input-formats --json
    returns
    format
    ,
    examples
    , and
    pitfalls
    for each input/output type. Pass component names (e.g.
    native.buffer
    ), NOT input-type names.
  4. Design principles:
    • Preserve identifier and spatial columns throughout.
    • Prefer native components over
      native.customsql
      . This is not a soft preference.
      See Native-first rule.
    • H3/Quadbin columns work for visualization without geometry extraction.
    • Use standard names for visualization:
      geom
      ,
      h3
      ,
      quadbin
      .
  1. 从目录中选择组件
  2. 获取所有计划使用的组件的Schema。执行
    carto workflows components get <name1>,<name2>,<name3> --connection <connection> --json
    返回
    inputs
    outputs
    notes
    。仔细阅读
    notes
    数组——其中包含注意事项。
  3. 获取输入类型格式。执行
    carto workflows components get <component1>,<component2> --connection <connection> --input-formats --json
    返回各输入/输出类型的
    format
    examples
    pitfalls
    。传入组件名称(例如
    native.buffer
    ),而非输入类型名称。
  4. 设计原则
    • 全程保留标识符和空间列。
    • 优先使用原生组件而非
      native.customsql
      。这不是软性偏好。
      详见原生优先规则
    • H3/Quadbin列无需提取几何即可用于可视化。
    • 使用标准名称用于可视化:
      geom
      h3
      quadbin

Phase 3 — Present plan, surface gaps, confirm

阶段3 — 展示方案、暴露缺口、确认需求

Present the workflow plan (components, data flow, decisions). Then explicitly enumerate every gap before building:
  • Unresolved parameters — thresholds, radii, filter values, time windows, k for k-NN, aggregation columns, output table names, etc.
  • Analytical decisions left to the user — significance levels, distance metrics, join types, null-handling, dedup keys, CRS, H3/quadbin resolution.
  • Ambiguities in the request — anything where you had to guess intent.
For each gap, propose a sensible default with its rationale (e.g. "p-value threshold: suggest
0.05
— conventional significance level", "buffer distance: suggest
1000m
— matches the city-block scale of the input"), and ask the user to confirm or override. Never silently pick a value for a user-facing analytical parameter. Wait for confirmation before building.
展示工作流方案(组件、数据流、决策)。然后明确列出所有未解决的缺口再开始构建:
  • 未解决的参数——阈值、半径、过滤值、时间窗口、k-NN的k值、聚合列、输出表名等。
  • 留给用户的分析决策——显著性水平、距离度量、连接类型、空值处理、去重键、CRS、H3/quadbin分辨率。
  • 需求中的歧义——任何你不得不猜测意图的内容。
对于每个缺口,提出合理的默认值并说明理由(例如“p值阈值:建议
0.05
——常规显著性水平”,“缓冲区距离:建议
1000m
——匹配输入数据的城市街区尺度”),并请求用户确认或修改。切勿擅自为面向用户的分析参数选择值。等待确认后再开始构建

Phase 4 — Build the workflow

阶段4 — 构建工作流

  1. Create the workflow file. Get the bundle/node/edge/variable shapes from
    carto workflows schema [section]
    (start with
    bundle
    , then
    node
    ,
    node.source
    ,
    node.customsql
    ,
    edge
    ,
    handles
    ). For customsql nodes, copy the template from
    carto workflows schema customsql
    .
    If you set the optional top-level
    privacy
    , it must be an object, not a string:
    "privacy": { "privacy": "private" }
    (the field name nests). Omit the field entirely if you don't need it —
    "privacy": "private"
    will fail
    validate
    .
    Source nodes (
    type: "source"
    ) — treat
    ReadTable
    like any other component: fetch its spec with
    carto workflows components get ReadTable --connection <conn> --json
    to get the canonical
    inputs[*].title
    and
    inputs[*].description
    . (
    ReadTable
    is hidden from
    components list
    because it's grouped
    __internal
    , but
    get
    returns it normally.) Two source-only rules
    get
    cannot tell you, both from
    schema node.source
    :
    • The canvas display name lives in
      data.label
      , NOT
      data.title
      . Generic nodes use
      title
      ; source nodes use
      label
      .
    • data.id
      and
      data.inputs[0].value
      must be the same FQN.
    Canvas layout & naming — apply on every node, every workflow. None of this affects execution, but the user opens the DAG in Builder and a sloppy canvas reads as low quality. The numbers are small and stable; just apply them.
    • Snap grid is 16 px. Every
      x
      and
      y
      you write must be
      % 16 == 0
      . Builder snaps drags to this grid; off-grid values look subtly misaligned next to anything the user nudged.
    • Card widths are fixed by node type: source nodes render at 192 px (12 cells), generic components at 64 px (4 cells). Knowing this is what lets you reason about gaps.
    • Card heights are fixed: every component card and source card is 80 px (5 cells) tall, with a 16 px label rendered below the card body. The label is not part of the card — it lives in the gap to the next card.
    • Canonical inter-card gap (right edge → next left edge): 80 px (5 cells) for tight linear placement; 128 px (8 cells) at a fan-in (a join's left input, where an edge from another row needs room). The gap is the constant; left-edge-to-left-edge Δx differs across patterns only because cards have different widths. So a generic→generic linear step is Δx=144 (9 cells); a source→generic step at the same gap is Δx=272 (17 cells); a generic→generic fan-in step is Δx=192 (12 cells).
    • Canonical vertical gap (card body bottom → next card body top): 80 px (5 cells), of which the first 16 px is the card's label and the remaining 64 px is whitespace. The label always sits inside the gap, never inside the card. So a stacked-card step is top-to-top Δy = 160 px (10 cells) — 80 (body) + 16 (label) + 64 (whitespace).
    • Layout. Source nodes stack at the leftmost column with the same
      x
      , Δy = 144 px (9 cells). The main pipeline runs at the y-midline of the source rows — e.g. sources at y=80 and y=224 → pipeline at y=160. Joins on the midline visually receive both inputs symmetrically.
    • data.title
      and
      data.label
      are different fields
      — never duplicate.
      title
      = short instance-specific verb (≤ 15 chars) describing what this node does in this DAG (
      "Rank"
      ,
      "Join to score"
      ,
      "To H3"
      ).
      label
      = the component's canonical type name as Builder shows it on a fresh drop (
      "Join"
      ,
      "Create Column"
      ,
      "H3 from GeoPoint"
      ) — read from
      carto workflows components get <name> --json
      components[0].title
      . Source nodes only render
      data.label
      on canvas (treat it as a short alias for the table:
      "Candidates"
      ,
      "Score grid C"
      ).
  2. Run
    validate
    after every write to the file.
    It's offline, fast, and catches structural errors immediately:
    bash
    carto workflows validate workflow.json --json
    Treat any save without a passing
    validate
    as broken — fix before continuing to the next node/edge.
    validate
    is authoritative.
    If a component schema from
    components get
    disagrees with what
    validate
    accepts, trust
    validate
    and adjust the bundle to satisfy it. Do not "fix" the bundle to match the schema if it's already passing validation.
  3. Run
    verify
    at branch boundaries
    , not on every save. It hits the warehouse (slower, requires auth), so reserve it for whole sub-DAGs once their structure validates clean, and once at the end before presenting:
    bash
    carto workflows verify-remote workflow.json --connection <connection-name> --json
    verify
    is what catches column-type mismatches, missing tables, and AT resolution — things
    validate
    cannot see.
  4. Fix errors silently — don't expose implementation details to the user.
  5. Iterate until complete, with both
    validate
    and a final
    verify
    clean.
  1. 创建工作流文件。从
    carto workflows schema [section]
    获取bundle/node/edge/variable结构(先从
    bundle
    开始,然后是
    node
    node.source
    node.customsql
    edge
    handles
    )。对于customsql节点,从
    carto workflows schema customsql
    复制模板。
    如果设置可选的顶层
    privacy
    ,它必须是一个对象,而非字符串:
    "privacy": { "privacy": "private" }
    (字段名称嵌套)。如果不需要则完全省略该字段——
    "privacy": "private"
    会导致
    validate
    失败。
    Source节点
    type: "source"
    )——将
    ReadTable
    视为普通组件:使用
    carto workflows components get ReadTable --connection <conn> --json
    获取规范的
    inputs[*].title
    inputs[*].description
    。(
    ReadTable
    components list
    中隐藏,因为它属于
    __internal
    分组,但
    get
    命令可正常返回。)有两个
    get
    无法告知的Source节点专属规则,均来自
    schema node.source
    • 画布显示名称位于
      data.label
      ,而非
      data.title
      。通用节点使用
      title
      ;Source节点使用
      label
    • data.id
      data.inputs[0].value
      必须是相同的FQN。
    画布布局与命名——每个节点、每个工作流都需遵循。这些不影响执行,但用户会在Builder中打开DAG,混乱的画布会显得质量低下。数值较小且固定,直接应用即可。
    • 对齐网格为16 px。你编写的每个
      x
      y
      都必须满足
      % 16 == 0
      。Builder会将拖拽操作对齐到此网格;非网格值与用户微调过的元素相邻时会显得轻微错位。
    • 卡片宽度由节点类型决定:Source节点渲染宽度为192 px(12格),通用组件为64 px(4格)。了解这一点才能合理规划间距。
    • 卡片高度固定:每个组件卡片和Source卡片高度均为80 px(5格),卡片下方会渲染16 px的标签。标签不属于卡片——位于下一张卡片的间距中。
    • 标准卡片间距(右边缘→下一张左边缘):线性紧凑布局为80 px(5格);扇入处(连接的左输入,需要为另一行的边留出空间)为128 px(8格)。间距是固定值;左边缘到左边缘的Δx因卡片宽度不同而变化。因此通用→通用的线性步骤Δx=144(9格);Source→通用的同间距步骤Δx=272(17格);通用→通用的扇入步骤Δx=192(12格)。
    • 标准垂直间距(卡片主体底部→下一张卡片主体顶部):80 px(5格),其中前16 px是卡片标签,剩余64 px是空白。标签始终位于间距内,而非卡片内。因此堆叠卡片的顶部到顶部Δy=160 px(10格)——80(主体)+16(标签)+64(空白)。
    • 布局:Source节点堆叠在最左侧列,
      x
      相同,Δy=144 px(9格)。主流水线位于Source行的y中线——例如Source位于y=80和y=224 → 流水线位于y=160。连接位于中线时,视觉上能对称接收两个输入。
    • data.title
      data.label
      是不同字段
      ——切勿重复。
      title
      = 简短的实例化动词(≤15字符),描述此节点在当前DAG中的作用(
      "Rank"
      "Join to score"
      "To H3"
      )。
      label
      = Builder中拖放新组件时显示的规范类型名称(
      "Join"
      "Create Column"
      "H3 from GeoPoint"
      )——从
      carto workflows components get <name> --json
      components[0].title
      获取。Source节点仅在画布上渲染
      data.label
      (将其视为表的简短别名:
      "Candidates"
      "Score grid C"
      )。
  2. 每次保存文件后都运行
    validate
    。它是离线的,速度快,可立即捕获结构错误:
    bash
    carto workflows validate workflow.json --json
    任何未通过
    validate
    的保存都视为损坏——修复后再继续下一个节点/边。
    validate
    具有权威性
    。如果
    components get
    返回的组件Schema与
    validate
    接受的内容不一致,请信任
    validate
    并调整bundle以满足要求。如果bundle已通过验证,请勿为匹配Schema而“修改”bundle。
  3. 在分支节点处运行
    verify
    ,而非每次保存都运行。它会访问数据仓库(速度较慢,需要授权),因此仅在子DAG结构验证通过后,以及最终提交前使用:
    bash
    carto workflows verify-remote workflow.json --connection <connection-name> --json
    verify
    会捕获列类型不匹配、表缺失、AT解析问题——这些都是
    validate
    无法检测到的。
  4. 静默修复错误——不要向用户暴露实现细节。
  5. 迭代至完成,确保
    validate
    和最终
    verify
    均通过。

Phase 5 — Present result

阶段5 — 提交成果

Summarize what was built. Confirm validation success. Wait for user confirmation.
总结构建内容,确认验证成功,等待用户确认。

Phase 6 — Upload to CARTO

阶段6 — 上传至CARTO

  1. Ask if the user wants to upload.
  2. Upload and provide the URL:
    bash
    carto workflows create --file workflow.json --verify
    The connection comes from
    connectionId
    inside the bundle — no
    --connection
    flag here.
  3. Do NOT auto-execute unless explicitly requested.

  1. 询问用户是否需要上传。
  2. 上传并提供URL:
    bash
    carto workflows create --file workflow.json --verify
    连接信息来自bundle内的
    connectionId
    ——无需
    --connection
    参数。
  3. 除非明确要求,否则不要自动执行。

Native-first rule

原生优先规则

native.customsql
is the last tool to reach for, not the first. Before writing a customsql node, attempt the native chain. Fall back to customsql only if at least one of these is true:
  • The native chain would require more than ~4-5 nodes to express the same logic.
  • A specific operation has no native equivalent at all (verified via
    carto workflows components list
    ).
  • The expression genuinely needs raw warehouse SQL (e.g.,
    ST_UNION_AGG
    ,
    LOGICAL_OR
    ,
    ML.PREDICT
    , last-N windowing).
Common operations and their native equivalents — try these first:
If you'd write SQL like…Use natives
WHERE x = …
/ multi-condition filter
native.where
(predicate),
native.wheresimplified
(UI builder),
native.spatialfilter
(geometry-based match/unmatch split),
native.select
(column projection)
SELECT a, b, c FROM t
(multi-column projection / rename / multi-expression)
native.select
(one node, free-form SELECT body)
SELECT ..., expr AS c FROM t
(add one computed column)
native.selectexpression
(one column + one expression per node)
GROUP BY k, SUM(x), AVG(y), COUNT(*)
(single key)
native.groupby
groupby
input is a single
Column
, not multi-column. For multi-key grouping use
native.customsql
.
JOIN ... ON a.k = b.k
(any join type)
native.joinv2
JOIN ... ON ST_INTERSECTS / ST_CONTAINS / ST_WITHIN
native.spatialjoin
MIN(ST_DISTANCE(a.geom, b.geom))
across two tables
native.distance
(augments the main table in place with
nearest_id
+
nearest_distance
— rename them per source if you chain two
native.distance
nodes for two reference tables)
ST_BUFFER(geom, d)
native.buffer
H3 binning / boundary / center / polyfill
native.h3frompoint
,
native.h3boundary
(output geometry column is named
<h3col>_geo
, e.g.
index_geo
not
geom
),
native.h3center
,
native.h3polyfill
ORDER BY ... LIMIT n
native.orderby
+
native.limit
z-score / standardization
native.normalize
weighted composite score
native.spatialcompositeunsupervised
(weighted/PCA),
native.spatialcompositesupervised
(target-driven)
Getis-Ord Gi*, GWR, isolines
native.getisord
,
native.gwr
,
native.isolines
Save final node to a table
native.saveastable
Signals you're reaching for customsql too early — stop and look for a native chain instead:
  • The customsql is just a
    WHERE
    clause, a single
    JOIN
    , a
    GROUP BY
    with one or two aggregates, or a column projection.
  • It wraps a single warehouse function (
    ST_BUFFER
    ,
    H3_FROMGEOGPOINT
    , etc.) for which a dedicated native exists.
  • Its only purpose is to project/rename/re-cast columns — use
    native.select
    (free-form SELECT body, one node) for multiple columns;
    native.selectexpression
    is for adding a single computed column.
  • You're chaining customsql outputs through more customsql nodes — chain natives instead.
When customsql is genuinely the right call, the per-warehouse SQL-dialect footguns live in the matching
references/providers/*.md
(BigQuery backticks, Snowflake casing, Databricks identifiers).

native.customsql
是最后的选择,而非首选。在编写customsql节点之前,尝试使用原生组件链。仅当满足以下至少一个条件时才退而求其次使用customsql:
  • 使用原生组件链表达相同逻辑需要约4-5个以上节点
  • 特定操作完全没有原生等效组件(通过
    carto workflows components list
    验证)。
  • 表达式确实需要原始数据仓库SQL(例如
    ST_UNION_AGG
    LOGICAL_OR
    ML.PREDICT
    、最近N条数据窗口函数)。
常见操作及其原生等效组件——优先尝试这些:
若你要编写此类SQL…使用原生组件
WHERE x = …
/ 多条件过滤
native.where
(谓词)、
native.wheresimplified
(UI构建器)、
native.spatialfilter
(基于几何的匹配/不匹配拆分)、
native.select
(列投影)
SELECT a, b, c FROM t
(多列投影/重命名/多表达式)
native.select
(单个节点,自由形式SELECT主体)
SELECT ..., expr AS c FROM t
(添加一个计算列)
native.selectexpression
(每个节点对应一列+一个表达式)
GROUP BY k, SUM(x), AVG(y), COUNT(*)
(单键)
native.groupby
——
groupby
输入是单个
Column
,而非多列。多键分组请使用
native.customsql
JOIN ... ON a.k = b.k
(任何连接类型)
native.joinv2
JOIN ... ON ST_INTERSECTS / ST_CONTAINS / ST_WITHIN
native.spatialjoin
跨两张表计算
MIN(ST_DISTANCE(a.geom, b.geom))
native.distance
(在主表中添加
nearest_id
+
nearest_distance
字段——如果为两个参考表链式调用两个
native.distance
节点,请根据源表重命名这些字段)
ST_BUFFER(geom, d)
native.buffer
H3分箱/边界/中心/填充
native.h3frompoint
native.h3boundary
(输出几何列名为
<h3col>_geo
,例如
index_geo
——不是
geom
)、
native.h3center
native.h3polyfill
ORDER BY ... LIMIT n
native.orderby
+
native.limit
z-score / 标准化
native.normalize
加权综合评分
native.spatialcompositeunsupervised
(加权/PCA)、
native.spatialcompositesupervised
(目标驱动)
Getis-Ord Gi*、GWR、等值线
native.getisord
native.gwr
native.isolines
将最终节点保存到表
native.saveastable
以下信号表明你过早使用customsql——停止操作并寻找原生组件链:
  • customsql仅包含
    WHERE
    子句、单个
    JOIN
    、带一两个聚合函数的
    GROUP BY
    或列投影。
  • 它封装了单个数据仓库函数(
    ST_BUFFER
    H3_FROMGEOGPOINT
    等),而已有专门的原生组件。
  • 其唯一目的是投影/重命名/转换列——多列使用
    native.select
    (自由形式SELECT主体,单个节点);添加单个计算列使用
    native.selectexpression
  • 你正在将customsql输出链式传递给更多customsql节点——改为链式调用原生组件。
当确实需要使用customsql时,各数据仓库的SQL方言陷阱在对应的
references/providers/*.md
中(BigQuery反引号、Snowflake大小写、Databricks标识符)。

Fetching component & input information

获取组件与输入信息

Do not rely on memorized component schemas or input formats. Always fetch live data from the CLI.
CommandPurpose
carto workflows components list --connection <conn> --json
List all available components
carto workflows components get <names> --connection <conn> --json
Component schemas with
inputs
,
outputs
, and
notes
carto workflows components get <names> --connection <conn> --input-formats --json
Input type
format
,
examples
,
pitfalls
for the types those components use
What to look for in the response:
  • Component
    notes
    — gotcha strings: non-obvious behavior, deprecated status, output column naming.
  • Input
    format
    — prose describing the expected value shape.
  • Input
    examples
    — concrete JSON snippets showing correct usage.
  • Input
    pitfalls
    — common mistakes, evaluation order, format quirks.
  • Component
    version
    — copy verbatim into the authored node's
    data.version
    (string). Generic nodes without it are flagged OUTDATED in Builder.
  • Input
    options
    (Selection / Enum)
    — the engine matches values exactly. Copy each option string verbatim — preserve case, never paraphrase or Title-Case (e.g. spatialjoin's
    jointype
    accepts
    "inner"
    , not
    "Inner"
    ).
For values that may evolve over time (component versions, bundle/config defaults, enum option lists), treat the CLI's
components get
/
schema
output as the single source of truth — never hardcode values in your own templates. Specifically:
  • config.schemaVersion
    — read the current default from
    carto workflows schema config --json
    properties.schemaVersion.default
    . Today it's
    "1.0.0"
    (string), but resolve at author time so future bumps don't require a skill update.

不要依赖记忆中的组件Schema或输入格式。始终从CLI获取实时数据。
命令用途
carto workflows components list --connection <conn> --json
列出所有可用组件
carto workflows components get <names> --connection <conn> --json
组件Schema,包含
inputs
outputs
notes
carto workflows components get <names> --connection <conn> --input-formats --json
组件所用输入类型的
format
examples
pitfalls
响应中需要关注的内容:
  • 组件
    notes
    ——注意事项:非直观行为、已弃用状态、输出列命名。
  • 输入
    format
    ——描述预期值结构的说明文字。
  • 输入
    examples
    ——展示正确用法的具体JSON片段。
  • 输入
    pitfalls
    ——常见错误、求值顺序、格式 quirks。
  • 组件
    version
    ——原样复制到创建的节点的
    data.version
    (字符串)。没有该字段的通用节点在Builder中会被标记为OUTDATED。
  • 输入
    options
    (选择/枚举)
    ——引擎会精确匹配值。原样复制每个选项字符串——保留大小写,切勿改写或转为标题大小写(例如spatialjoin的
    jointype
    接受
    "inner"
    ,而非
    "Inner"
    )。
对于可能随时间演变的值(组件版本、bundle/config默认值、枚举选项列表),将CLI的
components get
/
schema
输出视为唯一可信来源——切勿在自己的模板中硬编码值。具体来说:
  • config.schemaVersion
    ——从
    carto workflows schema config --json
    properties.schemaVersion.default
    读取当前默认值。目前是
    "1.0.0"
    (字符串),但在创建时解析,以便未来版本升级无需更新技能。

Provider-specific notes

提供商特定说明

Different warehouses have different SQL dialects, table-naming conventions, and column-casing rules. Always check the matching provider guide:
  • references/providers/bigquery.md
  • references/providers/snowflake.md
  • references/providers/databricks.md
Input-type formats (
Table
,
Column
,
ColumnsForJoin
,
SelectColumnAggregation
, etc.) and per-component gotchas (including the "AT components need
verify
, not
validate
" rule) are served by the CLI itself — see Fetching component & input information.

不同数据仓库有不同的SQL方言、表命名约定和列大小写规则。务必查看对应的提供商指南:
  • references/providers/bigquery.md
  • references/providers/snowflake.md
  • references/providers/databricks.md
输入类型格式(
Table
Column
ColumnsForJoin
SelectColumnAggregation
等)和组件特定注意事项(包括“AT组件需要
verify
,而非
validate
”规则)由CLI直接提供——详见获取组件与输入信息

Operating a workflow (after it's built)

工作流操作(构建完成后)

Once a workflow exists in CARTO, the CLI exposes CRUD and schedule management. Quick reference:
bash
undefined
工作流在CARTO中创建后,CLI提供增删改查和调度管理功能。快速参考:
bash
undefined

List / inspect

列出 / 查看详情

carto workflows list --json carto workflows get <id>
carto workflows list --json carto workflows get <id>

Update with edited JSON

使用编辑后的JSON更新

carto workflows update <id> --file workflow.json
carto workflows update <id> --file workflow.json

Add / remove a schedule

添加 / 删除调度

carto workflows schedule add <id> --expression "every day 08:00" carto workflows schedule remove <id>

Always-on guidance:

- **Workflows run on the connection's warehouse.** A workflow with a BigQuery connection cannot use Snowflake-specific SQL.
- **Schedule expression syntax depends on the engine** — natural-language for BQ/CARTO DW (`"every day 08:00"`), cron for Snowflake/Postgres (`"0 8 * * *"`), Quartz cron for Databricks (`"0 0 8 * * ?"`). See [`references/scheduling.md`](references/scheduling.md). Picking the wrong dialect fails at schedule-add time.
- **Copying a workflow across profiles** (dev → prod, customer-segregated workspaces) is covered in [`references/cross-profile-copy.md`](references/cross-profile-copy.md). Schedules don't transfer — see [`references/schedule-readd.md`](references/schedule-readd.md).
- **Deleting a workflow doesn't delete its outputs.** Tables/views the workflow created in the warehouse persist; clean them up with `carto sql job` if needed.
- **`workflows update` replaces the whole DAG.** There's no per-node patch. Always `get` first, edit, then `update`.
- **Workflow execution status** lives in the activity log (`WorkflowRun`, `WorkflowExecutionComplete` event types). For health monitoring of scheduled workflows, query that log via [`carto-query-datawarehouse`](../carto-query-datawarehouse) — see `references/activity-queries.md` in that skill.
carto workflows schedule add <id> --expression "every day 08:00" carto workflows schedule remove <id>

通用指导:

- **工作流在连接的数据仓库上运行**。使用BigQuery连接的工作流无法使用Snowflake专属SQL。
- **调度表达式语法取决于引擎**——BQ/CARTO DW使用自然语言(`"every day 08:00"`),Snowflake/Postgres使用cron(`"0 8 * * *"`),Databricks使用Quartz cron(`"0 0 8 * * ?"`)。详见[`references/scheduling.md`](references/scheduling.md)。选择错误语法会在添加调度时失败。
- **跨配置文件复制工作流**(dev → prod、客户隔离工作区)详见[`references/cross-profile-copy.md`](references/cross-profile-copy.md)。调度不会随复制转移——详见[`references/schedule-readd.md`](references/schedule-readd.md)。
- **删除工作流不会删除其输出**。工作流在数据仓库中创建的表/视图会保留;如需清理请使用`carto sql job`。
- **`workflows update`会替换整个DAG**。不支持单个节点补丁。始终先`get`、编辑、再`update`。
- **工作流执行状态**位于活动日志中(`WorkflowRun`、`WorkflowExecutionComplete`事件类型)。如需监控调度工作流的健康状态,请通过[`carto-query-datawarehouse`](../carto-query-datawarehouse)查询该日志——详见该技能中的`references/activity-queries.md`。