carto-create-workflow

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

carto-create-workflow

CARTO Workflows is a visual DAG builder that compiles to warehouse SQL. Each workflow runs inside a connected warehouse — no CARTO compute is involved at execution time. This skill covers the full lifecycle: building the DAG (the bulk of this file), operating it via the CLI (CRUD, schedules), and cross-profile copy (

dev → prod

promotion, customer-segregated workspaces via

carto workflows copy

) — see the references below.

For one-off ad-hoc SQL, use

carto-query-datawarehouse

— workflows are for repeatable, scheduled, multi-step DAGs.

Bundle structure, component schemas, input formats, and gotchas are all served by the CLI — never hardcode or assume them. The CLI is the source of truth.

Live introspection commands (use these before reaching for any reference file):

Command	What it serves
`carto workflows schema`	Index of all bundle/DAG schema sections
`carto workflows schema bundle`	Top-level bundle shape (id, title, connectionId, config, privacy, tags). `privacy` is a `$ref` — fetch its shape with `carto workflows schema privacy` . Minimal valid form: `"privacy": { "privacy": "private" }` (the inner string is not a bare `"private"` — `enums` lists the allowed values for that inner field).
`carto workflows schema config`	Full DAG config (schemaVersion, connectionProvider enum, nodes, edges, variables, viewport, useCache, executionSettings, schedule)
`carto workflows schema node`	Generic node shape, including `data.version` requirement and `data.title` vs `data.label`
`carto workflows schema node.source`	Source/ `ReadTable` node shape and the `data.id == data.inputs[0].value` invariant
`carto workflows schema node.customsql`	Full customsql node spec
`carto workflows schema customsql`	Copy-paste customsql node template (with `version: "2.0.0"` )
`carto workflows schema edge`	Edge shape
`carto workflows schema handles`	Edge handle naming reference — sourceHandle/targetHandle by node type, by operator, by component. Critical for valid edges.
`carto workflows schema variable`	Variable (parameter) shape — `{ order, name, type, value, public }`
`carto workflows schema schedule`	Declarative schedule metadata fields
`carto workflows schema enums`	All valid enums (node types, providers, privacies, schedule frequencies)
`carto workflows components list --connection <conn> --json`	Component catalog for the connected warehouse
`carto workflows components get <names> --connection <conn> --json`	Per-component `inputs` , `outputs` , `notes`
`carto workflows components get <names> --connection <conn> --input-formats --json`	Input-type `format` , `examples` , `pitfalls`
`carto workflows --help`	Full command reference, including schedule-expression dialects per engine

References (only for what the CLI doesn't serve):

```
references/providers/
```
— per-warehouse details (BigQuery, Snowflake, Databricks): identifier quoting, column casing, AT path.
```
references/scheduling.md
```
—
```
add
```
vs
```
update
```
semantics, bundle-level schedule warning, activity-log verification.
```
references/mcp-and-api-publish.md
```
— publishing a workflow as an MCP tool or callable API endpoint: bundle requirements (
```
native.mcptooloutput
```
+ scoped variables + draft descriptions),
```
{{@var}}
```
vs
```
@var
```
substitution syntax,
```
Number → FLOAT64
```
```
LIMIT
```
gotcha, post-publish verification.

references/cross-profile-copy.md

—

workflows copy

mechanics, connection mapping (

--connection-mapping

--connection

--skip-source-validation

, why copies are always new workflows.

```
references/schedule-readd.md
```
— schedules don't transfer across
```
workflows copy
```
; how to re-add them, including dialect translation when source and destination engines differ.

connectionProvider
must match the connection.
config.connectionProvider
(enum in
schema enums
) must match the connection's actual provider — mismatches generate the wrong SQL dialect and error at runtime. Look it up with
carto connections list --search <name> --json
(
connections get
requires a UUID).

CARTO Workflows是一款可视化DAG构建工具，可编译为数据仓库SQL。每个工作流都在已连接的数据仓库内运行——执行阶段不涉及CARTO计算。本技能覆盖完整生命周期：构建DAG（本文档核心内容）、通过CLI操作工作流（增删改查、调度），以及跨配置文件复制（

dev → prod

升级、通过

carto workflows copy

实现客户隔离工作区）——详见下方参考资料。

对于一次性临时SQL，请使用

carto-query-datawarehouse

——工作流适用于可重复、可调度的多步骤DAG。

Bundle结构、组件Schema、输入格式及注意事项均由CLI提供——切勿硬编码或主观假设。CLI是唯一可信来源。

实时自省命令（在查阅任何参考文件前先使用这些命令）：

命令	用途
`carto workflows schema`	所有bundle/DAG schema章节的索引
`carto workflows schema bundle`	顶层bundle结构（id、title、connectionId、config、privacy、tags）。 `privacy` 是 `$ref` ——使用 `carto workflows schema privacy` 获取其结构。最小有效格式： `"privacy": { "privacy": "private" }` （内部字符串并非单纯的 `"private"` —— `enums` 列出了该内部字段的允许值）。
`carto workflows schema config`	完整DAG配置（schemaVersion、connectionProvider枚举、nodes、edges、variables、viewport、useCache、executionSettings、schedule）
`carto workflows schema node`	通用节点结构，包括 `data.version` 要求以及 `data.title` 与 `data.label` 的区别
`carto workflows schema node.source`	Source/ `ReadTable` 节点结构，以及 `data.id == data.inputs[0].value` 的约束
`carto workflows schema node.customsql`	完整customsql节点规范
`carto workflows schema customsql`	可直接复制的customsql节点模板（含 `version: "2.0.0"` ）
`carto workflows schema edge`	边的结构
`carto workflows schema handles`	边句柄命名参考——按节点类型、操作符、组件划分的sourceHandle/targetHandle。对创建有效边至关重要。
`carto workflows schema variable`	变量（参数）结构—— `{ order, name, type, value, public }`
`carto workflows schema schedule`	声明式调度元数据字段
`carto workflows schema enums`	所有有效枚举值（节点类型、提供商、隐私设置、调度频率）
`carto workflows components list --connection <conn> --json`	已连接数据仓库的组件目录
`carto workflows components get <names> --connection <conn> --json`	单个组件的 `inputs` 、 `outputs` 、 `notes`
`carto workflows components get <names> --connection <conn> --input-formats --json`	输入类型的 `format` 、 `examples` 、 `pitfalls`
`carto workflows --help`	完整命令参考，包括各引擎对应的调度表达式语法

参考资料（仅用于CLI未覆盖的内容）：

```
references/providers/
```
——各数据仓库细节（BigQuery、Snowflake、Databricks）：标识符引用、列大小写、AT路径。
```
references/scheduling.md
```
——
```
add
```
与
```
update
```
的语义差异、bundle级调度警告、活动日志验证。
```
references/mcp-and-api-publish.md
```
——将工作流发布为MCP工具或可调用API端点：bundle要求（
```
native.mcptooloutput
```
+ 作用域变量 + 草稿描述）、
```
{{@var}}
```
与
```
@var
```
替换语法、
```
Number → FLOAT64
```
```
LIMIT
```
注意事项、发布后验证。

references/cross-profile-copy.md

——

workflows copy

机制、连接映射（

--connection-mapping

--connection

）、

--skip-source-validation

、复制为何始终生成新工作流。

```
references/schedule-readd.md
```
——调度不会随
```
workflows copy
```
转移；如何重新添加调度，包括源引擎与目标引擎不同时的语法转换。

connectionProvider
必须与连接匹配。
config.connectionProvider
（
schema enums
中的枚举值）必须与连接的实际提供商匹配——不匹配会生成错误的SQL方言并在运行时出错。使用
carto connections list --search <name> --json
查询（
connections get
需要UUID）。

Development process

开发流程

Follow these 6 phases in order for every workflow request. Do not skip or reorder them.

每个工作流请求都必须按以下6个阶段依次执行，不得跳过或调整顺序。

Phase 1 — Gather information

阶段1 — 收集信息

Identify data sources. If the user named tables, note them. Otherwise discover what's available with
```
carto connections list
```
and
```
carto connections describe <connection> "<fqn>"
```
.
Clarify the goal. What transformation? What output? What filters/conditions?
Determine the connection.
```
carto connections list | head -n 20
```
. Note its
```
provider
```
(
```
bigquery
```
/
```
snowflake
```
/
```
databricks
```
) — you will need it for the next step.
Read the provider reference.
<critical-rule id="read-provider-reference"> Before writing any node, you MUST open `references/providers/<provider>.md` (e.g. `references/providers/bigquery.md`) and read it end-to-end. This is non-negotiable.
<why>It contains identifier-quoting rules, column-casing behaviour, Analytics Toolbox path, schedule-expression dialect, and customsql
```
$a
```
/
```
$b
```
placeholder requirements that
```
validate
```
cannot catch. These only surface later as
```
verify-remote
```
failures or runtime SQL errors, and are the single most common cause of late-stage rework.</why>
<do-not>Do not skip this step because the next phases look concrete. Do not rely on memory of a previous run — provider files change.</do-not> </critical-rule>
Fetch the component catalog.
```
carto workflows components list --connection <connection> --json
```
— your only source of truth for component names.

确定数据源。如果用户指定了表，记录下来。否则使用
```
carto connections list
```
和
```
carto connections describe <connection> "<fqn>"
```
发现可用数据源。
明确目标。要进行什么转换？输出是什么？有哪些过滤/条件？

确定连接。执行

carto connections list | head -n 20

，记录其

provider

（

bigquery

snowflake

databricks

）——下一步会用到。

阅读提供商参考文档
<critical-rule id="read-provider-reference"> 在编写任何节点之前，你必须打开`references/providers/<provider>.md`（例如`references/providers/bigquery.md`）并完整阅读。这是硬性要求。
<why>文档包含标识符引用规则、列大小写行为、Analytics Toolbox路径、调度表达式语法以及customsql的
```
$a
```
/
```
$b
```
占位符要求，这些都是
```
validate
```
无法检测到的。这些问题只会在后续
```
verify-remote
```
失败或运行时SQL错误中暴露，是后期返工最常见的原因。</why>
<do-not>不要因为后续阶段看似具体就跳过此步骤。不要依赖之前的记忆——提供商文档可能会更新。</do-not> </critical-rule>
获取组件目录。执行
```
carto workflows components list --connection <connection> --json
```
——这是组件名称的唯一可信来源。

Phase 2 — Design the approach

阶段2 — 设计实现方案

Select components from the catalog you fetched.
Fetch schemas for every component you plan to use.
```
carto workflows components get <name1>,<name2>,<name3> --connection <connection> --json
```
returns
```
inputs
```
,
```
outputs
```
, and
```
notes
```
. Read the
```
notes
```
array carefully — it contains gotchas.

Fetch input type formats.

carto workflows components get <component1>,<component2> --connection <connection> --input-formats --json

returns

format

examples

, and

pitfalls

for each input/output type. Pass component names (e.g.

native.buffer

), NOT input-type names.

Design principles:
- Preserve identifier and spatial columns throughout.
- Prefer native components over
  native.customsql
  . This is not a soft preference. See Native-first rule.
- H3/Quadbin columns work for visualization without geometry extraction.
- Use standard names for visualization:
```
geom
```
  ,
```
h3
```
  ,
```
quadbin
```
  .

从目录中选择组件
获取所有计划使用的组件的Schema。执行
```
carto workflows components get <name1>,<name2>,<name3> --connection <connection> --json
```
返回
```
inputs
```
、
```
outputs
```
和
```
notes
```
。仔细阅读
```
notes
```
数组——其中包含注意事项。

获取输入类型格式。执行

carto workflows components get <component1>,<component2> --connection <connection> --input-formats --json

返回各输入/输出类型的

format

、

examples

和

pitfalls

。传入组件名称（例如

native.buffer

），而非输入类型名称。

设计原则：
- 全程保留标识符和空间列。
- 优先使用原生组件而非
  native.customsql
  。这不是软性偏好。详见原生优先规则。
- H3/Quadbin列无需提取几何即可用于可视化。
- 使用标准名称用于可视化：
```
geom
```
  、
```
h3
```
  、
```
quadbin
```
  。

Phase 3 — Present plan, surface gaps, confirm

阶段3 — 展示方案、暴露缺口、确认需求

Present the workflow plan (components, data flow, decisions). Then explicitly enumerate every gap before building:

Unresolved parameters — thresholds, radii, filter values, time windows, k for k-NN, aggregation columns, output table names, etc.
Analytical decisions left to the user — significance levels, distance metrics, join types, null-handling, dedup keys, CRS, H3/quadbin resolution.
Ambiguities in the request — anything where you had to guess intent.

For each gap, propose a sensible default with its rationale (e.g. "p-value threshold: suggest

0.05

— conventional significance level", "buffer distance: suggest

1000m

— matches the city-block scale of the input"), and ask the user to confirm or override. Never silently pick a value for a user-facing analytical parameter. Wait for confirmation before building.

展示工作流方案（组件、数据流、决策）。然后明确列出所有未解决的缺口再开始构建：

未解决的参数——阈值、半径、过滤值、时间窗口、k-NN的k值、聚合列、输出表名等。
留给用户的分析决策——显著性水平、距离度量、连接类型、空值处理、去重键、CRS、H3/quadbin分辨率。
需求中的歧义——任何你不得不猜测意图的内容。

对于每个缺口，提出合理的默认值并说明理由（例如“p值阈值：建议

0.05

——常规显著性水平”，“缓冲区距离：建议

1000m

——匹配输入数据的城市街区尺度”），并请求用户确认或修改。切勿擅自为面向用户的分析参数选择值。等待确认后再开始构建。

Phase 4 — Build the workflow

阶段4 — 构建工作流

Create the workflow file. Get the bundle/node/edge/variable shapes from
```
carto workflows schema [section]
```
(start with
```
bundle
```
, then
```
node
```
,
```
node.source
```
,
```
node.customsql
```
,
```
edge
```
,
```
handles
```
). For customsql nodes, copy the template from
```
carto workflows schema customsql
```
.
If you set the optional top-level
```
privacy
```
, it must be an object, not a string:
```
"privacy": { "privacy": "private" }
```
(the field name nests). Omit the field entirely if you don't need it —
```
"privacy": "private"
```
will fail
```
validate
```
.
Source nodes (
```
type: "source"
```
) — treat
```
ReadTable
```
like any other component: fetch its spec with
```
carto workflows components get ReadTable --connection <conn> --json
```
to get the canonical
```
inputs[*].title
```
and
```
inputs[*].description
```
. (
```
ReadTable
```
is hidden from
```
components list
```
because it's grouped
```
__internal
```
, but
```
get
```
returns it normally.) Two source-only rules
```
get
```
cannot tell you, both from
```
schema node.source
```
:
- The canvas display name lives in
```
data.label
```
  , NOT
```
data.title
```
  . Generic nodes use
```
title
```
  ; source nodes use
```
label
```
  .
- ```
data.id
```
  and
```
data.inputs[0].value
```
  must be the same FQN.
Canvas layout & naming — apply on every node, every workflow. None of this affects execution, but the user opens the DAG in Builder and a sloppy canvas reads as low quality. The numbers are small and stable; just apply them.
- Snap grid is 16 px. Every
```
x
```
  and
```
y
```
  you write must be
```
% 16 == 0
```
  . Builder snaps drags to this grid; off-grid values look subtly misaligned next to anything the user nudged.
- Card widths are fixed by node type: source nodes render at 192 px (12 cells), generic components at 64 px (4 cells). Knowing this is what lets you reason about gaps.
- Card heights are fixed: every component card and source card is 80 px (5 cells) tall, with a 16 px label rendered below the card body. The label is not part of the card — it lives in the gap to the next card.
- Canonical inter-card gap (right edge → next left edge): 80 px (5 cells) for tight linear placement; 128 px (8 cells) at a fan-in (a join's left input, where an edge from another row needs room). The gap is the constant; left-edge-to-left-edge Δx differs across patterns only because cards have different widths. So a generic→generic linear step is Δx=144 (9 cells); a source→generic step at the same gap is Δx=272 (17 cells); a generic→generic fan-in step is Δx=192 (12 cells).
- Canonical vertical gap (card body bottom → next card body top): 80 px (5 cells), of which the first 16 px is the card's label and the remaining 64 px is whitespace. The label always sits inside the gap, never inside the card. So a stacked-card step is top-to-top Δy = 160 px (10 cells) — 80 (body) + 16 (label) + 64 (whitespace).
- Layout. Source nodes stack at the leftmost column with the same
```
x
```
  , Δy = 144 px (9 cells). The main pipeline runs at the y-midline of the source rows — e.g. sources at y=80 and y=224 → pipeline at y=160. Joins on the midline visually receive both inputs symmetrically.
- data.title
  and
  data.label
  are different fields — never duplicate.
```
title
```
  = short instance-specific verb (≤ 15 chars) describing what this node does in this DAG (
```
"Rank"
```
  ,
```
"Join to score"
```
  ,
```
"To H3"
```
  ).
```
label
```
  = the component's canonical type name as Builder shows it on a fresh drop (
```
"Join"
```
  ,
```
"Create Column"
```
  ,
```
"H3 from GeoPoint"
```
  ) — read from
```
carto workflows components get <name> --json
```
  →
```
components[0].title
```
  . Source nodes only render
```
data.label
```
  on canvas (treat it as a short alias for the table:
```
"Candidates"
```
  ,
```
"Score grid C"
```
  ).
Run
validate
after every write to the file. It's offline, fast, and catches structural errors immediately:
bash
```
carto workflows validate workflow.json --json
```
Treat any save without a passing
```
validate
```
as broken — fix before continuing to the next node/edge.
validate
is authoritative. If a component schema from
```
components get
```
disagrees with what
```
validate
```
accepts, trust
```
validate
```
and adjust the bundle to satisfy it. Do not "fix" the bundle to match the schema if it's already passing validation.
Run
verify
at branch boundaries, not on every save. It hits the warehouse (slower, requires auth), so reserve it for whole sub-DAGs once their structure validates clean, and once at the end before presenting:
bash
```
carto workflows verify-remote workflow.json --connection <connection-name> --json
```
```
verify
```
is what catches column-type mismatches, missing tables, and AT resolution — things
```
validate
```
cannot see.
Fix errors silently — don't expose implementation details to the user.
Iterate until complete, with both
```
validate
```
and a final
```
verify
```
clean.

创建工作流文件。从
```
carto workflows schema [section]
```
获取bundle/node/edge/variable结构（先从
```
bundle
```
开始，然后是
```
node
```
、
```
node.source
```
、
```
node.customsql
```
、
```
edge
```
、
```
handles
```
）。对于customsql节点，从
```
carto workflows schema customsql
```
复制模板。
如果设置可选的顶层
```
privacy
```
，它必须是一个对象，而非字符串：
```
"privacy": { "privacy": "private" }
```
（字段名称嵌套）。如果不需要则完全省略该字段——
```
"privacy": "private"
```
会导致
```
validate
```
失败。
Source节点（
```
type: "source"
```
）——将
```
ReadTable
```
视为普通组件：使用
```
carto workflows components get ReadTable --connection <conn> --json
```
获取规范的
```
inputs[*].title
```
和
```
inputs[*].description
```
。（
```
ReadTable
```
在
```
components list
```
中隐藏，因为它属于
```
__internal
```
分组，但
```
get
```
命令可正常返回。）有两个
```
get
```
无法告知的Source节点专属规则，均来自
```
schema node.source
```
：
- 画布显示名称位于
```
data.label
```
  ，而非
```
data.title
```
  。通用节点使用
```
title
```
  ；Source节点使用
```
label
```
  。
- ```
data.id
```
  和
```
data.inputs[0].value
```
  必须是相同的FQN。
画布布局与命名——每个节点、每个工作流都需遵循。这些不影响执行，但用户会在Builder中打开DAG，混乱的画布会显得质量低下。数值较小且固定，直接应用即可。
- 对齐网格为16 px。你编写的每个
```
x
```
  和
```
y
```
  都必须满足
```
% 16 == 0
```
  。Builder会将拖拽操作对齐到此网格；非网格值与用户微调过的元素相邻时会显得轻微错位。
- 卡片宽度由节点类型决定：Source节点渲染宽度为192 px（12格），通用组件为64 px（4格）。了解这一点才能合理规划间距。
- 卡片高度固定：每个组件卡片和Source卡片高度均为80 px（5格），卡片下方会渲染16 px的标签。标签不属于卡片——位于下一张卡片的间距中。
- 标准卡片间距（右边缘→下一张左边缘）：线性紧凑布局为80 px（5格）；扇入处（连接的左输入，需要为另一行的边留出空间）为128 px（8格）。间距是固定值；左边缘到左边缘的Δx因卡片宽度不同而变化。因此通用→通用的线性步骤Δx=144（9格）；Source→通用的同间距步骤Δx=272（17格）；通用→通用的扇入步骤Δx=192（12格）。
- 标准垂直间距（卡片主体底部→下一张卡片主体顶部）：80 px（5格），其中前16 px是卡片标签，剩余64 px是空白。标签始终位于间距内，而非卡片内。因此堆叠卡片的顶部到顶部Δy=160 px（10格）——80（主体）+16（标签）+64（空白）。
- 布局：Source节点堆叠在最左侧列，
```
x
```
  相同，Δy=144 px（9格）。主流水线位于Source行的y中线——例如Source位于y=80和y=224 → 流水线位于y=160。连接位于中线时，视觉上能对称接收两个输入。
- data.title
  和
  data.label
  是不同字段——切勿重复。
```
title
```
  = 简短的实例化动词（≤15字符），描述此节点在当前DAG中的作用（
```
"Rank"
```
  、
```
"Join to score"
```
  、
```
"To H3"
```
  ）。
```
label
```
  = Builder中拖放新组件时显示的规范类型名称（
```
"Join"
```
  、
```
"Create Column"
```
  、
```
"H3 from GeoPoint"
```
  ）——从
```
carto workflows components get <name> --json
```
  →
```
components[0].title
```
  获取。Source节点仅在画布上渲染
```
data.label
```
  （将其视为表的简短别名：
```
"Candidates"
```
  、
```
"Score grid C"
```
  ）。
每次保存文件后都运行
validate
。它是离线的，速度快，可立即捕获结构错误：
bash
```
carto workflows validate workflow.json --json
```
任何未通过
```
validate
```
的保存都视为损坏——修复后再继续下一个节点/边。
validate
具有权威性。如果
```
components get
```
返回的组件Schema与
```
validate
```
接受的内容不一致，请信任
```
validate
```
并调整bundle以满足要求。如果bundle已通过验证，请勿为匹配Schema而“修改”bundle。
在分支节点处运行
verify
，而非每次保存都运行。它会访问数据仓库（速度较慢，需要授权），因此仅在子DAG结构验证通过后，以及最终提交前使用：
bash
```
carto workflows verify-remote workflow.json --connection <connection-name> --json
```
```
verify
```
会捕获列类型不匹配、表缺失、AT解析问题——这些都是
```
validate
```
无法检测到的。
静默修复错误——不要向用户暴露实现细节。
迭代至完成，确保
```
validate
```
和最终
```
verify
```
均通过。

Phase 5 — Present result

阶段5 — 提交成果

Summarize what was built. Confirm validation success. Wait for user confirmation.

总结构建内容，确认验证成功，等待用户确认。

Phase 6 — Upload to CARTO

阶段6 — 上传至CARTO

Ask if the user wants to upload.
Upload and provide the URL:
bash
```
carto workflows create --file workflow.json --verify
```
The connection comes from
```
connectionId
```
inside the bundle — no
```
--connection
```
flag here.
Do NOT auto-execute unless explicitly requested.

询问用户是否需要上传。

上传并提供URL：

bash

carto workflows create --file workflow.json --verify

连接信息来自bundle内的

connectionId

——无需

--connection

参数。

除非明确要求，否则不要自动执行。

Native-first rule

原生优先规则

native.customsql

is the last tool to reach for, not the first. Before writing a customsql node, attempt the native chain. Fall back to customsql only if at least one of these is true:

The native chain would require more than ~4-5 nodes to express the same logic.
A specific operation has no native equivalent at all (verified via
```
carto workflows components list
```
).
The expression genuinely needs raw warehouse SQL (e.g.,
```
ST_UNION_AGG
```
,
```
LOGICAL_OR
```
,
```
ML.PREDICT
```
, last-N windowing).

Common operations and their native equivalents — try these first:

If you'd write SQL like…	Use natives
`WHERE x = …` / multi-condition filter	`native.where` (predicate), `native.wheresimplified` (UI builder), `native.spatialfilter` (geometry-based match/unmatch split), `native.select` (column projection)
`SELECT a, b, c FROM t` (multi-column projection / rename / multi-expression)	`native.select` (one node, free-form SELECT body)
`SELECT ..., expr AS c FROM t` (add one computed column)	`native.selectexpression` (one column + one expression per node)
`GROUP BY k, SUM(x), AVG(y), COUNT(*)` (single key)	`native.groupby` — `groupby` input is a single `Column` , not multi-column. For multi-key grouping use `native.customsql` .
`JOIN ... ON a.k = b.k` (any join type)	`native.joinv2`
`JOIN ... ON ST_INTERSECTS / ST_CONTAINS / ST_WITHIN`	`native.spatialjoin`
`MIN(ST_DISTANCE(a.geom, b.geom))` across two tables	`native.distance` (augments the main table in place with `nearest_id` + `nearest_distance` — rename them per source if you chain two `native.distance` nodes for two reference tables)
`ST_BUFFER(geom, d)`	`native.buffer`
H3 binning / boundary / center / polyfill	`native.h3frompoint` , `native.h3boundary` (output geometry column is named `<h3col>_geo` , e.g. `index_geo` — not `geom` ), `native.h3center` , `native.h3polyfill`
`ORDER BY ... LIMIT n`	`native.orderby` + `native.limit`
z-score / standardization	`native.normalize`
weighted composite score	`native.spatialcompositeunsupervised` (weighted/PCA), `native.spatialcompositesupervised` (target-driven)
Getis-Ord Gi*, GWR, isolines	`native.getisord` , `native.gwr` , `native.isolines`
Save final node to a table	`native.saveastable`

Signals you're reaching for customsql too early — stop and look for a native chain instead:

The customsql is just a
```
WHERE
```
clause, a single
```
JOIN
```
, a
```
GROUP BY
```
with one or two aggregates, or a column projection.
It wraps a single warehouse function (
```
ST_BUFFER
```
,
```
H3_FROMGEOGPOINT
```
, etc.) for which a dedicated native exists.
Its only purpose is to project/rename/re-cast columns — use
```
native.select
```
(free-form SELECT body, one node) for multiple columns;
```
native.selectexpression
```
is for adding a single computed column.
You're chaining customsql outputs through more customsql nodes — chain natives instead.

When customsql is genuinely the right call, the per-warehouse SQL-dialect footguns live in the matching

references/providers/*.md

(BigQuery backticks, Snowflake casing, Databricks identifiers).

native.customsql

是最后的选择，而非首选。在编写customsql节点之前，尝试使用原生组件链。仅当满足以下至少一个条件时才退而求其次使用customsql：

使用原生组件链表达相同逻辑需要约4-5个以上节点。
特定操作完全没有原生等效组件（通过
```
carto workflows components list
```
验证）。
表达式确实需要原始数据仓库SQL（例如
```
ST_UNION_AGG
```
、
```
LOGICAL_OR
```
、
```
ML.PREDICT
```
、最近N条数据窗口函数）。

常见操作及其原生等效组件——优先尝试这些：

若你要编写此类SQL…	使用原生组件
`WHERE x = …` / 多条件过滤	`native.where` （谓词）、 `native.wheresimplified` （UI构建器）、 `native.spatialfilter` （基于几何的匹配/不匹配拆分）、 `native.select` （列投影）
`SELECT a, b, c FROM t` （多列投影/重命名/多表达式）	`native.select` （单个节点，自由形式SELECT主体）
`SELECT ..., expr AS c FROM t` （添加一个计算列）	`native.selectexpression` （每个节点对应一列+一个表达式）
`GROUP BY k, SUM(x), AVG(y), COUNT(*)` （单键）	`native.groupby` —— `groupby` 输入是单个 `Column` ，而非多列。多键分组请使用 `native.customsql` 。
`JOIN ... ON a.k = b.k` （任何连接类型）	`native.joinv2`
`JOIN ... ON ST_INTERSECTS / ST_CONTAINS / ST_WITHIN`	`native.spatialjoin`
跨两张表计算 `MIN(ST_DISTANCE(a.geom, b.geom))`	`native.distance` （在主表中添加 `nearest_id` + `nearest_distance` 字段——如果为两个参考表链式调用两个 `native.distance` 节点，请根据源表重命名这些字段）
`ST_BUFFER(geom, d)`	`native.buffer`
H3分箱/边界/中心/填充	`native.h3frompoint` 、 `native.h3boundary` （输出几何列名为 `<h3col>_geo` ，例如 `index_geo` ——不是 `geom` ）、 `native.h3center` 、 `native.h3polyfill`
`ORDER BY ... LIMIT n`	`native.orderby` + `native.limit`
z-score / 标准化	`native.normalize`
加权综合评分	`native.spatialcompositeunsupervised` （加权/PCA）、 `native.spatialcompositesupervised` （目标驱动）
Getis-Ord Gi*、GWR、等值线	`native.getisord` 、 `native.gwr` 、 `native.isolines`
将最终节点保存到表	`native.saveastable`

以下信号表明你过早使用customsql——停止操作并寻找原生组件链：

customsql仅包含
```
WHERE
```
子句、单个
```
JOIN
```
、带一两个聚合函数的
```
GROUP BY
```
或列投影。
它封装了单个数据仓库函数（
```
ST_BUFFER
```
、
```
H3_FROMGEOGPOINT
```
等），而已有专门的原生组件。
其唯一目的是投影/重命名/转换列——多列使用
```
native.select
```
（自由形式SELECT主体，单个节点）；添加单个计算列使用
```
native.selectexpression
```
。
你正在将customsql输出链式传递给更多customsql节点——改为链式调用原生组件。

当确实需要使用customsql时，各数据仓库的SQL方言陷阱在对应的

references/providers/*.md

中（BigQuery反引号、Snowflake大小写、Databricks标识符）。

Fetching component & input information

获取组件与输入信息

Do not rely on memorized component schemas or input formats. Always fetch live data from the CLI.

Command Purpose

Command	Purpose
`carto workflows components list --connection <conn> --json`	List all available components
`carto workflows components get <names> --connection <conn> --json`	Component schemas with `inputs` , `outputs` , and `notes`
`carto workflows components get <names> --connection <conn> --input-formats --json`	Input type `format` , `examples` , `pitfalls` for the types those components use

carto workflows components list --connection <conn> --json

List all available components

carto workflows components get <names> --connection <conn> --json

Component schemas with

inputs

outputs

, and

notes

carto workflows components get <names> --connection <conn> --input-formats --json

Input type

format

examples

pitfalls

for the types those components use

What to look for in the response:

Component
notes
— gotcha strings: non-obvious behavior, deprecated status, output column naming.
Input
format
— prose describing the expected value shape.
Input
examples
— concrete JSON snippets showing correct usage.
Input
pitfalls
— common mistakes, evaluation order, format quirks.
Component
version
— copy verbatim into the authored node's
```
data.version
```
(string). Generic nodes without it are flagged OUTDATED in Builder.
Input
options
(Selection / Enum) — the engine matches values exactly. Copy each option string verbatim — preserve case, never paraphrase or Title-Case (e.g. spatialjoin's
```
jointype
```
accepts
```
"inner"
```
, not
```
"Inner"
```
).

For values that may evolve over time (component versions, bundle/config defaults, enum option lists), treat the CLI's

components get

schema

output as the single source of truth — never hardcode values in your own templates. Specifically:

config.schemaVersion
— read the current default from
```
carto workflows schema config --json
```
→
```
properties.schemaVersion.default
```
. Today it's
```
"1.0.0"
```
(string), but resolve at author time so future bumps don't require a skill update.

不要依赖记忆中的组件Schema或输入格式。始终从CLI获取实时数据。

命令用途

命令	用途
`carto workflows components list --connection <conn> --json`	列出所有可用组件
`carto workflows components get <names> --connection <conn> --json`	组件Schema，包含 `inputs` 、 `outputs` 和 `notes`
`carto workflows components get <names> --connection <conn> --input-formats --json`	组件所用输入类型的 `format` 、 `examples` 、 `pitfalls`

carto workflows components list --connection <conn> --json

列出所有可用组件

carto workflows components get <names> --connection <conn> --json

组件Schema，包含

inputs

、

outputs

和

notes

carto workflows components get <names> --connection <conn> --input-formats --json

组件所用输入类型的

format

、

examples

、

pitfalls

响应中需要关注的内容：

组件
notes
——注意事项：非直观行为、已弃用状态、输出列命名。
输入
format
——描述预期值结构的说明文字。
输入
examples
——展示正确用法的具体JSON片段。
输入
pitfalls
——常见错误、求值顺序、格式 quirks。
组件
version
——原样复制到创建的节点的
```
data.version
```
（字符串）。没有该字段的通用节点在Builder中会被标记为OUTDATED。
输入
options
（选择/枚举）——引擎会精确匹配值。原样复制每个选项字符串——保留大小写，切勿改写或转为标题大小写（例如spatialjoin的
```
jointype
```
接受
```
"inner"
```
，而非
```
"Inner"
```
）。

对于可能随时间演变的值（组件版本、bundle/config默认值、枚举选项列表），将CLI的

components get

schema

输出视为唯一可信来源——切勿在自己的模板中硬编码值。具体来说：

config.schemaVersion
——从
```
carto workflows schema config --json
```
→
```
properties.schemaVersion.default
```
读取当前默认值。目前是
```
"1.0.0"
```
（字符串），但在创建时解析，以便未来版本升级无需更新技能。

Provider-specific notes

提供商特定说明

Different warehouses have different SQL dialects, table-naming conventions, and column-casing rules. Always check the matching provider guide:

```
references/providers/bigquery.md
```
```
references/providers/snowflake.md
```
```
references/providers/databricks.md
```

Input-type formats (

Table

Column

ColumnsForJoin

SelectColumnAggregation

, etc.) and per-component gotchas (including the "AT components need

verify

, not

validate

" rule) are served by the CLI itself — see Fetching component & input information.

不同数据仓库有不同的SQL方言、表命名约定和列大小写规则。务必查看对应的提供商指南：

```
references/providers/bigquery.md
```
```
references/providers/snowflake.md
```
```
references/providers/databricks.md
```

输入类型格式（

Table

、

Column

、

ColumnsForJoin

、

SelectColumnAggregation

等）和组件特定注意事项（包括“AT组件需要

verify

，而非

validate

”规则）由CLI直接提供——详见获取组件与输入信息。

Operating a workflow (after it's built)

工作流操作（构建完成后）

Once a workflow exists in CARTO, the CLI exposes CRUD and schedule management. Quick reference:

bash

undefined

工作流在CARTO中创建后，CLI提供增删改查和调度管理功能。快速参考：

bash

undefined

List / inspect

列出 / 查看详情

carto workflows list --json carto workflows get <id>

Update with edited JSON

使用编辑后的JSON更新

carto workflows update <id> --file workflow.json

Add / remove a schedule

添加 / 删除调度

carto workflows schedule add <id> --expression "every day 08:00" carto workflows schedule remove <id>


Always-on guidance:

- **Workflows run on the connection's warehouse.** A workflow with a BigQuery connection cannot use Snowflake-specific SQL.
- **Schedule expression syntax depends on the engine** — natural-language for BQ/CARTO DW (`"every day 08:00"`), cron for Snowflake/Postgres (`"0 8 * * *"`), Quartz cron for Databricks (`"0 0 8 * * ?"`). See [`references/scheduling.md`](references/scheduling.md). Picking the wrong dialect fails at schedule-add time.
- **Copying a workflow across profiles** (dev → prod, customer-segregated workspaces) is covered in [`references/cross-profile-copy.md`](references/cross-profile-copy.md). Schedules don't transfer — see [`references/schedule-readd.md`](references/schedule-readd.md).
- **Deleting a workflow doesn't delete its outputs.** Tables/views the workflow created in the warehouse persist; clean them up with `carto sql job` if needed.
- **`workflows update` replaces the whole DAG.** There's no per-node patch. Always `get` first, edit, then `update`.
- **Workflow execution status** lives in the activity log (`WorkflowRun`, `WorkflowExecutionComplete` event types). For health monitoring of scheduled workflows, query that log via [`carto-query-datawarehouse`](../carto-query-datawarehouse) — see `references/activity-queries.md` in that skill.

carto workflows schedule add <id> --expression "every day 08:00" carto workflows schedule remove <id>


通用指导：

- **工作流在连接的数据仓库上运行**。使用BigQuery连接的工作流无法使用Snowflake专属SQL。
- **调度表达式语法取决于引擎**——BQ/CARTO DW使用自然语言（`"every day 08:00"`），Snowflake/Postgres使用cron（`"0 8 * * *"`），Databricks使用Quartz cron（`"0 0 8 * * ?"`）。详见[`references/scheduling.md`](references/scheduling.md)。选择错误语法会在添加调度时失败。
- **跨配置文件复制工作流**（dev → prod、客户隔离工作区）详见[`references/cross-profile-copy.md`](references/cross-profile-copy.md)。调度不会随复制转移——详见[`references/schedule-readd.md`](references/schedule-readd.md)。
- **删除工作流不会删除其输出**。工作流在数据仓库中创建的表/视图会保留；如需清理请使用`carto sql job`。
- **`workflows update`会替换整个DAG**。不支持单个节点补丁。始终先`get`、编辑、再`update`。
- **工作流执行状态**位于活动日志中（`WorkflowRun`、`WorkflowExecutionComplete`事件类型）。如需监控调度工作流的健康状态，请通过[`carto-query-datawarehouse`](../carto-query-datawarehouse)查询该日志——详见该技能中的`references/activity-queries.md`。