carto-spatial-enrichment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpatial Enrichment in CARTO Workflows
CARTO工作流中的空间富集
Prerequisites: Load for the development process.
carto-create-workflowThis skill covers the universal pattern for enriching spatial data with demographics, risk scores, or any variable from a spatial features dataset.
前置条件:开发过程中需加载。
carto-create-workflow本技能涵盖了使用人口统计数据、风险评分或空间特征数据集中的任意变量来富集空间数据的通用模式。
Instructions
操作步骤
Follow the 5-step universal enrichment pattern. Each step maps to one or more workflow components.
遵循以下5步通用富集模式,每个步骤对应一个或多个工作流组件。
Step 1: Load source data
步骤1:加载源数据
Load the business entities (stores, points, polygons) that need enrichment.
- Points or polygons from a table: use a source node
ReadTable - Custom geometry: use to inline a GeoJSON polygon
native.tablefromgeojson
加载需要进行富集的业务实体(门店、点、多边形)。
- 来自表格的点或多边形:使用源节点
ReadTable - 自定义几何图形:使用嵌入GeoJSON多边形
native.tablefromgeojson
Step 2: Define target area
步骤2:定义目标区域
Choose one method based on the use case:
| Method | Component | When to use |
|---|---|---|
| Buffer | | Distance-based area (e.g. 1km around each store) |
| Isochrones | | Drive-time or walk-time areas |
| Direct polygon | (none needed) | Source data is already polygons |
| Direct points | (none needed) | Skip to Step 4 with |
根据使用场景选择以下方法之一:
| 方法 | 组件 | 使用场景 |
|---|---|---|
| 缓冲区 | | 基于距离的区域(例如每个门店周边1公里范围) |
| 等时线 | | 驾车或步行可达区域 |
| 直接使用多边形 | (无需组件) | 源数据本身已是多边形 |
| 直接使用点 | (无需组件) | 跳过步骤3,直接使用 |
Step 3: Spatial indexing (polyfill)
步骤3:空间索引(多边形填充)
Convert areas into a grid for enrichment. This step is required when using grid-based enrichment () or manual JOIN.
native.h3enrich- H3 grid (most common): -- set
native.h3polyfillto match the enrichment datasetresolution - Quadbin grid: -- set
native.quadbinpolyfillto match the enrichment datasetresolution
Key decision -- index type: Use H3 unless the enrichment data is natively in Quadbin.
将区域转换为网格以进行富集。当使用基于网格的富集()或手动JOIN时,此步骤为必填项。
native.h3enrich- H3网格(最常用):—— 设置
native.h3polyfill以匹配富集数据集resolution - Quadbin网格:—— 设置
native.quadbinpolyfill以匹配富集数据集resolution
关键决策——索引类型:除非富集数据原生为Quadbin格式,否则默认使用H3。
Step 4: Enrich
步骤4:执行富集
Two approaches, each with different column naming:
A) CARTO ENRICH procedures (recommended for Data Observatory or spatial features data):
- -- enrich an H3 grid
native.h3enrich - -- enrich points directly (skip Step 3)
native.enrichpoints - -- enrich polygons directly (skip Step 3)
native.enrichpolygons - Output columns are named (e.g.
{variable}_{aggregation},population_sum)air_quality_avg
B) Manual JOIN on the spatial index column:
- Use with the H3/Quadbin column as the join key
native.join - Output columns from the secondary table get a suffix
_joined - Default is INNER JOIN (silently drops unmatched cells)
Aggregation method guidance:
- -- population counts, totals
SUM - /
MAX-- risk scores, thresholdsMIN - -- quality metrics, indices
AVG
有两种方法,各自的列命名规则不同:
A) CARTO ENRICH 流程(推荐用于数据观测站或空间特征数据):
- —— 富集H3网格
native.h3enrich - —— 直接富集点数据(跳过步骤3)
native.enrichpoints - —— 直接富集多边形数据(跳过步骤3)
native.enrichpolygons - 输出列命名格式为(例如
{variable}_{aggregation}、population_sum)air_quality_avg
B) 基于空间索引列的手动JOIN:
- 使用,以H3/Quadbin列作为连接键
native.join - 从次级表输出的列会添加后缀
_joined - 默认使用INNER JOIN(自动丢弃未匹配的单元格)
聚合方法指南:
- —— 人口统计、总计类数据
SUM - /
MAX—— 风险评分、阈值类数据MIN - —— 质量指标、指数类数据
AVG
Step 5: Save results
步骤5:保存结果
Use to persist the enriched output.
native.saveastableIf the goal is per-entity enrichment (e.g. population per store), add a second JOIN + GROUP BY to aggregate grid-level results back to the source entity level.
Success: The workflow loads source data, defines areas, indexes to a grid (if needed), enriches with the target variables using the correct aggregation, and saves the result. Column names in downstream references match the enrichment method used.
使用保存富集后的输出结果。
native.saveastable如果目标是实现按实体富集(例如每个门店的人口数据),需添加第二次JOIN + GROUP BY操作,将网格级别的结果聚合回源实体级别。
成功标准:工作流需完成源数据加载、区域定义、网格索引(如需要)、使用正确聚合方式富集目标变量,并保存结果。下游引用中的列名需与所使用的富集方法匹配。
Gotchas
注意事项
- Provider casing & SQL dialect. This skill documents output column names in lowercase (,
population_sum,air_quality_avg, etc.) — BigQuery / Databricks / Postgres / Redshift convention. On Snowflake, unquoted identifiers surface UPPERCASE — reference them as<column>_joined,POPULATION_SUM,AIR_QUALITY_AVG. See<COLUMN>_JOINEDfor casing rules and SQL dialect equivalents.carto-create-workflow/references/providers/<provider>.md - Resolution alignment is critical. The polyfill resolution MUST match the enrichment dataset's native resolution (e.g. H3 resolution 8 with resolution 8 spatial features). A mismatch produces zero JOIN matches with NO error.
- Manual JOIN drops unmatched cells. defaults to INNER JOIN, silently dropping cells with no enrichment data. Use LEFT JOIN if completeness matters.
native.join - Deduplicate after polyfill. Use or GROUP BY on the index column to remove duplicate cells. If you need to preserve source identity (e.g. which store each cell came from), set
SELECT DISTINCTin the polyfill node.includecols: true - Column naming differs by method. ENRICH procedures produce columns. Manual JOIN produces
{variable}_{aggregation}columns. Plan downstream SQL references accordingly.{column}_joined - Buffer distance is in meters. Isoline range units depend on type: seconds for time-based, meters for distance-based.
- Re-aggregation needed for entity-level results. After grid enrichment, data is at the cell level. To get per-store or per-location totals, add a second JOIN + GROUP BY step to roll cell-level values back to the source entity.
- 提供商大小写规则与SQL方言:本技能文档中的输出列名为小写格式(、
population_sum、air_quality_avg等)—— 这是BigQuery / Databricks / Postgres / Redshift的约定。在Snowflake中,未加引号的标识符会显示为大写——需以<column>_joined、POPULATION_SUM、AIR_QUALITY_AVG的形式引用。请查看<COLUMN>_JOINED获取大小写规则和SQL方言对应说明。carto-create-workflow/references/providers/<provider>.md - 分辨率对齐至关重要:多边形填充的分辨率必须与富集数据集的原生分辨率匹配(例如H3分辨率8对应分辨率8的空间特征数据)。不匹配会导致JOIN无匹配结果且无错误提示。
- 手动JOIN会丢弃未匹配单元格:默认使用INNER JOIN,会自动丢弃无富集数据的单元格。如果需要保证完整性,请使用LEFT JOIN。
native.join - 多边形填充后需去重:对索引列使用或GROUP BY来移除重复单元格。如果需要保留源实体标识(例如每个单元格来自哪个门店),请在多边形填充节点中设置
SELECT DISTINCT。includecols: true - 列命名因方法而异:ENRICH流程生成格式的列,手动JOIN生成
{variable}_{aggregation}格式的列。请据此规划下游SQL引用。{column}_joined - 缓冲区距离单位为米:等时线范围单位取决于类型:基于时间的为秒,基于距离的为米。
- 实体级结果需重新聚合:网格富集后的数据为单元格级别。如需获取每个门店或位置的总计数据,需添加第二次JOIN + GROUP BY步骤,将单元格级别的值汇总回源实体级别。
Reference Templates
参考模板
Templates included in this skill folder (from the CARTO Workflows template repository):
| File | Pattern | Description |
|---|---|---|
| enrich_grid.json | GeoJSON polygon -> H3 polyfill -> ENRICH_GRID | Enrich a custom area with sociodemographic H3 data |
| enrich_points.json | Filter points -> ENRICH_POINTS | Enrich point locations with polygon-based risk data |
| estimate_population_around_retail_stores.json | Points -> Buffer -> H3 polyfill -> JOIN -> GROUP BY | Full entity-level enrichment with re-aggregation |
本技能文件夹中包含以下模板(来自CARTO工作流模板仓库):
| 文件 | 模式 | 描述 |
|---|---|---|
| enrich_grid.json | GeoJSON多边形 -> H3多边形填充 -> ENRICH_GRID | 使用社会人口统计H3数据富集自定义区域 |
| enrich_points.json | 过滤点 -> ENRICH_POINTS | 使用基于多边形的风险数据富集点位置 |
| estimate_population_around_retail_stores.json | 点 -> 缓冲区 -> H3多边形填充 -> JOIN -> GROUP BY | 包含重新聚合的完整实体级富集流程 |
Common Variations
常见变体
| Variation | Steps used | Key differences |
|---|---|---|
| Enrich points directly | 1 -> 4 -> 5 | Skip grid; use |
| Enrich polygons directly | 1 -> 4 -> 5 | Skip grid; use |
| Buffer + grid enrichment | 1 -> 2 -> 3 -> 4 -> 5 | |
| Isochrone + grid enrichment | 1 -> 2 -> 3 -> 4 -> 5 | |
| Re-aggregate to source entity | 1 -> 2 -> 3 -> 4 -> JOIN + GROUP BY -> 5 | Add second JOIN to map cells back to source entities |
| 变体 | 使用步骤 | 核心差异 |
|---|---|---|
| 直接富集点数据 | 1 -> 4 -> 5 | 跳过网格;使用 |
| 直接富集多边形数据 | 1 -> 4 -> 5 | 跳过网格;使用 |
| 缓冲区+网格富集 | 1 -> 2 -> 3 -> 4 -> 5 | 先使用 |
| 等时线+网格富集 | 1 -> 2 -> 3 -> 4 -> 5 | 先使用 |
| 重新聚合至源实体 | 1 -> 2 -> 3 -> 4 -> JOIN + GROUP BY -> 5 | 添加第二次JOIN,将单元格映射回源实体 |