carto-hotspot-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Hotspot Analysis with Getis-Ord Gi*

基于Getis-Ord Gi*的热点分析

Builds CARTO Workflows that identify statistically significant spatial clusters (hotspots and coldspots) using the Getis-Ord Gi* statistic.
Prerequisites: Load
carto-create-workflow
for the development process, JSON structure, and validation commands.

构建使用Getis-Ord Gi*统计量识别具有统计显著性的空间聚类(热点和冷点)的CARTO工作流。
前提条件:加载
carto-create-workflow
以获取开发流程、JSON结构和验证命令。

Instructions

操作步骤

A hotspot workflow always follows this pipeline:
Source Data → (Filter) → Spatial Indexing → Aggregation → Getis-Ord Gi* → (Filter Significant) → Save
热点分析工作流始终遵循以下流程:
源数据 → (过滤) → 空间索引 → 聚合 → Getis-Ord Gi* → (过滤显著性结果) → 保存

Step 1: Load Source Data

步骤1:加载源数据

Use
native.gettablebyname
. The input table typically contains point geometries.
Success: Node outputs a table with a geometry column (e.g.
geom
).
使用
native.gettablebyname
。输入表格通常包含点几何数据。
成功标志:节点输出包含几何列(例如
geom
)的表格。

Step 2: Filter (if needed)

步骤2:过滤(如有需要)

Use
native.wheresimplified
or
native.where
to narrow the dataset before analysis (e.g. filter by category, date range, non-null values).
Success: Output contains only the subset relevant to the analysis.
使用
native.wheresimplified
native.where
在分析前缩小数据集范围(例如按类别、日期范围、非空值过滤)。
成功标志:输出仅包含与分析相关的子集。

Step 3: Build a Complete Grid

步骤3:构建完整网格

Preferred approach: First polyfill the study area boundary (e.g. district polygons) with
native.h3polyfill
to create a complete, gap-free grid. Then enrich this grid with the data to analyze (e.g. count points per cell via
native.h3enrich
or a manual join + group by). This ensures every cell in the study area has a value (even if 0), which Getis-Ord needs — gaps in the grid distort the neighborhood calculations and can produce misleading results.
Simpler alternative (when no study area boundary is available): Convert point geometries directly to grid cells with
native.h3frompoint
or
native.quadbinfromgeopoint
. Be aware this only produces cells where data exists, leaving gaps that may affect the statistic.
Resolution guidance — higher resolution = smaller cells = more local patterns:
ResolutionCell sizeUse case
H3 res 7~5 km edgeDistrict/city-level patterns
H3 res 8~2 km edgeNeighborhood-level
H3 res 9~500m edgeStreet-level
Success: A contiguous grid covering the study area, with every cell assigned a spatial index column (e.g.
h3
).
推荐方法:首先使用
native.h3polyfill
对研究区域边界(例如行政区多边形)进行填充,创建一个完整无间隙的网格。然后用待分析的数据丰富该网格(例如通过
native.h3enrich
或手动连接+分组统计每个网格单元的点数)。这确保了研究区域内的每个单元都有一个值(即使是0),这是Getis-Ord所需的——网格中的间隙会扭曲邻域计算,可能产生误导性结果。
简化替代方案(当没有研究区域边界时):使用
native.h3frompoint
native.quadbinfromgeopoint
将点几何数据直接转换为网格单元。请注意,这仅会生成存在数据的单元,留下的间隙可能会影响统计结果。
分辨率指南——分辨率越高=单元越小=更局部的模式:
分辨率单元大小使用场景
H3 res 7~5 km 边长行政区/城市级模式
H3 res 8~2 km 边长社区级
H3 res 9~500m 边长街道级
成功标志:覆盖研究区域的连续网格,每个单元都被分配了空间索引列(例如
h3
)。

Step 4: Aggregate per Cell

步骤4:按单元聚合

Use
native.groupby
to produce one row per cell with a numeric value:
  • Group by: the spatial index column (
    h3
    )
  • Aggregation:
    h3,count
    (or
    value_col,sum
    /
    value_col,avg
    )
If using the polyfill approach, cells with no data should have a value of 0 (use
COALESCE(count, 0)
via
native.selectexpression
after joining).
Success: Output has exactly one row per unique cell with a count/sum column — no gaps.
使用
native.groupby
生成每个单元对应一行的数值结果:
  • 分组依据:空间索引列(
    h3
  • 聚合方式
    h3,count
    (或
    value_col,sum
    /
    value_col,avg
如果使用填充方法,无数据的单元值应为0(连接后通过
native.selectexpression
使用
COALESCE(count, 0)
)。
成功标志:输出每个唯一单元对应一行,包含计数/求和列——无间隙。

Step 5: Run Getis-Ord Gi*

步骤5:运行Getis-Ord Gi*

Use
native.getisord
with:
InputDescriptionDefault
indexcol
Column with H3/Quadbin indexes
h3
valuecol
Numeric column to analyze
h3_count
kernel
Weighting function for neighbors
uniform
size
K-ring size (neighborhood radius in hops)
3
Kernel options:
uniform
,
triangular
,
quadratic
,
quartic
,
gaussian
. Default to
uniform
(equal weight to all neighbors) unless the user has a reason to decay weight with distance.
K-ring size: Larger = smoother, broader patterns. Smaller = more localized clusters.
Success: Output contains
index
,
gi
(z-score), and
p_value
columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
使用
native.getisord
,参数如下:
输入参数描述默认值
indexcol
包含H3/Quadbin索引的列
h3
valuecol
待分析的数值列
h3_count
kernel
邻域加权函数
uniform
size
K环大小(邻域半径的跳数)
3
核函数选项
uniform
triangular
quadratic
quartic
gaussian
。默认使用
uniform
(所有邻域权重相等),除非用户有理由随距离衰减权重。
K环大小:值越大=模式越平滑、范围越广;值越小=聚类越局部化。
成功标志:输出包含每个单元的
index
gi
(z分数)和
p_value
列。(参见注意事项中的提供商大小写说明——Snowflake会将这些列名转为大写。)

Step 6: Filter Significant Results (optional)

步骤6:过滤显著性结果(可选)

Use
native.where
to keep only statistically significant cells:
  • p_value < 0.05
    — 95% confidence
  • p_value < 0.05 AND gi > 0
    — hotspots only
  • p_value < 0.05 AND gi < 0
    — coldspots only
Success: Only cells with statistically meaningful clustering remain.
使用
native.where
仅保留具有统计显著性的单元:
  • p_value < 0.05
    — 95%置信度
  • p_value < 0.05 AND gi > 0
    — 仅保留热点
  • p_value < 0.05 AND gi < 0
    — 仅保留冷点
成功标志:仅保留具有统计意义的聚类单元。

Step 7: Save

步骤7:保存

Use
native.saveastable
to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.
Success: Validated workflow that can be uploaded via
carto workflows create
.

使用
native.saveastable
保存结果。H3/Quadbin列可直接在CARTO Builder中可视化,无需转换几何数据。
成功标志:可通过
carto workflows create
上传的已验证工作流。

Output Columns

输出列

ColumnMeaning
index
Spatial index cell ID (H3 or Quadbin)
gi
Gi* z-score — positive = hotspot, negative = coldspot
p_value
Statistical significance — lower = more confident
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.

列名含义
index
空间索引单元ID(H3或Quadbin)
gi
Gi* z分数 — 正数=热点,负数=冷点
p_value
统计显著性 — 值越小=置信度越高
引擎会将这些列名声明为小写。有关Snowflake的情况,请参见注意事项中的提供商大小写说明。

Gotchas

注意事项

  • Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
    H3
    ,
    INDEX
    ,
    GI
    ,
    P_VALUE
    ,
    H3_COUNT
    in expressions. For dialect-specific SQL fragments (e.g.
    DATETIME_TRUNC
    below), see
    carto-create-workflow/references/providers/<provider>.md
    for the equivalents table.
  • The Getis-Ord component requires the Analytics Toolbox. Always run
    carto workflows verify-remote --connection <conn>
    to ensure the AT path is resolved.
    carto workflows validate
    is offline and cannot resolve AT location.
  • The output column is named
    index
    , not
    h3
    or
    quadbin
    . If you need to join back to original data, rename it (e.g. with
    native.renamecolumn
    ).
  • If you call
    native.h3boundary
    to materialize cell geometries for visualization, the new column is named
    <h3col>_geo
    (e.g.
    index_geo
    ), not
    geom
    . Reference it accordingly in downstream nodes.
  • The
    valuecol
    must be numeric. If you're counting features, the group-by step must produce a count column — don't pass the raw index column as the value.
  • Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
  • An empty result from the filter step (Step 6) usually means the k-ring size is too small or the data is too sparse for significant clustering. Try increasing
    size
    or lowering the resolution.
  • Date columns must be DATETIME type for spacetime Getis-Ord. CAST if your data has DATE or TIMESTAMP.
  • Temporal bandwidth choice dramatically affects results.
    bandwidth=1
    detects rapid changes;
    bandwidth=3+
    smooths over longer trends.
  • For time-series clustering, pre-filter to only significant cells (the 60% heuristic) to avoid clustering noise.
  • The spacetime classification component runs internally on the Gi* output -- do NOT filter by p_value before classification, or the trend test will have incomplete data.

  • 提供商大小写与SQL方言:本技能使用小写记录列名(符合BigQuery / Databricks / Postgres / Redshift惯例)。在Snowflake中,未加引号的标识符会显示为大写——在表达式中需引用
    H3
    INDEX
    GI
    P_VALUE
    H3_COUNT
    。有关特定方言的SQL片段(如下方的
    DATETIME_TRUNC
    ),请查看
    carto-create-workflow/references/providers/<provider>.md
    中的等效表。
  • Getis-Ord组件需要Analytics Toolbox。请始终运行
    carto workflows verify-remote --connection <conn>
    以确保AT路径已解析。
    carto workflows validate
    是离线操作,无法解析AT位置。
  • 输出列名为
    index
    ,而非
    h3
    quadbin
    。如果需要连接回原始数据,请重命名该列(例如使用
    native.renamecolumn
    )。
  • 如果调用
    native.h3boundary
    生成单元几何数据用于可视化,新列名为
    <h3col>_geo
    (例如
    index_geo
    ),不是
    geom
    。在下游节点中请相应引用。
  • valuecol
    必须为数值类型。如果是统计要素数量,分组步骤必须生成计数列——不要将原始索引列作为值传入。
  • 分辨率过高+区域过大=单元数量过多,可能导致速度缓慢或超出内存限制。请从适中的分辨率开始,逐步调整。
  • 过滤步骤(步骤6)返回空结果通常意味着K环大小过小或数据过于稀疏,无法形成显著聚类。尝试增大
    size
    或降低分辨率。
  • 用于时空Getis-Ord的日期列必须为DATETIME类型。如果数据是DATE或TIMESTAMP类型,请进行转换。
  • 时间带宽的选择会极大影响结果。
    bandwidth=1
    检测快速变化;
    bandwidth=3+
    平滑长期趋势。
  • 对于时间序列聚类,预过滤仅保留显著单元(60%启发式规则)以避免聚类噪声。
  • 时空分类组件在Gi*输出内部运行——不要在分类前按p_value过滤,否则趋势测试会缺少完整数据。

Spacetime Variants

时空变体

Getis-Ord Spacetime (
native.getisordspacetime
):
  • Extends basic Gi* to detect clusters in both space AND time.
  • Additional inputs:
    kerneltime
    (uniform/gaussian),
    bandwidth
    (number of time steps),
    timeinterval
    (week/month/day).
  • Data must be pre-aggregated into time bins (e.g. weekly counts per H3 cell).
  • Pipeline: points -> H3 -> create time column (BigQuery:
    DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)
    ; Snowflake / Databricks / Postgres:
    DATE_TRUNC('WEEK', datetime)
    ) -> GROUP BY (h3, time_bin) -> Getis-Ord Spacetime -> filter
    p_value < 0.05 AND gi > 0
    .
Spacetime Hotspot Classification (
native.spacetimehotspotsclassification
):
  • Chains AFTER Getis-Ord Spacetime output.
  • Classifies each cell's temporal trend: new hotspot, consecutive, intensifying, diminishing, sporadic, oscillating, historical.
  • Uses Modified Mann-Kendall trend test with a significance threshold (default 0.05).
  • Pipeline: ... -> Getis-Ord Spacetime -> Spacetime Hotspots Classification.
Time Series Clustering (
native.timeseriesclustering
):
  • Groups locations by similarity of their temporal Gi* pattern.
  • Chain: Getis-Ord Spacetime -> filter significant cells -> Cluster Time Series.
  • Method:
    profile
    (shape-based) or
    value
    (magnitude-based).
  • Filtering heuristic from the template: keep cells where >=60% of time steps have
    p_value < 0.05
    .

Getis-Ord时空分析 (
native.getisordspacetime
):
  • 将基础Gi*扩展到同时检测空间和时间中的聚类。
  • 额外输入参数:
    kerneltime
    (uniform/gaussian)、
    bandwidth
    (时间步数)、
    timeinterval
    (周/月/日)。
  • 数据必须预先聚合到时间区间中(例如每个H3单元的每周计数)。
  • 流程:点数据 -> H3 -> 创建时间列(BigQuery:
    DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)
    ;Snowflake / Databricks / Postgres:
    DATE_TRUNC('WEEK', datetime)
    ) -> 按(h3, time_bin)分组 -> Getis-Ord时空分析 -> 过滤
    p_value < 0.05 AND gi > 0
时空热点分类 (
native.spacetimehotspotsclassification
):
  • 在Getis-Ord时空分析输出之后链式调用。
  • 对每个单元的时间趋势进行分类:新热点、持续热点、增强型、减弱型、偶发型、波动型、历史型。
  • 使用修正Mann-Kendall趋势测试,显著性阈值默认0.05。
  • 流程:... -> Getis-Ord时空分析 -> 时空热点分类。
时间序列聚类 (
native.timeseriesclustering
):
  • 根据时空Gi*模式的相似性对位置进行分组。
  • 链式流程:Getis-Ord时空分析 -> 过滤显著单元 -> 时间序列聚类。
  • 方法:
    profile
    (基于形状)或
    value
    (基于幅度)。
  • 模板中的过滤启发式规则:保留至少60%时间步满足
    p_value < 0.05
    的单元。

Reference Templates

参考模板

These files are working examples (skill-local files in
hotspot-analysis/
, others in the project root):
FileDescription
poi_hotspot.json
Stockholm amenity POIs — H3 res 9, uniform kernel, k=3
space_time_hotspot.json
Barcelona accidents — spacetime Gi*, H3 res 9, weekly bins
spacetime_hotspot_classification.json
London collisions — spacetime Gi* + classification, gaussian kernel

这些文件是可用示例(本技能本地文件位于
hotspot-analysis/
,其他文件位于项目根目录):
文件描述
poi_hotspot.json
斯德哥尔摩便利设施POI数据 — H3 res 9,uniform核函数,k=3
space_time_hotspot.json
巴塞罗那事故数据 — 时空Gi*,H3 res 9,周区间
spacetime_hotspot_classification.json
伦敦碰撞数据 — 时空Gi* + 分类,gaussian核函数

Common Variations

常见变体

VariantHow
Polygon input instead of pointsUse
native.h3polyfill
instead of
native.h3frompoint
Enrich existing gridUse
native.h3enrich
to count points into a grid (avoids manual group-by + join)
Combine with other dataJoin Getis-Ord output with enrichment or attribute tables before saving
Spacetime hotspotsUse
native.getisordspacetime
— see Spacetime Variants section above
Classify hotspot trendsUse
native.spacetimehotspotsclassification
— chains after spacetime Gi* output
变体实现方式
多边形输入而非点数据使用
native.h3polyfill
替代
native.h3frompoint
丰富现有网格使用
native.h3enrich
将点数据计数到网格中(避免手动分组+连接)
与其他数据结合在保存前将Getis-Ord输出与增强数据或属性表连接
时空热点使用
native.getisordspacetime
— 参见上方的时空变体部分
热点趋势分类使用
native.spacetimehotspotsclassification
— 在时空Gi*输出之后链式调用