carto-hotspot-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Hotspot Analysis with Getis-Ord Gi*

基于Getis-Ord Gi*的热点分析

Builds CARTO Workflows that identify statistically significant spatial clusters (hotspots and coldspots) using the Getis-Ord Gi* statistic.

Prerequisites: Load

carto-create-workflow

for the development process, JSON structure, and validation commands.

构建使用Getis-Ord Gi*统计量识别具有统计显著性的空间聚类（热点和冷点）的CARTO工作流。

前提条件：加载

carto-create-workflow

以获取开发流程、JSON结构和验证命令。

Instructions

操作步骤

A hotspot workflow always follows this pipeline:

Source Data → (Filter) → Spatial Indexing → Aggregation → Getis-Ord Gi* → (Filter Significant) → Save

热点分析工作流始终遵循以下流程：

源数据 → (过滤) → 空间索引 → 聚合 → Getis-Ord Gi* → (过滤显著性结果) → 保存

Step 1: Load Source Data

步骤1：加载源数据

Use

native.gettablebyname

. The input table typically contains point geometries.

Success: Node outputs a table with a geometry column (e.g.

geom

使用

native.gettablebyname

。输入表格通常包含点几何数据。

成功标志：节点输出包含几何列（例如

geom

）的表格。

Step 2: Filter (if needed)

步骤2：过滤（如有需要）

Use

native.wheresimplified

native.where

to narrow the dataset before analysis (e.g. filter by category, date range, non-null values).

Success: Output contains only the subset relevant to the analysis.

使用

native.wheresimplified

或

native.where

在分析前缩小数据集范围（例如按类别、日期范围、非空值过滤）。

成功标志：输出仅包含与分析相关的子集。

Step 3: Build a Complete Grid

步骤3：构建完整网格

Preferred approach: First polyfill the study area boundary (e.g. district polygons) with

native.h3polyfill

to create a complete, gap-free grid. Then enrich this grid with the data to analyze (e.g. count points per cell via

native.h3enrich

or a manual join + group by). This ensures every cell in the study area has a value (even if 0), which Getis-Ord needs — gaps in the grid distort the neighborhood calculations and can produce misleading results.

Simpler alternative (when no study area boundary is available): Convert point geometries directly to grid cells with

native.h3frompoint

native.quadbinfromgeopoint

. Be aware this only produces cells where data exists, leaving gaps that may affect the statistic.

Resolution guidance — higher resolution = smaller cells = more local patterns:

Resolution	Cell size	Use case
H3 res 7	~5 km edge	District/city-level patterns
H3 res 8	~2 km edge	Neighborhood-level
H3 res 9	~500m edge	Street-level

Success: A contiguous grid covering the study area, with every cell assigned a spatial index column (e.g.

h3

推荐方法：首先使用

native.h3polyfill

对研究区域边界（例如行政区多边形）进行填充，创建一个完整无间隙的网格。然后用待分析的数据丰富该网格（例如通过

native.h3enrich

或手动连接+分组统计每个网格单元的点数）。这确保了研究区域内的每个单元都有一个值（即使是0），这是Getis-Ord所需的——网格中的间隙会扭曲邻域计算，可能产生误导性结果。

简化替代方案（当没有研究区域边界时）：使用

native.h3frompoint

或

native.quadbinfromgeopoint

将点几何数据直接转换为网格单元。请注意，这仅会生成存在数据的单元，留下的间隙可能会影响统计结果。

分辨率指南——分辨率越高=单元越小=更局部的模式：

分辨率	单元大小	使用场景
H3 res 7	~5 km 边长	行政区/城市级模式
H3 res 8	~2 km 边长	社区级
H3 res 9	~500m 边长	街道级

成功标志：覆盖研究区域的连续网格，每个单元都被分配了空间索引列（例如

h3

）。

Step 4: Aggregate per Cell

步骤4：按单元聚合

Use

native.groupby

to produce one row per cell with a numeric value:

Group by: the spatial index column (
```
h3
```
)
Aggregation:
```
h3,count
```
(or
```
value_col,sum
```
/
```
value_col,avg
```
)

If using the polyfill approach, cells with no data should have a value of 0 (use

COALESCE(count, 0)

via

native.selectexpression

after joining).

Success: Output has exactly one row per unique cell with a count/sum column — no gaps.

使用

native.groupby

生成每个单元对应一行的数值结果：

分组依据：空间索引列（
```
h3
```
）
聚合方式：
```
h3,count
```
（或
```
value_col,sum
```
/
```
value_col,avg
```
）

如果使用填充方法，无数据的单元值应为0（连接后通过

native.selectexpression

使用

COALESCE(count, 0)

）。

成功标志：输出每个唯一单元对应一行，包含计数/求和列——无间隙。

Step 5: Run Getis-Ord Gi*

步骤5：运行Getis-Ord Gi*

Use

native.getisord

with:

Input	Description	Default
`indexcol`	Column with H3/Quadbin indexes	`h3`
`valuecol`	Numeric column to analyze	`h3_count`
`kernel`	Weighting function for neighbors	`uniform`
`size`	K-ring size (neighborhood radius in hops)	`3`

Kernel options:

uniform

triangular

quadratic

quartic

gaussian

. Default to

uniform

(equal weight to all neighbors) unless the user has a reason to decay weight with distance.

K-ring size: Larger = smoother, broader patterns. Smaller = more localized clusters.

Success: Output contains

index

gi

(z-score), and

p_value

columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)

使用

native.getisord

，参数如下：

输入参数	描述	默认值
`indexcol`	包含H3/Quadbin索引的列	`h3`
`valuecol`	待分析的数值列	`h3_count`
`kernel`	邻域加权函数	`uniform`
`size`	K环大小（邻域半径的跳数）	`3`

核函数选项：

uniform

、

triangular

、

quadratic

、

quartic

、

gaussian

。默认使用

uniform

（所有邻域权重相等），除非用户有理由随距离衰减权重。

K环大小：值越大=模式越平滑、范围越广；值越小=聚类越局部化。

成功标志：输出包含每个单元的

index

、

gi

（z分数）和

p_value

列。（参见注意事项中的提供商大小写说明——Snowflake会将这些列名转为大写。）

Step 6: Filter Significant Results (optional)

步骤6：过滤显著性结果（可选）

Use

native.where

to keep only statistically significant cells:

```
p_value < 0.05
```
— 95% confidence
```
p_value < 0.05 AND gi > 0
```
— hotspots only
```
p_value < 0.05 AND gi < 0
```
— coldspots only

Success: Only cells with statistically meaningful clustering remain.

使用

native.where

仅保留具有统计显著性的单元：

```
p_value < 0.05
```
— 95%置信度
```
p_value < 0.05 AND gi > 0
```
— 仅保留热点
```
p_value < 0.05 AND gi < 0
```
— 仅保留冷点

成功标志：仅保留具有统计意义的聚类单元。

Step 7: Save

步骤7：保存

Use

native.saveastable

to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.

Success: Validated workflow that can be uploaded via

carto workflows create

使用

native.saveastable

保存结果。H3/Quadbin列可直接在CARTO Builder中可视化，无需转换几何数据。

成功标志：可通过

carto workflows create

上传的已验证工作流。

Output Columns

输出列

Column	Meaning
`index`	Spatial index cell ID (H3 or Quadbin)
`gi`	Gi* z-score — positive = hotspot, negative = coldspot
`p_value`	Statistical significance — lower = more confident

The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.

列名	含义
`index`	空间索引单元ID（H3或Quadbin）
`gi`	Gi* z分数 — 正数=热点，负数=冷点
`p_value`	统计显著性 — 值越小=置信度越高

引擎会将这些列名声明为小写。有关Snowflake的情况，请参见注意事项中的提供商大小写说明。

Gotchas

注意事项

Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
```
H3
```
,
```
INDEX
```
,
```
GI
```
,
```
P_VALUE
```
,
```
H3_COUNT
```
in expressions. For dialect-specific SQL fragments (e.g.
```
DATETIME_TRUNC
```
below), see
```
carto-create-workflow/references/providers/<provider>.md
```
for the equivalents table.
The Getis-Ord component requires the Analytics Toolbox. Always run
```
carto workflows verify-remote --connection <conn>
```
to ensure the AT path is resolved.
```
carto workflows validate
```
is offline and cannot resolve AT location.
The output column is named
```
index
```
, not
```
h3
```
or
```
quadbin
```
. If you need to join back to original data, rename it (e.g. with
```
native.renamecolumn
```
).
If you call
```
native.h3boundary
```
to materialize cell geometries for visualization, the new column is named
```
<h3col>_geo
```
(e.g.
```
index_geo
```
), not
```
geom
```
. Reference it accordingly in downstream nodes.
The
```
valuecol
```
must be numeric. If you're counting features, the group-by step must produce a count column — don't pass the raw index column as the value.
Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
An empty result from the filter step (Step 6) usually means the k-ring size is too small or the data is too sparse for significant clustering. Try increasing
```
size
```
or lowering the resolution.
Date columns must be DATETIME type for spacetime Getis-Ord. CAST if your data has DATE or TIMESTAMP.
Temporal bandwidth choice dramatically affects results.
```
bandwidth=1
```
detects rapid changes;
```
bandwidth=3+
```
smooths over longer trends.
For time-series clustering, pre-filter to only significant cells (the 60% heuristic) to avoid clustering noise.
The spacetime classification component runs internally on the Gi* output -- do NOT filter by p_value before classification, or the trend test will have incomplete data.

提供商大小写与SQL方言：本技能使用小写记录列名（符合BigQuery / Databricks / Postgres / Redshift惯例）。在Snowflake中，未加引号的标识符会显示为大写——在表达式中需引用
```
H3
```
、
```
INDEX
```
、
```
GI
```
、
```
P_VALUE
```
、
```
H3_COUNT
```
。有关特定方言的SQL片段（如下方的
```
DATETIME_TRUNC
```
），请查看
```
carto-create-workflow/references/providers/<provider>.md
```
中的等效表。
Getis-Ord组件需要Analytics Toolbox。请始终运行
```
carto workflows verify-remote --connection <conn>
```
以确保AT路径已解析。
```
carto workflows validate
```
是离线操作，无法解析AT位置。
输出列名为
```
index
```
，而非
```
h3
```
或
```
quadbin
```
。如果需要连接回原始数据，请重命名该列（例如使用
```
native.renamecolumn
```
）。
如果调用
```
native.h3boundary
```
生成单元几何数据用于可视化，新列名为
```
<h3col>_geo
```
（例如
```
index_geo
```
），不是
```
geom
```
。在下游节点中请相应引用。
```
valuecol
```
必须为数值类型。如果是统计要素数量，分组步骤必须生成计数列——不要将原始索引列作为值传入。
分辨率过高+区域过大=单元数量过多，可能导致速度缓慢或超出内存限制。请从适中的分辨率开始，逐步调整。
过滤步骤（步骤6）返回空结果通常意味着K环大小过小或数据过于稀疏，无法形成显著聚类。尝试增大
```
size
```
或降低分辨率。
用于时空Getis-Ord的日期列必须为DATETIME类型。如果数据是DATE或TIMESTAMP类型，请进行转换。
时间带宽的选择会极大影响结果。
```
bandwidth=1
```
检测快速变化；
```
bandwidth=3+
```
平滑长期趋势。
对于时间序列聚类，预过滤仅保留显著单元（60%启发式规则）以避免聚类噪声。
时空分类组件在Gi*输出内部运行——不要在分类前按p_value过滤，否则趋势测试会缺少完整数据。

Spacetime Variants

时空变体

Getis-Ord Spacetime (

native.getisordspacetime

Extends basic Gi* to detect clusters in both space AND time.
Additional inputs:
```
kerneltime
```
(uniform/gaussian),
```
bandwidth
```
(number of time steps),
```
timeinterval
```
(week/month/day).
Data must be pre-aggregated into time bins (e.g. weekly counts per H3 cell).
Pipeline: points -> H3 -> create time column (BigQuery:
```
DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)
```
; Snowflake / Databricks / Postgres:
```
DATE_TRUNC('WEEK', datetime)
```
) -> GROUP BY (h3, time_bin) -> Getis-Ord Spacetime -> filter
```
p_value < 0.05 AND gi > 0
```
.

Spacetime Hotspot Classification (

native.spacetimehotspotsclassification

Chains AFTER Getis-Ord Spacetime output.
Classifies each cell's temporal trend: new hotspot, consecutive, intensifying, diminishing, sporadic, oscillating, historical.
Uses Modified Mann-Kendall trend test with a significance threshold (default 0.05).
Pipeline: ... -> Getis-Ord Spacetime -> Spacetime Hotspots Classification.

Time Series Clustering (

native.timeseriesclustering

Groups locations by similarity of their temporal Gi* pattern.
Chain: Getis-Ord Spacetime -> filter significant cells -> Cluster Time Series.
Method:
```
profile
```
(shape-based) or
```
value
```
(magnitude-based).
Filtering heuristic from the template: keep cells where >=60% of time steps have
```
p_value < 0.05
```
.

Getis-Ord时空分析 (

native.getisordspacetime

)：

将基础Gi*扩展到同时检测空间和时间中的聚类。
额外输入参数：
```
kerneltime
```
（uniform/gaussian）、
```
bandwidth
```
（时间步数）、
```
timeinterval
```
（周/月/日）。
数据必须预先聚合到时间区间中（例如每个H3单元的每周计数）。
流程：点数据 -> H3 -> 创建时间列（BigQuery：
```
DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)
```
；Snowflake / Databricks / Postgres：
```
DATE_TRUNC('WEEK', datetime)
```
） -> 按(h3, time_bin)分组 -> Getis-Ord时空分析 -> 过滤
```
p_value < 0.05 AND gi > 0
```
。

时空热点分类 (

native.spacetimehotspotsclassification

)：

在Getis-Ord时空分析输出之后链式调用。
对每个单元的时间趋势进行分类：新热点、持续热点、增强型、减弱型、偶发型、波动型、历史型。
使用修正Mann-Kendall趋势测试，显著性阈值默认0.05。
流程：... -> Getis-Ord时空分析 -> 时空热点分类。

时间序列聚类 (

native.timeseriesclustering

)：

根据时空Gi*模式的相似性对位置进行分组。
链式流程：Getis-Ord时空分析 -> 过滤显著单元 -> 时间序列聚类。
方法：
```
profile
```
（基于形状）或
```
value
```
（基于幅度）。
模板中的过滤启发式规则：保留至少60%时间步满足
```
p_value < 0.05
```
的单元。

Reference Templates

参考模板

These files are working examples (skill-local files in

hotspot-analysis/

, others in the project root):

File	Description
`poi_hotspot.json`	Stockholm amenity POIs — H3 res 9, uniform kernel, k=3
`space_time_hotspot.json`	Barcelona accidents — spacetime Gi*, H3 res 9, weekly bins
`spacetime_hotspot_classification.json`	London collisions — spacetime Gi* + classification, gaussian kernel

这些文件是可用示例（本技能本地文件位于

hotspot-analysis/

，其他文件位于项目根目录）：

文件	描述
`poi_hotspot.json`	斯德哥尔摩便利设施POI数据 — H3 res 9，uniform核函数，k=3
`space_time_hotspot.json`	巴塞罗那事故数据 — 时空Gi*，H3 res 9，周区间
`spacetime_hotspot_classification.json`	伦敦碰撞数据 — 时空Gi* + 分类，gaussian核函数

Common Variations

常见变体

Variant	How
Polygon input instead of points	Use `native.h3polyfill` instead of `native.h3frompoint`
Enrich existing grid	Use `native.h3enrich` to count points into a grid (avoids manual group-by + join)
Combine with other data	Join Getis-Ord output with enrichment or attribute tables before saving
Spacetime hotspots	Use `native.getisordspacetime` — see Spacetime Variants section above
Classify hotspot trends	Use `native.spacetimehotspotsclassification` — chains after spacetime Gi* output

变体	实现方式
多边形输入而非点数据	使用 `native.h3polyfill` 替代 `native.h3frompoint`
丰富现有网格	使用 `native.h3enrich` 将点数据计数到网格中（避免手动分组+连接）
与其他数据结合	在保存前将Getis-Ord输出与增强数据或属性表连接
时空热点	使用 `native.getisordspacetime` — 参见上方的时空变体部分
热点趋势分类	使用 `native.spacetimehotspotsclassification` — 在时空Gi*输出之后链式调用