carto-hotspot-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHotspot Analysis with Getis-Ord Gi*
基于Getis-Ord Gi*的热点分析
Builds CARTO Workflows that identify statistically significant spatial clusters (hotspots and coldspots) using the Getis-Ord Gi* statistic.
Prerequisites: Load for the development process, JSON structure, and validation commands.
carto-create-workflow构建使用Getis-Ord Gi*统计量识别具有统计显著性的空间聚类(热点和冷点)的CARTO工作流。
前提条件:加载以获取开发流程、JSON结构和验证命令。
carto-create-workflowInstructions
操作步骤
A hotspot workflow always follows this pipeline:
Source Data → (Filter) → Spatial Indexing → Aggregation → Getis-Ord Gi* → (Filter Significant) → Save热点分析工作流始终遵循以下流程:
源数据 → (过滤) → 空间索引 → 聚合 → Getis-Ord Gi* → (过滤显著性结果) → 保存Step 1: Load Source Data
步骤1:加载源数据
Use . The input table typically contains point geometries.
native.gettablebynameSuccess: Node outputs a table with a geometry column (e.g. ).
geom使用。输入表格通常包含点几何数据。
native.gettablebyname成功标志:节点输出包含几何列(例如)的表格。
geomStep 2: Filter (if needed)
步骤2:过滤(如有需要)
Use or to narrow the dataset before analysis (e.g. filter by category, date range, non-null values).
native.wheresimplifiednative.whereSuccess: Output contains only the subset relevant to the analysis.
使用或在分析前缩小数据集范围(例如按类别、日期范围、非空值过滤)。
native.wheresimplifiednative.where成功标志:输出仅包含与分析相关的子集。
Step 3: Build a Complete Grid
步骤3:构建完整网格
Preferred approach: First polyfill the study area boundary (e.g. district polygons) with to create a complete, gap-free grid. Then enrich this grid with the data to analyze (e.g. count points per cell via or a manual join + group by). This ensures every cell in the study area has a value (even if 0), which Getis-Ord needs — gaps in the grid distort the neighborhood calculations and can produce misleading results.
native.h3polyfillnative.h3enrichSimpler alternative (when no study area boundary is available): Convert point geometries directly to grid cells with or . Be aware this only produces cells where data exists, leaving gaps that may affect the statistic.
native.h3frompointnative.quadbinfromgeopointResolution guidance — higher resolution = smaller cells = more local patterns:
| Resolution | Cell size | Use case |
|---|---|---|
| H3 res 7 | ~5 km edge | District/city-level patterns |
| H3 res 8 | ~2 km edge | Neighborhood-level |
| H3 res 9 | ~500m edge | Street-level |
Success: A contiguous grid covering the study area, with every cell assigned a spatial index column (e.g. ).
h3推荐方法:首先使用对研究区域边界(例如行政区多边形)进行填充,创建一个完整无间隙的网格。然后用待分析的数据丰富该网格(例如通过或手动连接+分组统计每个网格单元的点数)。这确保了研究区域内的每个单元都有一个值(即使是0),这是Getis-Ord所需的——网格中的间隙会扭曲邻域计算,可能产生误导性结果。
native.h3polyfillnative.h3enrich简化替代方案(当没有研究区域边界时):使用或将点几何数据直接转换为网格单元。请注意,这仅会生成存在数据的单元,留下的间隙可能会影响统计结果。
native.h3frompointnative.quadbinfromgeopoint分辨率指南——分辨率越高=单元越小=更局部的模式:
| 分辨率 | 单元大小 | 使用场景 |
|---|---|---|
| H3 res 7 | ~5 km 边长 | 行政区/城市级模式 |
| H3 res 8 | ~2 km 边长 | 社区级 |
| H3 res 9 | ~500m 边长 | 街道级 |
成功标志:覆盖研究区域的连续网格,每个单元都被分配了空间索引列(例如)。
h3Step 4: Aggregate per Cell
步骤4:按单元聚合
Use to produce one row per cell with a numeric value:
native.groupby- Group by: the spatial index column ()
h3 - Aggregation: (or
h3,count/value_col,sum)value_col,avg
If using the polyfill approach, cells with no data should have a value of 0 (use via after joining).
COALESCE(count, 0)native.selectexpressionSuccess: Output has exactly one row per unique cell with a count/sum column — no gaps.
使用生成每个单元对应一行的数值结果:
native.groupby- 分组依据:空间索引列()
h3 - 聚合方式:(或
h3,count/value_col,sum)value_col,avg
如果使用填充方法,无数据的单元值应为0(连接后通过使用)。
native.selectexpressionCOALESCE(count, 0)成功标志:输出每个唯一单元对应一行,包含计数/求和列——无间隙。
Step 5: Run Getis-Ord Gi*
步骤5:运行Getis-Ord Gi*
Use with:
native.getisord| Input | Description | Default |
|---|---|---|
| Column with H3/Quadbin indexes | |
| Numeric column to analyze | |
| Weighting function for neighbors | |
| K-ring size (neighborhood radius in hops) | |
Kernel options: , , , , . Default to (equal weight to all neighbors) unless the user has a reason to decay weight with distance.
uniformtriangularquadraticquarticgaussianuniformK-ring size: Larger = smoother, broader patterns. Smaller = more localized clusters.
Success: Output contains , (z-score), and columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
indexgip_value使用,参数如下:
native.getisord| 输入参数 | 描述 | 默认值 |
|---|---|---|
| 包含H3/Quadbin索引的列 | |
| 待分析的数值列 | |
| 邻域加权函数 | |
| K环大小(邻域半径的跳数) | |
核函数选项:、、、、。默认使用(所有邻域权重相等),除非用户有理由随距离衰减权重。
uniformtriangularquadraticquarticgaussianuniformK环大小:值越大=模式越平滑、范围越广;值越小=聚类越局部化。
成功标志:输出包含每个单元的、(z分数)和列。(参见注意事项中的提供商大小写说明——Snowflake会将这些列名转为大写。)
indexgip_valueStep 6: Filter Significant Results (optional)
步骤6:过滤显著性结果(可选)
Use to keep only statistically significant cells:
native.where- — 95% confidence
p_value < 0.05 - — hotspots only
p_value < 0.05 AND gi > 0 - — coldspots only
p_value < 0.05 AND gi < 0
Success: Only cells with statistically meaningful clustering remain.
使用仅保留具有统计显著性的单元:
native.where- — 95%置信度
p_value < 0.05 - — 仅保留热点
p_value < 0.05 AND gi > 0 - — 仅保留冷点
p_value < 0.05 AND gi < 0
成功标志:仅保留具有统计意义的聚类单元。
Step 7: Save
步骤7:保存
Use to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.
native.saveastableSuccess: Validated workflow that can be uploaded via .
carto workflows create使用保存结果。H3/Quadbin列可直接在CARTO Builder中可视化,无需转换几何数据。
native.saveastable成功标志:可通过上传的已验证工作流。
carto workflows createOutput Columns
输出列
| Column | Meaning |
|---|---|
| Spatial index cell ID (H3 or Quadbin) |
| Gi* z-score — positive = hotspot, negative = coldspot |
| Statistical significance — lower = more confident |
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.
| 列名 | 含义 |
|---|---|
| 空间索引单元ID(H3或Quadbin) |
| Gi* z分数 — 正数=热点,负数=冷点 |
| 统计显著性 — 值越小=置信度越高 |
引擎会将这些列名声明为小写。有关Snowflake的情况,请参见注意事项中的提供商大小写说明。
Gotchas
注意事项
- Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference ,
H3,INDEX,GI,P_VALUEin expressions. For dialect-specific SQL fragments (e.g.H3_COUNTbelow), seeDATETIME_TRUNCfor the equivalents table.carto-create-workflow/references/providers/<provider>.md - The Getis-Ord component requires the Analytics Toolbox. Always run to ensure the AT path is resolved.
carto workflows verify-remote --connection <conn>is offline and cannot resolve AT location.carto workflows validate - The output column is named , not
indexorh3. If you need to join back to original data, rename it (e.g. withquadbin).native.renamecolumn - If you call to materialize cell geometries for visualization, the new column is named
native.h3boundary(e.g.<h3col>_geo), notindex_geo. Reference it accordingly in downstream nodes.geom - The must be numeric. If you're counting features, the group-by step must produce a count column — don't pass the raw index column as the value.
valuecol - Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
- An empty result from the filter step (Step 6) usually means the k-ring size is too small or the data is too sparse for significant clustering. Try increasing or lowering the resolution.
size - Date columns must be DATETIME type for spacetime Getis-Ord. CAST if your data has DATE or TIMESTAMP.
- Temporal bandwidth choice dramatically affects results. detects rapid changes;
bandwidth=1smooths over longer trends.bandwidth=3+ - For time-series clustering, pre-filter to only significant cells (the 60% heuristic) to avoid clustering noise.
- The spacetime classification component runs internally on the Gi* output -- do NOT filter by p_value before classification, or the trend test will have incomplete data.
- 提供商大小写与SQL方言:本技能使用小写记录列名(符合BigQuery / Databricks / Postgres / Redshift惯例)。在Snowflake中,未加引号的标识符会显示为大写——在表达式中需引用、
H3、INDEX、GI、P_VALUE。有关特定方言的SQL片段(如下方的H3_COUNT),请查看DATETIME_TRUNC中的等效表。carto-create-workflow/references/providers/<provider>.md - Getis-Ord组件需要Analytics Toolbox。请始终运行以确保AT路径已解析。
carto workflows verify-remote --connection <conn>是离线操作,无法解析AT位置。carto workflows validate - 输出列名为,而非
index或h3。如果需要连接回原始数据,请重命名该列(例如使用quadbin)。native.renamecolumn - 如果调用生成单元几何数据用于可视化,新列名为
native.h3boundary(例如<h3col>_geo),不是index_geo。在下游节点中请相应引用。geom - 必须为数值类型。如果是统计要素数量,分组步骤必须生成计数列——不要将原始索引列作为值传入。
valuecol - 分辨率过高+区域过大=单元数量过多,可能导致速度缓慢或超出内存限制。请从适中的分辨率开始,逐步调整。
- 过滤步骤(步骤6)返回空结果通常意味着K环大小过小或数据过于稀疏,无法形成显著聚类。尝试增大或降低分辨率。
size - 用于时空Getis-Ord的日期列必须为DATETIME类型。如果数据是DATE或TIMESTAMP类型,请进行转换。
- 时间带宽的选择会极大影响结果。检测快速变化;
bandwidth=1平滑长期趋势。bandwidth=3+ - 对于时间序列聚类,预过滤仅保留显著单元(60%启发式规则)以避免聚类噪声。
- 时空分类组件在Gi*输出内部运行——不要在分类前按p_value过滤,否则趋势测试会缺少完整数据。
Spacetime Variants
时空变体
Getis-Ord Spacetime ():
native.getisordspacetime- Extends basic Gi* to detect clusters in both space AND time.
- Additional inputs: (uniform/gaussian),
kerneltime(number of time steps),bandwidth(week/month/day).timeinterval - Data must be pre-aggregated into time bins (e.g. weekly counts per H3 cell).
- Pipeline: points -> H3 -> create time column (BigQuery: ; Snowflake / Databricks / Postgres:
DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)) -> GROUP BY (h3, time_bin) -> Getis-Ord Spacetime -> filterDATE_TRUNC('WEEK', datetime).p_value < 0.05 AND gi > 0
Spacetime Hotspot Classification ():
native.spacetimehotspotsclassification- Chains AFTER Getis-Ord Spacetime output.
- Classifies each cell's temporal trend: new hotspot, consecutive, intensifying, diminishing, sporadic, oscillating, historical.
- Uses Modified Mann-Kendall trend test with a significance threshold (default 0.05).
- Pipeline: ... -> Getis-Ord Spacetime -> Spacetime Hotspots Classification.
Time Series Clustering ():
native.timeseriesclustering- Groups locations by similarity of their temporal Gi* pattern.
- Chain: Getis-Ord Spacetime -> filter significant cells -> Cluster Time Series.
- Method: (shape-based) or
profile(magnitude-based).value - Filtering heuristic from the template: keep cells where >=60% of time steps have .
p_value < 0.05
Getis-Ord时空分析 ():
native.getisordspacetime- 将基础Gi*扩展到同时检测空间和时间中的聚类。
- 额外输入参数:(uniform/gaussian)、
kerneltime(时间步数)、bandwidth(周/月/日)。timeinterval - 数据必须预先聚合到时间区间中(例如每个H3单元的每周计数)。
- 流程:点数据 -> H3 -> 创建时间列(BigQuery:;Snowflake / Databricks / Postgres:
DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)) -> 按(h3, time_bin)分组 -> Getis-Ord时空分析 -> 过滤DATE_TRUNC('WEEK', datetime)。p_value < 0.05 AND gi > 0
时空热点分类 ():
native.spacetimehotspotsclassification- 在Getis-Ord时空分析输出之后链式调用。
- 对每个单元的时间趋势进行分类:新热点、持续热点、增强型、减弱型、偶发型、波动型、历史型。
- 使用修正Mann-Kendall趋势测试,显著性阈值默认0.05。
- 流程:... -> Getis-Ord时空分析 -> 时空热点分类。
时间序列聚类 ():
native.timeseriesclustering- 根据时空Gi*模式的相似性对位置进行分组。
- 链式流程:Getis-Ord时空分析 -> 过滤显著单元 -> 时间序列聚类。
- 方法:(基于形状)或
profile(基于幅度)。value - 模板中的过滤启发式规则:保留至少60%时间步满足的单元。
p_value < 0.05
Reference Templates
参考模板
These files are working examples (skill-local files in , others in the project root):
hotspot-analysis/| File | Description |
|---|---|
| Stockholm amenity POIs — H3 res 9, uniform kernel, k=3 |
| Barcelona accidents — spacetime Gi*, H3 res 9, weekly bins |
| London collisions — spacetime Gi* + classification, gaussian kernel |
这些文件是可用示例(本技能本地文件位于,其他文件位于项目根目录):
hotspot-analysis/| 文件 | 描述 |
|---|---|
| 斯德哥尔摩便利设施POI数据 — H3 res 9,uniform核函数,k=3 |
| 巴塞罗那事故数据 — 时空Gi*,H3 res 9,周区间 |
| 伦敦碰撞数据 — 时空Gi* + 分类,gaussian核函数 |
Common Variations
常见变体
| Variant | How |
|---|---|
| Polygon input instead of points | Use |
| Enrich existing grid | Use |
| Combine with other data | Join Getis-Ord output with enrichment or attribute tables before saving |
| Spacetime hotspots | Use |
| Classify hotspot trends | Use |
| 变体 | 实现方式 |
|---|---|
| 多边形输入而非点数据 | 使用 |
| 丰富现有网格 | 使用 |
| 与其他数据结合 | 在保存前将Getis-Ord输出与增强数据或属性表连接 |
| 时空热点 | 使用 |
| 热点趋势分类 | 使用 |