Hotspot Analysis with Getis-Ord Gi*
Builds CARTO Workflows that identify statistically significant spatial clusters (hotspots and coldspots) using the Getis-Ord Gi* statistic.
Prerequisites: Load
for the development process, JSON structure, and validation commands.
Instructions
A hotspot workflow always follows this pipeline:
Source Data → (Filter) → Spatial Indexing → Aggregation → Getis-Ord Gi* → (Filter Significant) → Save
Step 1: Load Source Data
Use
. The input table typically contains point geometries.
Success: Node outputs a table with a geometry column (e.g.
).
Step 2: Filter (if needed)
Use
or
to narrow the dataset before analysis (e.g. filter by category, date range, non-null values).
Success: Output contains only the subset relevant to the analysis.
Step 3: Build a Complete Grid
Preferred approach: First polyfill the study area boundary (e.g. district polygons) with
to create a complete, gap-free grid. Then enrich this grid with the data to analyze (e.g. count points per cell via
or a manual join + group by). This ensures every cell in the study area has a value (even if 0), which Getis-Ord needs — gaps in the grid distort the neighborhood calculations and can produce misleading results.
Simpler alternative (when no study area boundary is available): Convert point geometries directly to grid cells with
or
native.quadbinfromgeopoint
. Be aware this only produces cells where data exists, leaving gaps that may affect the statistic.
Resolution guidance — higher resolution = smaller cells = more local patterns:
| Resolution | Cell size | Use case |
|---|
| H3 res 7 | ~5 km edge | District/city-level patterns |
| H3 res 8 | ~2 km edge | Neighborhood-level |
| H3 res 9 | ~500m edge | Street-level |
Success: A contiguous grid covering the study area, with every cell assigned a spatial index column (e.g.
).
Step 4: Aggregate per Cell
Use
to produce one row per cell with a numeric value:
- Group by: the spatial index column ()
- Aggregation: (or / )
If using the polyfill approach, cells with no data should have a value of 0 (use
via
after joining).
Success: Output has exactly one row per unique cell with a count/sum column — no gaps.
Step 5: Run Getis-Ord Gi*
| Input | Description | Default |
|---|
| Column with H3/Quadbin indexes | |
| Numeric column to analyze | |
| Weighting function for neighbors | |
| K-ring size (neighborhood radius in hops) | |
Kernel options:
,
,
,
,
. Default to
(equal weight to all neighbors) unless the user has a reason to decay weight with distance.
K-ring size: Larger = smoother, broader patterns. Smaller = more localized clusters.
Success: Output contains
,
(z-score), and
columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
Step 6: Filter Significant Results (optional)
Use
to keep only statistically significant cells:
- — 95% confidence
p_value < 0.05 AND gi > 0
— hotspots only
p_value < 0.05 AND gi < 0
— coldspots only
Success: Only cells with statistically meaningful clustering remain.
Step 7: Save
Use
to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.
Success: Validated workflow that can be uploaded via
.
Output Columns
| Column | Meaning |
|---|
| Spatial index cell ID (H3 or Quadbin) |
| Gi* z-score — positive = hotspot, negative = coldspot |
| Statistical significance — lower = more confident |
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.
Gotchas
- Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference , , , , in expressions. For dialect-specific SQL fragments (e.g. below), see
carto-create-workflow/references/providers/<provider>.md
for the equivalents table.
- The Getis-Ord component requires the Analytics Toolbox. Always run
carto workflows verify-remote --connection <conn>
to ensure the AT path is resolved. is offline and cannot resolve AT location.
- The output column is named , not or . If you need to join back to original data, rename it (e.g. with ).
- If you call to materialize cell geometries for visualization, the new column is named (e.g. ), not . Reference it accordingly in downstream nodes.
- The must be numeric. If you're counting features, the group-by step must produce a count column — don't pass the raw index column as the value.
- Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
- An empty result from the filter step (Step 6) usually means the k-ring size is too small or the data is too sparse for significant clustering. Try increasing or lowering the resolution.
- Date columns must be DATETIME type for spacetime Getis-Ord. CAST if your data has DATE or TIMESTAMP.
- Temporal bandwidth choice dramatically affects results. detects rapid changes; smooths over longer trends.
- For time-series clustering, pre-filter to only significant cells (the 60% heuristic) to avoid clustering noise.
- The spacetime classification component runs internally on the Gi* output -- do NOT filter by p_value before classification, or the trend test will have incomplete data.
Spacetime Variants
- Extends basic Gi* to detect clusters in both space AND time.
- Additional inputs: (uniform/gaussian), (number of time steps), (week/month/day).
- Data must be pre-aggregated into time bins (e.g. weekly counts per H3 cell).
- Pipeline: points -> H3 -> create time column (BigQuery:
DATETIME_TRUNC(CAST(datetime AS TIMESTAMP), WEEK)
; Snowflake / Databricks / Postgres: DATE_TRUNC('WEEK', datetime)
) -> GROUP BY (h3, time_bin) -> Getis-Ord Spacetime -> filter p_value < 0.05 AND gi > 0
.
Spacetime Hotspot Classification (
native.spacetimehotspotsclassification
):
- Chains AFTER Getis-Ord Spacetime output.
- Classifies each cell's temporal trend: new hotspot, consecutive, intensifying, diminishing, sporadic, oscillating, historical.
- Uses Modified Mann-Kendall trend test with a significance threshold (default 0.05).
- Pipeline: ... -> Getis-Ord Spacetime -> Spacetime Hotspots Classification.
Time Series Clustering (
native.timeseriesclustering
):
- Groups locations by similarity of their temporal Gi* pattern.
- Chain: Getis-Ord Spacetime -> filter significant cells -> Cluster Time Series.
- Method: (shape-based) or (magnitude-based).
- Filtering heuristic from the template: keep cells where >=60% of time steps have .
Reference Templates
These files are working examples (skill-local files in
, others in the project root):
| File | Description |
|---|
| Stockholm amenity POIs — H3 res 9, uniform kernel, k=3 |
| Barcelona accidents — spacetime Gi*, H3 res 9, weekly bins |
spacetime_hotspot_classification.json
| London collisions — spacetime Gi* + classification, gaussian kernel |
Common Variations
| Variant | How |
|---|
| Polygon input instead of points | Use instead of |
| Enrich existing grid | Use to count points into a grid (avoids manual group-by + join) |
| Combine with other data | Join Getis-Ord output with enrichment or attribute tables before saving |
| Spacetime hotspots | Use — see Spacetime Variants section above |
| Classify hotspot trends | Use native.spacetimehotspotsclassification
— chains after spacetime Gi* output |