carto-gwr

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Geographically Weighted Regression (GWR)

Builds CARTO Workflows that model spatially varying relationships between a dependent variable and one or more independent variables using GWR. Unlike global regression (one set of coefficients for the entire study area), GWR produces local coefficients per spatial unit, revealing how relationships change across space. Example: "bedrooms add $50k to price in downtown but only $20k in suburbs."

Prerequisites: Load

carto-create-workflow

for the development process, JSON structure, and validation commands.

在CARTO中构建工作流，使用GWR建模因变量与一个或多个自变量之间的空间变化关系。与全局回归（整个研究区域使用一组系数）不同，GWR会为每个空间单元生成局部系数，揭示关系如何随空间变化。示例：“在市中心，卧室能为房价增加5万美元，但在郊区仅增加2万美元。”

前提条件：加载

carto-create-workflow

以获取开发流程、JSON结构和验证命令。

Instructions

操作步骤

A GWR workflow follows this pipeline:

Source Data -> (Filter) -> Spatial Indexing (H3/Quadbin) -> Aggregation (dependent + independent vars per cell) -> GWR -> Save

GWR工作流遵循以下流程：

源数据 -> (过滤) -> 空间索引（H3/Quadbin）-> 聚合（每个单元的因变量+自变量）-> GWR -> 保存

Step 1: Load Source Data

步骤1：加载源数据

Use

native.gettablebyname

. The input table must contain at least one numeric dependent variable and one or more numeric independent (predictor) variables.

Success: Node outputs a table with the necessary numeric columns.

使用

native.gettablebyname

。输入表必须包含至少一个数值型因变量和一个或多个数值型自变量（预测变量）。

成功标志：节点输出包含必要数值列的表格。

Step 2: Filter (if needed)

步骤2：过滤（如有需要）

Use

native.wheresimplified

native.where

to narrow the dataset (e.g. remove nulls from key columns, filter by category or date range).

Success: Output contains only rows with valid, non-null values for the dependent and all independent variables.

使用

native.wheresimplified

或

native.where

缩小数据集范围（例如，移除关键列中的空值，按类别或日期范围过滤）。

成功标志：输出仅包含因变量和所有自变量均为有效非空值的行。

Step 3: Spatial Indexing

步骤3：空间索引

If the data is not already indexed, convert point geometries to spatial index cells:

```
native.h3frompoint
```
for H3
```
native.quadbinfromgeopoint
```
for Quadbin

If the data already contains an H3 or Quadbin column (common for pre-aggregated datasets), skip this step.

Resolution guidance:

Resolution	Cell size	Use case
H3 res 7	~5 km edge	City-level relationships
H3 res 8	~2 km edge	Neighborhood-level
H3 res 9	~500m edge	Street-level (needs dense data)

Success: Every row has a spatial index column (e.g.

h3

如果数据尚未建立索引，将点几何转换为空间索引单元：

使用
```
native.h3frompoint
```
生成H3索引
使用
```
native.quadbinfromgeopoint
```
生成Quadbin索引

如果数据已包含H3或Quadbin列（预聚合数据集常见），则跳过此步骤。

分辨率指南：

分辨率	单元大小	使用场景
H3 res 7	~5 km 边长	城市级关系分析
H3 res 8	~2 km 边长	社区级分析
H3 res 9	~500m 边长	街道级分析（需要高密度数据）

成功标志：每一行都有一个空间索引列（例如

h3

）。

Step 4: Aggregate per Cell

步骤4：按单元聚合

Use

native.groupby

to produce one row per cell with aggregated values for the dependent and all independent variables:

Group by: the spatial index column (
```
h3
```
)
Aggregation:
```
price,avg,bedrooms,avg,bathrooms,avg
```
(adapt to the actual columns)

The dependent variable should be aggregated with

avg

sum

depending on what makes sense. Independent variables are typically averaged.

Success: Output has exactly one row per unique cell, with numeric columns for the target and all predictors.

使用

native.groupby

为每个单元生成一行，包含因变量和所有自变量的聚合值：

分组依据：空间索引列（
```
h3
```
）
聚合规则：
```
price,avg,bedrooms,avg,bathrooms,avg
```
（根据实际列调整）

因变量应根据实际情况使用

avg

或

sum

进行聚合。自变量通常采用平均值聚合。

成功标志：输出每个唯一单元对应一行，包含目标变量和所有预测变量的数值列。

Step 5: Run GWR

步骤5：运行GWR

Use

native.gwr

with:

Input	Description	Default
`index_column`	Column with H3/Quadbin indexes	`h3`
`label_column`	Target / dependent variable to model (must be numeric)	-
`features_columns`	Predictor / independent variable columns (array of strings)	-
`kernel_function`	Weighting function for neighbors	`gaussian`
`kring_distance`	K-ring size (neighborhood radius in hops)	`3`
`fit_intercept`	Whether to fit an intercept term	`true`

Kernel options:

gaussian

(recommended -- smooth distance decay),

uniform

triangular

quadratic

quartic

K-ring size: Controls the neighborhood radius.

Too small (1-2): noisy, unstable coefficients.
Too large (5+): over-smoothed, approaches global regression.
Start with
```
3
```
as a balanced default.

Success: Output contains per-cell columns:

index

intercept

, one coefficient column per independent variable,

r_squared

, and

residual

. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)

使用

native.gwr

，参数如下：

输入	描述	默认值
`index_column`	包含H3/Quadbin索引的列	`h3`
`label_column`	要建模的目标/因变量（必须为数值型）	-
`features_columns`	预测/自变量列（字符串数组）	-
`kernel_function`	邻居加权函数	`gaussian`
`kring_distance`	K环大小（以跳数为单位的邻域半径）	`3`
`fit_intercept`	是否拟合截距项	`true`

核函数选项：

gaussian

（推荐——平滑距离衰减）、

uniform

、

triangular

、

quadratic

、

quartic

。

K环大小：控制邻域半径。

过小（1-2）：系数噪声大、不稳定。
过大（5+）：过度平滑，接近全局回归。
建议从
```
3
```
开始，作为平衡的默认值。

成功标志：输出包含每个单元的列：

index

、

intercept

、每个自变量对应的系数列、

r_squared

和

residual

。（参见“注意事项”中的供应商大小写说明——Snowflake会将这些列显示为大写。）

Step 6: Save

步骤6：保存

Use

native.saveastable

to persist results. The spatial index column is directly visualizable in CARTO Builder -- style the map by coefficient columns to create coefficient maps showing spatial variation.

Success: Validated workflow that can be uploaded via

carto workflows create

使用

native.saveastable

保存结果。空间索引列可直接在CARTO Builder中可视化——通过系数列设置地图样式，创建展示空间变化的系数地图。

成功标志：经过验证的工作流，可通过

carto workflows create

上传。

Output Columns

输出列

Column	Meaning
`index`	Spatial index cell ID (H3 or Quadbin)
`intercept`	Local intercept term
`<variable_name>`	Local coefficient for each independent variable
`r_squared`	Local model fit (0-1) -- higher = better local explanation
`residual`	Difference between observed and predicted value

The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.

列名	含义
`index`	空间索引单元ID（H3或Quadbin）
`intercept`	局部截距项
`<variable_name>`	每个自变量的局部系数
`r_squared`	局部模型拟合度（0-1）——值越高，局部解释性越好
`residual`	观测值与预测值的差值

引擎以小写形式声明这些列名。有关Snowflake的情况，请参见“注意事项”中的供应商大小写说明。

Gotchas

注意事项

Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
```
H3
```
,
```
INDEX
```
,
```
PRICE
```
,
```
R_SQUARED
```
,
```
INTERCEPT
```
, etc. in expressions. See
```
carto-create-workflow/references/providers/<provider>.md
```
for casing rules and SQL dialect equivalents.
The GWR component requires the Analytics Toolbox. Always run
```
carto workflows verify-remote --connection <conn>
```
to ensure the AT path is resolved.
```
carto workflows validate
```
is offline and cannot resolve AT location.
The dependent variable must be continuous and numeric. Categorical targets need a different approach (e.g. classification).
Cells with null values in ANY variable (dependent or independent) will be excluded from the model. Pre-filter or impute nulls before running GWR.
Multicollinearity between independent variables degrades results. If two predictors are highly correlated (e.g.
```
bedrooms
```
and
```
total_rooms
```
), drop one or combine them. Check correlation before including multiple similar variables.
K-ring size matters significantly: too small = noisy, unstable coefficients; too large = over-smoothed results that approach a global regression. Start with
```
3
```
and adjust.
```
r_squared
```
per cell indicates local model fit. Very low values across many cells suggest important predictors are missing from the model.
The
```
features_columns
```
input is an array of column names (e.g.
```
["bedrooms", "bathrooms"]
```
), not a comma-separated string.
The output column is named
```
index
```
, not the original spatial index column name. If joining back to original data, rename it with
```
native.renamecolumn
```
.
Sparse data at high resolutions leads to unreliable coefficients. Ensure enough cells have data for all variables before choosing a high resolution.

供应商大小写与SQL方言：本技能文档中的列名采用小写（BigQuery / Databricks / Postgres / Redshift的惯例）。在Snowflake中，未加引号的标识符会显示为大写——在表达式中需引用
```
H3
```
、
```
INDEX
```
、
```
PRICE
```
、
```
R_SQUARED
```
、
```
INTERCEPT
```
等。请查看
```
carto-create-workflow/references/providers/<provider>.md
```
获取大小写规则和SQL方言等效项。
GWR组件需要Analytics Toolbox。请始终运行
```
carto workflows verify-remote --connection <conn>
```
以确保AT路径已解析。
```
carto workflows validate
```
为离线操作，无法解析AT位置。
因变量必须是连续数值型。分类目标需要采用不同方法（例如分类算法）。
任何变量（因变量或自变量）包含空值的单元都会被排除在模型之外。运行GWR前，请预先过滤或填充空值。
自变量之间的多重共线性会降低结果质量。如果两个预测变量高度相关（例如
```
bedrooms
```
和
```
total_rooms
```
），请删除其中一个或合并它们。在包含多个相似变量前，请检查相关性。
K环大小至关重要：过小会导致系数噪声大、不稳定；过大则会导致结果过度平滑，接近全局回归。建议从
```
3
```
开始调整。
每个单元的
```
r_squared
```
表示局部模型拟合度。许多单元的
```
r_squared
```
值极低表明模型缺少重要的预测变量。
```
features_columns
```
输入是列名数组（例如
```
["bedrooms", "bathrooms"]
```
），而非逗号分隔的字符串。
输出列名为
```
index
```
，而非原始空间索引列名。如果要连接回原始数据，请使用
```
native.renamecolumn
```
重命名。
高分辨率下的稀疏数据会导致系数不可靠。在选择高分辨率前，请确保足够多的单元包含所有变量的数据。

Reference Templates

参考模板

Resource	Description
BQ Tutorial: Airbnb Listings Prices (GWR)	BigQuery step-by-step: Berlin Airbnb price vs bedrooms/bathrooms, H3 res 7, kring 3, Gaussian kernel
SF Tutorial: Airbnb Listings Prices (GWR)	Snowflake step-by-step: same analysis adapted for Snowflake

Workflow template (available in CARTO Workspace): "Applying Geographical Weighted Regression (GWR) to model the local spatial relationships in your data"

Builder use case: Analyzing Airbnb ratings in Los Angeles -- models

overall_rating

value_review

cleanliness

location

, enriched with Data Observatory sociodemographics. Uses H3 res 7, kring 3, Gaussian kernel.

资源	描述
BQ教程：Airbnb房源价格（GWR）	BigQuery分步教程：柏林Airbnb房价与卧室/浴室的关系，H3 res 7，K环3，高斯核函数
SF教程：Airbnb房源价格（GWR）	Snowflake分步教程：针对Snowflake适配的相同分析

工作流模板（可在CARTO Workspace中获取）：“应用地理加权回归（GWR）建模数据中的局部空间关系”

Builder用例：分析洛杉矶Airbnb评分——建模

overall_rating

与

value_review

、

cleanliness

、

location

的关系，并结合Data Observatory的社会人口统计数据进行丰富。使用H3 res 7，K环3，高斯核函数。

Common Variations

常见变体

Variant	How
Pre-aggregated data (already one row per cell)	Skip Steps 3-4, go directly to GWR
Enrich with Data Observatory	Add `native.enrichgrid` before GWR to include sociodemographic predictors
Coefficient comparison	Save results, then use Builder to style map by each coefficient column separately
Filter by model fit	Add `native.where` after GWR to keep only cells with `r_squared > 0.5` (or another threshold)
Combine with hotspot analysis	Run GWR first, then use residuals as input to Getis-Ord to find clusters of under/over-prediction

变体	实现方式
预聚合数据（每个单元已对应一行）	跳过步骤3-4，直接运行GWR
结合Data Observatory丰富数据	在GWR之前添加 `native.enrichgrid` 以纳入社会人口统计预测变量
系数对比	保存结果，然后使用Builder分别按每个系数列设置地图样式
按模型拟合度过滤	在GWR之后添加 `native.where` ，仅保留 `r_squared > 0.5` （或其他阈值）的单元
结合热点分析	先运行GWR，然后将残差作为输入用于Getis-Ord分析，找出预测不足/过度的聚类