carto-gwr

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Geographically Weighted Regression (GWR)

Geographically Weighted Regression (GWR)

Builds CARTO Workflows that model spatially varying relationships between a dependent variable and one or more independent variables using GWR. Unlike global regression (one set of coefficients for the entire study area), GWR produces local coefficients per spatial unit, revealing how relationships change across space. Example: "bedrooms add $50k to price in downtown but only $20k in suburbs."
Prerequisites: Load
carto-create-workflow
for the development process, JSON structure, and validation commands.

在CARTO中构建工作流,使用GWR建模因变量与一个或多个自变量之间的空间变化关系。与全局回归(整个研究区域使用一组系数)不同,GWR会为每个空间单元生成局部系数,揭示关系如何随空间变化。示例:“在市中心,卧室能为房价增加5万美元,但在郊区仅增加2万美元。”
前提条件:加载
carto-create-workflow
以获取开发流程、JSON结构和验证命令。

Instructions

操作步骤

A GWR workflow follows this pipeline:
Source Data -> (Filter) -> Spatial Indexing (H3/Quadbin) -> Aggregation (dependent + independent vars per cell) -> GWR -> Save
GWR工作流遵循以下流程:
源数据 -> (过滤) -> 空间索引(H3/Quadbin)-> 聚合(每个单元的因变量+自变量)-> GWR -> 保存

Step 1: Load Source Data

步骤1:加载源数据

Use
native.gettablebyname
. The input table must contain at least one numeric dependent variable and one or more numeric independent (predictor) variables.
Success: Node outputs a table with the necessary numeric columns.
使用
native.gettablebyname
。输入表必须包含至少一个数值型因变量和一个或多个数值型自变量(预测变量)。
成功标志:节点输出包含必要数值列的表格。

Step 2: Filter (if needed)

步骤2:过滤(如有需要)

Use
native.wheresimplified
or
native.where
to narrow the dataset (e.g. remove nulls from key columns, filter by category or date range).
Success: Output contains only rows with valid, non-null values for the dependent and all independent variables.
使用
native.wheresimplified
native.where
缩小数据集范围(例如,移除关键列中的空值,按类别或日期范围过滤)。
成功标志:输出仅包含因变量和所有自变量均为有效非空值的行。

Step 3: Spatial Indexing

步骤3:空间索引

If the data is not already indexed, convert point geometries to spatial index cells:
  • native.h3frompoint
    for H3
  • native.quadbinfromgeopoint
    for Quadbin
If the data already contains an H3 or Quadbin column (common for pre-aggregated datasets), skip this step.
Resolution guidance:
ResolutionCell sizeUse case
H3 res 7~5 km edgeCity-level relationships
H3 res 8~2 km edgeNeighborhood-level
H3 res 9~500m edgeStreet-level (needs dense data)
Success: Every row has a spatial index column (e.g.
h3
).
如果数据尚未建立索引,将点几何转换为空间索引单元:
  • 使用
    native.h3frompoint
    生成H3索引
  • 使用
    native.quadbinfromgeopoint
    生成Quadbin索引
如果数据已包含H3或Quadbin列(预聚合数据集常见),则跳过此步骤。
分辨率指南
分辨率单元大小使用场景
H3 res 7~5 km 边长城市级关系分析
H3 res 8~2 km 边长社区级分析
H3 res 9~500m 边长街道级分析(需要高密度数据)
成功标志:每一行都有一个空间索引列(例如
h3
)。

Step 4: Aggregate per Cell

步骤4:按单元聚合

Use
native.groupby
to produce one row per cell with aggregated values for the dependent and all independent variables:
  • Group by: the spatial index column (
    h3
    )
  • Aggregation:
    price,avg,bedrooms,avg,bathrooms,avg
    (adapt to the actual columns)
The dependent variable should be aggregated with
avg
or
sum
depending on what makes sense. Independent variables are typically averaged.
Success: Output has exactly one row per unique cell, with numeric columns for the target and all predictors.
使用
native.groupby
为每个单元生成一行,包含因变量和所有自变量的聚合值:
  • 分组依据:空间索引列(
    h3
  • 聚合规则
    price,avg,bedrooms,avg,bathrooms,avg
    (根据实际列调整)
因变量应根据实际情况使用
avg
sum
进行聚合。自变量通常采用平均值聚合。
成功标志:输出每个唯一单元对应一行,包含目标变量和所有预测变量的数值列。

Step 5: Run GWR

步骤5:运行GWR

Use
native.gwr
with:
InputDescriptionDefault
index_column
Column with H3/Quadbin indexes
h3
label_column
Target / dependent variable to model (must be numeric)-
features_columns
Predictor / independent variable columns (array of strings)-
kernel_function
Weighting function for neighbors
gaussian
kring_distance
K-ring size (neighborhood radius in hops)
3
fit_intercept
Whether to fit an intercept term
true
Kernel options:
gaussian
(recommended -- smooth distance decay),
uniform
,
triangular
,
quadratic
,
quartic
.
K-ring size: Controls the neighborhood radius.
  • Too small (1-2): noisy, unstable coefficients.
  • Too large (5+): over-smoothed, approaches global regression.
  • Start with
    3
    as a balanced default.
Success: Output contains per-cell columns:
index
,
intercept
, one coefficient column per independent variable,
r_squared
, and
residual
. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
使用
native.gwr
,参数如下:
输入描述默认值
index_column
包含H3/Quadbin索引的列
h3
label_column
要建模的目标/因变量(必须为数值型)-
features_columns
预测/自变量列(字符串数组)-
kernel_function
邻居加权函数
gaussian
kring_distance
K环大小(以跳数为单位的邻域半径)
3
fit_intercept
是否拟合截距项
true
核函数选项
gaussian
(推荐——平滑距离衰减)、
uniform
triangular
quadratic
quartic
K环大小:控制邻域半径。
  • 过小(1-2):系数噪声大、不稳定。
  • 过大(5+):过度平滑,接近全局回归。
  • 建议从
    3
    开始,作为平衡的默认值。
成功标志:输出包含每个单元的列:
index
intercept
、每个自变量对应的系数列、
r_squared
residual
。(参见“注意事项”中的供应商大小写说明——Snowflake会将这些列显示为大写。)

Step 6: Save

步骤6:保存

Use
native.saveastable
to persist results. The spatial index column is directly visualizable in CARTO Builder -- style the map by coefficient columns to create coefficient maps showing spatial variation.
Success: Validated workflow that can be uploaded via
carto workflows create
.

使用
native.saveastable
保存结果。空间索引列可直接在CARTO Builder中可视化——通过系数列设置地图样式,创建展示空间变化的系数地图。
成功标志:经过验证的工作流,可通过
carto workflows create
上传。

Output Columns

输出列

ColumnMeaning
index
Spatial index cell ID (H3 or Quadbin)
intercept
Local intercept term
<variable_name>
Local coefficient for each independent variable
r_squared
Local model fit (0-1) -- higher = better local explanation
residual
Difference between observed and predicted value
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.

列名含义
index
空间索引单元ID(H3或Quadbin)
intercept
局部截距项
<variable_name>
每个自变量的局部系数
r_squared
局部模型拟合度(0-1)——值越高,局部解释性越好
residual
观测值与预测值的差值
引擎以小写形式声明这些列名。有关Snowflake的情况,请参见“注意事项”中的供应商大小写说明。

Gotchas

注意事项

  • Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
    H3
    ,
    INDEX
    ,
    PRICE
    ,
    R_SQUARED
    ,
    INTERCEPT
    , etc. in expressions. See
    carto-create-workflow/references/providers/<provider>.md
    for casing rules and SQL dialect equivalents.
  • The GWR component requires the Analytics Toolbox. Always run
    carto workflows verify-remote --connection <conn>
    to ensure the AT path is resolved.
    carto workflows validate
    is offline and cannot resolve AT location.
  • The dependent variable must be continuous and numeric. Categorical targets need a different approach (e.g. classification).
  • Cells with null values in ANY variable (dependent or independent) will be excluded from the model. Pre-filter or impute nulls before running GWR.
  • Multicollinearity between independent variables degrades results. If two predictors are highly correlated (e.g.
    bedrooms
    and
    total_rooms
    ), drop one or combine them. Check correlation before including multiple similar variables.
  • K-ring size matters significantly: too small = noisy, unstable coefficients; too large = over-smoothed results that approach a global regression. Start with
    3
    and adjust.
  • r_squared
    per cell indicates local model fit. Very low values across many cells suggest important predictors are missing from the model.
  • The
    features_columns
    input is an array of column names (e.g.
    ["bedrooms", "bathrooms"]
    ), not a comma-separated string.
  • The output column is named
    index
    , not the original spatial index column name. If joining back to original data, rename it with
    native.renamecolumn
    .
  • Sparse data at high resolutions leads to unreliable coefficients. Ensure enough cells have data for all variables before choosing a high resolution.

  • 供应商大小写与SQL方言:本技能文档中的列名采用小写(BigQuery / Databricks / Postgres / Redshift的惯例)。在Snowflake中,未加引号的标识符会显示为大写——在表达式中需引用
    H3
    INDEX
    PRICE
    R_SQUARED
    INTERCEPT
    等。请查看
    carto-create-workflow/references/providers/<provider>.md
    获取大小写规则和SQL方言等效项。
  • GWR组件需要Analytics Toolbox。请始终运行
    carto workflows verify-remote --connection <conn>
    以确保AT路径已解析。
    carto workflows validate
    为离线操作,无法解析AT位置。
  • 因变量必须是连续数值型。分类目标需要采用不同方法(例如分类算法)。
  • 任何变量(因变量或自变量)包含空值的单元都会被排除在模型之外。运行GWR前,请预先过滤或填充空值。
  • 自变量之间的多重共线性会降低结果质量。如果两个预测变量高度相关(例如
    bedrooms
    total_rooms
    ),请删除其中一个或合并它们。在包含多个相似变量前,请检查相关性。
  • K环大小至关重要:过小会导致系数噪声大、不稳定;过大则会导致结果过度平滑,接近全局回归。建议从
    3
    开始调整。
  • 每个单元的
    r_squared
    表示局部模型拟合度。许多单元的
    r_squared
    值极低表明模型缺少重要的预测变量。
  • features_columns
    输入是列名数组(例如
    ["bedrooms", "bathrooms"]
    ),而非逗号分隔的字符串。
  • 输出列名为
    index
    ,而非原始空间索引列名。如果要连接回原始数据,请使用
    native.renamecolumn
    重命名。
  • 高分辨率下的稀疏数据会导致系数不可靠。在选择高分辨率前,请确保足够多的单元包含所有变量的数据。

Reference Templates

参考模板

ResourceDescription
BQ Tutorial: Airbnb Listings Prices (GWR)BigQuery step-by-step: Berlin Airbnb price vs bedrooms/bathrooms, H3 res 7, kring 3, Gaussian kernel
SF Tutorial: Airbnb Listings Prices (GWR)Snowflake step-by-step: same analysis adapted for Snowflake
Workflow template (available in CARTO Workspace): "Applying Geographical Weighted Regression (GWR) to model the local spatial relationships in your data"
Builder use case: Analyzing Airbnb ratings in Los Angeles -- models
overall_rating
vs
value_review
,
cleanliness
,
location
, enriched with Data Observatory sociodemographics. Uses H3 res 7, kring 3, Gaussian kernel.

资源描述
BQ教程:Airbnb房源价格(GWR)BigQuery分步教程:柏林Airbnb房价与卧室/浴室的关系,H3 res 7,K环3,高斯核函数
SF教程:Airbnb房源价格(GWR)Snowflake分步教程:针对Snowflake适配的相同分析
工作流模板(可在CARTO Workspace中获取):“应用地理加权回归(GWR)建模数据中的局部空间关系”
Builder用例:分析洛杉矶Airbnb评分——建模
overall_rating
value_review
cleanliness
location
的关系,并结合Data Observatory的社会人口统计数据进行丰富。使用H3 res 7,K环3,高斯核函数。

Common Variations

常见变体

VariantHow
Pre-aggregated data (already one row per cell)Skip Steps 3-4, go directly to GWR
Enrich with Data ObservatoryAdd
native.enrichgrid
before GWR to include sociodemographic predictors
Coefficient comparisonSave results, then use Builder to style map by each coefficient column separately
Filter by model fitAdd
native.where
after GWR to keep only cells with
r_squared > 0.5
(or another threshold)
Combine with hotspot analysisRun GWR first, then use residuals as input to Getis-Ord to find clusters of under/over-prediction
变体实现方式
预聚合数据(每个单元已对应一行)跳过步骤3-4,直接运行GWR
结合Data Observatory丰富数据在GWR之前添加
native.enrichgrid
以纳入社会人口统计预测变量
系数对比保存结果,然后使用Builder分别按每个系数列设置地图样式
按模型拟合度过滤在GWR之后添加
native.where
,仅保留
r_squared > 0.5
(或其他阈值)的单元
结合热点分析先运行GWR,然后将残差作为输入用于Getis-Ord分析,找出预测不足/过度的聚类