carto-gwr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGeographically Weighted Regression (GWR)
Geographically Weighted Regression (GWR)
Builds CARTO Workflows that model spatially varying relationships between a dependent variable and one or more independent variables using GWR. Unlike global regression (one set of coefficients for the entire study area), GWR produces local coefficients per spatial unit, revealing how relationships change across space. Example: "bedrooms add $50k to price in downtown but only $20k in suburbs."
Prerequisites: Load for the development process, JSON structure, and validation commands.
carto-create-workflow在CARTO中构建工作流,使用GWR建模因变量与一个或多个自变量之间的空间变化关系。与全局回归(整个研究区域使用一组系数)不同,GWR会为每个空间单元生成局部系数,揭示关系如何随空间变化。示例:“在市中心,卧室能为房价增加5万美元,但在郊区仅增加2万美元。”
前提条件:加载以获取开发流程、JSON结构和验证命令。
carto-create-workflowInstructions
操作步骤
A GWR workflow follows this pipeline:
Source Data -> (Filter) -> Spatial Indexing (H3/Quadbin) -> Aggregation (dependent + independent vars per cell) -> GWR -> SaveGWR工作流遵循以下流程:
源数据 -> (过滤) -> 空间索引(H3/Quadbin)-> 聚合(每个单元的因变量+自变量)-> GWR -> 保存Step 1: Load Source Data
步骤1:加载源数据
Use . The input table must contain at least one numeric dependent variable and one or more numeric independent (predictor) variables.
native.gettablebynameSuccess: Node outputs a table with the necessary numeric columns.
使用。输入表必须包含至少一个数值型因变量和一个或多个数值型自变量(预测变量)。
native.gettablebyname成功标志:节点输出包含必要数值列的表格。
Step 2: Filter (if needed)
步骤2:过滤(如有需要)
Use or to narrow the dataset (e.g. remove nulls from key columns, filter by category or date range).
native.wheresimplifiednative.whereSuccess: Output contains only rows with valid, non-null values for the dependent and all independent variables.
使用或缩小数据集范围(例如,移除关键列中的空值,按类别或日期范围过滤)。
native.wheresimplifiednative.where成功标志:输出仅包含因变量和所有自变量均为有效非空值的行。
Step 3: Spatial Indexing
步骤3:空间索引
If the data is not already indexed, convert point geometries to spatial index cells:
- for H3
native.h3frompoint - for Quadbin
native.quadbinfromgeopoint
If the data already contains an H3 or Quadbin column (common for pre-aggregated datasets), skip this step.
Resolution guidance:
| Resolution | Cell size | Use case |
|---|---|---|
| H3 res 7 | ~5 km edge | City-level relationships |
| H3 res 8 | ~2 km edge | Neighborhood-level |
| H3 res 9 | ~500m edge | Street-level (needs dense data) |
Success: Every row has a spatial index column (e.g. ).
h3如果数据尚未建立索引,将点几何转换为空间索引单元:
- 使用生成H3索引
native.h3frompoint - 使用生成Quadbin索引
native.quadbinfromgeopoint
如果数据已包含H3或Quadbin列(预聚合数据集常见),则跳过此步骤。
分辨率指南:
| 分辨率 | 单元大小 | 使用场景 |
|---|---|---|
| H3 res 7 | ~5 km 边长 | 城市级关系分析 |
| H3 res 8 | ~2 km 边长 | 社区级分析 |
| H3 res 9 | ~500m 边长 | 街道级分析(需要高密度数据) |
成功标志:每一行都有一个空间索引列(例如)。
h3Step 4: Aggregate per Cell
步骤4:按单元聚合
Use to produce one row per cell with aggregated values for the dependent and all independent variables:
native.groupby- Group by: the spatial index column ()
h3 - Aggregation: (adapt to the actual columns)
price,avg,bedrooms,avg,bathrooms,avg
The dependent variable should be aggregated with or depending on what makes sense. Independent variables are typically averaged.
avgsumSuccess: Output has exactly one row per unique cell, with numeric columns for the target and all predictors.
使用为每个单元生成一行,包含因变量和所有自变量的聚合值:
native.groupby- 分组依据:空间索引列()
h3 - 聚合规则:(根据实际列调整)
price,avg,bedrooms,avg,bathrooms,avg
因变量应根据实际情况使用或进行聚合。自变量通常采用平均值聚合。
avgsum成功标志:输出每个唯一单元对应一行,包含目标变量和所有预测变量的数值列。
Step 5: Run GWR
步骤5:运行GWR
Use with:
native.gwr| Input | Description | Default |
|---|---|---|
| Column with H3/Quadbin indexes | |
| Target / dependent variable to model (must be numeric) | - |
| Predictor / independent variable columns (array of strings) | - |
| Weighting function for neighbors | |
| K-ring size (neighborhood radius in hops) | |
| Whether to fit an intercept term | |
Kernel options: (recommended -- smooth distance decay), , , , .
gaussianuniformtriangularquadraticquarticK-ring size: Controls the neighborhood radius.
- Too small (1-2): noisy, unstable coefficients.
- Too large (5+): over-smoothed, approaches global regression.
- Start with as a balanced default.
3
Success: Output contains per-cell columns: , , one coefficient column per independent variable, , and . (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
indexinterceptr_squaredresidual使用,参数如下:
native.gwr| 输入 | 描述 | 默认值 |
|---|---|---|
| 包含H3/Quadbin索引的列 | |
| 要建模的目标/因变量(必须为数值型) | - |
| 预测/自变量列(字符串数组) | - |
| 邻居加权函数 | |
| K环大小(以跳数为单位的邻域半径) | |
| 是否拟合截距项 | |
核函数选项:(推荐——平滑距离衰减)、、、、。
gaussianuniformtriangularquadraticquarticK环大小:控制邻域半径。
- 过小(1-2):系数噪声大、不稳定。
- 过大(5+):过度平滑,接近全局回归。
- 建议从开始,作为平衡的默认值。
3
成功标志:输出包含每个单元的列:、、每个自变量对应的系数列、和。(参见“注意事项”中的供应商大小写说明——Snowflake会将这些列显示为大写。)
indexinterceptr_squaredresidualStep 6: Save
步骤6:保存
Use to persist results. The spatial index column is directly visualizable in CARTO Builder -- style the map by coefficient columns to create coefficient maps showing spatial variation.
native.saveastableSuccess: Validated workflow that can be uploaded via .
carto workflows create使用保存结果。空间索引列可直接在CARTO Builder中可视化——通过系数列设置地图样式,创建展示空间变化的系数地图。
native.saveastable成功标志:经过验证的工作流,可通过上传。
carto workflows createOutput Columns
输出列
| Column | Meaning |
|---|---|
| Spatial index cell ID (H3 or Quadbin) |
| Local intercept term |
| Local coefficient for each independent variable |
| Local model fit (0-1) -- higher = better local explanation |
| Difference between observed and predicted value |
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.
| 列名 | 含义 |
|---|---|
| 空间索引单元ID(H3或Quadbin) |
| 局部截距项 |
| 每个自变量的局部系数 |
| 局部模型拟合度(0-1)——值越高,局部解释性越好 |
| 观测值与预测值的差值 |
引擎以小写形式声明这些列名。有关Snowflake的情况,请参见“注意事项”中的供应商大小写说明。
Gotchas
注意事项
- Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference ,
H3,INDEX,PRICE,R_SQUARED, etc. in expressions. SeeINTERCEPTfor casing rules and SQL dialect equivalents.carto-create-workflow/references/providers/<provider>.md - The GWR component requires the Analytics Toolbox. Always run to ensure the AT path is resolved.
carto workflows verify-remote --connection <conn>is offline and cannot resolve AT location.carto workflows validate - The dependent variable must be continuous and numeric. Categorical targets need a different approach (e.g. classification).
- Cells with null values in ANY variable (dependent or independent) will be excluded from the model. Pre-filter or impute nulls before running GWR.
- Multicollinearity between independent variables degrades results. If two predictors are highly correlated (e.g. and
bedrooms), drop one or combine them. Check correlation before including multiple similar variables.total_rooms - K-ring size matters significantly: too small = noisy, unstable coefficients; too large = over-smoothed results that approach a global regression. Start with and adjust.
3 - per cell indicates local model fit. Very low values across many cells suggest important predictors are missing from the model.
r_squared - The input is an array of column names (e.g.
features_columns), not a comma-separated string.["bedrooms", "bathrooms"] - The output column is named , not the original spatial index column name. If joining back to original data, rename it with
index.native.renamecolumn - Sparse data at high resolutions leads to unreliable coefficients. Ensure enough cells have data for all variables before choosing a high resolution.
- 供应商大小写与SQL方言:本技能文档中的列名采用小写(BigQuery / Databricks / Postgres / Redshift的惯例)。在Snowflake中,未加引号的标识符会显示为大写——在表达式中需引用、
H3、INDEX、PRICE、R_SQUARED等。请查看INTERCEPT获取大小写规则和SQL方言等效项。carto-create-workflow/references/providers/<provider>.md - GWR组件需要Analytics Toolbox。请始终运行以确保AT路径已解析。
carto workflows verify-remote --connection <conn>为离线操作,无法解析AT位置。carto workflows validate - 因变量必须是连续数值型。分类目标需要采用不同方法(例如分类算法)。
- 任何变量(因变量或自变量)包含空值的单元都会被排除在模型之外。运行GWR前,请预先过滤或填充空值。
- 自变量之间的多重共线性会降低结果质量。如果两个预测变量高度相关(例如和
bedrooms),请删除其中一个或合并它们。在包含多个相似变量前,请检查相关性。total_rooms - K环大小至关重要:过小会导致系数噪声大、不稳定;过大则会导致结果过度平滑,接近全局回归。建议从开始调整。
3 - 每个单元的表示局部模型拟合度。许多单元的
r_squared值极低表明模型缺少重要的预测变量。r_squared - 输入是列名数组(例如
features_columns),而非逗号分隔的字符串。["bedrooms", "bathrooms"] - 输出列名为,而非原始空间索引列名。如果要连接回原始数据,请使用
index重命名。native.renamecolumn - 高分辨率下的稀疏数据会导致系数不可靠。在选择高分辨率前,请确保足够多的单元包含所有变量的数据。
Reference Templates
参考模板
| Resource | Description |
|---|---|
| BQ Tutorial: Airbnb Listings Prices (GWR) | BigQuery step-by-step: Berlin Airbnb price vs bedrooms/bathrooms, H3 res 7, kring 3, Gaussian kernel |
| SF Tutorial: Airbnb Listings Prices (GWR) | Snowflake step-by-step: same analysis adapted for Snowflake |
Workflow template (available in CARTO Workspace): "Applying Geographical Weighted Regression (GWR) to model the local spatial relationships in your data"
Builder use case: Analyzing Airbnb ratings in Los Angeles -- models vs , , , enriched with Data Observatory sociodemographics. Uses H3 res 7, kring 3, Gaussian kernel.
overall_ratingvalue_reviewcleanlinesslocation| 资源 | 描述 |
|---|---|
| BQ教程:Airbnb房源价格(GWR) | BigQuery分步教程:柏林Airbnb房价与卧室/浴室的关系,H3 res 7,K环3,高斯核函数 |
| SF教程:Airbnb房源价格(GWR) | Snowflake分步教程:针对Snowflake适配的相同分析 |
工作流模板(可在CARTO Workspace中获取):“应用地理加权回归(GWR)建模数据中的局部空间关系”
Builder用例:分析洛杉矶Airbnb评分——建模与、、的关系,并结合Data Observatory的社会人口统计数据进行丰富。使用H3 res 7,K环3,高斯核函数。
overall_ratingvalue_reviewcleanlinesslocationCommon Variations
常见变体
| Variant | How |
|---|---|
| Pre-aggregated data (already one row per cell) | Skip Steps 3-4, go directly to GWR |
| Enrich with Data Observatory | Add |
| Coefficient comparison | Save results, then use Builder to style map by each coefficient column separately |
| Filter by model fit | Add |
| Combine with hotspot analysis | Run GWR first, then use residuals as input to Getis-Ord to find clusters of under/over-prediction |
| 变体 | 实现方式 |
|---|---|
| 预聚合数据(每个单元已对应一行) | 跳过步骤3-4,直接运行GWR |
| 结合Data Observatory丰富数据 | 在GWR之前添加 |
| 系数对比 | 保存结果,然后使用Builder分别按每个系数列设置地图样式 |
| 按模型拟合度过滤 | 在GWR之后添加 |
| 结合热点分析 | 先运行GWR,然后将残差作为输入用于Getis-Ord分析,找出预测不足/过度的聚类 |