gnnwr

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GNNWR - Geographically Neural Network Weighted Regression

GNNWR - 地理神经网络加权回归

Quick Reference

快速参考

python
from gnnwr import models, datasets, utils
import pandas as pd

data = pd.read_csv("data.csv")

train, val, test = datasets.init_dataset(
    data=data, test_ratio=0.2, valid_ratio=0.1,
    x_column=["x1", "x2", "x3"], y_column=["y"],
    spatial_column=["lon", "lat"],  # REQUIRED: geographic coords
    batch_size=32, process_fn="minmax_scale"
)

model = models.GNNWR(train, val, test, use_gpu=True, optimizer="Adam", start_lr=0.01)
model.run(max_epoch=200, early_stop=30)

result = model.reg_result(only_return=True)  # DataFrame: coef_x1, coef_x2, ..., Pred_y
print(model.result())                         # R², AIC, RMSE, F-tests summary
python
from gnnwr import models, datasets, utils
import pandas as pd

data = pd.read_csv("data.csv")

train, val, test = datasets.init_dataset(
    data=data, test_ratio=0.2, valid_ratio=0.1,
    x_column=["x1", "x2", "x3"], y_column=["y"],
    spatial_column=["lon", "lat"],  # 必填:地理坐标
    batch_size=32, process_fn="minmax_scale"
)

model = models.GNNWR(train, val, test, use_gpu=True, optimizer="Adam", start_lr=0.01)
model.run(max_epoch=200, early_stop=30)

result = model.reg_result(only_return=True)  # DataFrame: coef_x1, coef_x2, ..., Pred_y
print(model.result())                         # R², AIC, RMSE, F检验汇总

Spatiotemporal (GTNNWR)

时空回归(GTNNWR)

python
train, val, test = datasets.init_dataset(
    data=data, ...,
    spatial_column=["lon", "lat"],
    temp_column=["year", "month"],  # add temporal coords
    use_model="gtnnwr"
)
model = models.GTNNWR(train, val, test, use_gpu=True)
python
train, val, test = datasets.init_dataset(
    data=data, ...,
    spatial_column=["lon", "lat"],
    temp_column=["year", "month"],  # 添加时间坐标
    use_model="gtnnwr"
)
model = models.GTNNWR(train, val, test, use_gpu=True)

Large-Scale (N > 10k) — KNN Mode

大规模数据集(N > 10k)——KNN模式

python
train, val, test = datasets.init_dataset(
    data=data, ..., knn_k=500  # only k nearest neighbor distances
)
python
train, val, test = datasets.init_dataset(
    data=data, ..., knn_k=500  # 仅计算k个最近邻的距离
)

Memory: N=100k full=55GB → knn_k=2000 only 763MB

内存占用:N=100k全矩阵=55GB → knn_k=2000仅需763MB

undefined
undefined

Key Classes

关键类

ClassPurpose
models.GNNWR
Spatial regression with neural network geographic weighting
models.GTNNWR
Spatiotemporal regression with temporal + spatial weighting
datasets.init_dataset
Data splitting, normalization, distance matrix construction
utils.Visualize
Built-in folium interactive maps for coefficients and predictions
用途
models.GNNWR
基于神经网络地理加权的空间回归
models.GTNNWR
结合时间与空间加权的时空回归
datasets.init_dataset
数据拆分、归一化、距离矩阵构建
utils.Visualize
内置folium交互式系数与预测地图

Essential Operations

核心操作

init_dataset Parameters

init_dataset参数

ParameterDefaultNotes
knn_k
NoneKNN sparse distance; None=full matrix
process_fn
"minmax_scale"or "standard_scale"
spatial_fun
BasicDistanceEuclidean; or ManhattanDistance
Reference
None"train", "train_val", or custom DataFrame
sample_seed
42Reproducibility
参数默认值说明
knn_k
NoneKNN稀疏距离;None表示全矩阵
process_fn
"minmax_scale"可选"standard_scale"
spatial_fun
BasicDistance欧氏距离;可选曼哈顿距离ManhattanDistance
Reference
None可选"train"、"train_val"或自定义DataFrame
sample_seed
42保证结果可复现

Model Hyperparameters

模型超参数

ParameterRecommendedNotes
optimizer
"Adam"Also: SGD, AdamW, Adagrad, RMSprop
start_lr
0.01–0.1Critical tuning point
drop_out
0.20.0–0.5
dense_layers
None (auto)Auto: power-of-2 sequence from input_dim to n_coef
early_stop
20–50Patience; -1=disabled
batch_norm
TrueStabilizes training
use_ols
TrueOLS-initialized output layer
参数推荐值说明
optimizer
"Adam"可选:SGD、AdamW、Adagrad、RMSprop
start_lr
0.01–0.1关键调优参数
drop_out
0.2取值范围0.0–0.5
dense_layers
None(自动)自动生成:从输入维度到系数数量的2的幂次序列
early_stop
20–50早停 patience;-1表示禁用
batch_norm
True稳定训练过程
use_ols
True输出层采用OLS初始化

Diagnostics

诊断分析

python
diag = model._test_diagnosis
diag.R2()           # always available
diag.RMSE()         # always available
diag.AIC()          # needs lite=False (auto for N<10k)
diag.AICc()         # corrected AIC
diag.F1_Global()    # GNNWR vs OLS significance
diag.F2_Global()    # spatial weight significance
diag.F3_Local()     # per-variable significance → (dict1, dict2)
lite=True
(auto when N>10k): only R²/RMSE; Hat-matrix diagnostics skipped.
python
diag = model._test_diagnosis
diag.R2()           # 始终可用
diag.RMSE()         # 始终可用
diag.AIC()          # 需要设置lite=False(N<10k时自动启用)
diag.AICc()         # 校正版AIC
diag.F1_Global()    # GNNWR与OLS的显著性对比
diag.F2_Global()    # 空间权重的显著性
diag.F3_Local()     # 各变量的显著性 → 返回(dict1, dict2)
lite=True
(N>10k时自动启用):仅保留R²/RMSE;跳过帽子矩阵诊断。

Visualization Patterns

可视化方案

Folium Interactive Maps (built-in)

内置Folium交互式地图

python
viz = utils.Visualize(model, lon_lat_columns=["lon", "lat"], zoom=5)
m1 = viz.display_dataset(name="all", y_column="y")
m1.save("dataset_map.html")

for col in [c for c in result.columns if c.startswith("coef_")]:
    m = viz.coefs_heatmap(data_column=col, steps=20)
    m.save(f"map_{col}.html")
python
viz = utils.Visualize(model, lon_lat_columns=["lon", "lat"], zoom=5)
m1 = viz.display_dataset(name="all", y_column="y")
m1.save("dataset_map.html")

for col in [c for c in result.columns if c.startswith("coef_")]:
    m = viz.coefs_heatmap(data_column=col, steps=20)
    m.save(f"map_{col}.html")

Matplotlib Static Maps (publication-ready)

Matplotlib静态地图(适用于发表)

python
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
coef_cols = [c for c in result.columns if c.startswith("coef_")]

for ax, col in zip(axes.flat, coef_cols):
    sc = ax.scatter(
        result["lon"], result["lat"],
        c=result[col], cmap="RdYlBu_r", s=5, alpha=0.8,
        vmin=result[col].quantile(0.02), vmax=result[col].quantile(0.98)
    )
    ax.set_title(col.replace("coef_", "β_"), fontsize=14)
    plt.colorbar(sc, ax=ax, shrink=0.8)

plt.suptitle("Spatially Varying Coefficients (GNNWR)", fontsize=16)
plt.tight_layout()
plt.savefig("coefficients_map.png", dpi=300, bbox_inches="tight")
python
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
coef_cols = [c for c in result.columns if c.startswith("coef_")]

for ax, col in zip(axes.flat, coef_cols):
    sc = ax.scatter(
        result["lon"], result["lat"],
        c=result[col], cmap="RdYlBu_r", s=5, alpha=0.8,
        vmin=result[col].quantile(0.02), vmax=result[col].quantile(0.98)
    )
    ax.set_title(col.replace("coef_", "β_"), fontsize=14)
    plt.colorbar(sc, ax=ax, shrink=0.8)

plt.suptitle("空间变系数(GNNWR)", fontsize=16)
plt.tight_layout()
plt.savefig("coefficients_map.png", dpi=300, bbox_inches="tight")

GeoPandas + Contextily (with basemap)

GeoPandas + Contextily(带底图)

python
import geopandas as gpd
import contextily as ctx

gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result.lon, result.lat), crs="EPSG:4326")
gdf_web = gdf.to_crs(epsg=3857)

fig, ax = plt.subplots(figsize=(12, 10))
gdf_web.plot(column="coef_x1", ax=ax, cmap="RdYlBu_r", legend=True,
             markersize=5, alpha=0.7, legend_kwds={"shrink": 0.6})
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_title("β_x1 Spatial Variation")
ax.set_axis_off()
plt.savefig("coef_basemap.png", dpi=300, bbox_inches="tight")
python
import geopandas as gpd
import contextily as ctx

gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result.lon, result.lat), crs="EPSG:4326")
gdf_web = gdf.to_crs(epsg=3857)

fig, ax = plt.subplots(figsize=(12, 10))
gdf_web.plot(column="coef_x1", ax=ax, cmap="RdYlBu_r", legend=True,
             markersize=5, alpha=0.7, legend_kwds={"shrink": 0.6})
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_title("β_x1空间变化", fontsize=14)
ax.set_axis_off()
plt.savefig("coef_basemap.png", dpi=300, bbox_inches="tight")

When to Use vs Alternatives

适用场景与替代工具对比

Use CaseToolWhy
Spatially varying coefficients (neural net)GNNWRNon-linear weighting, scalable, coefficient maps
Classical geographically weighted regressionmgwr / GWR4Traditional bandwidth-based, well-established theory
Spatial interpolation (no covariates)verde / scikit-gstatGridding / kriging without regression
Global regression baselinestatsmodels / scikit-learnNo spatial non-stationarity assumed
Spatiotemporal varying coefficientsGTNNWRGNNWR extended with temporal dimension
Large-scale spatial regression (N > 100k)GNNWR + knn_kSparse distance matrix, O(n·k²) diagnostics
Geostatistical simulationgeostatspy / SGeMSStochastic realizations, uncertainty quantification
Choose GNNWR when: You need spatially varying regression coefficients with neural network-based geographic weighting, especially for large datasets where classical GWR is computationally infeasible.
Choose classical GWR when: You need well-established inferential statistics, bandwidth-based weighting, and simpler model interpretation.
Choose verde/kriging when: You need spatial interpolation without explanatory variables — pure spatial prediction from observed values.
场景工具理由
空间变系数回归(神经网络)GNNWR非线性加权、可扩展、支持系数图生成
经典地理加权回归mgwr / GWR4传统带宽加权、理论成熟
空间插值(无协变量)verde / scikit-gstat无需回归的网格化/克里金插值
全局回归基准statsmodels / scikit-learn假设无空间非平稳性
时空变系数回归GTNNWRGNNWR扩展时间维度版本
大规模空间回归(N > 100k)GNNWR + knn_k稀疏距离矩阵、O(n·k²)复杂度诊断
地统计模拟geostatspy / SGeMS随机实现、不确定性量化
选择GNNWR的场景:需要基于神经网络地理加权的空间变系数回归,尤其适用于经典GWR计算成本过高的大规模数据集。
选择经典GWR的场景:需要成熟的推断统计、带宽加权以及更简单的模型解释。
选择verde/克里金的场景:无需解释变量的空间插值——仅通过观测值完成纯空间预测。

Common Workflows

通用工作流

Spatial Regression Analysis

空间回归分析

  • EDA: Check spatial distribution, feature correlations, OLS baseline
  • Data split:
    init_dataset
    with appropriate ratios and
    sample_seed=42
  • Train: Start with defaults, tune
    start_lr
    and
    early_stop
  • Diagnose: R², RMSE, F1 (GNNWR vs OLS), F2 (spatial weight significance)
  • Visualize: Coefficient maps, residual spatial distribution, pred vs obs
  • Interpret: Where do coefficients vary most? Which variables show strongest non-stationarity? (F3_Local)
  • Report: Model summary table + coefficient maps + diagnostic statistics
  • 探索性数据分析(EDA):检查空间分布、特征相关性、OLS基准模型
  • 数据拆分:使用
    init_dataset
    设置合适比例并指定
    sample_seed=42
    保证复现性
  • 训练:从默认参数开始,重点调优
    start_lr
    early_stop
  • 诊断:分析R²、RMSE、F1(GNNWR vs OLS)、F2(空间权重显著性)
  • 可视化:生成系数图、残差空间分布图、预测值vs观测值对比图
  • 解释:分析系数变化最显著的区域、非平稳性最强的变量(通过F3_Local)
  • 报告:整理模型汇总表 + 系数图 + 诊断统计数据

Common Issues

常见问题

IssueSolution
Model degenerates to global regressionForgot
spatial_column
— always pass it
OOM on distance matrixN > 10k without
knn_k
; use
knn_k=500–2000
Loss explodes during training
start_lr
too high; start with 0.01
OverfittingNo
early_stop
; always set 20–50
Coefficients on wrong scaleUse
reg_result()
for denormalized predictions
GTNNWR behaves like GNNWRMissing
temp_column
; silently falls back
问题解决方案
模型退化为全局回归忘记传入
spatial_column
——务必传入该参数
距离矩阵导致内存溢出N > 10k未设置
knn_k
;建议设置
knn_k=500–2000
训练过程中损失值激增
start_lr
过高;从0.01开始尝试
过拟合未设置
early_stop
;建议设置为20–50
系数尺度异常使用
reg_result()
获取反归一化后的预测结果
GTNNWR表现与GNNWR一致缺失
temp_column
;模型自动回退为GNNWR

References

参考资料

  • Diagnostics — DIAGNOSIS methods, F-tests, residual analysis
  • Visualization — Detailed visualization patterns and publication figures
  • 诊断分析 — 诊断方法、F检验、残差分析
  • 可视化 — 详细可视化方案与发表级图表