gnnwr

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GNNWR - Geographically Neural Network Weighted Regression

GNNWR - 地理神经网络加权回归

Quick Reference

快速参考

python

from gnnwr import models, datasets, utils
import pandas as pd

data = pd.read_csv("data.csv")

train, val, test = datasets.init_dataset(
    data=data, test_ratio=0.2, valid_ratio=0.1,
    x_column=["x1", "x2", "x3"], y_column=["y"],
    spatial_column=["lon", "lat"],  # REQUIRED: geographic coords
    batch_size=32, process_fn="minmax_scale"
)

model = models.GNNWR(train, val, test, use_gpu=True, optimizer="Adam", start_lr=0.01)
model.run(max_epoch=200, early_stop=30)

result = model.reg_result(only_return=True)  # DataFrame: coef_x1, coef_x2, ..., Pred_y
print(model.result())                         # R², AIC, RMSE, F-tests summary

python

from gnnwr import models, datasets, utils
import pandas as pd

data = pd.read_csv("data.csv")

train, val, test = datasets.init_dataset(
    data=data, test_ratio=0.2, valid_ratio=0.1,
    x_column=["x1", "x2", "x3"], y_column=["y"],
    spatial_column=["lon", "lat"],  # 必填：地理坐标
    batch_size=32, process_fn="minmax_scale"
)

model = models.GNNWR(train, val, test, use_gpu=True, optimizer="Adam", start_lr=0.01)
model.run(max_epoch=200, early_stop=30)

result = model.reg_result(only_return=True)  # DataFrame: coef_x1, coef_x2, ..., Pred_y
print(model.result())                         # R², AIC, RMSE, F检验汇总

Spatiotemporal (GTNNWR)

时空回归（GTNNWR）

python

train, val, test = datasets.init_dataset(
    data=data, ...,
    spatial_column=["lon", "lat"],
    temp_column=["year", "month"],  # add temporal coords
    use_model="gtnnwr"
)
model = models.GTNNWR(train, val, test, use_gpu=True)

python

train, val, test = datasets.init_dataset(
    data=data, ...,
    spatial_column=["lon", "lat"],
    temp_column=["year", "month"],  # 添加时间坐标
    use_model="gtnnwr"
)
model = models.GTNNWR(train, val, test, use_gpu=True)

Large-Scale (N > 10k) — KNN Mode

大规模数据集（N > 10k）——KNN模式

python

train, val, test = datasets.init_dataset(
    data=data, ..., knn_k=500  # only k nearest neighbor distances
)

python

train, val, test = datasets.init_dataset(
    data=data, ..., knn_k=500  # 仅计算k个最近邻的距离
)

Memory: N=100k full=55GB → knn_k=2000 only 763MB

内存占用：N=100k全矩阵=55GB → knn_k=2000仅需763MB

undefined

undefined

Key Classes

关键类

Class	Purpose
`models.GNNWR`	Spatial regression with neural network geographic weighting
`models.GTNNWR`	Spatiotemporal regression with temporal + spatial weighting
`datasets.init_dataset`	Data splitting, normalization, distance matrix construction
`utils.Visualize`	Built-in folium interactive maps for coefficients and predictions

类	用途
`models.GNNWR`	基于神经网络地理加权的空间回归
`models.GTNNWR`	结合时间与空间加权的时空回归
`datasets.init_dataset`	数据拆分、归一化、距离矩阵构建
`utils.Visualize`	内置folium交互式系数与预测地图

Essential Operations

核心操作

init_dataset Parameters

init_dataset参数

Parameter	Default	Notes
`knn_k`	None	KNN sparse distance; None=full matrix
`process_fn`	"minmax_scale"	or "standard_scale"
`spatial_fun`	BasicDistance	Euclidean; or ManhattanDistance
`Reference`	None	"train", "train_val", or custom DataFrame
`sample_seed`	42	Reproducibility

参数	默认值	说明
`knn_k`	None	KNN稀疏距离；None表示全矩阵
`process_fn`	"minmax_scale"	可选"standard_scale"
`spatial_fun`	BasicDistance	欧氏距离；可选曼哈顿距离ManhattanDistance
`Reference`	None	可选"train"、"train_val"或自定义DataFrame
`sample_seed`	42	保证结果可复现

Model Hyperparameters

模型超参数

Parameter	Recommended	Notes
`optimizer`	"Adam"	Also: SGD, AdamW, Adagrad, RMSprop
`start_lr`	0.01–0.1	Critical tuning point
`drop_out`	0.2	0.0–0.5
`dense_layers`	None (auto)	Auto: power-of-2 sequence from input_dim to n_coef
`early_stop`	20–50	Patience; -1=disabled
`batch_norm`	True	Stabilizes training
`use_ols`	True	OLS-initialized output layer

参数	推荐值	说明
`optimizer`	"Adam"	可选：SGD、AdamW、Adagrad、RMSprop
`start_lr`	0.01–0.1	关键调优参数
`drop_out`	0.2	取值范围0.0–0.5
`dense_layers`	None（自动）	自动生成：从输入维度到系数数量的2的幂次序列
`early_stop`	20–50	早停 patience；-1表示禁用
`batch_norm`	True	稳定训练过程
`use_ols`	True	输出层采用OLS初始化

Diagnostics

诊断分析

python

diag = model._test_diagnosis
diag.R2()           # always available
diag.RMSE()         # always available
diag.AIC()          # needs lite=False (auto for N<10k)
diag.AICc()         # corrected AIC
diag.F1_Global()    # GNNWR vs OLS significance
diag.F2_Global()    # spatial weight significance
diag.F3_Local()     # per-variable significance → (dict1, dict2)

lite=True

(auto when N>10k): only R²/RMSE; Hat-matrix diagnostics skipped.

python

diag = model._test_diagnosis
diag.R2()           # 始终可用
diag.RMSE()         # 始终可用
diag.AIC()          # 需要设置lite=False（N<10k时自动启用）
diag.AICc()         # 校正版AIC
diag.F1_Global()    # GNNWR与OLS的显著性对比
diag.F2_Global()    # 空间权重的显著性
diag.F3_Local()     # 各变量的显著性 → 返回(dict1, dict2)

lite=True

（N>10k时自动启用）：仅保留R²/RMSE；跳过帽子矩阵诊断。

Visualization Patterns

可视化方案

Folium Interactive Maps (built-in)

内置Folium交互式地图

python

viz = utils.Visualize(model, lon_lat_columns=["lon", "lat"], zoom=5)
m1 = viz.display_dataset(name="all", y_column="y")
m1.save("dataset_map.html")

for col in [c for c in result.columns if c.startswith("coef_")]:
    m = viz.coefs_heatmap(data_column=col, steps=20)
    m.save(f"map_{col}.html")

python

viz = utils.Visualize(model, lon_lat_columns=["lon", "lat"], zoom=5)
m1 = viz.display_dataset(name="all", y_column="y")
m1.save("dataset_map.html")

for col in [c for c in result.columns if c.startswith("coef_")]:
    m = viz.coefs_heatmap(data_column=col, steps=20)
    m.save(f"map_{col}.html")

Matplotlib Static Maps (publication-ready)

Matplotlib静态地图（适用于发表）

python

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
coef_cols = [c for c in result.columns if c.startswith("coef_")]

for ax, col in zip(axes.flat, coef_cols):
    sc = ax.scatter(
        result["lon"], result["lat"],
        c=result[col], cmap="RdYlBu_r", s=5, alpha=0.8,
        vmin=result[col].quantile(0.02), vmax=result[col].quantile(0.98)
    )
    ax.set_title(col.replace("coef_", "β_"), fontsize=14)
    plt.colorbar(sc, ax=ax, shrink=0.8)

plt.suptitle("Spatially Varying Coefficients (GNNWR)", fontsize=16)
plt.tight_layout()
plt.savefig("coefficients_map.png", dpi=300, bbox_inches="tight")

python

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
coef_cols = [c for c in result.columns if c.startswith("coef_")]

for ax, col in zip(axes.flat, coef_cols):
    sc = ax.scatter(
        result["lon"], result["lat"],
        c=result[col], cmap="RdYlBu_r", s=5, alpha=0.8,
        vmin=result[col].quantile(0.02), vmax=result[col].quantile(0.98)
    )
    ax.set_title(col.replace("coef_", "β_"), fontsize=14)
    plt.colorbar(sc, ax=ax, shrink=0.8)

plt.suptitle("空间变系数（GNNWR）", fontsize=16)
plt.tight_layout()
plt.savefig("coefficients_map.png", dpi=300, bbox_inches="tight")

GeoPandas + Contextily (with basemap)

GeoPandas + Contextily（带底图）

python

import geopandas as gpd
import contextily as ctx

gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result.lon, result.lat), crs="EPSG:4326")
gdf_web = gdf.to_crs(epsg=3857)

fig, ax = plt.subplots(figsize=(12, 10))
gdf_web.plot(column="coef_x1", ax=ax, cmap="RdYlBu_r", legend=True,
             markersize=5, alpha=0.7, legend_kwds={"shrink": 0.6})
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_title("β_x1 Spatial Variation")
ax.set_axis_off()
plt.savefig("coef_basemap.png", dpi=300, bbox_inches="tight")

python

import geopandas as gpd
import contextily as ctx

gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result.lon, result.lat), crs="EPSG:4326")
gdf_web = gdf.to_crs(epsg=3857)

fig, ax = plt.subplots(figsize=(12, 10))
gdf_web.plot(column="coef_x1", ax=ax, cmap="RdYlBu_r", legend=True,
             markersize=5, alpha=0.7, legend_kwds={"shrink": 0.6})
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_title("β_x1空间变化", fontsize=14)
ax.set_axis_off()
plt.savefig("coef_basemap.png", dpi=300, bbox_inches="tight")

When to Use vs Alternatives

适用场景与替代工具对比

Use Case	Tool	Why
Spatially varying coefficients (neural net)	GNNWR	Non-linear weighting, scalable, coefficient maps
Classical geographically weighted regression	mgwr / GWR4	Traditional bandwidth-based, well-established theory
Spatial interpolation (no covariates)	verde / scikit-gstat	Gridding / kriging without regression
Global regression baseline	statsmodels / scikit-learn	No spatial non-stationarity assumed
Spatiotemporal varying coefficients	GTNNWR	GNNWR extended with temporal dimension
Large-scale spatial regression (N > 100k)	GNNWR + knn_k	Sparse distance matrix, O(n·k²) diagnostics
Geostatistical simulation	geostatspy / SGeMS	Stochastic realizations, uncertainty quantification

Choose GNNWR when: You need spatially varying regression coefficients with neural network-based geographic weighting, especially for large datasets where classical GWR is computationally infeasible.

Choose classical GWR when: You need well-established inferential statistics, bandwidth-based weighting, and simpler model interpretation.

Choose verde/kriging when: You need spatial interpolation without explanatory variables — pure spatial prediction from observed values.

场景	工具	理由
空间变系数回归（神经网络）	GNNWR	非线性加权、可扩展、支持系数图生成
经典地理加权回归	mgwr / GWR4	传统带宽加权、理论成熟
空间插值（无协变量）	verde / scikit-gstat	无需回归的网格化/克里金插值
全局回归基准	statsmodels / scikit-learn	假设无空间非平稳性
时空变系数回归	GTNNWR	GNNWR扩展时间维度版本
大规模空间回归（N > 100k）	GNNWR + knn_k	稀疏距离矩阵、O(n·k²)复杂度诊断
地统计模拟	geostatspy / SGeMS	随机实现、不确定性量化

选择GNNWR的场景：需要基于神经网络地理加权的空间变系数回归，尤其适用于经典GWR计算成本过高的大规模数据集。

选择经典GWR的场景：需要成熟的推断统计、带宽加权以及更简单的模型解释。

选择verde/克里金的场景：无需解释变量的空间插值——仅通过观测值完成纯空间预测。

Common Workflows

通用工作流

Spatial Regression Analysis

空间回归分析

EDA: Check spatial distribution, feature correlations, OLS baseline
Data split:
```
init_dataset
```
with appropriate ratios and
```
sample_seed=42
```
Train: Start with defaults, tune
```
start_lr
```
and
```
early_stop
```
Diagnose: R², RMSE, F1 (GNNWR vs OLS), F2 (spatial weight significance)
Visualize: Coefficient maps, residual spatial distribution, pred vs obs
Interpret: Where do coefficients vary most? Which variables show strongest non-stationarity? (F3_Local)
Report: Model summary table + coefficient maps + diagnostic statistics

探索性数据分析（EDA）：检查空间分布、特征相关性、OLS基准模型
数据拆分：使用
```
init_dataset
```
设置合适比例并指定
```
sample_seed=42
```
保证复现性
训练：从默认参数开始，重点调优
```
start_lr
```
和
```
early_stop
```
诊断：分析R²、RMSE、F1（GNNWR vs OLS）、F2（空间权重显著性）
可视化：生成系数图、残差空间分布图、预测值vs观测值对比图
解释：分析系数变化最显著的区域、非平稳性最强的变量（通过F3_Local）
报告：整理模型汇总表 + 系数图 + 诊断统计数据

Common Issues

常见问题

Issue	Solution
Model degenerates to global regression	Forgot `spatial_column` — always pass it
OOM on distance matrix	N > 10k without `knn_k` ; use `knn_k=500–2000`
Loss explodes during training	`start_lr` too high; start with 0.01
Overfitting	No `early_stop` ; always set 20–50
Coefficients on wrong scale	Use `reg_result()` for denormalized predictions
GTNNWR behaves like GNNWR	Missing `temp_column` ; silently falls back

问题	解决方案
模型退化为全局回归	忘记传入 `spatial_column` ——务必传入该参数
距离矩阵导致内存溢出	N > 10k未设置 `knn_k` ；建议设置 `knn_k=500–2000`
训练过程中损失值激增	`start_lr` 过高；从0.01开始尝试
过拟合	未设置 `early_stop` ；建议设置为20–50
系数尺度异常	使用 `reg_result()` 获取反归一化后的预测结果
GTNNWR表现与GNNWR一致	缺失 `temp_column` ；模型自动回退为GNNWR

References

参考资料

Diagnostics — DIAGNOSIS methods, F-tests, residual analysis
Visualization — Detailed visualization patterns and publication figures

诊断分析 — 诊断方法、F检验、残差分析
可视化 — 详细可视化方案与发表级图表