gnnwr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGNNWR - Geographically Neural Network Weighted Regression
GNNWR - 地理神经网络加权回归
Quick Reference
快速参考
python
from gnnwr import models, datasets, utils
import pandas as pd
data = pd.read_csv("data.csv")
train, val, test = datasets.init_dataset(
data=data, test_ratio=0.2, valid_ratio=0.1,
x_column=["x1", "x2", "x3"], y_column=["y"],
spatial_column=["lon", "lat"], # REQUIRED: geographic coords
batch_size=32, process_fn="minmax_scale"
)
model = models.GNNWR(train, val, test, use_gpu=True, optimizer="Adam", start_lr=0.01)
model.run(max_epoch=200, early_stop=30)
result = model.reg_result(only_return=True) # DataFrame: coef_x1, coef_x2, ..., Pred_y
print(model.result()) # R², AIC, RMSE, F-tests summarypython
from gnnwr import models, datasets, utils
import pandas as pd
data = pd.read_csv("data.csv")
train, val, test = datasets.init_dataset(
data=data, test_ratio=0.2, valid_ratio=0.1,
x_column=["x1", "x2", "x3"], y_column=["y"],
spatial_column=["lon", "lat"], # 必填:地理坐标
batch_size=32, process_fn="minmax_scale"
)
model = models.GNNWR(train, val, test, use_gpu=True, optimizer="Adam", start_lr=0.01)
model.run(max_epoch=200, early_stop=30)
result = model.reg_result(only_return=True) # DataFrame: coef_x1, coef_x2, ..., Pred_y
print(model.result()) # R², AIC, RMSE, F检验汇总Spatiotemporal (GTNNWR)
时空回归(GTNNWR)
python
train, val, test = datasets.init_dataset(
data=data, ...,
spatial_column=["lon", "lat"],
temp_column=["year", "month"], # add temporal coords
use_model="gtnnwr"
)
model = models.GTNNWR(train, val, test, use_gpu=True)python
train, val, test = datasets.init_dataset(
data=data, ...,
spatial_column=["lon", "lat"],
temp_column=["year", "month"], # 添加时间坐标
use_model="gtnnwr"
)
model = models.GTNNWR(train, val, test, use_gpu=True)Large-Scale (N > 10k) — KNN Mode
大规模数据集(N > 10k)——KNN模式
python
train, val, test = datasets.init_dataset(
data=data, ..., knn_k=500 # only k nearest neighbor distances
)python
train, val, test = datasets.init_dataset(
data=data, ..., knn_k=500 # 仅计算k个最近邻的距离
)Memory: N=100k full=55GB → knn_k=2000 only 763MB
内存占用:N=100k全矩阵=55GB → knn_k=2000仅需763MB
undefinedundefinedKey Classes
关键类
| Class | Purpose |
|---|---|
| Spatial regression with neural network geographic weighting |
| Spatiotemporal regression with temporal + spatial weighting |
| Data splitting, normalization, distance matrix construction |
| Built-in folium interactive maps for coefficients and predictions |
| 类 | 用途 |
|---|---|
| 基于神经网络地理加权的空间回归 |
| 结合时间与空间加权的时空回归 |
| 数据拆分、归一化、距离矩阵构建 |
| 内置folium交互式系数与预测地图 |
Essential Operations
核心操作
init_dataset Parameters
init_dataset参数
| Parameter | Default | Notes |
|---|---|---|
| None | KNN sparse distance; None=full matrix |
| "minmax_scale" | or "standard_scale" |
| BasicDistance | Euclidean; or ManhattanDistance |
| None | "train", "train_val", or custom DataFrame |
| 42 | Reproducibility |
| 参数 | 默认值 | 说明 |
|---|---|---|
| None | KNN稀疏距离;None表示全矩阵 |
| "minmax_scale" | 可选"standard_scale" |
| BasicDistance | 欧氏距离;可选曼哈顿距离ManhattanDistance |
| None | 可选"train"、"train_val"或自定义DataFrame |
| 42 | 保证结果可复现 |
Model Hyperparameters
模型超参数
| Parameter | Recommended | Notes |
|---|---|---|
| "Adam" | Also: SGD, AdamW, Adagrad, RMSprop |
| 0.01–0.1 | Critical tuning point |
| 0.2 | 0.0–0.5 |
| None (auto) | Auto: power-of-2 sequence from input_dim to n_coef |
| 20–50 | Patience; -1=disabled |
| True | Stabilizes training |
| True | OLS-initialized output layer |
| 参数 | 推荐值 | 说明 |
|---|---|---|
| "Adam" | 可选:SGD、AdamW、Adagrad、RMSprop |
| 0.01–0.1 | 关键调优参数 |
| 0.2 | 取值范围0.0–0.5 |
| None(自动) | 自动生成:从输入维度到系数数量的2的幂次序列 |
| 20–50 | 早停 patience;-1表示禁用 |
| True | 稳定训练过程 |
| True | 输出层采用OLS初始化 |
Diagnostics
诊断分析
python
diag = model._test_diagnosis
diag.R2() # always available
diag.RMSE() # always available
diag.AIC() # needs lite=False (auto for N<10k)
diag.AICc() # corrected AIC
diag.F1_Global() # GNNWR vs OLS significance
diag.F2_Global() # spatial weight significance
diag.F3_Local() # per-variable significance → (dict1, dict2)lite=Truepython
diag = model._test_diagnosis
diag.R2() # 始终可用
diag.RMSE() # 始终可用
diag.AIC() # 需要设置lite=False(N<10k时自动启用)
diag.AICc() # 校正版AIC
diag.F1_Global() # GNNWR与OLS的显著性对比
diag.F2_Global() # 空间权重的显著性
diag.F3_Local() # 各变量的显著性 → 返回(dict1, dict2)lite=TrueVisualization Patterns
可视化方案
Folium Interactive Maps (built-in)
内置Folium交互式地图
python
viz = utils.Visualize(model, lon_lat_columns=["lon", "lat"], zoom=5)
m1 = viz.display_dataset(name="all", y_column="y")
m1.save("dataset_map.html")
for col in [c for c in result.columns if c.startswith("coef_")]:
m = viz.coefs_heatmap(data_column=col, steps=20)
m.save(f"map_{col}.html")python
viz = utils.Visualize(model, lon_lat_columns=["lon", "lat"], zoom=5)
m1 = viz.display_dataset(name="all", y_column="y")
m1.save("dataset_map.html")
for col in [c for c in result.columns if c.startswith("coef_")]:
m = viz.coefs_heatmap(data_column=col, steps=20)
m.save(f"map_{col}.html")Matplotlib Static Maps (publication-ready)
Matplotlib静态地图(适用于发表)
python
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
coef_cols = [c for c in result.columns if c.startswith("coef_")]
for ax, col in zip(axes.flat, coef_cols):
sc = ax.scatter(
result["lon"], result["lat"],
c=result[col], cmap="RdYlBu_r", s=5, alpha=0.8,
vmin=result[col].quantile(0.02), vmax=result[col].quantile(0.98)
)
ax.set_title(col.replace("coef_", "β_"), fontsize=14)
plt.colorbar(sc, ax=ax, shrink=0.8)
plt.suptitle("Spatially Varying Coefficients (GNNWR)", fontsize=16)
plt.tight_layout()
plt.savefig("coefficients_map.png", dpi=300, bbox_inches="tight")python
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
coef_cols = [c for c in result.columns if c.startswith("coef_")]
for ax, col in zip(axes.flat, coef_cols):
sc = ax.scatter(
result["lon"], result["lat"],
c=result[col], cmap="RdYlBu_r", s=5, alpha=0.8,
vmin=result[col].quantile(0.02), vmax=result[col].quantile(0.98)
)
ax.set_title(col.replace("coef_", "β_"), fontsize=14)
plt.colorbar(sc, ax=ax, shrink=0.8)
plt.suptitle("空间变系数(GNNWR)", fontsize=16)
plt.tight_layout()
plt.savefig("coefficients_map.png", dpi=300, bbox_inches="tight")GeoPandas + Contextily (with basemap)
GeoPandas + Contextily(带底图)
python
import geopandas as gpd
import contextily as ctx
gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result.lon, result.lat), crs="EPSG:4326")
gdf_web = gdf.to_crs(epsg=3857)
fig, ax = plt.subplots(figsize=(12, 10))
gdf_web.plot(column="coef_x1", ax=ax, cmap="RdYlBu_r", legend=True,
markersize=5, alpha=0.7, legend_kwds={"shrink": 0.6})
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_title("β_x1 Spatial Variation")
ax.set_axis_off()
plt.savefig("coef_basemap.png", dpi=300, bbox_inches="tight")python
import geopandas as gpd
import contextily as ctx
gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result.lon, result.lat), crs="EPSG:4326")
gdf_web = gdf.to_crs(epsg=3857)
fig, ax = plt.subplots(figsize=(12, 10))
gdf_web.plot(column="coef_x1", ax=ax, cmap="RdYlBu_r", legend=True,
markersize=5, alpha=0.7, legend_kwds={"shrink": 0.6})
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_title("β_x1空间变化", fontsize=14)
ax.set_axis_off()
plt.savefig("coef_basemap.png", dpi=300, bbox_inches="tight")When to Use vs Alternatives
适用场景与替代工具对比
| Use Case | Tool | Why |
|---|---|---|
| Spatially varying coefficients (neural net) | GNNWR | Non-linear weighting, scalable, coefficient maps |
| Classical geographically weighted regression | mgwr / GWR4 | Traditional bandwidth-based, well-established theory |
| Spatial interpolation (no covariates) | verde / scikit-gstat | Gridding / kriging without regression |
| Global regression baseline | statsmodels / scikit-learn | No spatial non-stationarity assumed |
| Spatiotemporal varying coefficients | GTNNWR | GNNWR extended with temporal dimension |
| Large-scale spatial regression (N > 100k) | GNNWR + knn_k | Sparse distance matrix, O(n·k²) diagnostics |
| Geostatistical simulation | geostatspy / SGeMS | Stochastic realizations, uncertainty quantification |
Choose GNNWR when: You need spatially varying regression coefficients with neural
network-based geographic weighting, especially for large datasets where classical GWR
is computationally infeasible.
Choose classical GWR when: You need well-established inferential statistics,
bandwidth-based weighting, and simpler model interpretation.
Choose verde/kriging when: You need spatial interpolation without explanatory
variables — pure spatial prediction from observed values.
| 场景 | 工具 | 理由 |
|---|---|---|
| 空间变系数回归(神经网络) | GNNWR | 非线性加权、可扩展、支持系数图生成 |
| 经典地理加权回归 | mgwr / GWR4 | 传统带宽加权、理论成熟 |
| 空间插值(无协变量) | verde / scikit-gstat | 无需回归的网格化/克里金插值 |
| 全局回归基准 | statsmodels / scikit-learn | 假设无空间非平稳性 |
| 时空变系数回归 | GTNNWR | GNNWR扩展时间维度版本 |
| 大规模空间回归(N > 100k) | GNNWR + knn_k | 稀疏距离矩阵、O(n·k²)复杂度诊断 |
| 地统计模拟 | geostatspy / SGeMS | 随机实现、不确定性量化 |
选择GNNWR的场景:需要基于神经网络地理加权的空间变系数回归,尤其适用于经典GWR计算成本过高的大规模数据集。
选择经典GWR的场景:需要成熟的推断统计、带宽加权以及更简单的模型解释。
选择verde/克里金的场景:无需解释变量的空间插值——仅通过观测值完成纯空间预测。
Common Workflows
通用工作流
Spatial Regression Analysis
空间回归分析
- EDA: Check spatial distribution, feature correlations, OLS baseline
- Data split: with appropriate ratios and
init_datasetsample_seed=42 - Train: Start with defaults, tune and
start_lrearly_stop - Diagnose: R², RMSE, F1 (GNNWR vs OLS), F2 (spatial weight significance)
- Visualize: Coefficient maps, residual spatial distribution, pred vs obs
- Interpret: Where do coefficients vary most? Which variables show strongest non-stationarity? (F3_Local)
- Report: Model summary table + coefficient maps + diagnostic statistics
- 探索性数据分析(EDA):检查空间分布、特征相关性、OLS基准模型
- 数据拆分:使用设置合适比例并指定
init_dataset保证复现性sample_seed=42 - 训练:从默认参数开始,重点调优和
start_lrearly_stop - 诊断:分析R²、RMSE、F1(GNNWR vs OLS)、F2(空间权重显著性)
- 可视化:生成系数图、残差空间分布图、预测值vs观测值对比图
- 解释:分析系数变化最显著的区域、非平稳性最强的变量(通过F3_Local)
- 报告:整理模型汇总表 + 系数图 + 诊断统计数据
Common Issues
常见问题
| Issue | Solution |
|---|---|
| Model degenerates to global regression | Forgot |
| OOM on distance matrix | N > 10k without |
| Loss explodes during training | |
| Overfitting | No |
| Coefficients on wrong scale | Use |
| GTNNWR behaves like GNNWR | Missing |
| 问题 | 解决方案 |
|---|---|
| 模型退化为全局回归 | 忘记传入 |
| 距离矩阵导致内存溢出 | N > 10k未设置 |
| 训练过程中损失值激增 | |
| 过拟合 | 未设置 |
| 系数尺度异常 | 使用 |
| GTNNWR表现与GNNWR一致 | 缺失 |
References
参考资料
- Diagnostics — DIAGNOSIS methods, F-tests, residual analysis
- Visualization — Detailed visualization patterns and publication figures
- 诊断分析 — 诊断方法、F检验、残差分析
- 可视化 — 详细可视化方案与发表级图表