geomaster

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GeoMaster

GeoMaster

Comprehensive geospatial science skill covering GIS, remote sensing, spatial analysis, and ML for Earth observation across 70+ topics with 500+ code examples in 8 programming languages.
这是一项全面的地理空间科学技能,涵盖GIS、遥感、空间分析以及地球观测机器学习,包含70余个主题和8种编程语言的500余个代码示例。

Installation

安装

bash
undefined
bash
undefined

Core Python stack (conda recommended)

Core Python stack (conda recommended)

conda install -c conda-forge gdal rasterio fiona shapely pyproj geopandas
conda install -c conda-forge gdal rasterio fiona shapely pyproj geopandas

Remote sensing & ML

Remote sensing & ML

uv pip install rsgislib torchgeo earthengine-api uv pip install scikit-learn xgboost torch-geometric
uv pip install rsgislib torchgeo earthengine-api uv pip install scikit-learn xgboost torch-geometric

Network & visualization

Network & visualization

uv pip install osmnx networkx folium keplergl uv pip install cartopy contextily mapclassify
uv pip install osmnx networkx folium keplergl uv pip install cartopy contextily mapclassify

Big data & cloud

Big data & cloud

uv pip install xarray rioxarray dask-geopandas uv pip install pystac-client planetary-computer
uv pip install xarray rioxarray dask-geopandas uv pip install pystac-client planetary-computer

Point clouds

Point clouds

uv pip install laspy pylas open3d pdal
uv pip install laspy pylas open3d pdal

Databases

Databases

conda install -c conda-forge postgis spatialite
undefined
conda install -c conda-forge postgis spatialite
undefined

Quick Start

快速开始

NDVI from Sentinel-2

基于Sentinel-2计算NDVI

python
import rasterio
import numpy as np

with rasterio.open('sentinel2.tif') as src:
    red = src.read(4).astype(float)   # B04
    nir = src.read(8).astype(float)   # B08
    ndvi = (nir - red) / (nir + red + 1e-8)
    ndvi = np.nan_to_num(ndvi, nan=0)

    profile = src.profile
    profile.update(count=1, dtype=rasterio.float32)

    with rasterio.open('ndvi.tif', 'w', **profile) as dst:
        dst.write(ndvi.astype(rasterio.float32), 1)
python
import rasterio
import numpy as np

with rasterio.open('sentinel2.tif') as src:
    red = src.read(4).astype(float)   # B04
    nir = src.read(8).astype(float)   # B08
    ndvi = (nir - red) / (nir + red + 1e-8)
    ndvi = np.nan_to_num(ndvi, nan=0)

    profile = src.profile
    profile.update(count=1, dtype=rasterio.float32)

    with rasterio.open('ndvi.tif', 'w', **profile) as dst:
        dst.write(ndvi.astype(rasterio.float32), 1)

Spatial Analysis with GeoPandas

使用GeoPandas进行空间分析

python
import geopandas as gpd
python
import geopandas as gpd

Load and ensure same CRS

Load and ensure same CRS

zones = gpd.read_file('zones.geojson') points = gpd.read_file('points.geojson')
if zones.crs != points.crs: points = points.to_crs(zones.crs)
zones = gpd.read_file('zones.geojson') points = gpd.read_file('points.geojson')
if zones.crs != points.crs: points = points.to_crs(zones.crs)

Spatial join and statistics

Spatial join and statistics

joined = gpd.sjoin(points, zones, how='inner', predicate='within') stats = joined.groupby('zone_id').agg({ 'value': ['count', 'mean', 'std', 'min', 'max'] }).round(2)
undefined
joined = gpd.sjoin(points, zones, how='inner', predicate='within') stats = joined.groupby('zone_id').agg({ 'value': ['count', 'mean', 'std', 'min', 'max'] }).round(2)
undefined

Google Earth Engine Time Series

Google Earth Engine时间序列分析

python
import ee
import pandas as pd

ee.Initialize(project='your-project')
roi = ee.Geometry.Point([-122.4, 37.7]).buffer(10000)

s2 = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
      .filterBounds(roi)
      .filterDate('2020-01-01', '2023-12-31')
      .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))

def add_ndvi(img):
    return img.addBands(img.normalizedDifference(['B8', 'B4']).rename('NDVI'))

s2_ndvi = s2.map(add_ndvi)

def extract_series(image):
    stats = image.reduceRegion(ee.Reducer.mean(), roi.centroid(), scale=10, maxPixels=1e9)
    return ee.Feature(None, {'date': image.date().format('YYYY-MM-dd'), 'ndvi': stats.get('NDVI')})

series = s2_ndvi.map(extract_series).getInfo()
df = pd.DataFrame([f['properties'] for f in series['features']])
df['date'] = pd.to_datetime(df['date'])
python
import ee
import pandas as pd

ee.Initialize(project='your-project')
roi = ee.Geometry.Point([-122.4, 37.7]).buffer(10000)

s2 = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
      .filterBounds(roi)
      .filterDate('2020-01-01', '2023-12-31')
      .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))

def add_ndvi(img):
    return img.addBands(img.normalizedDifference(['B8', 'B4']).rename('NDVI'))

s2_ndvi = s2.map(add_ndvi)

def extract_series(image):
    stats = image.reduceRegion(ee.Reducer.mean(), roi.centroid(), scale=10, maxPixels=1e9)
    return ee.Feature(None, {'date': image.date().format('YYYY-MM-dd'), 'ndvi': stats.get('NDVI')})

series = s2_ndvi.map(extract_series).getInfo()
df = pd.DataFrame([f['properties'] for f in series['features']])
df['date'] = pd.to_datetime(df['date'])

Core Concepts

核心概念

Data Types

数据类型

TypeExamplesLibraries
VectorShapefile, GeoJSON, GeoPackageGeoPandas, Fiona, GDAL
RasterGeoTIFF, NetCDF, COGRasterio, Xarray, GDAL
Point CloudLAS, LAZLaspy, PDAL, Open3D
类型示例
矢量Shapefile、GeoJSON、GeoPackageGeoPandas、Fiona、GDAL
栅格GeoTIFF、NetCDF、COGRasterio、Xarray、GDAL
点云LAS、LAZLaspy、PDAL、Open3D

Coordinate Systems

坐标系

  • EPSG:4326 (WGS 84) - Geographic, lat/lon, use for storage
  • EPSG:3857 (Web Mercator) - Web maps only (don't use for area/distance!)
  • EPSG:326xx/327xx (UTM) - Metric calculations, <1% distortion per zone
  • Use
    gdf.estimate_utm_crs()
    for automatic UTM detection
python
undefined
  • EPSG:4326(WGS 84)- 地理坐标系,经纬度,用于数据存储
  • EPSG:3857(Web墨卡托)- 仅用于Web地图(不要用于面积/距离计算!)
  • EPSG:326xx/327xx(UTM)- 米制计算,每个区域的变形<1%
  • 使用
    gdf.estimate_utm_crs()
    自动检测UTM坐标系
python
undefined

Always check CRS before operations

Always check CRS before operations

assert gdf1.crs == gdf2.crs, "CRS mismatch!"
assert gdf1.crs == gdf2.crs, "CRS mismatch!"

For area/distance calculations, use projected CRS

For area/distance calculations, use projected CRS

gdf_metric = gdf.to_crs(gdf.estimate_utm_crs()) area_sqm = gdf_metric.geometry.area
undefined
gdf_metric = gdf.to_crs(gdf.estimate_utm_crs()) area_sqm = gdf_metric.geometry.area
undefined

OGC Standards

OGC标准

  • WMS: Web Map Service - raster maps
  • WFS: Web Feature Service - vector data
  • WCS: Web Coverage Service - raster coverage
  • STAC: Spatiotemporal Asset Catalog - modern metadata
  • WMS: Web地图服务 - 栅格地图
  • WFS: Web要素服务 - 矢量数据
  • WCS: Web覆盖服务 - 栅格覆盖数据
  • STAC: 时空资产目录 - 现代元数据标准

Common Operations

常见操作

Spectral Indices

光谱指数

python
def calculate_indices(image_path):
    """NDVI, EVI, SAVI, NDWI from Sentinel-2."""
    with rasterio.open(image_path) as src:
        B02, B03, B04, B08, B11 = [src.read(i).astype(float) for i in [1,2,3,4,5]]

    ndvi = (B08 - B04) / (B08 + B04 + 1e-8)
    evi = 2.5 * (B08 - B04) / (B08 + 6*B04 - 7.5*B02 + 1)
    savi = ((B08 - B04) / (B08 + B04 + 0.5)) * 1.5
    ndwi = (B03 - B08) / (B03 + B08 + 1e-8)

    return {'NDVI': ndvi, 'EVI': evi, 'SAVI': savi, 'NDWI': ndwi}
python
def calculate_indices(image_path):
    """NDVI, EVI, SAVI, NDWI from Sentinel-2."""
    with rasterio.open(image_path) as src:
        B02, B03, B04, B08, B11 = [src.read(i).astype(float) for i in [1,2,3,4,5]]

    ndvi = (B08 - B04) / (B08 + B04 + 1e-8)
    evi = 2.5 * (B08 - B04) / (B08 + 6*B04 - 7.5*B02 + 1)
    savi = ((B08 - B04) / (B08 + B04 + 0.5)) * 1.5
    ndwi = (B03 - B08) / (B03 + B08 + 1e-8)

    return {'NDVI': ndvi, 'EVI': evi, 'SAVI': savi, 'NDWI': ndwi}

Vector Operations

矢量操作

python
undefined
python
undefined

Buffer (use projected CRS!)

Buffer (use projected CRS!)

gdf_proj = gdf.to_crs(gdf.estimate_utm_crs()) gdf['buffer_1km'] = gdf_proj.geometry.buffer(1000)
gdf_proj = gdf.to_crs(gdf.estimate_utm_crs()) gdf['buffer_1km'] = gdf_proj.geometry.buffer(1000)

Spatial relationships

Spatial relationships

intersects = gdf[gdf.geometry.intersects(other_geometry)] contains = gdf[gdf.geometry.contains(point_geometry)]
intersects = gdf[gdf.geometry.intersects(other_geometry)] contains = gdf[gdf.geometry.contains(point_geometry)]

Geometric operations

Geometric operations

gdf['centroid'] = gdf.geometry.centroid gdf['simplified'] = gdf.geometry.simplify(tolerance=0.001)
gdf['centroid'] = gdf.geometry.centroid gdf['simplified'] = gdf.geometry.simplify(tolerance=0.001)

Overlay operations

Overlay operations

intersection = gpd.overlay(gdf1, gdf2, how='intersection') union = gpd.overlay(gdf1, gdf2, how='union')
undefined
intersection = gpd.overlay(gdf1, gdf2, how='intersection') union = gpd.overlay(gdf1, gdf2, how='union')
undefined

Terrain Analysis

地形分析

python
def terrain_metrics(dem_path):
    """Calculate slope, aspect, hillshade from DEM."""
    with rasterio.open(dem_path) as src:
        dem = src.read(1)

    dy, dx = np.gradient(dem)
    slope = np.arctan(np.sqrt(dx**2 + dy**2)) * 180 / np.pi
    aspect = (90 - np.arctan2(-dy, dx) * 180 / np.pi) % 360

    # Hillshade
    az_rad, alt_rad = np.radians(315), np.radians(45)
    hillshade = (np.sin(alt_rad) * np.sin(np.radians(slope)) +
                 np.cos(alt_rad) * np.cos(np.radians(slope)) *
                 np.cos(np.radians(aspect) - az_rad))

    return slope, aspect, hillshade
python
def terrain_metrics(dem_path):
    """Calculate slope, aspect, hillshade from DEM."""
    with rasterio.open(dem_path) as src:
        dem = src.read(1)

    dy, dx = np.gradient(dem)
    slope = np.arctan(np.sqrt(dx**2 + dy**2)) * 180 / np.pi
    aspect = (90 - np.arctan2(-dy, dx) * 180 / np.pi) % 360

    # Hillshade
    az_rad, alt_rad = np.radians(315), np.radians(45)
    hillshade = (np.sin(alt_rad) * np.sin(np.radians(slope)) +
                 np.cos(alt_rad) * np.cos(np.radians(slope)) *
                 np.cos(np.radians(aspect) - az_rad))

    return slope, aspect, hillshade

Network Analysis

网络分析

python
import osmnx as ox
import networkx as nx
python
import osmnx as ox
import networkx as nx

Download and analyze street network

Download and analyze street network

G = ox.graph_from_place('San Francisco, CA', network_type='drive') G = ox.add_edge_speeds(G).add_edge_travel_times(G)
G = ox.graph_from_place('San Francisco, CA', network_type='drive') G = ox.add_edge_speeds(G).add_edge_travel_times(G)

Shortest path

Shortest path

orig = ox.distance.nearest_nodes(G, -122.4, 37.7) dest = ox.distance.nearest_nodes(G, -122.3, 37.8) route = nx.shortest_path(G, orig, dest, weight='travel_time')
undefined
orig = ox.distance.nearest_nodes(G, -122.4, 37.7) dest = ox.distance.nearest_nodes(G, -122.3, 37.8) route = nx.shortest_path(G, orig, dest, weight='travel_time')
undefined

Image Classification

影像分类

python
from sklearn.ensemble import RandomForestClassifier
import rasterio
from rasterio.features import rasterize

def classify_imagery(raster_path, training_gdf, output_path):
    """Train RF and classify imagery."""
    with rasterio.open(raster_path) as src:
        image = src.read()
        profile = src.profile
        transform = src.transform

    # Extract training data
    X_train, y_train = [], []
    for _, row in training_gdf.iterrows():
        mask = rasterize([(row.geometry, 1)],
                        out_shape=(profile['height'], profile['width']),
                        transform=transform, fill=0, dtype=np.uint8)
        pixels = image[:, mask > 0].T
        X_train.extend(pixels)
        y_train.extend([row['class_id']] * len(pixels))

    # Train and predict
    rf = RandomForestClassifier(n_estimators=100, max_depth=20, n_jobs=-1)
    rf.fit(X_train, y_train)

    prediction = rf.predict(image.reshape(image.shape[0], -1).T)
    prediction = prediction.reshape(profile['height'], profile['width'])

    profile.update(dtype=rasterio.uint8, count=1)
    with rasterio.open(output_path, 'w', **profile) as dst:
        dst.write(prediction.astype(rasterio.uint8), 1)

    return rf
python
from sklearn.ensemble import RandomForestClassifier
import rasterio
from rasterio.features import rasterize

def classify_imagery(raster_path, training_gdf, output_path):
    """Train RF and classify imagery."""
    with rasterio.open(raster_path) as src:
        image = src.read()
        profile = src.profile
        transform = src.transform

    # Extract training data
    X_train, y_train = [], []
    for _, row in training_gdf.iterrows():
        mask = rasterize([(row.geometry, 1)],
                        out_shape=(profile['height'], profile['width']),
                        transform=transform, fill=0, dtype=np.uint8)
        pixels = image[:, mask > 0].T
        X_train.extend(pixels)
        y_train.extend([row['class_id']] * len(pixels))

    # Train and predict
    rf = RandomForestClassifier(n_estimators=100, max_depth=20, n_jobs=-1)
    rf.fit(X_train, y_train)

    prediction = rf.predict(image.reshape(image.shape[0], -1).T)
    prediction = prediction.reshape(profile['height'], profile['width'])

    profile.update(dtype=rasterio.uint8, count=1)
    with rasterio.open(output_path, 'w', **profile) as dst:
        dst.write(prediction.astype(rasterio.uint8), 1)

    return rf

Modern Cloud-Native Workflows

现代云原生工作流

STAC + Planetary Computer

STAC + Planetary Computer

python
import pystac_client
import planetary_computer
import odc.stac
python
import pystac_client
import planetary_computer
import odc.stac

Search Sentinel-2 via STAC

Search Sentinel-2 via STAC

catalog = pystac_client.Client.open( "https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace, )
search = catalog.search( collections=["sentinel-2-l2a"], bbox=[-122.5, 37.7, -122.3, 37.9], datetime="2023-01-01/2023-12-31", query={"eo:cloud_cover": {"lt": 20}}, )
catalog = pystac_client.Client.open( "https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace, )
search = catalog.search( collections=["sentinel-2-l2a"], bbox=[-122.5, 37.7, -122.3, 37.9], datetime="2023-01-01/2023-12-31", query={"eo:cloud_cover": {"lt": 20}}, )

Load as xarray (cloud-native!)

Load as xarray (cloud-native!)

data = odc.stac.load( list(search.get_items())[:5], bands=["B02", "B03", "B04", "B08"], crs="EPSG:32610", resolution=10, )
data = odc.stac.load( list(search.get_items())[:5], bands=["B02", "B03", "B04", "B08"], crs="EPSG:32610", resolution=10, )

Calculate NDVI on xarray

Calculate NDVI on xarray

ndvi = (data.B08 - data.B04) / (data.B08 + data.B04)
undefined
ndvi = (data.B08 - data.B04) / (data.B08 + data.B04)
undefined

Cloud-Optimized GeoTIFF (COG)

云优化GeoTIFF(COG)

python
import rasterio
from rasterio.session import AWSSession
python
import rasterio
from rasterio.session import AWSSession

Read COG directly from cloud (partial reads)

Read COG directly from cloud (partial reads)

session = AWSSession(aws_access_key_id=..., aws_secret_access_key=...) with rasterio.open('s3://bucket/path.tif', session=session) as src: # Read only window of interest window = ((1000, 2000), (1000, 2000)) subset = src.read(1, window=window)
session = AWSSession(aws_access_key_id=..., aws_secret_access_key=...) with rasterio.open('s3://bucket/path.tif', session=session) as src: # Read only window of interest window = ((1000, 2000), (1000, 2000)) subset = src.read(1, window=window)

Write COG

Write COG

with rasterio.open('output.tif', 'w', **profile, tiled=True, blockxsize=256, blockysize=256, compress='DEFLATE', predictor=2) as dst: dst.write(data)
with rasterio.open('output.tif', 'w', **profile, tiled=True, blockxsize=256, blockysize=256, compress='DEFLATE', predictor=2) as dst: dst.write(data)

Validate COG

Validate COG

from rio_cogeo.cogeo import cog_validate cog_validate('output.tif')
undefined
from rio_cogeo.cogeo import cog_validate cog_validate('output.tif')
undefined

Performance Tips

性能优化技巧

python
undefined
python
undefined

1. Spatial indexing (10-100x faster queries)

1. Spatial indexing (10-100x faster queries)

gdf.sindex # Auto-created by GeoPandas
gdf.sindex # Auto-created by GeoPandas

2. Chunk large rasters

2. Chunk large rasters

with rasterio.open('large.tif') as src: for i, window in src.block_windows(1): block = src.read(1, window=window)
with rasterio.open('large.tif') as src: for i, window in src.block_windows(1): block = src.read(1, window=window)

3. Dask for big data

3. Dask for big data

import dask.array as da dask_array = da.from_rasterio('large.tif', chunks=(1, 1024, 1024))
import dask.array as da dask_array = da.from_rasterio('large.tif', chunks=(1, 1024, 1024))

4. Use Arrow for I/O

4. Use Arrow for I/O

gdf.to_file('output.gpkg', use_arrow=True)
gdf.to_file('output.gpkg', use_arrow=True)

5. GDAL caching

5. GDAL caching

from osgeo import gdal gdal.SetCacheMax(2**30) # 1GB cache
from osgeo import gdal gdal.SetCacheMax(2**30) # 1GB cache

6. Parallel processing

6. Parallel processing

rf = RandomForestClassifier(n_jobs=-1) # All cores
undefined
rf = RandomForestClassifier(n_jobs=-1) # All cores
undefined

Best Practices

最佳实践

  1. Always check CRS before spatial operations
  2. Use projected CRS for area/distance calculations
  3. Validate geometries:
    gdf = gdf[gdf.is_valid]
  4. Handle missing data:
    gdf['geometry'] = gdf['geometry'].fillna(None)
  5. Use efficient formats: GeoPackage > Shapefile, Parquet for large data
  6. Apply cloud masking to optical imagery
  7. Preserve lineage for reproducible research
  8. Use appropriate resolution for your analysis scale
  1. 空间操作前务必检查CRS
  2. 使用投影坐标系进行面积/距离计算
  3. 验证几何有效性
    gdf = gdf[gdf.is_valid]
  4. 处理缺失数据
    gdf['geometry'] = gdf['geometry'].fillna(None)
  5. 使用高效格式:GeoPackage优于Shapefile,大数据使用Parquet
  6. 对光学影像应用云掩膜
  7. 保留数据溯源信息以实现可复现研究
  8. 根据分析尺度选择合适的分辨率

Detailed Documentation

详细文档

  • Coordinate Systems - CRS fundamentals, UTM, transformations
  • Core Libraries - GDAL, Rasterio, GeoPandas, Shapely
  • Remote Sensing - Satellite missions, spectral indices, SAR
  • Machine Learning - Deep learning, CNNs, GNNs for RS
  • GIS Software - QGIS, ArcGIS, GRASS integration
  • Scientific Domains - Marine, hydrology, agriculture, forestry
  • Advanced GIS - 3D GIS, spatiotemporal, topology
  • Big Data - Distributed processing, GPU acceleration
  • Industry Applications - Urban planning, disaster management
  • Programming Languages - Python, R, Julia, JS, C++, Java, Go, Rust
  • Data Sources - Satellite catalogs, APIs
  • Troubleshooting - Common issues, debugging, error reference
  • Code Examples - 500+ examples

GeoMaster covers everything from basic GIS operations to advanced remote sensing and machine learning.
  • 坐标系 - CRS基础、UTM、坐标转换
  • 核心库 - GDAL、Rasterio、GeoPandas、Shapely
  • 遥感 - 卫星任务、光谱指数、SAR
  • 机器学习 - 深度学习、CNN、GNN在遥感中的应用
  • GIS软件 - QGIS、ArcGIS、GRASS集成
  • 科学领域 - 海洋、水文、农业、林业
  • 高级GIS - 3D GIS、时空分析、拓扑
  • 大数据 - 分布式处理、GPU加速
  • 行业应用 - 城市规划、灾害管理
  • 编程语言 - Python、R、Julia、JS、C++、Java、Go、Rust
  • 数据源 - 卫星目录、API
  • 故障排除 - 常见问题、调试、错误参考
  • 代码示例 - 500+示例

GeoMaster涵盖了从基础GIS操作到高级遥感和机器学习的所有内容。