geopandas

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- Adapted from: claude-scientific-skills/scientific-skills/geopandas -->
<!-- 改编自:claude-scientific-skills/scientific-skills/geopandas -->

GeoPandas Geospatial Data Analysis

GeoPandas地理空间数据分析

Python library for geospatial vector data - extends pandas with spatial operations.
这是一个用于地理空间矢量数据的Python库——在pandas基础上扩展了空间操作功能。

When to Use

适用场景

  • Working with geographic/spatial data (shapefiles, GeoJSON, GeoPackage)
  • Spatial analysis (buffer, intersection, spatial joins)
  • Coordinate transformations and projections
  • Creating choropleth maps
  • Processing geographic boundaries, points, lines, polygons
  • 处理地理/空间数据(shapefile、GeoJSON、GeoPackage)
  • 空间分析(缓冲区、交集、空间连接)
  • 坐标转换与投影
  • 创建分级统计图
  • 处理地理边界、点、线、面

Quick Start

快速入门

python
import geopandas as gpd
python
import geopandas as gpd

Read spatial data

Read spatial data

gdf = gpd.read_file("data.geojson")
gdf = gpd.read_file("data.geojson")

Basic exploration

Basic exploration

print(gdf.head()) print(gdf.crs) # Coordinate Reference System print(gdf.geometry.geom_type)
print(gdf.head()) print(gdf.crs) # Coordinate Reference System print(gdf.geometry.geom_type)

Simple plot

Simple plot

gdf.plot()
gdf.plot()

Reproject to different CRS

Reproject to different CRS

gdf_projected = gdf.to_crs("EPSG:3857")
gdf_projected = gdf.to_crs("EPSG:3857")

Calculate area (use projected CRS)

Calculate area (use projected CRS)

gdf_projected['area'] = gdf_projected.geometry.area
gdf_projected['area'] = gdf_projected.geometry.area

Save to file

Save to file

gdf.to_file("output.gpkg")
undefined
gdf.to_file("output.gpkg")
undefined

Reading/Writing Data

数据读写

python
undefined
python
undefined

Read various formats

Read various formats

gdf = gpd.read_file("data.shp") # Shapefile gdf = gpd.read_file("data.geojson") # GeoJSON gdf = gpd.read_file("data.gpkg") # GeoPackage
gdf = gpd.read_file("data.shp") # Shapefile gdf = gpd.read_file("data.geojson") # GeoJSON gdf = gpd.read_file("data.gpkg") # GeoPackage

Read with spatial filter (faster for large files)

Read with spatial filter (faster for large files)

gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))

Write to file

Write to file

gdf.to_file("output.gpkg") gdf.to_file("output.geojson", driver="GeoJSON")
gdf.to_file("output.gpkg") gdf.to_file("output.geojson", driver="GeoJSON")

PostGIS database

PostGIS database

from sqlalchemy import create_engine engine = create_engine("postgresql://user:pass@localhost/db") gdf = gpd.read_postgis("SELECT * FROM table", con=engine, geom_col='geom')
undefined
from sqlalchemy import create_engine engine = create_engine("postgresql://user:pass@localhost/db") gdf = gpd.read_postgis("SELECT * FROM table", con=engine, geom_col='geom')
undefined

Coordinate Reference Systems

坐标参考系统

python
undefined
python
undefined

Check CRS

Check CRS

print(gdf.crs)
print(gdf.crs)

Set CRS (when metadata missing)

Set CRS (when metadata missing)

gdf = gdf.set_crs("EPSG:4326")
gdf = gdf.set_crs("EPSG:4326")

Reproject (transforms coordinates)

Reproject (transforms coordinates)

gdf_projected = gdf.to_crs("EPSG:3857") # Web Mercator gdf_projected = gdf.to_crs("EPSG:32633") # UTM zone 33N
gdf_projected = gdf.to_crs("EPSG:3857") # Web Mercator gdf_projected = gdf.to_crs("EPSG:32633") # UTM zone 33N

Common CRS codes:

Common CRS codes:

EPSG:4326 - WGS84 (lat/lon)

EPSG:4326 - WGS84 (lat/lon)

EPSG:3857 - Web Mercator

EPSG:3857 - Web Mercator

EPSG:326XX - UTM zones

EPSG:326XX - UTM zones

undefined
undefined

Geometric Operations

几何操作

python
undefined
python
undefined

Buffer (expand/shrink geometries)

Buffer (expand/shrink geometries)

buffered = gdf.geometry.buffer(100) # 100 units buffer
buffered = gdf.geometry.buffer(100) # 100 units buffer

Centroid

Centroid

centroids = gdf.geometry.centroid
centroids = gdf.geometry.centroid

Simplify (reduce vertices)

Simplify (reduce vertices)

simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)

Convex hull

Convex hull

hull = gdf.geometry.convex_hull
hull = gdf.geometry.convex_hull

Boundary

Boundary

boundary = gdf.geometry.boundary
boundary = gdf.geometry.boundary

Area and length (use projected CRS!)

Area and length (use projected CRS!)

gdf['area'] = gdf.geometry.area gdf['length'] = gdf.geometry.length
undefined
gdf['area'] = gdf.geometry.area gdf['length'] = gdf.geometry.length
undefined

Spatial Analysis

空间分析

Spatial Joins

空间连接

python
undefined
python
undefined

Join based on spatial relationship

Join based on spatial relationship

joined = gpd.sjoin(gdf1, gdf2, predicate='intersects') joined = gpd.sjoin(gdf1, gdf2, predicate='within') joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects') joined = gpd.sjoin(gdf1, gdf2, predicate='within') joined = gpd.sjoin(gdf1, gdf2, predicate='contains')

Nearest neighbor join

Nearest neighbor join

nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
undefined
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
undefined

Overlay Operations

叠加操作

python
undefined
python
undefined

Intersection

Intersection

intersection = gpd.overlay(gdf1, gdf2, how='intersection')
intersection = gpd.overlay(gdf1, gdf2, how='intersection')

Union

Union

union = gpd.overlay(gdf1, gdf2, how='union')
union = gpd.overlay(gdf1, gdf2, how='union')

Difference

Difference

difference = gpd.overlay(gdf1, gdf2, how='difference')
undefined
difference = gpd.overlay(gdf1, gdf2, how='difference')
undefined

Dissolve (Aggregate by Attribute)

融合(按属性聚合)

python
undefined
python
undefined

Merge geometries by attribute

Merge geometries by attribute

dissolved = gdf.dissolve(by='region', aggfunc='sum')
undefined
dissolved = gdf.dissolve(by='region', aggfunc='sum')
undefined

Clip

裁剪

python
undefined
python
undefined

Clip data to boundary

Clip data to boundary

clipped = gpd.clip(gdf, boundary_gdf)
undefined
clipped = gpd.clip(gdf, boundary_gdf)
undefined

Visualization

可视化

python
import matplotlib.pyplot as plt
python
import matplotlib.pyplot as plt

Basic plot

Basic plot

gdf.plot()
gdf.plot()

Choropleth map

Choropleth map

gdf.plot(column='population', cmap='YlOrRd', legend=True)
gdf.plot(column='population', cmap='YlOrRd', legend=True)

Multi-layer map

Multi-layer map

fig, ax = plt.subplots(figsize=(10, 10)) gdf1.plot(ax=ax, color='blue', alpha=0.5) gdf2.plot(ax=ax, color='red', alpha=0.5) plt.savefig('map.png', dpi=300, bbox_inches='tight')
fig, ax = plt.subplots(figsize=(10, 10)) gdf1.plot(ax=ax, color='blue', alpha=0.5) gdf2.plot(ax=ax, color='red', alpha=0.5) plt.savefig('map.png', dpi=300, bbox_inches='tight')

Interactive map (requires folium)

Interactive map (requires folium)

gdf.explore(column='population', legend=True)
undefined
gdf.explore(column='population', legend=True)
undefined

Common Workflows

常见工作流

Spatial Join and Aggregate

空间连接与聚合

python
undefined
python
undefined

Join points to polygons

Join points to polygons

points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')

Aggregate by polygon

Aggregate by polygon

aggregated = points_in_polygons.groupby('index_right').agg({ 'value': 'sum', 'count': 'size' })
aggregated = points_in_polygons.groupby('index_right').agg({ 'value': 'sum', 'count': 'size' })

Merge back to polygons

Merge back to polygons

result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
undefined
result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
undefined

Buffer Analysis

缓冲区分析

python
undefined
python
undefined

Create buffers around points

Create buffers around points

gdf_projected = points_gdf.to_crs("EPSG:3857") # Project first! gdf_projected['buffer'] = gdf_projected.geometry.buffer(1000) # 1km buffer gdf_projected = gdf_projected.set_geometry('buffer')
gdf_projected = points_gdf.to_crs("EPSG:3857") # Project first! gdf_projected['buffer'] = gdf_projected.geometry.buffer(1000) # 1km buffer gdf_projected = gdf_projected.set_geometry('buffer')

Find features within buffer

Find features within buffer

within_buffer = gpd.sjoin(other_gdf, gdf_projected, predicate='within')
undefined
within_buffer = gpd.sjoin(other_gdf, gdf_projected, predicate='within')
undefined

Best Practices

最佳实践

  1. Always check CRS before spatial operations
  2. Use projected CRS for area/distance calculations
  3. Match CRS before spatial joins or overlays
  4. Validate geometries with
    .is_valid
    before operations
  5. Use GeoPackage format over Shapefile (modern, better)
  6. Use
    .copy()
    when modifying geometry to avoid side effects
  7. Filter during read with
    bbox
    for large files
  1. 进行空间操作前务必检查CRS
  2. 使用投影CRS计算面积/距离
  3. 空间连接或叠加前匹配CRS
  4. 操作前使用.is_valid验证几何图形
  5. 优先使用GeoPackage格式而非Shapefile(更现代、更优)
  6. 修改几何图形时使用.copy()避免副作用
  7. 处理大文件时,使用bbox参数在读取阶段进行过滤

vs Alternatives

与替代工具对比

ToolBest For
GeoPandasVector data analysis, spatial operations
RasterioRaster data (satellite imagery, DEMs)
ShapelyLow-level geometry operations
FoliumInteractive web maps
工具最佳适用场景
GeoPandas矢量数据分析、空间操作
Rasterio栅格数据(卫星影像、数字高程模型)
Shapely底层几何操作
Folium交互式Web地图

Resources

资源