geopandas
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!-- Adapted from: claude-scientific-skills/scientific-skills/geopandas -->
<!-- 改编自:claude-scientific-skills/scientific-skills/geopandas -->
GeoPandas Geospatial Data Analysis
GeoPandas地理空间数据分析
Python library for geospatial vector data - extends pandas with spatial operations.
这是一个用于地理空间矢量数据的Python库——在pandas基础上扩展了空间操作功能。
When to Use
适用场景
- Working with geographic/spatial data (shapefiles, GeoJSON, GeoPackage)
- Spatial analysis (buffer, intersection, spatial joins)
- Coordinate transformations and projections
- Creating choropleth maps
- Processing geographic boundaries, points, lines, polygons
- 处理地理/空间数据(shapefile、GeoJSON、GeoPackage)
- 空间分析(缓冲区、交集、空间连接)
- 坐标转换与投影
- 创建分级统计图
- 处理地理边界、点、线、面
Quick Start
快速入门
python
import geopandas as gpdpython
import geopandas as gpdRead spatial data
Read spatial data
gdf = gpd.read_file("data.geojson")
gdf = gpd.read_file("data.geojson")
Basic exploration
Basic exploration
print(gdf.head())
print(gdf.crs) # Coordinate Reference System
print(gdf.geometry.geom_type)
print(gdf.head())
print(gdf.crs) # Coordinate Reference System
print(gdf.geometry.geom_type)
Simple plot
Simple plot
gdf.plot()
gdf.plot()
Reproject to different CRS
Reproject to different CRS
gdf_projected = gdf.to_crs("EPSG:3857")
gdf_projected = gdf.to_crs("EPSG:3857")
Calculate area (use projected CRS)
Calculate area (use projected CRS)
gdf_projected['area'] = gdf_projected.geometry.area
gdf_projected['area'] = gdf_projected.geometry.area
Save to file
Save to file
gdf.to_file("output.gpkg")
undefinedgdf.to_file("output.gpkg")
undefinedReading/Writing Data
数据读写
python
undefinedpython
undefinedRead various formats
Read various formats
gdf = gpd.read_file("data.shp") # Shapefile
gdf = gpd.read_file("data.geojson") # GeoJSON
gdf = gpd.read_file("data.gpkg") # GeoPackage
gdf = gpd.read_file("data.shp") # Shapefile
gdf = gpd.read_file("data.geojson") # GeoJSON
gdf = gpd.read_file("data.gpkg") # GeoPackage
Read with spatial filter (faster for large files)
Read with spatial filter (faster for large files)
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))
Write to file
Write to file
gdf.to_file("output.gpkg")
gdf.to_file("output.geojson", driver="GeoJSON")
gdf.to_file("output.gpkg")
gdf.to_file("output.geojson", driver="GeoJSON")
PostGIS database
PostGIS database
from sqlalchemy import create_engine
engine = create_engine("postgresql://user:pass@localhost/db")
gdf = gpd.read_postgis("SELECT * FROM table", con=engine, geom_col='geom')
undefinedfrom sqlalchemy import create_engine
engine = create_engine("postgresql://user:pass@localhost/db")
gdf = gpd.read_postgis("SELECT * FROM table", con=engine, geom_col='geom')
undefinedCoordinate Reference Systems
坐标参考系统
python
undefinedpython
undefinedCheck CRS
Check CRS
print(gdf.crs)
print(gdf.crs)
Set CRS (when metadata missing)
Set CRS (when metadata missing)
gdf = gdf.set_crs("EPSG:4326")
gdf = gdf.set_crs("EPSG:4326")
Reproject (transforms coordinates)
Reproject (transforms coordinates)
gdf_projected = gdf.to_crs("EPSG:3857") # Web Mercator
gdf_projected = gdf.to_crs("EPSG:32633") # UTM zone 33N
gdf_projected = gdf.to_crs("EPSG:3857") # Web Mercator
gdf_projected = gdf.to_crs("EPSG:32633") # UTM zone 33N
Common CRS codes:
Common CRS codes:
EPSG:4326 - WGS84 (lat/lon)
EPSG:4326 - WGS84 (lat/lon)
EPSG:3857 - Web Mercator
EPSG:3857 - Web Mercator
EPSG:326XX - UTM zones
EPSG:326XX - UTM zones
undefinedundefinedGeometric Operations
几何操作
python
undefinedpython
undefinedBuffer (expand/shrink geometries)
Buffer (expand/shrink geometries)
buffered = gdf.geometry.buffer(100) # 100 units buffer
buffered = gdf.geometry.buffer(100) # 100 units buffer
Centroid
Centroid
centroids = gdf.geometry.centroid
centroids = gdf.geometry.centroid
Simplify (reduce vertices)
Simplify (reduce vertices)
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)
Convex hull
Convex hull
hull = gdf.geometry.convex_hull
hull = gdf.geometry.convex_hull
Boundary
Boundary
boundary = gdf.geometry.boundary
boundary = gdf.geometry.boundary
Area and length (use projected CRS!)
Area and length (use projected CRS!)
gdf['area'] = gdf.geometry.area
gdf['length'] = gdf.geometry.length
undefinedgdf['area'] = gdf.geometry.area
gdf['length'] = gdf.geometry.length
undefinedSpatial Analysis
空间分析
Spatial Joins
空间连接
python
undefinedpython
undefinedJoin based on spatial relationship
Join based on spatial relationship
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects')
joined = gpd.sjoin(gdf1, gdf2, predicate='within')
joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects')
joined = gpd.sjoin(gdf1, gdf2, predicate='within')
joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
Nearest neighbor join
Nearest neighbor join
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
undefinednearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
undefinedOverlay Operations
叠加操作
python
undefinedpython
undefinedIntersection
Intersection
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
Union
Union
union = gpd.overlay(gdf1, gdf2, how='union')
union = gpd.overlay(gdf1, gdf2, how='union')
Difference
Difference
difference = gpd.overlay(gdf1, gdf2, how='difference')
undefineddifference = gpd.overlay(gdf1, gdf2, how='difference')
undefinedDissolve (Aggregate by Attribute)
融合(按属性聚合)
python
undefinedpython
undefinedMerge geometries by attribute
Merge geometries by attribute
dissolved = gdf.dissolve(by='region', aggfunc='sum')
undefineddissolved = gdf.dissolve(by='region', aggfunc='sum')
undefinedClip
裁剪
python
undefinedpython
undefinedClip data to boundary
Clip data to boundary
clipped = gpd.clip(gdf, boundary_gdf)
undefinedclipped = gpd.clip(gdf, boundary_gdf)
undefinedVisualization
可视化
python
import matplotlib.pyplot as pltpython
import matplotlib.pyplot as pltBasic plot
Basic plot
gdf.plot()
gdf.plot()
Choropleth map
Choropleth map
gdf.plot(column='population', cmap='YlOrRd', legend=True)
gdf.plot(column='population', cmap='YlOrRd', legend=True)
Multi-layer map
Multi-layer map
fig, ax = plt.subplots(figsize=(10, 10))
gdf1.plot(ax=ax, color='blue', alpha=0.5)
gdf2.plot(ax=ax, color='red', alpha=0.5)
plt.savefig('map.png', dpi=300, bbox_inches='tight')
fig, ax = plt.subplots(figsize=(10, 10))
gdf1.plot(ax=ax, color='blue', alpha=0.5)
gdf2.plot(ax=ax, color='red', alpha=0.5)
plt.savefig('map.png', dpi=300, bbox_inches='tight')
Interactive map (requires folium)
Interactive map (requires folium)
gdf.explore(column='population', legend=True)
undefinedgdf.explore(column='population', legend=True)
undefinedCommon Workflows
常见工作流
Spatial Join and Aggregate
空间连接与聚合
python
undefinedpython
undefinedJoin points to polygons
Join points to polygons
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')
Aggregate by polygon
Aggregate by polygon
aggregated = points_in_polygons.groupby('index_right').agg({
'value': 'sum',
'count': 'size'
})
aggregated = points_in_polygons.groupby('index_right').agg({
'value': 'sum',
'count': 'size'
})
Merge back to polygons
Merge back to polygons
result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
undefinedresult = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
undefinedBuffer Analysis
缓冲区分析
python
undefinedpython
undefinedCreate buffers around points
Create buffers around points
gdf_projected = points_gdf.to_crs("EPSG:3857") # Project first!
gdf_projected['buffer'] = gdf_projected.geometry.buffer(1000) # 1km buffer
gdf_projected = gdf_projected.set_geometry('buffer')
gdf_projected = points_gdf.to_crs("EPSG:3857") # Project first!
gdf_projected['buffer'] = gdf_projected.geometry.buffer(1000) # 1km buffer
gdf_projected = gdf_projected.set_geometry('buffer')
Find features within buffer
Find features within buffer
within_buffer = gpd.sjoin(other_gdf, gdf_projected, predicate='within')
undefinedwithin_buffer = gpd.sjoin(other_gdf, gdf_projected, predicate='within')
undefinedBest Practices
最佳实践
- Always check CRS before spatial operations
- Use projected CRS for area/distance calculations
- Match CRS before spatial joins or overlays
- Validate geometries with before operations
.is_valid - Use GeoPackage format over Shapefile (modern, better)
- Use when modifying geometry to avoid side effects
.copy() - Filter during read with for large files
bbox
- 进行空间操作前务必检查CRS
- 使用投影CRS计算面积/距离
- 空间连接或叠加前匹配CRS
- 操作前使用.is_valid验证几何图形
- 优先使用GeoPackage格式而非Shapefile(更现代、更优)
- 修改几何图形时使用.copy()避免副作用
- 处理大文件时,使用bbox参数在读取阶段进行过滤
vs Alternatives
与替代工具对比
| Tool | Best For |
|---|---|
| GeoPandas | Vector data analysis, spatial operations |
| Rasterio | Raster data (satellite imagery, DEMs) |
| Shapely | Low-level geometry operations |
| Folium | Interactive web maps |
| 工具 | 最佳适用场景 |
|---|---|
| GeoPandas | 矢量数据分析、空间操作 |
| Rasterio | 栅格数据(卫星影像、数字高程模型) |
| Shapely | 底层几何操作 |
| Folium | 交互式Web地图 |