geospatial-data-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGeospatial Data Pipeline
地理空间数据处理流水线
Expert in processing, optimizing, and visualizing geospatial data at scale.
规模化处理、优化和可视化地理空间数据的专家方案。
When to Use
适用场景
✅ Use for:
- Drone imagery processing and annotation
- GPS track analysis and visualization
- Location-based search (find nearby X)
- Map tile generation for web/mobile
- Coordinate system transformations
- Geofencing and spatial queries
- GeoJSON optimization for web
❌ NOT for:
- Simple address validation (use address APIs)
- Basic distance calculations (use Haversine formula)
- Static map embeds (use Mapbox Static API)
- Geocoding (use Nominatim or Google Geocoding API)
✅ 适用场景:
- 无人机影像处理与标注
- GPS轨迹分析与可视化
- 基于位置的搜索(查找附近的X)
- 为Web/移动端生成地图瓦片
- 坐标系转换
- 地理围栏与空间查询
- 面向Web的GeoJSON优化
❌ 不适用场景:
- 简单地址验证(使用地址API)
- 基础距离计算(使用Haversine公式)
- 静态地图嵌入(使用Mapbox Static API)
- 地理编码(使用Nominatim或Google Geocoding API)
Technology Selection
技术选型
Database: PostGIS vs MongoDB Geospatial
数据库:PostGIS vs MongoDB地理空间功能
| Feature | PostGIS | MongoDB |
|---|---|---|
| Spatial indexes | GiST, SP-GiST | 2dsphere |
| Query language | SQL + spatial functions | Aggregation pipeline |
| Geometry types | 20+ (full OGC support) | Basic (Point, Line, Polygon) |
| Coordinate systems | 6000+ via EPSG | WGS84 only |
| Performance (10M points) | <100ms | <200ms |
| Best for | Complex spatial analysis | Document-centric apps |
Timeline:
- 2005: PostGIS 1.0 released
- 2012: MongoDB adds geospatial indexes
- 2020: PostGIS 3.0 with improved performance
- 2024: PostGIS remains gold standard for GIS workloads
| 特性 | PostGIS | MongoDB |
|---|---|---|
| 空间索引 | GiST, SP-GiST | 2dsphere |
| 查询语言 | SQL + 空间函数 | 聚合管道 |
| 几何类型 | 20+种(完整支持OGC标准) | 基础类型(点、线、面) |
| 坐标系 | 支持6000+种(通过EPSG) | 仅支持WGS84 |
| 性能(1000万条点数据) | <100ms | <200ms |
| 最佳适用场景 | 复杂空间分析 | 文档中心型应用 |
时间线:
- 2005年:PostGIS 1.0版本发布
- 2012年:MongoDB新增空间索引功能
- 2020年:PostGIS 3.0版本性能提升
- 2024年:PostGIS仍是GIS工作负载的黄金标准
Common Anti-Patterns
常见反模式
Anti-Pattern 1: Storing Coordinates as Strings
反模式1:以字符串存储坐标
Novice thinking: "I'll just store lat/lon as text, it's simple"
Problem: Can't use spatial indexes, queries are slow, no validation.
Wrong approach:
typescript
// ❌ String storage, no spatial features
interface Location {
id: string;
name: string;
latitude: string; // "37.7749"
longitude: string; // "-122.4194"
}
// Linear scan for "nearby" queries
async function findNearby(lat: string, lon: string): Promise<Location[]> {
const all = await db.locations.findAll();
return all.filter(loc => {
const distance = calculateDistance(
parseFloat(lat),
parseFloat(lon),
parseFloat(loc.latitude),
parseFloat(loc.longitude)
);
return distance < 5000; // 5km
});
}Why wrong: O(N) linear scan, no spatial index, string parsing overhead.
Correct approach:
typescript
// ✅ PostGIS GEOGRAPHY type with spatial index
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
location GEOGRAPHY(POINT, 4326) -- WGS84 coordinates
);
-- Spatial index (GiST)
CREATE INDEX idx_locations_geography ON locations USING GIST(location);
-- TypeScript query
async function findNearby(lat: number, lon: number, radiusMeters: number): Promise<Location[]> {
const query = `
SELECT id, name, ST_AsGeoJSON(location) as geojson
FROM locations
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography,
$3
)
ORDER BY location <-> ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography
LIMIT 100
`;
return db.query(query, [lon, lat, radiusMeters]); // <10ms with index
}Timeline context:
- 2000s: Stored lat/lon as FLOAT columns, did math in app code
- 2010s: PostGIS adoption, spatial indexes
- 2024: type handles Earth curvature automatically
GEOGRAPHY
新手误区:「我直接把经纬度存成文本就行,简单省事」
问题:无法使用空间索引,查询速度慢,无数据校验。
错误示例:
typescript
// ❌ String storage, no spatial features
interface Location {
id: string;
name: string;
latitude: string; // "37.7749"
longitude: string; // "-122.4194"
}
// Linear scan for "nearby" queries
async function findNearby(lat: string, lon: string): Promise<Location[]> {
const all = await db.locations.findAll();
return all.filter(loc => {
const distance = calculateDistance(
parseFloat(lat),
parseFloat(lon),
parseFloat(loc.latitude),
parseFloat(loc.longitude)
);
return distance < 5000; // 5km
});
}错误原因:O(N)线性扫描,无空间索引,存在字符串解析开销。
正确方案:
typescript
// ✅ PostGIS GEOGRAPHY type with spatial index
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
location GEOGRAPHY(POINT, 4326) -- WGS84 coordinates
);
-- Spatial index (GiST)
CREATE INDEX idx_locations_geography ON locations USING GIST(location);
-- TypeScript query
async function findNearby(lat: number, lon: number, radiusMeters: number): Promise<Location[]> {
const query = `
SELECT id, name, ST_AsGeoJSON(location) as geojson
FROM locations
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography,
$3
)
ORDER BY location <-> ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography
LIMIT 100
`;
return db.query(query, [lon, lat, radiusMeters]); // <10ms with index
}时间线背景:
- 2000年代:将经纬度存储为FLOAT列,在应用代码中计算
- 2010年代:PostGIS普及,空间索引开始使用
- 2024年:类型可自动处理地球曲率
GEOGRAPHY
Anti-Pattern 2: Not Using Spatial Indexes
反模式2:未使用空间索引
Problem: Proximity queries do full table scans.
Wrong approach:
sql
-- ❌ No index, sequential scan
CREATE TABLE drone_images (
id SERIAL PRIMARY KEY,
image_url VARCHAR(255),
location GEOGRAPHY(POINT, 4326)
);
-- This query scans ALL rows
SELECT * FROM drone_images
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
1000 -- 1km
);EXPLAIN output:
Seq Scan on drone_images (cost=0.00..1234.56 rows=1 width=123)Correct approach:
sql
-- ✅ GiST index for spatial queries
CREATE INDEX idx_drone_images_location ON drone_images USING GIST(location);
-- Same query, now uses index
SELECT * FROM drone_images
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
1000
);EXPLAIN output:
Bitmap Index Scan on idx_drone_images_location (cost=4.30..78.30 rows=50 width=123)Performance impact: 10M points, 5km radius query
- Without index: 3.2 seconds (full scan)
- With GiST index: 12ms (99.6% faster)
问题:邻近查询会执行全表扫描。
错误示例:
sql
-- ❌ No index, sequential scan
CREATE TABLE drone_images (
id SERIAL PRIMARY KEY,
image_url VARCHAR(255),
location GEOGRAPHY(POINT, 4326)
);
-- This query scans ALL rows
SELECT * FROM drone_images
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
1000 -- 1km
);执行计划输出:
Seq Scan on drone_images (cost=0.00..1234.56 rows=1 width=123)正确方案:
sql
-- ✅ GiST index for spatial queries
CREATE INDEX idx_drone_images_location ON drone_images USING GIST(location);
-- Same query, now uses index
SELECT * FROM drone_images
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
1000
);执行计划输出:
Bitmap Index Scan on idx_drone_images_location (cost=4.30..78.30 rows=50 width=123)性能影响:1000万条点数据,5公里半径查询
- 无索引:3.2秒(全表扫描)
- 有GiST索引:12毫秒(速度提升99.6%)
Anti-Pattern 3: Mixing Coordinate Systems
反模式3:混用坐标系
Novice thinking: "Coordinates are just numbers, I can mix them"
Problem: Incorrect distances, misaligned map features.
Wrong approach:
typescript
// ❌ Mixing EPSG:4326 (WGS84) and EPSG:3857 (Web Mercator)
const userLocation = {
lat: 37.7749, // WGS84
lon: -122.4194
};
const droneImage = {
x: -13634876, // Web Mercator (EPSG:3857)
y: 4545684
};
// Comparing apples to oranges!
const distance = Math.sqrt(
Math.pow(userLocation.lon - droneImage.x, 2) +
Math.pow(userLocation.lat - droneImage.y, 2)
);Result: Wildly incorrect distance (millions of "units").
Correct approach:
sql
-- ✅ Transform to common coordinate system
SELECT ST_Distance(
ST_Transform(
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326), -- WGS84
3857 -- Transform to Web Mercator
),
ST_SetSRID(ST_MakePoint(-13634876, 4545684), 3857) -- Already Web Mercator
) AS distance_meters;Or better: Always store in one system (WGS84), transform on display only.
Timeline:
- 2005: Web Mercator (EPSG:3857) introduced by Google Maps
- 2010: Confusion peaks as apps mix WGS84 data with Web Mercator tiles
- 2024: Best practice: Store WGS84, transform to 3857 only for tile rendering
新手误区:「坐标只是数字,我可以混用」
问题:距离计算错误,地图要素错位。
错误示例:
typescript
// ❌ Mixing EPSG:4326 (WGS84) and EPSG:3857 (Web Mercator)
const userLocation = {
lat: 37.7749, // WGS84
lon: -122.4194
};
const droneImage = {
x: -13634876, // Web Mercator (EPSG:3857)
y: 4545684
};
// Comparing apples to oranges!
const distance = Math.sqrt(
Math.pow(userLocation.lon - droneImage.x, 2) +
Math.pow(userLocation.lat - droneImage.y, 2)
);结果:距离计算完全错误(数值达数百万“单位”)。
正确方案:
sql
-- ✅ Transform to common coordinate system
SELECT ST_Distance(
ST_Transform(
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326), -- WGS84
3857 -- Transform to Web Mercator
),
ST_SetSRID(ST_MakePoint(-13634876, 4545684), 3857) -- Already Web Mercator
) AS distance_meters;更优方案:始终在统一系统中存储坐标(推荐WGS84),仅在展示时转换。
时间线:
- 2005年:Google Maps引入Web Mercator(EPSG:3857)
- 2010年:应用混用WGS84数据与Web Mercator瓦片的混乱达到顶峰
- 2024年:最佳实践:存储WGS84坐标,仅在瓦片渲染时转换为3857
Anti-Pattern 4: Loading Huge GeoJSON Files
反模式4:加载超大GeoJSON文件
Problem: 50MB GeoJSON file crashes browser.
Wrong approach:
typescript
// ❌ Load entire file into memory
const geoJson = await fetch('/drone-survey-data.geojson').then(r => r.json());
// 50MB of GeoJSON = browser freeze
map.addSource('drone-data', {
type: 'geojson',
data: geoJson // All 10,000 polygons loaded at once
});Correct approach 1: Vector tiles (pre-chunked)
typescript
// ✅ Serve as vector tiles (MBTiles or PMTiles)
map.addSource('drone-data', {
type: 'vector',
tiles: ['https://api.example.com/tiles/{z}/{x}/{y}.pbf'],
minzoom: 10,
maxzoom: 18
});
// Browser only loads visible tilesCorrect approach 2: GeoJSON simplification + chunking
bash
undefined问题:50MB的GeoJSON文件会导致浏览器崩溃。
错误示例:
typescript
// ❌ Load entire file into memory
const geoJson = await fetch('/drone-survey-data.geojson').then(r => r.json());
// 50MB of GeoJSON = browser freeze
map.addSource('drone-data', {
type: 'geojson',
data: geoJson // All 10,000 polygons loaded at once
});正确方案1:矢量瓦片(预分块)
typescript
// ✅ Serve as vector tiles (MBTiles or PMTiles)
map.addSource('drone-data', {
type: 'vector',
tiles: ['https://api.example.com/tiles/{z}/{x}/{y}.pbf'],
minzoom: 10,
maxzoom: 18
});
// Browser only loads visible tiles正确方案2:GeoJSON简化 + 分块
bash
undefinedSimplify geometry (reduce points)
Simplify geometry (reduce points)
npm install -g @mapbox/geojson-precision
geojson-precision -p 5 input.geojson output.geojson
npm install -g @mapbox/geojson-precision
geojson-precision -p 5 input.geojson output.geojson
Split into tiles
Split into tiles
npm install -g geojson-vt
npm install -g geojson-vt
Generate tiles programmatically (see scripts/tile_generator.ts)
Generate tiles programmatically (see scripts/tile_generator.ts)
**Correct approach 3**: Server-side filtering
```typescript
// ✅ Only fetch visible bounds
async function fetchVisibleFeatures(bounds: Bounds): Promise<GeoJSON> {
const response = await fetch(
`/api/features?bbox=${bounds.west},${bounds.south},${bounds.east},${bounds.north}`
);
return response.json();
}
map.on('moveend', async () => {
const bounds = map.getBounds();
const geojson = await fetchVisibleFeatures(bounds);
map.getSource('dynamic-data').setData(geojson);
});
**正确方案3**:服务端过滤
```typescript
// ✅ Only fetch visible bounds
async function fetchVisibleFeatures(bounds: Bounds): Promise<GeoJSON> {
const response = await fetch(
`/api/features?bbox=${bounds.west},${bounds.south},${bounds.east},${bounds.north}`
);
return response.json();
}
map.on('moveend', async () => {
const bounds = map.getBounds();
const geojson = await fetchVisibleFeatures(bounds);
map.getSource('dynamic-data').setData(geojson);
});Anti-Pattern 5: Euclidean Distance on Spherical Earth
反模式5:在球面上使用欧几里得距离
Novice thinking: "Distance is just Pythagorean theorem"
Problem: Incorrect at scale, worse near poles.
Wrong approach:
typescript
// ❌ Flat Earth distance (wrong!)
function distanceKm(lat1: number, lon1: number, lat2: number, lon2: number): number {
const dx = lon2 - lon1;
const dy = lat2 - lat1;
return Math.sqrt(dx * dx + dy * dy) * 111.32; // 111.32 km/degree (WRONG)
}
// Example: San Francisco to New York
const distance = distanceKm(37.7749, -122.4194, 40.7128, -74.0060);
// Returns: ~55 km (WRONG! Actual: ~4,130 km)Why wrong: Earth is a sphere, not a flat plane.
Correct approach 1: Haversine formula (great circle distance)
typescript
// ✅ Haversine formula (spherical Earth)
function haversineKm(lat1: number, lon1: number, lat2: number, lon2: number): number {
const R = 6371; // Earth radius in km
const dLat = toRadians(lat2 - lat1);
const dLon = toRadians(lon2 - lon1);
const a =
Math.sin(dLat / 2) * Math.sin(dLat / 2) +
Math.cos(toRadians(lat1)) * Math.cos(toRadians(lat2)) *
Math.sin(dLon / 2) * Math.sin(dLon / 2);
const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
return R * c;
}
// San Francisco to New York
const distance = haversineKm(37.7749, -122.4194, 40.7128, -74.0060);
// Returns: ~4,130 km ✅Correct approach 2: PostGIS (handles curvature automatically)
sql
-- ✅ PostGIS ST_Distance with GEOGRAPHY
SELECT ST_Distance(
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326)::geography
) / 1000 AS distance_km;
-- Returns: 4130.137 km ✅Accuracy comparison:
| Method | SF to NYC | Error |
|---|---|---|
| Euclidean (flat) | 55 km | 98.7% wrong |
| Haversine (sphere) | 4,130 km | ✅ Correct |
| PostGIS (ellipsoid) | 4,135 km | Most accurate |
新手误区:「距离就是勾股定理计算的结果」
问题:在大尺度场景下计算错误,在极地附近误差更大。
错误示例:
typescript
// ❌ Flat Earth distance (wrong!)
function distanceKm(lat1: number, lon1: number, lat2: number, lon2: number): number {
const dx = lon2 - lon1;
const dy = lat2 - lat1;
return Math.sqrt(dx * dx + dy * dy) * 111.32; // 111.32 km/degree (WRONG)
}
// Example: San Francisco to New York
const distance = distanceKm(37.7749, -122.4194, 40.7128, -74.0060);
// Returns: ~55 km (WRONG! Actual: ~4,130 km)错误原因:地球是球体,不是平面。
正确方案1:Haversine公式(大圆距离)
typescript
// ✅ Haversine formula (spherical Earth)
function haversineKm(lat1: number, lon1: number, lat2: number, lon2: number): number {
const R = 6371; // Earth radius in km
const dLat = toRadians(lat2 - lat1);
const dLon = toRadians(lon2 - lon1);
const a =
Math.sin(dLat / 2) * Math.sin(dLat / 2) +
Math.cos(toRadians(lat1)) * Math.cos(toRadians(lat2)) *
Math.sin(dLon / 2) * Math.sin(dLon / 2);
const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
return R * c;
}
// San Francisco to New York
const distance = haversineKm(37.7749, -122.4194, 40.7128, -74.0060);
// Returns: ~4,130 km ✅正确方案2:PostGIS(自动处理地球曲率)
sql
-- ✅ PostGIS ST_Distance with GEOGRAPHY
SELECT ST_Distance(
ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326)::geography
) / 1000 AS distance_km;
-- Returns: 4130.137 km ✅精度对比:
| 方法 | 旧金山到纽约的距离 | 误差 |
|---|---|---|
| 欧几里得(平面) | 55公里 | 错误率98.7% |
| Haversine(球体) | 4130公里 | ✅ 正确 |
| PostGIS(椭球体) | 4135公里 | 精度最高 |
Production Checklist
生产环境检查清单
□ PostGIS extension installed and spatial indexes created
□ All coordinates stored in consistent SRID (recommend: 4326)
□ GeoJSON files optimized (<1MB) or served as vector tiles
□ Coordinate transformations use ST_Transform, not manual math
□ Distance calculations use ST_Distance with GEOGRAPHY type
□ Bounding box queries use ST_MakeEnvelope + ST_Intersects
□ Large geometries chunked (not >100KB per feature)
□ Map tiles pre-generated for common zoom levels
□ CORS configured for tile servers
□ Rate limiting on geocoding/reverse geocoding endpoints□ PostGIS扩展已安装并创建空间索引
□ 所有坐标存储在统一的SRID中(推荐:4326)
□ GeoJSON文件已优化(<1MB)或通过矢量瓦片提供
□ 坐标系转换使用ST_Transform,而非手动计算
□ 距离计算使用GEOGRAPHY类型的ST_Distance
□ 边界框查询使用ST_MakeEnvelope + ST_Intersects
□ 大型几何数据已分块(单个要素不超过100KB)
□ 已预生成常用缩放级别的地图瓦片
□ 瓦片服务器已配置CORS
□ 地理编码/逆地理编码端点已配置速率限制When to Use vs Avoid
适用与规避场景对比
| Scenario | Appropriate? |
|---|---|
| Drone imagery annotation and search | ✅ Yes - process survey data |
| GPS track visualization | ✅ Yes - optimize paths |
| Find nearest coffee shops | ✅ Yes - spatial queries |
| Jurisdiction boundary lookups | ✅ Yes - point-in-polygon |
| Simple address autocomplete | ❌ No - use Mapbox/Google |
| Embed static map on page | ❌ No - use Static API |
| Geocode single address | ❌ No - use geocoding API |
| 场景 | 是否适用 |
|---|---|
| 无人机影像标注与搜索 | ✅ 是 - 处理勘测数据 |
| GPS轨迹可视化 | ✅ 是 - 优化路径 |
| 查找最近的咖啡店 | ✅ 是 - 空间查询 |
| 辖区边界查询 | ✅ 是 - 点-in-面查询 |
| 简单地址自动补全 | ❌ 否 - 使用Mapbox/Google服务 |
| 在页面嵌入静态地图 | ❌ 否 - 使用静态API |
| 单地址地理编码 | ❌ 否 - 使用地理编码API |
References
参考资料
- - EPSG codes, transformations, Web Mercator vs WGS84
/references/coordinate-systems.md - - PostGIS setup, spatial indexes, common queries
/references/postgis-guide.md - - Simplification, chunking, vector tiles
/references/geojson-optimization.md
- - EPSG编码、坐标系转换、Web Mercator vs WGS84
/references/coordinate-systems.md - - PostGIS安装、空间索引、常用查询
/references/postgis-guide.md - - 几何简化、分块、矢量瓦片
/references/geojson-optimization.md
Scripts
脚本工具
- - Process drone imagery, GPS tracks, GeoJSON validation
scripts/geospatial_processor.ts - - Generate vector tiles (MBTiles/PMTiles) from GeoJSON
scripts/tile_generator.ts
This skill guides: Geospatial data | PostGIS | GeoJSON | Map tiles | Coordinate systems | Drone data processing | Spatial queries
- - 处理无人机影像、GPS轨迹、GeoJSON验证
scripts/geospatial_processor.ts - - 从GeoJSON生成矢量瓦片(MBTiles/PMTiles)
scripts/tile_generator.ts
本技能覆盖:地理空间数据 | PostGIS | GeoJSON | 地图瓦片 | 坐标系 | 无人机数据处理 | 空间查询