postgis-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Documentation

文档规范

  • Make sure every create statement or CTE has descriptive comment
    --
    in front of it.
  • Write enough comments so you can deduce what was a requirement in the future and not walk in circles.
  • Every feature needs to have comprehensive up-to-date documentation near it.
  • 确保每个CREATE语句或CTE前都有描述性的
    --
    注释。
  • 撰写足够的注释,以便未来能推断出当初的需求,避免走弯路。
  • 每个功能都需要在附近配有全面且最新的文档。

Style

编码风格

  • PostGIS functions follow their spelling from the manual (
    st_segmentize
    ->
    ST_Segmentize
    ).
  • SQL is lowercase unless instructed otherwise.
  • Values in databases and layers should be absolute as much as possible: store "birthday" or "construction date" instead of "age".
  • Do not mix tabs and spaces in code.
  • Add empty lines between logical blocks.
  • Format the code nicely and consistently.
  • Call geometry column
    geom
    ; geography column
    geog
    .
  • PostGIS函数需遵循官方手册中的拼写(如
    st_segmentize
    应改为
    ST_Segmentize
    )。
  • SQL语句除特殊要求外均使用小写。
  • 数据库和图层中的值应尽可能采用绝对形式:存储"生日"或"建造日期"而非"年龄"。
  • 代码中不要混合使用制表符和空格。
  • 在逻辑块之间添加空行。
  • 代码格式要美观且统一。
  • 几何列命名为
    geom
    ;地理列命名为
    geog

Indexing

索引优化

  • Create brin for all columns when creating large table that will be used for ad-hoc queries.
  • If you have cache table that has a primary key, it makes sense to add values into
    including
    on same index for faster lookup.
  • 创建用于即席查询的大型表时,为所有列创建BRIN索引。
  • 若缓存表有主键,在同一索引的
    including
    子句中添加相关值可加快查找速度。

Debugging

调试技巧

  • Make sure that error messages towards developer are better than just "500 Internal server error".
  • Don't stub stuff out with insane fallbacks (like lat/lon=0) - instead make the rest of the code work around data absence and inform user.
  • SQL files should to be idempotent: drop table if exists + create table as; add some comments to make people grasp queries faster.
  • Create both "up' and "down/rollback" migration when creating new migrations for ease of iteration.
  • Check
    select postgis_full_version();
    to see if all upgrades happened successfully.
  • Don't run one SQL file from other SQL file - this quickly becomes a mess with relative file paths.
  • 确保返回给开发者的错误信息比单纯的"500 Internal Server Error"更具信息量。
  • 不要用极端的 fallback 方式(如设置经纬度为0)来掩盖问题——而是让代码适配数据缺失的情况并告知用户。
  • SQL文件应具备幂等性:先执行
    drop table if exists
    再执行
    create table as
    ;添加注释以帮助他人快速理解查询语句。
  • 创建新的迁移脚本时,同时编写"升级"和"回滚"脚本,以便迭代。
  • 执行
    select postgis_full_version();
    检查所有升级是否成功完成。
  • 不要从一个SQL文件中调用另一个SQL文件——相对路径会很快让代码变得混乱。

Raster

栅格数据处理

  • Do not work with GDAL on the filesystem. Import things into database and deal with data there.
  • 不要在文件系统层面使用GDAL处理数据。应将数据导入数据库后再进行操作。

SQL gotchas

SQL常见陷阱

  • sum(case when A then 1 else 0 end)
    is just
    count() filter (where A)
  • row_number() ...  = 1
    can likely be redone as
    order by + limit 1
    (possibly with
    distinct on
    or
    lateral
    )
  • exists(select 1 from ...)
    is just
    exists(select from ...)
  • tags ->> 'key' = 'value'
    is just
    tags @> '{"key": "value"}
    - works faster for indexes
  • you can't just create ordered table and then rely on it to be ordered on scan without
    order by
  • sum(case when A then 1 else 0 end)
    等价于
    count() filter (where A)
  • row_number() ...  = 1
    通常可以改写为
    order by + limit 1
    (可能结合
    distinct on
    lateral
  • exists(select 1 from ...)
    等价于
    exists(select from ...)
  • tags ->> 'key' = 'value'
    等价于
    tags @> '{"key": "value"}'
    ——后者在使用索引时速度更快
  • 不能仅依赖创建时的有序表来保证扫描时的顺序,必须显式添加
    order by
    子句

PostGIS gotchas

PostGIS常见陷阱

  • Do not use geometry typmod unless requested (things like
    geometry(multilinestring, 4326)
    ) - use plain
    geometry
    or
    geography
    instead. This removes clutter of
    ST_Multi
    and errors via
    ST_SetSRID
    .
  • ST_UnaryUnion(ST_Collect(geom))
    is just
    ST_Union(geom)
  • ST_Buffer(geom, 0)
    should be
    ST_MakeValid(geom)
  • select min(ST_Distance(..))
    should be
    select ST_Distance() ... order by a <-> b limit 1
    to enable knn gist
  • order by ST_Distance(c.geog, t.geog)
    should be
    order by c.geog <-> t.geog
  • ST_UnaryUnion
    is a sign you're doing something wrong
  • ST_MakeValid
    is a sign you're doing something wrong on the previous step
  • be extra attintive when calling
    ST_SetSRID
    : check the actual projection of input data, check if it can be set correctly during input (
    ST_GeomFromGeoJSON
    ,
    EWKT
    -style
    SRID=4326;POINT(...
    ,
    EWKB
    allow that). Check if
    ST_Transform
    is needed instead.
  • when looking for relation between point and polygon, prefer
    ST_Intersects
    to other topology predicates
  • when generating complex geometry by walking raster or grid, may make sense to
    ST_Simplify(geom, 0)
  • to generate neighbourhoods of predictable size, use
    ST_ClusterKMeans
    with k=2 and
    max_radius
    set to your distance.
  • use
    ST_AsEWKB
    for binary representation instead of
    ST_AsWKB
    to keep SRID.
  • Choosing projection: SRID=4326 (2D longlat) when input or output is longitude and latitude and coordinate value is to be shown to user. SRID=3857 (2D Spherical Mercator) when output will be shown on web map, ST_AsMVT, or 2D KNN requests of short distance are to be executed. SRID=4978 (3D XYZ) when performing internal computations, line-of-sight, clustering and averaging across antimeridian. Beware: only use 3D-aware operations, ST_Force3DZ on 2D CRS data before calling ST_Transform to it.
  • Instead of using
    ST_Hexagon
    /
    ST_HexagonGrid
    use
    h3
    extension.
  • When you know the data is going to be dumped in binary form, gzipped and moved around, consider using
    ST_QuantizeCoordinates
    if precision is known.
  • 除非有明确要求,否则不要使用几何类型修饰符(如
    geometry(multilinestring, 4326)
    )——应使用普通的
    geometry
    geography
    类型。这可以避免
    ST_Multi
    的冗余以及
    ST_SetSRID
    引发的错误。
  • ST_UnaryUnion(ST_Collect(geom))
    等价于
    ST_Union(geom)
  • ST_Buffer(geom, 0)
    应替换为
    ST_MakeValid(geom)
  • select min(ST_Distance(..))
    应改写为
    select ST_Distance() ... order by a <-> b limit 1
    ,以启用KNN GiST索引
  • order by ST_Distance(c.geog, t.geog)
    应改为
    order by c.geog <-> t.geog
  • 使用
    ST_UnaryUnion
    通常意味着你的实现方式存在问题
  • 使用
    ST_MakeValid
    通常意味着上一步的处理存在问题
  • 调用
    ST_SetSRID
    时要格外注意:检查输入数据的实际投影,确认是否可以在输入阶段就正确设置(
    ST_GeomFromGeoJSON
    、EWKT格式
    SRID=4326;POINT(...
    、EWKB都支持)。考虑是否需要改用
    ST_Transform
  • 查找点与多边形的关系时,优先使用
    ST_Intersects
    而非其他拓扑谓词
  • 通过栅格或网格生成复杂几何图形时,使用
    ST_Simplify(geom, 0)
    可能会有帮助
  • 要生成大小可预测的邻域,可结合使用
    ST_ClusterKMeans
    (设置k=2)和
    max_radius
    参数指定距离。
  • 如需保留SRID,使用
    ST_AsEWKB
    而非
    ST_AsWKB
    来生成二进制表示。
  • 投影选择指南:
  • 当输入或输出为经纬度且坐标值需展示给用户时,使用SRID=4326(二维经纬度)。
  • 当输出将用于Web地图、ST_AsMVT或执行短距离二维KNN查询时,使用SRID=3857(二维球面墨卡托)。
  • 进行内部计算、视线分析、跨日界线聚类与平均时,使用SRID=4978(三维XYZ)。注意:仅使用支持三维的操作,在调用ST_Transform转换到该坐标系前,需对二维CRS数据执行ST_Force3DZ。
  • 不要使用
    ST_Hexagon
    /
    ST_HexagonGrid
    ,应使用
    h3
    扩展。
  • 当已知数据将以二进制形式导出、压缩并传输时,若精度已知,可考虑使用
    ST_QuantizeCoordinates