carto-geocoding

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Geocoding Addresses in CARTO Workflows

在CARTO Workflows中进行地址Geocoding

Converts street addresses or place names into geographic coordinates (point geometries). This is an essential first step when working with tabular data that has an address column but no spatial column.
Prerequisites: Load
carto-create-workflow
for the development process, JSON structure, and validation commands.

将街道地址或地名转换为地理坐标(点几何图形)。这是处理包含地址列但无空间列的表格数据时必不可少的第一步。
前提条件:加载
carto-create-workflow
以获取开发流程、JSON结构和验证命令。

Instructions

操作步骤

A geocoding workflow follows this pipeline:
Source Table (with address column) -> Geocode -> (Filter successful) -> Save
Geocoding工作流遵循以下流程:
Source Table (with address column) -> Geocode -> (Filter successful) -> Save

Step 1: Load Source Data

步骤1:加载源数据

Use
native.gettablebyname
. The input table must contain a column with address strings (e.g.
address
,
full_address
,
location
).
Success: Node outputs a table with at least one text column containing address data.
使用
native.gettablebyname
。输入表必须包含一个存储地址字符串的列(例如
address
full_address
location
)。
成功标志:节点输出至少包含一个含地址数据的文本列的表格。

Step 2: Geocode

步骤2:执行Geocoding

Use
native.geocode
with:
InputDescriptionRequired
source
Table with address dataYes
address
Column containing the address stringYes
country
Country filter to improve accuracy (e.g.
"United States"
,
"United Kingdom"
)
No, but strongly recommended
The address column can contain full addresses (
"123 Main St, Springfield, IL 60001"
) or composite values built from multiple columns (concatenate street + city + postal code in a prior
native.selectexpression
step).
Two output handles: The geocode component produces two separate outputs:
  • match
    : Rows where geocoding succeeded -- a
    geom
    column with point geometry is added, plus a
    CARTO_GEOCODE_METADATA
    JSON column with quality info (confidence score, match type).
  • unmatch
    : Rows where geocoding failed --
    geom
    is NULL.
Connect downstream nodes to the correct handle based on your needs.
Success: The geocode node is configured with the address column and (ideally) a country filter. Edges connect to the
match
and/or
unmatch
output handles.
使用
native.geocode
,参数如下:
输入项描述是否必填
source
包含地址数据的表格
address
存储地址字符串的列
country
用于提高精度的国家筛选器(例如
"United States"
"United Kingdom"
否,但强烈建议填写
地址列可以包含完整地址(
"123 Main St, Springfield, IL 60001"
),也可以包含通过多列拼接而成的复合值(在之前的
native.selectexpression
步骤中拼接街道+城市+邮政编码)。
两个输出分支:Geocode组件会生成两个独立输出:
  • match
    :Geocoding成功的行——新增一个带点几何图形的
    geom
    列,以及一个包含质量信息(置信度分数、匹配类型)的
    CARTO_GEOCODE_METADATA
    JSON列。
  • unmatch
    :Geocoding失败的行——
    geom
    为NULL。
根据需求将下游节点连接到正确的分支。
成功标志:Geocode节点已配置地址列(理想情况下还配置了国家筛选器),且已将连线连接到
match
和/或
unmatch
输出分支。

Step 3: Filter or Review Results (optional)

步骤3:筛选或审核结果(可选)

For the match output:
  • Optionally filter by confidence using the metadata column (e.g. extract confidence from
    CARTO_GEOCODE_METADATA
    via
    native.selectexpression
    ).
For the unmatch output:
  • Save to a separate table for review and manual correction.
  • Common failure causes: typos, incomplete addresses, PO boxes, ambiguous place names.
Success: High-confidence geocoded rows are isolated; failed rows are captured for review.
对于**
match
**输出:
  • 可选:使用元数据列按置信度筛选(例如通过
    native.selectexpression
    CARTO_GEOCODE_METADATA
    中提取置信度)。
对于**
unmatch
**输出:
  • 保存到单独的表格中以便审核和手动修正。
  • 常见失败原因:拼写错误、地址不完整、邮政信箱、模糊地名。
成功标志:已分离出高置信度的Geocoding结果行;已捕获失败行以便审核。

Step 4: Save Results

步骤4:保存结果

Use
native.saveastable
to persist the geocoded output. The
geom
column contains WGS84 (EPSG:4326) point geometries, ready for visualization in CARTO Builder or further spatial analysis.
Success: Validated workflow that can be uploaded via
carto workflows create
.

使用
native.saveastable
持久化Geocoding输出。
geom
列包含WGS84(EPSG:4326)点几何图形,可直接在CARTO Builder中可视化或用于进一步的空间分析。
成功标志:可通过
carto workflows create
上传的已验证工作流。

Geocoding Under the Hood

Geocoding底层原理

The workflow component wraps the CARTO Analytics Toolbox function
GEOCODE_TABLE
, which:
  • Adds a
    geom
    column with point geometry to each row
  • Adds a
    CARTO_GEOCODE_METADATA
    JSON column with quality information (confidence, match type)
  • Uses CARTO Location Data Services (LDS) -- each geocoded row consumes LDS quota
Check available quota by querying the Analytics Toolbox
LDS_QUOTA_INFO()
function. The fully-qualified name is provider-specific (BigQuery:
`carto-un.carto`.LDS_QUOTA_INFO()
; Snowflake:
CARTO.CARTO.LDS_QUOTA_INFO()
; Databricks: stored procedure in the dedicated AT schema). Load
carto-create-workflow
and consult
references/providers/<provider>.md
for the AT path on your warehouse.

该工作流组件封装了CARTO Analytics Toolbox函数
GEOCODE_TABLE
,该函数:
  • 为每行新增一个带点几何图形的
    geom
  • 新增一个包含质量信息(置信度、匹配类型)的
    CARTO_GEOCODE_METADATA
    JSON列
  • 使用CARTO Location Data Services (LDS)——每一行Geocoding都会消耗LDS配额
可通过查询Analytics Toolbox的
LDS_QUOTA_INFO()
函数查看可用配额。该函数的全限定名称因提供商而异(BigQuery:
`carto-un.carto`.LDS_QUOTA_INFO()
;Snowflake:
CARTO.CARTO.LDS_QUOTA_INFO()
;Databricks:专用AT schema中的存储过程)。加载
carto-create-workflow
并查阅
references/providers/<provider>.md
以获取您的数据仓库对应的AT路径。

Gotchas

注意事项

  • Geocoding consumes LDS quota. Each row geocoded counts against the account's Location Data Services quota. Check quota availability before bulk operations, especially on large tables.
  • Two output handles:
    match
    and
    unmatch
    .
    Don't connect to the wrong one --
    match
    has geometries,
    unmatch
    has NULLs. If you connect the
    unmatch
    handle to a spatial operation, it will fail.
  • Country filter is strongly recommended. Without it, ambiguous addresses may resolve to the wrong country (e.g. "Springfield" exists in 30+ US states and in other countries). The country parameter improves both accuracy and speed.
  • Address formatting matters. Well-formatted addresses produce better results:
    "123 Main St, Springfield, IL 60001"
    works better than
    "123 main street springfield"
    . Include city, state/region, and postal code when available.
  • Provider casing & SQL dialect. Examples in this skill use lowercase column names (BigQuery / Databricks / Postgres / Redshift convention); on Snowflake unquoted identifiers surface UPPERCASE (e.g.
    CARTO_GEOCODE_METADATA
    ,
    GEOM
    ). When writing dialect-specific SQL or referencing the AT path, see
    carto-create-workflow/references/providers/<provider>.md
    .
  • For large tables, consider batching. Geocoding hundreds of thousands of rows in a single run can exhaust quota or time out. Split into batches if needed.
  • Output geometry is always WGS84 points. The
    geom
    column contains EPSG:4326 point geometries regardless of the input address format or country.
  • Failed geocodes deserve review. The
    unmatch
    output is not just noise -- it often reveals data quality issues (missing postal codes, abbreviated city names, non-standard formatting) that can be fixed and re-geocoded.

  • Geocoding会消耗LDS配额。每一行Geocoding都会计入账户的Location Data Services配额。在批量操作前,尤其是处理大型表格时,请检查配额可用性。
  • 两个输出分支:
    match
    unmatch
    。请勿连接到错误的分支——
    match
    包含几何图形,
    unmatch
    为NULL。如果将
    unmatch
    分支连接到空间操作,会导致失败。
  • 强烈建议使用国家筛选器。如果不设置,模糊地址可能会解析到错误的国家(例如“Springfield”存在于美国30多个州及其他国家)。国家参数可同时提高精度和速度。
  • 地址格式很重要。格式规范的地址会产生更好的结果:
    "123 Main St, Springfield, IL 60001"
    "123 main street springfield"
    效果更好。请尽可能包含城市、州/地区和邮政编码。
  • 提供商大小写与SQL方言。本技能中的示例使用小写列名(BigQuery / Databricks / Postgres / Redshift约定);在Snowflake中,未加引号的标识符会显示为大写(例如
    CARTO_GEOCODE_METADATA
    GEOM
    )。编写特定方言的SQL或引用AT路径时,请查看
    carto-create-workflow/references/providers/<provider>.md
  • 处理大型表格时请考虑分批处理。单次运行Geocoding数十万行可能会耗尽配额或超时。如有需要,请拆分成分批处理。
  • 输出几何图形始终为WGS84点。无论输入地址格式或国家如何,
    geom
    列都包含EPSG:4326点几何图形。
  • 失败的Geocoding值得审核
    unmatch
    输出并非无用数据——它通常会揭示数据质量问题(缺少邮政编码、城市名称缩写、非标准格式),这些问题可以修复后重新进行Geocoding。

Reference Templates

参考模板

Academy tutorials and workflow templates covering geocoding:
ResourceDescriptionURL
Geocoding (BigQuery AT)Step-by-step geocoding with Analytics Toolbox for BigQueryAcademy link
Geocoding (Snowflake AT)Step-by-step geocoding with Analytics Toolbox for SnowflakeAcademy link
Workflow template: Geocode street addressesGenerating new spatial data from addressesAcademy link

涵盖Geocoding的学院教程和工作流模板:
资源描述链接
Geocoding (BigQuery AT)使用BigQuery版Analytics Toolbox进行Geocoding的分步教程学院链接
Geocoding (Snowflake AT)使用Snowflake版Analytics Toolbox进行Geocoding的分步教程学院链接
Workflow template: Geocode street addresses从地址生成新的空间数据学院链接

Common Variations

常见变体

VariationHow
Composite address from multiple columnsAdd a
native.selectexpression
step before geocoding to concatenate street, city, state, zip into one column
Geocode + spatial joinChain: Geocode -> match -> Spatial Join (e.g. point-in-polygon to assign regions)
Geocode + enrichmentChain: Geocode -> match -> Buffer/Isochrone -> Enrich (add demographics around each geocoded point)
Capture failures for re-processingConnect both
match
and
unmatch
handles to separate
native.saveastable
nodes
Filter by confidenceAfter geocode, use
native.selectexpression
to extract confidence from the metadata JSON, then filter
变体实现方式
从多列拼接复合地址在Geocoding前添加
native.selectexpression
步骤,将街道、城市、州、邮政编码拼接成一列
Geocoding + 空间连接链式流程:Geocode -> match -> 空间连接(例如点面匹配以分配区域)
Geocoding + 数据增强链式流程:Geocode -> match -> 缓冲区/等时线 -> 数据增强(为每个Geocoding点添加周边人口统计数据)
捕获失败结果以便重新处理
match
unmatch
分支分别连接到不同的
native.saveastable
节点
按置信度筛选Geocoding后,使用
native.selectexpression
从元数据JSON中提取置信度,然后进行筛选