carto-site-selection

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Site Selection and Cannibalization Analysis

选址与蚕食分析

Builds CARTO Workflows that identify optimal locations for new facilities (stores, stations, offices) by combining spatial criteria, and that quantify cannibalization risk from overlapping catchment areas. Also covers twin-area and similar-location discovery.
Prerequisites: Load
carto-create-workflow
for the development process, JSON structure, and validation commands. Load
carto-trade-area-analysis
if the workflow involves isochrones, buffers, or catchment enrichment — that skill covers the catchment pipeline in detail.

在CARTO中构建工作流,通过结合空间识别标准确定新设施(门店、站点、办公室)的最优位置,并量化重叠商圈带来的蚕食风险。同时涵盖相似区域与同类选址的发现。
前置条件:开发过程、JSON结构及验证命令需加载
carto-create-workflow
。若工作流涉及等时线、缓冲区或商圈数据增强,需加载
carto-trade-area-analysis
——该技能详细介绍了商圈处理流程。

Decision Tree

决策树

User intentPattern
"Where should I open a new store?"Site Selection (scoring + ranking)
"Will a new store hurt existing ones?"Cannibalization Analysis
"Find locations similar to my best performers"Twin Areas / Similar Locations

用户意图模式
"我应该在哪里开新店?"选址分析(评分+排名)
"新店会对现有门店造成影响吗?"蚕食分析
"寻找与我最佳门店相似的选址"相似区域/同类选址

Instructions

操作指南

Pattern A: Site Selection (Scoring + Ranking)

模式A:选址分析(评分+排名)

Existing locations + Target area -> Spatial indexing -> Enrich with demographics/POIs -> Score/Rank -> Filter top candidates -> Save
现有位置 + 目标区域 -> 空间索引 -> 人口统计/POI数据增强 -> 评分/排名 -> 筛选顶级候选位置 -> 保存

Step 1: Load Data

步骤1:加载数据

Load two datasets with
native.gettablebyname
:
  • Existing locations (current stores/facilities)
  • Target area (e.g. city boundary, district polygons, or a grid covering the study area)
Success: Both tables loaded with geometry columns and unique identifiers.
使用
native.gettablebyname
加载两个数据集:
  • 现有位置(当前门店/设施)
  • 目标区域(例如城市边界、区域多边形或覆盖研究区域的网格)
成功标志:两个表均已加载,包含几何列与唯一标识符。

Step 2: Build Candidate Grid

步骤2:构建候选网格

Polyfill the target area into H3 or Quadbin cells using
native.h3polyfill
or
native.quadbinpolyfill
. Each cell is a candidate micro-location.
Success: A contiguous grid of cells covering the study area.
使用
native.h3polyfill
native.quadbinpolyfill
将目标区域填充为H3或Quadbin网格单元。每个单元都是一个候选微选址。
成功标志:生成覆盖研究区域的连续网格单元。

Step 3: Enrich Candidates

步骤3:增强候选位置数据

Attach demand signals to each cell — population, income, foot traffic, POI density — using
native.h3enrich
,
native.joinv2
, or the Data Observatory.
Success: Each grid cell has numeric columns representing demand/suitability factors.
使用
native.h3enrich
native.joinv2
或Data Observatory为每个单元添加需求信号——人口、收入、人流量、POI密度。
成功标志:每个网格单元都包含代表需求/适配性因素的数值列。

Step 4: Filter by Proximity to Existing Locations

步骤4:根据与现有位置的距离筛选

Use
native.h3distance
to compute hop distance from each candidate cell to the nearest existing location. Filter out cells that are too close (cannibalization risk) or too far (logistics cost).
  • native.h3distance
    returns hop count, not physical distance. Convert using the approximate edge length for the resolution (e.g. H3 res 8 ~ 460m edge, so 3 hops ~ 1.4 km).
Success: Candidate cells are within a sensible distance band from existing locations.
使用
native.h3distance
计算每个候选单元到最近现有位置的跳数距离。过滤掉过近(存在蚕食风险)或过远(物流成本过高)的单元。
  • native.h3distance
    返回的是跳数,而非物理距离。可通过对应分辨率的近似边长进行转换(例如H3分辨率8的边长约为460米,因此3跳约为1.4公里)。
成功标志:候选单元处于与现有位置距离合理的范围内。

Step 5: Score and Rank

步骤5:评分与排名

Use the scoring pattern from
trade-area-analysis
:
  1. Normalize each variable to [0,1] with
    native.normalize
  2. Composite score via
    native.selectexpression
    with user-defined weights
  3. Rank with
    native.orderby
    (descending) +
    native.limit
    (top N)
Success: A ranked shortlist of candidate cells with composite scores and contributing variables.
使用
trade-area-analysis
中的评分模式:
  1. 标准化:使用
    native.normalize
    将每个变量标准化至[0,1]区间
  2. 综合评分:通过
    native.selectexpression
    结合用户定义的权重计算综合评分
  3. 排名:使用
    native.orderby
    (降序)+
    native.limit
    (取前N个)进行排名
成功标志:生成带有综合评分及贡献变量的候选单元排名短名单。

Step 6: Save

步骤6:保存

Use
native.saveastable
. The H3/Quadbin column is directly visualizable in CARTO Builder.
Success: Validated workflow ready to upload.

使用
native.saveastable
保存结果。H3/Quadbin列可直接在CARTO Builder中可视化。
成功标志:生成可上传的已验证工作流。

Pattern B: Cannibalization Analysis

模式B:蚕食分析

Existing + Proposed locations -> Trade areas (isoline/buffer) -> Polyfill to grid -> Intersect/Join -> Measure overlap -> Save
现有位置 + 拟议位置 -> 商圈(等时线/缓冲区) -> 填充为网格 -> 相交/关联 -> 测量重叠度 -> 保存

Step 1: Load Data

步骤1:加载数据

Load existing locations and proposed locations (or a single table with a flag column distinguishing them).
Success: Both sets loaded with geometry and unique identifiers.
加载现有位置与拟议位置(或包含区分二者的标记列的单个表)。
成功标志:两组位置均已加载,包含几何信息与唯一标识符。

Step 2: Generate Trade Areas

步骤2:生成商圈

Create catchment areas around both existing and proposed locations using
native.isolines
(realistic) or
native.buffer
(simple). Use the same parameters for both sets to ensure comparability.
Success: Every location has a catchment polygon with consistent parameters.
使用
native.isolines
(更贴合实际)或
native.buffer
(简易版)为现有位置与拟议位置创建商圈。为确保可比性,两组位置需使用相同参数。
成功标志:每个位置都有参数一致的商圈多边形。

Step 3: Polyfill to Spatial Index

步骤3:填充为空间索引

Convert all catchment polygons to H3 or Quadbin cells with
native.h3polyfill
. Preserve the location identifier and an
is_proposed
flag.
Success: One row per cell per location, with location ID and type flag.
使用
native.h3polyfill
将所有商圈多边形转换为H3或Quadbin网格单元。保留位置标识符与
is_proposed
标记。
成功标志:每个位置对应一条网格单元记录,包含位置ID与类型标记。

Step 4: Find Overlap

步骤4:查找重叠区域

Use
native.joinv2
(inner join on the spatial index column) between existing-location cells and proposed-location cells. The result contains cells shared by at least one existing and one proposed location.
Success: Output contains only cells that fall in both an existing and a proposed catchment.
使用
native.joinv2
(基于空间索引列进行内连接)关联现有位置单元与拟议位置单元。结果包含至少被一个现有位置和一个拟议位置商圈覆盖的单元。
成功标志:输出仅包含同时属于现有与拟议商圈的单元。

Step 5: Measure Impact

步骤5:量化影响

Use
native.groupby
to aggregate overlap:
  • Per existing location: count of overlapping cells / total cells in that location's catchment = overlap percentage
  • Enrich overlap cells with population or revenue to quantify shared demand
Use
native.selectexpression
to compute the overlap ratio.
Success: Each existing location has an overlap metric showing how much of its catchment is shared with proposed locations.
使用
native.groupby
聚合重叠数据:
  • 单个现有位置:重叠单元数 / 该位置商圈的总单元数 = 重叠百分比
  • 增强重叠单元数据:结合人口或营收数据量化共享需求
使用
native.selectexpression
计算重叠比率。
成功标志:每个现有位置都有一个重叠指标,显示其商圈与拟议位置商圈的共享程度。

Step 6: Save

步骤6:保存

Use
native.saveastable
.
Success: Validated workflow with per-location cannibalization metrics.

使用
native.saveastable
保存结果。
成功标志:生成带有单个位置蚕食指标的已验证工作流。

Pattern C: Twin Areas / Similar Locations

模式C:相似区域/同类选址

Top-performing locations -> Trade areas -> Enrich -> Build similarity model -> Score all candidate areas -> Rank -> Save
高绩效位置 -> 商圈 -> 数据增强 -> 构建相似性模型 -> 为所有候选区域评分 -> 排名 -> 保存

Step 1: Identify Reference Locations

步骤1:确定参考位置

Load the full location dataset. Filter to top performers (e.g. top quartile by revenue) using
native.wheresimplified
or
native.orderby
+
native.limit
.
Success: A subset of high-performing locations isolated as the reference set.
加载完整位置数据集。使用
native.wheresimplified
native.orderby
+
native.limit
筛选出高绩效位置(例如营收前四分之一的门店)。
成功标志:分离出作为参考集的高绩效位置子集。

Step 2: Generate and Enrich Trade Areas

步骤2:生成并增强商圈数据

Create isochrone or buffer trade areas around reference locations. Polyfill to H3/Quadbin. Enrich with demographics, POIs, and any relevant variables.
Success: Each reference location has a rich demographic profile.
为参考位置创建等时线或缓冲区商圈,填充为H3/Quadbin网格单元,添加人口统计、POI及其他相关变量数据。
成功标志:每个参考位置都有丰富的人口统计特征。

Step 3: Build Twin Areas Model

步骤3:构建相似区域模型

Use
native.buildtwinareasmodel
(BUILD_TWIN_AREAS_MODEL) to create a PCA-based similarity model from the enriched reference locations.
  • Input: enriched reference locations with numeric feature columns
  • The model captures the multivariate "signature" of successful locations
Success: A model artifact that encodes the demographic profile of top performers.
使用
native.buildtwinareasmodel
(BUILD_TWIN_AREAS_MODEL)基于增强后的参考位置创建PCA(主成分分析)相似性模型。
  • 输入:带有数值特征列的增强参考位置数据
  • 模型捕捉成功选址的多元“特征”
成功标志:生成编码高绩效位置人口统计特征的模型工件。

Step 4: Find Similar Locations

步骤4:寻找相似位置

Use
native.findsimilarlocations
(FIND_SIMILAR_LOCATIONS) to score all candidate areas against the twin-areas model.
  • Input: candidate areas enriched with the same variables used to build the model
  • Output: similarity score per candidate
Success: Every candidate area has a similarity score relative to the reference set.
使用
native.findsimilarlocations
(FIND_SIMILAR_LOCATIONS)为所有候选区域基于相似区域模型评分。
  • 输入:使用与构建模型相同变量增强后的候选区域数据
  • 输出:每个候选区域的相似性评分
成功标志:每个候选区域都有相对于参考集的相似性评分。

Step 5: Rank and Save

步骤5:排名与保存

Rank by similarity score descending. Save top candidates.
Success: A ranked list of areas most similar to top-performing locations.

按相似性评分降序排名,保存顶级候选区域。
成功标志:生成与高绩效位置最相似的区域排名列表。

Commercial Hotspots Variant

商业热点变体

For demand-driven site selection (e.g. "where is unmet demand highest?"), use
native.commercialhotspots
:
  1. Build an H3 grid over the study area
  2. Enrich with the target demand variable (e.g. population aged 15-34)
  3. Run
    native.commercialhotspots
    with
    variablecolumns
    and
    weights
  4. Filter results by significance (
    p_value < 0.05
    )
  5. Optionally filter by
    native.h3distance
    from existing locations to focus on underserved areas
Note:
variablecolumns
uses Python-style list syntax (
['col1', 'col2']
), and
weights
is comma-separated — see the
trade-area-analysis
gotchas for details.

对于需求驱动的选址(例如“哪里未满足需求最高?”),使用
native.commercialhotspots
  1. 在研究区域上构建H3网格
  2. 添加目标需求变量数据(例如15-34岁人口)
  3. 使用
    variablecolumns
    weights
    运行
    native.commercialhotspots
  4. 按显著性筛选结果(
    p_value < 0.05
  5. 可选:使用
    native.h3distance
    筛选远离现有位置的区域,聚焦服务不足的区域
注意
variablecolumns
使用Python风格的列表语法(
['col1', 'col2']
),
weights
为逗号分隔格式——详情请参考
trade-area-analysis
中的注意事项。

Gotchas

注意事项

  • Provider casing & SQL dialect. This skill uses lowercase column names (
    h3
    ,
    is_proposed
    ,
    population
    , etc.) — BigQuery / Databricks / Postgres / Redshift convention. On Snowflake, unquoted identifiers surface UPPERCASE — reference them as
    H3
    ,
    IS_PROPOSED
    ,
    POPULATION
    . See
    carto-create-workflow/references/providers/<provider>.md
    for casing rules and SQL dialect equivalents.
  • native.commercialhotspots
    requires the Retail module of the Analytics Toolbox. Validate with
    --connection
    to confirm availability.
  • Twin Areas and Similar Locations use PCA internally — results are sensitive to variable selection and scaling. Include only relevant, non-redundant variables. Normalize inputs if scales differ widely.
  • Cannibalization overlap depends heavily on trade area definition (buffer radius, isoline time). Small changes in parameters can flip results. Document the chosen parameters and rationale.
  • native.h3distance
    returns hop count, not physical distance. Multiply by the approximate cell edge length for the resolution to get a rough metric distance (e.g. res 8 ~ 460m, res 9 ~ 174m per hop).
  • When comparing across regions of different sizes, normalize demographics to per-capita or per-area values to avoid size bias (e.g. population density instead of total population).
  • The "best" location depends entirely on the criteria and weights chosen — there is no objectively correct answer. Always document assumptions and let the user adjust weights.
  • For the twin-areas model, use the same set of enrichment variables for both the reference locations and the candidates. Mismatched variables will cause the model to fail or produce meaningless scores.

  • 提供商大小写与SQL方言:本技能使用小写列名(
    h3
    is_proposed
    population
    等)——符合BigQuery / Databricks / Postgres / Redshift的约定。在Snowflake中,未加引号的标识符会显示为大写——需引用为
    H3
    IS_PROPOSED
    POPULATION
    。请查看
    carto-create-workflow/references/providers/<provider>.md
    了解大小写规则与SQL方言对应关系。
  • native.commercialhotspots
    需要Analytics Toolbox的零售模块。使用
    --connection
    验证可用性。
  • 相似区域与同类选址内部使用PCA——结果对变量选择与缩放敏感。仅包含相关且非冗余的变量。若变量差异较大,请标准化输入数据。
  • 蚕食重叠度很大程度上取决于商圈定义(缓冲区半径、等时线时间)。参数的微小变化可能导致结果反转。请记录所选参数及理由。
  • native.h3distance
    返回的是跳数,而非物理距离。乘以对应分辨率的近似单元边长可得到大致的物理距离(例如分辨率8约为460米,分辨率9约为174米/跳)。
  • 比较不同大小的区域时,标准化人口统计数据为人均或每区域数值,避免规模偏差(例如使用人口密度而非总人口)。
  • “最佳”选址完全取决于所选标准与权重——不存在客观正确的答案。请始终记录假设,并允许用户调整权重。
  • 对于相似区域模型,参考位置与候选区域需使用相同的增强变量集。变量不匹配会导致模型失败或产生无意义的评分。

Reference Templates

参考模板

Academy Tutorials

学院教程

TutorialProviderURL
Pizza Hut Honolulu — site selection with commercial hotspotsBigQueryLink
Pizza Hut Honolulu — site selection with commercial hotspotsSnowflakeLink
Store cannibalization — quantifying new store impactBigQueryLink
Starbucks cannibalization — H3 grid overlap analysisBigQueryLink
Store cannibalization — Quadkey grid overlapSnowflakeLink
Find twin areas of top-performing storesBigQueryLink
Find similar locations based on trade areasBigQueryLink
EV charging station site selectionWorkflowsLink

教程提供商链接
必胜客火奴鲁鲁——基于商业热点的选址BigQuery链接
必胜客火奴鲁鲁——基于商业热点的选址Snowflake链接
门店蚕食分析——量化新店影响BigQuery链接
星巴克蚕食分析——H3网格重叠分析BigQuery链接
门店蚕食分析——Quadkey网格重叠Snowflake链接
寻找高绩效门店的相似区域BigQuery链接
基于商圈寻找相似选址BigQuery链接
充电桩选址优化Workflows链接

Common Variations

常见变体

VariantHow
Retail expansionIsochrones -> enrich with demographics + competitor density -> composite score -> top N
Franchise territory planningCannibalization pattern to ensure non-overlapping catchments before awarding territories
EV charging / public servicesGrid-based demand (population, traffic) + distance-from-existing filter -> rank underserved cells
Billboard / OOH placementBuffers -> audience enrichment -> normalize + weight -> top N (see
trade-area-analysis
)
Bank branch optimizationTwin areas from top branches -> find similar underserved areas -> propose new branches
Competitor proximity analysisH3 distance to competitor locations -> filter cells far from competitors but near demand
变体实现方式
零售拓展等时线 -> 人口统计+竞争对手密度数据增强 -> 综合评分 -> 取前N个
加盟店区域规划使用蚕食分析模式确保授予区域前商圈无重叠
充电桩/公共服务基于网格的需求(人口、流量)+ 与现有位置距离筛选 -> 服务不足单元排名
广告牌/户外广告布局缓冲区 -> 受众数据增强 -> 标准化+加权 -> 取前N个(详见
trade-area-analysis
银行网点优化从顶级网点生成相似区域 -> 寻找相似的服务不足区域 -> 提议新网点
竞争对手 proximity分析到竞争对手位置的H3距离 -> 筛选远离竞争对手但靠近需求的单元