carto-site-selection

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Site Selection and Cannibalization Analysis

选址与蚕食分析

Builds CARTO Workflows that identify optimal locations for new facilities (stores, stations, offices) by combining spatial criteria, and that quantify cannibalization risk from overlapping catchment areas. Also covers twin-area and similar-location discovery.

Prerequisites: Load

carto-create-workflow

for the development process, JSON structure, and validation commands. Load

carto-trade-area-analysis

if the workflow involves isochrones, buffers, or catchment enrichment — that skill covers the catchment pipeline in detail.

在CARTO中构建工作流，通过结合空间识别标准确定新设施（门店、站点、办公室）的最优位置，并量化重叠商圈带来的蚕食风险。同时涵盖相似区域与同类选址的发现。

前置条件：开发过程、JSON结构及验证命令需加载

carto-create-workflow

。若工作流涉及等时线、缓冲区或商圈数据增强，需加载

carto-trade-area-analysis

——该技能详细介绍了商圈处理流程。

Decision Tree

决策树

User intent	Pattern
"Where should I open a new store?"	Site Selection (scoring + ranking)
"Will a new store hurt existing ones?"	Cannibalization Analysis
"Find locations similar to my best performers"	Twin Areas / Similar Locations

用户意图	模式
"我应该在哪里开新店？"	选址分析（评分+排名）
"新店会对现有门店造成影响吗？"	蚕食分析
"寻找与我最佳门店相似的选址"	相似区域/同类选址

Instructions

操作指南

Pattern A: Site Selection (Scoring + Ranking)

模式A：选址分析（评分+排名）

Existing locations + Target area -> Spatial indexing -> Enrich with demographics/POIs -> Score/Rank -> Filter top candidates -> Save

现有位置 + 目标区域 -> 空间索引 -> 人口统计/POI数据增强 -> 评分/排名 -> 筛选顶级候选位置 -> 保存

Step 1: Load Data

步骤1：加载数据

Load two datasets with

native.gettablebyname

Existing locations (current stores/facilities)
Target area (e.g. city boundary, district polygons, or a grid covering the study area)

Success: Both tables loaded with geometry columns and unique identifiers.

使用

native.gettablebyname

加载两个数据集：

现有位置（当前门店/设施）
目标区域（例如城市边界、区域多边形或覆盖研究区域的网格）

成功标志：两个表均已加载，包含几何列与唯一标识符。

Step 2: Build Candidate Grid

步骤2：构建候选网格

Polyfill the target area into H3 or Quadbin cells using

native.h3polyfill

native.quadbinpolyfill

. Each cell is a candidate micro-location.

Success: A contiguous grid of cells covering the study area.

使用

native.h3polyfill

或

native.quadbinpolyfill

将目标区域填充为H3或Quadbin网格单元。每个单元都是一个候选微选址。

成功标志：生成覆盖研究区域的连续网格单元。

Step 3: Enrich Candidates

步骤3：增强候选位置数据

Attach demand signals to each cell — population, income, foot traffic, POI density — using

native.h3enrich

native.joinv2

, or the Data Observatory.

Success: Each grid cell has numeric columns representing demand/suitability factors.

使用

native.h3enrich

、

native.joinv2

或Data Observatory为每个单元添加需求信号——人口、收入、人流量、POI密度。

成功标志：每个网格单元都包含代表需求/适配性因素的数值列。

Step 4: Filter by Proximity to Existing Locations

步骤4：根据与现有位置的距离筛选

Use

native.h3distance

to compute hop distance from each candidate cell to the nearest existing location. Filter out cells that are too close (cannibalization risk) or too far (logistics cost).

```
native.h3distance
```
returns hop count, not physical distance. Convert using the approximate edge length for the resolution (e.g. H3 res 8 ~ 460m edge, so 3 hops ~ 1.4 km).

Success: Candidate cells are within a sensible distance band from existing locations.

使用

native.h3distance

计算每个候选单元到最近现有位置的跳数距离。过滤掉过近（存在蚕食风险）或过远（物流成本过高）的单元。

```
native.h3distance
```
返回的是跳数，而非物理距离。可通过对应分辨率的近似边长进行转换（例如H3分辨率8的边长约为460米，因此3跳约为1.4公里）。

成功标志：候选单元处于与现有位置距离合理的范围内。

Step 5: Score and Rank

步骤5：评分与排名

Use the scoring pattern from

trade-area-analysis

Normalize each variable to [0,1] with
```
native.normalize
```
Composite score via
```
native.selectexpression
```
with user-defined weights
Rank with
```
native.orderby
```
(descending) +
```
native.limit
```
(top N)

Success: A ranked shortlist of candidate cells with composite scores and contributing variables.

使用

trade-area-analysis

中的评分模式：

标准化：使用
```
native.normalize
```
将每个变量标准化至[0,1]区间
综合评分：通过
```
native.selectexpression
```
结合用户定义的权重计算综合评分
排名：使用
```
native.orderby
```
（降序）+
```
native.limit
```
（取前N个）进行排名

成功标志：生成带有综合评分及贡献变量的候选单元排名短名单。

Step 6: Save

步骤6：保存

Use

native.saveastable

. The H3/Quadbin column is directly visualizable in CARTO Builder.

Success: Validated workflow ready to upload.

使用

native.saveastable

保存结果。H3/Quadbin列可直接在CARTO Builder中可视化。

成功标志：生成可上传的已验证工作流。

Pattern B: Cannibalization Analysis

模式B：蚕食分析

Existing + Proposed locations -> Trade areas (isoline/buffer) -> Polyfill to grid -> Intersect/Join -> Measure overlap -> Save

现有位置 + 拟议位置 -> 商圈（等时线/缓冲区） -> 填充为网格 -> 相交/关联 -> 测量重叠度 -> 保存

Step 1: Load Data

步骤1：加载数据

Load existing locations and proposed locations (or a single table with a flag column distinguishing them).

Success: Both sets loaded with geometry and unique identifiers.

加载现有位置与拟议位置（或包含区分二者的标记列的单个表）。

成功标志：两组位置均已加载，包含几何信息与唯一标识符。

Step 2: Generate Trade Areas

步骤2：生成商圈

Create catchment areas around both existing and proposed locations using

native.isolines

(realistic) or

native.buffer

(simple). Use the same parameters for both sets to ensure comparability.

Success: Every location has a catchment polygon with consistent parameters.

使用

native.isolines

（更贴合实际）或

native.buffer

（简易版）为现有位置与拟议位置创建商圈。为确保可比性，两组位置需使用相同参数。

成功标志：每个位置都有参数一致的商圈多边形。

Step 3: Polyfill to Spatial Index

步骤3：填充为空间索引

Convert all catchment polygons to H3 or Quadbin cells with

native.h3polyfill

. Preserve the location identifier and an

is_proposed

flag.

Success: One row per cell per location, with location ID and type flag.

使用

native.h3polyfill

将所有商圈多边形转换为H3或Quadbin网格单元。保留位置标识符与

is_proposed

标记。

成功标志：每个位置对应一条网格单元记录，包含位置ID与类型标记。

Step 4: Find Overlap

步骤4：查找重叠区域

Use

native.joinv2

(inner join on the spatial index column) between existing-location cells and proposed-location cells. The result contains cells shared by at least one existing and one proposed location.

Success: Output contains only cells that fall in both an existing and a proposed catchment.

使用

native.joinv2

（基于空间索引列进行内连接）关联现有位置单元与拟议位置单元。结果包含至少被一个现有位置和一个拟议位置商圈覆盖的单元。

成功标志：输出仅包含同时属于现有与拟议商圈的单元。

Step 5: Measure Impact

步骤5：量化影响

Use

native.groupby

to aggregate overlap:

Per existing location: count of overlapping cells / total cells in that location's catchment = overlap percentage
Enrich overlap cells with population or revenue to quantify shared demand

Use

native.selectexpression

to compute the overlap ratio.

Success: Each existing location has an overlap metric showing how much of its catchment is shared with proposed locations.

使用

native.groupby

聚合重叠数据：

单个现有位置：重叠单元数 / 该位置商圈的总单元数 = 重叠百分比
增强重叠单元数据：结合人口或营收数据量化共享需求

使用

native.selectexpression

计算重叠比率。

成功标志：每个现有位置都有一个重叠指标，显示其商圈与拟议位置商圈的共享程度。

Step 6: Save

步骤6：保存

Use

native.saveastable

Success: Validated workflow with per-location cannibalization metrics.

使用

native.saveastable

保存结果。

成功标志：生成带有单个位置蚕食指标的已验证工作流。

Pattern C: Twin Areas / Similar Locations

模式C：相似区域/同类选址

Top-performing locations -> Trade areas -> Enrich -> Build similarity model -> Score all candidate areas -> Rank -> Save

高绩效位置 -> 商圈 -> 数据增强 -> 构建相似性模型 -> 为所有候选区域评分 -> 排名 -> 保存

Step 1: Identify Reference Locations

步骤1：确定参考位置

Load the full location dataset. Filter to top performers (e.g. top quartile by revenue) using

native.wheresimplified

native.orderby

native.limit

Success: A subset of high-performing locations isolated as the reference set.

加载完整位置数据集。使用

native.wheresimplified

或

native.orderby

native.limit

筛选出高绩效位置（例如营收前四分之一的门店）。

成功标志：分离出作为参考集的高绩效位置子集。

Step 2: Generate and Enrich Trade Areas

步骤2：生成并增强商圈数据

Create isochrone or buffer trade areas around reference locations. Polyfill to H3/Quadbin. Enrich with demographics, POIs, and any relevant variables.

Success: Each reference location has a rich demographic profile.

为参考位置创建等时线或缓冲区商圈，填充为H3/Quadbin网格单元，添加人口统计、POI及其他相关变量数据。

成功标志：每个参考位置都有丰富的人口统计特征。

Step 3: Build Twin Areas Model

步骤3：构建相似区域模型

Use

native.buildtwinareasmodel

(BUILD_TWIN_AREAS_MODEL) to create a PCA-based similarity model from the enriched reference locations.

Input: enriched reference locations with numeric feature columns
The model captures the multivariate "signature" of successful locations

Success: A model artifact that encodes the demographic profile of top performers.

使用

native.buildtwinareasmodel

（BUILD_TWIN_AREAS_MODEL）基于增强后的参考位置创建PCA（主成分分析）相似性模型。

输入：带有数值特征列的增强参考位置数据
模型捕捉成功选址的多元“特征”

成功标志：生成编码高绩效位置人口统计特征的模型工件。

Step 4: Find Similar Locations

步骤4：寻找相似位置

Use

native.findsimilarlocations

(FIND_SIMILAR_LOCATIONS) to score all candidate areas against the twin-areas model.

Input: candidate areas enriched with the same variables used to build the model
Output: similarity score per candidate

Success: Every candidate area has a similarity score relative to the reference set.

使用

native.findsimilarlocations

（FIND_SIMILAR_LOCATIONS）为所有候选区域基于相似区域模型评分。

输入：使用与构建模型相同变量增强后的候选区域数据
输出：每个候选区域的相似性评分

成功标志：每个候选区域都有相对于参考集的相似性评分。

Step 5: Rank and Save

步骤5：排名与保存

Rank by similarity score descending. Save top candidates.

Success: A ranked list of areas most similar to top-performing locations.

按相似性评分降序排名，保存顶级候选区域。

成功标志：生成与高绩效位置最相似的区域排名列表。

Commercial Hotspots Variant

商业热点变体

For demand-driven site selection (e.g. "where is unmet demand highest?"), use

native.commercialhotspots

Build an H3 grid over the study area
Enrich with the target demand variable (e.g. population aged 15-34)

Run

native.commercialhotspots

with

variablecolumns

and

weights

Filter results by significance (
```
p_value < 0.05
```
)
Optionally filter by
```
native.h3distance
```
from existing locations to focus on underserved areas

Note:

variablecolumns

uses Python-style list syntax (

['col1', 'col2']

), and

weights

is comma-separated — see the

trade-area-analysis

gotchas for details.

对于需求驱动的选址（例如“哪里未满足需求最高？”），使用

native.commercialhotspots

：

在研究区域上构建H3网格
添加目标需求变量数据（例如15-34岁人口）

使用

variablecolumns

和

weights

运行

native.commercialhotspots

按显著性筛选结果（
```
p_value < 0.05
```
）
可选：使用
```
native.h3distance
```
筛选远离现有位置的区域，聚焦服务不足的区域

注意：

variablecolumns

使用Python风格的列表语法（

['col1', 'col2']

），

weights

为逗号分隔格式——详情请参考

trade-area-analysis

中的注意事项。

Gotchas

注意事项

Provider casing & SQL dialect. This skill uses lowercase column names (
```
h3
```
,
```
is_proposed
```
,
```
population
```
, etc.) — BigQuery / Databricks / Postgres / Redshift convention. On Snowflake, unquoted identifiers surface UPPERCASE — reference them as
```
H3
```
,
```
IS_PROPOSED
```
,
```
POPULATION
```
. See
```
carto-create-workflow/references/providers/<provider>.md
```
for casing rules and SQL dialect equivalents.
```
native.commercialhotspots
```
requires the Retail module of the Analytics Toolbox. Validate with
```
--connection
```
to confirm availability.
Twin Areas and Similar Locations use PCA internally — results are sensitive to variable selection and scaling. Include only relevant, non-redundant variables. Normalize inputs if scales differ widely.
Cannibalization overlap depends heavily on trade area definition (buffer radius, isoline time). Small changes in parameters can flip results. Document the chosen parameters and rationale.
```
native.h3distance
```
returns hop count, not physical distance. Multiply by the approximate cell edge length for the resolution to get a rough metric distance (e.g. res 8 ~ 460m, res 9 ~ 174m per hop).
When comparing across regions of different sizes, normalize demographics to per-capita or per-area values to avoid size bias (e.g. population density instead of total population).
The "best" location depends entirely on the criteria and weights chosen — there is no objectively correct answer. Always document assumptions and let the user adjust weights.
For the twin-areas model, use the same set of enrichment variables for both the reference locations and the candidates. Mismatched variables will cause the model to fail or produce meaningless scores.

提供商大小写与SQL方言：本技能使用小写列名（
```
h3
```
、
```
is_proposed
```
、
```
population
```
等）——符合BigQuery / Databricks / Postgres / Redshift的约定。在Snowflake中，未加引号的标识符会显示为大写——需引用为
```
H3
```
、
```
IS_PROPOSED
```
、
```
POPULATION
```
。请查看
```
carto-create-workflow/references/providers/<provider>.md
```
了解大小写规则与SQL方言对应关系。
```
native.commercialhotspots
```
需要Analytics Toolbox的零售模块。使用
```
--connection
```
验证可用性。
相似区域与同类选址内部使用PCA——结果对变量选择与缩放敏感。仅包含相关且非冗余的变量。若变量差异较大，请标准化输入数据。
蚕食重叠度很大程度上取决于商圈定义（缓冲区半径、等时线时间）。参数的微小变化可能导致结果反转。请记录所选参数及理由。
```
native.h3distance
```
返回的是跳数，而非物理距离。乘以对应分辨率的近似单元边长可得到大致的物理距离（例如分辨率8约为460米，分辨率9约为174米/跳）。
比较不同大小的区域时，标准化人口统计数据为人均或每区域数值，避免规模偏差（例如使用人口密度而非总人口）。
“最佳”选址完全取决于所选标准与权重——不存在客观正确的答案。请始终记录假设，并允许用户调整权重。
对于相似区域模型，参考位置与候选区域需使用相同的增强变量集。变量不匹配会导致模型失败或产生无意义的评分。

Reference Templates

参考模板

Academy Tutorials

学院教程

Tutorial	Provider	URL
Pizza Hut Honolulu — site selection with commercial hotspots	BigQuery	Link
Pizza Hut Honolulu — site selection with commercial hotspots	Snowflake	Link
Store cannibalization — quantifying new store impact	BigQuery	Link
Starbucks cannibalization — H3 grid overlap analysis	BigQuery	Link
Store cannibalization — Quadkey grid overlap	Snowflake	Link
Find twin areas of top-performing stores	BigQuery	Link
Find similar locations based on trade areas	BigQuery	Link
EV charging station site selection	Workflows	Link

教程	提供商	链接
必胜客火奴鲁鲁——基于商业热点的选址	BigQuery	链接
必胜客火奴鲁鲁——基于商业热点的选址	Snowflake	链接
门店蚕食分析——量化新店影响	BigQuery	链接
星巴克蚕食分析——H3网格重叠分析	BigQuery	链接
门店蚕食分析——Quadkey网格重叠	Snowflake	链接
寻找高绩效门店的相似区域	BigQuery	链接
基于商圈寻找相似选址	BigQuery	链接
充电桩选址优化	Workflows	链接

Common Variations

常见变体

Variant	How
Retail expansion	Isochrones -> enrich with demographics + competitor density -> composite score -> top N
Franchise territory planning	Cannibalization pattern to ensure non-overlapping catchments before awarding territories
EV charging / public services	Grid-based demand (population, traffic) + distance-from-existing filter -> rank underserved cells
Billboard / OOH placement	Buffers -> audience enrichment -> normalize + weight -> top N (see `trade-area-analysis` )
Bank branch optimization	Twin areas from top branches -> find similar underserved areas -> propose new branches
Competitor proximity analysis	H3 distance to competitor locations -> filter cells far from competitors but near demand

变体	实现方式
零售拓展	等时线 -> 人口统计+竞争对手密度数据增强 -> 综合评分 -> 取前N个
加盟店区域规划	使用蚕食分析模式确保授予区域前商圈无重叠
充电桩/公共服务	基于网格的需求（人口、流量）+ 与现有位置距离筛选 -> 服务不足单元排名
广告牌/户外广告布局	缓冲区 -> 受众数据增强 -> 标准化+加权 -> 取前N个（详见 `trade-area-analysis` ）
银行网点优化	从顶级网点生成相似区域 -> 寻找相似的服务不足区域 -> 提议新网点
竞争对手 proximity分析	到竞争对手位置的H3距离 -> 筛选远离竞争对手但靠近需求的单元