Site Selection and Cannibalization Analysis
Builds CARTO Workflows that identify optimal locations for new facilities (stores, stations, offices) by combining spatial criteria, and that quantify cannibalization risk from overlapping catchment areas. Also covers twin-area and similar-location discovery.
Prerequisites: Load
for the development process, JSON structure, and validation commands. Load
carto-trade-area-analysis
if the workflow involves isochrones, buffers, or catchment enrichment — that skill covers the catchment pipeline in detail.
Decision Tree
| User intent | Pattern |
|---|
| "Where should I open a new store?" | Site Selection (scoring + ranking) |
| "Will a new store hurt existing ones?" | Cannibalization Analysis |
| "Find locations similar to my best performers" | Twin Areas / Similar Locations |
Instructions
Pattern A: Site Selection (Scoring + Ranking)
Existing locations + Target area -> Spatial indexing -> Enrich with demographics/POIs -> Score/Rank -> Filter top candidates -> Save
Step 1: Load Data
- Existing locations (current stores/facilities)
- Target area (e.g. city boundary, district polygons, or a grid covering the study area)
Success: Both tables loaded with geometry columns and unique identifiers.
Step 2: Build Candidate Grid
Polyfill the target area into H3 or Quadbin cells using
or
. Each cell is a candidate micro-location.
Success: A contiguous grid of cells covering the study area.
Step 3: Enrich Candidates
Attach demand signals to each cell — population, income, foot traffic, POI density — using
,
, or the Data Observatory.
Success: Each grid cell has numeric columns representing demand/suitability factors.
Step 4: Filter by Proximity to Existing Locations
Use
to compute hop distance from each candidate cell to the nearest existing location. Filter out cells that are too close (cannibalization risk) or too far (logistics cost).
- returns hop count, not physical distance. Convert using the approximate edge length for the resolution (e.g. H3 res 8 ~ 460m edge, so 3 hops ~ 1.4 km).
Success: Candidate cells are within a sensible distance band from existing locations.
Step 5: Score and Rank
Use the scoring pattern from
:
- Normalize each variable to [0,1] with
- Composite score via with user-defined weights
- Rank with (descending) + (top N)
Success: A ranked shortlist of candidate cells with composite scores and contributing variables.
Step 6: Save
Use
. The H3/Quadbin column is directly visualizable in CARTO Builder.
Success: Validated workflow ready to upload.
Pattern B: Cannibalization Analysis
Existing + Proposed locations -> Trade areas (isoline/buffer) -> Polyfill to grid -> Intersect/Join -> Measure overlap -> Save
Step 1: Load Data
Load existing locations and proposed locations (or a single table with a flag column distinguishing them).
Success: Both sets loaded with geometry and unique identifiers.
Step 2: Generate Trade Areas
Create catchment areas around both existing and proposed locations using
(realistic) or
(simple). Use the same parameters for both sets to ensure comparability.
Success: Every location has a catchment polygon with consistent parameters.
Step 3: Polyfill to Spatial Index
Convert all catchment polygons to H3 or Quadbin cells with
. Preserve the location identifier and an
flag.
Success: One row per cell per location, with location ID and type flag.
Step 4: Find Overlap
Use
(inner join on the spatial index column) between existing-location cells and proposed-location cells. The result contains cells shared by at least one existing and one proposed location.
Success: Output contains only cells that fall in both an existing and a proposed catchment.
Step 5: Measure Impact
Use
to aggregate overlap:
- Per existing location: count of overlapping cells / total cells in that location's catchment = overlap percentage
- Enrich overlap cells with population or revenue to quantify shared demand
Use
to compute the overlap ratio.
Success: Each existing location has an overlap metric showing how much of its catchment is shared with proposed locations.
Step 6: Save
Success: Validated workflow with per-location cannibalization metrics.
Pattern C: Twin Areas / Similar Locations
Top-performing locations -> Trade areas -> Enrich -> Build similarity model -> Score all candidate areas -> Rank -> Save
Step 1: Identify Reference Locations
Load the full location dataset. Filter to top performers (e.g. top quartile by revenue) using
or
+
.
Success: A subset of high-performing locations isolated as the reference set.
Step 2: Generate and Enrich Trade Areas
Create isochrone or buffer trade areas around reference locations. Polyfill to H3/Quadbin. Enrich with demographics, POIs, and any relevant variables.
Success: Each reference location has a rich demographic profile.
Step 3: Build Twin Areas Model
Use
native.buildtwinareasmodel
(BUILD_TWIN_AREAS_MODEL) to create a PCA-based similarity model from the enriched reference locations.
- Input: enriched reference locations with numeric feature columns
- The model captures the multivariate "signature" of successful locations
Success: A model artifact that encodes the demographic profile of top performers.
Step 4: Find Similar Locations
Use
native.findsimilarlocations
(FIND_SIMILAR_LOCATIONS) to score all candidate areas against the twin-areas model.
- Input: candidate areas enriched with the same variables used to build the model
- Output: similarity score per candidate
Success: Every candidate area has a similarity score relative to the reference set.
Step 5: Rank and Save
Rank by similarity score descending. Save top candidates.
Success: A ranked list of areas most similar to top-performing locations.
Commercial Hotspots Variant
For demand-driven site selection (e.g. "where is unmet demand highest?"), use
native.commercialhotspots
:
- Build an H3 grid over the study area
- Enrich with the target demand variable (e.g. population aged 15-34)
- Run
native.commercialhotspots
with and
- Filter results by significance ()
- Optionally filter by from existing locations to focus on underserved areas
Note:
uses Python-style list syntax (
), and
is comma-separated — see the
gotchas for details.
Gotchas
- Provider casing & SQL dialect. This skill uses lowercase column names (, , , etc.) — BigQuery / Databricks / Postgres / Redshift convention. On Snowflake, unquoted identifiers surface UPPERCASE — reference them as , , . See
carto-create-workflow/references/providers/<provider>.md
for casing rules and SQL dialect equivalents.
native.commercialhotspots
requires the Retail module of the Analytics Toolbox. Validate with to confirm availability.
- Twin Areas and Similar Locations use PCA internally — results are sensitive to variable selection and scaling. Include only relevant, non-redundant variables. Normalize inputs if scales differ widely.
- Cannibalization overlap depends heavily on trade area definition (buffer radius, isoline time). Small changes in parameters can flip results. Document the chosen parameters and rationale.
- returns hop count, not physical distance. Multiply by the approximate cell edge length for the resolution to get a rough metric distance (e.g. res 8 ~ 460m, res 9 ~ 174m per hop).
- When comparing across regions of different sizes, normalize demographics to per-capita or per-area values to avoid size bias (e.g. population density instead of total population).
- The "best" location depends entirely on the criteria and weights chosen — there is no objectively correct answer. Always document assumptions and let the user adjust weights.
- For the twin-areas model, use the same set of enrichment variables for both the reference locations and the candidates. Mismatched variables will cause the model to fail or produce meaningless scores.
Reference Templates
Academy Tutorials
| Tutorial | Provider | URL |
|---|
| Pizza Hut Honolulu — site selection with commercial hotspots | BigQuery | Link |
| Pizza Hut Honolulu — site selection with commercial hotspots | Snowflake | Link |
| Store cannibalization — quantifying new store impact | BigQuery | Link |
| Starbucks cannibalization — H3 grid overlap analysis | BigQuery | Link |
| Store cannibalization — Quadkey grid overlap | Snowflake | Link |
| Find twin areas of top-performing stores | BigQuery | Link |
| Find similar locations based on trade areas | BigQuery | Link |
| EV charging station site selection | Workflows | Link |
Common Variations
| Variant | How |
|---|
| Retail expansion | Isochrones -> enrich with demographics + competitor density -> composite score -> top N |
| Franchise territory planning | Cannibalization pattern to ensure non-overlapping catchments before awarding territories |
| EV charging / public services | Grid-based demand (population, traffic) + distance-from-existing filter -> rank underserved cells |
| Billboard / OOH placement | Buffers -> audience enrichment -> normalize + weight -> top N (see ) |
| Bank branch optimization | Twin areas from top branches -> find similar underserved areas -> propose new branches |
| Competitor proximity analysis | H3 distance to competitor locations -> filter cells far from competitors but near demand |