Agent Skill · Carto

carto-site-selection

Builds site selection and cannibalization analysis workflows in CARTO. Triggers when the user mentions site selection, cannibalization, cannibalizing, new store location, where to open, optimal location, facility placement, network impact, overlapping catchments, twin areas, similar locations, look-alike areas, find locations like my best, store overlap, revenue impact of new store, commercial hotspots, demand hotspots, location scoring, location ranking, expand network, new branch, franchise placement, EV charging siting, or wants to evaluate candidate sites, quantify overlap between trade areas, or find areas that resemble top-performing locations.

Provider: Carto Path in repo: skills/carto-site-selection/SKILL.md

Skill body

Site Selection and Cannibalization Analysis

Builds CARTO Workflows that identify optimal locations for new facilities (stores, stations, offices) by combining spatial criteria, and that quantify cannibalization risk from overlapping catchment areas. Also covers twin-area and similar-location discovery.

Prerequisites: Load carto-create-workflow for the development process, JSON structure, and validation commands. Load carto-trade-area-analysis if the workflow involves isochrones, buffers, or catchment enrichment — that skill covers the catchment pipeline in detail.


Decision Tree

User intent Pattern
“Where should I open a new store?” Site Selection (scoring + ranking)
“Will a new store hurt existing ones?” Cannibalization Analysis
“Find locations similar to my best performers” Twin Areas / Similar Locations

Instructions

Pattern A: Site Selection (Scoring + Ranking)

Existing locations + Target area -> Spatial indexing -> Enrich with demographics/POIs -> Score/Rank -> Filter top candidates -> Save

Step 1: Load Data

Load two datasets with native.gettablebyname:

Splitting candidates from existing network when both sit in one table. If the user’s source is a single table where “candidates” are defined by a numeric threshold (e.g. revenue ≥ X, size ≥ Y), use one native.where with that predicate and wire its match output to the candidate pipeline and its unmatch output to the existing-network pipeline. Two streams from one node — dramatically cleaner than orderby + limit + customsql NOT IN, and avoids a customsql entirely. The threshold approximates “top N”; if the user explicitly says “exactly top 25”, orderby + limit is still the right call but pay the extra-node cost knowingly.

Success: Both tables loaded with geometry columns and unique identifiers.

Step 2: Build Candidate Grid

Polyfill the target area into H3 or Quadbin cells using native.h3polyfill or native.quadbinpolyfill. Each cell is a candidate micro-location.

Success: A contiguous grid of cells covering the study area.

Step 3: Enrich Candidates

Attach demand signals to each cell — population, income, foot traffic, POI density — using native.h3enrich, native.joinv2, or the Data Observatory.

If candidates are continuous polygons (catchments / districts) rather than a grid, the choice between native.enrichpolygons and native.spatialjoin + native.groupby matters — see carto-spatial-enrichment Step 4 for the trade-off (AT dependency vs node count vs predicate / join control).

Success: Each grid cell has numeric columns representing demand/suitability factors.

Step 4: Filter by Proximity to Existing Locations

Use native.h3distance to compute hop distance from each candidate cell to the nearest existing location. Filter out cells that are too close (cannibalization risk) or too far (logistics cost).

Distance semantics — measure cannibalization from the catchment polygon, not the candidate point. When candidates are points with a generated trade area (isoline / buffer — see carto-trade-area-analysis), compute native.distance from the candidate’s catchment polygon to the existing competitor points, not from the candidate point itself. Reason: ST_DISTANCE(polygon, point) = 0 whenever the competitor sits inside the trade area — exactly the cannibalization signal you want. Point-to-point distance only reports “how far the nearest competitor is” and hides overlap. Pick the radius consistent with the trade-area size (e.g. ~5 km for a 10-minute walk catchment); large radii dilute the signal. This polygon-to-point pattern is the middle ground between Pattern A’s H3 hop filter and Pattern B’s full grid-overlap analysis — use it whenever the candidate already has a continuous catchment geometry.

Success: Candidate cells are within a sensible distance band from existing locations, and any cannibalization signal is measured from the catchment polygon when one exists.

Step 5: Score and Rank

Use the scoring pattern from trade-area-analysis:

  1. Normalize each variable to [0,1] with native.normalize
  2. Composite score via native.selectexpression with user-defined weights
  3. Rank with native.orderby (descending) + native.limit (top N)

Success: A ranked shortlist of candidate cells with composite scores and contributing variables.

Step 6: Save

Use native.saveastable. The H3/Quadbin column is directly visualizable in CARTO Builder.

Success: Validated workflow ready to upload.


Pattern B: Cannibalization Analysis

Existing + Proposed locations -> Trade areas (isoline/buffer) -> Polyfill to grid -> Intersect/Join -> Measure overlap -> Save

Step 1: Load Data

Load existing locations and proposed locations (or a single table with a flag column distinguishing them). If both sets sit in a single table and the split is defined by a numeric threshold (e.g. revenue ≥ X), see Pattern A Step 1’s native.where match/unmatch split — it produces both streams from one node and avoids customsql.

Success: Both sets loaded with geometry and unique identifiers.

Step 2: Generate Trade Areas

Create catchment areas around both existing and proposed locations using native.isolines (realistic) or native.buffer (simple). Use the same parameters for both sets to ensure comparability.

Success: Every location has a catchment polygon with consistent parameters.

Step 3: Polyfill to Spatial Index

Convert all catchment polygons to H3 or Quadbin cells with native.h3polyfill. Preserve the location identifier and an is_proposed flag.

Success: One row per cell per location, with location ID and type flag.

Step 4: Find Overlap

Use native.joinv2 (inner join on the spatial index column) between existing-location cells and proposed-location cells. The result contains cells shared by at least one existing and one proposed location.

Success: Output contains only cells that fall in both an existing and a proposed catchment.

Step 5: Measure Impact

Use native.groupby to aggregate overlap:

Use native.selectexpression to compute the overlap ratio.

Success: Each existing location has an overlap metric showing how much of its catchment is shared with proposed locations.

Step 6: Save

Use native.saveastable.

Success: Validated workflow with per-location cannibalization metrics.


Pattern C: Twin Areas / Similar Locations

Top-performing locations -> Trade areas -> Enrich -> Build similarity model -> Score all candidate areas -> Rank -> Save

Step 1: Identify Reference Locations

Load the full location dataset. Filter to top performers (e.g. top quartile by revenue) using native.wheresimplified or native.orderby + native.limit.

Success: A subset of high-performing locations isolated as the reference set.

Step 2: Generate and Enrich Trade Areas

Create isochrone or buffer trade areas around reference locations. Polyfill to H3/Quadbin. Enrich with demographics, POIs, and any relevant variables.

Success: Each reference location has a rich demographic profile.

Step 3: Build Twin Areas Model

Use native.buildtwinareasmodel (BUILD_TWIN_AREAS_MODEL) to create a PCA-based similarity model from the enriched reference locations.

Success: A model artifact that encodes the demographic profile of top performers.

Step 4: Find Similar Locations

Use native.findsimilarlocations (FIND_SIMILAR_LOCATIONS) to score all candidate areas against the twin-areas model.

Success: Every candidate area has a similarity score relative to the reference set.

Step 5: Rank and Save

Rank by similarity score descending. Save top candidates.

Success: A ranked list of areas most similar to top-performing locations.


Commercial Hotspots Variant

For demand-driven site selection (e.g. “where is unmet demand highest?”), use native.commercialhotspots:

  1. Build an H3 grid over the study area
  2. Enrich with the target demand variable (e.g. population aged 15-34)
  3. Run native.commercialhotspots with variablecolumns and weights
  4. Filter results by significance (p_value < 0.05)
  5. Optionally filter by native.h3distance from existing locations to focus on underserved areas

Note: variablecolumns uses Python-style list syntax (['col1', 'col2']), and weights is comma-separated — see the trade-area-analysis gotchas for details.


Gotchas


Reference Templates

Academy Tutorials

Tutorial Provider URL
Pizza Hut Honolulu — site selection with commercial hotspots BigQuery Link
Pizza Hut Honolulu — site selection with commercial hotspots Snowflake Link
Store cannibalization — quantifying new store impact BigQuery Link
Starbucks cannibalization — H3 grid overlap analysis BigQuery Link
Store cannibalization — Quadkey grid overlap Snowflake Link
Find twin areas of top-performing stores BigQuery Link
Find similar locations based on trade areas BigQuery Link
EV charging station site selection Workflows Link

Common Variations

Variant How
Retail expansion Isochrones -> enrich with demographics + competitor density -> composite score -> top N
Franchise territory planning Cannibalization pattern to ensure non-overlapping catchments before awarding territories
EV charging / public services Grid-based demand (population, traffic) + distance-from-existing filter -> rank underserved cells
Billboard / OOH placement Buffers -> audience enrichment -> normalize + weight -> top N (see trade-area-analysis)
Bank branch optimization Twin areas from top branches -> find similar underserved areas -> propose new branches
Competitor proximity analysis H3 distance to competitor locations -> filter cells far from competitors but near demand

Skill frontmatter

license: MIT