carto-spatial-autocorrelation
Builds Moran's I spatial autocorrelation workflows in CARTO. Triggers when the user mentions spatial autocorrelation, Moran's I, spatial dependency, spatial correlation, spatial outliers, HH HL LH LL quadrants, high-high clusters, low-low clusters, spatial weight matrix, "is there clustering", "are values spatially correlated", local indicators of spatial association, LISA, spatial randomness test, or wants to determine whether a variable exhibits spatial clustering, dispersion, or randomness across a gridded dataset. Also relevant when the user needs to classify locations into cluster types (HH, HL, LH, LL) rather than just identifying hotspots and coldspots.
Skill body
Spatial Autocorrelation with Moran’s I
Builds CARTO Workflows that measure spatial autocorrelation using Moran’s I, determining whether a variable exhibits clustering, dispersion, or randomness, and classifying each location into HH/HL/LH/LL quadrants.
Prerequisites: Load carto-create-workflow for the development process, JSON structure, and validation commands.
When to use Moran’s I vs Getis-Ord Gi*:
- Moran’s I: “Is there clustering?” + classify into cluster types (HH, HL, LH, LL) + identify spatial outliers (HL, LH)
- Getis-Ord Gi*: “Where are the hotspots/coldspots?” + magnitude of clustering (z-scores)
Instructions
A Moran’s I workflow follows this pipeline:
Source Data -> (Filter) -> Spatial Indexing (H3) -> Aggregation -> Moran's I -> (Filter Significant) -> Save
Step 1: Load Source Data
Use native.gettablebyname. The input table typically contains point geometries or pre-indexed grid data.
Success: Node outputs a table with a geometry column (e.g. geom) or an existing spatial index column.
Step 2: Filter (if needed)
Use native.wheresimplified or native.where to narrow the dataset (e.g. filter by category, date range, non-null values).
Success: Output contains only the subset relevant to the analysis.
Step 3: Spatial Indexing
Convert point geometries to H3 cells using native.h3frompoint.
Resolution guidance – higher resolution = smaller cells = more local patterns:
| Resolution | Cell size | Use case |
|---|---|---|
| H3 res 7 | ~5 km edge | District/city-level patterns |
| H3 res 8 | ~2 km edge | Neighborhood-level |
| H3 res 9 | ~500m edge | Street-level (used in Berlin POI tutorial) |
Success: Every row has a spatial index column (e.g. h3).
Step 4: Aggregate per Cell
Use native.groupby to produce one row per cell with a numeric value:
- Group by: the spatial index column (
h3) - Aggregation:
geoid,count(orvalue_col,sum/value_col,avg)
Success: Output has exactly one row per unique cell with a numeric column (e.g. geoid_count).
Step 5: Run Moran’s I
Use native.moransi with:
| Input | Description | Default |
|---|---|---|
indexcol |
Column with H3/Quadbin indexes | h3 |
valuecol |
Numeric column to test for autocorrelation | geoid_count |
size |
K-ring neighborhood radius (in hops) | 3 |
decay |
Distance decay function for spatial weights | uniform |
Decay options: uniform, inverse, inverse_square, exponential.
uniform: Equal weight to all neighbors within the k-ringexponential: Weight decreases exponentially with distance (used in Berlin POI tutorial)
K-ring size: Larger = broader neighborhood = smoother global patterns. Smaller = more localized assessment. The choice of neighborhood size significantly affects results.
Success: Output contains index, morans_i, p_value, and quadrant columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
Step 6: Filter Significant Results (recommended)
Use native.where to keep only statistically significant cells. Quadrant classification is only meaningful for significant cells.
Common filters:
p_value < 0.05– all significant cells (95% confidence)p_value < 0.05 AND quadrant = 'HH'– high-value clusters onlyp_value < 0.05 AND (quadrant = 'HL' OR quadrant = 'LH')– spatial outliers only
Success: Only cells with statistically meaningful spatial patterns remain.
Step 7: Save
Use native.saveastable to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.
Success: Validated workflow that can be uploaded via carto workflows create.
Output Columns
| Column | Meaning |
|---|---|
index |
Spatial index cell ID (H3 or Quadbin) |
morans_i |
Local Moran’s I value – positive = similar neighbors, negative = dissimilar neighbors |
p_value |
Statistical significance – lower = more confident |
quadrant |
Cluster classification: HH, HL, LH, or LL |
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.
Interpreting Results
Global Moran’s I (overall pattern):
- > 0 = spatial clustering (similar values near each other)
- < 0 = spatial dispersion (dissimilar values near each other)
- Near 0 = spatial randomness
Local quadrants (per-cell classification): | Quadrant | Meaning | Interpretation | |———-|———|—————-| | HH | High value surrounded by high values | Cluster core | | LL | Low value surrounded by low values | Low-value cluster | | HL | High value surrounded by low values | Spatial outlier (high anomaly) | | LH | Low value surrounded by high values | Spatial outlier (low anomaly) |
Gotchas
- Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
H3,INDEX,MORANS_I,P_VALUE,QUADRANT,GEOID_COUNTin expressions. Seecarto-create-workflow/references/providers/<provider>.mdfor casing rules and SQL dialect equivalents. - The Moran’s I component requires the Analytics Toolbox. Always run
carto workflows verify-remote --connection <conn>to ensure the AT path is resolved.carto workflows validateis offline and cannot resolve AT location. - The output column is named
index, noth3orquadbin. If you need to join back to original data, rename it (e.g. withnative.renamecolumn). This is the same behavior as Getis-Ord. - The
valuecolmust be numeric. If you are counting features, the group-by step must produce a count column – do not pass the raw index column as the value. - Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
- Moran’s I is sensitive to the definition of neighborhood. Both k-ring size and decay function choice materially affect results. Document your choices and consider testing alternatives.
- Quadrant classification is only meaningful for statistically significant cells. Always filter by
p_valuebefore interpreting quadrants – non-significant cells may show any quadrant label by chance. - The decay input parameter is named
decay(notkernel). Check the component schema if unsure.
Reference Templates
| Resource | Description |
|---|---|
| BQ Tutorial | Computing spatial autocorrelation of POI locations in Berlin (BigQuery) |
| SF Tutorial | Same tutorial for Snowflake |
| Workflow template | “Computing the spatial auto-correlation of point of interest locations” (available in CARTO Workspace) |
Common Variations
| Variant | How |
|---|---|
| Pre-indexed data | Skip Step 3 if data already has H3/Quadbin column |
| Polygon input instead of points | Use native.h3polyfill instead of native.h3frompoint |
| Complete grid (no gaps) | Polyfill study area boundary first, then enrich with data (same approach as hotspot analysis) |
| Combine with Getis-Ord | Run both analyses on the same aggregated grid, then join results for a richer picture |
| Filter to outliers only | Keep HL and LH quadrants to find anomalous locations |