carto-composite-scoring
Guides the user through building composite score workflows when they ask about composite scores, indexes, multi-variable scores, ranking areas, site scoring, market potential, resilience indexes, risk indexes, weighted scores, PCA, or supervised/unsupervised scoring.
Skill body
Use this skill whenever the user wants to create a composite score, index, or multi-variable ranking in a CARTO Workflow.
Prerequisites: Load carto-create-workflow for the development process.
Instructions
Step 1: Determine the scoring approach
Ask the user the following decision tree:
- “Do you have a target/outcome variable?” (e.g. revenue, sales, crime rate)
- Yes → Supervised method using
native.spatialcompositesupervised
- Yes → Supervised method using
- “No target, but do you have expert knowledge of variable importance?” (i.e. custom weights)
- 2a. Transparency matters more than node count →
native.normalizeper variable +native.selectexpressionwith the weighted-sum literally in SQL (a_norm * 0.5 + b_norm * 0.2 + c_norm * 0.3). Inputs and weights are inspectable inline in Workflows; each normalized column is materialized; no Analytics Toolbox dependency. Use when stakeholders need to see/audit the formula, or when you’re not running on AT. - 2b. Want the AT pipeline (scaling, aggregation, bucketing) handled for you →
native.spatialcompositeunsupervisedwithscoring_method: CUSTOM_WEIGHTS. One node, but the formula is hidden inside the component.
- 2a. Transparency matters more than node count →
- “No target, no weights?”
- → Unsupervised method with
ENTROPYorFIRST_PCusingnative.spatialcompositeunsupervised
- → Unsupervised method with
Success: You have identified which component and scoring method to use before designing the pipeline.
Step 2: Build the pipeline
Supervised pipeline (native.spatialcompositesupervised)
- Load the spatial features dataset (pre-indexed at H3 or Quadbin)
- Load the target/outcome dataset
- Join both datasets on the spatial index column
- Select only the relevant feature columns (drop spatial index column and geometry from feature selection — pass only actual feature variables)
- Run
native.spatialcompositesupervisedwith parameters:model_type:LINEAR_REGbucketize:EQUAL_INTERVALS_ZERO_CENTEREDn_buckets:5outlier_removal:truer_squared_threshold:0.4
- Output: composite score based on regression residuals (identifies areas that over/under-perform relative to the model)
Success: The workflow joins features with the target variable, selects only numeric feature columns, and produces a residual-based score per spatial cell.
Unsupervised pipeline (native.spatialcompositeunsupervised)
- Load the spatial features dataset
- Select only the relevant feature columns
- Encode any categorical/ordinal variables to numeric using
native.casewhen(e.g. “Low_density_urban” → 4, “High_density_urban” → 2) - Optionally reverse variables where higher = worse by multiplying by -1 in the SELECT query passed to the component
- Run
native.spatialcompositeunsupervisedwith parameters:scoring_method:CUSTOM_WEIGHTS/ENTROPY/FIRST_PCweights: required ifCUSTOM_WEIGHTS. Wire format is a JSON-encoded string of triples[[<column>, <weight>, <reverse>], ...]—columnis the feature column name,weightis a numeric weight (normalized internally to sum to 1),reverseis a boolean (trueflips the variable so higher = worse becomes higher = better without needing a CASE WHEN). Example:'[["population_density", 0.5, false], ["accident_rate", 0.5, true]]'. Not an object map.scaling:RANKINGaggregation:LINEARoutput:RETURN_RANGEwith range[0, 1]
- Output: composite index score per location
Success: All input variables are numeric, variable directions are aligned (higher = better for the score), and the output is a normalized score per spatial cell.
Gotchas
- Provider casing & SQL dialect. This skill uses lowercase column names (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
POPULATION_DENSITY,ACCIDENT_RATE, etc. in weights, expressions, and downstream SQL. Seecarto-create-workflow/references/providers/<provider>.mdfor casing rules and SQL dialect equivalents. - All input variables must be numeric. Ordinal strings (e.g. “low”/”medium”/”high”) must be manually encoded via CASE WHEN before passing to the component.
- Variable direction matters. If “higher is worse” for a variable, multiply by -1 before scoring. Forgetting this inverts the score meaning.
- Supervised R-squared threshold (default 0.4) is permissive. If model fit is poor, the residual-based score is mostly noise. Inspect model diagnostics.
- Custom weights are normalized internally to sum to 1. The absolute values do not matter, only the ratios.
- Supervised scores are residuals, not raw values. The score identifies areas that DEVIATE from the model, not areas with the highest raw values.
- Drop the spatial index column and geometry from the feature selection — only pass actual feature variables to the scoring component.
weightstype label is misreported by the CLI.carto workflows components get native.spatialcompositeunsupervised --jsonlistsweightsastype: "ColumnNumber", but the actual wire shape is aColumnAndNumberJSON-encoded string of triples[[<column>, <weight>, <reverse>], ...](see Step 2 of the unsupervised pipeline above). Trust this skill and the reference templates over the CLI’s reported type label for this input.indexcol.allowedColumnsis a UI hint, not a validator. The schema lists["geoid", "h3", "quadbin"]fornative.spatialcompositeunsupervised.indexcol, but it’s a Workflows dropdown hint — any unique identifier column works at runtime (e.g.store_id,cell_id).
Reference Templates
Both examples use Milan spatial features data at Quadbin resolution 18.
| Template | Component | File |
|---|---|---|
| Supervised — Identifying resilient neighbourhoods | native.spatialcompositesupervised |
composite-score-supervised.json |
| Unsupervised — Market potential scoring | native.spatialcompositeunsupervised |
composite-score-unsupervised.json |
Common Variations
| Variation | Approach |
|---|---|
| Risk index (flood, crime, etc.) | Unsupervised with CUSTOM_WEIGHTS; reverse variables where higher = safer |
| Market potential / site scoring | Unsupervised with CUSTOM_WEIGHTS or ENTROPY; weight demand-side variables higher |
| Resilience index | Supervised with outcome variable (e.g. revenue change); residuals reveal over/under-performers |
| Data-driven index (no domain expertise) | Unsupervised with FIRST_PC or ENTROPY to let variance drive the weights |
| Human development / composite indicator | Unsupervised with CUSTOM_WEIGHTS and RANKING scaling for ordinal-safe aggregation |