Agent Skill · Carto

carto-composite-scoring

Guides the user through building composite score workflows when they ask about composite scores, indexes, multi-variable scores, ranking areas, site scoring, market potential, resilience indexes, risk indexes, weighted scores, PCA, or supervised/unsupervised scoring.

View SKILL.md on GitHub → Source repository Provider profile

Provider: Carto Path in repo: skills/carto-composite-scoring/SKILL.md

Skill body

Use this skill whenever the user wants to create a composite score, index, or multi-variable ranking in a CARTO Workflow.

Prerequisites: Load carto-create-workflow for the development process.

Instructions

Step 1: Determine the scoring approach

Ask the user the following decision tree:

“Do you have a target/outcome variable?” (e.g. revenue, sales, crime rate)
- Yes → Supervised method using native.spatialcompositesupervised
“No target, but do you have expert knowledge of variable importance?” (i.e. custom weights)
- 2a. Transparency matters more than node count → native.normalize per variable + native.selectexpression with the weighted-sum literally in SQL (a_norm * 0.5 + b_norm * 0.2 + c_norm * 0.3). Inputs and weights are inspectable inline in Workflows; each normalized column is materialized; no Analytics Toolbox dependency. Use when stakeholders need to see/audit the formula, or when you’re not running on AT.
- 2b. Want the AT pipeline (scaling, aggregation, bucketing) handled for you → native.spatialcompositeunsupervised with scoring_method: CUSTOM_WEIGHTS. One node, but the formula is hidden inside the component.
“No target, no weights?”
- → Unsupervised method with ENTROPY or FIRST_PC using native.spatialcompositeunsupervised

Success: You have identified which component and scoring method to use before designing the pipeline.

Step 2: Build the pipeline

Supervised pipeline (`native.spatialcompositesupervised`)

Load the spatial features dataset (pre-indexed at H3 or Quadbin)
Load the target/outcome dataset
Join both datasets on the spatial index column
Select only the relevant feature columns (drop spatial index column and geometry from feature selection — pass only actual feature variables)
Run native.spatialcompositesupervised with parameters:
- model_type: LINEAR_REG
- bucketize: EQUAL_INTERVALS_ZERO_CENTERED
- n_buckets: 5
- outlier_removal: true
- r_squared_threshold: 0.4
Output: composite score based on regression residuals (identifies areas that over/under-perform relative to the model)

Success: The workflow joins features with the target variable, selects only numeric feature columns, and produces a residual-based score per spatial cell.

Unsupervised pipeline (`native.spatialcompositeunsupervised`)

Load the spatial features dataset
Select only the relevant feature columns
Encode any categorical/ordinal variables to numeric using native.casewhen (e.g. “Low_density_urban” → 4, “High_density_urban” → 2)
Optionally reverse variables where higher = worse by multiplying by -1 in the SELECT query passed to the component
Run native.spatialcompositeunsupervised with parameters:
- scoring_method: CUSTOM_WEIGHTS / ENTROPY / FIRST_PC
- weights: required if CUSTOM_WEIGHTS. Wire format is a JSON-encoded string of triples [[<column>, <weight>, <reverse>], ...] — column is the feature column name, weight is a numeric weight (normalized internally to sum to 1), reverse is a boolean (true flips the variable so higher = worse becomes higher = better without needing a CASE WHEN). Example: '[["population_density", 0.5, false], ["accident_rate", 0.5, true]]'. Not an object map.
- scaling: RANKING
- aggregation: LINEAR
- output: RETURN_RANGE with range [0, 1]
Output: composite index score per location

Success: All input variables are numeric, variable directions are aligned (higher = better for the score), and the output is a normalized score per spatial cell.

Gotchas

Provider casing & SQL dialect. This skill uses lowercase column names (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference POPULATION_DENSITY, ACCIDENT_RATE, etc. in weights, expressions, and downstream SQL. See carto-create-workflow/references/providers/<provider>.md for casing rules and SQL dialect equivalents.
All input variables must be numeric. Ordinal strings (e.g. “low”/”medium”/”high”) must be manually encoded via CASE WHEN before passing to the component.
Variable direction matters. If “higher is worse” for a variable, multiply by -1 before scoring. Forgetting this inverts the score meaning.
Supervised R-squared threshold (default 0.4) is permissive. If model fit is poor, the residual-based score is mostly noise. Inspect model diagnostics.
Custom weights are normalized internally to sum to 1. The absolute values do not matter, only the ratios.
Supervised scores are residuals, not raw values. The score identifies areas that DEVIATE from the model, not areas with the highest raw values.
Drop the spatial index column and geometry from the feature selection — only pass actual feature variables to the scoring component.
weights type label is misreported by the CLI. carto workflows components get native.spatialcompositeunsupervised --json lists weights as type: "ColumnNumber", but the actual wire shape is a ColumnAndNumber JSON-encoded string of triples [[<column>, <weight>, <reverse>], ...] (see Step 2 of the unsupervised pipeline above). Trust this skill and the reference templates over the CLI’s reported type label for this input.
indexcol.allowedColumns is a UI hint, not a validator. The schema lists ["geoid", "h3", "quadbin"] for native.spatialcompositeunsupervised.indexcol, but it’s a Workflows dropdown hint — any unique identifier column works at runtime (e.g. store_id, cell_id).

Reference Templates

Both examples use Milan spatial features data at Quadbin resolution 18.

Template	Component	File
Supervised — Identifying resilient neighbourhoods	`native.spatialcompositesupervised`	composite-score-supervised.json
Unsupervised — Market potential scoring	`native.spatialcompositeunsupervised`	composite-score-unsupervised.json

Common Variations

Variation	Approach
Risk index (flood, crime, etc.)	Unsupervised with `CUSTOM_WEIGHTS`; reverse variables where higher = safer
Market potential / site scoring	Unsupervised with `CUSTOM_WEIGHTS` or `ENTROPY`; weight demand-side variables higher
Resilience index	Supervised with outcome variable (e.g. revenue change); residuals reveal over/under-performers
Data-driven index (no domain expertise)	Unsupervised with `FIRST_PC` or `ENTROPY` to let variance drive the weights
Human development / composite indicator	Unsupervised with `CUSTOM_WEIGHTS` and `RANKING` scaling for ordinal-safe aggregation

Skill frontmatter

license: MIT