carto-composite-scoring

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Use this skill whenever the user wants to create a composite score, index, or multi-variable ranking in a CARTO Workflow.
Prerequisites: Load
carto-create-workflow
for the development process.
当用户想要在CARTO Workflow中创建综合得分、指数或多变量排名时,即可使用本技能。
前提条件:开发过程中需加载
carto-create-workflow

Instructions

操作步骤

Step 1: Determine the scoring approach

步骤1:确定评分方法

Ask the user the following decision tree:
  1. "Do you have a target/outcome variable?" (e.g. revenue, sales, crime rate)
    • Yes → Supervised method using
      native.spatialcompositesupervised
  2. "No target, but do you have expert knowledge of variable importance?"
    • Yes → Unsupervised method with
      CUSTOM_WEIGHTS
      using
      native.spatialcompositeunsupervised
  3. "No target, no weights?"
    • Unsupervised method with
      ENTROPY
      or
      FIRST_PC
      using
      native.spatialcompositeunsupervised
Success: You have identified which component and scoring method to use before designing the pipeline.
向用户询问以下决策树问题:
  1. "您是否有目标/结果变量?"(例如收入、销售额、犯罪率)
    • 是 → 使用
      native.spatialcompositesupervised
      有监督方法
  2. "无目标变量,但您是否具备变量重要性的专业知识?"
    • 是 → 使用
      native.spatialcompositeunsupervised
      并搭配
      CUSTOM_WEIGHTS
      无监督方法
  3. "无目标变量,也无权重?"
    • → 使用
      native.spatialcompositeunsupervised
      并搭配
      ENTROPY
      FIRST_PC
      无监督方法
成功标志:在设计流水线前,您已确定要使用的组件和评分方法。

Step 2: Build the pipeline

步骤2:构建流水线

Supervised pipeline (
native.spatialcompositesupervised
)

有监督流水线(
native.spatialcompositesupervised

  1. Load the spatial features dataset (pre-indexed at H3 or Quadbin)
  2. Load the target/outcome dataset
  3. Join both datasets on the spatial index column
  4. Select only the relevant feature columns (drop spatial index column and geometry from feature selection — pass only actual feature variables)
  5. Run
    native.spatialcompositesupervised
    with parameters:
    • model_type
      :
      LINEAR_REG
    • bucketize
      :
      EQUAL_INTERVALS_ZERO_CENTERED
    • n_buckets
      :
      5
    • outlier_removal
      :
      true
    • r_squared_threshold
      :
      0.4
  6. Output: composite score based on regression residuals (identifies areas that over/under-perform relative to the model)
Success: The workflow joins features with the target variable, selects only numeric feature columns, and produces a residual-based score per spatial cell.
  1. 加载空间要素数据集(已预先以H3或Quadbin索引)
  2. 加载目标/结果数据集
  3. 基于空间索引列关联两个数据集
  4. 仅选择相关要素列(从要素选择中移除空间索引列和几何列——仅传入实际要素变量)
  5. 使用以下参数运行
    native.spatialcompositesupervised
    • model_type
      :
      LINEAR_REG
    • bucketize
      :
      EQUAL_INTERVALS_ZERO_CENTERED
    • n_buckets
      :
      5
    • outlier_removal
      :
      true
    • r_squared_threshold
      :
      0.4
  6. 输出:基于回归残差的综合得分(识别相对于模型表现超出/低于预期的区域)
成功标志:工作流将要素与目标变量关联,仅选择数值型要素列,并为每个空间单元生成基于残差的得分。

Unsupervised pipeline (
native.spatialcompositeunsupervised
)

无监督流水线(
native.spatialcompositeunsupervised

  1. Load the spatial features dataset
  2. Select only the relevant feature columns
  3. Encode any categorical/ordinal variables to numeric using
    native.casewhen
    (e.g. "Low_density_urban" → 4, "High_density_urban" → 2)
  4. Optionally reverse variables where higher = worse by multiplying by -1 in the SELECT query passed to the component
  5. Run
    native.spatialcompositeunsupervised
    with parameters:
    • scoring_method
      :
      CUSTOM_WEIGHTS
      /
      ENTROPY
      /
      FIRST_PC
    • weights
      : required if
      CUSTOM_WEIGHTS
      . Wire format is a JSON-encoded string of triples
      [[<column>, <weight>, <reverse>], ...]
      column
      is the feature column name,
      weight
      is a numeric weight (normalized internally to sum to 1),
      reverse
      is a boolean (
      true
      flips the variable so higher = worse becomes higher = better without needing a CASE WHEN). Example:
      '[["population_density", 0.5, false], ["accident_rate", 0.5, true]]'
      . Not an object map.
    • scaling
      :
      RANKING
    • aggregation
      :
      LINEAR
    • output
      :
      RETURN_RANGE
      with range
      [0, 1]
  6. Output: composite index score per location
Success: All input variables are numeric, variable directions are aligned (higher = better for the score), and the output is a normalized score per spatial cell.
  1. 加载空间要素数据集
  2. 仅选择相关要素列
  3. 使用
    native.casewhen
    将任何分类/有序变量编码为数值(例如 "Low_density_urban" → 4,"High_density_urban" → 2)
  4. (可选)在传入组件的SELECT查询中,将数值越高代表结果越差的变量乘以-1进行反转
  5. 使用以下参数运行
    native.spatialcompositeunsupervised
    • scoring_method
      :
      CUSTOM_WEIGHTS
      /
      ENTROPY
      /
      FIRST_PC
    • weights
      : 若使用
      CUSTOM_WEIGHTS
      则为必填项。格式为JSON编码的三元组字符串
      [[<column>, <weight>, <reverse>], ...]
      ——
      column
      为要素列名称,
      weight
      为数值权重(内部会归一化至总和为1),
      reverse
      为布尔值(
      true
      会反转变量,无需CASE WHEN即可将“数值越高越差”转为“数值越高越好”)。示例:
      '[["population_density", 0.5, false], ["accident_rate", 0.5, true]]'
      注意不是对象映射
    • scaling
      :
      RANKING
    • aggregation
      :
      LINEAR
    • output
      :
      RETURN_RANGE
      ,范围为
      [0, 1]
  6. 输出:每个位置的综合指数得分
成功标志:所有输入变量均为数值型,变量方向已对齐(得分越高代表结果越好),且输出为每个空间单元的归一化得分。

Gotchas

注意事项

  • Provider casing & SQL dialect. This skill uses lowercase column names (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
    POPULATION_DENSITY
    ,
    ACCIDENT_RATE
    , etc. in weights, expressions, and downstream SQL. See
    carto-create-workflow/references/providers/<provider>.md
    for casing rules and SQL dialect equivalents.
  • All input variables must be numeric. Ordinal strings (e.g. "low"/"medium"/"high") must be manually encoded via CASE WHEN before passing to the component.
  • Variable direction matters. If "higher is worse" for a variable, multiply by -1 before scoring. Forgetting this inverts the score meaning.
  • Supervised R-squared threshold (default 0.4) is permissive. If model fit is poor, the residual-based score is mostly noise. Inspect model diagnostics.
  • Custom weights are normalized internally to sum to 1. The absolute values do not matter, only the ratios.
  • Supervised scores are residuals, not raw values. The score identifies areas that DEVIATE from the model, not areas with the highest raw values.
  • Drop the spatial index column and geometry from the feature selection — only pass actual feature variables to the scoring component.
  • 服务商大小写规则与SQL方言:本技能采用小写列名(BigQuery / Databricks / Postgres / Redshift的惯例)。在Snowflake中,未加引号的标识符会显示为大写——在权重、表达式和下游SQL中需引用
    POPULATION_DENSITY
    ACCIDENT_RATE
    等名称。可查看
    carto-create-workflow/references/providers/<provider>.md
    获取大小写规则和SQL方言对应关系。
  • 所有输入变量必须为数值型:有序字符串(例如"low"/"medium"/"high")必须在传入组件前通过CASE WHEN手动编码。
  • 变量方向至关重要:若某个变量“数值越高代表结果越差”,需在评分前乘以-1。遗漏此步骤会导致得分含义反转。
  • 有监督R平方阈值(默认0.4)较为宽松。若模型拟合度差,基于残差的得分大多为噪声。需检查模型诊断信息。
  • 自定义权重会在内部归一化至总和为1。绝对值无关紧要,仅比例有效。
  • 有监督得分是残差,而非原始数值。得分识别的是与模型偏离的区域,而非原始数值最高的区域。
  • 从要素选择中移除空间索引列和几何列——仅将实际要素变量传入评分组件。

Reference Templates

参考模板

Both examples use Milan spatial features data at Quadbin resolution 18.
TemplateComponentFile
Supervised — Identifying resilient neighbourhoods
native.spatialcompositesupervised
composite-score-supervised.json
Unsupervised — Market potential scoring
native.spatialcompositeunsupervised
composite-score-unsupervised.json
两个示例均使用Quadbin分辨率18的米兰空间要素数据。
模板组件文件
有监督——识别韧性社区
native.spatialcompositesupervised
composite-score-supervised.json
无监督——市场潜力评分
native.spatialcompositeunsupervised
composite-score-unsupervised.json

Common Variations

常见变体

VariationApproach
Risk index (flood, crime, etc.)Unsupervised with
CUSTOM_WEIGHTS
; reverse variables where higher = safer
Market potential / site scoringUnsupervised with
CUSTOM_WEIGHTS
or
ENTROPY
; weight demand-side variables higher
Resilience indexSupervised with outcome variable (e.g. revenue change); residuals reveal over/under-performers
Data-driven index (no domain expertise)Unsupervised with
FIRST_PC
or
ENTROPY
to let variance drive the weights
Human development / composite indicatorUnsupervised with
CUSTOM_WEIGHTS
and
RANKING
scaling for ordinal-safe aggregation
变体实现方法
风险指数(洪水、犯罪等)使用
CUSTOM_WEIGHTS
的无监督方法;反转“数值越高越安全”的变量
市场潜力/场地评分使用
CUSTOM_WEIGHTS
ENTROPY
的无监督方法;提高需求侧变量的权重
韧性指数使用带结果变量(例如收入变化)的有监督方法;残差可揭示表现超出/低于预期的对象
数据驱动型指数(无领域专业知识)使用
FIRST_PC
ENTROPY
的无监督方法,让方差决定权重
人类发展/综合指标使用
CUSTOM_WEIGHTS
RANKING
缩放的无监督方法,实现有序安全聚合