building-dbt-semantic-layer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Building the dbt Semantic Layer

构建dbt Semantic Layer

This skill guides the creation and modification of dbt Semantic Layer components: semantic models, entities, dimensions, and metrics.
  • Semantic models - Metadata configurations that define how dbt models map to business concepts
  • Entities - Keys that identify the grain of your data and enable joins between semantic models
  • Dimensions - Attributes used to filter or group metrics (categorical or time-based)
  • Metrics - Business calculations defined on top of semantic models (e.g., revenue, order count)
本技能指导dbt Semantic Layer组件(语义模型、实体、维度和指标)的创建与修改。
  • 语义模型 - 定义dbt模型如何映射到业务概念的元数据配置
  • 实体 - 标识数据粒度并支持语义模型之间关联的键
  • 维度 - 用于过滤或分组指标的属性(分类或基于时间)
  • 指标 - 基于语义模型定义的业务计算(例如:收入、订单数量)

Additional Resources

额外资源

  • Time Spine Setup - Required for time-based metrics and aggregations
  • Best Practices - Design patterns and recommendations for semantic models and metrics
[!NOTE] This skill contains guidance for the new dbt semantic layer YAML spec, valid for dbt 1.12.0 and above. If the user is using a different version of dbt, you can use the migration guide to help them migrate to the new spec and add new components to their semantic layer. Ask the user if they want to migrate to the new spec before proceeding.
  • 时间轴设置 - 基于时间的指标和聚合所需的配置
  • 最佳实践 - 语义模型和指标的设计模式与建议
[!NOTE] 本技能包含针对新版dbt语义层YAML规范的指导,适用于dbt 1.12.0及以上版本。如果用户使用的是其他版本的dbt,你可以使用迁移指南帮助他们迁移到新版规范,并在语义层中添加新组件。在继续操作前,请询问用户是否需要迁移到新版规范。

Entry Points

入口场景

Users may ask questions related to building metrics with the semantic layer in a few different ways. Here are the common entry points to look out for:
用户可能会通过几种不同的方式询问与使用语义层构建指标相关的问题。以下是需要注意的常见入口场景:

Business Question First

先提出业务问题

When the user describes a metric or analysis need (e.g., "I need to track customer lifetime value by segment"):
  1. Search project models or existing semantic models by name, description, and column names for relevant candidates
  2. Present top matches with brief context (model name, description, key columns)
  3. User confirms which model(s) / semantic models to build on / extend / update
  4. Work backwards from users need to define entities, dimensions, and metrics
当用户描述指标或分析需求时(例如:"我需要按细分群体跟踪客户生命周期价值"):
  1. 按名称、描述和列名搜索项目模型或现有语义模型,找到相关候选
  2. 展示匹配度最高的结果及简要上下文(模型名称、描述、关键列)
  3. 用户确认要基于/扩展/更新的模型/语义模型
  4. 从用户需求倒推,定义实体、维度和指标

Model First

先指定模型

When the user specifies a model to expose (e.g., "Add semantic layer to
customers
model"):
  1. Read the model SQL and existing YAML config
  2. Identify the grain (primary key / entity)
  3. Suggest dimensions based on column types and names
  4. Ask what metrics the user wants to define
Both paths converge on the same implementation workflow.
当用户指定要暴露的模型时(例如:"为
customers
模型添加语义层"):
  1. 读取模型SQL和现有YAML配置
  2. 确定数据粒度(主键/实体)
  3. 根据列类型和名称建议维度
  4. 询问用户要定义哪些指标
两种路径最终都会汇聚到相同的实现流程中。

Open Ended

开放式需求

User asks to build the semantic layer for a project or models that are not specified. ("Build the semantic layer for my project")
  1. Identify high importance models in the project
  2. Suggest some metrics and dimensions for those models
  3. Ask the user if they want to create more metrics and dimensions or if there are any other models they want to build the semantic layer on
用户要求为未指定的项目或模型构建语义层(例如:"为我的项目构建语义层"):
  1. 确定项目中高优先级的模型
  2. 为这些模型建议一些指标和维度
  3. 询问用户是否需要创建更多指标和维度,或者是否有其他模型需要构建语义层

Implementation Workflow

实现流程

Step 1: Enable Semantic Model

步骤1:启用语义模型

Decide which dbt model to build the semantic model on. Add
semantic_model:
block to the model's YAML with
enabled: true
. Set
agg_time_dimension
to the primary time column. If the model does not have a time column, warn user that the model cannot contain metrics that are time-based. Ask the user if they want to create a derived time dimension.
Example YAML:
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
确定要构建语义模型的dbt模型。在模型的YAML中添加
semantic_model:
块并设置
enabled: true
。将
agg_time_dimension
设置为主要时间列。如果模型没有时间列,需提醒用户该模型无法包含基于时间的指标,并询问用户是否要创建派生时间维度。
示例YAML:
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)

Step 2: Define Entities

步骤2:定义实体

Identify the primary key column (check for
_id
suffix, uniqueness tests, or explicit config). Add
entity: \n\t type: primary
block to that column's entry. If the model has foreign keys, define those as
entity: type: foreign
.
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      - name: order_id # this is the primary key column of the model
        entity: 
          type: primary 
          name: order
      - name: customer_id # this is a foreign key column of the model
        entity: 
          type: foreign 
          name: customer
识别主键列(检查
_id
后缀、唯一性测试或显式配置)。在该列的配置中添加
entity: \n\t type: primary
块。如果模型有外键,将其定义为
entity: type: foreign
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      - name: order_id # this is the primary key column of the model
        entity: 
          type: primary 
          name: order
      - name: customer_id # this is a foreign key column of the model
        entity: 
          type: foreign 
          name: customer

Step 3: Define Dimensions

步骤3:定义维度

Scan columns for dimension candidates. These would be useful columns to group by when querying a metrics:
  • Time columns →
    dimension: type: time
    with appropriate
    granularity
    (set at the column level)
  • Categorical columns (strings, booleans) →
    dimension: type: categorical
Present suggested dimensions to user for confirmation.
Example YAML:
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      
      - name: order_id
        entity: 
          type: primary 
          name: order
      
      - name: customer_id
        entity: 
          type: foreign 
          name: customer
      
      - name: ordered_at
        granularity: day # set the granularity of the time column
        dimension:
          type: time

      - name: order_status
        dimension:
          type: categorical
扫描列以寻找维度候选。这些列是查询指标时用于分组的有用列:
  • 时间列 →
    dimension: type: time
    并设置合适的
    granularity
    (在列级别设置)
  • 分类列(字符串、布尔值)→
    dimension: type: categorical
将建议的维度展示给用户确认。
示例YAML:
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      
      - name: order_id
        entity: 
          type: primary 
          name: order
      
      - name: customer_id
        entity: 
          type: foreign 
          name: customer
      
      - name: ordered_at
        granularity: day # set the granularity of the time column
        dimension:
          type: time

      - name: order_status
        dimension:
          type: categorical

Step 4: Define Metrics

步骤4:定义指标

Create some simple metrics for the model. For each metric, collect: name, description, label, aggregation type, and expression. Support metric types:
simple
,
derived
,
cumulative
,
conversion
,
ratio
.
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      
      - name: order_id
        entity: 
          type: primary 
          name: order
      
      - name: customer_id
        entity: 
          type: foreign 
          name: customer
      
      - name: ordered_at
        granularity: day # set the granularity of the time column
        dimension:
          type: time

      - name: order_status
        dimension:
          type: categorical

    metrics:
      - name: order_count
        type: simple
        agg: count
        expr: 1

      - name: total_revenue
        type: simple
        agg: sum
        expr: amount

      - name: average_order_value
        type: simple
        agg: average
        expr: amount
为模型创建一些简单指标。每个指标需收集:名称、描述、标签、聚合类型和表达式。支持的指标类型:
simple
,
derived
,
cumulative
,
conversion
,
ratio
yaml
models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      
      - name: order_id
        entity: 
          type: primary 
          name: order
      
      - name: customer_id
        entity: 
          type: foreign 
          name: customer
      
      - name: ordered_at
        granularity: day # set the granularity of the time column
        dimension:
          type: time

      - name: order_status
        dimension:
          type: categorical

    metrics:
      - name: order_count
        type: simple
        agg: count
        expr: 1

      - name: total_revenue
        type: simple
        agg: sum
        expr: amount

      - name: average_order_value
        type: simple
        agg: average
        expr: amount

Validation

验证

After writing YAML, validate in two stages:
  1. Parse Validation: Run
    dbt parse
    to confirm YAML syntax and references
  2. Semantic Layer Validation:
    • dbt sl validate
      (dbt Cloud CLI / dbt Fusion)
    • mf validate-configs
      (MetricFlow CLI)
Do not consider work complete until both validations pass.
编写YAML后,分两个阶段进行验证:
  1. 解析验证:运行
    dbt parse
    确认YAML语法和引用正确
  2. 语义层验证:
    • dbt sl validate
      (dbt Cloud CLI / dbt Fusion)
    • mf validate-configs
      (MetricFlow CLI)
只有当两次验证都通过后,才算工作完成。

Editing Existing Components

编辑现有组件

When modifying existing semantic layer config:
  • Check if the model's YAML already has
    semantic_model:
    block
  • Read existing entities, dimensions, and metrics before making changes
  • Preserve all existing YAML content not being modified
  • After edits, run full validation to ensure nothing broke
修改现有语义层配置时:
  • 检查模型的YAML是否已包含
    semantic_model:
  • 在进行更改前,先读取现有实体、维度和指标
  • 保留所有未被修改的现有YAML内容
  • 编辑完成后,运行完整验证确保没有问题

YAML Format Reference

YAML格式参考

Derived Dimensions and Entities

派生维度和实体

if the user wants to create a derived dimension or entity that is not a column within the dbt model, then we can use the
derived_semantics
block.
yaml
    derived_semantics:
      dimensions:
        - name: order_size_bucket
          type: categorical
          expr: case when amount > 100 then 'large' else 'small' end
          label: "Order Size"

      entities:
        - name: order_customer_key
          type: foreign
          expr: "order_id || '-' || customer_id"
如果用户想要创建dbt模型中不存在的列作为派生维度或实体,可以使用
derived_semantics
块。
yaml
    derived_semantics:
      dimensions:
        - name: order_size_bucket
          type: categorical
          expr: case when amount > 100 then 'large' else 'small' end
          label: "Order Size"

      entities:
        - name: order_customer_key
          type: foreign
          expr: "order_id || '-' || customer_id"

Advanced Metric Examples

高级指标示例

All simple metrics are defined at the model level under the
metrics
key. Advanced metrics that refer to simple metrics within the same model are defined within a model's YAML entry the
models.metrics
key. Advanced metrics that refer to simple metrics across different models are defined at the top level under the
metrics
key.
所有简单指标都在模型级别下的
metrics
键中定义。引用同一模型内简单指标的高级指标在模型的YAML条目中的
models.metrics
键下定义。引用不同模型中简单指标的高级指标在顶级
metrics
键下定义。

Derived Metrics

派生指标

yaml
      - name: revenue_per_order
        type: derived
        description: Average revenue per order
        label: Revenue per Order
        expr: total_revenue / total_orders
        input_metrics:
          - name: total_revenue
          - name: total_orders

      # With offset window
      - name: revenue_growth
        type: derived
        expr: total_revenue - revenue_last_week
        input_metrics:
          - name: total_revenue
          - name: total_revenue
            alias: revenue_last_week
            offset_window: 1 week
            filter: "{{ Dimension('order__status') }} = 'completed'"
yaml
      - name: revenue_per_order
        type: derived
        description: Average revenue per order
        label: Revenue per Order
        expr: total_revenue / total_orders
        input_metrics:
          - name: total_revenue
          - name: total_orders

      # With offset window
      - name: revenue_growth
        type: derived
        expr: total_revenue - revenue_last_week
        input_metrics:
          - name: total_revenue
          - name: total_revenue
            alias: revenue_last_week
            offset_window: 1 week
            filter: "{{ Dimension('order__status') }} = 'completed'"

Cumulative Metrics

累积指标

yaml
      - name: cumulative_revenue
        type: cumulative
        description: Running total of revenue
        label: Cumulative Revenue
        input_metric: total_revenue
        grain_to_date: week
        period_agg: first

      # With window
      - name: trailing_7d_revenue
        type: cumulative
        input_metric: total_revenue
        window: 7 days
yaml
      - name: cumulative_revenue
        type: cumulative
        description: Running total of revenue
        label: Cumulative Revenue
        input_metric: total_revenue
        grain_to_date: week
        period_agg: first

      # With window
      - name: trailing_7d_revenue
        type: cumulative
        input_metric: total_revenue
        window: 7 days

Ratio Metrics

比率指标

yaml
      - name: conversion_rate
        type: ratio
        description: Orders divided by visits
        label: Conversion Rate
        numerator: total_orders
        denominator: total_visits

      # With filters
      - name: premium_conversion_rate
        type: ratio
        numerator:
          name: total_orders
          filter: "{{ Dimension('order__customer_segment') }} = 'premium'"
          alias: premium_orders
        denominator: total_visits
yaml
      - name: conversion_rate
        type: ratio
        description: Orders divided by visits
        label: Conversion Rate
        numerator: total_orders
        denominator: total_visits

      # With filters
      - name: premium_conversion_rate
        type: ratio
        numerator:
          name: total_orders
          filter: "{{ Dimension('order__customer_segment') }} = 'premium'"
          alias: premium_orders
        denominator: total_visits

Conversion Metrics

转化指标

yaml
      - name: signup_to_purchase
        type: conversion
        description: Rate of signups converting to purchase
        label: Signup to Purchase
        entity: customer
        calculation: conversion_rate
        base_metric: signups
        conversion_metric: purchases
        window: 7 days
        constant_properties:
          - base_property: signup_channel
            conversion_property: purchase_channel
yaml
      - name: signup_to_purchase
        type: conversion
        description: Rate of signups converting to purchase
        label: Signup to Purchase
        entity: customer
        calculation: conversion_rate
        base_metric: signups
        conversion_metric: purchases
        window: 7 days
        constant_properties:
          - base_property: signup_channel
            conversion_property: purchase_channel

Top-level Metrics (Cross-model)

顶级指标(跨模型)

yaml
undefined
yaml
undefined

For metrics depending on multiple semantic models

For metrics depending on multiple semantic models

metrics:
  • name: cross_model_ratio type: ratio numerator: name: metric_from_model_a filter: "{{ Dimension('entity__dim') }} > 10" denominator: name: metric_from_model_b config: group: example_group tags: - example_tag meta: owner: "@someone"
undefined
metrics:
  • name: cross_model_ratio type: ratio numerator: name: metric_from_model_a filter: "{{ Dimension('entity__dim') }} > 10" denominator: name: metric_from_model_b config: group: example_group tags: - example_tag meta: owner: "@someone"
undefined

Filtering Metrics

指标过滤

Filters can be added to simple metrics or metric inputs to advanced metrics. The format of a filters is a Jinja template that can reference entities, dimensions, and metrics, a boolean operator, and a value. Ensure the value matches the type of the column being filtered.
Examples
filter: | {{ Entity('entity_name') }} = 'value'
filter: |
{{ Dimension('primary_entity__dimension_name') }} > 100
filter: |
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'
filter: |
{{ Metric('metric_name', group_by=['entity_name']) }} > 100
可以为简单指标或高级指标的输入添加过滤条件。过滤条件的格式是一个Jinja模板,可以引用实体、维度和指标,以及布尔运算符和值。确保值与被过滤列的类型匹配。
示例
filter: | {{ Entity('entity_name') }} = 'value'
filter: |
{{ Dimension('primary_entity__dimension_name') }} > 100
filter: |
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'
filter: |
{{ Metric('metric_name', group_by=['entity_name']) }} > 100

Key Formatting Rules

关键格式规则

  • semantic_model:
    block at model level with
    enabled: true
  • agg_time_dimension:
    at model level (not nested under
    semantic_model
    )
  • entity:
    and
    dimension:
    on columns (can use shorthand or full form)
  • granularity:
    required at column level for time dimensions
  • metrics:
    array at model level for single-model metrics
  • Top-level
    metrics:
    key for cross-model metrics (derived, ratio, cumulative, conversion only)
  • semantic_model:
    块位于模型级别,需设置
    enabled: true
  • agg_time_dimension:
    位于模型级别(不嵌套在
    semantic_model
    下)
  • entity:
    dimension:
    定义在列上(可使用简写或完整形式)
  • 时间维度需在列级别设置
    granularity:
  • 单模型指标在模型级别下的
    metrics:
    数组中定义
  • 跨模型指标(派生、比率、累积、转化类型)在顶级
    metrics:
    键下定义

Best Practices

最佳实践

  • Start with entities - Identify the grain before defining dimensions or metrics
  • Use shorthand where possible -
    entity: primary
    instead of full nested form for simple cases
  • Name metrics for business users - Use clear
    label
    values for non-technical users
  • Keep metrics close to their data - Simple metrics on their semantic model; top-level only for cross-model
  • Set appropriate granularity - Match the actual data grain (usually
    day
    )
  • 从实体开始 - 在定义维度或指标前先确定数据粒度
  • 尽可能使用简写 - 简单场景下使用
    entity: primary
    而非完整嵌套形式
  • 为业务用户命名指标 - 使用清晰的
    label
    值面向非技术用户
  • 将指标靠近其数据 - 简单指标放在对应的语义模型中;仅跨模型指标放在顶级
  • 设置合适的粒度 - 匹配实际数据粒度(通常为
    day

Common Pitfalls

常见陷阱

PitfallFix
Missing
agg_time_dimension
Every semantic model needs a default time dimension
granularity
inside
dimension:
block
Must be at column level
Defining a column as both an entity and a dimensionA column can only be one or the other
Simple metrics in top-level
metrics:
Top-level is only for cross-model metrics
Using
window
and
grain_to_date
together
Cumulative metrics can only have one
Missing
input_metrics
on derived metrics
Must list metrics used in
expr
陷阱解决方法
缺少
agg_time_dimension
每个语义模型都需要一个默认时间维度
granularity
dimension:
块内
必须在列级别设置
将列同时定义为实体和维度一列只能是其中之一
简单指标放在顶级
metrics:
顶级仅用于跨模型指标
同时使用
window
grain_to_date
累积指标只能使用其中一个
派生指标缺少
input_metrics
必须列出
expr
中使用的指标