building-dbt-semantic-layer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBuilding the dbt Semantic Layer
构建dbt Semantic Layer
This skill guides the creation and modification of dbt Semantic Layer components: semantic models, entities, dimensions, and metrics.
- Semantic models - Metadata configurations that define how dbt models map to business concepts
- Entities - Keys that identify the grain of your data and enable joins between semantic models
- Dimensions - Attributes used to filter or group metrics (categorical or time-based)
- Metrics - Business calculations defined on top of semantic models (e.g., revenue, order count)
本技能指导dbt Semantic Layer组件(语义模型、实体、维度和指标)的创建与修改。
- 语义模型 - 定义dbt模型如何映射到业务概念的元数据配置
- 实体 - 标识数据粒度并支持语义模型之间关联的键
- 维度 - 用于过滤或分组指标的属性(分类或基于时间)
- 指标 - 基于语义模型定义的业务计算(例如:收入、订单数量)
Additional Resources
额外资源
- Time Spine Setup - Required for time-based metrics and aggregations
- Best Practices - Design patterns and recommendations for semantic models and metrics
[!NOTE] This skill contains guidance for the new dbt semantic layer YAML spec, valid for dbt 1.12.0 and above. If the user is using a different version of dbt, you can use the migration guide to help them migrate to the new spec and add new components to their semantic layer. Ask the user if they want to migrate to the new spec before proceeding.
- 时间轴设置 - 基于时间的指标和聚合所需的配置
- 最佳实践 - 语义模型和指标的设计模式与建议
[!NOTE] 本技能包含针对新版dbt语义层YAML规范的指导,适用于dbt 1.12.0及以上版本。如果用户使用的是其他版本的dbt,你可以使用迁移指南帮助他们迁移到新版规范,并在语义层中添加新组件。在继续操作前,请询问用户是否需要迁移到新版规范。
Entry Points
入口场景
Users may ask questions related to building metrics with the semantic layer in a few different ways. Here are the common entry points to look out for:
用户可能会通过几种不同的方式询问与使用语义层构建指标相关的问题。以下是需要注意的常见入口场景:
Business Question First
先提出业务问题
When the user describes a metric or analysis need (e.g., "I need to track customer lifetime value by segment"):
- Search project models or existing semantic models by name, description, and column names for relevant candidates
- Present top matches with brief context (model name, description, key columns)
- User confirms which model(s) / semantic models to build on / extend / update
- Work backwards from users need to define entities, dimensions, and metrics
当用户描述指标或分析需求时(例如:"我需要按细分群体跟踪客户生命周期价值"):
- 按名称、描述和列名搜索项目模型或现有语义模型,找到相关候选
- 展示匹配度最高的结果及简要上下文(模型名称、描述、关键列)
- 用户确认要基于/扩展/更新的模型/语义模型
- 从用户需求倒推,定义实体、维度和指标
Model First
先指定模型
When the user specifies a model to expose (e.g., "Add semantic layer to model"):
customers- Read the model SQL and existing YAML config
- Identify the grain (primary key / entity)
- Suggest dimensions based on column types and names
- Ask what metrics the user wants to define
Both paths converge on the same implementation workflow.
当用户指定要暴露的模型时(例如:"为模型添加语义层"):
customers- 读取模型SQL和现有YAML配置
- 确定数据粒度(主键/实体)
- 根据列类型和名称建议维度
- 询问用户要定义哪些指标
两种路径最终都会汇聚到相同的实现流程中。
Open Ended
开放式需求
User asks to build the semantic layer for a project or models that are not specified. ("Build the semantic layer for my project")
- Identify high importance models in the project
- Suggest some metrics and dimensions for those models
- Ask the user if they want to create more metrics and dimensions or if there are any other models they want to build the semantic layer on
用户要求为未指定的项目或模型构建语义层(例如:"为我的项目构建语义层"):
- 确定项目中高优先级的模型
- 为这些模型建议一些指标和维度
- 询问用户是否需要创建更多指标和维度,或者是否有其他模型需要构建语义层
Implementation Workflow
实现流程
Step 1: Enable Semantic Model
步骤1:启用语义模型
Decide which dbt model to build the semantic model on. Add block to the model's YAML with . Set to the primary time column. If the model does not have a time column, warn user that the model cannot contain metrics that are time-based. Ask the user if they want to create a derived time dimension.
semantic_model:enabled: trueagg_time_dimensionExample YAML:
yaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)确定要构建语义模型的dbt模型。在模型的YAML中添加块并设置。将设置为主要时间列。如果模型没有时间列,需提醒用户该模型无法包含基于时间的指标,并询问用户是否要创建派生时间维度。
semantic_model:enabled: trueagg_time_dimension示例YAML:
yaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)Step 2: Define Entities
步骤2:定义实体
Identify the primary key column (check for suffix, uniqueness tests, or explicit config). Add block to that column's entry. If the model has foreign keys, define those as .
_identity: \n\t type: primaryentity: type: foreignyaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
columns:
- name: order_id # this is the primary key column of the model
entity:
type: primary
name: order
- name: customer_id # this is a foreign key column of the model
entity:
type: foreign
name: customer识别主键列(检查后缀、唯一性测试或显式配置)。在该列的配置中添加块。如果模型有外键,将其定义为。
_identity: \n\t type: primaryentity: type: foreignyaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
columns:
- name: order_id # this is the primary key column of the model
entity:
type: primary
name: order
- name: customer_id # this is a foreign key column of the model
entity:
type: foreign
name: customerStep 3: Define Dimensions
步骤3:定义维度
Scan columns for dimension candidates. These would be useful columns to group by when querying a metrics:
- Time columns → with appropriate
dimension: type: time(set at the column level)granularity - Categorical columns (strings, booleans) →
dimension: type: categorical
Present suggested dimensions to user for confirmation.
Example YAML:
yaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
columns:
- name: order_id
entity:
type: primary
name: order
- name: customer_id
entity:
type: foreign
name: customer
- name: ordered_at
granularity: day # set the granularity of the time column
dimension:
type: time
- name: order_status
dimension:
type: categorical扫描列以寻找维度候选。这些列是查询指标时用于分组的有用列:
- 时间列 → 并设置合适的
dimension: type: time(在列级别设置)granularity - 分类列(字符串、布尔值)→
dimension: type: categorical
将建议的维度展示给用户确认。
示例YAML:
yaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
columns:
- name: order_id
entity:
type: primary
name: order
- name: customer_id
entity:
type: foreign
name: customer
- name: ordered_at
granularity: day # set the granularity of the time column
dimension:
type: time
- name: order_status
dimension:
type: categoricalStep 4: Define Metrics
步骤4:定义指标
Create some simple metrics for the model. For each metric, collect: name, description, label, aggregation type, and expression. Support metric types: , , , , .
simplederivedcumulativeconversionratioyaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
columns:
- name: order_id
entity:
type: primary
name: order
- name: customer_id
entity:
type: foreign
name: customer
- name: ordered_at
granularity: day # set the granularity of the time column
dimension:
type: time
- name: order_status
dimension:
type: categorical
metrics:
- name: order_count
type: simple
agg: count
expr: 1
- name: total_revenue
type: simple
agg: sum
expr: amount
- name: average_order_value
type: simple
agg: average
expr: amount为模型创建一些简单指标。每个指标需收集:名称、描述、标签、聚合类型和表达式。支持的指标类型:, , , , 。
simplederivedcumulativeconversionratioyaml
models:
- name: orders
semantic_model:
enabled: true # enable the semantic model
agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
columns:
- name: order_id
entity:
type: primary
name: order
- name: customer_id
entity:
type: foreign
name: customer
- name: ordered_at
granularity: day # set the granularity of the time column
dimension:
type: time
- name: order_status
dimension:
type: categorical
metrics:
- name: order_count
type: simple
agg: count
expr: 1
- name: total_revenue
type: simple
agg: sum
expr: amount
- name: average_order_value
type: simple
agg: average
expr: amountValidation
验证
After writing YAML, validate in two stages:
- Parse Validation: Run to confirm YAML syntax and references
dbt parse - Semantic Layer Validation:
- (dbt Cloud CLI / dbt Fusion)
dbt sl validate - (MetricFlow CLI)
mf validate-configs
Do not consider work complete until both validations pass.
编写YAML后,分两个阶段进行验证:
- 解析验证:运行确认YAML语法和引用正确
dbt parse - 语义层验证:
- (dbt Cloud CLI / dbt Fusion)
dbt sl validate - (MetricFlow CLI)
mf validate-configs
只有当两次验证都通过后,才算工作完成。
Editing Existing Components
编辑现有组件
When modifying existing semantic layer config:
- Check if the model's YAML already has block
semantic_model: - Read existing entities, dimensions, and metrics before making changes
- Preserve all existing YAML content not being modified
- After edits, run full validation to ensure nothing broke
修改现有语义层配置时:
- 检查模型的YAML是否已包含块
semantic_model: - 在进行更改前,先读取现有实体、维度和指标
- 保留所有未被修改的现有YAML内容
- 编辑完成后,运行完整验证确保没有问题
YAML Format Reference
YAML格式参考
Derived Dimensions and Entities
派生维度和实体
if the user wants to create a derived dimension or entity that is not a column within the dbt model, then we can use the block.
derived_semanticsyaml
derived_semantics:
dimensions:
- name: order_size_bucket
type: categorical
expr: case when amount > 100 then 'large' else 'small' end
label: "Order Size"
entities:
- name: order_customer_key
type: foreign
expr: "order_id || '-' || customer_id"如果用户想要创建dbt模型中不存在的列作为派生维度或实体,可以使用块。
derived_semanticsyaml
derived_semantics:
dimensions:
- name: order_size_bucket
type: categorical
expr: case when amount > 100 then 'large' else 'small' end
label: "Order Size"
entities:
- name: order_customer_key
type: foreign
expr: "order_id || '-' || customer_id"Advanced Metric Examples
高级指标示例
All simple metrics are defined at the model level under the key. Advanced metrics that refer to simple metrics within the same model are defined within a model's YAML entry the key. Advanced metrics that refer to simple metrics across different models are defined at the top level under the key.
metricsmodels.metricsmetrics所有简单指标都在模型级别下的键中定义。引用同一模型内简单指标的高级指标在模型的YAML条目中的键下定义。引用不同模型中简单指标的高级指标在顶级键下定义。
metricsmodels.metricsmetricsDerived Metrics
派生指标
yaml
- name: revenue_per_order
type: derived
description: Average revenue per order
label: Revenue per Order
expr: total_revenue / total_orders
input_metrics:
- name: total_revenue
- name: total_orders
# With offset window
- name: revenue_growth
type: derived
expr: total_revenue - revenue_last_week
input_metrics:
- name: total_revenue
- name: total_revenue
alias: revenue_last_week
offset_window: 1 week
filter: "{{ Dimension('order__status') }} = 'completed'"yaml
- name: revenue_per_order
type: derived
description: Average revenue per order
label: Revenue per Order
expr: total_revenue / total_orders
input_metrics:
- name: total_revenue
- name: total_orders
# With offset window
- name: revenue_growth
type: derived
expr: total_revenue - revenue_last_week
input_metrics:
- name: total_revenue
- name: total_revenue
alias: revenue_last_week
offset_window: 1 week
filter: "{{ Dimension('order__status') }} = 'completed'"Cumulative Metrics
累积指标
yaml
- name: cumulative_revenue
type: cumulative
description: Running total of revenue
label: Cumulative Revenue
input_metric: total_revenue
grain_to_date: week
period_agg: first
# With window
- name: trailing_7d_revenue
type: cumulative
input_metric: total_revenue
window: 7 daysyaml
- name: cumulative_revenue
type: cumulative
description: Running total of revenue
label: Cumulative Revenue
input_metric: total_revenue
grain_to_date: week
period_agg: first
# With window
- name: trailing_7d_revenue
type: cumulative
input_metric: total_revenue
window: 7 daysRatio Metrics
比率指标
yaml
- name: conversion_rate
type: ratio
description: Orders divided by visits
label: Conversion Rate
numerator: total_orders
denominator: total_visits
# With filters
- name: premium_conversion_rate
type: ratio
numerator:
name: total_orders
filter: "{{ Dimension('order__customer_segment') }} = 'premium'"
alias: premium_orders
denominator: total_visitsyaml
- name: conversion_rate
type: ratio
description: Orders divided by visits
label: Conversion Rate
numerator: total_orders
denominator: total_visits
# With filters
- name: premium_conversion_rate
type: ratio
numerator:
name: total_orders
filter: "{{ Dimension('order__customer_segment') }} = 'premium'"
alias: premium_orders
denominator: total_visitsConversion Metrics
转化指标
yaml
- name: signup_to_purchase
type: conversion
description: Rate of signups converting to purchase
label: Signup to Purchase
entity: customer
calculation: conversion_rate
base_metric: signups
conversion_metric: purchases
window: 7 days
constant_properties:
- base_property: signup_channel
conversion_property: purchase_channelyaml
- name: signup_to_purchase
type: conversion
description: Rate of signups converting to purchase
label: Signup to Purchase
entity: customer
calculation: conversion_rate
base_metric: signups
conversion_metric: purchases
window: 7 days
constant_properties:
- base_property: signup_channel
conversion_property: purchase_channelTop-level Metrics (Cross-model)
顶级指标(跨模型)
yaml
undefinedyaml
undefinedFor metrics depending on multiple semantic models
For metrics depending on multiple semantic models
metrics:
- name: cross_model_ratio type: ratio numerator: name: metric_from_model_a filter: "{{ Dimension('entity__dim') }} > 10" denominator: name: metric_from_model_b config: group: example_group tags: - example_tag meta: owner: "@someone"
undefinedmetrics:
- name: cross_model_ratio type: ratio numerator: name: metric_from_model_a filter: "{{ Dimension('entity__dim') }} > 10" denominator: name: metric_from_model_b config: group: example_group tags: - example_tag meta: owner: "@someone"
undefinedFiltering Metrics
指标过滤
Filters can be added to simple metrics or metric inputs to advanced metrics. The format of a filters is a Jinja template that can reference entities, dimensions, and metrics, a boolean operator, and a value. Ensure the value matches the type of the column being filtered.
Examples
filter: |
{{ Entity('entity_name') }} = 'value'
filter: |
{{ Dimension('primary_entity__dimension_name') }} > 100
{{ Dimension('primary_entity__dimension_name') }} > 100
filter: |
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'
filter: |
{{ Metric('metric_name', group_by=['entity_name']) }} > 100
{{ Metric('metric_name', group_by=['entity_name']) }} > 100
可以为简单指标或高级指标的输入添加过滤条件。过滤条件的格式是一个Jinja模板,可以引用实体、维度和指标,以及布尔运算符和值。确保值与被过滤列的类型匹配。
示例
filter: |
{{ Entity('entity_name') }} = 'value'
filter: |
{{ Dimension('primary_entity__dimension_name') }} > 100
{{ Dimension('primary_entity__dimension_name') }} > 100
filter: |
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'
filter: |
{{ Metric('metric_name', group_by=['entity_name']) }} > 100
{{ Metric('metric_name', group_by=['entity_name']) }} > 100
Key Formatting Rules
关键格式规则
- block at model level with
semantic_model:enabled: true - at model level (not nested under
agg_time_dimension:)semantic_model - and
entity:on columns (can use shorthand or full form)dimension: - required at column level for time dimensions
granularity: - array at model level for single-model metrics
metrics: - Top-level key for cross-model metrics (derived, ratio, cumulative, conversion only)
metrics:
- 块位于模型级别,需设置
semantic_model:enabled: true - 位于模型级别(不嵌套在
agg_time_dimension:下)semantic_model - 和
entity:定义在列上(可使用简写或完整形式)dimension: - 时间维度需在列级别设置
granularity: - 单模型指标在模型级别下的数组中定义
metrics: - 跨模型指标(派生、比率、累积、转化类型)在顶级键下定义
metrics:
Best Practices
最佳实践
- Start with entities - Identify the grain before defining dimensions or metrics
- Use shorthand where possible - instead of full nested form for simple cases
entity: primary - Name metrics for business users - Use clear values for non-technical users
label - Keep metrics close to their data - Simple metrics on their semantic model; top-level only for cross-model
- Set appropriate granularity - Match the actual data grain (usually )
day
- 从实体开始 - 在定义维度或指标前先确定数据粒度
- 尽可能使用简写 - 简单场景下使用而非完整嵌套形式
entity: primary - 为业务用户命名指标 - 使用清晰的值面向非技术用户
label - 将指标靠近其数据 - 简单指标放在对应的语义模型中;仅跨模型指标放在顶级
- 设置合适的粒度 - 匹配实际数据粒度(通常为)
day
Common Pitfalls
常见陷阱
| Pitfall | Fix |
|---|---|
Missing | Every semantic model needs a default time dimension |
| Must be at column level |
| Defining a column as both an entity and a dimension | A column can only be one or the other |
Simple metrics in top-level | Top-level is only for cross-model metrics |
Using | Cumulative metrics can only have one |
Missing | Must list metrics used in |
| 陷阱 | 解决方法 |
|---|---|
缺少 | 每个语义模型都需要一个默认时间维度 |
| 必须在列级别设置 |
| 将列同时定义为实体和维度 | 一列只能是其中之一 |
简单指标放在顶级 | 顶级仅用于跨模型指标 |
同时使用 | 累积指标只能使用其中一个 |
派生指标缺少 | 必须列出 |