documenting-dbt-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dbt Documentation

dbt 文档编写

Document the WHY, not just the WHAT. Include grain, business rules, and caveats.
要记录原因,而非仅仅记录内容。需包含数据粒度、业务规则和注意事项。

Workflow

工作流程

1. Study Existing Documentation Patterns

1. 研究现有文档模式

CRITICAL: Match the project's documentation style before adding new docs.
bash
undefined
重要提示:添加新文档前需匹配项目的现有文档风格。
bash
undefined

Find all schema.yml files with documentation

查找所有包含文档的schema.yml文件

find . -name "schema.yml" | head -5
find . -name "schema.yml" | head -5

Read well-documented models to learn patterns

阅读编写完善的模型以学习模式

cat models/marts/schema.yml | head -150 cat models/staging/schema.yml | head -150

**Extract from existing documentation:**
- Description length (brief vs detailed)
- Formatting style (plain text vs markdown with headers)
- Information included (grain? business rules? caveats?)
- Column description depth (all columns vs key columns)
- Use of meta tags or custom properties
cat models/marts/schema.yml | head -150 cat models/staging/schema.yml | head -150

**从现有文档中提取以下信息:**
- 描述长度(简洁型 vs 详细型)
- 格式风格(纯文本 vs 带标题的Markdown)
- 包含的信息(数据粒度?业务规则?注意事项?)
- 字段描述的详细程度(所有字段 vs 关键字段)
- 元标签或自定义属性的使用情况

2. Read Model SQL

2. 阅读模型SQL代码

bash
cat models/<path>/<model_name>.sql
Understand: transformations, business logic, joins, filters.
bash
cat models/<path>/<model_name>.sql
理解:数据转换逻辑、业务规则、关联操作、过滤条件。

3. Check Existing Documentation for This Model

3. 检查该模型的现有文档

bash
undefined
bash
undefined

Find existing schema.yml

查找现有的schema.yml

find . -name "schema.yml" -exec grep -l "<model_name>" {} ;
find . -name "schema.yml" -exec grep -l "<model_name>" {} ;

Read existing docs

阅读现有文档

cat models/<path>/schema.yml | grep -A 100 "<model_name>"
undefined
cat models/<path>/schema.yml | grep -A 100 "<model_name>"
undefined

4. Identify Documentation Needs

4. 确定文档需求

For each model, document:
  • Model description: Purpose, grain, key business rules
  • Column descriptions: Business meaning, not just data type
For each column, consider:
  • What business concept does this represent?
  • Are there any caveats or special values?
  • What is the source of this data?
针对每个模型,需记录:
  • 模型描述:用途、数据粒度、核心业务规则
  • 字段描述:业务含义,而非仅仅数据类型
针对每个字段,需考虑:
  • 它代表什么业务概念?
  • 是否有任何注意事项或特殊值?
  • 该数据的来源是什么?

5. Write Documentation

5. 编写文档

Match the style discovered in step 1. Example format (adapt to project):
yaml
version: 2

models:
  - name: orders
    description: |
      Order transactions at the order line item grain.
      Each row represents one product in one order.

      **Business Rules:**
      - Revenue recognized on ship_date, not order_date
      - Cancelled orders excluded (status != 'cancelled')
      - Returns processed as negative line items

      **Grain:** One row per order_id + product_id combination

    columns:
      - name: order_id
        description: |
          Unique identifier for the order.
          Source: orders.id from Stripe webhook

      - name: customer_id
        description: |
          Foreign key to customers table.
          NULL for guest checkouts (pre-2023 only)

      - name: revenue
        description: |
          Net revenue for this line item in USD.
          Calculation: unit_price * quantity - discount_amount
          Excludes tax and shipping

      - name: order_status
        description: |
          Current status of the order.
          Values: pending, processing, shipped, delivered, cancelled, returned
需匹配步骤1中发现的风格。示例格式(请根据项目调整):
yaml
version: 2

models:
  - name: orders
    description: |
      订单交易数据,粒度为订单行项目。
      每一行代表一个订单中的一件商品。

      **业务规则:**
      - 收入确认以发货日期(ship_date)为准,而非下单日期(order_date)
      - 已取消的订单被排除(status != 'cancelled')
      - 退货记录以负金额行项目处理

      **数据粒度:** 每一行对应一个order_id + product_id的组合

    columns:
      - name: order_id
        description: |
          订单的唯一标识符。
          来源:Stripe webhook中的orders.id

      - name: customer_id
        description: |
          关联至customers表的外键。
          仅2023年之前的访客结账记录为NULL

      - name: revenue
        description: |
          该行项目的净收入,单位为美元。
          计算方式:unit_price * quantity - discount_amount
          不包含税费和运费

      - name: order_status
        description: |
          订单的当前状态。
          可选值:pending, processing, shipped, delivered, cancelled, returned

6. Generate Docs

6. 生成文档

bash
dbt docs generate
dbt docs serve  # Optional: preview locally
bash
dbt docs generate
dbt docs serve  # 可选:本地预览

Documentation Patterns

文档模式

Note: These are default templates. Always adapt to match project's existing style.
注意:以下为默认模板,请务必根据项目现有风格调整。

Model Description Template

模型描述模板

yaml
description: |
  [One sentence: what this model contains]

  **Grain:** [What does one row represent?]

  **Business Rules:**
  - [Key rule 1]
  - [Key rule 2]

  **Caveats:**
  - [Important limitation or edge case]
yaml
description: |
  [一句话说明:该模型包含什么内容]

  **数据粒度:** [每一行代表什么?]

  **业务规则:**
  - [核心规则1]
  - [核心规则2]

  **注意事项:**
  - [重要限制或边缘情况]

Column Description Patterns

字段描述模式

Column TypeDocumentation Focus
Primary keySource system, uniqueness guarantee
Foreign keyWhat it joins to, NULL handling
MetricCalculation formula, units, exclusions
DateTimezone, what event it represents
Status/CategoryAll possible values, business meaning
Boolean/FlagWhat true/false means in business terms
字段类型文档重点
主键来源系统、唯一性保证
外键关联目标、NULL值处理方式
指标计算公式、单位、排除规则
日期时区、代表的事件
状态/分类所有可选值、业务含义
布尔/标记字段真/假对应的业务含义

Documenting Calculated Fields

计算字段的文档编写

yaml
- name: gross_margin
  description: |
    Gross margin percentage.
    Calculation: (revenue - cogs) / revenue * 100
    NULL when revenue = 0 to avoid division by zero
yaml
- name: gross_margin
  description: |
    毛利率百分比。
    计算方式:(revenue - cogs) / revenue * 100
    当revenue = 0时为NULL,以避免除零错误

Anti-Patterns

反模式

  • Adding documentation without checking existing project patterns
  • Using different formatting style than existing documentation
  • Describing WHAT (e.g., "The order ID") instead of WHY/context
  • Missing grain documentation
  • Not documenting NULL handling
  • Leaving columns undocumented
  • Copy-pasting column names as descriptions
  • 未检查现有项目模式就添加文档
  • 使用与现有文档不同的格式风格
  • 仅描述内容(如“订单ID”)而非原因/背景
  • 缺失数据粒度文档
  • 未记录NULL值处理方式
  • 留下未文档化的字段
  • 直接复制字段名作为描述