documenting-dbt-models
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesedbt Documentation
dbt 文档编写
Document the WHY, not just the WHAT. Include grain, business rules, and caveats.
要记录原因,而非仅仅记录内容。需包含数据粒度、业务规则和注意事项。
Workflow
工作流程
1. Study Existing Documentation Patterns
1. 研究现有文档模式
CRITICAL: Match the project's documentation style before adding new docs.
bash
undefined重要提示:添加新文档前需匹配项目的现有文档风格。
bash
undefinedFind all schema.yml files with documentation
查找所有包含文档的schema.yml文件
find . -name "schema.yml" | head -5
find . -name "schema.yml" | head -5
Read well-documented models to learn patterns
阅读编写完善的模型以学习模式
cat models/marts/schema.yml | head -150
cat models/staging/schema.yml | head -150
**Extract from existing documentation:**
- Description length (brief vs detailed)
- Formatting style (plain text vs markdown with headers)
- Information included (grain? business rules? caveats?)
- Column description depth (all columns vs key columns)
- Use of meta tags or custom propertiescat models/marts/schema.yml | head -150
cat models/staging/schema.yml | head -150
**从现有文档中提取以下信息:**
- 描述长度(简洁型 vs 详细型)
- 格式风格(纯文本 vs 带标题的Markdown)
- 包含的信息(数据粒度?业务规则?注意事项?)
- 字段描述的详细程度(所有字段 vs 关键字段)
- 元标签或自定义属性的使用情况2. Read Model SQL
2. 阅读模型SQL代码
bash
cat models/<path>/<model_name>.sqlUnderstand: transformations, business logic, joins, filters.
bash
cat models/<path>/<model_name>.sql理解:数据转换逻辑、业务规则、关联操作、过滤条件。
3. Check Existing Documentation for This Model
3. 检查该模型的现有文档
bash
undefinedbash
undefinedFind existing schema.yml
查找现有的schema.yml
find . -name "schema.yml" -exec grep -l "<model_name>" {} ;
find . -name "schema.yml" -exec grep -l "<model_name>" {} ;
Read existing docs
阅读现有文档
cat models/<path>/schema.yml | grep -A 100 "<model_name>"
undefinedcat models/<path>/schema.yml | grep -A 100 "<model_name>"
undefined4. Identify Documentation Needs
4. 确定文档需求
For each model, document:
- Model description: Purpose, grain, key business rules
- Column descriptions: Business meaning, not just data type
For each column, consider:
- What business concept does this represent?
- Are there any caveats or special values?
- What is the source of this data?
针对每个模型,需记录:
- 模型描述:用途、数据粒度、核心业务规则
- 字段描述:业务含义,而非仅仅数据类型
针对每个字段,需考虑:
- 它代表什么业务概念?
- 是否有任何注意事项或特殊值?
- 该数据的来源是什么?
5. Write Documentation
5. 编写文档
Match the style discovered in step 1. Example format (adapt to project):
yaml
version: 2
models:
- name: orders
description: |
Order transactions at the order line item grain.
Each row represents one product in one order.
**Business Rules:**
- Revenue recognized on ship_date, not order_date
- Cancelled orders excluded (status != 'cancelled')
- Returns processed as negative line items
**Grain:** One row per order_id + product_id combination
columns:
- name: order_id
description: |
Unique identifier for the order.
Source: orders.id from Stripe webhook
- name: customer_id
description: |
Foreign key to customers table.
NULL for guest checkouts (pre-2023 only)
- name: revenue
description: |
Net revenue for this line item in USD.
Calculation: unit_price * quantity - discount_amount
Excludes tax and shipping
- name: order_status
description: |
Current status of the order.
Values: pending, processing, shipped, delivered, cancelled, returned需匹配步骤1中发现的风格。示例格式(请根据项目调整):
yaml
version: 2
models:
- name: orders
description: |
订单交易数据,粒度为订单行项目。
每一行代表一个订单中的一件商品。
**业务规则:**
- 收入确认以发货日期(ship_date)为准,而非下单日期(order_date)
- 已取消的订单被排除(status != 'cancelled')
- 退货记录以负金额行项目处理
**数据粒度:** 每一行对应一个order_id + product_id的组合
columns:
- name: order_id
description: |
订单的唯一标识符。
来源:Stripe webhook中的orders.id
- name: customer_id
description: |
关联至customers表的外键。
仅2023年之前的访客结账记录为NULL
- name: revenue
description: |
该行项目的净收入,单位为美元。
计算方式:unit_price * quantity - discount_amount
不包含税费和运费
- name: order_status
description: |
订单的当前状态。
可选值:pending, processing, shipped, delivered, cancelled, returned6. Generate Docs
6. 生成文档
bash
dbt docs generate
dbt docs serve # Optional: preview locallybash
dbt docs generate
dbt docs serve # 可选:本地预览Documentation Patterns
文档模式
Note: These are default templates. Always adapt to match project's existing style.
注意:以下为默认模板,请务必根据项目现有风格调整。
Model Description Template
模型描述模板
yaml
description: |
[One sentence: what this model contains]
**Grain:** [What does one row represent?]
**Business Rules:**
- [Key rule 1]
- [Key rule 2]
**Caveats:**
- [Important limitation or edge case]yaml
description: |
[一句话说明:该模型包含什么内容]
**数据粒度:** [每一行代表什么?]
**业务规则:**
- [核心规则1]
- [核心规则2]
**注意事项:**
- [重要限制或边缘情况]Column Description Patterns
字段描述模式
| Column Type | Documentation Focus |
|---|---|
| Primary key | Source system, uniqueness guarantee |
| Foreign key | What it joins to, NULL handling |
| Metric | Calculation formula, units, exclusions |
| Date | Timezone, what event it represents |
| Status/Category | All possible values, business meaning |
| Boolean/Flag | What true/false means in business terms |
| 字段类型 | 文档重点 |
|---|---|
| 主键 | 来源系统、唯一性保证 |
| 外键 | 关联目标、NULL值处理方式 |
| 指标 | 计算公式、单位、排除规则 |
| 日期 | 时区、代表的事件 |
| 状态/分类 | 所有可选值、业务含义 |
| 布尔/标记字段 | 真/假对应的业务含义 |
Documenting Calculated Fields
计算字段的文档编写
yaml
- name: gross_margin
description: |
Gross margin percentage.
Calculation: (revenue - cogs) / revenue * 100
NULL when revenue = 0 to avoid division by zeroyaml
- name: gross_margin
description: |
毛利率百分比。
计算方式:(revenue - cogs) / revenue * 100
当revenue = 0时为NULL,以避免除零错误Anti-Patterns
反模式
- Adding documentation without checking existing project patterns
- Using different formatting style than existing documentation
- Describing WHAT (e.g., "The order ID") instead of WHY/context
- Missing grain documentation
- Not documenting NULL handling
- Leaving columns undocumented
- Copy-pasting column names as descriptions
- 未检查现有项目模式就添加文档
- 使用与现有文档不同的格式风格
- 仅描述内容(如“订单ID”)而非原因/背景
- 缺失数据粒度文档
- 未记录NULL值处理方式
- 留下未文档化的字段
- 直接复制字段名作为描述