migrating-sql-to-dbt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dbt Migration

dbt 迁移

Don't convert everything at once. Build and validate layer by layer.
不要一次性转换所有内容。逐层构建并验证。

Workflow

工作流

1. Analyze Legacy SQL

1. 分析遗留SQL

bash
cat <legacy_sql_file>
Identify all tables referenced in the query.
bash
cat <legacy_sql_file>
识别查询中引用的所有表。

2. Check What Already Exists

2. 检查现有资源

bash
undefined
bash
undefined

Search for existing models/sources that reference the table

搜索引用该表的现有模型/数据源

grep -r "<table_name>" models/ --include=".sql" --include=".yml" find models/ -name "*.sql" | xargs grep -l "<table_name>"

For each table referenced in the legacy SQL:
1. Check if an existing model already references this table
2. Check if a source definition exists
3. If neither exists, ask user: "Table X not found - should I create it as a source?"

Only proceed to intermediate/mart layers after all dependencies exist.
grep -r "<table_name>" models/ --include=".sql" --include=".yml" find models/ -name "*.sql" | xargs grep -l "<table_name>"

对于遗留SQL中引用的每个表:
1. 检查是否已有引用该表的模型
2. 检查是否存在数据源定义
3. 如果两者都不存在,询问用户:“未找到表X - 是否需要将其创建为数据源?”

只有在所有依赖项都存在后,再进行中间层/集市层的工作。

3. Create Missing Sources

3. 创建缺失的数据源

yaml
undefined
yaml
undefined

models/staging/sources.yml

models/staging/sources.yml

version: 2
sources:
  • name: raw_database schema: raw_schema tables:
    • name: orders description: Raw orders from source system
    • name: customers description: Raw customer records
undefined
version: 2
sources:
  • name: raw_database schema: raw_schema tables:
    • name: orders description: Raw orders from source system
    • name: customers description: Raw customer records
undefined

4. Build Staging Layer

4. 构建 staging 层

One staging model per source table. Follow existing project naming conventions.
Build before proceeding:
bash
dbt build --select <staging_model>
每个源表对应一个staging模型。遵循现有项目的命名约定。
构建完成后再继续:
bash
dbt build --select <staging_model>

5. Build Intermediate Layer (if needed)

5. 构建中间层(如有需要)

Extract complex joins/logic into intermediate models.
Build incrementally:
bash
dbt build --select <intermediate_model>
将复杂的关联/逻辑提取到中间模型中。
增量构建:
bash
dbt build --select <intermediate_model>

6. Build Mart Layer

6. 构建集市层

Final business-facing model with aggregations.
面向业务的最终模型,包含聚合逻辑。

7. Validate Migration

7. 验证迁移结果

bash
undefined
bash
undefined

Build entire lineage

构建整个血缘关系

dbt build --select +<final_model> dbt show --select <final_model>
undefined
dbt build --select +<final_model> dbt show --select <final_model>
undefined

Migration Checklist

迁移检查清单

  • All source tables identified and documented
  • Sources.yml created with descriptions
  • Staging models: 1:1 with sources, renamed columns
  • Intermediate models: business logic extracted
  • Mart models: final aggregations
  • Each layer compiles successfully
  • Each layer builds successfully
  • Row counts match original (manual validation)
  • Tests added for key constraints
  • 已识别并记录所有源表
  • 已创建包含描述的sources.yml
  • Staging模型:与源表一一对应,已重命名列
  • 中间模型:已提取业务逻辑
  • 集市模型:已完成最终聚合
  • 每个层级编译成功
  • 每个层级构建成功
  • 行数与原数据匹配(手动验证)
  • 已为关键约束添加测试

Common Migration Patterns

常见迁移模式

  • Nested subqueries → Separate models (staging → intermediate → mart)
  • Temp tables → Ephemeral materialization
    {{ config(materialized='ephemeral') }}
  • Hardcoded values → Variables
    {{ var("name") }}
  • 嵌套子查询 → 拆分模型(staging → 中间层 → 集市层)
  • 临时表 → 临时物化
    {{ config(materialized='ephemeral') }}
  • 硬编码值 → 变量
    {{ var("name") }}

Anti-Patterns

反模式

  • Converting entire legacy query to single dbt model
  • Skipping the staging layer
  • Not validating each layer before proceeding
  • Keeping hardcoded values instead of using variables
  • Not documenting business logic during migration
  • 将整个遗留查询转换为单个dbt模型
  • 跳过staging层
  • 未验证当前层级就直接进行下一步
  • 保留硬编码值而非使用变量
  • 迁移过程中未记录业务逻辑