migrating-sql-to-dbt
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesedbt Migration
dbt 迁移
Don't convert everything at once. Build and validate layer by layer.
不要一次性转换所有内容。逐层构建并验证。
Workflow
工作流
1. Analyze Legacy SQL
1. 分析遗留SQL
bash
cat <legacy_sql_file>Identify all tables referenced in the query.
bash
cat <legacy_sql_file>识别查询中引用的所有表。
2. Check What Already Exists
2. 检查现有资源
bash
undefinedbash
undefinedSearch for existing models/sources that reference the table
搜索引用该表的现有模型/数据源
grep -r "<table_name>" models/ --include=".sql" --include=".yml"
find models/ -name "*.sql" | xargs grep -l "<table_name>"
For each table referenced in the legacy SQL:
1. Check if an existing model already references this table
2. Check if a source definition exists
3. If neither exists, ask user: "Table X not found - should I create it as a source?"
Only proceed to intermediate/mart layers after all dependencies exist.grep -r "<table_name>" models/ --include=".sql" --include=".yml"
find models/ -name "*.sql" | xargs grep -l "<table_name>"
对于遗留SQL中引用的每个表:
1. 检查是否已有引用该表的模型
2. 检查是否存在数据源定义
3. 如果两者都不存在,询问用户:“未找到表X - 是否需要将其创建为数据源?”
只有在所有依赖项都存在后,再进行中间层/集市层的工作。3. Create Missing Sources
3. 创建缺失的数据源
yaml
undefinedyaml
undefinedmodels/staging/sources.yml
models/staging/sources.yml
version: 2
sources:
- name: raw_database
schema: raw_schema
tables:
- name: orders description: Raw orders from source system
- name: customers description: Raw customer records
undefinedversion: 2
sources:
- name: raw_database
schema: raw_schema
tables:
- name: orders description: Raw orders from source system
- name: customers description: Raw customer records
undefined4. Build Staging Layer
4. 构建 staging 层
One staging model per source table. Follow existing project naming conventions.
Build before proceeding:
bash
dbt build --select <staging_model>每个源表对应一个staging模型。遵循现有项目的命名约定。
构建完成后再继续:
bash
dbt build --select <staging_model>5. Build Intermediate Layer (if needed)
5. 构建中间层(如有需要)
Extract complex joins/logic into intermediate models.
Build incrementally:
bash
dbt build --select <intermediate_model>将复杂的关联/逻辑提取到中间模型中。
增量构建:
bash
dbt build --select <intermediate_model>6. Build Mart Layer
6. 构建集市层
Final business-facing model with aggregations.
面向业务的最终模型,包含聚合逻辑。
7. Validate Migration
7. 验证迁移结果
bash
undefinedbash
undefinedBuild entire lineage
构建整个血缘关系
dbt build --select +<final_model>
dbt show --select <final_model>
undefineddbt build --select +<final_model>
dbt show --select <final_model>
undefinedMigration Checklist
迁移检查清单
- All source tables identified and documented
- Sources.yml created with descriptions
- Staging models: 1:1 with sources, renamed columns
- Intermediate models: business logic extracted
- Mart models: final aggregations
- Each layer compiles successfully
- Each layer builds successfully
- Row counts match original (manual validation)
- Tests added for key constraints
- 已识别并记录所有源表
- 已创建包含描述的sources.yml
- Staging模型:与源表一一对应,已重命名列
- 中间模型:已提取业务逻辑
- 集市模型:已完成最终聚合
- 每个层级编译成功
- 每个层级构建成功
- 行数与原数据匹配(手动验证)
- 已为关键约束添加测试
Common Migration Patterns
常见迁移模式
- Nested subqueries → Separate models (staging → intermediate → mart)
- Temp tables → Ephemeral materialization
{{ config(materialized='ephemeral') }} - Hardcoded values → Variables
{{ var("name") }}
- 嵌套子查询 → 拆分模型(staging → 中间层 → 集市层)
- 临时表 → 临时物化
{{ config(materialized='ephemeral') }} - 硬编码值 → 变量
{{ var("name") }}
Anti-Patterns
反模式
- Converting entire legacy query to single dbt model
- Skipping the staging layer
- Not validating each layer before proceeding
- Keeping hardcoded values instead of using variables
- Not documenting business logic during migration
- 将整个遗留查询转换为单个dbt模型
- 跳过staging层
- 未验证当前层级就直接进行下一步
- 保留硬编码值而非使用变量
- 迁移过程中未记录业务逻辑