dbt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dbt Expert Engineer Skill

dbt资深工程师技能

This skill provides a comprehensive guide for dbt development.
本技能提供dbt开发的全面指南。

1. dbt Command Basics

1. dbt命令基础

1.1. Required Options

1.1. 必备选项

Always specify these options with dbt commands:
sh
--profiles-dir ~/.dbt --no-use-colors
执行dbt命令时请始终指定以下选项:
sh
--profiles-dir ~/.dbt --no-use-colors

1.2. Connection Verification

1.2. 连接验证

Always verify connection at work start:
sh
dbt debug --profiles-dir ~/.dbt --no-use-colors
每日工作开始时请务必验证连接:
sh
dbt debug --profiles-dir ~/.dbt --no-use-colors

1.3. Ad-hoc Query Execution

1.3. 临时查询执行

Use
dbt show
command for ad-hoc queries in dbt:
sh
undefined
使用
dbt show
命令在dbt中执行临时查询:
sh
undefined

Basic query execution

基础查询执行

dbt show --inline "select 1 as test, current_timestamp() as now" --profiles-dir ~/.dbt --no-use-colors
dbt show --inline "select 1 as test, current_timestamp() as now" --profiles-dir ~/.dbt --no-use-colors

Specify row limit (default is 5)

指定行限制(默认值为5)

dbt show --inline "select * from table_name" --limit 10 --profiles-dir ~/.dbt --no-use-colors
dbt show --inline "select * from table_name" --limit 10 --profiles-dir ~/.dbt --no-use-colors

Reference dbt models

引用dbt模型

dbt show --inline "select * from {{ ref('model_name') }}" --profiles-dir ~/.dbt --no-use-colors
dbt show --inline "select * from {{ ref('model_name') }}" --profiles-dir ~/.dbt --no-use-colors

Direct reference using catalog.schema.table format

使用catalog.schema.table格式直接引用

dbt show --inline "select * from catalog_name.schema_name.table_name" --limit 3 --profiles-dir ~/.dbt --no-use-colors

Notes:

- Explicit LIMIT in query conflicts with `--limit` option and causes error
- DDL commands (SHOW statements, etc.) cause syntax error due to auto-LIMIT
dbt show --inline "select * from catalog_name.schema_name.table_name" --limit 3 --profiles-dir ~/.dbt --no-use-colors

注意事项:

- 查询中显式的LIMIT与`--limit`选项冲突,会导致错误
- DDL命令(如SHOW语句等)会因自动添加LIMIT而引发语法错误

2. Verification Procedures

2. 验证流程

2.1. Verification When dbt run is Prohibited

2.1. 禁止执行dbt run时的验证

Verification steps when
dbt run
cannot be executed to avoid production impact:
  1. Edit model
  2. Generate SQL with
    dbt compile --profiles-dir ~/.dbt --no-use-colors
  3. Get generated SQL from
    target/compiled/
  4. Verify with
    bq query
    or
    databricks
    command (recommend using LIMIT)
当无法执行
dbt run
以避免影响生产环境时的验证步骤:
  1. 编辑模型
  2. 使用
    dbt compile --profiles-dir ~/.dbt --no-use-colors
    生成SQL
  3. target/compiled/
    目录获取生成的SQL
  4. 使用
    bq query
    databricks
    命令验证(建议使用LIMIT)

2.2. Verification When dbt run is Allowed

2.2. 允许执行dbt run时的验证

Verification steps when
dbt run
is allowed in development/sandbox environments:
  1. Edit model
  2. Execute
    dbt run --select +model_name --profiles-dir ~/.dbt --no-use-colors
  3. Execute
    dbt test --select +model_name --profiles-dir ~/.dbt --no-use-colors
  4. Query generated table directly if needed
Notes:
  • Use
    --select
    option to limit scope
  • For model AND tag conditions, use
    --select "staging.target,tag:tag_name"
在开发/沙箱环境中允许执行
dbt run
时的验证步骤:
  1. 编辑模型
  2. 执行
    dbt run --select +model_name --profiles-dir ~/.dbt --no-use-colors
  3. 执行
    dbt test --select +model_name --profiles-dir ~/.dbt --no-use-colors
  4. 如有需要,直接查询生成的表
注意事项:
  • 使用
    --select
    选项限制范围
  • 如需同时满足模型和标签条件,使用
    --select "staging.target,tag:tag_name"

3. Issue Work Target Setup

3. 问题专属Target配置

Always set up Issue-specific target before
dbt run
during Issue work.
处理问题时,执行
dbt run
前请务必配置问题专属的target。

3.1. Setup Procedure

3.1. 配置步骤

  1. Read
    ~/.dbt/profiles.yml
    and check existing settings
  2. Add Issue-specific target if not exists, based on existing
    dev
    target
yaml
my_databricks_dbt:
  outputs:
    dev:
      # Existing settings...
    issue_123: # Name based on issue number
      catalog: dbt_dev_{username} # Same as dev
      host: dbc-xxxxx.cloud.databricks.com # Same as dev
      http_path: /sql/1.0/warehouses/xxxxx # Same as dev
      schema: dwh_issue_123 # Include issue number in schema name
      threads: 1
      token: dapixxxxx # Same as dev
      type: databricks
  target: dev
Then switch with
--target
option when executing dbt commands
sh
undefined
  1. 读取
    ~/.dbt/profiles.yml
    并检查现有配置
  2. 基于现有的
    dev
    target添加问题专属的target(如不存在)
yaml
my_databricks_dbt:
  outputs:
    dev:
      # 现有配置...
    issue_123: # 基于问题编号命名
      catalog: dbt_dev_{username} # 与dev一致
      host: dbc-xxxxx.cloud.databricks.com # 与dev一致
      http_path: /sql/1.0/warehouses/xxxxx # 与dev一致
      schema: dwh_issue_123 # 架构名称中包含问题编号
      threads: 1
      token: dapixxxxx # 与dev一致
      type: databricks
  target: dev
然后在执行dbt命令时使用
--target
选项切换
sh
undefined

Execute with issue_123 target

使用issue_123 target执行

dbt run --select +model_name --target issue_123 --profiles-dir ~/.dbt --no-use-colors
dbt run --select +model_name --target issue_123 --profiles-dir ~/.dbt --no-use-colors

Verify connection

验证连接

dbt debug --target issue_123 --profiles-dir ~/.dbt --no-use-colors
undefined
dbt debug --target issue_123 --profiles-dir ~/.dbt --no-use-colors
undefined

3.2. Notes

3.2. 注意事项

  • Keep target name and schema name consistent with issue number
  • Manually delete unused schemas after work completion
  • Intermediate layer auto-generates as
    {schema}_dbt_intermediates
  • target名称与架构名称需与问题编号保持一致
  • 工作完成后手动删除未使用的架构
  • 中间层会自动生成为
    {schema}_dbt_intermediates

4. Databricks SQL Dialect

4. Databricks SQL方言

  • Full-width column names require backticks
  • Column names and catalog names with hyphens require backticks
sql
-- Reference catalog name with hyphen
select * from `catalog-name`.schema_name.table_name;

-- Reference full-width column name
select `full-width column` from table_name;
  • 全角列名需要使用反引号包裹
  • 包含连字符的列名和catalog名称需要使用反引号包裹
sql
-- 引用包含连字符的catalog名称
select * from `catalog-name`.schema_name.table_name;

-- 引用全角列名
select `full-width column` from table_name;

5. Response Format

5. 响应格式

text
[Documentation-based response]

Source: [source_url]
Fetched: [fetched_at]
text
[基于文档的响应]

来源:[source_url]
获取时间:[fetched_at]