using-dbt-for-analytics-engineering

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Using dbt for Analytics Engineering

使用dbt进行分析工程

Core principle: Apply software engineering discipline (DRY, modularity, testing) to data transformation work through dbt's abstraction layer.

核心原则： 通过dbt的抽象层，将软件工程准则（DRY、模块化、测试）应用于数据转换工作。

When to Use

适用场景

Building new dbt models, sources, or tests
Modifying existing model logic or configurations
Refactoring a dbt project structure
Creating analytics pipelines or data transformations
Working with warehouse data that needs modeling

Do NOT use for:

Querying the semantic layer (use the

answering-natural-language-questions-with-dbt

skill)

构建新的dbt模型、数据源或测试
修改现有模型的逻辑或配置
重构dbt项目结构
创建分析管道或数据转换流程
处理需要建模的数仓数据

不适用场景：

查询语义层（请使用

answering-natural-language-questions-with-dbt

技能）

Reference Guides

参考指南

This skill includes detailed reference guides for specific techniques. Read the relevant guide when needed:

Guide	Use When
references/planning-dbt-models.md	Building new models - work backwards from desired output and use `dbt show` to validate results
references/discovering-data.md	Exploring unfamiliar sources or onboarding to a project
references/writing-data-tests.md	Adding tests - prioritize high-value tests over exhaustive coverage
references/debugging-dbt-errors.md	Fixing project parsing, compilation, or database errors
references/evaluating-impact-of-a-dbt-model-change.md	Assessing downstream effects before modifying models
references/writing-documentation.md	Write documentation that doesn't just restate the column name
references/managing-packages.md	Installing and managing dbt packages

本技能包含针对特定技术的详细参考指南。按需阅读相关指南：

指南	适用场景
references/planning-dbt-models.md	构建新模型——从期望的输出倒推，并使用 `dbt show` 验证结果
references/discovering-data.md	探索不熟悉的数据源或参与新项目
references/writing-data-tests.md	添加测试——优先覆盖高价值场景，而非追求全面覆盖
references/debugging-dbt-errors.md	修复项目解析、编译或数据库错误
references/evaluating-impact-of-a-dbt-model-change.md	修改模型前评估下游影响
references/writing-documentation.md	编写不只是重复列名的文档
references/managing-packages.md	安装和管理dbt包

DAG building guidelines

DAG构建准则

Conform to the existing style of a project (medallion layers, stage/intermediate/mart, etc)
Focus heavily on DRY principles.
- Before adding a new model or column, always be sure that the same logic isn't already defined elsewhere that can be used.
- Prefer a change that requires you to add one column to an existing intermediate model over adding an entire additional model to the project.

When users request new models: Always ask "why a new model vs extending existing?" before proceeding. Legitimate reasons exist (different grain, precalculation for performance), but users often request new models out of habit. Your job is to surface the tradeoff, not blindly comply.

遵循项目现有风格（如medallion分层、stage/intermediate/mart结构等）
严格遵循DRY原则
- 添加新模型或列之前，务必确认相同逻辑未在其他地方定义
- 优先选择在现有中间模型中添加一列，而非在项目中新增完整模型

当用户请求新模型时： 开始前务必询问“为什么需要新模型而非扩展现有模型？”。确实存在合理理由（如不同粒度、性能预计算），但用户常因习惯提出请求。你的职责是明确权衡利弊，而非盲目执行。

Model building guidelines

模型构建准则

Always use data modelling best practices when working in a project
Follow dbt best practices in code:
- Always use
```
{{ ref }}
```
  and
```
{{ source }}
```
  over hardcoded table names
- Use CTEs over subqueries
Before building a model, follow references/planning-dbt-models.md to plan your approach.
Before modifying or building on existing models, read their YAML documentation:
- Find the model's YAML file (can be any
```
.yml
```
  or
```
.yaml
```
  file in the models directory, but normally colocated with the SQL file)
- Check the model's
```
description
```
  to understand its purpose
- Read column-level
```
description
```
  fields to understand what each column represents
- Review any
```
meta
```
  properties that document business logic or ownership
- This context prevents misusing columns or duplicating existing logic

在项目中工作时，始终遵循数据建模最佳实践
代码中遵循dbt最佳实践：
- 始终使用
```
{{ ref }}
```
  和
```
{{ source }}
```
  ，而非硬编码表名
- 使用CTE而非子查询
构建模型前，遵循references/planning-dbt-models.md规划方法
修改或基于现有模型开发前，阅读其YAML文档：
- 找到模型的YAML文件（通常位于models目录下的
```
.yml
```
  或
```
.yaml
```
  文件，一般与SQL文件同目录）
- 查看模型的
```
description
```
  以了解其用途
- 阅读列级
```
description
```
  字段，理解每列的含义
- 查看任何记录业务逻辑或归属的
```
meta
```
  属性
- 这些上下文可避免误用列或重复现有逻辑

You must look at the data to be able to correctly model the data

必须查看数据才能正确建模

When implementing a model, you must use

dbt show

regularly to:

preview the input data you will work with, so that you use relevant columns and values
preview the results of your model, so that you know your work is correct
run basic data profiling (counts, min, max, nulls) of input and output data, to check for misconfigured joins or other logic errors

实现模型时，必须定期使用

dbt show

：

预览要处理的输入数据，确保使用相关列和值
预览模型结果，确认工作正确性
对输入和输出数据进行基础数据探查（计数、最小值、最大值、空值），检查连接配置错误或其他逻辑问题

Cost management best practices

成本管理最佳实践

Use
```
--limit
```
with
```
dbt show
```
and insert limits early into CTEs when exploring data
Use deferral (
```
--defer --state path/to/prod/artifacts
```
) to reuse production objects
Use
```
dbt clone
```
to produce zero-copy clones
Avoid large unpartitioned table scans in BigQuery
Always use
```
--select
```
instead of running the entire project

探索数据时，在
```
dbt show
```
中使用
```
--limit
```
，并在CTE早期添加限制条件
使用延迟加载（
```
--defer --state path/to/prod/artifacts
```
）复用生产对象
使用
```
dbt clone
```
生成零拷贝克隆
在BigQuery中避免大型未分区表扫描
始终使用
```
--select
```
而非运行整个项目

Interacting with the CLI

与CLI交互

You will be working in a terminal environment where you have access to the dbt CLI, and potentially the dbt MCP server. The MCP server may include access to the dbt Cloud platform's APIs if relevant.
You should prefer working with the dbt MCP server's tools, and help the user install and onboard the MCP when appropriate.

你将在终端环境中工作，可访问dbt CLI，可能还有dbt MCP服务器。MCP服务器可能包含对dbt Cloud平台API的访问权限（如相关）
优先使用dbt MCP服务器的工具，并在适当时帮助用户安装和接入MCP

Common Mistakes

常见错误

Mistake	Why It's Wrong	Fix
One-shotting models	Data work requires validation; schemas are unknown	Follow references/planning-dbt-models.md, iterate with `dbt show`
Not working iteratively	Changes to multiple models at once makes it hard to debug	Run `dbt build --select changed_model` on each model after modifying it
Assuming schema knowledge	Column names, types, and values vary across warehouses	Follow references/discovering-data.md before writing SQL
Not reading existing model documentation	Column names don't reveal business meaning	Read YAML descriptions before modifying models
Creating unnecessary models	Warehouse compute has real costs	Extend existing models when possible
Hardcoding table names	Breaks dbt's dependency graph	Always use `{{ ref() }}` and `{{ source() }}`
Global config changes	Configuration cascades unexpectedly	Change surgically, match existing patterns
Running DDL directly	Bypasses dbt's abstraction and tracking	Use dbt commands exclusively

错误	原因	解决方法
一次性构建模型	数据工作需要验证， schema未知	遵循references/planning-dbt-models.md，使用 `dbt show` 迭代开发
非迭代开发	同时修改多个模型会增加调试难度	修改后对每个模型运行 `dbt build --select changed_model`
假设已知schema	不同数仓的列名、类型和值存在差异	编写SQL前遵循references/discovering-data.md
未阅读现有模型文档	列名无法体现业务含义	修改模型前阅读YAML描述
创建不必要的模型	数仓计算有实际成本	尽可能扩展现有模型
硬编码表名	破坏dbt的依赖关系图	始终使用 `{{ ref() }}` 和 `{{ source() }}`
全局配置变更	配置会意外级联影响	精准修改，匹配现有模式
直接运行DDL	绕过dbt的抽象和跟踪	仅使用dbt命令

Rationalizations to Resist

需要抵制的合理化借口

Excuse	Reality
"User explicitly asked for a new model"	Users request out of habit. Ask why before complying.
"I've done this pattern hundreds of times"	This project's schema may differ. Verify with `dbt show` .
"User is senior / knows what they're doing"	Seniority doesn't change compute costs. Surface tradeoffs.
"It's just a small change"	Small changes compound. Follow DRY principles.

借口	实际情况
“用户明确要求新模型”	用户常因习惯提出请求，执行前先询问原因
“我已经做过几百次这个模式”	不同项目的schema可能不同，需用 `dbt show` 验证
“用户是资深人员/知道自己在做什么”	资历不改变计算成本，需明确权衡利弊
“只是一个小改动”	小改动会累积，遵循DRY原则

Red Flags - STOP and Reconsider

危险信号——停止并重新考虑

About to write SQL without checking actual column names
Modifying a model without reading its YAML documentation first
Creating a new model when a column addition would suffice
User gave table names as "the usual columns" - verify anyway
Skipping
```
dbt show
```
validation because "it's straightforward"
Running DDL or queries directly against the warehouse

未检查实际列名就准备编写SQL
未阅读YAML文档就修改模型
可通过添加列实现却要创建新模型
用户称表名是“常用列”——仍需验证
因“逻辑简单”跳过
```
dbt show
```
验证
直接对数仓运行DDL或查询