using-dbt-for-analytics-engineering
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUsing dbt for Analytics Engineering
使用dbt进行分析工程
Core principle: Apply software engineering discipline (DRY, modularity, testing) to data transformation work through dbt's abstraction layer.
核心原则: 通过dbt的抽象层,将软件工程准则(DRY、模块化、测试)应用于数据转换工作。
When to Use
适用场景
- Building new dbt models, sources, or tests
- Modifying existing model logic or configurations
- Refactoring a dbt project structure
- Creating analytics pipelines or data transformations
- Working with warehouse data that needs modeling
Do NOT use for:
- Querying the semantic layer (use the skill)
answering-natural-language-questions-with-dbt
- 构建新的dbt模型、数据源或测试
- 修改现有模型的逻辑或配置
- 重构dbt项目结构
- 创建分析管道或数据转换流程
- 处理需要建模的数仓数据
不适用场景:
- 查询语义层(请使用技能)
answering-natural-language-questions-with-dbt
Reference Guides
参考指南
This skill includes detailed reference guides for specific techniques. Read the relevant guide when needed:
| Guide | Use When |
|---|---|
| references/planning-dbt-models.md | Building new models - work backwards from desired output and use |
| references/discovering-data.md | Exploring unfamiliar sources or onboarding to a project |
| references/writing-data-tests.md | Adding tests - prioritize high-value tests over exhaustive coverage |
| references/debugging-dbt-errors.md | Fixing project parsing, compilation, or database errors |
| references/evaluating-impact-of-a-dbt-model-change.md | Assessing downstream effects before modifying models |
| references/writing-documentation.md | Write documentation that doesn't just restate the column name |
| references/managing-packages.md | Installing and managing dbt packages |
本技能包含针对特定技术的详细参考指南。按需阅读相关指南:
| 指南 | 适用场景 |
|---|---|
| references/planning-dbt-models.md | 构建新模型——从期望的输出倒推,并使用 |
| references/discovering-data.md | 探索不熟悉的数据源或参与新项目 |
| references/writing-data-tests.md | 添加测试——优先覆盖高价值场景,而非追求全面覆盖 |
| references/debugging-dbt-errors.md | 修复项目解析、编译或数据库错误 |
| references/evaluating-impact-of-a-dbt-model-change.md | 修改模型前评估下游影响 |
| references/writing-documentation.md | 编写不只是重复列名的文档 |
| references/managing-packages.md | 安装和管理dbt包 |
DAG building guidelines
DAG构建准则
- Conform to the existing style of a project (medallion layers, stage/intermediate/mart, etc)
- Focus heavily on DRY principles.
- Before adding a new model or column, always be sure that the same logic isn't already defined elsewhere that can be used.
- Prefer a change that requires you to add one column to an existing intermediate model over adding an entire additional model to the project.
When users request new models: Always ask "why a new model vs extending existing?" before proceeding. Legitimate reasons exist (different grain, precalculation for performance), but users often request new models out of habit. Your job is to surface the tradeoff, not blindly comply.
- 遵循项目现有风格(如medallion分层、stage/intermediate/mart结构等)
- 严格遵循DRY原则
- 添加新模型或列之前,务必确认相同逻辑未在其他地方定义
- 优先选择在现有中间模型中添加一列,而非在项目中新增完整模型
当用户请求新模型时: 开始前务必询问“为什么需要新模型而非扩展现有模型?”。确实存在合理理由(如不同粒度、性能预计算),但用户常因习惯提出请求。你的职责是明确权衡利弊,而非盲目执行。
Model building guidelines
模型构建准则
- Always use data modelling best practices when working in a project
- Follow dbt best practices in code:
- Always use and
{{ ref }}over hardcoded table names{{ source }} - Use CTEs over subqueries
- Always use
- Before building a model, follow references/planning-dbt-models.md to plan your approach.
- Before modifying or building on existing models, read their YAML documentation:
- Find the model's YAML file (can be any or
.ymlfile in the models directory, but normally colocated with the SQL file).yaml - Check the model's to understand its purpose
description - Read column-level fields to understand what each column represents
description - Review any properties that document business logic or ownership
meta - This context prevents misusing columns or duplicating existing logic
- Find the model's YAML file (can be any
- 在项目中工作时,始终遵循数据建模最佳实践
- 代码中遵循dbt最佳实践:
- 始终使用和
{{ ref }},而非硬编码表名{{ source }} - 使用CTE而非子查询
- 始终使用
- 构建模型前,遵循references/planning-dbt-models.md规划方法
- 修改或基于现有模型开发前,阅读其YAML文档:
- 找到模型的YAML文件(通常位于models目录下的或
.yml文件,一般与SQL文件同目录).yaml - 查看模型的以了解其用途
description - 阅读列级字段,理解每列的含义
description - 查看任何记录业务逻辑或归属的属性
meta - 这些上下文可避免误用列或重复现有逻辑
- 找到模型的YAML文件(通常位于models目录下的
You must look at the data to be able to correctly model the data
必须查看数据才能正确建模
When implementing a model, you must use regularly to:
dbt show- preview the input data you will work with, so that you use relevant columns and values
- preview the results of your model, so that you know your work is correct
- run basic data profiling (counts, min, max, nulls) of input and output data, to check for misconfigured joins or other logic errors
实现模型时,必须定期使用:
dbt show- 预览要处理的输入数据,确保使用相关列和值
- 预览模型结果,确认工作正确性
- 对输入和输出数据进行基础数据探查(计数、最小值、最大值、空值),检查连接配置错误或其他逻辑问题
Cost management best practices
成本管理最佳实践
- Use with
--limitand insert limits early into CTEs when exploring datadbt show - Use deferral () to reuse production objects
--defer --state path/to/prod/artifacts - Use to produce zero-copy clones
dbt clone - Avoid large unpartitioned table scans in BigQuery
- Always use instead of running the entire project
--select
- 探索数据时,在中使用
dbt show,并在CTE早期添加限制条件--limit - 使用延迟加载()复用生产对象
--defer --state path/to/prod/artifacts - 使用生成零拷贝克隆
dbt clone - 在BigQuery中避免大型未分区表扫描
- 始终使用而非运行整个项目
--select
Interacting with the CLI
与CLI交互
- You will be working in a terminal environment where you have access to the dbt CLI, and potentially the dbt MCP server. The MCP server may include access to the dbt Cloud platform's APIs if relevant.
- You should prefer working with the dbt MCP server's tools, and help the user install and onboard the MCP when appropriate.
- 你将在终端环境中工作,可访问dbt CLI,可能还有dbt MCP服务器。MCP服务器可能包含对dbt Cloud平台API的访问权限(如相关)
- 优先使用dbt MCP服务器的工具,并在适当时帮助用户安装和接入MCP
Common Mistakes
常见错误
| Mistake | Why It's Wrong | Fix |
|---|---|---|
| One-shotting models | Data work requires validation; schemas are unknown | Follow references/planning-dbt-models.md, iterate with |
| Not working iteratively | Changes to multiple models at once makes it hard to debug | Run |
| Assuming schema knowledge | Column names, types, and values vary across warehouses | Follow references/discovering-data.md before writing SQL |
| Not reading existing model documentation | Column names don't reveal business meaning | Read YAML descriptions before modifying models |
| Creating unnecessary models | Warehouse compute has real costs | Extend existing models when possible |
| Hardcoding table names | Breaks dbt's dependency graph | Always use |
| Global config changes | Configuration cascades unexpectedly | Change surgically, match existing patterns |
| Running DDL directly | Bypasses dbt's abstraction and tracking | Use dbt commands exclusively |
| 错误 | 原因 | 解决方法 |
|---|---|---|
| 一次性构建模型 | 数据工作需要验证, schema未知 | 遵循references/planning-dbt-models.md,使用 |
| 非迭代开发 | 同时修改多个模型会增加调试难度 | 修改后对每个模型运行 |
| 假设已知schema | 不同数仓的列名、类型和值存在差异 | 编写SQL前遵循references/discovering-data.md |
| 未阅读现有模型文档 | 列名无法体现业务含义 | 修改模型前阅读YAML描述 |
| 创建不必要的模型 | 数仓计算有实际成本 | 尽可能扩展现有模型 |
| 硬编码表名 | 破坏dbt的依赖关系图 | 始终使用 |
| 全局配置变更 | 配置会意外级联影响 | 精准修改,匹配现有模式 |
| 直接运行DDL | 绕过dbt的抽象和跟踪 | 仅使用dbt命令 |
Rationalizations to Resist
需要抵制的合理化借口
| Excuse | Reality |
|---|---|
| "User explicitly asked for a new model" | Users request out of habit. Ask why before complying. |
| "I've done this pattern hundreds of times" | This project's schema may differ. Verify with |
| "User is senior / knows what they're doing" | Seniority doesn't change compute costs. Surface tradeoffs. |
| "It's just a small change" | Small changes compound. Follow DRY principles. |
| 借口 | 实际情况 |
|---|---|
| “用户明确要求新模型” | 用户常因习惯提出请求,执行前先询问原因 |
| “我已经做过几百次这个模式” | 不同项目的schema可能不同,需用 |
| “用户是资深人员/知道自己在做什么” | 资历不改变计算成本,需明确权衡利弊 |
| “只是一个小改动” | 小改动会累积,遵循DRY原则 |
Red Flags - STOP and Reconsider
危险信号——停止并重新考虑
- About to write SQL without checking actual column names
- Modifying a model without reading its YAML documentation first
- Creating a new model when a column addition would suffice
- User gave table names as "the usual columns" - verify anyway
- Skipping validation because "it's straightforward"
dbt show - Running DDL or queries directly against the warehouse
- 未检查实际列名就准备编写SQL
- 未阅读YAML文档就修改模型
- 可通过添加列实现却要创建新模型
- 用户称表名是“常用列”——仍需验证
- 因“逻辑简单”跳过验证
dbt show - 直接对数仓运行DDL或查询