motherduck-model-data

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Model Data in MotherDuck

在MotherDuck中建模数据

Use this skill when creating data models, tables, designing schemas, choosing data types, defining relationships between tables, or restructuring data for analytical workloads.
当你需要创建数据模型、表、设计模式、选择数据类型、定义表间关系或为分析工作负载重构数据时,可以使用此技能。

Core Behavior

核心行为

When a user asks questions like "build a data model", "model my data", or "create a transformation layer", the default output is a file-based project scaffold — not just SQL executed directly in the warehouse.
The project scaffold includes:
  • SQL files organized by lifecycle stage (
    raw/
    ,
    staging/
    ,
    analytics/
    )
  • A manifest (
    model_manifest.yml
    ) defining the DAG: model names, dependencies, materialization strategy, and target database
This is a lightweight framework-agnostic convention for organizing SQL transformations that can be reviewed, versioned, and rerun.
当用户提出诸如“构建数据模型”、“为我的数据建模”或“创建转换层”之类的请求时,默认输出是基于文件的项目脚手架——而非直接在数据仓库中执行的SQL语句。
项目脚手架包含:
  • SQL文件:按生命周期阶段(
    raw/
    staging/
    analytics/
    )组织
  • 清单文件
    model_manifest.yml
    ):定义DAG(有向无环图),包括模型名称、依赖关系、物化策略和目标数据库
这是一个轻量级、与框架无关的SQL转换组织规范,支持代码评审、版本控制和重复运行。

Prerequisites

前置条件

  • MotherDuck connection established via
    motherduck-connect
  • Existing source shape understood via
    motherduck-explore
  • DuckDB SQL syntax available via
    motherduck-duckdb-sql
  • 已通过
    motherduck-connect
    建立MotherDuck连接
  • 已通过
    motherduck-explore
    了解现有数据源结构
  • 已掌握
    motherduck-duckdb-sql
    提供的DuckDB SQL语法

Default Posture

默认准则

  • Design for analytical reads, not transactional writes.
  • Prefer wide denormalized tables and pre-aggregated serving tables over highly normalized OLTP-style schemas.
  • Use fully qualified names and add comments to tables and columns.
  • Use
    NOT NULL
    aggressively; do not assume primary keys or foreign keys are enforced.
  • Reuse an existing dbt, SQLMesh, or repo-local modeling convention when one is already present; create the lightweight scaffold only when there is no established project shape.
  • Separate
    raw
    ,
    staging
    , and
    analytics
    lifecycle stages when the project is non-trivial.
  • Always produce SQL files — never execute transformations directly in the warehouse without first writing them to files.
  • Always produce a manifest — every model must declare its dependencies so the DAG is explicit and reproducible.
  • 针对分析型读取场景设计,而非事务型写入场景。
  • 优先选择宽表、预聚合的服务表,而非高度规范化的OLTP风格模式。
  • 使用全限定名称,并为表和字段添加注释。
  • 积极使用
    NOT NULL
    约束;不假设主键或外键会被强制生效。
  • 若已有dbt、SQLMesh或本地仓库建模规范,则复用现有规范;仅在无既定项目结构时创建轻量级脚手架。
  • 当项目非 trivial 时,分离
    raw
    staging
    analytics
    生命周期阶段。
  • 始终生成SQL文件——绝不直接在数据仓库中执行转换而不先写入文件。
  • 始终生成清单文件——每个模型必须声明其依赖关系,确保DAG清晰可复现。

Workflow

工作流程

  1. Inspect the current source tables and actual column types before designing new models.
  2. Choose the target lifecycle stage and grain for each modeled table. Map dependencies between models.
  3. Create the project directory structure with SQL files and manifest.
  4. Author each model as a standalone SQL file. Use explicit types, nullability, comments, and fully qualified names. Decide between a table, CTAS rebuild, or view based on freshness and cost.
  5. Fill in the manifest with model metadata: name, path, stage, materialization, database, and
    depends_on
    references.
  6. Run the models against the warehouse and verify the resulting tables match expected grain and row counts. If MCP is the runner, DDL or CTAS execution uses
    query_rw
    only after explicit user approval; the default deliverable remains checked-in SQL files plus the manifest.
  1. 在设计新模型前,检查当前源表和实际字段类型。
  2. 为每个建模表选择目标生命周期阶段和粒度,映射模型间的依赖关系。
  3. 创建包含SQL文件和清单文件的项目目录结构。
  4. 将每个模型编写为独立的SQL文件。使用显式类型、空值约束、注释和全限定名称。根据数据新鲜度和成本决定使用表、CTAS重建还是视图。
  5. 在清单文件中填写模型元数据:名称、路径、阶段、物化方式、数据库以及
    depends_on
    引用。
  6. 在数据仓库中运行模型并验证生成的表是否符合预期粒度和行数。若由MCP执行,DDL或CTAS操作仅在用户明确批准后使用
    query_rw
    ;默认交付物为已签入的SQL文件加清单文件。

Expected Project Structure

预期项目结构

<project-name>/
  models/
    raw/
      raw_<entity>.sql           -- DDL for raw landing tables
    staging/
      stg_<entity>.sql           -- Deduplicated, typed, filtered
    analytics/
      dim_<entity>.sql           -- Dimension tables
      fct_<entity>.sql           -- Fact / metric tables
  model_manifest.yml             -- DAG: names, deps, materialization
<project-name>/
  models/
    raw/
      raw_<entity>.sql           -- 原始落地表的DDL
    staging/
      stg_<entity>.sql           -- 去重、类型化、过滤后的表
    analytics/
      dim_<entity>.sql           -- 维度表
      fct_<entity>.sql           -- 事实/指标表
  model_manifest.yml             -- DAG:名称、依赖、物化方式

When to Skip the Scaffold

何时跳过脚手架

If the user explicitly asks for a single table, a quick DDL statement, or an ad-hoc exploration query, produce the SQL directly. The scaffold is the default for modeling work — multi-table, multi-stage transformations with dependencies.
如果用户明确要求单个表、快速DDL语句或临时探索查询,则直接生成SQL。脚手架是建模工作的默认交付物——即涉及多表、多阶段且存在依赖关系的转换工作。

Open Next

后续参考

  • references/MODELING_PLAYBOOK.md
    for schema patterns, data-type guidance, CTAS/view decisions, complex types, constraints, project scaffold conventions, and common modeling mistakes
  • references/MODELING_PLAYBOOK.md
    :包含模式模式、数据类型指南、CTAS/视图决策、复杂类型、约束、项目脚手架规范以及常见建模错误

Related Skills

相关技能

  • motherduck-duckdb-sql
    for type syntax and function details
  • motherduck-query
    for executing DDL, rebuilds, and validation queries
  • motherduck-explore
    for understanding the source schema before remodeling
  • motherduck-load-data
    for ingestion paths that feed the modeled tables
  • motherduck-duckdb-sql
    :获取类型语法和函数细节
  • motherduck-query
    :执行DDL、重建和验证查询
  • motherduck-explore
    :在重构模型前了解源模式
  • motherduck-load-data
    :为建模表提供数据 ingestion 路径