wren-dlt-connector

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

wren-dlt-connector

Connect SaaS data to Wren Engine for SQL analysis — from zero to a verified, queryable project in one conversation.

将SaaS数据连接至Wren Engine以进行SQL分析——只需一次对话，即可从零搭建出可验证、可查询的项目。

Who this is for

适用人群

Data analysts who know SQL and some Python, but may not have used dlt or Wren before. Explain concepts briefly when they first appear, but don't over-explain things a SQL-literate person would already know.

熟悉SQL和基础Python，但可能从未使用过dlt或Wren的数据分析师。首次出现相关概念时可简要解释，但无需过度讲解懂SQL的用户已了解的内容。

Overview

流程概述

This skill walks through a four-phase workflow:

Extract — Use dlt (data load tool) to pull data from a SaaS API into a local DuckDB file
Model — Introspect the DuckDB schema and auto-generate a Wren semantic project (YAML models, relationships, profile)
Build & Verify — Build the project and run actual SQL queries to confirm everything works end-to-end
Handoff — Show the user their data and next steps

The user might enter at any phase. Ask which phase they're starting from — they may already have a

.duckdb

file and just need phases 2–4.

The goal is a project that actually queries successfully, not just files that look correct. Always run the verification step before declaring success.

此技能会引导用户完成四阶段工作流：

提取 — 使用dlt（data load tool）从SaaS API拉取数据到本地DuckDB文件
建模 — 探查DuckDB的Schema并自动生成Wren语义项目（YAML模型、关联关系、配置文件）
构建与验证 — 构建项目并运行实际SQL查询，确认端到端流程正常运行
交付 — 向用户展示其数据及后续步骤

用户可能从任意阶段开始操作。询问用户的起始阶段——他们可能已经拥有

.duckdb

文件，只需完成第2至第4阶段。

**目标是打造一个能成功执行查询的项目，而非仅生成看似正确的文件。**在宣布成功前务必执行验证步骤。

Critical: DuckDB catalog naming

关键注意事项：DuckDB catalog命名规则

When wren engine connects to a DuckDB file, it ATTACHes it using the filename (without

.duckdb

extension) as the catalog alias:

ATTACH DATABASE 'stripe_data.duckdb' AS "stripe_data" (READ_ONLY)

This means every model's
table_reference.catalog
must equal the DuckDB filename stem. If the file is

hubspot.duckdb

, the catalog is

hubspot

. If it's

my_pipeline.duckdb

, the catalog is

my_pipeline

Getting this wrong causes "table not found" errors at query time. The

introspect_dlt.py

script handles this automatically.

当Wren Engine连接DuckDB文件时，会使用文件名（不含

.duckdb

扩展名）作为catalog别名来ATTACH数据库：

ATTACH DATABASE 'stripe_data.duckdb' AS "stripe_data" (READ_ONLY)

这意味着每个模型的
table_reference.catalog
必须与DuckDB文件名的主干部分一致。如果文件名为

hubspot.duckdb

，则catalog为

hubspot

；如果文件名为

my_pipeline.duckdb

，则catalog为

my_pipeline

。

此处设置错误会导致查询时出现“表未找到”错误。

introspect_dlt.py

脚本会自动处理此问题。

Critical: Type normalization

关键注意事项：类型标准化

Column types must be normalized using wren SDK's

type_mapping.parse_type()

function, which uses sqlglot to convert database-specific types (like DuckDB's

HUGEINT

TIMESTAMP WITH TIME ZONE

) into canonical SQL types that wren-core understands. Do not hardcode type mappings — always delegate to

parse_type(raw_type, "duckdb")

The

introspect_dlt.py

script does this automatically when wren SDK is installed.

必须使用Wren SDK的

type_mapping.parse_type()

函数对列类型进行标准化，该函数通过sqlglot将数据库特定类型（如DuckDB的

HUGEINT

、

TIMESTAMP WITH TIME ZONE

）转换为Wren-core可识别的标准SQL类型。请勿硬编码类型映射——始终委托给

parse_type(raw_type, "duckdb")

处理。

安装Wren SDK后，

introspect_dlt.py

脚本会自动完成此操作。

Phase 1: Extract — dlt Pipeline Setup

阶段1：提取 — dlt管道设置

Step 1: Pick the SaaS source

步骤1：选择SaaS数据源

Ask the user which SaaS service they want to connect. Read

references/dlt_sources.md

for a list of popular verified sources and their auth requirements. If the source isn't listed, check whether dlt has a verified source for it by searching

dlthub.com/docs/dlt-ecosystem/verified-sources

询问用户想要连接的SaaS服务。查看

references/dlt_sources.md

获取热门已验证数据源及其认证要求的列表。如果该数据源未在列表中，可通过搜索

dlthub.com/docs/dlt-ecosystem/verified-sources

确认dlt是否有对应的已验证数据源。

Step 2: Install dlt

步骤2：安装dlt

bash

pip install "dlt[duckdb]" --break-system-packages

bash

pip install "dlt[duckdb]" --break-system-packages

Step 3: Write the pipeline script

步骤3：编写管道脚本

Create a Python script that:

Imports the dlt source function for the chosen SaaS
Configures the pipeline with
```
destination='duckdb'
```
and a local file path
Runs the pipeline with
```
pipeline.run(source)
```

Here's the general pattern — adapt it per source (check

references/dlt_sources.md

for source-specific templates):

python

import dlt

pipeline = dlt.pipeline(
    pipeline_name="<source>_pipeline",
    destination="duckdb",
    dataset_name="<source>_data",
)

创建Python脚本，完成以下操作：

导入所选SaaS对应的dlt源函数
配置管道，设置
```
destination='duckdb'
```
及本地文件路径
运行
```
pipeline.run(source)
```
执行管道

以下是通用模板——可根据数据源调整（查看

references/dlt_sources.md

获取数据源特定模板）：

python

import dlt

pipeline = dlt.pipeline(
    pipeline_name="<source>_pipeline",
    destination="duckdb",
    dataset_name="<source>_data",
)

Source-specific: check references/dlt_sources.md for auth patterns

数据源特定：查看references/dlt_sources.md获取认证模式

source = <source_function>(api_key=dlt.secrets.value)

info = pipeline.run(source) print(info)

undefined

source = <source_function>(api_key=dlt.secrets.value)

info = pipeline.run(source) print(info)

undefined

Step 4: Set up credentials

步骤4：配置凭证

dlt reads credentials from environment variables or

.dlt/secrets.toml

. The simplest approach for a one-time run:

bash

undefined

dlt从环境变量或

.dlt/secrets.toml

读取凭证。一次性运行的最简方式：

bash

undefined

Set the credential as an environment variable

将凭证设置为环境变量

The exact variable name depends on the source — check references/dlt_sources.md

具体变量名取决于数据源——查看references/dlt_sources.md

export SOURCES__<SOURCE>__API_KEY="the-actual-key"


Ask the user for their API key or token. Remind them:
- Never commit credentials to git
- Environment variables are the simplest way for a one-time run
- For repeated use, they can create `.dlt/secrets.toml`

export SOURCES__<SOURCE>__API_KEY="the-actual-key"


向用户索要其API密钥或令牌。提醒用户：
- 切勿将凭证提交至git
- 环境变量是一次性运行的最简方式
- 如需重复使用，可创建`.dlt/secrets.toml`文件

Step 5: Run the pipeline

步骤5：运行管道

bash

python <pipeline_script>.py

After the run, confirm:

The pipeline completed without errors
A
```
.duckdb
```
file was created (usually at
```
<pipeline_name>.duckdb
```
)
Print discovered tables and their column counts

python

import duckdb
con = duckdb.connect("<pipeline_name>.duckdb", read_only=True)
for row in con.execute("""
    SELECT table_schema, table_name,
           (SELECT COUNT(*) FROM information_schema.columns c
            WHERE c.table_schema = t.table_schema AND c.table_name = t.table_name) as col_count
    FROM information_schema.tables t
    WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
      AND table_name NOT LIKE '_dlt_%'
    ORDER BY table_schema, table_name
""").fetchall():
    print(f"  {row[0]}.{row[1]} ({row[2]} columns)")
con.close()

bash

python <pipeline_script>.py

运行完成后，确认：

管道无错误完成运行
已创建
```
.duckdb
```
文件（通常位于
```
<pipeline_name>.duckdb
```
路径）
打印已发现的表及其列数

python

import duckdb
con = duckdb.connect("<pipeline_name>.duckdb", read_only=True)
for row in con.execute("""
    SELECT table_schema, table_name,
           (SELECT COUNT(*) FROM information_schema.columns c
            WHERE c.table_schema = t.table_schema AND c.table_name = t.table_name) as col_count
    FROM information_schema.tables t
    WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
      AND table_name NOT LIKE '_dlt_%'
    ORDER BY table_schema, table_name
""").fetchall():
    print(f"  {row[0]}.{row[1]} ({row[2]} columns)")
con.close()

Phase 2: Model — Generate Wren Project

阶段2：建模 — 生成Wren项目

Run the introspection script to auto-generate a complete Wren project from the DuckDB file:

bash

python <path-to-this-skill>/scripts/introspect_dlt.py \
    --duckdb-path <path-to-duckdb-file> \
    --output-dir <project-directory> \
    --project-name <name>

This script:

Connects to the DuckDB file (read-only)
Sets
table_reference.catalog
to the DuckDB filename stem (matching wren engine's ATTACH behavior)
Discovers all tables and columns via
```
information_schema
```
Filters out dlt internal tables (
```
_dlt_loads
```
,
```
_dlt_pipeline_state
```
, etc.)
Filters out dlt metadata columns (
```
_dlt_id
```
,
```
_dlt_load_id
```
,
```
_dlt_list_idx
```
) from model definitions
Detects parent-child relationships from
```
_dlt_parent_id
```
columns and table naming conventions
Normalizes column types using
wren.type_mapping.parse_type()
(sqlglot-based)
Generates a complete v2 YAML project (wren_project.yml, models/, relationships.yml, instructions.md)

After running, show the user what was generated:

bash

undefined

运行探查脚本，从DuckDB文件自动生成完整的Wren项目：

bash

python <path-to-this-skill>/scripts/introspect_dlt.py \
    --duckdb-path <path-to-duckdb-file> \
    --output-dir <project-directory> \
    --project-name <name>

此脚本会：

以只读模式连接DuckDB文件
将
table_reference.catalog
设置为DuckDB文件名的主干部分（匹配Wren Engine的ATTACH行为）
通过
```
information_schema
```
发现所有表和列
过滤掉dlt内部表（
```
_dlt_loads
```
、
```
_dlt_pipeline_state
```
等）
从模型定义中过滤掉dlt元数据列（
```
_dlt_id
```
、
```
_dlt_load_id
```
、
```
_dlt_list_idx
```
）
从
```
_dlt_parent_id
```
列和表命名规则中检测父子关联关系
使用
wren.type_mapping.parse_type()
标准化列类型（基于sqlglot）
生成完整的v2 YAML项目（wren_project.yml、models/、relationships.yml、instructions.md）

运行完成后，向用户展示生成的内容：

bash

undefined

Show project summary

展示项目摘要

cat <project-directory>/wren_project.yml echo "---" ls <project-directory>/models/ echo "---" cat <project-directory>/relationships.yml

undefined

cat <project-directory>/wren_project.yml echo "---" ls <project-directory>/models/ echo "---" cat <project-directory>/relationships.yml

undefined

Verify model correctness

验证模型正确性

Spot-check one generated model to confirm:

table_reference.catalog

matches the DuckDB filename (e.g.,

stripe_data

for

stripe_data.duckdb

)

```
table_reference.schema
```
matches the DuckDB schema (usually
```
main
```
)
No
```
_dlt_*
```
columns appear in the columns list
Column types look reasonable (VARCHAR, BIGINT, BOOLEAN, TIMESTAMP, etc.)

抽查一个生成的模型，确认：

table_reference.catalog

与DuckDB文件名匹配（例如

stripe_data.duckdb

对应

stripe_data

）

```
table_reference.schema
```
与DuckDB的schema匹配（通常为
```
main
```
）
列列表中无
```
_dlt_*
```
列
列类型合理（VARCHAR、BIGINT、BOOLEAN、TIMESTAMP等）

Set up the connection profile

设置连接配置文件

Create a Wren profile so the user can query without specifying connection details every time. The

url

must point to the directory containing the

.duckdb

file (not the file itself):

python

import yaml
from pathlib import Path

wren_home = Path.home() / ".wren"
wren_home.mkdir(exist_ok=True)
profiles_file = wren_home / "profiles.yml"

existing = (yaml.safe_load(profiles_file.read_text()) or {}) if profiles_file.exists() else {}
existing.setdefault("profiles", {})

profile_name = "<source>_dlt"
existing["profiles"][profile_name] = {
    "datasource": "duckdb",
    "url": str(Path("<duckdb-path>").resolve().parent),
    "format": "duckdb",
}
existing["active"] = profile_name

profiles_file.write_text(yaml.dump(existing, default_flow_style=False, sort_keys=False))

创建Wren配置文件，让用户无需每次查询都指定连接详情。

url

必须指向包含
.duckdb
文件的目录（而非文件本身）：

python

import yaml
from pathlib import Path

wren_home = Path.home() / ".wren"
wren_home.mkdir(exist_ok=True)
profiles_file = wren_home / "profiles.yml"

existing = (yaml.safe_load(profiles_file.read_text()) or {}) if profiles_file.exists() else {}
existing.setdefault("profiles", {})

profile_name = "<source>_dlt"
existing["profiles"][profile_name] = {
    "datasource": "duckdb",
    "url": str(Path("<duckdb-path>").resolve().parent),
    "format": "duckdb",
}
existing["active"] = profile_name

profiles_file.write_text(yaml.dump(existing, default_flow_style=False, sort_keys=False))

Phase 3: Build & Verify — The Project Must Actually Work

阶段3：构建与验证 — 项目必须能实际运行

This phase is not optional. A project that generates YAML but fails at query time is not a success.

此阶段为必填项。仅生成YAML但查询失败的项目不算成功。

Step 1: Build the MDL

步骤1：构建MDL

bash

cd <project-directory>
wren context build

This compiles the YAML models into

target/mdl.json

. If this fails, fix the issues before proceeding (see Troubleshooting below).

bash

cd <project-directory>
wren context build

此操作会将YAML模型编译为

target/mdl.json

。如果编译失败，需先修复问题再继续（见下方故障排除）。

Step 2: Validate with a real query

步骤2：通过真实查询验证

Run at least one query per generated model to confirm the project is functional:

bash

undefined

为每个生成的模型至少运行一次查询，确认项目可用：

bash

undefined

For each model, verify it resolves correctly

针对每个模型，验证其可正确解析

wren --sql 'SELECT COUNT(*) as total FROM "<table_name>"'


If any query fails, debug and fix the model before moving on. Common issues:
- Wrong catalog in table_reference → "table not found"
- Type mismatch → fix the column type in metadata.yml
- Missing profile → check `wren profile list`

wren --sql 'SELECT COUNT(*) as total FROM "<table_name>"'


如果任何查询失败，需先调试并修复模型再继续。常见问题：
- `table_reference.catalog`设置错误 → “表未找到”
- 类型不匹配 → 在metadata.yml中修复列类型
- 缺少配置文件 → 检查`wren profile list`

Step 3: Run interesting queries

步骤3：运行实用查询

Once basic queries pass, run 2–3 more interesting queries to show the user what their data looks like:

bash

undefined

基础查询通过后，运行2-3个更实用的查询，向用户展示其数据情况：

bash

undefined

Preview data

预览数据

wren --sql 'SELECT * FROM "<table_name>" LIMIT 5'

If there's a relationship, verify both models are queryable

如果有关联关系，验证两个模型均可查询

wren --sql 'SELECT * FROM "<parent>" LIMIT 5' wren --sql 'SELECT * FROM "<child>" LIMIT 5'


Show the results to the user and explain what they're seeing. This is their first look at the data through Wren — make it count.

wren --sql 'SELECT * FROM "<parent>" LIMIT 5' wren --sql 'SELECT * FROM "<child>" LIMIT 5'


向用户展示结果并解释内容。这是用户首次通过Wren查看数据，需确保体验良好。

Step 4: Confirm success

步骤4：确认成功

Only after queries return real data, tell the user the setup is complete. Summarize:

How many models were created
What relationships were detected
Which profile is active
Example queries they can try next

只有当查询返回真实数据后，才可告知用户设置已完成。总结：

创建的模型数量
检测到的关联关系
当前激活的配置文件
可供尝试的示例查询

Troubleshooting

故障排除

wren context build

fails:

Check that
```
data_source: duckdb
```
is set in
```
wren_project.yml
```
Verify the DuckDB file path in the profile is correct
Run
```
wren context validate
```
for detailed error messages

If queries fail with "table not found":

Most likely cause:
```
table_reference.catalog
```
doesn't match the DuckDB filename. If the file is
```
pipeline.duckdb
```
, the catalog must be
```
pipeline
```
, not empty string.
Check the profile's
```
url
```
points to the directory containing the
```
.duckdb
```
file
Table names with double underscores need quoting:
```
"hubspot__contacts"
```

If queries fail with type errors:

Check column types in the model YAML — they should be canonical SQL types (VARCHAR, BIGINT, etc.)
Re-run
```
introspect_dlt.py
```
with wren SDK installed to get proper type normalization

General:

Check that the profile is active:
```
wren profile list
```
The DuckDB file might be locked if a dlt pipeline is running — wait for it to finish

如果

wren context build

失败：

检查
```
wren_project.yml
```
中是否设置了
```
data_source: duckdb
```
验证配置文件中的DuckDB文件路径是否正确
运行
```
wren context validate
```
获取详细错误信息

如果查询失败并提示“表未找到”：

最可能原因：
```
table_reference.catalog
```
与DuckDB文件名不匹配。如果文件名为
```
pipeline.duckdb
```
，则catalog必须为
```
pipeline
```
，而非空字符串。
检查配置文件的
```
url
```
是否指向包含
```
.duckdb
```
文件的目录
含双下划线的表名需加引号：
```
"hubspot__contacts"
```

如果查询失败并提示类型错误：

检查模型YAML中的列类型——应为标准SQL类型（VARCHAR、BIGINT等）
安装Wren SDK后重新运行
```
introspect_dlt.py
```
以获取正确的类型标准化结果

通用问题：

检查配置文件是否激活：
```
wren profile list
```
如果dlt管道正在运行，DuckDB文件可能被锁定——等待管道运行完成

Important notes

重要说明

dlt's
```
_dlt_parent_id
```
/
```
_dlt_id
```
columns are kept in the actual DuckDB tables but hidden from Wren model definitions. They're only used in relationship conditions.
DuckDB has a single-writer limitation. Don't run a dlt sync while querying. For concurrent access, dlt should write to a separate file and swap atomically.
The generated models use
```
table_reference
```
(not
```
ref_sql
```
) since they map directly to DuckDB tables created by dlt.
Column types are normalized using wren SDK's
```
parse_type()
```
with sqlglot's DuckDB dialect. If a type looks wrong, the user can edit the model's
```
metadata.yml
```
directly.

dlt的
```
_dlt_parent_id
```
/
```
_dlt_id
```
列会保留在实际DuckDB表中，但会从Wren模型定义中隐藏。它们仅用于关联关系条件。
DuckDB存在单写入者限制。查询时请勿运行dlt同步。如需并发访问，dlt应写入单独文件并以原子方式替换。
生成的模型使用
```
table_reference
```
（而非
```
ref_sql
```
），因为它们直接映射到dlt创建的DuckDB表。
列类型通过Wren SDK的
```
parse_type()
```
结合sqlglot的DuckDB方言进行标准化。如果类型显示错误，用户可直接编辑模型的
```
metadata.yml
```
文件。