metabase-database-metadata

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Metabase Database Metadata Format

Metabase数据库元数据格式

Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like
["Sample Database", "PUBLIC", "ORDERS"]
instead of database identifiers.
The format is defined by a specification bundled alongside this file as
spec.md
(upstream source: metabase/database-metadata). The same project ships a CLI (
@metabase/database-metadata
on npm) that converts the raw JSON exported from a Metabase instance into the YAML tree described by the spec.
Metabase将数据库元数据(同步的数据库、其包含的表及字段)以YAML文件树的形式存储。文件具备差异友好性:完全省略了数字ID,外键使用
["Sample Database", "PUBLIC", "ORDERS"]
这类自然键元组,而非数据库标识符。
该格式由与本文档捆绑的
spec.md
定义(上游源码:metabase/database-metadata)。同一项目提供了一个CLI工具(npm上的
@metabase/database-metadata
),可将从Metabase实例导出的原始JSON转换为该规范描述的YAML文件树。

Canonical layout

标准目录结构

All metadata for a project lives under a top-level
.metadata/
directory:
  • .metadata/databases/
    — the YAML tree. This is the canonical source for the agent. Read these files to understand the schema, columns, types, and FK relationships.
  • .metadata/table_metadata.json
    — the raw JSON exported from the Metabase instance. Potentially multi-megabyte (or multi-gigabyte) JSON with flat
    databases
    /
    tables
    /
    fields
    arrays. Never open, grep, or pass it to tools. It exists only as input to the extractor.
The
.metadata/
directory should be gitignored. On large warehouses the extracted metadata can reach gigabytes — committing it would make the repo painful or unusable.
项目的所有元数据都存储在顶级目录
.metadata/
下:
  • .metadata/databases/
    —— YAML文件树。这是Agent的标准数据源。读取这些文件可了解数据库的模式、列、类型及外键关系。
  • .metadata/table_metadata.json
    —— 从Metabase实例导出的原始JSON。可能是数兆字节(甚至数吉字节)的大文件,包含扁平化的
    databases
    /
    tables
    /
    fields
    数组。切勿打开、搜索或传递给工具。它仅作为提取器的输入存在。
.metadata/
目录应添加到.gitignore中。在大型数据仓库中,提取后的元数据可能达到吉字节级别——提交到仓库会导致仓库臃肿甚至无法使用。

First-time setup

首次设置

Do not run any of the steps below proactively at session start. Only run them when the user explicitly asks to fetch metadata, set up the workflow, or requests something that plainly requires knowledge of the database schema (e.g. "write a query against ORDERS", "describe what tables exist").
When setup is triggered:
会话开始时请勿主动执行以下任何步骤。仅当用户明确要求获取元数据、设置工作流,或请求明显需要了解数据库模式的内容(例如“编写针对ORDERS表的查询”、“描述现有表的信息”)时,才执行这些步骤。
触发设置流程时:

1. Ensure
.metadata/
is gitignored

1. 确保
.metadata/
已加入.gitignore

Read the repo's
.gitignore
and confirm
.metadata/
is listed. If it isn't, ask the user before modifying
.gitignore
— e.g.:
.metadata/
is not in
.gitignore
. Committing it would bloat the repo (metadata can be gigabytes). Shall I add it?
Only edit
.gitignore
after the user confirms.
读取仓库的
.gitignore
文件,确认
.metadata/
已被列出。若未列出,修改前需询问用户,例如:
.metadata/
未在.gitignore中。提交它会导致仓库臃肿(元数据可达吉字节级别)。是否需要我将其添加进去?
仅在用户确认后编辑
.gitignore

2. Export the metadata from Metabase

2. 从Metabase导出元数据

Fetch
table_metadata.json
by calling
POST /api/ee/serialization/metadata/export
on the Metabase instance and writing the response to
.metadata/table_metadata.json
. The endpoint accepts three boolean query parameters that opt sections in or out — they all default to
false
, so requests must explicitly set the sections they want:
  • with-databases
    — include the
    databases
    array.
  • with-tables
    — include the
    tables
    array.
  • with-fields
    — include the
    fields
    array.
A typical full export sets all three to
true
. The user supplies the base URL and an API key (e.g. via
METABASE_URL
and
METABASE_API_KEY
env vars):
sh
mkdir -p .metadata
curl -sf -X POST "$METABASE_URL/api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true" \
  -H "X-API-Key: $METABASE_API_KEY" \
  -o .metadata/table_metadata.json
If the user has not provided credentials, ask for them before running the call.
调用Metabase实例的
POST /api/ee/serialization/metadata/export
接口获取
table_metadata.json
,并将响应写入
.metadata/table_metadata.json
。该接口接受三个布尔型查询参数,用于选择是否包含对应部分——默认均为
false
,因此请求需显式设置所需的部分:
  • with-databases
    —— 包含
    databases
    数组。
  • with-tables
    —— 包含
    tables
    数组。
  • with-fields
    —— 包含
    fields
    数组。
典型的完整导出会将三者均设为
true
。用户需提供基础URL和API密钥(例如通过
METABASE_URL
METABASE_API_KEY
环境变量):
sh
mkdir -p .metadata
curl -sf -X POST "$METABASE_URL/api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true" \
  -H "X-API-Key: $METABASE_API_KEY" \
  -o .metadata/table_metadata.json
若用户未提供凭据,执行调用前需向用户索要。

3. Extract

3. 提取元数据

Once
.metadata/table_metadata.json
is in place:
sh
rm -rf .metadata/databases
npx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases
Then read the YAML tree under
.metadata/databases/
to answer the user's question.
.metadata/table_metadata.json
准备就绪后:
sh
rm -rf .metadata/databases
npx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases
随后读取
.metadata/databases/
下的YAML文件树,以回答用户的问题。

Session start behaviour

会话启动行为

At the start of a session, do not run any fetch commands. Just observe what's on disk:
  • If
    .metadata/table_metadata.json
    and
    .metadata/databases/
    both exist, assume the tree is sufficiently up to date and use it directly. Do not refetch.
  • If the tree is missing or only partial, do nothing until the user asks for something that needs it — then fall into the first-time-setup flow above.
If something in the tree looks stale or inconsistent while you're using it, mention it to the user and let them decide whether to refetch. Never refresh silently.
会话开始时,请勿执行任何获取命令。仅检查磁盘上的现有内容:
  • .metadata/table_metadata.json
    .metadata/databases/
    均存在,默认文件树已足够新,直接使用即可。无需重新获取。
  • 若文件树缺失或不完整,无需操作,直到用户请求需要它的内容时,再进入上述首次设置流程。
使用过程中若发现文件树看起来过时或不一致,需告知用户并让其决定是否重新获取。切勿静默刷新。

Refreshing (user-initiated only)

刷新元数据(仅用户主动触发)

If the user explicitly asks to refresh metadata, re-run the export call to overwrite
.metadata/table_metadata.json
, then re-run the extract step. Always remove
.metadata/databases
before re-extracting so stale files are not left behind.
若用户明确要求刷新元数据,重新执行导出调用以覆盖
.metadata/table_metadata.json
,然后重新执行提取步骤。重新提取前务必删除
.metadata/databases
,避免遗留过时文件。

Entities

实体类型

Three entity types, two file types:
EntityFileDescription
Database
.metadata/databases/{db}/{db}.yaml
A connected data source (Postgres, MySQL, BigQuery, etc.). Identified by name.
Table
.metadata/databases/{db}/schemas/{schema}/tables/{table}.yaml
(or
.../tables/{table}.yaml
for schemaless DBs)
A physical table or view. Contains a
fields
array with all its columns nested inline.
Field(nested inside a Table YAML, no separate file)A column. Includes
base_type
,
database_type
, and optionally
effective_type
,
semantic_type
,
coercion_strategy
,
parent_id
,
fk_target_field_id
.
三种实体类型,对应两种文件类型:
实体文件描述
Database
.metadata/databases/{db}/{db}.yaml
已连接的数据源(Postgres、MySQL、BigQuery等)。按名称标识。
Table
.metadata/databases/{db}/schemas/{schema}/tables/{table}.yaml
(无模式数据库为
.../tables/{table}.yaml
物理表或视图。包含一个
fields
数组,内嵌所有列的信息。
Field(嵌套在Table的YAML文件中,无单独文件)列。包含
base_type
database_type
,可选包含
effective_type
semantic_type
coercion_strategy
parent_id
fk_target_field_id

Foreign keys

外键

Foreign keys use natural-key tuples, not numeric IDs:
  • Database FK: the database name (string) — e.g.
    "Sample Database"
  • Table FK:
    [database, schema_or_null, table]
    — e.g.
    ["Sample Database", "PUBLIC", "ORDERS"]
  • Field FK:
    [database, schema_or_null, table, field, ...nested_field_names]
    — e.g.
    ["Sample Database", "PUBLIC", "EVENTS", "DATA", "user", "name"]
    for a JSON-unfolded column
    DATA.user.name
Field-level FKs show up as
parent_id
(nested field parent) and
fk_target_field_id
(referenced PK for FK columns).
外键使用自然键元组,而非数字ID:
  • 数据库外键:数据库名称(字符串)——例如
    "Sample Database"
  • 表外键
    [database, schema_or_null, table]
    ——例如
    ["Sample Database", "PUBLIC", "ORDERS"]
  • 字段外键
    [database, schema_or_null, table, field, ...nested_field_names]
    ——例如针对JSON展开列
    DATA.user.name
    ["Sample Database", "PUBLIC", "EVENTS", "DATA", "user", "name"]
字段级外键以
parent_id
(嵌套字段的父字段)和
fk_target_field_id
(外键列引用的主键)的形式呈现。

Type attributes on fields

字段的类型属性

  • database_type
    — the raw native type string from the driver (
    BIGINT
    ,
    VARCHAR
    ,
    JSONB
    , etc.). Database-specific.
  • base_type
    — the Metabase type matching the native type (
    type/BigInteger
    ,
    type/Text
    ,
    type/Structured
    , etc.).
  • effective_type
    — the type Metabase treats the column as at query time. Only emitted when it differs from
    base_type
    (i.e. coercion is configured).
  • coercion_strategy
    — the rule producing
    effective_type
    from
    base_type
    (e.g.
    Coercion/ISO8601->DateTime
    ,
    Coercion/UNIXMilliSeconds->DateTime
    ).
  • semantic_type
    — business-domain label (
    type/PK
    ,
    type/FK
    ,
    type/Email
    ,
    type/Category
    ,
    type/Latitude
    , etc.). Drives UI and some analytical behavior.
See the bundled spec for the full type hierarchy and available coercion strategies.
  • database_type
    —— 驱动返回的原始原生类型字符串(
    BIGINT
    VARCHAR
    JSONB
    等)。与数据库相关。
  • base_type
    —— 与原生类型匹配的Metabase类型(
    type/BigInteger
    type/Text
    type/Structured
    等)。
  • effective_type
    —— Metabase在查询时将列视为的类型。仅当与
    base_type
    不同时才会输出(即配置了类型转换)。
  • coercion_strategy
    —— 将
    base_type
    转换为
    effective_type
    的规则(例如
    Coercion/ISO8601->DateTime
    Coercion/UNIXMilliSeconds->DateTime
    )。
  • semantic_type
    —— 业务领域标签(
    type/PK
    type/FK
    type/Email
    type/Category
    type/Latitude
    等)。用于驱动UI和部分分析行为。
完整的类型层级和可用转换策略请参考捆绑的规范文档。

Reading the spec

阅读规范文档

This skill ships with a local snapshot of the spec as
spec.md
, alongside
SKILL.md
.
Read it on demand, not eagerly. Open
spec.md
only when you actually need detail beyond what
SKILL.md
summarizes — e.g. the full base-type / semantic-type hierarchy, the complete list of coercion strategies, or the exact folder-path rules. Do not open it at session start, and do not open it for tasks unrelated to the metadata tree.
If the bundled copy looks out of date with the upstream package, the skill's own
README.md
documents how to refresh it with
extract-spec
.
本技能附带了一份本地快照版的规范文档
spec.md
,与
SKILL.md
放在一起。
按需阅读,无需提前阅读。仅当需要
SKILL.md
摘要之外的详细信息时,才打开
spec.md
——例如完整的基础类型/语义类型层级、转换策略的完整列表,或精确的文件夹路径规则。会话开始时请勿打开,与元数据文件树无关的任务也请勿打开。
若本地快照版与上游包不一致,技能自身的
README.md
文档说明了如何使用
extract-spec
命令刷新它。