metabase-database-metadata
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMetabase Database Metadata Format
Metabase数据库元数据格式
Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like instead of database identifiers.
["Sample Database", "PUBLIC", "ORDERS"]The format is defined by a specification bundled alongside this file as (upstream source: metabase/database-metadata). The same project ships a CLI ( on npm) that converts the raw JSON exported from a Metabase instance into the YAML tree described by the spec.
spec.md@metabase/database-metadataMetabase将数据库元数据(同步的数据库、其包含的表及字段)以YAML文件树的形式存储。文件具备差异友好性:完全省略了数字ID,外键使用这类自然键元组,而非数据库标识符。
["Sample Database", "PUBLIC", "ORDERS"]该格式由与本文档捆绑的定义(上游源码:metabase/database-metadata)。同一项目提供了一个CLI工具(npm上的),可将从Metabase实例导出的原始JSON转换为该规范描述的YAML文件树。
spec.md@metabase/database-metadataCanonical layout
标准目录结构
All metadata for a project lives under a top-level directory:
.metadata/- — the YAML tree. This is the canonical source for the agent. Read these files to understand the schema, columns, types, and FK relationships.
.metadata/databases/ - — the raw JSON exported from the Metabase instance. Potentially multi-megabyte (or multi-gigabyte) JSON with flat
.metadata/table_metadata.json/databases/tablesarrays. Never open, grep, or pass it to tools. It exists only as input to the extractor.fields
The directory should be gitignored. On large warehouses the extracted metadata can reach gigabytes — committing it would make the repo painful or unusable.
.metadata/项目的所有元数据都存储在顶级目录下:
.metadata/- —— YAML文件树。这是Agent的标准数据源。读取这些文件可了解数据库的模式、列、类型及外键关系。
.metadata/databases/ - —— 从Metabase实例导出的原始JSON。可能是数兆字节(甚至数吉字节)的大文件,包含扁平化的
.metadata/table_metadata.json/databases/tables数组。切勿打开、搜索或传递给工具。它仅作为提取器的输入存在。fields
.metadata/First-time setup
首次设置
Do not run any of the steps below proactively at session start. Only run them when the user explicitly asks to fetch metadata, set up the workflow, or requests something that plainly requires knowledge of the database schema (e.g. "write a query against ORDERS", "describe what tables exist").
When setup is triggered:
会话开始时请勿主动执行以下任何步骤。仅当用户明确要求获取元数据、设置工作流,或请求明显需要了解数据库模式的内容(例如“编写针对ORDERS表的查询”、“描述现有表的信息”)时,才执行这些步骤。
触发设置流程时:
1. Ensure .metadata/
is gitignored
.metadata/1. 确保.metadata/
已加入.gitignore
.metadata/Read the repo's and confirm is listed. If it isn't, ask the user before modifying — e.g.:
.gitignore.metadata/.gitignoreis not in.metadata/. Committing it would bloat the repo (metadata can be gigabytes). Shall I add it?.gitignore
Only edit after the user confirms.
.gitignore读取仓库的文件,确认已被列出。若未列出,修改前需询问用户,例如:
.gitignore.metadata/未在.gitignore中。提交它会导致仓库臃肿(元数据可达吉字节级别)。是否需要我将其添加进去?.metadata/
仅在用户确认后编辑。
.gitignore2. Export the metadata from Metabase
2. 从Metabase导出元数据
Fetch by calling on the Metabase instance and writing the response to . The endpoint accepts three boolean query parameters that opt sections in or out — they all default to , so requests must explicitly set the sections they want:
table_metadata.jsonPOST /api/ee/serialization/metadata/export.metadata/table_metadata.jsonfalse- — include the
with-databasesarray.databases - — include the
with-tablesarray.tables - — include the
with-fieldsarray.fields
A typical full export sets all three to . The user supplies the base URL and an API key (e.g. via and env vars):
trueMETABASE_URLMETABASE_API_KEYsh
mkdir -p .metadata
curl -sf -X POST "$METABASE_URL/api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metadata/table_metadata.jsonIf the user has not provided credentials, ask for them before running the call.
调用Metabase实例的接口获取,并将响应写入。该接口接受三个布尔型查询参数,用于选择是否包含对应部分——默认均为,因此请求需显式设置所需的部分:
POST /api/ee/serialization/metadata/exporttable_metadata.json.metadata/table_metadata.jsonfalse- —— 包含
with-databases数组。databases - —— 包含
with-tables数组。tables - —— 包含
with-fields数组。fields
典型的完整导出会将三者均设为。用户需提供基础URL和API密钥(例如通过和环境变量):
trueMETABASE_URLMETABASE_API_KEYsh
mkdir -p .metadata
curl -sf -X POST "$METABASE_URL/api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metadata/table_metadata.json若用户未提供凭据,执行调用前需向用户索要。
3. Extract
3. 提取元数据
Once is in place:
.metadata/table_metadata.jsonsh
rm -rf .metadata/databases
npx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databasesThen read the YAML tree under to answer the user's question.
.metadata/databases/当准备就绪后:
.metadata/table_metadata.jsonsh
rm -rf .metadata/databases
npx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases随后读取下的YAML文件树,以回答用户的问题。
.metadata/databases/Session start behaviour
会话启动行为
At the start of a session, do not run any fetch commands. Just observe what's on disk:
- If and
.metadata/table_metadata.jsonboth exist, assume the tree is sufficiently up to date and use it directly. Do not refetch..metadata/databases/ - If the tree is missing or only partial, do nothing until the user asks for something that needs it — then fall into the first-time-setup flow above.
If something in the tree looks stale or inconsistent while you're using it, mention it to the user and let them decide whether to refetch. Never refresh silently.
会话开始时,请勿执行任何获取命令。仅检查磁盘上的现有内容:
- 若和
.metadata/table_metadata.json均存在,默认文件树已足够新,直接使用即可。无需重新获取。.metadata/databases/ - 若文件树缺失或不完整,无需操作,直到用户请求需要它的内容时,再进入上述首次设置流程。
使用过程中若发现文件树看起来过时或不一致,需告知用户并让其决定是否重新获取。切勿静默刷新。
Refreshing (user-initiated only)
刷新元数据(仅用户主动触发)
If the user explicitly asks to refresh metadata, re-run the export call to overwrite , then re-run the extract step. Always remove before re-extracting so stale files are not left behind.
.metadata/table_metadata.json.metadata/databases若用户明确要求刷新元数据,重新执行导出调用以覆盖,然后重新执行提取步骤。重新提取前务必删除,避免遗留过时文件。
.metadata/table_metadata.json.metadata/databasesEntities
实体类型
Three entity types, two file types:
| Entity | File | Description |
|---|---|---|
| Database | | A connected data source (Postgres, MySQL, BigQuery, etc.). Identified by name. |
| Table | | A physical table or view. Contains a |
| Field | (nested inside a Table YAML, no separate file) | A column. Includes |
三种实体类型,对应两种文件类型:
| 实体 | 文件 | 描述 |
|---|---|---|
| Database | | 已连接的数据源(Postgres、MySQL、BigQuery等)。按名称标识。 |
| Table | | 物理表或视图。包含一个 |
| Field | (嵌套在Table的YAML文件中,无单独文件) | 列。包含 |
Foreign keys
外键
Foreign keys use natural-key tuples, not numeric IDs:
- Database FK: the database name (string) — e.g.
"Sample Database" - Table FK: — e.g.
[database, schema_or_null, table]["Sample Database", "PUBLIC", "ORDERS"] - Field FK: — e.g.
[database, schema_or_null, table, field, ...nested_field_names]for a JSON-unfolded column["Sample Database", "PUBLIC", "EVENTS", "DATA", "user", "name"]DATA.user.name
Field-level FKs show up as (nested field parent) and (referenced PK for FK columns).
parent_idfk_target_field_id外键使用自然键元组,而非数字ID:
- 数据库外键:数据库名称(字符串)——例如
"Sample Database" - 表外键:——例如
[database, schema_or_null, table]["Sample Database", "PUBLIC", "ORDERS"] - 字段外键:——例如针对JSON展开列
[database, schema_or_null, table, field, ...nested_field_names]的DATA.user.name["Sample Database", "PUBLIC", "EVENTS", "DATA", "user", "name"]
字段级外键以(嵌套字段的父字段)和(外键列引用的主键)的形式呈现。
parent_idfk_target_field_idType attributes on fields
字段的类型属性
- — the raw native type string from the driver (
database_type,BIGINT,VARCHAR, etc.). Database-specific.JSONB - — the Metabase type matching the native type (
base_type,type/BigInteger,type/Text, etc.).type/Structured - — the type Metabase treats the column as at query time. Only emitted when it differs from
effective_type(i.e. coercion is configured).base_type - — the rule producing
coercion_strategyfromeffective_type(e.g.base_type,Coercion/ISO8601->DateTime).Coercion/UNIXMilliSeconds->DateTime - — business-domain label (
semantic_type,type/PK,type/FK,type/Email,type/Category, etc.). Drives UI and some analytical behavior.type/Latitude
See the bundled spec for the full type hierarchy and available coercion strategies.
- —— 驱动返回的原始原生类型字符串(
database_type、BIGINT、VARCHAR等)。与数据库相关。JSONB - —— 与原生类型匹配的Metabase类型(
base_type、type/BigInteger、type/Text等)。type/Structured - —— Metabase在查询时将列视为的类型。仅当与
effective_type不同时才会输出(即配置了类型转换)。base_type - —— 将
coercion_strategy转换为base_type的规则(例如effective_type、Coercion/ISO8601->DateTime)。Coercion/UNIXMilliSeconds->DateTime - —— 业务领域标签(
semantic_type、type/PK、type/FK、type/Email、type/Category等)。用于驱动UI和部分分析行为。type/Latitude
完整的类型层级和可用转换策略请参考捆绑的规范文档。
Reading the spec
阅读规范文档
This skill ships with a local snapshot of the spec as , alongside .
spec.mdSKILL.mdRead it on demand, not eagerly. Open only when you actually need detail beyond what summarizes — e.g. the full base-type / semantic-type hierarchy, the complete list of coercion strategies, or the exact folder-path rules. Do not open it at session start, and do not open it for tasks unrelated to the metadata tree.
spec.mdSKILL.mdIf the bundled copy looks out of date with the upstream package, the skill's own documents how to refresh it with .
README.mdextract-spec本技能附带了一份本地快照版的规范文档,与放在一起。
spec.mdSKILL.md按需阅读,无需提前阅读。仅当需要摘要之外的详细信息时,才打开——例如完整的基础类型/语义类型层级、转换策略的完整列表,或精确的文件夹路径规则。会话开始时请勿打开,与元数据文件树无关的任务也请勿打开。
SKILL.mdspec.md若本地快照版与上游包不一致,技能自身的文档说明了如何使用命令刷新它。
README.mdextract-spec