Metabase Database Metadata Format
Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are
diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like
["Sample Database", "PUBLIC", "ORDERS"]
instead of database identifiers.
The format is defined by a specification bundled alongside this file as
(upstream source:
metabase/database-metadata). The same project ships a CLI (
@metabase/database-metadata
on npm) that converts the raw JSON exported from a Metabase instance into the YAML tree described by the spec.
Canonical layout
All metadata for a project lives under a top-level
directory:
- — the YAML tree. This is the canonical source for the agent. Read these files to understand the schema, columns, types, and FK relationships.
.metadata/table_metadata.json
— the raw JSON exported from the Metabase instance. Potentially multi-megabyte (or multi-gigabyte) JSON with flat / / arrays. Never open, grep, or pass it to tools. It exists only as input to the extractor.
The
directory should be gitignored. On large warehouses the extracted metadata can reach gigabytes — committing it would make the repo painful or unusable.
First-time setup
Do not run any of the steps below proactively at session start. Only run them when the user explicitly asks to fetch metadata, set up the workflow, or requests something that plainly requires knowledge of the database schema (e.g. "write a query against ORDERS", "describe what tables exist").
When setup is triggered:
1. Ensure is gitignored
Read the repo's
and confirm
is listed. If it isn't,
ask the user before modifying — e.g.:
is not in
. Committing it would bloat the repo (metadata can be gigabytes). Shall I add it?
Only edit
after the user confirms.
2. Export the metadata from Metabase
Fetch
by calling
POST /api/ee/serialization/metadata/export
on the Metabase instance and writing the response to
.metadata/table_metadata.json
. The endpoint accepts three boolean query parameters that opt sections in or out — they all default to
, so requests must explicitly set the sections they want:
- — include the array.
- — include the array.
- — include the array.
A typical full export sets all three to
. The user supplies the base URL and an API key (e.g. via
and
env vars):
sh
mkdir -p .metadata
curl -sf -X POST "$METABASE_URL/api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metadata/table_metadata.json
If the user has not provided credentials, ask for them before running the call.
3. Extract
Once
.metadata/table_metadata.json
is in place:
sh
rm -rf .metadata/databases
npx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases
Then read the YAML tree under
to answer the user's question.
Session start behaviour
At the start of a session, do not run any fetch commands. Just observe what's on disk:
- If
.metadata/table_metadata.json
and both exist, assume the tree is sufficiently up to date and use it directly. Do not refetch.
- If the tree is missing or only partial, do nothing until the user asks for something that needs it — then fall into the first-time-setup flow above.
If something in the tree looks stale or inconsistent while you're using it, mention it to the user and let them decide whether to refetch. Never refresh silently.
Refreshing (user-initiated only)
If the user explicitly asks to refresh metadata, re-run the export call to overwrite
.metadata/table_metadata.json
, then re-run the extract step. Always remove
before re-extracting so stale files are not left behind.
Entities
Three entity types, two file types:
| Entity | File | Description |
|---|
| Database | .metadata/databases/{db}/{db}.yaml
| A connected data source (Postgres, MySQL, BigQuery, etc.). Identified by name. |
| Table | .metadata/databases/{db}/schemas/{schema}/tables/{table}.yaml
(or for schemaless DBs) | A physical table or view. Contains a array with all its columns nested inline. |
| Field | (nested inside a Table YAML, no separate file) | A column. Includes , , and optionally , , , , . |
Foreign keys
Foreign keys use natural-key tuples, not numeric IDs:
- Database FK: the database name (string) — e.g.
- Table FK:
[database, schema_or_null, table]
— e.g. ["Sample Database", "PUBLIC", "ORDERS"]
- Field FK:
[database, schema_or_null, table, field, ...nested_field_names]
— e.g. ["Sample Database", "PUBLIC", "EVENTS", "DATA", "user", "name"]
for a JSON-unfolded column
Field-level FKs show up as
(nested field parent) and
(referenced PK for FK columns).
Type attributes on fields
- — the raw native type string from the driver (, , , etc.). Database-specific.
- — the Metabase type matching the native type (, , , etc.).
- — the type Metabase treats the column as at query time. Only emitted when it differs from (i.e. coercion is configured).
- — the rule producing from (e.g.
Coercion/ISO8601->DateTime
, Coercion/UNIXMilliSeconds->DateTime
).
- — business-domain label (, , , , , etc.). Drives UI and some analytical behavior.
See the bundled spec for the full type hierarchy and available coercion strategies.
Reading the spec
This skill ships with a local snapshot of the spec as
, alongside
.
Read it on demand, not eagerly. Open
only when you actually need detail beyond what
summarizes — e.g. the full base-type / semantic-type hierarchy, the complete list of coercion strategies, or the exact folder-path rules. Do not open it at session start, and do not open it for tasks unrelated to the metadata tree.
If the bundled copy looks out of date with the upstream package, the skill's own
documents how to refresh it with
.