metabase-database-metadata

Metabase Database Metadata Format

Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like

["Sample Database", "PUBLIC", "ORDERS"]

instead of database identifiers.

The format is defined by a specification bundled alongside this file as

spec.md

(upstream source: metabase/database-metadata). The same project ships a CLI (

@metabase/database-metadata

on npm) that converts the raw JSON exported from a Metabase instance into the YAML tree described by the spec.

Canonical layout

All metadata for a project lives under a top-level

.metadata/

directory:

.metadata/databases/
— the YAML tree. This is the canonical source for the agent. Read these files to understand the schema, columns, types, and FK relationships.
.metadata/table_metadata.json
— the raw JSON exported from the Metabase instance. Potentially multi-megabyte (or multi-gigabyte) JSON with flat
```
databases
```
/
```
tables
```
/
```
fields
```
arrays. Never open, grep, or pass it to tools. It exists only as input to the extractor.

The

.metadata/

directory should be gitignored. On large warehouses the extracted metadata can reach gigabytes — committing it would make the repo painful or unusable.

First-time setup

Do not run any of the steps below proactively at session start. Only run them when the user explicitly asks to fetch metadata, set up the workflow, or requests something that plainly requires knowledge of the database schema (e.g. "write a query against ORDERS", "describe what tables exist").

When setup is triggered:

1. Ensure

.metadata/

is gitignored

Read the repo's

.gitignore

and confirm

.metadata/

is listed. If it isn't, ask the user before modifying
.gitignore
— e.g.:

.metadata/
is not in
.gitignore
. Committing it would bloat the repo (metadata can be gigabytes). Shall I add it?

Only edit

.gitignore

after the user confirms.

2. Export the metadata from Metabase

Fetch

table_metadata.json

by calling

POST /api/ee/serialization/metadata/export

on the Metabase instance and writing the response to

.metadata/table_metadata.json

. The endpoint accepts three boolean query parameters that opt sections in or out — they all default to

false

, so requests must explicitly set the sections they want:

```
with-databases
```
— include the
```
databases
```
array.
```
with-tables
```
— include the
```
tables
```
array.
```
with-fields
```
— include the
```
fields
```
array.

A typical full export sets all three to

true

. The user supplies the base URL and an API key (e.g. via

METABASE_URL

and

METABASE_API_KEY

env vars):

mkdir -p .metadata
curl -sf -X POST "$METABASE_URL/api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true" \
  -H "X-API-Key: $METABASE_API_KEY" \
  -o .metadata/table_metadata.json

If the user has not provided credentials, ask for them before running the call.

3. Extract

Once

.metadata/table_metadata.json

is in place:

rm -rf .metadata/databases
npx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases

Then read the YAML tree under

.metadata/databases/

to answer the user's question.

Session start behaviour

At the start of a session, do not run any fetch commands. Just observe what's on disk:

If
```
.metadata/table_metadata.json
```
and
```
.metadata/databases/
```
both exist, assume the tree is sufficiently up to date and use it directly. Do not refetch.
If the tree is missing or only partial, do nothing until the user asks for something that needs it — then fall into the first-time-setup flow above.

If something in the tree looks stale or inconsistent while you're using it, mention it to the user and let them decide whether to refetch. Never refresh silently.

Refreshing (user-initiated only)

If the user explicitly asks to refresh metadata, re-run the export call to overwrite

.metadata/table_metadata.json

, then re-run the extract step. Always remove

.metadata/databases

before re-extracting so stale files are not left behind.

Entities

Three entity types, two file types:

Entity	File	Description
Database	`.metadata/databases/{db}/{db}.yaml`	A connected data source (Postgres, MySQL, BigQuery, etc.). Identified by name.
Table	`.metadata/databases/{db}/schemas/{schema}/tables/{table}.yaml` (or `.../tables/{table}.yaml` for schemaless DBs)	A physical table or view. Contains a `fields` array with all its columns nested inline.
Field	(nested inside a Table YAML, no separate file)	A column. Includes `base_type` , `database_type` , and optionally `effective_type` , `semantic_type` , `coercion_strategy` , `parent_id` , `fk_target_field_id` .

Foreign keys

Foreign keys use natural-key tuples, not numeric IDs:

Database FK: the database name (string) — e.g.
```
"Sample Database"
```

Table FK:

[database, schema_or_null, table]

— e.g.

["Sample Database", "PUBLIC", "ORDERS"]

Field FK:

[database, schema_or_null, table, field, ...nested_field_names]

— e.g.

["Sample Database", "PUBLIC", "EVENTS", "DATA", "user", "name"]

for a JSON-unfolded column

DATA.user.name

Field-level FKs show up as

parent_id

(nested field parent) and

fk_target_field_id

(referenced PK for FK columns).

Type attributes on fields

database_type
— the raw native type string from the driver (
```
BIGINT
```
,
```
VARCHAR
```
,
```
JSONB
```
, etc.). Database-specific.
base_type
— the Metabase type matching the native type (
```
type/BigInteger
```
,
```
type/Text
```
,
```
type/Structured
```
, etc.).
effective_type
— the type Metabase treats the column as at query time. Only emitted when it differs from
```
base_type
```
(i.e. coercion is configured).

coercion_strategy
— the rule producing

effective_type

from

base_type

(e.g.

Coercion/ISO8601->DateTime

Coercion/UNIXMilliSeconds->DateTime

semantic_type
— business-domain label (
```
type/PK
```
,
```
type/FK
```
,
```
type/Email
```
,
```
type/Category
```
,
```
type/Latitude
```
, etc.). Drives UI and some analytical behavior.

See the bundled spec for the full type hierarchy and available coercion strategies.

Reading the spec

This skill ships with a local snapshot of the spec as

spec.md

, alongside

SKILL.md

Read it on demand, not eagerly. Open

spec.md

only when you actually need detail beyond what

SKILL.md

summarizes — e.g. the full base-type / semantic-type hierarchy, the complete list of coercion strategies, or the exact folder-path rules. Do not open it at session start, and do not open it for tasks unrelated to the metadata tree.

If the bundled copy looks out of date with the upstream package, the skill's own

README.md

documents how to refresh it with

extract-spec

metabase-database-metadata

NPX Install

Tags

SKILL.md Content