metabase-semantic-checker

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Metabase semantic checker

Metabase语义检查器

The semantic checker validates a tree of Metabase Representation Format YAML files for referential integrity. Schema-level validation (shape of each file, required fields, enum values) is handled separately by
npx @metabase/representations validate-schema
; the semantic checker runs after schema validation and focuses on cross-file and cross-system consistency.
It answers questions like:
  • Does every
    collection_id
    ,
    parent_id
    ,
    dashboard_id
    ,
    document_id
    ,
    based_on_card_id
    , transform tag, snippet name, etc. resolve to an entity that actually exists in the tree?
  • For each MBQL query, do every
    source-table
    , field reference, join target, segment, measure, and expression resolve against the database schema?
  • For each native query, do the referenced tables, columns, and snippets exist?
  • Do dashboards' and documents' embedded card references point at real cards?
The checker ships inside the Metabase Enterprise JAR and is invoked via
--mode checker
. Default Docker image:
metabase/metabase-enterprise:latest
. Use
metabase/metabase-enterprise-head:latest
only when the user explicitly wants the in-development build — e.g. testing unreleased checker changes.
语义检查器会验证Metabase Representation Format YAML文件目录的引用完整性。模式级验证(每个文件的结构、必填字段、枚举值)由
npx @metabase/representations validate-schema
单独处理;语义检查器在模式验证之后运行,重点关注跨文件和跨系统的一致性。
它可以回答以下问题:
  • 每个
    collection_id
    parent_id
    dashboard_id
    document_id
    based_on_card_id
    、转换标签、代码片段名称等是否都能解析为目录中实际存在的实体?
  • 对于每个MBQL查询,所有
    source-table
    、字段引用、连接目标、分段、度量和表达式是否都能与数据库模式匹配?
  • 对于每个原生查询,引用的表、列和代码片段是否存在?
  • 仪表板和文档中嵌入的卡片引用是否指向真实存在的卡片?
该检查器包含在Metabase Enterprise JAR中,可通过
--mode checker
调用。默认Docker镜像为
metabase/metabase-enterprise:latest
。仅当用户明确需要开发版本时才使用
metabase/metabase-enterprise-head:latest
——例如测试未发布的检查器更新。

Inputs

输入项

Two inputs, both required:
  • The representation tree — the repo root containing
    collections/
    ,
    databases/
    ,
    transforms/
    ,
    python_libraries/
    . This is what gets checked.
  • The database metadata — a JSON file produced by
    GET /api/database/metadata
    . By default located at
    .metabase/metadata.json
    .
    The checker uses it to resolve column/table references inside queries; without it, query-level checks cannot run.
If
.metabase/metadata.json
is missing, do not run the checker. Instead, tell the user it needs to be fetched first and defer to the
metabase-database-metadata
skill (which handles
.env
, credentials, and the fetch). Only run the checker once the metadata file is present on disk.
需要两个必填输入项:
  • 表示文件目录 — 包含
    collections/
    databases/
    transforms/
    python_libraries/
    的仓库根目录,这是检查的目标对象。
  • 数据库元数据 — 由
    GET /api/database/metadata
    生成的JSON文件。默认路径为
    .metabase/metadata.json
    。检查器使用它解析查询中的列/表引用;没有该文件,查询级检查无法运行。
如果
.metabase/metadata.json
缺失,请不要运行检查器。相反,告知用户需要先获取该文件,并引导使用
metabase-database-metadata
技能(该技能处理
.env
、凭据和获取流程)。仅当磁盘上存在元数据文件时,才可运行检查器。

When to run

运行时机

Run the semantic checker once, after you are done making changes to representation YAML files — editing a card's query, renaming a collection, re-parenting entities, adding/removing snippets or transform tags, etc. A passing schema check does not catch broken cross-entity references or query columns that no longer exist; the semantic checker does. Treat it as the second half of local validation, paired with
npx @metabase/representations validate-schema
.
Batch it — don't run between edits. Each invocation spins up the Metabase JVM and loads the database metadata, which takes roughly a minute of fixed overhead before any checks run. Running it after every individual edit wastes that minute on each edit and bogs the session down. Make all the YAML changes you intend to make, then run the checker once at the end. If it surfaces issues, fix them and re-run — but again, fix everything you can see in one pass before re-running.
Outside of that, do not run it proactively at session start. At session start, just observe what's on disk — do not refresh metadata, do not pull the Docker image. Only run when the user explicitly asks, or once you have finished a batch of YAML edits.
在完成对表示YAML文件的所有修改后,运行一次语义检查器——例如编辑卡片查询、重命名集合、重新设置实体父级、添加/删除代码片段或转换标签等。通过模式检查并不代表能发现无效的跨实体引用或已不存在的查询列;而语义检查器可以做到这一点。将其视为本地验证的第二部分,与
npx @metabase/representations validate-schema
配合使用。
批量操作——不要在每次编辑后都运行。每次调用都会启动Metabase JVM并加载数据库元数据,这需要大约一分钟的固定准备时间,之后才会开始检查。每次编辑后都运行检查器会浪费这一分钟的时间,拖慢会话速度。先完成所有计划的YAML修改,最后再运行一次检查器。如果发现问题,修复后重新运行——但同样,先一次性修复所有可见问题后再重新运行。
除此之外,不要在会话开始时主动运行检查器。会话开始时,只需查看磁盘上的内容——无需刷新元数据,无需拉取Docker镜像。仅当用户明确要求,或完成一批YAML修改后再运行。

Running the checker

运行检查器

Once
.metabase/metadata.json
exists and Docker is available:
sh
docker pull metabase/metabase-enterprise:latest

docker run --rm \
  -v "$PWD:/workspace" \
  --entrypoint "" \
  -w /app \
  metabase/metabase-enterprise:latest \
  java -jar metabase.jar \
    --mode checker \
    --export /workspace \
    --schema-dir /workspace/.metabase/metadata.json \
    --schema-format concise
Flag reference:
  • --mode checker
    — selects semantic-check mode (skips server startup, import, etc.).
  • --export /workspace
    — path inside the container to the representation tree root. With the
    -v "$PWD:/workspace"
    mount above, this maps to the current repo root on the host.
  • --schema-dir /workspace/.metabase/metadata.json
    — path to the database metadata JSON. Despite the
    -dir
    suffix the flag accepts a single JSON file. Point it elsewhere only if the user has stored metadata at a non-default path.
  • --schema-format concise
    — format the input metadata is in.
    concise
    matches what
    @metabase/database-metadata
    /
    GET /api/database/metadata
    produce. Do not change unless the user explicitly has a different dump format.
The container needs no network access for the check itself — pull the image first if the host is offline-prone.
Exit code is non-zero on findings. Surface the checker's stdout/stderr verbatim to the user; do not summarize away specific paths or entity names, since those are how the user locates the broken reference.
.metabase/metadata.json
存在且Docker可用时:
sh
docker pull metabase/metabase-enterprise:latest

docker run --rm \
  -v "$PWD:/workspace" \
  --entrypoint "" \
  -w /app \
  metabase/metabase-enterprise:latest \
  java -jar metabase.jar \
    --mode checker \
    --export /workspace \
    --schema-dir /workspace/.metabase/metadata.json \
    --schema-format concise
参数说明:
  • --mode checker
    — 选择语义检查模式(跳过服务器启动、导入等步骤)。
  • --export /workspace
    — 容器内表示文件目录根路径的位置。通过上述
    -v "$PWD:/workspace"
    挂载,该路径会映射到主机上当前的仓库根目录。
  • --schema-dir /workspace/.metabase/metadata.json
    — 数据库元数据JSON文件的路径。尽管后缀是
    -dir
    ,该参数仍接受单个JSON文件。仅当用户将元数据存储在非默认路径时,才需要指定其他位置。
  • --schema-format concise
    — 输入元数据的格式。
    concise
    @metabase/database-metadata
    /
    GET /api/database/metadata
    生成的格式一致。除非用户明确使用其他导出格式,否则不要修改该参数。
检查过程本身不需要容器具备网络访问权限——如果主机网络不稳定,请先拉取镜像。
若检查出问题,退出码为非零值。将检查器的标准输出/错误信息原封不动地展示给用户;不要省略具体路径或实体名称,因为这些信息是用户定位无效引用的关键。

Common failure modes

常见失败场景

  • "Database metadata not found" / schema load errors
    .metabase/metadata.json
    is missing, stale, or malformed. Refer the user to the
    metabase-database-metadata
    skill for a fresh fetch.
  • Unknown collection / card / dashboard / snippet / tag reference — the referenced
    entity_id
    or name does not exist in the tree. Either the target YAML is missing, or the reference is a typo; grep the tree for the id/name to confirm which.
  • Unknown table or field inside a query — the query references a column that the database metadata doesn't know about. Either the warehouse schema has drifted (refetch metadata), or the query itself is wrong.
  • Docker image missing / not pulled — run
    docker pull metabase/metabase-enterprise:latest
    first. On slow networks warn the user; the image is multi-hundred-MB.
  • “数据库元数据未找到”/模式加载错误
    .metabase/metadata.json
    缺失、过时或格式错误。引导用户使用
    metabase-database-metadata
    技能重新获取。
  • 未知集合/卡片/仪表板/代码片段/标签引用 — 引用的
    entity_id
    或名称在目录中不存在。要么目标YAML文件缺失,要么引用存在拼写错误;在目录中搜索该ID/名称即可确认原因。
  • 查询中存在未知表或字段 — 查询引用了数据库元数据中不存在的列。要么数据仓库模式已变更(重新获取元数据),要么查询本身存在错误。
  • Docker镜像缺失/未拉取 — 先运行
    docker pull metabase/metabase-enterprise:latest
    。如果网络较慢,提醒用户;该镜像大小为数百MB。

Relationship to other skills

与其他技能的关联

  • metabase-representation-format
    — defines the YAML shape the checker reads. Use it when the user is editing or creating representation files.
  • metabase-database-metadata
    — owns the
    .metabase/metadata.json
    file and the fetch/refresh flow. Invoke it whenever the metadata file is missing, stale, or the user explicitly asks to refresh it before re-running the checker.
  • Schema-level validation (
    npx @metabase/representations validate-schema
    ) — the fast, local-only check that runs in the
    Schema Check
    CI workflow and does not need database metadata. Essentially instant; run it freely between edits. The semantic checker assumes schema-valid input, so run schema validation first if a file looks structurally wrong.
  • metabase-representation-format
    — 定义检查器读取的YAML结构。当用户编辑或创建表示文件时使用该技能。
  • metabase-database-metadata
    — 负责
    .metabase/metadata.json
    文件的管理以及获取/刷新流程。每当元数据文件缺失、过时,或用户明确要求在重新运行检查器前刷新元数据时,调用该技能。
  • 模式级验证
    npx @metabase/representations validate-schema
    ) — 快速的本地检查,在
    Schema Check
    CI工作流中运行,不需要数据库元数据。几乎可以立即完成;可在每次编辑后自由运行。语义检查器假设输入已通过模式验证,因此如果文件结构看起来有问题,请先运行模式验证。