table-validation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Table Validation

数据表验证

Verify the column contract of a table that was just created or written: columns present, declared types correct, nullability correct. This skill is deliberately narrow.
验证刚创建或写入的数据表的列契约:确保列存在、声明的类型正确、可空性正确。本技能的适用范围特意限定得很窄。

Target shape (important)

目标格式(重要)

ValidationHook.on_end
invokes this skill with the whole session, not a single table. The target you receive is a
SessionTarget
whose
.targets
is a list of table records matching this skill's
targets: [{type: table}]
filter. When a node writes multiple tables (CTAS scaffolding, layered ETL), loop over
session.targets
and run the checks below independently for each
TableTarget
. Transfer targets are covered by
transfer-reconciliation
. Emit one
CheckResult
per (target, check) pair so the retry prompt can tell the agent which specific table failed.
Explicitly out of scope:
  • Object exists and row count > 0 — already checked by the builtin validation layer before this skill runs. The hook supplies you with those results in the precheck context; do not re-run
    describe_table
    just to confirm the table exists.
  • Data-content assertions (null ratios, value ranges, accepted values, regex format, duplicates, uniqueness). CTAS from an empty source, idempotent upserts, schema-only bootstrapping, and partition scaffolding are legitimate patterns that produce zero-row tables; blocking on those would cause false positives. If you need data-content rules for a specific table, author a project-level validator skill under
    ./.datus/skills/
    or
    ~/.datus/skills/
    with a
    targets:
    filter.
ValidationHook.on_end
会传入整个会话而非单个数据表来调用本技能。你收到的目标是一个
SessionTarget
,其
.targets
属性是符合本技能
targets: [{type: table}]
过滤器的数据表记录列表。当某个节点写入多个数据表时(如CTAS框架、分层ETL),请遍历
session.targets
,并针对每个
TableTarget
独立执行以下检查。传输目标由
transfer-reconciliation
处理。针对每个(目标,检查)对输出一个
CheckResult
,这样重试提示就能告知Agent具体哪个数据表验证失败。
明确排除的范围
  • 对象存在行数>0 —— 在本技能运行前,内置验证层已完成这些检查。钩子会在预检查上下文中提供这些结果;请勿为确认数据表存在而重新调用
    describe_table
  • 数据内容断言(空值比例、值范围、可接受值、正则格式、重复项、唯一性)。从空源执行CTAS、幂等更新、仅架构初始化和分区框架都是合法的模式,可能会生成零行数据表;若在此处阻塞这些情况会导致误报。如果特定数据表需要数据内容规则,请在
    ./.datus/skills/
    ~/.datus/skills/
    下编写一个项目级验证器技能,并配置
    targets:
    过滤器。

Checks in scope

检查范围

  1. Column set — every expected column name appears in the actual table, and (when strict match is requested) no unexpected columns appear.
  2. Types — each expected column's declared type matches.
  3. Nullability — each expected column's nullability matches.
  1. 列集合 —— 每个预期的列名都出现在实际数据表中;当要求严格匹配时,不得出现未预期的列。
  2. 数据类型 —— 每个预期列的声明类型与契约一致。
  3. 可空性 —— 每个预期列的可空性与契约一致。

Execution checklist

执行检查清单

Run the column-contract checks in this order. Stop on the first blocking failure for a given table, then continue with the next target table if the session contains multiple table targets.
  1. Expected columns present — when the caller supplied an expected column set, every expected column must appear in
    describe_table
    output.
  2. No unexpected columns — when the caller requires exact matching, flag any actual column that is not in the contract.
  3. Types match — compare each expected column's declared type with the contract. Treat widening as acceptable only when the contract explicitly allows it.
  4. Nullability matches — compare each expected column's nullable /
    NOT NULL
    setting with the contract.
For every executed check, report the check name, observed value, expected value or threshold, pass/fail decision, and a short failure reason.
按以下顺序执行列契约检查。针对某个数据表,遇到第一个阻塞性失败时停止检查,然后继续处理会话中的下一个目标数据表。
  1. 预期列是否存在 —— 当调用方提供了预期列集合时,每个预期列都必须出现在
    describe_table
    的输出中。
  2. 无未预期列 —— 当调用方要求精确匹配时,标记任何不在契约中的实际列。
  3. 类型匹配 —— 将每个预期列的声明类型与契约进行比较。仅当契约明确允许时,才将类型拓宽视为可接受。
  4. 可空性匹配 —— 将每个预期列的可空/
    NOT NULL
    设置与契约进行比较。
对于每一项执行的检查,需报告检查名称、观测值、预期值或阈值、通过/失败判定,以及简短的失败原因。

When there is no explicit column contract

当无明确列契约时

If the caller did not supply an expected column set / type map, there is nothing for this skill to check — emit the JSON block with
"checks": []
and return without calling tools. The builtin layer has already confirmed existence and row count; duplicating that check here only produces false negatives when catalog/database/schema identifiers are ambiguous.
如果调用方未提供预期列集合/类型映射,本技能无需执行任何检查——输出包含
"checks": []
的JSON块即可返回,无需调用工具。内置层已确认数据表的存在性和行数;若在此处重复检查,只会在目录/数据库/架构标识符不明确时产生误判。

Tools

工具

Use
describe_table
and
get_table_ddl
to introspect the target. Do not run
read_query
for counting rows or sampling data — out of scope.
使用
describe_table
get_table_ddl
来探查目标数据表。请勿调用
read_query
来统计行数或采样数据——这不在本技能范围内。

Project-level validation examples

项目级验证示例

The following checks are intentionally not bundled here. Add them in a project-level validator skill under
./.datus/skills/<name>/
or
~/.datus/skills/<name>/
with
kind: validator
and a
targets:
filter when the table actually needs them:
  • null ratios per column
  • numeric ranges / min-max checks
  • accepted value sets / enum membership
  • regex / format validation
  • uniqueness / duplicate-key detection
  • cross-column assertions
以下检查未被纳入本技能。当数据表确实需要这些检查时,请在
./.datus/skills/<name>/
~/.datus/skills/<name>/
下添加一个项目级验证器技能,设置
kind: validator
并配置
targets:
过滤器:
  • 各列的空值比例
  • 数值范围/最小值-最大值检查
  • 可接受值集合/枚举成员验证
  • 正则/格式验证
  • 唯一性/重复键检测
  • 跨列断言

Output

输出

Emit the standard validator JSON block (see the output contract appended by the hook). Use
severity: "blocking"
only for column contract violations that would break downstream consumers. Mismatches that are cosmetic or widening-safe should be
severity: "advisory"
.
输出标准的验证器JSON块(参见钩子附加的输出契约)。仅当列契约违反会影响下游消费者时,使用
severity: "blocking"
;对于仅影响外观或安全拓宽的不匹配,应使用
severity: "advisory"