frontmatter-guard

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Frontmatter Guard Skill

Convention: see
skills/conventions/quality.md
for citation rules; this skill is structural validation, not citation auditing.
,嵌套引号、slug不匹配、空字节、空前置元数据、YAML解析失败)。为Agent驱动的工作流封装
gbrain frontmatter
CLI工具。 triggers:
  • "validate frontmatter"
  • "check frontmatter"
  • "fix frontmatter"
  • "frontmatter audit"
  • "brain lint" tools:
  • exec mutating: true

Contract

Frontmatter Guard Skill

This skill guarantees:
  • Every brain page is scanned against the seven canonical frontmatter validation classes
  • Mechanical errors (nested quotes, missing closing
    ---
    , null bytes, slug mismatch) are auto-repairable on demand with
    .bak
    backups
  • Validation logic is shared with
    gbrain doctor
    's
    frontmatter_integrity
    subcheck — single source of truth
  • Reports per source (gbrain is multi-source since v0.18.0); never silently audits the wrong root
约定: 参考
skills/conventions/quality.md
中的引用规则;本技能用于结构验证,不涉及引用审计。

Why This Exists

功能承诺

Brain pages pile up over months. Agents write them with malformed frontmatter:
  • Missing closing
    ---
    (entity detector bugs)
  • Unstructured YAML in meeting pages (ingestion bugs)
  • Slug mismatches (path renames not propagated)
  • Null bytes (binary corruption from copy-paste accidents)
  • Nested double quotes in titles (
    title: "Phil "Nick" Last"
    )
Without a guard, these accumulate silently until
gbrain sync
chokes or search returns garbage. The guard makes the failure visible at audit time and trivially fixable.
本技能保证:
  • 所有脑图页面都会针对7类标准前置元数据验证规则进行扫描
  • 机械错误(嵌套引号、缺少闭合
    ---
    、空字节、slug不匹配)可按需自动修复,并生成
    .bak
    备份文件
  • 验证逻辑与
    gbrain doctor
    frontmatter_integrity
    子检查共享——保证单一事实来源
  • 按来源生成报告(gbrain自v0.18.0起支持多来源);绝不会静默审计错误的根目录

Validation classes

设计初衷

CodeMeaningAuto-fixable?
MISSING_OPEN
File doesn't start with
---
No (needs human)
MISSING_CLOSE
No closing
---
before first heading
Yes
YAML_PARSE
YAML failed to parseSometimes (depends on cause)
SLUG_MISMATCH
Frontmatter
slug:
differs from path-derived slug
Yes (removes the field)
NULL_BYTES
Binary corruption (
\x00
)
Yes
NESTED_QUOTES
title: "outer "inner" outer"
shape
Yes
EMPTY_FRONTMATTER
Open + close present but nothing betweenNo (needs human)
脑图页面会随着时间不断积累,Agent生成的页面可能存在前置元数据格式错误:
  • 缺少闭合
    ---
    (实体检测器bug)
  • 会议页面中存在非结构化YAML(导入bug)
  • slug不匹配(路径重命名未同步)
  • 空字节(复制粘贴导致的二进制损坏)
  • 标题中存在嵌套双引号(
    title: "Phil "Nick" Last"
如果没有这个防护工具,这些错误会静默积累,直到
gbrain sync
运行失败或搜索结果出现垃圾数据。本工具能在审计阶段就暴露问题,并让修复变得简单。

Phases

验证类别

Phase 1: Audit

Run a read-only scan across all registered sources (or one with
--source <id>
).
bash
gbrain frontmatter audit --json
Reports:
  • Per-source counts grouped by error code
  • Sample of up to 20 affected pages per source
  • Total count
  • Scan timestamp
Output is JSON; agents parse
errors_by_code
and
per_source
to decide next steps.
代码含义是否可自动修复?
MISSING_OPEN
文件未以
---
开头
否(需人工处理)
MISSING_CLOSE
第一个标题前缺少闭合
---
YAML_PARSE
YAML解析失败有时(取决于原因)
SLUG_MISMATCH
前置元数据中的
slug:
与路径生成的slug不一致
是(移除该字段)
NULL_BYTES
二进制损坏(
\x00
NESTED_QUOTES
形如
title: "outer "inner" outer"
的格式
EMPTY_FRONTMATTER
存在开头和闭合标记,但中间无内容否(需人工处理)

Phase 2: Validate one path

操作阶段

阶段1:审计

Validate a single file or directory (does not require source registration):
bash
gbrain frontmatter validate <path> --json
Exit code 0 = clean; 1 = errors found. Use this in CI pipelines or pre-commit hooks.
对所有已注册来源(或通过
--source <id>
指定单个来源)执行只读扫描。
bash
gbrain frontmatter audit --json
报告内容:
  • 按错误代码分组的各来源统计数
  • 每个来源最多20个受影响页面的样本
  • 总错误数
  • 扫描时间戳
输出为JSON格式;Agent可解析
errors_by_code
per_source
字段来决定后续操作。

Phase 3: Fix

阶段2:验证单个路径

When issues are found:
bash
gbrain frontmatter validate <path> --fix
--fix
writes
<file>.bak
for every modified file before mutating. The backup is the safety contract — works whether the brain is a git repo or a plain directory.
--dry-run
previews without writing. Use this before applying fixes in batch.
验证单个文件或目录(无需注册来源):
bash
gbrain frontmatter validate <path> --json
退出码0表示无错误;1表示发现错误。可用于CI流水线或预提交钩子。

Phase 4: Pre-commit hook (optional)

阶段3:修复

For brain repos that ARE git repos, install the pre-commit hook to block malformed pages from being committed in the first place:
bash
gbrain frontmatter install-hook [--source <id>]
The hook runs
gbrain frontmatter validate
against staged
.md
/
.mdx
files. Bypass with
git commit --no-verify
.
发现问题后执行修复:
bash
gbrain frontmatter validate <path> --fix
--fix
参数会在修改文件前为每个文件生成
<file>.bak
备份。备份是安全保障——无论脑图是git仓库还是普通目录都适用。
--dry-run
参数可预览修复操作但不实际写入文件。批量修复前建议使用该参数。

Trigger words

阶段4:预提交钩子(可选)

When the user says any of these, route here:
  • "validate frontmatter"
  • "check frontmatter"
  • "fix frontmatter"
  • "frontmatter audit"
  • "brain lint"
对于git仓库形式的脑图,安装预提交钩子可从源头阻止格式错误的页面被提交:
bash
gbrain frontmatter install-hook [--source <id>]
该钩子会对已暂存的
.md
/
.mdx
文件执行
gbrain frontmatter validate
检查。可通过
git commit --no-verify
绕过检查。

Output rules

触发词

  • Always run
    gbrain frontmatter audit --json
    first; never assume a brain is clean.
  • Surface counts to the user in plain language; do not dump raw JSON.
  • For
    --fix
    operations: state how many files will be modified BEFORE running, then confirm.
  • SLUG_MISMATCH
    fixes remove the frontmatter
    slug:
    field — gbrain derives slug from path. Mention this when the user's title is intentionally renamed.
  • Never auto-fix
    MISSING_OPEN
    or
    EMPTY_FRONTMATTER
    without explicit user input — these usually mean a human author started a page and didn't finish.
当用户说出以下任意内容时,路由至本技能:
  • "validate frontmatter"
  • "check frontmatter"
  • "fix frontmatter"
  • "frontmatter audit"
  • "brain lint"

Chains with

输出规则

  • gbrain doctor
    — the
    frontmatter_integrity
    subcheck reports the same counts as
    audit
    .
  • skills/maintain/SKILL.md
    — broader brain health audit; chain after this skill if other classes of issue are suspected.
  • skills/lint/SKILL.md
    (via
    gbrain lint
    ) — overlapping rules for skill-file lint; the
    frontmatter-*
    rule names in lint output come from this skill's validation surface.
  • 始终先执行
    gbrain frontmatter audit --json
    ;绝不要假设脑图是无错误的。
  • 用通俗易懂的语言向用户展示统计数;不要直接输出原始JSON。
  • 执行
    --fix
    操作时:在运行前告知用户将修改的文件数量,然后确认。
  • SLUG_MISMATCH
    修复会移除前置元数据中的
    slug:
    字段——gbrain会从路径生成slug。当用户有意重命名标题时需提及这一点。
  • 未经用户明确输入,绝不要自动修复
    MISSING_OPEN
    EMPTY_FRONTMATTER
    ——这些通常意味着人工作者尚未完成页面编写。

Output Format

关联技能/工具

Audit summary (terse, agent-friendly):
Frontmatter audit — 17 issue(s) across 1 source(s)

[default] /Users/me/brain
  17 issue(s)
    MISSING_CLOSE: 8
    NESTED_QUOTES: 5
    NULL_BYTES: 4
  sample:
    people/jane.md — MISSING_CLOSE
    companies/acme.md — NESTED_QUOTES
    (+ 12 more)

Fix with: gbrain frontmatter validate /Users/me/brain --fix
JSON envelope (when
--json
is passed):
json
{
  "ok": false,
  "total": 17,
  "errors_by_code": { "MISSING_CLOSE": 8, "NESTED_QUOTES": 5, "NULL_BYTES": 4 },
  "per_source": [
    {
      "source_id": "default",
      "source_path": "/Users/me/brain",
      "total": 17,
      "errors_by_code": { "MISSING_CLOSE": 8, "NESTED_QUOTES": 5, "NULL_BYTES": 4 },
      "sample": [{ "path": "people/jane.md", "codes": ["MISSING_CLOSE"] }]
    }
  ],
  "scanned_at": "2026-04-25T22:30:00.000Z"
}
gbrain frontmatter validate <path> --json
returns a similar envelope keyed on per-file results instead of per-source.
  • gbrain doctor
    ——
    frontmatter_integrity
    子检查的统计数与
    audit
    一致。
  • skills/maintain/SKILL.md
    ——更全面的脑图健康审计;如果怀疑存在其他类型的问题,可在本技能之后调用该技能。
  • skills/lint/SKILL.md
    (通过
    gbrain lint
    )——技能文件检查的重叠规则;lint输出中的
    frontmatter-*
    规则名称来自本技能的验证范围。

Prevention — Writing Valid Frontmatter

输出格式

This is the most important section. Fixing broken frontmatter is good. Not writing broken frontmatter in the first place is better.
审计摘要(简洁,适合Agent处理):
Frontmatter audit — 17 issue(s) across 1 source(s)

[default] /Users/me/brain
  17 issue(s)
    MISSING_CLOSE: 8
    NESTED_QUOTES: 5
    NULL_BYTES: 4
  sample:
    people/jane.md — MISSING_CLOSE
    companies/acme.md — NESTED_QUOTES
    (+ 12 more)

Fix with: gbrain frontmatter validate /Users/me/brain --fix
JSON封装(当使用
--json
参数时):
json
{
  "ok": false,
  "total": 17,
  "errors_by_code": { "MISSING_CLOSE": 8, "NESTED_QUOTES": 5, "NULL_BYTES": 4 },
  "per_source": [
    {
      "source_id": "default",
      "source_path": "/Users/me/brain",
      "total": 17,
      "errors_by_code": { "MISSING_CLOSE": 8, "NESTED_QUOTES": 5, "NULL_BYTES": 4 },
      "sample": [{ "path": "people/jane.md", "codes": ["MISSING_CLOSE"] }]
    }
  ],
  "scanned_at": "2026-04-25T22:30:00.000Z"
}
gbrain frontmatter validate <path> --json
返回类似的封装结构,但按文件结果而非来源分组。

YAML arrays (the historical #1 error source)

预防措施——编写合法的前置元数据

yaml
undefined
这是最重要的部分。 修复错误的前置元数据固然重要,但从一开始就避免编写错误的前置元数据更好。

Correct: single-quoted YAML flow (canonical form gbrain emits)

YAML数组(历史上排名第一的错误来源)

tags: ['yc', 'w2025', 'ai']
yaml
undefined

Correct: unquoted scalars (fine when values have no special chars)

正确:单引号YAML流(gbrain生成的标准格式)

tags: [yc, w2025, ai]
tags: ['yc', 'w2025', 'ai']

Correct: block style

正确:无引号标量(值无特殊字符时可用)

tags:
  • yc
  • w2025
tags: [yc, w2025, ai]

Tolerated post-v0.37.5.0 but non-canonical: JSON-style double quotes

正确:块级样式

tags: ["yc", "w2025"]
tags:
  • yc
  • w2025

Broken: mixed JSON objects and strings (invalid YAML)

v0.37.5.0之后兼容但非标准:JSON风格双引号

tags: [{"name": "sports"}, "posterous"]

**Why this used to break:** before v0.37.5.0, the validator counted unescaped `"` characters and flagged any line with 3+. A flow sequence like `tags: ["yc", "w2025"]` has 4 unescaped `"` by design — it's valid YAML, but the dumb counter flagged it anyway. One brain saw 6,981 of these on a single doctor run. v0.37.5.0 parses suspicious values with `js-yaml.safeLoad` before flagging, so JSON-style arrays no longer trigger NESTED_QUOTES.

**Why you should still write the canonical form:** the auto-fix engine (`gbrain frontmatter validate --fix`) and the inferred-frontmatter serializer both emit single-quoted YAML for `tags:` / `aliases:`. Writing the canonical form in new content keeps the source files stylistically consistent and makes diffs against `--fix` runs empty.

**The classic LLM trap:** code like `tags: [${items.map(t => JSON.stringify(t)).join(', ')}]` produces `tags: ["yc", "w2025"]`. Use single quotes with an apostrophe fallback: `tags: [${items.map(t => t.includes("'") ? JSON.stringify(t) : "'" + t + "'").join(', ')}]`. Or use a YAML library that knows how to emit canonical YAML.
tags: ["yc", "w2025"]

Quoted scalars

错误:混合JSON对象和字符串(无效YAML)

yaml
undefined
tags: [{"name": "sports"}, "posterous"]

**为何过去会报错:** 在v0.37.5.0之前,验证器会统计未转义的`"`字符,标记任何包含3个及以上的行。像`tags: ["yc", "w2025"]`这样的流序列天生包含4个未转义的`"`——这是合法的YAML,但简单的计数器会误判。某次`doctor`运行中,一个脑图出现了6981个此类误判。v0.37.5.0会先使用`js-yaml.safeLoad`解析可疑值再标记,因此JSON风格数组不再触发NESTED_QUOTES错误。

**为何仍应使用标准格式:** 自动修复引擎(`gbrain frontmatter validate --fix`)和推断前置元数据序列化器都会为`tags:`/`aliases:`生成单引号YAML。在新内容中使用标准格式可保持源文件风格一致,并且与`--fix`运行的差异为空。

**经典LLM陷阱:** 类似`tags: [${items.map(t => JSON.stringify(t)).join(', ')}]`的代码会生成`tags: ["yc", "w2025"]`。应使用单引号并回退到撇号:`tags: [${items.map(t => t.includes("'") ? JSON.stringify(t) : "'" + t + "'").join(', ')}]`。或者使用能生成标准YAML的YAML库。

Correct: single quotes for values with special chars

带引号的标量

title: 'My "Quoted" Title'
yaml
undefined

Correct: double quotes when value has apostrophes

正确:值含特殊字符时使用单引号

title: "Men's Fashion Guide"
title: 'My "Quoted" Title'

Broken: double quotes wrapping inner double quotes

正确:值含撇号时使用双引号

title: "My "Quoted" Title"
undefined
title: "Men's Fashion Guide"

When to quote at all

错误:双引号包裹内部双引号

  • Unquoted is fine for simple values:
    type: person
    ,
    batch: w2025
  • Quote when the value contains
    : " ' # [ ] { } | > & * ! ? ,
    or starts with
    @
  • Single quotes are the default safe choice
  • Double quotes only when the value itself contains apostrophes
title: "My "Quoted" Title"
undefined

Anti-Patterns

何时需要加引号

Don't auto-fix
MISSING_OPEN
or
EMPTY_FRONTMATTER
without user input.
These usually mean a human author started a page and didn't finish — silently inserting
---
markers around an unfinished draft is wrong.
Don't use
--fix
to "make doctor green" without reading the audit first.
SLUG_MISMATCH cases are surfaced for manual review specifically because gbrain derives the slug from path. A mismatch usually means the user renamed a file intentionally; auto-removing the slug field is the right outcome only when you've confirmed the rename was deliberate.
Don't skip the
.bak
backups.
The
.bak
is the safety contract for non-git brain repos. If
.bak
files accumulate after a fix run, that's a feature, not a bug — the user can review the diffs and delete the backups when satisfied.
Don't run
audit
on a brain where sources aren't registered.
The CLI returns "no registered sources to audit" gracefully, but the migration emits a
skipped: no_sources
phase result. Don't paper over this with a manual path-walk; the right fix is to register the source via
gbrain sources add
.
Don't install the pre-commit hook on non-git brain dirs. The install-hook command skips them automatically with a one-line note. If you see "skipped — not a git repo" and want validation at write time anyway, use the
audit
command on a cron schedule.
  • 无引号适用于简单值:
    type: person
    ,
    batch: w2025
  • 加引号当值包含
    : " ' # [ ] { } | > & * ! ? ,
    或以
    @
    开头时
  • 单引号是默认的安全选择
  • 双引号仅在值本身包含撇号时使用

反模式

不要未经用户输入就自动修复
MISSING_OPEN
EMPTY_FRONTMATTER
这些通常意味着人工作者尚未完成页面编写——静默插入
---
标记会破坏未完成的草稿。
不要未查看审计结果就使用
--fix
来“让doctor显示正常”。
SLUG_MISMATCH案例需要人工审核,因为gbrain从路径生成slug。不匹配通常意味着用户有意重命名了文件;只有确认重命名是故意的,自动移除slug字段才是正确的操作。
不要跳过
.bak
备份。
.bak
是非git脑图仓库的安全保障。修复后
.bak
文件积累是正常的,而非bug——用户可查看差异并在满意后删除备份。
不要在未注册来源的脑图上运行
audit
CLI会优雅地返回“no registered sources to audit”,但迁移会生成
skipped: no_sources
阶段结果。不要通过手动遍历路径来掩盖这个问题;正确的做法是通过
gbrain sources add
注册来源。
不要在非git脑图目录上安装预提交钩子。 install-hook命令会自动跳过并给出一行提示。如果看到“skipped — not a git repo”但仍想在写入时进行验证,可通过定时任务运行
audit
命令。