web-content-audit

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Web Content Audit

Web内容审计

Schema authority: cross-file consistency rules (3-place sync for materials, task ID stability, quiz renumber trap) are codified in
_shared/domain-primitives.md
§13. Audit scripts in this skill operationalise those rules.
Filename convention (English-first): audit reports land under
data/audit-*.md
; scripts under
scripts/audit-*.mjs
.

This skill produces audit scripts that read source files and deployed data, compare them, and emit a human-readable report. The goal is not to gatekeep deployment (that's verification's job) — it's to surface drift that humans should review.

Schema 权威来源: 跨文件一致性规则（素材三方同步、任务ID稳定性、测验编号重命名陷阱）已编入
_shared/domain-primitives.md
第13节。本技能中的审计脚本将这些规则落地执行。
文件名约定（英文优先）: 审计报告存放在
data/audit-*.md
下；脚本存放在
scripts/audit-*.mjs
下。

本技能生成审计脚本，读取源文件与已部署数据并进行对比，输出人类可读的报告。其目标并非阻止部署（那是验证的职责），而是呈现需要人工审核的内容偏差。

When to Invoke

调用时机

Before a major content release (catch outline ↔ course-data ↔ visuals drift before students see it).
After adding/removing units / materials / quiz items (re-check the three-place sync rule).
Before handing off a corporate edition to a client (confirm the condensed COURSE matches its source).
When the user senses "something's off" but can't see a runtime bug — audit is for the invisible drift.
When planning the next iteration (e.g. "which sections need new illustrations?").

Do NOT invoke when the question is "does the page render correctly?" — that's

web-visual-verification

重大内容发布前（在学生看到之前，捕捉大纲 ↔ course-data ↔ 视觉素材之间的偏差）。
添加/移除单元/素材/测验题目后（重新检查三方同步规则）。
向客户交付企业版课程前（确认精简后的COURSE与源内容一致）。
用户感觉「哪里不对」但未发现运行时bug时——审计针对的是隐形偏差。
规划下一轮迭代时（例如「哪些章节需要新增插图？」）。

请勿调用当问题是「页面渲染是否正常？」时——那是

web-visual-verification

的职责。

Audit vs Verify — Operating Mode Difference

审计 vs 验证——运行模式差异

Trait	Audit	Verify
Output	Markdown report (human reads)	Pass/fail (CI gates)
Tools	File reads, regex, JSON diffs, optionally Playwright	Playwright + asserts
Failure mode	Always exits 0; report tells human what's worth fixing	Exits non-zero on bad state
When run	Manually, at milestones	On every PR / pre-deploy
Mental model	"Show me what's drifted"	"Don't let this regress"

If your script ends with

assert.*

process.exit(1)

, it's not an audit — move it to

web-visual-verification

特性	审计	验证
输出	Markdown报告（供人工阅读）	通过/失败结果（作为CI门禁）
工具	文件读取、正则表达式、JSON对比，可选Playwright	Playwright + 断言
失败模式	始终以0状态退出；报告告知人工哪些内容需要修复	状态异常时以非0状态退出
运行时机	手动触发，在里程碑节点	每个PR/预部署阶段
核心思路	「告诉我哪里出现了偏差」	「防止出现回退」

如果你的脚本以

assert.*

或

process.exit(1)

结尾，那它不属于审计——请将其移至

web-visual-verification

。

The Five Audit Categories

五大审计类别

Audit 1: Cross-Artifact Reference Resolution

审计1：跨工件引用解析

The classic teaching-site bug:

course-data.js:materials[]

references a file the disk doesn't have, or

getMaterialUrl()

is missing a rule.

// scripts/audit-material-references.mjs
import fs from 'node:fs/promises';
import path from 'node:path';

// Read course-data.js (via vm or regex extract)
const COURSE = await loadCourse('./course-data.js');

const report = ['# Material reference audit\n'];
for (const m of COURSE.materials) {
  // 1. Does the file router know about it?
  const routerCovers = await checkRouterCoverage(m.name);
  // 2. Does the actual file exist?
  const fileExists = await checkFileExists(m);
  // 3. Is it referenced in any unit's materials[]?
  const referencedInUnit = findReferencingUnits(COURSE, m.id);

  if (!routerCovers || !fileExists || referencedInUnit.length === 0) {
    report.push(`## ⚠️ ${m.id}: ${m.name}`);
    report.push(`- Router: ${routerCovers ? '✅' : '❌'}`);
    report.push(`- File: ${fileExists ? '✅' : '❌'}`);
    report.push(`- Referenced in units: ${referencedInUnit.join(', ') || '❌ NONE'}`);
  }
}
await fs.writeFile('data/audit-materials.md', report.join('\n'));

The output goes to

data/audit-*.md

; the human opens it and decides what to fix.

教学网站的典型bug：

course-data.js:materials[]

引用了磁盘上不存在的文件，或者

getMaterialUrl()

缺少对应规则。

// scripts/audit-material-references.mjs
import fs from 'node:fs/promises';
import path from 'node:path';

// Read course-data.js (via vm or regex extract)
const COURSE = await loadCourse('./course-data.js');

const report = ['# Material reference audit\n'];
for (const m of COURSE.materials) {
  // 1. Does the file router know about it?
  const routerCovers = await checkRouterCoverage(m.name);
  // 2. Does the actual file exist?
  const fileExists = await checkFileExists(m);
  // 3. Is it referenced in any unit's materials[]?
  const referencedInUnit = findReferencingUnits(COURSE, m.id);

  if (!routerCovers || !fileExists || referencedInUnit.length === 0) {
    report.push(`## ⚠️ ${m.id}: ${m.name}`);
    report.push(`- Router: ${routerCovers ? '✅' : '❌'}`);
    report.push(`- File: ${fileExists ? '✅' : '❌'}`);
    report.push(`- Referenced in units: ${referencedInUnit.join(', ') || '❌ NONE'}`);
  }
}
await fs.writeFile('data/audit-materials.md', report.join('\n'));

输出文件存至

data/audit-*.md

；人工打开后决定需要修复的内容。

Audit 2: Asset Coverage / Gap Analysis

审计2：资产覆盖/缺口分析

"Which sections still have no illustration?" — guide future work, don't fail anything.

// scripts/audit-illustrations.mjs
// Walks every .chapter / .accordion-content; counts text length, image count, list density
// For each section: if (textLength > 500 && imageCount === 0) → candidate for illustration
// Output: data/illustration-audit.md (a markdown list with section titles + screenshots)

This may use Playwright for screenshots (capture mode — no assertions) but the product is the gap list, not the screenshots. Don't conflate this with

web-visual-verification

capture-*.mjs

which has no analytical output.

「哪些章节仍无插图？」——为后续工作提供指导，而非判定失败。

// scripts/audit-illustrations.mjs
// Walks every .chapter / .accordion-content; counts text length, image count, list density
// For each section: if (textLength > 500 && imageCount === 0) → candidate for illustration
// Output: data/illustration-audit.md (a markdown list with section titles + screenshots)

此脚本可能使用Playwright进行截图（仅捕获模式——无断言），但核心产出是缺口列表，而非截图。请勿将其与

web-visual-verification

中无分析输出的

capture-*.mjs

混淆。

Audit 3: Inlined Data ↔ Source Drift (Corporate Edition)

审计3：内联数据 ↔ 源内容偏差（企业版）

When

window.COURSE

is inlined in

index.html

(corporate edition), it can drift from the source files it was condensed from. Audit the diff:

// scripts/audit-corp-content.mjs
const inlined = await loadCourseFromIndexHtml('corporate-editions/client_6h/index.html');
const source = await loadSourceMarkdown('course-package/');

const report = ['# Corporate edition content audit\n'];

// Compare meta
report.push(`## Meta`);
report.push(diff(inlined.meta, source.meta));

// Per-unit: deliverables / prompts / tasks count
for (const day of ['day1', 'day2']) {
  for (const unit of inlined[day].units) {
    report.push(`## ${unit.id}: ${unit.title}`);
    report.push(`- Prompts: ${unit.prompts?.length || 0}`);
    report.push(`- Tasks: ${unit.tasks?.length || 0}`);
    report.push(`- Materials: ${unit.materials?.length || 0}`);
  }
}
await fs.writeFile('data/audit-corp-content.md', report.join('\n'));

The example workshop's

audit-corp-6h-content.mjs

does exactly this — produces a markdown dump for the instructor to skim before each corporate session.

当

window.COURSE

内联至

index.html

（企业版）时，可能与它所精简自的源文件产生偏差。审计两者的差异：

// scripts/audit-corp-content.mjs
const inlined = await loadCourseFromIndexHtml('corporate-editions/client_6h/index.html');
const source = await loadSourceMarkdown('course-package/');

const report = ['# Corporate edition content audit\n'];

// Compare meta
report.push(`## Meta`);
report.push(diff(inlined.meta, source.meta));

// Per-unit: deliverables / prompts / tasks count
for (const day of ['day1', 'day2']) {
  for (const unit of inlined[day].units) {
    report.push(`## ${unit.id}: ${unit.title}`);
    report.push(`- Prompts: ${unit.prompts?.length || 0}`);
    report.push(`- Tasks: ${unit.tasks?.length || 0}`);
    report.push(`- Materials: ${unit.materials?.length || 0}`);
  }
}
await fs.writeFile('data/audit-corp-content.md', report.join('\n'));

示例工作坊中的

audit-corp-6h-content.mjs

正是如此——生成Markdown输出供讲师在每次企业课程前快速浏览。

Audit 4: ID Stability & Reuse

审计4：ID稳定性与复用

task.id

quiz.id

unit.id

are stable contracts (localStorage keys, internal references). Audit them:

// scripts/audit-id-stability.mjs
// 1. Collect all task IDs across course-data.js — check no duplicates
// 2. Read git log for course-data.js — flag any task ID that was renamed (not just deleted)
// 3. Cross-check quiz[N].sourceUnit references actual unit IDs
// 4. Cross-check materials[].id appears in at least one unit's materials[]

This is the audit that catches "we accidentally renamed

d2-u3-t1

d2-u3-task1

" — silent localStorage-progress-wipe across all students.

task.id

、

quiz.id

、

unit.id

是稳定的约定（localStorage键、内部引用）。对其进行审计：

// scripts/audit-id-stability.mjs
// 1. Collect all task IDs across course-data.js — check no duplicates
// 2. Read git log for course-data.js — flag any task ID that was renamed (not just deleted)
// 3. Cross-check quiz[N].sourceUnit references actual unit IDs
// 4. Cross-check materials[].id appears in at least one unit's materials[]

此审计可捕获「我们意外将

d2-u3-t1

重命名为

d2-u3-task1

」这类问题——这会导致所有学生的localStorage进度被静默清除。

Audit 5: Produced Artifact Inspection

审计5：生成工件检查

When the pipeline produces a binary deliverable (DOCX, PDF, zip), audit its inner structure to catch broken builds early.

// scripts/inspect-docx-images.mjs
import JSZip from 'jszip';
const zip = await JSZip.loadAsync(await readFile('dist/ebook.docx'));
const images = Object.keys(zip.files).filter(f => f.startsWith('word/media/'));
console.log(`📷 內嵌圖片：${images.length} 張`);
for (const f of images) {
  const size = zip.files[f]._data?.uncompressedSize || 0;
  console.log(`   ${f}  ${(size / 1024).toFixed(1)} KB`);
}

If you expect ~30 images and the DOCX has 3, something in the build went wrong silently. This is faster than opening Word and scrolling.

当流水线生成二进制交付物（DOCX、PDF、压缩包）时，检查其内部结构以尽早发现构建故障。

// scripts/inspect-docx-images.mjs
import JSZip from 'jszip';
const zip = await JSZip.loadAsync(await readFile('dist/ebook.docx'));
const images = Object.keys(zip.files).filter(f => f.startsWith('word/media/'));
console.log(`📷 內嵌圖片：${images.length} 張`);
for (const f of images) {
  const size = zip.files[f]._data?.uncompressedSize || 0;
  console.log(`   ${f}  ${(size / 1024).toFixed(1)} KB`);
}

如果你预期约30张图片但DOCX中只有3张，说明构建过程中出现了静默故障。这种检查比打开Word滚动查看更快。

Output Convention

输出约定

data/
├── audit-materials.md            ← Audit 1
├── audit-illustrations.md         ← Audit 2 (gap analysis)
├── audit/section-*.png            ←   ↳ supporting screenshots
├── audit-corp-content.md          ← Audit 3
├── audit-id-stability.md          ← Audit 4
└── audit-ebook-inspection.txt     ← Audit 5

All audit outputs are versionable markdown (or JSON). Commit them at release milestones; the diff between releases tells you "what content actually changed".

data/
├── audit-materials.md            ← 审计1
├── audit-illustrations.md         ← 审计2（缺口分析）
├── audit/section-*.png            ←   ↳ 配套截图
├── audit-corp-content.md          ← 审计3
├── audit-id-stability.md          ← 审计4
└── audit-ebook-inspection.txt     ← 审计5

所有审计输出均为可版本化的Markdown（或JSON）。在发布里程碑节点提交这些文件；不同版本间的差异可告知你「实际变更了哪些内容」。

Format the Report for Human Skimming

为人工快速浏览格式化报告

Auditors will skim, not read. Structure for skimming:

markdown

undefined

审核人员会快速浏览而非逐字阅读。请为快速浏览设计报告结构：

markdown

undefined

{Audit name} — {timestamp}

{审计名称} — {时间戳}

Summary

摘要

✅ 24 of 30 materials fully wired
⚠️ 4 materials missing router rules
❌ 2 materials referenced but file missing

✅ 30项素材中有24项已完全配置
⚠️ 4项素材缺少路由规则
❌ 2项素材被引用但文件不存在

⚠️ Warnings (need attention but not blocking)

⚠️ 警告（需关注但不阻塞）

...

❌ Errors (likely broken in production)

❌ 错误（生产环境可能已损坏）

...

ℹ️ Notes (FYI)

ℹ️ 说明（仅供参考）

...


Lead with the summary and emoji-prefix every section. The audit's job is to be the world's most efficient code review.

...


以摘要开头，为每个章节添加 emoji 前缀。审计的职责是成为最高效的代码审查工具。

The "Three-Place Sync" Audit (Critical for Teaching Sites)

「三方同步」审计（教学网站关键）

The pattern from

static-spa-conversion

: every material must appear in (1) filesystem, (2)

course-data.js:materials[]

, (3)

getMaterialUrl()

router. Audit script:

const fsFiles = new Set(await fs.readdir('course-package/materials/'));
const dataMaterials = new Set(COURSE.materials.map(m => routerNameToFile(m.name)));
const routerCoverage = extractRouterRules(indexHtmlContent);

const inFsNotInData = [...fsFiles].filter(f => !dataMaterials.has(f));
const inDataNotInFs = [...dataMaterials].filter(f => !fsFiles.has(f));
const inDataNotInRouter = [...dataMaterials].filter(f => !routerCoverage.has(f));

// Emit a 3-column markdown table showing each file's coverage in each of the three places.

This single script catches 90% of "the material link is broken" bugs.

来自

static-spa-conversion

的模式：每个素材必须出现在三个位置：(1) 文件系统，(2)

course-data.js:materials[]

，(3)

getMaterialUrl()

路由。审计脚本：

const fsFiles = new Set(await fs.readdir('course-package/materials/'));
const dataMaterials = new Set(COURSE.materials.map(m => routerNameToFile(m.name)));
const routerCoverage = extractRouterRules(indexHtmlContent);

const inFsNotInData = [...fsFiles].filter(f => !dataMaterials.has(f));
const inDataNotInFs = [...dataMaterials].filter(f => !fsFiles.has(f));
const inDataNotInRouter = [...dataMaterials].filter(f => !routerCoverage.has(f));

// Emit a 3-column markdown table showing each file's coverage in each of the three places.

这单个脚本可捕获90%的「素材链接损坏」类bug。

Audit Doesn't Block — It Informs

审计不阻塞——仅提供信息

The strongest discipline: audit scripts always exit 0, even when they find problems. Why:

Audits run regularly; you don't want CI red because of a known "we'll fix in next release" issue.
Audits surface judgement calls (e.g. "section has no illustration — is that intentional?") that automation shouldn't decide.
A failing audit feels like a verify; humans then treat audits as gates and the report falls into "didn't read it" hell.

If an issue MUST block release, promote it to a verify script (

web-visual-verification

) with

assert.*

. Audits inform; verifies gate.

最严格的准则：审计脚本始终以0状态退出，即使发现问题。原因如下：

审计会定期运行；你不希望CI因已知「将在下一版本修复」的问题而变红。
审计呈现的是需要主观判断的问题（例如「章节无插图——是否是有意为之？」），这类问题不应由自动化工具决定。
失败的审计会被当作验证；人工会将审计视为门禁，进而导致报告被「忽略不读」。

如果某个问题必须阻止发布，请将其升级为验证脚本（

web-visual-verification

）并添加

assert.*

断言。审计提供信息；验证充当门禁。

When to Audit (Cadence)

审计时机（节奏）

Trigger	Audits to run
Adding/removing a unit	id-stability, material-references
Adding a material file	material-references (esp. the three-place sync)
Renumbering quiz	id-stability + manual check of hardcoded `(N題)` strings
Before corporate edition handoff	corp-content (inline ↔ source) + material-references
Before publishing ebook	inspect-docx-images / inspect-pdf-pagecount
Quarterly maintenance	illustrations (gap analysis) → plan next content cycle

触发事件	需运行的审计
添加/移除单元	ID稳定性、素材引用
添加素材文件	素材引用（尤其是三方同步）
重新编号测验	ID稳定性 + 手动检查硬编码的 `(N題)` 字符串
交付企业版前	企业版内容（内联 ↔ 源内容） + 素材引用
发布电子书前	DOCX图片检查 / PDF页数检查
季度维护	插图（缺口分析）→ 规划下一内容周期

Anti-Patterns

反模式

Audit scripts with
assert.*
/ non-zero exits — that's a verify, not an audit. Move it.
Audit reports that nobody reads — keep them short, lead with summary counts, use emojis aggressively for scanability.
One audit doing five jobs — split.
```
audit-materials.mjs
```
and
```
audit-illustrations.mjs
```
produce different reports to different folders for different decisions.
Running audits in CI as if they were verifies — they become noise, get muted, become useless. Run on demand or at release milestones.
No timestamp / version in the report — when comparing two audit runs, you need to know which is newer.

含
assert.*
/非0退出的审计脚本——那是验证，不是审计。请迁移。
无人阅读的审计报告——保持简短，以统计摘要开头，大量使用emoji提升可读性。
一个审计承担五项职责——拆分。
```
audit-materials.mjs
```
和
```
audit-illustrations.mjs
```
应生成不同报告并存至不同文件夹，服务于不同决策。
在CI中像运行验证一样运行审计——它们会变成噪音，被屏蔽，最终失去作用。按需运行或在发布里程碑节点运行。
报告中无时间戳/版本信息——对比两次审计结果时，你需要知道哪个是最新的。

Hand-off

交付

When this skill finishes:

A set of
```
audit-*.mjs
```
(and supporting
```
inspect-*.mjs
```
) scripts is in
```
scripts/
```
.
Their outputs land in
```
data/audit-*.md
```
(or
```
.json
```
for tooling consumption).
A README section in the project explains "how to read an audit report" (link to the convention above).
The user knows to run audits at release milestones, not on every commit.

当本技能完成时：

一组
```
audit-*.mjs
```
（及配套的
```
inspect-*.mjs
```
）脚本已存至
```
scripts/
```
。
它们的输出存至
```
data/audit-*.md
```
（或供工具消费的
```
.json
```
）。
项目中的README章节已说明「如何阅读审计报告」（链接至上述约定）。
用户已了解需在发布里程碑节点运行审计，而非每次提交都运行。