mdx-sanitizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MDX Sanitizer

MDX Sanitizer

Comprehensive MDX content sanitizer that prevents JSX parsing errors caused by angle brackets, generics, and other conflicting patterns.
一款全面的MDX内容清理工具,可防止由尖括号、泛型以及其他冲突模式引发的JSX解析错误。

The Problem

问题背景

MDX 2.x treats unescaped
<
and
{
as JSX syntax. This causes build failures when content contains:
  • TypeScript generics:
    Promise&lt;T&gt;
    ,
    Array&lt;string&gt;
    ,
    Map&lt;K, V&gt;
  • Comparisons:
    &lt;100ms
    ,
    &lt;=
    ,
    &gt;=
  • Arrows:
    --&gt;
    ,
    &lt;--
    ,
    -&gt;
  • Invalid tags:
    &lt;link&gt;
    in prose,
    &lt;tag&gt;
    placeholders
  • Empty brackets:
    &lt;&gt;
MDX 2.x会将未转义的
<
{
视为JSX语法。当内容包含以下元素时,会导致构建失败:
  • TypeScript泛型
    Promise<T>
    Array<string>
    Map<K, V>
  • 比较运算符
    <100ms
    <=
    >=
  • 箭头符号
    -->
    <--
    ->
  • 无效标签:散文中的
    <link>
    <tag>
    占位符
  • 空括号
    <>

Solution Architecture

解决方案架构

This skill implements a three-layer defense:
本工具实现了三层防护机制:

1. Sync-Time Sanitization (Proactive)

1. 同步时清理(主动防护)

Content is sanitized when syncing from
.claude/skills/
to
website/docs/
:
  • syncSkillDocs.ts
    - Main skill files
  • syncSkillSubpages.ts
    - Reference files
  • doc-generator.ts
    - Generated docs
在从
.claude/skills/
同步到
website/docs/
时对内容进行清理:
  • syncSkillDocs.ts
    - 主工具文件
  • syncSkillSubpages.ts
    - 参考文件
  • doc-generator.ts
    - 生成的文档

2. Pre-Commit Validation (Reactive)

2. 提交前验证(响应式防护)

The git pre-commit hook validates files before commit using
validate-brackets.js
.
Git提交前钩子会使用
validate-brackets.js
在提交前验证文件。

3. Build-Time Validation (Final Check)

3. 构建时验证(最终检查)

npm run validate:all
runs as part of
prebuild
to catch any issues.
npm run validate:all
会作为
prebuild
的一部分运行,以排查所有问题。

Usage

使用方法

Check for Issues (Dry Run)

检查问题(试运行)

bash
cd website
npm run sanitize:mdx
bash
cd website
npm run sanitize:mdx

or with verbose output

或启用详细输出

npm run sanitize:mdx -- --verbose
undefined
npm run sanitize:mdx -- --verbose
undefined

Fix All Issues

修复所有问题

bash
cd website
npm run sanitize:mdx -- --fix
bash
cd website
npm run sanitize:mdx -- --fix

or shorthand

或简写命令

npm run fix:mdx
undefined
npm run fix:mdx
undefined

Programmatic API

程序化API

typescript
import { sanitizeForMdx, validateMdxSafety, isMdxSafe } from './lib/mdx-sanitizer';

// Sanitize content
const result = sanitizeForMdx(content, { useHtmlEntities: true });
if (result.modified) {
  console.log(`Fixed ${result.issues.length} issues`);
  fs.writeFileSync(path, result.content);
}

// Validate without modifying
const issues = validateMdxSafety(content, 'path/to/file.md');

// Quick check
if (!isMdxSafe(content)) {
  // Handle issues
}
typescript
import { sanitizeForMdx, validateMdxSafety, isMdxSafe } from './lib/mdx-sanitizer';

// 清理内容
const result = sanitizeForMdx(content, { useHtmlEntities: true });
if (result.modified) {
  console.log(`已修复 ${result.issues.length} 个问题`);
  fs.writeFileSync(path, result.content);
}

// 验证内容但不修改
const issues = validateMdxSafety(content, 'path/to/file.md');

// 快速检查
if (!isMdxSafe(content)) {
  // 处理问题
}

Escaping Strategies

转义策略

The sanitizer uses HTML entities for maximum compatibility:
PatternOriginalEscaped
Less-than
<
&lt;
Greater-than
>
&gt;
Generics
&lt;T&gt;
&amp;lt;T&amp;gt;
Comparison
&lt;=
&amp;lt;=
Content inside code blocks (
```
or
`
) is automatically protected and never escaped.
该清理工具使用HTML实体以实现最大兼容性:
模式原始内容转义后内容
小于号
<
&lt;
大于号
>
&gt;
泛型
<T>
&amp;lt;T&amp;gt;
比较运算符
<=
&amp;lt;=
代码块(
```
`
)内的内容会自动被保护,不会被转义。

Files Modified

修改的文件

  • website/scripts/lib/mdx-sanitizer.ts
    - Core sanitizer module
  • website/scripts/sanitize-mdx.ts
    - CLI wrapper
  • website/scripts/syncSkillDocs.ts
    - Integration
  • website/scripts/syncSkillSubpages.ts
    - Integration
  • website/scripts/lib/doc-generator.ts
    - Integration
  • website/package.json
    - npm scripts
  • website/scripts/lib/mdx-sanitizer.ts
    - 核心清理模块
  • website/scripts/sanitize-mdx.ts
    - CLI 封装器
  • website/scripts/syncSkillDocs.ts
    - 集成文件
  • website/scripts/syncSkillSubpages.ts
    - 集成文件
  • website/scripts/lib/doc-generator.ts
    - 集成文件
  • website/package.json
    - npm 脚本

Patterns Detected

可检测的模式

  1. Less-than before digit:
    &lt;100
    ,
    &lt;0.5ms
  2. Comparison operators:
    &lt;=
    ,
    &gt;=
  3. Empty brackets:
    &lt;&gt;
  4. Arrows:
    &lt;--
    ,
    --&gt;
  5. Generic types:
    Promise&lt;T&gt;
    ,
    Array&lt;string&gt;
  6. Space after less-than:
    &lt; value
  7. Invalid pseudo-tags:
    &lt;link&gt;
    ,
    &lt;tag&gt;
    (not valid HTML)
  1. 尖括号后接数字
    <100
    <0.5ms
  2. 比较运算符
    <=
    >=
  3. 空括号
    <>
  4. 箭头符号
    <--
    -->
  5. 泛型类型
    Promise<T>
    Array<string>
  6. 尖括号后接空格
    < value
  7. 无效伪标签
    <link>
    <tag>
    (非有效HTML元素)

Troubleshooting

故障排除

Build Still Fails After Running Sanitizer

运行清理工具后构建仍然失败

  1. Clear Docusaurus cache:
    npm run clear
  2. Re-run sanitizer:
    npm run sanitize:mdx -- --fix
  3. Rebuild:
    npm run build
  1. 清除Docusaurus缓存:
    npm run clear
  2. 重新运行清理工具:
    npm run sanitize:mdx -- --fix
  3. 重新构建:
    npm run build

False Positives

误报问题

If valid JSX components are being escaped:
  • Ensure they use PascalCase (e.g.,
    &lt;MyComponent&gt;
    )
  • Check they're valid HTML5 elements
如果有效的JSX组件被转义:
  • 确保组件使用大驼峰命名(例如:
    <MyComponent>
  • 检查组件是否为有效的HTML5元素

Manual Escaping

手动转义

For edge cases, manually escape in source:
  • Use backticks for inline code:
    `&lt;T&gt;`
  • Use fenced code blocks for multi-line
  • Use HTML entities:
    &lt;
    and
    &gt;
对于特殊场景,可在源文件中手动转义:
  • 使用反引号包裹行内代码:
    `<T>`
  • 使用围栏代码块包裹多行内容
  • 使用HTML实体:
    &lt;
    &gt;

Sources

参考资料