indexion-refactor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseindexion refactor — Codebase Refactoring
indexion refactor — 代码库重构
Detect and eliminate duplication at three levels — textual, structural, and conceptual —
using indexion's analysis commands, then verify SoT is enforced.
使用indexion的分析命令,从文本、结构、概念三个层级检测并消除重复内容,随后验证SoT规范是否得到执行。
When to Use
适用场景
- After adding a new abstraction (type, module, API layer)
- After introducing a new file format or I/O boundary
- When a fix required touching 3+ files for the same reason
- When a "guard" or "skip" was added to work around a structural problem
- When ,
opendir, or similar filesystem errors appear from unexpected pathsENOENT - When extracting shared code across packages
- When cleaning up after a refactor (removing trivial wrapper functions)
- Periodic SoT health check on a codebase
- 添加新的抽象(类型、模块、API层)之后
- 引入新的文件格式或I/O边界之后
- 因同一原因需要修改3个及以上文件时
- 为解决结构问题添加了“guard”或“skip”逻辑时
- 出现来自意外路径的、
opendir或类似文件系统错误时ENOENT - 在跨包提取共享代码时
- 重构完成后进行清理(移除无意义的包装函数)时
- 定期对代码库进行SoT健康检查时
Three Levels of Duplication
三个层级的重复内容
| Level | What it is | Tool | Example |
|---|---|---|---|
| Textual | Copy-pasted code blocks, identical functions | | |
| Structural | Same logic structure with different names | | cross-package extraction candidates, trivial wrappers |
| Conceptual | Same domain concept implemented independently | | Three modules each deciding "is this file an archive?" |
Textual duplication is easy to find and fix. Conceptual duplication is the hardest and
most dangerous — it produces no copy-paste matches but means changing one concept requires
updating every scattered implementation.
| 层级 | 定义 | 工具 | 示例 |
|---|---|---|---|
| 文本级 | 复制粘贴的代码块、完全相同的函数 | | |
| 结构级 | 逻辑结构相同但命名不同的代码 | | 跨包提取候选代码、无意义的包装器 |
| 概念级 | 同一领域概念被独立实现 | | 三个模块各自实现“判断文件是否为归档文件”的逻辑 |
文本级重复内容最容易发现和修复。概念级重复内容最难处理且最危险——它没有复制粘贴的匹配痕迹,但意味着修改一个概念时需要更新所有分散的实现。
Workflow
工作流程
Phase 1: Clear textual duplication (plan refactor
)
plan refactor阶段1:清理文本级重复内容(plan refactor
)
plan refactorStart with high-confidence matches and work down.
bash
undefined从高置信度匹配项开始处理,逐步推进。
bash
undefinedStep 1: Find 90%+ duplicates (high confidence)
Step 1: Find 90%+ duplicates (high confidence)
indexion plan refactor --threshold=0.9
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
cmd/indexion/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
cmd/indexion/
indexion plan refactor --threshold=0.9
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/
**Read the output in three sections:**
| Section | What it finds | Action |
|---------|--------------|--------|
| Similar Files | Files with high overall similarity | Investigate for structural consolidation |
| Duplicate Code Blocks | Line-level identical code between files | Extract to `@common` or shared module |
| Function-Level Duplicates | Structurally similar functions (TF-IDF on bodies) | Unify into single SoT function |
**Same-file duplicates** (functions within one file at 90%+) are the highest-value
targets — easiest to fix, clearest wins. Example: `get_global_data_dir` and
`get_global_cache_dir` share 95% structure, extracted into `resolve_os_dir`.
```bashindexion plan refactor --threshold=0.9
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
cmd/indexion/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
cmd/indexion/
indexion plan refactor --threshold=0.9
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/
**输出分为三个部分解读:**
| 部分 | 检测内容 | 操作建议 |
|---------|--------------|--------|
| 相似文件 | 整体相似度较高的文件 | 调研是否可以进行结构合并 |
| 重复代码块 | 文件间行级完全相同的代码 | 提取到`@common`或共享模块中 |
| 函数级重复项 | 结构相似的函数(基于函数体的TF-IDF分析) | 统一为单一SoT函数 |
**同一文件内的重复项**(同一文件内相似度90%+的函数)是最高价值的处理目标——最容易修复,效果最明显。示例:`get_global_data_dir`和`get_global_cache_dir`有95%的结构相似度,可提取为`resolve_os_dir`函数。
```bashStep 2: Use grep to trace references before consolidating
Step 2: Use grep to trace references before consolidating
indexion grep "TypeIdent:TfidfEmbeddingProvider" src/
indexion grep --semantic=name:is_whitespace src/
indexion grep "TypeIdent:TfidfEmbeddingProvider" src/
indexion grep --semantic=name:is_whitespace src/
Step 3: Fix, then re-run to confirm duplicates are gone
Step 3: Fix, then re-run to confirm duplicates are gone
indexion plan refactor --threshold=0.9 --include='*.mbt' ...
indexion plan refactor --threshold=0.9 --include='*.mbt' ...
Step 4: Lower threshold and iterate
Step 4: Lower threshold and iterate
indexion plan refactor --threshold=0.85 --include='*.mbt' ...
**`plan refactor` options:**
| Option | Default | Description |
|--------|---------|-------------|
| `--threshold=FLOAT` | 0.7 | Minimum similarity threshold |
| `--strategy=NAME` | hybrid | Similarity: hybrid, tfidf, bm25, jsd, ncd |
| `--fdr=FLOAT` | 0 | FDR correction (0=disabled) |
| `--style=STYLE` | raw | Output: raw, structured |
| `--format=FORMAT` | md | Output: md, json, text, github-issue |
| `--name=NAME` | -- | Project name (for structured style) |
| `--include=PATTERN` | -- | Include pattern (repeatable) |
| `--exclude=PATTERN` | -- | Exclude pattern (repeatable) |
| `-o, --output=FILE` | stdout | Output file path |
| `--specs-dir=DIR` | kgfs | KGF specs directory |
**What remains after cleanup (stop signals):**
- **Platform stubs** (`native.mbt` / `stub.mbt`) — intentional platform branching
- **Type method similarity** (`to_string` on different types) — different types, same pattern
- **CLI command boilerplate** (`command()` functions) — @argparse API pattern, not duplication
- **Semantic-but-different** functions (`is_disqualifying_keyword` vs `is_skip_token`) — different purposeindexion plan refactor --threshold=0.85 --include='*.mbt' ...
**`plan refactor`选项:**
| 选项 | 默认值 | 描述 |
|--------|---------|-------------|
| `--threshold=FLOAT` | 0.7 | 最小相似度阈值 |
| `--strategy=NAME` | hybrid | 相似度算法:hybrid、tfidf、bm25、jsd、ncd |
| `--fdr=FLOAT` | 0 | FDR校正(0表示禁用) |
| `--style=STYLE` | raw | 输出样式:raw、structured |
| `--format=FORMAT` | md | 输出格式:md、json、text、github-issue |
| `--name=NAME` | -- | 项目名称(用于structured样式) |
| `--include=PATTERN` | -- | 包含模式(可重复使用) |
| `--exclude=PATTERN` | -- | 排除模式(可重复使用) |
| `-o, --output=FILE` | stdout | 输出文件路径 |
| `--specs-dir=DIR` | kgfs | KGF规范目录 |
**清理后可保留的内容(停止信号):**
- **平台桩代码**(`native.mbt` / `stub.mbt`)——有意的平台分支逻辑
- **类型方法相似度**(不同类型的`to_string`方法)——类型不同,但模式相同
- **CLI命令模板代码**(`command()`函数)——@argparse API模式,不属于重复内容
- **语义相似但用途不同**的函数(`is_disqualifying_keyword` vs `is_skip_token`)——用途不同Phase 2: Extract cross-package shared code (plan solid
)
plan solid阶段2:提取跨包共享代码(plan solid
)
plan solidAfter cleaning within each directory, find code that should be shared across packages.
bash
undefined清理完每个目录内的重复内容后,查找应跨包共享的代码。
bash
undefinedFind overlap between two packages
Find overlap between two packages
indexion plan solid --from=src/a,src/b
indexion plan solid --from=src/a,src/b
Specify extraction target
Specify extraction target
indexion plan solid --from=src/a,src/b --to=src/common
indexion plan solid --from=src/a,src/b --to=src/common
Use tree edit distance for precise function-level matching
Use tree edit distance for precise function-level matching
indexion plan solid --from=src/a,src/b --strategy=apted
indexion plan solid --from=src/a,src/b --strategy=apted
Higher threshold for stricter matching
Higher threshold for stricter matching
indexion plan solid --from=src/a,src/b --threshold=0.95
indexion plan solid --from=src/a,src/b --threshold=0.95
Filter files
Filter files
indexion plan solid --from=src/a,src/b --include='.mbt' --exclude='_test.mbt'
`plan solid` differs from `plan refactor`:
| | `plan refactor` | `plan solid` |
|---|-----------------|-------------|
| Scope | Internal duplication within directories | Cross-directory overlap |
| Goal | Consolidate within a codebase | Extract shared code into a new package |
| Input | `<path>` | `--from=dirA,dirB` |
**`plan solid` options:**
| Option | Default | Description |
|--------|---------|-------------|
| `--from=DIRS` | (required) | Source directories (comma-separated or repeatable) |
| `--to=DIR` | -- | Target directory for extraction |
| `--rules=FILE` | -- | Rules file (.solidrc) |
| `--rule=RULE` | -- | Inline rule (repeatable) |
| `--threshold=FLOAT` | 0.9 | Minimum similarity threshold |
| `--strategy=NAME` | tfidf | Similarity: tfidf, apted, tsed |
| `--include=PATTERN` | -- | Include pattern (repeatable) |
| `--exclude=PATTERN` | -- | Exclude pattern (repeatable) |
| `--format=FORMAT` | md | Output: md, json, github-issue |
| `-o, --output=FILE` | stdout | Output file path |
| `--specs-dir=DIR` | kgfs | KGF specs directory |
**Workflow:**
1. Run `plan refactor` on each directory individually first to clean internal duplication
2. Run `plan solid --from=dirA,dirB` to find cross-directory extraction candidates
3. Extract shared code following the plan's recommendations
4. Use `indexion grep "TypeIdent:SharedType"` to verify all references are updatedindexion plan solid --from=src/a,src/b --include='.mbt' --exclude='_test.mbt'
`plan solid`与`plan refactor`的区别:
| | `plan refactor` | `plan solid` |
|---|-----------------|-------------|
| 范围 | 目录内的重复内容 | 跨目录的重叠代码 |
| 目标 | 在代码库内合并重复内容 | 将共享代码提取到新包中 |
| 输入 | `<path>` | `--from=dirA,dirB` |
**`plan solid`选项:**
| 选项 | 默认值 | 描述 |
|--------|---------|-------------|
| `--from=DIRS` | 必填 | 源目录(逗号分隔或可重复使用) |
| `--to=DIR` | -- | 提取目标目录 |
| `--rules=FILE` | -- | 规则文件(.solidrc) |
| `--rule=RULE` | -- | 内联规则(可重复使用) |
| `--threshold=FLOAT` | 0.9 | 最小相似度阈值 |
| `--strategy=NAME` | tfidf | 相似度算法:tfidf、apted、tsed |
| `--include=PATTERN` | -- | 包含模式(可重复使用) |
| `--exclude=PATTERN` | -- | 排除模式(可重复使用) |
| `--format=FORMAT` | md | 输出格式:md、json、github-issue |
| `-o, --output=FILE` | stdout | 输出文件路径 |
| `--specs-dir=DIR` | kgfs | KGF规范目录 |
**工作流程:**
1. 先对每个目录单独运行`plan refactor`,清理内部重复内容
2. 运行`plan solid --from=dirA,dirB`,查找跨目录提取候选代码
3. 根据建议提取共享代码
4. 使用`indexion grep "TypeIdent:SharedType"`验证所有引用已更新Phase 3: Remove unnecessary wrappers (plan unwrap
)
plan unwrap阶段3:移除不必要的包装器(plan unwrap
)
plan unwrapAfter consolidation, clean up trivial delegation functions that add indirection without value.
bash
undefined合并完成后,清理那些仅增加间接性而无实际价值的无意义委托函数。
bash
undefinedStep 1: Quick check
Step 1: Quick check
indexion grep --semantic=proxy src/
indexion grep --semantic=proxy src/
Step 2: Detailed report
Step 2: Detailed report
indexion plan unwrap --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
indexion plan unwrap --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
Step 3: Preview changes (safe — no files modified)
Step 3: Preview changes (safe — no files modified)
indexion plan unwrap --dry-run --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
indexion plan unwrap --dry-run --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
Step 4: Apply fixes
Step 4: Apply fixes
indexion plan unwrap --fix --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
indexion plan unwrap --fix --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
Step 5: Run tests
Step 5: Run tests
moon test --target native
**What gets detected:** Functions whose body is a single function call with
all arguments forwarded as simple identifiers — no control flow, no transforms.
```moonbit
// Detected (default) — trivial delegation
fn matches_pattern(text : String, pat : String) -> Bool {
@glob.glob_match(text, pat)
}
// Excluded by default (use --all to include)
fn length(self : MyList) -> Int {
self.items.length() // self-delegation (encapsulation)
}
fn emit(value : String) -> Action {
Emit(value) // bare constructor
}plan unwrap| Mode | Flag | Description |
|---|---|---|
| Report | (default) | List wrappers found |
| Preview | | Show all edits without modifying files |
| Fix | | Apply edits to files |
plan unwrap| Option | Default | Description |
|---|---|---|
| -- | Preview edits |
| -- | Apply edits |
| -- | Include self-delegation and bare constructor wrappers |
| -- | Include |
| -- | Include bare constructor wrappers |
| -- | Include pattern (repeatable) |
| -- | Exclude pattern (repeatable) |
| md | Output: md, json, text |
| stdout | Output file path |
| kgfs | KGF specs directory |
Review before removing:
- Platform wrappers (FFI, ) are abstraction layers, not accidental indirection
@osenv_path - Public API wrappers used by external packages — removing them is a breaking change
- Always first
--dry-run
moon test --target native
**检测目标:**函数体仅包含一个函数调用,且所有参数均以简单标识符转发——无控制流、无转换逻辑。
```moonbit
// Detected (default) — trivial delegation
fn matches_pattern(text : String, pat : String) -> Bool {
@glob.glob_match(text, pat)
}
// Excluded by default (use --all to include)
fn length(self : MyList) -> Int {
self.items.length() // self-delegation (encapsulation)
}
fn emit(value : String) -> Action {
Emit(value) // bare constructor
}plan unwrap| 模式 | 标志 | 描述 |
|---|---|---|
| 报告 | 默认 | 列出检测到的包装器 |
| 预览 | | 显示所有编辑内容但不修改文件 |
| 修复 | | 将编辑内容应用到文件 |
plan unwrap| 选项 | 默认值 | 描述 |
|---|---|---|
| -- | 预览编辑内容 |
| -- | 应用编辑内容 |
| -- | 包含自委托和裸构造函数包装器 |
| -- | 包含 |
| -- | 包含裸构造函数包装器 |
| -- | 包含模式(可重复使用) |
| -- | 排除模式(可重复使用) |
| md | 输出格式:md、json、text |
| stdout | 输出文件路径 |
| kgfs | KGF规范目录 |
移除前需检查:
- 平台包装器(FFI、)是抽象层,并非意外的间接性
@osenv_path - 被外部包使用的公共API包装器——移除会导致破坏性变更
- 务必先使用预览
--dry-run
Phase 4: Detect concept-level duplication (explore
+ analysis)
explore阶段4:检测概念级重复内容(explore
+ 分析)
exploreThis is the hardest level. Textual and structural tools won't find it because the
code is different — but the concept is the same.
bash
undefined这是最难处理的层级。文本和结构工具无法检测到它,因为代码本身不同——但概念是相同的。
bash
undefinedFind which files share vocabulary (= work in the same concept domain)
Find which files share vocabulary (= work in the same concept domain)
indexion explore --threshold=0.4
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/
Files at 40-60% similarity without structural duplication are **concept neighbors** —
they use the same terms because they deal with the same domain.
For each high-similarity pair, ask: **"What concept do they share, and who owns it?"**
```bashindexion explore --threshold=0.4
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/
相似度在40-60%之间且无结构重复的文件是**概念关联文件**——它们使用相同术语,因为处理的是同一领域。
对于每一对高相似度文件,思考:**“它们共享什么概念,谁应该拥有这个概念?”**
```bashInspect shared vocabulary with tree structure comparison
Inspect shared vocabulary with tree structure comparison
indexion explore file_a.mbt file_b.mbt --threshold=0 --strategy=apted
**Common patterns of concept leakage:**
| Symptom | Concept leaked | Fix |
|---------|---------------|-----|
| Both files call `is_X(spec)` then `Y::from_spec(spec)` | "Determine if X and configure Y" | Extract `try_do_X(path, spec)` into the module that owns X |
| Both files `@fs.read_file_to_string(path)` when content is already loaded | "Read file content" | Pass content as argument, don't re-read |
| Both files `parent_dir(path)` then `@fs.read_dir(dir)` | "List sibling files" | Centralize directory walking into pipeline |
| Multiple `if is_virtual_path(x) { skip }` guards | "Real vs virtual path" | Make the type system prevent virtual paths from reaching here |
| Both files `buf.write_string("\n"); buf.write_string(x)` | "Join text entries" | Extract `join_text_entries()` into the owning module |indexion explore file_a.mbt file_b.mbt --threshold=0 --strategy=apted
**概念泄漏的常见模式:**
| 症状 | 泄漏的概念 | 修复方案 |
|---------|---------------|-----|
| 两个文件都先调用`is_X(spec)`,再调用`Y::from_spec(spec)` | “判断是否为X并配置Y” | 将`try_do_X(path, spec)`提取到拥有X的模块中 |
| 当内容已加载时,两个文件仍调用`@fs.read_file_to_string(path)` | “读取文件内容” | 直接传递内容作为参数,避免重复读取 |
| 两个文件都先调用`parent_dir(path)`,再调用`@fs.read_dir(dir)` | “列出同级文件” | 将目录遍历逻辑集中到流水线中 |
| 多处出现`if is_virtual_path(x) { skip }`守卫逻辑 | “真实路径 vs 虚拟路径” | 让类型系统阻止虚拟路径到达此处 |
| 两个文件都执行`buf.write_string("\n"); buf.write_string(x)` | “拼接文本条目” | 将`join_text_entries()`提取到所属模块中 |Phase 5: Consolidate into SoT
阶段5:合并到SoT
The module that defines the concept should be the only one that implements the logic.
Rules:
- One concept, one module, one function. If "extract text from archive" appears in ,
vfs.mbt, anddiscover.mbt, it belongs inargs.mbtonly.vfs.mbt - Callers receive results, not ingredients. Don't export +
is_archive_spec+ArchiveSpec::from_specseparately. Exportexpand_archive.try_extract_archive_text(path, spec) -> String? - Guards are symptoms, not fixes. means virtual paths shouldn't reach here at all. Fix the source, not the sink.
if is_virtual_path(x) { skip } - Re-reading from disk what's already in memory is a concept leak. If holds the text, no downstream code should call
SupportedFile.content.@fs.read_file_to_string(file.path)
定义概念的模块应该是唯一实现逻辑的模块。
规则:
- 一个概念,一个模块,一个函数。 如果“从归档文件提取文本”的逻辑出现在、
vfs.mbt和discover.mbt中,它应该只存在于args.mbt中。vfs.mbt - 调用者接收结果,而非原料。 不要分别导出+
is_archive_spec+ArchiveSpec::from_spec,应导出expand_archive。try_extract_archive_text(path, spec) -> String? - 守卫逻辑是症状,而非解决方案。 意味着虚拟路径根本不应该到达此处。修复源头,而非在下游补救。
if is_virtual_path(x) { skip } - 重复读取已在内存中的磁盘内容是概念泄漏。 如果已存储文本,下游代码不应再调用
SupportedFile.content。@fs.read_file_to_string(file.path)
Phase 6: Verify
阶段6:验证
bash
undefinedbash
undefinedConfirm textual duplication is gone
Confirm textual duplication is gone
indexion plan refactor --threshold=0.9
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/indexion/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/indexion/
indexion plan refactor --threshold=0.9
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/indexion/
--include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated'
src/ cmd/indexion/
Confirm concept similarity is reduced
Confirm concept similarity is reduced
indexion explore file_a.mbt file_b.mbt --threshold=0
indexion explore file_a.mbt file_b.mbt --threshold=0
Confirm wrappers are cleaned up
Confirm wrappers are cleaned up
indexion plan unwrap --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
indexion plan unwrap --include='.mbt' --exclude='_wbtest.mbt'
--exclude='moon.pkg' --exclude='pkg.generated' src/
--exclude='moon.pkg' --exclude='pkg.generated' src/
Run tests
Run tests
moon test --target native
After SoT consolidation:
- Textual similarity between the concept owner and its callers drops
- Callers become shorter (one API call instead of multi-step logic)
- The concept owner may grow, but it's the **single place to change**moon test --target native
合并到SoT后:
- 概念所属模块与其调用者之间的文本相似度降低
- 调用者代码更简短(一次API调用替代多步逻辑)
- 概念所属模块可能会变大,但它是**唯一需要修改的地方**Phase 7: Prove non-recurrence with tests
阶段7:通过测试防止问题复发
Write a test that structurally prevents the old pattern from recurring:
moonbit
test "SoT: SupportedFile.path is always a real filesystem path" {
// Create an archive, run load_supported_file_info
// Assert: no path contains "!/"
// Assert: every path passes @fs.path_exists
}The test doesn't check behavior — it checks the SoT invariant.
编写一个从结构上阻止旧模式复发的测试:
moonbit
test "SoT: SupportedFile.path is always a real filesystem path" {
// Create an archive, run load_supported_file_info
// Assert: no path contains "!/"
// Assert: every path passes @fs.path_exists
}该测试不检查行为——它检查SoT不变量。
Red Flags
危险信号
"I need to add a guard here"
“我需要在这里添加守卫逻辑”
If you're adding to a function that shouldn't receive
special cases, the problem is upstream. The function's caller should never pass that value.
if is_special_case(x) { skip }如果你要向一个不应该接收特殊情况的函数添加,问题出在上游。函数的调用者绝不应该传递该值。
if is_special_case(x) { skip }"It works but prints errors to stderr"
“它能运行,但会向stderr输出错误”
Stderr messages from C runtime () mean invalid data
reached a system call. absorbs the error, but already printed.
The only fix is preventing invalid data from reaching the call.
opendir: No such file or directorycatchperror()来自C运行时的stderr消息()意味着无效数据到达了系统调用。可以吸收错误,但已经输出了信息。唯一的修复方法是阻止无效数据到达该调用。
opendir: No such file or directorycatchperror()"I'll fix it in each command separately"
“我会在每个命令中分别修复”
If the same fix is needed in explore, search, grep, reconcile, plan documentation...
the fix belongs in the shared pipeline, not in each command.
如果同一个修复需要在explore、search、grep、reconcile、plan文档等多个地方进行,这个修复应该放在共享流水线中,而非每个命令里。
"The similarity is just shared vocabulary, not real duplication"
“相似度只是共享词汇,不是真正的重复”
40-60% TF-IDF similarity between modules that aren't supposed to share concepts is a
warning. The vocabulary match IS the signal.
本不应共享概念的模块之间出现40-60%的TF-IDF相似度是一个警告。词汇匹配本身就是信号。
Quick Reference: Which Command When
快速参考:何时使用哪个命令
| Question | Command |
|---|---|
| "What files are similar?" | |
| "What exactly is duplicated?" | |
| "What code overlaps between packages A and B?" | |
| "Which functions are trivial wrappers?" | |
| "What concept do these files share?" | |
| "Has the duplication been fixed?" | Re-run |
| 问题 | 命令 |
|---|---|
| “哪些文件相似?” | |
| “具体哪些内容重复了?” | |
| “包A和包B之间有哪些重叠代码?” | |
| “哪些函数是无意义的包装器?” | |
| “这些文件共享什么概念?” | |
| “重复内容是否已修复?” | 使用相同阈值重新运行 |