indexion-explore
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseindexion explore
indexion explore
Analyze file similarity across a directory to find duplicates and related code.
分析目录下的文件相似度,查找重复文件和相关代码。
When to Use
适用场景
- User asks to find similar or duplicate files
- User wants to understand code overlap before refactoring
- User asks "what files are related to X?"
- User wants to detect copy-paste code
- Quick scan before detailed refactoring — use as a first pass, then
follow up with for actionable detail
indexion plan refactor
- 用户要求查找相似或重复文件
- 用户想要在重构前了解代码重叠情况
- 用户询问"什么文件与X相关?"
- 用户想要检测复制粘贴的代码
- 详细重构前的快速扫描 —— 作为第一步使用,之后可执行获取可落地的详细方案
indexion plan refactor
Usage
使用方法
bash
undefinedbash
undefinedBasic similarity matrix (default: tfidf strategy)
Basic similarity matrix (default: tfidf strategy)
indexion explore <path>
indexion explore <path>
List format with threshold (most useful for finding duplicates)
List format with threshold (most useful for finding duplicates)
indexion explore --format=list --threshold=0.7 <path>
indexion explore --format=list --threshold=0.7 <path>
Cluster similar files together
Cluster similar files together
indexion explore --format=cluster --threshold=0.6 <path>
indexion explore --format=cluster --threshold=0.6 <path>
JSON output for further processing
JSON output for further processing
indexion explore --format=json --threshold=0.5 <path>
indexion explore --format=json --threshold=0.5 <path>
Filter by extension
Filter by extension
indexion explore --ext=.mbt --ext=.ts <path>
indexion explore --ext=.mbt --ext=.ts <path>
Include/exclude patterns
Include/exclude patterns
indexion explore --include='.ts' --exclude='_test.ts' src/
indexion explore --include='.ts' --exclude='_test.ts' src/
Filter out config noise
Filter out config noise
indexion explore --format=list --threshold=0.7
--include='*.mbt' --exclude='moon.pkg' cmd/indexion/
--include='*.mbt' --exclude='moon.pkg' cmd/indexion/
indexion explore --format=list --threshold=0.7
--include='*.mbt' --exclude='moon.pkg' cmd/indexion/
--include='*.mbt' --exclude='moon.pkg' cmd/indexion/
Function-level tree edit distance (more precise, slower)
Function-level tree edit distance (more precise, slower)
indexion explore --strategy=apted --format=list <path>
indexion explore --strategy=tsed --format=list <path>
indexion explore --strategy=apted --format=list <path>
indexion explore --strategy=tsed --format=list <path>
Hybrid strategy (auto-selects TF-IDF or APTED based on dataset size)
Hybrid strategy (auto-selects TF-IDF or APTED based on dataset size)
indexion explore --strategy=hybrid --format=list <path>
undefinedindexion explore --strategy=hybrid --format=list <path>
undefinedStrategies
匹配策略
| Strategy | Description | Speed |
|---|---|---|
| TF-IDF token similarity | Fast |
| Dynamic TF-IDF + APTED, auto-selects based on dataset size | Adaptive |
| Normalized Compression Distance | Fast |
| All-Path Tree Edit Distance (function-level) | Slow |
| Tree Structure Edit Distance (function-level) | Slow |
| 策略 | 描述 | 速度 |
|---|---|---|
| TF-IDF token相似度匹配 | 快 |
| 动态TF-IDF + APTED,根据数据集大小自动选择 | 自适应 |
| 归一化压缩距离 | 快 |
| 全路径树编辑距离(函数级别) | 慢 |
| 树结构编辑距离(函数级别) | 慢 |
Output Formats
输出格式
- — Full similarity matrix (default, good for small sets)
matrix - — Sorted pairs above threshold (best for finding duplicates)
list - — Groups of similar files
cluster - — Machine-readable output
json
- —— 完整相似度矩阵(默认,适合小数据集)
matrix - —— 高于阈值的排序配对(最适合查找重复内容)
list - —— 相似文件分组
cluster - —— 机器可读的输出格式
json
Relationship to Other Commands
与其他命令的对应关系
| Task | Use |
|---|---|
| "What files are similar?" | |
| "Find nested for loops" | |
| "Find functions named sort" | |
| "What exactly is duplicated?" | |
| "Find code similar to a description" | |
| 需求 | 使用命令 |
|---|---|
| "哪些文件是相似的?" | |
| "查找嵌套for循环" | |
| "查找名为sort的函数" | |
| "具体哪些内容是重复的?" | |
| "查找与描述相似的代码" | |
Workflow: explore → plan refactor
工作流:explore → plan refactor
- Run for a quick scan
indexion explore --format=list --threshold=0.7 <path> - If high-similarity pairs exist, run for details
indexion plan refactor --threshold=0.9 <path> - Fix duplicates, then re-run both to verify
- 执行进行快速扫描
indexion explore --format=list --threshold=0.7 <path> - 如果存在高相似度的文件对,执行获取详情
indexion plan refactor --threshold=0.9 <path> - 修复重复内容后,重新执行两个命令验证结果
Dogfooding Lessons
内部使用经验
- moon.pkg files inflate similarity scores (they all look alike) — exclude
with for meaningful results
--exclude='*moon.pkg*' - 96%+ similarity between CLI files usually means duplicated utility functions
- 85-95% similarity is often structural (same CLI patterns) — not always actionable
- types.mbt files showing 100% similarity is normal — type definition files share structural patterns (pub struct + getters) that inflate TF-IDF scores
- moon.pkg文件会拉高相似度得分(它们的内容都很相似)—— 使用排除此类文件可获得更有意义的结果
--exclude='*moon.pkg*' - CLI文件之间96%+相似度通常意味着存在重复的工具函数
- 85-95%相似度通常是结构层面的(相同的CLI模式)—— 不一定需要处理
- types.mbt文件显示100%相似度是正常现象—— 类型定义文件具有相同的结构模式(pub struct + getters),会拉高TF-IDF得分