index-knowledge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

index-knowledge

索引知识库

Generate hierarchical AGENTS.md files. Root + complexity-scored subdirectories.
生成分层的AGENTS.md文件。包含根目录及带复杂度评分的子目录。

Usage

使用方法

--create-new   # Read existing → remove all → regenerate from scratch
--max-depth=2  # Limit directory depth (default: 5)
Default: Update mode (modify existing + create new where warranted)

--create-new   # 读取现有文件 → 删除全部 → 从头重新生成
--max-depth=2  # 限制目录深度(默认值:5)
默认模式:更新模式(修改现有内容 + 在必要时创建新内容)

Workflow (High-Level)

工作流程(高层级)

  1. Discovery + Analysis (concurrent)
    • Launch parallel explore agents (multiple Task calls in one message)
    • Main session: bash structure + LSP codemap + read existing AGENTS.md
  2. Score & Decide - Determine AGENTS.md locations from merged findings
  3. Generate - Root first, then subdirs in parallel
  4. Review - Deduplicate, trim, validate
<critical> **TodoWrite ALL phases. Mark in_progress → completed in real-time.**
TodoWrite([
  { id: "discovery", content: "Fire explore agents + LSP codemap + read existing", status: "pending", priority: "high" },
  { id: "scoring", content: "Score directories, determine locations", status: "pending", priority: "high" },
  { id: "generate", content: "Generate AGENTS.md files (root + subdirs)", status: "pending", priority: "high" },
  { id: "review", content: "Deduplicate, validate, trim", status: "pending", priority: "medium" }
])
</critical>
  1. 发现与分析(并行)
    • 启动并行探索Agent(一条消息中包含多个Task调用)
    • 主会话:bash结构分析 + LSP代码映射 + 读取现有AGENTS.md文件
  2. 评分与决策 - 根据合并后的结果确定AGENTS.md的生成位置
  3. 生成 - 先生成根目录文档,再并行生成子目录文档
  4. 审核 - 去重、精简、验证
<critical> **记录所有阶段的TodoWrite状态。实时标记in_progress(进行中)→ completed(已完成)。**
TodoWrite([
  { id: "discovery", content: "Fire explore agents + LSP codemap + read existing", status: "pending", priority: "high" },
  { id: "scoring", content: "Score directories, determine locations", status: "pending", priority: "high" },
  { id: "generate", content: "Generate AGENTS.md files (root + subdirs)", status: "pending", priority: "high" },
  { id: "review", content: "Deduplicate, validate, trim", status: "pending", priority: "medium" }
])
</critical>

Phase 1: Discovery + Analysis (Concurrent)

阶段1:发现与分析(并行)

Mark "discovery" as in_progress.
将"discovery"标记为in_progress(进行中)。

Launch Parallel Explore Agents

启动并行探索Agent

Multiple Task calls in a single message execute in parallel. Results return directly.
// All Task calls in ONE message = parallel execution

Task(
  description="project structure",
  subagent_type="explore",
  prompt="Project structure: PREDICT standard patterns for detected language → REPORT deviations only"
)

Task(
  description="entry points",
  subagent_type="explore",
  prompt="Entry points: FIND main files → REPORT non-standard organization"
)

Task(
  description="conventions",
  subagent_type="explore",
  prompt="Conventions: FIND config files (.eslintrc, pyproject.toml, .editorconfig) → REPORT project-specific rules"
)

Task(
  description="anti-patterns",
  subagent_type="explore",
  prompt="Anti-patterns: FIND 'DO NOT', 'NEVER', 'ALWAYS', 'DEPRECATED' comments → LIST forbidden patterns"
)

Task(
  description="build/ci",
  subagent_type="explore",
  prompt="Build/CI: FIND .github/workflows, Makefile → REPORT non-standard patterns"
)

Task(
  description="test patterns",
  subagent_type="explore",
  prompt="Test patterns: FIND test configs, test structure → REPORT unique conventions"
)
<dynamic-agents> **DYNAMIC AGENT SPAWNING**: After bash analysis, spawn ADDITIONAL explore agents based on project scale:
FactorThresholdAdditional Agents
Total files>100+1 per 100 files
Total lines>10k+1 per 10k lines
Directory depth≥4+2 for deep exploration
Large files (>500 lines)>10 files+1 for complexity hotspots
Monorepodetected+1 per package/workspace
Multiple languages>1+1 per language
bash
undefined
单条消息中的多个Task调用会并行执行,结果直接返回。
// 单条消息中的所有Task调用 = 并行执行

Task(
  description="project structure",
  subagent_type="explore",
  prompt="Project structure: PREDICT standard patterns for detected language → REPORT deviations only"
)

Task(
  description="entry points",
  subagent_type="explore",
  prompt="Entry points: FIND main files → REPORT non-standard organization"
)

Task(
  description="conventions",
  subagent_type="explore",
  prompt="Conventions: FIND config files (.eslintrc, pyproject.toml, .editorconfig) → REPORT project-specific rules"
)

Task(
  description="anti-patterns",
  subagent_type="explore",
  prompt="Anti-patterns: FIND 'DO NOT', 'NEVER', 'ALWAYS', 'DEPRECATED' comments → LIST forbidden patterns"
)

Task(
  description="build/ci",
  subagent_type="explore",
  prompt="Build/CI: FIND .github/workflows, Makefile → REPORT non-standard patterns"
)

Task(
  description="test patterns",
  subagent_type="explore",
  prompt="Test patterns: FIND test configs, test structure → REPORT unique conventions"
)
<dynamic-agents> **动态Agent生成**:在bash分析后,根据项目规模生成额外的探索Agent:
因素阈值额外Agent数量
总文件数>100每100个文件+1个
总行数>10k每10k行+1个
目录深度≥4+2个用于深度探索
大文件(>500行)>10个+1个用于复杂度热点分析
单体仓库(Monorepo)已检测到每个包/工作区+1个
多语言>1种每种语言+1个
bash
undefined

Measure project scale first

先测量项目规模

total_files=$(find . -type f -not -path '/node_modules/' -not -path '/.git/' | wc -l) total_lines=$(find . -type f ( -name ".ts" -o -name ".py" -o -name ".go" ) -not -path '/node_modules/' -exec wc -l {} + 2>/dev/null | tail -1 | awk '{print $1}') large_files=$(find . -type f ( -name ".ts" -o -name ".py" ) -not -path '/node_modules/' -exec wc -l {} + 2>/dev/null | awk '$1 > 500 {count++} END {print count+0}') max_depth=$(find . -type d -not -path '/node_modules/' -not -path '/.git/*' | awk -F/ '{print NF}' | sort -rn | head -1)

Example spawning (all in ONE message for parallel execution):
// 500 files, 50k lines, depth 6, 15 large files → spawn additional agents Task( description="large files", subagent_type="explore", prompt="Large file analysis: FIND files >500 lines, REPORT complexity hotspots" )
Task( description="deep modules", subagent_type="explore", prompt="Deep modules at depth 4+: FIND hidden patterns, internal conventions" )
Task( description="cross-cutting", subagent_type="explore", prompt="Cross-cutting concerns: FIND shared utilities across directories" ) // ... more based on calculation
</dynamic-agents>
total_files=$(find . -type f -not -path '/node_modules/' -not -path '/.git/' | wc -l) total_lines=$(find . -type f ( -name ".ts" -o -name ".py" -o -name ".go" ) -not -path '/node_modules/' -exec wc -l {} + 2>/dev/null | tail -1 | awk '{print $1}') large_files=$(find . -type f ( -name ".ts" -o -name ".py" ) -not -path '/node_modules/' -exec wc -l {} + 2>/dev/null | awk '$1 > 500 {count++} END {print count+0}') max_depth=$(find . -type d -not -path '/node_modules/' -not -path '/.git/*' | awk -F/ '{print NF}' | sort -rn | head -1)

示例生成(全部在单条消息中以并行执行):
// 500个文件、50k行、深度6、15个大文件 → 生成额外Agent Task( description="large files", subagent_type="explore", prompt="Large file analysis: FIND files >500 lines, REPORT complexity hotspots" )
Task( description="deep modules", subagent_type="explore", prompt="Deep modules at depth 4+: FIND hidden patterns, internal conventions" )
Task( description="cross-cutting", subagent_type="explore", prompt="Cross-cutting concerns: FIND shared utilities across directories" ) // ... 根据计算结果生成更多
</dynamic-agents>

Main Session: Concurrent Analysis

主会话:并行分析

While Task agents execute, main session does:
在Task Agent执行的同时,主会话执行以下操作:

1. Bash Structural Analysis

1. Bash结构分析

bash
undefined
bash
undefined

Directory depth + file counts

目录深度 + 文件计数

find . -type d -not -path '/.' -not -path '/node_modules/' -not -path '/venv/' -not -path '/dist/' -not -path '/build/' | awk -F/ '{print NF-1}' | sort -n | uniq -c
find . -type d -not -path '/.' -not -path '/node_modules/' -not -path '/venv/' -not -path '/dist/' -not -path '/build/' | awk -F/ '{print NF-1}' | sort -n | uniq -c

Files per directory (top 30)

各目录文件数(前30个)

find . -type f -not -path '/.' -not -path '/node_modules/' | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -30
find . -type f -not -path '/.' -not -path '/node_modules/' | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -30

Code concentration by extension

按文件扩展名统计代码分布

find . -type f ( -name ".py" -o -name ".ts" -o -name ".tsx" -o -name ".js" -o -name ".go" -o -name ".rs" ) -not -path '/node_modules/' | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -20
find . -type f ( -name ".py" -o -name ".ts" -o -name ".tsx" -o -name ".js" -o -name ".go" -o -name ".rs" ) -not -path '/node_modules/' | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -20

Existing AGENTS.md / CLAUDE.md

现有AGENTS.md / CLAUDE.md文件

find . -type f ( -name "AGENTS.md" -o -name "CLAUDE.md" ) -not -path '/node_modules/' 2>/dev/null
undefined
find . -type f ( -name "AGENTS.md" -o -name "CLAUDE.md" ) -not -path '/node_modules/' 2>/dev/null
undefined

2. Read Existing AGENTS.md

2. 读取现有AGENTS.md文件

For each existing file found:
  Read(filePath=file)
  Extract: key insights, conventions, anti-patterns
  Store in EXISTING_AGENTS map
If
--create-new
: Read all existing first (preserve context) → then delete all → regenerate.
对于每个找到的现有文件:
  Read(filePath=file)
  提取:关键见解、约定、反模式
  存储到EXISTING_AGENTS映射中
如果使用
--create-new
:先读取所有现有文件(保留上下文)→ 然后删除全部 → 重新生成。

3. LSP Codemap (if available)

3. LSP代码映射(如果可用)

lsp_servers()  # Check availability
lsp_servers()  # 检查可用性

Entry points (parallel)

入口点(并行)

lsp_document_symbols(filePath="src/index.ts") lsp_document_symbols(filePath="main.py")
lsp_document_symbols(filePath="src/index.ts") lsp_document_symbols(filePath="main.py")

Key symbols (parallel)

关键符号(并行)

lsp_workspace_symbols(filePath=".", query="class") lsp_workspace_symbols(filePath=".", query="interface") lsp_workspace_symbols(filePath=".", query="function")
lsp_workspace_symbols(filePath=".", query="class") lsp_workspace_symbols(filePath=".", query="interface") lsp_workspace_symbols(filePath=".", query="function")

Centrality for top exports

顶级导出的中心度

lsp_find_references(filePath="...", line=X, character=Y)

**LSP Fallback**: If unavailable, rely on explore agents + AST-grep.

**Merge: bash + LSP + existing + Task agent results. Mark "discovery" as completed.**

---
lsp_find_references(filePath="...", line=X, character=Y)

**LSP回退方案**:如果不可用,依赖探索Agent + AST-grep。

**合并**:bash分析结果 + LSP结果 + 现有文件内容 + Task Agent结果。将"discovery"标记为completed(已完成)。

---

Phase 2: Scoring & Location Decision

阶段2:评分与位置决策

Mark "scoring" as in_progress.
将"scoring"标记为in_progress(进行中)。

Scoring Matrix

评分矩阵

FactorWeightHigh ThresholdSource
File count3x>20bash
Subdir count2x>5bash
Code ratio2x>70%bash
Unique patterns1xHas own configexplore
Module boundary2xHas index.ts/init.pybash
Symbol density2x>30 symbolsLSP
Export count2x>10 exportsLSP
Reference centrality3x>20 refsLSP
因素权重高阈值数据来源
文件数3倍>20bash
子目录数2倍>5bash
代码占比2倍>70%bash
独特模式1倍有独立配置文件探索Agent
模块边界2倍有index.ts/init.pybash
符号密度2倍>30个符号LSP
导出数2倍>10个导出LSP
引用中心度3倍>20个引用LSP

Decision Rules

决策规则

ScoreAction
Root (.)ALWAYS create
>15Create AGENTS.md
8-15Create if distinct domain
<8Skip (parent covers)
分数操作
根目录 (.)始终创建
>15创建AGENTS.md
8-15如果是独立领域则创建
<8跳过(父目录文档已覆盖)

Output

输出

AGENTS_LOCATIONS = [
  { path: ".", type: "root" },
  { path: "src/hooks", score: 18, reason: "high complexity" },
  { path: "src/api", score: 12, reason: "distinct domain" }
]
Mark "scoring" as completed.

AGENTS_LOCATIONS = [
  { path: ".", type: "root" },
  { path: "src/hooks", score: 18, reason: "high complexity" },
  { path: "src/api", score: 12, reason: "distinct domain" }
]
将"scoring"标记为completed(已完成)。

Phase 3: Generate AGENTS.md

阶段3:生成AGENTS.md

Mark "generate" as in_progress.
将"generate"标记为in_progress(进行中)。

Root AGENTS.md (Full Treatment)

根目录AGENTS.md(完整内容)

markdown
undefined
markdown
undefined

PROJECT KNOWLEDGE BASE

项目知识库

Generated: {TIMESTAMP} Commit: {SHORT_SHA} Branch: {BRANCH}
生成时间: {TIMESTAMP} 提交哈希: {SHORT_SHA} 分支: {BRANCH}

OVERVIEW

概述

{1-2 sentences: what + core stack}
{1-2句话:项目内容 + 核心技术栈}

STRUCTURE

结构

``` {root}/ ├── {dir}/ # {non-obvious purpose only} └── {entry} ```
``` {root}/ ├── {dir}/ # 仅标注非显而易见的用途} └── {entry} ```

WHERE TO LOOK

查找指引

TaskLocationNotes
任务位置说明

CODE MAP

代码映射

{From LSP - skip if unavailable or project <10 files}
| Symbol | Type | Location | Refs | Role |
{来自LSP - 如果不可用或项目文件<10个则跳过}
| 符号 | 类型 | 位置 | 引用数 | 角色 |

CONVENTIONS

约定

{ONLY deviations from standard}
{仅标注与标准的差异}

ANTI-PATTERNS (THIS PROJECT)

项目反模式

{Explicitly forbidden here}
{此处明确禁止的内容}

UNIQUE STYLES

独特风格

{Project-specific}
{项目特有的规则}

COMMANDS

命令

```bash {dev/test/build} ```
```bash {开发/测试/构建命令} ```

NOTES

注意事项

{Gotchas}

**Quality gates**: 50-150 lines, no generic advice, no obvious info.
{常见陷阱}

**质量门槛**:50-150行,无通用建议,无显而易见的信息。

Subdirectory AGENTS.md (Parallel)

子目录AGENTS.md(并行生成)

Launch general agents for each location in ONE message (parallel execution):
// All in single message = parallel
Task(
  description="AGENTS.md for src/hooks",
  subagent_type="general",
  prompt="Generate AGENTS.md for: src/hooks
    - Reason: high complexity
    - 30-80 lines max
    - NEVER repeat parent content
    - Sections: OVERVIEW (1 line), STRUCTURE (if >5 subdirs), WHERE TO LOOK, CONVENTIONS (if different), ANTI-PATTERNS
    - Write directly to src/hooks/AGENTS.md"
)

Task(
  description="AGENTS.md for src/api",
  subagent_type="general",
  prompt="Generate AGENTS.md for: src/api
    - Reason: distinct domain
    - 30-80 lines max
    - NEVER repeat parent content
    - Sections: OVERVIEW (1 line), STRUCTURE (if >5 subdirs), WHERE TO LOOK, CONVENTIONS (if different), ANTI-PATTERNS
    - Write directly to src/api/AGENTS.md"
)
// ... one Task per AGENTS_LOCATIONS entry
Results return directly. Mark "generate" as completed.

在单条消息中为每个位置启动通用Agent(并行执行):
// 全部在单条消息中 = 并行执行
Task(
  description="AGENTS.md for src/hooks",
  subagent_type="general",
  prompt="Generate AGENTS.md for: src/hooks
    - Reason: high complexity
    - 30-80 lines max
    - NEVER repeat parent content
    - Sections: OVERVIEW (1 line), STRUCTURE (if >5 subdirs), WHERE TO LOOK, CONVENTIONS (if different), ANTI-PATTERNS
    - Write directly to src/hooks/AGENTS.md"
)

Task(
  description="AGENTS.md for src/api",
  subagent_type="general",
  prompt="Generate AGENTS.md for: src/api
    - Reason: distinct domain
    - 30-80 lines max
    - NEVER repeat parent content
    - Sections: OVERVIEW (1 line), STRUCTURE (if >5 subdirs), WHERE TO LOOK, CONVENTIONS (if different), ANTI-PATTERNS
    - Write directly to src/api/AGENTS.md"
)
// ... 每个AGENTS_LOCATIONS条目对应一个Task
结果直接返回。将"generate"标记为completed(已完成)。

Phase 4: Review & Deduplicate

阶段4:审核与去重

Mark "review" as in_progress.
For each generated file:
  • Remove generic advice
  • Remove parent duplicates
  • Trim to size limits
  • Verify telegraphic style
Mark "review" as completed.

将"review"标记为in_progress(进行中)。
对于每个生成的文件:
  • 删除通用建议
  • 删除与父目录重复的内容
  • 精简至行数限制内
  • 验证电报式风格(简洁直接)
将"review"标记为completed(已完成)。

Final Report

最终报告

=== index-knowledge Complete ===

Mode: {update | create-new}

Files:
  ✓ ./AGENTS.md (root, {N} lines)
  ✓ ./src/hooks/AGENTS.md ({N} lines)

Dirs Analyzed: {N}
AGENTS.md Created: {N}
AGENTS.md Updated: {N}

Hierarchy:
  ./AGENTS.md
  └── src/hooks/AGENTS.md

=== index-knowledge 完成 ===

模式: {update | create-new}

文件:
  ✓ ./AGENTS.md (根目录, {N}行)
  ✓ ./src/hooks/AGENTS.md ({N}行)

已分析目录数: {N}
已创建AGENTS.md数: {N}
已更新AGENTS.md数: {N}

层级结构:
  ./AGENTS.md
  └── src/hooks/AGENTS.md

Anti-Patterns

反模式

  • Static agent count: MUST vary agents based on project size/depth
  • Sequential execution: MUST parallel (multiple Task calls in one message)
  • Ignoring existing: ALWAYS read existing first, even with --create-new
  • Over-documenting: Not every dir needs AGENTS.md
  • Redundancy: Child never repeats parent
  • Generic content: Remove anything that applies to ALL projects
  • Verbose style: Telegraphic or die
  • 静态Agent数量:必须根据项目规模/深度调整Agent数量
  • 顺序执行:必须并行执行(单条消息中包含多个Task调用)
  • 忽略现有内容:即使使用--create-new,也必须先读取现有文件
  • 过度文档化:并非每个目录都需要AGENTS.md
  • 冗余内容:子目录文档绝不能重复父目录内容
  • 通用内容:删除适用于所有项目的内容
  • 冗长风格:必须使用简洁的电报式风格