codebase-understanding
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCodebase Understanding
Codebase Understanding
概述
Overview
本技能提供自底向上的代码库分析方法,通过为每个目录生成 README.md 文档,形成树状的代码理解体系。
核心特性:
- 自底向上: 从叶子目录开始分析,逐层向上汇总
- 一句话描述: 每个类、函数用一句话概括功能
- 状态持久化: 支持断点续传,中断后可继续分析
- 增量更新: 只分析修改过的文件,提高效率
This skill provides a bottom-up codebase analysis method, which forms a tree-like code understanding system by generating README.md documents for each directory.
Core Features:
- Bottom-up Analysis: Start analysis from leaf directories and summarize layer by layer upwards
- One-sentence Descriptions: Summarize the function of each class and function in one sentence
- State Persistence: Supports resumable analysis, allowing you to continue after interruption
- Incremental Updates: Only analyze modified files to improve efficiency
使用场景
Usage Scenarios
1. 理解新项目
1. Understanding New Projects
用户请求示例:
- "帮我理解这个项目的代码结构"
- "这个代码库是做什么的?有哪些主要模块?"
- "我刚接手这个项目,需要了解整体架构"
操作步骤:
- 使用 Glob 工具扫描源代码目录结构
- 初始化 状态文件
.analysis-state.json - 从叶子目录开始分析并生成 README.md
- 逐层向上生成父目录的 README.md
- 最后生成根目录的源代码概览 README.md
User Request Examples:
- "Help me understand the code structure of this project"
- "What does this codebase do? What are the main modules?"
- "I just took over this project and need to understand the overall architecture"
Operation Steps:
- Scan the source code directory structure using the Glob tool
- Initialize the state file
.analysis-state.json - Start analysis from leaf directories and generate README.md files
- Generate README.md files for parent directories layer by layer upwards
- Finally generate a source code overview README.md in the root directory
2. 生成技术文档
2. Generating Technical Documentation
用户请求示例:
- "为这个项目生成完整的代码文档"
- "需要一份代码库的参考文档"
- "生成 API 文档和架构说明"
操作步骤:
- 检查是否已有分析状态,如果有则继续
- 完成所有目录的 README.md 生成
- 在根目录生成整体架构文档
- 提供关键文件和函数的索引
User Request Examples:
- "Generate complete code documentation for this project"
- "Need a reference document for this codebase"
- "Generate API documentation and architecture descriptions"
Operation Steps:
- Check if analysis state exists; if so, continue from there
- Complete README.md generation for all directories
- Generate an overall architecture document in the root directory
- Provide indexes for key files and functions
3. 分析功能实现
3. Analyzing Function Implementations
用户请求示例:
- "这个功能在哪里实现的?"
- "查找处理用户登录的代码"
- "追踪订单创建的完整流程"
操作步骤:
- 使用 Grep 工具搜索关键词(如 "login", "authentication")
- 读取相关文件,理解功能实现
- 追踪函数调用链
- 在对应目录的 README.md 中标注关键流程
User Request Examples:
- "Where is this function implemented?"
- "Find the code that handles user login"
- "Track the complete process of order creation"
Operation Steps:
- Use the Grep tool to search for keywords (e.g., "login", "authentication")
- Read relevant files to understand function implementations
- Track function call chains
- Mark key processes in the README.md of the corresponding directory
工作流程
Workflow
阶段 1: 准备和扫描
Phase 1: Preparation and Scanning
bash
undefinedbash
undefined1. 识别源代码目录
1. Identify source code directories
src_dirs = ["src", "lib", "app", "server"]
src_dirs = ["src", "lib", "app", "server"]
2. 扫描目录树结构
2. Scan directory tree structure
使用 Bash 工具: find src -type d | sort
Use Bash tool: find src -type d | sort
3. 初始化或加载状态文件
3. Initialize or load state file
如果 .analysis-state.json 存在 → 加载状态
如果不存在 → 创建新的状态文件
**状态文件结构**: 见 [references/state-management.md](references/state-management.md)If .analysis-state.json exists → Load state
If not exists → Create new state file
**State File Structure**: See [references/state-management.md](references/state-management.md)阶段 2: 分析叶子目录
Phase 2: Analyze Leaf Directories
叶子目录 = 不包含子目录的目录(只有源代码文件)
步骤:
-
列出目录下所有文件javascript
Glob: **/*.{js,ts,py,java,go,rs,cpp,c,h} -
对每个文件执行分析
- 读取文件内容:
Read(path/to/file) - 识别类定义、函数定义
- 提取函数签名(函数名、参数、返回类型)
- 分析函数体,理解功能
详细方法: 见 references/file-analysis.md - 读取文件内容:
-
生成一句话描述
- 模板:
[动词] + [对象] + [方式] + [结果] - 示例:
验证用户邮箱和密码并返回用户对象
- 模板:
-
生成 README.md
- 使用模板: assets/leaf-readme-template.md
- 写入文件:
Write(path/to/README.md, content)
-
更新状态json
{ "src/utils/auth": { "status": "completed", "readmeGenerated": true, "files": ["login.js", "register.js"], "fileHashes": { "login.js": "abc123", ... } } }
并行处理: 多个叶子目录可以并行分析,提高效率。
Leaf directories = Directories that do not contain subdirectories (only source code files)
Steps:
-
List all files in the directoryjavascript
Glob: **/*.{js,ts,py,java,go,rs,cpp,c,h} -
Perform analysis on each file
- Read file content:
Read(path/to/file) - Identify class definitions and function definitions
- Extract function signatures (function name, parameters, return type)
- Analyze function body to understand functionality
Detailed Methods: See references/file-analysis.md - Read file content:
-
Generate one-sentence descriptions
- Template:
[Verb] + [Object] + [Method] + [Result] - Example:
Validates user email and password and returns user object
- Template:
-
Generate README.md
- Use template: assets/leaf-readme-template.md
- Write to file:
Write(path/to/README.md, content)
-
Update statejson
{ "src/utils/auth": { "status": "completed", "readmeGenerated": true, "files": ["login.js", "register.js"], "fileHashes": { "login.js": "abc123", ... } } }
Parallel Processing: Multiple leaf directories can be analyzed in parallel to improve efficiency.
阶段 3: 分析分支目录
Phase 3: Analyze Branch Directories
分支目录 = 包含子目录的目录
步骤:
-
读取所有子目录的 README.mdjavascript
for (subdir of subdirs) { readme = Read(subdir/README.md); 提取"目录概述"部分; } -
分析当前目录的文件(如果有)
- 同叶子目录的文件分析方法
-
生成 README.md
- 使用模板: assets/branch-readme-template.md
- 包含子目录概述摘要
- 包含当前目录文件分析
-
更新状态json
{ "src/services": { "status": "completed", "subdirs": ["user", "order", "payment"], "completedSubdirs": ["user", "order", "payment"] } }
依赖关系: 必须等待所有子目录完成后再分析父目录。
Branch directories = Directories that contain subdirectories
Steps:
-
Read README.md files of all subdirectoriesjavascript
for (subdir of subdirs) { readme = Read(subdir/README.md); Extract "Directory Overview" section; } -
Analyze files in the current directory (if any)
- Same file analysis method as leaf directories
-
Generate README.md
- Use template: assets/branch-readme-template.md
- Include summary of subdirectory overviews
- Include analysis of current directory files
-
Update statejson
{ "src/services": { "status": "completed", "subdirs": ["user", "order", "payment"], "completedSubdirs": ["user", "order", "payment"] } }
Dependency: Parent directories must wait for all subdirectories to be completed before analysis.
阶段 4: 生成根目录文档
Phase 4: Generate Root Directory Documentation
步骤:
- 收集所有一级目录的概述
- 分析技术栈
- 读取 package.json / requirements.txt / pom.xml
- 识别主要依赖和框架
- 生成架构图
- 识别分层结构
- 绘制模块关系图
- 生成 README.md
- 使用模板: assets/root-readme-template.md
Steps:
- Collect overviews of all first-level directories
- Analyze tech stack
- Read package.json / requirements.txt / pom.xml
- Identify main dependencies and frameworks
- Generate architecture diagram
- Identify layered structure
- Draw module relationship diagram
- Generate README.md
- Use template: assets/root-readme-template.md
状态管理和断点续传
State Management and Resumable Analysis
状态文件位置
State File Location
项目根目录/.analysis-state.jsonProject root directory/.analysis-state.json状态检查和恢复
State Check and Recovery
开始分析前:
javascript
if (exists('.analysis-state.json')) {
state = load('.analysis-state.json');
print("发现已有分析状态:");
print(`已完成: ${state.stats.analyzedDirectories}/${state.stats.totalDirectories}`);
print("从上次中断处继续...");
} else {
print("首次分析,初始化状态文件...");
state = init();
}继续中断的分析:
javascript
pending_dirs = state.get_pending_directories();
for (dir of pending_dirs) {
if (state.should_analyze(dir)) {
analyze_directory(dir);
}
}Before starting analysis:
javascript
if (exists('.analysis-state.json')) {
state = load('.analysis-state.json');
print("Existing analysis state found:");
print(`Completed: ${state.stats.analyzedDirectories}/${state.stats.totalDirectories}`);
print("Resuming from last interruption...");
} else {
print("First-time analysis, initializing state file...");
state = init();
}Resume interrupted analysis:
javascript
pending_dirs = state.get_pending_directories();
for (dir of pending_dirs) {
if (state.should_analyze(dir)) {
analyze_directory(dir);
}
}增量更新策略
Incremental Update Strategy
当重新分析时:
- 计算文件的 MD5 哈希
- 与状态文件中保存的哈希对比
- 如果哈希不同 → 文件已修改,重新分析
- 如果哈希相同 → 跳过,使用已有结果
详细说明: 见 references/state-management.md
When re-analyzing:
- Calculate MD5 hash of files
- Compare with hashes saved in the state file
- If hashes are different → File has been modified, re-analyze
- If hashes are the same → Skip, use existing results
Detailed Explanation: See references/state-management.md
语言支持
Language Support
JavaScript / TypeScript
JavaScript / TypeScript
识别模式:
javascript
export class UserService { }
export function createUser() { }
export const validate = () => { }提取: 类名、函数名、参数、返回类型、async 标记
Recognition Patterns:
javascript
export class UserService { }
export function createUser() { }
export const validate = () => { }Extraction: Class name, function name, parameters, return type, async marker
Python
Python
识别模式:
python
class UserService:
def create_user():
async def fetch_data():提取: 类名、函数名、参数、返回类型、装饰器
Recognition Patterns:
python
class UserService:
def create_user():
async def fetch_data():Extraction: Class name, function name, parameters, return type, decorators
Java
Java
识别模式:
java
public class UserService { }
public void createUser() { }提取: 类名、方法名、参数、返回类型、注解
Recognition Patterns:
java
public class UserService { }
public void createUser() { }Extraction: Class name, method name, parameters, return type, annotations
Go
Go
识别模式:
go
type UserService struct { }
func CreateUser() { }
func (s *UserService) Method() { }提取: 类型名、函数名、方法名、参数、返回类型
详细方法: 见 references/file-analysis.md
Recognition Patterns:
go
type UserService struct { }
func CreateUser() { }
func (s *UserService) Method() { }Extraction: Type name, function name, method name, parameters, return type
Detailed Methods: See references/file-analysis.md
函数描述规范
Function Description Specifications
一句话描述模板
One-sentence Description Template
| 函数类型 | 模板 | 示例 |
|---|---|---|
| 数据处理 | | 验证用户邮箱并返回验证结果 |
| 查询获取 | | 从数据库获取用户信息 |
| 操作执行 | | 发送验证邮件到用户邮箱 |
| 工具辅助 | | 格式化日期为本地化字符串 |
| Function Type | Template | Example |
|---|---|---|
| Data Processing | | Validates user email and returns verification result |
| Query Retrieval | | Retrieve user information from database |
| Operation Execution | | Send verification email to user's mailbox |
| Tool Assistance | | Format date into localized string |
质量标准
Quality Standards
✅ 好的描述:
- 验证用户登录凭证并返回 JWT token
- 计算购物车中商品的总折扣金额
- 从 Redis 获取用户会话信息
❌ 不好的描述:
- 处理数据(太模糊)
- helper 函数(没有说明功能)
- get, set(缺少上下文)
✅ Good Descriptions:
- Validates user login credentials and returns JWT token
- Calculates total discount amount of items in shopping cart
- Retrieve user session information from Redis
❌ Poor Descriptions:
- Process data (too vague)
- Helper function (no functional explanation)
- get, set (lack of context)
输出文档结构
Output Document Structure
分析完成后,项目中的每个目录都有一个 README.md:
project/
├── README.md (根目录概览)
├── src/
│ ├── README.md (src 概述)
│ ├── utils/
│ │ ├── README.md (utils 目录说明)
│ │ ├── auth/
│ │ │ ├── README.md (auth 模块详细说明)
│ │ │ ├── login.js
│ │ │ └── register.js
│ │ └── date/
│ │ ├── README.md (date 模块详细说明)
│ │ └── helpers.js
│ └── services/
│ ├── README.md (services 说明)
│ ├── user/
│ │ ├── README.md (user 服务说明)
│ │ └── service.js
│ └── order/
│ ├── README.md (order 服务说明)
│ └── service.js每个 README.md 包含该层级的相关信息,形成完整的文档树。
After analysis is completed, each directory in the project has a README.md:
project/
├── README.md (Root directory overview)
├── src/
│ ├── README.md (src overview)
│ ├── utils/
│ │ ├── README.md (utils directory description)
│ │ ├── auth/
│ │ │ ├── README.md (auth module detailed description)
│ │ │ ├── login.js
│ │ │ └── register.js
│ │ └── date/
│ │ ├── README.md (date module detailed description)
│ │ └── helpers.js
│ └── services/
│ ├── README.md (services description)
│ ├── user/
│ │ ├── README.md (user service description)
│ │ └── service.js
│ └── order/
│ ├── README.md (order service description)
│ └── service.jsEach README.md contains relevant information for that level, forming a complete documentation tree.
最佳实践
Best Practices
1. 并行处理
1. Parallel Processing
- 叶子目录可以并行分析
- 使用 Task 工具启动多个 Explore agent 并行工作
- 父目录必须等待子目录完成
- Leaf directories can be analyzed in parallel
- Use Task tool to start multiple Explore agents working in parallel
- Parent directories must wait for subdirectories to complete
2. 进度追踪
2. Progress Tracking
使用 TodoWrite 工具实时更新进度:
javascript
TodoWrite([
{ content: "分析 src/utils/auth/", status: "in_progress" },
{ content: "分析 src/utils/date/", status: "pending" },
{ content: "生成 src/utils/ README.md", status: "pending" }
]);Use TodoWrite tool to update progress in real-time:
javascript
TodoWrite([
{ content: "Analyze src/utils/auth/", status: "in_progress" },
{ content: "Analyze src/utils/date/", status: "pending" },
{ content: "Generate src/utils/ README.md", status: "pending" }
]);3. 错误处理
3. Error Handling
遇到错误时:
- 记录错误到状态文件
- 标记目录为 "failed"
- 继续处理其他目录
- 最后提供错误报告
When encountering errors:
- Record errors to state file
- Mark directory as "failed"
- Continue processing other directories
- Provide error report at the end
4. 性能优化
4. Performance Optimization
- 使用 Glob 而不是 find 命令搜索文件
- 批量读取文件减少 I/O 操作
- 并行处理独立的目录
- 使用文件哈希避免重复分析
- Use Glob instead of find command to search files
- Batch read files to reduce I/O operations
- Process independent directories in parallel
- Use file hashes to avoid repeated analysis
5. 质量检查
5. Quality Check
生成 README.md 后检查:
- ✅ 所有文件都已分析
- ✅ 所有类和函数都有描述
- ✅ 描述简洁准确
- ✅ Markdown 格式正确
After generating README.md, check:
- ✅ All files have been analyzed
- ✅ All classes and functions have descriptions
- ✅ Descriptions are concise and accurate
- ✅ Markdown format is correct
执行示例
Execution Examples
示例 1: 分析小型项目
Example 1: Analyze Small Project
javascript
// 1. 扫描目录
dirs = Glob("src/**")
// ["src/utils", "src/services", "src/models"]
// 2. 识别叶子目录
leaf_dirs = ["src/utils/auth", "src/utils/format", "src/services/user"]
// 3. 并行分析叶子目录
for (dir of leaf_dirs) {
analyze_leaf_directory(dir);
}
// 4. 分析父目录
analyze_branch_directory("src/utils");
analyze_branch_directory("src/services");
// 5. 生成根目录
generate_root_readme("src");javascript
// 1. Scan directories
dirs = Glob("src/**")
// ["src/utils", "src/services", "src/models"]
// 2. Identify leaf directories
leaf_dirs = ["src/utils/auth", "src/utils/format", "src/services/user"]
// 3. Analyze leaf directories in parallel
for (dir of leaf_dirs) {
analyze_leaf_directory(dir);
}
// 4. Analyze parent directories
analyze_branch_directory("src/utils");
analyze_branch_directory("src/services");
// 5. Generate root directory
generate_root_readme("src");示例 2: 继续中断的分析
Example 2: Resume Interrupted Analysis
javascript
// 1. 加载状态
state = load_state(".analysis-state.json");
// 2. 获取待处理目录
pending = state.get_pending_directories();
// ["src/services/order", "src/services/payment"]
// 3. 继续分析
for (dir of pending) {
analyze_directory(dir);
}
// 4. 完成剩余父目录
if (all_subdirs_completed("src/services")) {
analyze_branch_directory("src/services");
}javascript
// 1. Load state
state = load_state(".analysis-state.json");
// 2. Get pending directories
pending = state.get_pending_directories();
// ["src/services/order", "src/services/payment"]
// 3. Continue analysis
for (dir of pending) {
analyze_directory(dir);
}
// 4. Complete remaining parent directories
if (all_subdirs_completed("src/services")) {
analyze_branch_directory("src/services");
}常见问题
Frequently Asked Questions
Q: 分析中断了怎么办?
Q: What if the analysis is interrupted?
A: 状态文件会保存所有进度。下次运行时会自动从上次中断处继续。
A: The state file saves all progress. The next time you run it, it will automatically resume from the last interruption.
Q: 代码修改后需要重新分析吗?
Q: Do I need to re-analyze after code modifications?
A: 系统会检测文件哈希,只分析修改过的文件,未修改的文件会跳过。
A: The system will detect file hashes and only analyze modified files; unmodified files will be skipped.
Q: 如何只分析特定目录?
Q: How to analyze only specific directories?
A: 可以在状态文件中标记其他目录为 "skipped",或直接指定要分析的目录路径。
A: You can mark other directories as "skipped" in the state file, or directly specify the directory paths to analyze.
Q: 生成的 README.md 会覆盖已有文件吗?
Q: Will the generated README.md overwrite existing files?
A: 会。建议将生成的 README.md 重命名或备份,避免覆盖手动编写的重要文档。
A: Yes. It is recommended to rename or back up generated README.md files to avoid overwriting important manually written documents.
资源
Resources
references/
references/
- state-management.md: 状态管理和断点续传的详细实现
- file-analysis.md: 文件分析方法和语言特定模式
- state-management.md: Detailed implementation of state management and resumable analysis
- file-analysis.md: File analysis methods and language-specific patterns
assets/
assets/
- leaf-readme-template.md: 叶子目录 README 模板
- branch-readme-template.md: 分支目录 README 模板
- root-readme-template.md: 根目录 README 模板
- leaf-readme-template.md: Leaf directory README template
- branch-readme-template.md: Branch directory README template
- root-readme-template.md: Root directory README template