codebase-understanding

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Codebase Understanding

Codebase Understanding

概述

Overview

本技能提供自底向上的代码库分析方法,通过为每个目录生成 README.md 文档,形成树状的代码理解体系。
核心特性:
  • 自底向上: 从叶子目录开始分析,逐层向上汇总
  • 一句话描述: 每个类、函数用一句话概括功能
  • 状态持久化: 支持断点续传,中断后可继续分析
  • 增量更新: 只分析修改过的文件,提高效率
This skill provides a bottom-up codebase analysis method, which forms a tree-like code understanding system by generating README.md documents for each directory.
Core Features:
  • Bottom-up Analysis: Start analysis from leaf directories and summarize layer by layer upwards
  • One-sentence Descriptions: Summarize the function of each class and function in one sentence
  • State Persistence: Supports resumable analysis, allowing you to continue after interruption
  • Incremental Updates: Only analyze modified files to improve efficiency

使用场景

Usage Scenarios

1. 理解新项目

1. Understanding New Projects

用户请求示例:
  • "帮我理解这个项目的代码结构"
  • "这个代码库是做什么的?有哪些主要模块?"
  • "我刚接手这个项目,需要了解整体架构"
操作步骤:
  1. 使用 Glob 工具扫描源代码目录结构
  2. 初始化
    .analysis-state.json
    状态文件
  3. 从叶子目录开始分析并生成 README.md
  4. 逐层向上生成父目录的 README.md
  5. 最后生成根目录的源代码概览 README.md
User Request Examples:
  • "Help me understand the code structure of this project"
  • "What does this codebase do? What are the main modules?"
  • "I just took over this project and need to understand the overall architecture"
Operation Steps:
  1. Scan the source code directory structure using the Glob tool
  2. Initialize the
    .analysis-state.json
    state file
  3. Start analysis from leaf directories and generate README.md files
  4. Generate README.md files for parent directories layer by layer upwards
  5. Finally generate a source code overview README.md in the root directory

2. 生成技术文档

2. Generating Technical Documentation

用户请求示例:
  • "为这个项目生成完整的代码文档"
  • "需要一份代码库的参考文档"
  • "生成 API 文档和架构说明"
操作步骤:
  1. 检查是否已有分析状态,如果有则继续
  2. 完成所有目录的 README.md 生成
  3. 在根目录生成整体架构文档
  4. 提供关键文件和函数的索引
User Request Examples:
  • "Generate complete code documentation for this project"
  • "Need a reference document for this codebase"
  • "Generate API documentation and architecture descriptions"
Operation Steps:
  1. Check if analysis state exists; if so, continue from there
  2. Complete README.md generation for all directories
  3. Generate an overall architecture document in the root directory
  4. Provide indexes for key files and functions

3. 分析功能实现

3. Analyzing Function Implementations

用户请求示例:
  • "这个功能在哪里实现的?"
  • "查找处理用户登录的代码"
  • "追踪订单创建的完整流程"
操作步骤:
  1. 使用 Grep 工具搜索关键词(如 "login", "authentication")
  2. 读取相关文件,理解功能实现
  3. 追踪函数调用链
  4. 在对应目录的 README.md 中标注关键流程
User Request Examples:
  • "Where is this function implemented?"
  • "Find the code that handles user login"
  • "Track the complete process of order creation"
Operation Steps:
  1. Use the Grep tool to search for keywords (e.g., "login", "authentication")
  2. Read relevant files to understand function implementations
  3. Track function call chains
  4. Mark key processes in the README.md of the corresponding directory

工作流程

Workflow

阶段 1: 准备和扫描

Phase 1: Preparation and Scanning

bash
undefined
bash
undefined

1. 识别源代码目录

1. Identify source code directories

src_dirs = ["src", "lib", "app", "server"]
src_dirs = ["src", "lib", "app", "server"]

2. 扫描目录树结构

2. Scan directory tree structure

使用 Bash 工具: find src -type d | sort
Use Bash tool: find src -type d | sort

3. 初始化或加载状态文件

3. Initialize or load state file

如果 .analysis-state.json 存在 → 加载状态 如果不存在 → 创建新的状态文件

**状态文件结构**: 见 [references/state-management.md](references/state-management.md)
If .analysis-state.json exists → Load state If not exists → Create new state file

**State File Structure**: See [references/state-management.md](references/state-management.md)

阶段 2: 分析叶子目录

Phase 2: Analyze Leaf Directories

叶子目录 = 不包含子目录的目录(只有源代码文件)
步骤:
  1. 列出目录下所有文件
    javascript
    Glob: **/*.{js,ts,py,java,go,rs,cpp,c,h}
  2. 对每个文件执行分析
    • 读取文件内容:
      Read(path/to/file)
    • 识别类定义、函数定义
    • 提取函数签名(函数名、参数、返回类型)
    • 分析函数体,理解功能
    详细方法: 见 references/file-analysis.md
  3. 生成一句话描述
    • 模板:
      [动词] + [对象] + [方式] + [结果]
    • 示例:
      验证用户邮箱和密码并返回用户对象
  4. 生成 README.md
    • 使用模板: assets/leaf-readme-template.md
    • 写入文件:
      Write(path/to/README.md, content)
  5. 更新状态
    json
    {
      "src/utils/auth": {
        "status": "completed",
        "readmeGenerated": true,
        "files": ["login.js", "register.js"],
        "fileHashes": { "login.js": "abc123", ... }
      }
    }
并行处理: 多个叶子目录可以并行分析,提高效率。
Leaf directories = Directories that do not contain subdirectories (only source code files)
Steps:
  1. List all files in the directory
    javascript
    Glob: **/*.{js,ts,py,java,go,rs,cpp,c,h}
  2. Perform analysis on each file
    • Read file content:
      Read(path/to/file)
    • Identify class definitions and function definitions
    • Extract function signatures (function name, parameters, return type)
    • Analyze function body to understand functionality
    Detailed Methods: See references/file-analysis.md
  3. Generate one-sentence descriptions
    • Template:
      [Verb] + [Object] + [Method] + [Result]
    • Example:
      Validates user email and password and returns user object
  4. Generate README.md
    • Use template: assets/leaf-readme-template.md
    • Write to file:
      Write(path/to/README.md, content)
  5. Update state
    json
    {
      "src/utils/auth": {
        "status": "completed",
        "readmeGenerated": true,
        "files": ["login.js", "register.js"],
        "fileHashes": { "login.js": "abc123", ... }
      }
    }
Parallel Processing: Multiple leaf directories can be analyzed in parallel to improve efficiency.

阶段 3: 分析分支目录

Phase 3: Analyze Branch Directories

分支目录 = 包含子目录的目录
步骤:
  1. 读取所有子目录的 README.md
    javascript
    for (subdir of subdirs) {
      readme = Read(subdir/README.md);
      提取"目录概述"部分;
    }
  2. 分析当前目录的文件(如果有)
    • 同叶子目录的文件分析方法
  3. 生成 README.md
    • 使用模板: assets/branch-readme-template.md
    • 包含子目录概述摘要
    • 包含当前目录文件分析
  4. 更新状态
    json
    {
      "src/services": {
        "status": "completed",
        "subdirs": ["user", "order", "payment"],
        "completedSubdirs": ["user", "order", "payment"]
      }
    }
依赖关系: 必须等待所有子目录完成后再分析父目录。
Branch directories = Directories that contain subdirectories
Steps:
  1. Read README.md files of all subdirectories
    javascript
    for (subdir of subdirs) {
      readme = Read(subdir/README.md);
      Extract "Directory Overview" section;
    }
  2. Analyze files in the current directory (if any)
    • Same file analysis method as leaf directories
  3. Generate README.md
    • Use template: assets/branch-readme-template.md
    • Include summary of subdirectory overviews
    • Include analysis of current directory files
  4. Update state
    json
    {
      "src/services": {
        "status": "completed",
        "subdirs": ["user", "order", "payment"],
        "completedSubdirs": ["user", "order", "payment"]
      }
    }
Dependency: Parent directories must wait for all subdirectories to be completed before analysis.

阶段 4: 生成根目录文档

Phase 4: Generate Root Directory Documentation

步骤:
  1. 收集所有一级目录的概述
  2. 分析技术栈
    • 读取 package.json / requirements.txt / pom.xml
    • 识别主要依赖和框架
  3. 生成架构图
    • 识别分层结构
    • 绘制模块关系图
  4. 生成 README.md
    • 使用模板: assets/root-readme-template.md
Steps:
  1. Collect overviews of all first-level directories
  2. Analyze tech stack
    • Read package.json / requirements.txt / pom.xml
    • Identify main dependencies and frameworks
  3. Generate architecture diagram
    • Identify layered structure
    • Draw module relationship diagram
  4. Generate README.md
    • Use template: assets/root-readme-template.md

状态管理和断点续传

State Management and Resumable Analysis

状态文件位置

State File Location

项目根目录/.analysis-state.json
Project root directory/.analysis-state.json

状态检查和恢复

State Check and Recovery

开始分析前:
javascript
if (exists('.analysis-state.json')) {
  state = load('.analysis-state.json');
  print("发现已有分析状态:");
  print(`已完成: ${state.stats.analyzedDirectories}/${state.stats.totalDirectories}`);
  print("从上次中断处继续...");
} else {
  print("首次分析,初始化状态文件...");
  state = init();
}
继续中断的分析:
javascript
pending_dirs = state.get_pending_directories();
for (dir of pending_dirs) {
  if (state.should_analyze(dir)) {
    analyze_directory(dir);
  }
}
Before starting analysis:
javascript
if (exists('.analysis-state.json')) {
  state = load('.analysis-state.json');
  print("Existing analysis state found:");
  print(`Completed: ${state.stats.analyzedDirectories}/${state.stats.totalDirectories}`);
  print("Resuming from last interruption...");
} else {
  print("First-time analysis, initializing state file...");
  state = init();
}
Resume interrupted analysis:
javascript
pending_dirs = state.get_pending_directories();
for (dir of pending_dirs) {
  if (state.should_analyze(dir)) {
    analyze_directory(dir);
  }
}

增量更新策略

Incremental Update Strategy

当重新分析时:
  1. 计算文件的 MD5 哈希
  2. 与状态文件中保存的哈希对比
  3. 如果哈希不同 → 文件已修改,重新分析
  4. 如果哈希相同 → 跳过,使用已有结果
详细说明: 见 references/state-management.md
When re-analyzing:
  1. Calculate MD5 hash of files
  2. Compare with hashes saved in the state file
  3. If hashes are different → File has been modified, re-analyze
  4. If hashes are the same → Skip, use existing results
Detailed Explanation: See references/state-management.md

语言支持

Language Support

JavaScript / TypeScript

JavaScript / TypeScript

识别模式:
javascript
export class UserService { }
export function createUser() { }
export const validate = () => { }
提取: 类名、函数名、参数、返回类型、async 标记
Recognition Patterns:
javascript
export class UserService { }
export function createUser() { }
export const validate = () => { }
Extraction: Class name, function name, parameters, return type, async marker

Python

Python

识别模式:
python
class UserService:
def create_user():
async def fetch_data():
提取: 类名、函数名、参数、返回类型、装饰器
Recognition Patterns:
python
class UserService:
def create_user():
async def fetch_data():
Extraction: Class name, function name, parameters, return type, decorators

Java

Java

识别模式:
java
public class UserService { }
public void createUser() { }
提取: 类名、方法名、参数、返回类型、注解
Recognition Patterns:
java
public class UserService { }
public void createUser() { }
Extraction: Class name, method name, parameters, return type, annotations

Go

Go

识别模式:
go
type UserService struct { }
func CreateUser() { }
func (s *UserService) Method() { }
提取: 类型名、函数名、方法名、参数、返回类型
详细方法: 见 references/file-analysis.md
Recognition Patterns:
go
type UserService struct { }
func CreateUser() { }
func (s *UserService) Method() { }
Extraction: Type name, function name, method name, parameters, return type
Detailed Methods: See references/file-analysis.md

函数描述规范

Function Description Specifications

一句话描述模板

One-sentence Description Template

函数类型模板示例
数据处理
[动词] + [对象] + [方式] + [结果]
验证用户邮箱并返回验证结果
查询获取
从 [数据源] 查询 [条件]
从数据库获取用户信息
操作执行
[动词] + [对象] + [结果]
发送验证邮件到用户邮箱
工具辅助
[动词] + [对象] + [转换]
格式化日期为本地化字符串
Function TypeTemplateExample
Data Processing
[Verb] + [Object] + [Method] + [Result]
Validates user email and returns verification result
Query Retrieval
Query [conditions] from [data source]
Retrieve user information from database
Operation Execution
[Verb] + [Object] + [Result]
Send verification email to user's mailbox
Tool Assistance
[Verb] + [Object] + [Conversion]
Format date into localized string

质量标准

Quality Standards

好的描述:
  • 验证用户登录凭证并返回 JWT token
  • 计算购物车中商品的总折扣金额
  • 从 Redis 获取用户会话信息
不好的描述:
  • 处理数据(太模糊)
  • helper 函数(没有说明功能)
  • get, set(缺少上下文)
Good Descriptions:
  • Validates user login credentials and returns JWT token
  • Calculates total discount amount of items in shopping cart
  • Retrieve user session information from Redis
Poor Descriptions:
  • Process data (too vague)
  • Helper function (no functional explanation)
  • get, set (lack of context)

输出文档结构

Output Document Structure

分析完成后,项目中的每个目录都有一个 README.md:
project/
├── README.md (根目录概览)
├── src/
│   ├── README.md (src 概述)
│   ├── utils/
│   │   ├── README.md (utils 目录说明)
│   │   ├── auth/
│   │   │   ├── README.md (auth 模块详细说明)
│   │   │   ├── login.js
│   │   │   └── register.js
│   │   └── date/
│   │       ├── README.md (date 模块详细说明)
│   │       └── helpers.js
│   └── services/
│       ├── README.md (services 说明)
│       ├── user/
│       │   ├── README.md (user 服务说明)
│       │   └── service.js
│       └── order/
│           ├── README.md (order 服务说明)
│           └── service.js
每个 README.md 包含该层级的相关信息,形成完整的文档树。
After analysis is completed, each directory in the project has a README.md:
project/
├── README.md (Root directory overview)
├── src/
│   ├── README.md (src overview)
│   ├── utils/
│   │   ├── README.md (utils directory description)
│   │   ├── auth/
│   │   │   ├── README.md (auth module detailed description)
│   │   │   ├── login.js
│   │   │   └── register.js
│   │   └── date/
│   │       ├── README.md (date module detailed description)
│   │       └── helpers.js
│   └── services/
│       ├── README.md (services description)
│       ├── user/
│       │   ├── README.md (user service description)
│       │   └── service.js
│       └── order/
│           ├── README.md (order service description)
│           └── service.js
Each README.md contains relevant information for that level, forming a complete documentation tree.

最佳实践

Best Practices

1. 并行处理

1. Parallel Processing

  • 叶子目录可以并行分析
  • 使用 Task 工具启动多个 Explore agent 并行工作
  • 父目录必须等待子目录完成
  • Leaf directories can be analyzed in parallel
  • Use Task tool to start multiple Explore agents working in parallel
  • Parent directories must wait for subdirectories to complete

2. 进度追踪

2. Progress Tracking

使用 TodoWrite 工具实时更新进度:
javascript
TodoWrite([
  { content: "分析 src/utils/auth/", status: "in_progress" },
  { content: "分析 src/utils/date/", status: "pending" },
  { content: "生成 src/utils/ README.md", status: "pending" }
]);
Use TodoWrite tool to update progress in real-time:
javascript
TodoWrite([
  { content: "Analyze src/utils/auth/", status: "in_progress" },
  { content: "Analyze src/utils/date/", status: "pending" },
  { content: "Generate src/utils/ README.md", status: "pending" }
]);

3. 错误处理

3. Error Handling

遇到错误时:
  1. 记录错误到状态文件
  2. 标记目录为 "failed"
  3. 继续处理其他目录
  4. 最后提供错误报告
When encountering errors:
  1. Record errors to state file
  2. Mark directory as "failed"
  3. Continue processing other directories
  4. Provide error report at the end

4. 性能优化

4. Performance Optimization

  • 使用 Glob 而不是 find 命令搜索文件
  • 批量读取文件减少 I/O 操作
  • 并行处理独立的目录
  • 使用文件哈希避免重复分析
  • Use Glob instead of find command to search files
  • Batch read files to reduce I/O operations
  • Process independent directories in parallel
  • Use file hashes to avoid repeated analysis

5. 质量检查

5. Quality Check

生成 README.md 后检查:
  • ✅ 所有文件都已分析
  • ✅ 所有类和函数都有描述
  • ✅ 描述简洁准确
  • ✅ Markdown 格式正确
After generating README.md, check:
  • ✅ All files have been analyzed
  • ✅ All classes and functions have descriptions
  • ✅ Descriptions are concise and accurate
  • ✅ Markdown format is correct

执行示例

Execution Examples

示例 1: 分析小型项目

Example 1: Analyze Small Project

javascript
// 1. 扫描目录
dirs = Glob("src/**")
// ["src/utils", "src/services", "src/models"]

// 2. 识别叶子目录
leaf_dirs = ["src/utils/auth", "src/utils/format", "src/services/user"]

// 3. 并行分析叶子目录
for (dir of leaf_dirs) {
  analyze_leaf_directory(dir);
}

// 4. 分析父目录
analyze_branch_directory("src/utils");
analyze_branch_directory("src/services");

// 5. 生成根目录
generate_root_readme("src");
javascript
// 1. Scan directories
dirs = Glob("src/**")
// ["src/utils", "src/services", "src/models"]

// 2. Identify leaf directories
leaf_dirs = ["src/utils/auth", "src/utils/format", "src/services/user"]

// 3. Analyze leaf directories in parallel
for (dir of leaf_dirs) {
  analyze_leaf_directory(dir);
}

// 4. Analyze parent directories
analyze_branch_directory("src/utils");
analyze_branch_directory("src/services");

// 5. Generate root directory
generate_root_readme("src");

示例 2: 继续中断的分析

Example 2: Resume Interrupted Analysis

javascript
// 1. 加载状态
state = load_state(".analysis-state.json");

// 2. 获取待处理目录
pending = state.get_pending_directories();
// ["src/services/order", "src/services/payment"]

// 3. 继续分析
for (dir of pending) {
  analyze_directory(dir);
}

// 4. 完成剩余父目录
if (all_subdirs_completed("src/services")) {
  analyze_branch_directory("src/services");
}
javascript
// 1. Load state
state = load_state(".analysis-state.json");

// 2. Get pending directories
pending = state.get_pending_directories();
// ["src/services/order", "src/services/payment"]

// 3. Continue analysis
for (dir of pending) {
  analyze_directory(dir);
}

// 4. Complete remaining parent directories
if (all_subdirs_completed("src/services")) {
  analyze_branch_directory("src/services");
}

常见问题

Frequently Asked Questions

Q: 分析中断了怎么办?

Q: What if the analysis is interrupted?

A: 状态文件会保存所有进度。下次运行时会自动从上次中断处继续。
A: The state file saves all progress. The next time you run it, it will automatically resume from the last interruption.

Q: 代码修改后需要重新分析吗?

Q: Do I need to re-analyze after code modifications?

A: 系统会检测文件哈希,只分析修改过的文件,未修改的文件会跳过。
A: The system will detect file hashes and only analyze modified files; unmodified files will be skipped.

Q: 如何只分析特定目录?

Q: How to analyze only specific directories?

A: 可以在状态文件中标记其他目录为 "skipped",或直接指定要分析的目录路径。
A: You can mark other directories as "skipped" in the state file, or directly specify the directory paths to analyze.

Q: 生成的 README.md 会覆盖已有文件吗?

Q: Will the generated README.md overwrite existing files?

A: 会。建议将生成的 README.md 重命名或备份,避免覆盖手动编写的重要文档。
A: Yes. It is recommended to rename or back up generated README.md files to avoid overwriting important manually written documents.

资源

Resources

references/

references/

  • state-management.md: 状态管理和断点续传的详细实现
  • file-analysis.md: 文件分析方法和语言特定模式
  • state-management.md: Detailed implementation of state management and resumable analysis
  • file-analysis.md: File analysis methods and language-specific patterns

assets/

assets/

  • leaf-readme-template.md: 叶子目录 README 模板
  • branch-readme-template.md: 分支目录 README 模板
  • root-readme-template.md: 根目录 README 模板
  • leaf-readme-template.md: Leaf directory README template
  • branch-readme-template.md: Branch directory README template
  • root-readme-template.md: Root directory README template