Codebase Understanding

Overview

This skill provides a bottom-up codebase analysis method, which forms a tree-like code understanding system by generating README.md documents for each directory.

Core Features:

Bottom-up Analysis: Start analysis from leaf directories and summarize layer by layer upwards
One-sentence Descriptions: Summarize the function of each class and function in one sentence
State Persistence: Supports resumable analysis, allowing you to continue after interruption
Incremental Updates: Only analyze modified files to improve efficiency

Usage Scenarios

1. Understanding New Projects

User Request Examples:

"Help me understand the code structure of this project"
"What does this codebase do? What are the main modules?"
"I just took over this project and need to understand the overall architecture"

Operation Steps:

Scan the source code directory structure using the Glob tool
Initialize the
```
.analysis-state.json
```
state file
Start analysis from leaf directories and generate README.md files
Generate README.md files for parent directories layer by layer upwards
Finally generate a source code overview README.md in the root directory

2. Generating Technical Documentation

User Request Examples:

"Generate complete code documentation for this project"
"Need a reference document for this codebase"
"Generate API documentation and architecture descriptions"

Operation Steps:

Check if analysis state exists; if so, continue from there
Complete README.md generation for all directories
Generate an overall architecture document in the root directory
Provide indexes for key files and functions

3. Analyzing Function Implementations

User Request Examples:

"Where is this function implemented?"
"Find the code that handles user login"
"Track the complete process of order creation"

Operation Steps:

Use the Grep tool to search for keywords (e.g., "login", "authentication")
Read relevant files to understand function implementations
Track function call chains
Mark key processes in the README.md of the corresponding directory

Workflow

Phase 1: Preparation and Scanning

bash

# 1. Identify source code directories
src_dirs = ["src", "lib", "app", "server"]

# 2. Scan directory tree structure
Use Bash tool: find src -type d | sort

# 3. Initialize or load state file
If .analysis-state.json exists → Load state
If not exists → Create new state file

State File Structure: See references/state-management.md

Phase 2: Analyze Leaf Directories

Leaf directories = Directories that do not contain subdirectories (only source code files)

Steps:

List all files in the directory

javascript

Glob: **/*.{js,ts,py,java,go,rs,cpp,c,h}

Perform analysis on each file
- Read file content:
```
Read(path/to/file)
```
- Identify class definitions and function definitions
- Extract function signatures (function name, parameters, return type)
- Analyze function body to understand functionality
Detailed Methods: See references/file-analysis.md

Generate one-sentence descriptions

Template:
```
[Verb] + [Object] + [Method] + [Result]
```

Example:

Validates user email and password and returns user object

Generate README.md
- Use template: assets/leaf-readme-template.md
- Write to file:
```
Write(path/to/README.md, content)
```

Update state

json

{
  "src/utils/auth": {
    "status": "completed",
    "readmeGenerated": true,
    "files": ["login.js", "register.js"],
    "fileHashes": { "login.js": "abc123", ... }
  }
}

Parallel Processing: Multiple leaf directories can be analyzed in parallel to improve efficiency.

Phase 3: Analyze Branch Directories

Branch directories = Directories that contain subdirectories

Steps:

Read README.md files of all subdirectories

javascript

for (subdir of subdirs) {
  readme = Read(subdir/README.md);
  Extract "Directory Overview" section;
}

Analyze files in the current directory (if any)
- Same file analysis method as leaf directories
Generate README.md
- Use template: assets/branch-readme-template.md
- Include summary of subdirectory overviews
- Include analysis of current directory files

Update state

json

{
  "src/services": {
    "status": "completed",
    "subdirs": ["user", "order", "payment"],
    "completedSubdirs": ["user", "order", "payment"]
  }
}

Dependency: Parent directories must wait for all subdirectories to be completed before analysis.

Phase 4: Generate Root Directory Documentation

Steps:

Collect overviews of all first-level directories
Analyze tech stack
- Read package.json / requirements.txt / pom.xml
- Identify main dependencies and frameworks
Generate architecture diagram
- Identify layered structure
- Draw module relationship diagram
Generate README.md
- Use template: assets/root-readme-template.md

State Management and Resumable Analysis

State File Location

Project root directory/.analysis-state.json

State Check and Recovery

Before starting analysis:

javascript

if (exists('.analysis-state.json')) {
  state = load('.analysis-state.json');
  print("Existing analysis state found:");
  print(`Completed: ${state.stats.analyzedDirectories}/${state.stats.totalDirectories}`);
  print("Resuming from last interruption...");
} else {
  print("First-time analysis, initializing state file...");
  state = init();
}

Resume interrupted analysis:

javascript

pending_dirs = state.get_pending_directories();
for (dir of pending_dirs) {
  if (state.should_analyze(dir)) {
    analyze_directory(dir);
  }
}

Incremental Update Strategy

When re-analyzing:

Calculate MD5 hash of files
Compare with hashes saved in the state file
If hashes are different → File has been modified, re-analyze
If hashes are the same → Skip, use existing results

Detailed Explanation: See references/state-management.md

Language Support

JavaScript / TypeScript

Recognition Patterns:

javascript

export class UserService { }
export function createUser() { }
export const validate = () => { }

Extraction: Class name, function name, parameters, return type, async marker

Python

Recognition Patterns:

python

class UserService:
def create_user():
async def fetch_data():

Extraction: Class name, function name, parameters, return type, decorators

Java

Recognition Patterns:

java

public class UserService { }
public void createUser() { }

Extraction: Class name, method name, parameters, return type, annotations

Go

Recognition Patterns:

type UserService struct { }
func CreateUser() { }
func (s *UserService) Method() { }

Extraction: Type name, function name, method name, parameters, return type

Detailed Methods: See references/file-analysis.md

Function Description Specifications

One-sentence Description Template

Function Type	Template	Example
Data Processing	`[Verb] + [Object] + [Method] + [Result]`	Validates user email and returns verification result
Query Retrieval	`Query [conditions] from [data source]`	Retrieve user information from database
Operation Execution	`[Verb] + [Object] + [Result]`	Send verification email to user's mailbox
Tool Assistance	`[Verb] + [Object] + [Conversion]`	Format date into localized string

Quality Standards

✅ Good Descriptions:

Validates user login credentials and returns JWT token
Calculates total discount amount of items in shopping cart
Retrieve user session information from Redis

❌ Poor Descriptions:

Process data (too vague)
Helper function (no functional explanation)
get, set (lack of context)

Output Document Structure

After analysis is completed, each directory in the project has a README.md:

project/
├── README.md (Root directory overview)
├── src/
│   ├── README.md (src overview)
│   ├── utils/
│   │   ├── README.md (utils directory description)
│   │   ├── auth/
│   │   │   ├── README.md (auth module detailed description)
│   │   │   ├── login.js
│   │   │   └── register.js
│   │   └── date/
│   │       ├── README.md (date module detailed description)
│   │       └── helpers.js
│   └── services/
│       ├── README.md (services description)
│       ├── user/
│       │   ├── README.md (user service description)
│       │   └── service.js
│       └── order/
│           ├── README.md (order service description)
│           └── service.js

Each README.md contains relevant information for that level, forming a complete documentation tree.

Best Practices

1. Parallel Processing

Leaf directories can be analyzed in parallel
Use Task tool to start multiple Explore agents working in parallel
Parent directories must wait for subdirectories to complete

2. Progress Tracking

Use TodoWrite tool to update progress in real-time:

javascript

TodoWrite([
  { content: "Analyze src/utils/auth/", status: "in_progress" },
  { content: "Analyze src/utils/date/", status: "pending" },
  { content: "Generate src/utils/ README.md", status: "pending" }
]);

3. Error Handling

When encountering errors:

Record errors to state file
Mark directory as "failed"
Continue processing other directories
Provide error report at the end

4. Performance Optimization

Use Glob instead of find command to search files
Batch read files to reduce I/O operations
Process independent directories in parallel
Use file hashes to avoid repeated analysis

5. Quality Check

After generating README.md, check:

✅ All files have been analyzed
✅ All classes and functions have descriptions
✅ Descriptions are concise and accurate
✅ Markdown format is correct

Execution Examples

Example 1: Analyze Small Project

javascript

// 1. Scan directories
dirs = Glob("src/**")
// ["src/utils", "src/services", "src/models"]

// 2. Identify leaf directories
leaf_dirs = ["src/utils/auth", "src/utils/format", "src/services/user"]

// 3. Analyze leaf directories in parallel
for (dir of leaf_dirs) {
  analyze_leaf_directory(dir);
}

// 4. Analyze parent directories
analyze_branch_directory("src/utils");
analyze_branch_directory("src/services");

// 5. Generate root directory
generate_root_readme("src");

Example 2: Resume Interrupted Analysis

javascript

// 1. Load state
state = load_state(".analysis-state.json");

// 2. Get pending directories
pending = state.get_pending_directories();
// ["src/services/order", "src/services/payment"]

// 3. Continue analysis
for (dir of pending) {
  analyze_directory(dir);
}

// 4. Complete remaining parent directories
if (all_subdirs_completed("src/services")) {
  analyze_branch_directory("src/services");
}

Frequently Asked Questions

Q: What if the analysis is interrupted?

A: The state file saves all progress. The next time you run it, it will automatically resume from the last interruption.

Q: Do I need to re-analyze after code modifications?

A: The system will detect file hashes and only analyze modified files; unmodified files will be skipped.

Q: How to analyze only specific directories?

A: You can mark other directories as "skipped" in the state file, or directly specify the directory paths to analyze.

Q: Will the generated README.md overwrite existing files?

A: Yes. It is recommended to rename or back up generated README.md files to avoid overwriting important manually written documents.

Resources

references/

state-management.md: Detailed implementation of state management and resumable analysis
file-analysis.md: File analysis methods and language-specific patterns

assets/

leaf-readme-template.md: Leaf directory README template
branch-readme-template.md: Branch directory README template
root-readme-template.md: Root directory README template

codebase-understanding

NPX Install

Tags

SKILL.md Content (Chinese)

Codebase Understanding

Overview

Usage Scenarios

1. Understanding New Projects

2. Generating Technical Documentation

3. Analyzing Function Implementations

Workflow

Phase 1: Preparation and Scanning

Phase 2: Analyze Leaf Directories

Phase 3: Analyze Branch Directories

Phase 4: Generate Root Directory Documentation

State Management and Resumable Analysis

State File Location

State Check and Recovery

Incremental Update Strategy

Language Support

JavaScript / TypeScript

Python

Java

Go

Function Description Specifications

One-sentence Description Template

Quality Standards

Output Document Structure

Best Practices

1. Parallel Processing

2. Progress Tracking

3. Error Handling

4. Performance Optimization

5. Quality Check

Execution Examples

Example 1: Analyze Small Project

Example 2: Resume Interrupted Analysis

Frequently Asked Questions

Q: What if the analysis is interrupted?

Q: Do I need to re-analyze after code modifications?

Q: How to analyze only specific directories?

Q: Will the generated README.md overwrite existing files?

Resources

references/

assets/