Codebase Understanding
Overview
This skill provides a bottom-up codebase analysis method, which forms a tree-like code understanding system by generating README.md documents for each directory.
Core Features:
- Bottom-up Analysis: Start analysis from leaf directories and summarize layer by layer upwards
- One-sentence Descriptions: Summarize the function of each class and function in one sentence
- State Persistence: Supports resumable analysis, allowing you to continue after interruption
- Incremental Updates: Only analyze modified files to improve efficiency
Usage Scenarios
1. Understanding New Projects
User Request Examples:
- "Help me understand the code structure of this project"
- "What does this codebase do? What are the main modules?"
- "I just took over this project and need to understand the overall architecture"
Operation Steps:
- Scan the source code directory structure using the Glob tool
- Initialize the state file
- Start analysis from leaf directories and generate README.md files
- Generate README.md files for parent directories layer by layer upwards
- Finally generate a source code overview README.md in the root directory
2. Generating Technical Documentation
User Request Examples:
- "Generate complete code documentation for this project"
- "Need a reference document for this codebase"
- "Generate API documentation and architecture descriptions"
Operation Steps:
- Check if analysis state exists; if so, continue from there
- Complete README.md generation for all directories
- Generate an overall architecture document in the root directory
- Provide indexes for key files and functions
3. Analyzing Function Implementations
User Request Examples:
- "Where is this function implemented?"
- "Find the code that handles user login"
- "Track the complete process of order creation"
Operation Steps:
- Use the Grep tool to search for keywords (e.g., "login", "authentication")
- Read relevant files to understand function implementations
- Track function call chains
- Mark key processes in the README.md of the corresponding directory
Workflow
Phase 1: Preparation and Scanning
bash
# 1. Identify source code directories
src_dirs = ["src", "lib", "app", "server"]
# 2. Scan directory tree structure
Use Bash tool: find src -type d | sort
# 3. Initialize or load state file
If .analysis-state.json exists → Load state
If not exists → Create new state file
State File Structure: See references/state-management.md
Phase 2: Analyze Leaf Directories
Leaf directories = Directories that do not contain subdirectories (only source code files)
Steps:
-
List all files in the directory
javascript
Glob: **/*.{js,ts,py,java,go,rs,cpp,c,h}
-
Perform analysis on each file
- Read file content:
- Identify class definitions and function definitions
- Extract function signatures (function name, parameters, return type)
- Analyze function body to understand functionality
Detailed Methods: See references/file-analysis.md
-
Generate one-sentence descriptions
- Template:
[Verb] + [Object] + [Method] + [Result]
- Example:
Validates user email and password and returns user object
-
Generate README.md
- Use template: assets/leaf-readme-template.md
- Write to file:
Write(path/to/README.md, content)
-
Update state
json
{
"src/utils/auth": {
"status": "completed",
"readmeGenerated": true,
"files": ["login.js", "register.js"],
"fileHashes": { "login.js": "abc123", ... }
}
}
Parallel Processing: Multiple leaf directories can be analyzed in parallel to improve efficiency.
Phase 3: Analyze Branch Directories
Branch directories = Directories that contain subdirectories
Steps:
-
Read README.md files of all subdirectories
javascript
for (subdir of subdirs) {
readme = Read(subdir/README.md);
Extract "Directory Overview" section;
}
-
Analyze files in the current directory (if any)
- Same file analysis method as leaf directories
-
Generate README.md
- Use template: assets/branch-readme-template.md
- Include summary of subdirectory overviews
- Include analysis of current directory files
-
Update state
json
{
"src/services": {
"status": "completed",
"subdirs": ["user", "order", "payment"],
"completedSubdirs": ["user", "order", "payment"]
}
}
Dependency: Parent directories must wait for all subdirectories to be completed before analysis.
Phase 4: Generate Root Directory Documentation
Steps:
- Collect overviews of all first-level directories
- Analyze tech stack
- Read package.json / requirements.txt / pom.xml
- Identify main dependencies and frameworks
- Generate architecture diagram
- Identify layered structure
- Draw module relationship diagram
- Generate README.md
- Use template: assets/root-readme-template.md
State Management and Resumable Analysis
State File Location
Project root directory/.analysis-state.json
State Check and Recovery
Before starting analysis:
javascript
if (exists('.analysis-state.json')) {
state = load('.analysis-state.json');
print("Existing analysis state found:");
print(`Completed: ${state.stats.analyzedDirectories}/${state.stats.totalDirectories}`);
print("Resuming from last interruption...");
} else {
print("First-time analysis, initializing state file...");
state = init();
}
Resume interrupted analysis:
javascript
pending_dirs = state.get_pending_directories();
for (dir of pending_dirs) {
if (state.should_analyze(dir)) {
analyze_directory(dir);
}
}
Incremental Update Strategy
When re-analyzing:
- Calculate MD5 hash of files
- Compare with hashes saved in the state file
- If hashes are different → File has been modified, re-analyze
- If hashes are the same → Skip, use existing results
Detailed Explanation: See references/state-management.md
Language Support
JavaScript / TypeScript
Recognition Patterns:
javascript
export class UserService { }
export function createUser() { }
export const validate = () => { }
Extraction: Class name, function name, parameters, return type, async marker
Python
Recognition Patterns:
python
class UserService:
def create_user():
async def fetch_data():
Extraction: Class name, function name, parameters, return type, decorators
Java
Recognition Patterns:
java
public class UserService { }
public void createUser() { }
Extraction: Class name, method name, parameters, return type, annotations
Go
Recognition Patterns:
go
type UserService struct { }
func CreateUser() { }
func (s *UserService) Method() { }
Extraction: Type name, function name, method name, parameters, return type
Detailed Methods: See references/file-analysis.md
Function Description Specifications
One-sentence Description Template
| Function Type | Template | Example |
|---|
| Data Processing | [Verb] + [Object] + [Method] + [Result]
| Validates user email and returns verification result |
| Query Retrieval | Query [conditions] from [data source]
| Retrieve user information from database |
| Operation Execution | [Verb] + [Object] + [Result]
| Send verification email to user's mailbox |
| Tool Assistance | [Verb] + [Object] + [Conversion]
| Format date into localized string |
Quality Standards
✅ Good Descriptions:
- Validates user login credentials and returns JWT token
- Calculates total discount amount of items in shopping cart
- Retrieve user session information from Redis
❌ Poor Descriptions:
- Process data (too vague)
- Helper function (no functional explanation)
- get, set (lack of context)
Output Document Structure
After analysis is completed, each directory in the project has a README.md:
project/
├── README.md (Root directory overview)
├── src/
│ ├── README.md (src overview)
│ ├── utils/
│ │ ├── README.md (utils directory description)
│ │ ├── auth/
│ │ │ ├── README.md (auth module detailed description)
│ │ │ ├── login.js
│ │ │ └── register.js
│ │ └── date/
│ │ ├── README.md (date module detailed description)
│ │ └── helpers.js
│ └── services/
│ ├── README.md (services description)
│ ├── user/
│ │ ├── README.md (user service description)
│ │ └── service.js
│ └── order/
│ ├── README.md (order service description)
│ └── service.js
Each README.md contains relevant information for that level, forming a complete documentation tree.
Best Practices
1. Parallel Processing
- Leaf directories can be analyzed in parallel
- Use Task tool to start multiple Explore agents working in parallel
- Parent directories must wait for subdirectories to complete
2. Progress Tracking
Use TodoWrite tool to update progress in real-time:
javascript
TodoWrite([
{ content: "Analyze src/utils/auth/", status: "in_progress" },
{ content: "Analyze src/utils/date/", status: "pending" },
{ content: "Generate src/utils/ README.md", status: "pending" }
]);
3. Error Handling
When encountering errors:
- Record errors to state file
- Mark directory as "failed"
- Continue processing other directories
- Provide error report at the end
4. Performance Optimization
- Use Glob instead of find command to search files
- Batch read files to reduce I/O operations
- Process independent directories in parallel
- Use file hashes to avoid repeated analysis
5. Quality Check
After generating README.md, check:
- ✅ All files have been analyzed
- ✅ All classes and functions have descriptions
- ✅ Descriptions are concise and accurate
- ✅ Markdown format is correct
Execution Examples
Example 1: Analyze Small Project
javascript
// 1. Scan directories
dirs = Glob("src/**")
// ["src/utils", "src/services", "src/models"]
// 2. Identify leaf directories
leaf_dirs = ["src/utils/auth", "src/utils/format", "src/services/user"]
// 3. Analyze leaf directories in parallel
for (dir of leaf_dirs) {
analyze_leaf_directory(dir);
}
// 4. Analyze parent directories
analyze_branch_directory("src/utils");
analyze_branch_directory("src/services");
// 5. Generate root directory
generate_root_readme("src");
Example 2: Resume Interrupted Analysis
javascript
// 1. Load state
state = load_state(".analysis-state.json");
// 2. Get pending directories
pending = state.get_pending_directories();
// ["src/services/order", "src/services/payment"]
// 3. Continue analysis
for (dir of pending) {
analyze_directory(dir);
}
// 4. Complete remaining parent directories
if (all_subdirs_completed("src/services")) {
analyze_branch_directory("src/services");
}
Frequently Asked Questions
Q: What if the analysis is interrupted?
A: The state file saves all progress. The next time you run it, it will automatically resume from the last interruption.
Q: Do I need to re-analyze after code modifications?
A: The system will detect file hashes and only analyze modified files; unmodified files will be skipped.
Q: How to analyze only specific directories?
A: You can mark other directories as "skipped" in the state file, or directly specify the directory paths to analyze.
Q: Will the generated README.md overwrite existing files?
A: Yes. It is recommended to rename or back up generated README.md files to avoid overwriting important manually written documents.
Resources
references/
- state-management.md: Detailed implementation of state management and resumable analysis
- file-analysis.md: File analysis methods and language-specific patterns
assets/
- leaf-readme-template.md: Leaf directory README template
- branch-readme-template.md: Branch directory README template
- root-readme-template.md: Root directory README template