Code Deep Understanding Analyzer v2.3 (Chinese Version)
A professional code analysis tool based on cognitive science research, supporting three analysis depths to ensure true code understanding rather than generating fluency illusions.
Three Analysis Modes
| User Intent | Recommended Mode | Trigger Word Examples | Analysis Duration |
|---|
| Quick browsing/code review | Quick Mode | "Take a quick look", "What does this code do", "Scan briefly" | 5-10 minutes |
| Learning comprehension/technical research | Standard Mode ⭐ | "Analyze this", "Help me understand", "Explain this", "What's the principle" | 15-20 minutes |
| In-depth mastery/large-scale projects | Deep Mode 🚀 | "Thorough analysis", "Complete mastery", "In-depth research", "Interview preparation", "Overall project analysis" | 30+ minutes |
Standard Mode is used by default, and the system will automatically select the most appropriate mode based on code scale and user intent.
🚀 Deep Mode Internal Intelligent Strategy:
- Code ≤ 2000 lines: Uses progressive generation (sequential chapter filling)
- Code > 2000 lines: Automatically enables parallel processing (sub-Agents analyze chapters in parallel)
Core Philosophy: Understanding First, Memory Second
Combat Fluency Illusion
"Able to read code ≠ Able to write code"
"Able to understand explanations ≠ Able to implement independently"
"Feel like understanding ≠ Truly understand"
Core Principles:
- Understand the WHY, not just the WHAT
- Enforce self-explanation to verify true understanding
- Establish conceptual connections, not isolated memory
- Test transfer ability through application variants
Research Support:
- Dunlosky et al. - Elaborative interrogation is significantly more effective than passive reading
- Chi et al. - Self-explainers are more likely to acquire correct mental models
- Karpicke & Roediger - Retrieval practice is 250% better than repeated reading
Mandatory Pre-Analysis Check: Understanding Verification Checkpoint
Execute corresponding verification processes based on the selected mode:
Quick Mode - Simplified Verification
- Quickly identify code type and core functions
- List key concepts (no in-depth verification required)
Standard Mode - Standard Verification
- Conduct self-explanation tests on core concepts
- Verify ability to explain the "WHY"
Deep Mode - Complete Verification
- Full self-explanation test
- Application transfer ability verification
Output Format (at the beginning of the analysis document):
markdown
## Understanding Verification Status [Standard/Deep Mode Only]
|---------|---------|-------------|---------|------|
| User Authentication Flow | ✅ | ✅ | ✅ | Understood |
| JWT Token Mechanism | ✅ | ⚠️ | ❌ | ⚠️ Needs in-depth understanding |
| Password Hashing | ✅ | ✅ | ⚠️ | Basic understanding |
Output Structures for Three Modes
Quick Mode Output Structure (5-10 minutes)
markdown
# [Code Name] Quick Analysis
## 1. Quick Overview
- Programming language and version
- Code scale and type
- Core dependencies
## 2. Function Description
- What is the main function (WHAT)
- Brief explanation of WHY it's needed
## 3. Core Algorithm/Design
- Algorithm complexity (if applicable)
- Design patterns used (if applicable)
- WHY this algorithm/pattern was chosen
## 4. Key Code Snippets
- 3-5 core code snippets
- Brief explanation of each snippet's role
## 5. Dependency Relationships
- List of external libraries and their uses
## 6. Quick Usage Example
- Simple runnable example
Standard Mode Output Structure (15-20 minutes) ⭐Recommended
markdown
# [Code Name] Deep Understanding Analysis
## Understanding Verification Status
[Self-explanation test result table]
## 1. Quick Overview
- Programming language, scale, dependencies
## 2. Background and Motivation (Elaborative Interrogation)
- WHY this code is needed
- WHY this solution was chosen
- WHY other solutions were not chosen
## 3. Core Concept Explanation
- List key concepts
- Answer 2-3 WHY questions for each concept
## 4. Algorithms and Theory
- Complexity analysis
- WHY this algorithm was chosen
- Reference materials
## 5. Design Patterns
- Identified patterns
- WHY they are used
## 6. In-Depth Key Code Analysis
- Line-by-line WHY analysis
- Execution flow example
## 7. Dependencies and Usage Examples
- Detailed WHY comments
Deep Mode Output Structure (30+ minutes)
Deep Mode automatically selects the optimal strategy based on code scale to ensure sufficient depth for each chapter:
Strategy A: Progressive Generation (Code ≤ 2000 lines)
Suitable for small to medium code, generate chapters sequentially:
markdown
# [Code Name] Complete Mastery Analysis
[Includes all content from Standard Mode, plus the following sections]
## 3+. Concept Network Diagram
- Core concept list (3 WHY questions each)
- Concept relationship matrix
- Connection to existing knowledge
## 6+. Complete Execution Example
- Multi-scenario execution flow
- Boundary condition explanation
- Error-prone point annotations
## 8. Test Case Analysis (if code includes tests)
- Test file list and coverage analysis
- Boundary conditions discovered from tests
- Test-driven understanding verification
## 9. Application Transfer Scenarios (at least 2)
- Scenario 1: Invariant principles + modified parts + WHY
- Scenario 2: Invariant principles + modified parts + WHY
- Extract general patterns
## 10. Dependency Relationships and Usage Examples
- Detailed WHY comments
## 11. Quality Verification Checklist
- Understanding depth verification
- Technical accuracy verification
- Practicality verification
- Final "Four Abilities" test
Strategy B: Parallel Processing (Code > 2000 lines) 🚀
Suitable for large projects, uses sub-Agent parallel architecture:
Core Architecture
┌─────────────────────────────────────────────────────────────┐
│ Main Coordinator Agent │
│ - Generates analysis outline and directory framework │
│ - Identifies core concept list (shared with sub-Agents) │
│ - Assigns chapter tasks │
│ - Aggregates sub-Agent results │
│ - Final quality verification │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │
│ Background & Motivation │ │ Core Concepts │ │ Algorithms & Theory │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└─────────────────┼─────────────────┘
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Sub-Agent 4 │ │ Sub-Agent 5 │ │ Sub-Agent 6 │
│ Design Patterns │ │ Code Analysis │ │ Application Transfer │
└─────────────┘ └─────────────┘ └─────────────┘
Parallel Execution Flow
| Phase | Executor | Operation | Output |
|---|
| 1. Framework Preparation | Main Agent | Quick overview of code, generates outline and core concept list | |
| 2. Task Distribution | Main Agent | Creates independent task descriptions for each chapter | Task list |
| 3. Parallel Processing | Sub-Agents | Each sub-Agent focuses on one chapter, generates in-depth content | |
| 4. Result Aggregation | Main Agent | Merges all chapters, unifies format | |
| 5. Quality Verification | Main Agent | Checks depth standards, supplements weak sections | Final document |
Chapter Task Definition (Instruction Template for Sub-Agents)
markdown
# Sub-Agent Task: [Chapter Name]
## Context Information
- **Project/Code Name:** [Project/Code Name]
- **Programming Language:** [Language]
- **Code Scale:** [Line count]
- **Core Concepts:** [Concept list from Main Agent]
## Your Task
You are a specialized analysis expert responsible for the "**[Chapter Name]**" section. Please conduct in-depth analysis of this section and generate detailed content.
## Output Requirements
1. **Content Depth:** This chapter must be at least [X] words
2. **WHY Analysis:** Each key point must answer 3 WHY questions
3. **Code Comments:** Use scenario/step + WHY style
4. **Citation Sources:** Provide authoritative reference links
5. **Independence:** Generate complete independent chapter content, no need to reference other chapters
## Output Format
Directly output Markdown-formatted chapter content, starting with `## [Chapter Name]`.
## Depth Standards
- [ ] All sub-items are covered (no "brief" or "same as above")
- [ ] Each WHY has at least 2-3 sentences of explanation
- [ ] Code examples have complete comments
- [ ] Execution flow has specific data tracking
Start analysis:
Main Agent Aggregation Logic
markdown
# Parallel Deep Mode Aggregation Specification
## Aggregation Steps
1. **Read All Sub-Chapters**
chapter_1_background_and_motivation.md
chapter_2_core_concepts.md
chapter_3_algorithms_and_theory.md
chapter_4_design_patterns.md
chapter_5_code_analysis.md
chapter_6_test_case_analysis.md (if applicable)
chapter_7_application_transfer.md
chapter_8_dependency_relationships.md
chapter_9_quality_verification.md
2. **Merge Order**
```markdown
# [Project/Code Name] Complete Mastery Analysis (Parallel Deep Version)
## Understanding Verification Status
[Generated from Main Agent's preliminary analysis]
[Insert chapter content in order]
-
Cross-Check
- Core concepts are consistently defined across chapters
- WHY explanations have no contradictions
- Cited code examples are consistent
-
Depth Verification
- Each chapter meets word count requirements
- WHY analysis is sufficient
- Execution examples are complete
#### Implementation Pseudocode
Function: ParallelDeepMode(code, work_directory):
// ========== Phase 1: Framework Preparation ==========
framework = {
"project_name": extract_name(code),
"language": identify_language(code),
"total_lines": count_lines(code),
"core_concepts": extract_core_concepts(code), // Shared with all sub-Agents
"chapters": [
"Background and Motivation",
"Core Concepts",
"Algorithms and Theory",
"Design Patterns",
"Key Code Analysis",
"Test Case Analysis",
"Application Transfer Scenarios",
"Dependency Relationships",
"Quality Verification"
]
}
write_file(f"{work_directory}/00-framework.json", framework)
// ========== Phase 2: Create Sub-Tasks ==========
subtask_list = []
for each chapter in framework["chapters"]:
task_description = generate_task_template(chapter, framework)
task_file = f"{work_directory}/tasks/{chapter}-task.md"
write_file(task_file, task_description)
subtask_list.append(task_file)
// ========== Phase 3: Execute Sub-Agents in Parallel ==========
// Note: Actual execution uses Task tool to create parallel sub-Agents
chapter_file_list = []
for each task_file in subtask_list:
// Create sub-Agent (execute in parallel)
sub_agent = create_agent(
name: f"Analyst-{chapter}",
task: read_file(task_file),
code: code,
output_file: f"{work_directory}/chapters/{chapter}.md"
)
// Start parallel execution
sub_agent.start(parallel=True)
chapter_file_list.append(sub_agent.output_file)
// Wait for all sub-Agents to complete
wait_for_all(chapter_file_list)
// ========== Phase 4: Result Aggregation ==========
complete_document = "# {framework['project_name']} Complete Mastery Analysis\n\n"
complete_document += "## Understanding Verification Status\n\n"
complete_document += generate_verification_table(framework) + "\n\n"
for each chapter_file in chapter_file_list:
chapter_content = read_file(chapter_file)
complete_document += chapter_content + "\n\n"
// ========== Phase 5: Quality Verification ==========
if not pass_depth_check(complete_document):
weak_chapters = identify_weak_sections(complete_document)
for each chapter in weak_chapters:
// Re-execute sub-Agent for this chapter, require deeper content
re_execute(chapter)
complete_document = update_chapter(complete_document, chapter)
// ========== Final Output ==========
final_file = f"{work_directory}/{framework['project_name']}-complete-mastery-analysis.md"
write_file(final_file, complete_document)
return final_file
---
## Analysis Process (Research-Driven)
**Depth Standards for Each Chapter:**
```markdown
## Depth Self-Check Checklist (Check after completing each chapter)
### Content Completeness
- [ ] All sub-items of the chapter are covered (no "brief" or "same as above")
- [ ] Each WHY has specific explanations (not just one sentence)
- [ ] Code examples have complete comments (scenario/step + WHY)
### Analysis Depth
- [ ] Each core concept has complete answers to 3 WHY questions
- [ ] Algorithms have complexity analysis + selection reasons
- [ ] Design patterns have WHY to use + consequences of not using
- [ ] Execution flow has specific data tracking
### Practicality
- [ ] Error-prone points are annotated
- [ ] Boundary conditions are explained
- [ ] At least 2 application transfer scenarios
Implementation Method (Pseudocode Flow):
Function: DeepModeProgressiveGeneration(code, file_path):
// Phase 1: Generate Framework
framework = generate_complete_outline(Standard structure + Deep extensions)
write_file(file_path, framework)
// Phase 2: Fill Chapters One by One
chapter_list = [
"1. Quick Overview",
"2. Background and Motivation",
"3. Core Concepts",
"4. Algorithms and Theory",
"5. Design Patterns",
"6. In-Depth Key Code Analysis",
"7. Test Case Analysis (if applicable)",
"8. Application Transfer Scenarios",
"9. Dependency Relationships",
"10. Quality Verification"
]
for each chapter in chapter_list:
current_content = read_file(file_path)
// Generate chapter content (focus on one task at a time to ensure depth)
chapter_content = generate_deep_chapter(chapter, code)
// Requirement: Each chapter is at least 300-500 words, code snippets have complete comments
// Depth Self-Check
if not pass_depth_check(chapter_content):
chapter_content = append_details(chapter_content)
// Update File
new_content = current_content.replace(chapter_placeholder, chapter_content)
write_file(file_path, new_content)
// Phase 3: Overall Verification
complete_document = read_file(file_path)
if not pass_overall_check(complete_document):
weak_chapters = identify_weak_sections(complete_document)
for chapter in weak_chapters:
supplement_content(chapter)
return file_path
Analysis Process (Research-Driven)
Step 1: Quick Overview
Goal: Establish an overall mental model
Must Identify:
- Programming Language and version
- File/project scale
- Core Dependencies
- Code type (algorithm, business logic, framework code, etc.)
Step 2: Elaborative Interrogation - Background and Motivation
Core Questions (Must Answer):
-
WHY: Why is this code needed?
- What practical problem does it solve?
- What would happen if this code didn't exist?
-
WHY: Why was this technical solution chosen?
- What alternative solutions are there?
- Why weren't other solutions chosen?
- What are the trade-offs of this solution?
-
WHY: Why is it needed in this timing/scenario?
- In what business process is it used?
- What are the preconditions and postconditions?
Output Format:
markdown
## Background and Motivation Analysis
### Problem Essence
**Problem to Solve:** [Describe in one sentence]
**WHY It Needs to Be Solved:** [Consequences of not solving it]
### Solution Selection
**Selected Solution:** [Current implementation method]
**WHY This Solution Was Chosen:**
- Advantages: [List 2-3 key advantages]
- Disadvantages: [List 1-2 known limitations]
- Trade-offs: [Explain what trade-offs were made]
**Alternative Solution Comparison:**
- Solution A: [Brief description] - WHY not chosen: [Reason]
- Solution B: [Brief description] - WHY not chosen: [Reason]
### Application Scenarios
**Applicable Scenarios:** [Specific scenario description]
**WHY Applicable:** [Explain why this scenario is suitable]
**Inapplicable Scenarios:** [List boundary conditions]
**WHY Inapplicable:** [Explain why certain scenarios are not suitable]
Step 3: Concept Network Construction
Goal: Establish connections between concepts, not isolated memory
Must Include:
-
Core Concept Extraction
- Identify all key concepts (classes, functions, algorithms, data structures)
- Each concept must answer 3 WHY questions
-
Concept Relationship Mapping
- Dependency relationship: A depends on B - WHY?
- Comparison relationship: A vs B - WHY choose A?
- Combination relationship: A + B → C - WHY combine this way?
-
Knowledge Connection
- Connect to known concepts
- Connect to design patterns
- Connect to theoretical foundations
Output Format:
markdown
## Concept Network Diagram
### Core Concept List
**Concept 1: User Authentication**
- **What it is:** The process of verifying user identity
- **WHY needed:** Protect system resources from unauthorized access
- **WHY implemented this way:** Use JWT for stateless authentication to reduce server pressure
- **WHY not use other methods:** Session-based methods require server storage, which is not conducive to horizontal scaling
**Concept 2: Password Hashing**
- **What it is:** Convert plaintext passwords into irreversible hash values
- **WHY needed:** Even if the database is compromised, attackers cannot obtain original passwords
- **WHY use bcrypt:** Built-in salt, adjustable computational cost to resist brute-force attacks
- **WHY not use MD5/SHA1:** Too fast to compute, vulnerable to brute-force attacks
### Concept Relationship Matrix
|---------|--------|--------|-------------|
| Dependency | User Authentication | Password Hashing | Authentication requires password verification, which must be hashed first for comparison |
| Sequence | Password Hashing | Token Generation | Access Token can only be generated after password verification passes |
| Comparison | JWT | Session | JWT is stateless, suitable for distributed systems; Session is stateful, increases server pressure |
### Connection to Existing Knowledge
- **Connection to Design Patterns:** [Detailed below]
- **Connection to Algorithm Theory:** [Detailed below]
- **Connection to Security Principles:** Least privilege principle, defense-in-depth principle
Step 4: In-Depth Algorithm and Theory Analysis
Mandatory Requirements: All algorithms and core theories must:
- Mark time/space complexity
- Explain "WHY this complexity is acceptable"
- Provide authoritative reference materials
- Explain scenarios where performance degrades
Output Format:
markdown
## Algorithm and Theory Analysis
### Algorithm: Quick Sort
**Basic Information:**
- **Time Complexity:** Average O(n log n), Worst O(n²)
- **Space Complexity:** O(log n)
**Elaborative Interrogation:**
**WHY Choose Quick Sort?**
- Excellent average performance, usually the fastest in practical applications
- In-place sorting, high space efficiency
- Cache-friendly, good access locality
**WHY Is Worst-Case O(n²) Acceptable?**
- Worst-case scenario has very low probability (can be avoided through randomization)
- Actual data is usually not fully sorted/reverse sorted
- Can be optimized with Median-of-Three method
**WHY Not Choose Other Sorting Algorithms?**
- Merge Sort: Requires O(n) additional space, not suitable for memory-constrained scenarios
- Heap Sort: Although stable O(n log n), poor cache performance, slower than Quick Sort in practice
- Insertion Sort: Excellent for small datasets, but O(n²) is not suitable for large-scale data
**When Does Performance Degrade?**
- Input is already sorted or reverse sorted (can be solved with randomization)
- Poor pivot selection (can be solved with Median-of-Three)
- Large number of duplicate elements (can be optimized with three-way Quick Sort)
**Reference Materials:**
- [Quick Sort - Wikipedia](https://en.wikipedia.org/wiki/Quicksort)
- [Quick Sort Analysis - Princeton](https://algs4.cs.princeton.edu/23quicksort/)
- [Why is QuickSort better than MergeSort?](https://stackoverflow.com/questions/70402/why-is-quicksort-better-than-other-sorting-algorithms-in-practice)
### Theoretical Foundation: JWT (JSON Web Token)
**WHY Use JWT?**
- Stateless authentication, no need for server to store Sessions
- Self-contained, Token carries all necessary information
- Cross-domain friendly, suitable for microservice architecture
**WHY Is JWT Secure?**
- Uses signature to verify integrity
- Cannot be forged (unless private key is leaked)
- Can set expiration time (exp)
**WHY Does JWT Have Limitations?**
- Cannot be invalidated proactively (unless maintaining a blacklist, which undermines stateless advantage)
- Token size is relatively large (Base64 encoding increases size by about 33%)
- Sensitive information needs encryption, signature alone does not provide confidentiality
**Reference Materials:**
- [JWT.io - Introduction](https://jwt.io/introduction)
- [RFC 7519 - JWT Specification](https://tools.ietf.org/html/rfc7519)
Step 5: Design Pattern Identification and Interrogation
Mandatory Check: Each design pattern used in the code must:
- Clearly mark the pattern name
- Explain WHY this pattern is used
- Explain what would happen if this pattern was not used
- Provide standard references
Output Format:
markdown
## Design Pattern Analysis
### Pattern 1: Singleton Pattern
**Application Location:** `DatabaseConnection` class
**WHY Use Singleton?**
- Database connections have high overhead, reusing a single instance saves resources
- Avoids connection pool chaos, unified connection lifecycle management
- Global unique access point, easy to control concurrency
**WHY Not Use Singleton?**
- Creating new connections for each operation leads to resource exhaustion
- Multiple connection instances may cause transaction inconsistencies
- Difficult to control concurrent access
**Implementation Details:**
```python
class DatabaseConnection:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
# WHY initialize in __new__:
# Ensure singleton before object creation, thread-safe
return cls._instance
WHY Implement This Way?
- Use instead of : Control instance creation, not initialization
- Class variable : Stores the unique instance
- Lazy Loading: Only creates instance when first used
Potential Issues:
- ⚠️ Not thread-safe (needs locking in multi-threaded environments)
- ⚠️ Difficult unit testing (global state is hard to isolate)
- ⚠️ Violates single responsibility principle (class manages its own instance)
Better Alternative Solutions:
- Dependency Injection: More flexible, easier to test
- Module-level variables: Python modules are naturally singletons
Reference Materials:
---
### Step 6: In-Depth Line-by-Line Analysis (Key Code Snippets)
**Core Principles:**
- Select 3-5 most critical code snippets
- Each line of code must explain "what it does" + "WHY it's done this way"
- Provide execution flow examples with specific data
- Annotate error-prone points and boundary conditions
**Output Format:**
```markdown
## In-Depth Key Code Analysis
### Code Snippet 1: User Authentication Function
**Overall Role:** Verify username and password, return JWT Token or None
**WHY This Function Is Needed:** Authentication is the first line of defense for system security, must be reliable and efficient
**Original Code:**
```python
def authenticate_user(username, password):
user = db.find_user(username)
if not user:
return None
if verify_password(password, user.password_hash):
return generate_token(user.id)
return None
In-Depth Line-by-Line Analysis (Recommended Comment Style): Scenario-Based + Execution Flow Tracking
Comment Style Explanation:
# Scenario N: [Description]
/ // Scenario N: [Description]
- Mark different execution paths for conditional branches (if/else, switch, match, etc.)
- / - Mark serial execution flows (initialization order, function call sequence, etc.)
- Comment symbols match the language: Use for Python, for C++/Java
- Track execution flow with specific variable values ( / )
- Note iteration status of loops/recursion
- Mark change trajectories of key data
python
def authenticate_user(username, password):
# Step 1: Query user
user = db.find_user(username)
# WHY query user first: Avoid password hashing for non-existent usernames (save computation)
# Scenario 1: If user does not exist, immediately return None
if not user:
return None
# WHY return None instead of throwing exception: Authentication failure is a normal business process, not an exception
# WHY not distinguish between "user does not exist" and "wrong password": Prevent username enumeration attacks
# Scenario 2: If password verification passes, generate and return Token
if verify_password(password, user.password_hash):
# verify_password internal flow:
# 1. Extract salt from password_hash
# 2. Hash plaintext password with the same salt
# 3. Constant-time comparison of two hash values (prevent timing attacks)
return generate_token(user.id)
# Current state: user.id = 42 (example)
# generate_token(42) → "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
# Scenario 3: Wrong password, return None
return None
# WHY same return value as "user does not exist": Prevent attackers from distinguishing between the two failure cases
Complete Execution Flow Example (Multi-Scenario Tracking):
cpp
// Example: Trace function that produces tensors (typical compiler code style)
Value getProducerOfTensor(Value tensor) {
Value opResult;
while (true) {
// Scenario 1: If tensor is defined by LinalgOp, return directly
if (auto linalgOp = tensor.getDefiningOp<LinalgOp>()) {
opResult = cast<OpResult>(tensor);
// while loop runs only once
return;
}
// According to this section's example, first call to this function: tensor = %2_tile
// Scenario 2: If tensor is linked via ExtractSliceOp, continue tracing source
if (auto sliceOp = tensor.getDefiningOp<tensor::ExtractSliceOp>()) {
tensor = sliceOp.getSource();
// Current state: tensor = %2, defined by linalg.matmul
// Execute second while loop, will enter Scenario 1 branch (linalg.matmul is LinalgOp)
continue;
}
// Scenario 3: Via scf.for iteration parameter
// Example IR:
// %1 = linalg.generic ins(%A) outs(%init) { ... }
// %2 = scf.for %i = 0 to 10 iter_args(%arg = %1) {
// %3 = linalg.generic ins(%arg) outs(%init2) { ... }
// scf.yield %3
// }
// getProducerOfTensor(%arg)
if (auto blockArg = dyn_cast<BlockArgument>(tensor)) {
// First while loop: tensor = %arg, which is BlockArgument
if (auto forOp = blockArg.getDefiningOp<scf::ForOp>()) {
// %arg is defined by scf.for, get loop's initial value: %1
// blockArg.getArgNumber() = 0 (%arg is the 0th iteration parameter)
// forOp.getInitArgs()[0] = %1
tensor = forOp.getInitArgs()[blockArg.getArgNumber()];
// Current state: tensor = %1, defined by linalg.generic
// Execute second while loop, will enter Scenario 1 branch
continue;
}
}
return; // Not found (may be function parameter)
}
}
Recommended Execution Flow Example Style:
Scenario 1: Authentication Success
# Initial State
Input: username="alice", password="Secret123!"
# Execution Path
Step 1: db.find_user("alice")
→ Query database
→ Return User(id=42, username="alice", password_hash="$2b$12$KIX...")
# Current state: user exists, skip return None in Scenario 1
Step 2: Enter Scenario 2 branch (password verification)
→ verify_password("Secret123!", "$2b$12$KIX...")
→ Extract salt: $2b$12$KIX...
→ Hash "Secret123!" with salt
→ Constant-time comparison of hash values
→ Return True
Step 3: generate_token(42)
→ Create payload: {"user_id": 42, "exp": 1643723400}
→ Sign with private key
→ Return "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjo0Miwi..."
# Final return: Token string
# Performance Analysis
Time consumed: ~100ms (mainly bcrypt computation)
Scenario 2: User Does Not Exist
# Initial State
Input: username="bob", password="anything"
# Execution Path
Step 1: db.find_user("bob")
→ Query database
→ Return None
# Current state: user = None, enter Scenario 1 branch
Step 2: if not user: # true
→ Directly return None
# Scenarios 2 and 3 are not executed
# Performance Analysis
Time consumed: ~5ms (only database query)
⚠️ Note: Much faster than authentication success, may leak whether user exists
# Security Recommendation: Add fixed delay or fake hash computation to make response times similar for both cases
Scenario 3: Wrong Password
# Initial State
Input: username="alice", password="WrongPass"
# Execution Path
Step 1: db.find_user("alice")
→ Return User(id=42, ...)
# Current state: user exists, skip return None in Scenario 1
Step 2: Enter Scenario 2 branch (password verification)
→ verify_password("WrongPass", "$2b$12$KIX...")
→ Hash "WrongPass"
→ Compare hash values
→ Return False
Step 3: Password verification fails, do not execute generate_token
→ Continue to final return None
# Scenario 3: Password verification fails, return None
# Performance Analysis
Time consumed: ~100ms (similar to authentication success)
✅ Advantage: Cannot determine if password is correct via response time
Key Takeaways Summary:
-
Security Considerations:
- ✅ Plaintext password only exists briefly in memory, immediately hashed for verification
- ✅ Failure reasons are not disclosed (prevent username enumeration)
- ✅ Constant-time comparison (prevent timing attacks)
- ⚠️ Potential issue: Faster response when user does not exist (needs optimization)
-
Performance Optimization:
- ✅ Quick return when user does not exist, no wasted hash computation
- ⚠️ But this causes timing leakage, need to balance security and performance
-
Error Handling:
- ✅ Use None to indicate failure, clear and conforms to Python conventions
- ⚠️ Caller must check return value, otherwise may misuse None
-
Improvement Areas:
- Add logging for failed attempts (detect brute-force attacks)
- Add Rate Limiting
- Unify response times for failure scenarios
---
### Step 6.5: Reverse Understanding via Test Cases (If Tests Exist)
**Goal:** Reverse verify and deepen understanding of code functionality through test cases
**Why It's Important:**
- Test cases reflect the **expected behavior** of the code, making them the most accurate "user manual"
- Tests usually cover **boundary conditions** and **exception scenarios**, which are easily overlooked in the main code
- Tests can **verify if understanding is correct**, avoiding false assumptions
**Must execute this step when code contains test files.**
#### 6.5.1 Test File Identification
**Common Test File Patterns:**
| Language | Test File Patterns | Test Directory Structure |
|------|-------------|-------------|
| **Python** | `test_*.py`, `*_test.py` | `tests/`, `test/` |
| **JavaScript/TypeScript** | `*.test.ts`, `*.test.js` | `__tests__/`, `tests/` |
| **Go** | `*_test.go` | Same directory as source code, `*_test.go` |
| **Java** | `*Test.java`, `*Tests.java` | `src/test/java/` |
| **C++** | `*.cpp` (contains tests), gtest | `test/`, `tests/`, `unittest/` |
| **Rust** | `*_test.rs`, `tests/*.rs` | `tests/` |
| **MLIR/LLVM** | `*.mlir` (test files) | `test/Dialect/*/` |
**Large Project Test Directory Structure Example:**
```bash
# MLIR Style (independent test directory)
mlir/test/Dialect/Linalg/
├── ops.mlir # Linalg dialect operation tests
├── transformation.mlir # Transformation tests
├── interfaces.mlir # Interface tests
└── invalid.mlir # Error handling tests
# Traditional C++ Project Style
project/test/
├── unittest/ # Unit tests
├── integration/ # Integration tests
└── benchmark/ # Performance tests
6.5.2 Test Coverage Analysis
Analyze Functionality Covered by Tests:
markdown
## Test Case Coverage Analysis
### Test File List
|--------------|-----------|-------------|
| `test/Dialect/Linalg/ops.mlir` | Linalg Ops | 156 |
| `test/Dialect/Linalg/invalid.mlir` | Error Handling | 43 |
| `unittest/test_auth.cpp` | `authenticate_user()` | 12 |
### Function Coverage Matrix
|---------|-----------|---------|-----------|
| linalg.matmul operation | `Dialect/Linalg/Ops/*` | ✅ Has tests | Covers normal + boundary cases |
| linalg.generic interface | `Interfaces/*` | ✅ Has tests | Fully covered |
| Tile transformation | `Transforms/Tiling.cpp` | ⚠️ Insufficient tests | Missing nested scenarios |
6.5.3 Understanding Boundary Conditions Through Tests
Extract Key Boundary Conditions from Tests:
markdown
## Boundary Conditions Discovered from Tests
### MLIR Example: Understanding linalg.generic Region Constraints
#### Test File: test/Dialect/Linalg/invalid.mlir
```mlir
// Test: generic region must have exactly one block
func.func @invalid_generic_empty_region(%arg0: tensor<10xf32>) -> tensor<10xf32> {
%0 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>],
iterator_types = ["parallel"]}
outs(%arg0) {
// Empty region - should report error
} -> tensor<10xf32>
return %0 : tensor<10xf32>
}
WHY This Test Is Important:
- Reveals structural constraints of : Must have a block
- Clearly defines error conditions through negative testing (invalid test)
- Boundary condition: Number of region blocks must = 1
Test File: test/Dialect/Linalg/ops.mlir
mlir
// Test: Number of inputs and outputs must match indexing_maps
func.func @generic_mismatched_maps(%a: tensor<10xf32>, %b: tensor<10xf32>) -> tensor<10xf32> {
%0 = linalg.generic {
indexing_maps = [
affine_map<(d0) -> (d0)>, // Map for 1 input
affine_map<(d0) -> (d0)> // Map for 1 output
],
iterator_types = ["parallel"]
} ins(%a, %b : tensor<10xf32>, tensor<10xf32>) // But there are 2 inputs
outs(%0 : tensor<10xf32>) {
^bb0(%in: f32, %in_2: f32, %out: f32):
linalg.yield %in : f32
} -> tensor<10xf32>
return %0 : tensor<10xf32>
}
WHY This Is Handled This Way:
- Verifies type system constraints: Number of inputs/outputs must match maps
- Tests static verification logic, catches errors at compile time
- Illustrates MLIR's static strong typing feature
C++ Example: Understanding Concurrent Security Through Tests
Test File: unittest/concurrent_map_test.cpp
cpp
// Test: Concurrent insertion of the same key
TEST(ConcurrentMapTest, ConcurrentInsertSameKey) {
ConcurrentMap<int, int> map;
const int num_threads = 10;
const int key = 42;
std::vector<std::thread> threads;
for (int i = 0; i < num_threads; ++i) {
threads.emplace_back([&map, key, i]() {
map.Insert(key, i); // All threads insert the same key
});
}
for (auto& t : threads) t.join();
// Verify: Only one insertion succeeds
EXPECT_EQ(map.Size(), 1);
EXPECT_TRUE(map.Contains(key));
}
WHY This Test Exists:
- Verifies thread safety: Multi-threaded concurrent access does not cause crashes
- Illustrates conflict handling strategy: Later insertions overwrite earlier ones (or vice versa)
- Tests consistency guarantees: Final state meets expectations
#### 6.5.4 Test-Driven Understanding Example
**Complete Example: Understanding `linalg.tile` Transformation Through MLIR Tests**
```markdown
## Reverse Understanding via Test Cases: linalg.tile Transformation
### Question: Can we fully understand tile behavior just by reading documentation?
**Documentation Description (Simplified):**
> `linalg.tile` decomposes linalg operations into smaller fragments
**Potentially Missing Details:**
1. How is tile size determined?
2. Which operations support tiling?
3. What is the loop order after tiling?
4. How to handle remaining elements?
### Answers Discovered from Tests
#### Test 1: test/tile-mlir.mlir - Basic Tile Behavior
```mlir
// Original operation
%0 = linalg.matmul ins(%A: tensor<128x128xf32>, %B: tensor<128x128xf32>)
outs(%C: tensor<128x128xf32>)
// Tile size 32x32
%1 = linalg.tile %0 tile_sizes[32, 32]
Discovery: Tile size is specified directly, output contains nested loop structure
Test 2: test/tile-mlir.mlir - Handling Remaining Elements
mlir
// 127x127 matrix, tile size 32x32
%0 = linalg.matmul ins(%A: tensor<127x127xf32>, ...)
%1 = linalg.tile %0 tile_sizes[32, 32]
Discovery: Automatically generates boundary checks to handle uneven remaining elements
Test 3: test/tile-mlir.mlir - Operations That Cannot Be Tiled
mlir
// Attempt to tile unsupported operation
%0 = linalg.generic ...
%1 = linalg.tile %0 tile_sizes[16]
// Expected: Compilation error or runtime failure
Discovery: Not all operations support tiling, there are clear constraints
Understanding Comparison Before and After Tests
| Question | After Reading Documentation Only | After Reading Tests |
|---|
| How to specify tile size? | ⚠️ Unclear | ✅ Directly as parameter |
| How to handle remaining elements? | ❓ Not mentioned in documentation | ✅ Automatic boundary checks |
| Which operations are supported? | ❓ Incomplete list | ✅ Tests cover all supported operations |
| What is the loop order? | ⚠️ Vague description | ✅ Can see order from test IR |
Conclusion: Test cases supplement approximately 50% of implementation details!
#### 6.5.5 Key Points for Parsing Test Files in Different Languages
**Notes for Testing in Each Language:**
```markdown
## Key Points for Parsing Test Files in Different Languages
### Python (pytest/unittest)
- Look for `test_*.py` or `*_test.py`
- Pay attention to `@pytest.mark.parametrize` parameterized tests
- Focus on `pytest.raises` exception tests
- Look for fixtures (`conftest.py`) to understand test context
### C++ (gtest/gtest)
- Look for `*_test.cpp` or `test/*.cpp`
- `TEST_F` indicates fixture tests with preconditions
- `EXPECT_*` vs `ASSERT_*`: Whether to continue after failure
- `TEST_P` indicates parameterized tests
### MLIR/LLVM
- Test files are usually `.mlir` or `.td`
- `RUN:` commands specify how to execute tests
- `// EXPECTED:` marks expected output
- `// ERROR:` marks expected compilation errors
- FileCheck directives: `CHECK-`, `CHECK-NOT:`, `CHECK-DAG:`
### JavaScript/TypeScript (Jest)
- `*.test.ts`, `*.spec.ts`
- `describe/it` nested structure
- `expect(...).toThrow()` exception tests
- `beforeEach/afterEach` hook functions
### Go
- Tests are in the same directory as source code: `*_test.go`
- `TestXxx(t *testing.T)` basic tests
- `TableDrivenTests` table-driven tests
- `TestMain` test entry point
### Rust
- `*_test.rs` inline tests
- `tests/` directory integration tests
- `#[should_panic]` exception tests
- `#[ignore]` skipped tests
6.5.6 Test Quality Evaluation
Evaluate Whether Tests Are Sufficient:
markdown
## Test Quality Evaluation
### Covered Function Points
- ✅ Normal flow
- ✅ Boundary inputs
- ✅ Exception inputs
- ⚠️ Concurrent scenarios
- ❌ Performance tests
### MLIR-Specific Evaluation
- ✅ Positive tests (valid.mlir)
- ✅ Negative tests (invalid.mlir)
- ⚠️ Performance regression tests
- ❌ Cross-dialect interaction tests
### Test Deficiency Warnings
> ⚠️ **Warning: This module has insufficient test coverage**
> - Uncovered scenarios: [List specifically]
> - Recommended supplements: [Specific suggestions]
6.5.7 Test Case Analysis Output Template
markdown
## Test Case Analysis
### Test File Structure
[List test files/directories and their corresponding source code modules]
### Key Test Case Interpretation
[Select 3-5 most valuable test cases]
### Hidden Behavior Discovered from Tests
[List details easily overlooked when only reading main code]
### Test Coverage Evaluation
- Core function coverage: X%
- Boundary condition coverage: [Sufficient/Insufficient]
### Test Quality Recommendations
[If tests are insufficient, propose improvement suggestions]
Step 9: Application Transfer Test (Verify True Understanding)
Goal: Test whether concepts can be applied to different scenarios
Must Include:
- At least 2 application scenarios in different domains
- Explain how to adjust code to adapt to new scenarios
- Mark which principles remain unchanged and which need modification
Output Format:
markdown
## Application Transfer Scenarios
### Scenario 1: Apply User Authentication to API Key Verification
**Original Scenario:** Web user login authentication
**New Scenario:** Third-party API key verification
**Invariant Principles:**
- Core process of verifying "who is calling"
- Hash-stored credentials (API keys should also be hashed)
- Access token generation mechanism
**Modified Parts:**
```python
# Original: Username + Password
def authenticate_user(username, password):
user = db.find_user(username)
if not user:
return None
if verify_password(password, user.password_hash):
return generate_token(user.id)
return None
# Transferred: API Key
def authenticate_api_key(api_key):
# WHY only one parameter: API key itself is both identity and credential
app = db.find_app_by_key_prefix(api_key[:8])
# WHY query by prefix: Avoid full table scan, API key prefix as index
if not app:
return None
if verify_api_key(api_key, app.key_hash):
# WHY hash too: Prevent key leakage if database is compromised
return generate_token(app.id, scope=app.permissions)
# WHY add scope: API keys usually have different permission levels
return None
WHY Transfer This Way:
- Retain core security principles (hash storage, constant-time comparison)
- Adjust business logic (single parameter, permission scope)
- Optimize query performance (prefix index)
Learned General Pattern:
- Similar structure can be used in any scenario that needs to verify "who is calling"
- Core: Find entity → Verify credential → Generate token
- Variations: Credential form, query method, token content
Scenario 2: Apply Quick Sort to Log Analysis
Original Scenario: Sort user list by ID
New Scenario: Sort millions of logs by timestamp
Invariant Principles:
- Divide and conquer idea: Recursively decompose problems
- Pivot selection: Key factor affecting performance
- In-place sorting: Saves space
Adjusted Parts:
python
# Original: Simple Quick Sort
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
# Transferred: Log Sorting (External Sort + Optimization)
def quicksort_logs(log_file, output_file, memory_limit):
# WHY external sort: Data volume exceeds memory, cannot be loaded all at once
# 1. Split and sort chunks
chunks = split_file_into_chunks(log_file, memory_limit)
# WHY split into chunks: Each chunk can be loaded into memory and sorted individually
for chunk in chunks:
logs = load_chunk(chunk)
# WHY use timsort instead of quicksort:
# - Logs are usually partially ordered (appended by time)
# - Timsort is optimized for partially ordered data to O(n)
# - Python's built-in sorted() is timsort
logs.sort(key=lambda log: log.timestamp)
save_sorted_chunk(chunk, logs)
# 2. Merge sorted chunks
merge_sorted_chunks(chunks, output_file)
# WHY merge: Combine multiple sorted sequences into one sorted sequence
return output_file
WHY Not Use Quick Sort Directly:
- Data volume exceeds memory: Needs external sorting
- Logs are partially ordered: Timsort is better
- Stable sorting required: Maintain order of logs with same timestamp
Learned General Pattern:
- Algorithm selection depends on data characteristics (scale, order, stability requirements)
- Basic principles can be transferred (divide and conquer, comparison), but implementation needs adjustment
- Ultra-large data requires external algorithms (split + merge)
---
### Step 10: Dependency Relationships and Usage Examples
(Similar to original version, but with added WHY explanations)
```markdown
## Dependency Relationship Analysis
### External Libraries
**bcrypt (v5.1.0)**
- **Purpose:** Password Hashing
- **WHY Choose bcrypt:**
- Built-in salt, no manual management needed
- Adjustable computational cost (cost factor)
- Resists GPU/ASIC accelerated attacks
- **WHY Not Use SHA256:** Too fast to compute, vulnerable to brute-force attacks
- **WHY Not Use scrypt/argon2:** bcrypt is more mature and has better compatibility
**jsonwebtoken (v9.0.0)**
- **Purpose:** JWT token generation and verification
- **WHY Choose JWT:** Stateless authentication, suitable for distributed systems
- **WHY Not Use Session:** Session requires server storage, not conducive to scaling
### Internal Module Dependencies
**database.js → auth.js**
- **Dependency Reason:** Authentication requires querying user data
- **WHY This Design:** Separate data access and business logic (single responsibility principle)
**utils/crypto.js → auth.js**
- **Dependency Reason:** Authentication requires password hashing and verification
- **WHY Encapsulate into Utility Module:** Encryption logic is complex, centralized management is more secure
## Complete Usage Example
(Includes detailed WHY comments)
### Example 1: Standard User Login Flow
```javascript
// 1. Import authentication module
const auth = require('./auth');
// 2. Receive user input (from login form)
const username = req.body.username; // Example: "alice"
const password = req.body.password; // Example: "Secret123!"
// WHY not hash password on client:
// - After hashing on client, hash value itself becomes the "password"
// - Attackers can directly login if they obtain the hash value
// - Must hash with salt on server, client always sends plaintext
// 3. Call authentication function
const token = await auth.authenticate_user(username, password);
// 4. Respond based on result
if (token) {
// Authentication success
res.json({
success: true,
token: token,
// WHY return token: Client needs to carry it in subsequent requests
message: 'Login successful'
});
// WHY set HTTP-only Cookie (optional):
// res.cookie('auth_token', token, {
// httpOnly: true, // WHY: Prevent XSS attacks from reading it
// secure: true // WHY: Only transmit over HTTPS
// });
} else {
// Authentication failure (user does not exist or wrong password)
// WHY not distinguish failure reasons: Prevent username enumeration
res.status(401).json({
success: false,
message: 'Incorrect username or password' // Vague error message
});
// WHY return 401 instead of 403:
// 401 = Unauthenticated (needs to provide credentials)
// 403 = Authenticated but no permission
}
Execution Result Analysis:
Success Path:
Client request → Server verification → Return Token
Time: ~100ms
Token example: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
Failure Path:
Client request → Server verification → Return 401 error
Time: ~100ms (similar to success, prevents timing attacks)
---
### Step 11: Self-Assessment Checklist
**After completing analysis, mandatory verification of the following items:**
```markdown
## Quality Verification Checklist
### Understanding Depth Verification
- [ ] **Each core concept answers 3 WHY questions**
- WHY this concept is needed
- WHY it's implemented this way
- WHY other methods are not used
- [ ] **Self-explanation test passed**
- [ ] Can explain each core concept without looking at code
- [ ] Can explain "why" instead of just "what"
- [ ] Can apply it in different scenarios (transfer test)
- [ ] **Concept connections established**
- [ ] Marked dependency/comparison/combination relationships between concepts
- [ ] Connected to existing knowledge (design patterns, algorithm theory)
- [ ] Explained reasons for each connection (WHY)
### Technical Accuracy Verification
- [ ] **Algorithm analysis complete**
- [ ] Time/space complexity marked
- [ ] WHY this algorithm was chosen
- [ ] WHY complexity is acceptable
- [ ] Provided authoritative reference materials
- [ ] **Design pattern identification**
- [ ] All patterns are marked
- [ ] WHY this pattern is used
- [ ] What would happen if not used
- [ ] Provided standard references
- [ ] **Code analysis detailed**
- [ ] Key code snippets have line-by-line analysis
- [ ] Each line includes "what it does" + "WHY it's done this way"
- [ ] Provided execution examples with specific data
- [ ] Annotated error-prone points and boundary conditions
### Practicality Verification
- [ ] **Application transfer test**
- [ ] At least 2 transfer examples in different scenarios
- [ ] Explained what remains unchanged and what needs to be changed
- [ ] Extracted general patterns
- [ ] **Usage examples are runnable**
- [ ] Example code is complete
- [ ] Includes detailed WHY comments
- [ ] Explained execution results
- [ ] **Issues and improvement suggestions**
- [ ] Pointed out potential issues
- [ ] WHY it's an issue
- [ ] Provided improvement solutions
- [ ] WHY the improvement solution is better
### Final Verification Questions
**If you don't look at the original code, based on this analysis document:**
1. ✅ Can you understand the code's design ideas?
2. ✅ Can you implement similar functions independently?
3. ✅ Can you apply it to different scenarios?
4. ✅ Can you explain it clearly to others?
If any answer is "No", the analysis is not deep enough and needs supplementation.
Output Format Summary
Complete Analysis Document Structure:
markdown
# [Code Name] Deep Understanding Analysis
## Understanding Verification Status
[Self-explanation test result table]
## 1. Quick Overview
- Programming language:
- Code scale:
- Core dependencies:
## 2. Background and Motivation Analysis (Elaborative Interrogation)
- Problem essence (WHY needed)
- Solution selection (WHY chosen + WHY other solutions not chosen)
- Application scenarios (WHY applicable + WHY not applicable)
## 3. Concept Network Diagram
- Core concept list (3 WHY questions per concept)
- Concept relationship matrix
- Connection to existing knowledge
## 4. In-Depth Algorithm and Theory Analysis
- Each algorithm: Complexity + WHY chosen + WHY acceptable + reference materials
- Each theory: WHY used + WHY effective + WHY limited
## 5. Design Pattern Analysis
- Each pattern: WHY used + WHY not used + implementation details + reference materials
## 6. In-Depth Key Code Analysis
- Each code snippet: Line-by-line analysis (what it does + WHY) + execution examples + key takeaways
## 7. Test Case Analysis (if applicable)
- Test file list and coverage analysis
- Boundary conditions discovered from tests
- Test-driven understanding verification
## 8. Application Transfer Scenarios (at least 2)
- Each scenario: Invariant principles + modified parts + WHY transferred this way
## 9. Dependency Relationships and Usage Examples
- Each dependency: WHY chosen + WHY other solutions not chosen
- Examples include detailed WHY comments
## 10. Quality Verification Checklist
[Check all verification items]
Special Scenario Handling
Multi-File Projects
-
Overall Architecture Analysis
- Project structure tree + WHY organized this way
- Entry file + WHY start here
- Module division + WHY divided this way
-
Inter-Module Relationships
- Dependency graph + WHY dependent this way
- Data flow graph + WHY flows this way
- Call chain + WHY called this way
-
Module-by-Module Analysis
- Analyze each core module according to standard process
- Emphasize WHY relationships between modules
Complex Algorithms
-
Layered Explanation
- First describe ideas in natural language
- Then show structure with pseudocode
- Finally analyze implementation line by line
-
WHY Throughout
- WHY this algorithm was chosen
- WHY each step is done this way
- WHY complexity is as such
-
Visualization Assistance
- Show execution process with specific data
- Explain WHY at each step
Unfamiliar Technology Stacks
-
Technology Background Explanation
- What this technology stack is
- WHY this technology stack exists
- WHY the project chose it
-
Key Concept Explanation
- Concepts unique to this technology stack
- WHY designed this way
- Comparison with other technology stacks
-
Learning Resources
- Official documentation links
- WHY recommend these resources
- Learning path suggestions
Final Pre-Analysis Check
Before starting analysis, confirm:
Remember: The goal is not to "finish reading the code", but to "truly understand the code".
📤 Output Requirements (Token-Optimized Version)
After completing analysis, must generate independent Markdown document!
Document Generation Strategies for Three Modes
| Mode | Generation Method | Number of Files | Applicable Scenarios |
|---|
| Quick | Single Write | 1 | Quick code review |
| Standard | Single Write | 1 | Learning and understanding code |
| Deep | Automatically select strategy based on scale | 1-2 | In-depth mastery, large projects |
| → Code ≤ 2000 lines | Progressive Write | 1-2 | Interview preparation, complete mastery |
| → Code > 2000 lines | Parallel Processing + Aggregation | Multiple temporary chapters → 1 final document | Large projects, complex codebases |
⚡ Token Saving Strategies
Important Principle: Avoid duplicate output, write directly to files
-
Prohibit outputting complete analysis in conversation
- Complete analysis is written directly to file, not output to conversation
- Only output analysis summary + file path in conversation
-
Chunk processing for large projects
- Single-file analysis: Generate single document
- Multi-file project: Generate multiple documents by module
- Ultra-long analysis: Split into +
module-name-detailed-analysis.md
-
Progressive Generation (for Deep Mode)
- First generate framework document (table of contents + overview)
- Fill content section by section, use Write to append updates each time
Document Generation Rules
-
File Naming Format
- Single file:
[code-name]-deep-analysis.md
or
- Multi-file project:
[project-name]-overview.md
+ [module-name]-analysis.md
- Examples:
jwt-authentication-deep-analysis.md
, quicksort-deep-analysis.md
-
Generation Method (Token-Optimized Flow)
Method 1: Direct Write (Recommended)
User: Conduct in-depth analysis of this code
1. [Complete analysis process, do not output complete content]
2. Use Write tool directly to generate document:
File path: [code-name]-deep-analysis.md
Content: [Complete analysis content]
3. Output brief summary in conversation:
- Mode: Standard/Deep
- Key findings: 3-5 key points
- File path: [code-name]-deep-analysis.md
Method 2: Chunk Generation for Multi-File Projects
1. [Complete overall analysis]
2. Generate overview document:
Write: [project-name]-overview.md
Content: Overall architecture, module relationship diagram, analysis framework
3. Generate detailed documents by module:
Write: [moduleA]-analysis.md
Write: [moduleB]-analysis.md
Write: [moduleC]-analysis.md
4. Output summary:
- Generated 4 documents
- List all file paths
Method 3: Deep Mode (Automatically select strategy based on code scale)
Deep Mode automatically selects optimal strategy based on code scale.
[Strategy A: Progressive Generation] When code ≤ 2000 lines
- First generate framework document (table of contents + overview)
- Fill content section by section, use Write to append updates each time
- Refer to "Deep Mode Output Structure - Strategy A" section above
[Strategy B: Parallel Processing] When code > 2000 lines
1. Main Agent generates framework and task allocation
2. Use Task tool to create multiple parallel sub-Agents
3. Each sub-Agent focuses on one chapter, generates independent file
4. Main Agent aggregates all chapters, generates final document
File structure:
work/
├── 00-framework.json # Framework generated by Main Agent
├── tasks/ # Sub-task description directory
├── chapters/ # Chapters generated by sub-Agents
└── [project-name]-complete-mastery-analysis.md # Final aggregated document
Example Task call:
Task(
description: "In-depth analysis of [chapter-name] chapter",
prompt: "You are a [chapter-name] analysis expert, please conduct in-depth analysis...[specific instructions]",
subagent_type: "general-purpose"
)
-
Conversation Output Format (Simplified Version)
markdown
## Analysis Completed
**Mode:** Standard Mode
**Key Findings:**
- Code implements [core function]
- Uses [algorithm/pattern] to solve [problem]
- Key optimization points: [optimization point1], [optimization point2]
- Potential issues: [issue1], [issue2]
**Complete Document:** `[code-name]-deep-analysis.md`
Output Process Comparison
❌ High Token Consumption Method (Avoid):
1. Output 5000-token complete analysis in conversation
2. Use Write tool to write another 5000 tokens
→ Total: 10000+ tokens output
✅ Token-Optimized Method (Recommended):
1. Use Write tool directly to write 5000 tokens
2. Output 200-token summary in conversation
→ Total: 5200 tokens output (saves ~50%)
Large Project Chunking Guide
| Project Scale | Recommended Mode | Generation Strategy | File Structure |
|---|
| < 500 lines | Quick/Standard | Single document | |
| 500-2000 lines | Standard | Single document (may be long) | |
| 2000-10000 lines | Deep (automatic parallel) | Parallel chapters | Multiple temporary chapters → 1 final document |
| > 10000 lines | Deep (automatic parallel) | Hierarchical parallel | Module-level parallel + chapter-level parallel |
Important: Do not output complete analysis results in conversation, write directly to file, only output summary!
🚀 Deep Mode Automatic Implementation Guide (Specific Instructions for Claude)
Deep Mode automatically selects optimal strategy based on code scale. When parallel processing is needed:
Step 1: Identify if Parallel Processing Is Needed
Automatic trigger conditions (use parallel processing if any are met):
- Number of code files > 10
- Total code lines > 2000
- User explicitly mentions "large project", "complete project", "overall project analysis"
- User uses depth trigger words like "thoroughly", "complete mastery", "in-depth research" and code scale is large
Step 2: Select Processing Strategy
if code_lines <= 2000:
use Strategy A: Progressive Generation (sequential processing)
else:
use Strategy B: Parallel Processing (detailed below)
Step 3: Parallel Processing Preparation (Strategy B)
bash
# Create working directory
mkdir -p code-analysis/{tasks,chapters}
# Generate framework file
cat > code-analysis/00-framework.json << 'EOF'
{
"project_name": "[project-name]",
"language": "[language]",
"total_lines": [line-count],
"core_concepts": [concept-list],
"chapters": [
"Background and Motivation", "Core Concepts", "Algorithm Theory",
"Design Patterns", "Code Analysis", "Application Transfer",
"Dependency Relationships", "Quality Verification"
]
}
EOF
Step 4: Create Parallel Sub-Agents
For each chapter, use Task tool to create independent sub-Agents:
Task(
description: "In-depth analysis of [chapter-name] chapter",
prompt: """
You are a [chapter-name] analysis expert.
## Context
- Project: {project_name}
- Language: {language}
- Core Concepts: {core_concepts}
## Task
Conduct in-depth analysis of the [chapter-name] section of the code, generate detailed chapter content (at least {min_words} words).
## Requirements
- Use scenario/step + WHY style comments
- Each key point answers 3 WHY questions
- Provide specific execution examples
- Cite authoritative sources
## Output
Write complete chapter content to file:
code-analysis/chapters/{chapter-name}.md
""",
subagent_type: "general-purpose"
)
Step 4: Aggregate Results
After all sub-Agents are completed, use Read tool to read all chapter files, merge in order:
1. Read code-analysis/00-framework.json
2. Read code-analysis/chapters/*.md (in order)
3. Merge into final document
4. Write to {project-name}-complete-mastery-analysis.md
📋 Chapter Depth Self-Check Standards (Ensure Quality)
When generating in Deep Mode, each chapter must pass the following checks:
markdown
## Chapter Depth Self-Check Checklist
### 1. Content Completeness (Mandatory)
- [ ] All sub-items of the chapter are covered (no "brief" or "same as above")
- [ ] Each WHY has specific explanations (at least 2-3 sentences, not just one sentence)
- [ ] Code examples have complete comments (use scenario/step + WHY style)
- [ ] Citations have source links (algorithms/patterns/theories)
### 2. Analysis Depth (By Chapter Type)
**Concept Chapters (Chapter 3):**
- [ ] Each core concept has 3 WHY answers
- WHY this concept is needed
- WHY it's implemented this way
- WHY other methods are not used
**Algorithm Chapters (Chapter 4):**
- [ ] Time/space complexity marked
- [ ] Explanation of WHY this algorithm was chosen
- [ ] Explanation of WHY complexity is acceptable
- [ ] Explanation of degradation scenarios
**Design Pattern Chapters (Chapter 5):**
- [ ] Pattern name and standard reference provided
- [ ] Explanation of WHY this pattern is used
- [ ] Explanation of what would happen if not used
**Code Analysis Chapters (Chapter 6):**
- [ ] Line-by-line analysis (what it does + WHY)
- [ ] Execution examples with specific data
- [ ] Multi-scenario tracking (at least 2 scenarios)
- [ ] Error-prone points and boundary conditions annotated
### 3. Practicality (Application Value)
- [ ] Error-prone points annotated
- [ ] Boundary conditions explained
- [ ] At least 2 application transfer scenarios
- [ ] Improvement suggestions have WHY explanations
### 4. Format Specification
- [ ] Uses Markdown format
- [ ] Code blocks have language annotations
- [ ] Tables are aligned correctly
- [ ] List levels are clear
### Handling of Unqualified Chapters
**Case A: Insufficient Content (<300 words)**
→ Append details: Add more explanations, examples, comparisons
**Case B: Insufficient WHY Analysis**
→ Supplement WHY: Ask "why" for each core point
**Case C: Incomplete Code Comments**
→ Add detailed comments: Use scenario/step + WHY style
**Case D: Missing Execution Flow**
→ Add specific data examples: Track variable change trajectories
Quick Depth Evaluation Standards:
| Chapter | Minimum Word Count | Mandatory Elements |
|---|
| 1. Quick Overview | 200 | Language, scale, dependencies, type |
| 2. Background and Motivation | 400 | Problem essence, solution selection, application scenarios |
| 3. Core Concepts | 600 | 3 WHY per concept, relationship matrix |
| 4. Algorithms and Theory | 500 | Complexity, WHY, reference materials |
| 5. Design Patterns | 400 | Pattern name, WHY, standard reference |
| 6. In-Depth Key Code Analysis | 800 | Line-by-line analysis, execution examples, scenario tracking |
| 7. Test Case Analysis | 400 | Test coverage, boundary conditions, test findings |
| 8. Application Transfer | 500 | At least 2 scenarios, invariant principles, modified parts |
| 9. Dependency Relationships | 300 | WHY for each dependency, usage examples |
| 10. Quality Verification | 200 | Verification checklist, four abilities test |
Total: Deep Mode document should be ≥ 4300 words