Knowledge Builder - Parallel KB Generation Orchestrator
Parameters
Extract these parameters from the user's input:
| Parameter | Required | Default | Description |
|---|
| No | - | Feature ID to incorporate learnings from an archived feature into KB |
Environment values (resolve via shell):
- : !
rp1 agent-tools rp1-root-dir
(extract from JSON response)
This command orchestrates parallel knowledge base generation using a map-reduce architecture
CRITICAL: This is an ORCHESTRATOR command, not a thin wrapper. This command must handle parallel execution coordination, result aggregation, and state management.
Architecture Overview
Phase 1 (Sequential): Spatial Analyzer -> Categorized file lists
Phase 2 (Parallel): 4 Analysis Agents -> JSON outputs (concept, arch, module, pattern)
Phase 3 (Sequential): Command -> Merge JSON -> Generate index.md -> Write KB files
Key Design: The main orchestrator generates index.md directly (not via sub-agent) because:
- It has visibility into all 4 sub-agent outputs
- It can aggregate key facts into a "jump off" entry point
- index.md must contain file manifest with accurate line counts from generated files
Execution Instructions
DO NOT ask for user approval. Execute immediately.
Feature Learning Mode
If
is provided, this is a
feature learning build that captures knowledge from an archived feature.
Skip Phase 0 entirely (no git commit parsing needed).
-
Locate archived feature:
FEATURE_PATH = {{$RP1_ROOT}}/work/archives/features/{FEATURE_ID}/
If not found, check active features:
FEATURE_PATH = {{$RP1_ROOT}}/work/features/{FEATURE_ID}/
If neither exists, error:
Feature not found: {FEATURE_ID}
Checked: {{$RP1_ROOT}}/work/archives/features/{FEATURE_ID}/
{{$RP1_ROOT}}/work/features/{FEATURE_ID}/
-
Read feature documentation:
{FEATURE_PATH}/requirements.md
- What was built
- - How it was designed
{FEATURE_PATH}/field-notes.md
- Learnings and discoveries (if exists)
- - Implementation details with files modified
-
Extract files modified from tasks.md:
Parse implementation summaries to build
list:
Look for patterns:
- **Files**: `src/file1.ts`, `src/file2.ts`
- **Files Modified**: ...
Extract all file paths into FILES_MODIFIED array.
-
Extract feature context:
Build a
object containing:
- Feature ID and path
- Key requirements (summarized)
- Architectural decisions from design.md
- All discoveries from field-notes.md
- Implementation patterns used
- : FILES_MODIFIED array
-
Jump directly to Phase 1 (Spatial Analysis):
- Pass to spatial analyzer instead of git diff
- Spatial analyzer categorizes these specific files
- No git commit comparison needed
-
Spatial analyzer prompt (Feature Learning Mode):
FEATURE_LEARNING mode. Categorize these files modified during feature implementation:
FILES: {{stringify(FILES_MODIFIED)}}
Rank each file 0-5, categorize by KB section (index_files, concept_files, arch_files, module_files).
Return JSON with categorized files.
-
Sub-agent prompts include:
FEATURE_CONTEXT: {{stringify(feature_context)}}
MODE: FEATURE_LEARNING
Incorporate learnings from this completed feature:
- Update patterns.md with implementation patterns discovered
- Update architecture.md if new architectural patterns emerged
- Update modules.md with new components/dependencies
- Update concept_map.md with new domain concepts
Focus on files that were modified: {{stringify(FILES_MODIFIED)}}
Phase 0: Change Detection and Diff Analysis
NOTE: Skip this phase entirely if FEATURE_ID is provided (Feature Learning Mode).
-
Check for existing KB state:
- Check if
{{$RP1_ROOT}}/context/state.json
exists
- If exists, read the field from state.json
-
Check current git commit:
- Run: to get current commit hash
- Compare with git_commit from state.json (if exists)
-
Determine build strategy:
CASE A: No changes detected (state.json exists AND git commit unchanged):
- ACTION: Skip build entirely (no-op)
- MESSAGE: "KB is up-to-date (commit {{commit_hash}}). No regeneration needed. KB is automatically loaded by agents when needed."
CASE A-MONOREPO: No changes in this service (monorepo: git commit changed but no changes in CODEBASE_ROOT):
- ACTION: Skip build BUT update state.json with new commit
- REASON: In monorepo, global commit moves even when this service unchanged. Update commit reference to avoid checking larger diff ranges in future.
- Update state.json:
- Read existing state.json
- Update only the field to new commit hash
- Keep all other fields unchanged (strategy, repo_type, files_analyzed, etc.)
- Write updated state.json
- MESSAGE: "No changes in this service since last build. Updated commit reference ({{old_commit}} -> {{new_commit}}). KB is automatically loaded by agents when needed."
CASE B: First-time build (no state.json):
- ACTION: Full analysis mode - proceed to Phase 1
- MESSAGE: "First-time KB generation with parallel analysis (10-15 min)"
- MODE: Full scan (spatial analyzer processes all files)
CASE C: Incremental update (state.json exists AND commit changed AND files changed in CODEBASE_ROOT):
-
ACTION: Incremental analysis mode - get changed files with diffs
-
Read monorepo metadata from state.json AND local values from meta.json:
bash
# Read shareable state
repo_type=$(jq -r '.repo_type // "single-project"' {{$RP1_ROOT}}/context/state.json)
# Read local values from meta.json (with fallback to state.json for backward compatibility)
if [ -f "{{$RP1_ROOT}}/context/meta.json" ]; then
repo_root=$(jq -r '.repo_root // "."' {{$RP1_ROOT}}/context/meta.json)
current_project_path=$(jq -r '.current_project_path // "."' {{$RP1_ROOT}}/context/meta.json)
else
# Backward compatibility: read from state.json if meta.json doesn't exist
repo_root=$(jq -r '.repo_root // "."' {{$RP1_ROOT}}/context/state.json)
current_project_path=$(jq -r '.current_project_path // "."' {{$RP1_ROOT}}/context/state.json)
fi
-
Get changed files list:
bash
# If monorepo, run git diff from repo root and filter to current project
if [ "$repo_type" = "monorepo" ]; then
cd "$repo_root"
# Get all changed files
all_changes=$(git diff --name-only {{old_commit}} {{new_commit}})
# Filter to current project (skip filtering if root project)
if [ "$current_project_path" = "." ] || [ "$current_project_path" = "" ]; then
# Root project - include all files
echo "$all_changes"
else
# Subdirectory project - filter to project path
echo "$all_changes" | grep "^${current_project_path}"
fi
else
# Single-project - get all changes
git diff --name-only {{old_commit}} {{new_commit}}
fi
-
Check if any files changed in scope:
- If NO changes found -> Go to CASE A-MONOREPO (update commit only)
- If changes found -> Continue with incremental analysis
-
Check change set size (prevent token limit issues):
bash
changed_file_count=$(echo "$changed_files" | wc -l)
if [ $changed_file_count -gt 50 ]; then
echo "Large change set ($changed_file_count files changed). Using FULL mode for reliability."
# Fall back to FULL mode (skip getting diffs)
MODE="FULL"
else
MODE="INCREMENTAL"
fi
-
MESSAGE:
- If MODE=FULL: "Large change set ({{changed_file_count}} files). Full analysis (10-15 min)"
- If MODE=INCREMENTAL: "Changes detected since last build ({{old_commit}} -> {{new_commit}}). Analyzing {{changed_file_count}} changed files (2-5 min)"
-
Get detailed diffs for each changed file (only if MODE=INCREMENTAL):
bash
# Only if incremental mode (< 50 files)
git diff {{old_commit}} {{new_commit}} -- <filepath>
-
Store diffs: Create FILE_DIFFS JSON mapping filepath -> diff content (only if MODE=INCREMENTAL)
-
Filter changed files: Apply EXCLUDE_PATTERNS, filter to relevant extensions
-
Store changed files list: Will be passed to spatial analyzer
-
MODE: INCREMENTAL (< 50 files) or FULL (>= 50 files)
Phase 1: Spatial Analysis (Sequential)
-
Spawn spatial analyzer agent:
For full build (CASE B):
Use Task tool with:
subagent_type: rp1-base:kb-spatial-analyzer
prompt: "FULL SCAN mode. Scan all files in repository at {{CODEBASE_ROOT}}, rank files 0-5, categorize by KB section. Return JSON with index_files, concept_files, arch_files, module_files arrays."
For incremental build (CASE C):
Use Task tool with:
subagent_type: rp1-base:kb-spatial-analyzer
prompt: "INCREMENTAL mode. Only categorize these changed files: {{changed_files_list}}. Rank each file 0-5, categorize by KB section (index_files, concept_files, arch_files, module_files). Return JSON with categorized changed files."
-
Parse spatial analyzer output:
- Extract JSON from agent response
- Validate structure: must have , , , , , , ,
- Store shareable metadata: ,
- Store local metadata from : , (will be written to meta.json)
- For incremental: files_scanned should match changed_file_count
- Check that at least one category has files (some categories may be empty in incremental)
-
Handle spatial analyzer failure:
- If agent crashes or returns invalid JSON: Log error with details
- If categorization is completely empty: Log error
- Provide troubleshooting guidance
Phase 2: Map Phase (Parallel Execution)
-
Spawn 4 analysis agents in parallel (CRITICAL: Use a SINGLE message with 4 Task tool calls):
Agent 1 - Concept Extractor:
Use Task tool with:
subagent_type: rp1-base:kb-concept-extractor
prompt: "MODE={{mode}}. Extract domain concepts for concept_map.md. Repository type: {{repo_type}}. Files to analyze (JSON): {{stringify(concept_files)}}. {{if mode==INCREMENTAL}}File diffs (JSON): {{stringify(file_diffs_for_concept_files)}}{{endif}}. Return JSON with concepts, terminology, relationships."
Agent 2 - Architecture Mapper:
Use Task tool with:
subagent_type: rp1-base:kb-architecture-mapper
prompt: "MODE={{mode}}. Map system architecture for architecture.md. Repository type: {{repo_type}}. Files to analyze (JSON): {{stringify(arch_files)}}. {{if mode==INCREMENTAL}}File diffs (JSON): {{stringify(file_diffs_for_arch_files)}}{{endif}}. Return JSON with patterns, layers, diagram."
Agent 3 - Module Analyzer:
Use Task tool with:
subagent_type: rp1-base:kb-module-analyzer
prompt: "MODE={{mode}}. Analyze modules for modules.md. Repository type: {{repo_type}}. Files to analyze (JSON): {{stringify(module_files)}}. {{if mode==INCREMENTAL}}File diffs (JSON): {{stringify(file_diffs_for_module_files)}}{{endif}}. Return JSON with modules, components, dependencies."
Agent 4 - Pattern Extractor:
Use Task tool with:
subagent_type: rp1-base:kb-pattern-extractor
prompt: "MODE={{mode}}. Extract implementation patterns for patterns.md. Repository type: {{repo_type}}. Files to analyze (JSON): {{stringify(concept_files + module_files)}}. {{if mode==INCREMENTAL}}File diffs (JSON): {{stringify(file_diffs_for_pattern_files)}}{{endif}}. Return JSON with patterns (<=150 lines when rendered)."
-
Collect agent outputs:
- Wait for all 4 agents to complete
- Parse JSON from each agent response
- Validate JSON structure for each output
-
Handle partial failures:
If 1 agent fails:
- Continue with remaining 3 successful agents
- Generate placeholder content for failed section:
- concept_map.md failed -> "# Error extracting concepts - run full rebuild"
- architecture.md failed -> "# Error mapping architecture - see logs"
- modules.md failed -> "# Error analyzing modules - run full rebuild"
- patterns.md failed -> "# Error extracting patterns - run full rebuild"
- Include warning in final report: "Partial KB generated (1 agent failed: <agent-name>)"
- Write partial KB files (index.md always generated by orchestrator + 3 successful agent files + 1 placeholder)
- Exit with partial success (still usable KB)
If 2+ agents fail:
- Log all errors with specific agent names and error messages
- Do NOT write partial KB (too incomplete to be useful)
- Provide troubleshooting guidance:
- Check file permissions
- Verify git repository is valid
- Try running again (may be transient failure)
- Exit with error message: "ERROR: KB generation failed (X agents failed)"
- Exit code: 1
Phase 3: Reduce Phase (Merge and Write)
-
Load KB templates:
Use Skill tool with:
skill: rp1-base:knowledge-base-templates
- Load templates for: index.md, concept_map.md, architecture.md, modules.md, patterns.md
-
Merge agent data into templates (concept_map, architecture, modules, patterns):
concept_map.md:
- Use concept-extractor JSON data
- Fill template sections: core concepts, terminology, relationships, patterns
- Add concept boundaries
architecture.md:
- Use architecture-mapper JSON data
- Fill template sections: patterns, layers, interactions, integrations
- Insert Mermaid diagram from JSON
modules.md:
- Use module-analyzer JSON data
- Fill template sections: modules, components, dependencies, metrics
- Add responsibility matrix
patterns.md:
- Use pattern-extractor JSON data
- Fill template sections: 6 core patterns, conditional patterns (if detected)
- Verify output is <=150 lines
- Omit conditional sections if not detected
-
Validate Mermaid diagrams:
Use Skill tool with:
skill: rp1-base:mermaid
- Validate diagram from architecture.md
- If invalid: Log warning, use fallback simple diagram or omit
-
Generate index.md directly (orchestrator-owned, not agent):
The orchestrator generates index.md as the "jump off" entry point by aggregating data from all 4 sub-agents.
Follow the index.md generation instructions in the knowledge-base-templates skill:
- See "Index.md Generation (Orchestrator-Owned)" section in SKILL.md
- Aggregation process: extract data from each sub-agent's JSON output
- Calculate file manifest: get line counts after writing other KB files
- Template placeholder mapping: fill template with aggregated data
-
Write KB files:
Use Write tool to write:
- {{$RP1_ROOT}}/context/index.md
- {{$RP1_ROOT}}/context/concept_map.md
- {{$RP1_ROOT}}/context/architecture.md
- {{$RP1_ROOT}}/context/modules.md
- {{$RP1_ROOT}}/context/patterns.md
Phase 4: State Management
-
Aggregate metadata:
- Combine metadata from spatial analyzer + 4 analysis agents
- Calculate total files analyzed
- Extract languages and frameworks
- Calculate metrics (module count, component count, concept count)
-
Generate state.json (shareable metadata - safe to commit/share):
json
{
"strategy": "parallel-map-reduce",
"repo_type": "{{repo_type}}",
"monorepo_projects": ["{{project1}}", "{{project2}}"],
"generated_at": "{{ISO timestamp}}",
"git_commit": "{{git rev-parse HEAD}}",
"files_analyzed": {{total_files}},
"languages": ["{{lang1}}", "{{lang2}}"],
"metrics": {
"modules": {{module_count}},
"components": {{component_count}},
"concepts": {{concept_count}}
}
}
-
Generate meta.json (local values - should NOT be committed/shared):
json
{
"repo_root": "{{repo_root}}",
"current_project_path": "{{current_project_path}}"
}
NOTE:
contains local paths that may differ per team member. This file should be added to
.
-
Write state files:
Use Write tool to write:
- {{$RP1_ROOT}}/context/state.json
- {{$RP1_ROOT}}/context/meta.json
Phase 5: Error Handling
Error Conditions:
- Spatial analyzer fails or returns invalid JSON
- 2 or more analysis agents fail
- Template loading fails
- Write operations fail repeatedly
- Git commands fail (unable to detect commit hash)
Error Handling Procedure:
- Log clear error message indicating which phase/component failed
- Provide specific details about what went wrong
- List attempted operations and their results
- Provide actionable guidance for resolution:
- Check git repository status if git commands failed
- Verify file permissions if write operations failed
- Check agent logs if spatial analyzer or analysis agents failed
- Report error to user with troubleshooting steps
Final Report:
Knowledge Base Generated Successfully
Strategy: Parallel map-reduce
Repository: {{repo_type}}
Files Analyzed: {{total_files}}
KB Files Written:
- {{$RP1_ROOT}}/context/index.md
- {{$RP1_ROOT}}/context/concept_map.md
- {{$RP1_ROOT}}/context/architecture.md
- {{$RP1_ROOT}}/context/modules.md
- {{$RP1_ROOT}}/context/patterns.md
- {{$RP1_ROOT}}/context/state.json (shareable metadata)
- {{$RP1_ROOT}}/context/meta.json (local paths - add to .gitignore)
Next steps:
- KB is automatically loaded by agents when needed (no manual /knowledge-load required)
- Subsequent runs will use same parallel approach (10-15 min)
- Incremental updates (changed files only) are faster (2-5 min)
- Add meta.json to .gitignore to prevent sharing local paths
Final Report (Feature Learning Mode):
Feature Learnings Captured
Feature: {{FEATURE_ID}}
Source: {{FEATURE_PATH}}
Learnings Incorporated:
- patterns.md: {{N}} new patterns from implementation
- architecture.md: {{N}} architectural decisions
- modules.md: {{N}} new components/dependencies
- concept_map.md: {{N}} domain concepts
KB Files Updated:
- {{$RP1_ROOT}}/context/index.md
- {{$RP1_ROOT}}/context/concept_map.md
- {{$RP1_ROOT}}/context/architecture.md
- {{$RP1_ROOT}}/context/modules.md
- {{$RP1_ROOT}}/context/patterns.md
The knowledge from feature "{{FEATURE_ID}}" has been captured into the KB.
Future agents will benefit from these learnings.
Additional Parameters
| Parameter | Default | Purpose |
|---|
| RP1_ROOT | | Root directory for KB artifacts |
| CODEBASE_ROOT | | Repository root to analyze |
| EXCLUDE_PATTERNS | node_modules/,.git/,build/,dist/
| Patterns to exclude from scanning |
Critical Execution Notes
- Change detection first: Always check Phase 0 - compare git commit hash to skip if unchanged
- Do NOT iterate: Execute workflow ONCE, no refinement
- Parallel spawning: Spawn 4 agents in SINGLE message with multiple Task calls
- Index.md ownership: Orchestrator generates index.md directly (not via sub-agent)
- Error handling: Provide clear error messages with troubleshooting steps if failures occur
- No user interaction: Complete entire workflow autonomously
- Set expectations: Inform user builds take 10-15 minutes (or instant if no changes)
Output Discipline
CRITICAL - Keep Output Concise:
- Do ALL internal work in <thinking> tags (NOT visible to user)
- Do NOT output verbose phase-by-phase progress ("Now doing Phase 1...", "Spawning agents...", etc.)
- Do NOT explain internal logic or decision-making process
- Only output 3 things:
- Initial status: Build mode message (CASE A/B/C)
- High-level progress (optional): "Analyzing... (Phase X/5)" every 2-3 minutes
- Final report: Success message with KB files written (see Final Report above)
Example of CORRECT output:
First-time KB generation with parallel analysis (10-15 min)
Analyzing... (Phase 2/5)
Knowledge Base Generated Successfully
[Final Report as shown above]
Example of INCORRECT output (DO NOT DO THIS):
Checking for state.json...
state.json not found, proceeding with first-time build
Running git rev-parse HEAD to get commit...
Commit is 475b03e...
Spawning kb-spatial-analyzer agent...
Parsing spatial analyzer output...
Found 90 files in index_files category...
Now spawning 4 parallel agents...
Spawning kb-concept-extractor...
Spawning kb-architecture-mapper...
Spawning kb-module-analyzer...
etc. (too verbose!)
Expected Performance
No changes detected:
- Instant (no-op)
- Single-project: Commit unchanged -> Skip entirely
- Monorepo: Commit changed but no changes in this service -> Update state.json commit only
First-time build (no state.json - full analysis):
- 10-15 minutes
- Spatial analyzer scans all files
- 5 parallel agents analyze all relevant files
- Generates complete KB
Incremental update (commit changed - changed files only):
- 2-5 minutes (much faster!)
- Git diff identifies changed files
- Spatial analyzer categorizes only changed files
- 5 parallel agents load existing KB + analyze only changed files
- Updates KB with changes only
- Preserves all existing good content