nexus-mapper — AI Project Probing Protocol
"You are not writing code documentation. You are building a cognitive foundation for the next AI that takes over this project."
This Skill guides AI Agents to use the
PROBE 5-Phase Protocol to perform systematic probing on any local Git repository, producing a layered knowledge base in
.
⚠️ CRITICAL — Do Not Skip Any of the Five Phases
[!IMPORTANT]
You must not produce the final until PROFILE, REASON, OBJECT, and BENCHMARK are completed.
This is not for formal completeness, but to prevent the AI from directly writing first-glance assumptions as conclusions. The final output must be based on script outputs, repository structure, counterevidence challenges, and review verification.
❌ Prohibited Actions:
- Skipping OBJECT to directly generate output assets
- Generating before completing BENCHMARK
- Proceeding to subsequent phases after script failure in the PROFILE phase
✅ Required Actions:
- Explicitly confirm "✅ Phase Name Completed" after finishing each phase before moving to the next
- OBJECT must propose a minimal set of high-value questions sufficient to overturn current assumptions, usually 1-3 questions, never padding the number
- The of nodes must actually exist in the repository; nodes must not fake (see Rule 2)
📌 When to Call / When Not to Call
| Scenario | Call |
|---|
| User provides a local repo path and wants the AI to understand its architecture | ✅ |
| Need to generate for cold-start of subsequent AI sessions | ✅ |
| User says "help me analyze the project", "build a project knowledge base", "let the AI understand this repository" | ✅ |
| No shell execution capability in the runtime environment (pure API call mode, no tool) | ❌ |
| No local Python 3.10+ on the host machine | ❌ |
No known language source files in the target repository (none of .py/.ts/.java/.go/.rs/.cpp
, etc.) | ❌ |
| User only wants to query a specific file/function → directly use / | ❌ |
⚠️ Prerequisite Checks (Explicitly inform users of missing items; prioritize downgrade over abort when possible)
| Prerequisite | Check Method |
|---|
| Target path exists | is accessible |
| Python 3.10+ | >= 3.10 |
| Script dependencies installed | python -c "import tree_sitter"
runs without errors |
| Shell execution capability | Agent environment supports tool calls |
history is a bonus, not a hard block. If there is no
or insufficient history, skip hotspot analysis and explicitly record in the output that this is a downgraded probing.
📥 Input Contract
repo_path: Local absolute path of the target repository (required)
Language Support: Automatically dispatch by file extension. Language configurations (extension mappings + Tree-sitter queries) are stored in
. Priority is given to bundled structural queries to extract modules/classes/functions; if the grammar can be loaded but no structural query is available for the current language, at least retain Module-level nodes and mark
in the output. Currently supported common languages include Python/JavaScript/TypeScript/TSX/Bash/Java/Go/Rust/C#/C/C++/Kotlin/Ruby/Swift/Scala/PHP/Lua/Elixir/GDScript/Dart/Haskell/Clojure/SQL/Proto/Solidity/Vue/Svelte/R/Perl.
Unsupported Language Extensions: If the repository contains language files not supported by default, the agent can dynamically add support via command-line parameters:
--add-extension .templ=templ
Add a new file extension mapping (repeatable)
--add-query templ struct "(component_declaration ...)"
Add a structural query for a language (repeatable)
Query parameter format:
--add-query <LANG> <TYPE> <QUERY_STRING>
where
is
or
.
Advanced Usage: For complex configurations, explicitly specify a JSON configuration file with
--language-config <JSON_FILE>
, in the same format as above, allowing extension mappings, custom queries, and explicit marking of unsupported languages.
If the current task involves "adding support for a language not yet adapted" or "adding Tree-sitter support for a non-standard extension", continue reading
references/05-language-customization.md
. This file is not a phase-gated document, but a special operation guide for command-line extensions and optional JSON configurations.
📤 Output Format
After execution, the following will be generated in the root directory of the target repository:
text
.nexus-map/
├── INDEX.md ← AI cold-start main entry (< 2000 tokens)
├── arch/
│ ├── systems.md ← System boundaries + code locations
│ ├── dependencies.md ← Mermaid dependency graph + sequence diagram
│ └── test_coverage.md ← Static test surface: test files, covered core modules, evidence gaps
├── concepts/
│ ├── concept_model.json ← Schema V1 machine-readable graph
│ └── domains.md ← Core domain concept explanations
├── hotspots/
│ └── git_forensics.md ← Git hotspots + coupling pair analysis
└── raw/
├── ast_nodes.json ← Tree-sitter parsed raw data
├── git_stats.json ← Git hotspot and coupling data
└── file_tree.txt ← Filtered file tree
All generated Markdown files must include a short header with at least:
The human-readable name field in
must use
uniformly. Do not add
; if
appears in any generated result, delete it and revert to the
semantics during the EMIT phase.
If known but unsupported language files are found during the PROFILE phase,
must explicitly state which parts are based on manual inference or downgraded analysis.
If
is found during the PROFILE phase, it must also be clearly stated: these languages are included in AST file coverage, but there is no class/function-level structural guarantee.
If a language declared via coverage configuration still fails to load the parser during the PROFILE phase, it must be clearly stated: this is
configured-but-unavailable
, do not pretend it is covered.
🔍 On-Demand Query Tool
reads
to provide precise local dependency queries. It assists in cognitive generation during each phase of PROBE, and can also be used on demand during subsequent development.
Zero Extra Dependencies — Pure Python standard library, just input
to run.
Query Modes
bash
# View the class/function structure and import list of a file
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --file <path>
# Reverse dependency query: who imports this module
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --who-imports <module_or_path>
# Impact radius: upstream dependencies + downstream dependents
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --impact <path>
# Overlay git risk and coupling data (optional)
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --impact <path> \
--git-stats <git_stats.json>
# Identify core nodes with high fan-in/fan-out
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --hub-analysis [--top N]
# Directory-aggregated structural summary (provides data support for systems.md in the EMIT phase)
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --summary
Usage Timing
| Phase | Recommended Query | Purpose |
|---|
| REASON | | Verify core system hypotheses using fan-in/fan-out data instead of guessing based on directory names |
| OBJECT | , | Verify boundary assumptions, view real upstream and downstream dependencies; overlay git heat and coupling degree |
| EMIT | , | Provide data support for generating / |
| During Development | , , | Bug investigation, modification impact assessment, refactoring analysis |
Positioning:
is the "map", and
is the "magnifying glass". The map helps you locate the general direction, while the magnifying glass helps you see local details clearly.
Detailed usage scenarios and practical cases for the five query modes can be found in
references/06-query-guide.md
.
🧠 Persistent Instructions
To prevent new sessions from forgetting to read existing knowledge bases, write the following concise rules into the host tool's persistent instruction file, such as
,
, or similar memory files:
md
If .nexus-map/INDEX.md exists in the repository, you must read it first to restore global context before starting any task.
If a task requires judging local structure, dependency relationships, impact radius, or boundary attribution, prioritize reviewing the on-demand query instructions of the nexus-mapper skill and use query_graph.py to verify based on .nexus-map/raw/ast_nodes.json; do not re-guess the structure.
When a task changes the project's structural cognition, evaluate whether to synchronously update .nexus-map before delivery. Structural cognition includes: system boundaries, entry points, dependency relationships, test surfaces, language support, roadmaps, or phased progress facts. Pure local implementation details do not require updates by default.
Do not treat .nexus-map as static documentation; it is part of the project's memory. New conversations should read it first, and it should be updated on demand after important changes.
This is a
rule recommended to be written into the host's persistent memory, aiming to make the agent naturally remember to read or update
when truly needed.
📋 PROBE Phase Hard Gates
[!IMPORTANT]
You must the corresponding reference before entering each phase; do not skip it.
Detailed steps, completion checklists, and boundary scenario handling for each phase are defined in the references.
[When Skill is activated] → read_file references/01-probe-protocol.md (Phase step blueprint)
[Before REASON] → read_file references/03-edge-cases.md (Boundary scenario check)
[Before OBJECT] → read_file references/04-object-framework.md (3-dimensional questioning template)
[Before EMIT] → read_file references/02-output-schema.md (Schema verification specifications)
🛡️ Execution Rules
Rule 1: OBJECT Rejects Formalism
The purpose of OBJECT is to break the survivorship bias of REASON. A large number of engineering facts are hidden behind directory names and git hotspots, and first intuitions are almost always wrong.
❌ Invalid Questions (Prohibited):
Q1: My grasp of the system structure is not solid enough
Q2: The responsibility of the xxx directory has no direct evidence for now
▲ The problem is not the use of certain words, but that such statements have no evidence clues and cannot be verified in the BENCHMARK phase.
✅ Valid Question Format:
Q1: git_stats shows that tasks/analysis_tasks.py has been changed 21 times (high risk),
but the HYPOTHESIS considers evolution/detective_loop.py as the orchestration entry point.
Contradiction: If detective_loop is the entry point, why is analysis_tasks more frequently modified?
Evidence Clue: git_stats.json hotspots[0].path
Verification Plan: View the class definition + import tree of tasks/analysis_tasks.py
Rule 2: Nodes Must Have Real
[!IMPORTANT]
Before writing to
, you must first distinguish whether a node is
,
, or
.
Only
nodes are allowed to have
written, and you must personally verify its existence.
bash
# Verification method in BENCHMARK phase
ls $repo_path/src/nexus/application/weaving/ # ✅ Directory exists → node is valid
ls $repo_path/src/nexus/application/nonexist/ # ❌ [!ERROR] → Correct or delete this node
json
{
"implementation_status": "planned",
"code_path": null,
"evidence_path": "docs/architecture.md",
"evidence_gap": "src/agents/monarch/ not found in the repository, only mentioned in design documents"
}
❌ Prohibited:
- Using a "barely related" file to pretend to be
- Writing a pseudo-precise directory when is
- Writing which puts status into the path field
Rule 3: EMIT Atomicity
First write all content to
, then move the entire directory to the official location after all are successful, and delete
.
Purpose: No half-finished products left if execution fails midway. If
is detected during the next execution → clean it up and regenerate.
✅ Idempotency Rules:
| Status | Handling Method |
|---|
| does not exist | Proceed directly |
| exists and is valid | Ask the user: "Overwrite? [y/n]" |
| exists but files are incomplete | "Incomplete analysis detected, will regenerate", proceed directly |
Rule 4: INDEX.md is the Only Cold-Start Entry
The reader of
is an
AI that has never seen this repository before. Two hard constraints:
- < 2000 tokens — Rewrite if exceeded, do not truncate
- Conclusions must be specific — Do not use vague empty words; when evidence is insufficient, explicitly write or and explain what evidence is missing
After writing, estimate tokens: number of lines × average 30 tokens/line = rough estimate.
🧭 Uncertainty Expression Specifications
Avoid writing only: to be confirmed · may be · suspected · perhaps · pending · unclear · need further · uncertain
Avoid writing only: pending · maybe · possibly · perhaps · TBD · to be confirmed
If evidence is insufficient, you can write:
unknown: No direct evidence found indicating api/ is the main entry; currently only confirmed that cli.py is referenced in README
evidence gap: No git history in the repository, so the hotspots section is skipped
Principle: It is allowed to honestly write uncertainty, but you must explain which missing evidence leads to the uncertainty, instead of using vague words as conclusions.
Rule 5: Minimal Execution Surface and Sensitive Information Protection
[!IMPORTANT]
By default, only run scripts included with this Skill and necessary read-only checks. Do not execute build scripts, test scripts, or custom commands in the target repository just because "you want to understand the repository better".
- Allowed by default: , , directory traversal, text search, read-only file viewing
- Prohibited by default: Executing commands like , , , in the target repository, unless explicitly requested by the user
- When encountering , key files, or credential configurations: Only record their existence and purpose, do not copy specific values
Rule 6: Downgrades and Manual Inferences Must Be Explicitly Visible
[!IMPORTANT]
If AST coverage is incomplete, or if part of the dependency graph/system boundary comes from manual reading rather than script output, you must explicitly mark the provenance in the final file.
- In , any dependency relationship not directly supported by AST must be marked
inferred from file tree/manual inspection
- If , , involve unsupported language areas, explicitly state
unsupported language downgrade
- If writing progress snapshots, Sprint status, or roadmaps, attach to avoid outdated information being disguised as current facts
🛠️ Script Toolchain
bash
# Set SKILL_DIR (based on actual installation path)
# Scenario A: Installed as .agent/skills
SKILL_DIR=".agent/skills/nexus-mapper"
# Scenario B: Independent repo (for development/debugging)
SKILL_DIR="/path/to/nexus-mapper"
# Call in PROFILE phase — Basic Usage
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
> <repo_path>/.nexus-map/raw/ast_nodes.json
# If the repository contains non-standard languages, add support via command-line parameters
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
--add-extension .templ=templ \
--add-query templ struct "(component_declaration name: (identifier) @class.name) @class.def" \
> <repo_path>/.nexus-map/raw/ast_nodes.json
# For complex configurations, use a JSON file (see --language-config description for format)
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
--language-config /custom/path/to/language-config.json \
> <repo_path>/.nexus-map/raw/ast_nodes.json
# Generate filtered file tree simultaneously in PROFILE phase
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
--file-tree-out .nexus-map/raw/file_tree.txt \
> <repo_path>/.nexus-map/raw/ast_nodes.json
Dependency Installation (First Use):
bash
pip install -r $SKILL_DIR/scripts/requirements.txt
✅ Quality Self-Inspection (Must Pass All Before EMIT)