nexus-mapper — AI Project Probing Protocol

"You are not writing code documentation. You are building a cognitive foundation for the next AI that takes over this project."

This Skill guides AI Agents to use the PROBE 5-Phase Protocol to perform systematic probing on any local Git repository, producing a layered knowledge base in

.nexus-map/

⚠️ CRITICAL — Do Not Skip Any of the Five Phases

[!IMPORTANT] You must not produce the final
.nexus-map/
until PROFILE, REASON, OBJECT, and BENCHMARK are completed.
This is not for formal completeness, but to prevent the AI from directly writing first-glance assumptions as conclusions. The final output must be based on script outputs, repository structure, counterevidence challenges, and review verification.

❌ Prohibited Actions:

Skipping OBJECT to directly generate output assets
Generating
```
concept_model.json
```
before completing BENCHMARK
Proceeding to subsequent phases after script failure in the PROFILE phase

✅ Required Actions:

Explicitly confirm "✅ Phase Name Completed" after finishing each phase before moving to the next
OBJECT must propose a minimal set of high-value questions sufficient to overturn current assumptions, usually 1-3 questions, never padding the number
The
```
code_path
```
of
```
implemented
```
nodes must actually exist in the repository;
```
planned/inferred
```
nodes must not fake
```
code_path
```
(see Rule 2)

📌 When to Call / When Not to Call

Scenario	Call
User provides a local repo path and wants the AI to understand its architecture	✅
Need to generate `.nexus-map/INDEX.md` for cold-start of subsequent AI sessions	✅
User says "help me analyze the project", "build a project knowledge base", "let the AI understand this repository"	✅
No shell execution capability in the runtime environment (pure API call mode, no `run_command` tool)	❌
No local Python 3.10+ on the host machine	❌
No known language source files in the target repository (none of `.py/.ts/.java/.go/.rs/.cpp` , etc.)	❌
User only wants to query a specific file/function → directly use `view_file` / `grep_search`	❌

⚠️ Prerequisite Checks (Explicitly inform users of missing items; prioritize downgrade over abort when possible)

Prerequisite	Check Method
Target path exists	`$repo_path` is accessible
Python 3.10+	`python --version` >= 3.10
Script dependencies installed	`python -c "import tree_sitter"` runs without errors
Shell execution capability	Agent environment supports `run_command` tool calls

git

history is a bonus, not a hard block. If there is no

.git

or insufficient history, skip hotspot analysis and explicitly record in the output that this is a downgraded probing.

📥 Input Contract

repo_path: Local absolute path of the target repository (required)

Language Support: Automatically dispatch by file extension. Language configurations (extension mappings + Tree-sitter queries) are stored in

scripts/languages.json

. Priority is given to bundled structural queries to extract modules/classes/functions; if the grammar can be loaded but no structural query is available for the current language, at least retain Module-level nodes and mark

module-only coverage

in the output. Currently supported common languages include Python/JavaScript/TypeScript/TSX/Bash/Java/Go/Rust/C#/C/C++/Kotlin/Ruby/Swift/Scala/PHP/Lua/Elixir/GDScript/Dart/Haskell/Clojure/SQL/Proto/Solidity/Vue/Svelte/R/Perl.

Unsupported Language Extensions: If the repository contains language files not supported by default, the agent can dynamically add support via command-line parameters:

```
--add-extension .templ=templ
```
Add a new file extension mapping (repeatable)

--add-query templ struct "(component_declaration ...)"

Add a structural query for a language (repeatable)

Query parameter format:

--add-query <LANG> <TYPE> <QUERY_STRING>

where

<TYPE>

struct

imports

Advanced Usage: For complex configurations, explicitly specify a JSON configuration file with

--language-config <JSON_FILE>

, in the same format as above, allowing extension mappings, custom queries, and explicit marking of unsupported languages.

If the current task involves "adding support for a language not yet adapted" or "adding Tree-sitter support for a non-standard extension", continue reading

references/05-language-customization.md

. This file is not a phase-gated document, but a special operation guide for command-line extensions and optional JSON configurations.

📤 Output Format

After execution, the following will be generated in the root directory of the target repository:

text

.nexus-map/
├── INDEX.md                    ← AI cold-start main entry (< 2000 tokens)
├── arch/
│   ├── systems.md              ← System boundaries + code locations
│   ├── dependencies.md         ← Mermaid dependency graph + sequence diagram
│   └── test_coverage.md        ← Static test surface: test files, covered core modules, evidence gaps
├── concepts/
│   ├── concept_model.json      ← Schema V1 machine-readable graph
│   └── domains.md              ← Core domain concept explanations
├── hotspots/
│   └── git_forensics.md        ← Git hotspots + coupling pair analysis
└── raw/
    ├── ast_nodes.json          ← Tree-sitter parsed raw data
    ├── git_stats.json          ← Git hotspot and coupling data
    └── file_tree.txt           ← Filtered file tree

All generated Markdown files must include a short header with at least:

```
generated_by
```
```
verified_at
```
```
provenance
```

The human-readable name field in

concept_model.json

must use

label

uniformly. Do not add

title

; if

title

appears in any generated result, delete it and revert to the

label

semantics during the EMIT phase.

If known but unsupported language files are found during the PROFILE phase,

provenance

must explicitly state which parts are based on manual inference or downgraded analysis. If

module-only coverage

is found during the PROFILE phase, it must also be clearly stated: these languages are included in AST file coverage, but there is no class/function-level structural guarantee. If a language declared via coverage configuration still fails to load the parser during the PROFILE phase, it must be clearly stated: this is

configured-but-unavailable

, do not pretend it is covered.

🔍 On-Demand Query Tool

scripts/query_graph.py

reads

raw/ast_nodes.json

to provide precise local dependency queries. It assists in cognitive generation during each phase of PROBE, and can also be used on demand during subsequent development.

Zero Extra Dependencies — Pure Python standard library, just input

ast_nodes.json

to run.

Query Modes

bash

# View the class/function structure and import list of a file
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --file <path>

# Reverse dependency query: who imports this module
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --who-imports <module_or_path>

# Impact radius: upstream dependencies + downstream dependents
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --impact <path>

# Overlay git risk and coupling data (optional)
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --impact <path> \
  --git-stats <git_stats.json>

# Identify core nodes with high fan-in/fan-out
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --hub-analysis [--top N]

# Directory-aggregated structural summary (provides data support for systems.md in the EMIT phase)
python $SKILL_DIR/scripts/query_graph.py <ast_nodes.json> --summary

Usage Timing

Phase	Recommended Query	Purpose
REASON	`--hub-analysis`	Verify core system hypotheses using fan-in/fan-out data instead of guessing based on directory names
OBJECT	`--impact` , `--impact --git-stats`	Verify boundary assumptions, view real upstream and downstream dependencies; overlay git heat and coupling degree
EMIT	`--summary` , `--file`	Provide data support for generating `systems.md` / `dependencies.md`
During Development	`--file` , `--who-imports` , `--impact`	Bug investigation, modification impact assessment, refactoring analysis

Positioning:
.nexus-map/
is the "map", and
query_graph.py
is the "magnifying glass". The map helps you locate the general direction, while the magnifying glass helps you see local details clearly.

Detailed usage scenarios and practical cases for the five query modes can be found in
references/06-query-guide.md
.

🧠 Persistent Instructions

To prevent new sessions from forgetting to read existing knowledge bases, write the following concise rules into the host tool's persistent instruction file, such as

AGENTS.md

CLAUDE.md

, or similar memory files:

If .nexus-map/INDEX.md exists in the repository, you must read it first to restore global context before starting any task.

If a task requires judging local structure, dependency relationships, impact radius, or boundary attribution, prioritize reviewing the on-demand query instructions of the nexus-mapper skill and use query_graph.py to verify based on .nexus-map/raw/ast_nodes.json; do not re-guess the structure.

When a task changes the project's structural cognition, evaluate whether to synchronously update .nexus-map before delivery. Structural cognition includes: system boundaries, entry points, dependency relationships, test surfaces, language support, roadmaps, or phased progress facts. Pure local implementation details do not require updates by default.

Do not treat .nexus-map as static documentation; it is part of the project's memory. New conversations should read it first, and it should be updated on demand after important changes.

This is a rule recommended to be written into the host's persistent memory, aiming to make the agent naturally remember to read or update

.nexus-map

when truly needed.

📋 PROBE Phase Hard Gates

[!IMPORTANT] You must
read_file
the corresponding reference before entering each phase; do not skip it. Detailed steps, completion checklists, and boundary scenario handling for each phase are defined in the references.

[When Skill is activated]     → read_file references/01-probe-protocol.md  (Phase step blueprint)
[Before REASON]        → read_file references/03-edge-cases.md       (Boundary scenario check)
[Before OBJECT]        → read_file references/04-object-framework.md  (3-dimensional questioning template)
[Before EMIT]          → read_file references/02-output-schema.md    (Schema verification specifications)

🛡️ Execution Rules

Rule 1: OBJECT Rejects Formalism

The purpose of OBJECT is to break the survivorship bias of REASON. A large number of engineering facts are hidden behind directory names and git hotspots, and first intuitions are almost always wrong.

❌ Invalid Questions (Prohibited):

Q1: My grasp of the system structure is not solid enough
Q2: The responsibility of the xxx directory has no direct evidence for now

▲ The problem is not the use of certain words, but that such statements have no evidence clues and cannot be verified in the BENCHMARK phase.

✅ Valid Question Format:

Q1: git_stats shows that tasks/analysis_tasks.py has been changed 21 times (high risk),
    but the HYPOTHESIS considers evolution/detective_loop.py as the orchestration entry point.
    Contradiction: If detective_loop is the entry point, why is analysis_tasks more frequently modified?
    Evidence Clue: git_stats.json hotspots[0].path
    Verification Plan: View the class definition + import tree of tasks/analysis_tasks.py

Rule 2:

implemented

Nodes Must Have Real

code_path

[!IMPORTANT] Before writing to
concept_model.json
, you must first distinguish whether a node is
implemented
,
planned
, or
inferred
. Only
implemented
nodes are allowed to have
code_path
written, and you must personally verify its existence.

bash

# Verification method in BENCHMARK phase
ls $repo_path/src/nexus/application/weaving/   # ✅ Directory exists → node is valid
ls $repo_path/src/nexus/application/nonexist/  # ❌ [!ERROR] → Correct or delete this node

For

planned

inferred

nodes, use:

json

{
  "implementation_status": "planned",
  "code_path": null,
  "evidence_path": "docs/architecture.md",
  "evidence_gap": "src/agents/monarch/ not found in the repository, only mentioned in design documents"
}

❌ Prohibited:

Using a "barely related" file to pretend to be
```
code_path
```
Writing a pseudo-precise directory when
```
implementation_status
```
is
```
planned/inferred
```
Writing
```
code_path: "PLANNED"
```
which puts status into the path field

Rule 3: EMIT Atomicity

First write all content to

.nexus-map/.tmp/

, then move the entire directory to the official location after all are successful, and delete

.tmp/

Purpose: No half-finished products left if execution fails midway. If

.tmp/

is detected during the next execution → clean it up and regenerate.

✅ Idempotency Rules:

Status	Handling Method
`.nexus-map/` does not exist	Proceed directly
`.nexus-map/` exists and `INDEX.md` is valid	Ask the user: "Overwrite? [y/n]"
`.nexus-map/` exists but files are incomplete	"Incomplete analysis detected, will regenerate", proceed directly

Rule 4: INDEX.md is the Only Cold-Start Entry

The reader of

INDEX.md

is an AI that has never seen this repository before. Two hard constraints:

< 2000 tokens — Rewrite if exceeded, do not truncate
Conclusions must be specific — Do not use vague empty words; when evidence is insufficient, explicitly write
```
evidence gap
```
or
```
unknown
```
and explain what evidence is missing

After writing, estimate tokens: number of lines × average 30 tokens/line = rough estimate.

🧭 Uncertainty Expression Specifications

Avoid writing only: to be confirmed · may be · suspected · perhaps · pending · unclear · need further · uncertain
Avoid writing only: pending · maybe · possibly · perhaps · TBD · to be confirmed

If evidence is insufficient, you can write:

unknown: No direct evidence found indicating api/ is the main entry; currently only confirmed that cli.py is referenced in README

evidence gap: No git history in the repository, so the hotspots section is skipped

Principle: It is allowed to honestly write uncertainty, but you must explain which missing evidence leads to the uncertainty, instead of using vague words as conclusions.

Rule 5: Minimal Execution Surface and Sensitive Information Protection

[!IMPORTANT] By default, only run scripts included with this Skill and necessary read-only checks. Do not execute build scripts, test scripts, or custom commands in the target repository just because "you want to understand the repository better".

Allowed by default:
```
extract_ast.py
```
,
```
git_detective.py
```
, directory traversal, text search, read-only file viewing
Prohibited by default: Executing commands like
```
npm install
```
,
```
pnpm dev
```
,
```
python main.py
```
,
```
docker compose up
```
in the target repository, unless explicitly requested by the user
When encountering
```
.env
```
, key files, or credential configurations: Only record their existence and purpose, do not copy specific values

Rule 6: Downgrades and Manual Inferences Must Be Explicitly Visible

[!IMPORTANT] If AST coverage is incomplete, or if part of the dependency graph/system boundary comes from manual reading rather than script output, you must explicitly mark the provenance in the final file.

In
```
dependencies.md
```
, any dependency relationship not directly supported by AST must be marked
```
inferred from file tree/manual inspection
```

domains.md

systems.md

INDEX.md

involve unsupported language areas, explicitly state

unsupported language downgrade

If writing progress snapshots, Sprint status, or roadmaps, attach
```
verified_at
```
to avoid outdated information being disguised as current facts

🛠️ Script Toolchain

bash

# Set SKILL_DIR (based on actual installation path)
# Scenario A: Installed as .agent/skills
SKILL_DIR=".agent/skills/nexus-mapper"
# Scenario B: Independent repo (for development/debugging)
SKILL_DIR="/path/to/nexus-mapper"

# Call in PROFILE phase — Basic Usage
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
  > <repo_path>/.nexus-map/raw/ast_nodes.json

# If the repository contains non-standard languages, add support via command-line parameters
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
  --add-extension .templ=templ \
  --add-query templ struct "(component_declaration name: (identifier) @class.name) @class.def" \
  > <repo_path>/.nexus-map/raw/ast_nodes.json

# For complex configurations, use a JSON file (see --language-config description for format)
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
  --language-config /custom/path/to/language-config.json \
  > <repo_path>/.nexus-map/raw/ast_nodes.json

# Generate filtered file tree simultaneously in PROFILE phase
python $SKILL_DIR/scripts/extract_ast.py <repo_path> [--max-nodes 500] \
  --file-tree-out .nexus-map/raw/file_tree.txt \
  > <repo_path>/.nexus-map/raw/ast_nodes.json

Dependency Installation (First Use):

bash

pip install -r $SKILL_DIR/scripts/requirements.txt

✅ Quality Self-Inspection (Must Pass All Before EMIT)

All five phases are completed, with explicit "✅ Completed" marks for each phase
The number of questions in OBJECT is not padded; each question is accompanied by evidence clues and an executable verification plan

The

code_path

implemented

nodes has been personally verified to exist;

planned/inferred

nodes use

implementation_status + evidence_path + evidence_gap

```
responsibility
```
field: specific, verifiable; explicitly explain gaps when evidence is insufficient
Full text of
```
INDEX.md
```
is < 2000 tokens, conclusions are specific and not overly certain (Rule 4)
If unsupported language files are found, downgrades and manual inference ranges have been explicitly marked in the final Markdown headers and relevant sections
```
arch/test_coverage.md
```
has been generated, and it is clearly stated that this is a static test surface rather than runtime coverage

nexus-mapper

NPX Install

Tags

SKILL.md Content (Chinese)

nexus-mapper — AI Project Probing Protocol

⚠️ CRITICAL — Do Not Skip Any of the Five Phases

📌 When to Call / When Not to Call

⚠️ Prerequisite Checks (Explicitly inform users of missing items; prioritize downgrade over abort when possible)

📥 Input Contract

📤 Output Format

🔍 On-Demand Query Tool

Query Modes

Usage Timing

🧠 Persistent Instructions

📋 PROBE Phase Hard Gates

🛡️ Execution Rules

Rule 1: OBJECT Rejects Formalism

Rule 2:
`implemented`
Nodes Must Have Real
`code_path`

Rule 3: EMIT Atomicity

Rule 4: INDEX.md is the Only Cold-Start Entry

🧭 Uncertainty Expression Specifications

Rule 5: Minimal Execution Surface and Sensitive Information Protection

Rule 6: Downgrades and Manual Inferences Must Be Explicitly Visible

🛠️ Script Toolchain

✅ Quality Self-Inspection (Must Pass All Before EMIT)