kb CLI and Knowledge Base Pattern
Build and maintain a self-compiling Obsidian markdown knowledge base using the
CLI. The LLM reads raw sources, writes cross-linked wiki articles, files Q&A results back into the corpus, and runs lint-and-heal passes. The CLI also supports codebase ingestion with deep inspection commands for code quality, architecture health, and symbol relationships.
Each
topic lives in its own top-level folder (e.g.
) with
,
,
,
subtrees plus a topic-level
and
. All topics share a single Obsidian vault at the repo root. Read
references/architecture.md
for the full rationale and the four-phase pipeline (ingest → compile → query → lint).
The topic's
(symlinked to
) is the
schema document — it tells the LLM the scope, conventions, current articles, and research gaps for that topic. Co-evolve it as the topic matures.
Prerequisites
- Verify the binary is available:
- For search and index commands, verify QMD is installed:
bash
qmd --version
# If missing: npm install -g @tobilu/qmd
- Supported source languages for codebase analysis: TypeScript (), TSX (), JavaScript (), JSX (), Go ().
Pattern Overview
Based on Andrej Karpathy's LLM Wiki pattern, the KB treats the LLM as a compiler that reads raw source documents and produces a structured, cross-linked markdown wiki. The four-phase loop:
- Ingest — Scrape/curate sources via CLI → (immutable staging)
- Compile — LLM reads , writes articles (3000-4000 words, dense wikilinks)
- Query — Q&A against wiki → file answers to , promote strong answers to wiki
- Lint — Automated structural checks + LLM-driven semantic healing
Read
references/architecture.md
for the full rationale, context-window vs RAG tradeoffs, and multi-topic vault design.
Related Skills
This skill orchestrates several companion skills for the LLM-driven phases:
- obsidian-markdown — author wiki articles with valid Obsidian Flavored Markdown (wikilinks, callouts, embeds, properties).
- obsidian-bases — create files under for dashboard views, filters, and formulas.
- obsidian-cli — interact with the running Obsidian vault from the command line (open notes, search, refresh indexes).
kb CLI Quick Reference
Topic management
bash
kb topic new <slug> <title> <domain> # scaffold a new topic
kb topic list # list all topics in the vault
kb topic info <slug> # topic metadata (counts, last log entry)
Ingestion (auto-generates frontmatter, auto-appends to log.md)
bash
kb ingest url <url> --topic <slug> # scrape a web URL via Firecrawl
kb ingest file <path> --topic <slug> # convert local file (PDF, DOCX, EPUB, HTML, images w/OCR, etc.)
kb ingest youtube <url> --topic <slug> # extract YouTube transcript
kb ingest bookmarks <path> --topic <slug> # ingest a bookmark-cluster markdown file
kb ingest codebase <path> --topic <slug> # analyze a codebase into raw/codebase/
Codebase inspection
bash
kb inspect smells [--type <smell-type>] --format json
kb inspect dead-code --format json
kb inspect complexity [--top N] --format json
kb inspect blast-radius [--min N] [--top N] --format json
kb inspect coupling [--unstable] --format json
kb inspect circular-deps --format json
kb inspect symbol <name> --format json
kb inspect file <path> --format json
kb inspect backlinks <name-or-path> --format json
kb inspect deps <name-or-path> --format json
Structural linting
bash
kb lint [<slug>] [--save] # dead links, orphans, missing sources, format violations, stale content
Indexing and search (requires QMD)
bash
kb index --topic <slug> # create or update QMD collection
kb search "<query>" --topic <slug> # hybrid BM25 + vector search
kb search "<query>" --lex --topic <slug> # keyword-only search
kb search "<query>" --vec --topic <slug> # vector-only search
After running
or
, the CLI auto-appends entries to
. Manual log entries are still needed for compile, query, promote, and split operations (Procedure 5).
Command Dispatch
Map the user's intent to the correct command:
| Intent | Command |
|---|
| Scaffold a new topic | kb topic new <slug> <title> <domain>
|
| List all topics | |
| Scrape a web URL | kb ingest url <url> --topic <slug>
|
| Ingest a local file (PDF, DOCX, etc.) | kb ingest file <path> --topic <slug>
|
| Extract a YouTube transcript | kb ingest youtube <url> --topic <slug>
|
| Ingest bookmark clusters | kb ingest bookmarks <path> --topic <slug>
|
| Analyze a codebase | kb ingest codebase <path> --topic <slug> --progress never
|
| Find code smells | kb inspect smells --format json
|
| Find dead exports and orphan files | kb inspect dead-code --format json
|
| Rank functions by complexity | kb inspect complexity --format json
|
| Find high-impact symbols (blast radius) | kb inspect blast-radius --min 5 --format json
|
| Find unstable files (coupling) | kb inspect coupling --unstable --format json
|
| Find circular imports | kb inspect circular-deps --format json
|
| Look up a specific symbol | kb inspect symbol <name> --format json
|
| Look up a specific file | kb inspect file <path> --format json
|
| Find what depends on X (incoming refs) | kb inspect backlinks <name-or-path> --format json
|
| Find what X depends on (outgoing deps) | kb inspect deps <name-or-path> --format json
|
| Run structural lint | |
| Index vault for search | |
| Search the knowledge base | kb search "<query>" --topic <slug> --format json
|
Codebase Analysis Workflow
For codebase-specific analysis, the
command must run before any inspect command.
Workflow A -- Code Analysis (no QMD required):
kb ingest codebase <path> --topic <slug> --> kb inspect <subcommand>
Workflow B -- Full Pipeline (requires QMD):
kb ingest codebase <path> --topic <slug> --> kb index --> kb search <query>
On first run,
bootstraps the topic under
<path>/.kb/vault/<topic-slug>/
by default. Later commands auto-discover this vault only when they run from inside that repository tree; otherwise pass
.
Ingest a Codebase
bash
kb ingest codebase <path> --topic <slug> --progress never
Always use
in agent contexts to prevent TTY progress bars from corrupting stdout.
Use
and
only when bootstrapping a missing topic.
Parse the JSON output from stdout to extract key values:
- -- the topic identifier for later commands
- -- absolute path to the vault root
- -- absolute path to the topic directory
- , , -- summary statistics
- -- check for warnings or errors
Stderr carries structured stage logs. Do not treat stderr content as failure evidence.
Key flags:
- -- override vault root location
- -- deprecated alias for
- -- target topic slug inside the vault
- -- bootstrap-only topic title override
- -- bootstrap-only topic domain override
- -- re-include paths that would otherwise be ignored (repeatable)
- -- exclude additional paths from scanning (repeatable)
- -- enable semantic analysis when adapters support it
Read
references/cli-ingest-codebase.md
for the full flag table and output schema.
Inspect the Vault
Run inspect subcommands to analyze code quality and architecture.
Shared flags for all inspect subcommands:
- -- always use JSON for programmatic parsing
- -- explicit vault root (omit to auto-discover from cwd)
- -- explicit topic slug (omit if only one topic exists)
Tabular Subcommands
These return a list of rows sorted by the primary metric:
-
smells -- List symbols and files with detected code smells.
kb inspect smells --format json
kb inspect smells --type high-complexity --format json
-
dead-code -- List dead exports and orphan files.
kb inspect dead-code --format json
-
complexity -- Rank functions/methods by cyclomatic complexity. Default top 20.
kb inspect complexity --format json
kb inspect complexity --top 50 --format json
-
blast-radius -- Rank symbols by transitive dependent count.
kb inspect blast-radius --format json
kb inspect blast-radius --min 10 --top 20 --format json
-
coupling -- Rank files by instability (Ce / (Ca + Ce)).
kb inspect coupling --format json
kb inspect coupling --unstable --format json
-
circular-deps -- List files participating in circular import chains.
kb inspect circular-deps --format json
Detail Lookup Subcommands
These return field-value pairs for a single matched entity:
-
symbol <name> -- Case-insensitive substring match. Returns detail fields for a single match, or a summary table for multiple matches.
kb inspect symbol parseConfig --format json
-
file <path> -- Exact source path lookup. Use the source-relative path as stored in vault frontmatter.
kb inspect file src/config.ts --format json
Relation Subcommands
These return relation edges (
,
,
):
-
backlinks <name-or-path> -- Incoming references. Accepts a symbol name or file path.
kb inspect backlinks parseConfig --format json
-
deps <name-or-path> -- Outgoing dependencies. Accepts a symbol name or file path.
kb inspect deps src/config.ts --format json
Read
references/cli-inspect.md
for all column schemas and flag details.
Index the Vault
Index the vault content into QMD for search. This step requires QMD on PATH.
The command is idempotent: it checks whether the collection already exists and chooses
(create) or
(refresh) automatically.
Key flags:
- (default true) -- run embedding after syncing files
- -- force re-embedding all documents
- -- attach human context to improve search relevance
- -- override the derived collection name
Read
references/cli-search-index.md
for the full output schema.
Search the Vault
Search indexed vault content with QMD. Requires a prior
run.
bash
kb search "<query>" --topic <slug> --format json
Search modes:
- Hybrid (default) -- combines lexical and vector search
- Lexical () -- BM25 keyword search only
- Vector () -- embedding-based semantic search
The
and
flags are mutually exclusive. Omit both for hybrid mode.
Key flags:
- (default 10) -- maximum results
- -- minimum relevance threshold
- -- return full document content instead of snippets
- -- return all matches above the minimum score
Read
references/cli-search-index.md
for full details.
KB Maintenance Procedures
Procedure 1: Compile a wiki article
- Read
references/compilation-guide.md
to anchor on length, style, wikilink density, and sourcing rules.
- Identify candidate sources via
kb search "<topic phrase>" --topic <slug>
or read <topic>/wiki/index/Source Index.md
.
- Load the candidate raw sources fully into context.
- Load
<topic>/wiki/index/Concept Index.md
for orientation on existing articles and wikilink targets (including in other topics).
- Surface takeaways BEFORE drafting. Present to the user: 3-5 key takeaways from the sources, the entities/concepts this article will introduce or update, and anything that contradicts existing wiki articles. Ask: "Anything specific to emphasize or de-emphasize?" Wait for the response. Skip this step only if the user has explicitly asked for autonomous compilation.
- Write the article to
<topic>/wiki/concepts/<Article Title>.md
following the obsidian-markdown skill for wikilink, callout, and frontmatter syntax. Use the frontmatter schema from references/frontmatter-schemas.md
. Target 3000-4000 words with a Sources section, wikilinks to related articles, and code or diagram blocks where applicable.
- Backlink audit -- do not skip. Grep every existing article in for mentions of the new article's title, aliases, or core entities. For each match, add a wikilink at the first mention (and one later occurrence). This is the step most commonly skipped -- a compounding wiki depends on bidirectional links.
bash
grep -rln "<new article title or key term>" <topic>/wiki/concepts/
- Update the topic's indexes (Procedure 2).
- Update current-articles list.
- Re-index the topic's collection: .
- Append an entry to (Procedure 5) -- e.g.,
## [YYYY-MM-DD] compile | <Article Title> (<word_count> words, <N> sources)
.
When
updating an existing article (rather than writing new), use the
Current / Proposed / Reason / Source
diff format and contradiction-sweep workflow described in
references/compilation-guide.md
.
Procedure 2: Maintain topic indexes
After adding, renaming, or removing any wiki article:
<topic>/wiki/index/Dashboard.md
-- update article count, total word count, featured sections, and any Obsidian Base embeds (use the obsidian-bases skill to author files and embed them).
<topic>/wiki/index/Concept Index.md
-- insert/update the article row alphabetically with its one-line summary.
<topic>/wiki/index/Source Index.md
-- for each new article, append rows for every source it cites, with a wikilink back to the article.
- Optionally refresh the live view in Obsidian with the obsidian-cli skill (, ).
Procedure 3: Query the wiki and file back the answer
A query has two phases: Phase A produces the answer by reading the wiki (never from general knowledge); Phase B files the answer back so the exploration compounds.
Precondition: Identify which topic(s) the question belongs to. If the question spans topics, load each topic's Concept Index.
Phase A -- Answer from the wiki
- Read the topic's Concept Index first (
<topic>/wiki/index/Concept Index.md
). Scan the full index to identify candidate articles. Do NOT answer from general knowledge -- the wiki is the source of truth, even when the answer seems obvious. A contradiction between the wiki and general knowledge is itself valuable signal.
- Locate relevant articles. At small scale (<30 articles), the index is enough. At larger scale, supplement with
kb search "<phrase>" --topic <slug>
. Also grep the topic for keywords: grep -rl "<keyword>" <topic>/wiki/concepts/
.
- Read the identified articles in full. Follow one level of when targets look relevant to the question. Stop at one hop -- deeper traversal wastes context.
- (Optional) Pull in raw sources if an article's claim is ambiguous and its frontmatter points at a specific raw file worth verifying.
- Synthesize the answer with these properties:
- Grounded in the wiki articles you just read -- every factual claim traces back to a citation.
- Notes agreements and disagreements between articles when they exist.
- Flags gaps explicitly: "The wiki has no article on X" or "[[Article Y]] does not yet cover Z".
- Suggests follow-up ingest targets or open questions.
- Match format to question type:
- Factual → prose with inline citations.
- Comparison → table with rows per alternative, citations in cells.
- How-it-works → numbered steps with citations.
- What-do-we-know-about-X → structured summary with "Known", "Open questions", "Gaps".
- Visual → ASCII/Mermaid diagram, Marp deck (see
references/tooling-tips.md
), or matplotlib chart.
Phase B -- File back the answer
- Save the answer to
<topic>/outputs/queries/<YYYY-MM-DD> <Question Slug>.md
with frontmatter: , , informed_by: ["[[Article 1]]", "[[Article 2]]"]
. See references/frontmatter-schemas.md
for the full schema.
- In the body, list which wiki articles informed the answer under (as wikilinks) and call out new insights that should be absorbed back into those articles on the next compile pass.
- When a filed-back insight contradicts or extends an article's claims, recompile the affected articles (Procedure 1).
- Promote to wiki when the synthesis is durable. If the answer is a first-class reference (a comparison table, a trade-off analysis, a new concept synthesized from multiple articles), copy it to
<topic>/wiki/concepts/<Title>.md
following Procedure 1 standards and update the indexes (Procedure 2). Karpathy's pattern treats strong query answers as wiki citizens, not secondary artifacts.
- Append to (Procedure 5) -- e.g.,
## [YYYY-MM-DD] query | <Question Slug>
plus a second line ## [YYYY-MM-DD] promote | <Title>
if promoted.
Anti-patterns to avoid:
- Answering from memory -- always read the wiki pages. The wiki may contradict what you think you know.
- No citations -- every factual claim must trace back to a .
- Skipping the save -- good query answers compound the wiki's value. Always file to ; promote when durable.
- Silent gaps -- surface missing coverage explicitly so the next ingest pass can fill it.
Procedure 4: Lint and heal
Run structural lint via the
CLI:
This checks dead wikilinks, orphan articles, missing source references, format violations, and stale content, saving a dated report to
. For each issue,
propose the fix with a diff before applying -- do not batch-apply changes:
- Dead wikilink -- either create the missing article (Procedure 1) or rewrite the wikilink to point at an existing article.
- Orphan article -- add incoming wikilinks from at least one related article, or remove the article if it is outside the topic's scope.
- Missing source file -- an article's frontmatter references a file absent from . Either re-ingest () or correct the reference.
- Stale content -- article's date is older than its source's date. Recompile with current sources.
- Format violation -- fix missing frontmatter fields, H1 title, lead paragraph, or Sources section.
For deeper LLM-driven self-healing checks (inconsistencies across articles, missing coverage, wikilink audits, filed-back query absorption), read
references/lint-procedure.md
.
After the heal pass, append
## [YYYY-MM-DD] lint | <N> issues found, <M> fixed
to
.
Procedure 5: Append to log.md
The
CLI auto-appends log entries for
and
operations. Manual entries are needed for
compile,
query,
promote, and
split operations.
Format -- each entry is a single H2 heading with a consistent prefix so the log stays grep-able:
markdown
## [YYYY-MM-DD] <op> | <short description>
Where
is one of
,
,
, or
(ingest and lint are handled by
).
Examples:
markdown
## [2026-04-04] compile | Transformer Architecture (3847 words, 6 sources)
## [2026-04-04] query | 2026-04-04 flash-attention-vs-paged-attention.md
## [2026-04-04] promote | FlashAttention vs PagedAttention (from query)
## [2026-04-05] split | "Inference Optimization" → KV Cache, Speculative Decoding
Optionally add a body paragraph under each entry with more context (key findings, source urls, decisions made). Keep entries terse -- the log is for skimming, not prose.
Quick recent-activity check -- the consistent prefix lets unix tools query the log:
bash
grep "^## \[" <topic>/log.md | tail -10 # last 10 events
grep "^## \[.*compile" <topic>/log.md | wc -l # total compiles
grep "^## \[2026-04" <topic>/log.md # April 2026 events
Keep
at the topic root (not inside
or
) so it sits alongside
as a first-class topic artifact.
Output Format Selection
All
and
commands support
:
- json -- always use for programmatic parsing
- table -- human-readable aligned columns (default)
- tsv -- tab-separated for piping to Unix tools
The
and
commands always output JSON to stdout.
Read
references/output-formats.md
for format examples and empty result handling.
Error Handling
CLI Errors
| Error | Recovery |
|---|
unable to find a vault from <path>
| Run kb ingest codebase <path> --topic <slug>
first, or re-run with if the vault lives elsewhere |
| Run npm install -g @tobilu/qmd
|
| Run or to populate the vault |
multiple topics were found
| Re-run with |
--title and --domain are bootstrap-only
| Remove those flags when re-ingesting an existing topic |
no symbols matched "<query>"
| Use or to discover valid names |
| Use exact source-relative path from vault frontmatter (e.g. not ) |
KB Workflow Errors
| Error | Recovery |
|---|
| not found | Install the binary and ensure it is on PATH. Verify with |
| Topic not found | Run to see available topics, or scaffold with |
| Article exceeds 4000 words | Extract a sub-topic into its own article and wikilink to it |
| Cross-topic wikilink ambiguity | Disambiguate with full path: [[other-topic/wiki/concepts/Article Name|Display Name]]
|
| missing in existing topic | Create manually and backfill from git: git log --format='## [%ad] <op> | %s' --date=short <topic>/
|
Read
references/error-handling.md
for the full error catalog with causes and recovery steps.
Constraints
MUST DO
- Run before any inspect command on that topic
- Use when parsing output programmatically
- Use when running in a non-interactive context
- Parse stdout only for command output; treat stderr as diagnostics
- Use the from ingest output for subsequent flags
- Read
references/compilation-guide.md
before writing wiki articles
- Run backlink audits after every article compile (Procedure 1, step 7)
- File query answers to (Procedure 3)
- Append manual log entries for compile, query, promote, and split operations
MUST NOT DO
- Pass both and to
- Pass with to
- Treat stderr content as failure evidence for
- Assume vault location without running ingest or checking for
- Use relative paths like for -- use instead
- Answer wiki queries from general knowledge -- the wiki is the source of truth
- Skip the backlink audit when compiling articles
- Batch-apply lint fixes without proposing diffs first