basalt-cortex

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Basalt Cortex

Mine knowledge from multiple sources into Obsidian-compatible markdown files stored in

~/Documents/basalt-cortex/

. Each file has structured YAML frontmatter. Files auto-sync to basaltcortex.com via the

basalt-cortex

CLI tray daemon.

Read references/basalt-format.md before any file operations.

从多个数据源挖掘知识，并存储为与Obsidian兼容的Markdown文件，路径为

~/Documents/basalt-cortex/

。每个文件都包含结构化的YAML前置元数据。文件可通过

basalt-cortex

CLI托盘守护进程自动同步至basaltcortex.com。

在执行任何文件操作前，请阅读references/basalt-format.md。

Modes

模式

Mode	Trigger	What it does
init	"set up cortex", "cortex init"	Create vault structure, state.json, example notes
mine	"run the cortex", "mine emails", "mine slack"	Extract from a source, write Basalt files
query	"cortex search", "what do I know about"	Search across Basalt files
stats	"cortex stats"	Count files, show vault totals
sync	"cortex sync"	Push to Frond API (future — see references/sync-patterns.md)

模式	触发指令	功能说明
初始化（init）	"set up cortex", "cortex init"	创建vault目录结构、state.json文件及示例笔记
挖掘（mine）	"run the cortex", "mine emails", "mine slack"	从指定数据源提取信息，生成Basalt格式文件
查询（query）	"cortex search", "what do I know about"	在Basalt文件中进行搜索
统计（stats）	"cortex stats"	统计文件数量，展示vault整体数据
同步（sync）	"cortex sync"	推送至Frond API（功能待开发——详见references/sync-patterns.md）

Init Mode

初始化模式

Create the vault structure. Run once before first mine.

创建vault目录结构。首次执行挖掘前需运行一次。

Check for existing vault

检查现有vault

bash

ls ~/Documents/basalt-cortex/state.json 2>/dev/null && echo "EXISTS" || echo "FRESH"

If EXISTS: ask user — skip, reset (data loss warning), or continue (add missing folders only).

bash

ls ~/Documents/basalt-cortex/state.json 2>/dev/null && echo "EXISTS" || echo "FRESH"

若返回EXISTS：询问用户——跳过初始化、重置（会丢失数据，需提示警告）或继续（仅添加缺失的文件夹）。

Create structure

创建目录结构

bash

mkdir -p ~/Documents/basalt-cortex/{clients,contacts,communications,knowledge,projects,notes,.obsidian}

bash

mkdir -p ~/Documents/basalt-cortex/{clients,contacts,communications,knowledge,projects,notes,.obsidian}

Write state.json

生成state.json文件

json

{
  "version": "1.0",
  "format": "basalt",
  "cursors": { "gmail": null, "google_chat": null, "slack": null, "calendar": null },
  "last_run": null,
  "totals": { "clients": 0, "contacts": 0, "communications": 0, "knowledge": 0 },
  "processed_source_ids": [],
  "runs": []
}

json

{
  "version": "1.0",
  "format": "basalt",
  "cursors": { "gmail": null, "google_chat": null, "slack": null, "calendar": null },
  "last_run": null,
  "totals": { "clients": 0, "contacts": 0, "communications": 0, "knowledge": 0 },
  "processed_source_ids": [],
  "runs": []
}

Write Obsidian config

生成Obsidian配置

json

// ~/Documents/basalt-cortex/.obsidian/app.json
{ "newFileLocation": "folder", "newFileFolderPath": "notes" }

json

// ~/Documents/basalt-cortex/.obsidian/app.json
{ "newFileLocation": "folder", "newFileFolderPath": "notes" }

Write 3 example notes

生成3篇示例笔记

Create one example client, contact, and knowledge note in Basalt format so the user can see the structure. Use templates from references/basalt-format.md.

Report the created structure to the user. Tell them to open

~/Documents/basalt-cortex/

in Obsidian.

根据Basalt格式创建1篇客户示例笔记、1篇联系人事例笔记和1篇知识示例笔记，模板参考references/basalt-format.md。

向用户报告已创建的目录结构，并告知其可在Obsidian中打开

~/Documents/basalt-cortex/

路径。

Mine Mode

挖掘模式

Extract knowledge from a source and write Basalt-format files.

从指定数据源提取知识并生成Basalt格式文件。

Source Selection

数据源选择

Ask or detect which source to mine:

Source	Fetch method	Notes
gmail (default)	Gmail MCP ( `gmail_messages` )	Use `extract_contacts` + `list` + `get`
google-chat	Google Chat MCP ( `chat_messages` )	Mine space-by-space, NOT `search_active`
slack	Slack MCP or API token	MCP tools or `curl` with token
google-drive	Drive MCP or `gws` CLI	Metadata + summaries, don't copy full docs
local	Read tool + Glob	`find` + `cat` on local directories
mcp	MCP tool calls	Any connected MCP server with searchable data
web	WebFetch or browser	Firecrawl, Playwright, or WebFetch
calendar	Calendar MCP or `gws` CLI	Events, attendees, meeting notes

询问用户或自动检测要挖掘的数据源：

数据源	获取方式	说明
gmail（默认）	Gmail MCP ( `gmail_messages` )	使用 `extract_contacts` + `list` + `get` 工具
google-chat	Google Chat MCP ( `chat_messages` )	按空间逐个挖掘，不使用 `search_active`
slack	Slack MCP或API令牌	使用MCP工具或带令牌的 `curl` 命令
google-drive	Drive MCP或 `gws` CLI	提取元数据及摘要，不复制完整文档
local（本地文件）	读取工具 + Glob模式	对本地目录执行 `find` + `cat` 操作
mcp	MCP工具调用	任何连接的、包含可搜索数据的MCP服务器
web（网页）	WebFetch或浏览器	使用Firecrawl、Playwright或WebFetch
calendar（日历）	Calendar MCP或 `gws` CLI	提取事件、参会人、会议纪要

Proven Extraction Workflow (Two-Phase)

成熟的提取工作流（两阶段）

Mining works in two phases. Phase 1 (reconnaissance) uses MCP tools interactively. Phase 2 (batch write) generates a Python script for efficiency.

挖掘分为两个阶段。第一阶段（侦察）交互式使用MCP工具。第二阶段（批量写入）生成Python脚本以提升效率。

Phase 1: Reconnaissance via MCP (interactive)

第一阶段：通过MCP进行交互式侦察

Use MCP tools to fetch raw data and identify entities. Claude does the AI extraction in-context — no external LLM call needed.

Gmail example:

1. extract_contacts — scan 100 recent emails, get deduplicated contacts with names/emails/counts
   - Use `field: "from"` for inbound contacts
   - Use `field: "to"` for outbound contacts (from sent mail)
   - Exclude automated domains: jezweb.net, google.com, github.com, cloudflare.com, etc.

2. list — fetch 30-50 emails per batch with bodyPreview
   - Query: `in:inbox -category:promotions -category:social -category:updates -category:forums after:YYYY/MM/DD`
   - Format: compact or full, bodyPreview: 1000-2000

3. get — fetch full content for significant threads (client conversations, support requests, decisions)

4. Pre-filter while scanning:
   - Skip: 2FA codes, domain expiry notices, payment receipts, Wordfence alerts, auto top-ups
   - Keep: Real human conversations, support requests, project discussions, business decisions
   - See references/prefilter-patterns.md for full skip/keep rules

Google Chat example:

1. chat_spaces list — get all spaces with lastActiveTime
2. chat_messages list — fetch ONE space at a time, limit 25-50
   - NEVER use search_active for mining (times out on 50+ spaces)
   - Iterate space by space, save progress after each
3. Pre-filter: skip bot messages, join/leave events, webhook posts

From the fetched data, identify:

CLIENTS: businesses/organisations (name, domain, industry)
CONTACTS: people mentioned (name, email, role, company, phone if visible)
COMMUNICATIONS: the interaction itself (subject, participants, summary, type)
KNOWLEDGE: facts, decisions, preferences, commitments, relationships, deadlines

For each entity, craft a

summary

field: 1-3 sentences, dense with names and context. This is the Vectorize embedding input — make it specific and useful for semantic search.

使用MCP工具获取原始数据并识别实体。Claude可在上下文内完成AI提取——无需调用外部LLM。

Gmail示例：

1. extract_contacts — 扫描100封近期邮件，获取去重后的联系人信息，包含姓名/邮箱/出现次数
   - 使用`field: "from"`提取收件人侧联系人
   - 使用`field: "to"`提取发件人侧联系人（来自已发送邮件）
   - 排除自动化域名：jezweb.net、google.com、github.com、cloudflare.com等

2. list — 按批次获取30-50封带bodyPreview的邮件
   - 查询语句：`in:inbox -category:promotions -category:social -category:updates -category:forums after:YYYY/MM/DD`
   - 格式：紧凑或完整，bodyPreview长度：1000-2000字符

3. get — 获取重要线程的完整内容（客户对话、支持请求、决策相关）

4. 扫描时预过滤：
   - 跳过：双因素验证码、域名过期通知、支付凭证、Wordfence警报、自动充值通知
   - 保留：真实人工对话、支持请求、项目讨论、业务决策
   - 完整的跳过/保留规则详见references/prefilter-patterns.md

Google Chat示例：

1. chat_spaces list — 获取所有空间及其lastActiveTime
2. chat_messages list — 逐个空间获取消息，限制25-50条
   - 挖掘时绝不要使用search_active（空间数量超过50时会超时）
   - 逐个空间迭代，完成每个空间后保存进度
3. 预过滤：跳过机器人消息、加入/离开事件、webhook推送内容

从获取的数据中识别以下实体：

客户（CLIENTS）：企业/组织（名称、域名、行业）
联系人（CONTACTS）：提及的个人（姓名、邮箱、职位、公司、可见的电话）
沟通记录（COMMUNICATIONS）：交互本身（主题、参与者、摘要、类型）
知识（KNOWLEDGE）：事实、决策、偏好、承诺、关系、截止日期

为每个实体生成

summary

字段：1-3句话，包含具体姓名及上下文信息。该字段将作为Vectorize嵌入输入——需确保内容具体且对语义搜索有用。

Phase 2: Batch Write via Python Script

第二阶段：通过Python脚本批量写入

Once entities are identified, generate a Python script to write all Basalt files at once. This is dramatically faster than individual Write tool calls (55 files in one execution vs 8 tool calls for 17 files).

Script location:

.jez/scripts/mine-{source}-batch.py

Script must include these helper functions:

```
slugify(text)
```
— lowercase, hyphens, no special chars, max 60 chars

write_client(domain, name, industry, summary, contacts, tags)

write_contact(name, email, role, company, company_domain, summary, phone, tags)

write_communication(date, subject_slug, subject, summary, participants, client_domain, comm_type, body, source_id)

write_knowledge(topic_slug, summary, kind, client_domain, contact_email, body, date)

Key script behaviours:

Check
```
if path.exists(): return
```
— never overwrite existing files (dedup)
Use human-readable filenames (see basalt-format.md filename conventions)
Keep machine IDs in frontmatter (
```
id
```
field) for sync
Update
```
~/.cortex/state.json
```
totals and run history at the end
Print each file written for progress tracking

Data goes directly in the script as Python data structures — not loaded from a JSON file. Claude populates the data arrays from Phase 1 analysis:

python

clients = [
    ("bigcolour.com.au", "Big Colour", "signage",
     "Signage company. Justin is director. Active client with L2Chat agent.",
     [("Justin Big Colour", "Director")]),
    # ... more clients
]

for domain, name, industry, summary, contacts in clients:
    write_client(domain, name, industry, summary, contacts)

识别实体后，生成Python脚本一次性写入所有Basalt文件。相比单独调用Write工具，此方式效率显著提升（单次执行可写入55个文件，而单独调用工具写入17个文件需8次调用）。

脚本路径：

.jez/scripts/mine-{source}-batch.py

脚本必须包含以下辅助函数：

```
slugify(text)
```
— 转换为小写，用连字符连接，移除特殊字符，最大长度60字符

write_client(domain, name, industry, summary, contacts, tags)

write_contact(name, email, role, company, company_domain, summary, phone, tags)

write_communication(date, subject_slug, subject, summary, participants, client_domain, comm_type, body, source_id)

write_knowledge(topic_slug, summary, kind, client_domain, contact_email, body, date)

脚本核心行为：

检查
```
if path.exists(): return
```
— 绝不覆盖现有文件（去重）
使用易读的文件名（遵循basalt-format.md中的文件名约定）
在前置元数据中保留机器ID（
```
id
```
字段）用于同步
结束时更新
```
~/.cortex/state.json
```
中的统计数据及运行历史
打印每个已写入的文件以跟踪进度

数据直接嵌入脚本，以Python数据结构形式存在——无需从JSON文件加载。Claude会根据第一阶段的分析填充数据数组：

python

clients = [
    ("bigcolour.com.au", "Big Colour", "signage",
     "Signage company. Justin is director. Active client with L2Chat agent.",
     [("Justin Big Colour", "Director")]),
    # ... 更多客户
]

for domain, name, industry, summary, contacts in clients:
    write_client(domain, name, industry, summary, contacts)

Common Arguments

通用参数

Argument	Effect
`--dry-run`	Print what would be written, don't touch disk
`--from YYYY-MM-DD`	Only process items from this date onward
`--batch-size N`	Process N items per run (default: 50)
`--source SOURCE`	Which source to mine

参数	作用
`--dry-run`	打印将要执行的操作，但不修改磁盘文件
`--from YYYY-MM-DD`	仅处理该日期之后的内容
`--batch-size N`	每次运行处理N条数据（默认：50）
`--source SOURCE`	指定要挖掘的数据源

Environment

环境变量

Variable	Default	Purpose
`CORTEX_DIR`	`~/Documents/basalt-cortex`	Vault root (syncs to basaltcortex.com)
`CORTEX_STATE`	`~/.cortex/state.json`	Cursor + run history
`CORTEX_OWNER_EMAIL`	`jeremy@jezweb.net`	Your email — excluded from contacts

变量	默认值	用途
`CORTEX_DIR`	`~/Documents/basalt-cortex`	Vault根目录（同步至basaltcortex.com）
`CORTEX_STATE`	`~/.cortex/state.json`	游标及运行历史
`CORTEX_OWNER_EMAIL`	`jeremy@jezweb.net`	你的邮箱——会被排除在联系人之外

Query Mode

查询模式

Search across Basalt files. Claude can do this natively — no script needed.

在Basalt文件中进行搜索。Claude可原生支持此功能——无需脚本。

Commands

指令

What user says	Action
"cortex search QUERY"	Grep frontmatter + content across all files
"what do I know about COMPANY"	Read `clients/{domain}.md` + find related comms and knowledge
"cortex contacts"	List all files in `contacts/` with name and email from frontmatter
"cortex client DOMAIN"	Full dossier — client file + linked contacts + recent comms + facts
"cortex export TYPE"	Export to CSV or JSON

用户指令	操作
"cortex search QUERY"	在所有文件的前置元数据及内容中搜索关键词
"what do I know about COMPANY"	读取 `clients/{domain}.md` 文件，并查找相关沟通记录及知识
"cortex contacts"	列出 `contacts/` 目录下所有文件，并从前置元数据中提取姓名和邮箱
"cortex client DOMAIN"	生成完整档案——客户文件+关联联系人+近期沟通记录+相关事实
"cortex export TYPE"	导出为CSV或JSON格式

Search Pattern

搜索示例

bash

undefined

bash

undefined

Keyword search across all Basalt files

在所有Basalt文件中进行关键词搜索

grep -rl "QUERY" ~/Documents/basalt-cortex/ --include="*.md"

Frontmatter field search

前置元数据字段搜索

grep -rl "client_domain: example.com" ~/Documents/basalt-cortex/ --include="*.md"


For structured queries, read frontmatter with Python `frontmatter` library or parse YAML between `---` markers.

---

grep -rl "client_domain: example.com" ~/Documents/basalt-cortex/ --include="*.md"


对于结构化查询，可使用Python的`frontmatter`库读取前置元数据，或解析`---`标记间的YAML内容。

---

Stats Mode

统计模式

bash

echo "Clients:        $(find ~/Documents/basalt-cortex/clients -name '*.md' 2>/dev/null | wc -l)"
echo "Contacts:       $(find ~/Documents/basalt-cortex/contacts -name '*.md' 2>/dev/null | wc -l)"
echo "Communications: $(find ~/Documents/basalt-cortex/communications -name '*.md' 2>/dev/null | wc -l)"
echo "Knowledge:      $(find ~/Documents/basalt-cortex/knowledge -name '*.md' 2>/dev/null | wc -l)"
echo "Notes:          $(find ~/Documents/basalt-cortex/notes -name '*.md' 2>/dev/null | wc -l)"

Sync Mode

同步模式

Files in

~/Documents/basalt-cortex/

auto-sync to basaltcortex.com via the

basalt-cortex

CLI tray daemon. No manual sync needed.

The daemon uses chokidar to watch for file changes and pushes to the API with content hash comparison (skip unchanged files) and last-write-wins conflict resolution.

Start daemon:

basalt-cortex tray

(runs in system tray) Manual push:

basalt-cortex push

Manual pull:

basalt-cortex pull

Bidirectional:

basalt-cortex sync

(watches local + polls remote every 30s)

~/Documents/basalt-cortex/

目录下的文件可通过

basalt-cortex

CLI托盘守护进程自动同步至basaltcortex.com，无需手动同步。

守护进程使用chokidar监听文件变化，并通过内容哈希对比（跳过未修改文件）和最后写入者获胜的冲突解决机制推送至API。

启动守护进程：

basalt-cortex tray

（在系统托盘运行） 手动推送：

basalt-cortex push

手动拉取：

basalt-cortex pull

双向同步：

basalt-cortex sync

（监听本地文件+每30秒轮询远程）

Scheduling

任务调度

Method	How
Claude Code Cowork	Scheduled task: "Run basalt-cortex mine gmail" > Daily
Cron	`0 6 * * * ANTHROPIC_API_KEY=sk-... python3 ~/.jez/scripts/cortex-mine-gmail.py`
`/loop`	`/loop 24h basalt-cortex mine gmail`

方式	操作说明
Claude Code Cowork	定时任务："Run basalt-cortex mine gmail" > 每日执行
Cron	`0 6 * * * ANTHROPIC_API_KEY=sk-... python3 ~/.jez/scripts/cortex-mine-gmail.py`
`/loop`	`/loop 24h basalt-cortex mine gmail`

References

参考文档

When	Read
Before any file operations	references/basalt-format.md
When extracting semantic fields from threads	references/field-catalog.md
Per-source fetch and extract patterns	references/source-patterns.md
Before processing raw content	references/prefilter-patterns.md
When syncing to Frond/D1/Vectorize	references/sync-patterns.md

场景	参考文档
执行任何文件操作前	references/basalt-format.md
从对话线程中提取语义字段时	references/field-catalog.md
各数据源的获取及提取模式	references/source-patterns.md
处理原始内容前	references/prefilter-patterns.md
同步至Frond/D1/Vectorize时	references/sync-patterns.md