basalt-cortex

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Basalt Cortex

Basalt Cortex

Mine knowledge from multiple sources into Obsidian-compatible markdown files stored in
~/Documents/basalt-cortex/
. Each file has structured YAML frontmatter. Files auto-sync to basaltcortex.com via the
basalt-cortex
CLI tray daemon.
Read references/basalt-format.md before any file operations.
从多个数据源挖掘知识,并存储为与Obsidian兼容的Markdown文件,路径为
~/Documents/basalt-cortex/
。每个文件都包含结构化的YAML前置元数据。文件可通过
basalt-cortex
CLI托盘守护进程自动同步至basaltcortex.com。
在执行任何文件操作前,请阅读references/basalt-format.md

Modes

模式

ModeTriggerWhat it does
init"set up cortex", "cortex init"Create vault structure, state.json, example notes
mine"run the cortex", "mine emails", "mine slack"Extract from a source, write Basalt files
query"cortex search", "what do I know about"Search across Basalt files
stats"cortex stats"Count files, show vault totals
sync"cortex sync"Push to Frond API (future — see references/sync-patterns.md)

模式触发指令功能说明
初始化(init)"set up cortex", "cortex init"创建vault目录结构、state.json文件及示例笔记
挖掘(mine)"run the cortex", "mine emails", "mine slack"从指定数据源提取信息,生成Basalt格式文件
查询(query)"cortex search", "what do I know about"在Basalt文件中进行搜索
统计(stats)"cortex stats"统计文件数量,展示vault整体数据
同步(sync)"cortex sync"推送至Frond API(功能待开发——详见references/sync-patterns.md)

Init Mode

初始化模式

Create the vault structure. Run once before first mine.
创建vault目录结构。首次执行挖掘前需运行一次。

Check for existing vault

检查现有vault

bash
ls ~/Documents/basalt-cortex/state.json 2>/dev/null && echo "EXISTS" || echo "FRESH"
If EXISTS: ask user — skip, reset (data loss warning), or continue (add missing folders only).
bash
ls ~/Documents/basalt-cortex/state.json 2>/dev/null && echo "EXISTS" || echo "FRESH"
若返回EXISTS:询问用户——跳过初始化、重置(会丢失数据,需提示警告)或继续(仅添加缺失的文件夹)。

Create structure

创建目录结构

bash
mkdir -p ~/Documents/basalt-cortex/{clients,contacts,communications,knowledge,projects,notes,.obsidian}
bash
mkdir -p ~/Documents/basalt-cortex/{clients,contacts,communications,knowledge,projects,notes,.obsidian}

Write state.json

生成state.json文件

json
{
  "version": "1.0",
  "format": "basalt",
  "cursors": { "gmail": null, "google_chat": null, "slack": null, "calendar": null },
  "last_run": null,
  "totals": { "clients": 0, "contacts": 0, "communications": 0, "knowledge": 0 },
  "processed_source_ids": [],
  "runs": []
}
json
{
  "version": "1.0",
  "format": "basalt",
  "cursors": { "gmail": null, "google_chat": null, "slack": null, "calendar": null },
  "last_run": null,
  "totals": { "clients": 0, "contacts": 0, "communications": 0, "knowledge": 0 },
  "processed_source_ids": [],
  "runs": []
}

Write Obsidian config

生成Obsidian配置

json
// ~/Documents/basalt-cortex/.obsidian/app.json
{ "newFileLocation": "folder", "newFileFolderPath": "notes" }
json
// ~/Documents/basalt-cortex/.obsidian/app.json
{ "newFileLocation": "folder", "newFileFolderPath": "notes" }

Write 3 example notes

生成3篇示例笔记

Create one example client, contact, and knowledge note in Basalt format so the user can see the structure. Use templates from references/basalt-format.md.
Report the created structure to the user. Tell them to open
~/Documents/basalt-cortex/
in Obsidian.

根据Basalt格式创建1篇客户示例笔记、1篇联系人事例笔记和1篇知识示例笔记,模板参考references/basalt-format.md
向用户报告已创建的目录结构,并告知其可在Obsidian中打开
~/Documents/basalt-cortex/
路径。

Mine Mode

挖掘模式

Extract knowledge from a source and write Basalt-format files.
从指定数据源提取知识并生成Basalt格式文件。

Source Selection

数据源选择

Ask or detect which source to mine:
SourceFetch methodNotes
gmail (default)Gmail MCP (
gmail_messages
)
Use
extract_contacts
+
list
+
get
google-chatGoogle Chat MCP (
chat_messages
)
Mine space-by-space, NOT
search_active
slackSlack MCP or API tokenMCP tools or
curl
with token
google-driveDrive MCP or
gws
CLI
Metadata + summaries, don't copy full docs
localRead tool + Glob
find
+
cat
on local directories
mcpMCP tool callsAny connected MCP server with searchable data
webWebFetch or browserFirecrawl, Playwright, or WebFetch
calendarCalendar MCP or
gws
CLI
Events, attendees, meeting notes
询问用户或自动检测要挖掘的数据源:
数据源获取方式说明
gmail(默认)Gmail MCP (
gmail_messages
)
使用
extract_contacts
+
list
+
get
工具
google-chatGoogle Chat MCP (
chat_messages
)
按空间逐个挖掘,不使用
search_active
slackSlack MCP或API令牌使用MCP工具或带令牌的
curl
命令
google-driveDrive MCP或
gws
CLI
提取元数据及摘要,不复制完整文档
local(本地文件)读取工具 + Glob模式对本地目录执行
find
+
cat
操作
mcpMCP工具调用任何连接的、包含可搜索数据的MCP服务器
web(网页)WebFetch或浏览器使用Firecrawl、Playwright或WebFetch
calendar(日历)Calendar MCP或
gws
CLI
提取事件、参会人、会议纪要

Proven Extraction Workflow (Two-Phase)

成熟的提取工作流(两阶段)

Mining works in two phases. Phase 1 (reconnaissance) uses MCP tools interactively. Phase 2 (batch write) generates a Python script for efficiency.
挖掘分为两个阶段。第一阶段(侦察)交互式使用MCP工具。第二阶段(批量写入)生成Python脚本以提升效率。

Phase 1: Reconnaissance via MCP (interactive)

第一阶段:通过MCP进行交互式侦察

Use MCP tools to fetch raw data and identify entities. Claude does the AI extraction in-context — no external LLM call needed.
Gmail example:
1. extract_contacts — scan 100 recent emails, get deduplicated contacts with names/emails/counts
   - Use `field: "from"` for inbound contacts
   - Use `field: "to"` for outbound contacts (from sent mail)
   - Exclude automated domains: jezweb.net, google.com, github.com, cloudflare.com, etc.

2. list — fetch 30-50 emails per batch with bodyPreview
   - Query: `in:inbox -category:promotions -category:social -category:updates -category:forums after:YYYY/MM/DD`
   - Format: compact or full, bodyPreview: 1000-2000

3. get — fetch full content for significant threads (client conversations, support requests, decisions)

4. Pre-filter while scanning:
   - Skip: 2FA codes, domain expiry notices, payment receipts, Wordfence alerts, auto top-ups
   - Keep: Real human conversations, support requests, project discussions, business decisions
   - See references/prefilter-patterns.md for full skip/keep rules
Google Chat example:
1. chat_spaces list — get all spaces with lastActiveTime
2. chat_messages list — fetch ONE space at a time, limit 25-50
   - NEVER use search_active for mining (times out on 50+ spaces)
   - Iterate space by space, save progress after each
3. Pre-filter: skip bot messages, join/leave events, webhook posts
From the fetched data, identify:
  • CLIENTS: businesses/organisations (name, domain, industry)
  • CONTACTS: people mentioned (name, email, role, company, phone if visible)
  • COMMUNICATIONS: the interaction itself (subject, participants, summary, type)
  • KNOWLEDGE: facts, decisions, preferences, commitments, relationships, deadlines
For each entity, craft a
summary
field: 1-3 sentences, dense with names and context. This is the Vectorize embedding input — make it specific and useful for semantic search.
使用MCP工具获取原始数据并识别实体。Claude可在上下文内完成AI提取——无需调用外部LLM。
Gmail示例:
1. extract_contacts — 扫描100封近期邮件,获取去重后的联系人信息,包含姓名/邮箱/出现次数
   - 使用`field: "from"`提取收件人侧联系人
   - 使用`field: "to"`提取发件人侧联系人(来自已发送邮件)
   - 排除自动化域名:jezweb.net、google.com、github.com、cloudflare.com等

2. list — 按批次获取30-50封带bodyPreview的邮件
   - 查询语句:`in:inbox -category:promotions -category:social -category:updates -category:forums after:YYYY/MM/DD`
   - 格式:紧凑或完整,bodyPreview长度:1000-2000字符

3. get — 获取重要线程的完整内容(客户对话、支持请求、决策相关)

4. 扫描时预过滤:
   - 跳过:双因素验证码、域名过期通知、支付凭证、Wordfence警报、自动充值通知
   - 保留:真实人工对话、支持请求、项目讨论、业务决策
   - 完整的跳过/保留规则详见references/prefilter-patterns.md
Google Chat示例:
1. chat_spaces list — 获取所有空间及其lastActiveTime
2. chat_messages list — 逐个空间获取消息,限制25-50条
   - 挖掘时绝不要使用search_active(空间数量超过50时会超时)
   - 逐个空间迭代,完成每个空间后保存进度
3. 预过滤:跳过机器人消息、加入/离开事件、webhook推送内容
从获取的数据中识别以下实体:
  • 客户(CLIENTS):企业/组织(名称、域名、行业)
  • 联系人(CONTACTS):提及的个人(姓名、邮箱、职位、公司、可见的电话)
  • 沟通记录(COMMUNICATIONS):交互本身(主题、参与者、摘要、类型)
  • 知识(KNOWLEDGE):事实、决策、偏好、承诺、关系、截止日期
为每个实体生成
summary
字段:1-3句话,包含具体姓名及上下文信息。该字段将作为Vectorize嵌入输入——需确保内容具体且对语义搜索有用。

Phase 2: Batch Write via Python Script

第二阶段:通过Python脚本批量写入

Once entities are identified, generate a Python script to write all Basalt files at once. This is dramatically faster than individual Write tool calls (55 files in one execution vs 8 tool calls for 17 files).
Script location:
.jez/scripts/mine-{source}-batch.py
Script must include these helper functions:
  • slugify(text)
    — lowercase, hyphens, no special chars, max 60 chars
  • write_client(domain, name, industry, summary, contacts, tags)
  • write_contact(name, email, role, company, company_domain, summary, phone, tags)
  • write_communication(date, subject_slug, subject, summary, participants, client_domain, comm_type, body, source_id)
  • write_knowledge(topic_slug, summary, kind, client_domain, contact_email, body, date)
Key script behaviours:
  • Check
    if path.exists(): return
    — never overwrite existing files (dedup)
  • Use human-readable filenames (see basalt-format.md filename conventions)
  • Keep machine IDs in frontmatter (
    id
    field) for sync
  • Update
    ~/.cortex/state.json
    totals and run history at the end
  • Print each file written for progress tracking
Data goes directly in the script as Python data structures — not loaded from a JSON file. Claude populates the data arrays from Phase 1 analysis:
python
clients = [
    ("bigcolour.com.au", "Big Colour", "signage",
     "Signage company. Justin is director. Active client with L2Chat agent.",
     [("Justin Big Colour", "Director")]),
    # ... more clients
]

for domain, name, industry, summary, contacts in clients:
    write_client(domain, name, industry, summary, contacts)
识别实体后,生成Python脚本一次性写入所有Basalt文件。相比单独调用Write工具,此方式效率显著提升(单次执行可写入55个文件,而单独调用工具写入17个文件需8次调用)。
脚本路径:
.jez/scripts/mine-{source}-batch.py
脚本必须包含以下辅助函数:
  • slugify(text)
    — 转换为小写,用连字符连接,移除特殊字符,最大长度60字符
  • write_client(domain, name, industry, summary, contacts, tags)
  • write_contact(name, email, role, company, company_domain, summary, phone, tags)
  • write_communication(date, subject_slug, subject, summary, participants, client_domain, comm_type, body, source_id)
  • write_knowledge(topic_slug, summary, kind, client_domain, contact_email, body, date)
脚本核心行为:
  • 检查
    if path.exists(): return
    — 绝不覆盖现有文件(去重)
  • 使用易读的文件名(遵循basalt-format.md中的文件名约定)
  • 在前置元数据中保留机器ID(
    id
    字段)用于同步
  • 结束时更新
    ~/.cortex/state.json
    中的统计数据及运行历史
  • 打印每个已写入的文件以跟踪进度
数据直接嵌入脚本,以Python数据结构形式存在——无需从JSON文件加载。Claude会根据第一阶段的分析填充数据数组:
python
clients = [
    ("bigcolour.com.au", "Big Colour", "signage",
     "Signage company. Justin is director. Active client with L2Chat agent.",
     [("Justin Big Colour", "Director")]),
    # ... 更多客户
]

for domain, name, industry, summary, contacts in clients:
    write_client(domain, name, industry, summary, contacts)

Common Arguments

通用参数

ArgumentEffect
--dry-run
Print what would be written, don't touch disk
--from YYYY-MM-DD
Only process items from this date onward
--batch-size N
Process N items per run (default: 50)
--source SOURCE
Which source to mine
参数作用
--dry-run
打印将要执行的操作,但不修改磁盘文件
--from YYYY-MM-DD
仅处理该日期之后的内容
--batch-size N
每次运行处理N条数据(默认:50)
--source SOURCE
指定要挖掘的数据源

Environment

环境变量

VariableDefaultPurpose
CORTEX_DIR
~/Documents/basalt-cortex
Vault root (syncs to basaltcortex.com)
CORTEX_STATE
~/.cortex/state.json
Cursor + run history
CORTEX_OWNER_EMAIL
jeremy@jezweb.net
Your email — excluded from contacts

变量默认值用途
CORTEX_DIR
~/Documents/basalt-cortex
Vault根目录(同步至basaltcortex.com)
CORTEX_STATE
~/.cortex/state.json
游标及运行历史
CORTEX_OWNER_EMAIL
jeremy@jezweb.net
你的邮箱——会被排除在联系人之外

Query Mode

查询模式

Search across Basalt files. Claude can do this natively — no script needed.
在Basalt文件中进行搜索。Claude可原生支持此功能——无需脚本。

Commands

指令

What user saysAction
"cortex search QUERY"Grep frontmatter + content across all files
"what do I know about COMPANY"Read
clients/{domain}.md
+ find related comms and knowledge
"cortex contacts"List all files in
contacts/
with name and email from frontmatter
"cortex client DOMAIN"Full dossier — client file + linked contacts + recent comms + facts
"cortex export TYPE"Export to CSV or JSON
用户指令操作
"cortex search QUERY"在所有文件的前置元数据及内容中搜索关键词
"what do I know about COMPANY"读取
clients/{domain}.md
文件,并查找相关沟通记录及知识
"cortex contacts"列出
contacts/
目录下所有文件,并从前置元数据中提取姓名和邮箱
"cortex client DOMAIN"生成完整档案——客户文件+关联联系人+近期沟通记录+相关事实
"cortex export TYPE"导出为CSV或JSON格式

Search Pattern

搜索示例

bash
undefined
bash
undefined

Keyword search across all Basalt files

在所有Basalt文件中进行关键词搜索

grep -rl "QUERY" ~/Documents/basalt-cortex/ --include="*.md"
grep -rl "QUERY" ~/Documents/basalt-cortex/ --include="*.md"

Frontmatter field search

前置元数据字段搜索

grep -rl "client_domain: example.com" ~/Documents/basalt-cortex/ --include="*.md"

For structured queries, read frontmatter with Python `frontmatter` library or parse YAML between `---` markers.

---
grep -rl "client_domain: example.com" ~/Documents/basalt-cortex/ --include="*.md"

对于结构化查询,可使用Python的`frontmatter`库读取前置元数据,或解析`---`标记间的YAML内容。

---

Stats Mode

统计模式

bash
echo "Clients:        $(find ~/Documents/basalt-cortex/clients -name '*.md' 2>/dev/null | wc -l)"
echo "Contacts:       $(find ~/Documents/basalt-cortex/contacts -name '*.md' 2>/dev/null | wc -l)"
echo "Communications: $(find ~/Documents/basalt-cortex/communications -name '*.md' 2>/dev/null | wc -l)"
echo "Knowledge:      $(find ~/Documents/basalt-cortex/knowledge -name '*.md' 2>/dev/null | wc -l)"
echo "Notes:          $(find ~/Documents/basalt-cortex/notes -name '*.md' 2>/dev/null | wc -l)"
Also read
state.json
for last run date, cursor positions, and run history.

bash
echo "Clients:        $(find ~/Documents/basalt-cortex/clients -name '*.md' 2>/dev/null | wc -l)"
echo "Contacts:       $(find ~/Documents/basalt-cortex/contacts -name '*.md' 2>/dev/null | wc -l)"
echo "Communications: $(find ~/Documents/basalt-cortex/communications -name '*.md' 2>/dev/null | wc -l)"
echo "Knowledge:      $(find ~/Documents/basalt-cortex/knowledge -name '*.md' 2>/dev/null | wc -l)"
echo "Notes:          $(find ~/Documents/basalt-cortex/notes -name '*.md' 2>/dev/null | wc -l)"
同时读取
state.json
文件获取上次运行日期、游标位置及运行历史。

Sync Mode

同步模式

Files in
~/Documents/basalt-cortex/
auto-sync to basaltcortex.com via the
basalt-cortex
CLI tray daemon. No manual sync needed.
The daemon uses chokidar to watch for file changes and pushes to the API with content hash comparison (skip unchanged files) and last-write-wins conflict resolution.
Start daemon:
basalt-cortex tray
(runs in system tray) Manual push:
basalt-cortex push
Manual pull:
basalt-cortex pull
Bidirectional:
basalt-cortex sync
(watches local + polls remote every 30s)

~/Documents/basalt-cortex/
目录下的文件可通过
basalt-cortex
CLI托盘守护进程自动同步至basaltcortex.com,无需手动同步。
守护进程使用chokidar监听文件变化,并通过内容哈希对比(跳过未修改文件)和最后写入者获胜的冲突解决机制推送至API。
启动守护进程:
basalt-cortex tray
(在系统托盘运行) 手动推送:
basalt-cortex push
手动拉取:
basalt-cortex pull
双向同步:
basalt-cortex sync
(监听本地文件+每30秒轮询远程)

Scheduling

任务调度

MethodHow
Claude Code CoworkScheduled task: "Run basalt-cortex mine gmail" > Daily
Cron
0 6 * * * ANTHROPIC_API_KEY=sk-... python3 ~/.jez/scripts/cortex-mine-gmail.py
/loop
/loop 24h basalt-cortex mine gmail
方式操作说明
Claude Code Cowork定时任务:"Run basalt-cortex mine gmail" > 每日执行
Cron
0 6 * * * ANTHROPIC_API_KEY=sk-... python3 ~/.jez/scripts/cortex-mine-gmail.py
/loop
/loop 24h basalt-cortex mine gmail

References

参考文档

WhenRead
Before any file operationsreferences/basalt-format.md
When extracting semantic fields from threadsreferences/field-catalog.md
Per-source fetch and extract patternsreferences/source-patterns.md
Before processing raw contentreferences/prefilter-patterns.md
When syncing to Frond/D1/Vectorizereferences/sync-patterns.md
场景参考文档
执行任何文件操作前references/basalt-format.md
从对话线程中提取语义字段时references/field-catalog.md
各数据源的获取及提取模式references/source-patterns.md
处理原始内容前references/prefilter-patterns.md
同步至Frond/D1/Vectorize时references/sync-patterns.md