basalt-cortex
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBasalt Cortex
Basalt Cortex
Mine knowledge from multiple sources into Obsidian-compatible markdown files stored in . Each file has structured YAML frontmatter. Files auto-sync to basaltcortex.com via the CLI tray daemon.
~/Documents/basalt-cortex/basalt-cortexRead references/basalt-format.md before any file operations.
从多个数据源挖掘知识,并存储为与Obsidian兼容的Markdown文件,路径为。每个文件都包含结构化的YAML前置元数据。文件可通过 CLI托盘守护进程自动同步至basaltcortex.com。
~/Documents/basalt-cortex/basalt-cortex在执行任何文件操作前,请阅读references/basalt-format.md。
Modes
模式
| Mode | Trigger | What it does |
|---|---|---|
| init | "set up cortex", "cortex init" | Create vault structure, state.json, example notes |
| mine | "run the cortex", "mine emails", "mine slack" | Extract from a source, write Basalt files |
| query | "cortex search", "what do I know about" | Search across Basalt files |
| stats | "cortex stats" | Count files, show vault totals |
| sync | "cortex sync" | Push to Frond API (future — see references/sync-patterns.md) |
| 模式 | 触发指令 | 功能说明 |
|---|---|---|
| 初始化(init) | "set up cortex", "cortex init" | 创建vault目录结构、state.json文件及示例笔记 |
| 挖掘(mine) | "run the cortex", "mine emails", "mine slack" | 从指定数据源提取信息,生成Basalt格式文件 |
| 查询(query) | "cortex search", "what do I know about" | 在Basalt文件中进行搜索 |
| 统计(stats) | "cortex stats" | 统计文件数量,展示vault整体数据 |
| 同步(sync) | "cortex sync" | 推送至Frond API(功能待开发——详见references/sync-patterns.md) |
Init Mode
初始化模式
Create the vault structure. Run once before first mine.
创建vault目录结构。首次执行挖掘前需运行一次。
Check for existing vault
检查现有vault
bash
ls ~/Documents/basalt-cortex/state.json 2>/dev/null && echo "EXISTS" || echo "FRESH"If EXISTS: ask user — skip, reset (data loss warning), or continue (add missing folders only).
bash
ls ~/Documents/basalt-cortex/state.json 2>/dev/null && echo "EXISTS" || echo "FRESH"若返回EXISTS:询问用户——跳过初始化、重置(会丢失数据,需提示警告)或继续(仅添加缺失的文件夹)。
Create structure
创建目录结构
bash
mkdir -p ~/Documents/basalt-cortex/{clients,contacts,communications,knowledge,projects,notes,.obsidian}bash
mkdir -p ~/Documents/basalt-cortex/{clients,contacts,communications,knowledge,projects,notes,.obsidian}Write state.json
生成state.json文件
json
{
"version": "1.0",
"format": "basalt",
"cursors": { "gmail": null, "google_chat": null, "slack": null, "calendar": null },
"last_run": null,
"totals": { "clients": 0, "contacts": 0, "communications": 0, "knowledge": 0 },
"processed_source_ids": [],
"runs": []
}json
{
"version": "1.0",
"format": "basalt",
"cursors": { "gmail": null, "google_chat": null, "slack": null, "calendar": null },
"last_run": null,
"totals": { "clients": 0, "contacts": 0, "communications": 0, "knowledge": 0 },
"processed_source_ids": [],
"runs": []
}Write Obsidian config
生成Obsidian配置
json
// ~/Documents/basalt-cortex/.obsidian/app.json
{ "newFileLocation": "folder", "newFileFolderPath": "notes" }json
// ~/Documents/basalt-cortex/.obsidian/app.json
{ "newFileLocation": "folder", "newFileFolderPath": "notes" }Write 3 example notes
生成3篇示例笔记
Create one example client, contact, and knowledge note in Basalt format so the user can see the structure. Use templates from references/basalt-format.md.
Report the created structure to the user. Tell them to open in Obsidian.
~/Documents/basalt-cortex/根据Basalt格式创建1篇客户示例笔记、1篇联系人事例笔记和1篇知识示例笔记,模板参考references/basalt-format.md。
向用户报告已创建的目录结构,并告知其可在Obsidian中打开路径。
~/Documents/basalt-cortex/Mine Mode
挖掘模式
Extract knowledge from a source and write Basalt-format files.
从指定数据源提取知识并生成Basalt格式文件。
Source Selection
数据源选择
Ask or detect which source to mine:
| Source | Fetch method | Notes |
|---|---|---|
| gmail (default) | Gmail MCP ( | Use |
| google-chat | Google Chat MCP ( | Mine space-by-space, NOT |
| slack | Slack MCP or API token | MCP tools or |
| google-drive | Drive MCP or | Metadata + summaries, don't copy full docs |
| local | Read tool + Glob | |
| mcp | MCP tool calls | Any connected MCP server with searchable data |
| web | WebFetch or browser | Firecrawl, Playwright, or WebFetch |
| calendar | Calendar MCP or | Events, attendees, meeting notes |
询问用户或自动检测要挖掘的数据源:
| 数据源 | 获取方式 | 说明 |
|---|---|---|
| gmail(默认) | Gmail MCP ( | 使用 |
| google-chat | Google Chat MCP ( | 按空间逐个挖掘,不使用 |
| slack | Slack MCP或API令牌 | 使用MCP工具或带令牌的 |
| google-drive | Drive MCP或 | 提取元数据及摘要,不复制完整文档 |
| local(本地文件) | 读取工具 + Glob模式 | 对本地目录执行 |
| mcp | MCP工具调用 | 任何连接的、包含可搜索数据的MCP服务器 |
| web(网页) | WebFetch或浏览器 | 使用Firecrawl、Playwright或WebFetch |
| calendar(日历) | Calendar MCP或 | 提取事件、参会人、会议纪要 |
Proven Extraction Workflow (Two-Phase)
成熟的提取工作流(两阶段)
Mining works in two phases. Phase 1 (reconnaissance) uses MCP tools interactively. Phase 2 (batch write) generates a Python script for efficiency.
挖掘分为两个阶段。第一阶段(侦察)交互式使用MCP工具。第二阶段(批量写入)生成Python脚本以提升效率。
Phase 1: Reconnaissance via MCP (interactive)
第一阶段:通过MCP进行交互式侦察
Use MCP tools to fetch raw data and identify entities. Claude does the AI extraction in-context — no external LLM call needed.
Gmail example:
1. extract_contacts — scan 100 recent emails, get deduplicated contacts with names/emails/counts
- Use `field: "from"` for inbound contacts
- Use `field: "to"` for outbound contacts (from sent mail)
- Exclude automated domains: jezweb.net, google.com, github.com, cloudflare.com, etc.
2. list — fetch 30-50 emails per batch with bodyPreview
- Query: `in:inbox -category:promotions -category:social -category:updates -category:forums after:YYYY/MM/DD`
- Format: compact or full, bodyPreview: 1000-2000
3. get — fetch full content for significant threads (client conversations, support requests, decisions)
4. Pre-filter while scanning:
- Skip: 2FA codes, domain expiry notices, payment receipts, Wordfence alerts, auto top-ups
- Keep: Real human conversations, support requests, project discussions, business decisions
- See references/prefilter-patterns.md for full skip/keep rulesGoogle Chat example:
1. chat_spaces list — get all spaces with lastActiveTime
2. chat_messages list — fetch ONE space at a time, limit 25-50
- NEVER use search_active for mining (times out on 50+ spaces)
- Iterate space by space, save progress after each
3. Pre-filter: skip bot messages, join/leave events, webhook postsFrom the fetched data, identify:
- CLIENTS: businesses/organisations (name, domain, industry)
- CONTACTS: people mentioned (name, email, role, company, phone if visible)
- COMMUNICATIONS: the interaction itself (subject, participants, summary, type)
- KNOWLEDGE: facts, decisions, preferences, commitments, relationships, deadlines
For each entity, craft a field: 1-3 sentences, dense with names and context. This is the Vectorize embedding input — make it specific and useful for semantic search.
summary使用MCP工具获取原始数据并识别实体。Claude可在上下文内完成AI提取——无需调用外部LLM。
Gmail示例:
1. extract_contacts — 扫描100封近期邮件,获取去重后的联系人信息,包含姓名/邮箱/出现次数
- 使用`field: "from"`提取收件人侧联系人
- 使用`field: "to"`提取发件人侧联系人(来自已发送邮件)
- 排除自动化域名:jezweb.net、google.com、github.com、cloudflare.com等
2. list — 按批次获取30-50封带bodyPreview的邮件
- 查询语句:`in:inbox -category:promotions -category:social -category:updates -category:forums after:YYYY/MM/DD`
- 格式:紧凑或完整,bodyPreview长度:1000-2000字符
3. get — 获取重要线程的完整内容(客户对话、支持请求、决策相关)
4. 扫描时预过滤:
- 跳过:双因素验证码、域名过期通知、支付凭证、Wordfence警报、自动充值通知
- 保留:真实人工对话、支持请求、项目讨论、业务决策
- 完整的跳过/保留规则详见references/prefilter-patterns.mdGoogle Chat示例:
1. chat_spaces list — 获取所有空间及其lastActiveTime
2. chat_messages list — 逐个空间获取消息,限制25-50条
- 挖掘时绝不要使用search_active(空间数量超过50时会超时)
- 逐个空间迭代,完成每个空间后保存进度
3. 预过滤:跳过机器人消息、加入/离开事件、webhook推送内容从获取的数据中识别以下实体:
- 客户(CLIENTS):企业/组织(名称、域名、行业)
- 联系人(CONTACTS):提及的个人(姓名、邮箱、职位、公司、可见的电话)
- 沟通记录(COMMUNICATIONS):交互本身(主题、参与者、摘要、类型)
- 知识(KNOWLEDGE):事实、决策、偏好、承诺、关系、截止日期
为每个实体生成字段:1-3句话,包含具体姓名及上下文信息。该字段将作为Vectorize嵌入输入——需确保内容具体且对语义搜索有用。
summaryPhase 2: Batch Write via Python Script
第二阶段:通过Python脚本批量写入
Once entities are identified, generate a Python script to write all Basalt files at once. This is dramatically faster than individual Write tool calls (55 files in one execution vs 8 tool calls for 17 files).
Script location:
.jez/scripts/mine-{source}-batch.pyScript must include these helper functions:
- — lowercase, hyphens, no special chars, max 60 chars
slugify(text) write_client(domain, name, industry, summary, contacts, tags)write_contact(name, email, role, company, company_domain, summary, phone, tags)write_communication(date, subject_slug, subject, summary, participants, client_domain, comm_type, body, source_id)write_knowledge(topic_slug, summary, kind, client_domain, contact_email, body, date)
Key script behaviours:
- Check — never overwrite existing files (dedup)
if path.exists(): return - Use human-readable filenames (see basalt-format.md filename conventions)
- Keep machine IDs in frontmatter (field) for sync
id - Update totals and run history at the end
~/.cortex/state.json - Print each file written for progress tracking
Data goes directly in the script as Python data structures — not loaded from a JSON file. Claude populates the data arrays from Phase 1 analysis:
python
clients = [
("bigcolour.com.au", "Big Colour", "signage",
"Signage company. Justin is director. Active client with L2Chat agent.",
[("Justin Big Colour", "Director")]),
# ... more clients
]
for domain, name, industry, summary, contacts in clients:
write_client(domain, name, industry, summary, contacts)识别实体后,生成Python脚本一次性写入所有Basalt文件。相比单独调用Write工具,此方式效率显著提升(单次执行可写入55个文件,而单独调用工具写入17个文件需8次调用)。
脚本路径:
.jez/scripts/mine-{source}-batch.py脚本必须包含以下辅助函数:
- — 转换为小写,用连字符连接,移除特殊字符,最大长度60字符
slugify(text) write_client(domain, name, industry, summary, contacts, tags)write_contact(name, email, role, company, company_domain, summary, phone, tags)write_communication(date, subject_slug, subject, summary, participants, client_domain, comm_type, body, source_id)write_knowledge(topic_slug, summary, kind, client_domain, contact_email, body, date)
脚本核心行为:
- 检查— 绝不覆盖现有文件(去重)
if path.exists(): return - 使用易读的文件名(遵循basalt-format.md中的文件名约定)
- 在前置元数据中保留机器ID(字段)用于同步
id - 结束时更新中的统计数据及运行历史
~/.cortex/state.json - 打印每个已写入的文件以跟踪进度
数据直接嵌入脚本,以Python数据结构形式存在——无需从JSON文件加载。Claude会根据第一阶段的分析填充数据数组:
python
clients = [
("bigcolour.com.au", "Big Colour", "signage",
"Signage company. Justin is director. Active client with L2Chat agent.",
[("Justin Big Colour", "Director")]),
# ... 更多客户
]
for domain, name, industry, summary, contacts in clients:
write_client(domain, name, industry, summary, contacts)Common Arguments
通用参数
| Argument | Effect |
|---|---|
| Print what would be written, don't touch disk |
| Only process items from this date onward |
| Process N items per run (default: 50) |
| Which source to mine |
| 参数 | 作用 |
|---|---|
| 打印将要执行的操作,但不修改磁盘文件 |
| 仅处理该日期之后的内容 |
| 每次运行处理N条数据(默认:50) |
| 指定要挖掘的数据源 |
Environment
环境变量
| Variable | Default | Purpose |
|---|---|---|
| | Vault root (syncs to basaltcortex.com) |
| | Cursor + run history |
| | Your email — excluded from contacts |
| 变量 | 默认值 | 用途 |
|---|---|---|
| | Vault根目录(同步至basaltcortex.com) |
| | 游标及运行历史 |
| | 你的邮箱——会被排除在联系人之外 |
Query Mode
查询模式
Search across Basalt files. Claude can do this natively — no script needed.
在Basalt文件中进行搜索。Claude可原生支持此功能——无需脚本。
Commands
指令
| What user says | Action |
|---|---|
| "cortex search QUERY" | Grep frontmatter + content across all files |
| "what do I know about COMPANY" | Read |
| "cortex contacts" | List all files in |
| "cortex client DOMAIN" | Full dossier — client file + linked contacts + recent comms + facts |
| "cortex export TYPE" | Export to CSV or JSON |
| 用户指令 | 操作 |
|---|---|
| "cortex search QUERY" | 在所有文件的前置元数据及内容中搜索关键词 |
| "what do I know about COMPANY" | 读取 |
| "cortex contacts" | 列出 |
| "cortex client DOMAIN" | 生成完整档案——客户文件+关联联系人+近期沟通记录+相关事实 |
| "cortex export TYPE" | 导出为CSV或JSON格式 |
Search Pattern
搜索示例
bash
undefinedbash
undefinedKeyword search across all Basalt files
在所有Basalt文件中进行关键词搜索
grep -rl "QUERY" ~/Documents/basalt-cortex/ --include="*.md"
grep -rl "QUERY" ~/Documents/basalt-cortex/ --include="*.md"
Frontmatter field search
前置元数据字段搜索
grep -rl "client_domain: example.com" ~/Documents/basalt-cortex/ --include="*.md"
For structured queries, read frontmatter with Python `frontmatter` library or parse YAML between `---` markers.
---grep -rl "client_domain: example.com" ~/Documents/basalt-cortex/ --include="*.md"
对于结构化查询,可使用Python的`frontmatter`库读取前置元数据,或解析`---`标记间的YAML内容。
---Stats Mode
统计模式
bash
echo "Clients: $(find ~/Documents/basalt-cortex/clients -name '*.md' 2>/dev/null | wc -l)"
echo "Contacts: $(find ~/Documents/basalt-cortex/contacts -name '*.md' 2>/dev/null | wc -l)"
echo "Communications: $(find ~/Documents/basalt-cortex/communications -name '*.md' 2>/dev/null | wc -l)"
echo "Knowledge: $(find ~/Documents/basalt-cortex/knowledge -name '*.md' 2>/dev/null | wc -l)"
echo "Notes: $(find ~/Documents/basalt-cortex/notes -name '*.md' 2>/dev/null | wc -l)"Also read for last run date, cursor positions, and run history.
state.jsonbash
echo "Clients: $(find ~/Documents/basalt-cortex/clients -name '*.md' 2>/dev/null | wc -l)"
echo "Contacts: $(find ~/Documents/basalt-cortex/contacts -name '*.md' 2>/dev/null | wc -l)"
echo "Communications: $(find ~/Documents/basalt-cortex/communications -name '*.md' 2>/dev/null | wc -l)"
echo "Knowledge: $(find ~/Documents/basalt-cortex/knowledge -name '*.md' 2>/dev/null | wc -l)"
echo "Notes: $(find ~/Documents/basalt-cortex/notes -name '*.md' 2>/dev/null | wc -l)"同时读取文件获取上次运行日期、游标位置及运行历史。
state.jsonSync Mode
同步模式
Files in auto-sync to basaltcortex.com via the CLI tray daemon. No manual sync needed.
~/Documents/basalt-cortex/basalt-cortexThe daemon uses chokidar to watch for file changes and pushes to the API with content hash comparison (skip unchanged files) and last-write-wins conflict resolution.
Start daemon: (runs in system tray)
Manual push:
Manual pull:
Bidirectional: (watches local + polls remote every 30s)
basalt-cortex traybasalt-cortex pushbasalt-cortex pullbasalt-cortex sync~/Documents/basalt-cortex/basalt-cortex守护进程使用chokidar监听文件变化,并通过内容哈希对比(跳过未修改文件)和最后写入者获胜的冲突解决机制推送至API。
启动守护进程: (在系统托盘运行)
手动推送:
手动拉取:
双向同步: (监听本地文件+每30秒轮询远程)
basalt-cortex traybasalt-cortex pushbasalt-cortex pullbasalt-cortex syncScheduling
任务调度
| Method | How |
|---|---|
| Claude Code Cowork | Scheduled task: "Run basalt-cortex mine gmail" > Daily |
| Cron | |
| |
| 方式 | 操作说明 |
|---|---|
| Claude Code Cowork | 定时任务:"Run basalt-cortex mine gmail" > 每日执行 |
| Cron | |
| |
References
参考文档
| When | Read |
|---|---|
| Before any file operations | references/basalt-format.md |
| When extracting semantic fields from threads | references/field-catalog.md |
| Per-source fetch and extract patterns | references/source-patterns.md |
| Before processing raw content | references/prefilter-patterns.md |
| When syncing to Frond/D1/Vectorize | references/sync-patterns.md |
| 场景 | 参考文档 |
|---|---|
| 执行任何文件操作前 | references/basalt-format.md |
| 从对话线程中提取语义字段时 | references/field-catalog.md |
| 各数据源的获取及提取模式 | references/source-patterns.md |
| 处理原始内容前 | references/prefilter-patterns.md |
| 同步至Frond/D1/Vectorize时 | references/sync-patterns.md |