cold-start

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cold Start — Day-One Brain Bootstrapping

冷启动——Brain首日数据引导

You have a working brain. Search works. Now what?
An empty brain is a static database. A brain with your email history, calendar, contacts, conversations, and social media is a live context membrane that makes every future interaction smarter. This skill sequences the highest-leverage data sources to get you from zero to useful in one session.
你已经拥有一个可运行的Brain,搜索功能正常。接下来该做什么?
空白Brain只是静态数据库。而包含你的邮件历史、日历、联系人、对话和社交媒体内容的Brain,是一个实时上下文载体,能让未来的每一次交互都更智能。本Skill按优先级排序高价值数据源,可在一次会话中帮你从零基础搭建起可用的Brain。

Contract

规则约定

  • Every import phase is gated on user consent (ask-user pattern) before proceeding.
  • Google/social API access goes through ClawVisor. The agent never holds raw OAuth tokens or API keys. This is a safety requirement, not a preference. ClawVisor vaults credentials, enforces task-scoped authorization, logs every API call, and requires human approval for destructive operations. If the user doesn't want ClawVisor, the only safe alternative is offline file exports (Google Takeout, Twitter archive download).
  • Each phase is independently valuable — the user can stop after any phase and still have a useful brain.
  • Progress is tracked in
    ~/.gbrain/cold-start-state.json
    so interrupted sessions can resume.
  • Entity detection and cross-linking run on every import, not as a separate pass.
  • 每个导入阶段开始前,都需获得用户同意(采用询问用户模式)。
  • 谷歌/社交平台API访问必须通过ClawVisor。Agent绝不会持有原始OAuth令牌或API密钥。这是安全要求,而非可选方案。ClawVisor会存储凭证、执行任务范围授权、记录每一次API调用,且破坏性操作需人工批准。若用户不愿使用ClawVisor,唯一安全的替代方案是离线文件导出(如Google Takeout、Twitter归档下载)。
  • 每个阶段都具备独立价值——用户可在任意阶段停止,仍能获得可用的Brain。
  • 进度会记录在
    ~/.gbrain/cold-start-state.json
    中,中断的会话可从中断处恢复。
  • 实体检测和交叉链接会在每次导入时同步运行,无需单独执行。

Prerequisites

前置条件

  • GBrain installed and initialized (
    gbrain doctor --json
    all green)
  • Brain repo cloned and synced
  • Agent has terminal access and can run
    gbrain
    CLI commands
  • GBrain已安装并初始化(
    gbrain doctor --json
    检测全绿)
  • Brain仓库已克隆并同步
  • Agent具备终端访问权限,可执行
    gbrain
    CLI命令

The Priority Stack

优先级排序

Data sources ranked by information density × ease of import:
PrioritySourceWhyTimePages Created
1Existing markdown/ObsidianHighest density — it's already structured5 min100s-1000s
2Google ContactsSeeds the people/ directory — names, emails, companies10 min50-500
3Google Calendar (90 days)Meeting history with attendee context15 min30-90
4Gmail (recent threads)Relationship context, active threads, org chart signals20 min50-200
5Conversations (ChatGPT/Claude exports)Your thinking, questions, mental models15 min10-100
6X/Twitter archiveYour public positions, takes, engagement patterns20 min30-365
7File archives (Dropbox/Drive/local)Historical documents, old writing, photos30+ minvaries
8Meeting transcripts (Circleback/etc.)Deep relationship context from recorded calls20 min10-50
数据源按信息密度 × 导入难度排序:
优先级数据源原因耗时创建页面数
1现有Markdown/Obsidian笔记信息密度最高——已结构化5分钟数百至数千页
2Google联系人填充people/目录——包含姓名、邮箱、公司信息10分钟50-500页
3Google日历(近90天)包含会议历史及参会者上下文15分钟30-90页
4Gmail(近期会话)包含关系上下文、活跃会话、组织架构信号20分钟50-200页
5对话导出(ChatGPT/Claude等)记录你的思考、问题、思维模型15分钟10-100页
6X/Twitter归档记录你的公开立场、观点、互动模式20分钟30-365页
7文件归档(Dropbox/Drive/本地)包含历史文档、旧文稿、照片30+分钟数量不定
8会议转录稿(Circleback等)包含录制通话中的深度关系上下文20分钟10-50页

Phase 0: ClawVisor Setup (Required for API Access)

阶段0:ClawVisor设置(API访问必备)

Safety boundary: An AI agent with raw OAuth tokens to your Gmail, Calendar, and Contacts is an uncontrolled attack surface. One prompt injection, one malicious tool call, and your entire Google account is exposed. ClawVisor eliminates this risk class entirely.
ClawVisor is a credential gateway that sits between the agent and your APIs. The agent never sees your credentials — ClawVisor injects them at request time, enforces policies, and logs everything.
What ClawVisor gives you:
  • Credential vaulting — agent sees shadow tokens, never real secrets
  • Task-scoped authorization — each workflow declares exactly what it needs
  • Audit trail — every API call logged with metadata (who, what, when)
  • Human approval gates — destructive operations (send email, modify calendar) require your explicit approval
  • Multi-service — Gmail, Calendar, Contacts, Drive, GitHub, iMessage from one gateway
  • Revocation — disable the agent's access in one click, no token rotation needed
Setup (15 min):
  1. Sign up at app.clawvisor.com
  2. Create an agent in the dashboard, copy the agent token
  3. Set environment variables:
    bash
    gbrain config set clawvisor_url "https://app.clawvisor.com"
    gbrain config set clawvisor_agent_token "<token>"
  4. Activate Google services (Gmail, Calendar, Contacts) in the dashboard
  5. Create a standing task with expansive scope:
    "Full brain bootstrapping: read emails, calendar events, and contacts to populate knowledge base. List, read, and search across all connected accounts."
  6. Save the standing task ID:
    bash
    gbrain config set clawvisor_task_id "<task_id>"
Critical scoping rule: Be expansive in task purposes. "Email triage" gets rejected by intent verification. "Full executive assistant email management including inbox triage, searching by any criteria, reading emails, tracking threads" works. The intent model uses the purpose to judge each request.
安全边界说明: 拥有Gmail、日历和联系人原始OAuth令牌的AI Agent是不受控的攻击面。一次提示注入或恶意工具调用,就可能导致你的整个谷歌账户泄露。ClawVisor可彻底消除这类风险。
ClawVisor是位于Agent与API之间的凭证网关。Agent永远不会看到你的凭证——ClawVisor会在请求时注入凭证、执行策略并记录所有操作。
ClawVisor提供的功能:
  • 凭证存储——Agent仅能看到影子令牌,无法获取真实密钥
  • 任务范围授权——每个工作流需明确声明所需权限
  • 审计追踪——每一次API调用都会记录元数据(执行者、操作内容、时间)
  • 人工批准机制——破坏性操作(发送邮件、修改日历)需你的明确批准
  • 多服务支持——通过一个网关即可访问Gmail、日历、联系人、Drive、GitHub、iMessage等
  • 权限撤销——一键禁用Agent的访问权限,无需轮换令牌
设置步骤(15分钟):
  1. app.clawvisor.com注册账号
  2. 在控制台创建Agent,复制Agent令牌
  3. 设置环境变量:
    bash
    gbrain config set clawvisor_url "https://app.clawvisor.com"
    gbrain config set clawvisor_agent_token "<token>"
  4. 在控制台激活谷歌服务(Gmail、日历、联系人)
  5. 创建一个范围宽泛的常驻任务:
    "完整Brain引导:读取邮件、日历事件和联系人以填充知识库。列出、读取并搜索所有关联账户的内容。"
  6. 保存常驻任务ID:
    bash
    gbrain config set clawvisor_task_id "<task_id>"
关键范围规则: 任务描述需尽可能宽泛。“邮件分类”会被意图验证拒绝,而“完整执行助理邮件管理,包括收件箱分类、任意条件搜索、读取邮件、追踪会话”则可行。意图模型会根据任务目的判断每一次请求。

If the user declines ClawVisor

若用户拒绝使用ClawVisor

Do NOT fall back to direct OAuth. Instead, skip Phases 2-4 (Contacts, Calendar, Gmail) and proceed with offline-only imports:
  • Phase 1 (markdown/Obsidian) — works without any API access
  • Phase 5 (conversation exports) — works from downloaded JSON files
  • Phase 6 (X/Twitter) — works from downloaded archive
  • Phase 7 (file archives) — works from local files
  • Phase 8 (meeting transcripts) — works from exported transcripts
Tell the user:
"No problem. We'll skip the Google imports for now and work with file-based sources. You can set up ClawVisor anytime to unlock Contacts, Calendar, and Gmail imports safely."
Do NOT offer direct OAuth as an alternative. An agent holding raw Google tokens is a security liability. The skill should not teach agents to store credentials they shouldn't have.
请勿退回到直接OAuth方式。相反,跳过阶段2-4(联系人、日历、Gmail),仅进行离线导入:
  • 阶段1(Markdown/Obsidian)——无需任何API访问即可运行
  • 阶段5(对话导出)——可从下载的JSON文件导入
  • 阶段6(X/Twitter)——可从下载的归档文件导入
  • 阶段7(文件归档)——可从本地文件导入
  • 阶段8(会议转录稿)——可从导出的转录稿导入
告知用户:
"没问题。我们暂时跳过谷歌导入,仅处理基于文件的数据源。你可随时设置ClawVisor,安全解锁联系人、日历和Gmail导入功能。"
请勿提供直接OAuth作为替代方案。持有谷歌原始令牌的Agent存在安全隐患,本Skill不应教导Agent存储不应持有的凭证。

Phase 1: Existing Markdown / Obsidian Import

阶段1:现有Markdown/Obsidian笔记导入

The highest-leverage first import. If the user already has a notes system, this is hundreds or thousands of structured pages ready to go.
价值最高的首次导入。若用户已有笔记系统,这意味着已有数百或数千个结构化页面可直接使用。

Discovery

发现数据源

bash
echo "=== Markdown Repository Discovery ==="
for dir in /data/* ~/git/* ~/Documents/* ~/notes/* ~/obsidian/* 2>/dev/null; do
  if [ -d "$dir" ]; then
    md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" \
      -not -path "*/.git/*" -not -path "*/.obsidian/*" 2>/dev/null | wc -l | tr -d ' ')
    if [ "$md_count" -gt 5 ]; then
      total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
      echo "  $dir ($total_size, $md_count .md files)"
    fi
  fi
done
bash
echo "=== Markdown仓库发现 ==="
for dir in /data/* ~/git/* ~/Documents/* ~/notes/* ~/obsidian/* 2>/dev/null; do
  if [ -d "$dir" ]; then
    md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" \
      -not -path "*/.git/*" -not -path "*/.obsidian/*" 2>/dev/null | wc -l | tr -d ' ')
    if [ "$md_count" -gt 5 ]; then
      total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
      echo "  $dir ($total_size, $md_count .md files)"
    fi
  fi
done

Import

导入操作

bash
undefined
bash
undefined

For Obsidian vaults, use the migrate skill for proper wikilink handling

对于Obsidian库,使用migrate Skill以正确处理维基链接

gbrain migrate --from obsidian --path /path/to/vault
gbrain migrate --from obsidian --path /path/to/vault

For plain markdown directories

对于普通Markdown目录

gbrain import /path/to/dir --no-embed --workers 4
gbrain import /path/to/dir --no-embed --workers 4

Verify

验证

gbrain stats gbrain search "<topic from the imported data>"
undefined
gbrain stats gbrain search "<导入数据中的主题>"
undefined

Post-import

导入后操作

  • Run link extraction:
    gbrain extract links --source db
  • Run timeline extraction:
    gbrain extract timeline --source db
  • Start embeddings:
    gbrain embed --stale
    (runs in background)
Track progress:
bash
echo '{"phase_1_complete": true, "pages_imported": N}' > ~/.gbrain/cold-start-state.json
  • 运行链接提取:
    gbrain extract links --source db
  • 运行时间线提取:
    gbrain extract timeline --source db
  • 启动嵌入任务:
    gbrain embed --stale
    (后台运行)
记录进度:
bash
echo '{"phase_1_complete": true, "pages_imported": N}' > ~/.gbrain/cold-start-state.json

Phase 2: Google Contacts → People Pages

阶段2:Google联系人 → 人物页面

Seeds the people/ directory. Every person in your contacts becomes a brain page with name, email, phone, company, and notes. This is the foundation that all other imports build on — when Gmail references "john@acme.com", the brain already knows who John is.
填充people/目录。联系人中的每个人都会成为Brain中的一个页面,包含姓名、邮箱、电话、公司和备注信息。这是所有其他导入的基础——当Gmail提到“john@acme.com”时,Brain已知道John是谁。

Via ClawVisor

通过ClawVisor获取

javascript
// Fetch all contacts
const contacts = await clawvisor('google.contacts', 'list_contacts', {
  limit: 1000,
  fields: 'names,emailAddresses,phoneNumbers,organizations,biographies'
});
javascript
// 获取所有联系人
const contacts = await clawvisor('google.contacts', 'list_contacts', {
  limit: 1000,
  fields: 'names,emailAddresses,phoneNumbers,organizations,biographies'
});

Via direct Google People API

通过Google People API直接获取

bash
curl -s -H "Authorization: Bearer $GOOGLE_TOKEN" \
  "https://people.googleapis.com/v1/people/me/connections?personFields=names,emailAddresses,phoneNumbers,organizations,biographies&pageSize=1000"
bash
curl -s -H "Authorization: Bearer $GOOGLE_TOKEN" \
  "https://people.googleapis.com/v1/people/me/connections?personFields=names,emailAddresses,phoneNumbers,organizations,biographies&pageSize=1000"

Processing rules

处理规则

For each contact:
  1. Filter out noise — skip contacts with no name, no email, or that are clearly automated (noreply@, no-reply@, support@, notifications@)
  2. Check brain first
    gbrain search "name"
    to avoid duplicates
  3. Create people/ page with:
    • Name, email(s), phone(s), company, title
    • Source attribution:
      [Source: Google Contacts, YYYY-MM-DD]
    • Any notes from the contact as initial context
  4. Link to company — if the contact has an organization, create/update the company page and link the person to it
针对每个联系人:
  1. 过滤无效数据——跳过无姓名、无邮箱或明显是自动化账号的联系人(noreply@、no-reply@、support@、notifications@)
  2. 先检查Brain——执行
    gbrain search "name"
    避免重复创建
  3. 创建people/页面,包含:
    • 姓名、邮箱、电话、公司、职位
    • 来源标注:
      [来源:Google Contacts,YYYY-MM-DD]
    • 联系人中的备注信息作为初始上下文
  4. 关联公司页面——若联系人所属组织存在,创建/更新公司页面并关联该人物

Quality gate

质量校验

After importing 5 contacts, pause and show the user a sample page. Ask:
"Here's what a contact page looks like. Want me to continue with the rest, or adjust the format first?"
导入5个联系人后,暂停并向用户展示一个示例页面。询问:
"这是联系人页面的样式。是否继续导入剩余联系人,还是先调整格式?"

Phase 3: Google Calendar (Last 90 Days)

阶段3:Google日历(近90天)

Meeting history with attendee context. Calendar events reveal who the user meets with, how often, and in what context. Combined with contacts, this builds a rich relationship map.
包含会议历史及参会者上下文。日历事件可揭示用户与谁会面、会面频率及会面场景。结合联系人数据,可构建丰富的关系图谱。

Fetch events

获取事件

javascript
// Via ClawVisor — query ALL calendar accounts
const accounts = ['primary@gmail.com', 'work@company.com'];
for (const account of accounts) {
  const events = await clawvisor(`google.calendar:${account}`, 'list_events', {
    timeMin: new Date(Date.now() - 90 * 86400000).toISOString(),
    timeMax: new Date().toISOString(),
    singleEvents: true,
    orderBy: 'startTime'
  });
}
javascript
// 通过ClawVisor——查询所有日历账户
const accounts = ['primary@gmail.com', 'work@company.com'];
for (const account of accounts) {
  const events = await clawvisor(`google.calendar:${account}`, 'list_events', {
    timeMin: new Date(Date.now() - 90 * 86400000).toISOString(),
    timeMax: new Date().toISOString(),
    singleEvents: true,
    orderBy: 'startTime'
  });
}

Brain structure

Brain结构

Follow the three-tier calendar architecture:
brain/daily/calendar/
├── calendar-log.md              ← compiled truth (patterns, key people)
├── YYYY/
│   ├── YYYY-MM.md               ← monthly summary
│   └── YYYY-MM-DD.md            ← daily event log
遵循三层日历架构:
brain/daily/calendar/
├── calendar-log.md              ← 汇总记录(模式、关键人物)
├── YYYY/
│   ├── YYYY-MM.md               ← 月度总结
│   └── YYYY-MM-DD.md            ← 每日事件日志

Entity enrichment

实体增强

For each event with attendees:
  1. Look up each attendee in the brain (they should exist from Phase 2)
  2. Add a timeline entry to their page: met at [event title] on [date]
  3. If an attendee has no brain page and appears in 3+ events, create one
  4. Link attendees who appear in the same meeting
针对每个有参会者的事件:
  1. 在Brain中查找每个参会者(应已在阶段2创建)
  2. 在其页面添加时间线条目:于[事件标题]会议中会面,日期[具体日期]
  3. 若参会者无Brain页面且出现在3次以上事件中,创建其页面
  4. 关联同一会议中的参会者

Phase 4: Gmail (Recent Threads)

阶段4:Gmail(近期会话)

Relationship context and active threads. Email reveals organizational relationships, ongoing conversations, and communication patterns.
包含关系上下文和活跃会话。邮件可揭示组织关系、正在进行的对话及沟通模式。

Strategy: Smart sampling, not bulk import

策略:智能抽样,而非批量导入

Don't import every email. Import the signal:
  1. Sent mail (last 30 days) — who the user actively communicates with
  2. Starred/important emails — user-curated signal
  3. Threads with 3+ replies — active conversations worth tracking
  4. Emails from people already in the brain — enrichment, not cold import
无需导入所有邮件,只需导入有效信号
  1. 已发送邮件(近30天)——用户主动沟通的对象
  2. 星标/重要邮件——用户标记的有效内容
  3. 回复3次以上的会话——值得追踪的活跃对话
  4. 来自Brain中已有联系人的邮件——用于增强上下文,而非冷导入

Processing

处理流程

For each email thread:
  1. Entity detection — extract people, companies mentioned
  2. Update people pages — add communication context to timeline
  3. Create meeting pages — if the email is a meeting summary or follow-up
  4. Skip noise — newsletters, automated notifications, marketing
针对每个邮件会话:
  1. 实体检测——提取提及的人物、公司
  2. 更新人物页面——在时间线中添加沟通上下文
  3. 创建会议页面——若邮件是会议总结或跟进内容
  4. 过滤无效内容——跳过新闻通讯、自动化通知、营销邮件

Filtering rules

过滤规则

Auto-skip (never import):
  • noreply@, no-reply@, notifications@, support@, mailer-daemon@
  • Unsubscribe-heavy senders (marketing)
  • GitHub/Jira/Linear notification emails
  • Calendar invites (already captured in Phase 3)
Always import:
  • Direct emails from people in the brain
  • Starred/flagged emails
  • Emails the user sent (their words are highest-value signal)
自动跳过(绝不导入):
  • noreply@、no-reply@、notifications@、support@、mailer-daemon@
  • 包含大量退订链接的发送者(营销类)
  • GitHub/Jira/Linear通知邮件
  • 日历邀请(已在阶段3捕获)
必须导入:
  • Brain中已有联系人发送的直接邮件
  • 星标/标记为重要的邮件
  • 用户发送的邮件(用户的表述是最高价值信号)

Phase 5: Conversation Exports (ChatGPT / Claude / Perplexity)

阶段5:对话导出(ChatGPT / Claude / Perplexity)

Your thinking, captured. AI conversation exports reveal what the user was researching, building, and thinking about. This is original thinking preserved in dialog form.
记录你的思考过程。AI对话导出可揭示用户正在研究、构建和思考的内容。这是以对话形式保存的原始思考。

Supported formats

支持格式

  • ChatGPT: Settings → Data Controls → Export →
    conversations.json
  • Claude: Download from claude.ai conversation history
  • Perplexity: Export from settings
  • ChatGPT: 设置 → 数据控制 → 导出 →
    conversations.json
  • Claude: 从claude.ai对话历史下载
  • Perplexity: 从设置中导出

Processing

处理流程

For each conversation:
  1. Assess significance (1-5 scale):
    • 1 = Pure utility (how-tos, quick lookups) → skip or minimal page
    • 2 = Minor context → 1-paragraph note
    • 3 = Notable (reveals interests, building something) → full page
    • 4 = Important (deep personal processing, strategic thinking) → rich page
    • 5 = Defining (identity work, breakthrough insights) → full treatment
  2. Extract entities — people, companies, concepts discussed
  3. Capture original thinking — the user's exact phrasing is the signal. Never paraphrase.
  4. File by primary subject — not in a "conversations/" dump. A conversation about a person goes to people/, about a concept goes to concepts/, etc.
针对每个对话:
  1. 评估重要性(1-5级):
    • 1级 = 纯实用内容(操作指南、快速查询)→ 跳过或创建极简页面
    • 2级 = 次要上下文 → 创建1段笔记
    • 3级 = 值得关注(揭示兴趣、正在构建的内容)→ 创建完整页面
    • 4级 = 重要内容(深度个人思考、战略规划)→ 创建详细页面
    • 5级 = 定义性内容(自我认知、突破性见解)→ 全面处理
  2. 提取实体——提及的人物、公司、概念
  3. 保留原始思考——用户的原话是核心信号,绝不改写
  4. 按主题分类存储——不要存入“conversations/”目录。关于人物的对话存入people/,关于概念的存入concepts/,以此类推

Quality rule

质量规则

Only import conversations rated 3+. The brain is for signal, not noise.
仅导入重要性3级及以上的对话。Brain用于存储有效信号,而非无效信息。

Phase 6: X/Twitter Archive

阶段6:X/Twitter归档

Your public positions and engagement patterns. Twitter reveals what the user thinks, who they engage with, and what ideas they're developing publicly.
记录你的公开立场和互动模式。Twitter可揭示用户的观点、互动对象及正在发展的想法。

Data sources

数据源

  1. Twitter data export (Settings → Your Account → Download Archive)
    • Contains all tweets, likes, DMs, bookmarks
  2. Live API (if available) — recent tweets and engagement
  3. Bookmarks — curated signal, high value
  1. Twitter数据导出(设置 → 你的账户 → 下载归档)
    • 包含所有推文、点赞、私信、书签
  2. 实时API(若可用)——近期推文和互动
  3. 书签——用户筛选的高价值内容

Brain structure

Brain结构

brain/media/x/{handle}/
├── x-log.md                     ← compiled truth (themes, voice, key threads)
├── daily/YYYY-MM-DD.md          ← daily tweet log
├── monthly/YYYY-MM.md           ← monthly rollup
└── bookmarks/                   ← saved/bookmarked content
brain/media/x/{handle}/
├── x-log.md                     ← 汇总记录(主题、风格、关键会话)
├── daily/YYYY-MM-DD.md          ← 每日推文日志
├── monthly/YYYY-MM.md           ← 月度汇总
└── bookmarks/                   ← 保存/书签内容

Processing

处理流程

  • Original tweets → capture with full context, extract entities
  • Quote tweets → capture the user's commentary + the source tweet
  • Threads → reconstruct as a single narrative
  • Bookmarks → high-signal curation, import with tags
  • Likes — low signal, skip unless the user wants them
  • 原创推文——完整保留上下文,提取实体
  • 引用推文——保留用户的评论 + 原推文内容
  • 推文线程——重构为完整叙事
  • 书签——高信号筛选内容,带标签导入
  • 点赞——低信号,除非用户要求否则跳过

Phase 7: File Archives

阶段7:文件归档

Historical documents, old writing, photos with metadata. This is the long tail — less structured but potentially very high value (old journals, letters, early writing).
Delegate to the
archive-crawler
skill. It handles:
  • Crawling directory structures
  • Filtering for high-value content (user's own writing, not installers)
  • Text extraction from PDFs, images (OCR), documents
  • Entity extraction and brain page creation
Safety gate: Archive crawling can be slow and create many pages. Always start with a scan-only pass:
bash
gbrain archive-crawler --scan-only --path /path/to/archive
Show the user the manifest before proceeding with full ingestion.
Supported sources:
  • Local directories (Dropbox sync folder, Google Drive, old hard drives)
  • Cloud storage (Backblaze B2, S3) via mounted paths
  • Email archives (PST, mbox, EML, Google Takeout)
  • Data exports (LinkedIn, Facebook, etc.)
包含历史文档、旧文稿、带元数据的照片。这是长尾内容——结构化程度较低,但可能具有极高价值(如旧日记、信件、早期文稿)。
委托
archive-crawler
Skill处理,它支持:
  • 遍历目录结构
  • 筛选高价值内容(用户原创文稿,而非安装包)
  • 从PDF、图片(OCR)、文档中提取文本
  • 实体提取和Brain页面创建
安全提示: 归档爬取可能较慢且会创建大量页面。始终先执行仅扫描模式:
bash
gbrain archive-crawler --scan-only --path /path/to/archive
在开始完整导入前,向用户展示扫描清单。
支持的数据源:
  • 本地目录(Dropbox同步文件夹、Google Drive、旧硬盘)
  • 云存储(Backblaze B2、S3)通过挂载路径访问
  • 邮件归档(PST、mbox、EML、Google Takeout)
  • 数据导出(LinkedIn、Facebook等)

Phase 8: Meeting Transcripts

阶段8:会议转录稿

Deep relationship context from recorded calls. If the user has a meeting recording service (Circleback, Otter, Fireflies, Read.ai), import recent transcripts.
Delegate to
meeting-ingestion
skill. Key rules:
  • Always pull the complete transcript, not just the AI summary
  • Entity propagation is MANDATORY — every attendee gets a timeline update
  • A meeting is NOT fully ingested until all entity pages are updated
包含录制通话中的深度关系上下文。若用户使用会议录制服务(Circleback、Otter、Fireflies、Read.ai),导入近期转录稿。
委托
meeting-ingestion
Skill处理,核心规则:
  • 始终获取完整转录稿,而非仅AI摘要
  • 必须执行实体传播——每个参会者的时间线都需更新
  • 只有所有实体页面更新完成,才算完成会议导入

Post-Bootstrap Checklist

引导完成后检查清单

After completing available phases:
  1. Verify brain health:
    bash
    gbrain doctor --json
    gbrain stats
  2. Test retrieval:
    bash
    gbrain query "who do I meet with most often?"
    gbrain query "what am I working on?"
    gbrain search "<person from contacts>"
  3. Set up live sync (if not already):
    • Calendar: daily cron
    • Email: periodic sweep (4-8 hours)
    • X: daily ingest
    • Brain repo:
      gbrain sync --repo <path>
      every 5-30 minutes
  4. Track state:
    json
    // ~/.gbrain/cold-start-state.json
    {
      "started": "2026-01-15T10:00:00Z",
      "credential_gateway": "clawvisor",
      "phases_completed": [1, 2, 3, 4],
      "phases_skipped": [6, 7],
      "total_pages_created": 847,
      "total_entities_linked": 1203,
      "next_phase": 5
    }
  5. Tell the user what to do next:
    "Your brain has N pages across people, calendar, email, and conversations. Live sync is configured for [sources]. From here:
    • The signal-detector captures entities from every conversation
    • The briefing skill can compile daily context
    • The executive-assistant pattern handles email triage
    • Say 'enrich [person]' to deep-dive any contact"
完成可用阶段后:
  1. 验证Brain健康状态:
    bash
    gbrain doctor --json
    gbrain stats
  2. 测试检索功能:
    bash
    gbrain query "我最常和谁会面?"
    gbrain query "我正在做什么项目?"
    gbrain search "<联系人中的某个人>"
  3. 设置实时同步(若未设置):
    • 日历:每日定时任务
    • 邮件:定期扫描(4-8小时一次)
    • X:每日导入
    • Brain仓库:每5-30分钟执行一次
      gbrain sync --repo <path>
  4. 记录状态:
    json
    // ~/.gbrain/cold-start-state.json
    {
      "started": "2026-01-15T10:00:00Z",
      "credential_gateway": "clawvisor",
      "phases_completed": [1, 2, 3, 4],
      "phases_skipped": [6, 7],
      "total_pages_created": 847,
      "total_entities_linked": 1203,
      "next_phase": 5
    }
  5. 告知用户后续操作:
    "你的Brain已拥有N个页面,涵盖人物、日历、邮件和对话内容。已为[数据源]配置实时同步。接下来你可以:
    • 使用signal-detector捕获所有对话中的实体
    • 使用briefing Skill生成每日上下文汇总
    • 使用executive-assistant模式处理邮件分类
    • 说'enrich [人物名]'深入挖掘任意联系人的信息"

Anti-Patterns

反模式

  • Giving the agent raw OAuth tokens. This is the #1 anti-pattern. An agent with raw Gmail/Calendar tokens is an uncontrolled attack surface — one prompt injection and your entire Google account is exposed. Use ClawVisor. If the user declines ClawVisor, skip to offline imports. Never offer direct OAuth as a fallback.
  • Bulk importing everything without filtering. The brain is for signal, not noise. Filter out automated senders, marketing emails, utility conversations.
  • Importing without entity cross-linking. Every import should detect entities and update existing brain pages. Isolated imports don't compound.
  • Not gating on user consent. Every phase should be presented as a choice. The user may not want their DMs or therapy conversations imported.
  • Importing everything at significance 1. Not every conversation is worth a brain page. Use the significance scale and skip utility content.
  • Creating people pages for automated senders. Sentry, GitHub notifications, newsletter platforms are not people. Filter by the rules in Phase 4.
  • 给Agent提供原始OAuth令牌。这是头号反模式。拥有Gmail/日历原始令牌的Agent是不受控的攻击面——一次提示注入就可能导致整个谷歌账户泄露。请使用ClawVisor。若用户拒绝ClawVisor,切换到离线导入。绝不要提供直接OAuth作为替代方案。
  • 不做过滤就批量导入所有内容。Brain用于存储有效信号,而非无效信息。过滤掉自动化发送者、营销邮件、实用类对话。
  • 导入时不进行实体交叉链接。每次导入都应检测实体并更新现有Brain页面。孤立的导入无法产生协同效应。
  • 不获取用户同意就执行操作。每个阶段都应作为选项呈现给用户。用户可能不愿导入私信或治疗对话等内容。
  • 导入所有重要性1级的内容。并非所有对话都值得创建Brain页面。使用重要性分级,跳过实用类内容。
  • 为自动化发送者创建人物页面。Sentry、GitHub通知、新闻通讯平台不属于人物范畴。遵循阶段4的过滤规则。

Resume Protocol

恢复协议

If the session is interrupted:
  1. Read
    ~/.gbrain/cold-start-state.json
  2. Skip completed phases
  3. Resume from
    next_phase
  4. The user doesn't have to repeat credential setup or re-import completed sources
若会话中断:
  1. 读取
    ~/.gbrain/cold-start-state.json
  2. 跳过已完成的阶段
  3. next_phase
    处恢复
  4. 用户无需重复凭证设置或重新导入已完成的数据源

Output Format

输出格式

After each phase:
PHASE N COMPLETE: [source name]
================================

Pages created: N
Pages updated: N
Entities linked: N
Time elapsed: N min

Sample pages:
- people/jane-smith.md (created — 3 emails, 5 meetings)
- companies/acme-corp.md (updated — 2 new employees linked)

Next: Phase N+1 — [description]. Ready to proceed?
每个阶段完成后:
阶段N完成:[数据源名称]
================================

创建页面数:N
更新页面数:N
关联实体数:N
耗时:N分钟

示例页面:
- people/jane-smith.md(已创建——关联3封邮件、5次会议)
- companies/acme-corp.md(已更新——关联2名新员工)

下一步:阶段N+1——[描述]。是否继续?

Tools Used

使用的工具

  • search
    — check for existing pages before creating
  • query
    — hybrid search for entity deduplication
  • get_page
    — read existing pages for merge decisions
  • put_page
    — create and update brain pages
  • add_link
    — cross-reference entities
  • add_timeline_entry
    — record events on entity timelines
  • sync_brain
    — sync changes to the index after each phase
  • search
    ——创建前检查是否存在页面
  • query
    ——混合搜索以实现实体去重
  • get_page
    ——读取现有页面以决定合并策略
  • put_page
    ——创建和更新Brain页面
  • add_link
    ——交叉引用实体
  • add_timeline_entry
    ——在实体时间线中记录事件
  • sync_brain
    ——每个阶段完成后同步索引变更