swain-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!-- swain-model-hint: opus, effort: high -->
<!-- swain-model-hint: opus, effort: high -->
swain-search
swain-search
Collect, normalize, and cache source materials into reusable troves that swain-design artifacts can reference.
将来源材料收集、规范化并缓存为可复用的Trove,供swain-design工件引用。
Mode detection
模式检测
| Signal | Mode |
|---|---|
| No trove exists for the topic, or user says "research X" / "gather sources" | Create — new trove |
| Trove exists and user provides new sources or says "add to" / "extend" | Extend — add sources to existing trove |
| Trove exists and user says "refresh" or sources are past TTL | Refresh — re-fetch stale sources |
| User asks "what troves do we have" or "find sources about X" | Discover — search existing troves by tag |
| 信号 | 模式 |
|---|---|
| 主题对应的Trove不存在,或用户说"研究X" / "收集来源" | 创建 — 新Trove |
| Trove已存在,且用户提供新来源或说"添加到" / "扩展" | 扩展 — 为现有Trove添加来源 |
| Trove已存在,且用户说"刷新"或来源超过TTL | 刷新 — 重新获取过期来源 |
| 用户问"我们有哪些Trove"或"查找关于X的来源" | 发现 — 按标签搜索现有Trove |
Create mode
创建模式
Build a new trove from scratch.
从零开始构建新的Trove。
Step 1 — Gather inputs
步骤1 — 收集输入
Ask the user (or infer from context) for:
- Trove ID — a slug for the topic (e.g., ). Suggest one if the context is clear.
websocket-vs-sse - Tags — keywords for discovery (e.g., ,
real-time,websocket)sse - Sources — any combination of:
- Web search queries ("search for WebSocket vs SSE comparisons")
- URLs (web pages, forum threads, docs)
- Video/audio URLs
- Local file paths
- Freshness TTL overrides — optional, defaults are fine for most troves
If invoked from swain-design (e.g., spike entering Active), the artifact context provides the topic, tags, and sometimes initial sources.
向用户询问(或从上下文推断)以下信息:
- Trove ID — 主题的短标识(例如:)。如果上下文明确,可主动建议一个。
websocket-vs-sse - 标签 — 用于发现的关键词(例如:,
real-time,websocket)sse - 来源 — 以下任意组合:
- 网页搜索查询("搜索WebSocket与SSE的对比")
- URL(网页、论坛帖子、文档)
- 视频/音频URL
- 本地文件路径
- 新鲜度TTL覆盖 — 可选,默认值适用于大多数Trove
如果是从swain-design调用(例如:spike进入活跃状态),工件上下文会提供主题、标签,有时还会提供初始来源。
Step 2 — Collect and normalize
步骤2 — 收集与规范化
For each source, use the appropriate capability. Read for the exact markdown structure per source type.
skills/swain-search/references/normalization-formats.mdWeb search queries:
- Use a web search capability to find relevant results
- Select the top 3-5 most relevant results
- For each: fetch the page, normalize to markdown per the web page format
- If no web search capability is available, tell the user and skip
Web page URLs:
- Fetch the page using a browser or page-fetching capability
- Strip boilerplate (nav, ads, sidebars, cookie banners)
- Normalize to markdown per the web page format
- If fetch fails, record the URL in manifest with a flag and move on
failed: true
Video/audio URLs:
- Use a media transcription capability to get the transcript
- Normalize to markdown per the media format (timestamps, speaker labels, key points)
- If no transcription capability is available, tell the user and skip — or accept a pre-made transcript
Local files:
- Use a document conversion capability (PDF, DOCX, etc.) or read directly if already markdown
- Normalize per the document format
- For markdown files: add frontmatter only, preserve content
Forum threads / discussions:
- Fetch and normalize per the forum format (chronological, author-attributed)
- Flatten nested threads to chronological order with reply-to context
Repositories:
- Clone or read the repository contents
- Mirror the original directory tree under
sources/<source-id>/ - Default: mirror the full tree. For large repositories (thousands of files), ingest selectively and set in the manifest entry
selective: true - Populate the array with paths to the most important files (relative to the source-id directory)
highlights
Documentation sites:
- Crawl or fetch the documentation site
- Mirror the section hierarchy under
sources/<source-id>/ - Default: mirror the full site. For large sites, ingest selectively and set
selective: true - Populate the array with paths to the most important pages
highlights - Preserve internal link structure where possible
Each normalized source gets a slug-based source ID and lives in a directory-per-source layout:
- Flat sources (web, forum, media, document, local):
sources/<source-id>/<source-id>.md - Hierarchical sources (repository, documentation-site): with the original tree mirrored inside
sources/<source-id>/
Source ID generation:
- Derive the source ID as a slug from the source title or URL (e.g., ,
mdn-websocket-api)strangeloop-2025-realtime - When a slug collides with an existing source ID: append using two random words from
__word1-word2skills/swain-search/references/wordlist.txt - If the wordlist is missing, append followed by 4 hex characters (e.g.,
__) as a fallback__a3f8
针对每个来源,使用相应的能力。请阅读了解每种来源类型对应的精确Markdown结构。
skills/swain-search/references/normalization-formats.md网页搜索查询:
- 使用网页搜索能力查找相关结果
- 选择最相关的3-5个结果
- 对每个结果:获取页面内容,按照网页格式规范化为Markdown
- 如果没有可用的网页搜索能力,告知用户并跳过
网页URL:
- 使用浏览器或页面获取能力获取页面内容
- 去除冗余内容(导航栏、广告、侧边栏、Cookie提示)
- 按照网页格式规范化为Markdown
- 如果获取失败,在清单中记录该URL并标记,然后继续处理下一个来源
failed: true
视频/音频URL:
- 使用媒体转写能力获取转录文本
- 按照媒体格式规范化为Markdown(包含时间戳、说话人标签、关键点)
- 如果没有可用的转写能力,告知用户并跳过 — 或接受预先制作的转录文本
本地文件:
- 使用文档转换能力(PDF、DOCX等),如果已是Markdown格式则直接读取
- 按照文档格式规范化
- 对于Markdown文件:仅添加前置元数据,保留原有内容
论坛帖子/讨论:
- 获取内容并按照论坛格式规范化(按时间顺序、标注作者)
- 将嵌套帖子展平为带回复上下文的时间顺序列表
代码仓库:
- 克隆或读取仓库内容
- 在下镜像原始目录结构
sources/<source-id>/ - 默认:镜像完整目录树。对于大型仓库(数千个文件),选择性导入并在清单条目中设置
selective: true - 在数组中填充最重要文件的路径(相对于source-id目录)
highlights
文档站点:
- 爬取或获取文档站点内容
- 在下镜像章节层级
sources/<source-id>/ - 默认:镜像完整站点。对于大型站点,选择性导入并设置
selective: true - 在数组中填充最重要页面的路径
highlights - 尽可能保留内部链接结构
每个规范化后的来源都会获得一个基于短标识的source ID,并按每个来源一个目录的结构存储:
- 扁平来源(网页、论坛、媒体、文档、本地):
sources/<source-id>/<source-id>.md - 层级来源(代码仓库、文档站点):,内部镜像原始目录结构
sources/<source-id>/
Source ID生成规则:
- 从来源标题或URL派生短标识作为source ID(例如:,
mdn-websocket-api)strangeloop-2025-realtime - 当短标识与现有source ID冲突时:从中选取两个随机单词,追加为
skills/swain-search/references/wordlist.txt__word1-word2 - 如果单词列表缺失,作为回退方案,追加和4个十六进制字符(例如:
__)__a3f8
Step 3 — Generate manifest
步骤3 — 生成清单
Create following the schema in . Include:
manifest.yamlskills/swain-search/references/manifest-schema.md- Trove metadata (id, created date, tags)
- Default freshness TTL per source type
- One entry per source with provenance (URL/path, fetch date, content hash, type)
Compute content hashes as bare hex SHA-256 digests (no prefix) of the normalized markdown content:
bash
shasum -a 256 sources/mdn-websocket-api/mdn-websocket-api.md | cut -d' ' -f1按照中的 schema 创建。内容包括:
skills/swain-search/references/manifest-schema.mdmanifest.yaml- Trove元数据(id、创建日期、标签)
- 每种来源类型的默认新鲜度TTL
- 每个来源的条目,包含溯源信息(URL/路径、获取日期、内容哈希、类型)
计算内容哈希时,使用规范化后Markdown内容的纯十六进制SHA-256摘要(无前缀):
bash
shasum -a 256 sources/mdn-websocket-api/mdn-websocket-api.md | cut -d' ' -f1Step 4 — Generate synthesis
步骤4 — 生成综合摘要
Create — a structured distillation of key findings across all sources.
synthesis.mdStructure the synthesis by theme, not by source. Group related findings together, cite sources by ID, and surface:
- Key findings — what the sources collectively say about the topic
- Points of agreement — where sources converge
- Points of disagreement — where sources conflict or present alternatives
- Gaps — what the sources don't cover that might matter
Keep it concise. The synthesis is a starting point, not a comprehensive report — the user or artifact author will refine it.
创建 — 对所有来源的关键发现进行结构化提炼。
synthesis.md按主题而非来源组织综合摘要。将相关发现分组,通过source ID引用来源,并突出显示:
- 关键发现 — 所有来源关于该主题的共同结论
- 共识点 — 来源达成一致的内容
- 分歧点 — 来源存在冲突或提供替代方案的内容
- 空白点 — 来源未涵盖但可能重要的内容
保持简洁。综合摘要只是起点,而非全面报告 — 用户或工件作者会对其进行细化。
Step 5 — Commit and stamp
步骤5 — 提交与标记
Use the dual-commit pattern (same as swain-design lifecycle stamps) to give the trove a reachable commit hash.
Before Commit A — append a entry to with a placeholder for the commit hash:
historymanifest.yaml--yaml
history:
- event: created
date: 2026-03-09
commit: "--"
sources: 3Commit A — commit the trove content:
bash
git add docs/troves/<trove-id>/
git commit -m "research(<trove-id>): create trove with N sources"
TROVE_HASH=$(git rev-parse HEAD)Commit B — back-fill the commit hash into the history entry, then update the referencing artifact's frontmatter (if one exists):
bash
undefined使用双提交模式(与swain-design生命周期标记相同)为Trove赋予可访问的提交哈希。
提交A之前 — 在的条目中追加一个记录,使用作为提交哈希的占位符:
manifest.yamlhistory--yaml
history:
- event: created
date: 2026-03-09
commit: "--"
sources: 3提交A — 提交Trove内容:
bash
git add docs/troves/<trove-id>/
git commit -m "research(<trove-id>): create trove with N sources"
TROVE_HASH=$(git rev-parse HEAD)提交B — 将提交哈希回填到历史条目,然后更新引用该Trove的工件的前置元数据(如果存在):
bash
undefinedReplace "--" with the real hash in the history entry
将历史条目中的"--"替换为真实哈希
Update artifact frontmatter: trove: <trove-id>@<TROVE_HASH>
更新工件前置元数据:trove: <trove-id>@<TROVE_HASH>
git add docs/troves/<trove-id>/manifest.yaml
git add docs/<artifact-type>/<phase>/<artifact-dir>/ # if artifact exists
git commit -m "docs(<trove-id>): stamp history hash ${TROVE_HASH:0:7}"
If no referencing artifact exists yet (standalone research), Commit B still stamps the history entry — report the hash so it can be referenced later.git add docs/troves/<trove-id>/manifest.yaml
git add docs/<artifact-type>/<phase>/<artifact-dir>/ # 如果工件存在
git commit -m "docs(<trove-id>): stamp history hash ${TROVE_HASH:0:7}"
如果还没有引用该Trove的工件(独立研究),提交B仍会标记历史条目 — 报告哈希以便后续引用。Step 6 — Report
步骤6 — 报告
Tell the user what was created:
Trovecreated with N sources — committed as<trove-id>.<TROVE_HASH:0:7>
— provenance and metadatadocs/troves/<trove-id>/manifest.yaml — N normalized source filesdocs/troves/<trove-id>/sources/ — thematic distillationdocs/troves/<trove-id>/synthesis.mdReference from artifacts with:trove: <trove-id>@<TROVE_HASH:0:7>
告知用户已创建的内容:
已创建Trove,包含N个来源 — 提交哈希为<trove-id>。<TROVE_HASH:0:7>
— 溯源与元数据docs/troves/<trove-id>/manifest.yaml — N个规范化后的来源文件docs/troves/<trove-id>/sources/ — 主题提炼摘要docs/troves/<trove-id>/synthesis.md在工件中引用格式:trove: <trove-id>@<TROVE_HASH:0:7>
Extend mode
扩展模式
Add new sources to an existing trove.
- Read the existing
manifest.yaml - Collect and normalize new sources (same as Create step 2)
- Assign slug-based source IDs to new sources (following the same ID generation rules)
- Append new entries to
manifest.yaml - Update date
refreshed - Regenerate incorporating all sources (old + new)
synthesis.md - Append a entry with
historyandevent: extendedplaceholdercommit: "--" - Commit and stamp (same dual-commit pattern as Create step 5):
- Commit A:
git commit -m "research(<trove-id>): extend with N new sources" - Capture
TROVE_HASH=$(git rev-parse HEAD) - Commit B: back-fill hash in history entry, update referencing artifact frontmatter (if artifact exists)
- Commit A:
- Report what was added, including the new commit hash
为现有Trove添加新来源。
- 读取现有的
manifest.yaml - 收集并规范化新来源(与创建模式步骤2相同)
- 为新来源分配基于短标识的source ID(遵循相同的ID生成规则)
- 在中追加新条目
manifest.yaml - 更新日期
refreshed - 重新生成,整合所有来源(旧+新)
synthesis.md - 在条目中追加一个
history的记录,使用event: extended作为提交哈希占位符-- - 提交与标记(与创建模式步骤5相同的双提交模式):
- 提交A:
git commit -m "research(<trove-id>): extend with N new sources" - 获取
TROVE_HASH=$(git rev-parse HEAD) - 提交B:将哈希回填到历史条目,更新引用该Trove的工件前置元数据(如果存在)
- 提交A:
- 报告已添加的内容,包括新的提交哈希
Refresh mode
刷新模式
Re-fetch stale sources and update changed content.
- Read
manifest.yaml - For each source, check if date +
fetchedhas elapsedfreshness-ttl - For stale sources:
- Re-fetch the raw content
- Re-normalize to markdown
- Compute new content hash
- If hash changed: replace the source file, update manifest entry
- If hash unchanged: update only date
fetched
- Update date in manifest
refreshed - If any content changed, regenerate
synthesis.md - Append a entry with
history,event: refreshed, andsources-changed: Mplaceholdercommit: "--" - Commit and stamp (same dual-commit pattern as Create step 5):
- Commit A:
git commit -m "research(<trove-id>): refresh N sources (M changed)" - Capture
TROVE_HASH=$(git rev-parse HEAD) - Commit B: back-fill hash in history entry, update referencing artifact(s) frontmatter — check in manifest for all dependents
referenced-by
- Commit A:
- Report: "Refreshed N sources. M had changed content, K were unchanged. New hash: ."
<TROVE_HASH:0:7>
For sources with , skip them during refresh.
freshness-ttl: never重新获取过期来源并更新已更改的内容。
- 读取
manifest.yaml - 对每个来源,检查日期 +
fetched是否已过期freshness-ttl - 对于过期来源:
- 重新获取原始内容
- 重新规范化为Markdown
- 计算新的内容哈希
- 如果哈希已更改:替换来源文件,更新清单条目
- 如果哈希未更改:仅更新日期
fetched
- 更新清单中的日期
refreshed - 如果有内容更改,重新生成
synthesis.md - 在条目中追加一个
history,event: refreshed的记录,使用sources-changed: M作为提交哈希占位符-- - 提交与标记(与创建模式步骤5相同的双提交模式):
- 提交A:
git commit -m "research(<trove-id>): refresh N sources (M changed)" - 获取
TROVE_HASH=$(git rev-parse HEAD) - 提交B:将哈希回填到历史条目,更新所有引用该Trove的工件前置元数据 — 检查清单中的获取所有依赖项
referenced-by
- 提交A:
- 报告:"已刷新N个来源。其中M个内容有更改,K个无变化。新哈希:。"
<TROVE_HASH:0:7>
对于的来源,刷新时跳过。
freshness-ttl: neverDiscover mode
发现模式
Help the user find existing troves relevant to their topic.
- Scan for all troves
docs/troves/*/manifest.yaml - Match against the user's query by:
- Tag match — trove tags contain query keywords
- Title match — trove ID slug contains query keywords
- For each match, show: trove ID, tags, source count, last refreshed date, referenced-by list
- If no matches, suggest creating a new trove
帮助用户找到与他们的主题相关的现有Trove。
- 扫描获取所有Trove
docs/troves/*/manifest.yaml - 通过以下方式匹配用户查询:
- 标签匹配 — Trove标签包含查询关键词
- 标题匹配 — Trove ID短标识包含查询关键词
- 对于每个匹配结果,显示:Trove ID、标签、来源数量、最后刷新日期、引用列表
- 如果没有匹配结果,建议创建新的Trove
Graceful degradation
优雅降级
The skill references capabilities generically. When a capability isn't available:
| Capability | Fallback |
|---|---|
| Web search | Skip search-based sources. Tell user: "No web search capability available — provide URLs directly or add a search MCP." |
| Browser / page fetcher | Try basic URL fetch. If that fails: "Can't fetch this URL — paste the content or provide a local file." |
| Media transcription | "No transcription capability available — provide a pre-made transcript file, or add a media conversion tool." |
| Document conversion | "Can't convert this file type — provide a markdown version, or add a document conversion tool." |
Never fail the entire run because one capability is missing. Collect what you can, skip what you can't, and report clearly.
该技能通用地引用各种能力。当某能力不可用时:
| 能力 | 回退方案 |
|---|---|
| 网页搜索 | 跳过基于搜索的来源。告知用户:"无可用的网页搜索能力 — 请直接提供URL或添加搜索MCP。" |
| 浏览器/页面获取 | 尝试基础URL获取。如果失败:"无法获取该URL — 请粘贴内容或提供本地文件。" |
| 媒体转写 | "无可用的媒体转写能力 — 请提供预先制作的转录文件,或添加媒体转换工具。" |
| 文档转换 | "无法转换该文件类型 — 请提供Markdown版本,或添加文档转换工具。" |
不要因某一个能力缺失而导致整个运行失败。尽可能收集可用内容,跳过无法处理的部分,并清晰告知用户。
Capability detection
能力检测
Before collecting sources, check what's available. Look for tools matching these patterns — the exact tool names vary by installation:
- Web search: tools with "search" in the name (e.g., ,
brave_web_search)bing-search-to-markdown - Page fetching: tools with "fetch", "webpage", "browser" in the name (e.g., ,
fetch_content,webpage-to-markdown)browser_navigate - Media transcription: tools with "audio", "video", "youtube" in the name (e.g., ,
audio-to-markdown)youtube-to-markdown - Document conversion: tools with "pdf", "docx", "pptx", "xlsx" in the name (e.g., ,
pdf-to-markdown)docx-to-markdown
Report available capabilities at the start of collection so the user knows what will and won't work.
在收集来源之前,检查可用的能力。查找符合以下模式的工具 — 具体工具名称因安装环境而异:
- 网页搜索:名称包含"search"的工具(例如:,
brave_web_search)bing-search-to-markdown - 页面获取:名称包含"fetch", "webpage", "browser"的工具(例如:,
fetch_content,webpage-to-markdown)browser_navigate - 媒体转写:名称包含"audio", "video", "youtube"的工具(例如:,
audio-to-markdown)youtube-to-markdown - 文档转换:名称包含"pdf", "docx", "pptx", "xlsx"的工具(例如:,
pdf-to-markdown)docx-to-markdown
在收集开始时报告可用的能力,让用户了解哪些功能可用,哪些不可用。
Linking from artifacts
从工件链接到Trove
Artifacts reference troves in frontmatter:
yaml
trove: websocket-vs-sse@abc1234The format is . The commit hash pins the trove to a specific version — troves evolve over time as sources are added or refreshed, and the hash ensures reproducibility.
<trove-id>@<commit-hash>The dual-commit workflow in Create step 5, Extend step 8, and Refresh step 7 handles this automatically — Commit A records the trove content and Commit B stamps the hash into the history entry and referencing artifact's frontmatter. Do not defer this to the operator.
工件在前置元数据中引用Trove:
yaml
trove: websocket-vs-sse@abc1234格式为。提交哈希将Trove固定到特定版本 — 随着来源的添加或刷新,Trove会不断演变,哈希确保了可复现性。
<trove-id>@<commit-hash>创建模式步骤5、扩展模式步骤8和刷新模式步骤7中的双提交工作流会自动处理这一点 — 提交A记录Trove内容,提交B将哈希标记到历史条目和引用工件的前置元数据中。请勿将此操作推迟给操作人员。