search-layer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Search Layer v2.2 — 意图感知多源检索协议

Search Layer v2.2 — Intent-Aware Multi-Source Retrieval Protocol

四源同级:Brave (
web_search
) + Exa + Tavily + Grok。按意图自动选策略、调权重、做合成。
Four parallel sources: Brave (
web_search
) + Exa + Tavily + Grok. Automatically select strategies, adjust weights, and synthesize results based on intent.

执行流程

Execution Flow

用户查询
[Phase 1] 意图分类 → 确定搜索策略
[Phase 2] 查询分解 & 扩展 → 生成子查询
[Phase 3] 多源并行检索 → Brave + search.py (Exa + Tavily + Grok)
[Phase 4] 结果合并 & 排序 → 去重 + 意图加权评分
[Phase 5] 知识合成 → 结构化输出

用户查询
[Phase 1] 意图分类 → 确定搜索策略
[Phase 2] 查询分解 & 扩展 → 生成子查询
[Phase 3] 多源并行检索 → Brave + search.py (Exa + Tavily + Grok)
[Phase 4] 结果合并 & 排序 → 去重 + 意图加权评分
[Phase 5] 知识合成 → 结构化输出

Phase 1: 意图分类

Phase 1: Intent Classification

收到搜索请求后,先判断意图类型,再决定搜索策略。不要问用户用哪种模式。
意图识别信号ModeFreshness权重偏向
Factual"什么是 X"、"X 的定义"、"What is X"answer权威 0.5
Status"X 最新进展"、"X 现状"、"latest X"deeppw/pm新鲜度 0.5
Comparison"X vs Y"、"X 和 Y 区别"deeppy关键词 0.4 + 权威 0.4
Tutorial"怎么做 X"、"X 教程"、"how to X"answerpy权威 0.5
Exploratory"深入了解 X"、"X 生态"、"about X"deep权威 0.5
News"X 新闻"、"本周 X"、"X this week"deeppd/pw新鲜度 0.6
Resource"X 官网"、"X GitHub"、"X 文档"fast关键词 0.5
详细分类指南见
references/intent-guide.md
判断规则
  1. 扫描查询中的信号词
  2. 多个类型匹配时选最具体的
  3. 无法判断时默认
    exploratory

After receiving a search request, first determine the intent type, then decide the search strategy. Do not ask users which mode to use.
IntentRecognition SignalsModeFreshnessWeight Bias
Factual"what is X", "definition of X", "What is X"answerAuthority 0.5
Status"latest progress of X", "current status of X", "latest X"deeppw/pmFreshness 0.5
Comparison"X vs Y", "difference between X and Y"deeppyKeyword 0.4 + Authority 0.4
Tutorial"how to do X", "X tutorial", "how to X"answerpyAuthority 0.5
Exploratory"learn X in depth", "X ecosystem", "about X"deepAuthority 0.5
News"X news", "X this week", "X this week"deeppd/pwFreshness 0.6
Resource"X official website", "X GitHub", "X documentation"fastKeyword 0.5
See
references/intent-guide.md
for detailed classification guidelines
Judgment Rules:
  1. Scan for signal words in the query
  2. Select the most specific type when multiple types match
  3. Default to
    exploratory
    if unable to judge

Phase 2: 查询分解 & 扩展

Phase 2: Query Decomposition & Expansion

根据意图类型,将用户查询扩展为一组子查询:
Expand the user query into a set of sub-queries based on the intent type:

通用规则

General Rules

  • 技术同义词自动扩展:k8s→Kubernetes, JS→JavaScript, Go→Golang, Postgres→PostgreSQL
  • 中文技术查询:同时生成英文变体(如 "Rust 异步编程" → 额外搜 "Rust async programming")
  • Automatic technical synonym expansion: k8s→Kubernetes, JS→JavaScript, Go→Golang, Postgres→PostgreSQL
  • Chinese technical queries: generate English variants simultaneously (e.g. "Rust 异步编程" → additionally search for "Rust async programming")

按意图扩展

Expansion by Intent

意图扩展策略示例
Factual加 "definition"、"explained""WebTransport" → "WebTransport", "WebTransport explained overview"
Status加年份、"latest"、"update""Deno 进展" → "Deno 2.0 latest 2026", "Deno update release"
Comparison拆成 3 个子查询"Bun vs Deno" → "Bun vs Deno", "Bun advantages", "Deno advantages"
Tutorial加 "tutorial"、"guide"、"step by step""Rust CLI" → "Rust CLI tutorial", "Rust CLI guide step by step"
Exploratory拆成 2-3 个角度"RISC-V" → "RISC-V overview", "RISC-V ecosystem", "RISC-V use cases"
News加 "news"、"announcement"、日期"AI 新闻" → "AI news this week 2026", "AI announcement latest"
Resource加具体资源类型"Anthropic MCP" → "Anthropic MCP official documentation"

IntentExpansion StrategyExample
FactualAdd "definition", "explained""WebTransport" → "WebTransport", "WebTransport explained overview"
StatusAdd year, "latest", "update""Deno progress" → "Deno 2.0 latest 2026", "Deno update release"
ComparisonSplit into 3 sub-queries"Bun vs Deno" → "Bun vs Deno", "Bun advantages", "Deno advantages"
TutorialAdd "tutorial", "guide", "step by step""Rust CLI" → "Rust CLI tutorial", "Rust CLI guide step by step"
ExploratorySplit into 2-3 perspectives"RISC-V" → "RISC-V overview", "RISC-V ecosystem", "RISC-V use cases"
NewsAdd "news", "announcement", date"AI news" → "AI news this week 2026", "AI announcement latest"
ResourceAdd specific resource type"Anthropic MCP" → "Anthropic MCP official documentation"

Phase 3: 多源并行检索

Phase 3: Multi-source Parallel Retrieval

Step 1: Brave(所有模式)

Step 1: Brave (All Modes)

对每个子查询调用
web_search
。如果意图有 freshness 要求,传
freshness
参数:
web_search(query="Deno 2.0 latest 2026", freshness="pw")
Call
web_search
for each sub-query. If the intent has freshness requirements, pass the
freshness
parameter:
web_search(query="Deno 2.0 latest 2026", freshness="pw")

Step 2: Exa + Tavily + Grok(Deep / Answer 模式)

Step 2: Exa + Tavily + Grok (Deep / Answer Mode)

对子查询调用 search.py,传入意图和 freshness:
bash
python3 /home/node/.openclaw/workspace/skills/search-layer/scripts/search.py \
  --queries "子查询1" "子查询2" "子查询3" \
  --mode deep \
  --intent status \
  --freshness pw \
  --num 5
各模式源参与矩阵
模式ExaTavilyGrok说明
fastfallbackExa 优先;无 Exa key 时用 Grok
deep三源并行
answer仅 Tavily(含 AI answer)
参数说明
参数说明
--queries
多个子查询并行执行(也可用位置参数传单个查询)
--mode
fast / deep / answer
--intent
意图类型,影响评分权重(不传则不评分,行为与 v1 一致)
--freshness
pd(24h) / pw(周) / pm(月) / py(年)
--domain-boost
逗号分隔的域名,匹配的结果权威分 +0.2
--num
每源每查询的结果数
Grok 源说明
  • 通过 completions API 调用 Grok 模型(
    grok-4.1-fast
    ),利用其实时知识返回结构化搜索结果
  • 自动检测时间敏感查询并注入当前时间上下文
  • 在 deep 模式下与 Exa、Tavily 并行执行
  • 需要在
    ~/.openclaw/credentials/search.json
    中配置 Grok 的
    apiUrl
    apiKey
    model
    (或通过环境变量
    GROK_API_URL
    GROK_API_KEY
    GROK_MODEL
  • 如果 Grok 配置缺失,自动降级为 Exa + Tavily 双源
Call search.py for sub-queries, pass intent and freshness:
bash
python3 /home/node/.openclaw/workspace/skills/search-layer/scripts/search.py \
  --queries "子查询1" "子查询2" "子查询3" \
  --mode deep \
  --intent status \
  --freshness pw \
  --num 5
Source Participation Matrix by Mode:
ModeExaTavilyGrokDescription
fastfallbackExa first; use Grok when no Exa key is available
deepThree sources run in parallel
answerTavily only (includes AI answer)
Parameter Description:
ParameterDescription
--queries
Multiple sub-queries executed in parallel (single query can also be passed as positional parameter)
--mode
fast / deep / answer
--intent
Intent type, affects scoring weight (no scoring if not passed, behavior consistent with v1)
--freshness
pd(24h) / pw(week) / pm(month) / py(year)
--domain-boost
Comma separated domain names, matching results get +0.2 authority score
--num
Number of results per source per query
Grok Source Description:
  • Call Grok model (
    grok-4.1-fast
    ) via completions API, return structured search results using its real-time knowledge
  • Automatically detect time-sensitive queries and inject current time context
  • Run in parallel with Exa and Tavily in deep mode
  • Need to configure Grok's
    apiUrl
    ,
    apiKey
    ,
    model
    in
    ~/.openclaw/credentials/search.json
    (or via environment variables
    GROK_API_URL
    ,
    GROK_API_KEY
    ,
    GROK_MODEL
    )
  • Automatically downgrade to Exa + Tavily dual sources if Grok configuration is missing

Step 3: 合并

Step 3: Merge

将 Brave 结果与 search.py 输出合并。按 canonical URL 去重,标记来源。
如果 search.py 返回了
score
字段,用它排序;Brave 结果没有 score 的,用同样的意图权重公式补算。

Merge Brave results with search.py output. Deduplicate by canonical URL, mark sources.
If search.py returns the
score
field, use it for sorting; if Brave results have no score, calculate it using the same intent weight formula.

Phase 3.5: 引用追踪(Thread Pulling)

Phase 3.5: Reference Tracking (Thread Pulling)

当搜索结果中包含 GitHub issue/PR 链接,且意图为 Status 或 Exploratory 时,自动触发引用追踪。
When search results include GitHub issue/PR links and the intent is Status or Exploratory, automatically trigger reference tracking.

自动触发条件

Automatic Trigger Conditions

  • 意图为
    status
    exploratory
  • 搜索结果中包含
    github.com/.../issues/
    github.com/.../pull/
    URL
  • Intent is
    status
    or
    exploratory
  • Search results include
    github.com/.../issues/
    or
    github.com/.../pull/
    URLs

方式 1: search.py --extract-refs(批量)

Method 1: search.py --extract-refs (Batch)

在搜索结果上直接提取引用图,无需额外调用:
bash
python3 search.py "OpenClaw config validation bug" --mode deep --intent status --extract-refs
输出中会多一个
refs
字段,包含每个结果 URL 的引用列表。
也可以跳过搜索,直接对已知 URL 提取引用:
bash
python3 search.py --extract-refs-urls "https://github.com/owner/repo/issues/123" "https://github.com/owner/repo/issues/456"
Extract reference graph directly from search results without additional calls:
bash
python3 search.py "OpenClaw config validation bug" --mode deep --intent status --extract-refs
There will be an additional
refs
field in the output, containing the reference list for each result URL.
You can also skip searching and extract references directly for known URLs:
bash
python3 search.py --extract-refs-urls "https://github.com/owner/repo/issues/123" "https://github.com/owner/repo/issues/456"

方式 2: fetch-thread(单 URL 深度抓取)

Method 2: fetch-thread (Single URL Deep Crawl)

对单个 URL 拉取完整讨论流 + 结构化引用:
bash
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format json
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format markdown
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --extract-refs-only
GitHub 场景(issue/PR):通过 API 拉取正文 + 全部 comments + timeline 事件(cross-references、commits),提取:
  • Issue/PR 引用(#123、owner/repo#123)
  • Duplicate 标记
  • Commit 引用
  • 关联 PR/issue(timeline cross-references)
  • 外部 URL
通用 web 场景:web fetch + 正则提取引用链接。
Pull complete discussion stream + structured references for a single URL:
bash
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format json
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format markdown
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --extract-refs-only
GitHub scenario (issue/PR): Pull body + all comments + timeline events (cross-references, commits) via API, extract:
  • Issue/PR references (#123, owner/repo#123)
  • Duplicate markers
  • Commit references
  • Associated PR/issue (timeline cross-references)
  • External URLs
General web scenario: web fetch + regex extraction of reference links.

Agent 执行流程

Agent Execution Flow

Step 1: search-layer 搜索 → 获取初始结果
Step 2: search.py --extract-refs 或 fetch-thread → 提取线索图
Step 3: Agent 筛选高价值线索(LLM 判断哪些值得追踪)
Step 4: fetch-thread 深度抓取每个高价值线索
Step 5: 重复 Step 2-4,直到信息闭环或达到深度限制(建议 max_depth=3)

Step 1: search-layer search → Get initial results
Step 2: search.py --extract-refs or fetch-thread → Extract clue graph
Step 3: Agent filters high-value clues (LLM judges which are worth tracking)
Step 4: fetch-thread deeply crawls each high-value clue
Step 5: Repeat Step 2-4 until information is closed or depth limit is reached (recommended max_depth=3)

Phase 4: 结果排序

Phase 4: Result Sorting

评分公式

Scoring Formula

score = w_keyword × keyword_match + w_freshness × freshness_score + w_authority × authority_score
权重由意图决定(见 Phase 1 表格)。各分项:
  • keyword_match (0-1):查询词在标题+摘要中的覆盖率
  • freshness_score (0-1):基于发布日期,越新越高(无日期=0.5)
  • authority_score (0-1):基于域名权威等级
    • Tier 1 (1.0): github.com, stackoverflow.com, 官方文档站
    • Tier 2 (0.8): HN, dev.to, 知名技术博客
    • Tier 3 (0.6): Medium, 掘金, InfoQ
    • Tier 4 (0.4): 其他
完整域名评分表见
references/authority-domains.json
score = w_keyword × keyword_match + w_freshness × freshness_score + w_authority × authority_score
Weights are determined by intent (see Phase 1 table). Each item:
  • keyword_match (0-1): Coverage of query terms in title + abstract
  • freshness_score (0-1): Based on publication date, newer is higher (no date = 0.5)
  • authority_score (0-1): Based on domain authority level
    • Tier 1 (1.0): github.com, stackoverflow.com, official documentation sites
    • Tier 2 (0.8): HN, dev.to, well-known technical blogs
    • Tier 3 (0.6): Medium, Juejin, InfoQ
    • Tier 4 (0.4): Others
See
references/authority-domains.json
for full domain rating table

Domain Boost

Domain Boost

通过
--domain-boost
参数手动指定需要加权的域名(匹配的结果权威分 +0.2):
bash
search.py "query" --mode deep --intent tutorial --domain-boost dev.to,freecodecamp.org
推荐搭配:
  • Tutorial →
    dev.to, freecodecamp.org, realpython.com, baeldung.com
  • Resource →
    github.com
  • News →
    techcrunch.com, arstechnica.com, theverge.com

Manually specify domains to be weighted via the
--domain-boost
parameter (matching results get +0.2 authority score):
bash
search.py "query" --mode deep --intent tutorial --domain-boost dev.to,freecodecamp.org
Recommended combinations:
  • Tutorial →
    dev.to, freecodecamp.org, realpython.com, baeldung.com
  • Resource →
    github.com
  • News →
    techcrunch.com, arstechnica.com, theverge.com

Phase 5: 知识合成

Phase 5: Knowledge Synthesis

根据结果数量选择合成策略:
Select synthesis strategy based on the number of results:

小结果集(≤5 条)

Small result set (≤5 items)

逐条展示,每条带源标签和评分:
1. [Title](url) — snippet... `[brave, exa]` ⭐0.85
2. [Title](url) — snippet... `[tavily]` ⭐0.72
Display one by one, each with source tag and score:
1. [Title](url) — snippet... `[brave, exa]` ⭐0.85
2. [Title](url) — snippet... `[tavily]` ⭐0.72

中结果集(5-15 条)

Medium result set (5-15 items)

按主题聚类 + 每组摘要:
**主题 A: [描述]**
- [结果1] — 要点... `[source]`
- [结果2] — 要点... `[source]`

**主题 B: [描述]**
- [结果3] — 要点... `[source]`
Cluster by theme + summary per group:
**Theme A: [Description]**
- [Result 1] — Key points... `[source]`
- [Result 2] — Key points... `[source]`

**Theme B: [Description]**
- [Result 3] — Key points... `[source]`

大结果集(15+ 条)

Large result set (15+ items)

高层综述 + Top 5 + 深入提示:
[一段综述,概括主要发现]

**Top 5 最相关结果:**
1. ...
2. ...

共找到 N 条结果,覆盖 [源列表]。需要深入哪个方面?
High-level overview + Top 5 + in-depth prompt:
[An overview summarizing the main findings]

**Top 5 most relevant results:**
1. ...
2. ...

Found N results in total, covering [source list]. Which aspect do you want to explore in depth?

合成规则

Synthesis Rules

  • 先给答案,再列来源(不要先说"我搜了什么")
  • 按主题聚合,不按来源聚合(不要"Brave 结果:... Exa 结果:...")
  • 冲突信息显性标注:不同源说法矛盾时明确指出
  • 置信度表达
    • 多源一致 + 新鲜 → 直接陈述
    • 单源或较旧 → "根据 [source],..."
    • 冲突或不确定 → "存在不同说法:A 认为...,B 认为..."

  • Give the answer first, then list sources (don't start with "I searched for...")
  • Aggregate by theme, not by source (don't do "Brave results: ... Exa results: ...")
  • Explicitly mark conflicting information: clearly point out when different sources have contradictory statements
  • Confidence expression:
    • Multi-source consistent + fresh → state directly
    • Single source or older → "According to [source], ..."
    • Conflicting or uncertain → "There are different opinions: A believes..., B believes..."

降级策略

Degradation Strategy

  • Exa 429/5xx → 继续 Brave + Tavily + Grok
  • Tavily 429/5xx → 继续 Brave + Exa + Grok
  • Grok 超时/错误 → 继续 Brave + Exa + Tavily
  • search.py 整体失败 → 仅用 Brave
    web_search
    (始终可用)
  • 永远不要因为某个源失败而阻塞主流程

  • Exa 429/5xx → continue with Brave + Tavily + Grok
  • Tavily 429/5xx → continue with Brave + Exa + Grok
  • Grok timeout/error → continue with Brave + Exa + Tavily
  • Overall search.py failure → use only Brave
    web_search
    (always available)
  • Never block the main process because of a single source failure

向后兼容

Backward Compatibility

不带
--intent
参数时,search.py 行为与 v1 完全一致(无评分,按原始顺序输出)。
现有调用方(如 github-explorer)无需修改。

When the
--intent
parameter is not included, the behavior of search.py is exactly the same as v1 (no scoring, output in original order).
Existing callers (e.g. github-explorer) do not need modification.

快速参考

Quick Reference

场景命令
快速事实
web_search
+
search.py --mode answer --intent factual
深度调研
web_search
+
search.py --mode deep --intent exploratory
最新动态
web_search(freshness="pw")
+
search.py --mode deep --intent status --freshness pw
对比分析
web_search
× 3 queries +
search.py --queries "A vs B" "A pros" "B pros" --intent comparison
找资源
web_search
+
search.py --mode fast --intent resource
ScenarioCommand
Quick fact check
web_search
+
search.py --mode answer --intent factual
In-depth research
web_search
+
search.py --mode deep --intent exploratory
Latest updates
web_search(freshness="pw")
+
search.py --mode deep --intent status --freshness pw
Comparative analysis
web_search
× 3 queries +
search.py --queries "A vs B" "A pros" "B pros" --intent comparison
Find resources
web_search
+
search.py --mode fast --intent resource