search-layer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Search Layer v2.2 — 意图感知多源检索协议

Search Layer v2.2 — Intent-Aware Multi-Source Retrieval Protocol

四源同级：Brave (

web_search

) + Exa + Tavily + Grok。按意图自动选策略、调权重、做合成。

Four parallel sources: Brave (

web_search

) + Exa + Tavily + Grok. Automatically select strategies, adjust weights, and synthesize results based on intent.

执行流程

Execution Flow

用户查询
    ↓
[Phase 1] 意图分类 → 确定搜索策略
    ↓
[Phase 2] 查询分解 & 扩展 → 生成子查询
    ↓
[Phase 3] 多源并行检索 → Brave + search.py (Exa + Tavily + Grok)
    ↓
[Phase 4] 结果合并 & 排序 → 去重 + 意图加权评分
    ↓
[Phase 5] 知识合成 → 结构化输出

用户查询
    ↓
[Phase 1] 意图分类 → 确定搜索策略
    ↓
[Phase 2] 查询分解 & 扩展 → 生成子查询
    ↓
[Phase 3] 多源并行检索 → Brave + search.py (Exa + Tavily + Grok)
    ↓
[Phase 4] 结果合并 & 排序 → 去重 + 意图加权评分
    ↓
[Phase 5] 知识合成 → 结构化输出

Phase 1: 意图分类

Phase 1: Intent Classification

收到搜索请求后，先判断意图类型，再决定搜索策略。不要问用户用哪种模式。

意图	识别信号	Mode	Freshness	权重偏向
Factual	"什么是 X"、"X 的定义"、"What is X"	answer	—	权威 0.5
Status	"X 最新进展"、"X 现状"、"latest X"	deep	pw/pm	新鲜度 0.5
Comparison	"X vs Y"、"X 和 Y 区别"	deep	py	关键词 0.4 + 权威 0.4
Tutorial	"怎么做 X"、"X 教程"、"how to X"	answer	py	权威 0.5
Exploratory	"深入了解 X"、"X 生态"、"about X"	deep	—	权威 0.5
News	"X 新闻"、"本周 X"、"X this week"	deep	pd/pw	新鲜度 0.6
Resource	"X 官网"、"X GitHub"、"X 文档"	fast	—	关键词 0.5

详细分类指南见
references/intent-guide.md

判断规则：

扫描查询中的信号词
多个类型匹配时选最具体的
无法判断时默认
```
exploratory
```

After receiving a search request, first determine the intent type, then decide the search strategy. Do not ask users which mode to use.

Intent	Recognition Signals	Mode	Freshness	Weight Bias
Factual	"what is X", "definition of X", "What is X"	answer	—	Authority 0.5
Status	"latest progress of X", "current status of X", "latest X"	deep	pw/pm	Freshness 0.5
Comparison	"X vs Y", "difference between X and Y"	deep	py	Keyword 0.4 + Authority 0.4
Tutorial	"how to do X", "X tutorial", "how to X"	answer	py	Authority 0.5
Exploratory	"learn X in depth", "X ecosystem", "about X"	deep	—	Authority 0.5
News	"X news", "X this week", "X this week"	deep	pd/pw	Freshness 0.6
Resource	"X official website", "X GitHub", "X documentation"	fast	—	Keyword 0.5

See
references/intent-guide.md
for detailed classification guidelines

Judgment Rules:

Scan for signal words in the query
Select the most specific type when multiple types match
Default to
```
exploratory
```
if unable to judge

Phase 2: 查询分解 & 扩展

Phase 2: Query Decomposition & Expansion

根据意图类型，将用户查询扩展为一组子查询：

Expand the user query into a set of sub-queries based on the intent type:

通用规则

General Rules

技术同义词自动扩展：k8s→Kubernetes, JS→JavaScript, Go→Golang, Postgres→PostgreSQL
中文技术查询：同时生成英文变体（如 "Rust 异步编程" → 额外搜 "Rust async programming"）

Automatic technical synonym expansion: k8s→Kubernetes, JS→JavaScript, Go→Golang, Postgres→PostgreSQL
Chinese technical queries: generate English variants simultaneously (e.g. "Rust 异步编程" → additionally search for "Rust async programming")

按意图扩展

Expansion by Intent

意图	扩展策略	示例
Factual	加 "definition"、"explained"	"WebTransport" → "WebTransport", "WebTransport explained overview"
Status	加年份、"latest"、"update"	"Deno 进展" → "Deno 2.0 latest 2026", "Deno update release"
Comparison	拆成 3 个子查询	"Bun vs Deno" → "Bun vs Deno", "Bun advantages", "Deno advantages"
Tutorial	加 "tutorial"、"guide"、"step by step"	"Rust CLI" → "Rust CLI tutorial", "Rust CLI guide step by step"
Exploratory	拆成 2-3 个角度	"RISC-V" → "RISC-V overview", "RISC-V ecosystem", "RISC-V use cases"
News	加 "news"、"announcement"、日期	"AI 新闻" → "AI news this week 2026", "AI announcement latest"
Resource	加具体资源类型	"Anthropic MCP" → "Anthropic MCP official documentation"

Intent	Expansion Strategy	Example
Factual	Add "definition", "explained"	"WebTransport" → "WebTransport", "WebTransport explained overview"
Status	Add year, "latest", "update"	"Deno progress" → "Deno 2.0 latest 2026", "Deno update release"
Comparison	Split into 3 sub-queries	"Bun vs Deno" → "Bun vs Deno", "Bun advantages", "Deno advantages"
Tutorial	Add "tutorial", "guide", "step by step"	"Rust CLI" → "Rust CLI tutorial", "Rust CLI guide step by step"
Exploratory	Split into 2-3 perspectives	"RISC-V" → "RISC-V overview", "RISC-V ecosystem", "RISC-V use cases"
News	Add "news", "announcement", date	"AI news" → "AI news this week 2026", "AI announcement latest"
Resource	Add specific resource type	"Anthropic MCP" → "Anthropic MCP official documentation"

Phase 3: 多源并行检索

Phase 3: Multi-source Parallel Retrieval

Step 1: Brave（所有模式）

Step 1: Brave (All Modes)

对每个子查询调用

web_search

。如果意图有 freshness 要求，传

freshness

参数：

web_search(query="Deno 2.0 latest 2026", freshness="pw")

Call

web_search

for each sub-query. If the intent has freshness requirements, pass the

freshness

parameter:

web_search(query="Deno 2.0 latest 2026", freshness="pw")

Step 2: Exa + Tavily + Grok（Deep / Answer 模式）

Step 2: Exa + Tavily + Grok (Deep / Answer Mode)

对子查询调用 search.py，传入意图和 freshness：

bash

python3 /home/node/.openclaw/workspace/skills/search-layer/scripts/search.py \
  --queries "子查询1" "子查询2" "子查询3" \
  --mode deep \
  --intent status \
  --freshness pw \
  --num 5

各模式源参与矩阵：

模式	Exa	Tavily	Grok	说明
fast	✅	❌	fallback	Exa 优先；无 Exa key 时用 Grok
deep	✅	✅	✅	三源并行
answer	❌	✅	❌	仅 Tavily（含 AI answer）

参数说明：

参数	说明
`--queries`	多个子查询并行执行（也可用位置参数传单个查询）
`--mode`	fast / deep / answer
`--intent`	意图类型，影响评分权重（不传则不评分，行为与 v1 一致）
`--freshness`	pd(24h) / pw(周) / pm(月) / py(年)
`--domain-boost`	逗号分隔的域名，匹配的结果权威分 +0.2
`--num`	每源每查询的结果数

Grok 源说明：

通过 completions API 调用 Grok 模型（
```
grok-4.1-fast
```
），利用其实时知识返回结构化搜索结果
自动检测时间敏感查询并注入当前时间上下文
在 deep 模式下与 Exa、Tavily 并行执行

需要在

~/.openclaw/credentials/search.json

中配置 Grok 的

apiUrl

、

apiKey

、

model

（或通过环境变量

GROK_API_URL

、

GROK_API_KEY

、

GROK_MODEL

）

如果 Grok 配置缺失，自动降级为 Exa + Tavily 双源

Call search.py for sub-queries, pass intent and freshness:

bash

python3 /home/node/.openclaw/workspace/skills/search-layer/scripts/search.py \
  --queries "子查询1" "子查询2" "子查询3" \
  --mode deep \
  --intent status \
  --freshness pw \
  --num 5

Source Participation Matrix by Mode:

Mode	Exa	Tavily	Grok	Description
fast	✅	❌	fallback	Exa first; use Grok when no Exa key is available
deep	✅	✅	✅	Three sources run in parallel
answer	❌	✅	❌	Tavily only (includes AI answer)

Parameter Description:

Parameter	Description
`--queries`	Multiple sub-queries executed in parallel (single query can also be passed as positional parameter)
`--mode`	fast / deep / answer
`--intent`	Intent type, affects scoring weight (no scoring if not passed, behavior consistent with v1)
`--freshness`	pd(24h) / pw(week) / pm(month) / py(year)
`--domain-boost`	Comma separated domain names, matching results get +0.2 authority score
`--num`	Number of results per source per query

Grok Source Description:

Call Grok model (
```
grok-4.1-fast
```
) via completions API, return structured search results using its real-time knowledge
Automatically detect time-sensitive queries and inject current time context
Run in parallel with Exa and Tavily in deep mode

Need to configure Grok's

apiUrl

apiKey

model

~/.openclaw/credentials/search.json

(or via environment variables

GROK_API_URL

GROK_API_KEY

GROK_MODEL

)

Automatically downgrade to Exa + Tavily dual sources if Grok configuration is missing

Step 3: 合并

Step 3: Merge

将 Brave 结果与 search.py 输出合并。按 canonical URL 去重，标记来源。

如果 search.py 返回了

score

字段，用它排序；Brave 结果没有 score 的，用同样的意图权重公式补算。

Merge Brave results with search.py output. Deduplicate by canonical URL, mark sources.

If search.py returns the

score

field, use it for sorting; if Brave results have no score, calculate it using the same intent weight formula.

Phase 3.5: 引用追踪（Thread Pulling）

Phase 3.5: Reference Tracking (Thread Pulling)

当搜索结果中包含 GitHub issue/PR 链接，且意图为 Status 或 Exploratory 时，自动触发引用追踪。

When search results include GitHub issue/PR links and the intent is Status or Exploratory, automatically trigger reference tracking.

自动触发条件

Automatic Trigger Conditions

意图为
```
status
```
或
```
exploratory
```

搜索结果中包含

github.com/.../issues/

或

github.com/.../pull/

URL

Intent is
```
status
```
or
```
exploratory
```

Search results include

github.com/.../issues/

github.com/.../pull/

URLs

方式 1: search.py --extract-refs（批量）

Method 1: search.py --extract-refs (Batch)

在搜索结果上直接提取引用图，无需额外调用：

bash

python3 search.py "OpenClaw config validation bug" --mode deep --intent status --extract-refs

输出中会多一个

refs

字段，包含每个结果 URL 的引用列表。

也可以跳过搜索，直接对已知 URL 提取引用：

bash

python3 search.py --extract-refs-urls "https://github.com/owner/repo/issues/123" "https://github.com/owner/repo/issues/456"

Extract reference graph directly from search results without additional calls:

bash

python3 search.py "OpenClaw config validation bug" --mode deep --intent status --extract-refs

There will be an additional

refs

field in the output, containing the reference list for each result URL.

You can also skip searching and extract references directly for known URLs:

bash

python3 search.py --extract-refs-urls "https://github.com/owner/repo/issues/123" "https://github.com/owner/repo/issues/456"

方式 2: fetch-thread（单 URL 深度抓取）

Method 2: fetch-thread (Single URL Deep Crawl)

对单个 URL 拉取完整讨论流 + 结构化引用：

bash

python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format json
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format markdown
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --extract-refs-only

GitHub 场景（issue/PR）：通过 API 拉取正文 + 全部 comments + timeline 事件（cross-references、commits），提取：

Issue/PR 引用（#123、owner/repo#123）
Duplicate 标记
Commit 引用
关联 PR/issue（timeline cross-references）
外部 URL

通用 web 场景：web fetch + 正则提取引用链接。

Pull complete discussion stream + structured references for a single URL:

bash

python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format json
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --format markdown
python3 fetch_thread.py "https://github.com/owner/repo/issues/123" --extract-refs-only

GitHub scenario (issue/PR): Pull body + all comments + timeline events (cross-references, commits) via API, extract:

Issue/PR references (#123, owner/repo#123)
Duplicate markers
Commit references
Associated PR/issue (timeline cross-references)
External URLs

General web scenario: web fetch + regex extraction of reference links.

Agent 执行流程

Agent Execution Flow

Step 1: search-layer 搜索 → 获取初始结果
Step 2: search.py --extract-refs 或 fetch-thread → 提取线索图
Step 3: Agent 筛选高价值线索（LLM 判断哪些值得追踪）
Step 4: fetch-thread 深度抓取每个高价值线索
Step 5: 重复 Step 2-4，直到信息闭环或达到深度限制（建议 max_depth=3）

Step 1: search-layer search → Get initial results
Step 2: search.py --extract-refs or fetch-thread → Extract clue graph
Step 3: Agent filters high-value clues (LLM judges which are worth tracking)
Step 4: fetch-thread deeply crawls each high-value clue
Step 5: Repeat Step 2-4 until information is closed or depth limit is reached (recommended max_depth=3)

Phase 4: 结果排序

Phase 4: Result Sorting

评分公式

Scoring Formula

score = w_keyword × keyword_match + w_freshness × freshness_score + w_authority × authority_score

权重由意图决定（见 Phase 1 表格）。各分项：

keyword_match (0-1)：查询词在标题+摘要中的覆盖率
freshness_score (0-1)：基于发布日期，越新越高（无日期=0.5）
authority_score (0-1)：基于域名权威等级
- Tier 1 (1.0): github.com, stackoverflow.com, 官方文档站
- Tier 2 (0.8): HN, dev.to, 知名技术博客
- Tier 3 (0.6): Medium, 掘金, InfoQ
- Tier 4 (0.4): 其他

完整域名评分表见
references/authority-domains.json

score = w_keyword × keyword_match + w_freshness × freshness_score + w_authority × authority_score

Weights are determined by intent (see Phase 1 table). Each item:

keyword_match (0-1): Coverage of query terms in title + abstract
freshness_score (0-1): Based on publication date, newer is higher (no date = 0.5)
authority_score (0-1): Based on domain authority level
- Tier 1 (1.0): github.com, stackoverflow.com, official documentation sites
- Tier 2 (0.8): HN, dev.to, well-known technical blogs
- Tier 3 (0.6): Medium, Juejin, InfoQ
- Tier 4 (0.4): Others

See
references/authority-domains.json
for full domain rating table

Domain Boost

通过

--domain-boost

参数手动指定需要加权的域名（匹配的结果权威分 +0.2）：

bash

search.py "query" --mode deep --intent tutorial --domain-boost dev.to,freecodecamp.org

推荐搭配：

Tutorial →

dev.to, freecodecamp.org, realpython.com, baeldung.com

Resource →
```
github.com
```

News →

techcrunch.com, arstechnica.com, theverge.com

Manually specify domains to be weighted via the

--domain-boost

parameter (matching results get +0.2 authority score):

bash

search.py "query" --mode deep --intent tutorial --domain-boost dev.to,freecodecamp.org

Recommended combinations:

Tutorial →

dev.to, freecodecamp.org, realpython.com, baeldung.com

Resource →
```
github.com
```

News →

techcrunch.com, arstechnica.com, theverge.com

Phase 5: 知识合成

Phase 5: Knowledge Synthesis

根据结果数量选择合成策略：

Select synthesis strategy based on the number of results:

小结果集（≤5 条）

Small result set (≤5 items)

逐条展示，每条带源标签和评分：

1. [Title](url) — snippet... `[brave, exa]` ⭐0.85
2. [Title](url) — snippet... `[tavily]` ⭐0.72

Display one by one, each with source tag and score:

1. [Title](url) — snippet... `[brave, exa]` ⭐0.85
2. [Title](url) — snippet... `[tavily]` ⭐0.72

中结果集（5-15 条）

Medium result set (5-15 items)

按主题聚类 + 每组摘要：

**主题 A: [描述]**
- [结果1] — 要点... `[source]`
- [结果2] — 要点... `[source]`

**主题 B: [描述]**
- [结果3] — 要点... `[source]`

Cluster by theme + summary per group:

**Theme A: [Description]**
- [Result 1] — Key points... `[source]`
- [Result 2] — Key points... `[source]`

**Theme B: [Description]**
- [Result 3] — Key points... `[source]`

大结果集（15+ 条）

Large result set (15+ items)

高层综述 + Top 5 + 深入提示：

[一段综述，概括主要发现]

**Top 5 最相关结果：**
1. ...
2. ...

共找到 N 条结果，覆盖 [源列表]。需要深入哪个方面？

High-level overview + Top 5 + in-depth prompt:

[An overview summarizing the main findings]

**Top 5 most relevant results:**
1. ...
2. ...

Found N results in total, covering [source list]. Which aspect do you want to explore in depth?

合成规则

Synthesis Rules

先给答案，再列来源（不要先说"我搜了什么"）
按主题聚合，不按来源聚合（不要"Brave 结果：... Exa 结果：..."）
冲突信息显性标注：不同源说法矛盾时明确指出
置信度表达：
- 多源一致 + 新鲜 → 直接陈述
- 单源或较旧 → "根据 [source]，..."
- 冲突或不确定 → "存在不同说法：A 认为...，B 认为..."

Give the answer first, then list sources (don't start with "I searched for...")
Aggregate by theme, not by source (don't do "Brave results: ... Exa results: ...")
Explicitly mark conflicting information: clearly point out when different sources have contradictory statements
Confidence expression:
- Multi-source consistent + fresh → state directly
- Single source or older → "According to [source], ..."
- Conflicting or uncertain → "There are different opinions: A believes..., B believes..."

降级策略

Degradation Strategy

Exa 429/5xx → 继续 Brave + Tavily + Grok
Tavily 429/5xx → 继续 Brave + Exa + Grok
Grok 超时/错误 → 继续 Brave + Exa + Tavily
search.py 整体失败 → 仅用 Brave
```
web_search
```
（始终可用）
永远不要因为某个源失败而阻塞主流程

Exa 429/5xx → continue with Brave + Tavily + Grok
Tavily 429/5xx → continue with Brave + Exa + Grok
Grok timeout/error → continue with Brave + Exa + Tavily
Overall search.py failure → use only Brave
```
web_search
```
(always available)
Never block the main process because of a single source failure

向后兼容

Backward Compatibility

不带

--intent

参数时，search.py 行为与 v1 完全一致（无评分，按原始顺序输出）。

现有调用方（如 github-explorer）无需修改。

When the

--intent

parameter is not included, the behavior of search.py is exactly the same as v1 (no scoring, output in original order).

Existing callers (e.g. github-explorer) do not need modification.

快速参考

Quick Reference

场景	命令
快速事实	`web_search` + `search.py --mode answer --intent factual`
深度调研	`web_search` + `search.py --mode deep --intent exploratory`
最新动态	`web_search(freshness="pw")` + `search.py --mode deep --intent status --freshness pw`
对比分析	`web_search` × 3 queries + `search.py --queries "A vs B" "A pros" "B pros" --intent comparison`
找资源	`web_search` + `search.py --mode fast --intent resource`

Scenario	Command
Quick fact check	`web_search` + `search.py --mode answer --intent factual`
In-depth research	`web_search` + `search.py --mode deep --intent exploratory`
Latest updates	`web_search(freshness="pw")` + `search.py --mode deep --intent status --freshness pw`
Comparative analysis	`web_search` × 3 queries + `search.py --queries "A vs B" "A pros" "B pros" --intent comparison`
Find resources	`web_search` + `search.py --mode fast --intent resource`