knowledge-synthesis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Knowledge Synthesis

知识整合

The last mile of enterprise search. Takes raw results from multiple sources and produces a coherent, trustworthy answer.

企业搜索的最后一公里环节。将来自多个来源的原始结果转化为连贯、可信的答案。

The Goal

目标

Transform this:

~~chat result: "Sarah said in #eng: 'let's go with REST, GraphQL is overkill for our use case'"
~~email result: "Subject: API Decision — Sarah's email confirming REST approach with rationale"
~~cloud storage result: "API Design Doc v3 — updated section 2 to reflect REST decision"
~~project tracker result: "Task: Finalize API approach — marked complete by Sarah"

Into this:

The team decided to go with REST over GraphQL for the API redesign. Sarah made the
call, noting that GraphQL was overkill for the current use case. This was discussed
in #engineering on Tuesday, confirmed via email Wednesday, and the design doc has
been updated to reflect the decision. The related ~~project tracker task is marked complete.

Sources:
- ~~chat: #engineering thread (Jan 14)
- ~~email: "API Decision" from Sarah (Jan 15)
- ~~cloud storage: "API Design Doc v3" (updated Jan 15)
- ~~project tracker: "Finalize API approach" (completed Jan 15)

将以下内容转化为：

~~chat result: "Sarah said in #eng: 'let's go with REST, GraphQL is overkill for our use case'"
~~email result: "Subject: API Decision — Sarah's email confirming REST approach with rationale"
~~cloud storage result: "API Design Doc v3 — updated section 2 to reflect REST decision"
~~project tracker result: "Task: Finalize API approach — marked complete by Sarah"

转化为：

The team decided to go with REST over GraphQL for the API redesign. Sarah made the
call, noting that GraphQL was overkill for the current use case. This was discussed
in #engineering on Tuesday, confirmed via email Wednesday, and the design doc has
been updated to reflect the decision. The related ~~project tracker task is marked complete.

Sources:
- ~~chat: #engineering thread (Jan 14)
- ~~email: "API Decision" from Sarah (Jan 15)
- ~~cloud storage: "API Design Doc v3" (updated Jan 15)
- ~~project tracker: "Finalize API approach" (completed Jan 15)

Deduplication

去重

Cross-Source Deduplication

跨来源去重

The same information often appears in multiple places. Identify and merge duplicates:

Signals that results are about the same thing:

Same or very similar text content
Same author/sender
Timestamps within a short window (same day or adjacent days)
References to the same entity (project name, document, decision)
One source references another ("as discussed in ~~chat", "per the email", "see the doc")

How to merge:

Combine into a single narrative item
Cite all sources where it appeared
Use the most complete version as the primary text
Add unique details from each source

同一信息常出现在多个地方。需识别并合并重复内容：

表明结果指向同一内容的信号：

文本内容相同或高度相似
作者/发件人相同
时间戳在短时间范围内（同一天或相邻几天）
引用同一实体（项目名称、文档、决策）
一个来源引用另一个来源（"如在~~chat中讨论的"、"根据邮件内容"、"参见文档"）

合并方式：

整合为单个叙述条目
标注所有出现该信息的来源
以最完整的版本作为主要文本
添加每个来源中的独特细节

Deduplication Priority

去重优先级

When the same information exists in multiple sources, prefer:

1. The most complete version (fullest context)
2. The most authoritative source (official doc > chat)
3. The most recent version (latest update wins for evolving info)

当同一信息存在于多个来源时，优先选择：

1. 最完整的版本（上下文最丰富）
2. 最权威的来源（官方文档 > 聊天记录）
3. 最新的版本（针对不断演变的信息，最新更新优先）

What NOT to Deduplicate

无需去重的情况

Keep as separate items when:

The same topic is discussed but with different conclusions
Different people express different viewpoints
The information evolved meaningfully between sources (v1 vs v2 of a decision)
Different time periods are represented

以下情况需保留为独立条目：

同一主题但结论不同
不同人表达不同观点
信息在不同来源间发生了有意义的演变（决策的v1 vs v2版本）
涉及不同时间段

Citation and Source Attribution

引用与来源标注

Every claim in the synthesized answer must be attributable to a source.

整合后的答案中所有主张都必须标注来源。

Attribution Format

标注格式

Inline for direct references:

Sarah confirmed the REST approach in her email on Wednesday.
The design doc was updated to reflect this (~~cloud storage: "API Design Doc v3").

Source list at the end for completeness:

Sources:
- ~~chat: #engineering discussion (Jan 14) — initial decision thread
- ~~email: "API Decision" from Sarah Chen (Jan 15) — formal confirmation
- ~~cloud storage: "API Design Doc v3" last modified Jan 15 — updated specification

直接引用时采用内联标注：

Sarah confirmed the REST approach in her email on Wednesday.
The design doc was updated to reflect this (~~cloud storage: "API Design Doc v3").

末尾添加来源列表以保证完整性：

Sources:
- ~~chat: #engineering discussion (Jan 14) — initial decision thread
- ~~email: "API Decision" from Sarah Chen (Jan 15) — formal confirmation
- ~~cloud storage: "API Design Doc v3" last modified Jan 15 — updated specification

Attribution Rules

标注规则

Always name the source type (~~chat, ~~email, ~~cloud storage, etc.)
Include the specific location (channel, folder, thread)
Include the date or relative time
Include the author when relevant
Include document/thread titles when available
For ~~chat, note the channel name
For ~~email, note the subject line and sender
For ~~cloud storage, note the document title

始终注明来源类型（~~chat、~~email、~~cloud storage等）
包含具体位置（频道、文件夹、线程）
包含日期或相对时间
相关时注明作者
如有可用，包含文档/线程标题
对于~~chat，注明频道名称
对于~~email，注明主题行和发件人
对于~~cloud storage，注明文档标题

Confidence Levels

置信度等级

Not all results are equally trustworthy. Assess confidence based on:

并非所有结果的可信度都相同。需基于以下因素评估置信度：

Freshness

时效性

Recency	Confidence impact
Today / yesterday	High confidence for current state
This week	Good confidence
This month	Moderate — things may have changed
Older than a month	Lower confidence — flag as potentially outdated

For status queries, heavily weight freshness. For policy/factual queries, freshness matters less.

时效性	对置信度的影响
今天/昨天	对当前状态的置信度高
本周	置信度良好
本月	中等 — 情况可能已发生变化
早于一个月	置信度较低 — 标记为可能已过时

对于状态查询，需重点考量时效性。对于政策/事实查询，时效性的影响较小。

Authority

权威性

Source type	Authority level
Official wiki / knowledge base	Highest — curated, maintained
Shared documents (final versions)	High — intentionally published
Email announcements	High — formal communication
Meeting notes	Moderate-high — may be incomplete
Chat messages (thread conclusions)	Moderate — informal but real-time
Chat messages (mid-thread)	Lower — may not reflect final position
Draft documents	Low — not finalized
Task comments	Contextual — depends on commenter

来源类型	权威等级
官方维基/知识库	最高 — 经过整理、维护
共享文档（最终版本）	高 — 有意发布的内容
邮件公告	高 — 正式沟通内容
会议纪要	中高 — 可能不完整
聊天消息（线程结论）	中等 — 非正式但实时
聊天消息（线程中途内容）	较低 — 可能不反映最终立场
草稿文档	低 — 未最终确定
任务评论	取决于上下文 — 视评论者而定

Expressing Confidence

置信度表达

When confidence is high (multiple fresh, authoritative sources agree):

The team decided to use REST for the API redesign. [direct statement]

When confidence is moderate (single source or somewhat dated):

Based on the discussion in #engineering last month, the team was leaning
toward REST for the API redesign. This may have evolved since then.

When confidence is low (old data, informal source, or conflicting signals):

I found a reference to an API migration discussion from three months ago
in ~~chat, but I couldn't find a formal decision document. The information
may be outdated. You might want to check with the team for current status.

当置信度高（多个时效性强、权威的来源达成一致）：

The team decided to use REST for the API redesign. [direct statement]

当置信度中等（单一来源或时效性一般）：

Based on the discussion in #engineering last month, the team was leaning
toward REST for the API redesign. This may have evolved since then.

当置信度低（数据老旧、来源非正式或存在冲突信号）：

I found a reference to an API migration discussion from three months ago
in ~~chat, but I couldn't find a formal decision document. The information
may be outdated. You might want to check with the team for current status.

Conflicting Information

信息冲突处理

When sources disagree:

I found conflicting information about the API approach:
- The ~~chat discussion on Jan 10 suggested GraphQL
- But Sarah's email on Jan 15 confirmed REST
- The design doc (updated Jan 15) reflects REST

The most recent sources indicate REST was the final decision,
but the earlier ~~chat discussion explored GraphQL first.

Always surface conflicts rather than silently picking one version.

当来源信息存在冲突时：

I found conflicting information about the API approach:
- The ~~chat discussion on Jan 10 suggested GraphQL
- But Sarah's email on Jan 15 confirmed REST
- The design doc (updated Jan 15) reflects REST

The most recent sources indicate REST was the final decision,
but the earlier ~~chat discussion explored GraphQL first.

始终要明确呈现冲突，而非默认选择某一版本。

Summarization Strategies

总结策略

For Small Result Sets (1-5 results)

小型结果集（1-5条结果）

Present each result with context. No summarization needed — give the user everything:

[Direct answer synthesized from results]

[Detail from source 1]
[Detail from source 2]

Sources: [full attribution]

呈现每条结果的上下文。无需总结 — 向用户提供全部内容：

[Direct answer synthesized from results]

[Detail from source 1]
[Detail from source 2]

Sources: [full attribution]

For Medium Result Sets (5-15 results)

中型结果集（5-15条结果）

Group by theme and summarize each group:

[Overall answer]

Theme 1: [summary of related results]
Theme 2: [summary of related results]

Key sources: [top 3-5 most relevant sources]
Full results: [count] items found across [sources]

按主题分组并总结每组内容：

[Overall answer]

Theme 1: [summary of related results]
Theme 2: [summary of related results]

Key sources: [top 3-5 most relevant sources]
Full results: [count] items found across [sources]

For Large Result Sets (15+ results)

大型结果集（15条以上结果）

Provide a high-level synthesis with the option to drill down:

[Overall answer based on most relevant results]

Summary:
- [Key finding 1] (supported by N sources)
- [Key finding 2] (supported by N sources)
- [Key finding 3] (supported by N sources)

Top sources:
- [Most authoritative/relevant source]
- [Second most relevant]
- [Third most relevant]

Found [total count] results across [source list].
Want me to dig deeper into any specific aspect?

提供高层级整合内容，并提供深入挖掘的选项：

[Overall answer based on most relevant results]

Summary:
- [Key finding 1] (supported by N sources)
- [Key finding 2] (supported by N sources)
- [Key finding 3] (supported by N sources)

Top sources:
- [Most authoritative/relevant source]
- [Second most relevant]
- [Third most relevant]

Found [total count] results across [source list].
Want me to dig deeper into any specific aspect?

Summarization Rules

总结规则

Lead with the answer, not the search process
Do not list raw results — synthesize them into narrative
Group related items from different sources together
Preserve important nuance and caveats
Include enough detail that the user can decide whether to dig deeper
Always offer to provide more detail if the result set was large

以答案开头，而非搜索过程
不要罗列原始结果 — 将其整合成叙述性内容
将来自不同来源的相关内容分组
保留重要的细微差别和警告信息
包含足够的细节，使用户能够决定是否需要深入挖掘
当结果集较大时，始终提供进一步深入的选项

Synthesis Workflow

整合工作流

[Raw results from all sources]
          ↓
[1. Deduplicate — merge same info from different sources]
          ↓
[2. Cluster — group related results by theme/topic]
          ↓
[3. Rank — order clusters and items by relevance to query]
          ↓
[4. Assess confidence — freshness × authority × agreement]
          ↓
[5. Synthesize — produce narrative answer with attribution]
          ↓
[6. Format — choose appropriate detail level for result count]
          ↓
[Coherent answer with sources]

[Raw results from all sources]
          ↓
[1. Deduplicate — merge same info from different sources]
          ↓
[2. Cluster — group related results by theme/topic]
          ↓
[3. Rank — order clusters and items by relevance to query]
          ↓
[4. Assess confidence — freshness × authority × agreement]
          ↓
[5. Synthesize — produce narrative answer with attribution]
          ↓
[6. Format — choose appropriate detail level for result count]
          ↓
[Coherent answer with sources]

Anti-Patterns

反模式

Do not:

List results source by source ("From ~~chat: ... From ~~email: ... From ~~cloud storage: ...")
Include irrelevant results just because they matched a keyword
Bury the answer under methodology explanation
Present conflicting info without flagging the conflict
Omit source attribution
Present uncertain information with the same confidence as well-supported facts
Summarize so aggressively that useful detail is lost

Do:

Lead with the answer
Group by topic, not by source
Flag confidence levels when appropriate
Surface conflicts explicitly
Attribute all claims to sources
Offer to go deeper when result sets are large

请勿：

按来源逐条列出结果（"来自~~chat：... 来自~~email：... 来自~~cloud storage：..."）
因匹配关键词而包含无关结果
将答案隐藏在方法说明之下
呈现冲突信息却不标记冲突
省略来源标注
将不确定的信息与有充分依据的事实以相同置信度呈现
过度总结导致有用细节丢失

请务必：

以答案开头
按主题分组，而非按来源分组
适当标记置信度等级
明确呈现冲突信息
为所有主张标注来源
当结果集较大时，提供深入挖掘的选项