knowledge-synthesis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKnowledge Synthesis
知识整合
The last mile of enterprise search. Takes raw results from multiple sources and produces a coherent, trustworthy answer.
企业搜索的最后一公里环节。将来自多个来源的原始结果转化为连贯、可信的答案。
The Goal
目标
Transform this:
~~chat result: "Sarah said in #eng: 'let's go with REST, GraphQL is overkill for our use case'"
~~email result: "Subject: API Decision — Sarah's email confirming REST approach with rationale"
~~cloud storage result: "API Design Doc v3 — updated section 2 to reflect REST decision"
~~project tracker result: "Task: Finalize API approach — marked complete by Sarah"Into this:
The team decided to go with REST over GraphQL for the API redesign. Sarah made the
call, noting that GraphQL was overkill for the current use case. This was discussed
in #engineering on Tuesday, confirmed via email Wednesday, and the design doc has
been updated to reflect the decision. The related ~~project tracker task is marked complete.
Sources:
- ~~chat: #engineering thread (Jan 14)
- ~~email: "API Decision" from Sarah (Jan 15)
- ~~cloud storage: "API Design Doc v3" (updated Jan 15)
- ~~project tracker: "Finalize API approach" (completed Jan 15)将以下内容转化为:
~~chat result: "Sarah said in #eng: 'let's go with REST, GraphQL is overkill for our use case'"
~~email result: "Subject: API Decision — Sarah's email confirming REST approach with rationale"
~~cloud storage result: "API Design Doc v3 — updated section 2 to reflect REST decision"
~~project tracker result: "Task: Finalize API approach — marked complete by Sarah"转化为:
The team decided to go with REST over GraphQL for the API redesign. Sarah made the
call, noting that GraphQL was overkill for the current use case. This was discussed
in #engineering on Tuesday, confirmed via email Wednesday, and the design doc has
been updated to reflect the decision. The related ~~project tracker task is marked complete.
Sources:
- ~~chat: #engineering thread (Jan 14)
- ~~email: "API Decision" from Sarah (Jan 15)
- ~~cloud storage: "API Design Doc v3" (updated Jan 15)
- ~~project tracker: "Finalize API approach" (completed Jan 15)Deduplication
去重
Cross-Source Deduplication
跨来源去重
The same information often appears in multiple places. Identify and merge duplicates:
Signals that results are about the same thing:
- Same or very similar text content
- Same author/sender
- Timestamps within a short window (same day or adjacent days)
- References to the same entity (project name, document, decision)
- One source references another ("as discussed in ~~chat", "per the email", "see the doc")
How to merge:
- Combine into a single narrative item
- Cite all sources where it appeared
- Use the most complete version as the primary text
- Add unique details from each source
同一信息常出现在多个地方。需识别并合并重复内容:
表明结果指向同一内容的信号:
- 文本内容相同或高度相似
- 作者/发件人相同
- 时间戳在短时间范围内(同一天或相邻几天)
- 引用同一实体(项目名称、文档、决策)
- 一个来源引用另一个来源("如在~~chat中讨论的"、"根据邮件内容"、"参见文档")
合并方式:
- 整合为单个叙述条目
- 标注所有出现该信息的来源
- 以最完整的版本作为主要文本
- 添加每个来源中的独特细节
Deduplication Priority
去重优先级
When the same information exists in multiple sources, prefer:
1. The most complete version (fullest context)
2. The most authoritative source (official doc > chat)
3. The most recent version (latest update wins for evolving info)当同一信息存在于多个来源时,优先选择:
1. 最完整的版本(上下文最丰富)
2. 最权威的来源(官方文档 > 聊天记录)
3. 最新的版本(针对不断演变的信息,最新更新优先)What NOT to Deduplicate
无需去重的情况
Keep as separate items when:
- The same topic is discussed but with different conclusions
- Different people express different viewpoints
- The information evolved meaningfully between sources (v1 vs v2 of a decision)
- Different time periods are represented
以下情况需保留为独立条目:
- 同一主题但结论不同
- 不同人表达不同观点
- 信息在不同来源间发生了有意义的演变(决策的v1 vs v2版本)
- 涉及不同时间段
Citation and Source Attribution
引用与来源标注
Every claim in the synthesized answer must be attributable to a source.
整合后的答案中所有主张都必须标注来源。
Attribution Format
标注格式
Inline for direct references:
Sarah confirmed the REST approach in her email on Wednesday.
The design doc was updated to reflect this (~~cloud storage: "API Design Doc v3").Source list at the end for completeness:
Sources:
- ~~chat: #engineering discussion (Jan 14) — initial decision thread
- ~~email: "API Decision" from Sarah Chen (Jan 15) — formal confirmation
- ~~cloud storage: "API Design Doc v3" last modified Jan 15 — updated specification直接引用时采用内联标注:
Sarah confirmed the REST approach in her email on Wednesday.
The design doc was updated to reflect this (~~cloud storage: "API Design Doc v3").末尾添加来源列表以保证完整性:
Sources:
- ~~chat: #engineering discussion (Jan 14) — initial decision thread
- ~~email: "API Decision" from Sarah Chen (Jan 15) — formal confirmation
- ~~cloud storage: "API Design Doc v3" last modified Jan 15 — updated specificationAttribution Rules
标注规则
- Always name the source type (~~chat, ~~email, ~~cloud storage, etc.)
- Include the specific location (channel, folder, thread)
- Include the date or relative time
- Include the author when relevant
- Include document/thread titles when available
- For ~~chat, note the channel name
- For ~~email, note the subject line and sender
- For ~~cloud storage, note the document title
- 始终注明来源类型(~~chat、~~email、~~cloud storage等)
- 包含具体位置(频道、文件夹、线程)
- 包含日期或相对时间
- 相关时注明作者
- 如有可用,包含文档/线程标题
- 对于~~chat,注明频道名称
- 对于~~email,注明主题行和发件人
- 对于~~cloud storage,注明文档标题
Confidence Levels
置信度等级
Not all results are equally trustworthy. Assess confidence based on:
并非所有结果的可信度都相同。需基于以下因素评估置信度:
Freshness
时效性
| Recency | Confidence impact |
|---|---|
| Today / yesterday | High confidence for current state |
| This week | Good confidence |
| This month | Moderate — things may have changed |
| Older than a month | Lower confidence — flag as potentially outdated |
For status queries, heavily weight freshness. For policy/factual queries, freshness matters less.
| 时效性 | 对置信度的影响 |
|---|---|
| 今天/昨天 | 对当前状态的置信度高 |
| 本周 | 置信度良好 |
| 本月 | 中等 — 情况可能已发生变化 |
| 早于一个月 | 置信度较低 — 标记为可能已过时 |
对于状态查询,需重点考量时效性。对于政策/事实查询,时效性的影响较小。
Authority
权威性
| Source type | Authority level |
|---|---|
| Official wiki / knowledge base | Highest — curated, maintained |
| Shared documents (final versions) | High — intentionally published |
| Email announcements | High — formal communication |
| Meeting notes | Moderate-high — may be incomplete |
| Chat messages (thread conclusions) | Moderate — informal but real-time |
| Chat messages (mid-thread) | Lower — may not reflect final position |
| Draft documents | Low — not finalized |
| Task comments | Contextual — depends on commenter |
| 来源类型 | 权威等级 |
|---|---|
| 官方维基/知识库 | 最高 — 经过整理、维护 |
| 共享文档(最终版本) | 高 — 有意发布的内容 |
| 邮件公告 | 高 — 正式沟通内容 |
| 会议纪要 | 中高 — 可能不完整 |
| 聊天消息(线程结论) | 中等 — 非正式但实时 |
| 聊天消息(线程中途内容) | 较低 — 可能不反映最终立场 |
| 草稿文档 | 低 — 未最终确定 |
| 任务评论 | 取决于上下文 — 视评论者而定 |
Expressing Confidence
置信度表达
When confidence is high (multiple fresh, authoritative sources agree):
The team decided to use REST for the API redesign. [direct statement]When confidence is moderate (single source or somewhat dated):
Based on the discussion in #engineering last month, the team was leaning
toward REST for the API redesign. This may have evolved since then.When confidence is low (old data, informal source, or conflicting signals):
I found a reference to an API migration discussion from three months ago
in ~~chat, but I couldn't find a formal decision document. The information
may be outdated. You might want to check with the team for current status.当置信度高(多个时效性强、权威的来源达成一致):
The team decided to use REST for the API redesign. [direct statement]当置信度中等(单一来源或时效性一般):
Based on the discussion in #engineering last month, the team was leaning
toward REST for the API redesign. This may have evolved since then.当置信度低(数据老旧、来源非正式或存在冲突信号):
I found a reference to an API migration discussion from three months ago
in ~~chat, but I couldn't find a formal decision document. The information
may be outdated. You might want to check with the team for current status.Conflicting Information
信息冲突处理
When sources disagree:
I found conflicting information about the API approach:
- The ~~chat discussion on Jan 10 suggested GraphQL
- But Sarah's email on Jan 15 confirmed REST
- The design doc (updated Jan 15) reflects REST
The most recent sources indicate REST was the final decision,
but the earlier ~~chat discussion explored GraphQL first.Always surface conflicts rather than silently picking one version.
当来源信息存在冲突时:
I found conflicting information about the API approach:
- The ~~chat discussion on Jan 10 suggested GraphQL
- But Sarah's email on Jan 15 confirmed REST
- The design doc (updated Jan 15) reflects REST
The most recent sources indicate REST was the final decision,
but the earlier ~~chat discussion explored GraphQL first.始终要明确呈现冲突,而非默认选择某一版本。
Summarization Strategies
总结策略
For Small Result Sets (1-5 results)
小型结果集(1-5条结果)
Present each result with context. No summarization needed — give the user everything:
[Direct answer synthesized from results]
[Detail from source 1]
[Detail from source 2]
Sources: [full attribution]呈现每条结果的上下文。无需总结 — 向用户提供全部内容:
[Direct answer synthesized from results]
[Detail from source 1]
[Detail from source 2]
Sources: [full attribution]For Medium Result Sets (5-15 results)
中型结果集(5-15条结果)
Group by theme and summarize each group:
[Overall answer]
Theme 1: [summary of related results]
Theme 2: [summary of related results]
Key sources: [top 3-5 most relevant sources]
Full results: [count] items found across [sources]按主题分组并总结每组内容:
[Overall answer]
Theme 1: [summary of related results]
Theme 2: [summary of related results]
Key sources: [top 3-5 most relevant sources]
Full results: [count] items found across [sources]For Large Result Sets (15+ results)
大型结果集(15条以上结果)
Provide a high-level synthesis with the option to drill down:
[Overall answer based on most relevant results]
Summary:
- [Key finding 1] (supported by N sources)
- [Key finding 2] (supported by N sources)
- [Key finding 3] (supported by N sources)
Top sources:
- [Most authoritative/relevant source]
- [Second most relevant]
- [Third most relevant]
Found [total count] results across [source list].
Want me to dig deeper into any specific aspect?提供高层级整合内容,并提供深入挖掘的选项:
[Overall answer based on most relevant results]
Summary:
- [Key finding 1] (supported by N sources)
- [Key finding 2] (supported by N sources)
- [Key finding 3] (supported by N sources)
Top sources:
- [Most authoritative/relevant source]
- [Second most relevant]
- [Third most relevant]
Found [total count] results across [source list].
Want me to dig deeper into any specific aspect?Summarization Rules
总结规则
- Lead with the answer, not the search process
- Do not list raw results — synthesize them into narrative
- Group related items from different sources together
- Preserve important nuance and caveats
- Include enough detail that the user can decide whether to dig deeper
- Always offer to provide more detail if the result set was large
- 以答案开头,而非搜索过程
- 不要罗列原始结果 — 将其整合成叙述性内容
- 将来自不同来源的相关内容分组
- 保留重要的细微差别和警告信息
- 包含足够的细节,使用户能够决定是否需要深入挖掘
- 当结果集较大时,始终提供进一步深入的选项
Synthesis Workflow
整合工作流
[Raw results from all sources]
↓
[1. Deduplicate — merge same info from different sources]
↓
[2. Cluster — group related results by theme/topic]
↓
[3. Rank — order clusters and items by relevance to query]
↓
[4. Assess confidence — freshness × authority × agreement]
↓
[5. Synthesize — produce narrative answer with attribution]
↓
[6. Format — choose appropriate detail level for result count]
↓
[Coherent answer with sources][Raw results from all sources]
↓
[1. Deduplicate — merge same info from different sources]
↓
[2. Cluster — group related results by theme/topic]
↓
[3. Rank — order clusters and items by relevance to query]
↓
[4. Assess confidence — freshness × authority × agreement]
↓
[5. Synthesize — produce narrative answer with attribution]
↓
[6. Format — choose appropriate detail level for result count]
↓
[Coherent answer with sources]Anti-Patterns
反模式
Do not:
- List results source by source ("From ~~chat: ... From ~~email: ... From ~~cloud storage: ...")
- Include irrelevant results just because they matched a keyword
- Bury the answer under methodology explanation
- Present conflicting info without flagging the conflict
- Omit source attribution
- Present uncertain information with the same confidence as well-supported facts
- Summarize so aggressively that useful detail is lost
Do:
- Lead with the answer
- Group by topic, not by source
- Flag confidence levels when appropriate
- Surface conflicts explicitly
- Attribute all claims to sources
- Offer to go deeper when result sets are large
请勿:
- 按来源逐条列出结果("来自
chat:... 来自email:... 来自~~cloud storage:...") - 因匹配关键词而包含无关结果
- 将答案隐藏在方法说明之下
- 呈现冲突信息却不标记冲突
- 省略来源标注
- 将不确定的信息与有充分依据的事实以相同置信度呈现
- 过度总结导致有用细节丢失
请务必:
- 以答案开头
- 按主题分组,而非按来源分组
- 适当标记置信度等级
- 明确呈现冲突信息
- 为所有主张标注来源
- 当结果集较大时,提供深入挖掘的选项