enrich

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MANDATORY PREPARATION

必备准备工作

Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first. Consult the knowledge-systems reference in the agent-workflow skill for RAG architecture, chunking strategies, and retrieval patterns.

Add knowledge sources to ground the workflow in facts. Without grounding, agents hallucinate. With grounding, they cite sources.
调用 {{command_prefix}}agent-workflow —— 它包含工作流原则、反模式,以及上下文收集协议。在继续操作前请遵循该协议——如果还不存在工作流上下文,你必须先运行 {{command_prefix}}teach-maestro。 查阅agent-workflow技能中的知识系统参考,了解RAG架构、分块策略和检索模式。

添加知识源以基于事实支撑工作流。没有数据支撑时,Agent会产生幻觉。有数据支撑时,它们可以引用来源。

Knowledge Source Assessment

知识源评估

Identify what knowledge the workflow needs:
Knowledge TypeSourceUpdate FrequencyAccess Pattern
Domain docsInternal docs, specsMonthlySemantic search
Code contextCodebaseReal-timeCode search
User dataDatabase, CRMReal-timeStructured query
External dataAPIs, webReal-timeAPI call
HistoricalLogs, past interactionsDailyTime-range query
确定工作流需要哪些知识:
知识类型来源更新频率访问模式
领域文档内部文档、规格说明每月语义搜索
代码上下文代码库实时代码搜索
用户数据数据库、CRM实时结构化查询
外部数据API、网页实时API调用
历史数据日志、过往交互记录每日时间范围查询

Add RAG Pipeline

添加RAG Pipeline

For document-based knowledge (consult the knowledge-systems reference in the agent-workflow skill):
  1. Select documents: Identify the authoritative source documents
  2. Chunk strategy: Choose chunking based on document type (semantic > token-based)
  3. Embed: Use appropriate embedding model for the domain
  4. Index: Store in vector database with metadata
  5. Retrieve: Implement hybrid search (semantic + keyword)
  6. Inject: Add retrieved context to the prompt with source attribution
针对基于文档的知识(请查阅agent-workflow技能中的知识系统参考):
  1. 选择文档:确定权威的源文档
  2. 分块策略:根据文档类型选择分块方式(语义分块 > 基于token的分块)
  3. 嵌入:为对应领域选择合适的嵌入模型
  4. 索引:存储到带元数据的向量数据库中
  5. 检索:实现混合搜索(语义 + 关键词)
  6. 注入:将检索到的上下文添加到prompt中,并标注来源归属

Add Structured Data

添加结构化数据

For database-backed knowledge:
  1. Define the query interface: Natural language → structured query
  2. Add guardrails: Read-only access, query complexity limits
  3. Format results: Transform raw data into context the model can use
  4. Attribute: Include data source and freshness in the context
针对数据库支撑的知识:
  1. 定义查询接口:自然语言 → 结构化查询
  2. 添加防护规则:只读访问、查询复杂度限制
  3. 结果格式化:将原始数据转换为模型可使用的上下文
  4. 归属标注:在上下文中包含数据源和数据新鲜度

Add Real-Time Data

添加实时数据

For live information:
  1. Identify APIs: What external services provide the needed data
  2. Cache strategy: How often does the data change? Cache accordingly
  3. Fallback: What happens when the API is down?
  4. Attribution: Include data timestamp and source
针对实时信息:
  1. 确定API:哪些外部服务可提供所需数据
  2. 缓存策略:数据的更新频率是多少?据此设置缓存
  3. 降级方案:API不可用时的处理逻辑是什么?
  4. 归属标注:包含数据时间戳和来源

Enrichment Checklist

丰富度检查清单

  • Every knowledge source has attribution (source, date, confidence)
  • Retrieval quality tested independently of generation quality
  • Chunk sizes tested and optimized for the document types
  • Fallbacks exist for all external knowledge sources
  • Knowledge base has a refresh/update strategy
  • PII is handled appropriately in knowledge sources
  • 每个知识源都有归属标注(来源、日期、置信度)
  • 检索质量已独立于生成质量完成测试
  • 分块大小已针对文档类型完成测试和优化
  • 所有外部知识源都有降级方案
  • 知识库具备刷新/更新策略
  • 知识源中的PII已得到妥善处理

Recommended Next Step

推荐下一步操作

After enrichment, run
{{command_prefix}}evaluate
to test retrieval quality, or
{{command_prefix}}iterate
to set up continuous monitoring of knowledge freshness.
NEVER:
  • Index everything without curation (garbage in = garbage out)
  • Skip source attribution (hallucination without attribution is undetectable)
  • Build RAG without testing retrieval quality first
  • Use fixed chunk sizes for all document types
  • Assume embedding similarity equals relevance
完成知识丰富后,运行
{{command_prefix}}evaluate
测试检索质量,或运行
{{command_prefix}}iterate
设置知识新鲜度的持续监控。
严禁操作
  • 不加筛选就索引所有内容(垃圾进 = 垃圾出)
  • 跳过来源归属标注(没有归属的幻觉无法被检测)
  • 未先测试检索质量就构建RAG
  • 对所有文档类型使用固定分块大小
  • 假设嵌入相似度等同于相关性