rag-architect
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRAG Architect
RAG架构师
The agent designs, implements, and optimizes production-grade Retrieval-Augmented Generation pipelines, covering the full lifecycle from document chunking through evaluation.
该Agent负责设计、实现并优化生产级检索增强生成(Retrieval-Augmented Generation)管道,覆盖从文档分块到评估的完整生命周期。
Workflow
工作流程
- Analyse corpus -- Profile the document collection: count, average length, format mix (PDF, HTML, Markdown), language(s), and domain. Validate that sample documents are accessible before proceeding.
- Select chunking strategy -- Choose from the Chunking Strategy Matrix based on corpus characteristics. Set chunk size, overlap, and boundary rules. Run a test split on 100 sample documents.
- Choose embedding model -- Select an embedding model from the Embedding Model table based on domain, latency budget, and cost constraints. Verify dimension compatibility with the target vector database.
- Select vector database -- Pick a vector store from the Vector Database Comparison based on scale, query patterns, and operational requirements.
- Design retrieval pipeline -- Configure retrieval strategy (dense, sparse, or hybrid). Add reranking if precision requirements exceed 0.85. Set the top-K parameter and similarity threshold.
- Implement query transformations -- If query-document style mismatch exists, enable HyDE. If queries are ambiguous, enable multi-query generation. Validate each transformation improves retrieval metrics on a held-out set.
- Configure guardrails -- Enable PII detection, toxicity filtering, hallucination detection, and source attribution. Set confidence score thresholds.
- Evaluate end-to-end -- Run the RAGAS evaluation framework. Verify faithfulness > 0.90, context relevance > 0.80, answer relevance > 0.85. Iterate on weak components.
- 分析语料库 -- 分析文档集合概况:数量、平均长度、格式组合(PDF、HTML、Markdown)、语言及领域。在继续操作前验证样本文档可访问。
- 选择分块策略 -- 根据语料库特征从分块策略矩阵中选择合适策略。设置分块大小、重叠率和边界规则。对100份样本文档进行测试拆分。
- 选择嵌入模型 -- 根据领域、延迟预算和成本限制从嵌入模型对照表中选择模型。验证模型维度与目标向量数据库兼容。
- 选择向量数据库 -- 根据规模、查询模式和运维需求从向量数据库对比表中选择向量存储。
- 设计检索管道 -- 配置检索策略(稠密检索、稀疏检索或混合检索)。若精度要求超过0.85,添加重排序环节。设置top-K参数和相似度阈值。
- 实现查询转换 -- 若查询与文档风格不匹配,启用HyDE。若查询存在歧义,启用多查询生成。在预留数据集上验证每种转换是否提升检索指标。
- 配置防护机制 -- 启用PII检测、毒性过滤、幻觉检测和来源归因功能。设置置信度分数阈值。
- 端到端评估 -- 运行RAGAS评估框架。验证忠实度>0.90、上下文相关性>0.80、答案相关性>0.85。针对薄弱环节迭代优化。
Chunking Strategy Matrix
分块策略矩阵
| Strategy | Best For | Chunk Size | Overlap | Pros | Cons |
|---|---|---|---|---|---|
| Fixed-size (token) | Uniform docs, consistent sizing | 512-2048 tokens | 10-20% | Predictable, simple | Breaks semantic units |
| Sentence-based | Narrative text, articles | 3-8 sentences | 1 sentence | Preserves language boundaries | Variable sizes |
| Paragraph-based | Structured docs, technical manuals | 1-3 paragraphs | 0-1 paragraph | Preserves topic coherence | Highly variable sizes |
| Semantic | Long-form, research papers | Dynamic | Topic-shift detection | Best coherence | Computationally expensive |
| Recursive | Mixed content types | Dynamic, multi-level | Per-level | Optimal utilization | Complex implementation |
| Document-aware | Multi-format collections | Format-specific | Section-level | Preserves metadata | Format-specific code required |
| 策略 | 适用场景 | 分块大小 | 重叠率 | 优势 | 劣势 |
|---|---|---|---|---|---|
| Fixed-size (token) | 格式统一、篇幅一致的文档 | 512-2048 tokens | 10-20% | 可预测、实现简单 | 破坏语义单元 |
| Sentence-based | 叙事文本、文章类内容 | 3-8个句子 | 1个句子 | 保留语言边界 | 分块大小不稳定 |
| Paragraph-based | 结构化文档、技术手册 | 1-3个段落 | 0-1个段落 | 保留主题连贯性 | 分块大小差异极大 |
| Semantic | 长篇内容、研究论文 | 动态调整 | 主题转移检测 | 语义连贯性最佳 | 计算成本高 |
| Recursive | 混合内容类型 | 动态多层级 | 按层级设置 | 资源利用率最优 | 实现复杂 |
| Document-aware | 多格式集合 | 格式特定 | 章节级 | 保留元数据 | 需要格式特定代码 |
Embedding Model Comparison
嵌入模型对比表
| Model | Dimensions | Speed | Quality | Cost | Best For |
|---|---|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | ~14K tok/s | Good | Free (local) | Prototyping, low-latency |
| all-mpnet-base-v2 | 768 | ~2.8K tok/s | Better | Free (local) | Balanced production use |
| text-embedding-3-small | 1536 | API | High | $0.02/1M tokens | Cost-effective production |
| text-embedding-3-large | 3072 | API | Highest | $0.13/1M tokens | Maximum quality |
| Domain fine-tuned | Varies | Varies | Domain-best | Training cost | Specialized domains (legal, medical) |
| 模型 | 维度 | 速度 | 质量 | 成本 | 适用场景 |
|---|---|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | ~14K tok/s | 良好 | 免费(本地部署) | 原型开发、低延迟场景 |
| all-mpnet-base-v2 | 768 | ~2.8K tok/s | 更优 | 免费(本地部署) | 平衡型生产场景 |
| text-embedding-3-small | 1536 | API调用 | 高 | $0.02/1M tokens | 高性价比生产场景 |
| text-embedding-3-large | 3072 | API调用 | 最高 | $0.13/1M tokens | 追求极致质量场景 |
| Domain fine-tuned | 可变 | 可变 | 领域最优 | 需训练成本 | 专业领域(法律、医疗) |
Vector Database Comparison
向量数据库对比表
| Database | Type | Scaling | Key Feature | Best For |
|---|---|---|---|---|
| Pinecone | Managed | Auto-scaling | Metadata filtering, hybrid search | Production, managed preference |
| Weaviate | Open source | Horizontal | GraphQL API, multi-modal | Complex data types |
| Qdrant | Open source | Distributed | High perf, low memory (Rust) | Performance-critical |
| Chroma | Embedded | Limited | Simple API, SQLite-backed | Prototyping, small-scale |
| pgvector | PostgreSQL ext | PostgreSQL scaling | ACID, SQL joins | Existing PostgreSQL infra |
| 数据库 | 类型 | 扩展性 | 核心特性 | 适用场景 |
|---|---|---|---|---|
| Pinecone | 托管式 | 自动扩缩容 | 元数据过滤、混合检索 | 生产环境、偏好托管服务 |
| Weaviate | 开源 | 水平扩展 | GraphQL API、多模态支持 | 复杂数据类型场景 |
| Qdrant | 开源 | 分布式 | 高性能、低内存占用(Rust实现) | 性能敏感场景 |
| Chroma | 嵌入式 | 有限扩展 | 简单API、SQLite支撑 | 原型开发、小规模场景 |
| pgvector | PostgreSQL扩展 | 跟随PostgreSQL扩展 | ACID合规、SQL关联查询 | 已有PostgreSQL基础设施 |
Retrieval Strategies
检索策略
| Strategy | When to Use | Implementation |
|---|---|---|
| Dense (vector similarity) | Default for semantic search | Cosine similarity with k-NN/ANN |
| Sparse (BM25/TF-IDF) | Exact keyword matching needed | Elasticsearch or inverted index |
| Hybrid (dense + sparse) | Best of both needed | Reciprocal Rank Fusion (RRF) with tuned weights |
| + Reranking | Precision must exceed 0.85 | Cross-encoder reranker after initial retrieval |
| 策略 | 适用场景 | 实现方式 |
|---|---|---|
| Dense (vector similarity) | 语义搜索默认方案 | 基于k-NN/ANN的余弦相似度计算 |
| Sparse (BM25/TF-IDF) | 需要精确关键词匹配 | Elasticsearch或倒排索引 |
| Hybrid (dense + sparse) | 需要兼顾两者优势 | 带权重调优的Reciprocal Rank Fusion (RRF) |
| + Reranking | 精度必须超过0.85 | 初始检索后使用Cross-encoder重排序器 |
Query Transformation Techniques
查询转换技术
| Technique | When to Use | How It Works |
|---|---|---|
| HyDE | Query/document style mismatch | LLM generates hypothetical answer; embed that instead of query |
| Multi-query | Ambiguous queries | Generate 3-5 query variations; retrieve for each; deduplicate |
| Step-back | Specific questions needing general context | Transform to broader query; retrieve general + specific |
| 技术 | 适用场景 | 工作原理 |
|---|---|---|
| HyDE | 查询与文档风格不匹配 | LLM生成假设性答案;将该答案嵌入而非原查询 |
| Multi-query | 查询存在歧义 | 生成3-5个查询变体;分别检索;去重 |
| Step-back | 需要通用上下文的特定问题 | 转换为更宽泛的查询;检索通用+特定内容 |
Context Window Optimization
上下文窗口优化
- Relevance ordering: Most relevant chunks first in the context window
- Diversity: Deduplicate semantically similar chunks
- Token budget: Fit within model context limit; reserve tokens for system prompt and answer
- Hierarchical inclusion: Include section summary before detailed chunks when available
- Compression: Summarize low-relevance chunks; extract key facts from verbose passages
- 相关性排序:上下文窗口中优先放置最相关的分块
- 多样性:语义相似的分块去重
- Token预算:控制在模型上下文限制内;为系统提示词和答案预留Token
- 层级包含:若有章节摘要,在详细分块前先加入摘要
- 压缩:对低相关性分块进行摘要;从冗长段落中提取关键事实
Evaluation Metrics (RAGAS Framework)
评估指标(RAGAS框架)
| Metric | Target | What It Measures |
|---|---|---|
| Faithfulness | > 0.90 | Answers grounded in retrieved context |
| Context Relevance | > 0.80 | Retrieved chunks relevant to query |
| Answer Relevance | > 0.85 | Answer addresses the original question |
| Precision@K | > 0.70 | % of top-K results that are relevant |
| Recall@K | > 0.80 | % of relevant docs found in top-K |
| MRR | > 0.75 | Reciprocal rank of first relevant result |
| 指标 | 目标值 | 衡量内容 |
|---|---|---|
| Faithfulness | > 0.90 | 答案是否基于检索到的上下文 |
| Context Relevance | > 0.80 | 检索到的分块与查询的相关性 |
| Answer Relevance | > 0.85 | 答案是否回应原始问题 |
| Precision@K | > 0.70 | top-K结果中相关内容占比 |
| Recall@K | > 0.80 | 相关文档在top-K结果中的占比 |
| MRR | > 0.75 | 第一个相关结果的倒数排名 |
Guardrails
防护机制
- PII detection: Scan retrieved chunks and generated responses for PII; redact or block
- Hallucination detection: Compare generated claims against source documents via NLI
- Source attribution: Every factual claim must cite a retrieved chunk
- Confidence scoring: Return confidence level; if below threshold, return "I don't have enough information"
- Injection prevention: Sanitize user queries; reject prompt injection attempts
- PII检测:扫描检索到的分块和生成的响应,识别PII信息并进行脱敏或拦截
- 幻觉检测:通过自然语言推理(NLI)对比生成内容与源文档
- 来源归因:每个事实性声明必须引用检索到的分块
- 置信度评分:返回置信度;若低于阈值,返回"我没有足够的信息"
- 注入防护:净化用户查询;拒绝提示词注入尝试
Example: Internal Knowledge Base RAG Pipeline
示例:内部知识库RAG管道
yaml
corpus:
documents: 12,000 Confluence pages + 3,000 PDFs
avg_length: 2,400 tokens
languages: [English]
domain: internal engineering docs
pipeline:
chunking:
strategy: recursive
max_tokens: 512
overlap: 50 tokens
boundary: paragraph
embedding:
model: text-embedding-3-small
dimensions: 1536
batch_size: 100
vector_db:
engine: pgvector
index: HNSW (ef_construction=128, m=16)
reason: "Existing PostgreSQL infra; ACID compliance for audit"
retrieval:
strategy: hybrid
dense_weight: 0.7
sparse_weight: 0.3
top_k: 10
reranker: cross-encoder/ms-marco-MiniLM-L-12-v2
final_k: 5
evaluation_results:
faithfulness: 0.93
context_relevance: 0.84
answer_relevance: 0.88
precision_at_5: 0.76
recall_at_10: 0.85yaml
corpus:
documents: 12,000 Confluence pages + 3,000 PDFs
avg_length: 2,400 tokens
languages: [English]
domain: internal engineering docs
pipeline:
chunking:
strategy: recursive
max_tokens: 512
overlap: 50 tokens
boundary: paragraph
embedding:
model: text-embedding-3-small
dimensions: 1536
batch_size: 100
vector_db:
engine: pgvector
index: HNSW (ef_construction=128, m=16)
reason: "Existing PostgreSQL infra; ACID compliance for audit"
retrieval:
strategy: hybrid
dense_weight: 0.7
sparse_weight: 0.3
top_k: 10
reranker: cross-encoder/ms-marco-MiniLM-L-12-v2
final_k: 5
evaluation_results:
faithfulness: 0.93
context_relevance: 0.84
answer_relevance: 0.88
precision_at_5: 0.76
recall_at_10: 0.85Production Patterns
生产模式
- Caching: Query-level (exact match), semantic (similar queries via embedding distance < 0.05), chunk-level (embedding cache)
- Streaming: Stream generation tokens while retrieval completes; show sources after generation
- Fallbacks: If primary vector DB is unavailable, serve from read-replica; if retrieval returns no results above threshold, say so explicitly
- Document refresh: Incremental re-embedding on change detection; full re-index weekly
- Cost control: Batch embeddings, cache aggressively, route simple queries to BM25 only
- 缓存:查询级(精确匹配)、语义级(嵌入距离<0.05的相似查询)、分块级(嵌入缓存)
- 流式输出:检索完成时同步生成Token;生成结束后展示来源
- 降级方案:若主向量数据库不可用,切换至只读副本;若检索结果均低于阈值,明确告知用户
- 文档刷新:变更检测时增量重新嵌入;每周全量重新索引
- 成本控制:批量嵌入、积极缓存、简单查询仅路由至BM25
Common Pitfalls
常见陷阱
| Problem | Solution |
|---|---|
| Chunks break mid-sentence | Use boundary-aware chunking with sentence/paragraph overlap |
| Low retrieval precision | Add cross-encoder reranker; tune similarity threshold |
| High latency (> 2s) | Cache embeddings; use faster model; reduce top-K |
| Inconsistent quality | Implement RAGAS evaluation in CI; add quality scoring |
| Scalability bottleneck | Shard vector DB; implement auto-scaling; add caching layer |
| 问题 | 解决方案 |
|---|---|
| 分块在句子中间断开 | 使用感知边界的分策略,设置句子/段落重叠 |
| 检索精度低 | 添加Cross-encoder重排序器;调整相似度阈值 |
| 延迟过高(>2s) | 缓存嵌入结果;使用更快的模型;降低top-K值 |
| 质量不稳定 | 在CI中实现RAGAS评估;添加质量评分机制 |
| 扩展性瓶颈 | 分片向量数据库;实现自动扩缩容;添加缓存层 |
Scripts
脚本工具
Chunking Optimizer
Chunking Optimizer
Analyses corpus and recommends optimal chunking strategy with parameters.
分析语料库并推荐最优分块策略及参数。
Retrieval Evaluator
Retrieval Evaluator
Runs evaluation suite (precision, recall, MRR, NDCG) against a test query set.
针对测试查询集运行评估套件(精度、召回率、MRR、NDCG)。
Pipeline Benchmarker
Pipeline Benchmarker
Measures end-to-end latency, throughput, and cost per query across configurations.
在不同配置下测量端到端延迟、吞吐量和单查询成本。
Troubleshooting
故障排查
| Problem | Cause | Solution |
|---|---|---|
| Chunks contain incomplete sentences or broken code blocks | Fixed-size chunking ignoring semantic boundaries | Switch to sentence-based or semantic (heading-aware) chunking; enable boundary detection in |
| Retrieved context is relevant but answer is wrong | LLM hallucinating beyond retrieved chunks | Enable faithfulness evaluation via RAGAS; add source attribution guardrails; lower confidence threshold to surface "I don't know" responses |
| Precision@K below 0.50 despite relevant documents existing | Embedding model does not capture domain vocabulary | Fine-tune embedding model on domain data or switch to a domain-specific model; add cross-encoder reranking stage |
| Query latency exceeds 2 seconds | Large top-K, no caching, or unoptimized HNSW index | Reduce top-K, enable query-level and semantic caching, tune HNSW parameters (ef_search, m) |
| Recall drops after adding new documents | Stale embeddings or index fragmentation after incremental inserts | Trigger full re-index; verify new documents pass chunking pipeline; check embedding model version consistency |
| Hybrid retrieval returns duplicate chunks | Dense and sparse retrievers returning overlapping results without deduplication | Apply Reciprocal Rank Fusion (RRF) with deduplication before reranking; tune dense/sparse weight ratio |
| Evaluation metrics fluctuate across runs | Non-deterministic embedding batching or insufficient test query set | Fix random seeds, increase evaluation sample size, run evaluations on a frozen ground-truth set |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 分块包含不完整句子或损坏的代码块 | 固定大小分块忽略语义边界 | 切换为基于句子或语义(感知标题)的分块策略;在 |
| 检索上下文相关但答案错误 | LLM生成超出检索上下文的内容 | 通过RAGAS启用忠实度评估;添加来源归因防护机制;降低置信度阈值以触发"我不知道"响应 |
| 尽管存在相关文档,但Precision@K低于0.50 | 嵌入模型未捕捉领域词汇 | 在领域数据上微调嵌入模型或切换为领域特定模型;添加Cross-encoder重排序环节 |
| 查询延迟超过2秒 | top-K值过大、无缓存或HNSW索引未优化 | 降低top-K值;启用查询级和语义级缓存;调整HNSW参数(ef_search、m) |
| 添加新文档后召回率下降 | 嵌入结果过时或增量插入后索引碎片化 | 触发全量重新索引;验证新文档通过分块管道;检查嵌入模型版本一致性 |
| 混合检索返回重复分块 | 稠密和稀疏检索器返回重叠结果且未去重 | 在重排序前应用带去重的Reciprocal Rank Fusion (RRF);调整稠密/稀疏权重比 |
| 评估指标在多次运行中波动 | 嵌入批处理非确定性或测试查询集不足 | 固定随机种子;增加评估样本量;在冻结的真值数据集上运行评估 |
Success Criteria
成功标准
- Faithfulness > 0.90 -- Generated answers are grounded in retrieved context as measured by the RAGAS faithfulness metric.
- Context Relevance > 0.80 -- At least 80% of retrieved chunks are relevant to the user query.
- Precision@5 > 0.70 -- Seven out of ten top-5 result sets contain only relevant documents.
- End-to-end latency < 500ms -- P95 query-to-response latency stays under 500 milliseconds for interactive workloads.
- Recall@10 > 0.85 -- The system retrieves at least 85% of relevant documents within the top 10 results.
- Chunk boundary quality > 0.80 -- At least 80% of chunks end on clean sentence or paragraph boundaries as reported by .
chunking_optimizer.py - Monthly cost within budget -- Total embedding, vector DB, and reranking costs stay within the budget ceiling defined in requirements.
- 忠实度>0.90 -- 生成的答案基于检索到的上下文,由RAGAS忠实度指标衡量。
- 上下文相关性>0.80 -- 至少80%的检索分块与用户查询相关。
- Precision@5>0.70 -- 十个top-5结果集中至少七个仅包含相关文档。
- 端到端延迟<500ms -- 交互式工作负载的P95查询到响应延迟保持在500毫秒以内。
- Recall@10>0.85 -- 系统在top10结果中检索到至少85%的相关文档。
- 分块边界质量>0.80 -- 至少80%的分块在完整句子或段落边界结束,由报告。
chunking_optimizer.py - 月度成本在预算内 -- 嵌入、向量数据库和重排序的总成本保持在需求定义的预算上限内。
Scope & Limitations
范围与限制
This skill covers:
- End-to-end RAG pipeline architecture design: chunking, embedding, vector storage, retrieval, reranking, and evaluation.
- Quantitative chunking analysis across four strategy families (fixed-size, sentence, paragraph, semantic).
- Retrieval quality evaluation using standard IR metrics (Precision@K, Recall@K, MRR, NDCG) with a built-in TF-IDF baseline.
- Automated pipeline design with component selection, cost projection, and Mermaid architecture diagrams.
This skill does NOT cover:
- LLM prompt engineering or generation-side optimization -- see .
engineering/prompt-engineer-toolkit - Database schema design for metadata stores alongside vector databases -- see .
engineering/database-designer - Production observability, alerting, and SLO dashboards for deployed pipelines -- see .
engineering/observability-designer - Agent orchestration or multi-step reasoning workflows that sit on top of RAG retrieval -- see .
engineering/agent-workflow-designer
本技能涵盖:
- 端到端RAG管道架构设计:分块、嵌入、向量存储、检索、重排序和评估。
- 四类策略家族(固定大小、句子、段落、语义)的定量分块分析。
- 使用标准IR指标(Precision@K、Recall@K、MRR、NDCG)结合内置TF-IDF基线进行检索质量评估。
- 自动化管道设计,包括组件选择、成本预测和Mermaid架构图。
本技能不涵盖:
- LLM提示词工程或生成侧优化 -- 请查看。
engineering/prompt-engineer-toolkit - 向量数据库配套的元数据存储数据库 schema 设计 -- 请查看。
engineering/database-designer - 已部署管道的生产可观测性、告警和SLO仪表盘 -- 请查看。
engineering/observability-designer - 基于RAG检索的Agent编排或多步骤推理工作流 -- 请查看。
engineering/agent-workflow-designer
Integration Points
集成点
| Skill | Integration | Data Flow |
|---|---|---|
| Optimize system prompts and few-shot examples fed alongside retrieved chunks | Pipeline design output --> prompt templates that reference chunk format and metadata |
| Design relational metadata stores (tags, access control, source tracking) paired with the vector database | Vector DB recommendation --> metadata schema for hybrid storage |
| Set up latency, throughput, and accuracy monitoring for the deployed RAG pipeline | Evaluation metrics and SLO targets --> dashboards and alerting rules |
| Embed the RAG retrieval step inside multi-agent reasoning workflows | Retrieval config --> agent tool definition with top-K and threshold parameters |
| Automate embedding re-indexing, evaluation regression tests, and deployment on document changes | Evaluation thresholds --> CI gate that blocks deploys when metrics regress |
| Review the query and ingestion API surface exposed by the RAG service | Pipeline config --> OpenAPI spec review for search and ingest endpoints |
| 技能 | 集成方式 | 数据流 |
|---|---|---|
| 优化与检索分块一起传入的系统提示词和少样本示例 | 管道设计输出 --> 引用分块格式和元数据的提示词模板 |
| 设计与向量数据库配对的关系型元数据存储(标签、访问控制、来源追踪) | 向量数据库推荐 --> 混合存储的元数据schema |
| 为部署的RAG管道设置延迟、吞吐量和准确性监控 | 评估指标和SLO目标 --> 仪表盘和告警规则 |
| 将RAG检索步骤嵌入多Agent推理工作流 | 检索配置 --> 包含top-K和阈值参数的Agent工具定义 |
| 文档变更时自动执行嵌入重新索引、评估回归测试和部署 | 评估阈值 --> 指标退化时阻止部署的CI gate |
| 审查RAG服务暴露的查询和 ingestion API 接口 | 管道配置 --> 搜索和 ingest 端点的OpenAPI规范审查 |
Tool Reference
工具参考
chunking_optimizer.py
chunking_optimizer.py
Purpose: Analyzes a document corpus and evaluates multiple chunking strategies (fixed-size, sentence-based, paragraph-based, semantic/heading-aware) to recommend the optimal approach with configuration parameters.
Usage:
bash
python chunking_optimizer.py <directory> [options]Flags / Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional, required | -- | Directory containing text/markdown documents to analyze |
| string | None | Output file path for results in JSON format |
| string | None | JSON configuration file to customize strategy parameters (fixed_sizes, overlaps, sentence_max_sizes, paragraph_max_sizes, semantic_max_sizes) |
| string list | | File extensions to include when scanning the corpus |
| flag | off | Print all strategy scores in addition to the recommendation |
Example:
bash
python chunking_optimizer.py ./docs --output results.json --extensions .txt .md --verboseOutput Formats:
- Console -- Corpus summary, recommended strategy name, performance score, reasoning text, and two sample chunks. With , all strategy scores are listed.
--verbose - JSON () -- Full results object containing
--output,corpus_info(per-strategy size statistics, boundary quality, semantic coherence, vocabulary statistics, performance score),strategy_results(best strategy, all scores, reasoning), andrecommendation.sample_chunks
用途: 分析文档语料库,评估多种分块策略(固定大小、基于句子、基于段落、语义/感知标题),推荐最优方案及配置参数。
用法:
bash
python chunking_optimizer.py <directory> [options]参数/标志:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数,必填 | -- | 包含待分析文本/Markdown文档的目录 |
| 字符串 | None | JSON格式结果的输出文件路径 |
| 字符串 | None | 自定义策略参数的JSON配置文件(fixed_sizes、overlaps、sentence_max_sizes、paragraph_max_sizes、semantic_max_sizes) |
| 字符串列表 | | 扫描语料库时包含的文件扩展名 |
| 标志 | 关闭 | 除推荐结果外,打印所有策略得分 |
示例:
bash
python chunking_optimizer.py ./docs --output results.json --extensions .txt .md --verbose输出格式:
- 控制台 -- 语料库摘要、推荐策略名称、性能得分、推理文本及两个样本分块。启用时,列出所有策略得分。
--verbose - JSON() -- 完整结果对象,包含
--output、corpus_info(各策略的大小统计、边界质量、语义连贯性、词汇统计、性能得分)、strategy_results(最优策略、所有得分、推理)和recommendation。sample_chunks
retrieval_evaluator.py
retrieval_evaluator.py
Purpose: Evaluates retrieval system performance using a built-in TF-IDF baseline retriever and standard information retrieval metrics: Precision@K, Recall@K, MRR, and NDCG. Includes failure analysis and improvement recommendations.
Usage:
bash
python retrieval_evaluator.py <queries> <corpus> <ground_truth> [options]Flags / Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional, required | -- | JSON file containing queries (list of |
| positional, required | -- | Directory containing the document corpus |
| positional, required | -- | JSON file mapping query IDs to lists of relevant document IDs |
| string | None | Output file path for results in JSON format |
| int list | | K values used when computing Precision@K, Recall@K, and NDCG@K |
| string list | | File extensions to include from the corpus directory |
| flag | off | Print detailed per-metric values and failure analysis counts |
Example:
bash
python retrieval_evaluator.py queries.json ./corpus ground_truth.json --output eval.json --k-values 1 5 10 --verboseOutput Formats:
- Console -- Evaluation summary table (Precision@1, Precision@5, Recall@5, MRR, NDCG@5) with performance assessment and numbered improvement recommendations. With , all aggregate metrics and failure analysis counts are printed.
--verbose - JSON () -- Full results object containing
--output,aggregate_metrics(per-query metrics, retrieved docs, relevant docs),query_results(poor precision/recall counts, zero-result counts, query length analysis, failure patterns),failure_analysis, andevaluation_summary.recommendations
用途: 使用内置TF-IDF基线检索器和标准信息检索指标(Precision@K、Recall@K、MRR、NDCG)评估检索系统性能。包含失败分析和改进建议。
用法:
bash
python retrieval_evaluator.py <queries> <corpus> <ground_truth> [options]参数/标志:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数,必填 | -- | 包含查询的JSON文件( |
| 位置参数,必填 | -- | 包含文档语料库的目录 |
| 位置参数,必填 | -- | 将查询ID映射到相关文档ID列表的JSON文件 |
| 字符串 | None | JSON格式结果的输出文件路径 |
| 整数列表 | | 计算Precision@K、Recall@K和NDCG@K时使用的K值 |
| 字符串列表 | | 从语料库目录中包含的文件扩展名 |
| 标志 | 关闭 | 打印详细的每指标值和失败分析统计 |
示例:
bash
python retrieval_evaluator.py queries.json ./corpus ground_truth.json --output eval.json --k-values 1 5 10 --verbose输出格式:
- 控制台 -- 评估摘要表(Precision@1、Precision@5、Recall@5、MRR、NDCG@5),含性能评估和编号改进建议。启用时,打印所有聚合指标和失败分析统计。
--verbose - JSON() -- 完整结果对象,包含
--output、aggregate_metrics(每查询指标、检索文档、相关文档)、query_results(低精度/召回率统计、零结果统计、查询长度分析、失败模式)、failure_analysis和evaluation_summary。recommendations
rag_pipeline_designer.py
rag_pipeline_designer.py
Purpose: Accepts a system requirements specification and generates a complete RAG pipeline design including component recommendations (chunking, embedding, vector DB, retrieval, reranking, evaluation), cost projections, a Mermaid architecture diagram, and deployment configuration templates.
Usage:
bash
python rag_pipeline_designer.py <requirements> [options]Flags / Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional, required | -- | JSON file containing system requirements (document_types, document_count, avg_document_size, queries_per_day, query_patterns, latency_requirement, budget_monthly, accuracy_priority, cost_priority, maintenance_complexity) |
| string | None | Output file path for the pipeline design in JSON format |
| flag | off | Print full configuration templates for each component |
Example:
bash
python rag_pipeline_designer.py requirements.json --output pipeline_design.json --verboseOutput Formats:
- Console -- Design summary with total monthly cost, per-component recommendations (name, rationale, cost), and a Mermaid architecture diagram. With , full JSON configuration templates for each component are printed.
--verbose - JSON () -- Complete pipeline design object containing per-component
--outputfields (name, type, config, rationale, pros, cons, cost_monthly),ComponentRecommendation,total_cost(Mermaid markup), andarchitecture_diagram(per-component configs plus deployment/scaling/monitoring settings).config_templates
用途: 接收系统需求规格,生成完整的RAG管道设计,包括组件推荐(分块、嵌入、向量数据库、检索、重排序、评估)、成本预测、Mermaid架构图和部署配置模板。
用法:
bash
python rag_pipeline_designer.py <requirements> [options]参数/标志:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数,必填 | -- | 包含系统需求的JSON文件(document_types、document_count、avg_document_size、queries_per_day、query_patterns、latency_requirement、budget_monthly、accuracy_priority、cost_priority、maintenance_complexity) |
| 字符串 | None | JSON格式管道设计的输出文件路径 |
| 标志 | 关闭 | 打印各组件的完整配置模板 |
示例:
bash
python rag_pipeline_designer.py requirements.json --output pipeline_design.json --verbose输出格式:
- 控制台 -- 设计摘要,含月度总成本、各组件推荐(名称、理由、成本)及Mermaid架构图。启用时,打印各组件的完整JSON配置模板。
--verbose - JSON() -- 完整管道设计对象,包含各组件的
--output字段(名称、类型、配置、理由、优势、劣势、月度成本)、ComponentRecommendation、total_cost(Mermaid标记)和architecture_diagram(各组件配置及部署/扩缩容/监控设置)。config_templates