rag-architect

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RAG Architect

RAG架构师

The agent designs, implements, and optimizes production-grade Retrieval-Augmented Generation pipelines, covering the full lifecycle from document chunking through evaluation.

该Agent负责设计、实现并优化生产级检索增强生成（Retrieval-Augmented Generation）管道，覆盖从文档分块到评估的完整生命周期。

Workflow

工作流程

Analyse corpus -- Profile the document collection: count, average length, format mix (PDF, HTML, Markdown), language(s), and domain. Validate that sample documents are accessible before proceeding.
Select chunking strategy -- Choose from the Chunking Strategy Matrix based on corpus characteristics. Set chunk size, overlap, and boundary rules. Run a test split on 100 sample documents.
Choose embedding model -- Select an embedding model from the Embedding Model table based on domain, latency budget, and cost constraints. Verify dimension compatibility with the target vector database.
Select vector database -- Pick a vector store from the Vector Database Comparison based on scale, query patterns, and operational requirements.
Design retrieval pipeline -- Configure retrieval strategy (dense, sparse, or hybrid). Add reranking if precision requirements exceed 0.85. Set the top-K parameter and similarity threshold.
Implement query transformations -- If query-document style mismatch exists, enable HyDE. If queries are ambiguous, enable multi-query generation. Validate each transformation improves retrieval metrics on a held-out set.
Configure guardrails -- Enable PII detection, toxicity filtering, hallucination detection, and source attribution. Set confidence score thresholds.
Evaluate end-to-end -- Run the RAGAS evaluation framework. Verify faithfulness > 0.90, context relevance > 0.80, answer relevance > 0.85. Iterate on weak components.

分析语料库 -- 分析文档集合概况：数量、平均长度、格式组合（PDF、HTML、Markdown）、语言及领域。在继续操作前验证样本文档可访问。
选择分块策略 -- 根据语料库特征从分块策略矩阵中选择合适策略。设置分块大小、重叠率和边界规则。对100份样本文档进行测试拆分。
选择嵌入模型 -- 根据领域、延迟预算和成本限制从嵌入模型对照表中选择模型。验证模型维度与目标向量数据库兼容。
选择向量数据库 -- 根据规模、查询模式和运维需求从向量数据库对比表中选择向量存储。
设计检索管道 -- 配置检索策略（稠密检索、稀疏检索或混合检索）。若精度要求超过0.85，添加重排序环节。设置top-K参数和相似度阈值。
实现查询转换 -- 若查询与文档风格不匹配，启用HyDE。若查询存在歧义，启用多查询生成。在预留数据集上验证每种转换是否提升检索指标。
配置防护机制 -- 启用PII检测、毒性过滤、幻觉检测和来源归因功能。设置置信度分数阈值。
端到端评估 -- 运行RAGAS评估框架。验证忠实度>0.90、上下文相关性>0.80、答案相关性>0.85。针对薄弱环节迭代优化。

Chunking Strategy Matrix

分块策略矩阵

Strategy	Best For	Chunk Size	Overlap	Pros	Cons
Fixed-size (token)	Uniform docs, consistent sizing	512-2048 tokens	10-20%	Predictable, simple	Breaks semantic units
Sentence-based	Narrative text, articles	3-8 sentences	1 sentence	Preserves language boundaries	Variable sizes
Paragraph-based	Structured docs, technical manuals	1-3 paragraphs	0-1 paragraph	Preserves topic coherence	Highly variable sizes
Semantic	Long-form, research papers	Dynamic	Topic-shift detection	Best coherence	Computationally expensive
Recursive	Mixed content types	Dynamic, multi-level	Per-level	Optimal utilization	Complex implementation
Document-aware	Multi-format collections	Format-specific	Section-level	Preserves metadata	Format-specific code required

策略	适用场景	分块大小	重叠率	优势	劣势
Fixed-size (token)	格式统一、篇幅一致的文档	512-2048 tokens	10-20%	可预测、实现简单	破坏语义单元
Sentence-based	叙事文本、文章类内容	3-8个句子	1个句子	保留语言边界	分块大小不稳定
Paragraph-based	结构化文档、技术手册	1-3个段落	0-1个段落	保留主题连贯性	分块大小差异极大
Semantic	长篇内容、研究论文	动态调整	主题转移检测	语义连贯性最佳	计算成本高
Recursive	混合内容类型	动态多层级	按层级设置	资源利用率最优	实现复杂
Document-aware	多格式集合	格式特定	章节级	保留元数据	需要格式特定代码

Embedding Model Comparison

嵌入模型对比表

Model	Dimensions	Speed	Quality	Cost	Best For
all-MiniLM-L6-v2	384	~14K tok/s	Good	Free (local)	Prototyping, low-latency
all-mpnet-base-v2	768	~2.8K tok/s	Better	Free (local)	Balanced production use
text-embedding-3-small	1536	API	High	$0.02/1M tokens	Cost-effective production
text-embedding-3-large	3072	API	Highest	$0.13/1M tokens	Maximum quality
Domain fine-tuned	Varies	Varies	Domain-best	Training cost	Specialized domains (legal, medical)

模型	维度	速度	质量	成本	适用场景
all-MiniLM-L6-v2	384	~14K tok/s	良好	免费（本地部署）	原型开发、低延迟场景
all-mpnet-base-v2	768	~2.8K tok/s	更优	免费（本地部署）	平衡型生产场景
text-embedding-3-small	1536	API调用	高	$0.02/1M tokens	高性价比生产场景
text-embedding-3-large	3072	API调用	最高	$0.13/1M tokens	追求极致质量场景
Domain fine-tuned	可变	可变	领域最优	需训练成本	专业领域（法律、医疗）

Vector Database Comparison

向量数据库对比表

Database	Type	Scaling	Key Feature	Best For
Pinecone	Managed	Auto-scaling	Metadata filtering, hybrid search	Production, managed preference
Weaviate	Open source	Horizontal	GraphQL API, multi-modal	Complex data types
Qdrant	Open source	Distributed	High perf, low memory (Rust)	Performance-critical
Chroma	Embedded	Limited	Simple API, SQLite-backed	Prototyping, small-scale
pgvector	PostgreSQL ext	PostgreSQL scaling	ACID, SQL joins	Existing PostgreSQL infra

数据库	类型	扩展性	核心特性	适用场景
Pinecone	托管式	自动扩缩容	元数据过滤、混合检索	生产环境、偏好托管服务
Weaviate	开源	水平扩展	GraphQL API、多模态支持	复杂数据类型场景
Qdrant	开源	分布式	高性能、低内存占用（Rust实现）	性能敏感场景
Chroma	嵌入式	有限扩展	简单API、SQLite支撑	原型开发、小规模场景
pgvector	PostgreSQL扩展	跟随PostgreSQL扩展	ACID合规、SQL关联查询	已有PostgreSQL基础设施

Retrieval Strategies

检索策略

Strategy	When to Use	Implementation
Dense (vector similarity)	Default for semantic search	Cosine similarity with k-NN/ANN
Sparse (BM25/TF-IDF)	Exact keyword matching needed	Elasticsearch or inverted index
Hybrid (dense + sparse)	Best of both needed	Reciprocal Rank Fusion (RRF) with tuned weights
+ Reranking	Precision must exceed 0.85	Cross-encoder reranker after initial retrieval

策略	适用场景	实现方式
Dense (vector similarity)	语义搜索默认方案	基于k-NN/ANN的余弦相似度计算
Sparse (BM25/TF-IDF)	需要精确关键词匹配	Elasticsearch或倒排索引
Hybrid (dense + sparse)	需要兼顾两者优势	带权重调优的Reciprocal Rank Fusion (RRF)
+ Reranking	精度必须超过0.85	初始检索后使用Cross-encoder重排序器

Query Transformation Techniques

查询转换技术

Technique	When to Use	How It Works
HyDE	Query/document style mismatch	LLM generates hypothetical answer; embed that instead of query
Multi-query	Ambiguous queries	Generate 3-5 query variations; retrieve for each; deduplicate
Step-back	Specific questions needing general context	Transform to broader query; retrieve general + specific

技术	适用场景	工作原理
HyDE	查询与文档风格不匹配	LLM生成假设性答案；将该答案嵌入而非原查询
Multi-query	查询存在歧义	生成3-5个查询变体；分别检索；去重
Step-back	需要通用上下文的特定问题	转换为更宽泛的查询；检索通用+特定内容

Context Window Optimization

上下文窗口优化

Relevance ordering: Most relevant chunks first in the context window
Diversity: Deduplicate semantically similar chunks
Token budget: Fit within model context limit; reserve tokens for system prompt and answer
Hierarchical inclusion: Include section summary before detailed chunks when available
Compression: Summarize low-relevance chunks; extract key facts from verbose passages

相关性排序：上下文窗口中优先放置最相关的分块
多样性：语义相似的分块去重
Token预算：控制在模型上下文限制内；为系统提示词和答案预留Token
层级包含：若有章节摘要，在详细分块前先加入摘要
压缩：对低相关性分块进行摘要；从冗长段落中提取关键事实

Evaluation Metrics (RAGAS Framework)

评估指标（RAGAS框架）

Metric	Target	What It Measures
Faithfulness	> 0.90	Answers grounded in retrieved context
Context Relevance	> 0.80	Retrieved chunks relevant to query
Answer Relevance	> 0.85	Answer addresses the original question
Precision@K	> 0.70	% of top-K results that are relevant
Recall@K	> 0.80	% of relevant docs found in top-K
MRR	> 0.75	Reciprocal rank of first relevant result

指标	目标值	衡量内容
Faithfulness	> 0.90	答案是否基于检索到的上下文
Context Relevance	> 0.80	检索到的分块与查询的相关性
Answer Relevance	> 0.85	答案是否回应原始问题
Precision@K	> 0.70	top-K结果中相关内容占比
Recall@K	> 0.80	相关文档在top-K结果中的占比
MRR	> 0.75	第一个相关结果的倒数排名

Guardrails

防护机制

PII detection: Scan retrieved chunks and generated responses for PII; redact or block
Hallucination detection: Compare generated claims against source documents via NLI
Source attribution: Every factual claim must cite a retrieved chunk
Confidence scoring: Return confidence level; if below threshold, return "I don't have enough information"
Injection prevention: Sanitize user queries; reject prompt injection attempts

PII检测：扫描检索到的分块和生成的响应，识别PII信息并进行脱敏或拦截
幻觉检测：通过自然语言推理（NLI）对比生成内容与源文档
来源归因：每个事实性声明必须引用检索到的分块
置信度评分：返回置信度；若低于阈值，返回"我没有足够的信息"
注入防护：净化用户查询；拒绝提示词注入尝试

Example: Internal Knowledge Base RAG Pipeline

示例：内部知识库RAG管道

yaml

corpus:
  documents: 12,000 Confluence pages + 3,000 PDFs
  avg_length: 2,400 tokens
  languages: [English]
  domain: internal engineering docs

pipeline:
  chunking:
    strategy: recursive
    max_tokens: 512
    overlap: 50 tokens
    boundary: paragraph
  embedding:
    model: text-embedding-3-small
    dimensions: 1536
    batch_size: 100
  vector_db:
    engine: pgvector
    index: HNSW (ef_construction=128, m=16)
    reason: "Existing PostgreSQL infra; ACID compliance for audit"
  retrieval:
    strategy: hybrid
    dense_weight: 0.7
    sparse_weight: 0.3
    top_k: 10
    reranker: cross-encoder/ms-marco-MiniLM-L-12-v2
    final_k: 5

evaluation_results:
  faithfulness: 0.93
  context_relevance: 0.84
  answer_relevance: 0.88
  precision_at_5: 0.76
  recall_at_10: 0.85

yaml

corpus:
  documents: 12,000 Confluence pages + 3,000 PDFs
  avg_length: 2,400 tokens
  languages: [English]
  domain: internal engineering docs

pipeline:
  chunking:
    strategy: recursive
    max_tokens: 512
    overlap: 50 tokens
    boundary: paragraph
  embedding:
    model: text-embedding-3-small
    dimensions: 1536
    batch_size: 100
  vector_db:
    engine: pgvector
    index: HNSW (ef_construction=128, m=16)
    reason: "Existing PostgreSQL infra; ACID compliance for audit"
  retrieval:
    strategy: hybrid
    dense_weight: 0.7
    sparse_weight: 0.3
    top_k: 10
    reranker: cross-encoder/ms-marco-MiniLM-L-12-v2
    final_k: 5

evaluation_results:
  faithfulness: 0.93
  context_relevance: 0.84
  answer_relevance: 0.88
  precision_at_5: 0.76
  recall_at_10: 0.85

Production Patterns

生产模式

Caching: Query-level (exact match), semantic (similar queries via embedding distance < 0.05), chunk-level (embedding cache)
Streaming: Stream generation tokens while retrieval completes; show sources after generation
Fallbacks: If primary vector DB is unavailable, serve from read-replica; if retrieval returns no results above threshold, say so explicitly
Document refresh: Incremental re-embedding on change detection; full re-index weekly
Cost control: Batch embeddings, cache aggressively, route simple queries to BM25 only

缓存：查询级（精确匹配）、语义级（嵌入距离<0.05的相似查询）、分块级（嵌入缓存）
流式输出：检索完成时同步生成Token；生成结束后展示来源
降级方案：若主向量数据库不可用，切换至只读副本；若检索结果均低于阈值，明确告知用户
文档刷新：变更检测时增量重新嵌入；每周全量重新索引
成本控制：批量嵌入、积极缓存、简单查询仅路由至BM25

Common Pitfalls

常见陷阱

Problem	Solution
Chunks break mid-sentence	Use boundary-aware chunking with sentence/paragraph overlap
Low retrieval precision	Add cross-encoder reranker; tune similarity threshold
High latency (> 2s)	Cache embeddings; use faster model; reduce top-K
Inconsistent quality	Implement RAGAS evaluation in CI; add quality scoring
Scalability bottleneck	Shard vector DB; implement auto-scaling; add caching layer

问题	解决方案
分块在句子中间断开	使用感知边界的分策略，设置句子/段落重叠
检索精度低	添加Cross-encoder重排序器；调整相似度阈值
延迟过高（>2s）	缓存嵌入结果；使用更快的模型；降低top-K值
质量不稳定	在CI中实现RAGAS评估；添加质量评分机制
扩展性瓶颈	分片向量数据库；实现自动扩缩容；添加缓存层

Scripts

脚本工具

Chunking Optimizer

Analyses corpus and recommends optimal chunking strategy with parameters.

分析语料库并推荐最优分块策略及参数。

Retrieval Evaluator

Runs evaluation suite (precision, recall, MRR, NDCG) against a test query set.

针对测试查询集运行评估套件（精度、召回率、MRR、NDCG）。

Pipeline Benchmarker

Measures end-to-end latency, throughput, and cost per query across configurations.

在不同配置下测量端到端延迟、吞吐量和单查询成本。

Troubleshooting

故障排查

Problem	Cause	Solution
Chunks contain incomplete sentences or broken code blocks	Fixed-size chunking ignoring semantic boundaries	Switch to sentence-based or semantic (heading-aware) chunking; enable boundary detection in `chunking_optimizer.py`
Retrieved context is relevant but answer is wrong	LLM hallucinating beyond retrieved chunks	Enable faithfulness evaluation via RAGAS; add source attribution guardrails; lower confidence threshold to surface "I don't know" responses
Precision@K below 0.50 despite relevant documents existing	Embedding model does not capture domain vocabulary	Fine-tune embedding model on domain data or switch to a domain-specific model; add cross-encoder reranking stage
Query latency exceeds 2 seconds	Large top-K, no caching, or unoptimized HNSW index	Reduce top-K, enable query-level and semantic caching, tune HNSW parameters (ef_search, m)
Recall drops after adding new documents	Stale embeddings or index fragmentation after incremental inserts	Trigger full re-index; verify new documents pass chunking pipeline; check embedding model version consistency
Hybrid retrieval returns duplicate chunks	Dense and sparse retrievers returning overlapping results without deduplication	Apply Reciprocal Rank Fusion (RRF) with deduplication before reranking; tune dense/sparse weight ratio
Evaluation metrics fluctuate across runs	Non-deterministic embedding batching or insufficient test query set	Fix random seeds, increase evaluation sample size, run evaluations on a frozen ground-truth set

问题	原因	解决方案
分块包含不完整句子或损坏的代码块	固定大小分块忽略语义边界	切换为基于句子或语义（感知标题）的分块策略；在 `chunking_optimizer.py` 中启用边界检测
检索上下文相关但答案错误	LLM生成超出检索上下文的内容	通过RAGAS启用忠实度评估；添加来源归因防护机制；降低置信度阈值以触发"我不知道"响应
尽管存在相关文档，但Precision@K低于0.50	嵌入模型未捕捉领域词汇	在领域数据上微调嵌入模型或切换为领域特定模型；添加Cross-encoder重排序环节
查询延迟超过2秒	top-K值过大、无缓存或HNSW索引未优化	降低top-K值；启用查询级和语义级缓存；调整HNSW参数（ef_search、m）
添加新文档后召回率下降	嵌入结果过时或增量插入后索引碎片化	触发全量重新索引；验证新文档通过分块管道；检查嵌入模型版本一致性
混合检索返回重复分块	稠密和稀疏检索器返回重叠结果且未去重	在重排序前应用带去重的Reciprocal Rank Fusion (RRF)；调整稠密/稀疏权重比
评估指标在多次运行中波动	嵌入批处理非确定性或测试查询集不足	固定随机种子；增加评估样本量；在冻结的真值数据集上运行评估

Success Criteria

成功标准

Faithfulness > 0.90 -- Generated answers are grounded in retrieved context as measured by the RAGAS faithfulness metric.
Context Relevance > 0.80 -- At least 80% of retrieved chunks are relevant to the user query.
Precision@5 > 0.70 -- Seven out of ten top-5 result sets contain only relevant documents.
End-to-end latency < 500ms -- P95 query-to-response latency stays under 500 milliseconds for interactive workloads.
Recall@10 > 0.85 -- The system retrieves at least 85% of relevant documents within the top 10 results.
Chunk boundary quality > 0.80 -- At least 80% of chunks end on clean sentence or paragraph boundaries as reported by
```
chunking_optimizer.py
```
.
Monthly cost within budget -- Total embedding, vector DB, and reranking costs stay within the budget ceiling defined in requirements.

忠实度>0.90 -- 生成的答案基于检索到的上下文，由RAGAS忠实度指标衡量。
上下文相关性>0.80 -- 至少80%的检索分块与用户查询相关。
Precision@5>0.70 -- 十个top-5结果集中至少七个仅包含相关文档。
端到端延迟<500ms -- 交互式工作负载的P95查询到响应延迟保持在500毫秒以内。
Recall@10>0.85 -- 系统在top10结果中检索到至少85%的相关文档。
分块边界质量>0.80 -- 至少80%的分块在完整句子或段落边界结束，由
```
chunking_optimizer.py
```
报告。
月度成本在预算内 -- 嵌入、向量数据库和重排序的总成本保持在需求定义的预算上限内。

Scope & Limitations

范围与限制

This skill covers:

End-to-end RAG pipeline architecture design: chunking, embedding, vector storage, retrieval, reranking, and evaluation.
Quantitative chunking analysis across four strategy families (fixed-size, sentence, paragraph, semantic).
Retrieval quality evaluation using standard IR metrics (Precision@K, Recall@K, MRR, NDCG) with a built-in TF-IDF baseline.
Automated pipeline design with component selection, cost projection, and Mermaid architecture diagrams.

This skill does NOT cover:

LLM prompt engineering or generation-side optimization -- see
```
engineering/prompt-engineer-toolkit
```
.
Database schema design for metadata stores alongside vector databases -- see
```
engineering/database-designer
```
.
Production observability, alerting, and SLO dashboards for deployed pipelines -- see
```
engineering/observability-designer
```
.
Agent orchestration or multi-step reasoning workflows that sit on top of RAG retrieval -- see
```
engineering/agent-workflow-designer
```
.

本技能涵盖：

端到端RAG管道架构设计：分块、嵌入、向量存储、检索、重排序和评估。
四类策略家族（固定大小、句子、段落、语义）的定量分块分析。
使用标准IR指标（Precision@K、Recall@K、MRR、NDCG）结合内置TF-IDF基线进行检索质量评估。
自动化管道设计，包括组件选择、成本预测和Mermaid架构图。

本技能不涵盖：

LLM提示词工程或生成侧优化 -- 请查看
```
engineering/prompt-engineer-toolkit
```
。
向量数据库配套的元数据存储数据库 schema 设计 -- 请查看
```
engineering/database-designer
```
。
已部署管道的生产可观测性、告警和SLO仪表盘 -- 请查看
```
engineering/observability-designer
```
。
基于RAG检索的Agent编排或多步骤推理工作流 -- 请查看
```
engineering/agent-workflow-designer
```
。

Integration Points

集成点

Skill	Integration	Data Flow
`engineering/prompt-engineer-toolkit`	Optimize system prompts and few-shot examples fed alongside retrieved chunks	Pipeline design output --> prompt templates that reference chunk format and metadata
`engineering/database-designer`	Design relational metadata stores (tags, access control, source tracking) paired with the vector database	Vector DB recommendation --> metadata schema for hybrid storage
`engineering/observability-designer`	Set up latency, throughput, and accuracy monitoring for the deployed RAG pipeline	Evaluation metrics and SLO targets --> dashboards and alerting rules
`engineering/agent-workflow-designer`	Embed the RAG retrieval step inside multi-agent reasoning workflows	Retrieval config --> agent tool definition with top-K and threshold parameters
`engineering/ci-cd-pipeline-builder`	Automate embedding re-indexing, evaluation regression tests, and deployment on document changes	Evaluation thresholds --> CI gate that blocks deploys when metrics regress
`engineering/api-design-reviewer`	Review the query and ingestion API surface exposed by the RAG service	Pipeline config --> OpenAPI spec review for search and ingest endpoints

技能	集成方式	数据流
`engineering/prompt-engineer-toolkit`	优化与检索分块一起传入的系统提示词和少样本示例	管道设计输出 --> 引用分块格式和元数据的提示词模板
`engineering/database-designer`	设计与向量数据库配对的关系型元数据存储（标签、访问控制、来源追踪）	向量数据库推荐 --> 混合存储的元数据schema
`engineering/observability-designer`	为部署的RAG管道设置延迟、吞吐量和准确性监控	评估指标和SLO目标 --> 仪表盘和告警规则
`engineering/agent-workflow-designer`	将RAG检索步骤嵌入多Agent推理工作流	检索配置 --> 包含top-K和阈值参数的Agent工具定义
`engineering/ci-cd-pipeline-builder`	文档变更时自动执行嵌入重新索引、评估回归测试和部署	评估阈值 --> 指标退化时阻止部署的CI gate
`engineering/api-design-reviewer`	审查RAG服务暴露的查询和 ingestion API 接口	管道配置 --> 搜索和 ingest 端点的OpenAPI规范审查

Tool Reference

工具参考

chunking_optimizer.py

Purpose: Analyzes a document corpus and evaluates multiple chunking strategies (fixed-size, sentence-based, paragraph-based, semantic/heading-aware) to recommend the optimal approach with configuration parameters.

Usage:

bash

python chunking_optimizer.py <directory> [options]

Flags / Parameters:

Flag	Type	Default	Description
`directory`	positional, required	--	Directory containing text/markdown documents to analyze
`--output` , `-o`	string	None	Output file path for results in JSON format
`--config` , `-c`	string	None	JSON configuration file to customize strategy parameters (fixed_sizes, overlaps, sentence_max_sizes, paragraph_max_sizes, semantic_max_sizes)
`--extensions`	string list	`.txt .md .markdown`	File extensions to include when scanning the corpus
`--verbose` , `-v`	flag	off	Print all strategy scores in addition to the recommendation

Example:

bash

python chunking_optimizer.py ./docs --output results.json --extensions .txt .md --verbose

Output Formats:

Console -- Corpus summary, recommended strategy name, performance score, reasoning text, and two sample chunks. With
```
--verbose
```
, all strategy scores are listed.
JSON (
```
--output
```
) -- Full results object containing
```
corpus_info
```
,
```
strategy_results
```
(per-strategy size statistics, boundary quality, semantic coherence, vocabulary statistics, performance score),
```
recommendation
```
(best strategy, all scores, reasoning), and
```
sample_chunks
```
.

用途： 分析文档语料库，评估多种分块策略（固定大小、基于句子、基于段落、语义/感知标题），推荐最优方案及配置参数。

用法：

bash

python chunking_optimizer.py <directory> [options]

参数/标志：

标志	类型	默认值	描述
`directory`	位置参数，必填	--	包含待分析文本/Markdown文档的目录
`--output` , `-o`	字符串	None	JSON格式结果的输出文件路径
`--config` , `-c`	字符串	None	自定义策略参数的JSON配置文件（fixed_sizes、overlaps、sentence_max_sizes、paragraph_max_sizes、semantic_max_sizes）
`--extensions`	字符串列表	`.txt .md .markdown`	扫描语料库时包含的文件扩展名
`--verbose` , `-v`	标志	关闭	除推荐结果外，打印所有策略得分

示例：

bash

python chunking_optimizer.py ./docs --output results.json --extensions .txt .md --verbose

输出格式：

控制台 -- 语料库摘要、推荐策略名称、性能得分、推理文本及两个样本分块。启用
```
--verbose
```
时，列出所有策略得分。
JSON（
```
--output
```
） -- 完整结果对象，包含
```
corpus_info
```
、
```
strategy_results
```
（各策略的大小统计、边界质量、语义连贯性、词汇统计、性能得分）、
```
recommendation
```
（最优策略、所有得分、推理）和
```
sample_chunks
```
。

retrieval_evaluator.py

Purpose: Evaluates retrieval system performance using a built-in TF-IDF baseline retriever and standard information retrieval metrics: Precision@K, Recall@K, MRR, and NDCG. Includes failure analysis and improvement recommendations.

Usage:

bash

python retrieval_evaluator.py <queries> <corpus> <ground_truth> [options]

Flags / Parameters:

Flag	Type	Default	Description
`queries`	positional, required	--	JSON file containing queries (list of `{"id": ..., "query": ...}` objects, or `{"queries": [...]}` )
`corpus`	positional, required	--	Directory containing the document corpus
`ground_truth`	positional, required	--	JSON file mapping query IDs to lists of relevant document IDs
`--output` , `-o`	string	None	Output file path for results in JSON format
`--k-values`	int list	`1 3 5 10`	K values used when computing Precision@K, Recall@K, and NDCG@K
`--extensions`	string list	`.txt .md .markdown`	File extensions to include from the corpus directory
`--verbose` , `-v`	flag	off	Print detailed per-metric values and failure analysis counts

Example:

bash

python retrieval_evaluator.py queries.json ./corpus ground_truth.json --output eval.json --k-values 1 5 10 --verbose

Output Formats:

Console -- Evaluation summary table (Precision@1, Precision@5, Recall@5, MRR, NDCG@5) with performance assessment and numbered improvement recommendations. With
```
--verbose
```
, all aggregate metrics and failure analysis counts are printed.
JSON (
```
--output
```
) -- Full results object containing
```
aggregate_metrics
```
,
```
query_results
```
(per-query metrics, retrieved docs, relevant docs),
```
failure_analysis
```
(poor precision/recall counts, zero-result counts, query length analysis, failure patterns),
```
evaluation_summary
```
, and
```
recommendations
```
.

用途： 使用内置TF-IDF基线检索器和标准信息检索指标（Precision@K、Recall@K、MRR、NDCG）评估检索系统性能。包含失败分析和改进建议。

用法：

bash

python retrieval_evaluator.py <queries> <corpus> <ground_truth> [options]

参数/标志：

标志	类型	默认值	描述
`queries`	位置参数，必填	--	包含查询的JSON文件（ `{"id": ..., "query": ...}` 对象列表，或 `{"queries": [...]}` 格式）
`corpus`	位置参数，必填	--	包含文档语料库的目录
`ground_truth`	位置参数，必填	--	将查询ID映射到相关文档ID列表的JSON文件
`--output` , `-o`	字符串	None	JSON格式结果的输出文件路径
`--k-values`	整数列表	`1 3 5 10`	计算Precision@K、Recall@K和NDCG@K时使用的K值
`--extensions`	字符串列表	`.txt .md .markdown`	从语料库目录中包含的文件扩展名
`--verbose` , `-v`	标志	关闭	打印详细的每指标值和失败分析统计

示例：

bash

python retrieval_evaluator.py queries.json ./corpus ground_truth.json --output eval.json --k-values 1 5 10 --verbose

输出格式：

控制台 -- 评估摘要表（Precision@1、Precision@5、Recall@5、MRR、NDCG@5），含性能评估和编号改进建议。启用
```
--verbose
```
时，打印所有聚合指标和失败分析统计。
JSON（
```
--output
```
） -- 完整结果对象，包含
```
aggregate_metrics
```
、
```
query_results
```
（每查询指标、检索文档、相关文档）、
```
failure_analysis
```
（低精度/召回率统计、零结果统计、查询长度分析、失败模式）、
```
evaluation_summary
```
和
```
recommendations
```
。

rag_pipeline_designer.py

Purpose: Accepts a system requirements specification and generates a complete RAG pipeline design including component recommendations (chunking, embedding, vector DB, retrieval, reranking, evaluation), cost projections, a Mermaid architecture diagram, and deployment configuration templates.

Usage:

bash

python rag_pipeline_designer.py <requirements> [options]

Flags / Parameters:

Flag	Type	Default	Description
`requirements`	positional, required	--	JSON file containing system requirements (document_types, document_count, avg_document_size, queries_per_day, query_patterns, latency_requirement, budget_monthly, accuracy_priority, cost_priority, maintenance_complexity)
`--output` , `-o`	string	None	Output file path for the pipeline design in JSON format
`--verbose` , `-v`	flag	off	Print full configuration templates for each component

Example:

bash

python rag_pipeline_designer.py requirements.json --output pipeline_design.json --verbose

Output Formats:

Console -- Design summary with total monthly cost, per-component recommendations (name, rationale, cost), and a Mermaid architecture diagram. With
```
--verbose
```
, full JSON configuration templates for each component are printed.
JSON (
```
--output
```
) -- Complete pipeline design object containing per-component
```
ComponentRecommendation
```
fields (name, type, config, rationale, pros, cons, cost_monthly),
```
total_cost
```
,
```
architecture_diagram
```
(Mermaid markup), and
```
config_templates
```
(per-component configs plus deployment/scaling/monitoring settings).

用途： 接收系统需求规格，生成完整的RAG管道设计，包括组件推荐（分块、嵌入、向量数据库、检索、重排序、评估）、成本预测、Mermaid架构图和部署配置模板。

用法：

bash

python rag_pipeline_designer.py <requirements> [options]

参数/标志：

标志	类型	默认值	描述
`requirements`	位置参数，必填	--	包含系统需求的JSON文件（document_types、document_count、avg_document_size、queries_per_day、query_patterns、latency_requirement、budget_monthly、accuracy_priority、cost_priority、maintenance_complexity）
`--output` , `-o`	字符串	None	JSON格式管道设计的输出文件路径
`--verbose` , `-v`	标志	关闭	打印各组件的完整配置模板

示例：

bash

python rag_pipeline_designer.py requirements.json --output pipeline_design.json --verbose

输出格式：

控制台 -- 设计摘要，含月度总成本、各组件推荐（名称、理由、成本）及Mermaid架构图。启用
```
--verbose
```
时，打印各组件的完整JSON配置模板。
JSON（
```
--output
```
） -- 完整管道设计对象，包含各组件的
```
ComponentRecommendation
```
字段（名称、类型、配置、理由、优势、劣势、月度成本）、
```
total_cost
```
、
```
architecture_diagram
```
（Mermaid标记）和
```
config_templates
```
（各组件配置及部署/扩缩容/监控设置）。