rag-architect

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

RAG Architect

RAG架构师

The agent designs, implements, and optimizes production-grade Retrieval-Augmented Generation pipelines, covering the full lifecycle from document chunking through evaluation.
该Agent负责设计、实现并优化生产级检索增强生成(Retrieval-Augmented Generation)管道,覆盖从文档分块到评估的完整生命周期。

Workflow

工作流程

  1. Analyse corpus -- Profile the document collection: count, average length, format mix (PDF, HTML, Markdown), language(s), and domain. Validate that sample documents are accessible before proceeding.
  2. Select chunking strategy -- Choose from the Chunking Strategy Matrix based on corpus characteristics. Set chunk size, overlap, and boundary rules. Run a test split on 100 sample documents.
  3. Choose embedding model -- Select an embedding model from the Embedding Model table based on domain, latency budget, and cost constraints. Verify dimension compatibility with the target vector database.
  4. Select vector database -- Pick a vector store from the Vector Database Comparison based on scale, query patterns, and operational requirements.
  5. Design retrieval pipeline -- Configure retrieval strategy (dense, sparse, or hybrid). Add reranking if precision requirements exceed 0.85. Set the top-K parameter and similarity threshold.
  6. Implement query transformations -- If query-document style mismatch exists, enable HyDE. If queries are ambiguous, enable multi-query generation. Validate each transformation improves retrieval metrics on a held-out set.
  7. Configure guardrails -- Enable PII detection, toxicity filtering, hallucination detection, and source attribution. Set confidence score thresholds.
  8. Evaluate end-to-end -- Run the RAGAS evaluation framework. Verify faithfulness > 0.90, context relevance > 0.80, answer relevance > 0.85. Iterate on weak components.
  1. 分析语料库 -- 分析文档集合概况:数量、平均长度、格式组合(PDF、HTML、Markdown)、语言及领域。在继续操作前验证样本文档可访问。
  2. 选择分块策略 -- 根据语料库特征从分块策略矩阵中选择合适策略。设置分块大小、重叠率和边界规则。对100份样本文档进行测试拆分。
  3. 选择嵌入模型 -- 根据领域、延迟预算和成本限制从嵌入模型对照表中选择模型。验证模型维度与目标向量数据库兼容。
  4. 选择向量数据库 -- 根据规模、查询模式和运维需求从向量数据库对比表中选择向量存储。
  5. 设计检索管道 -- 配置检索策略(稠密检索、稀疏检索或混合检索)。若精度要求超过0.85,添加重排序环节。设置top-K参数和相似度阈值。
  6. 实现查询转换 -- 若查询与文档风格不匹配,启用HyDE。若查询存在歧义,启用多查询生成。在预留数据集上验证每种转换是否提升检索指标。
  7. 配置防护机制 -- 启用PII检测、毒性过滤、幻觉检测和来源归因功能。设置置信度分数阈值。
  8. 端到端评估 -- 运行RAGAS评估框架。验证忠实度>0.90、上下文相关性>0.80、答案相关性>0.85。针对薄弱环节迭代优化。

Chunking Strategy Matrix

分块策略矩阵

StrategyBest ForChunk SizeOverlapProsCons
Fixed-size (token)Uniform docs, consistent sizing512-2048 tokens10-20%Predictable, simpleBreaks semantic units
Sentence-basedNarrative text, articles3-8 sentences1 sentencePreserves language boundariesVariable sizes
Paragraph-basedStructured docs, technical manuals1-3 paragraphs0-1 paragraphPreserves topic coherenceHighly variable sizes
SemanticLong-form, research papersDynamicTopic-shift detectionBest coherenceComputationally expensive
RecursiveMixed content typesDynamic, multi-levelPer-levelOptimal utilizationComplex implementation
Document-awareMulti-format collectionsFormat-specificSection-levelPreserves metadataFormat-specific code required
策略适用场景分块大小重叠率优势劣势
Fixed-size (token)格式统一、篇幅一致的文档512-2048 tokens10-20%可预测、实现简单破坏语义单元
Sentence-based叙事文本、文章类内容3-8个句子1个句子保留语言边界分块大小不稳定
Paragraph-based结构化文档、技术手册1-3个段落0-1个段落保留主题连贯性分块大小差异极大
Semantic长篇内容、研究论文动态调整主题转移检测语义连贯性最佳计算成本高
Recursive混合内容类型动态多层级按层级设置资源利用率最优实现复杂
Document-aware多格式集合格式特定章节级保留元数据需要格式特定代码

Embedding Model Comparison

嵌入模型对比表

ModelDimensionsSpeedQualityCostBest For
all-MiniLM-L6-v2384~14K tok/sGoodFree (local)Prototyping, low-latency
all-mpnet-base-v2768~2.8K tok/sBetterFree (local)Balanced production use
text-embedding-3-small1536APIHigh$0.02/1M tokensCost-effective production
text-embedding-3-large3072APIHighest$0.13/1M tokensMaximum quality
Domain fine-tunedVariesVariesDomain-bestTraining costSpecialized domains (legal, medical)
模型维度速度质量成本适用场景
all-MiniLM-L6-v2384~14K tok/s良好免费(本地部署)原型开发、低延迟场景
all-mpnet-base-v2768~2.8K tok/s更优免费(本地部署)平衡型生产场景
text-embedding-3-small1536API调用$0.02/1M tokens高性价比生产场景
text-embedding-3-large3072API调用最高$0.13/1M tokens追求极致质量场景
Domain fine-tuned可变可变领域最优需训练成本专业领域(法律、医疗)

Vector Database Comparison

向量数据库对比表

DatabaseTypeScalingKey FeatureBest For
PineconeManagedAuto-scalingMetadata filtering, hybrid searchProduction, managed preference
WeaviateOpen sourceHorizontalGraphQL API, multi-modalComplex data types
QdrantOpen sourceDistributedHigh perf, low memory (Rust)Performance-critical
ChromaEmbeddedLimitedSimple API, SQLite-backedPrototyping, small-scale
pgvectorPostgreSQL extPostgreSQL scalingACID, SQL joinsExisting PostgreSQL infra
数据库类型扩展性核心特性适用场景
Pinecone托管式自动扩缩容元数据过滤、混合检索生产环境、偏好托管服务
Weaviate开源水平扩展GraphQL API、多模态支持复杂数据类型场景
Qdrant开源分布式高性能、低内存占用(Rust实现)性能敏感场景
Chroma嵌入式有限扩展简单API、SQLite支撑原型开发、小规模场景
pgvectorPostgreSQL扩展跟随PostgreSQL扩展ACID合规、SQL关联查询已有PostgreSQL基础设施

Retrieval Strategies

检索策略

StrategyWhen to UseImplementation
Dense (vector similarity)Default for semantic searchCosine similarity with k-NN/ANN
Sparse (BM25/TF-IDF)Exact keyword matching neededElasticsearch or inverted index
Hybrid (dense + sparse)Best of both neededReciprocal Rank Fusion (RRF) with tuned weights
+ RerankingPrecision must exceed 0.85Cross-encoder reranker after initial retrieval
策略适用场景实现方式
Dense (vector similarity)语义搜索默认方案基于k-NN/ANN的余弦相似度计算
Sparse (BM25/TF-IDF)需要精确关键词匹配Elasticsearch或倒排索引
Hybrid (dense + sparse)需要兼顾两者优势带权重调优的Reciprocal Rank Fusion (RRF)
+ Reranking精度必须超过0.85初始检索后使用Cross-encoder重排序器

Query Transformation Techniques

查询转换技术

TechniqueWhen to UseHow It Works
HyDEQuery/document style mismatchLLM generates hypothetical answer; embed that instead of query
Multi-queryAmbiguous queriesGenerate 3-5 query variations; retrieve for each; deduplicate
Step-backSpecific questions needing general contextTransform to broader query; retrieve general + specific
技术适用场景工作原理
HyDE查询与文档风格不匹配LLM生成假设性答案;将该答案嵌入而非原查询
Multi-query查询存在歧义生成3-5个查询变体;分别检索;去重
Step-back需要通用上下文的特定问题转换为更宽泛的查询;检索通用+特定内容

Context Window Optimization

上下文窗口优化

  • Relevance ordering: Most relevant chunks first in the context window
  • Diversity: Deduplicate semantically similar chunks
  • Token budget: Fit within model context limit; reserve tokens for system prompt and answer
  • Hierarchical inclusion: Include section summary before detailed chunks when available
  • Compression: Summarize low-relevance chunks; extract key facts from verbose passages
  • 相关性排序:上下文窗口中优先放置最相关的分块
  • 多样性:语义相似的分块去重
  • Token预算:控制在模型上下文限制内;为系统提示词和答案预留Token
  • 层级包含:若有章节摘要,在详细分块前先加入摘要
  • 压缩:对低相关性分块进行摘要;从冗长段落中提取关键事实

Evaluation Metrics (RAGAS Framework)

评估指标(RAGAS框架)

MetricTargetWhat It Measures
Faithfulness> 0.90Answers grounded in retrieved context
Context Relevance> 0.80Retrieved chunks relevant to query
Answer Relevance> 0.85Answer addresses the original question
Precision@K> 0.70% of top-K results that are relevant
Recall@K> 0.80% of relevant docs found in top-K
MRR> 0.75Reciprocal rank of first relevant result
指标目标值衡量内容
Faithfulness> 0.90答案是否基于检索到的上下文
Context Relevance> 0.80检索到的分块与查询的相关性
Answer Relevance> 0.85答案是否回应原始问题
Precision@K> 0.70top-K结果中相关内容占比
Recall@K> 0.80相关文档在top-K结果中的占比
MRR> 0.75第一个相关结果的倒数排名

Guardrails

防护机制

  • PII detection: Scan retrieved chunks and generated responses for PII; redact or block
  • Hallucination detection: Compare generated claims against source documents via NLI
  • Source attribution: Every factual claim must cite a retrieved chunk
  • Confidence scoring: Return confidence level; if below threshold, return "I don't have enough information"
  • Injection prevention: Sanitize user queries; reject prompt injection attempts
  • PII检测:扫描检索到的分块和生成的响应,识别PII信息并进行脱敏或拦截
  • 幻觉检测:通过自然语言推理(NLI)对比生成内容与源文档
  • 来源归因:每个事实性声明必须引用检索到的分块
  • 置信度评分:返回置信度;若低于阈值,返回"我没有足够的信息"
  • 注入防护:净化用户查询;拒绝提示词注入尝试

Example: Internal Knowledge Base RAG Pipeline

示例:内部知识库RAG管道

yaml
corpus:
  documents: 12,000 Confluence pages + 3,000 PDFs
  avg_length: 2,400 tokens
  languages: [English]
  domain: internal engineering docs

pipeline:
  chunking:
    strategy: recursive
    max_tokens: 512
    overlap: 50 tokens
    boundary: paragraph
  embedding:
    model: text-embedding-3-small
    dimensions: 1536
    batch_size: 100
  vector_db:
    engine: pgvector
    index: HNSW (ef_construction=128, m=16)
    reason: "Existing PostgreSQL infra; ACID compliance for audit"
  retrieval:
    strategy: hybrid
    dense_weight: 0.7
    sparse_weight: 0.3
    top_k: 10
    reranker: cross-encoder/ms-marco-MiniLM-L-12-v2
    final_k: 5

evaluation_results:
  faithfulness: 0.93
  context_relevance: 0.84
  answer_relevance: 0.88
  precision_at_5: 0.76
  recall_at_10: 0.85
yaml
corpus:
  documents: 12,000 Confluence pages + 3,000 PDFs
  avg_length: 2,400 tokens
  languages: [English]
  domain: internal engineering docs

pipeline:
  chunking:
    strategy: recursive
    max_tokens: 512
    overlap: 50 tokens
    boundary: paragraph
  embedding:
    model: text-embedding-3-small
    dimensions: 1536
    batch_size: 100
  vector_db:
    engine: pgvector
    index: HNSW (ef_construction=128, m=16)
    reason: "Existing PostgreSQL infra; ACID compliance for audit"
  retrieval:
    strategy: hybrid
    dense_weight: 0.7
    sparse_weight: 0.3
    top_k: 10
    reranker: cross-encoder/ms-marco-MiniLM-L-12-v2
    final_k: 5

evaluation_results:
  faithfulness: 0.93
  context_relevance: 0.84
  answer_relevance: 0.88
  precision_at_5: 0.76
  recall_at_10: 0.85

Production Patterns

生产模式

  • Caching: Query-level (exact match), semantic (similar queries via embedding distance < 0.05), chunk-level (embedding cache)
  • Streaming: Stream generation tokens while retrieval completes; show sources after generation
  • Fallbacks: If primary vector DB is unavailable, serve from read-replica; if retrieval returns no results above threshold, say so explicitly
  • Document refresh: Incremental re-embedding on change detection; full re-index weekly
  • Cost control: Batch embeddings, cache aggressively, route simple queries to BM25 only
  • 缓存:查询级(精确匹配)、语义级(嵌入距离<0.05的相似查询)、分块级(嵌入缓存)
  • 流式输出:检索完成时同步生成Token;生成结束后展示来源
  • 降级方案:若主向量数据库不可用,切换至只读副本;若检索结果均低于阈值,明确告知用户
  • 文档刷新:变更检测时增量重新嵌入;每周全量重新索引
  • 成本控制:批量嵌入、积极缓存、简单查询仅路由至BM25

Common Pitfalls

常见陷阱

ProblemSolution
Chunks break mid-sentenceUse boundary-aware chunking with sentence/paragraph overlap
Low retrieval precisionAdd cross-encoder reranker; tune similarity threshold
High latency (> 2s)Cache embeddings; use faster model; reduce top-K
Inconsistent qualityImplement RAGAS evaluation in CI; add quality scoring
Scalability bottleneckShard vector DB; implement auto-scaling; add caching layer
问题解决方案
分块在句子中间断开使用感知边界的分策略,设置句子/段落重叠
检索精度低添加Cross-encoder重排序器;调整相似度阈值
延迟过高(>2s)缓存嵌入结果;使用更快的模型;降低top-K值
质量不稳定在CI中实现RAGAS评估;添加质量评分机制
扩展性瓶颈分片向量数据库;实现自动扩缩容;添加缓存层

Scripts

脚本工具

Chunking Optimizer

Chunking Optimizer

Analyses corpus and recommends optimal chunking strategy with parameters.
分析语料库并推荐最优分块策略及参数。

Retrieval Evaluator

Retrieval Evaluator

Runs evaluation suite (precision, recall, MRR, NDCG) against a test query set.
针对测试查询集运行评估套件(精度、召回率、MRR、NDCG)。

Pipeline Benchmarker

Pipeline Benchmarker

Measures end-to-end latency, throughput, and cost per query across configurations.
在不同配置下测量端到端延迟、吞吐量和单查询成本。

Troubleshooting

故障排查

ProblemCauseSolution
Chunks contain incomplete sentences or broken code blocksFixed-size chunking ignoring semantic boundariesSwitch to sentence-based or semantic (heading-aware) chunking; enable boundary detection in
chunking_optimizer.py
Retrieved context is relevant but answer is wrongLLM hallucinating beyond retrieved chunksEnable faithfulness evaluation via RAGAS; add source attribution guardrails; lower confidence threshold to surface "I don't know" responses
Precision@K below 0.50 despite relevant documents existingEmbedding model does not capture domain vocabularyFine-tune embedding model on domain data or switch to a domain-specific model; add cross-encoder reranking stage
Query latency exceeds 2 secondsLarge top-K, no caching, or unoptimized HNSW indexReduce top-K, enable query-level and semantic caching, tune HNSW parameters (ef_search, m)
Recall drops after adding new documentsStale embeddings or index fragmentation after incremental insertsTrigger full re-index; verify new documents pass chunking pipeline; check embedding model version consistency
Hybrid retrieval returns duplicate chunksDense and sparse retrievers returning overlapping results without deduplicationApply Reciprocal Rank Fusion (RRF) with deduplication before reranking; tune dense/sparse weight ratio
Evaluation metrics fluctuate across runsNon-deterministic embedding batching or insufficient test query setFix random seeds, increase evaluation sample size, run evaluations on a frozen ground-truth set
问题原因解决方案
分块包含不完整句子或损坏的代码块固定大小分块忽略语义边界切换为基于句子或语义(感知标题)的分块策略;在
chunking_optimizer.py
中启用边界检测
检索上下文相关但答案错误LLM生成超出检索上下文的内容通过RAGAS启用忠实度评估;添加来源归因防护机制;降低置信度阈值以触发"我不知道"响应
尽管存在相关文档,但Precision@K低于0.50嵌入模型未捕捉领域词汇在领域数据上微调嵌入模型或切换为领域特定模型;添加Cross-encoder重排序环节
查询延迟超过2秒top-K值过大、无缓存或HNSW索引未优化降低top-K值;启用查询级和语义级缓存;调整HNSW参数(ef_search、m)
添加新文档后召回率下降嵌入结果过时或增量插入后索引碎片化触发全量重新索引;验证新文档通过分块管道;检查嵌入模型版本一致性
混合检索返回重复分块稠密和稀疏检索器返回重叠结果且未去重在重排序前应用带去重的Reciprocal Rank Fusion (RRF);调整稠密/稀疏权重比
评估指标在多次运行中波动嵌入批处理非确定性或测试查询集不足固定随机种子;增加评估样本量;在冻结的真值数据集上运行评估

Success Criteria

成功标准

  • Faithfulness > 0.90 -- Generated answers are grounded in retrieved context as measured by the RAGAS faithfulness metric.
  • Context Relevance > 0.80 -- At least 80% of retrieved chunks are relevant to the user query.
  • Precision@5 > 0.70 -- Seven out of ten top-5 result sets contain only relevant documents.
  • End-to-end latency < 500ms -- P95 query-to-response latency stays under 500 milliseconds for interactive workloads.
  • Recall@10 > 0.85 -- The system retrieves at least 85% of relevant documents within the top 10 results.
  • Chunk boundary quality > 0.80 -- At least 80% of chunks end on clean sentence or paragraph boundaries as reported by
    chunking_optimizer.py
    .
  • Monthly cost within budget -- Total embedding, vector DB, and reranking costs stay within the budget ceiling defined in requirements.
  • 忠实度>0.90 -- 生成的答案基于检索到的上下文,由RAGAS忠实度指标衡量。
  • 上下文相关性>0.80 -- 至少80%的检索分块与用户查询相关。
  • Precision@5>0.70 -- 十个top-5结果集中至少七个仅包含相关文档。
  • 端到端延迟<500ms -- 交互式工作负载的P95查询到响应延迟保持在500毫秒以内。
  • Recall@10>0.85 -- 系统在top10结果中检索到至少85%的相关文档。
  • 分块边界质量>0.80 -- 至少80%的分块在完整句子或段落边界结束,由
    chunking_optimizer.py
    报告。
  • 月度成本在预算内 -- 嵌入、向量数据库和重排序的总成本保持在需求定义的预算上限内。

Scope & Limitations

范围与限制

This skill covers:
  • End-to-end RAG pipeline architecture design: chunking, embedding, vector storage, retrieval, reranking, and evaluation.
  • Quantitative chunking analysis across four strategy families (fixed-size, sentence, paragraph, semantic).
  • Retrieval quality evaluation using standard IR metrics (Precision@K, Recall@K, MRR, NDCG) with a built-in TF-IDF baseline.
  • Automated pipeline design with component selection, cost projection, and Mermaid architecture diagrams.
This skill does NOT cover:
  • LLM prompt engineering or generation-side optimization -- see
    engineering/prompt-engineer-toolkit
    .
  • Database schema design for metadata stores alongside vector databases -- see
    engineering/database-designer
    .
  • Production observability, alerting, and SLO dashboards for deployed pipelines -- see
    engineering/observability-designer
    .
  • Agent orchestration or multi-step reasoning workflows that sit on top of RAG retrieval -- see
    engineering/agent-workflow-designer
    .
本技能涵盖:
  • 端到端RAG管道架构设计:分块、嵌入、向量存储、检索、重排序和评估。
  • 四类策略家族(固定大小、句子、段落、语义)的定量分块分析。
  • 使用标准IR指标(Precision@K、Recall@K、MRR、NDCG)结合内置TF-IDF基线进行检索质量评估。
  • 自动化管道设计,包括组件选择、成本预测和Mermaid架构图。
本技能不涵盖:
  • LLM提示词工程或生成侧优化 -- 请查看
    engineering/prompt-engineer-toolkit
  • 向量数据库配套的元数据存储数据库 schema 设计 -- 请查看
    engineering/database-designer
  • 已部署管道的生产可观测性、告警和SLO仪表盘 -- 请查看
    engineering/observability-designer
  • 基于RAG检索的Agent编排或多步骤推理工作流 -- 请查看
    engineering/agent-workflow-designer

Integration Points

集成点

SkillIntegrationData Flow
engineering/prompt-engineer-toolkit
Optimize system prompts and few-shot examples fed alongside retrieved chunksPipeline design output --> prompt templates that reference chunk format and metadata
engineering/database-designer
Design relational metadata stores (tags, access control, source tracking) paired with the vector databaseVector DB recommendation --> metadata schema for hybrid storage
engineering/observability-designer
Set up latency, throughput, and accuracy monitoring for the deployed RAG pipelineEvaluation metrics and SLO targets --> dashboards and alerting rules
engineering/agent-workflow-designer
Embed the RAG retrieval step inside multi-agent reasoning workflowsRetrieval config --> agent tool definition with top-K and threshold parameters
engineering/ci-cd-pipeline-builder
Automate embedding re-indexing, evaluation regression tests, and deployment on document changesEvaluation thresholds --> CI gate that blocks deploys when metrics regress
engineering/api-design-reviewer
Review the query and ingestion API surface exposed by the RAG servicePipeline config --> OpenAPI spec review for search and ingest endpoints
技能集成方式数据流
engineering/prompt-engineer-toolkit
优化与检索分块一起传入的系统提示词和少样本示例管道设计输出 --> 引用分块格式和元数据的提示词模板
engineering/database-designer
设计与向量数据库配对的关系型元数据存储(标签、访问控制、来源追踪)向量数据库推荐 --> 混合存储的元数据schema
engineering/observability-designer
为部署的RAG管道设置延迟、吞吐量和准确性监控评估指标和SLO目标 --> 仪表盘和告警规则
engineering/agent-workflow-designer
将RAG检索步骤嵌入多Agent推理工作流检索配置 --> 包含top-K和阈值参数的Agent工具定义
engineering/ci-cd-pipeline-builder
文档变更时自动执行嵌入重新索引、评估回归测试和部署评估阈值 --> 指标退化时阻止部署的CI gate
engineering/api-design-reviewer
审查RAG服务暴露的查询和 ingestion API 接口管道配置 --> 搜索和 ingest 端点的OpenAPI规范审查

Tool Reference

工具参考

chunking_optimizer.py

chunking_optimizer.py

Purpose: Analyzes a document corpus and evaluates multiple chunking strategies (fixed-size, sentence-based, paragraph-based, semantic/heading-aware) to recommend the optimal approach with configuration parameters.
Usage:
bash
python chunking_optimizer.py <directory> [options]
Flags / Parameters:
FlagTypeDefaultDescription
directory
positional, required--Directory containing text/markdown documents to analyze
--output
,
-o
stringNoneOutput file path for results in JSON format
--config
,
-c
stringNoneJSON configuration file to customize strategy parameters (fixed_sizes, overlaps, sentence_max_sizes, paragraph_max_sizes, semantic_max_sizes)
--extensions
string list
.txt .md .markdown
File extensions to include when scanning the corpus
--verbose
,
-v
flagoffPrint all strategy scores in addition to the recommendation
Example:
bash
python chunking_optimizer.py ./docs --output results.json --extensions .txt .md --verbose
Output Formats:
  • Console -- Corpus summary, recommended strategy name, performance score, reasoning text, and two sample chunks. With
    --verbose
    , all strategy scores are listed.
  • JSON (
    --output
    ) -- Full results object containing
    corpus_info
    ,
    strategy_results
    (per-strategy size statistics, boundary quality, semantic coherence, vocabulary statistics, performance score),
    recommendation
    (best strategy, all scores, reasoning), and
    sample_chunks
    .

用途: 分析文档语料库,评估多种分块策略(固定大小、基于句子、基于段落、语义/感知标题),推荐最优方案及配置参数。
用法:
bash
python chunking_optimizer.py <directory> [options]
参数/标志:
标志类型默认值描述
directory
位置参数,必填--包含待分析文本/Markdown文档的目录
--output
,
-o
字符串NoneJSON格式结果的输出文件路径
--config
,
-c
字符串None自定义策略参数的JSON配置文件(fixed_sizes、overlaps、sentence_max_sizes、paragraph_max_sizes、semantic_max_sizes)
--extensions
字符串列表
.txt .md .markdown
扫描语料库时包含的文件扩展名
--verbose
,
-v
标志关闭除推荐结果外,打印所有策略得分
示例:
bash
python chunking_optimizer.py ./docs --output results.json --extensions .txt .md --verbose
输出格式:
  • 控制台 -- 语料库摘要、推荐策略名称、性能得分、推理文本及两个样本分块。启用
    --verbose
    时,列出所有策略得分。
  • JSON
    --output
    ) -- 完整结果对象,包含
    corpus_info
    strategy_results
    (各策略的大小统计、边界质量、语义连贯性、词汇统计、性能得分)、
    recommendation
    (最优策略、所有得分、推理)和
    sample_chunks

retrieval_evaluator.py

retrieval_evaluator.py

Purpose: Evaluates retrieval system performance using a built-in TF-IDF baseline retriever and standard information retrieval metrics: Precision@K, Recall@K, MRR, and NDCG. Includes failure analysis and improvement recommendations.
Usage:
bash
python retrieval_evaluator.py <queries> <corpus> <ground_truth> [options]
Flags / Parameters:
FlagTypeDefaultDescription
queries
positional, required--JSON file containing queries (list of
{"id": ..., "query": ...}
objects, or
{"queries": [...]}
)
corpus
positional, required--Directory containing the document corpus
ground_truth
positional, required--JSON file mapping query IDs to lists of relevant document IDs
--output
,
-o
stringNoneOutput file path for results in JSON format
--k-values
int list
1 3 5 10
K values used when computing Precision@K, Recall@K, and NDCG@K
--extensions
string list
.txt .md .markdown
File extensions to include from the corpus directory
--verbose
,
-v
flagoffPrint detailed per-metric values and failure analysis counts
Example:
bash
python retrieval_evaluator.py queries.json ./corpus ground_truth.json --output eval.json --k-values 1 5 10 --verbose
Output Formats:
  • Console -- Evaluation summary table (Precision@1, Precision@5, Recall@5, MRR, NDCG@5) with performance assessment and numbered improvement recommendations. With
    --verbose
    , all aggregate metrics and failure analysis counts are printed.
  • JSON (
    --output
    ) -- Full results object containing
    aggregate_metrics
    ,
    query_results
    (per-query metrics, retrieved docs, relevant docs),
    failure_analysis
    (poor precision/recall counts, zero-result counts, query length analysis, failure patterns),
    evaluation_summary
    , and
    recommendations
    .

用途: 使用内置TF-IDF基线检索器和标准信息检索指标(Precision@K、Recall@K、MRR、NDCG)评估检索系统性能。包含失败分析和改进建议。
用法:
bash
python retrieval_evaluator.py <queries> <corpus> <ground_truth> [options]
参数/标志:
标志类型默认值描述
queries
位置参数,必填--包含查询的JSON文件(
{"id": ..., "query": ...}
对象列表,或
{"queries": [...]}
格式)
corpus
位置参数,必填--包含文档语料库的目录
ground_truth
位置参数,必填--将查询ID映射到相关文档ID列表的JSON文件
--output
,
-o
字符串NoneJSON格式结果的输出文件路径
--k-values
整数列表
1 3 5 10
计算Precision@K、Recall@K和NDCG@K时使用的K值
--extensions
字符串列表
.txt .md .markdown
从语料库目录中包含的文件扩展名
--verbose
,
-v
标志关闭打印详细的每指标值和失败分析统计
示例:
bash
python retrieval_evaluator.py queries.json ./corpus ground_truth.json --output eval.json --k-values 1 5 10 --verbose
输出格式:
  • 控制台 -- 评估摘要表(Precision@1、Precision@5、Recall@5、MRR、NDCG@5),含性能评估和编号改进建议。启用
    --verbose
    时,打印所有聚合指标和失败分析统计。
  • JSON
    --output
    ) -- 完整结果对象,包含
    aggregate_metrics
    query_results
    (每查询指标、检索文档、相关文档)、
    failure_analysis
    (低精度/召回率统计、零结果统计、查询长度分析、失败模式)、
    evaluation_summary
    recommendations

rag_pipeline_designer.py

rag_pipeline_designer.py

Purpose: Accepts a system requirements specification and generates a complete RAG pipeline design including component recommendations (chunking, embedding, vector DB, retrieval, reranking, evaluation), cost projections, a Mermaid architecture diagram, and deployment configuration templates.
Usage:
bash
python rag_pipeline_designer.py <requirements> [options]
Flags / Parameters:
FlagTypeDefaultDescription
requirements
positional, required--JSON file containing system requirements (document_types, document_count, avg_document_size, queries_per_day, query_patterns, latency_requirement, budget_monthly, accuracy_priority, cost_priority, maintenance_complexity)
--output
,
-o
stringNoneOutput file path for the pipeline design in JSON format
--verbose
,
-v
flagoffPrint full configuration templates for each component
Example:
bash
python rag_pipeline_designer.py requirements.json --output pipeline_design.json --verbose
Output Formats:
  • Console -- Design summary with total monthly cost, per-component recommendations (name, rationale, cost), and a Mermaid architecture diagram. With
    --verbose
    , full JSON configuration templates for each component are printed.
  • JSON (
    --output
    ) -- Complete pipeline design object containing per-component
    ComponentRecommendation
    fields (name, type, config, rationale, pros, cons, cost_monthly),
    total_cost
    ,
    architecture_diagram
    (Mermaid markup), and
    config_templates
    (per-component configs plus deployment/scaling/monitoring settings).
用途: 接收系统需求规格,生成完整的RAG管道设计,包括组件推荐(分块、嵌入、向量数据库、检索、重排序、评估)、成本预测、Mermaid架构图和部署配置模板。
用法:
bash
python rag_pipeline_designer.py <requirements> [options]
参数/标志:
标志类型默认值描述
requirements
位置参数,必填--包含系统需求的JSON文件(document_types、document_count、avg_document_size、queries_per_day、query_patterns、latency_requirement、budget_monthly、accuracy_priority、cost_priority、maintenance_complexity)
--output
,
-o
字符串NoneJSON格式管道设计的输出文件路径
--verbose
,
-v
标志关闭打印各组件的完整配置模板
示例:
bash
python rag_pipeline_designer.py requirements.json --output pipeline_design.json --verbose
输出格式:
  • 控制台 -- 设计摘要,含月度总成本、各组件推荐(名称、理由、成本)及Mermaid架构图。启用
    --verbose
    时,打印各组件的完整JSON配置模板。
  • JSON
    --output
    ) -- 完整管道设计对象,包含各组件的
    ComponentRecommendation
    字段(名称、类型、配置、理由、优势、劣势、月度成本)、
    total_cost
    architecture_diagram
    (Mermaid标记)和
    config_templates
    (各组件配置及部署/扩缩容/监控设置)。