mcp-documentation-server
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesemcp-documentation-server
mcp-documentation-server
Skill by ara.so — MCP Skills collection.
MCP Documentation Server provides local-first document management with semantic search capabilities. It uses an embedded Orama vector database for hybrid full-text and vector search, intelligent parent-child chunking for better context, and optional Google Gemini AI integration for advanced document analysis.
由ara.so开发的Skill — MCP Skills合集。
MCP Documentation Server提供本地优先的文档管理及语义搜索功能。它采用嵌入式Orama向量数据库实现混合全文与向量搜索,通过智能父子分块提升上下文关联性,并可选择集成Google Gemini AI进行高级文档分析。
Installation
安装
Quick Start with MCP Client
使用MCP Client快速启动
Add to your MCP client configuration (e.g., Claude Desktop ):
~/Library/Application Support/Claude/claude_desktop_config.jsonjson
{
"mcpServers": {
"documentation": {
"command": "npx",
"args": ["-y", "@andrea9293/mcp-documentation-server"]
}
}
}添加至你的MCP客户端配置文件(例如Claude Desktop的):
~/Library/Application Support/Claude/claude_desktop_config.jsonjson
{
"mcpServers": {
"documentation": {
"command": "npx",
"args": ["-y", "@andrea9293/mcp-documentation-server"]
}
}
}With Environment Variables
使用环境变量
json
{
"mcpServers": {
"documentation": {
"command": "npx",
"args": ["-y", "@andrea9293/mcp-documentation-server"],
"env": {
"MCP_BASE_DIR": "/path/to/workspace",
"GEMINI_API_KEY": "your-gemini-api-key",
"MCP_EMBEDDING_MODEL": "Xenova/paraphrase-multilingual-mpnet-base-v2",
"START_WEB_UI": "true",
"WEB_PORT": "3080"
}
}
}
}json
{
"mcpServers": {
"documentation": {
"command": "npx",
"args": ["-y", "@andrea9293/mcp-documentation-server"],
"env": {
"MCP_BASE_DIR": "/path/to/workspace",
"GEMINI_API_KEY": "your-gemini-api-key",
"MCP_EMBEDDING_MODEL": "Xenova/paraphrase-multilingual-mpnet-base-v2",
"START_WEB_UI": "true",
"WEB_PORT": "3080"
}
}
}
}Development Installation
开发环境安装
bash
git clone https://github.com/andrea9293/mcp-documentation-server.git
cd mcp-documentation-server
npm install
npm run buildbash
git clone https://github.com/andrea9293/mcp-documentation-server.git
cd mcp-documentation-server
npm install
npm run buildConfiguration
配置
Environment Variables
环境变量
| Variable | Default | Description |
|---|---|---|
| | Base directory for data storage |
| | Embedding model (384 dims) |
| — | Google Gemini API key for AI search |
| | Enable LRU embedding cache |
| | Start built-in web interface |
| | Web UI port |
| | Stream large files |
| | Streaming buffer (64KB) |
| | Streaming threshold (10MB) |
| 变量 | 默认值 | 描述 |
|---|---|---|
| | 数据存储的基础目录 |
| | 嵌入模型(384维度) |
| — | 用于AI搜索的Google Gemini API密钥 |
| | 启用LRU嵌入缓存 |
| | 启动内置Web界面 |
| | Web UI端口 |
| | 流式处理大文件 |
| | 流式处理缓冲区(64KB) |
| | 流式处理阈值(10MB) |
Embedding Models
嵌入模型
Fast (default):
- — 384 dimensions, ~80MB
Xenova/all-MiniLM-L6-v2
High Quality (recommended):
- — 768 dimensions, ~420MB, multilingual
Xenova/paraphrase-multilingual-mpnet-base-v2
⚠️ Changing models requires re-indexing all documents (embeddings are incompatible).
快速版(默认):
- — 384维度,约80MB
Xenova/all-MiniLM-L6-v2
高质量版(推荐):
- — 768维度,约420MB,支持多语言
Xenova/paraphrase-multilingual-mpnet-base-v2
⚠️ 更换模型需要重新索引所有文档(不同模型的嵌入向量不兼容)。
Storage Structure
存储结构
~/.mcp-documentation-server/
├── data/
│ ├── orama-chunks.msp # Vector DB (child chunks + embeddings)
│ ├── orama-docs.msp # Document DB (full content + metadata)
│ ├── orama-parents.msp # Parent chunks DB (context sections)
│ ├── migration-complete.flag
│ └── *.md # Markdown document copies
└── uploads/ # Drop files here for processing~/.mcp-documentation-server/
├── data/
│ ├── orama-chunks.msp # 向量数据库(子分块+嵌入向量)
│ ├── orama-docs.msp # 文档数据库(完整内容+元数据)
│ ├── orama-parents.msp # 父分块数据库(上下文段落)
│ ├── migration-complete.flag
│ └── *.md # Markdown文档副本
└── uploads/ # 将文件拖至此目录进行处理MCP Tools
MCP工具
Document Management
文档管理
add_document
add_document
Add a new document to the knowledge base.
typescript
// Tool call
{
"title": "API Reference",
"content": "# Authentication\nUse Bearer tokens...",
"metadata": {
"category": "api",
"version": "2.0",
"author": "team"
}
}Response includes document ID, chunk count, and timing stats.
向知识库中添加新文档。
typescript
// 工具调用
{
"title": "API参考手册",
"content": "# 认证\n使用Bearer令牌...",
"metadata": {
"category": "api",
"version": "2.0",
"author": "团队"
}
}返回结果包含文档ID、分块数量和计时统计信息。
list_documents
list_documents
List all documents with metadata and previews.
typescript
// Tool call (no parameters required)
// Returns array of documents:
[
{
"id": "doc_abc123",
"title": "API Reference",
"preview": "# Authentication\nUse Bearer...",
"metadata": { "category": "api" },
"created": "2025-01-15T10:30:00Z",
"updated": "2025-01-15T10:30:00Z"
}
]列出所有文档及其元数据和预览内容。
typescript
// 工具调用(无需参数)
// 返回文档数组:
[
{
"id": "doc_abc123",
"title": "API参考手册",
"preview": "# 认证\n使用Bearer...",
"metadata": { "category": "api" },
"created": "2025-01-15T10:30:00Z",
"updated": "2025-01-15T10:30:00Z"
}
]get_document
get_document
Retrieve full document content by ID.
typescript
// Tool call
{
"id": "doc_abc123"
}
// Returns complete document with metadata and full content通过ID检索完整文档内容。
typescript
// 工具调用
{
"id": "doc_abc123"
}
// 返回包含元数据和完整内容的文档delete_document
delete_document
Remove a document and all associated data.
typescript
// Tool call
{
"id": "doc_abc123"
}
// Deletes document, chunks, embeddings, and backup files删除文档及其所有关联数据。
typescript
// 工具调用
{
"id": "doc_abc123"
}
// 删除文档、分块、嵌入向量和备份文件File Processing
文件处理
process_uploads
process_uploads
Process all files from the uploads folder.
typescript
// Tool call (no parameters)
// Processes .txt, .md, .pdf files from ~/.mcp-documentation-server/uploads/
// Returns:
{
"processed": 3,
"failed": 0,
"results": [
{
"filename": "guide.md",
"success": true,
"documentId": "doc_xyz789",
"chunks": 12
}
]
}处理uploads文件夹中的所有文件。
typescript
// 工具调用(无需参数)
// 处理~/.mcp-documentation-server/uploads/目录下的.txt、.md、.pdf文件
// 返回结果:
{
"processed": 3,
"failed": 0,
"results": [
{
"filename": "guide.md",
"success": true,
"documentId": "doc_xyz789",
"chunks": 12
}
]
}get_uploads_path
get_uploads_path
Get the absolute path to the uploads folder.
typescript
// Tool call (no parameters)
// Returns: "/Users/username/.mcp-documentation-server/uploads"获取uploads文件夹的绝对路径。
typescript
// 工具调用(无需参数)
// 返回:"/Users/username/.mcp-documentation-server/uploads"list_uploads_files
list_uploads_files
List files in the uploads folder.
typescript
// Tool call (no parameters)
// Returns:
[
{
"name": "api-guide.md",
"size": 45678,
"sizeFormatted": "44.6 KB",
"extension": ".md"
}
]列出uploads文件夹中的文件。
typescript
// 工具调用(无需参数)
// 返回结果:
[
{
"name": "api-guide.md",
"size": 45678,
"sizeFormatted": "44.6 KB",
"extension": ".md"
}
]get_ui_url
get_ui_url
Get the Web UI URL.
typescript
// Tool call (no parameters)
// Returns: "http://localhost:3080"获取Web UI的URL。
typescript
// 工具调用(无需参数)
// 返回:"http://localhost:3080"Search Tools
搜索工具
search_documents
search_documents
Semantic vector search within a specific document.
typescript
// Tool call
{
"documentId": "doc_abc123",
"query": "authentication methods",
"limit": 5
}
// Returns:
{
"results": [
{
"content": "Bearer token authentication is used...",
"parentContent": "# Authentication\nBearer token authentication...",
"score": 0.92,
"documentId": "doc_abc123",
"documentTitle": "API Reference"
}
],
"total": 5
}在指定文档内进行语义向量搜索。
typescript
// 工具调用
{
"documentId": "doc_abc123",
"query": "认证方式",
"limit": 5
}
// 返回结果:
{
"results": [
{
"content": "使用Bearer令牌认证...",
"parentContent": "# 认证\n使用Bearer令牌认证...",
"score": 0.92,
"documentId": "doc_abc123",
"documentTitle": "API参考手册"
}
],
"total": 5
}search_all_documents
search_all_documents
Hybrid full-text + vector search across all documents.
typescript
// Tool call
{
"query": "rate limiting configuration",
"limit": 10
}
// Returns deduplicated results sorted by relevance
// Includes both exact text matches and semantic similarity在所有文档中进行混合全文+向量搜索。
typescript
// 工具调用
{
"query": "限流配置",
"limit": 10
}
// 返回按相关性排序的去重结果
// 包含精确文本匹配和语义相似度匹配结果get_context_window
get_context_window
Fetch surrounding chunks for richer context.
typescript
// Tool call
{
"documentId": "doc_abc123",
"chunkIndex": 5,
"windowSize": 2
}
// Returns chunks [3, 4, 5, 6, 7] with the target chunk at index 5
// Useful for giving LLMs broader context around a search result获取目标分块的周边内容以提供更丰富的上下文。
typescript
// 工具调用
{
"documentId": "doc_abc123",
"chunkIndex": 5,
"windowSize": 2
}
// 返回分块[3, 4, 5, 6, 7],目标分块位于索引5处
// 有助于为大语言模型提供搜索结果的更广泛上下文search_documents_with_ai
search_documents_with_ai
AI-powered search using Google Gemini (requires ).
GEMINI_API_KEYtypescript
// Tool call
{
"documentIds": ["doc_abc123", "doc_xyz789"],
"query": "How do I implement rate limiting with Redis?",
"conversationHistory": [
{
"role": "user",
"content": "What caching options are available?"
},
{
"role": "assistant",
"content": "Redis and Memcached are supported..."
}
]
}
// Returns AI-generated answer with context and sources使用Google Gemini进行AI驱动的搜索(需要配置)。
GEMINI_API_KEYtypescript
// 工具调用
{
"documentIds": ["doc_abc123", "doc_xyz789"],
"query": "如何使用Redis实现限流?",
"conversationHistory": [
{
"role": "user",
"content": "有哪些缓存选项可用?"
},
{
"role": "assistant",
"content": "支持Redis和Memcached..."
}
]
}
// 返回包含上下文和来源的AI生成答案Common Patterns
常见使用模式
Setting Up a Knowledge Base
搭建知识库
typescript
// 1. Add documents programmatically
const apiDoc = await mcp.call("add_document", {
title: "REST API Guide",
content: "# REST API\n\n## Endpoints...",
metadata: { type: "api", version: "1.0" }
});
// 2. Or drop files in uploads folder
const uploadsPath = await mcp.call("get_uploads_path");
// Copy files to uploadsPath, then:
const result = await mcp.call("process_uploads");
// 3. Verify documents
const docs = await mcp.call("list_documents");
console.log(`${docs.length} documents indexed`);typescript
// 1. 通过编程方式添加文档
const apiDoc = await mcp.call("add_document", {
title: "REST API指南",
content: "# REST API\n\n## 端点...",
metadata: { type: "api", version: "1.0" }
});
// 2. 或将文件拖至uploads文件夹
const uploadsPath = await mcp.call("get_uploads_path");
// 将文件复制到uploadsPath,然后执行:
const result = await mcp.call("process_uploads");
// 3. 验证文档
const docs = await mcp.call("list_documents");
console.log(`已索引${docs.length}个文档`);Semantic Search Workflow
语义搜索工作流
typescript
// Search across all documents
const results = await mcp.call("search_all_documents", {
query: "database connection pooling",
limit: 5
});
// Get more context for top result
if (results.results.length > 0) {
const topResult = results.results[0];
const context = await mcp.call("get_context_window", {
documentId: topResult.documentId,
chunkIndex: topResult.chunkIndex,
windowSize: 3
});
// context.chunks contains surrounding sections
}typescript
// 在所有文档中搜索
const results = await mcp.call("search_all_documents", {
query: "数据库连接池",
limit: 5
});
// 获取顶部结果的更多上下文
if (results.results.length > 0) {
const topResult = results.results[0];
const context = await mcp.call("get_context_window", {
documentId: topResult.documentId,
chunkIndex: topResult.chunkIndex,
windowSize: 3
});
// context.chunks包含周边段落
}AI-Assisted Research
AI辅助研究
typescript
// Use Gemini for complex queries
const answer = await mcp.call("search_documents_with_ai", {
documentIds: ["doc_1", "doc_2"],
query: "Compare authentication approaches and recommend best practices",
conversationHistory: []
});
// answer contains:
// - AI-generated response
// - Source chunks with citations
// - Confidence scorestypescript
// 使用Gemini处理复杂查询
const answer = await mcp.call("search_documents_with_ai", {
documentIds: ["doc_1", "doc_2"],
query: "比较不同认证方式并推荐最佳实践",
conversationHistory: []
});
// answer包含:
// - AI生成的响应
// - 带引用的来源分块
// - 置信度评分Batch Document Processing
批量文档处理
typescript
// Process multiple files efficiently
const files = await mcp.call("list_uploads_files");
console.log(`Found ${files.length} files to process`);
const result = await mcp.call("process_uploads");
result.results.forEach(item => {
if (item.success) {
console.log(`✓ ${item.filename}: ${item.chunks} chunks`);
} else {
console.error(`✗ ${item.filename}: ${item.error}`);
}
});typescript
// 高效处理多个文件
const files = await mcp.call("list_uploads_files");
console.log(`发现${files.length}个待处理文件`);
const result = await mcp.call("process_uploads");
result.results.forEach(item => {
if (item.success) {
console.log(`✓ ${item.filename}: ${item.chunks}个分块`);
} else {
console.error(`✗ ${item.filename}: ${item.error}`);
}
});Managing Document Lifecycle
文档生命周期管理
typescript
// List and filter documents
const allDocs = await mcp.call("list_documents");
const apiDocs = allDocs.filter(doc =>
doc.metadata?.category === "api"
);
// Update a document (delete + re-add)
await mcp.call("delete_document", { id: "doc_old" });
await mcp.call("add_document", {
title: "Updated API Docs",
content: updatedContent,
metadata: { category: "api", version: "2.0" }
});typescript
// 列出并筛选文档
const allDocs = await mcp.call("list_documents");
const apiDocs = allDocs.filter(doc =>
doc.metadata?.category === "api"
);
// 更新文档(删除后重新添加)
await mcp.call("delete_document", { id: "doc_old" });
await mcp.call("add_document", {
title: "更新后的API文档",
content: updatedContent,
metadata: { category: "api", version: "2.0" }
});Web UI
Web UI
Access the built-in web interface at (default).
http://localhost:3080Features:
- Dashboard with document statistics
- Upload drag & drop interface
- Visual search across all documents
- Document browser with delete/view
- AI search interface (if Gemini key configured)
- Context window explorer
Disable with or change port with .
START_WEB_UI=falseWEB_PORT默认访问内置Web界面:。
http://localhost:3080功能特性:
- 包含文档统计信息的仪表盘
- 文件拖拽上传界面
- 跨文档可视化搜索
- 文档浏览器(支持查看/删除)
- AI搜索界面(需配置Gemini密钥)
- 上下文窗口浏览器
可通过禁用Web UI,或通过修改端口。
START_WEB_UI=falseWEB_PORTTroubleshooting
故障排除
Embeddings Not Generated
嵌入向量未生成
Symptom: Search returns no results or poor results.
Solution:
bash
undefined**症状:**搜索无结果或结果质量差。
解决方案:
bash
undefinedCheck if model is downloaded
检查模型是否已下载
ls ~/.cache/huggingface/
ls ~/.cache/huggingface/
Force re-download by clearing cache
清除缓存强制重新下载
rm -rf ~/.cache/huggingface/
rm -rf ~/.cache/huggingface/
Restart server to re-download
重启服务器重新下载模型
undefinedundefinedModel Change Not Taking Effect
模型更改未生效
Symptom: Changed but search quality unchanged.
MCP_EMBEDDING_MODELSolution:
bash
undefined**症状:**修改后搜索质量无变化。
MCP_EMBEDDING_MODEL解决方案:
bash
undefinedDelete existing database
删除现有数据库
rm ~/.mcp-documentation-server/data/orama-*.msp
rm ~/.mcp-documentation-server/data/orama-*.msp
Re-add all documents
重新添加所有文档
Server will recreate database with new model dimensions
服务器将使用新模型维度重建数据库
undefinedundefinedLarge File Processing Fails
大文件处理失败
Symptom: File upload times out or fails.
Solution:
json
{
"env": {
"MCP_STREAMING_ENABLED": "true",
"MCP_STREAM_FILE_SIZE_LIMIT": "5242880"
}
}Reduces threshold to 5MB to trigger streaming mode earlier.
**症状:**文件上传超时或失败。
解决方案:
json
{
"env": {
"MCP_STREAMING_ENABLED": "true",
"MCP_STREAM_FILE_SIZE_LIMIT": "5242880"
}
}将阈值降低至5MB,更早触发流式处理模式。
AI Search Not Available
AI搜索不可用
Symptom: tool missing.
search_documents_with_aiSolution:
- Verify is set in configuration
GEMINI_API_KEY - Get API key from https://aistudio.google.com/app/apikey
- Restart MCP server after adding key
**症状:**缺少工具。
search_documents_with_ai解决方案:
- 验证配置中是否已设置
GEMINI_API_KEY - 从https://aistudio.google.com/app/apikey获取API密钥
- 添加密钥后重启MCP服务器
Port Already in Use
端口已被占用
Symptom: Web UI fails to start with .
EADDRINUSESolution:
json
{
"env": {
"WEB_PORT": "3081"
}
}Or disable Web UI:
"START_WEB_UI": "false"**症状:**Web UI启动失败,提示。
EADDRINUSE解决方案:
json
{
"env": {
"WEB_PORT": "3081"
}
}或禁用Web UI:
"START_WEB_UI": "false"Memory Issues with Large Documents
大文档导致内存问题
Symptom: Server crashes with out-of-memory error.
Solution:
- Enable streaming:
MCP_STREAMING_ENABLED=true - Reduce chunk size: Process files individually
- Use more efficient model: (384 dims)
Xenova/all-MiniLM-L6-v2
**症状:**服务器因内存不足崩溃。
解决方案:
- 启用流式处理:
MCP_STREAMING_ENABLED=true - 减小分块大小:单独处理文件
- 使用更高效的模型:(384维度)
Xenova/all-MiniLM-L6-v2
Migration from Legacy JSON
从旧版JSON迁移
Symptom: Old documents not appearing.
Solution:
The server automatically migrates legacy JSON documents on first startup. Check:
bash
ls ~/.mcp-documentation-server/data/migration-complete.flagIf flag exists but documents missing, check logs for migration errors.
**症状:**旧文档未显示。
解决方案:
服务器在首次启动时会自动迁移旧版JSON文档。检查:
bash
ls ~/.mcp-documentation-server/data/migration-complete.flag如果标记存在但文档缺失,请检查日志中的迁移错误。
Development
开发
Run in development mode:
bash
npm run dev # MCP server with hot reload
npm run inspect # FastMCP web UI for testing tools
npm run web # Web UI only (development)Build and run production:
bash
npm run build
npm start运行开发模式:
bash
npm run dev # 带热重载的MCP服务器
npm run inspect # 用于测试工具的FastMCP Web UI
npm run web # 仅启动Web UI(开发模式)构建并运行生产版本:
bash
npm run build
npm startArchitecture
架构
- FastMCP — MCP server framework with stdio transport
- Orama — Embedded vector database (3 instances: docs, chunks, parents)
- Transformers.js — Local embedding generation (@xenova/transformers)
- Parent-Child Chunking — Large context parents + small precise children
- LRU Cache — In-memory embedding cache for performance
- Streaming Reader — Handle large files without memory bloat
Data flow:
- Document → IntelligentChunker → Parent chunks + Child chunks
- Child chunks → EmbeddingProvider → Vectors (384/768 dims)
- Vectors + chunks → OramaStore → Persisted to disk
- Query → Hybrid search (text + vector) → Deduplicated by parent → Results with context
- FastMCP — 采用stdio传输的MCP服务器框架
- Orama — 嵌入式向量数据库(3个实例:文档、分块、父分块)
- Transformers.js — 本地嵌入向量生成工具(@xenova/transformers)
- 父子分块 — 大上下文父分块 + 精准小分块
- LRU缓存 — 内存嵌入向量缓存提升性能
- 流式读取器 — 处理大文件避免内存膨胀
数据流:
- 文档 → 智能分块器 → 父分块 + 子分块
- 子分块 → 嵌入向量生成器 → 向量(384/768维度)
- 向量 + 分块 → Orama存储 → 持久化到磁盘
- 查询 → 混合搜索(文本+向量) → 按父分块去重 → 带上下文的结果