chromadb-integration-skills
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseChromaDB Integration Skills
ChromaDB集成技能
Purpose: This skill teaches agents how to integrate ChromaDB for semantic search, persistent storage, and pattern matching across ANY domain - research, code, trading, legal, documentation, and more.
Critical Use Case: When agents need to work with large datasets (1000+ items), perform semantic search, maintain persistent knowledge, or learn from historical patterns, ChromaDB eliminates token limits and enables powerful vector-based retrieval.
Used By: All agent types - researchers, developers, traders, legal analysts, documentation writers, QA testers, etc.
用途:本技能教授各类Agent如何在研究、代码、交易、法律、文档等所有领域集成ChromaDB,实现语义搜索、持久化存储和模式匹配。
关键使用场景:当Agent需要处理1000条以上的大型数据集、执行语义搜索、维护持久化知识或从历史模式中学习时,ChromaDB可突破令牌限制,实现强大的基于向量的检索功能。
适用对象:所有类型的Agent——研究员、开发者、交易员、法律分析师、文档撰写者、QA测试人员等。
When to Use ChromaDB Integration
何时使用ChromaDB集成
Use ChromaDB when:
- Large Datasets: Working with 1000+ items (documents, code files, bugs, trades, contracts, etc.)
- Semantic Search: Finding items by meaning, not just keywords
- Persistent Memory: Knowledge needs to survive across sessions, days, months
- Pattern Matching: Identifying similar historical cases/patterns for decision-making
- Cross-Session Learning: Building institutional knowledge over time
- Token Limits: Data too large to fit in context window (100K+ tokens)
- Aggregation: Combining results from multiple queries/sources
在以下场景使用ChromaDB:
- 大型数据集:处理1000条以上的项目(文档、代码文件、Bug记录、交易数据、合同等)
- 语义搜索:根据含义而非仅关键词查找项目
- 持久化记忆:知识需要在多个会话、数天甚至数月内保留
- 模式匹配:识别相似的历史案例/模式以辅助决策
- 跨会话学习:逐步构建机构级知识
- 令牌限制:数据量过大,无法放入上下文窗口(10万+令牌)
- 聚合功能:合并多个查询/来源的结果
Core ChromaDB Concepts
ChromaDB核心概念
Collections
Collections
Definition: Named vector databases storing documents with embeddings and metadata
Naming Strategy:
- Domain-based:
{domain}_{purpose}_{identifier} - Examples:
- Research: ,
research_prior_art_blockchain_2024research_literature_ml_transformers - Code: ,
codebase_api_endpointscodebase_bug_patterns_auth - Trading: ,
backtest_results_sma_strategymarket_conditions_spy_2024 - Legal: ,
case_law_patent_eligibilitycontracts_saas_clauses - Documentation: ,
api_docs_v2architecture_decisions_2024
- Research:
定义:存储带有Embeddings和元数据的文档的命名向量数据库
命名策略:
- 基于领域:
{domain}_{purpose}_{identifier} - 示例:
研究领域:,
research_prior_art_blockchain_2024代码领域:research_literature_ml_transformers,codebase_api_endpoints交易领域:codebase_bug_patterns_auth,backtest_results_sma_strategy法律领域:market_conditions_spy_2024,case_law_patent_eligibility文档领域:contracts_saas_clauses,api_docs_v2architecture_decisions_2024
Documents
Documents
Definition: Text content to be searched semantically
Best Practices:
- Chunk Size: 200-500 words optimal (too small = context loss, too large = poor granularity)
- Content Format: Title + summary + key details (e.g., ))
"Patent US10123456 - Blockchain Authentication. Abstract: A method for..." - Deduplication: Use unique IDs to prevent duplicate storage
定义:用于语义搜索的文本内容
最佳实践:
- 分块大小:200-500字为最优(过小会丢失上下文,过大则粒度不足)
- 内容格式:标题+摘要+关键细节(例如:)
"Patent US10123456 - Blockchain Authentication. Abstract: A method for..." - 去重:使用唯一ID避免重复存储
Metadata
Metadata
Definition: Structured data for filtering, not semantic search
Strategy:
javascript
{
// Temporal filters
"date": "2024-11-14",
"year": 2024,
"month": 11,
// Categorical filters
"type": "bug_report",
"category": "authentication",
"severity": "high",
// Numeric filters
"citations": 42,
"price": 150.25,
"performance_score": 0.87,
// Source tracking
"source": "github_issue",
"author": "kim-asplund",
"url": "https://..."
}定义:用于过滤的结构化数据,不参与语义搜索
策略:
javascript
{
// 时间过滤
"date": "2024-11-14",
"year": 2024,
"month": 11,
// 分类过滤
"type": "bug_report",
"category": "authentication",
"severity": "high",
// 数值过滤
"citations": 42,
"price": 150.25,
"performance_score": 0.87,
// 来源追踪
"source": "github_issue",
"author": "kim-asplund",
"url": "https://..."
}Embeddings
Embeddings
Definition: Vector representations enabling semantic similarity
How It Works:
- ChromaDB automatically generates embeddings from document text
- Similar meanings → similar vectors → close in vector space
- Distance metrics (cosine, euclidean) measure similarity
定义:实现语义相似度的向量表示
工作原理:
- ChromaDB自动从文档文本生成Embeddings
- 含义相似→向量相似→在向量空间中距离接近
- 距离指标(余弦距离、欧氏距离)用于衡量相似度
Universal ChromaDB Workflow
通用ChromaDB工作流
Phase 1: Collection Design
阶段1:集合设计
javascript
// Step 1: Design collection strategy based on agent type
const collectionStrategy = {
research_agent: "One collection per research topic/question",
code_agent: "Collections by codebase module/feature",
trading_agent: "Collections by strategy/timeframe/symbol",
legal_agent: "Collections by practice area/jurisdiction",
documentation_agent: "Collections by project/version"
};
// Step 2: Create collection with descriptive metadata
mcp__chroma__create_collection({
collection_name: "{domain}_{purpose}_{identifier}",
embedding_function_name: "default", // Uses sentence transformers
metadata: {
created_date: "2024-11-14",
domain: "research|code|trading|legal|docs",
purpose: "Descriptive purpose",
total_items: 0, // Will update
last_updated: "2024-11-14"
}
});javascript
// Step 1: Design collection strategy based on agent type
const collectionStrategy = {
research_agent: "One collection per research topic/question",
code_agent: "Collections by codebase module/feature",
trading_agent: "Collections by strategy/timeframe/symbol",
legal_agent: "Collections by practice area/jurisdiction",
documentation_agent: "Collections by project/version"
};
// Step 2: Create collection with descriptive metadata
mcp__chroma__create_collection({
collection_name: "{domain}_{purpose}_{identifier}",
embedding_function_name: "default", // Uses sentence transformers
metadata: {
created_date: "2024-11-14",
domain: "research|code|trading|legal|docs",
purpose: "Descriptive purpose",
total_items: 0, // Will update
last_updated: "2024-11-14"
}
});Phase 2: Data Ingestion
阶段2:数据导入
javascript
// Step 1: Batch data collection (minimize API calls)
const items = collectAllItems(); // From API, files, database, etc.
// Step 2: Transform to ChromaDB format
const documents = items.map(item => formatDocument(item));
const ids = items.map(item => item.id || generateUniqueId());
const metadatas = items.map(item => extractMetadata(item));
// Step 3: Batch insert (ChromaDB handles chunking automatically)
mcp__chroma__add_documents({
collection_name: collectionName,
documents: documents,
ids: ids,
metadatas: metadatas
});
// Step 4: Update collection metadata
mcp__chroma__modify_collection({
collection_name: collectionName,
new_metadata: {
...existingMetadata,
total_items: items.length,
last_updated: new Date().toISOString()
}
});javascript
// Step 1: Batch data collection (minimize API calls)
const items = collectAllItems(); // From API, files, database, etc.
// Step 2: Transform to ChromaDB format
const documents = items.map(item => formatDocument(item));
const ids = items.map(item => item.id || generateUniqueId());
const metadatas = items.map(item => extractMetadata(item));
// Step 3: Batch insert (ChromaDB handles chunking automatically)
mcp__chroma__add_documents({
collection_name: collectionName,
documents: documents,
ids: ids,
metadatas: metadatas
});
// Step 4: Update collection metadata
mcp__chroma__modify_collection({
collection_name: collectionName,
new_metadata: {
...existingMetadata,
total_items: items.length,
last_updated: new Date().toISOString()
}
});Phase 3: Semantic Search
阶段3:语义搜索
javascript
// Step 1: Formulate semantic query (natural language works!)
const query = "authentication failures in production environment";
// Step 2: Execute semantic search with filters
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 20,
where: {
"$and": [
{ "environment": "production" },
{ "severity": { "$in": ["high", "critical"] } },
{ "date": { "$gte": "2024-01-01" } }
]
},
include: ["documents", "metadatas", "distances"]
});
// Step 3: Filter by semantic similarity (distance threshold)
const highlyRelevant = results.ids[0].filter((id, idx) =>
results.distances[0][idx] < 0.3 // Adjust threshold based on use case
);
// Step 4: Retrieve full details if needed
const fullDetails = mcp__chroma__get_documents({
collection_name: collectionName,
ids: highlyRelevant,
include: ["documents", "metadatas"]
});javascript
// Step 1: Formulate semantic query (natural language works!)
const query = "authentication failures in production environment";
// Step 2: Execute semantic search with filters
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 20,
where: {
"$and": [
{ "environment": "production" },
{ "severity": { "$in": ["high", "critical"] } },
{ "date": { "$gte": "2024-01-01" } }
]
},
include: ["documents", "metadatas", "distances"]
});
// Step 3: Filter by semantic similarity (distance threshold)
const highlyRelevant = results.ids[0].filter((id, idx) =>
results.distances[0][idx] < 0.3 // Adjust threshold based on use case
);
// Step 4: Retrieve full details if needed
const fullDetails = mcp__chroma__get_documents({
collection_name: collectionName,
ids: highlyRelevant,
include: ["documents", "metadatas"]
});Phase 4: Pattern Matching
阶段4:模式匹配
javascript
// Cross-collection pattern detection
const allCollections = mcp__chroma__list_collections();
const relevantCollections = allCollections.filter(c =>
c.startsWith(collectionPrefix)
);
const patterns = [];
for (const collection of relevantCollections) {
const matches = mcp__chroma__query_documents({
collection_name: collection,
query_texts: [patternQuery],
n_results: 10,
where: { "outcome": "success" } // Only successful cases
});
if (matches.ids[0].length > 0) {
patterns.push({
collection: collection,
matches: matches,
success_rate: calculateSuccessRate(matches)
});
}
}
// Identify best pattern
const bestPattern = patterns.sort((a, b) =>
b.success_rate - a.success_rate
)[0];javascript
// Cross-collection pattern detection
const allCollections = mcp__chroma__list_collections();
const relevantCollections = allCollections.filter(c =>
c.startsWith(collectionPrefix)
);
const patterns = [];
for (const collection of relevantCollections) {
const matches = mcp__chroma__query_documents({
collection_name: collection,
query_texts: [patternQuery],
n_results: 10,
where: { "outcome": "success" } // Only successful cases
});
if (matches.ids[0].length > 0) {
patterns.push({
collection: collection,
matches: matches,
success_rate: calculateSuccessRate(matches)
});
}
}
// Identify best pattern
const bestPattern = patterns.sort((a, b) =>
b.success_rate - a.success_rate
)[0];Use Case Templates
使用场景模板
Template 1: Research Agent - Literature Review
模板1:研究Agent - 文献综述
Problem: Store 1000+ research papers, find semantically similar work
javascript
// Collection: research_literature_{topic}
const papers = fetchPapersFromAPI("machine learning transformers");
mcp__chroma__create_collection({
collection_name: "research_literature_ml_transformers",
metadata: { topic: "ML Transformers", papers_count: 0 }
});
// Store papers with rich metadata
papers.forEach(paper => {
mcp__chroma__add_documents({
collection_name: "research_literature_ml_transformers",
documents: [`${paper.title}. ${paper.abstract}`],
ids: [paper.doi || paper.id],
metadatas: [{
title: paper.title,
authors: paper.authors.join(", "),
year: paper.year,
citations: paper.citation_count,
venue: paper.venue,
url: paper.url
}]
});
});
// Semantic search: "Find papers about attention mechanisms for vision"
const relevant = mcp__chroma__query_documents({
collection_name: "research_literature_ml_transformers",
query_texts: ["attention mechanisms computer vision"],
n_results: 20,
where: { "year": { "$gte": 2020 }, "citations": { "$gte": 50 } }
});Benefits: No token limits, semantic discovery, citation filtering, persistent library
问题:存储1000+篇研究论文,查找语义相似的研究成果
javascript
// Collection: research_literature_{topic}
const papers = fetchPapersFromAPI("machine learning transformers");
mcp__chroma__create_collection({
collection_name: "research_literature_ml_transformers",
metadata: { topic: "ML Transformers", papers_count: 0 }
});
// Store papers with rich metadata
papers.forEach(paper => {
mcp__chroma__add_documents({
collection_name: "research_literature_ml_transformers",
documents: [`${paper.title}. ${paper.abstract}`],
ids: [paper.doi || paper.id],
metadatas: [{
title: paper.title,
authors: paper.authors.join(", "),
year: paper.year,
citations: paper.citation_count,
venue: paper.venue,
url: paper.url
}]
});
});
// Semantic search: "Find papers about attention mechanisms for vision"
const relevant = mcp__chroma__query_documents({
collection_name: "research_literature_ml_transformers",
query_texts: ["attention mechanisms computer vision"],
n_results: 20,
where: { "year": { "$gte": 2020 }, "citations": { "$gte": 50 } }
});优势:无令牌限制、语义发现、引用过滤、持久化文献库
Template 2: Code Agent - Bug Pattern Recognition
模板2:代码Agent - Bug模式识别
Problem: Store bug reports, identify similar issues, suggest solutions
javascript
// Collection: codebase_bug_patterns_{module}
const bugs = fetchAllGitHubIssues("is:issue label:bug");
mcp__chroma__create_collection({
collection_name: "codebase_bug_patterns_auth",
metadata: { module: "authentication", total_bugs: 0 }
});
// Store bugs with solutions
bugs.forEach(bug => {
mcp__chroma__add_documents({
collection_name: "codebase_bug_patterns_auth",
documents: [`Bug #${bug.number}: ${bug.title}. ${bug.body}`],
ids: [`bug_${bug.number}`],
metadatas: [{
number: bug.number,
title: bug.title,
severity: bug.labels.find(l => l.startsWith("severity:"))?.split(":")[1],
status: bug.state,
solution: bug.resolution || "No solution yet",
created_at: bug.created_at,
resolved_at: bug.closed_at,
url: bug.html_url
}]
});
});
// New bug arrives - find similar historical bugs
const newBugDescription = "User login fails with 401 error after password reset";
const similarBugs = mcp__chroma__query_documents({
collection_name: "codebase_bug_patterns_auth",
query_texts: [newBugDescription],
n_results: 10,
where: { "status": "closed", "solution": { "$ne": "No solution yet" } }
});
// Extract solution from most similar resolved bug
const suggestedSolution = similarBugs.metadatas[0][0].solution;Benefits: Instant bug pattern matching, solution reuse, similar issue detection
问题:存储Bug报告,识别相似问题并建议解决方案
javascript
// Collection: codebase_bug_patterns_{module}
const bugs = fetchAllGitHubIssues("is:issue label:bug");
mcp__chroma__create_collection({
collection_name: "codebase_bug_patterns_auth",
metadata: { module: "authentication", total_bugs: 0 }
});
// Store bugs with solutions
bugs.forEach(bug => {
mcp__chroma__add_documents({
collection_name: "codebase_bug_patterns_auth",
documents: [`Bug #${bug.number}: ${bug.title}. ${bug.body}`],
ids: [`bug_${bug.number}`],
metadatas: [{
number: bug.number,
title: bug.title,
severity: bug.labels.find(l => l.startsWith("severity:"))?.split(":")[1],
status: bug.state,
solution: bug.resolution || "No solution yet",
created_at: bug.created_at,
resolved_at: bug.closed_at,
url: bug.html_url
}]
});
});
// New bug arrives - find similar historical bugs
const newBugDescription = "User login fails with 401 error after password reset";
const similarBugs = mcp__chroma__query_documents({
collection_name: "codebase_bug_patterns_auth",
query_texts: [newBugDescription],
n_results: 10,
where: { "status": "closed", "solution": { "$ne": "No solution yet" } }
});
// Extract solution from most similar resolved bug
const suggestedSolution = similarBugs.metadatas[0][0].solution;优势:即时Bug模式匹配、解决方案复用、相似问题检测
Template 3: Trading Agent - Backtest Results Database
模板3:交易Agent - 回测结果数据库
Problem: Store 10,000+ backtest results, identify optimal parameter patterns
javascript
// Collection: backtest_results_{strategy_name}
const backtests = runParameterSweep(strategyCode, parameterRanges);
mcp__chroma__create_collection({
collection_name: "backtest_results_sma_crossover",
metadata: { strategy: "SMA Crossover", total_backtests: 0 }
});
// Store each backtest with parameters + results
backtests.forEach(backtest => {
const description = `
SMA Crossover strategy with fast=${backtest.params.fast_period},
slow=${backtest.params.slow_period}, stop_loss=${backtest.params.stop_loss}.
Market conditions: ${backtest.market_regime}, volatility=${backtest.avg_volatility}.
`;
mcp__chroma__add_documents({
collection_name: "backtest_results_sma_crossover",
documents: [description],
ids: [`backtest_${backtest.id}`],
metadatas: [{
fast_period: backtest.params.fast_period,
slow_period: backtest.params.slow_period,
stop_loss: backtest.params.stop_loss,
sharpe_ratio: backtest.sharpe_ratio,
max_drawdown: backtest.max_drawdown,
win_rate: backtest.win_rate,
total_return: backtest.total_return,
market_regime: backtest.market_regime,
symbol: backtest.symbol,
timeframe: backtest.timeframe,
start_date: backtest.start_date,
end_date: backtest.end_date
}]
});
});
// Find optimal parameters for current market conditions
const currentMarket = analyzeCurrentMarket();
const marketDescription = `
Market regime: ${currentMarket.regime}, volatility: ${currentMarket.volatility},
trend strength: ${currentMarket.trend_strength}
`;
const optimalBacktests = mcp__chroma__query_documents({
collection_name: "backtest_results_sma_crossover",
query_texts: [marketDescription],
n_results: 20,
where: {
"$and": [
{ "sharpe_ratio": { "$gte": 1.5 } },
{ "max_drawdown": { "$lte": -0.15 } },
{ "symbol": currentMarket.symbol }
]
}
});
// Extract best parameter set
const bestParams = optimalBacktests.metadatas[0][0];Benefits: Parameter optimization, market regime matching, performance pattern discovery
问题:存储10000+条回测结果,识别最优参数模式
javascript
// Collection: backtest_results_{strategy_name}
const backtests = runParameterSweep(strategyCode, parameterRanges);
mcp__chroma__create_collection({
collection_name: "backtest_results_sma_crossover",
metadata: { strategy: "SMA Crossover", total_backtests: 0 }
});
// Store each backtest with parameters + results
backtests.forEach(backtest => {
const description = `
SMA Crossover strategy with fast=${backtest.params.fast_period},
slow=${backtest.params.slow_period}, stop_loss=${backtest.params.stop_loss}.
Market conditions: ${backtest.market_regime}, volatility=${backtest.avg_volatility}.
`;
mcp__chroma__add_documents({
collection_name: "backtest_results_sma_crossover",
documents: [description],
ids: [`backtest_${backtest.id}`],
metadatas: [{
fast_period: backtest.params.fast_period,
slow_period: backtest.params.slow_period,
stop_loss: backtest.params.stop_loss,
sharpe_ratio: backtest.sharpe_ratio,
max_drawdown: backtest.max_drawdown,
win_rate: backtest.win_rate,
total_return: backtest.total_return,
market_regime: backtest.market_regime,
symbol: backtest.symbol,
timeframe: backtest.timeframe,
start_date: backtest.start_date,
end_date: backtest.end_date
}]
});
});
// Find optimal parameters for current market conditions
const currentMarket = analyzeCurrentMarket();
const marketDescription = `
Market regime: ${currentMarket.regime}, volatility: ${currentMarket.volatility},
trend strength: ${currentMarket.trend_strength}
`;
const optimalBacktests = mcp__chroma__query_documents({
collection_name: "backtest_results_sma_crossover",
query_texts: [marketDescription],
n_results: 20,
where: {
"$and": [
{ "sharpe_ratio": { "$gte": 1.5 } },
{ "max_drawdown": { "$lte": -0.15 } },
{ "symbol": currentMarket.symbol }
]
}
});
// Extract best parameter set
const bestParams = optimalBacktests.metadatas[0][0];优势:参数优化、市场状态匹配、性能模式发现
Template 4: Documentation Agent - Style Guide Enforcement
模板4:文档Agent - 风格指南执行
Problem: Store API documentation examples, ensure consistent style
javascript
// Collection: api_docs_{project_version}
const existingDocs = parseAllApiDocs("./docs/api/");
mcp__chroma__create_collection({
collection_name: "api_docs_v2",
metadata: { version: "2.0", total_endpoints: 0 }
});
// Store documentation with style metadata
existingDocs.forEach(doc => {
mcp__chroma__add_documents({
collection_name: "api_docs_v2",
documents: [doc.fullContent],
ids: [doc.endpoint],
metadatas: [{
endpoint: doc.endpoint,
method: doc.method,
category: doc.category,
style_score: doc.styleScore, // Computed during ingestion
has_examples: doc.examples.length > 0,
has_error_codes: doc.errorCodes.length > 0,
last_updated: doc.lastModified
}]
});
});
// New endpoint documented - find similar endpoints for style consistency
const newEndpoint = "POST /api/v2/users/{id}/preferences";
const similarEndpoints = mcp__chroma__query_documents({
collection_name: "api_docs_v2",
query_texts: [`${newEndpoint} user preferences update`],
n_results: 5,
where: {
"$and": [
{ "method": "POST" },
{ "style_score": { "$gte": 0.9 } },
{ "has_examples": true }
]
}
});
// Use similar endpoint as template
const template = similarEndpoints.documents[0][0];Benefits: Style consistency, template discovery, automated quality checks
问题:存储API文档示例,确保风格一致性
javascript
// Collection: api_docs_{project_version}
const existingDocs = parseAllApiDocs("./docs/api/");
mcp__chroma__create_collection({
collection_name: "api_docs_v2",
metadata: { version: "2.0", total_endpoints: 0 }
});
// Store documentation with style metadata
existingDocs.forEach(doc => {
mcp__chroma__add_documents({
collection_name: "api_docs_v2",
documents: [doc.fullContent],
ids: [doc.endpoint],
metadatas: [{
endpoint: doc.endpoint,
method: doc.method,
category: doc.category,
style_score: doc.styleScore, // Computed during ingestion
has_examples: doc.examples.length > 0,
has_error_codes: doc.errorCodes.length > 0,
last_updated: doc.lastModified
}]
});
});
// New endpoint documented - find similar endpoints for style consistency
const newEndpoint = "POST /api/v2/users/{id}/preferences";
const similarEndpoints = mcp__chroma__query_documents({
collection_name: "api_docs_v2",
query_texts: [`${newEndpoint} user preferences update`],
n_results: 5,
where: {
"$and": [
{ "method": "POST" },
{ "style_score": { "$gte": 0.9 } },
{ "has_examples": true }
]
}
});
// Use similar endpoint as template
const template = similarEndpoints.documents[0][0];优势:风格一致性、模板查找、自动化质量检查
Template 5: QA Testing Agent - Test Pattern Library
模板5:QA测试Agent - 测试模式库
Problem: Store test cases, identify gaps, suggest new tests
javascript
// Collection: test_cases_{module}
const existingTests = parseTestFiles("./tests/");
mcp__chroma__create_collection({
collection_name: "test_cases_authentication",
metadata: { module: "authentication", total_tests: 0 }
});
// Store test cases with coverage metadata
existingTests.forEach(test => {
mcp__chroma__add_documents({
collection_name: "test_cases_authentication",
documents: [`${test.description}. Covers: ${test.coveredScenarios.join(", ")}`],
ids: [test.id],
metadatas: [{
test_type: test.type, // "unit", "integration", "e2e"
file_path: test.filePath,
line_number: test.lineNumber,
last_run: test.lastRun,
status: test.lastStatus,
execution_time_ms: test.executionTime,
assertions: test.assertionCount
}]
});
});
// New feature added - identify missing test coverage
const newFeature = "Password reset with 2FA verification";
const existingCoverage = mcp__chroma__query_documents({
collection_name: "test_cases_authentication",
query_texts: [newFeature],
n_results: 10
});
// If distance > 0.5, probably not covered
const isCovered = existingCoverage.distances[0][0] < 0.5;
if (!isCovered) {
// Suggest test cases based on similar features
const similarFeatures = mcp__chroma__query_documents({
collection_name: "test_cases_authentication",
query_texts: ["password reset", "2FA verification"],
n_results: 5
});
// Use similar tests as templates
const testTemplates = similarFeatures.documents[0];
}Benefits: Coverage gap detection, test template discovery, pattern-based test generation
问题:存储测试用例,识别覆盖缺口并建议新测试
javascript
// Collection: test_cases_{module}
const existingTests = parseTestFiles("./tests/");
mcp__chroma__create_collection({
collection_name: "test_cases_authentication",
metadata: { module: "authentication", total_tests: 0 }
});
// Store test cases with coverage metadata
existingTests.forEach(test => {
mcp__chroma__add_documents({
collection_name: "test_cases_authentication",
documents: [`${test.description}. Covers: ${test.coveredScenarios.join(", ")}`],
ids: [test.id],
metadatas: [{
test_type: test.type, // "unit", "integration", "e2e"
file_path: test.filePath,
line_number: test.lineNumber,
last_run: test.lastRun,
status: test.lastStatus,
execution_time_ms: test.executionTime,
assertions: test.assertionCount
}]
});
});
// New feature added - identify missing test coverage
const newFeature = "Password reset with 2FA verification";
const existingCoverage = mcp__chroma__query_documents({
collection_name: "test_cases_authentication",
query_texts: [newFeature],
n_results: 10
});
// If distance > 0.5, probably not covered
const isCovered = existingCoverage.distances[0][0] < 0.5;
if (!isCovered) {
// Suggest test cases based on similar features
const similarFeatures = mcp__chroma__query_documents({
collection_name: "test_cases_authentication",
query_texts: ["password reset", "2FA verification"],
n_results: 5
});
// Use similar tests as templates
const testTemplates = similarFeatures.documents[0];
}优势:覆盖缺口检测、测试模板查找、基于模式的测试生成
Advanced Patterns
高级模式
Pattern 1: Multi-Collection Aggregation
模式1:多集合聚合
Use Case: Search across multiple related collections simultaneously
javascript
// Example: Search all research topics for a cross-cutting concept
const researchCollections = mcp__chroma__list_collections();
const topicCollections = researchCollections.filter(c =>
c.startsWith("research_literature_")
);
const crossTopicResults = [];
for (const collection of topicCollections) {
const results = mcp__chroma__query_documents({
collection_name: collection,
query_texts: ["transfer learning"],
n_results: 10
});
crossTopicResults.push({
topic: collection.replace("research_literature_", ""),
papers: results
});
}
// Aggregate and rank by relevance across topics
const allPapers = crossTopicResults.flatMap(r =>
r.papers.ids[0].map((id, idx) => ({
id: id,
topic: r.topic,
distance: r.papers.distances[0][idx],
metadata: r.papers.metadatas[0][idx]
}))
);
const rankedPapers = allPapers.sort((a, b) => a.distance - b.distance);使用场景:同时搜索多个相关集合
javascript
// Example: Search all research topics for a cross-cutting concept
const researchCollections = mcp__chroma__list_collections();
const topicCollections = researchCollections.filter(c =>
c.startsWith("research_literature_")
);
const crossTopicResults = [];
for (const collection of topicCollections) {
const results = mcp__chroma__query_documents({
collection_name: collection,
query_texts: ["transfer learning"],
n_results: 10
});
crossTopicResults.push({
topic: collection.replace("research_literature_", ""),
papers: results
});
}
// Aggregate and rank by relevance across topics
const allPapers = crossTopicResults.flatMap(r =>
r.papers.ids[0].map((id, idx) => ({
id: id,
topic: r.topic,
distance: r.papers.distances[0][idx],
metadata: r.papers.metadatas[0][idx]
}))
);
const rankedPapers = allPapers.sort((a, b) => a.distance - b.distance);Pattern 2: Hierarchical Collections
模式2:分层集合
Use Case: Parent-child relationship between collections
javascript
// Parent: codebase_architecture_decisions
// Children: codebase_architecture_decisions_{year}
// Create parent collection with aggregated data
mcp__chroma__create_collection({
collection_name: "codebase_architecture_decisions",
metadata: { type: "parent", child_collections: [] }
});
// Create child collections by year
[2022, 2023, 2024].forEach(year => {
mcp__chroma__create_collection({
collection_name: `codebase_architecture_decisions_${year}`,
metadata: { type: "child", parent: "codebase_architecture_decisions", year }
});
});
// Query strategy: Try child first (faster), fallback to parent
const queryYear = 2024;
let results = mcp__chroma__query_documents({
collection_name: `codebase_architecture_decisions_${queryYear}`,
query_texts: [query],
n_results: 10
});
if (results.ids[0].length < 5) {
// Not enough results in child, query parent
results = mcp__chroma__query_documents({
collection_name: "codebase_architecture_decisions",
query_texts: [query],
n_results: 10
});
}使用场景:集合间存在父子关系
javascript
// Parent: codebase_architecture_decisions
// Children: codebase_architecture_decisions_{year}
// Create parent collection with aggregated data
mcp__chroma__create_collection({
collection_name: "codebase_architecture_decisions",
metadata: { type: "parent", child_collections: [] }
});
// Create child collections by year
[2022, 2023, 2024].forEach(year => {
mcp__chroma__create_collection({
collection_name: `codebase_architecture_decisions_${year}`,
metadata: { type: "child", parent: "codebase_architecture_decisions", year }
});
});
// Query strategy: Try child first (faster), fallback to parent
const queryYear = 2024;
let results = mcp__chroma__query_documents({
collection_name: `codebase_architecture_decisions_${queryYear}`,
query_texts: [query],
n_results: 10
});
if (results.ids[0].length < 5) {
// Not enough results in child, query parent
results = mcp__chroma__query_documents({
collection_name: "codebase_architecture_decisions",
query_texts: [query],
n_results: 10
});
}Pattern 3: Temporal Decay
模式3:时间衰减
Use Case: Prioritize recent items while keeping historical context
javascript
// Store items with temporal metadata
mcp__chroma__add_documents({
collection_name: collectionName,
documents: documents,
ids: ids,
metadatas: metadatas.map(m => ({
...m,
timestamp: Date.now(),
age_days: 0 // Will be updated
}))
});
// Query with temporal boost
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 50 // Get more results for re-ranking
});
// Re-rank with temporal decay
const now = Date.now();
const rankedResults = results.ids[0].map((id, idx) => {
const ageDays = (now - results.metadatas[0][idx].timestamp) / (1000 * 60 * 60 * 24);
const decayFactor = Math.exp(-ageDays / 30); // Half-life ~30 days
const semanticScore = 1 - results.distances[0][idx];
const combinedScore = semanticScore * 0.7 + decayFactor * 0.3;
return {
id: id,
semantic_score: semanticScore,
decay_factor: decayFactor,
combined_score: combinedScore,
metadata: results.metadatas[0][idx]
};
}).sort((a, b) => b.combined_score - a.combined_score);使用场景:优先展示近期内容,同时保留历史上下文
javascript
// Store items with temporal metadata
mcp__chroma__add_documents({
collection_name: collectionName,
documents: documents,
ids: ids,
metadatas: metadatas.map(m => ({
...m,
timestamp: Date.now(),
age_days: 0 // Will be updated
}))
});
// Query with temporal boost
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 50 // Get more results for re-ranking
});
// Re-rank with temporal decay
const now = Date.now();
const rankedResults = results.ids[0].map((id, idx) => {
const ageDays = (now - results.metadatas[0][idx].timestamp) / (1000 * 60 * 60 * 24);
const decayFactor = Math.exp(-ageDays / 30); // Half-life ~30 days
const semanticScore = 1 - results.distances[0][idx];
const combinedScore = semanticScore * 0.7 + decayFactor * 0.3;
return {
id: id,
semantic_score: semanticScore,
decay_factor: decayFactor,
combined_score: combinedScore,
metadata: results.metadatas[0][idx]
};
}).sort((a, b) => b.combined_score - a.combined_score);Performance Optimization
性能优化
Batching Strategy
批量处理策略
javascript
// BAD: One document at a time (slow)
for (const item of items) {
mcp__chroma__add_documents({
collection_name: collectionName,
documents: [item.document],
ids: [item.id],
metadatas: [item.metadata]
});
}
// GOOD: Batch insert (100x faster)
const BATCH_SIZE = 100;
for (let i = 0; i < items.length; i += BATCH_SIZE) {
const batch = items.slice(i, i + BATCH_SIZE);
mcp__chroma__add_documents({
collection_name: collectionName,
documents: batch.map(item => item.document),
ids: batch.map(item => item.id),
metadatas: batch.map(item => item.metadata)
});
}javascript
// BAD: One document at a time (slow)
for (const item of items) {
mcp__chroma__add_documents({
collection_name: collectionName,
documents: [item.document],
ids: [item.id],
metadatas: [item.metadata]
});
}
// GOOD: Batch insert (100x faster)
const BATCH_SIZE = 100;
for (let i = 0; i < items.length; i += BATCH_SIZE) {
const batch = items.slice(i, i + BATCH_SIZE);
mcp__chroma__add_documents({
collection_name: collectionName,
documents: batch.map(item => item.document),
ids: batch.map(item => item.id),
metadatas: batch.map(item => item.metadata)
});
}Caching Strategy
缓存策略
javascript
// Check collection exists before creating
const existingCollections = mcp__chroma__list_collections();
if (!existingCollections.includes(collectionName)) {
mcp__chroma__create_collection({ collection_name: collectionName });
}
// Check document exists before adding
const existing = mcp__chroma__get_documents({
collection_name: collectionName,
ids: [documentId]
});
if (!existing.ids || existing.ids.length === 0) {
// Document doesn't exist, add it
mcp__chroma__add_documents({ ... });
} else {
// Document exists, update instead
mcp__chroma__update_documents({ ... });
}javascript
// Check collection exists before creating
const existingCollections = mcp__chroma__list_collections();
if (!existingCollections.includes(collectionName)) {
mcp__chroma__create_collection({ collection_name: collectionName });
}
// Check document exists before adding
const existing = mcp__chroma__get_documents({
collection_name: collectionName,
ids: [documentId]
});
if (!existing.ids || existing.ids.length === 0) {
// Document doesn't exist, add it
mcp__chroma__add_documents({ ... });
} else {
// Document exists, update instead
mcp__chroma__update_documents({ ... });
}Query Optimization
查询优化
javascript
// Use metadata filters to reduce search space
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 20,
where: {
// Pre-filter with metadata (faster than post-filtering semantic results)
"date": { "$gte": "2024-01-01" },
"category": { "$in": ["high_priority", "critical"] }
}
});
// Only include what you need
const minimalResults = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 10,
include: ["metadatas", "distances"] // Exclude documents if not needed
});javascript
// Use metadata filters to reduce search space
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 20,
where: {
// Pre-filter with metadata (faster than post-filtering semantic results)
"date": { "$gte": "2024-01-01" },
"category": { "$in": ["high_priority", "critical"] }
}
});
// Only include what you need
const minimalResults = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 10,
include: ["metadatas", "distances"] // Exclude documents if not needed
});Success Criteria
成功标准
ChromaDB integration is SUCCESSFUL when:
- ✅ Collections Created: Meaningful naming, appropriate metadata
- ✅ Data Ingested: Batched efficiently, deduplicated
- ✅ Semantic Search Works: Returns relevant results (distance < 0.4)
- ✅ Metadata Filters Applied: Correctly scopes search space
- ✅ Performance Optimized: Batching, caching, minimal queries
- ✅ Cross-Collection Queries: When appropriate for use case
- ✅ Persistent Knowledge: Data survives across sessions
- ✅ Pattern Matching: Identifies similar historical cases
- ✅ Token Limits Eliminated: Handles 1000+ items without context overflow
Skill Version: 1.0
Created: 2025-11-14
Purpose: Teach universal ChromaDB integration patterns for all agent types
Target Quality: 65/70
Dependencies: ChromaDB MCP (mcp__chroma__*)
Universal: Works for research, code, trading, legal, documentation, QA, and all other domains
ChromaDB集成成功的标志:
- ✅ 已创建集合:命名有意义,元数据合适
- ✅ 已导入数据:高效批量处理,已去重
- ✅ 语义搜索正常工作:返回相关结果(距离<0.4)
- ✅ 已应用元数据过滤:正确缩小搜索范围
- ✅ 性能已优化:批量处理、缓存、最少查询
- ✅ 支持跨集合查询:适用于对应使用场景
- ✅ 持久化知识:数据可跨会话保留
- ✅ 模式匹配:可识别相似历史案例
- ✅ 已消除令牌限制:可处理1000+条项目,无上下文溢出
技能版本:1.0
创建时间:2025-11-14
用途:教授适用于所有Agent类型的通用ChromaDB集成模式
目标质量:65/70
依赖:ChromaDB MCP (mcp__chroma__*)
通用性:适用于研究、代码、交易、法律、文档、QA及所有其他领域
🔴 Error Handling & Resilience (Priority 1)
🔴 错误处理与弹性(优先级1)
Critical for Production: Prevent data loss, handle failures gracefully
生产环境关键要求:防止数据丢失,优雅处理故障
Retry with Exponential Backoff
指数退避重试
javascript
async function retryWithBackoff(operation, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
await sleep(delay);
}
}
}javascript
async function retryWithBackoff(operation, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
await sleep(delay);
}
}
}Get-or-Create Pattern
获取或创建模式
javascript
function getOrCreateCollection(collectionName, metadata = {}) {
const collections = mcp__chroma__list_collections();
if (collections.includes(collectionName)) {
return { created: false, collection_name: collectionName };
}
mcp__chroma__create_collection({
collection_name: collectionName,
embedding_function_name: "default",
metadata: metadata
});
return { created: true, collection_name: collectionName };
}javascript
function getOrCreateCollection(collectionName, metadata = {}) {
const collections = mcp__chroma__list_collections();
if (collections.includes(collectionName)) {
return { created: false, collection_name: collectionName };
}
mcp__chroma__create_collection({
collection_name: collectionName,
embedding_function_name: "default",
metadata: metadata
});
return { created: true, collection_name: collectionName };
}Document Validation
文档验证
javascript
function validateDocument(document, id, metadata) {
if (!document || typeof document !== 'string') {
throw new Error(`Document must be non-empty string`);
}
if (!id || id.includes(' ')) {
throw new Error(`ID must be non-empty string without spaces`);
}
if (metadata && typeof metadata !== 'object') {
throw new Error(`Metadata must be object`);
}
}See full patterns: Load sub-skill
chromadb-error-handlingjavascript
function validateDocument(document, id, metadata) {
if (!document || typeof document !== 'string') {
throw new Error(`Document must be non-empty string`);
}
if (!id || id.includes(' ')) {
throw new Error(`ID must be non-empty string without spaces`);
}
if (metadata && typeof metadata !== 'object') {
throw new Error(`Metadata must be object`);
}
}查看完整模式:加载子技能
chromadb-error-handling🧪 Testing Patterns (Priority 1)
🧪 测试模式(优先级1)
Essential for Quality: Ensure ChromaDB integrations work correctly
质量保障必备:确保ChromaDB集成正常工作
Unit Tests (Mock ChromaDB)
单元测试(Mock ChromaDB)
javascript
// Mock ChromaDB for fast unit tests
class MockChromaDB {
constructor() {
this.collections = {};
}
create_collection({ collection_name, metadata }) {
this.collections[collection_name] = {
documents: [], ids: [], metadatas: [], metadata
};
}
query_documents({ collection_name, query_texts, n_results }) {
const collection = this.collections[collection_name];
return {
ids: [collection.ids.slice(0, n_results)],
distances: [collection.ids.slice(0, n_results).map(() => 0.2)]
};
}
}javascript
// Mock ChromaDB for fast unit tests
class MockChromaDB {
constructor() {
this.collections = {};
}
create_collection({ collection_name, metadata }) {
this.collections[collection_name] = {
documents: [], ids: [], metadatas: [], metadata
};
}
query_documents({ collection_name, query_texts, n_results }) {
const collection = this.collections[collection_name];
return {
ids: [collection.ids.slice(0, n_results)],
distances: [collection.ids.slice(0, n_results).map(() => 0.2)]
};
}
}Integration Tests (Real ChromaDB)
集成测试(真实ChromaDB)
javascript
describe('Semantic Search Integration', () => {
test('returns relevant documents', async () => {
await chromaClient.add({
collection_name: testCollection,
documents: [
'Machine learning uses neural networks',
'Python is a programming language'
],
ids: ['doc1', 'doc2']
});
const results = await chromaClient.query({
collection_name: testCollection,
query_texts: ['neural networks deep learning'],
n_results: 2
});
expect(results.ids[0]).toContain('doc1');
expect(results.distances[0][0]).toBeLessThan(0.4);
});
});See full patterns: Load sub-skill
chromadb-testing-patternsjavascript
describe('Semantic Search Integration', () => {
test('returns relevant documents', async () => {
await chromaClient.add({
collection_name: testCollection,
documents: [
'Machine learning uses neural networks',
'Python is a programming language'
],
ids: ['doc1', 'doc2']
});
const results = await chromaClient.query({
collection_name: testCollection,
query_texts: ['neural networks deep learning'],
n_results: 2
});
expect(results.ids[0]).toContain('doc1');
expect(results.distances[0][0]).toBeLessThan(0.4);
});
});查看完整模式:加载子技能
chromadb-testing-patterns🔐 Security & Privacy (Priority 2)
🔐 安全与隐私(优先级2)
Critical for Compliance: Protect PII, sanitize data
合规关键要求:保护PII,清理数据
PII Redaction
PII脱敏
javascript
function redactPII(text) {
let redacted = text;
// Email redaction
redacted = redacted.replace(
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
'[EMAIL_REDACTED]'
);
// Phone redaction
redacted = redacted.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]');
// SSN redaction
redacted = redacted.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');
return redacted;
}javascript
function redactPII(text) {
let redacted = text;
// Email redaction
redacted = redacted.replace(
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
'[EMAIL_REDACTED]'
);
// Phone redaction
redacted = redacted.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]');
// SSN redaction
redacted = redacted.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');
return redacted;
}Secret Sanitization
密钥清理
javascript
function redactSecrets(text) {
let redacted = text;
// GitHub tokens
redacted = redacted.replace(/ghp_[a-zA-Z0-9]{36}/g, '[GITHUB_TOKEN]');
// AWS keys
redacted = redacted.replace(/AKIA[0-9A-Z]{16}/g, '[AWS_KEY]');
// API keys
redacted = redacted.replace(
/api[_-]?key['\"]?\s*[:=]\s*['\"]?([a-zA-Z0-9_-]{20,})/gi,
'api_key: [REDACTED]'
);
return redacted;
}javascript
function redactSecrets(text) {
let redacted = text;
// GitHub tokens
redacted = redacted.replace(/ghp_[a-zA-Z0-9]{36}/g, '[GITHUB_TOKEN]');
// AWS keys
redacted = redacted.replace(/AKIA[0-9A-Z]{16}/g, '[AWS_KEY]');
// API keys
redacted = redacted.replace(
/api[_-]?key['\"]?\s*[:=]\s*['\"]?([a-zA-Z0-9_-]{20,})/gi,
'api_key: [REDACTED]'
);
return redacted;
}Access Control
访问控制
javascript
const collectionPermissions = {
'research_confidential': ['research_team', 'admin'],
'customer_pii': ['support_team', 'admin']
};
function checkAccess(collectionName, userRole) {
const allowedRoles = collectionPermissions[collectionName] || ['admin'];
if (!allowedRoles.includes(userRole)) {
throw new Error(`Access denied for role '${userRole}'`);
}
}See full patterns: Load sub-skill
chromadb-security-patternsjavascript
const collectionPermissions = {
'research_confidential': ['research_team', 'admin'],
'customer_pii': ['support_team', 'admin']
};
function checkAccess(collectionName, userRole) {
const allowedRoles = collectionPermissions[collectionName] || ['admin'];
if (!allowedRoles.includes(userRole)) {
throw new Error(`Access denied for role '${userRole}'`);
}
}查看完整模式:加载子技能
chromadb-security-patterns🗄️ Data Lifecycle Management (Priority 2)
🗄️ 数据生命周期管理(优先级2)
Sustain Production: Version collections, archive old data
生产环境可持续性:集合版本控制、归档旧数据
Schema Versioning
Schema版本控制
javascript
// Semantic versioning for collections
function createVersionedCollection(domain, purpose, version = 'v1') {
const collectionName = `${domain}_${purpose}_${version}`;
mcp__chroma__create_collection({
collection_name: collectionName,
metadata: {
version: version,
created_at: new Date().toISOString(),
schema_version: '1.0',
retention_days: 730, // 2 years
lifecycle_stage: 'active'
}
});
return collectionName;
}javascript
// Semantic versioning for collections
function createVersionedCollection(domain, purpose, version = 'v1') {
const collectionName = `${domain}_${purpose}_${version}`;
mcp__chroma__create_collection({
collection_name: collectionName,
metadata: {
version: version,
created_at: new Date().toISOString(),
schema_version: '1.0',
retention_days: 730, // 2 years
lifecycle_stage: 'active'
}
});
return collectionName;
}Schema Migration
Schema迁移
javascript
async function migrateCollectionSchema(oldCollection, newVersion) {
const newCollection = `${oldCollection}_v${newVersion}`;
// Create new collection
await mcp__chroma__create_collection({
collection_name: newCollection,
metadata: { migrated_from: oldCollection, version: newVersion }
});
// Copy all documents with transformed metadata
const allDocs = await mcp__chroma__get_documents({
collection_name: oldCollection,
limit: 100000
});
const transformedMetadatas = allDocs.metadatas.map(transformMetadata);
await mcp__chroma__add_documents({
collection_name: newCollection,
documents: allDocs.documents,
ids: allDocs.ids,
metadatas: transformedMetadatas
});
// Mark old as deprecated
await mcp__chroma__modify_collection({
collection_name: oldCollection,
new_metadata: { lifecycle_stage: 'deprecated', replacement: newCollection }
});
}javascript
async function migrateCollectionSchema(oldCollection, newVersion) {
const newCollection = `${oldCollection}_v${newVersion}`;
// Create new collection
await mcp__chroma__create_collection({
collection_name: newCollection,
metadata: { migrated_from: oldCollection, version: newVersion }
});
// Copy all documents with transformed metadata
const allDocs = await mcp__chroma__get_documents({
collection_name: oldCollection,
limit: 100000
});
const transformedMetadatas = allDocs.metadatas.map(transformMetadata);
await mcp__chroma__add_documents({
collection_name: newCollection,
documents: allDocs.documents,
ids: allDocs.ids,
metadatas: transformedMetadatas
});
// Mark old as deprecated
await mcp__chroma__modify_collection({
collection_name: oldCollection,
new_metadata: { lifecycle_stage: 'deprecated', replacement: newCollection }
});
}Retention Enforcement
保留策略执行
javascript
async function enforceRetentionPolicies() {
const collections = await mcp__chroma__list_collections();
for (const collectionName of collections) {
const info = await mcp__chroma__get_collection_info({ collection_name });
const retentionDays = info.metadata.retention_days || 730;
const ageDays = calculateAgeDays(info.metadata.created_at);
if (ageDays > retentionDays) {
await archiveCollection(collectionName); // Backup first
await mcp__chroma__delete_collection({ collection_name: collectionName });
}
}
}See full patterns: Load sub-skill
chromadb-lifecycle-managementjavascript
async function enforceRetentionPolicies() {
const collections = await mcp__chroma__list_collections();
for (const collectionName of collections) {
const info = await mcp__chroma__get_collection_info({ collection_name });
const retentionDays = info.metadata.retention_days || 730;
const ageDays = calculateAgeDays(info.metadata.created_at);
if (ageDays > retentionDays) {
await archiveCollection(collectionName); // Backup first
await mcp__chroma__delete_collection({ collection_name: collectionName });
}
}
}查看完整模式:加载子技能
chromadb-lifecycle-management🐛 Debugging & Troubleshooting (Priority 1)
🐛 调试与故障排除(优先级1)
Common Issues
常见问题
Issue: No results returned
- Cause: Distance threshold too strict, wrong collection
- Fix: Increase threshold (0.3 → 0.5), verify collection name
- Debug: Check values
results.distances[0]
Issue: Poor semantic matches
- Cause: Document chunking too large/small
- Fix: Optimal chunk size 200-500 words
- Debug: Review document length, split long documents
Issue: Slow queries
- Cause: Large collection without metadata filters
- Fix: Add metadata pre-filters (clause)
where - Debug: Check collection size, add filters
问题:无结果返回
- 原因:距离阈值过严、集合错误
- 修复:提高阈值(0.3→0.5)、验证集合名称
- 调试:查看的值
results.distances[0]
问题:语义匹配效果差
- 原因:文档分块过大/过小
- 修复:最优分块大小为200-500字
- 调试:检查文档长度,拆分长文档
问题:查询缓慢
- 原因:大型集合未使用元数据过滤
- 修复:添加元数据预过滤(子句)
where - 调试:检查集合大小,添加过滤器
Distance Threshold Guide
距离阈值指南
| Distance | Similarity | Use Case |
|---|---|---|
| < 0.2 | Almost exact | Duplicate detection |
| 0.2-0.3 | Very similar | High precision search |
| 0.3-0.5 | Moderately similar | Balanced search |
| 0.5-0.7 | Weakly similar | Broad exploration |
| > 0.7 | Different topics | Not relevant |
| 距离 | 相似度 | 使用场景 |
|---|---|---|
| < 0.2 | 几乎完全匹配 | 重复检测 |
| 0.2-0.3 | 高度相似 | 高精度搜索 |
| 0.3-0.5 | 中等相似 | 平衡型搜索 |
| 0.5-0.7 | 弱相似 | 广泛探索 |
| > 0.7 | 主题不同 | 不相关 |
Antipatterns to Avoid
需避免的反模式
❌ Storing entire files as single document
- Loses granularity, poor search relevance
- ✅ Fix: Chunk into 200-500 word sections
❌ No metadata filters on large collections
- Slow queries, high latency
- ✅ Fix: Always filter by date, category, type
❌ Not deduplicating documents
- Wasted storage, duplicate results
- ✅ Fix: Check existence before adding
❌ Ignoring connection failures
- Data loss, silent failures
- ✅ Fix: Implement retry logic, fallback
❌ 将整个文件作为单个文档存储
- 丢失粒度,搜索相关性差
- ✅ 修复:拆分为200-500字的片段
❌ 大型集合未使用元数据过滤
- 查询缓慢,延迟高
- ✅ 修复:始终按日期、类别、类型过滤
❌ 未对文档去重
- 浪费存储,返回重复结果
- ✅ 修复:添加前检查是否已存在
❌ 忽略连接故障
- 数据丢失,静默故障
- ✅ 修复:实现重试逻辑、降级方案
📚 Sub-Skill Reference
📚 子技能参考
Load targeted sub-skills for deep dives:
- chromadb-error-handling: Retry patterns, validation, circuit breakers (~150 lines)
- chromadb-testing-patterns: Unit/integration tests, mocking, fixtures (~120 lines)
- chromadb-security-patterns: PII redaction, access control, GDPR compliance (~90 lines)
- chromadb-lifecycle-management: Versioning, migration, archival, retention (~100 lines)
Usage:
Skill({ skill: "chromadb-error-handling" })加载针对性子技能以深入学习:
- chromadb-error-handling:重试模式、验证、断路器(约150行)
- chromadb-testing-patterns:单元/集成测试、Mock、测试夹具(约120行)
- chromadb-security-patterns:PII脱敏、访问控制、GDPR合规(约90行)
- chromadb-lifecycle-management:版本控制、迁移、归档、保留(约100行)
使用方式:
Skill({ skill: "chromadb-error-handling" })Updated Success Criteria
更新后的成功标准
ChromaDB integration is PRODUCTION-READY when:
Core Functionality (Original):
- ✅ Collections created with meaningful naming
- ✅ Data ingested efficiently (batching)
- ✅ Semantic search returns relevant results
- ✅ Metadata filters applied correctly
- ✅ Performance optimized
Production Readiness (New):
- ✅ Error Handling: Retry logic, validation, graceful degradation
- ✅ Testing: Unit tests (mocked), integration tests (real ChromaDB)
- ✅ Security: PII redacted, secrets sanitized, access control
- ✅ Lifecycle: Versioning strategy, retention policies, archival
Quality Score: 65/70 → 85/100 (with all enhancements)
Skill Version: 2.0
Updated: 2025-11-14
Enhancements: Error handling, testing, security, lifecycle management
Quality Score: 85/100 (Production-Ready)
Dependencies: ChromaDB MCP (mcp__chroma__*)
Sub-Skills: 4 modular sub-skills for targeted loading
ChromaDB集成达到生产就绪的标志:
核心功能(原有):
- ✅ 已创建命名有意义的集合
- ✅ 已高效导入数据(批量处理)
- ✅ 语义搜索返回相关结果
- ✅ 已正确应用元数据过滤
- ✅ 性能已优化
生产就绪(新增):
- ✅ 错误处理:重试逻辑、验证、优雅降级
- ✅ 测试:单元测试(Mock)、集成测试(真实ChromaDB)
- ✅ 安全:PII已脱敏、密钥已清理、访问控制已实现
- ✅ 生命周期:版本控制策略、保留策略、归档机制
质量评分:65/70 → 85/100(含所有增强功能)
技能版本:2.0
更新时间:2025-11-14
增强功能:错误处理、测试、安全、生命周期管理
质量评分:85/100(生产就绪)
依赖:ChromaDB MCP (mcp__chroma__*)
子技能:4个模块化子技能,支持按需加载