weaviate

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Weaviate Database Operations

Weaviate数据库操作

This skill provides comprehensive access to Weaviate vector databases including search operations, natural language queries, schema inspection, data exploration, filtered fetching, collection creation, and data imports.
本Skill可全面操作Weaviate向量数据库,包括搜索操作、自然语言查询、架构检查、数据探索、过滤提取、集合创建以及数据导入。

Weaviate Cloud Instance

Weaviate云实例

If the user does not have an instance yet, direct them to the cloud console to register and create a free sandbox. Create a Weaviate instance via Weaviate Cloud.
如果用户还没有实例,引导他们前往云控制台注册并创建免费沙箱。通过Weaviate Cloud创建Weaviate实例。

Environment Variables

环境变量

Required:
  • WEAVIATE_URL
    - Your Weaviate Cloud cluster URL
  • WEAVIATE_API_KEY
    - Your Weaviate API key
External Provider Keys (auto-detected): Set only the keys your collections use, refer to Environment Requirements for more information.
必填项:
  • WEAVIATE_URL
    - 你的Weaviate Cloud集群URL
  • WEAVIATE_API_KEY
    - 你的Weaviate API密钥
外部提供商密钥(自动检测): 仅设置你的集合所使用的密钥,更多信息请参考环境要求

Script Index

脚本索引

Search & Query

搜索与查询

  • Query Agent - Ask Mode: Use when the user wants a direct answer to a question based on collection data. The Query Agent synthesizes information from one or more collections and returns a structured response with source citations (collection name and object ID).
  • Query Agent - Search Mode: Use when the user wants to explore or browse raw objects across one or more collections. Unlike ask mode, this returns the actual data objects rather than a synthesized answer.
  • Hybrid Search: Default choice for most searches. Provides a good balance of semantic understanding and exact keyword matching. Use this when you are unsure which search type to pick.
  • Semantic Search: Use for finding conceptually similar content regardless of exact wording. Best when the intent matters more than specific keywords.
  • Keyword Search: Use for finding exact terms, IDs, SKUs, or specific text patterns. Best when precise keyword matching is needed rather than semantic similarity.
  • Query Agent - 问答模式: 当用户希望基于集合数据获取问题的直接答案时使用。Query Agent会从一个或多个集合中整合信息,并返回带有来源引用(集合名称和对象ID)的结构化响应。
  • Query Agent - 搜索模式: 当用户希望探索或浏览一个或多个集合中的原始对象时使用。与问答模式不同,此模式返回实际的数据对象而非整合后的答案。
  • 混合搜索: 大多数搜索场景的默认选择。在语义理解和精确关键词匹配之间实现了良好的平衡。当你不确定选择哪种搜索类型时,请使用此方式。
  • 语义搜索: 用于查找概念相似的内容,无需考虑措辞是否完全一致。最适合意图比特定关键词更重要的场景。
  • 关键词搜索: 用于查找精确术语、ID、SKU或特定文本模式。最适合需要精确关键词匹配而非语义相似性的场景。

Collection Management

集合管理

  • List Collections: Use to discover what collections exist in the Weaviate instance. This should typically be the first step before performing any search or data operation.
  • Get Collection Details: Use to understand a collection's schema — its properties, data types, vectorizer configuration, replication factor, and multi-tenancy status. Helpful before running searches or imports.
  • Explore Collection: Use to analyze data distribution, top values, and inspect actual content in a collection. Helpful for understanding what data looks like before querying.
  • Create Collection: Use to create new collections with custom schemas before importing data. Do not specify a vectorizer unless the user explicitly requests one (the default
    text2vec_weaviate
    is used).
  • 列出集合: 用于发现Weaviate实例中存在的所有集合。这通常是执行任何搜索或数据操作前的第一步。
  • 获取集合详情: 用于了解集合的架构——包括其属性、数据类型、向量化器配置、复制因子和多租户状态。在运行搜索或导入前使用非常有帮助。
  • 探索集合: 用于分析集合中的数据分布、顶级值并检查实际内容。有助于在查询前了解数据的情况。
  • 创建集合: 用于在导入数据前创建带有自定义架构的新集合。除非用户明确要求,否则不要指定向量化器(默认使用
    text2vec_weaviate
    )。

Data Operations

数据操作

  • Fetch and Filter: Use to retrieve specific objects by ID or strictly filtered subsets of data. Best for precise data retrieval rather than search.
  • Import Data: Use to bulk import data into an existing collection from CSV, JSON, or JSONL files.
  • Create Example Data: Use to create example data for immediate use of other skills, if no data is available or user requests some toy data.
  • 提取与过滤: 用于按ID检索特定对象严格过滤的数据子集。最适合精确的数据检索而非搜索。
  • 导入数据: 用于将数据从CSV、JSON或JSONL文件批量导入到现有集合中。
  • 创建示例数据: 如果没有可用数据或用户请求测试数据,可使用此功能创建示例数据,以便立即使用其他Skill。

Recommendations

建议

  1. Start by listing collections if you don't know what's available:
    bash
    uv run scripts/list_collections.py
  2. Ask the user if they want to create example data if nothing is available and the user requests it. Otherwise continue.
bash
uv run scripts/example_data.py
  1. Get collection details to understand the schema:
    bash
    uv run scripts/get_collection.py --name "COLLECTION_NAME"
  2. Explore collection data to see values and statistics:
    bash
    uv run scripts/explore_collection.py "COLLECTION_NAME"
  3. Import data to populate a new collection (if needed):
    bash
    uv run scripts/import.py "data.csv" --collection "CollectionName"
  4. Do not specify a vectorizer when creating collections unless requested:
    bash
    uv run scripts/create_collection.py Article \
      --properties '[{"name": "title", "data_type": "text"}, {"name": "body", "data_type": "text"}]'
  5. Choose the right search type:
    • Get AI-powered answers with source citations across multiple collections →
      ask.py
    • Get raw objects from multiple collections →
      query_search.py
    • General search →
      hybrid_search.py
      (default)
    • Conceptual similarity →
      semantic_search.py
    • Exact terms/IDs →
      keyword_search.py
  1. 如果不知道可用的集合,先列出所有集合
    bash
    uv run scripts/list_collections.py
  2. 如果没有可用数据且用户有需求,询问用户是否要创建示例数据。否则继续。
bash
uv run scripts/example_data.py
  1. 获取集合详情以了解架构
    bash
    uv run scripts/get_collection.py --name "COLLECTION_NAME"
  2. 探索集合数据以查看值和统计信息
    bash
    uv run scripts/explore_collection.py "COLLECTION_NAME"
  3. 导入数据以填充新集合(如有需要)
    bash
    uv run scripts/import.py "data.csv" --collection "CollectionName"
  4. 创建集合时不要指定向量化器,除非用户要求
    bash
    uv run scripts/create_collection.py Article \
      --properties '[{"name": "title", "data_type": "text"}, {"name": "body", "data_type": "text"}]'
  5. 选择合适的搜索类型:
    • 跨多个集合获取带来源引用的AI生成答案 →
      ask.py
    • 从多个集合获取原始对象 →
      query_search.py
    • 通用搜索 →
      hybrid_search.py
      (默认)
    • 概念相似性搜索 →
      semantic_search.py
    • 精确术语/ID搜索 →
      keyword_search.py

Output Formats

输出格式

All scripts support:
  • Markdown tables (default and recommended)
  • JSON (
    --json
    flag)
所有脚本支持:
  • Markdown表格(默认推荐格式)
  • JSON
    --json
    参数)

Error Handling

错误处理

Common errors:
  • WEAVIATE_URL not set
    → Set the environment variable
  • Collection not found
    → Use
    list_collections.py
    to see available collections
  • Authentication error
    → Check API keys for both Weaviate and vectorizer providers
常见错误:
  • WEAVIATE_URL not set
    → 设置对应的环境变量
  • Collection not found
    → 使用
    list_collections.py
    查看可用集合
  • Authentication error
    → 检查Weaviate和向量化器提供商的API密钥