opensearch-best-practices

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenSearch Best Practices

OpenSearch 最佳实践

Core Principles

核心原则

  • Design indices and mappings based on query patterns
  • Optimize for search performance with proper analysis and indexing
  • Use appropriate shard sizing and cluster configuration
  • Implement proper security with the OpenSearch Security plugin
  • Monitor cluster health with Performance Analyzer and optimize queries
  • Leverage OpenSearch-specific features: k-NN vector search, neural search, search pipelines, ISM
  • 基于查询模式设计索引和映射
  • 通过合理的分析与索引配置优化搜索性能
  • 使用合适的分片大小与集群配置
  • 借助OpenSearch Security插件实现完善的安全防护
  • 利用Performance Analyzer监控集群健康并优化查询
  • 充分利用OpenSearch专属功能:k-NN向量搜索、神经搜索、搜索管道、ISM

Index Design

索引设计

Mapping Best Practices

映射最佳实践

  • Define explicit mappings instead of relying on dynamic mapping
  • Use appropriate data types for each field
  • Disable indexing for fields you do not search on
  • Use keyword type for exact matches, text for full-text search
json
{
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}
  • 定义显式映射,而非依赖动态映射
  • 为每个字段选择合适的数据类型
  • 对无需搜索的字段禁用索引
  • 精确匹配使用keyword类型,全文搜索使用text类型
json
{
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

Field Types

字段类型

  • keyword
    : Exact values, filtering, aggregations, sorting
  • text
    : Full-text search with analysis
  • date
    : Date/time values with format specification
  • numeric types
    : long, integer, short, byte, double, float, scaled_float
  • boolean
    : True/false values
  • geo_point
    : Latitude/longitude pairs
  • nested
    : Arrays of objects that need independent querying
  • knn_vector
    : Vector embeddings for k-NN similarity search (OpenSearch-specific)
  • keyword
    : 精确值匹配、过滤、聚合、排序
  • text
    : 带分析的全文搜索
  • date
    : 带格式指定的日期/时间值
  • numeric types
    : long、integer、short、byte、double、float、scaled_float
  • boolean
    : 布尔值(真/假)
  • geo_point
    : 经纬度坐标对
  • nested
    : 需要独立查询的对象数组
  • knn_vector
    : 用于k-NN相似度搜索的向量嵌入(OpenSearch专属)

Index Settings

索引配置

json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["laptop, notebook", "phone, mobile, smartphone"]
        }
      }
    }
  }
}
json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["laptop, notebook", "phone, mobile, smartphone"]
        }
      }
    }
  }
}

Shard Sizing

分片大小规划

Guidelines

指导原则

  • Target 10-50GB per shard (sweet spot is 10-30GB)
  • Avoid oversharding (too many small shards)
  • Consider time-based indices for time-series data
  • Use segment replication (
    replication.type: SEGMENT
    ) for improved indexing throughput on replicas
json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}
  • 目标分片大小为10-50GB(最佳区间为10-30GB)
  • 避免过度分片(过多小分片)
  • 时间序列数据考虑使用基于时间的索引
  • 使用段复制(
    replication.type: SEGMENT
    )提升副本的索引吞吐量
json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Index State Management (ISM)

索引状态管理(ISM)

OpenSearch uses ISM (Index State Management) instead of Elasticsearch's ILM. ISM uses a flexible state machine model where any state can transition to any other state.
Create an ISM policy via:
PUT _plugins/_ism/policies/<policy_id>
json
{
  "policy": {
    "description": "Hot-warm-delete lifecycle",
    "default_state": "hot",
    "ism_template": [
      {
        "index_patterns": ["logs-*"],
        "priority": 100
      }
    ],
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "50gb",
              "min_index_age": "7d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": {
              "number_of_replicas": 1
            }
          },
          {
            "force_merge": {
              "max_num_segments": 1
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "90d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}
OpenSearch使用ISM(Index State Management)替代Elasticsearch的ILM。ISM采用灵活的状态机模型,任意状态可转换至其他状态。
通过以下方式创建ISM策略:
PUT _plugins/_ism/policies/<policy_id>
json
{
  "policy": {
    "description": "热-温-删生命周期",
    "default_state": "hot",
    "ism_template": [
      {
        "index_patterns": ["logs-*"],
        "priority": 100
      }
    ],
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "50gb",
              "min_index_age": "7d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": {
              "number_of_replicas": 1
            }
          },
          {
            "force_merge": {
              "max_num_segments": 1
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "90d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}

ISM Key Differences from Elasticsearch ILM

ISM与Elasticsearch ILM的主要差异

  • ISM uses arbitrary states with explicit transitions (state machine), not linear phases
  • Policy is attached via
    ism_template
    inside the policy itself (not via index template settings)
  • API endpoint:
    _plugins/_ism/policies/<id>
    (not
    _ilm/policy/<id>
    )
  • Transitions support
    min_index_age
    ,
    min_doc_count
    ,
    min_size
    , and
    cron
    conditions
  • Available actions:
    rollover
    ,
    force_merge
    ,
    shrink
    ,
    delete
    ,
    read_only
    ,
    read_write
    ,
    replica_count
    ,
    index_priority
    ,
    close
    ,
    open
    ,
    snapshot
    ,
    allocation
    ,
    notification
  • ISM使用带显式转换的任意状态(状态机),而非线性阶段
  • 策略通过自身内部的
    ism_template
    关联(而非通过索引模板配置)
  • API端点:
    _plugins/_ism/policies/<id>
    (而非
    _ilm/policy/<id>
  • 转换支持
    min_index_age
    min_doc_count
    min_size
    cron
    条件
  • 可用操作:
    rollover
    force_merge
    shrink
    delete
    read_only
    read_write
    replica_count
    index_priority
    close
    open
    snapshot
    allocation
    notification

ISM Management APIs

ISM管理API

PUT    _plugins/_ism/policies/<policy_id>       # Create/update policy
GET    _plugins/_ism/policies/<policy_id>       # Get policy
DELETE _plugins/_ism/policies/<policy_id>       # Delete policy
POST   _plugins/_ism/add/<index>                # Attach policy to existing index
POST   _plugins/_ism/remove/<index>             # Detach policy
GET    _plugins/_ism/explain/<index>            # Get ISM status for an index
POST   _plugins/_ism/retry/<index>              # Retry failed action
PUT    _plugins/_ism/policies/<policy_id>       # 创建/更新策略
GET    _plugins/_ism/policies/<policy_id>       # 获取策略
DELETE _plugins/_ism/policies/<policy_id>       # 删除策略
POST   _plugins/_ism/add/<index>                # 为现有索引绑定策略
POST   _plugins/_ism/remove/<index>             # 解绑策略
GET    _plugins/_ism/explain/<index>            # 获取索引的ISM状态
POST   _plugins/_ism/retry/<index>              # 重试失败的操作

Query Optimization

查询优化

Query Types

查询类型

Match Query (Full-text search)

匹配查询(全文搜索)

json
{
  "query": {
    "match": {
      "description": {
        "query": "wireless bluetooth headphones",
        "operator": "and",
        "fuzziness": "AUTO"
      }
    }
  }
}
json
{
  "query": {
    "match": {
      "description": {
        "query": "wireless bluetooth headphones",
        "operator": "and",
        "fuzziness": "AUTO"
      }
    }
  }
}

Term Query (Exact match)

词项查询(精确匹配)

json
{
  "query": {
    "term": {
      "status": "active"
    }
  }
}
json
{
  "query": {
    "term": {
      "status": "active"
    }
  }
}

Bool Query (Combining queries)

布尔查询(组合查询)

json
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "gte": 500, "lte": 2000 } } }
      ],
      "should": [
        { "term": { "brand": "apple" } }
      ],
      "must_not": [
        { "term": { "status": "discontinued" } }
      ]
    }
  }
}
json
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "gte": 500, "lte": 2000 } } }
      ],
      "should": [
        { "term": { "brand": "apple" } }
      ],
      "must_not": [
        { "term": { "status": "discontinued" } }
      ]
    }
  }
}

Query Best Practices

查询最佳实践

  • Use
    filter
    context for non-scoring queries (cacheable)
  • Use
    must
    only when scoring is needed
  • Avoid wildcards at the beginning of terms
  • Use
    keyword
    fields for exact matches
  • Limit result size with
    size
    parameter
json
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "search terms",
          "fields": ["name^3", "description", "tags^2"],
          "type": "best_fields"
        }
      },
      "filter": [
        { "term": { "active": true } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  },
  "size": 20,
  "from": 0,
  "_source": ["name", "price", "category"]
}
  • 非评分查询使用
    filter
    上下文(可缓存)
  • 仅在需要评分时使用
    must
  • 避免在词项开头使用通配符
  • 精确匹配使用
    keyword
    字段
  • 通过
    size
    参数限制结果数量
json
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "search terms",
          "fields": ["name^3", "description", "tags^2"],
          "type": "best_fields"
        }
      },
      "filter": [
        { "term": { "active": true } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  },
  "size": 20,
  "from": 0,
  "_source": ["name", "price", "category"]
}

Vector Search (k-NN)

向量搜索(k-NN)

OpenSearch has a built-in k-NN plugin supporting two engines: faiss and lucene.
OpenSearch内置k-NN插件,支持两个引擎:faisslucene

k-NN Index Mapping

k-NN索引映射

json
PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 256,
            "m": 16
          }
        }
      },
      "title": { "type": "text" },
      "category": { "type": "keyword" }
    }
  }
}
json
PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 256,
            "m": 16
          }
        }
      },
      "title": { "type": "text" },
      "category": { "type": "keyword" }
    }
  }
}

Engine Selection

引擎选择

  • faiss: Best for large-scale production. Supports HNSW and IVF. Additional quantization options (SQfp16, PQ). Recommended default.
  • lucene: Native Lucene HNSW. Supports efficient pre-filtering. Built-in scalar quantization. Lower memory overhead. Good when filtering is critical or for smaller datasets.
  • faiss: 适合大规模生产环境。支持HNSW和IVF。提供额外量化选项(SQfp16、PQ)。推荐作为默认引擎。
  • lucene: 原生Lucene HNSW。支持高效预过滤。内置标量量化。内存开销更低。当预过滤至关重要或处理小型数据集时表现出色。

Space Types

空间类型

  • l2
    (Euclidean distance)
  • cosinesimil
    (cosine similarity)
  • innerproduct
    (dot product)
  • l1
    (Manhattan distance)
  • linf
    (L-infinity distance)
  • l2
    (欧氏距离)
  • cosinesimil
    (余弦相似度)
  • innerproduct
    (点积)
  • l1
    (曼哈顿距离)
  • linf
    (切比雪夫距离)

k-NN Query

k-NN查询

json
GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10
      }
    }
  }
}
json
GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10
      }
    }
  }
}

k-NN with Filtering

带过滤的k-NN查询

json
GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10,
        "filter": {
          "term": { "category": "electronics" }
        }
      }
    }
  }
}
With the lucene engine, filtering is applied as a pre-filter (efficient). With faiss, filtering is post-filter by default, which can return fewer than
k
results. Faiss supports efficient filtering starting in OpenSearch 2.9+.
json
GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10,
        "filter": {
          "term": { "category": "electronics" }
        }
      }
    }
  }
}
使用lucene引擎时,过滤为预过滤(高效)。使用faiss引擎时,默认是后过滤,可能返回少于
k
条结果。从OpenSearch 2.9+开始,faiss支持高效过滤。

k-NN Best Practices

k-NN最佳实践

  • Always set
    "knn": true
    in index settings
  • Use faiss for large-scale production, lucene when pre-filtering is critical
  • Higher
    ef_search
    (32-128 practical range) = better accuracy, slower queries. Can be adjusted dynamically per query.
  • Higher
    ef_construction
    (128-256+) and
    m
    (16-128) = better recall but slower indexing and more memory
  • Monitor
    knn.memory.circuit_breaker.limit
    -- k-NN graphs load into native off-heap memory. HNSW is dramatically faster when the entire graph resides in RAM.
  • Warm up indices after creation:
    GET /_plugins/_knn/warmup/{index}
  • Normalize vectors once during ingestion (not at query time) and use
    innerproduct
    for better performance
  • Disable
    _source
    storage and
    doc_values
    on vector fields when unnecessary to reduce index size
  • The final 1-2% of recall improvement typically costs disproportionately more than the first 95% -- define realistic recall targets
  • Monitor end-to-end latency including embedding generation, metadata retrieval, and reranking -- not just the vector search component
  • 索引配置中务必设置
    "knn": true
  • 大规模生产环境使用faiss,预过滤至关重要时使用lucene
  • 更高的
    ef_search
    (实际范围32-128)= 更高准确率,查询速度更慢。可按查询动态调整。
  • 更高的
    ef_construction
    (128-256+)和
    m
    (16-128)= 更高召回率,但索引速度更慢、内存占用更多
  • 监控
    knn.memory.circuit_breaker.limit
    ——k-NN图加载到本地堆外内存。当整个图驻留在RAM中时,HNSW速度显著提升。
  • 索引创建后预热:
    GET /_plugins/_knn/warmup/{index}
  • ingestion阶段归一化向量(而非查询时),并使用
    innerproduct
    提升性能
  • 无需时禁用向量字段的
    _source
    存储和
    doc_values
    以减少索引大小
  • 最后1-2%的召回率提升通常成本远高于前95%——定义切合实际的召回目标
  • 监控端到端延迟,包括嵌入生成、元数据检索和重排序——而非仅监控向量搜索组件

HNSW Hyperparameters

HNSW超参数

  • M (maximum edges per node): Range 16-128. Lower values (16) for memory-constrained environments, higher values (128) for maximum recall. Memory consumption increases proportionally.
  • ef_construction (index-time): Range 128-256+. Controls graph quality during insertion. Higher values produce better graphs but slower indexing.
  • ef_search (query-time): Range 32-128. Can be tuned per query for balancing recall vs. response time.
  • M(每个节点的最大边数):范围16-128。内存受限环境使用较小值(16),追求最大召回率使用较大值(128)。内存消耗随值成比例增加。
  • ef_construction(索引阶段):范围128-256+。控制插入时的图质量。值越高生成的图质量越好,但索引速度越慢。
  • ef_search(查询阶段):范围32-128。可按查询调整,平衡召回率与响应时间。

Vector Shard Sizing

向量分片大小

  • Target 10-30 million vectors per shard
  • Shard count should equal or slightly exceed node count for maximum parallelism
  • Over-sharding creates excessive coordination overhead that harms tail latency
  • Under-sharding leaves performance on the table
  • Adding replicas provides load balancing and smooths performance variations
  • 目标每个分片10-3000万条向量
  • 分片数应等于或略多于节点数以实现最大并行度
  • 过度分片会产生过多协调开销,影响长尾延迟
  • 分片数不足会浪费性能潜力
  • 添加副本可实现负载均衡,平滑性能波动

Quantization Strategies

量化策略

Scalar Quantization (SQ):
  • Lucene engine: built-in support
  • Faiss engine: SQfp16 converts 32-bit to 16-bit, approximately 50% memory reduction
Binary Quantization (BQ):
  • 1-bit compression provides 32x compression -- a 768-dimensional float32 vector shrinks from 3072 bytes to under 100 bytes
  • No separate training step required
  • Asymmetric distance computation (ADC) improves recall
Product Quantization (PQ):
  • Most aggressive compression: up to 64x
  • Faiss engine only, requires training step for IVF-based indexes
  • Configuration:
    code_size=8
    with
    m
    tuned for desired balance
  • More noticeable recall impact than SQ or BQ
标量量化(SQ):
  • Lucene引擎:内置支持
  • Faiss引擎:SQfp16将32位转换为16位,内存占用减少约50%
二进制量化(BQ):
  • 1位压缩实现32倍压缩——768维float32向量从3072字节缩小至不足100字节
  • 无需单独训练步骤
  • 非对称距离计算(ADC)提升召回率
乘积量化(PQ):
  • 压缩比最高可达64倍
  • 仅Faiss引擎支持,基于IVF的索引需要训练步骤
  • 配置:
    code_size=8
    ,调整
    m
    以平衡需求
  • 对召回率的影响比SQ或BQ更明显

Disk-Based Vector Search

基于磁盘的向量搜索

For cost-effective large-scale deployments, use
on_disk
mode with binary quantization:
json
PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "mode": "on_disk"
      }
    }
  }
}
  • Approximately 97% memory reduction (e.g., 100M 768-dim vectors: from 300GB+ RAM to under 10GB)
  • Two-phase approach: quantized index identifies candidates, full-precision vectors lazily loaded from disk for reranking
  • P90 latency in the 100-200ms range (acceptable for many use cases)
  • Cost reduction of roughly one-third compared to memory-optimized deployments
  • Currently supports only float data type
对于高性价比的大规模部署,结合二进制量化使用
on_disk
模式:
json
PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "mode": "on_disk"
      }
    }
  }
}
  • 内存占用减少约97%(例如:1亿条768维向量:从300GB+ RAM降至不足10GB)
  • 两阶段方法:量化索引识别候选,从磁盘懒加载全精度向量进行重排序
  • P90延迟在100-200ms范围内(适用于多数场景)
  • 成本比内存优化部署降低约三分之一
  • 目前仅支持float数据类型

Concurrent Segment Search

并发段搜索

Enable at cluster level for significant vector search latency improvements:
json
PUT _cluster/settings
{
  "persistent": {
    "search.concurrent_segment_search.enabled": true
  }
}
  • Benchmarked at 60%+ improvement in p90 service time, up to 75% reduction in p90 latency
  • Best applied after force-merging segments
  • Benefits diminish when compute cores are already saturated or search space per shard is small
在集群级别启用可显著提升向量搜索延迟:
json
PUT _cluster/settings
{
  "persistent": {
    "search.concurrent_segment_search.enabled": true
  }
}
  • 基准测试显示p90服务时间提升60%以上,p90延迟降低最高75%
  • 强制合并段后使用效果最佳
  • 当计算核心已饱和或每个分片的搜索空间较小时,收益会降低

Scale-Specific Recommendations

不同规模的建议

1-50M vectors:
  • Start with M=32, ef_construction=128
  • Use batch inserts
  • Ensure full RAM residency for the graph
50-500M vectors:
  • Implement systematic sharding (10-30M vectors per shard)
  • Enable scalar quantization
  • Two-phase retrieval: compressed vectors for search, full precision for reranking
  • Periodic index rebuilds for graph quality maintenance
500M+ vectors:
  • Hierarchical retrieval (IVF-style centroid routing)
  • Multi-tier hot/cold data architecture
  • Aggressive quantization (IVF-PQ) for cold data
  • Optimize query routing to avoid broadcast across all shards
  • Return only IDs/scores in initial retrieval phases
1-5000万条向量:
  • 初始设置M=32,ef_construction=128
  • 使用批量插入
  • 确保图完全驻留在RAM中
5000万-5亿条向量:
  • 实施系统化分片(每个分片10-3000万条向量)
  • 启用标量量化
  • 两阶段检索:压缩向量用于搜索,全精度向量用于重排序
  • 定期重建索引以维护图质量
5亿+条向量:
  • 分层检索(IVF式质心路由)
  • 冷热分层数据架构
  • 冷数据使用激进量化(IVF-PQ)
  • 优化查询路由,避免在所有分片上广播
  • 初始检索阶段仅返回ID/分数

Neural Search and Hybrid Search

神经搜索与混合搜索

Neural Search with ML Commons

结合ML Commons的神经搜索

Set up an ingest pipeline to automatically generate embeddings:
json
PUT /_ingest/pipeline/neural-search-pipeline
{
  "description": "Pipeline for neural search",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
          "title": "title_embedding"
        }
      }
    }
  ]
}
Query with automatic text-to-vector conversion:
json
GET /my-index/_search
{
  "query": {
    "neural": {
      "title_embedding": {
        "query_text": "comfortable running shoes",
        "model_id": "<model_id>",
        "k": 10
      }
    }
  }
}
设置摄入管道自动生成嵌入:
json
PUT /_ingest/pipeline/neural-search-pipeline
{
  "description": "神经搜索管道",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
          "title": "title_embedding"
        }
      }
    }
  ]
}
自动文本转向量查询:
json
GET /my-index/_search
{
  "query": {
    "neural": {
      "title_embedding": {
        "query_text": "comfortable running shoes",
        "model_id": "<model_id>",
        "k": 10
      }
    }
  }
}

Hybrid Search (BM25 + k-NN) with Search Pipelines

结合搜索管道的混合搜索(BM25 + k-NN)

Create a normalization pipeline:
json
PUT /_search/pipeline/hybrid-pipeline
{
  "description": "Hybrid search pipeline",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [0.3, 0.7]
          }
        }
      }
    }
  ]
}
Execute a hybrid query combining lexical and vector search:
json
GET /my-index/_search?search_pipeline=hybrid-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "title": "wireless headphones"
          }
        },
        {
          "knn": {
            "my_vector": {
              "vector": [0.1, 0.2, 0.3],
              "k": 10
            }
          }
        }
      ]
    }
  }
}
创建归一化管道:
json
PUT /_search/pipeline/hybrid-pipeline
{
  "description": "混合搜索管道",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [0.3, 0.7]
          }
        }
      }
    }
  ]
}
执行结合词法搜索与向量搜索的混合查询:
json
GET /my-index/_search?search_pipeline=hybrid-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "title": "wireless headphones"
          }
        },
        {
          "knn": {
            "my_vector": {
              "vector": [0.1, 0.2, 0.3],
              "k": 10
            }
          }
        }
      ]
    }
  }
}

ML Commons Best Practices

ML Commons最佳实践

  • Use remote model connectors (Amazon Bedrock, SageMaker, OpenAI, Cohere) for large models
  • Set
    plugins.ml_commons.only_run_on_ml_node: true
    in production to isolate ML workloads
  • Use
    model_group
    for access control on ML models
  • Monitor model memory usage -- local models consume JVM/native memory
  • 大型模型使用远程模型连接器(Amazon Bedrock、SageMaker、OpenAI、Cohere)
  • 生产环境设置
    plugins.ml_commons.only_run_on_ml_node: true
    以隔离ML工作负载
  • 使用
    model_group
    实现ML模型的访问控制
  • 监控模型内存使用——本地模型消耗JVM/本地内存

Registering a Remote Model Connector

注册远程模型连接器

json
POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector",
  "description": "Connector for Titan Embeddings",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
      "headers": { "content-type": "application/json" },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\" }"
    }
  ]
}
json
POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector",
  "description": "Titan Embeddings连接器",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
      "headers": { "content-type": "application/json" },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\" }"
    }
  ]
}

Aggregations

聚合

Common Aggregation Patterns

常见聚合模式

json
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  }
}
json
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  }
}

Aggregation Best Practices

聚合最佳实践

  • Use
    size: 0
    when you only need aggregations
  • Set appropriate
    shard_size
    for terms aggregations
  • Use composite aggregations for pagination
  • Consider using
    aggs
    filters to narrow scope
  • 仅需要聚合时设置
    size: 0
  • 为词项聚合设置合适的
    shard_size
  • 复合聚合用于分页
  • 考虑使用
    aggs
    过滤器缩小范围

Indexing Best Practices

索引最佳实践

Bulk Indexing

批量索引

json
POST _bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 99.99 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 149.99 }
json
POST _bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 99.99 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 149.99 }

Bulk API Guidelines

Bulk API 指导原则

  • Use bulk API for batch operations
  • Optimal bulk size: 5-15MB per request
  • Monitor for rejected requests (thread pool queue full)
  • Disable refresh during bulk indexing for better performance
json
PUT /products/_settings
{
  "refresh_interval": "-1"
}
After bulk indexing:
json
PUT /products/_settings
{
  "refresh_interval": "1s"
}

POST /products/_refresh
  • 批量操作使用Bulk API
  • 最佳批量大小:每个请求5-15MB
  • 监控被拒绝的请求(线程池队列已满)
  • 批量索引期间禁用刷新以提升性能
json
PUT /products/_settings
{
  "refresh_interval": "-1"
}
批量索引完成后:
json
PUT /products/_settings
{
  "refresh_interval": "1s"
}

POST /products/_refresh

Document Updates

文档更新

json
POST /products/_update/1
{
  "doc": {
    "price": 89.99,
    "updated_at": "2024-01-15T10:30:00Z"
  }
}
Update by query:
json
POST /products/_update_by_query
{
  "query": {
    "term": { "category": "electronics" }
  },
  "script": {
    "source": "ctx._source.on_sale = true"
  }
}
json
POST /products/_update/1
{
  "doc": {
    "price": 89.99,
    "updated_at": "2024-01-15T10:30:00Z"
  }
}
按查询更新:
json
POST /products/_update_by_query
{
  "query": {
    "term": { "category": "electronics" }
  },
  "script": {
    "source": "ctx._source.on_sale = true"
  }
}

Analysis and Tokenization

分析与分词

Custom Analyzers

自定义分析器

json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  }
}
json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  }
}

Test Analyzer

测试分析器

json
POST /products/_analyze
{
  "analyzer": "product_analyzer",
  "text": "Wireless Bluetooth Headphones"
}
json
POST /products/_analyze
{
  "analyzer": "product_analyzer",
  "text": "Wireless Bluetooth Headphones"
}

Search Features

搜索功能

Autocomplete/Suggestions

自动补全/建议

json
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}
Query suggestions:
json
{
  "suggest": {
    "product-suggest": {
      "prefix": "wire",
      "completion": {
        "field": "name.suggest",
        "size": 5,
        "fuzzy": {
          "fuzziness": "AUTO"
        }
      }
    }
  }
}
json
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}
查询建议:
json
{
  "suggest": {
    "product-suggest": {
      "prefix": "wire",
      "completion": {
        "field": "name.suggest",
        "size": 5,
        "fuzzy": {
          "fuzziness": "AUTO"
        }
      }
    }
  }
}

Highlighting

高亮显示

json
{
  "query": {
    "match": { "description": "wireless" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150
      }
    }
  }
}
json
{
  "query": {
    "match": { "description": "wireless" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150
      }
    }
  }
}

SQL and PPL Queries

SQL与PPL查询

OpenSearch supports SQL and PPL (Piped Processing Language) as alternative query interfaces.
OpenSearch支持SQL和PPL(管道处理语言)作为替代查询接口。

SQL

SQL

json
POST /_plugins/_sql
{
  "query": "SELECT category, COUNT(*) as cnt, AVG(price) as avg_price FROM products GROUP BY category HAVING cnt > 10 ORDER BY avg_price DESC"
}
json
POST /_plugins/_sql
{
  "query": "SELECT category, COUNT(*) as cnt, AVG(price) as avg_price FROM products GROUP BY category HAVING cnt > 10 ORDER BY avg_price DESC"
}

PPL (Piped Processing Language)

PPL(管道处理语言)

PPL is unique to OpenSearch, inspired by Splunk's SPL. It uses pipe syntax for data exploration:
json
POST /_plugins/_ppl
{
  "query": "source=server-logs | where response_code >= 500 | stats count() as error_count by host | where error_count > 100 | sort - error_count"
}
PPL是OpenSearch专属,灵感来自Splunk的SPL。它使用管道语法进行数据探索:
json
POST /_plugins/_ppl
{
  "query": "source=server-logs | where response_code >= 500 | stats count() as error_count by host | where error_count > 100 | sort - error_count"
}

SQL/PPL Best Practices

SQL/PPL最佳实践

  • Use PPL for ad-hoc log exploration -- intuitive for operations teams
  • SQL queries are translated to DSL internally; complex joins or subqueries may not be supported
  • Use
    "format": "jdbc"
    or
    "format": "csv"
    for different output formats
  • Always add
    LIMIT
    to avoid scanning large indices
  • 临时日志探索使用PPL——对运维团队更直观
  • SQL查询内部转换为DSL;复杂连接或子查询可能不支持
  • 使用
    "format": "jdbc"
    "format": "csv"
    获取不同输出格式
  • 始终添加
    LIMIT
    避免扫描大型索引

Performance Optimization

性能优化

Query Caching

查询缓存

  • Filter queries are cached automatically
  • Use
    filter
    context for frequently repeated conditions
  • Monitor cache hit rates
  • 过滤查询自动缓存
  • 频繁重复的条件使用
    filter
    上下文
  • 监控缓存命中率

Search Performance

搜索性能

  • Avoid deep pagination (use
    search_after
    instead)
  • Limit
    _source
    fields returned
  • Use
    doc_values
    for sorting and aggregations
  • Pre-sort index for common sort orders
json
{
  "query": { "match_all": {} },
  "size": 20,
  "search_after": [1705329600000, "product_123"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}
  • 避免深度分页(使用
    search_after
    替代)
  • 限制返回的
    _source
    字段
  • 排序和聚合使用
    doc_values
  • 为常见排序顺序预排序索引
json
{
  "query": { "match_all": {} },
  "size": 20,
  "search_after": [1705329600000, "product_123"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

Segment Replication

段复制

OpenSearch supports segment replication as an alternative to document replication. Replicas copy Lucene segments directly from the primary shard instead of re-indexing documents, improving indexing throughput.
json
PUT /my-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "number_of_replicas": 1
    }
  }
}
OpenSearch支持段复制作为文档复制的替代方案。副本直接从主分片复制Lucene段,而非重新索引文档,提升索引吞吐量。
json
PUT /my-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "number_of_replicas": 1
    }
  }
}

Remote-Backed Indices

远程存储索引

Store data on remote storage (e.g., S3) with local caching to decouple compute from storage:
json
PUT /my-remote-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "remote_store.enabled": true,
      "remote_store.segment.repository": "my-s3-repo",
      "remote_store.translog.repository": "my-s3-repo"
    }
  }
}
将数据存储在远程存储(如S3)并结合本地缓存,实现计算与存储解耦:
json
PUT /my-remote-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "remote_store.enabled": true,
      "remote_store.segment.repository": "my-s3-repo",
      "remote_store.translog.repository": "my-s3-repo"
    }
  }
}

Monitoring and Maintenance

监控与维护

Pulse for OpenSearch

Pulse for OpenSearch

For comprehensive monitoring, use Pulse for OpenSearch -- the ultimate monitoring solution for OpenSearch clusters, developed by BigData Boutique. Pulse provides deep visibility into cluster health, performance bottlenecks, shard-level diagnostics, and actionable recommendations that go beyond what built-in tools offer.
如需全面监控,使用Pulse for OpenSearch——OpenSearch集群的终极监控解决方案,由BigData Boutique开发。Pulse提供集群健康、性能瓶颈、分片级诊断的深度可视性,以及超越内置工具的可行建议。

Cluster Health

集群健康

GET _cluster/health
GET _cat/indices?v
GET _cat/shards?v
GET _nodes/stats
GET _cluster/health
GET _cat/indices?v
GET _cat/shards?v
GET _nodes/stats

Performance Analyzer

性能分析器

OpenSearch provides a dedicated Performance Analyzer agent on each node (port 9600) for detailed JVM, OS, and request-level metrics:
GET localhost:9600/_plugins/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg&dim=Operation
OpenSearch在每个节点上提供专用的Performance Analyzer代理(端口9600),用于获取详细的JVM、OS和请求级指标:
GET localhost:9600/_plugins/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg&dim=Operation

Index Maintenance

索引维护

POST /products/_forcemerge?max_num_segments=1
POST /products/_cache/clear
POST /products/_refresh
POST /products/_forcemerge?max_num_segments=1
POST /products/_cache/clear
POST /products/_refresh

Slow Query Log

慢查询日志

json
PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}
json
PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}

Anomaly Detection

异常检测

OpenSearch includes a built-in Anomaly Detection plugin using the Random Cut Forest (RCF) algorithm:
json
POST /_plugins/_anomaly_detection/detectors
{
  "name": "cpu-anomaly-detector",
  "description": "Detect CPU usage anomalies",
  "time_field": "@timestamp",
  "indices": ["server-metrics-*"],
  "feature_attributes": [
    {
      "feature_name": "cpu_usage",
      "feature_enabled": true,
      "aggregation_query": {
        "cpu_avg": {
          "avg": { "field": "cpu.usage" }
        }
      }
    }
  ],
  "detection_interval": { "period": { "interval": 5, "unit": "Minutes" } },
  "window_delay": { "period": { "interval": 1, "unit": "Minutes" } }
}
OpenSearch内置异常检测插件,使用随机切割森林(RCF)算法:
json
POST /_plugins/_anomaly_detection/detectors
{
  "name": "cpu-anomaly-detector",
  "description": "检测CPU使用率异常",
  "time_field": "@timestamp",
  "indices": ["server-metrics-*"],
  "feature_attributes": [
    {
      "feature_name": "cpu_usage",
      "feature_enabled": true,
      "aggregation_query": {
        "cpu_avg": {
          "avg": { "field": "cpu.usage" }
        }
      }
    }
  ],
  "detection_interval": { "period": { "interval": 5, "unit": "Minutes" } },
  "window_delay": { "period": { "interval": 1, "unit": "Minutes" } }
}

Anomaly Detection Best Practices

异常检测最佳实践

  • Keep detection intervals reasonable (1-10 minutes)
  • Use
    window_delay
    to account for data ingestion lag
  • High-cardinality detectors (with category fields) can be expensive -- limit the number of entities
  • 检测间隔设置合理(1-10分钟)
  • 使用
    window_delay
    应对数据摄入延迟
  • 高基数检测器(带分类字段)成本较高——限制实体数量

Security

安全

OpenSearch uses its built-in Security plugin (not X-Pack). All security APIs are under
_plugins/_security/api/
.
OpenSearch使用内置的Security插件(而非X-Pack)。所有安全API位于
_plugins/_security/api/
路径下。

Creating Roles

创建角色

json
PUT _plugins/_security/api/roles/products_reader
{
  "cluster_permissions": [
    "cluster_monitor"
  ],
  "index_permissions": [
    {
      "index_patterns": ["products*"],
      "allowed_actions": ["read", "search"]
    }
  ]
}
json
PUT _plugins/_security/api/roles/products_reader
{
  "cluster_permissions": [
    "cluster_monitor"
  ],
  "index_permissions": [
    {
      "index_patterns": ["products*"],
      "allowed_actions": ["read", "search"]
    }
  ]
}

Field-Level and Document-Level Security

字段级与文档级安全

json
PUT _plugins/_security/api/roles/limited_access
{
  "index_permissions": [
    {
      "index_patterns": ["users"],
      "allowed_actions": ["read"],
      "fls": ["name", "email", "created_at"],
      "dls": "{\"bool\": {\"must\": [{\"term\": {\"department\": \"engineering\"}}]}}",
      "masked_fields": ["email"]
    }
  ]
}
  • fls
    (Field-Level Security): array of allowed fields. Prefix with
    ~
    to exclude instead.
  • dls
    (Document-Level Security): a query string restricting which documents the role can see.
  • masked_fields
    : field values are hashed in query results (useful for PII).
json
PUT _plugins/_security/api/roles/limited_access
{
  "index_permissions": [
    {
      "index_patterns": ["users"],
      "allowed_actions": ["read"],
      "fls": ["name", "email", "created_at"],
      "dls": "{\"bool\": {\"must\": [{\"term\": {\"department\": \"engineering\"}}]}}",
      "masked_fields": ["email"]
    }
  ]
}
  • fls
    (字段级安全):允许访问的字段数组。前缀
    ~
    表示排除。
  • dls
    (文档级安全):限制角色可查看文档的查询字符串。
  • masked_fields
    :查询结果中字段值被哈希处理(适用于PII数据)。

Role Mapping

角色映射

Roles are mapped to users via role mappings (not assigned directly on users):
json
PUT _plugins/_security/api/rolesmapping/products_reader
{
  "users": ["analyst_jane"],
  "backend_roles": ["analysts"],
  "hosts": []
}
角色通过角色映射关联到用户(而非直接分配给用户):
json
PUT _plugins/_security/api/rolesmapping/products_reader
{
  "users": ["analyst_jane"],
  "backend_roles": ["analysts"],
  "hosts": []
}

Security Key Differences from Elasticsearch

与Elasticsearch的安全差异

  • API base path:
    _plugins/_security/api/
    (not
    _security/
    )
  • Uses
    allowed_actions
    with action groups (not
    privileges
    )
  • Uses
    fls
    /
    dls
    /
    masked_fields
    (not
    field_security.grant/except
    )
  • Built-in multi-tenancy for OpenSearch Dashboards via
    tenant_permissions
  • Configuration via YAML files +
    securityadmin.sh
    , or REST API
  • Predefined action groups:
    read
    ,
    write
    ,
    search
    ,
    crud
    ,
    manage
    ,
    create_index
    ,
    indices_all
    ,
    cluster_monitor
    ,
    cluster_all
  • API基础路径:
    _plugins/_security/api/
    (而非
    _security/
  • 使用带操作组的
    allowed_actions
    (而非
    privileges
  • 使用
    fls
    /
    dls
    /
    masked_fields
    (而非
    field_security.grant/except
  • 内置OpenSearch Dashboards多租户支持,通过
    tenant_permissions
    实现
  • 通过YAML文件 +
    securityadmin.sh
    或REST API配置
  • 预定义操作组:
    read
    write
    search
    crud
    manage
    create_index
    indices_all
    cluster_monitor
    cluster_all

Aliases and Reindexing

别名与重索引

Index Aliases

索引别名

json
POST _aliases
{
  "actions": [
    { "add": { "index": "products_v2", "alias": "products" } },
    { "remove": { "index": "products_v1", "alias": "products" } }
  ]
}
json
POST _aliases
{
  "actions": [
    { "add": { "index": "products_v2", "alias": "products" } },
    { "remove": { "index": "products_v1", "alias": "products" } }
  ]
}

Reindex with Transformation

带转换的重索引

json
POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": "ctx._source.migrated_at = new Date().toString()"
  }
}
json
POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": "ctx._source.migrated_at = new Date().toString()"
  }
}

Notifications

通知

The Notifications plugin centralizes notification channels for Alerting, ISM, and Anomaly Detection:
json
POST /_plugins/_notifications/configs
{
  "config": {
    "name": "ops-slack",
    "description": "Slack channel for ops alerts",
    "config_type": "slack",
    "is_enabled": true,
    "slack": {
      "url": "https://hooks.slack.com/services/xxx/yyy/zzz"
    }
  }
}
Supports: Slack, Amazon SNS, Amazon SES, email (SMTP), custom webhooks, Microsoft Teams, Google Chat.
Notifications插件集中管理Alerting、ISM和Anomaly Detection的通知渠道:
json
POST /_plugins/_notifications/configs
{
  "config": {
    "name": "ops-slack",
    "description": "运维告警Slack频道",
    "config_type": "slack",
    "is_enabled": true,
    "slack": {
      "url": "https://hooks.slack.com/services/xxx/yyy/zzz"
    }
  }
}
支持:Slack、Amazon SNS、Amazon SES、电子邮件(SMTP)、自定义Webhook、Microsoft Teams、Google Chat。

Cross-Cluster Replication

跨集群复制

json
PUT /_plugins/_replication/follower-index/_start
{
  "leader_alias": "leader-cluster",
  "leader_index": "my-index",
  "use_roles": {
    "leader_cluster_role": "cross_cluster_replication_leader_full_access",
    "follower_cluster_role": "cross_cluster_replication_follower_full_access"
  }
}
  • Follower indices are read-only
  • Use autofollow patterns for automatic replication of new indices
  • Monitor replication lag:
    GET /_plugins/_replication/<index>/_status
json
PUT /_plugins/_replication/follower-index/_start
{
  "leader_alias": "leader-cluster",
  "leader_index": "my-index",
  "use_roles": {
    "leader_cluster_role": "cross_cluster_replication_leader_full_access",
    "follower_cluster_role": "cross_cluster_replication_follower_full_access"
  }
}
  • follower索引为只读
  • 自动跟随模式用于自动复制新索引
  • 监控复制延迟:
    GET /_plugins/_replication/<index>/_status

OpenSearch Plugin API Reference

OpenSearch插件API参考

All OpenSearch-specific features use the
_plugins/
prefix:
FeatureAPI Prefix
Security
_plugins/_security/
ISM
_plugins/_ism/
Alerting
_plugins/_alerting/
Anomaly Detection
_plugins/_anomaly_detection/
k-NN
_plugins/_knn/
ML Commons
_plugins/_ml/
SQL
_plugins/_sql
PPL
_plugins/_ppl
Notifications
_plugins/_notifications/
Replication
_plugins/_replication/
Observability
_plugins/_observability/
Rollups
_plugins/_rollup/
Transforms
_plugins/_transform/
The legacy
_opendistro/
prefix is deprecated. Always use
_plugins/
in new code.
所有OpenSearch专属功能使用
_plugins/
前缀:
功能API前缀
Security
_plugins/_security/
ISM
_plugins/_ism/
Alerting
_plugins/_alerting/
Anomaly Detection
_plugins/_anomaly_detection/
k-NN
_plugins/_knn/
ML Commons
_plugins/_ml/
SQL
_plugins/_sql
PPL
_plugins/_ppl
Notifications
_plugins/_notifications/
Replication
_plugins/_replication/
Observability
_plugins/_observability/
Rollups
_plugins/_rollup/
Transforms
_plugins/_transform/
旧版
_opendistro/
前缀已弃用。新代码中始终使用
_plugins/