opensearch-best-practices

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenSearch Best Practices

OpenSearch 最佳实践

Core Principles

核心原则

Design indices and mappings based on query patterns
Optimize for search performance with proper analysis and indexing
Use appropriate shard sizing and cluster configuration
Implement proper security with the OpenSearch Security plugin
Monitor cluster health with Performance Analyzer and optimize queries
Leverage OpenSearch-specific features: k-NN vector search, neural search, search pipelines, ISM

基于查询模式设计索引和映射
通过合理的分析与索引配置优化搜索性能
使用合适的分片大小与集群配置
借助OpenSearch Security插件实现完善的安全防护
利用Performance Analyzer监控集群健康并优化查询
充分利用OpenSearch专属功能：k-NN向量搜索、神经搜索、搜索管道、ISM

Index Design

索引设计

Mapping Best Practices

映射最佳实践

Define explicit mappings instead of relying on dynamic mapping
Use appropriate data types for each field
Disable indexing for fields you do not search on
Use keyword type for exact matches, text for full-text search

json

{
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

定义显式映射，而非依赖动态映射
为每个字段选择合适的数据类型
对无需搜索的字段禁用索引
精确匹配使用keyword类型，全文搜索使用text类型

json

{
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

Field Types

字段类型

```
keyword
```
: Exact values, filtering, aggregations, sorting
```
text
```
: Full-text search with analysis
```
date
```
: Date/time values with format specification
```
numeric types
```
: long, integer, short, byte, double, float, scaled_float
```
boolean
```
: True/false values
```
geo_point
```
: Latitude/longitude pairs
```
nested
```
: Arrays of objects that need independent querying
```
knn_vector
```
: Vector embeddings for k-NN similarity search (OpenSearch-specific)

```
keyword
```
: 精确值匹配、过滤、聚合、排序
```
text
```
: 带分析的全文搜索
```
date
```
: 带格式指定的日期/时间值
```
numeric types
```
: long、integer、short、byte、double、float、scaled_float
```
boolean
```
: 布尔值（真/假）
```
geo_point
```
: 经纬度坐标对
```
nested
```
: 需要独立查询的对象数组
```
knn_vector
```
: 用于k-NN相似度搜索的向量嵌入（OpenSearch专属）

Index Settings

索引配置

json

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["laptop, notebook", "phone, mobile, smartphone"]
        }
      }
    }
  }
}

json

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["laptop, notebook", "phone, mobile, smartphone"]
        }
      }
    }
  }
}

Shard Sizing

分片大小规划

Guidelines

指导原则

Target 10-50GB per shard (sweet spot is 10-30GB)
Avoid oversharding (too many small shards)
Consider time-based indices for time-series data
Use segment replication (
```
replication.type: SEGMENT
```
) for improved indexing throughput on replicas

json

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

目标分片大小为10-50GB（最佳区间为10-30GB）
避免过度分片（过多小分片）
时间序列数据考虑使用基于时间的索引
使用段复制（
```
replication.type: SEGMENT
```
）提升副本的索引吞吐量

json

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Index State Management (ISM)

索引状态管理（ISM）

OpenSearch uses ISM (Index State Management) instead of Elasticsearch's ILM. ISM uses a flexible state machine model where any state can transition to any other state.

Create an ISM policy via:

PUT _plugins/_ism/policies/<policy_id>

json

{
  "policy": {
    "description": "Hot-warm-delete lifecycle",
    "default_state": "hot",
    "ism_template": [
      {
        "index_patterns": ["logs-*"],
        "priority": 100
      }
    ],
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "50gb",
              "min_index_age": "7d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": {
              "number_of_replicas": 1
            }
          },
          {
            "force_merge": {
              "max_num_segments": 1
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "90d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}

OpenSearch使用ISM（Index State Management）替代Elasticsearch的ILM。ISM采用灵活的状态机模型，任意状态可转换至其他状态。

通过以下方式创建ISM策略：

PUT _plugins/_ism/policies/<policy_id>

json

{
  "policy": {
    "description": "热-温-删生命周期",
    "default_state": "hot",
    "ism_template": [
      {
        "index_patterns": ["logs-*"],
        "priority": 100
      }
    ],
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "50gb",
              "min_index_age": "7d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": {
              "number_of_replicas": 1
            }
          },
          {
            "force_merge": {
              "max_num_segments": 1
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "90d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}

ISM Key Differences from Elasticsearch ILM

ISM与Elasticsearch ILM的主要差异

ISM uses arbitrary states with explicit transitions (state machine), not linear phases
Policy is attached via
```
ism_template
```
inside the policy itself (not via index template settings)

API endpoint:

_plugins/_ism/policies/<id>

(not

_ilm/policy/<id>

)

Transitions support
```
min_index_age
```
,
```
min_doc_count
```
,
```
min_size
```
, and
```
cron
```
conditions

Available actions:

rollover

force_merge

shrink

delete

read_only

read_write

replica_count

index_priority

close

open

snapshot

allocation

notification

ISM使用带显式转换的任意状态（状态机），而非线性阶段
策略通过自身内部的
```
ism_template
```
关联（而非通过索引模板配置）

API端点：

_plugins/_ism/policies/<id>

（而非

_ilm/policy/<id>

）

转换支持
```
min_index_age
```
、
```
min_doc_count
```
、
```
min_size
```
和
```
cron
```
条件

可用操作：

rollover

、

force_merge

、

shrink

、

delete

、

read_only

、

read_write

、

replica_count

、

index_priority

、

close

、

open

、

snapshot

、

allocation

、

notification

ISM Management APIs

ISM管理API

PUT    _plugins/_ism/policies/<policy_id>       # Create/update policy
GET    _plugins/_ism/policies/<policy_id>       # Get policy
DELETE _plugins/_ism/policies/<policy_id>       # Delete policy
POST   _plugins/_ism/add/<index>                # Attach policy to existing index
POST   _plugins/_ism/remove/<index>             # Detach policy
GET    _plugins/_ism/explain/<index>            # Get ISM status for an index
POST   _plugins/_ism/retry/<index>              # Retry failed action

PUT    _plugins/_ism/policies/<policy_id>       # 创建/更新策略
GET    _plugins/_ism/policies/<policy_id>       # 获取策略
DELETE _plugins/_ism/policies/<policy_id>       # 删除策略
POST   _plugins/_ism/add/<index>                # 为现有索引绑定策略
POST   _plugins/_ism/remove/<index>             # 解绑策略
GET    _plugins/_ism/explain/<index>            # 获取索引的ISM状态
POST   _plugins/_ism/retry/<index>              # 重试失败的操作

Query Optimization

查询优化

Query Types

查询类型

Match Query (Full-text search)

匹配查询（全文搜索）

json

{
  "query": {
    "match": {
      "description": {
        "query": "wireless bluetooth headphones",
        "operator": "and",
        "fuzziness": "AUTO"
      }
    }
  }
}

json

{
  "query": {
    "match": {
      "description": {
        "query": "wireless bluetooth headphones",
        "operator": "and",
        "fuzziness": "AUTO"
      }
    }
  }
}

Term Query (Exact match)

词项查询（精确匹配）

json

{
  "query": {
    "term": {
      "status": "active"
    }
  }
}

json

{
  "query": {
    "term": {
      "status": "active"
    }
  }
}

Bool Query (Combining queries)

布尔查询（组合查询）

json

{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "gte": 500, "lte": 2000 } } }
      ],
      "should": [
        { "term": { "brand": "apple" } }
      ],
      "must_not": [
        { "term": { "status": "discontinued" } }
      ]
    }
  }
}

json

{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "gte": 500, "lte": 2000 } } }
      ],
      "should": [
        { "term": { "brand": "apple" } }
      ],
      "must_not": [
        { "term": { "status": "discontinued" } }
      ]
    }
  }
}

Query Best Practices

查询最佳实践

Use
```
filter
```
context for non-scoring queries (cacheable)
Use
```
must
```
only when scoring is needed
Avoid wildcards at the beginning of terms
Use
```
keyword
```
fields for exact matches
Limit result size with
```
size
```
parameter

json

{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "search terms",
          "fields": ["name^3", "description", "tags^2"],
          "type": "best_fields"
        }
      },
      "filter": [
        { "term": { "active": true } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  },
  "size": 20,
  "from": 0,
  "_source": ["name", "price", "category"]
}

非评分查询使用
```
filter
```
上下文（可缓存）
仅在需要评分时使用
```
must
```
避免在词项开头使用通配符
精确匹配使用
```
keyword
```
字段
通过
```
size
```
参数限制结果数量

json

{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "search terms",
          "fields": ["name^3", "description", "tags^2"],
          "type": "best_fields"
        }
      },
      "filter": [
        { "term": { "active": true } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  },
  "size": 20,
  "from": 0,
  "_source": ["name", "price", "category"]
}

Vector Search (k-NN)

向量搜索（k-NN）

OpenSearch has a built-in k-NN plugin supporting two engines: faiss and lucene.

OpenSearch内置k-NN插件，支持两个引擎：faiss和lucene。

k-NN Index Mapping

k-NN索引映射

json

PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 256,
            "m": 16
          }
        }
      },
      "title": { "type": "text" },
      "category": { "type": "keyword" }
    }
  }
}

json

PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 256,
            "m": 16
          }
        }
      },
      "title": { "type": "text" },
      "category": { "type": "keyword" }
    }
  }
}

Engine Selection

引擎选择

faiss: Best for large-scale production. Supports HNSW and IVF. Additional quantization options (SQfp16, PQ). Recommended default.
lucene: Native Lucene HNSW. Supports efficient pre-filtering. Built-in scalar quantization. Lower memory overhead. Good when filtering is critical or for smaller datasets.

faiss: 适合大规模生产环境。支持HNSW和IVF。提供额外量化选项（SQfp16、PQ）。推荐作为默认引擎。
lucene: 原生Lucene HNSW。支持高效预过滤。内置标量量化。内存开销更低。当预过滤至关重要或处理小型数据集时表现出色。

Space Types

空间类型

```
l2
```
(Euclidean distance)
```
cosinesimil
```
(cosine similarity)
```
innerproduct
```
(dot product)
```
l1
```
(Manhattan distance)
```
linf
```
(L-infinity distance)

```
l2
```
（欧氏距离）
```
cosinesimil
```
（余弦相似度）
```
innerproduct
```
（点积）
```
l1
```
（曼哈顿距离）
```
linf
```
（切比雪夫距离）

k-NN Query

k-NN查询

json

GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10
      }
    }
  }
}

json

GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10
      }
    }
  }
}

k-NN with Filtering

带过滤的k-NN查询

json

GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10,
        "filter": {
          "term": { "category": "electronics" }
        }
      }
    }
  }
}

With the lucene engine, filtering is applied as a pre-filter (efficient). With faiss, filtering is post-filter by default, which can return fewer than

results. Faiss supports efficient filtering starting in OpenSearch 2.9+.

json

GET /my-vector-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 10,
        "filter": {
          "term": { "category": "electronics" }
        }
      }
    }
  }
}

使用lucene引擎时，过滤为预过滤（高效）。使用faiss引擎时，默认是后过滤，可能返回少于

条结果。从OpenSearch 2.9+开始，faiss支持高效过滤。

k-NN Best Practices

k-NN最佳实践

Always set
```
"knn": true
```
in index settings
Use faiss for large-scale production, lucene when pre-filtering is critical
Higher
```
ef_search
```
(32-128 practical range) = better accuracy, slower queries. Can be adjusted dynamically per query.
Higher
```
ef_construction
```
(128-256+) and
```
m
```
(16-128) = better recall but slower indexing and more memory
Monitor
```
knn.memory.circuit_breaker.limit
```
-- k-NN graphs load into native off-heap memory. HNSW is dramatically faster when the entire graph resides in RAM.
Warm up indices after creation:
```
GET /_plugins/_knn/warmup/{index}
```
Normalize vectors once during ingestion (not at query time) and use
```
innerproduct
```
for better performance
Disable
```
_source
```
storage and
```
doc_values
```
on vector fields when unnecessary to reduce index size
The final 1-2% of recall improvement typically costs disproportionately more than the first 95% -- define realistic recall targets
Monitor end-to-end latency including embedding generation, metadata retrieval, and reranking -- not just the vector search component

索引配置中务必设置
```
"knn": true
```
大规模生产环境使用faiss，预过滤至关重要时使用lucene
更高的
```
ef_search
```
（实际范围32-128）= 更高准确率，查询速度更慢。可按查询动态调整。
更高的
```
ef_construction
```
（128-256+）和
```
m
```
（16-128）= 更高召回率，但索引速度更慢、内存占用更多
监控
```
knn.memory.circuit_breaker.limit
```
——k-NN图加载到本地堆外内存。当整个图驻留在RAM中时，HNSW速度显著提升。
索引创建后预热：
```
GET /_plugins/_knn/warmup/{index}
```
ingestion阶段归一化向量（而非查询时），并使用
```
innerproduct
```
提升性能
无需时禁用向量字段的
```
_source
```
存储和
```
doc_values
```
以减少索引大小
最后1-2%的召回率提升通常成本远高于前95%——定义切合实际的召回目标
监控端到端延迟，包括嵌入生成、元数据检索和重排序——而非仅监控向量搜索组件

HNSW Hyperparameters

HNSW超参数

M (maximum edges per node): Range 16-128. Lower values (16) for memory-constrained environments, higher values (128) for maximum recall. Memory consumption increases proportionally.
ef_construction (index-time): Range 128-256+. Controls graph quality during insertion. Higher values produce better graphs but slower indexing.
ef_search (query-time): Range 32-128. Can be tuned per query for balancing recall vs. response time.

M（每个节点的最大边数）：范围16-128。内存受限环境使用较小值（16），追求最大召回率使用较大值（128）。内存消耗随值成比例增加。
ef_construction（索引阶段）：范围128-256+。控制插入时的图质量。值越高生成的图质量越好，但索引速度越慢。
ef_search（查询阶段）：范围32-128。可按查询调整，平衡召回率与响应时间。

Vector Shard Sizing

向量分片大小

Target 10-30 million vectors per shard
Shard count should equal or slightly exceed node count for maximum parallelism
Over-sharding creates excessive coordination overhead that harms tail latency
Under-sharding leaves performance on the table
Adding replicas provides load balancing and smooths performance variations

目标每个分片10-3000万条向量
分片数应等于或略多于节点数以实现最大并行度
过度分片会产生过多协调开销，影响长尾延迟
分片数不足会浪费性能潜力
添加副本可实现负载均衡，平滑性能波动

Quantization Strategies

量化策略

Scalar Quantization (SQ):

Lucene engine: built-in support
Faiss engine: SQfp16 converts 32-bit to 16-bit, approximately 50% memory reduction

Binary Quantization (BQ):

1-bit compression provides 32x compression -- a 768-dimensional float32 vector shrinks from 3072 bytes to under 100 bytes
No separate training step required
Asymmetric distance computation (ADC) improves recall

Product Quantization (PQ):

Most aggressive compression: up to 64x
Faiss engine only, requires training step for IVF-based indexes
Configuration:
```
code_size=8
```
with
```
m
```
tuned for desired balance
More noticeable recall impact than SQ or BQ

标量量化（SQ）：

Lucene引擎：内置支持
Faiss引擎：SQfp16将32位转换为16位，内存占用减少约50%

二进制量化（BQ）：

1位压缩实现32倍压缩——768维float32向量从3072字节缩小至不足100字节
无需单独训练步骤
非对称距离计算（ADC）提升召回率

乘积量化（PQ）：

压缩比最高可达64倍
仅Faiss引擎支持，基于IVF的索引需要训练步骤
配置：
```
code_size=8
```
，调整
```
m
```
以平衡需求
对召回率的影响比SQ或BQ更明显

Disk-Based Vector Search

基于磁盘的向量搜索

For cost-effective large-scale deployments, use

on_disk

mode with binary quantization:

json

PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "mode": "on_disk"
      }
    }
  }
}

Approximately 97% memory reduction (e.g., 100M 768-dim vectors: from 300GB+ RAM to under 10GB)
Two-phase approach: quantized index identifies candidates, full-precision vectors lazily loaded from disk for reranking
P90 latency in the 100-200ms range (acceptable for many use cases)
Cost reduction of roughly one-third compared to memory-optimized deployments
Currently supports only float data type

对于高性价比的大规模部署，结合二进制量化使用

on_disk

模式：

json

PUT /my-vector-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "mode": "on_disk"
      }
    }
  }
}

内存占用减少约97%（例如：1亿条768维向量：从300GB+ RAM降至不足10GB）
两阶段方法：量化索引识别候选，从磁盘懒加载全精度向量进行重排序
P90延迟在100-200ms范围内（适用于多数场景）
成本比内存优化部署降低约三分之一
目前仅支持float数据类型

Concurrent Segment Search

并发段搜索

Enable at cluster level for significant vector search latency improvements:

json

PUT _cluster/settings
{
  "persistent": {
    "search.concurrent_segment_search.enabled": true
  }
}

Benchmarked at 60%+ improvement in p90 service time, up to 75% reduction in p90 latency
Best applied after force-merging segments
Benefits diminish when compute cores are already saturated or search space per shard is small

在集群级别启用可显著提升向量搜索延迟：

json

PUT _cluster/settings
{
  "persistent": {
    "search.concurrent_segment_search.enabled": true
  }
}

基准测试显示p90服务时间提升60%以上，p90延迟降低最高75%
强制合并段后使用效果最佳
当计算核心已饱和或每个分片的搜索空间较小时，收益会降低

Scale-Specific Recommendations

不同规模的建议

1-50M vectors:

Start with M=32, ef_construction=128
Use batch inserts
Ensure full RAM residency for the graph

50-500M vectors:

Implement systematic sharding (10-30M vectors per shard)
Enable scalar quantization
Two-phase retrieval: compressed vectors for search, full precision for reranking
Periodic index rebuilds for graph quality maintenance

500M+ vectors:

Hierarchical retrieval (IVF-style centroid routing)
Multi-tier hot/cold data architecture
Aggressive quantization (IVF-PQ) for cold data
Optimize query routing to avoid broadcast across all shards
Return only IDs/scores in initial retrieval phases

1-5000万条向量：

初始设置M=32，ef_construction=128
使用批量插入
确保图完全驻留在RAM中

5000万-5亿条向量：

实施系统化分片（每个分片10-3000万条向量）
启用标量量化
两阶段检索：压缩向量用于搜索，全精度向量用于重排序
定期重建索引以维护图质量

5亿+条向量：

分层检索（IVF式质心路由）
冷热分层数据架构
冷数据使用激进量化（IVF-PQ）
优化查询路由，避免在所有分片上广播
初始检索阶段仅返回ID/分数

Neural Search and Hybrid Search

神经搜索与混合搜索

Neural Search with ML Commons

结合ML Commons的神经搜索

Set up an ingest pipeline to automatically generate embeddings:

json

PUT /_ingest/pipeline/neural-search-pipeline
{
  "description": "Pipeline for neural search",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
          "title": "title_embedding"
        }
      }
    }
  ]
}

Query with automatic text-to-vector conversion:

json

GET /my-index/_search
{
  "query": {
    "neural": {
      "title_embedding": {
        "query_text": "comfortable running shoes",
        "model_id": "<model_id>",
        "k": 10
      }
    }
  }
}

设置摄入管道自动生成嵌入：

json

PUT /_ingest/pipeline/neural-search-pipeline
{
  "description": "神经搜索管道",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
          "title": "title_embedding"
        }
      }
    }
  ]
}

自动文本转向量查询：

json

GET /my-index/_search
{
  "query": {
    "neural": {
      "title_embedding": {
        "query_text": "comfortable running shoes",
        "model_id": "<model_id>",
        "k": 10
      }
    }
  }
}

Hybrid Search (BM25 + k-NN) with Search Pipelines

结合搜索管道的混合搜索（BM25 + k-NN）

Create a normalization pipeline:

json

PUT /_search/pipeline/hybrid-pipeline
{
  "description": "Hybrid search pipeline",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [0.3, 0.7]
          }
        }
      }
    }
  ]
}

Execute a hybrid query combining lexical and vector search:

json

GET /my-index/_search?search_pipeline=hybrid-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "title": "wireless headphones"
          }
        },
        {
          "knn": {
            "my_vector": {
              "vector": [0.1, 0.2, 0.3],
              "k": 10
            }
          }
        }
      ]
    }
  }
}

创建归一化管道：

json

PUT /_search/pipeline/hybrid-pipeline
{
  "description": "混合搜索管道",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [0.3, 0.7]
          }
        }
      }
    }
  ]
}

执行结合词法搜索与向量搜索的混合查询：

json

GET /my-index/_search?search_pipeline=hybrid-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "title": "wireless headphones"
          }
        },
        {
          "knn": {
            "my_vector": {
              "vector": [0.1, 0.2, 0.3],
              "k": 10
            }
          }
        }
      ]
    }
  }
}

ML Commons Best Practices

ML Commons最佳实践

Use remote model connectors (Amazon Bedrock, SageMaker, OpenAI, Cohere) for large models

Set

plugins.ml_commons.only_run_on_ml_node: true

in production to isolate ML workloads

Use
```
model_group
```
for access control on ML models
Monitor model memory usage -- local models consume JVM/native memory

大型模型使用远程模型连接器（Amazon Bedrock、SageMaker、OpenAI、Cohere）

生产环境设置

plugins.ml_commons.only_run_on_ml_node: true

以隔离ML工作负载

使用
```
model_group
```
实现ML模型的访问控制
监控模型内存使用——本地模型消耗JVM/本地内存

Registering a Remote Model Connector

注册远程模型连接器

json

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector",
  "description": "Connector for Titan Embeddings",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
      "headers": { "content-type": "application/json" },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\" }"
    }
  ]
}

json

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector",
  "description": "Titan Embeddings连接器",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
      "headers": { "content-type": "application/json" },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\" }"
    }
  ]
}

Aggregations

聚合

Common Aggregation Patterns

常见聚合模式

json

{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  }
}

json

{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  }
}

Aggregation Best Practices

聚合最佳实践

Use
```
size: 0
```
when you only need aggregations
Set appropriate
```
shard_size
```
for terms aggregations
Use composite aggregations for pagination
Consider using
```
aggs
```
filters to narrow scope

仅需要聚合时设置
```
size: 0
```
为词项聚合设置合适的
```
shard_size
```
复合聚合用于分页
考虑使用
```
aggs
```
过滤器缩小范围

Indexing Best Practices

索引最佳实践

Bulk Indexing

批量索引

json

POST _bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 99.99 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 149.99 }

json

POST _bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 99.99 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 149.99 }

Bulk API Guidelines

Bulk API 指导原则

Use bulk API for batch operations
Optimal bulk size: 5-15MB per request
Monitor for rejected requests (thread pool queue full)
Disable refresh during bulk indexing for better performance

json

PUT /products/_settings
{
  "refresh_interval": "-1"
}

After bulk indexing:

json

PUT /products/_settings
{
  "refresh_interval": "1s"
}

POST /products/_refresh

批量操作使用Bulk API
最佳批量大小：每个请求5-15MB
监控被拒绝的请求（线程池队列已满）
批量索引期间禁用刷新以提升性能

json

PUT /products/_settings
{
  "refresh_interval": "-1"
}

批量索引完成后：

json

PUT /products/_settings
{
  "refresh_interval": "1s"
}

POST /products/_refresh

Document Updates

文档更新

json

POST /products/_update/1
{
  "doc": {
    "price": 89.99,
    "updated_at": "2024-01-15T10:30:00Z"
  }
}

Update by query:

json

POST /products/_update_by_query
{
  "query": {
    "term": { "category": "electronics" }
  },
  "script": {
    "source": "ctx._source.on_sale = true"
  }
}

json

POST /products/_update/1
{
  "doc": {
    "price": 89.99,
    "updated_at": "2024-01-15T10:30:00Z"
  }
}

按查询更新：

json

POST /products/_update_by_query
{
  "query": {
    "term": { "category": "electronics" }
  },
  "script": {
    "source": "ctx._source.on_sale = true"
  }
}

Analysis and Tokenization

分析与分词

Custom Analyzers

自定义分析器

json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  }
}

json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  }
}

Test Analyzer

测试分析器

json

POST /products/_analyze
{
  "analyzer": "product_analyzer",
  "text": "Wireless Bluetooth Headphones"
}

json

POST /products/_analyze
{
  "analyzer": "product_analyzer",
  "text": "Wireless Bluetooth Headphones"
}

Search Features

搜索功能

Autocomplete/Suggestions

自动补全/建议

json

{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

Query suggestions:

json

{
  "suggest": {
    "product-suggest": {
      "prefix": "wire",
      "completion": {
        "field": "name.suggest",
        "size": 5,
        "fuzzy": {
          "fuzziness": "AUTO"
        }
      }
    }
  }
}

json

{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

查询建议：

json

{
  "suggest": {
    "product-suggest": {
      "prefix": "wire",
      "completion": {
        "field": "name.suggest",
        "size": 5,
        "fuzzy": {
          "fuzziness": "AUTO"
        }
      }
    }
  }
}

Highlighting

高亮显示

json

{
  "query": {
    "match": { "description": "wireless" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150
      }
    }
  }
}

json

{
  "query": {
    "match": { "description": "wireless" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150
      }
    }
  }
}

SQL and PPL Queries

SQL与PPL查询

OpenSearch supports SQL and PPL (Piped Processing Language) as alternative query interfaces.

OpenSearch支持SQL和PPL（管道处理语言）作为替代查询接口。

SQL

json

POST /_plugins/_sql
{
  "query": "SELECT category, COUNT(*) as cnt, AVG(price) as avg_price FROM products GROUP BY category HAVING cnt > 10 ORDER BY avg_price DESC"
}

json

POST /_plugins/_sql
{
  "query": "SELECT category, COUNT(*) as cnt, AVG(price) as avg_price FROM products GROUP BY category HAVING cnt > 10 ORDER BY avg_price DESC"
}

PPL (Piped Processing Language)

PPL（管道处理语言）

PPL is unique to OpenSearch, inspired by Splunk's SPL. It uses pipe syntax for data exploration:

json

POST /_plugins/_ppl
{
  "query": "source=server-logs | where response_code >= 500 | stats count() as error_count by host | where error_count > 100 | sort - error_count"
}

PPL是OpenSearch专属，灵感来自Splunk的SPL。它使用管道语法进行数据探索：

json

POST /_plugins/_ppl
{
  "query": "source=server-logs | where response_code >= 500 | stats count() as error_count by host | where error_count > 100 | sort - error_count"
}

SQL/PPL Best Practices

SQL/PPL最佳实践

Use PPL for ad-hoc log exploration -- intuitive for operations teams
SQL queries are translated to DSL internally; complex joins or subqueries may not be supported
Use
```
"format": "jdbc"
```
or
```
"format": "csv"
```
for different output formats
Always add
```
LIMIT
```
to avoid scanning large indices

临时日志探索使用PPL——对运维团队更直观
SQL查询内部转换为DSL；复杂连接或子查询可能不支持
使用
```
"format": "jdbc"
```
或
```
"format": "csv"
```
获取不同输出格式
始终添加
```
LIMIT
```
避免扫描大型索引

Performance Optimization

性能优化

Query Caching

查询缓存

Filter queries are cached automatically
Use
```
filter
```
context for frequently repeated conditions
Monitor cache hit rates

过滤查询自动缓存
频繁重复的条件使用
```
filter
```
上下文
监控缓存命中率

Search Performance

搜索性能

Avoid deep pagination (use
```
search_after
```
instead)
Limit
```
_source
```
fields returned
Use
```
doc_values
```
for sorting and aggregations
Pre-sort index for common sort orders

json

{
  "query": { "match_all": {} },
  "size": 20,
  "search_after": [1705329600000, "product_123"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

避免深度分页（使用
```
search_after
```
替代）
限制返回的
```
_source
```
字段
排序和聚合使用
```
doc_values
```
为常见排序顺序预排序索引

json

{
  "query": { "match_all": {} },
  "size": 20,
  "search_after": [1705329600000, "product_123"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

Segment Replication

段复制

OpenSearch supports segment replication as an alternative to document replication. Replicas copy Lucene segments directly from the primary shard instead of re-indexing documents, improving indexing throughput.

json

PUT /my-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "number_of_replicas": 1
    }
  }
}

OpenSearch支持段复制作为文档复制的替代方案。副本直接从主分片复制Lucene段，而非重新索引文档，提升索引吞吐量。

json

PUT /my-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "number_of_replicas": 1
    }
  }
}

Remote-Backed Indices

远程存储索引

Store data on remote storage (e.g., S3) with local caching to decouple compute from storage:

json

PUT /my-remote-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "remote_store.enabled": true,
      "remote_store.segment.repository": "my-s3-repo",
      "remote_store.translog.repository": "my-s3-repo"
    }
  }
}

将数据存储在远程存储（如S3）并结合本地缓存，实现计算与存储解耦：

json

PUT /my-remote-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT",
      "remote_store.enabled": true,
      "remote_store.segment.repository": "my-s3-repo",
      "remote_store.translog.repository": "my-s3-repo"
    }
  }
}

Monitoring and Maintenance

监控与维护

Pulse for OpenSearch

For comprehensive monitoring, use Pulse for OpenSearch -- the ultimate monitoring solution for OpenSearch clusters, developed by BigData Boutique. Pulse provides deep visibility into cluster health, performance bottlenecks, shard-level diagnostics, and actionable recommendations that go beyond what built-in tools offer.

如需全面监控，使用Pulse for OpenSearch——OpenSearch集群的终极监控解决方案，由BigData Boutique开发。Pulse提供集群健康、性能瓶颈、分片级诊断的深度可视性，以及超越内置工具的可行建议。

Cluster Health

集群健康

GET _cluster/health
GET _cat/indices?v
GET _cat/shards?v
GET _nodes/stats

GET _cluster/health
GET _cat/indices?v
GET _cat/shards?v
GET _nodes/stats

Performance Analyzer

性能分析器

OpenSearch provides a dedicated Performance Analyzer agent on each node (port 9600) for detailed JVM, OS, and request-level metrics:

GET localhost:9600/_plugins/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg&dim=Operation

OpenSearch在每个节点上提供专用的Performance Analyzer代理（端口9600），用于获取详细的JVM、OS和请求级指标：

GET localhost:9600/_plugins/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg&dim=Operation

Index Maintenance

索引维护

POST /products/_forcemerge?max_num_segments=1
POST /products/_cache/clear
POST /products/_refresh

POST /products/_forcemerge?max_num_segments=1
POST /products/_cache/clear
POST /products/_refresh

Slow Query Log

慢查询日志

json

PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}

json

PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}

Anomaly Detection

异常检测

OpenSearch includes a built-in Anomaly Detection plugin using the Random Cut Forest (RCF) algorithm:

json

POST /_plugins/_anomaly_detection/detectors
{
  "name": "cpu-anomaly-detector",
  "description": "Detect CPU usage anomalies",
  "time_field": "@timestamp",
  "indices": ["server-metrics-*"],
  "feature_attributes": [
    {
      "feature_name": "cpu_usage",
      "feature_enabled": true,
      "aggregation_query": {
        "cpu_avg": {
          "avg": { "field": "cpu.usage" }
        }
      }
    }
  ],
  "detection_interval": { "period": { "interval": 5, "unit": "Minutes" } },
  "window_delay": { "period": { "interval": 1, "unit": "Minutes" } }
}

OpenSearch内置异常检测插件，使用随机切割森林（RCF）算法：

json

POST /_plugins/_anomaly_detection/detectors
{
  "name": "cpu-anomaly-detector",
  "description": "检测CPU使用率异常",
  "time_field": "@timestamp",
  "indices": ["server-metrics-*"],
  "feature_attributes": [
    {
      "feature_name": "cpu_usage",
      "feature_enabled": true,
      "aggregation_query": {
        "cpu_avg": {
          "avg": { "field": "cpu.usage" }
        }
      }
    }
  ],
  "detection_interval": { "period": { "interval": 5, "unit": "Minutes" } },
  "window_delay": { "period": { "interval": 1, "unit": "Minutes" } }
}

Anomaly Detection Best Practices

异常检测最佳实践

Keep detection intervals reasonable (1-10 minutes)
Use
```
window_delay
```
to account for data ingestion lag
High-cardinality detectors (with category fields) can be expensive -- limit the number of entities

检测间隔设置合理（1-10分钟）
使用
```
window_delay
```
应对数据摄入延迟
高基数检测器（带分类字段）成本较高——限制实体数量

Security

安全

OpenSearch uses its built-in Security plugin (not X-Pack). All security APIs are under

_plugins/_security/api/

OpenSearch使用内置的Security插件（而非X-Pack）。所有安全API位于

_plugins/_security/api/

路径下。

Creating Roles

创建角色

json

PUT _plugins/_security/api/roles/products_reader
{
  "cluster_permissions": [
    "cluster_monitor"
  ],
  "index_permissions": [
    {
      "index_patterns": ["products*"],
      "allowed_actions": ["read", "search"]
    }
  ]
}

json

PUT _plugins/_security/api/roles/products_reader
{
  "cluster_permissions": [
    "cluster_monitor"
  ],
  "index_permissions": [
    {
      "index_patterns": ["products*"],
      "allowed_actions": ["read", "search"]
    }
  ]
}

Field-Level and Document-Level Security

字段级与文档级安全

json

PUT _plugins/_security/api/roles/limited_access
{
  "index_permissions": [
    {
      "index_patterns": ["users"],
      "allowed_actions": ["read"],
      "fls": ["name", "email", "created_at"],
      "dls": "{\"bool\": {\"must\": [{\"term\": {\"department\": \"engineering\"}}]}}",
      "masked_fields": ["email"]
    }
  ]
}

```
fls
```
(Field-Level Security): array of allowed fields. Prefix with
```
~
```
to exclude instead.
```
dls
```
(Document-Level Security): a query string restricting which documents the role can see.
```
masked_fields
```
: field values are hashed in query results (useful for PII).

json

PUT _plugins/_security/api/roles/limited_access
{
  "index_permissions": [
    {
      "index_patterns": ["users"],
      "allowed_actions": ["read"],
      "fls": ["name", "email", "created_at"],
      "dls": "{\"bool\": {\"must\": [{\"term\": {\"department\": \"engineering\"}}]}}",
      "masked_fields": ["email"]
    }
  ]
}

```
fls
```
（字段级安全）：允许访问的字段数组。前缀
```
~
```
表示排除。
```
dls
```
（文档级安全）：限制角色可查看文档的查询字符串。
```
masked_fields
```
：查询结果中字段值被哈希处理（适用于PII数据）。

Role Mapping

角色映射

Roles are mapped to users via role mappings (not assigned directly on users):

json

PUT _plugins/_security/api/rolesmapping/products_reader
{
  "users": ["analyst_jane"],
  "backend_roles": ["analysts"],
  "hosts": []
}

角色通过角色映射关联到用户（而非直接分配给用户）：

json

PUT _plugins/_security/api/rolesmapping/products_reader
{
  "users": ["analyst_jane"],
  "backend_roles": ["analysts"],
  "hosts": []
}

Security Key Differences from Elasticsearch

与Elasticsearch的安全差异

API base path:
```
_plugins/_security/api/
```
(not
```
_security/
```
)
Uses
```
allowed_actions
```
with action groups (not
```
privileges
```
)

Uses

fls

dls

masked_fields

(not

field_security.grant/except

)

Built-in multi-tenancy for OpenSearch Dashboards via
```
tenant_permissions
```
Configuration via YAML files +
```
securityadmin.sh
```
, or REST API

Predefined action groups:

read

write

search

crud

manage

create_index

indices_all

cluster_monitor

cluster_all

API基础路径：
```
_plugins/_security/api/
```
（而非
```
_security/
```
）
使用带操作组的
```
allowed_actions
```
（而非
```
privileges
```
）

使用

fls

dls

masked_fields

（而非

field_security.grant/except

）

内置OpenSearch Dashboards多租户支持，通过
```
tenant_permissions
```
实现
通过YAML文件 +
```
securityadmin.sh
```
或REST API配置

预定义操作组：

read

、

write

、

search

、

crud

、

manage

、

create_index

、

indices_all

、

cluster_monitor

、

cluster_all

Aliases and Reindexing

别名与重索引

Index Aliases

索引别名

json

POST _aliases
{
  "actions": [
    { "add": { "index": "products_v2", "alias": "products" } },
    { "remove": { "index": "products_v1", "alias": "products" } }
  ]
}

json

POST _aliases
{
  "actions": [
    { "add": { "index": "products_v2", "alias": "products" } },
    { "remove": { "index": "products_v1", "alias": "products" } }
  ]
}

Reindex with Transformation

带转换的重索引

json

POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": "ctx._source.migrated_at = new Date().toString()"
  }
}

json

POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": "ctx._source.migrated_at = new Date().toString()"
  }
}

Notifications

通知

The Notifications plugin centralizes notification channels for Alerting, ISM, and Anomaly Detection:

json

POST /_plugins/_notifications/configs
{
  "config": {
    "name": "ops-slack",
    "description": "Slack channel for ops alerts",
    "config_type": "slack",
    "is_enabled": true,
    "slack": {
      "url": "https://hooks.slack.com/services/xxx/yyy/zzz"
    }
  }
}

Supports: Slack, Amazon SNS, Amazon SES, email (SMTP), custom webhooks, Microsoft Teams, Google Chat.

Notifications插件集中管理Alerting、ISM和Anomaly Detection的通知渠道：

json

POST /_plugins/_notifications/configs
{
  "config": {
    "name": "ops-slack",
    "description": "运维告警Slack频道",
    "config_type": "slack",
    "is_enabled": true,
    "slack": {
      "url": "https://hooks.slack.com/services/xxx/yyy/zzz"
    }
  }
}

支持：Slack、Amazon SNS、Amazon SES、电子邮件（SMTP）、自定义Webhook、Microsoft Teams、Google Chat。

Cross-Cluster Replication

跨集群复制

json

PUT /_plugins/_replication/follower-index/_start
{
  "leader_alias": "leader-cluster",
  "leader_index": "my-index",
  "use_roles": {
    "leader_cluster_role": "cross_cluster_replication_leader_full_access",
    "follower_cluster_role": "cross_cluster_replication_follower_full_access"
  }
}

Follower indices are read-only
Use autofollow patterns for automatic replication of new indices

Monitor replication lag:

GET /_plugins/_replication/<index>/_status

json

PUT /_plugins/_replication/follower-index/_start
{
  "leader_alias": "leader-cluster",
  "leader_index": "my-index",
  "use_roles": {
    "leader_cluster_role": "cross_cluster_replication_leader_full_access",
    "follower_cluster_role": "cross_cluster_replication_follower_full_access"
  }
}

follower索引为只读
自动跟随模式用于自动复制新索引

监控复制延迟：

GET /_plugins/_replication/<index>/_status

OpenSearch Plugin API Reference

OpenSearch插件API参考

All OpenSearch-specific features use the

_plugins/

prefix:

Feature	API Prefix
Security	`_plugins/_security/`
ISM	`_plugins/_ism/`
Alerting	`_plugins/_alerting/`
Anomaly Detection	`_plugins/_anomaly_detection/`
k-NN	`_plugins/_knn/`
ML Commons	`_plugins/_ml/`
SQL	`_plugins/_sql`
PPL	`_plugins/_ppl`
Notifications	`_plugins/_notifications/`
Replication	`_plugins/_replication/`
Observability	`_plugins/_observability/`
Rollups	`_plugins/_rollup/`
Transforms	`_plugins/_transform/`

The legacy

_opendistro/

prefix is deprecated. Always use

_plugins/

in new code.

所有OpenSearch专属功能使用

_plugins/

前缀：

功能	API前缀
Security	`_plugins/_security/`
ISM	`_plugins/_ism/`
Alerting	`_plugins/_alerting/`
Anomaly Detection	`_plugins/_anomaly_detection/`
k-NN	`_plugins/_knn/`
ML Commons	`_plugins/_ml/`
SQL	`_plugins/_sql`
PPL	`_plugins/_ppl`
Notifications	`_plugins/_notifications/`
Replication	`_plugins/_replication/`
Observability	`_plugins/_observability/`
Rollups	`_plugins/_rollup/`
Transforms	`_plugins/_transform/`

旧版

_opendistro/

前缀已弃用。新代码中始终使用

_plugins/

。