elasticsearch-best-practices

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Elasticsearch Best Practices

Elasticsearch 最佳实践

Core Principles

核心原则

  • Design indices and mappings based on query patterns
  • Optimize for search performance with proper analysis and indexing
  • Use appropriate shard sizing and cluster configuration
  • Implement proper security and access control
  • Monitor cluster health and optimize queries
  • 根据查询模式设计索引和映射
  • 通过合理的分析与索引优化搜索性能
  • 使用合适的分片大小和集群配置
  • 实施恰当的安全与访问控制
  • 监控集群健康状况并优化查询

Index Design

索引设计

Mapping Best Practices

映射最佳实践

  • Define explicit mappings instead of relying on dynamic mapping
  • Use appropriate data types for each field
  • Disable indexing for fields you do not search on
  • Use keyword type for exact matches, text for full-text search
json
{
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}
  • 定义显式映射,而非依赖动态映射
  • 为每个字段使用合适的数据类型
  • 对无需搜索的字段禁用索引
  • 精确匹配使用keyword类型,全文搜索使用text类型
json
{
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

Field Types

字段类型

  • keyword
    : Exact values, filtering, aggregations, sorting
  • text
    : Full-text search with analysis
  • date
    : Date/time values with format specification
  • numeric types
    : long, integer, short, byte, double, float, scaled_float
  • boolean
    : True/false values
  • geo_point
    : Latitude/longitude pairs
  • nested
    : Arrays of objects that need independent querying
  • keyword
    : 精确值、过滤、聚合、排序
  • text
    : 带分析的全文搜索
  • date
    : 带格式规范的日期/时间值
  • numeric types
    : long、integer、short、byte、double、float、scaled_float
  • boolean
    : 布尔值(真/假)
  • geo_point
    : 经纬度对
  • nested
    : 需要独立查询的对象数组

Index Settings

索引设置

json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["laptop, notebook", "phone, mobile, smartphone"]
        }
      }
    }
  }
}
json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["laptop, notebook", "phone, mobile, smartphone"]
        }
      }
    }
  }
}

Shard Sizing

分片大小

Guidelines

指导原则

  • Target 20-40GB per shard
  • Aim for ~20 shards per GB of heap
  • Avoid oversharding (too many small shards)
  • Consider time-based indices for time-series data
json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}
  • 每个分片目标大小为20-40GB
  • 每GB堆内存对应约20个分片
  • 避免过度分片(过多小分片)
  • 时序数据考虑使用基于时间的索引
json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Index Lifecycle Management (ILM)

索引生命周期管理(ILM)

json
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
json
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Query Optimization

查询优化

Query Types

查询类型

Match Query (Full-text search)

匹配查询(全文搜索)

json
{
  "query": {
    "match": {
      "description": {
        "query": "wireless bluetooth headphones",
        "operator": "and",
        "fuzziness": "AUTO"
      }
    }
  }
}
json
{
  "query": {
    "match": {
      "description": {
        "query": "wireless bluetooth headphones",
        "operator": "and",
        "fuzziness": "AUTO"
      }
    }
  }
}

Term Query (Exact match)

词项查询(精确匹配)

json
{
  "query": {
    "term": {
      "status": "active"
    }
  }
}
json
{
  "query": {
    "term": {
      "status": "active"
    }
  }
}

Bool Query (Combining queries)

布尔查询(组合查询)

json
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "gte": 500, "lte": 2000 } } }
      ],
      "should": [
        { "term": { "brand": "apple" } }
      ],
      "must_not": [
        { "term": { "status": "discontinued" } }
      ]
    }
  }
}
json
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "gte": 500, "lte": 2000 } } }
      ],
      "should": [
        { "term": { "brand": "apple" } }
      ],
      "must_not": [
        { "term": { "status": "discontinued" } }
      ]
    }
  }
}

Query Best Practices

查询最佳实践

  • Use
    filter
    context for non-scoring queries (cacheable)
  • Use
    must
    only when scoring is needed
  • Avoid wildcards at the beginning of terms
  • Use
    keyword
    fields for exact matches
  • Limit result size with
    size
    parameter
json
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "search terms",
          "fields": ["name^3", "description", "tags^2"],
          "type": "best_fields"
        }
      },
      "filter": [
        { "term": { "active": true } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  },
  "size": 20,
  "from": 0,
  "_source": ["name", "price", "category"]
}
  • 对非评分查询使用
    filter
    上下文(可缓存)
  • 仅在需要评分时使用
    must
  • 避免在词项开头使用通配符
  • 精确匹配使用
    keyword
    字段
  • 使用
    size
    参数限制结果数量
json
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "search terms",
          "fields": ["name^3", "description", "tags^2"],
          "type": "best_fields"
        }
      },
      "filter": [
        { "term": { "active": true } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  },
  "size": 20,
  "from": 0,
  "_source": ["name", "price", "category"]
}

Aggregations

聚合

Common Aggregation Patterns

常见聚合模式

json
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  }
}
json
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  }
}

Aggregation Best Practices

聚合最佳实践

  • Use
    size: 0
    when you only need aggregations
  • Set appropriate
    shard_size
    for terms aggregations
  • Use composite aggregations for pagination
  • Consider using
    aggs
    filters to narrow scope
  • 仅需要聚合时使用
    size: 0
  • 为词项聚合设置合适的
    shard_size
  • 使用复合聚合实现分页
  • 考虑使用
    aggs
    过滤器缩小范围

Indexing Best Practices

索引最佳实践

Bulk Indexing

批量索引

json
POST _bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 99.99 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 149.99 }
json
POST _bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 99.99 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 149.99 }

Bulk API Guidelines

Bulk API 指导原则

  • Use bulk API for batch operations
  • Optimal bulk size: 5-15MB per request
  • Monitor for rejected requests (thread pool queue full)
  • Disable refresh during bulk indexing for better performance
json
PUT /products/_settings
{
  "refresh_interval": "-1"
}

// After bulk indexing:
PUT /products/_settings
{
  "refresh_interval": "1s"
}

POST /products/_refresh
  • 批量操作使用Bulk API
  • 最佳批量大小:每个请求5-15MB
  • 监控被拒绝的请求(线程池队列已满)
  • 批量索引期间禁用刷新以提升性能
json
PUT /products/_settings
{
  "refresh_interval": "-1"
}

// 批量索引完成后:
PUT /products/_settings
{
  "refresh_interval": "1s"
}

POST /products/_refresh

Document Updates

文档更新

json
POST /products/_update/1
{
  "doc": {
    "price": 89.99,
    "updated_at": "2024-01-15T10:30:00Z"
  }
}

// Update by query
POST /products/_update_by_query
{
  "query": {
    "term": { "category": "electronics" }
  },
  "script": {
    "source": "ctx._source.on_sale = true"
  }
}
json
POST /products/_update/1
{
  "doc": {
    "price": 89.99,
    "updated_at": "2024-01-15T10:30:00Z"
  }
}

// 按查询更新
POST /products/_update_by_query
{
  "query": {
    "term": { "category": "electronics" }
  },
  "script": {
    "source": "ctx._source.on_sale = true"
  }
}

Analysis and Tokenization

分析与分词

Custom Analyzers

自定义分析器

json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  }
}
json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  }
}

Test Analyzer

测试分析器

json
POST /products/_analyze
{
  "analyzer": "product_analyzer",
  "text": "Wireless Bluetooth Headphones"
}
json
POST /products/_analyze
{
  "analyzer": "product_analyzer",
  "text": "Wireless Bluetooth Headphones"
}

Search Features

搜索功能

Autocomplete/Suggestions

自动补全/建议

json
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

// Query suggestions
{
  "suggest": {
    "product-suggest": {
      "prefix": "wire",
      "completion": {
        "field": "name.suggest",
        "size": 5
      }
    }
  }
}
json
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

// 查询建议
{
  "suggest": {
    "product-suggest": {
      "prefix": "wire",
      "completion": {
        "field": "name.suggest",
        "size": 5
      }
    }
  }
}

Highlighting

高亮显示

json
{
  "query": {
    "match": { "description": "wireless" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150
      }
    }
  }
}
json
{
  "query": {
    "match": { "description": "wireless" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150
      }
    }
  }
}

Performance Optimization

性能优化

Query Caching

查询缓存

  • Filter queries are cached automatically
  • Use
    filter
    context for frequently repeated conditions
  • Monitor cache hit rates
  • 过滤查询会自动缓存
  • 频繁重复的条件使用
    filter
    上下文
  • 监控缓存命中率

Search Performance

搜索性能

  • Avoid deep pagination (use
    search_after
    instead)
  • Limit
    _source
    fields returned
  • Use
    doc_values
    for sorting and aggregations
  • Pre-sort index for common sort orders
json
{
  "query": { "match_all": {} },
  "size": 20,
  "search_after": [1705329600000, "product_123"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}
  • 避免深度分页(使用
    search_after
    替代)
  • 限制返回的
    _source
    字段
  • 排序和聚合使用
    doc_values
  • 为常见排序预排序索引
json
{
  "query": { "match_all": {} },
  "size": 20,
  "search_after": [1705329600000, "product_123"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

Monitoring and Maintenance

监控与维护

Cluster Health

集群健康

GET _cluster/health
GET _cat/indices?v
GET _cat/shards?v
GET _nodes/stats
GET _cluster/health
GET _cat/indices?v
GET _cat/shards?v
GET _nodes/stats

Index Maintenance

索引维护

POST /products/_forcemerge?max_num_segments=1
POST /products/_cache/clear
POST /products/_refresh
POST /products/_forcemerge?max_num_segments=1
POST /products/_cache/clear
POST /products/_refresh

Slow Query Log

慢查询日志

json
PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}
json
PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}

Security

安全

Index-Level Security

索引级安全

json
PUT _security/role/products_reader
{
  "indices": [
    {
      "names": ["products*"],
      "privileges": ["read"]
    }
  ]
}
json
PUT _security/role/products_reader
{
  "indices": [
    {
      "names": ["products*"],
      "privileges": ["read"]
    }
  ]
}

Field-Level Security

字段级安全

json
PUT _security/role/limited_access
{
  "indices": [
    {
      "names": ["users"],
      "privileges": ["read"],
      "field_security": {
        "grant": ["name", "email", "created_at"]
      }
    }
  ]
}
json
PUT _security/role/limited_access
{
  "indices": [
    {
      "names": ["users"],
      "privileges": ["read"],
      "field_security": {
        "grant": ["name", "email", "created_at"]
      }
    }
  ]
}

Aliases and Reindexing

别名与重新索引

Index Aliases

索引别名

json
POST _aliases
{
  "actions": [
    { "add": { "index": "products_v2", "alias": "products" } },
    { "remove": { "index": "products_v1", "alias": "products" } }
  ]
}
json
POST _aliases
{
  "actions": [
    { "add": { "index": "products_v2", "alias": "products" } },
    { "remove": { "index": "products_v1", "alias": "products" } }
  ]
}

Reindex with Transformation

带转换的重新索引

json
POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": "ctx._source.migrated_at = new Date().toString()"
  }
}
json
POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": "ctx._source.migrated_at = new Date().toString()"
  }
}