kibana-alerting-rules

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kibana Alerting Rules

Kibana告警规则

Core Concepts

核心概念

A rule has three parts: conditions (what to detect), schedule (how often to check), and actions (what happens when conditions are met). When conditions are met, the rule creates alerts, which trigger actions via connectors.
一条告警规则包含三个核心部分:conditions(检测逻辑)schedule(执行频率)actions(触发动作)。当检测条件满足时,规则会生成alerts(告警事件),并通过**connectors(连接器)**触发预设的动作。

Authentication

认证方式

All alerting API calls require either API key auth or Basic auth. Every mutating request must include the
kbn-xsrf
header.
http
kbn-xsrf: true
所有告警API调用均需要API Key认证或Basic认证。所有写操作请求必须包含
kbn-xsrf
请求头。
http
kbn-xsrf: true

Required Privileges

所需权限

  • all
    privileges for the appropriate Kibana feature (e.g., Stack Rules, Observability, Security)
  • read
    privileges for Actions and Connectors (to attach actions to rules)
  • 对应Kibana功能的
    all
    权限(例如Stack Rules、Observability、Security)
  • Actions and Connectors的
    read
    权限(用于为规则关联动作)

API Reference

API参考

Base path:
<kibana_url>/api/alerting
(or
/s/<space_id>/api/alerting
for non-default spaces).
OperationMethodEndpoint
Create rulePOST
/api/alerting/rule/{id}
Update rulePUT
/api/alerting/rule/{id}
Get ruleGET
/api/alerting/rule/{id}
Delete ruleDELETE
/api/alerting/rule/{id}
Find rulesGET
/api/alerting/rules/_find
List rule typesGET
/api/alerting/rule_types
Enable rulePOST
/api/alerting/rule/{id}/_enable
Disable rulePOST
/api/alerting/rule/{id}/_disable
Mute all alertsPOST
/api/alerting/rule/{id}/_mute_all
Unmute all alertsPOST
/api/alerting/rule/{id}/_unmute_all
Mute alertPOST
/api/alerting/rule/{rule_id}/alert/{alert_id}/_mute
Unmute alertPOST
/api/alerting/rule/{rule_id}/alert/{alert_id}/_unmute
Update API keyPOST
/api/alerting/rule/{id}/_update_api_key
Create snoozePOST
/api/alerting/rule/{id}/snooze_schedule
Delete snoozeDELETE
/api/alerting/rule/{ruleId}/snooze_schedule/{scheduleId}
Health checkGET
/api/alerting/_health
基础路径:
<kibana_url>/api/alerting
(非默认空间使用
/s/<space_id>/api/alerting
)。
操作请求方法接口地址
创建规则POST
/api/alerting/rule/{id}
更新规则PUT
/api/alerting/rule/{id}
获取规则详情GET
/api/alerting/rule/{id}
删除规则DELETE
/api/alerting/rule/{id}
查询规则列表GET
/api/alerting/rules/_find
列出规则类型GET
/api/alerting/rule_types
启用规则POST
/api/alerting/rule/{id}/_enable
禁用规则POST
/api/alerting/rule/{id}/_disable
静音所有告警事件POST
/api/alerting/rule/{id}/_mute_all
取消静音所有告警事件POST
/api/alerting/rule/{id}/_unmute_all
静音指定告警事件POST
/api/alerting/rule/{rule_id}/alert/{alert_id}/_mute
取消静音指定告警事件POST
/api/alerting/rule/{rule_id}/alert/{alert_id}/_unmute
更新API KeyPOST
/api/alerting/rule/{id}/_update_api_key
创建暂停计划POST
/api/alerting/rule/{id}/snooze_schedule
删除暂停计划DELETE
/api/alerting/rule/{ruleId}/snooze_schedule/{scheduleId}
健康检查GET
/api/alerting/_health

Creating a Rule

创建规则

Required Fields

必填字段

FieldTypeDescription
name
stringDisplay name (does not need to be unique)
rule_type_id
stringThe rule type (e.g.,
.es-query
,
.index-threshold
)
consumer
stringOwning app:
alerts
,
apm
,
discover
,
infrastructure
,
logs
,
metrics
,
ml
,
monitoring
,
securitySolution
,
siem
,
stackAlerts
,
uptime
params
objectRule-type-specific parameters
schedule
objectCheck interval, e.g.,
{"interval": "5m"}
字段名类型描述
name
string规则显示名称(无需唯一)
rule_type_id
string规则类型(例如
.es-query
.index-threshold
consumer
string所属应用:
alerts
apm
discover
infrastructure
logs
metrics
ml
monitoring
securitySolution
siem
stackAlerts
uptime
params
object规则类型专属参数
schedule
object检测间隔,例如
{"interval": "5m"}

Optional Fields

可选字段

FieldTypeDescription
actions
arrayActions to run when conditions are met (each references a connector)
tags
arrayTags for organizing rules
enabled
booleanWhether the rule runs immediately (default: true)
notify_when
string
onActionGroupChange
,
onActiveAlert
, or
onThrottleInterval
(prefer setting per-action instead)
alert_delay
objectAlert only after N consecutive matches, e.g.,
{"active": 3}
flapping
object/nullOverride flapping detection settings
字段名类型描述
actions
array条件满足时执行的动作列表(每个动作关联一个连接器)
tags
array用于规则分类的标签
enabled
boolean是否立即启用规则(默认值:true)
notify_when
string
onActionGroupChange
onActiveAlert
onThrottleInterval
(建议在动作级别配置该字段)
alert_delay
object连续N次匹配后才触发告警,例如
{"active": 3}
flapping
object/null覆盖默认的告警波动检测设置

Example: Create an Elasticsearch Query Rule

示例:创建Elasticsearch查询规则

bash
curl -X POST "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "High error rate",
    "rule_type_id": ".es-query",
    "consumer": "stackAlerts",
    "schedule": { "interval": "5m" },
    "params": {
      "index": ["logs-*"],
      "timeField": "@timestamp",
      "esQuery": "{\"query\":{\"match\":{\"log.level\":\"error\"}}}",
      "threshold": [100],
      "thresholdComparator": ">",
      "timeWindowSize": 5,
      "timeWindowUnit": "m",
      "size": 100
    },
    "actions": [
      {
        "id": "my-slack-connector-id",
        "group": "query matched",
        "params": {
          "message": "Alert: {{rule.name}} - {{context.hits}} hits detected"
        },
        "frequency": {
          "summary": false,
          "notify_when": "onActionGroupChange"
        }
      }
    ],
    "tags": ["production", "errors"]
  }'
The same structure applies to other rule types — set the appropriate
rule_type_id
(e.g.,
.index-threshold
,
.es-query
) and provide the matching
params
object. Use
GET /api/alerting/rule_types
to discover params schemas.
bash
curl -X POST "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "高错误率告警",
    "rule_type_id": ".es-query",
    "consumer": "stackAlerts",
    "schedule": { "interval": "5m" },
    "params": {
      "index": ["logs-*"],
      "timeField": "@timestamp",
      "esQuery": "{\"query\":{\"match\":{\"log.level\":\"error\"}}}",
      "threshold": [100],
      "thresholdComparator": ">",
      "timeWindowSize": 5,
      "timeWindowUnit": "m",
      "size": 100
    },
    "actions": [
      {
        "id": "my-slack-connector-id",
        "group": "query matched",
        "params": {
          "message": "告警:{{rule.name}} - 检测到{{context.hits}}条匹配记录"
        },
        "frequency": {
          "summary": false,
          "notify_when": "onActionGroupChange"
        }
      }
    ],
    "tags": ["production", "errors"]
  }'
其他规则类型的请求结构类似——只需设置对应的
rule_type_id
(例如
.index-threshold
.es-query
)并提供匹配的
params
对象。可通过
GET /api/alerting/rule_types
查询各规则类型的参数 schema。

Updating a Rule

更新规则

PUT /api/alerting/rule/{id}
— send the complete rule body.
rule_type_id
and
consumer
are immutable after creation. Returns 409 Conflict if another user updated the rule concurrently; re-fetch and retry.
使用
PUT /api/alerting/rule/{id}
接口——需传入完整的规则体。
rule_type_id
consumer
字段在创建后不可修改。 如果规则在你获取后被其他用户修改,请求会返回409 Conflict,此时需重新获取最新版本后重试。

Finding Rules

查询规则列表

bash
curl -X GET "https://my-kibana:5601/api/alerting/rules/_find?per_page=20&page=1&search=cpu&sort_field=name&sort_order=asc" \
  -H "Authorization: ApiKey <your-api-key>"
Query parameters:
per_page
,
page
,
search
,
default_search_operator
,
search_fields
,
sort_field
,
sort_order
,
has_reference
,
fields
,
filter
,
filter_consumers
.
Use the
filter
parameter with KQL syntax for advanced queries:
text
filter=alert.attributes.tags:"production"
bash
curl -X GET "https://my-kibana:5601/api/alerting/rules/_find?per_page=20&page=1&search=cpu&sort_field=name&sort_order=asc" \
  -H "Authorization: ApiKey <your-api-key>"
支持的查询参数:
per_page
page
search
default_search_operator
search_fields
sort_field
sort_order
has_reference
fields
filter
filter_consumers
可使用KQL语法的
filter
参数进行高级查询:
text
filter=alert.attributes.tags:"production"

Lifecycle Operations

生命周期管理操作

bash
undefined
bash
undefined

Enable

启用规则

curl -X POST ".../api/alerting/rule/{id}/_enable" -H "kbn-xsrf: true"
curl -X POST ".../api/alerting/rule/{id}/_enable" -H "kbn-xsrf: true"

Disable

禁用规则

curl -X POST ".../api/alerting/rule/{id}/_disable" -H "kbn-xsrf: true"
curl -X POST ".../api/alerting/rule/{id}/_disable" -H "kbn-xsrf: true"

Mute all alerts

静音所有告警事件

curl -X POST ".../api/alerting/rule/{id}/_mute_all" -H "kbn-xsrf: true"
curl -X POST ".../api/alerting/rule/{id}/_mute_all" -H "kbn-xsrf: true"

Mute specific alert

静音指定告警事件

curl -X POST ".../api/alerting/rule/{rule_id}/alert/{alert_id}/_mute" -H "kbn-xsrf: true"
curl -X POST ".../api/alerting/rule/{rule_id}/alert/{alert_id}/_mute" -H "kbn-xsrf: true"

Delete

删除规则

curl -X DELETE ".../api/alerting/rule/{id}" -H "kbn-xsrf: true"
undefined
curl -X DELETE ".../api/alerting/rule/{id}" -H "kbn-xsrf: true"
undefined

Terraform Provider

Terraform 配置方式

Use the
elasticstack
provider resource
elasticstack_kibana_alerting_rule
.
hcl
terraform {
  required_providers {
    elasticstack = {
      source  = "elastic/elasticstack"
    }
  }
}

provider "elasticstack" {
  kibana {
    endpoints = ["https://my-kibana:5601"]
    api_key   = var.kibana_api_key
  }
}

resource "elasticstack_kibana_alerting_rule" "cpu_alert" {
  name         = "CPU usage critical"
  consumer     = "stackAlerts"
  rule_type_id = ".index-threshold"
  interval     = "1m"
  enabled      = true

  params = jsonencode({
    index              = ["metrics-*"]
    timeField          = "@timestamp"
    aggType            = "avg"
    aggField           = "system.cpu.total.pct"
    groupBy            = "top"
    termField          = "host.name"
    termSize           = 10
    threshold          = [0.9]
    thresholdComparator = ">"
    timeWindowSize     = 5
    timeWindowUnit     = "m"
  })

  tags = ["infrastructure", "production"]
}
Key Terraform notes:
  • params
    must be passed as a JSON-encoded string via
    jsonencode()
  • Use
    elasticstack_kibana_action_connector
    data source or resource to reference connector IDs in actions
  • Import existing rules:
    terraform import elasticstack_kibana_alerting_rule.my_rule <space_id>/<rule_id>
    (use
    default
    for the default space)
使用
elasticstack
provider的
elasticstack_kibana_alerting_rule
资源。
hcl
terraform {
  required_providers {
    elasticstack = {
      source  = "elastic/elasticstack"
    }
  }
}

provider "elasticstack" {
  kibana {
    endpoints = ["https://my-kibana:5601"]
    api_key   = var.kibana_api_key
  }
}

resource "elasticstack_kibana_alerting_rule" "cpu_alert" {
  name         = "CPU使用率临界告警"
  consumer     = "stackAlerts"
  rule_type_id = ".index-threshold"
  interval     = "1m"
  enabled      = true

  params = jsonencode({
    index              = ["metrics-*"]
    timeField          = "@timestamp"
    aggType            = "avg"
    aggField           = "system.cpu.total.pct"
    groupBy            = "top"
    termField          = "host.name"
    termSize           = 10
    threshold          = [0.9]
    thresholdComparator = ">"
    timeWindowSize     = 5
    timeWindowUnit     = "m"
  })

  tags = ["infrastructure", "production"]
}
Terraform 关键注意事项:
  • params
    必须通过
    jsonencode()
    以JSON编码字符串的形式传入
  • 可使用
    elasticstack_kibana_action_connector
    数据源或资源来引用动作中的连接器ID
  • 导入已有规则:
    terraform import elasticstack_kibana_alerting_rule.my_rule <space_id>/<rule_id>
    (默认空间使用
    default

Triggering Kibana Workflows from Rules

通过规则触发Kibana工作流

Preview feature — available from Elastic Stack 9.3 and Elastic Cloud Serverless. APIs may change.
Attach a workflow as a rule action using the workflow ID as the connector ID. Set
params: {}
— alert context flows automatically through the
event
object inside the workflow.
bash
curl -X PUT "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "High error rate",
    "schedule": { "interval": "5m" },
    "params": { ... },
    "actions": [
      {
        "id": "<workflow-id>",
        "group": "query matched",
        "params": {},
        "frequency": { "summary": false, "notify_when": "onActionGroupChange" }
      }
    ]
  }'
In the UI: Stack Management > Rules > Actions > Workflows. Only
enabled: true
workflows appear in the picker.
For workflow YAML structure,
{{ event }}
context fields, step types, and patterns, refer to the
kibana-connectors
skill if available.
预览功能 —— 从Elastic Stack 9.3和Elastic Cloud Serverless版本开始提供。API可能会发生变更。
将工作流作为规则动作关联,使用工作流ID作为连接器ID。设置
params: {}
——告警上下文会自动通过工作流中的
event
对象传递。
bash
curl -X PUT "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "高错误率告警",
    "schedule": { "interval": "5m" },
    "params": { ... },
    "actions": [
      {
        "id": "<workflow-id>",
        "group": "query matched",
        "params": {},
        "frequency": { "summary": false, "notify_when": "onActionGroupChange" }
      }
    ]
  }'
在UI中操作路径:Stack Management > Rules > Actions > Workflows。仅
enabled: true
的工作流会出现在选择器中。
关于工作流YAML结构、
{{ event }}
上下文字段、步骤类型和模式,请参考
kibana-connectors
技能文档(若可用)。

Connectors and Actions in Rules

规则中的连接器与动作

Each action references a connector by ID, an action
group
, action
params
(using Mustache templates), and a per-action
frequency
object. Key fields:
  • group
    — which trigger state fires this action (e.g.,
    "query matched"
    ,
    "Recovered"
    ). Discover valid groups via
    GET /api/alerting/rule_types
    .
  • frequency.summary
    true
    for a digest of all alerts;
    false
    for per-alert.
  • frequency.notify_when
    onActionGroupChange
    |
    onActiveAlert
    |
    onThrottleInterval
    .
  • frequency.throttle
    — minimum repeat interval (e.g.,
    "10m"
    ); only applies with
    onThrottleInterval
    .
For full reference on action structure, Mustache variables (
{{rule.name}}
,
{{context.*}}
,
{{alerts.new.count}}
), Mustache lambdas (
EvalMath
,
FormatDate
,
ParseHjson
), recovery actions, and multi-channel patterns, refer to the
kibana-connectors
skill if available.
每个动作通过ID引用连接器,包含动作
group
、动作
params
(使用Mustache模板)和每个动作专属的
frequency
对象。关键字段说明:
  • group
    —— 触发该动作的规则状态组(例如
    "query matched"
    "Recovered"
    )。可通过
    GET /api/alerting/rule_types
    查询有效组名。
  • frequency.summary
    ——
    true
    表示发送所有告警的汇总信息;
    false
    表示发送单条告警详情。
  • frequency.notify_when
    ——
    onActionGroupChange
    |
    onActiveAlert
    |
    onThrottleInterval
  • frequency.throttle
    —— 最小重复触发间隔(例如
    "10m"
    );仅在
    onThrottleInterval
    模式下生效。
关于动作结构、Mustache变量(
{{rule.name}}
{{context.*}}
{{alerts.new.count}}
)、Mustache lambda(
EvalMath
FormatDate
ParseHjson
)、恢复动作和多渠道模式的完整参考,请参考
kibana-connectors
技能文档(若可用)。

Best Practices

最佳实践

  1. Set action frequency per action, not per rule. The
    notify_when
    field at the rule level is deprecated in favor of per-action
    frequency
    objects. If you set it at the rule level and later edit the rule in the Kibana UI, it is automatically converted to action-level values.
  2. Use alert summaries to reduce notification noise. Instead of sending one notification per alert, configure actions to send periodic summaries at a custom interval. Use
    "summary": true
    and set a
    throttle
    interval. This is especially valuable for rules that monitor many hosts or documents.
  3. Choose the right action frequency for each channel. Use
    onActionGroupChange
    for paging/ticketing systems (fire once, resolve once). Use
    onActiveAlert
    for audit logging to an Index connector. Use
    onThrottleInterval
    with a throttle like
    "30m"
    for dashboards or lower-priority notifications.
  4. Always add a recovery action. Rules without a recovery action leave incidents open in PagerDuty, Jira, and ServiceNow indefinitely. Use the connector's native close/resolve event action (e.g.,
    eventAction: "resolve"
    for PagerDuty) in the
    Recovered
    action group.
  5. Set a reasonable check interval. The minimum recommended interval is
    1m
    . Very short intervals across many rules clog Task Manager throughput and increase schedule drift. The server setting
    xpack.alerting.rules.minimumScheduleInterval.value
    enforces this.
  6. Use
    alert_delay
    to suppress transient spikes.
    Setting
    {"active": 3}
    means the alert only fires after 3 consecutive runs match the condition, filtering out brief anomalies.
  7. Enable flapping detection. Alerts that rapidly switch between active and recovered are marked as "flapping" and notifications are suppressed. This is on by default but can be tuned per-rule with the
    flapping
    object.
  8. Use
    server.publicBaseUrl
    for deep links.
    Set
    server.publicBaseUrl
    in
    kibana.yml
    so that
    {{rule.url}}
    and
    {{kibanaBaseUrl}}
    variables resolve to valid URLs in notifications.
  9. Tag rules consistently. Use tags like
    production
    ,
    staging
    ,
    team-platform
    for filtering and organization in the Find API and UI.
  10. Use Kibana Spaces to isolate rules by team or environment. Prefix API paths with
    /s/<space_id>/
    for non-default spaces. Connectors are also space-scoped, so create matching connectors in each space.
  1. 在动作级别配置触发频率,而非规则级别。规则级别的
    notify_when
    字段已被弃用,推荐使用动作级别的
    frequency
    对象。若你在规则级别设置了该字段,后续在Kibana UI中编辑规则时,系统会自动将其转换为动作级别的配置。
  2. 使用告警汇总减少通知噪音。不要为每条告警发送单独通知,可配置动作按自定义间隔发送周期性汇总信息。设置
    "summary": true
    并指定
    throttle
    间隔。这对于监控大量主机或文档的规则尤为重要。
  3. 为不同渠道选择合适的触发频率。对于告警/工单系统(如PagerDuty)使用
    onActionGroupChange
    (触发一次,恢复一次);对于索引连接器的审计日志使用
    onActiveAlert
    ;对于仪表盘或低优先级通知,使用
    onThrottleInterval
    并设置如
    "30m"
    的间隔。
  4. 始终配置恢复动作。未配置恢复动作的规则会导致PagerDuty、Jira和ServiceNow中的事件无限期处于打开状态。在
    Recovered
    动作组中使用连接器的原生关闭/解决事件动作(例如PagerDuty的
    eventAction: "resolve"
    )。
  5. 设置合理的检测间隔。推荐的最小间隔为
    1m
    。过多规则使用过短的检测间隔会占用Task Manager的吞吐量,并增加调度延迟。服务器设置
    xpack.alerting.rules.minimumScheduleInterval.value
    会强制限制最小间隔。
  6. 使用
    alert_delay
    抑制瞬时峰值
    。设置
    {"active": 3}
    表示连续3次匹配条件后才触发告警,可过滤掉短暂的异常波动。
  7. 启用告警波动检测。频繁在活跃和恢复状态之间切换的告警会被标记为"flapping(波动)",并停止发送通知。该功能默认开启,也可通过
    flapping
    对象为单个规则调整设置。
  8. 设置
    server.publicBaseUrl
    以支持深度链接
    。在
    kibana.yml
    中配置
    server.publicBaseUrl
    ,确保通知中的
    {{rule.url}}
    {{kibanaBaseUrl}}
    变量能解析为有效URL。
  9. 统一规则标签规范。使用
    production
    staging
    team-platform
    等标签,方便在查询API和UI中过滤和管理规则。
  10. 使用Kibana空间隔离规则。按团队或环境使用Kibana空间隔离规则。对于非默认空间,需在API路径前添加
    /s/<space_id>/
    。连接器同样是空间级别的资源,需在每个空间创建对应的连接器。

Common Pitfalls

常见陷阱

  1. Missing
    kbn-xsrf
    header.
    All POST, PUT, DELETE requests require
    kbn-xsrf: true
    or any truthy value. Omitting it returns a 400 error.
  2. Wrong
    consumer
    value.
    Using an invalid consumer (e.g.,
    observability
    instead of
    infrastructure
    ) causes a 400 error. Check the rule type's supported consumers via
    GET /api/alerting/rule_types
    .
  3. Immutable fields on update. You cannot change
    rule_type_id
    or
    consumer
    with PUT. You must delete and recreate the rule.
  4. Rule-level
    notify_when
    and
    throttle
    are deprecated.
    Setting these at the rule level still works but conflicts with action-level frequency settings. Always use
    frequency
    inside each action object.
  5. Rule ID conflicts. POST to
    /api/alerting/rule/{id}
    with an existing ID returns 409. Either omit the ID to auto-generate, or check existence first.
  6. API key ownership. Rules run using the API key of the user who created or last updated them. If that user's permissions change or the user is deleted, the rule may fail silently. Use
    _update_api_key
    to re-associate.
  7. Too many actions per rule. Rules generating thousands of alerts with multiple actions can clog Task Manager. The server setting
    xpack.alerting.rules.run.actions.max
    (default varies) limits actions per run. Design rules to use alert summaries or limit term sizes.
  8. Long-running rules. Rules that run expensive queries are cancelled after
    xpack.alerting.rules.run.timeout
    (default
    5m
    ). When cancelled, all alerts and actions from that run are discarded. Optimize queries or increase the timeout for specific rule types.
  9. Concurrent update conflicts. PUT returns 409 if the rule was modified by another user since you last read it. Always GET the latest version before updating.
  10. Import/export loses secrets. Rules exported via Saved Objects are disabled on import. Connectors lose their secrets and must be re-configured.
  1. 缺少
    kbn-xsrf
    请求头
    。所有POST、PUT、DELETE请求必须包含
    kbn-xsrf: true
    或其他真值。缺少该头会返回400错误。
  2. consumer
    值错误
    。使用无效的consumer值(例如用
    observability
    代替
    infrastructure
    )会导致400错误。可通过
    GET /api/alerting/rule_types
    查询规则类型支持的consumer值。
  3. 更新时修改不可变字段。无法通过PUT请求修改
    rule_type_id
    consumer
    字段,必须删除规则后重新创建。
  4. 规则级别的
    notify_when
    throttle
    已弃用
    。在规则级别设置这些字段虽然仍能生效,但会与动作级别的频率配置冲突。请始终在每个动作对象内配置
    frequency
  5. 规则ID冲突。使用已存在的ID向
    /api/alerting/rule/{id}
    发送POST请求会返回409错误。可省略ID让系统自动生成,或先检查ID是否存在。
  6. API Key归属问题。规则会使用创建或最后更新该规则的用户的API Key运行。若该用户的权限变更或账号被删除,规则可能会静默失败。可使用
    _update_api_key
    重新关联API Key。
  7. 单条规则关联过多动作。生成大量告警且关联多个动作的规则会占用Task Manager资源。服务器设置
    xpack.alerting.rules.run.actions.max
    (默认值随版本变化)限制单次运行的动作数量。建议设计规则时使用告警汇总或限制term大小。
  8. 规则运行时间过长。执行复杂查询的规则会在
    xpack.alerting.rules.run.timeout
    (默认
    5m
    )后被取消。规则被取消后,本次运行生成的所有告警和动作都会被丢弃。需优化查询或为特定规则类型增加超时时间。
  9. 并发更新冲突。若规则在你获取后被其他用户修改,PUT请求会返回409错误。更新前请务必获取最新版本的规则。
  10. 导入/导出丢失敏感信息。通过Saved Objects导出的规则在导入后会被禁用,连接器会丢失敏感信息,需要重新配置。

Examples

示例场景

Create a threshold alert: "Alert me when CPU exceeds 90% on any host for 5 minutes." Use
rule_type_id: ".index-threshold"
,
aggField: "system.cpu.total.pct"
,
threshold: [0.9]
, and
timeWindowSize: 5
. Attach a PagerDuty action on
"threshold met"
and a matching
Recovered
action to auto-close incidents.
Find rules by tag: "Show all production alerting rules."
GET /api/alerting/rules/_find
with
filter=alert.attributes.tags:"production"
and
sort_field=name
to page through results.
Pause a rule temporarily: "Disable rule abc123 until next Monday."
POST /api/alerting/rule/abc123/_disable
. Re-enable with
_enable
when ready; the rule retains all configuration while disabled.
创建阈值告警:"当任意主机的CPU使用率超过90%并持续5分钟时触发告警"。使用
rule_type_id: ".index-threshold"
aggField: "system.cpu.total.pct"
threshold: [0.9]
timeWindowSize: 5
。在
"threshold met"
组关联PagerDuty动作,并在
Recovered
组配置对应的恢复动作以自动关闭事件。
按标签查询规则:"显示所有生产环境的告警规则"。调用
GET /api/alerting/rules/_find
并携带
filter=alert.attributes.tags:"production"
sort_field=name
参数分页查询结果。
临时暂停规则:"禁用规则abc123至下周一"。调用
POST /api/alerting/rule/abc123/_disable
。需要恢复时调用
_enable
接口;规则在禁用期间会保留所有配置。

Guidelines

操作指南

  • Include
    kbn-xsrf: true
    on every POST, PUT, and DELETE; omitting it returns 400.
  • Set
    frequency
    inside each action object — rule-level
    notify_when
    and
    throttle
    are deprecated.
  • rule_type_id
    and
    consumer
    are immutable after creation; delete and recreate the rule to change them.
  • Prefix paths with
    /s/<space_id>/api/alerting/
    for non-default Kibana Spaces.
  • Always pair an active action with a
    Recovered
    action to auto-close PagerDuty, Jira, and ServiceNow incidents.
  • Run
    GET /api/alerting/rule_types
    first to discover valid
    consumer
    values and action group names.
  • Use
    alert_delay
    to suppress transient spikes; use the
    flapping
    object to reduce noise from unstable conditions.
  • 所有POST、PUT和DELETE请求必须包含
    kbn-xsrf: true
    ;缺少该头会返回400错误。
  • 在每个动作对象内配置
    frequency
    ——规则级别的
    notify_when
    throttle
    已被弃用。
  • rule_type_id
    consumer
    在创建后不可修改;如需变更需删除并重新创建规则。
  • 对于非默认Kibana空间,需在API路径前添加
    /s/<space_id>/api/alerting/
  • 始终为活跃动作配对
    Recovered
    动作,以自动关闭PagerDuty、Jira和ServiceNow中的事件。
  • 先调用
    GET /api/alerting/rule_types
    查询有效的
    consumer
    值和动作组名。
  • 使用
    alert_delay
    抑制瞬时峰值;使用
    flapping
    对象减少不稳定场景下的噪音。

Additional Resources

额外资源