axiom-alerting
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAxiom Alerting
Axiom 告警系统
You manage alerting in Axiom end-to-end: notifiers for routing and monitors for detection.
您可以端到端管理Axiom中的告警功能:负责路由的通知器和负责检测的监控器。
API Overview
API 概览
Base URL: with Bearer token auth from (project root or ).
https://api.axiom.co/v2/.axiom.toml~/.axiom.toml基础URL:,使用(项目根目录或)中的Bearer令牌进行身份验证。
https://api.axiom.co/v2/.axiom.toml~/.axiom.tomlMonitors (/v2/monitors
)
/v2/monitors监控器(/v2/monitors
)
/v2/monitors| Operation | Method | Path |
|---|---|---|
| List | GET | |
| Get | GET | |
| History | GET | |
| Create | POST | |
| Update | PUT | |
| Delete | DELETE | |
| 操作 | 请求方法 | 路径 |
|---|---|---|
| 列出 | GET | |
| 获取 | GET | |
| 查看历史 | GET | |
| 创建 | POST | |
| 更新 | PUT | |
| 删除 | DELETE | |
Notifiers (/v2/notifiers
)
/v2/notifiers通知器(/v2/notifiers
)
/v2/notifiers| Operation | Method | Path |
|---|---|---|
| List | GET | |
| Get | GET | |
| Create | POST | |
| Update | PUT | |
| Delete | DELETE | |
| 操作 | 请求方法 | 路径 |
|---|---|---|
| 列出 | GET | |
| 获取 | GET | |
| 创建 | POST | |
| 更新 | PUT | |
| 删除 | DELETE | |
Prerequisites
前置条件
- Run
scripts/setup - Ensure has a deployment:
.axiom.toml
toml
[deployments.prod]
url = "https://api.axiom.co"
token = "xaat-your-token"
org_id = "your-org-id"- 运行
scripts/setup - 确保中包含部署配置:
.axiom.toml
toml
[deployments.prod]
url = "https://api.axiom.co"
token = "xaat-your-token"
org_id = "your-org-id"Scripts
脚本
Core:
scripts/axiom-api <deploy> <method> <path> [body]
Monitor scripts:
scripts/monitor-list <deployment> [--json]scripts/monitor-get <deployment> <id>scripts/monitor-history <deployment> <id> <startTime> <endTime>scripts/monitor-create <deployment> <json-file>scripts/monitor-update <deployment> <id> <json-file>scripts/monitor-delete <deployment> <id>
Notifier scripts:
scripts/notifier-list <deployment> [--json]scripts/notifier-get <deployment> <id>scripts/notifier-create <deployment> <json-file>scripts/notifier-update <deployment> <id> <json-file>scripts/notifier-delete <deployment> <id>
核心脚本:
scripts/axiom-api <deploy> <method> <path> [body]
监控器相关脚本:
scripts/monitor-list <deployment> [--json]scripts/monitor-get <deployment> <id>scripts/monitor-history <deployment> <id> <startTime> <endTime>scripts/monitor-create <deployment> <json-file>scripts/monitor-update <deployment> <id> <json-file>scripts/monitor-delete <deployment> <id>
通知器相关脚本:
scripts/notifier-list <deployment> [--json]scripts/notifier-get <deployment> <id>scripts/notifier-create <deployment> <json-file>scripts/notifier-update <deployment> <id> <json-file>scripts/notifier-delete <deployment> <id>
Recommended Workflow
推荐工作流
- Create notifier first.
- Create monitor and set .
notifierIds - Validate monitor behavior with .
monitor-history - Iterate monitor thresholds and schedule.
- 先创建通知器。
- 创建监控器并设置。
notifierIds - 使用验证监控器行为。
monitor-history - 迭代调整监控器阈值和调度规则。
Workflow: End-To-End Alerting
端到端告警工作流
- Run .
scripts/setup - List existing notifiers with and reuse one if appropriate.
scripts/notifier-list <deployment> - If no suitable notifier exists, create one with .
scripts/notifier-create - Create or update the monitor with attached.
notifierIds - Validate with .
scripts/monitor-history <deployment> <id> <startTime> <endTime> - If behavior is noisy or silent, tune ,
threshold,rangeMinutes, and N-of-M trigger fields.intervalMinutes - Re-check history after each change.
- 运行。
scripts/setup - 使用列出已有通知器,如有合适的可直接复用。
scripts/notifier-list <deployment> - 若无合适通知器,使用创建新的。
scripts/notifier-create - 创建或更新监控器并关联。
notifierIds - 使用验证监控器。
scripts/monitor-history <deployment> <id> <startTime> <endTime> - 若告警过于频繁或无告警,调整、
threshold、rangeMinutes以及N-of-M触发字段。intervalMinutes - 每次修改后重新查看历史记录。
Best Practices
最佳实践
- Configure one channel per notifier.
- Use (not
emails) for email notifier payloads.recipients - Prefer /
triggerAfterNPositiveResultsfor noisy signals.triggerFromNRuns - Use explicit in monitor queries; avoid
bin()for alert logic.bin_auto() - For metrics-backed monitors, prefer for definitions; API responses may include both
mplQueryandaplQuery.mplQuery
- 每个通知器配置一个渠道。
- 邮件通知器 payload 使用而非
emails。recipients - 针对噪声信号,优先使用/
triggerAfterNPositiveResults。triggerFromNRuns - 监控器查询中使用显式;告警逻辑避免使用
bin()。bin_auto() - 对于基于指标的监控器,优先使用定义;API响应可能同时包含
mplQuery和aplQuery。mplQuery
Monitor Types And Operators
监控器类型与运算符
Monitor types:
ThresholdMatchEventAnomalyDetection
Operators:
AboveBelowAboveOrEqualBelowOrEqualAboveOrBelow
监控器类型:
- (阈值型)
Threshold - (事件匹配型)
MatchEvent - (异常检测型)
AnomalyDetection
运算符:
- (高于)
Above - (低于)
Below - (大于等于)
AboveOrEqual - (小于等于)
BelowOrEqual - (超出范围)
AboveOrBelow
Monitor Field Reference
监控器字段参考
Core fields:
- : Human-readable monitor name.
name - :
type,Threshold, orMatchEvent.AnomalyDetection - /
aplQuery: Query evaluated by the monitor.mplQuery - : Array of notifier IDs to notify.
notifierIds - : Whether monitor is disabled.
disabled - : Optional timestamp for temporary disable/snooze.
disabledUntil - : Optional monitor description.
description
Threshold and evaluation fields:
- : Threshold comparison operator.
operator - : Numeric threshold value.
threshold - : Query evaluation window in minutes.
rangeMinutes - : Evaluation cadence in minutes.
intervalMinutes - : Whether no-data should trigger alerting.
alertOnNoData - : Positive evaluations required before firing.
triggerAfterNPositiveResults - : Total evaluation runs considered for N-of-M logic.
triggerFromNRuns
Advanced behavior fields:
- : Whether alerts can resolve automatically.
resolvable - : Notify per group key/value result.
notifyByGroup - : Notify on every positive evaluation.
notifyEveryRun - : Skip sending resolved notifications.
skipResolved - : Delay (seconds) to tolerate late-arriving data.
secondDelay
Type-specific fields:
- : Field used by some anomaly/value-anomaly monitors.
columnName
核心字段:
- :监控器的可读名称。
name - :
type、Threshold或MatchEvent。AnomalyDetection - /
aplQuery:监控器执行的查询语句。mplQuery - :要通知的通知器ID数组。
notifierIds - :监控器是否禁用。
disabled - :临时禁用/ snooze的可选时间戳。
disabledUntil - :监控器的可选描述。
description
阈值与评估字段:
- :阈值比较运算符。
operator - :数值阈值。
threshold - :查询评估窗口(分钟)。
rangeMinutes - :评估频率(分钟)。
intervalMinutes - :无数据时是否触发告警。
alertOnNoData - :触发告警所需的正评估次数。
triggerAfterNPositiveResults - :N-of-M逻辑考虑的总评估次数。
triggerFromNRuns
高级行为字段:
- :告警是否可自动解除。
resolvable - :按分组键/值结果发送通知。
notifyByGroup - :每次正评估都发送通知。
notifyEveryRun - :跳过发送解除告警的通知。
skipResolved - :容忍延迟数据的延迟时间(秒)。
secondDelay
类型专属字段:
- :部分异常/值异常监控器使用的字段。
columnName
Minimal Valid Monitor Examples
最简有效监控器示例
Threshold:
json
{
"name": "High Error Count",
"type": "Threshold",
"aplQuery": "['logs'] | where status >= 500 | summarize count()",
"operator": "Above",
"threshold": 100,
"rangeMinutes": 5,
"intervalMinutes": 5,
"notifierIds": ["notifier-id"],
"triggerAfterNPositiveResults": 2,
"triggerFromNRuns": 3,
"disabled": false
}MatchEvent:
json
{
"name": "Error Event Match",
"type": "MatchEvent",
"aplQuery": "['logs'] | where level == 'error'",
"rangeMinutes": 5,
"intervalMinutes": 5,
"notifierIds": ["notifier-id"],
"disabled": false
}AnomalyDetection:
json
{
"name": "CPU Anomaly",
"type": "AnomalyDetection",
"aplQuery": "['metrics'] | summarize avg(cpu_usage)",
"columnName": "cpu_usage",
"operator": "AboveOrBelow",
"rangeMinutes": 5,
"intervalMinutes": 5,
"notifierIds": ["notifier-id"],
"disabled": false
}阈值型:
json
{
"name": "High Error Count",
"type": "Threshold",
"aplQuery": "['logs'] | where status >= 500 | summarize count()",
"operator": "Above",
"threshold": 100,
"rangeMinutes": 5,
"intervalMinutes": 5,
"notifierIds": ["notifier-id"],
"triggerAfterNPositiveResults": 2,
"triggerFromNRuns": 3,
"disabled": false
}事件匹配型:
json
{
"name": "Error Event Match",
"type": "MatchEvent",
"aplQuery": "['logs'] | where level == 'error'",
"rangeMinutes": 5,
"intervalMinutes": 5,
"notifierIds": ["notifier-id"],
"disabled": false
}异常检测型:
json
{
"name": "CPU Anomaly",
"type": "AnomalyDetection",
"aplQuery": "['metrics'] | summarize avg(cpu_usage)",
"columnName": "cpu_usage",
"operator": "AboveOrBelow",
"rangeMinutes": 5,
"intervalMinutes": 5,
"notifierIds": ["notifier-id"],
"disabled": false
}Minimal Valid Notifier Examples
最简有效通知器示例
Email:
json
{
"name": "Oncall Email",
"properties": {
"email": {
"emails": ["oncall@example.com"]
}
}
}Slack:
json
{
"name": "Oncall Slack",
"properties": {
"slack": {
"slackUrl": "https://hooks.slack.com/services/T.../B.../XXX"
}
}
}Custom webhook:
json
{
"name": "Oncall Custom Webhook",
"properties": {
"customWebhook": {
"url": "https://api.example.com/alerts",
"body": "{\"action\":\"{{.Action}}\",\"monitorID\":\"{{.MonitorID}}\"}"
}
}
}邮件通知器:
json
{
"name": "Oncall Email",
"properties": {
"email": {
"emails": ["oncall@example.com"]
}
}
}Slack通知器:
json
{
"name": "Oncall Slack",
"properties": {
"slack": {
"slackUrl": "https://hooks.slack.com/services/T.../B.../XXX"
}
}
}自定义Webhook:
json
{
"name": "Oncall Custom Webhook",
"properties": {
"customWebhook": {
"url": "https://api.example.com/alerts",
"body": "{\"action\":\"{{.Action}}\",\"monitorID\":\"{{.MonitorID}}\"}"
}
}
}Troubleshooting
故障排查
401 Unauthorized- Cause: invalid or expired token.
- Fix:
- Verify token in .
~/.axiom.toml - Re-run and retry:
scripts/setupscripts/notifier-list <deployment>
- Verify token in
403 Forbidden- Cause: token lacks required permissions.
- Fix:
- Create/assign token scopes for monitor/notifier management and dataset query access.
- Retry:
scripts/monitor-list <deployment>
404 Not Found- Cause: wrong monitor/notifier ID or wrong deployment/org.
- Fix:
- Confirm deployment in .
.axiom.toml - Re-list objects and use exact IDs:
scripts/monitor-list <deployment> --jsonscripts/notifier-list <deployment> --json
- Confirm deployment in
400 Bad Request- Cause: invalid notifier payload shape.
- Fix:
- Use one notifier channel inside .
properties - For email, use (not
emails).recipients - Validate against a known-good example and retry:
scripts/notifier-create <deployment> <json-file>
- Use one notifier channel inside
400 Bad Request- Cause: invalid monitor schema, operator/type mismatch, or invalid query fields.
- Fix:
- Validate required fields: ,
name, query field, schedule, andtype.notifierIds - Confirm matches monitor type and threshold logic.
operator - Retry:
scripts/monitor-create <deployment> <json-file>scripts/monitor-update <deployment> <id> <json-file>
- Validate required fields:
Monitor created but never alerts:
- Cause: threshold too strict, wrong query window, or not enough positive runs.
- Fix:
- Inspect history over a known active period:
scripts/monitor-history <deployment> <id> <startTime> <endTime>
- Reduce threshold or widen .
rangeMinutes - Tune /
triggerAfterNPositiveResults.triggerFromNRuns
- Inspect history over a known active period:
Too many alerts (noisy monitor):
- Cause: threshold too low or interval too short.
- Fix:
- Increase threshold.
- Increase and/or
triggerAfterNPositiveResults.triggerFromNRuns - Increase or narrow match conditions.
intervalMinutes
Notifier exists but no delivery:
- Cause: destination config invalid (URL/key/channel/email list), or destination-side rejection.
- Fix:
- Fetch notifier and verify destination fields:
scripts/notifier-get <deployment> <id>
- Recreate/update notifier with corrected properties:
scripts/notifier-update <deployment> <id> <json-file>
- Confirm monitor references correct notifier IDs.
- Fetch notifier and verify destination fields:
401 Unauthorized- 原因:令牌无效或过期。
- 解决方法:
- 验证中的令牌。
~/.axiom.toml - 重新运行后重试:
scripts/setupscripts/notifier-list <deployment>
- 验证
403 Forbidden- 原因:令牌缺少必要权限。
- 解决方法:
- 创建/分配具备监控器/通知器管理和数据集查询权限的令牌范围。
- 重试:
scripts/monitor-list <deployment>
获取/更新/删除时出现:
404 Not Found- 原因:监控器/通知器ID错误,或部署/组织错误。
- 解决方法:
- 确认中的部署配置。
.axiom.toml - 重新列出对象并使用准确ID:
scripts/monitor-list <deployment> --jsonscripts/notifier-list <deployment> --json
- 确认
创建/更新通知器时出现:
400 Bad Request- 原因:通知器 payload 格式无效。
- 解决方法:
- 在中仅配置一个通知渠道。
properties - 邮件通知器使用而非
emails。recipients - 对照已知正确示例验证后重试:
scripts/notifier-create <deployment> <json-file>
- 在
创建/更新监控器时出现:
400 Bad Request- 原因:监控器 schema 无效、运算符/类型不匹配,或查询字段无效。
- 解决方法:
- 验证必填字段:、
name、查询字段、调度规则和type。notifierIds - 确认与监控器类型和阈值逻辑匹配。
operator - 重试:
scripts/monitor-create <deployment> <json-file>scripts/monitor-update <deployment> <id> <json-file>
- 验证必填字段:
监控器已创建但从未触发告警:
- 原因:阈值过于严格、查询窗口错误,或正评估次数不足。
- 解决方法:
- 在已知活跃时段查看历史记录:
scripts/monitor-history <deployment> <id> <startTime> <endTime>
- 降低阈值或扩大。
rangeMinutes - 调整/
triggerAfterNPositiveResults。triggerFromNRuns
- 在已知活跃时段查看历史记录:
告警过多(监控器噪声大):
- 原因:阈值过低或评估间隔过短。
- 解决方法:
- 提高阈值。
- 增加和/或
triggerAfterNPositiveResults。triggerFromNRuns - 增加或缩小匹配条件。
intervalMinutes
通知器存在但未送达:
- 原因:目标配置无效(URL/密钥/渠道/邮件列表),或目标端拒绝。
- 解决方法:
- 获取通知器并验证目标字段:
scripts/notifier-get <deployment> <id>
- 使用修正后的属性重新创建/更新通知器:
scripts/notifier-update <deployment> <id> <json-file>
- 确认监控器引用了正确的通知器ID。
- 获取通知器并验证目标字段: