building-dashboards

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Building Dashboards

构建仪表板

You design dashboards that help humans make decisions quickly. Dashboards are products: audience, questions, and actions matter more than chart count.
你设计的仪表板能够帮助人们快速做出决策。仪表板是产品:受众、要解决的问题以及可执行的操作比图表数量更重要。

Philosophy

设计理念

  1. Decisions first. Every panel answers a question that leads to an action.
  2. Overview → drilldown → evidence. Start broad, narrow on click/filter, end with raw logs.
  3. Rates and percentiles over averages. Averages hide problems; p95/p99 expose them.
  4. Simple beats dense. One question per panel. No chart junk.
  5. Validate with data. Never guess fields—discover schema first.

  1. **决策优先。**每个面板都要回答一个能导向具体行动的问题。
  2. **概览 → 下钻 → 证据。**从全局视角开始,点击/筛选后聚焦细节,最终展示原始日志。
  3. **优先使用比率和百分位数而非平均值。**平均值会掩盖问题;p95/p99分位数能暴露问题。
  4. **简洁胜于密集。**每个面板只解决一个问题。避免冗余图表元素。
  5. **用数据验证。**绝不猜测字段——先发现数据模式。

Entry Points

起始场景

Choose your starting point:
Starting fromWorkflow
Vague descriptionIntake → design blueprint → APL per panel → deploy
TemplatePick template → customize dataset/service/env → deploy
Splunk dashboardExtract SPL → translate via spl-to-apl → map to chart types → deploy
ExplorationUse axiom-sre to discover schema/signals → productize into panels

选择你的起始工作流:
起始场景工作流
模糊需求描述需求收集 → 设计蓝图 → 为每个面板编写APL → 部署
模板选择模板 → 自定义数据集/服务/环境 → 部署
Splunk仪表板提取SPL → 通过spl-to-apl转换为APL → 映射图表类型 → 部署
探索分析使用axiom-sre发现模式/信号 → 转化为可复用面板

Intake: What to Ask First

需求收集:首先要明确的问题

Before designing, clarify:
  1. Audience & decision
    • Oncall triage? (fast refresh, error-focused)
    • Team health? (daily trends, SLO tracking)
    • Exec reporting? (weekly summaries, high-level)
  2. Scope
    • Service, environment, region, cluster, endpoint?
    • Single service or cross-service view?
  3. Datasets
    • Which Axiom datasets contain the data?
    • Run
      getschema
      to discover fields—never guess:
    apl
    ['dataset'] | where _time between (ago(1h) .. now()) | getschema
  4. Golden signals
    • Traffic: requests/sec, events/min
    • Errors: error rate, 5xx count
    • Latency: p50, p95, p99 duration
    • Saturation: CPU, memory, queue depth, connections
  5. Drilldown dimensions
    • What do users filter/group by? (service, route, status, pod, customer_id)

开始设计前,先厘清以下内容:
  1. 受众与决策场景
    • 是运维值班排查?(需快速刷新,聚焦错误)
    • 是团队健康度监控?(需每日趋势,SLO追踪)
    • 是高管汇报?(需每周汇总,高层视角)
  2. 范围
    • 涉及哪些服务、环境、区域、集群、端点?
    • 是单服务视图还是跨服务视图?
  3. 数据集
    • 哪些Axiom数据集包含所需数据?
    • 运行
      getschema
      来发现字段——绝不猜测:
    apl
    ['dataset'] | where _time between (ago(1h) .. now()) | getschema
  4. 核心指标
    • 流量:请求/秒、事件/分钟
    • 错误:错误率、5xx请求数
    • 延迟:p50、p95、p99响应时长
    • 饱和度:CPU、内存、队列深度、连接数
  5. 下钻维度
    • 用户会按哪些维度筛选/分组?(服务、路由、状态、Pod、客户ID)

Dashboard Blueprint

仪表板蓝图

Use this 4-section structure as the default:
默认采用以下4段式结构:

1. At-a-Glance (Statistic panels)

1. 概览面板(统计型面板)

Single numbers that answer "is it broken right now?"
  • Error rate (last 5m)
  • p95 latency (last 5m)
  • Request rate (last 5m)
  • Active alerts (if applicable)
用单个数值回答“当前是否出现故障?”
  • 错误率(最近5分钟)
  • p95延迟(最近5分钟)
  • 请求率(最近5分钟)
  • 活跃告警(如有)

2. Trends (TimeSeries panels)

2. 趋势面板(时间序列面板)

Time-based patterns that answer "what changed?"
  • Traffic over time
  • Error rate over time
  • Latency percentiles over time
  • Stacked by status/service for comparison
用时间维度的模式回答“发生了什么变化?”
  • 流量趋势
  • 错误率趋势
  • 延迟分位数趋势
  • 按状态/服务堆叠以方便对比

3. Breakdowns (Table/Pie panels)

3. 细分面板(表格/饼图面板)

Top-N analysis that answers "where should I look?"
  • Top 10 failing routes
  • Top 10 error messages
  • Worst pods by error rate
  • Request distribution by status
用Top-N分析回答“应该关注哪里?”
  • 故障最多的10个路由
  • 出现最频繁的10条错误信息
  • 错误率最高的Pod
  • 请求按状态的分布

4. Evidence (LogStream + SmartFilter)

4. 证据面板(LogStream + SmartFilter)

Raw events that answer "what exactly happened?"
  • LogStream filtered to errors
  • SmartFilter for service/env/route
  • Key fields projected for readability

用原始事件回答“具体发生了什么?”
  • 过滤为错误的LogStream
  • 针对服务/环境/路由的SmartFilter
  • 仅展示关键字段以提升可读性

Chart Types

图表类型

Note: Dashboard queries inherit time from the UI picker—no explicit
_time
filter needed.
Validation: TimeSeries, Statistic, Table, Pie, LogStream, Note, MonitorList are fully validated by
dashboard-validate
. Heatmap, ScatterPlot, SmartFilter work but may trigger warnings.
**注意:**仪表板查询会继承UI时间选择器的时间范围——无需显式添加
_time
过滤条件。
**验证支持:**TimeSeries、Statistic、Table、Pie、LogStream、Note、MonitorList完全支持
dashboard-validate
验证。Heatmap、ScatterPlot、SmartFilter可正常使用,但可能触发警告。

Statistic

统计型(Statistic)

When: Single KPI, current value, threshold comparison.
apl
['logs']
| where service == "api"
| summarize 
    total = count(),
    errors = countif(status >= 500)
| extend error_rate = round(100.0 * errors / total, 2)
| project error_rate
Pitfalls: Don't use for time series; ensure query returns single row.
**适用场景:**单个KPI、当前值、阈值对比。
apl
['logs']
| where service == "api"
| summarize 
    total = count(),
    errors = countif(status >= 500)
| extend error_rate = round(100.0 * errors / total, 2)
| project error_rate
**避坑指南:**不要用于时间序列场景;确保查询返回单行结果。

TimeSeries

时间序列(TimeSeries)

When: Trends over time, before/after comparison, rate changes.
apl
// Single metric - use bin_auto for automatic sizing
['logs']
| summarize ['req/min'] = count() by bin_auto(_time)

// Latency percentiles - use percentiles_array for proper overlay
['logs']
| summarize percentiles_array(duration_ms, 50, 95, 99) by bin_auto(_time)
Best practices:
  • Use
    bin_auto(_time)
    instead of fixed
    bin(_time, 1m)
    — auto-adjusts to time window
  • Use
    percentiles_array()
    instead of multiple
    percentile()
    calls — renders as one chart
  • Too many series = unreadable; use
    top N
    or filter
**适用场景:**时间趋势、前后对比、速率变化。
apl
// 单一指标 - 使用bin_auto自动调整时间粒度
['logs']
| summarize ['req/min'] = count() by bin_auto(_time)

// 延迟分位数 - 使用percentiles_array实现正确的叠加展示
['logs']
| summarize percentiles_array(duration_ms, 50, 95, 99) by bin_auto(_time)
最佳实践:
  • bin_auto(_time)
    替代固定的
    bin(_time, 1m)
    ——会根据时间窗口自动调整
  • percentiles_array()
    替代多次
    percentile()
    调用——会渲染为单个图表
  • 过多序列会导致可读性下降;使用
    top N
    或过滤条件精简

Table

表格(Table)

When: Top-N lists, detailed breakdowns, exportable data.
apl
['logs']
| where status >= 500
| summarize errors = count() by route, error_message
| top 10 by errors
| project route, error_message, errors
Pitfalls:
  • Always use
    top N
    to prevent unbounded results
  • Use
    project
    to control column order and names
**适用场景:**Top-N列表、详细细分、可导出数据。
apl
['logs']
| where status >= 500
| summarize errors = count() by route, error_message
| top 10 by errors
| project route, error_message, errors
避坑指南:
  • 始终使用
    top N
    避免无限制结果
  • project
    控制列的顺序和名称

Pie

饼图(Pie)

When: Share-of-total for LOW cardinality dimensions (≤6 slices).
apl
['logs']
| summarize count() by status_class = case(
    status < 300, "2xx",
    status < 400, "3xx",
    status < 500, "4xx",
    "5xx"
  )
Pitfalls:
  • Never use for high cardinality (routes, user IDs)
  • Prefer tables for >6 categories
  • Always aggregate to reduce slices
**适用场景:**低基数维度(≤6个分类)的占比分析。
apl
['logs']
| summarize count() by status_class = case(
    status < 300, "2xx",
    status < 400, "3xx",
    status < 500, "4xx",
    "5xx"
  )
避坑指南:
  • 不要用于高基数维度(路由、用户ID)
  • 分类超过6个时优先使用表格
  • 始终先聚合以减少分类数量

LogStream

日志流(LogStream)

When: Raw event inspection, debugging, evidence gathering.
apl
['logs']
| where service == "api" and status >= 500
| project-keep _time, trace_id, route, status, error_message, duration_ms
| take 100
Pitfalls:
  • Always include
    take N
    (100-500 max)
  • Use
    project-keep
    to show relevant fields only
  • Filter aggressively—raw logs are expensive
**适用场景:**原始事件检查、调试、证据收集。
apl
['logs']
| where service == "api" and status >= 500
| project-keep _time, trace_id, route, status, error_message, duration_ms
| take 100
避坑指南:
  • 始终包含
    take N
    (最多100-500条)
  • project-keep
    只展示相关字段
  • 严格过滤——原始日志查询成本较高

Heatmap

热力图(Heatmap)

When: Distribution visualization, latency patterns, density analysis.
apl
['logs']
| summarize histogram(duration_ms, 15) by bin_auto(_time)
Best for: Latency distributions, response time patterns, identifying outliers.
**适用场景:**分布可视化、延迟模式、密度分析。
apl
['logs']
| summarize histogram(duration_ms, 15) by bin_auto(_time)
**最佳适用场景:**延迟分布、响应时间模式、异常值识别。

Scatter Plot

散点图(Scatter Plot)

When: Correlation between two metrics, identifying patterns.
apl
['logs']
| summarize avg(duration_ms), avg(resp_size_bytes) by route
Best for: Response size vs latency correlation, resource usage patterns.
**适用场景:**两个指标间的相关性分析、模式识别。
apl
['logs']
| summarize avg(duration_ms), avg(resp_size_bytes) by route
**最佳适用场景:**响应大小与延迟的相关性、资源使用模式。

SmartFilter (Filter Bar)

SmartFilter(过滤栏)

When: Interactive filtering for the entire dashboard.
SmartFilter is a chart type that creates dropdown/search filters. Requires:
  1. A
    SmartFilter
    chart with filter definitions
  2. declare query_parameters
    in each panel query
Filter types:
  • selectType: "apl"
    — Dynamic dropdown from APL query
  • selectType: "list"
    — Static dropdown with predefined options
  • type: "search"
    — Free-text input
Panel query pattern:
apl
declare query_parameters (country_filter:string = "");
['logs'] | where isempty(country_filter) or ['geo.country'] == country_filter
See
reference/smartfilter.md
for full JSON structure and cascading filter examples.
**适用场景:**为整个仪表板提供交互式过滤功能。
SmartFilter是一种图表类型,用于创建下拉/搜索过滤器。需要:
  1. 一个包含过滤规则的
    SmartFilter
    图表
  2. 每个面板查询中添加
    declare query_parameters
过滤类型:
  • selectType: "apl"
    —— 基于APL查询的动态下拉选项
  • selectType: "list"
    —— 预定义选项的静态下拉菜单
  • type: "search"
    —— 自由文本输入框
面板查询模式:
apl
declare query_parameters (country_filter:string = "");
['logs'] | where isempty(country_filter) or ['geo.country'] == country_filter
完整的JSON结构和级联过滤示例请参考
reference/smartfilter.md

Monitor List

监控列表(Monitor List)

When: Display monitor status on operational dashboards.
No APL needed—select monitors from the UI. Shows:
  • Monitor status (normal/triggered/off)
  • Run history (green/red squares)
  • Dataset, type, notifiers
**适用场景:**在运维仪表板上展示监控状态。
无需编写APL——从UI中选择监控项即可。展示内容包括:
  • 监控状态(正常/触发/关闭)
  • 运行历史(绿/红方块)
  • 数据集、类型、通知方式

Note

备注(Note)

When: Context, instructions, section headers.
Use GitHub Flavored Markdown for:
  • Dashboard purpose and audience
  • Runbook links
  • Section dividers
  • On-call instructions

**适用场景:**添加上下文信息、操作说明、章节标题。
支持GitHub风格的Markdown,可用于:
  • 仪表板用途和受众说明
  • 运行手册链接
  • 章节分隔符
  • 值班操作指南

Chart Configuration

图表配置

Charts support JSON configuration options beyond the query. See
reference/chart-config.md
for full details.
Quick reference:
Chart TypeKey Options
Statistic
colorScheme
,
customUnits
,
unit
,
showChart
(sparkline),
errorThreshold
/
warningThreshold
TimeSeries
aggChartOpts
:
variant
(line/area/bars),
scaleDistr
(linear/log),
displayNull
LogStream/Table
tableSettings
:
columns
,
fontSize
,
highlightSeverity
,
wrapLines
Pie
hideHeader
Note
text
(markdown),
variant
Common options (all charts):
  • overrideDashboardTimeRange
    : boolean
  • overrideDashboardCompareAgainst
    : boolean
  • hideHeader
    : boolean

图表支持除查询外的JSON配置选项。完整详情请参考
reference/chart-config.md
快速参考:
图表类型关键配置项
Statistic
colorScheme
customUnits
unit
showChart
(迷你趋势图)、
errorThreshold
/
warningThreshold
TimeSeries
aggChartOpts
:
variant
(折线/面积/柱状)、
scaleDistr
(线性/对数)、
displayNull
LogStream/Table
tableSettings
:
columns
fontSize
highlightSeverity
wrapLines
Pie
hideHeader
Note
text
(Markdown内容)、
variant
通用配置项(所有图表):
  • overrideDashboardTimeRange
    : 布尔值
  • overrideDashboardCompareAgainst
    : 布尔值
  • hideHeader
    : 布尔值

APL Patterns

APL模式

Time Filtering in Dashboards vs Ad-hoc Queries

仪表板查询与临时查询的时间过滤差异

Dashboard panel queries do NOT need explicit time filters. The dashboard UI time picker automatically scopes all queries to the selected time window.
apl
// DASHBOARD QUERY — no time filter needed
['logs']
| where service == "api"
| summarize count() by bin_auto(_time)
Ad-hoc queries (Axiom Query tab, axiom-sre exploration) MUST have explicit time filters:
apl
// AD-HOC QUERY — always include time filter
['logs']
| where _time between (ago(1h) .. now())
| where service == "api"
| summarize count() by bin_auto(_time)
**仪表板面板查询无需显式时间过滤。**仪表板UI的时间选择器会自动将所有查询限定在选定的时间窗口内。
apl
// 仪表板查询 —— 无需时间过滤
['logs']
| where service == "api"
| summarize count() by bin_auto(_time)
临时查询(Axiom查询标签页、axiom-sre探索分析)必须添加显式时间过滤:
apl
// 临时查询 —— 必须包含时间过滤
['logs']
| where _time between (ago(1h) .. now())
| where service == "api"
| summarize count() by bin_auto(_time)

Bin Size Selection

时间粒度选择

Prefer
bin_auto(_time)
— it automatically adjusts to the dashboard time window.
Manual bin sizes (only when auto doesn't fit your needs):
Time windowBin size
15m10s–30s
1h1m
6h5m
24h15m–1h
7d1h–6h
优先使用
bin_auto(_time)
——它会根据仪表板的时间窗口自动调整粒度。
手动设置粒度(仅当自动调整不符合需求时使用):
时间窗口推荐粒度
15分钟10秒–30秒
1小时1分钟
6小时5分钟
24小时15分钟–1小时
7天1小时–6小时

Cardinality Guardrails

基数限制准则

Prevent query explosion:
apl
// GOOD: bounded
| summarize count() by route | top 10 by count_

// BAD: unbounded high-cardinality grouping
| summarize count() by user_id  // millions of rows
避免查询结果爆炸:
apl
// 推荐:有界结果
| summarize count() by route | top 10 by count_

// 不推荐:无界高基数分组
| summarize count() by user_id  // 会产生数百万行结果

Field Escaping

字段转义

Fields with dots need bracket notation:
apl
| where ['kubernetes.pod.name'] == "frontend"
Fields with dots IN the name (not hierarchy) need escaping:
apl
| where ['kubernetes.labels.app\\.kubernetes\\.io/name'] == "frontend"
含点的字段需要使用方括号语法:
apl
| where ['kubernetes.pod.name'] == "frontend"
名称中包含点的字段(非层级结构)需要转义:
apl
| where ['kubernetes.labels.app\\.kubernetes\\.io/name'] == "frontend"

Golden Signal Queries

核心指标查询示例

Traffic:
apl
| summarize requests = count() by bin_auto(_time)
Errors (as rate %):
apl
| summarize total = count(), errors = countif(status >= 500) by bin_auto(_time)
| extend error_rate = iff(total > 0, round(100.0 * errors / total, 2), 0.0)
| project _time, error_rate
Latency (use percentiles_array for proper chart overlay):
apl
| summarize percentiles_array(duration_ms, 50, 95, 99) by bin_auto(_time)

流量:
apl
| summarize requests = count() by bin_auto(_time)
错误率(百分比形式):
apl
| summarize total = count(), errors = countif(status >= 500) by bin_auto(_time)
| extend error_rate = iff(total > 0, round(100.0 * errors / total, 2), 0.0)
| project _time, error_rate
延迟(使用percentiles_array实现正确的图表叠加):
apl
| summarize percentiles_array(duration_ms, 50, 95, 99) by bin_auto(_time)

Layout Composition

布局组合

Grid Principles

网格原则

  • Dashboard width = 12 units
  • Typical panel: w=3 (quarter), w=4 (third), w=6 (half), w=12 (full)
  • Stats row: 4 panels × w=3, h=2
  • TimeSeries row: 2 panels × w=6, h=4
  • Tables: w=6 or w=12, h=4–6
  • LogStream: w=12, h=6–8
  • 仪表板宽度为12单位
  • 典型面板尺寸:w=3(四分之一宽)、w=4(三分之一宽)、w=6(半宽)、w=12(全宽)
  • 统计行:4个面板 × w=3,h=2
  • 时间序列行:2个面板 × w=6,h=4
  • 表格:w=6或w=12,h=4–6
  • LogStream:w=12,h=6–8

Section Layout Pattern

章节布局模式

Row 0-1:  [Stat w=3] [Stat w=3] [Stat w=3] [Stat w=3]
Row 2-5:  [TimeSeries w=6, h=4] [TimeSeries w=6, h=4]
Row 6-9:  [Table w=6, h=4] [Pie w=6, h=4]
Row 10+:  [LogStream w=12, h=6]
行0-1:  [统计面板 w=3] [统计面板 w=3] [统计面板 w=3] [统计面板 w=3]
行2-5:  [时间序列面板 w=6, h=4] [时间序列面板 w=6, h=4]
行6-9:  [表格面板 w=6, h=4] [饼图面板 w=6, h=4]
行10+:  [LogStream面板 w=12, h=6]

Naming Conventions

命名规范

  • Use question-style titles: "Error rate by route" not "Errors"
  • Prefix with context if multi-service: "[API] Error rate"
  • Include units: "Latency (ms)", "Traffic (req/s)"

  • 使用问题式标题:比如“按路由划分的错误率”而非“错误”
  • 多服务场景下添加前缀:比如“[API] 错误率”
  • 包含单位:比如“延迟(ms)”、“流量(req/s)”

Dashboard Settings

仪表板设置

Refresh Rate

刷新频率

Dashboard auto-refreshes at configured interval. Options: 15s, 30s, 1m, 5m, etc.
⚠️ Query cost warning: Short refresh (15s) + long time range (90d) = expensive queries running constantly.
Recommendations:
Use caseRefresh rate
Oncall/real-time15s–30s
Team health1m–5m
Executive/weekly5m–15m
仪表板会按配置的间隔自动刷新。可选值:15秒、30秒、1分钟、5分钟等。
**⚠️ 查询成本警告:**短刷新间隔(15秒)+ 长时间范围(90天)= 持续运行的高成本查询。
推荐配置:
使用场景刷新频率
值班/实时监控15秒–30秒
团队健康度监控1分钟–5分钟
高管/周度汇报5分钟–15分钟

Sharing

共享设置

  • Just Me: Private, only you can access
  • Group: Specific team/group in your org
  • Everyone: All users in your Axiom org
Data visibility is still governed by dataset permissions—users only see data from datasets they can access.
  • 仅我可见:私有,只有你能访问
  • 指定群组:组织内的特定团队/群组
  • 所有人可见:Axiom组织内的所有用户
数据可见性仍受数据集权限管控——用户只能访问他们有权限的数据集。

URL Time Range Parameters

URL时间范围参数

?t_qr=24h
(quick range),
?t_ts=...&t_te=...
(custom),
?t_against=-1d
(comparison)

?t_qr=24h
(快速时间范围)、
?t_ts=...&t_te=...
(自定义时间范围)、
?t_against=-1d
(对比时间范围)

Setup

环境搭建

Run
scripts/setup
to check requirements (curl, jq, ~/.axiom.toml).
Config in
~/.axiom.toml
(shared with axiom-sre):
toml
[deployments.prod]
url = "https://api.axiom.co"
token = "xaat-your-token"
org_id = "your-org-id"

运行
scripts/setup
检查依赖项(curl、jq、~/.axiom.toml)。
配置文件位于
~/.axiom.toml
(与axiom-sre共享):
toml
[deployments.prod]
url = "https://api.axiom.co"
token = "xaat-your-token"
org_id = "your-org-id"

Deployment

部署流程

Scripts

脚本说明

ScriptUsage
scripts/get-user-id <deploy>
Get your user ID for
owner
field
scripts/dashboard-list <deploy>
List all dashboards
scripts/dashboard-get <deploy> <id>
Fetch dashboard JSON
scripts/dashboard-validate <file>
Validate JSON structure
scripts/dashboard-create <deploy> <file>
Create dashboard
scripts/dashboard-update <deploy> <id> <file>
Update (needs version)
scripts/dashboard-copy <deploy> <id>
Clone dashboard
scripts/dashboard-link <deploy> <id>
Get shareable URL
scripts/dashboard-delete <deploy> <id>
Delete (with confirm)
scripts/axiom-api <deploy> <method> <path>
Low-level API calls
脚本用途
scripts/get-user-id <deploy>
获取你的用户ID,用于
owner
字段
scripts/dashboard-list <deploy>
列出所有仪表板
scripts/dashboard-get <deploy> <id>
获取仪表板JSON内容
scripts/dashboard-validate <file>
验证JSON结构
scripts/dashboard-create <deploy> <file>
创建仪表板
scripts/dashboard-update <deploy> <id> <file>
更新仪表板(需要版本号)
scripts/dashboard-copy <deploy> <id>
克隆仪表板
scripts/dashboard-link <deploy> <id>
获取可共享的URL
scripts/dashboard-delete <deploy> <id>
删除仪表板(需确认)
scripts/axiom-api <deploy> <method> <path>
底层API调用

Workflow

工作流

⚠️ CRITICAL: Always validate queries BEFORE deploying.
  1. Design dashboard (sections + panels)
  2. Write APL for each panel
  3. Build JSON (from template or manually)
  4. Validate queries using axiom-sre with explicit time filter
  5. dashboard-validate
    to check structure
  6. dashboard-create
    or
    dashboard-update
    to deploy
  7. dashboard-link
    to get URL
    — NEVER construct Axiom URLs manually (org IDs and base URLs vary per deployment)
  8. Share link with user

⚠️ 关键:部署前务必验证所有查询。
  1. 设计仪表板(章节 + 面板)
  2. 为每个面板编写APL
  3. 构建JSON文件(基于模板或手动编写)
  4. 使用axiom-sre并添加显式时间过滤来验证查询
  5. 运行
    dashboard-validate
    检查结构
  6. 运行
    dashboard-create
    dashboard-update
    进行部署
  7. 使用
    dashboard-link
    获取URL
    —— 绝不手动构造Axiom URL(组织ID和基础URL因部署环境而异)
  8. 与用户共享链接

Sibling Skill Integration

关联技能集成

spl-to-apl: Translate Splunk SPL → APL. Map
timechart
→ TimeSeries,
stats
→ Statistic/Table. See
reference/splunk-migration.md
.
axiom-sre: Discover schema with
getschema
, explore baselines, identify dimensions, then productize into panels.

**spl-to-apl:**将Splunk SPL转换为APL。映射
timechart
到TimeSeries,
stats
到Statistic/Table。详情请参考
reference/splunk-migration.md
**axiom-sre:**使用
getschema
发现数据模式,探索基准线,识别维度,然后转化为可复用面板。

Templates

模板

Pre-built templates in
reference/templates/
:
TemplateUse case
service-overview.json
Single service oncall dashboard with Heatmap
service-overview-with-filters.json
Same with SmartFilter (route/status dropdowns)
api-health.json
HTTP API with traffic/errors/latency
blank.json
Minimal skeleton
Placeholders:
{{owner_id}}
,
{{service}}
,
{{dataset}}
Usage:
bash
USER_ID=$(scripts/get-user-id prod)
scripts/dashboard-from-template service-overview "my-service" "$USER_ID" "my-dataset" ./dashboard.json
scripts/dashboard-validate ./dashboard.json
scripts/dashboard-create prod ./dashboard.json
⚠️ Templates assume field names (
service
,
status
,
route
,
duration_ms
). Discover your schema first and use
sed
to fix mismatches.

预构建模板位于
reference/templates/
模板适用场景
service-overview.json
带热力图的单服务值班仪表板
service-overview-with-filters.json
带SmartFilter(路由/状态下拉)的单服务仪表板
api-health.json
包含流量/错误/延迟的HTTP API仪表板
blank.json
最小化骨架模板
占位符:
{{owner_id}}
{{service}}
{{dataset}}
使用方法:
bash
USER_ID=$(scripts/get-user-id prod)
scripts/dashboard-from-template service-overview "my-service" "$USER_ID" "my-dataset" ./dashboard.json
scripts/dashboard-validate ./dashboard.json
scripts/dashboard-create prod ./dashboard.json
⚠️ 模板假设字段名称
service
status
route
duration_ms
)。请先发现你的数据模式,再使用
sed
修正不匹配的字段名。

Common Pitfalls

常见问题

ProblemCauseSolution
"unable to find dataset" errorsDataset name doesn't exist in your orgCheck available datasets in Axiom UI
"creating dashboards for other users" 403Owner ID doesn't match your tokenUse
scripts/get-user-id prod
to get your UUID
All panels show errorsField names don't match your schemaDiscover schema first, use sed to fix field names
Dashboard shows no dataService filter too restrictiveRemove or adjust
where service == 'x'
filters
Queries time outMissing time filter or too broadDashboard inherits time from picker; ad-hoc queries need explicit time filter
Wrong org in dashboard URLManually constructed URLAlways use
dashboard-link <deploy> <id>
— never guess org IDs or base URLs

问题原因解决方案
“无法找到数据集”错误组织中不存在该数据集名称在Axiom UI中检查可用数据集
“为其他用户创建仪表板”403错误Owner ID与你的token不匹配使用
scripts/get-user-id prod
获取你的UUID
所有面板都显示错误字段名称与你的数据模式不匹配先发现数据模式,使用sed修正字段名
仪表板无数据展示服务过滤条件过于严格移除或调整
where service == 'x'
过滤条件
查询超时缺少时间过滤或范围过宽仪表板会继承时间选择器的范围;临时查询需要显式时间过滤
仪表板URL中的组织信息错误手动构造URL始终使用
dashboard-link <deploy> <id>
—— 绝不猜测组织ID或基础URL

Reference

参考文档

  • reference/chart-config.md
    — All chart configuration options (JSON)
  • reference/smartfilter.md
    — SmartFilter/FilterBar full configuration
  • reference/chart-cookbook.md
    — APL patterns per chart type
  • reference/layout-recipes.md
    — Grid layouts and section blueprints
  • reference/splunk-migration.md
    — Splunk panel → Axiom mapping
  • reference/design-playbook.md
    — Decision-first design principles
  • reference/templates/
    — Ready-to-use dashboard JSON files
  • reference/chart-config.md
    —— 所有图表的配置选项(JSON格式)
  • reference/smartfilter.md
    —— SmartFilter/过滤栏的完整配置
  • reference/chart-cookbook.md
    —— 各图表类型的APL模式
  • reference/layout-recipes.md
    —— 网格布局和章节蓝图
  • reference/splunk-migration.md
    —— Splunk面板到Axiom的映射规则
  • reference/design-playbook.md
    —— 决策优先的设计原则
  • reference/templates/
    —— 可直接使用的仪表板JSON文件