grafana-platform-dashboard
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGrafana Platform Dashboard
Grafana平台仪表盘
Design platform operations dashboards so operators see tenant-impacting risk first, then drill into service-specific health without overload.
设计平台运维仪表盘,让运维人员首先看到影响租户的风险,然后无需过载即可深入查看特定服务的健康状态。
Quick Start
快速开始
Use this skill when the user asks for platform dashboard updates and reliability checks.
- Confirm dashboard target:
bash
oc --context <ctx> get grafanadashboard -A | rg -i '<dashboard-name-or-theme>'- Export dashboard and JSON:
bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh export \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--out-dir /tmp/<workspace>- Edit the JSON and validate all PromQL:
bash
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json /tmp/<workspace>/<name>.json- Apply live safely:
bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--json /tmp/<workspace>/<name>.json当用户需要更新平台仪表盘和进行可靠性检查时使用本技能。
- 确认仪表盘目标:
bash
oc --context <ctx> get grafanadashboard -A | rg -i '<dashboard-name-or-theme>'- 导出仪表盘和JSON:
bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh export \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--out-dir /tmp/<workspace>- 编辑JSON并验证所有PromQL:
bash
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json /tmp/<workspace>/<name>.json- 安全地实时应用:
bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--json /tmp/<workspace>/<name>.jsonWorkflow
工作流
1) Lock Scope From Platform Contracts
1) 根据平台协议锁定范围
Use the platform contract in platform-contract.md before editing panels.
- Keep L1 command view constrained to critical pre-tenant-impact signals.
- Use gate-aligned components first (critical CO gate, nodes, MCP, core API/etcd/ingress).
- Keep service-specific sections (Crossplane, Keycloak) below L1.
编辑面板前请参考platform-contract.md中的平台协议。
- 保持L1命令视图仅限定于会预先影响租户的关键信号。
- 优先使用与关口对齐的组件(关键CO关口、节点、MCP、核心API/etcd/ingress)。
- 将特定服务板块(Crossplane、Keycloak)放在L1下方。
2) Enforce Information Architecture
2) 执行信息架构规范
Use layout-guidelines.md:
- L1: critical-only, immediate action, minimal panel budget.
- L2: platform services by dependency domain.
- L3: deep dives (for example future GPU dashboard), not in L1.
参考layout-guidelines.md:
- L1:仅展示关键内容、需立即处理的信息,严格控制面板数量。
- L2:按依赖领域划分的平台服务。
- L3:深度排查内容(例如未来的GPU仪表盘),不放在L1中。
3) Build Queries From Known Library
3) 从已知库构建查询
Use promql-library.md:
- Start from known-good queries and adapt labels minimally.
- Prefer counts and action tables over decorative charts.
- Filter alert noise explicitly (for example ArgoCD/GitOps) when requested.
参考promql-library.md:
- 从已验证可用的查询开始,仅对标签做最小调整。
- 优先使用计数和操作表,而非装饰性图表。
- 如有需要,显式过滤告警噪音(例如ArgoCD/GitOps产生的)。
4) Validate Before Apply
4) 应用前验证
Always run the scan script after edits:
bash
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json <file.json> \
--output <scan.tsv>Pass criteria: all queries report , zero bad/parse errors.
success编辑完成后务必运行扫描脚本:
bash
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json <file.json> \
--output <scan.tsv>通过标准:所有查询返回,无错误/解析错误。
success5) Apply and Verify Sync
5) 应用并验证同步
Apply only after validation succeeds:
bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply ...
oc --context <ctx> -n <ns> get grafanadashboard <name> \
-o jsonpath='{.status.conditions[?(@.type=="DashboardSynchronized")].status}{"|"}{.status.conditions[?(@.type=="DashboardSynchronized")].reason}{"\n"}'仅在验证通过后应用变更:
bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply ...
oc --context <ctx> -n <ns> get grafanadashboard <name> \
-o jsonpath='{.status.conditions[?(@.type=="DashboardSynchronized")].status}{"|"}{.status.conditions[?(@.type=="DashboardSynchronized")].reason}{"\n"}'6) Close With Operator-Focused Summary
6) 以运维为核心的收尾总结
Report:
- What changed (panel names and intent).
- Validation result (query count and failures).
- Sync status and any residual risk.
- Next step: promote live changes into GitOps-managed source.
报告内容:
- 变更内容(面板名称和调整目的)。
- 验证结果(查询数量和失败数)。
- 同步状态和所有剩余风险。
- 下一步:将实时变更推广到GitOps管理的源代码中。
Design Rules
设计规则
- Put critical tenant-impact predictors first.
- Every red panel must imply an action path.
- Avoid ambiguous panel names (for example replace “platform pods” with concrete namespace scope).
- Keep L1 low-noise; move detail below or to dedicated dashboards.
- Keep GPU deep diagnostics in a dedicated GPU dashboard, not mixed into L1.
- 优先放置影响租户的关键预测指标。
- 所有红色面板都必须对应明确的操作路径。
- 避免模糊的面板名称(例如将“平台pods”替换为具体的命名空间范围)。
- 保持L1低噪音;将详细内容移到下方或专属仪表盘。
- 将GPU深度诊断内容放在专属GPU仪表盘,不要混入L1。
References
参考资料
- Platform Contract
- PromQL Panel Library
- Layout Guidelines
- 平台协议
- PromQL面板库
- 布局规范