prometheus-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese基础规则
Basic Rules
当用户询问类似 "我的id:xxx name:gis 服务现在什么问题?"、"xxx_gis服务有啥问题没" 或 "帮我看下projectId=xxx, projectName=gis 的服务现状" 等想查询服务监控指标的问题时,请按照以下 7 个步骤进行排查,并将结果汇总分析服务的当前状态(注意替换application)。
-
查询当前 CPU 使用量 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum(rate(container_cpu_user_seconds_total{image!="",application="xxx_gis"}[30s])) by(application) * 100'结果举例:"查询到的最新值: 32.3487092130255" 代表查询使用了 32% 左右的 CPU(最大不是 100%,而是申请核心数的百分比,如申请 3 核心则最大是 300%)。 -
查询当前 CPU 申请量 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum((container_spec_cpu_quota{application="xxx_gis", image!=""}) /1000) by (application)'结果举例:"查询到的最新值: 6181" 代表这个服务总共申请了 6181% 的 CPU。 -
查询当前 Memory 使用量 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum(container_memory_rss{image!="",application="xxx_gis"}) by (application)'结果举例:"查询到的最新值: 21611335680.0" (单位是 byte) -
查询当前 Memory 申请量 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum(container_spec_memory_limit_bytes{image!="",application="xxx_gis"}) by (application)'结果举例:"查询到的最新值: 75161927680.0" (单位是 byte) -
查询当前 Load 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum(container_cpu_load_average_10s{application="xxx_gis"}) by (application)'结果举例:"查询到的最新值: 1242.0" ,注意这个值应该和cpu申请量对比看,理论上小于申请量的80%为最佳,比如cpu申请量是2000,则1242并不高 -
查询当前 HeapUsed 使用量 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum(jvm_memory_used_bytes{application="xxx_gis"}) by (application)'结果举例:"查询到的最新值: 15085025864.0" (单位 byte) -
查询当前 HeapMax 最大量 构造 promql 语句并运行,例如:
prometheus.pybashuv run prometheus.py 'sum(jvm_memory_max_bytes{application="xxx_gis"}) by (application)'结果举例:"查询到的最新值: 100503912448.0" (单位 byte)
When users ask questions like "What's wrong with my service with id:xxx name:gis?", "Is there any problem with the xxx_gis service?" or "Help me check the status of the service with projectId=xxx, projectName=gis" to inquire about service monitoring metrics, please follow the 7 steps below for troubleshooting, and summarize and analyze the current status of the service (note to replace the application).
- Query Current CPU Usage
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum(rate(container_cpu_user_seconds_total{image!="",application="xxx_gis"}[30s])) by(application) * 100'Example Result: "Latest value queried: 32.3487092130255" This means the CPU usage is around 32% (the maximum is not 100%, but a percentage of the requested cores; for example, if 3 cores are requested, the maximum is 300%).
- Query Current CPU Request Amount
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum((container_spec_cpu_quota{application="xxx_gis", image!=""}) /1000) by (application)'Example Result: "Latest value queried: 6181" This means the service has requested a total of 6181% CPU.
- Query Current Memory Usage
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum(container_memory_rss{image!="",application="xxx_gis"}) by (application)'Example Result: "Latest value queried: 21611335680.0" (unit: byte)
- Query Current Memory Request Amount
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum(container_spec_memory_limit_bytes{image!="",application="xxx_gis"}) by (application)'Example Result: "Latest value queried: 75161927680.0" (unit: byte)
- Query Current Load
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum(container_cpu_load_average_10s{application="xxx_gis"}) by (application)'Example Result: "Latest value queried: 1242.0", note that this value should be compared with the CPU request amount. Theoretically, it is optimal if it is less than 80% of the request amount. For example, if the CPU request amount is 2000, 1242 is not high.
- Query Current HeapUsed Usage
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum(jvm_memory_used_bytes{application="xxx_gis"}) by (application)'Example Result: "Latest value queried: 15085025864.0" (unit: byte)
- Query Current HeapMax Maximum Amount
Construct a PromQL statement and run , for example:
prometheus.py
bash
uv run prometheus.py 'sum(jvm_memory_max_bytes{application="xxx_gis"}) by (application)'Example Result: "Latest value queried: 100503912448.0" (unit: byte)