Skip to content

1.持久化方式

记录规则:根据查询创建新指标。

警报规则:从查询生成警报

可视化规则:使用Grafana等仪表板可视化查询

1.1 记录规则

记录规则是一种根据已有时间序列计算新时间序列(特别是聚合时间序列)的方法

  • 跨多个时间序列生成聚合
  • 预先计算消耗大的查询
  • 产生可用于生成警报的时间序列

2. 语法

官方文档

rule_group:

yml
# The name of the group. Must be unique within a file.
name: <string>

# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]

# Limit the number of alerts an alerting rule and series a recording
# rule can produce. 0 is no limit.
[ limit: <int> | default = 0 ]

# Offset the rule evaluation timestamp of this particular group by the specified duration into the past.
[ query_offset: <duration> | default = global.rule_query_offset ]

rules:
  [ - <rule> ... ]
# The name of the group. Must be unique within a file.
name: <string>

# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]

# Limit the number of alerts an alerting rule and series a recording
# rule can produce. 0 is no limit.
[ limit: <int> | default = 0 ]

# Offset the rule evaluation timestamp of this particular group by the specified duration into the past.
[ query_offset: <duration> | default = global.rule_query_offset ]

rules:
  [ - <rule> ... ]

rule:

yml
# The name of the time series to output to. Must be a valid metric name.
record: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>

# Labels to add or overwrite before storing the result.
labels:
  [ <labelname>: <labelvalue> ]
# The name of the time series to output to. Must be a valid metric name.
record: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>

# Labels to add or overwrite before storing the result.
labels:
  [ <labelname>: <labelvalue> ]

如:

yml
groups:
  - name: example
    rules:
    - record: code:prometheus_http_requests_total:sum
      expr: sum by (code) (prometheus_http_requests_total)
groups:
  - name: example
    rules:
    - record: code:prometheus_http_requests_total:sum
      expr: sum by (code) (prometheus_http_requests_total)
  • 检查语法
shell
promtool check rules /path/to/example.rules.yml
promtool check rules /path/to/example.rules.yml

3. 配置

vim /etc/prometheus/prometheus.yml

yml
rule_files:
  - "rules/*.yml"
rule_files:
  - "rules/*.yml"
  • 创建rules目录
bash
mkdir -p /data/monitor/prometheus/rules
mkdir -p /data/monitor/prometheus/rules
  • 创建记录规则
shell
cat > /data/monitor/prometheus/rules/node_record.yml <<EOF
groups:
  - name: test-record
    interval: 10s
    rules:
      - record: instance:memory:usage 
        expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
     
      - record: instance:cpu:load
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100)
EOF
cat > /data/monitor/prometheus/rules/node_record.yml <<EOF
groups:
  - name: test-record
    interval: 10s
    rules:
      - record: instance:memory:usage 
        expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
     
      - record: instance:cpu:load
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100)
EOF
  • 检查规则
bash
promtool check rules /data/monitor/prometheus/rules/node_record.yml
promtool check rules /data/monitor/prometheus/rules/node_record.yml
  • 热更新
curl -X POST http://10.103.236.199:19090/prometheus/-/reload
curl -X POST http://10.103.236.199:19090/prometheus/-/reload
  • 效果

image-20250213151127906

官当:

https://prometheus.io/docs/practices/rules/