1.持久化方式
记录规则:根据查询创建新指标。
警报规则:从查询生成警报
可视化规则:使用Grafana等仪表板可视化查询
1.1 记录规则
记录规则是一种根据已有时间序列计算新时间序列(特别是聚合时间序列)的方法
- 跨多个时间序列生成聚合
- 预先计算消耗大的查询
- 产生可用于生成警报的时间序列
2. 语法
rule_group:
yml
# The name of the group. Must be unique within a file.
name: <string>
# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]
# Limit the number of alerts an alerting rule and series a recording
# rule can produce. 0 is no limit.
[ limit: <int> | default = 0 ]
# Offset the rule evaluation timestamp of this particular group by the specified duration into the past.
[ query_offset: <duration> | default = global.rule_query_offset ]
rules:
[ - <rule> ... ]
# The name of the group. Must be unique within a file.
name: <string>
# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]
# Limit the number of alerts an alerting rule and series a recording
# rule can produce. 0 is no limit.
[ limit: <int> | default = 0 ]
# Offset the rule evaluation timestamp of this particular group by the specified duration into the past.
[ query_offset: <duration> | default = global.rule_query_offset ]
rules:
[ - <rule> ... ]
rule:
yml
# The name of the time series to output to. Must be a valid metric name.
record: <string>
# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
# The name of the time series to output to. Must be a valid metric name.
record: <string>
# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
如:
yml
groups:
- name: example
rules:
- record: code:prometheus_http_requests_total:sum
expr: sum by (code) (prometheus_http_requests_total)
groups:
- name: example
rules:
- record: code:prometheus_http_requests_total:sum
expr: sum by (code) (prometheus_http_requests_total)
- 检查语法
shell
promtool check rules /path/to/example.rules.yml
promtool check rules /path/to/example.rules.yml
3. 配置
vim /etc/prometheus/prometheus.yml
yml
rule_files:
- "rules/*.yml"
rule_files:
- "rules/*.yml"
- 创建rules目录
bash
mkdir -p /data/monitor/prometheus/rules
mkdir -p /data/monitor/prometheus/rules
- 创建记录规则
shell
cat > /data/monitor/prometheus/rules/node_record.yml <<EOF
groups:
- name: test-record
interval: 10s
rules:
- record: instance:memory:usage
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
- record: instance:cpu:load
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100)
EOF
cat > /data/monitor/prometheus/rules/node_record.yml <<EOF
groups:
- name: test-record
interval: 10s
rules:
- record: instance:memory:usage
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
- record: instance:cpu:load
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100)
EOF
- 检查规则
bash
promtool check rules /data/monitor/prometheus/rules/node_record.yml
promtool check rules /data/monitor/prometheus/rules/node_record.yml
- 热更新
curl -X POST http://10.103.236.199:19090/prometheus/-/reload
curl -X POST http://10.103.236.199:19090/prometheus/-/reload
- 效果
官当: