Skip to content

1. 配置告警

导入sql配置

1.0 测试告警

点击模版管理-->自定义模版

image-20240705111101407

这些就是通过sql导入进来的信息

❌ 注意

此次是以飞书为例,飞书webhook 定义关键字时需注意和你报警模版中的关键字保持一致即可,否则不能发出警告

编辑飞书这个模版,配置好之后,点击--->告警管理--->告警测试--->飞书

image-20240705115043908

  • 飞书效果

image-20240705115114052

1.1 模版配置

1.1.1 添加自定义模版-ec2

image-20240705104726474

image-20240705105026006

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}{{if eq $v.status
"resolved"}}[PROMETHEUS-恢复信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**✅
> <font color="info">告警级别:</font> {{$v.labels.severity}}
> <font color="info">当前状态:</font> **<font color="#67C23A">已恢复</font>**
> <font color="info">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="info">结束时间:</font> {{GetCSTtime $v.endsAt}}
> <font color="info">实例地址:</font> {{$v.labels.instance}}
> <font color="info">**{{$v.annotations.description}}**</font>
{{else}}[PROMETHEUS-告警信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**🔥
> <font color="#FF0000">告警级别:</font> {{$v.labels.severity}}
> <font color="#FF0000">当前状态:</font> **<font color="#E6A23C">需要处理</font>**
> <font color="#FF0000">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="#FF0000">实例地址:</font> {{$v.labels.instance}}
> <font color="#FF0000">**{{$v.annotations.description}}**</font>
{{end}}{{ end }}
{{ $urimsg:=""}}{{ range $key,$value:=.commonLabels }}{{$urimsg =
print $urimsg $key "%3D%22" $value "%22%2C" }}{{end}}[✍点我屏蔽该告警](http://alertmanager.ikubernetes.net/#/silences/new?filter=%7B{{SplitString $urimsg 0 -3}}%7D)
{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}{{if eq $v.status
"resolved"}}[PROMETHEUS-恢复信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**✅
> <font color="info">告警级别:</font> {{$v.labels.severity}}
> <font color="info">当前状态:</font> **<font color="#67C23A">已恢复</font>**
> <font color="info">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="info">结束时间:</font> {{GetCSTtime $v.endsAt}}
> <font color="info">实例地址:</font> {{$v.labels.instance}}
> <font color="info">**{{$v.annotations.description}}**</font>
{{else}}[PROMETHEUS-告警信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**🔥
> <font color="#FF0000">告警级别:</font> {{$v.labels.severity}}
> <font color="#FF0000">当前状态:</font> **<font color="#E6A23C">需要处理</font>**
> <font color="#FF0000">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="#FF0000">实例地址:</font> {{$v.labels.instance}}
> <font color="#FF0000">**{{$v.annotations.description}}**</font>
{{end}}{{ end }}
{{ $urimsg:=""}}{{ range $key,$value:=.commonLabels }}{{$urimsg =
print $urimsg $key "%3D%22" $value "%22%2C" }}{{end}}[✍点我屏蔽该告警](http://alertmanager.ikubernetes.net/#/silences/new?filter=%7B{{SplitString $urimsg 0 -3}}%7D)

1.1.2 添加自定义模版Pod

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}{{if eq $v.status "resolved"}}[PROMETHEUS-恢复信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**✅
> <font color="info">告警级别:</font> {{$v.labels.severity}}
> <font color="info">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="info">结束时间:</font> {{GetCSTtime $v.endsAt}}
> <font color="info">命名空间:</font> {{$v.labels.namespace}}
> <font color="info">实例名称:</font> {{$v.labels.pod}}
> <font color="info">实例地址:</font> {{$v.labels.instance}}
> <font color="info">**{{$v.annotations.description}}**</font>
{{else}}[PROMETHEUS-告警信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**🔥
> <font color="#FF0000">告警级别:</font> {{$v.labels.severity}}
> <font color="#FF0000">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="#FF0000">命名空间:</font> {{$v.labels.namespace}}
> <font color="#FF0000">实例名称:</font> {{$v.labels.pod}}
> <font color="#FF0000">实例地址:</font> {{$v.labels.instance}}
> <font color="#FF0000">**{{$v.annotations.description}}**</font>
{{end}}{{ end }}
{{ $urimsg:=""}}{{ range $key,$value:=.commonLabels }}{{$urimsg =
print $urimsg $key "%3D%22" $value "%22%2C" }}{{end}}[✍点我屏蔽该告警](http://alertmanager.ikubernetes.net/#/silences/new?filter=%7B{{SplitString $urimsg 0 -3}}%7D)
{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}{{if eq $v.status "resolved"}}[PROMETHEUS-恢复信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**✅
> <font color="info">告警级别:</font> {{$v.labels.severity}}
> <font color="info">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="info">结束时间:</font> {{GetCSTtime $v.endsAt}}
> <font color="info">命名空间:</font> {{$v.labels.namespace}}
> <font color="info">实例名称:</font> {{$v.labels.pod}}
> <font color="info">实例地址:</font> {{$v.labels.instance}}
> <font color="info">**{{$v.annotations.description}}**</font>
{{else}}[PROMETHEUS-告警信息]({{$v.generatorURL}})
> **[{{$v.labels.alertname}}]({{$var}})**🔥
> <font color="#FF0000">告警级别:</font> {{$v.labels.severity}}
> <font color="#FF0000">开始时间:</font> {{GetCSTtime $v.startsAt}}
> <font color="#FF0000">命名空间:</font> {{$v.labels.namespace}}
> <font color="#FF0000">实例名称:</font> {{$v.labels.pod}}
> <font color="#FF0000">实例地址:</font> {{$v.labels.instance}}
> <font color="#FF0000">**{{$v.annotations.description}}**</font>
{{end}}{{ end }}
{{ $urimsg:=""}}{{ range $key,$value:=.commonLabels }}{{$urimsg =
print $urimsg $key "%3D%22" $value "%22%2C" }}{{end}}[✍点我屏蔽该告警](http://alertmanager.ikubernetes.net/#/silences/new?filter=%7B{{SplitString $urimsg 0 -3}}%7D)

1.2 告警路由

告警路由的名称,实例-环境-部门-创建人,这样当大量路由创建的时候容易区分

1.2.1 路由作用

可根据路由配置,分发到不同群组

如果不指定默认路由,告警则走默认告警组

1.2.2 创建路由

image-20240705142559024

  • 发送没有指定标签
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"hostname":"you"},"annotations":{"summary":"This is a testalert"}}]'
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"hostname":"you"},"annotations":{"summary":"This is a testalert"}}]'
  • 效果

image-20240705142921182

  • 指定根据路由标签名字发送
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"hostname":"han"},"annotations":{"summary":"This is a testalert"}}]'
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"hostname":"han"},"annotations":{"summary":"This is a testalert"}}]'
  • 效果,devops组默认接收

image-20240705143140290

1.2.3 路由正则

1.3 Alertmanger接入PrometheusAlert

此次采用Prometheus Rules

1.3.1 alertmanager配置

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitor
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.qq.com:465'     # 邮箱服务器的SMTP主机配置
      smtp_from: 'xxx@qq.com'    # 发送邮件主题
      smtp_auth_username: 'xx@qq.com'      # 登录用户名
      smtp_auth_password: 'djubruxxxeee'    # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
      smtp_require_tls: false           # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
    templates:
      - '/etc/alertmanager/*.tmpl'
    route:
      group_by: ['env','instance','type','group','job','alertname','cluster']
      group_wait: 10s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
      group_interval: 1m # 发送新告警间隔时间
      repeat_interval: 10m # 重复告警间隔发送时间,如果没处理过多久再次发送一次
      receiver: 'webhook'  #  默认接收人
      routes:
      - receiver: 'webhook'
        match:
          severity: critical

      - receiver: 'email'
        match:
          severity: critical

      - receiver: 'wechat'
        match:
          severity: critical222

      - receiver: 'dingtalk'
        match:
          severity: warning

    receivers:
    - name: 'email'
      email_configs:
      - to: 'xxxsource@163.com'
        send_resolved: true
        html: '{{ template "email.to.html" . }}'
        headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }

    #- name: 'devops'
    #  email_configs:
    #  - to: 'hxopxxource@163.com,xxx@qq.com'
    #    send_resolved: true
    #    html: '{{ template "email.to.html" . }}'

    - name: 'wechat'
      wechat_configs:
      - corp_id: 'wwe158c08ab4275006'
        to_party: '1'
        to_user: '@all'
        agent_id: 1000004
        api_secret: 'eGORelIo1EqzLfgEQs3oPfsdtAOm5nkGELI-Ag3TTwo'
        send_resolved: true

    - name: 'dingtalk'
      webhook_configs:
      - url: 'http://dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook/send'
        send_resolved: true

    - name: 'webhook'
      webhook_configs:
      - url: 'http://prometheus-alert-center.monitor.svc:8080/prometheusalert?type=fs&tpl=prometheus-fs&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/51eb144c-cc6a-xxxx-4c9b23'
        send_resolved: true

    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']

  wechat.tmpl: |-
    {{ define "wechat.default.message" }}
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{- range $index, $alert := .Alerts -}}
    {{- if eq $index 0 }}
    ========= 监控报警 =========
    告警状态:{{   .Status }}
    告警级别:{{ .Labels.severity }}
    告警类型:{{ $alert.Labels.alertname }}
    故障主机: {{ $alert.Labels.instance }}
    告警主题: {{ $alert.Annotations.summary }}
    告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
    触发阀值:{{ .Annotations.value }}
    故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
    ========= = end =  =========
    {{- end }}
    {{- end }}
    {{- end }}
    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{- range $index, $alert := .Alerts -}}
    {{- if eq $index 0 }}
    ========= 告警恢复 =========
    告警类型:{{ .Labels.alertname }}
    告警状态:{{   .Status }}
    告警主题: {{ $alert.Annotations.summary }}
    告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
    故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
    恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
    {{- if gt (len $alert.Labels.instance) 0 }}
    实例信息: {{ $alert.Labels.instance }}
    {{- end }}
    ========= = end =  =========
    {{- end }}
    {{- end }}
    {{- end }}
    {{- end }}

  email.tmpl: |-
    {{ define "email.from" }}xxx.com{{ end }}
    {{ define "email.to" }}xxx.com{{ end }}
    {{ define "email.to.html" }}
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{ range .Alerts }}
    ========= 监控报警 =========<br>
    告警程序: prometheus_alert <br>
    告警级别: {{ .Labels.severity }} <br>
    告警类型: {{ .Labels.alertname }} <br>
    告警主机: {{ .Labels.instance }} <br>
    告警主题: {{ .Annotations.summary }}  <br>
    告警详情: {{ .Annotations.description }} <br>
    触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
    ========= = end =  =========<br>
    {{ end }}{{ end -}}

    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{ range .Alerts }}
    ========= 告警恢复 =========<br>
    告警程序: prometheus_alert <br>
    告警级别: {{ .Labels.severity }} <br>
    告警类型: {{ .Labels.alertname }} <br>
    告警主机: {{ .Labels.instance }} <br>
    告警主题: {{ .Annotations.summary }} <br>
    告警详情: {{ .Annotations.description }} <br>
    触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
    恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>
    ========= = end =  =========<br>
    {{ end }}{{ end -}}

    {{- end }}
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitor
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.qq.com:465'     # 邮箱服务器的SMTP主机配置
      smtp_from: 'xxx@qq.com'    # 发送邮件主题
      smtp_auth_username: 'xx@qq.com'      # 登录用户名
      smtp_auth_password: 'djubruxxxeee'    # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
      smtp_require_tls: false           # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
    templates:
      - '/etc/alertmanager/*.tmpl'
    route:
      group_by: ['env','instance','type','group','job','alertname','cluster']
      group_wait: 10s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
      group_interval: 1m # 发送新告警间隔时间
      repeat_interval: 10m # 重复告警间隔发送时间,如果没处理过多久再次发送一次
      receiver: 'webhook'  #  默认接收人
      routes:
      - receiver: 'webhook'
        match:
          severity: critical

      - receiver: 'email'
        match:
          severity: critical

      - receiver: 'wechat'
        match:
          severity: critical222

      - receiver: 'dingtalk'
        match:
          severity: warning

    receivers:
    - name: 'email'
      email_configs:
      - to: 'xxxsource@163.com'
        send_resolved: true
        html: '{{ template "email.to.html" . }}'
        headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }

    #- name: 'devops'
    #  email_configs:
    #  - to: 'hxopxxource@163.com,xxx@qq.com'
    #    send_resolved: true
    #    html: '{{ template "email.to.html" . }}'

    - name: 'wechat'
      wechat_configs:
      - corp_id: 'wwe158c08ab4275006'
        to_party: '1'
        to_user: '@all'
        agent_id: 1000004
        api_secret: 'eGORelIo1EqzLfgEQs3oPfsdtAOm5nkGELI-Ag3TTwo'
        send_resolved: true

    - name: 'dingtalk'
      webhook_configs:
      - url: 'http://dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook/send'
        send_resolved: true

    - name: 'webhook'
      webhook_configs:
      - url: 'http://prometheus-alert-center.monitor.svc:8080/prometheusalert?type=fs&tpl=prometheus-fs&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/51eb144c-cc6a-xxxx-4c9b23'
        send_resolved: true

    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']

  wechat.tmpl: |-
    {{ define "wechat.default.message" }}
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{- range $index, $alert := .Alerts -}}
    {{- if eq $index 0 }}
    ========= 监控报警 =========
    告警状态:{{   .Status }}
    告警级别:{{ .Labels.severity }}
    告警类型:{{ $alert.Labels.alertname }}
    故障主机: {{ $alert.Labels.instance }}
    告警主题: {{ $alert.Annotations.summary }}
    告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
    触发阀值:{{ .Annotations.value }}
    故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
    ========= = end =  =========
    {{- end }}
    {{- end }}
    {{- end }}
    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{- range $index, $alert := .Alerts -}}
    {{- if eq $index 0 }}
    ========= 告警恢复 =========
    告警类型:{{ .Labels.alertname }}
    告警状态:{{   .Status }}
    告警主题: {{ $alert.Annotations.summary }}
    告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
    故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
    恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
    {{- if gt (len $alert.Labels.instance) 0 }}
    实例信息: {{ $alert.Labels.instance }}
    {{- end }}
    ========= = end =  =========
    {{- end }}
    {{- end }}
    {{- end }}
    {{- end }}

  email.tmpl: |-
    {{ define "email.from" }}xxx.com{{ end }}
    {{ define "email.to" }}xxx.com{{ end }}
    {{ define "email.to.html" }}
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{ range .Alerts }}
    ========= 监控报警 =========<br>
    告警程序: prometheus_alert <br>
    告警级别: {{ .Labels.severity }} <br>
    告警类型: {{ .Labels.alertname }} <br>
    告警主机: {{ .Labels.instance }} <br>
    告警主题: {{ .Annotations.summary }}  <br>
    告警详情: {{ .Annotations.description }} <br>
    触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
    ========= = end =  =========<br>
    {{ end }}{{ end -}}

    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{ range .Alerts }}
    ========= 告警恢复 =========<br>
    告警程序: prometheus_alert <br>
    告警级别: {{ .Labels.severity }} <br>
    告警类型: {{ .Labels.alertname }} <br>
    告警主机: {{ .Labels.instance }} <br>
    告警主题: {{ .Annotations.summary }} <br>
    告警详情: {{ .Annotations.description }} <br>
    触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
    恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>
    ========= = end =  =========<br>
    {{ end }}{{ end -}}

    {{- end }}
  • 执行apply
  • 热更新
bash
curl -XPOST http://alertmanager.ikubernetes.net/-/reload
curl -XPOST http://alertmanager.ikubernetes.net/-/reload

2. 配置详解