Skip to content

1. 基于钉钉的报警媒介

1.1 生成钉钉机器人

1.打开钉钉的智能群助手,点击添加机器人

img

2.选择自定义机器人

img

img

3.复制webhook地址后点击保存

img

2. dingtalk部署

2.1 创建configmap

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dingtalk-cm
  namespace: monitor
data:
  config.yml: |-
    templates:
      - /etc/prometheus-webhook-dingtalk/dingding.tmpl
    targets:
      webhook:
        url: https://oapi.dingtalk.com/robot/send?access_token=e5f3dbfbf4e3070427486e1ca288f3077aa5155d51f33ea012a838cc3070eb53
        message:
          text: '{{ template "dingtalk.to.message" . }}'
  dingding.tmpl: |-
    {{ define "dingtalk.to.message" }}
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{- range $index, $alert := .Alerts -}}

    =========  **监控告警** ========= <br>

    **告警集群:**     k8s <br>
    **告警类型:**    {{ $alert.Labels.alertname }} <br>
    **告警级别:**    {{ $alert.Labels.severity }} <br>
    **告警状态:**    {{ .Status }} <br>
    **故障主机:**    {{ $alert.Labels.instance }} {{ $alert.Labels.device }} <br>
    **告警主题:**    {{ .Annotations.summary }} <br>
    **告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
    **主机标签:**    {{ range .Labels.SortedPairs  }}  </br> [{{ .Name }}: {{ .Value | markdown | html }} ] <br>
    {{- end }}

    **故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
    ========= = **end** =  =========
    {{- end }}
    {{- end }}

    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{- range $index, $alert := .Alerts -}}

    ========= **故障恢复** ========= <br>
    **告警集群:**     k8s <br>
    **告警主题:**    {{ $alert.Annotations.summary }} <br>
    **告警主机:**    {{ .Labels.instance }} <br>
    **告警类型:**    {{ .Labels.alertname }} <br>
    **告警级别:**    {{ $alert.Labels.severity }} <br>
    **告警状态:**    {{ .Status }} <br>
    **告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
    **故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
    **恢复时间:**    {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>

    ========= = **end** =  =========
    {{- end }}
    {{- end }}
    {{- end }}
apiVersion: v1
kind: ConfigMap
metadata:
  name: dingtalk-cm
  namespace: monitor
data:
  config.yml: |-
    templates:
      - /etc/prometheus-webhook-dingtalk/dingding.tmpl
    targets:
      webhook:
        url: https://oapi.dingtalk.com/robot/send?access_token=e5f3dbfbf4e3070427486e1ca288f3077aa5155d51f33ea012a838cc3070eb53
        message:
          text: '{{ template "dingtalk.to.message" . }}'
  dingding.tmpl: |-
    {{ define "dingtalk.to.message" }}
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{- range $index, $alert := .Alerts -}}

    =========  **监控告警** ========= <br>

    **告警集群:**     k8s <br>
    **告警类型:**    {{ $alert.Labels.alertname }} <br>
    **告警级别:**    {{ $alert.Labels.severity }} <br>
    **告警状态:**    {{ .Status }} <br>
    **故障主机:**    {{ $alert.Labels.instance }} {{ $alert.Labels.device }} <br>
    **告警主题:**    {{ .Annotations.summary }} <br>
    **告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
    **主机标签:**    {{ range .Labels.SortedPairs  }}  </br> [{{ .Name }}: {{ .Value | markdown | html }} ] <br>
    {{- end }}

    **故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
    ========= = **end** =  =========
    {{- end }}
    {{- end }}

    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{- range $index, $alert := .Alerts -}}

    ========= **故障恢复** ========= <br>
    **告警集群:**     k8s <br>
    **告警主题:**    {{ $alert.Annotations.summary }} <br>
    **告警主机:**    {{ .Labels.instance }} <br>
    **告警类型:**    {{ .Labels.alertname }} <br>
    **告警级别:**    {{ $alert.Labels.severity }} <br>
    **告警状态:**    {{ .Status }} <br>
    **告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
    **故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
    **恢复时间:**    {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>

    ========= = **end** =  =========
    {{- end }}
    {{- end }}
    {{- end }}
  • apply
bash
kubectl apply -f  5.dingtalk-configmap.yaml
kubectl apply -f  5.dingtalk-configmap.yaml

2.2 创建dp

yaml
apiVersion: v1
kind: Service
metadata:
  name: dingtalk
  namespace: monitor
spec:
  selector:
    app: dingtalk
  ports:
    - name: http
      protocol: TCP
      port: 8060
      targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dingtalk
  namespace: monitor
  labels:
    app: dingtalk
spec:
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  selector:
    matchLabels:
      app: dingtalk
  template:
    metadata:
      labels:
        app: dingtalk
    spec:
      restartPolicy: "Always"
      containers:
        - name: dingtalk
          image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/prometheus-webhook-dingtalk:v2.1.0
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: dingtalk-conf
              mountPath: /etc/prometheus-webhook-dingtalk/
          resources:
            limits:
              cpu: "400m"
              memory: "500Mi"
            requests:
              cpu: "100m"
              memory: "100Mi"
          ports:
            - containerPort: 8060
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            periodSeconds: 5
            initialDelaySeconds: 30
            successThreshold: 1
            tcpSocket:
              port: 8060
          livenessProbe:
            tcpSocket:
              port: 8060
            initialDelaySeconds: 30
            periodSeconds: 10
      volumes:
        - name: dingtalk-conf
          configMap:
            name: dingtalk-cm
apiVersion: v1
kind: Service
metadata:
  name: dingtalk
  namespace: monitor
spec:
  selector:
    app: dingtalk
  ports:
    - name: http
      protocol: TCP
      port: 8060
      targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dingtalk
  namespace: monitor
  labels:
    app: dingtalk
spec:
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  selector:
    matchLabels:
      app: dingtalk
  template:
    metadata:
      labels:
        app: dingtalk
    spec:
      restartPolicy: "Always"
      containers:
        - name: dingtalk
          image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/prometheus-webhook-dingtalk:v2.1.0
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: dingtalk-conf
              mountPath: /etc/prometheus-webhook-dingtalk/
          resources:
            limits:
              cpu: "400m"
              memory: "500Mi"
            requests:
              cpu: "100m"
              memory: "100Mi"
          ports:
            - containerPort: 8060
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            periodSeconds: 5
            initialDelaySeconds: 30
            successThreshold: 1
            tcpSocket:
              port: 8060
          livenessProbe:
            tcpSocket:
              port: 8060
            initialDelaySeconds: 30
            periodSeconds: 10
      volumes:
        - name: dingtalk-conf
          configMap:
            name: dingtalk-cm
  • apply
bash
 kubectl apply -f 6.dingtalk-webhook-delpoy.yaml
 kubectl apply -f 6.dingtalk-webhook-delpoy.yaml
  • 热更新
bash
curl -XPOST http://alertmanager.ikubernetes.net/-/reload
curl -XPOST http://alertmanager.ikubernetes.net/-/reload

2.2 配置Alertmanger

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitor
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.qq.com:465'     # 邮箱服务器的SMTP主机配置
      smtp_from: '1046493951@qq.com'    # 发送邮件主题
      smtp_auth_username: '1046493951@qq.com'      # 登录用户名
      smtp_auth_password: 'djubruffhuolbeee'    # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
      smtp_require_tls: false           # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
    templates:
      - '/etc/alertmanager/*.tmpl'
    route:
      group_by: ['env','instance','type','group','job','alertname','cluster']
      group_wait: 10s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
      group_interval: 2m # 发送新告警间隔时间
      repeat_interval: 10m # 重复告警间隔发送时间,如果没处理过多久再次发送一次
      receiver: 'wechat'  #  默认接收人
      routes:
      - receiver: 'email'
        match:
          severity: critical

      - receiver: 'wechat'
        match:
          severity: critical222

      - receiver: 'webhook'
        match:
          severity: critical111  #这里是测试,根据线上报警标签进行修改

    receivers:
    - name: 'email'
      email_configs:
      - to: 'hxopensource@163.com'
        send_resolved: true
        html: '{{ template "email.to.html" . }}'
        headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }

    #- name: 'devops'
    #  email_configs:
    #  - to: 'hxopensource@163.com,xxx@qq.com'
    #    send_resolved: true
    #    html: '{{ template "email.to.html" . }}'

    - name: 'wechat'
      wechat_configs:
      - corp_id: 'xxx'
        to_party: '1'
        to_user: '@all'
        agent_id: 1000004
        api_secret: 'eGORelIo1EqzxxxxxxnkGELI-Ag3TTwo'
        send_resolved: true

    - name: 'webhook'
      webhook_configs:
      - url: 'http://dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook/send'
        send_resolved: true

    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitor
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.qq.com:465'     # 邮箱服务器的SMTP主机配置
      smtp_from: '1046493951@qq.com'    # 发送邮件主题
      smtp_auth_username: '1046493951@qq.com'      # 登录用户名
      smtp_auth_password: 'djubruffhuolbeee'    # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
      smtp_require_tls: false           # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
    templates:
      - '/etc/alertmanager/*.tmpl'
    route:
      group_by: ['env','instance','type','group','job','alertname','cluster']
      group_wait: 10s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
      group_interval: 2m # 发送新告警间隔时间
      repeat_interval: 10m # 重复告警间隔发送时间,如果没处理过多久再次发送一次
      receiver: 'wechat'  #  默认接收人
      routes:
      - receiver: 'email'
        match:
          severity: critical

      - receiver: 'wechat'
        match:
          severity: critical222

      - receiver: 'webhook'
        match:
          severity: critical111  #这里是测试,根据线上报警标签进行修改

    receivers:
    - name: 'email'
      email_configs:
      - to: 'hxopensource@163.com'
        send_resolved: true
        html: '{{ template "email.to.html" . }}'
        headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }

    #- name: 'devops'
    #  email_configs:
    #  - to: 'hxopensource@163.com,xxx@qq.com'
    #    send_resolved: true
    #    html: '{{ template "email.to.html" . }}'

    - name: 'wechat'
      wechat_configs:
      - corp_id: 'xxx'
        to_party: '1'
        to_user: '@all'
        agent_id: 1000004
        api_secret: 'eGORelIo1EqzxxxxxxnkGELI-Ag3TTwo'
        send_resolved: true

    - name: 'webhook'
      webhook_configs:
      - url: 'http://dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook/send'
        send_resolved: true

    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']

2.3 发送消息

bash
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"severity":"critical111"},"annotations":{"summary":"This is a testalert"}}]'
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"severity":"critical111"},"annotations":{"summary":"This is a testalert"}}]'
  • 效果

image-20240703171400899