1. 基于钉钉的报警媒介
1.1 生成钉钉机器人
1.打开钉钉的智能群助手,点击添加机器人
2.选择自定义机器人
3.复制webhook地址后点击保存
2. dingtalk部署
2.1 创建configmap
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dingtalk-cm
namespace: monitor
data:
config.yml: |-
templates:
- /etc/prometheus-webhook-dingtalk/dingding.tmpl
targets:
webhook:
url: https://oapi.dingtalk.com/robot/send?access_token=e5f3dbfbf4e3070427486e1ca288f3077aa5155d51f33ea012a838cc3070eb53
message:
text: '{{ template "dingtalk.to.message" . }}'
dingding.tmpl: |-
{{ define "dingtalk.to.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
========= **监控告警** ========= <br>
**告警集群:** k8s <br>
**告警类型:** {{ $alert.Labels.alertname }} <br>
**告警级别:** {{ $alert.Labels.severity }} <br>
**告警状态:** {{ .Status }} <br>
**故障主机:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }} <br>
**告警主题:** {{ .Annotations.summary }} <br>
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
**主机标签:** {{ range .Labels.SortedPairs }} </br> [{{ .Name }}: {{ .Value | markdown | html }} ] <br>
{{- end }}
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
========= = **end** = =========
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
========= **故障恢复** ========= <br>
**告警集群:** k8s <br>
**告警主题:** {{ $alert.Annotations.summary }} <br>
**告警主机:** {{ .Labels.instance }} <br>
**告警类型:** {{ .Labels.alertname }} <br>
**告警级别:** {{ $alert.Labels.severity }} <br>
**告警状态:** {{ .Status }} <br>
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
**恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
========= = **end** = =========
{{- end }}
{{- end }}
{{- end }}
apiVersion: v1
kind: ConfigMap
metadata:
name: dingtalk-cm
namespace: monitor
data:
config.yml: |-
templates:
- /etc/prometheus-webhook-dingtalk/dingding.tmpl
targets:
webhook:
url: https://oapi.dingtalk.com/robot/send?access_token=e5f3dbfbf4e3070427486e1ca288f3077aa5155d51f33ea012a838cc3070eb53
message:
text: '{{ template "dingtalk.to.message" . }}'
dingding.tmpl: |-
{{ define "dingtalk.to.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
========= **监控告警** ========= <br>
**告警集群:** k8s <br>
**告警类型:** {{ $alert.Labels.alertname }} <br>
**告警级别:** {{ $alert.Labels.severity }} <br>
**告警状态:** {{ .Status }} <br>
**故障主机:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }} <br>
**告警主题:** {{ .Annotations.summary }} <br>
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
**主机标签:** {{ range .Labels.SortedPairs }} </br> [{{ .Name }}: {{ .Value | markdown | html }} ] <br>
{{- end }}
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
========= = **end** = =========
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
========= **故障恢复** ========= <br>
**告警集群:** k8s <br>
**告警主题:** {{ $alert.Annotations.summary }} <br>
**告警主机:** {{ .Labels.instance }} <br>
**告警类型:** {{ .Labels.alertname }} <br>
**告警级别:** {{ $alert.Labels.severity }} <br>
**告警状态:** {{ .Status }} <br>
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} <br>
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
**恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
========= = **end** = =========
{{- end }}
{{- end }}
{{- end }}
- apply
bash
kubectl apply -f 5.dingtalk-configmap.yaml
kubectl apply -f 5.dingtalk-configmap.yaml
2.2 创建dp
yaml
apiVersion: v1
kind: Service
metadata:
name: dingtalk
namespace: monitor
spec:
selector:
app: dingtalk
ports:
- name: http
protocol: TCP
port: 8060
targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dingtalk
namespace: monitor
labels:
app: dingtalk
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
selector:
matchLabels:
app: dingtalk
template:
metadata:
labels:
app: dingtalk
spec:
restartPolicy: "Always"
containers:
- name: dingtalk
image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/prometheus-webhook-dingtalk:v2.1.0
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: dingtalk-conf
mountPath: /etc/prometheus-webhook-dingtalk/
resources:
limits:
cpu: "400m"
memory: "500Mi"
requests:
cpu: "100m"
memory: "100Mi"
ports:
- containerPort: 8060
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
periodSeconds: 5
initialDelaySeconds: 30
successThreshold: 1
tcpSocket:
port: 8060
livenessProbe:
tcpSocket:
port: 8060
initialDelaySeconds: 30
periodSeconds: 10
volumes:
- name: dingtalk-conf
configMap:
name: dingtalk-cm
apiVersion: v1
kind: Service
metadata:
name: dingtalk
namespace: monitor
spec:
selector:
app: dingtalk
ports:
- name: http
protocol: TCP
port: 8060
targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dingtalk
namespace: monitor
labels:
app: dingtalk
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
selector:
matchLabels:
app: dingtalk
template:
metadata:
labels:
app: dingtalk
spec:
restartPolicy: "Always"
containers:
- name: dingtalk
image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/prometheus-webhook-dingtalk:v2.1.0
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: dingtalk-conf
mountPath: /etc/prometheus-webhook-dingtalk/
resources:
limits:
cpu: "400m"
memory: "500Mi"
requests:
cpu: "100m"
memory: "100Mi"
ports:
- containerPort: 8060
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
periodSeconds: 5
initialDelaySeconds: 30
successThreshold: 1
tcpSocket:
port: 8060
livenessProbe:
tcpSocket:
port: 8060
initialDelaySeconds: 30
periodSeconds: 10
volumes:
- name: dingtalk-conf
configMap:
name: dingtalk-cm
- apply
bash
kubectl apply -f 6.dingtalk-webhook-delpoy.yaml
kubectl apply -f 6.dingtalk-webhook-delpoy.yaml
- 热更新
bash
curl -XPOST http://alertmanager.ikubernetes.net/-/reload
curl -XPOST http://alertmanager.ikubernetes.net/-/reload
2.2 配置Alertmanger
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: monitor
data:
alertmanager.yml: |-
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.qq.com:465' # 邮箱服务器的SMTP主机配置
smtp_from: '1046493951@qq.com' # 发送邮件主题
smtp_auth_username: '1046493951@qq.com' # 登录用户名
smtp_auth_password: 'djubruffhuolbeee' # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
smtp_require_tls: false # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
templates:
- '/etc/alertmanager/*.tmpl'
route:
group_by: ['env','instance','type','group','job','alertname','cluster']
group_wait: 10s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
group_interval: 2m # 发送新告警间隔时间
repeat_interval: 10m # 重复告警间隔发送时间,如果没处理过多久再次发送一次
receiver: 'wechat' # 默认接收人
routes:
- receiver: 'email'
match:
severity: critical
- receiver: 'wechat'
match:
severity: critical222
- receiver: 'webhook'
match:
severity: critical111 #这里是测试,根据线上报警标签进行修改
receivers:
- name: 'email'
email_configs:
- to: 'hxopensource@163.com'
send_resolved: true
html: '{{ template "email.to.html" . }}'
headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }
#- name: 'devops'
# email_configs:
# - to: 'hxopensource@163.com,xxx@qq.com'
# send_resolved: true
# html: '{{ template "email.to.html" . }}'
- name: 'wechat'
wechat_configs:
- corp_id: 'xxx'
to_party: '1'
to_user: '@all'
agent_id: 1000004
api_secret: 'eGORelIo1EqzxxxxxxnkGELI-Ag3TTwo'
send_resolved: true
- name: 'webhook'
webhook_configs:
- url: 'http://dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: monitor
data:
alertmanager.yml: |-
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.qq.com:465' # 邮箱服务器的SMTP主机配置
smtp_from: '1046493951@qq.com' # 发送邮件主题
smtp_auth_username: '1046493951@qq.com' # 登录用户名
smtp_auth_password: 'djubruffhuolbeee' # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
smtp_require_tls: false # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
templates:
- '/etc/alertmanager/*.tmpl'
route:
group_by: ['env','instance','type','group','job','alertname','cluster']
group_wait: 10s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
group_interval: 2m # 发送新告警间隔时间
repeat_interval: 10m # 重复告警间隔发送时间,如果没处理过多久再次发送一次
receiver: 'wechat' # 默认接收人
routes:
- receiver: 'email'
match:
severity: critical
- receiver: 'wechat'
match:
severity: critical222
- receiver: 'webhook'
match:
severity: critical111 #这里是测试,根据线上报警标签进行修改
receivers:
- name: 'email'
email_configs:
- to: 'hxopensource@163.com'
send_resolved: true
html: '{{ template "email.to.html" . }}'
headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }
#- name: 'devops'
# email_configs:
# - to: 'hxopensource@163.com,xxx@qq.com'
# send_resolved: true
# html: '{{ template "email.to.html" . }}'
- name: 'wechat'
wechat_configs:
- corp_id: 'xxx'
to_party: '1'
to_user: '@all'
agent_id: 1000004
api_secret: 'eGORelIo1EqzxxxxxxnkGELI-Ag3TTwo'
send_resolved: true
- name: 'webhook'
webhook_configs:
- url: 'http://dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
2.3 发送消息
bash
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"severity":"critical111"},"annotations":{"summary":"This is a testalert"}}]'
curl -XPOST -H 'Content-Type: application/json' http://alertmanager.ikubernetes.net/api/v1/alerts -d'[{"labels":{"severity":"critical111"},"annotations":{"summary":"This is a testalert"}}]'
- 效果