Skip to content

1. 自定义资源接入

Prometheus使用各种Exporter来监控资源。Exporter可以看成是监控的agent端,它负责收集对应资源的指标,并提供接口给到Prometheus读取。

2. ECS数据抓取

2.1 配置安装node-exporter

  • 启动容器
yaml
docker run -d -p 9100:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
-v "/etc/localtime:/etc/localtime" \
prom/node-exporter \
--path.procfs /host/proc \
--path.sysfs /host/sys \
--collector.filesystem.ignored-mount-points
"^/(sys|proc|dev|host|etc)($|/)"
docker run -d -p 9100:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
-v "/etc/localtime:/etc/localtime" \
prom/node-exporter \
--path.procfs /host/proc \
--path.sysfs /host/sys \
--collector.filesystem.ignored-mount-points
"^/(sys|proc|dev|host|etc)($|/)"

验证,curl localhost:9100/metrics

  • 创建采集器
yaml
- job_name: 'other-ECS'
  static_configs:
  - targets: ['10.103.236.199:9100']
    labels:
      hostname: 'test-node-exporter'
- job_name: 'other-ECS'
  static_configs:
  - targets: ['10.103.236.199:9100']
    labels:
      hostname: 'test-node-exporter'
  • 热加载
curl -XPOST http://prometheus.ikubernetes.net/-/reload
curl -XPOST http://prometheus.ikubernetes.net/-/reload

3. process-exporter进程监控

官当

process-export主要用来做进程监控,比如某个服务的进程数、消耗了多少CPU、内存等资源

3.0 语法

yaml
vim /opt/process-exporter/config/process-exporter.yml

process_names:
#  - name: "{{.Comm}}"
#    cmdline:
#    - '.+'

  - name: "{{.Matches}}"
    cmdline:
    - 'nginx'

  - name: "{{.Matches}}"
    cmdline:
    - '/opt/atlassian/confluence/bin/tomcat-juli.jar'

  - name: "{{.Matches}}"
    cmdline:
    - 'vsftpd'

  - name: "{{.Matches}}"
    cmdline:
    - 'redis-server'
vim /opt/process-exporter/config/process-exporter.yml

process_names:
#  - name: "{{.Comm}}"
#    cmdline:
#    - '.+'

  - name: "{{.Matches}}"
    cmdline:
    - 'nginx'

  - name: "{{.Matches}}"
    cmdline:
    - '/opt/atlassian/confluence/bin/tomcat-juli.jar'

  - name: "{{.Matches}}"
    cmdline:
    - 'vsftpd'

  - name: "{{.Matches}}"
    cmdline:
    - 'redis-server'

cmdline: 所选进程的唯一标识,ps -ef 可以查询到。如果改进程不存在,则不会有该进程的数据采集到。

bash
{{.Comm}}  记得带上{{}}
{{.Comm}}  记得带上{{}}
.Commgroupname="redis-server"exe或者sh文件名称
.ExeBasegroupname="redis-server *:6379"/
.ExeFullgroupname="/usr/bin/redis-server *:6379"ps中的进程完成信息
.Usernamegroupname="redis"使用进程所属的用户进行分组
.Matchesgroupname="map[:redis]"表示配置到关键字"redis"

3.1 创建挂载目录

yaml
 mkdir -p /opt/process-exporter/config
 
 cat /opt/process-exporter/config/process-exporter.yml
 process_names:
   - name: "{{.Matches}}"  # 匹配模板
     cmdline:
     - 'api'  #根据自己的修改
 mkdir -p /opt/process-exporter/config
 
 cat /opt/process-exporter/config/process-exporter.yml
 process_names:
   - name: "{{.Matches}}"  # 匹配模板
     cmdline:
     - 'api'  #根据自己的修改

3.2 配置安装process-exporter

bash
docker run -itd --rm -p 9256:9256 --privileged -v /proc:/host/proc -v /opt/process-exporter/config:/config ncabatoff/process-exporter --procfs /host/proc -config.path config/process-exporter.yml
docker run -itd --rm -p 9256:9256 --privileged -v /proc:/host/proc -v /opt/process-exporter/config:/config ncabatoff/process-exporter --procfs /host/proc -config.path config/process-exporter.yml
  • 验证
bash
curl localhost:9256/metrics

ps aux | grep -v grep | grep api
curl localhost:9256/metrics

ps aux | grep -v grep | grep api

4. k8s中process-exporter进程监控

需要监控k8s集群每台linux服务器的 docker kubelet进程运行状态,当有进程异常时,触发告警

4.1 配置config

  • 创建config

1.exporter-cofig.yaml

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: process-exporter-config
  namespace: monitor
data:
  process-exporter-config.yaml: |-
    process_names:
    - name: "{{.Matches}}"
      cmdline:
      - 'docker'
    - name: "{{.Matches}}"
      cmdline:
      - 'kubelet'
apiVersion: v1
kind: ConfigMap
metadata:
  name: process-exporter-config
  namespace: monitor
data:
  process-exporter-config.yaml: |-
    process_names:
    - name: "{{.Matches}}"
      cmdline:
      - 'docker'
    - name: "{{.Matches}}"
      cmdline:
      - 'kubelet'
  • 执行
yaml
kubectl apply -f 1.exporter-cofig.yaml
kubectl apply -f 1.exporter-cofig.yaml

4.2 安装

  • 创建daeset

2.exporter-dp.yaml

yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: process-exporter
  namespace: monitor
  labels:
    app: process-exporter
spec:
  selector:
    matchLabels:
      app: process-exporter
  template:
    metadata:
      labels:
        app: process-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      containers:
        - name: process-exporter
          image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/process-exporter:latest
          args:
            - -config.path=/config/process-exporter-config.yaml
          ports:
            - containerPort: 9256
          resources:
            requests:
              cpu: 10m
              memory: 10Mi
            limits:
              cpu: 150m
              memory: 180Mi
          securityContext:
            runAsNonRoot: true
            runAsUser: 65534
          volumeMounts:
            - name: proc
              mountPath: /proc
            - name: config
              mountPath: /config
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: config
          configMap:
            name: process-exporter-config
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: process-exporter
  namespace: monitor
  labels:
    app: process-exporter
spec:
  selector:
    matchLabels:
      app: process-exporter
  template:
    metadata:
      labels:
        app: process-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      containers:
        - name: process-exporter
          image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/process-exporter:latest
          args:
            - -config.path=/config/process-exporter-config.yaml
          ports:
            - containerPort: 9256
          resources:
            requests:
              cpu: 10m
              memory: 10Mi
            limits:
              cpu: 150m
              memory: 180Mi
          securityContext:
            runAsNonRoot: true
            runAsUser: 65534
          volumeMounts:
            - name: proc
              mountPath: /proc
            - name: config
              mountPath: /config
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: config
          configMap:
            name: process-exporter-config
  • apply
yaml
kubectl apply -f 2.exporter-dp.yaml
kubectl apply -f 2.exporter-dp.yaml
  • 验证
curl pod-ip:9256/metrics
curl pod-ip:9256/metrics

4.3 prometheus配置采集器

yaml
    - job_name: 'process-exporter'
      scrape_interval: 1m
      scrape_timeout: 1m
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9256'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        action: replace
        target_label: ip
    - job_name: 'process-exporter'
      scrape_interval: 1m
      scrape_timeout: 1m
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9256'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        action: replace
        target_label: ip
  • 热更新
bash
curl -XPOST http://prometheus.ikubernetes.net/-/reload
curl -XPOST http://prometheus.ikubernetes.net/-/reload
  • 效果

image-20240630154052134

4.4 rule规则

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rule
  labels:
    name: prometheus-rule
  namespace: monitoring
data:
  alert-rules.yaml: |-
    groups:
    - name: node-alert
      rules:
      - alert: service not running
        expr: namedprocess_namegroup_num_procs == 0
        for: 1m
        labels:
          severity: warning
          team: server
        annotations:
          summary: "{{$labels.ip}} service status not running"
          description: "{{$labels.ip}} {{$labels.groupname}} service status not running"
          value: "{{$labels.groupname}}"
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rule
  labels:
    name: prometheus-rule
  namespace: monitoring
data:
  alert-rules.yaml: |-
    groups:
    - name: node-alert
      rules:
      - alert: service not running
        expr: namedprocess_namegroup_num_procs == 0
        for: 1m
        labels:
          severity: warning
          team: server
        annotations:
          summary: "{{$labels.ip}} service status not running"
          description: "{{$labels.ip}} {{$labels.groupname}} service status not running"
          value: "{{$labels.groupname}}"

模板ID为249

5. domain-exporter

文档,https://github.com/caarlos0/domain_exporter/releases

5.1 创建svc

cat 1.domain-svc.yaml

yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    name: domain-exporter
  name: domain-exporter
  namespace: monitor
spec:
  ports:
    - name: domain-exporter
      protocol: TCP
      port: 9222
      targetPort: 9222
  selector:
    app: domain-exporter
apiVersion: v1
kind: Service
metadata:
  labels:
    name: domain-exporter
  name: domain-exporter
  namespace: monitor
spec:
  ports:
    - name: domain-exporter
      protocol: TCP
      port: 9222
      targetPort: 9222
  selector:
    app: domain-exporter
  • 执行apply

5.2 创建dp

cat 2.domain-exporter-dp.yaml

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: domain-exporter
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: domain-exporter
  template:
    metadata:
      name: domain-exporter
      labels:
        app: domain-exporter
    spec:
      containers:
        - name: domain-exporter
          image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/domain_exporter:v1.23.0
          ports:
            - name: tcp
              containerPort: 9222
              protocol: TCP
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
            limits:
              cpu: 200m
              memory: 256Mi
          securityContext:
            runAsUser: 1000
            readOnlyRootFilesystem: true
            runAsNonRoot: true
          readinessProbe:
            tcpSocket:
              port: 9222
            initialDelaySeconds: 5
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
apiVersion: apps/v1
kind: Deployment
metadata:
  name: domain-exporter
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: domain-exporter
  template:
    metadata:
      name: domain-exporter
      labels:
        app: domain-exporter
    spec:
      containers:
        - name: domain-exporter
          image: registry.cn-zhangjiakou.aliyuncs.com/hsuing/domain_exporter:v1.23.0
          ports:
            - name: tcp
              containerPort: 9222
              protocol: TCP
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
            limits:
              cpu: 200m
              memory: 256Mi
          securityContext:
            runAsUser: 1000
            readOnlyRootFilesystem: true
            runAsNonRoot: true
          readinessProbe:
            tcpSocket:
              port: 9222
            initialDelaySeconds: 5
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
  • 执行apply

5.3 接入prometheus

yaml
    - job_name: domain-exporter
      metrics_path: /probe
      relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: domain-exporter:9222 # domain_exporter address
      static_configs:
      - targets:
        - baidu.com #根据环境修改
    - job_name: domain-exporter
      metrics_path: /probe
      relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: domain-exporter:9222 # domain_exporter address
      static_configs:
      - targets:
        - baidu.com #根据环境修改
  • 执行apply

  • 热更新

bash
curl -XPOST http://prometheus.ikubernetes.net/-/reload
curl -XPOST http://prometheus.ikubernetes.net/-/reload

5.4 报警规则

yaml
  domain.rules: |
    groups:
    - name: domain
      rules:
      - alert: 域名检测失败
        expr: domain_probe_success == 0
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }} ,域名检测'
          description: '{{ $labels.domain }}, 域名检测失败,请及时查看!!!'
      - alert: 域名过期
        expr: domain_expiry_days < 15
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }},域名过期'
          description: '{{ $labels.domain }},将在15天后过期,请及时查看!!!'
      - alert: 域名过期
        expr: domain_expiry_days < 5
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }},域名过期'
          description: '{{ $labels.domain }},将在5天后过期,请及时查看!!!'
  domain.rules: |
    groups:
    - name: domain
      rules:
      - alert: 域名检测失败
        expr: domain_probe_success == 0
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }} ,域名检测'
          description: '{{ $labels.domain }}, 域名检测失败,请及时查看!!!'
      - alert: 域名过期
        expr: domain_expiry_days < 15
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }},域名过期'
          description: '{{ $labels.domain }},将在15天后过期,请及时查看!!!'
      - alert: 域名过期
        expr: domain_expiry_days < 5
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }},域名过期'
          description: '{{ $labels.domain }},将在5天后过期,请及时查看!!!'
  • 执行apply
  • 热更新

6. redis-export

version: "3.2"
services:
  redis-exporter:
    image: oliver006/redis_exporter
    container_name: redis-exporter
    restart: unless-stopped
    command:
      - "-redis.password-file=/redis_passwd.json"
    volumes:
      - /usr/share/zoneinfo/PRC:/etc/localtime
      - /data/redis-exporter/redis_passwd.json:/redis_passwd.json
    expose:
      - 9121
    network_mode: "host"
version: "3.2"
services:
  redis-exporter:
    image: oliver006/redis_exporter
    container_name: redis-exporter
    restart: unless-stopped
    command:
      - "-redis.password-file=/redis_passwd.json"
    volumes:
      - /usr/share/zoneinfo/PRC:/etc/localtime
      - /data/redis-exporter/redis_passwd.json:/redis_passwd.json
    expose:
      - 9121
    network_mode: "host"

7. mysql-export

https://github.com/prometheus/mysqld_exporter

https://github.com/starsliao/TenSunS/blob/main/docs/如何优雅的使用一个mysqld_exporter监控所有的MySQL实例.md

8. PG-export

https://github.com/prometheus-community/postgres_exporter

https://cloud.tencent.com/developer/article/1868937

https://pigsty.cc/zh/docs/pgsql/dashboard/

https://demo.pigsty.cc/dashboards/f/pgsql/pgsql

搜索模版

参考,

https://grafana.com/grafana/dashboards/9965

https://cloud.tencent.com/document/product/1416/56038