Skip to content

文档,https://etcd.io/docs/v3.4/op-guide/maintenance/

1. 容器磁盘写满引发的后果

  • Pod 不能删除 (一直 Terminating)
  • Pod 不能被创建 (一直 ContainerCreating)

1.1 磁盘写满分两种情况

磁盘空间全部使用完

磁盘 Inode 全部使用完

df -i
df -i

解决方式:

1.标记 Node 为不可调度

bash
$ kubectl drain ${node-name}
$ kubectl drain ${node-name}

2.查找那个容器输出日志占用最大

bash
for name in $(docker ps -a  | awk '{print $1}' | grep -v CONTAINER); do docker inspect $name | grep LogPath | awk '{print $NF}' | tr -d '",' |xargs du -sh;done

#清空日志
echo > xxx.log
for name in $(docker ps -a  | awk '{print $1}' | grep -v CONTAINER); do docker inspect $name | grep LogPath | awk '{print $NF}' | tr -d '",' |xargs du -sh;done

#清空日志
echo > xxx.log

如果磁盘还是没有释放,可以通过 lsof | grep -i delete 查找已删除的文件进程,找到后直接 kill

3.取消不可调度的标记

$ kubectl uncordon ${node-name}
$ kubectl uncordon ${node-name}

4.定时清理脚本

bash
#!/usr/bin/env bash

for images_id in `docker images | grep 'harbor.example.com' | awk '{print $3}'`
do
    docker rmi $images_id
done

# 清理 <none> images
for images_id_1 in `docker images  | awk '$2 ~ "<none>"{print $3}'`
do
    docker rmi $images_id_1
done
#!/usr/bin/env bash

for images_id in `docker images | grep 'harbor.example.com' | awk '{print $3}'`
do
    docker rmi $images_id
done

# 清理 <none> images
for images_id_1 in `docker images  | awk '$2 ~ "<none>"{print $3}'`
do
    docker rmi $images_id_1
done

1.2 查看Pod占用磁盘情况

bash
#1.首先在节点上查看dokcer目录使用的空间,得到是docker占用的磁盘。
du -sh --max-depth 1 /var/lib/docker/

#2.在节点上运行以下命令,可以看到images和container占用信息。
docker system df

#3.查看每个image、container详细大小
docker system df -v

#4.节点上查看磁盘使用情况,并安装从大到小排序,可以看到各个容器的占用磁盘空间。
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr
#1.首先在节点上查看dokcer目录使用的空间,得到是docker占用的磁盘。
du -sh --max-depth 1 /var/lib/docker/

#2.在节点上运行以下命令,可以看到images和container占用信息。
docker system df

#3.查看每个image、container详细大小
docker system df -v

#4.节点上查看磁盘使用情况,并安装从大到小排序,可以看到各个容器的占用磁盘空间。
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr

常用命令

bash
#1.获取镜像,ID,端口号,状态
docker ps -a --format "table {{.Image}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}" | sort -hr

#2.列出正在运行的容器
docker ps -a -f  "status=running"

#3.列出退出的容器
docker ps -a -f  "status=exited"

#4.查看容器磁盘信息
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr

#5.获取容器的IP
docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)

#6.获取容器的mac地址
docker inspect --format='{{range .NetworkSettings.Networks}}{{.MacAddress}}{{end}}' $(docker ps -a -q)

#7.获取容器name
docker inspect --format='{{.Name}}' $(docker ps -aq)
docker inspect --format='{{.Name}}' $(docker ps -aq)|cut -d"/" -f2

#8.获取容器的Hostname
docker inspect --format '{{ .Config.Hostname }}' $(docker ps -q)

#9.获取hostname,ip
docker inspect --format 'Hostname:{{ .Config.Hostname }}  Name:{{.Name}} IP:{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)

#10.获取容器的log path
docker inspect --format='{{.LogPath}}' `docker ps -a -q`

#11.获取容器的镜像
docker inspect --format='{{.Config.Image}}' `docker ps -a -q`
#1.获取镜像,ID,端口号,状态
docker ps -a --format "table {{.Image}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}" | sort -hr

#2.列出正在运行的容器
docker ps -a -f  "status=running"

#3.列出退出的容器
docker ps -a -f  "status=exited"

#4.查看容器磁盘信息
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr

#5.获取容器的IP
docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)

#6.获取容器的mac地址
docker inspect --format='{{range .NetworkSettings.Networks}}{{.MacAddress}}{{end}}' $(docker ps -a -q)

#7.获取容器name
docker inspect --format='{{.Name}}' $(docker ps -aq)
docker inspect --format='{{.Name}}' $(docker ps -aq)|cut -d"/" -f2

#8.获取容器的Hostname
docker inspect --format '{{ .Config.Hostname }}' $(docker ps -q)

#9.获取hostname,ip
docker inspect --format 'Hostname:{{ .Config.Hostname }}  Name:{{.Name}} IP:{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)

#10.获取容器的log path
docker inspect --format='{{.LogPath}}' `docker ps -a -q`

#11.获取容器的镜像
docker inspect --format='{{.Config.Image}}' `docker ps -a -q`

2. etcd磁盘写满

2.1 在容器中操作

bash
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 |  3.4.14 |  2.1 GB |     false |      false |        55 |    6115946 |            6115946 |   memberID:9834606138033252550 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+

# etcdctl alarm list
memberID:9834606138033252550 alarm:NOSPACE

#获取etcd数据历史版本号
etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*'
6115560


#压缩旧版本
# etcdctl compact 6115560
compacted revision 6115560

#整理磁盘碎片
# etcdctl defrag

#查看etcd db大小
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 |  3.4.14 |  4.1 MB |     false |      false |        55 |    6116017 |            6116017 |   memberID:9834606138033252550 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+

#解除告警
# etcdctl alarm disarm
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 |  3.4.14 |  2.1 GB |     false |      false |        55 |    6115946 |            6115946 |   memberID:9834606138033252550 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+

# etcdctl alarm list
memberID:9834606138033252550 alarm:NOSPACE

#获取etcd数据历史版本号
etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*'
6115560


#压缩旧版本
# etcdctl compact 6115560
compacted revision 6115560

#整理磁盘碎片
# etcdctl defrag

#查看etcd db大小
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 |  3.4.14 |  4.1 MB |     false |      false |        55 |    6116017 |            6116017 |   memberID:9834606138033252550 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+

#解除告警
# etcdctl alarm disarm

2.2 外部访问etcd

2.2.1 安装

bash
ETCD_VER=v3.5.14

# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
ETCD_VER=v3.5.14

# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version

2.3 解决方式

增加容量

bash
--auto-compaction-mode=revision --auto-compaction-retention=1000 --quota-backend-bytes=8589934592


auto-compaction-mode=revision 按版本号压缩
auto-compaction-retention=1000 保留近1000个revision,每5分钟自动压缩 ”latest revision” - 1000
quota-backend-bytes 设置etcd最大容量为8G
--auto-compaction-mode=revision --auto-compaction-retention=1000 --quota-backend-bytes=8589934592


auto-compaction-mode=revision 按版本号压缩
auto-compaction-retention=1000 保留近1000个revision,每5分钟自动压缩 ”latest revision” - 1000
quota-backend-bytes 设置etcd最大容量为8G

清理老数据

  • 设置环境变量
bash
ETCD_CA_CERT="/etc/kubernetes/pki/etcd/ca.crt"
ETCD_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCD_KEY="/etc/kubernetes/pki/etcd/server.key"
HOST_1=https://xxx.xxx.xxx.xxx:2379
ETCD_CA_CERT="/etc/kubernetes/pki/etcd/ca.crt"
ETCD_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCD_KEY="/etc/kubernetes/pki/etcd/server.key"
HOST_1=https://xxx.xxx.xxx.xxx:2379
  • 获取当前etcd数据的修订版本(revision)
bash
rev=$(ETCDCTL_API=3 etcdctl  --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')

echo $rev
rev=$(ETCDCTL_API=3 etcdctl  --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')

echo $rev
  • 整合压缩旧版本数据
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" compact $rev
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" compact $rev
  • 执行碎片整理
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" defrag
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" defrag
  • 解除告警

bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" alarm disarm
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" alarm disarm
  • 验证
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" put helltext 123
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --endpoints="${HOST_1}" put helltext 123

auto compact只会压缩key space,不会释放物理存储空间。所以需要定期的执行defrag

https://www.kubernetes.org.cn/7569.html

https://blog.51cto.com/u_15966109/6082624