文档,https://etcd.io/docs/v3.4/op-guide/maintenance/
1. 容器磁盘写满引发的后果
- Pod 不能删除 (一直 Terminating)
- Pod 不能被创建 (一直 ContainerCreating)
1.1 磁盘写满分两种情况
磁盘空间全部使用完
磁盘 Inode 全部使用完
df -i
df -i
解决方式:
1.标记 Node 为不可调度
bash
$ kubectl drain ${node-name}
$ kubectl drain ${node-name}
2.查找那个容器输出日志占用最大
bash
for name in $(docker ps -a | awk '{print $1}' | grep -v CONTAINER); do docker inspect $name | grep LogPath | awk '{print $NF}' | tr -d '",' |xargs du -sh;done
#清空日志
echo > xxx.log
for name in $(docker ps -a | awk '{print $1}' | grep -v CONTAINER); do docker inspect $name | grep LogPath | awk '{print $NF}' | tr -d '",' |xargs du -sh;done
#清空日志
echo > xxx.log
如果磁盘还是没有释放,可以通过
lsof | grep -i delete
查找已删除的文件进程,找到后直接kill
掉
3.取消不可调度的标记
$ kubectl uncordon ${node-name}
$ kubectl uncordon ${node-name}
4.定时清理脚本
bash
#!/usr/bin/env bash
for images_id in `docker images | grep 'harbor.example.com' | awk '{print $3}'`
do
docker rmi $images_id
done
# 清理 <none> images
for images_id_1 in `docker images | awk '$2 ~ "<none>"{print $3}'`
do
docker rmi $images_id_1
done
#!/usr/bin/env bash
for images_id in `docker images | grep 'harbor.example.com' | awk '{print $3}'`
do
docker rmi $images_id
done
# 清理 <none> images
for images_id_1 in `docker images | awk '$2 ~ "<none>"{print $3}'`
do
docker rmi $images_id_1
done
1.2 查看Pod占用磁盘情况
bash
#1.首先在节点上查看dokcer目录使用的空间,得到是docker占用的磁盘。
du -sh --max-depth 1 /var/lib/docker/
#2.在节点上运行以下命令,可以看到images和container占用信息。
docker system df
#3.查看每个image、container详细大小
docker system df -v
#4.节点上查看磁盘使用情况,并安装从大到小排序,可以看到各个容器的占用磁盘空间。
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr
#1.首先在节点上查看dokcer目录使用的空间,得到是docker占用的磁盘。
du -sh --max-depth 1 /var/lib/docker/
#2.在节点上运行以下命令,可以看到images和container占用信息。
docker system df
#3.查看每个image、container详细大小
docker system df -v
#4.节点上查看磁盘使用情况,并安装从大到小排序,可以看到各个容器的占用磁盘空间。
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr
常用命令
bash
#1.获取镜像,ID,端口号,状态
docker ps -a --format "table {{.Image}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}" | sort -hr
#2.列出正在运行的容器
docker ps -a -f "status=running"
#3.列出退出的容器
docker ps -a -f "status=exited"
#4.查看容器磁盘信息
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr
#5.获取容器的IP
docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)
#6.获取容器的mac地址
docker inspect --format='{{range .NetworkSettings.Networks}}{{.MacAddress}}{{end}}' $(docker ps -a -q)
#7.获取容器name
docker inspect --format='{{.Name}}' $(docker ps -aq)
docker inspect --format='{{.Name}}' $(docker ps -aq)|cut -d"/" -f2
#8.获取容器的Hostname
docker inspect --format '{{ .Config.Hostname }}' $(docker ps -q)
#9.获取hostname,ip
docker inspect --format 'Hostname:{{ .Config.Hostname }} Name:{{.Name}} IP:{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)
#10.获取容器的log path
docker inspect --format='{{.LogPath}}' `docker ps -a -q`
#11.获取容器的镜像
docker inspect --format='{{.Config.Image}}' `docker ps -a -q`
#1.获取镜像,ID,端口号,状态
docker ps -a --format "table {{.Image}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}" | sort -hr
#2.列出正在运行的容器
docker ps -a -f "status=running"
#3.列出退出的容器
docker ps -a -f "status=exited"
#4.查看容器磁盘信息
docker ps -a --format "table {{.Size}}\t{{.Names}}" | sort -hr
#5.获取容器的IP
docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)
#6.获取容器的mac地址
docker inspect --format='{{range .NetworkSettings.Networks}}{{.MacAddress}}{{end}}' $(docker ps -a -q)
#7.获取容器name
docker inspect --format='{{.Name}}' $(docker ps -aq)
docker inspect --format='{{.Name}}' $(docker ps -aq)|cut -d"/" -f2
#8.获取容器的Hostname
docker inspect --format '{{ .Config.Hostname }}' $(docker ps -q)
#9.获取hostname,ip
docker inspect --format 'Hostname:{{ .Config.Hostname }} Name:{{.Name}} IP:{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)
#10.获取容器的log path
docker inspect --format='{{.LogPath}}' `docker ps -a -q`
#11.获取容器的镜像
docker inspect --format='{{.Config.Image}}' `docker ps -a -q`
2. etcd磁盘写满
2.1 在容器中操作
bash
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 | 3.4.14 | 2.1 GB | false | false | 55 | 6115946 | 6115946 | memberID:9834606138033252550 |
| | | | | | | | | | alarm:NOSPACE |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
# etcdctl alarm list
memberID:9834606138033252550 alarm:NOSPACE
#获取etcd数据历史版本号
etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*'
6115560
#压缩旧版本
# etcdctl compact 6115560
compacted revision 6115560
#整理磁盘碎片
# etcdctl defrag
#查看etcd db大小
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 | 3.4.14 | 4.1 MB | false | false | 55 | 6116017 | 6116017 | memberID:9834606138033252550 |
| | | | | | | | | | alarm:NOSPACE |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
#解除告警
# etcdctl alarm disarm
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 | 3.4.14 | 2.1 GB | false | false | 55 | 6115946 | 6115946 | memberID:9834606138033252550 |
| | | | | | | | | | alarm:NOSPACE |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
# etcdctl alarm list
memberID:9834606138033252550 alarm:NOSPACE
#获取etcd数据历史版本号
etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*'
6115560
#压缩旧版本
# etcdctl compact 6115560
compacted revision 6115560
#整理磁盘碎片
# etcdctl defrag
#查看etcd db大小
# etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 5a62a94db08693c1 | 3.4.14 | 4.1 MB | false | false | 55 | 6116017 | 6116017 | memberID:9834606138033252550 |
| | | | | | | | | | alarm:NOSPACE |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
#解除告警
# etcdctl alarm disarm
2.2 外部访问etcd
2.2.1 安装
bash
ETCD_VER=v3.5.14
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
ETCD_VER=v3.5.14
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
2.3 解决方式
增加容量
bash
--auto-compaction-mode=revision --auto-compaction-retention=1000 --quota-backend-bytes=8589934592
auto-compaction-mode=revision 按版本号压缩
auto-compaction-retention=1000 保留近1000个revision,每5分钟自动压缩 ”latest revision” - 1000
quota-backend-bytes 设置etcd最大容量为8G
--auto-compaction-mode=revision --auto-compaction-retention=1000 --quota-backend-bytes=8589934592
auto-compaction-mode=revision 按版本号压缩
auto-compaction-retention=1000 保留近1000个revision,每5分钟自动压缩 ”latest revision” - 1000
quota-backend-bytes 设置etcd最大容量为8G
清理老数据
- 设置环境变量
bash
ETCD_CA_CERT="/etc/kubernetes/pki/etcd/ca.crt"
ETCD_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCD_KEY="/etc/kubernetes/pki/etcd/server.key"
HOST_1=https://xxx.xxx.xxx.xxx:2379
ETCD_CA_CERT="/etc/kubernetes/pki/etcd/ca.crt"
ETCD_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCD_KEY="/etc/kubernetes/pki/etcd/server.key"
HOST_1=https://xxx.xxx.xxx.xxx:2379
- 获取当前etcd数据的修订版本(revision)
bash
rev=$(ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
echo $rev
rev=$(ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
echo $rev
- 整合压缩旧版本数据
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" compact $rev
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" compact $rev
- 执行碎片整理
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" defrag
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" defrag
解除告警
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" alarm disarm
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" alarm disarm
- 验证
bash
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" put helltext 123
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--endpoints="${HOST_1}" put helltext 123
auto compact只会压缩key space,不会释放物理存储空间。所以需要定期的执行defrag