pg_repack 回收表体积
对于某些常进行archiver或者 purge操作的表而言,如果我们不定期回收表空间,则表体积会越涨越大。
但是pg自带的 vacuum full 在回收的过程中会阻塞读写操作,不能在生产环境直接运行。
因此,在生产环境 我们常用的表空间收缩工具是pg_squeeze 和 pg_repack
项目地址: https://github.com/reorg/pg_repack
原理: 新建一个一模一样的影子表,然后拷贝原表的数据,最后rename替换原表。
注意: 待处理的表必须有主键
yum install centos-release-scl-rh
yum install llvm-toolset-7-clang
cd /home/postgres
tar xf pg_repack-ver_1.4.4.tar.gz
export PATH=/usr/local/pgsql-11.5/bin:$PATH -- 需要载入环境变量,不然编译过程中可能找不到pg_config这个文件
cd pg_repack-ver_1.4.4
make && make install
yum install centos-release-scl-rh
yum install llvm-toolset-7-clang
cd /home/postgres
tar xf pg_repack-ver_1.4.4.tar.gz
export PATH=/usr/local/pgsql-11.5/bin:$PATH -- 需要载入环境变量,不然编译过程中可能找不到pg_config这个文件
cd pg_repack-ver_1.4.4
make && make install
另外,会生成一个可执行的文件: /home/postgres/pg_repack-ver_1.4.4/bin/pg_repack
修改配置文件:
vim /usr/local/pgsql-11.5/data/postgresql.conf
shared_preload_libraries = 'pg_repack'
shared_preload_libraries = 'pg_repack'
然后 ,重启pg进程
使用方法:
create database db1;
\c db1
create extension pg_repack;
create table testdata (id integer,course int,grade numeric(4,2),testtime date);
alter table testdata add primary key (id);
insert into testdata
select generate_series(1,100) as id,
10 as course,
10.11 as grade,
'2017-07-06' as testtime;
create database db1;
\c db1
create extension pg_repack;
create table testdata (id integer,course int,grade numeric(4,2),testtime date);
alter table testdata add primary key (id);
insert into testdata
select generate_series(1,100) as id,
10 as course,
10.11 as grade,
'2017-07-06' as testtime;
然后,我们可以去看下PG datadir物理文件大小从1.1GB涨到了1.6GB了
然后,我们再使用命令 delete from testdata where id between 5000000 and 10000000; 对testdata表删除一半的数据 ,此时可以看到物理文件没有任何缩小。
然后,在外部使用pg_repack对 color表做空间回收:
cd /home/postgres/pg_repack-ver_1.4.4/bin
./pg_repack -h 127.0.0.1 --port 5434 -Upostgres -d db1 -t testdata -j 2 -D -k
cd /home/postgres/pg_repack-ver_1.4.4/bin
./pg_repack -h 127.0.0.1 --port 5434 -Upostgres -d db1 -t testdata -j 2 -D -k
pg_repack参数
-a, --all repack all databases
-t, --table=TABLE repack specific table only
-I, --parent-table=TABLE repack specific parent table and its inheritors
-c, --schema=SCHEMA repack tables in specific schema only
-s, --tablespace=TBLSPC move repacked tables to a new tablespace
-S, --moveidx move repacked indexes to TBLSPC too
-o, --order-by=COLUMNS order by columns instead of cluster keys
-n, --no-order do vacuum full instead of cluster
-N, --dry-run print what would have been repacked
-j, --jobs=NUM Use this many parallel jobs for each table
-i, --index=INDEX move only the specified index
-x, --only-indexes move only indexes of the specified table
-T, --wait-timeout=SECS timeout to cancel other backends on conflict
-D, --no-kill-backend don't kill other backends when timed out
-Z, --no-analyze don't analyze at end
-k, --no-superuser-check skip superuser checks in client
-C, --exclude-extension don't repack tables which belong to specific extension
Connection options:
-d, --dbname=DBNAME database to connect
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port
-U, --username=USERNAME user name to connect as
-w, --no-password never prompt for password
-W, --password force password prompt
-a, --all repack all databases
-t, --table=TABLE repack specific table only
-I, --parent-table=TABLE repack specific parent table and its inheritors
-c, --schema=SCHEMA repack tables in specific schema only
-s, --tablespace=TBLSPC move repacked tables to a new tablespace
-S, --moveidx move repacked indexes to TBLSPC too
-o, --order-by=COLUMNS order by columns instead of cluster keys
-n, --no-order do vacuum full instead of cluster
-N, --dry-run print what would have been repacked
-j, --jobs=NUM Use this many parallel jobs for each table
-i, --index=INDEX move only the specified index
-x, --only-indexes move only indexes of the specified table
-T, --wait-timeout=SECS timeout to cancel other backends on conflict
-D, --no-kill-backend don't kill other backends when timed out
-Z, --no-analyze don't analyze at end
-k, --no-superuser-check skip superuser checks in client
-C, --exclude-extension don't repack tables which belong to specific extension
Connection options:
-d, --dbname=DBNAME database to connect
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port
-U, --username=USERNAME user name to connect as
-w, --no-password never prompt for password
-W, --password force password prompt