1.pg_rman介绍
pg_rman(物理备份)
类似于oracle的rman备份策略,实现了全量、增量、归档的层级的备份,可以很灵活的管理PostgreSQL数据库的备份,pg_rman是一款开源的pg的备份恢复插件支持在线和基于PITR的备份恢复方式。
- 特点
○ 使用简单,一个命令即可完成备份和恢复.
○ 支持在线全备,增量备份,归档备份.
○ 支持备份压缩,通过gzip工具实现页内压缩.
○ 自动备份维护,自动删除过期的WAL备份文件.
○ 支持备份验证.
○ 恢复期间无事务丢失,支持基于PITR的配置文件生成器
- 注意事项
○ pg_rman 基于 pg_start_backup
○ pg_rman 基于需要在本地安装,跑的不是流复制协议,而是文件拷贝,所以pg_rman必须和数据库节点跑一起,不能远程备份
○ pg_rman 注意跟数据库版本配套
○ 随便指定一个存在的库,备份的是所有的物理文件
○ 时区一定保持一致,否则指定时间点恢复失败
○ 如果恢复一次数据库,之后必须在全备一次,否则不能进行增量备份
2.安装
- 文档
下载地址: https://github.com/ossc-db/pg_rman/
PostgreSQL Rpm相关依赖、插件包下载地址: https://yum.postgresql.org/10/redhat/rhel-6.7-x86_64/
手册地址: http://ossc-db.github.io/pg_rman/index.html
2.1配置环境变量
基于16实验
[ptgres@other ~]$vim .bashrc
export PGPORT=5532
export PG_HOME=/data/apps/pgsql/16
export PATH=$PG_HOME/bin:$PATH
export PGDATA=/data/pgdata/data
export LD_LIBRARY_PATH=$PG_HOME/lib
export LANG=en_US.utf8
export BACKUP_PATH=/home/ptgres/pg_rman_backups
export SRVLOG_PATH=/data/pgdata/data/log
export ARCLOG_PATH=/data/pgdata/archivedir
[ptgres@other ~]$vim .bashrc
export PGPORT=5532
export PG_HOME=/data/apps/pgsql/16
export PATH=$PG_HOME/bin:$PATH
export PGDATA=/data/pgdata/data
export LD_LIBRARY_PATH=$PG_HOME/lib
export LANG=en_US.utf8
export BACKUP_PATH=/home/ptgres/pg_rman_backups
export SRVLOG_PATH=/data/pgdata/data/log
export ARCLOG_PATH=/data/pgdata/archivedir
❌ 注意
环境一定的配置正确,否则,init的时候会出现报错
[ptgres@other ~]$ pg_rman init
pg_rman: logging.c:226: pg_log_generic_v: Assertion `progname' failed. Aborted (core dumped)
2.2下载
wget https://github.com/ossc-db/pg_rman/releases/download/V1.3.16/pg_rman-1.3.16-pg16.tar.gz
2.3安装
#安装依赖
yum install zlib-devel
#解压
tar zxvf pg_rman-1.3.16-pg16.tar.gz && cd pg_rman-1.3.16-pg16
#编译和安装,默认安装在pg下bin目录下
make && make install
#安装依赖
yum install zlib-devel
#解压
tar zxvf pg_rman-1.3.16-pg16.tar.gz && cd pg_rman-1.3.16-pg16
#编译和安装,默认安装在pg下bin目录下
make && make install
2.4初始化
- 前提
必须开启归档,否则失败
archive_mode = on # enables archiving; off, on, or always
archive_command = 'test ! -f /postgresql/pgsql/archive_log/%f && cp %p /postgresql/pgsql/archive_log/%f'
archive_mode = on # enables archiving; off, on, or always
archive_command = 'test ! -f /postgresql/pgsql/archive_log/%f && cp %p /postgresql/pgsql/archive_log/%f'
- 初始化
[ptgres@other ~]$ pg_rman init
INFO: ARCLOG_PATH is set to '/data/pgdata/archivedir'
INFO: SRVLOG_PATH is set to '/data/pgdata/data/log'
[ptgres@other ~]$ pg_rman init
INFO: ARCLOG_PATH is set to '/data/pgdata/archivedir'
INFO: SRVLOG_PATH is set to '/data/pgdata/data/log'
3.目录结构
[ptgres@other pg_rman_backups]$ ll
total 8
drwx------ 4 ptgres ptgres 34 Mar 20 14:30 backup
-rw-rw-r-- 1 ptgres ptgres 75 Mar 20 14:30 pg_rman.ini
-rw-rw-r-- 1 ptgres ptgres 40 Mar 20 14:30 system_identifier
drwx------ 2 ptgres ptgres 6 Mar 20 14:30 timeline_history
[ptgres@other pg_rman_backups]$ ll
total 8
drwx------ 4 ptgres ptgres 34 Mar 20 14:30 backup
-rw-rw-r-- 1 ptgres ptgres 75 Mar 20 14:30 pg_rman.ini
-rw-rw-r-- 1 ptgres ptgres 40 Mar 20 14:30 system_identifier
drwx------ 2 ptgres ptgres 6 Mar 20 14:30 timeline_history
3.1备份策略
pg_rman.ini
[ptgres@other pg_rman_backups]$ cat pg_rman.ini
#默认只有这两个
ARCLOG_PATH='/data/pgdata/archivedir'
SRVLOG_PATH='/data/pgdata/data/log'
#14天的数据,每周进行一次全备,每周一和周三的2:00做一次增量备份,每天进行一次归档备份
COMPRESS_DATA = YES
KEEP_DATA_GENERATIONS=3
KEEP_DATA_DAYS=14
KEEP_ARCLOG_FILES=30
KEEP_ARCLOG_DAYS=20
KEEP_SRVLOG_FILES=10 # 保留最近10个日志文件(pg_log)
KEEP_SRVLOG_DAYS=20
[ptgres@other pg_rman_backups]$ cat pg_rman.ini
#默认只有这两个
ARCLOG_PATH='/data/pgdata/archivedir'
SRVLOG_PATH='/data/pgdata/data/log'
#14天的数据,每周进行一次全备,每周一和周三的2:00做一次增量备份,每天进行一次归档备份
COMPRESS_DATA = YES
KEEP_DATA_GENERATIONS=3
KEEP_DATA_DAYS=14
KEEP_ARCLOG_FILES=30
KEEP_ARCLOG_DAYS=20
KEEP_SRVLOG_FILES=10 # 保留最近10个日志文件(pg_log)
KEEP_SRVLOG_DAYS=20
参数 | 说明 |
---|---|
KEEP_DATA_GENERATIONS | 需要全备份数 |
KEEP_DATA_DAYS | 保留天数 |
KEEP_ARCLOG_DAYS | wal日志保留时长,比KEEP_DATA_DAYS多一天即可 |
KEEP_SRVLOG_DAYS | 日志保留天数,单位天 |
3.2backup.ini
[ptgres@other 181558]$ cat backup.ini
# configuration
BACKUP_MODE=FULL
FULL_BACKUP_ON_ERROR=false
WITH_SERVERLOG=false
COMPRESS_DATA=false
# result
TIMELINEID=4
START_LSN=0/21000028
STOP_LSN=0/21000138
START_TIME='2024-03-20 18:15:58'
END_TIME='2024-03-20 18:16:00'
RECOVERY_XID=762
RECOVERY_TIME='2024-03-20 18:16:00'
TOTAL_DATA_BYTES=39632069
READ_DATA_BYTES=39631835
READ_ARCLOG_BYTES=201328288
WRITE_BYTES=226860821
BLOCK_SIZE=8192
XLOG_BLOCK_SIZE=8192
STATUS=DELETED
[ptgres@other 181558]$ cat backup.ini
# configuration
BACKUP_MODE=FULL
FULL_BACKUP_ON_ERROR=false
WITH_SERVERLOG=false
COMPRESS_DATA=false
# result
TIMELINEID=4
START_LSN=0/21000028
STOP_LSN=0/21000138
START_TIME='2024-03-20 18:15:58'
END_TIME='2024-03-20 18:16:00'
RECOVERY_XID=762
RECOVERY_TIME='2024-03-20 18:16:00'
TOTAL_DATA_BYTES=39632069
READ_DATA_BYTES=39631835
READ_ARCLOG_BYTES=201328288
WRITE_BYTES=226860821
BLOCK_SIZE=8192
XLOG_BLOCK_SIZE=8192
STATUS=DELETED
LSN将用于比对增量备份时对比数据块的LSN是否发生了变化,是否需要备份
4.备份
4.1全备
- 随便指定一个存在的库,备份的是所有的物理文件,并且备份一次,
必须验证validate
,否则再次备份失败
[ptgres@other ~]$ pg_rman backup -bf -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
[ptgres@other ~]$ pg_rman backup -bf -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
-b --->full(f) ,incremental(i), archive(a)
-C ---->checkpoint
-Z ---->压缩数据
-P ----> 显示进度
- 或者 都以参数形式
pg_rman backup \
-b full \ # 全量备份
-B /data05/back/pgstdbak \ # 备份目录
-D /data04/back/pg_root_1922 \ # 备库的$PGDATA
-s \ # 备份pg_log
-Z \ # 压缩
--keep-data-generations=3 \ # 保留3个全量备份,删除不需要的全量备份
--keep-data-days=10 \ # 保证能恢复到10天内的任意时间点,删除不需要的
--keep-arclog-files=30 \ # 保留最近30个归档文件
--keep-arclog-days=20 \ # 保留20天内的归档文件
--keep-srvlog-files=10 \ # 保留最近10个日志文件(pg_log)
--keep-srvlog-days=20 \ # 保留20天内的日志文件(pg_log)
--standby-host=127.0.0.1 \ # 如何连接standby
--standby-port=1922 \ # 如何连接standby
-h 127.0.0.1 \ # 如何连接primary
-p 1921 \ # 如何连接primary
-U postgres \ # 如何连接primary, standby(超级用户)
-d postgres # 如何连接primary, standby(database name)
pg_rman backup \
-b full \ # 全量备份
-B /data05/back/pgstdbak \ # 备份目录
-D /data04/back/pg_root_1922 \ # 备库的$PGDATA
-s \ # 备份pg_log
-Z \ # 压缩
--keep-data-generations=3 \ # 保留3个全量备份,删除不需要的全量备份
--keep-data-days=10 \ # 保证能恢复到10天内的任意时间点,删除不需要的
--keep-arclog-files=30 \ # 保留最近30个归档文件
--keep-arclog-days=20 \ # 保留20天内的归档文件
--keep-srvlog-files=10 \ # 保留最近10个日志文件(pg_log)
--keep-srvlog-days=20 \ # 保留20天内的日志文件(pg_log)
--standby-host=127.0.0.1 \ # 如何连接standby
--standby-port=1922 \ # 如何连接standby
-h 127.0.0.1 \ # 如何连接primary
-p 1921 \ # 如何连接primary
-U postgres \ # 如何连接primary, standby(超级用户)
-d postgres # 如何连接primary, standby(database name)
4.2增量备
pg_rman最大的亮点就是实现了增量备份,注意不是基于WAL日志的增量备份,是基于上次全量备份之后发生的变化数据块的增量备份
增量备份是基于文件系统的update time时间线,增量备份必须有个对应的全库备份
[ptgres@other ~]$pg_rman backup -bi -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
[ptgres@other ~]$pg_rman backup -bi -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
4.3删除备份
[ptgres@other ~]$ pg_rman show
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
2024-03-20 18:15:58 2024-03-20 18:16:00 FULL 226MB 4 OK
#删除,虽然把备份删除掉了,但是一些备份信息仍然遗留在元数据中
[ptgres@other ~]$ pg_rman delete "2024-03-20 18:15:58" -f
#查看结果
[ptgres@other ~]$ pg_rman show
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
#查看还没有删除的元数据
[ptgres@other ~]$ pg_rman show -a
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
2024-03-20 18:15:58 2024-03-20 18:16:00 FULL 226MB 4 DELETED
[ptgres@other ~]$ pg_rman show
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
2024-03-20 18:15:58 2024-03-20 18:16:00 FULL 226MB 4 OK
#删除,虽然把备份删除掉了,但是一些备份信息仍然遗留在元数据中
[ptgres@other ~]$ pg_rman delete "2024-03-20 18:15:58" -f
#查看结果
[ptgres@other ~]$ pg_rman show
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
#查看还没有删除的元数据
[ptgres@other ~]$ pg_rman show -a
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
2024-03-20 18:15:58 2024-03-20 18:16:00 FULL 226MB 4 DELETED
各列含义解释:
Start : start time of backup
Time : total time of backup
Total : size of whole database cluster
Data : size of read data files
WAL : size of read WAL archive files
Log : size of read serverlog files
Backup: size of backup (= written size)
Status: status of backup. Possible values are:
OK : backup is done and validated.
DONE : backup is done, but not validated yet.
RUNNING : backup is running now.
DELETING : backup is being deleted now.
DELETED : backup has been deleted.
ERROR : backup is unavailable because some errors occur during backup.
CORRUPT : backup is unavailable because it is broken.
5.恢复
5.1增量恢复
- --recovery-target-timeline TIMELINE
- 指定恢复的时间线,如果没有指定,则用$PGDATA/global/pg_control)中的时间线。
- --recovery-target-time TIMESTAMP
- 指定恢复到哪个时间。如果没有指定,则一直恢复到最后的时间。
- --recovery-target-xid XID
- 指定恢复到哪个事务ID(XID),如果没有指定,则一直恢复到最后的XID。
- --recovery-target-inclusive
- 前面指定的恢复点(recovery-target-time、recovery-target-xid),恢复时是刚好超过这个点,还是刚好在这个点之前停掉,默认是超过这个点(即设置为true的情况)
#恢复数据之前,需要停止数据库
systemctl stop postgresql
#查看备份信息
[ptgres@other ~]$ pg_rman show
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
2024-03-20 15:18:02 2024-03-20 15:18:04 INCR 50MB 1 OK
2024-03-20 15:14:49 2024-03-20 15:14:51 FULL 272MB 1 OK
#根据时间,开始恢复
[ptgres@other ~]$ pg_rman restore --recovery-target-time='2024-03-20 15:18:02'
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2024-03-20 15:14:49"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2024-03-20 15:14:49" backup and archive log files by SIZE
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring database files from the full mode backup "2024-03-20 15:14:49"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring WAL files from backup "2024-03-20 15:14:49"
INFO: backup "2024-03-20 15:18:02" is valid
INFO: restoring WAL files from backup "2024-03-20 15:18:02"
INFO: restoring online WAL files and server log files
INFO: create pg_rman_recovery.conf for recovery-related parameters.
INFO: remove an 'include' directive added by pg_rman in postgresql.conf if exists
INFO: append an 'include' directive in postgresql.conf for pg_rman_recovery.conf
INFO: generating recovery.signal
INFO: removing standby.signal if exists to restore as primary
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started. After the recovery is done, we recommend to remove recovery-related parameters configured by pg_rman.
#--hard-copy 但经测试该参数无效,结果均为实际拷贝,所以加不加无所谓
#启动数据库
[ptgres@other ~]$ pg_ctl start
#登录数据库,此时的数据库状态是recovery,不能创建数据
[ptgres@other ~]$ psql -h 127.0.0.1 -U postgres -p 5532
psql (16.0)
Type "help" for help.
postgres=#
#查看状态,数据库处于 read-only状态,或者 启动数据库实例前在postgresql.conf中添加recovery_target_action='promote'
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
#解除状态
#查找函数
postgres=# \df *resume*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------------+------------------+---------------------+------
pg_catalog | pg_wal_replay_resume | void | | func
(1 row)
postgres=# select pg_wal_replay_resume() ;
#再次查看状态,f----->t 可用
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
#恢复数据之前,需要停止数据库
systemctl stop postgresql
#查看备份信息
[ptgres@other ~]$ pg_rman show
=====================================================================
StartTime EndTime Mode Size TLI Status
=====================================================================
2024-03-20 15:18:02 2024-03-20 15:18:04 INCR 50MB 1 OK
2024-03-20 15:14:49 2024-03-20 15:14:51 FULL 272MB 1 OK
#根据时间,开始恢复
[ptgres@other ~]$ pg_rman restore --recovery-target-time='2024-03-20 15:18:02'
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2024-03-20 15:14:49"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2024-03-20 15:14:49" backup and archive log files by SIZE
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring database files from the full mode backup "2024-03-20 15:14:49"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring WAL files from backup "2024-03-20 15:14:49"
INFO: backup "2024-03-20 15:18:02" is valid
INFO: restoring WAL files from backup "2024-03-20 15:18:02"
INFO: restoring online WAL files and server log files
INFO: create pg_rman_recovery.conf for recovery-related parameters.
INFO: remove an 'include' directive added by pg_rman in postgresql.conf if exists
INFO: append an 'include' directive in postgresql.conf for pg_rman_recovery.conf
INFO: generating recovery.signal
INFO: removing standby.signal if exists to restore as primary
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started. After the recovery is done, we recommend to remove recovery-related parameters configured by pg_rman.
#--hard-copy 但经测试该参数无效,结果均为实际拷贝,所以加不加无所谓
#启动数据库
[ptgres@other ~]$ pg_ctl start
#登录数据库,此时的数据库状态是recovery,不能创建数据
[ptgres@other ~]$ psql -h 127.0.0.1 -U postgres -p 5532
psql (16.0)
Type "help" for help.
postgres=#
#查看状态,数据库处于 read-only状态,或者 启动数据库实例前在postgresql.conf中添加recovery_target_action='promote'
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
#解除状态
#查找函数
postgres=# \df *resume*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------------+------------------+---------------------+------
pg_catalog | pg_wal_replay_resume | void | | func
(1 row)
postgres=# select pg_wal_replay_resume() ;
#再次查看状态,f----->t 可用
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
6.备库中备份
pg_rman支持在备库的节点做备份,这样可以减少对主库的IO压力。pg_rman在备库上做备份,也需要连接主库中,执行一些操作。另需要在备库中打开归档,即在postgresql.conf文件中做如下配置:
archive_mode = always
archive_command = 'test ! -f /home/postgres/pgarch/%f && cp %p /home/postgres/pgarch/%f'
archive_mode = always
archive_command = 'test ! -f /home/postgres/pgarch/%f && cp %p /home/postgres/pgarch/%f'
注意在备库中打开归档需要archive_mode = always
,而不是archive_mode = on
,因为archive_mode=on
只在主库上生效,在备库上不生效。
vim pg_rman.ini
#添加从库信息
STANDBY_HOST=192.168.6.16
STANDBY_PORT=5432
#如果提示需要输入密码,可以把密码设置到环境变量中
export PGPASSWORD=*******
pg_rman backup -bf --host=192.168.6.15
vim pg_rman.ini
#添加从库信息
STANDBY_HOST=192.168.6.16
STANDBY_PORT=5432
#如果提示需要输入密码,可以把密码设置到环境变量中
export PGPASSWORD=*******
pg_rman backup -bf --host=192.168.6.15
7.案例
#创建表
create table test16(id int, info text);
#插入数据
insert into test16 select n, n||'info' from generate_series(1,10) n;
#切下point
checkpoint ;
select pg_switch_wal();
#rman 初始化
pg_rman init
#rman 开始全备
pg_rman backup -bf -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate
#再次创建表数据
create table test17(id int, info text);
insert into test17 select n, n||'info' from generate_series(1,10) n;
checkpoint ;
select pg_switch_wal();
#rman 开始增量
pg_rman backup -bi -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate
#模拟数据丢失
#删除表
drop table test17 ;
\q
#开始恢复
看标题5
#创建表
create table test16(id int, info text);
#插入数据
insert into test16 select n, n||'info' from generate_series(1,10) n;
#切下point
checkpoint ;
select pg_switch_wal();
#rman 初始化
pg_rman init
#rman 开始全备
pg_rman backup -bf -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate
#再次创建表数据
create table test17(id int, info text);
insert into test17 select n, n||'info' from generate_series(1,10) n;
checkpoint ;
select pg_switch_wal();
#rman 开始增量
pg_rman backup -bi -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate
#模拟数据丢失
#删除表
drop table test17 ;
\q
#开始恢复
看标题5