Skip to content

1.pg_rman介绍

pg_rman(物理备份)类似于oracle的rman备份策略,实现了全量、增量、归档的层级的备份,可以很灵活的管理PostgreSQL数据库的备份,pg_rman是一款开源的pg的备份恢复插件支持在线和基于PITR的备份恢复方式。

  • 特点

○ 使用简单,一个命令即可完成备份和恢复.

○ 支持在线全备,增量备份,归档备份.

○ 支持备份压缩,通过gzip工具实现页内压缩.

○ 自动备份维护,自动删除过期的WAL备份文件.

○ 支持备份验证.

○ 恢复期间无事务丢失,支持基于PITR的配置文件生成器

  • 注意事项

○ pg_rman 基于 pg_start_backup

○ pg_rman 基于需要在本地安装,跑的不是流复制协议,而是文件拷贝,所以pg_rman必须和数据库节点跑一起,不能远程备份

○ pg_rman 注意跟数据库版本配套

随便指定一个存在的库,备份的是所有的物理文件

时区一定保持一致,否则指定时间点恢复失败

如果恢复一次数据库,之后必须在全备一次,否则不能进行增量备份

2.安装

  • 文档

下载地址: https://github.com/ossc-db/pg_rman/

PostgreSQL Rpm相关依赖、插件包下载地址: https://yum.postgresql.org/10/redhat/rhel-6.7-x86_64/

手册地址: http://ossc-db.github.io/pg_rman/index.html

2.1配置环境变量

基于16实验

bash
[ptgres@other ~]$vim .bashrc

export PGPORT=5532
export PG_HOME=/data/apps/pgsql/16
export PATH=$PG_HOME/bin:$PATH
export PGDATA=/data/pgdata/data
export LD_LIBRARY_PATH=$PG_HOME/lib
export LANG=en_US.utf8
export BACKUP_PATH=/home/ptgres/pg_rman_backups
export SRVLOG_PATH=/data/pgdata/data/log
export ARCLOG_PATH=/data/pgdata/archivedir
[ptgres@other ~]$vim .bashrc

export PGPORT=5532
export PG_HOME=/data/apps/pgsql/16
export PATH=$PG_HOME/bin:$PATH
export PGDATA=/data/pgdata/data
export LD_LIBRARY_PATH=$PG_HOME/lib
export LANG=en_US.utf8
export BACKUP_PATH=/home/ptgres/pg_rman_backups
export SRVLOG_PATH=/data/pgdata/data/log
export ARCLOG_PATH=/data/pgdata/archivedir

❌ 注意

环境一定的配置正确,否则,init的时候会出现报错

[ptgres@other ~]$ pg_rman init

pg_rman: logging.c:226: pg_log_generic_v: Assertion `progname' failed. Aborted (core dumped)

2.2下载

wget https://github.com/ossc-db/pg_rman/releases/download/V1.3.16/pg_rman-1.3.16-pg16.tar.gz

2.3安装

bash
#安装依赖
yum install zlib-devel

#解压
tar zxvf pg_rman-1.3.16-pg16.tar.gz && cd pg_rman-1.3.16-pg16

#编译和安装,默认安装在pg下bin目录下
make && make install
#安装依赖
yum install zlib-devel

#解压
tar zxvf pg_rman-1.3.16-pg16.tar.gz && cd pg_rman-1.3.16-pg16

#编译和安装,默认安装在pg下bin目录下
make && make install

2.4初始化

  • 前提

必须开启归档,否则失败

archive_mode = on		# enables archiving; off, on, or always
archive_command = 'test ! -f /postgresql/pgsql/archive_log/%f && cp %p /postgresql/pgsql/archive_log/%f'
archive_mode = on		# enables archiving; off, on, or always
archive_command = 'test ! -f /postgresql/pgsql/archive_log/%f && cp %p /postgresql/pgsql/archive_log/%f'
  • 初始化
bash
[ptgres@other ~]$ pg_rman init
INFO: ARCLOG_PATH is set to '/data/pgdata/archivedir'
INFO: SRVLOG_PATH is set to '/data/pgdata/data/log'
[ptgres@other ~]$ pg_rman init
INFO: ARCLOG_PATH is set to '/data/pgdata/archivedir'
INFO: SRVLOG_PATH is set to '/data/pgdata/data/log'

3.目录结构

[ptgres@other pg_rman_backups]$ ll
total 8
drwx------ 4 ptgres ptgres 34 Mar 20 14:30 backup
-rw-rw-r-- 1 ptgres ptgres 75 Mar 20 14:30 pg_rman.ini
-rw-rw-r-- 1 ptgres ptgres 40 Mar 20 14:30 system_identifier
drwx------ 2 ptgres ptgres  6 Mar 20 14:30 timeline_history
[ptgres@other pg_rman_backups]$ ll
total 8
drwx------ 4 ptgres ptgres 34 Mar 20 14:30 backup
-rw-rw-r-- 1 ptgres ptgres 75 Mar 20 14:30 pg_rman.ini
-rw-rw-r-- 1 ptgres ptgres 40 Mar 20 14:30 system_identifier
drwx------ 2 ptgres ptgres  6 Mar 20 14:30 timeline_history

3.1备份策略

pg_rman.ini

bash
[ptgres@other pg_rman_backups]$ cat pg_rman.ini
#默认只有这两个
ARCLOG_PATH='/data/pgdata/archivedir'
SRVLOG_PATH='/data/pgdata/data/log'

#14天的数据,每周进行一次全备,每周一和周三的2:00做一次增量备份,每天进行一次归档备份
COMPRESS_DATA = YES
KEEP_DATA_GENERATIONS=3
KEEP_DATA_DAYS=14
KEEP_ARCLOG_FILES=30
KEEP_ARCLOG_DAYS=20
KEEP_SRVLOG_FILES=10           # 保留最近10个日志文件(pg_log)  
KEEP_SRVLOG_DAYS=20
[ptgres@other pg_rman_backups]$ cat pg_rman.ini
#默认只有这两个
ARCLOG_PATH='/data/pgdata/archivedir'
SRVLOG_PATH='/data/pgdata/data/log'

#14天的数据,每周进行一次全备,每周一和周三的2:00做一次增量备份,每天进行一次归档备份
COMPRESS_DATA = YES
KEEP_DATA_GENERATIONS=3
KEEP_DATA_DAYS=14
KEEP_ARCLOG_FILES=30
KEEP_ARCLOG_DAYS=20
KEEP_SRVLOG_FILES=10           # 保留最近10个日志文件(pg_log)  
KEEP_SRVLOG_DAYS=20
参数说明
KEEP_DATA_GENERATIONS需要全备份数
KEEP_DATA_DAYS保留天数
KEEP_ARCLOG_DAYSwal日志保留时长,比KEEP_DATA_DAYS多一天即可
KEEP_SRVLOG_DAYS日志保留天数,单位天

image-20240320181741623

3.2backup.ini

bash
[ptgres@other 181558]$ cat backup.ini
# configuration
BACKUP_MODE=FULL
FULL_BACKUP_ON_ERROR=false
WITH_SERVERLOG=false
COMPRESS_DATA=false
# result
TIMELINEID=4
START_LSN=0/21000028
STOP_LSN=0/21000138
START_TIME='2024-03-20 18:15:58'
END_TIME='2024-03-20 18:16:00'
RECOVERY_XID=762
RECOVERY_TIME='2024-03-20 18:16:00'
TOTAL_DATA_BYTES=39632069
READ_DATA_BYTES=39631835
READ_ARCLOG_BYTES=201328288
WRITE_BYTES=226860821
BLOCK_SIZE=8192
XLOG_BLOCK_SIZE=8192
STATUS=DELETED
[ptgres@other 181558]$ cat backup.ini
# configuration
BACKUP_MODE=FULL
FULL_BACKUP_ON_ERROR=false
WITH_SERVERLOG=false
COMPRESS_DATA=false
# result
TIMELINEID=4
START_LSN=0/21000028
STOP_LSN=0/21000138
START_TIME='2024-03-20 18:15:58'
END_TIME='2024-03-20 18:16:00'
RECOVERY_XID=762
RECOVERY_TIME='2024-03-20 18:16:00'
TOTAL_DATA_BYTES=39632069
READ_DATA_BYTES=39631835
READ_ARCLOG_BYTES=201328288
WRITE_BYTES=226860821
BLOCK_SIZE=8192
XLOG_BLOCK_SIZE=8192
STATUS=DELETED

LSN将用于比对增量备份时对比数据块的LSN是否发生了变化,是否需要备份

4.备份

4.1全备

  • 随便指定一个存在的库,备份的是所有的物理文件,并且备份一次,必须验证validate,否则再次备份失败
bash
[ptgres@other ~]$ pg_rman backup -bf -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
[ptgres@other ~]$ pg_rman backup -bf -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb

-b --->full(f) ,incremental(i), archive(a)

-C ---->checkpoint

-Z ---->压缩数据

-P ----> 显示进度

  • 或者 都以参数形式
bash
pg_rman backup \    
-b full \                          # 全量备份   
-B /data05/back/pgstdbak \       # 备份目录  
-D /data04/back/pg_root_1922 \   # 备库的$PGDATA  
-s \                               # 备份pg_log  
-Z \                               # 压缩  
--keep-data-generations=3 \        # 保留3个全量备份,删除不需要的全量备份  
--keep-data-days=10 \              # 保证能恢复到10天内的任意时间点,删除不需要的  
--keep-arclog-files=30 \           # 保留最近30个归档文件  
--keep-arclog-days=20 \            # 保留20天内的归档文件  
--keep-srvlog-files=10 \           # 保留最近10个日志文件(pg_log)  
--keep-srvlog-days=20 \            # 保留20天内的日志文件(pg_log)  
--standby-host=127.0.0.1 \         # 如何连接standby  
--standby-port=1922 \              # 如何连接standby  
-h 127.0.0.1 \                     # 如何连接primary  
-p 1921 \                          # 如何连接primary  
-U postgres \                      # 如何连接primary, standby(超级用户)  
-d postgres                        # 如何连接primary, standby(database name)
pg_rman backup \    
-b full \                          # 全量备份   
-B /data05/back/pgstdbak \       # 备份目录  
-D /data04/back/pg_root_1922 \   # 备库的$PGDATA  
-s \                               # 备份pg_log  
-Z \                               # 压缩  
--keep-data-generations=3 \        # 保留3个全量备份,删除不需要的全量备份  
--keep-data-days=10 \              # 保证能恢复到10天内的任意时间点,删除不需要的  
--keep-arclog-files=30 \           # 保留最近30个归档文件  
--keep-arclog-days=20 \            # 保留20天内的归档文件  
--keep-srvlog-files=10 \           # 保留最近10个日志文件(pg_log)  
--keep-srvlog-days=20 \            # 保留20天内的日志文件(pg_log)  
--standby-host=127.0.0.1 \         # 如何连接standby  
--standby-port=1922 \              # 如何连接standby  
-h 127.0.0.1 \                     # 如何连接primary  
-p 1921 \                          # 如何连接primary  
-U postgres \                      # 如何连接primary, standby(超级用户)  
-d postgres                        # 如何连接primary, standby(database name)

4.2增量备

pg_rman最大的亮点就是实现了增量备份,注意不是基于WAL日志的增量备份,是基于上次全量备份之后发生的变化数据块的增量备份

增量备份是基于文件系统的update time时间线,增量备份必须有个对应的全库备份

bash
[ptgres@other ~]$pg_rman backup -bi -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
[ptgres@other ~]$pg_rman backup -bi -C -Z -P -h 127.0.0.1 -p 5532 -U postgres -d testdb

4.3删除备份

bash
[ptgres@other ~]$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================
2024-03-20 18:15:58  2024-03-20 18:16:00  FULL   226MB     4  OK

#删除,虽然把备份删除掉了,但是一些备份信息仍然遗留在元数据中
[ptgres@other ~]$ pg_rman delete "2024-03-20 18:15:58" -f

#查看结果
[ptgres@other ~]$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================

#查看还没有删除的元数据
[ptgres@other ~]$ pg_rman show  -a
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================
2024-03-20 18:15:58  2024-03-20 18:16:00  FULL   226MB     4  DELETED
[ptgres@other ~]$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================
2024-03-20 18:15:58  2024-03-20 18:16:00  FULL   226MB     4  OK

#删除,虽然把备份删除掉了,但是一些备份信息仍然遗留在元数据中
[ptgres@other ~]$ pg_rman delete "2024-03-20 18:15:58" -f

#查看结果
[ptgres@other ~]$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================

#查看还没有删除的元数据
[ptgres@other ~]$ pg_rman show  -a
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================
2024-03-20 18:15:58  2024-03-20 18:16:00  FULL   226MB     4  DELETED

各列含义解释:

Start : start time of backup

Time : total time of backup

Total : size of whole database cluster

Data : size of read data files

WAL : size of read WAL archive files

Log : size of read serverlog files

Backup: size of backup (= written size)

Status: status of backup. Possible values are:

OK : backup is done and validated.

DONE : backup is done, but not validated yet.

RUNNING : backup is running now.

DELETING : backup is being deleted now.

DELETED : backup has been deleted.

ERROR : backup is unavailable because some errors occur during backup.

CORRUPT : backup is unavailable because it is broken.

5.恢复

5.1增量恢复

  • --recovery-target-timeline TIMELINE
    • 指定恢复的时间线,如果没有指定,则用$PGDATA/global/pg_control)中的时间线。
  • --recovery-target-time TIMESTAMP
    • 指定恢复到哪个时间。如果没有指定,则一直恢复到最后的时间。
  • --recovery-target-xid XID
    • 指定恢复到哪个事务ID(XID),如果没有指定,则一直恢复到最后的XID。
  • --recovery-target-inclusive
    • 前面指定的恢复点(recovery-target-time、recovery-target-xid),恢复时是刚好超过这个点,还是刚好在这个点之前停掉,默认是超过这个点(即设置为true的情况)
bash
#恢复数据之前,需要停止数据库
systemctl stop postgresql

#查看备份信息
[ptgres@other ~]$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================
2024-03-20 15:18:02  2024-03-20 15:18:04  INCR    50MB     1  OK
2024-03-20 15:14:49  2024-03-20 15:14:51  FULL   272MB     1  OK

#根据时间,开始恢复
[ptgres@other ~]$ pg_rman restore --recovery-target-time='2024-03-20 15:18:02' 

INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2024-03-20 15:14:49"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2024-03-20 15:14:49" backup and archive log files by SIZE
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring database files from the full mode backup "2024-03-20 15:14:49"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring WAL files from backup "2024-03-20 15:14:49"
INFO: backup "2024-03-20 15:18:02" is valid
INFO: restoring WAL files from backup "2024-03-20 15:18:02"
INFO: restoring online WAL files and server log files
INFO: create pg_rman_recovery.conf for recovery-related parameters.
INFO: remove an 'include' directive added by pg_rman in postgresql.conf if exists
INFO: append an 'include' directive in postgresql.conf for pg_rman_recovery.conf
INFO: generating recovery.signal
INFO: removing standby.signal if exists to restore as primary
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started. After the recovery is done, we recommend to remove recovery-related parameters configured by pg_rman.


#--hard-copy 但经测试该参数无效,结果均为实际拷贝,所以加不加无所谓

#启动数据库
[ptgres@other ~]$ pg_ctl start

#登录数据库,此时的数据库状态是recovery,不能创建数据
[ptgres@other ~]$ psql -h 127.0.0.1 -U postgres -p 5532
psql (16.0)
Type "help" for help.

postgres=#

#查看状态,数据库处于 read-only状态,或者 启动数据库实例前在postgresql.conf中添加recovery_target_action='promote'
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

#解除状态
#查找函数
postgres=# \df *resume*
                                 List of functions
   Schema   |         Name         | Result data type | Argument data types | Type
------------+----------------------+------------------+---------------------+------
 pg_catalog | pg_wal_replay_resume | void             |                     | func
(1 row)

postgres=# select pg_wal_replay_resume() ;

#再次查看状态,f----->t 可用
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)
#恢复数据之前,需要停止数据库
systemctl stop postgresql

#查看备份信息
[ptgres@other ~]$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status
=====================================================================
2024-03-20 15:18:02  2024-03-20 15:18:04  INCR    50MB     1  OK
2024-03-20 15:14:49  2024-03-20 15:14:51  FULL   272MB     1  OK

#根据时间,开始恢复
[ptgres@other ~]$ pg_rman restore --recovery-target-time='2024-03-20 15:18:02' 

INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2024-03-20 15:14:49"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2024-03-20 15:14:49" backup and archive log files by SIZE
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring database files from the full mode backup "2024-03-20 15:14:49"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2024-03-20 15:14:49" is valid
INFO: restoring WAL files from backup "2024-03-20 15:14:49"
INFO: backup "2024-03-20 15:18:02" is valid
INFO: restoring WAL files from backup "2024-03-20 15:18:02"
INFO: restoring online WAL files and server log files
INFO: create pg_rman_recovery.conf for recovery-related parameters.
INFO: remove an 'include' directive added by pg_rman in postgresql.conf if exists
INFO: append an 'include' directive in postgresql.conf for pg_rman_recovery.conf
INFO: generating recovery.signal
INFO: removing standby.signal if exists to restore as primary
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started. After the recovery is done, we recommend to remove recovery-related parameters configured by pg_rman.


#--hard-copy 但经测试该参数无效,结果均为实际拷贝,所以加不加无所谓

#启动数据库
[ptgres@other ~]$ pg_ctl start

#登录数据库,此时的数据库状态是recovery,不能创建数据
[ptgres@other ~]$ psql -h 127.0.0.1 -U postgres -p 5532
psql (16.0)
Type "help" for help.

postgres=#

#查看状态,数据库处于 read-only状态,或者 启动数据库实例前在postgresql.conf中添加recovery_target_action='promote'
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

#解除状态
#查找函数
postgres=# \df *resume*
                                 List of functions
   Schema   |         Name         | Result data type | Argument data types | Type
------------+----------------------+------------------+---------------------+------
 pg_catalog | pg_wal_replay_resume | void             |                     | func
(1 row)

postgres=# select pg_wal_replay_resume() ;

#再次查看状态,f----->t 可用
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

6.备库中备份

pg_rman支持在备库的节点做备份,这样可以减少对主库的IO压力。pg_rman在备库上做备份,也需要连接主库中,执行一些操作。另需要在备库中打开归档,即在postgresql.conf文件中做如下配置:

archive_mode = always
archive_command = 'test ! -f /home/postgres/pgarch/%f && cp %p /home/postgres/pgarch/%f'
archive_mode = always
archive_command = 'test ! -f /home/postgres/pgarch/%f && cp %p /home/postgres/pgarch/%f'

注意在备库中打开归档需要archive_mode = always,而不是archive_mode = on,因为archive_mode=on只在主库上生效,在备库上不生效。

bash
vim pg_rman.ini
#添加从库信息
STANDBY_HOST=192.168.6.16
STANDBY_PORT=5432

#如果提示需要输入密码,可以把密码设置到环境变量中
export PGPASSWORD=*******


pg_rman backup -bf --host=192.168.6.15
vim pg_rman.ini
#添加从库信息
STANDBY_HOST=192.168.6.16
STANDBY_PORT=5432

#如果提示需要输入密码,可以把密码设置到环境变量中
export PGPASSWORD=*******


pg_rman backup -bf --host=192.168.6.15

7.案例

bash
#创建表
create table test16(id int, info text);

#插入数据
insert into test16 select n, n||'info' from generate_series(1,10) n;

#切下point
checkpoint ;
select pg_switch_wal();

#rman 初始化
pg_rman init

#rman 开始全备
pg_rman backup -bf -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate

#再次创建表数据
create table test17(id int, info text);
insert into test17 select n, n||'info' from generate_series(1,10) n;
checkpoint ;
select pg_switch_wal();

#rman 开始增量
pg_rman backup -bi -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate

#模拟数据丢失
#删除表
drop table test17 ; 
\q

#开始恢复
看标题5
#创建表
create table test16(id int, info text);

#插入数据
insert into test16 select n, n||'info' from generate_series(1,10) n;

#切下point
checkpoint ;
select pg_switch_wal();

#rman 初始化
pg_rman init

#rman 开始全备
pg_rman backup -bf -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate

#再次创建表数据
create table test17(id int, info text);
insert into test17 select n, n||'info' from generate_series(1,10) n;
checkpoint ;
select pg_switch_wal();

#rman 开始增量
pg_rman backup -bi -C -P -h 127.0.0.1 -p 5532 -U postgres -d testdb
pg_rman validate

#模拟数据丢失
#删除表
drop table test17 ; 
\q

#开始恢复
看标题5