PostgreSQL Data Backup Strategy

Posted by nomanoma on Wed, 24 Jul 2019 12:43:39 +0200

Summary

After any database is built and ready for use, the first thing to be determined is the database backup strategy. Rational and planned backup is the key to data security.
Here is a set of backup strategies designed for PostgreSQL database

WAL Archiving Strategy

wal log

The so-called wal is write ahead log. The history of all changes and operations in the database system is stored to ensure that the database does not lose data due to failures, such as power failures or other failures that cause server crashes. It's written before the writing operation is completed, but when it's really finished, it doesn't seem to have much meaning, but it's never used. Although wal archives are not opened in the default settings, they are automatically deleted.
Sometimes when we use full-time to recover data, the time point is always not so precise, because full-time data is usually a period of time ago, the data between which can be retrieved by using wal. That's why we recommend opening wal archives

Parameter Configuration

Before starting wal log archiving, you first need to set several parameters

  • wal_level = replica
    The optional values of this parameter are minimal, replica and logic. The level of wal increases in turn, and the more information in wal. Because wal at the minimal level does not contain sufficient information to reconstruct data from the underlying backup and wal logs, wal log archives cannot be opened in this mode.
  • archive_mode = on
    The above parameter is on, which means open archive backup. The optional parameter is on,off,always default value is off, so you have to open it manually.
  • archive_command = 'cp %p /pgdata/10/archive_wals%f'
    The default value of this parameter is an empty string whose value can be a shell command or a complex shell script. In shell scripts or commands, you can use "% p" to indicate the filename of the wal file to be archived that contains the full path information, and "% f" to represent the filename of the wal file that does not contain the path information.

Note: Both wal_level and archive_mode parameter modifications require a restart of the database to take effect. Modification of archive_command is not required. So when you configure a new system, whether you need to archive or not, it is recommended to turn on these two parameters.

Archiving Policy Scripts

The strategy for this sharing is to use a shell script to manage the archives:
In the postgres home directory, create bin and log directories respectively. The bin directory stores the script, and the log records the execution log.
The archiving strategy is to archive logs by date in archivedir under / data for 20 days. Automatic deletion 20 days ago
The archive_command parameter is configured to:

archive_command ='/bin/bash /home/postgres/bin/pg_archive.sh %p %f'

The archive_command command command does not use a simple cp command when archiving, but uses a predetermined script to backup, which contains

#!/bin/bash
source /home/postgres/.bash_profile

DATE=`date +%Y%m%d`
DIR="/data/archivedir/$DATE"
BACK="/data/archivedir/"`date -d '-20 day' +%Y%m%d`
if [ -d "$BACK" ]; then
                rm -rf $BACK
                echo "success rm $BACK" > /home/postgres/logs/pg_archive_logs
        else
                echo "the old backup file not exists!" > /home/postgres/logs/pg_archive_logs
fi

(test -d $DIR || mkdir -p $DIR) && cp $1 $DIR/$2

Physical Completeness

Using pg_basebackup

PG database has its own full backup command, which provides a convenient tool for basic backup. This command will copy the data of the whole database instance. It is often used to build backups and make full backups.

It should be noted that on the backup terminal server using pg_basebackup, it is necessary to have access to the database, that is, to specify the right in pg_hba.conf.
Each backup uses a wal slot max_wal_senders.

Backup script

Back up a little every day for two weeks.

00 00 */1 * *     /home/postgres/bin/pgbackup.sh >/home/postgres/logs/pgbackup.log 2>&1 &

#!/bin/bash

back="/data/sas_pgbackup/"`date -d '-2 week' +%Y-%m-%d`

backf="/data/sas_pgbackup/"`date +%F`"/base.tar"

echo "starting backup....."

pg_basebackup -D /data/sas_pgbackup/`date +%F` -x -R -P -F t -U postgres

if [ -f "$backf" ]; then
    echo "success backup to $backf"
        if [ -d "$back" ]; then
                rm -rf $back
        echo "success rm $back"
        else
                echo "the old backup file not exists!"
        fi
else
    echo "backup failed, because of the file $backf not found"
fi

Backup Management Tool

pg_rman

pg_rman is an excellent open source backup software specially designed for PostgreSQL database. Its idea is similar to Oracle's RMAN and it is a very convenient backup management tool.

The biggest highlight of pg_rman is the incremental backup. Note that it is not incremental backup based on WAL logs, but incremental backup based on changed data blocks after the last full backup.

The backup principle of pg_rman is introduced.

  • pg_rman does not use stream replication protocol to copy, but uses file copy, that is, first to execute the pg_start_backup() function on the main library, then to copy the data directory of the whole database, and finally to execute the pg_stop_backup() function to end the backup.
  • So pg_rman must run with the database node. Otherwise, it can be imagined that the data files of the database can not be copied, and thus can not be backed up. Therefore, when backing up, it is necessary to mount the file system of the backup machine to the database machine by using NFS sharing methods such as NAS.
  • pg_rman can backup the whole library on the standby node, but it needs to connect to the main library through the network to execute pg_start_backup and pg_stop_backup.
  • Pg_rman needs the main library to open the archive for better backup, so it needs to open the archive in postgresql. conf, so that pg_rman can determine where the archive directory of the database is. Pg_rman mainly reads the value of archive_command parameter in postgresql.conf file.
  • Pg_rman can also backup the general program log of PostgreSQL database, that is, the log file of pg_log directory. So pg_rman reads the log_directory parameter in postgresql.conf to determine which directory these logs are in. Of course, these program logs are usually not too important to backup.

Use of pg_rman

Installation of pg_rman is not going to be described in detail, but its simple use.

Do it all at once:

$ pg_rman backup --backup-mode=full
INFO: copying database files
INFO: copying archived WAL files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.

After completing it, check it:

$ pg_rman validate
INFO: validate: "2019-6-13 17:45:07" backup and archive log files by CRC
INFO: backup "2019-6-13 17:45:07" is valid
//Then use pg_rman show to view the status:

$ pg_rman show
=====================================================================
 StartTime EndTime Mode Size TLI Status
=====================================================================
2019-6-13 17:45:07 2019-6-13 17:45:09 FULL 50MB 1 OK

At this point, you can see that full backup is ok

After the increment is completed, if over a period of time, the database changes a lot, you can also make a full backup.

$ pg_rman backup --backup-mode=incremental --with-serverlog
INFO: copying database files
INFO: copying archived WAL files
INFO: copying server log files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
[postgres@pg01 ~]$ pg_rman show
=====================================================================
 StartTime EndTime Mode Size TLI Status
=====================================================================
2019-6-13 17:47:44 2019-6-13 17:47:46 INCR 67MB 1 DONE
2019-6-13 17:45:07 2019-6-13 17:45:09 FULL 50MB 1 OK

//Do not forget to do pg_rman validate when each backup is completed:

$ pg_rman validate
INFO: validate: "2019-6-13 17:47:44" backup, archive log files and server log files by CRC
INFO: backup "2019-6-13 17:47:44" is valid
$ pg_rman show
=====================================================================
 StartTime EndTime Mode Size TLI Status
=====================================================================
2019-6-13 17:47:44 2019-6-13 17:47:46 INCR 67MB 1 OK
2019-6-13 17:45:07 2019-6-13 17:45:09 FULL 50MB 1 OK

Topics: Database PostgreSQL shell Oracle