High availability | repmgr build PostgreSQL high availability cluster deployment document [recommended collection]

Posted by moriman on Fri, 03 Dec 2021 09:52:27 +0100

Author: Yanbo Qingyun technology database R & D Engineer

At present, he is engaged in the development of PostgreSQL products and is keen on the study and research of PostgreSQL database

Last issue We introduced the PG cluster replication management tool repmgr, which can easily build a highly available cluster of PostgreSQL. After the primary node goes down, select the standby machine to promote it to the primary node and continue to provide services.

This article will introduce in detail the deployment process of repmgr to build a PostgreSQL high availability cluster.

preparation

  1. Install the repmgr tool on all servers in the cluster
  2. The Primary server installs the PostgreSQL database, completes initialization, and starts the database normally (Primary)

|1. Main warehouse

1.1 modify postgresql.conf file

$ vim postgresql.conf
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on   # repmgr itself does not require WAL file archiving.
archive_command = '/bin/true'

In versions before PG9.6, wal_level is allowed to be set to archive and hot_standby. In the new version, these values are still accepted, but they will be mapped to replica.

1.2 creating repmgr users and libraries

Create PostgreSQL super user and database for repmgr metadata information

# su - postgres 
$ /usr/lib/postgresql/11/bin/createuser -s repmgr
$ /usr/lib/postgresql/11/bin/createdb repmgr -O repmgr

alter user repmgr with password 'test1234';

1.3 modification pg_hba.conf file

repmgr User as repmgr The default database user used by the tool
$ vim pg_hba.conf
local   replication   repmgr                              trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      0.0.0.0/0               trust

local   repmgr        repmgr                              trust
host    repmgr        repmgr      127.0.0.1/32            trust
host    repmgr        repmgr      0.0.0.0/0               trust

repmgr password free login settings

# Modify PG_ reload database takes effect after the hba.conf file
$ su - postgres -c "/usr/lib/postgresql/11/bin/pg_ctl reload"

# su postgres
$ vim ~/.pgpass

# Add the following contents to the ~ /. pgpass file, and change the user, database and password to your own
*:*:repmgr:repmgr:test1234

# Modify ~ /. pgpass file permissions
chmod 600 ~/.pgpass

1.4 create repmgr.conf file

Create a repmgr.conf file on the primary server

node_id,node_name and conninfo need to be different from the slave library

node_id=1                                     # Node ID, node ID of highly available cluster
node_name='node1'                             # Node name, the name of each node in the highly available cluster, corresponding to select * from PG in the cluster_ stat_ replication;  Application found in_ name
conninfo='host=192.168.100.2 port=5432 user=repmgr dbname=repmgr connect_timeout=2'     # All servers in the cluster must be able to connect to the local node using this string
data_directory='/data/pgsql/main'             # pg data directory
replication_user='repmgr'                     # Stream replication database user. repmgr is used by default
repmgr_bindir='/usr/lib/postgresql/11/bin'    # repmgr software directory
pg_bindir='/usr/lib/postgresql/11/bin'        # pg software directory

# Log management
log_level=INFO
log_file='/data/pglog/repmgr/repmgrd.log'      # The log file needs to be created in advance
log_status_interval=10                         # This setting causes repmgrd to issue status log lines at the specified time interval (in seconds, the default is 300), describing the current status of repmgrd, 
              # For example: [2021-09-28 17:51:15] [INFO] monitoring primary node "node1" (ID: 1) in normal state

# pg and repmgr service management commands
service_start_command='/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ start -o \'-c config_file=/etc/postgresql/11/main/postgresql.conf\' -l /data/pglog/start.log'
service_stop_command='/usr/lib/postgresql/11/bin/pg_ctl stop'
service_restart_command='/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ restart -o \'-c config_file=/etc/postgresql/11/main/postgresql.conf\' -l /data/pglog/start.log'
service_reload_command='su - postgres -c \'/usr/lib/postgresql/11/bin/pg_ctl reload\' '

repmgrd_pid_file='/tmp/repmgrd.pid'              # pid file of repmgrd runtime
repmgrd_service_start_command='/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd.pid`'

# failover settings
failover=automatic
promote_command='/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf standby promote  --log-to-file'        #When repmgrd determines that the current node will become the new master node, promote will be executed in the case of failover_ A program or script defined in command
follow_command='/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf  standby follow --log-to-file --upstream-node-id=%n'        # %n will be replaced with the ID of the new primary node. If not provided, repmgr standby follow will try to determine the new primary repmgr standby follow node by itself,
                                                                                                           # However, if the original master node goes online again after the new master node is promoted, there is a risk that the node will continue to follow the original master node.
# High availability parameter settings
location='location1'                # Any string that defines the location of the node and is used to check the visibility of the current master node during failover
priority=100                        # Node priority, which may be used when selecting primary. (lsn > priority > node_id)
                                    # 0 means that the node will not be promoted as the master node
monitoring_history=yes              # Whether to write monitoring data to the "monitoring_history" table
reconnect_interval=10               # The interval (in seconds) between attempts to reconnect before failover
reconnect_attempts=6                # Number of attempts to reconnect before failover
connection_check_type=ping          # ping: repmg test connection using PQPing() method
                                    # Connection: attempt to establish a new connection with the node
                                    # query: execute SQL statements on nodes through existing connections
monitor_interval_secs=5             # Interval for writing monitoring data
use_replication_slots=true
# failover_validation_command=      # %n (node_id), %a (node_name). 
                                    # Customize the script to validate the failover decisions made by repmgrd
                                    # This script must return an exit code of 0 to indicate that the node should promote itself as the master node.

The repmgr.conf file of this example is placed in the following location: / etc/postgresql/11/main/repmgr.conf.

[notes for use]

  • repmgr.conf should not be stored in the PostgreSQL data directory because it may be overwritten when setting up or reinitializing the PostgreSQL server;
  • If the repmgr binary is placed outside the PostgreSQL installation directory, specify repmgr_bindir to enable repmgr to perform operations on other nodes (for example: repmgr cluster crosscheck).

1.5 register master server

To enable repmgr to support replication clusters, you must register the primary node (repmgr primary register) with repmgr. This installs the repmgr extension and metadata objects, and adds metadata records to the primary server.

# su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf primary register"
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
  • View cluster information
# su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf cluster show"
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                          
----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 port=5432 user=repmgr dbname=repmgr connect_timeout=2 password=test1234
  • View repmgr metadata table
repmgr=# SELECT * FROM repmgr.nodes;

 node_id | upstream_node_id | active | node_name |  type   | location | priority |                                          conninfo                                           | repluser | slot_name |             config_file             
---------+------------------+--------+-----------+---------+----------+----------+---------------------------------------------------------------------------------------------+----------+-----------+-------------------------------------
       1 |                  | t      | node1     | primary | default  |      100 | host=127.0.0.1 port=5432 user=repmgr dbname=repmgr connect_timeout=2 password=test1234 | repmgr   |           | /etc/postgresql/11/main/repmgr.conf
  • The configuration file changes and needs to be executed at each node
$ repmgr primary register --force -f /path/to/repmgr.conf
$ repmgr standby register --force -f /path/to/repmgr.conf
$ repmgr witness register --force -f /path/to/repmgr.conf -h primary_host

[notes for use]

repmgr cannot run as root.

1.6 start repmgrd

1. Modify the postgresql.conf file

Join the repmgr shared library (just join repmgr in the previous shared library).

shared_preload_libraries = 'passwordcheck, repmgr'

2. Restart database

/usr/lib/postgresql/11/bin/pg_ctl restart

3. Start repmgrd service

# To create a log file, the log file of repmgrd needs to be created manually
su postgres
mkdir -p /data/pglog/repmgr/
touch /data/pglog/repmgr/repmgrd.log

# Start repmgrd service
/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start

1.7 repmgrd log rotation

To ensure that the current repmgrd log file (the file specified with the parameter log_file in the repmgr.conf configuration file) does not grow indefinitely, configure your system logrotate to rotate it regularly.

vim /etc/logrotate.d/repmgr
    /data/pglog/repmgr/repmgrd.log {
        missingok
        compress
        rotate 52
        maxsize 100M
        weekly
        create 0600 postgres postgres
        postrotate
            /usr/bin/killall -HUP repmgrd
        endscript
    }

1.8 repmgrd reload configuration

# 1. kill old process
kill -9 `cat /tmp/repmgrd.pid`

# 2,start
/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start

|2. Spare parts

[notes for use]

On the standby database, do not create a PostgreSQL instance (that is, do not execute initdb or any database creation script provided by the package), but ensure that the target data directory (and any other directory you want PostgreSQL to use) exists and belongs to its postgres system users. Permission must be set to 0700 (drwx ------).

2.1 create repmgr.com file

Create a repmgr.conf file on the standby server. The repmgr configuration file is the same as the main library. Pay attention to modifying the node in it_ id,node_name and conninfo are this node.

2.2 check whether the standby database can be cloned

Before the standby server node is registered, it is not necessary to initialize the PostgreSQL database. It can be "one click" deployed through the repmgr tool. Before cloning an alternate server, you can use the following command to test whether it can be cloned.

Use the -- dry run option to check whether the standby database can be cloned

$ su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -h 192.168.100.2 -U repmgr -d repmgr -f /etc/postgresql/11/main/repmgr.conf standby clone --dry-run"

NOTICE: destination directory "/data/pgsql/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.100.2 user=repmgr dbname=repmgr
DETAIL: current installation size is 38 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  pg_basebackup -l "repmgr base backup"  -D /data/pgsql/main -h 192.168.100.2 -p 5432 -U repmgr -X stream 
INFO: all prerequisites for "standby clone" are met

An error is reported, which proves that the password free login of the primary node is not configured!

NOTICE: destination directory "/data/pgsql/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.100.2 user=repmgr dbname=repmgr
ERROR: connection to database failed
DETAIL: 
fe_sendauth: no password supplied

2.3 clone backup Library

$ su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -h 192.168.100.2 -U repmgr -d repmgr -f /etc/postgresql/11/main/repmgr.conf standby clone"

NOTICE: destination directory "/data/pgsql/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.100.2 user=repmgr dbname=repmgr
DETAIL: current installation size is 38 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/data/pgsql/main"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  pg_basebackup -l "repmgr base backup"  -D /data/pgsql/main -h 192.168.100.2 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /data/pgsql/main start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

This represents PG using PostgreSQL_ The basebackup tool cloned the PostgreSQL data directory file from 192.168.100.2. A recovery.conf file containing the correct parameters to start streaming from the master server is automatically created. By default, any configuration files in the primary data directory are copied to the standby. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may need to be modified before standby starts.

2.4 modifying configuration files

Modify postgresql.conf and pg_hba.conf configuration file to configure password free login.

2.5 start standby database

# su postgres
$ /usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ start -o '-c config_file=/etc/postgresql/11/main/postgresql.conf' -l /data/pglog/start.log

2.6 register slave database as standby server

# su postgres
$ /usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf --upstream-node-id=1 standby register

2.7 start repmgrd

1. Modify the postgresql.conf file and add the repmgr shared library

shared_preload_libraries = 'passwordcheck, repmgr'

2. Restart database

/usr/lib/postgresql/11/bin/pg_ctl restart

3. Start repmgrd

# Create log file
su postgres
mkdir -p /data/pglog/repmgr/
touch /data/pglog/repmgr/repmgrd.log

# Start repmgrd service
/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start

2.8 repmgrd log rotation

To ensure that the current repmgrd log file (the file specified with the parameter log_file in the repmgr.conf configuration file) does not grow indefinitely, configure your system logrotate to rotate it regularly.

#  vim /etc/logrotate.d/repmgr
    /data/pglog/repmgr/repmgrd.log {
         missingok
        compress
        rotate 52
        maxsize 100M
        weekly
        create 0600 postgres postgres
        postrotate
            /usr/bin/killall -HUP repmgrd
        endscript
    }

2.9 repmgrd overload configuration

# 1. kill old process
kill -9 `cat /tmp/repmgrd.pid`

# 2,start
/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start

|3 witness server

[notes for use]

  • The witness server is only useful when using repmgrd;
  • In the case of failover, the witness server provides evidence that the primary server itself is unavailable, rather than, for example, network separation between different physical locations (to prevent brain crack problems);
  • Please set up a common PostgreSQL instance on the server in the same network segment as the cluster master server, install repmgr and repmgrd, and register the instance as witness (repmgr witness register) (the witness server Database system identifier cannot be the same as the cluster master server).

3.1 start node postgres service

/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ start

3.2 add repmgr.conf configuration

The basic configuration is the same as that of the main library, and the node is maintained_ id,node_name and conninfo are different from the main database.

3.3 start repmgrd

1. Modify the postgresql.conf file and add the repmgr shared library

shared_preload_libraries = 'passwordcheck, repmgr'

2. Restart database

/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ restart -o '-c config_file=/etc/postgresql/11/main/postgresql.conf' -l /data/pglog/start.log

3. Start repmgrd

# Create log file
su postgres
mkdir -p /data/pglog/repmgr/
touch /data/pglog/repmgr/repmgrd.log

# Start repmgrd service
/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start

4. To ensure that the current repmgrd log file (the file specified with the parameter log_file in the repmgr.conf configuration file) will not grow indefinitely, configure your system logrotate to rotate it regularly

#  vim /etc/logrotate.d/repmgr
    /data/pglog/repmgr/repmgrd.log {
        missingok
         compress
        rotate 52
        maxsize 100M
        weekly
        create 0600 postgres postgres
        postrotate
            /usr/bin/killall -HUP repmgrd
        endscript
    }

3.4 register witness

/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf witness register -h 192.168.100.2

|Summary

So far, a PostgreSQL high availability cluster has been built based on repmgr (repmgr itself does not provide virtual ip services. If you need virtual ip services, please use kept or other tools). It has the functions of cluster status monitoring, fault detection, fault transfer and so on. For more advanced functions and principles of repmgr, such as dealing with network splitting, main visibility consensus, cascade replication, monitoring the number of connections, event notification, etc., please refer to the official documents for further study.

reference resources

[1]. repmgr.conf configuration: https://raw.githubusercontent...

Topics: Database PostgreSQL