Author: Yanbo Qingyun technology database R & D Engineer
At present, he is engaged in the development of PostgreSQL products and is keen on the study and research of PostgreSQL database
Last issue We introduced the PG cluster replication management tool repmgr, which can easily build a highly available cluster of PostgreSQL. After the primary node goes down, select the standby machine to promote it to the primary node and continue to provide services.
This article will introduce in detail the deployment process of repmgr to build a PostgreSQL high availability cluster.
preparation
- Install the repmgr tool on all servers in the cluster
- The Primary server installs the PostgreSQL database, completes initialization, and starts the database normally (Primary)
|1. Main warehouse
1.1 modify postgresql.conf file
$ vim postgresql.conf max_wal_senders = 10 max_replication_slots = 10 wal_level = 'hot_standby' hot_standby = on archive_mode = on # repmgr itself does not require WAL file archiving. archive_command = '/bin/true'
In versions before PG9.6, wal_level is allowed to be set to archive and hot_standby. In the new version, these values are still accepted, but they will be mapped to replica.
1.2 creating repmgr users and libraries
Create PostgreSQL super user and database for repmgr metadata information
# su - postgres $ /usr/lib/postgresql/11/bin/createuser -s repmgr $ /usr/lib/postgresql/11/bin/createdb repmgr -O repmgr alter user repmgr with password 'test1234';
1.3 modification pg_hba.conf file
repmgr User as repmgr The default database user used by the tool $ vim pg_hba.conf local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr 0.0.0.0/0 trust local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr 0.0.0.0/0 trust
repmgr password free login settings
# Modify PG_ reload database takes effect after the hba.conf file $ su - postgres -c "/usr/lib/postgresql/11/bin/pg_ctl reload" # su postgres $ vim ~/.pgpass # Add the following contents to the ~ /. pgpass file, and change the user, database and password to your own *:*:repmgr:repmgr:test1234 # Modify ~ /. pgpass file permissions chmod 600 ~/.pgpass
1.4 create repmgr.conf file
Create a repmgr.conf file on the primary server
node_id,node_name and conninfo need to be different from the slave library
node_id=1 # Node ID, node ID of highly available cluster node_name='node1' # Node name, the name of each node in the highly available cluster, corresponding to select * from PG in the cluster_ stat_ replication; Application found in_ name conninfo='host=192.168.100.2 port=5432 user=repmgr dbname=repmgr connect_timeout=2' # All servers in the cluster must be able to connect to the local node using this string data_directory='/data/pgsql/main' # pg data directory replication_user='repmgr' # Stream replication database user. repmgr is used by default repmgr_bindir='/usr/lib/postgresql/11/bin' # repmgr software directory pg_bindir='/usr/lib/postgresql/11/bin' # pg software directory # Log management log_level=INFO log_file='/data/pglog/repmgr/repmgrd.log' # The log file needs to be created in advance log_status_interval=10 # This setting causes repmgrd to issue status log lines at the specified time interval (in seconds, the default is 300), describing the current status of repmgrd, # For example: [2021-09-28 17:51:15] [INFO] monitoring primary node "node1" (ID: 1) in normal state # pg and repmgr service management commands service_start_command='/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ start -o \'-c config_file=/etc/postgresql/11/main/postgresql.conf\' -l /data/pglog/start.log' service_stop_command='/usr/lib/postgresql/11/bin/pg_ctl stop' service_restart_command='/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ restart -o \'-c config_file=/etc/postgresql/11/main/postgresql.conf\' -l /data/pglog/start.log' service_reload_command='su - postgres -c \'/usr/lib/postgresql/11/bin/pg_ctl reload\' ' repmgrd_pid_file='/tmp/repmgrd.pid' # pid file of repmgrd runtime repmgrd_service_start_command='/usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start' repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd.pid`' # failover settings failover=automatic promote_command='/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf standby promote --log-to-file' #When repmgrd determines that the current node will become the new master node, promote will be executed in the case of failover_ A program or script defined in command follow_command='/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf standby follow --log-to-file --upstream-node-id=%n' # %n will be replaced with the ID of the new primary node. If not provided, repmgr standby follow will try to determine the new primary repmgr standby follow node by itself, # However, if the original master node goes online again after the new master node is promoted, there is a risk that the node will continue to follow the original master node. # High availability parameter settings location='location1' # Any string that defines the location of the node and is used to check the visibility of the current master node during failover priority=100 # Node priority, which may be used when selecting primary. (lsn > priority > node_id) # 0 means that the node will not be promoted as the master node monitoring_history=yes # Whether to write monitoring data to the "monitoring_history" table reconnect_interval=10 # The interval (in seconds) between attempts to reconnect before failover reconnect_attempts=6 # Number of attempts to reconnect before failover connection_check_type=ping # ping: repmg test connection using PQPing() method # Connection: attempt to establish a new connection with the node # query: execute SQL statements on nodes through existing connections monitor_interval_secs=5 # Interval for writing monitoring data use_replication_slots=true # failover_validation_command= # %n (node_id), %a (node_name). # Customize the script to validate the failover decisions made by repmgrd # This script must return an exit code of 0 to indicate that the node should promote itself as the master node.
The repmgr.conf file of this example is placed in the following location: / etc/postgresql/11/main/repmgr.conf.
[notes for use]
- repmgr.conf should not be stored in the PostgreSQL data directory because it may be overwritten when setting up or reinitializing the PostgreSQL server;
- If the repmgr binary is placed outside the PostgreSQL installation directory, specify repmgr_bindir to enable repmgr to perform operations on other nodes (for example: repmgr cluster crosscheck).
1.5 register master server
To enable repmgr to support replication clusters, you must register the primary node (repmgr primary register) with repmgr. This installs the repmgr extension and metadata objects, and adds metadata records to the primary server.
# su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf primary register" INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (ID: 1) registered
- View cluster information
# su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf cluster show" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=127.0.0.1 port=5432 user=repmgr dbname=repmgr connect_timeout=2 password=test1234
- View repmgr metadata table
repmgr=# SELECT * FROM repmgr.nodes; node_id | upstream_node_id | active | node_name | type | location | priority | conninfo | repluser | slot_name | config_file ---------+------------------+--------+-----------+---------+----------+----------+---------------------------------------------------------------------------------------------+----------+-----------+------------------------------------- 1 | | t | node1 | primary | default | 100 | host=127.0.0.1 port=5432 user=repmgr dbname=repmgr connect_timeout=2 password=test1234 | repmgr | | /etc/postgresql/11/main/repmgr.conf
- The configuration file changes and needs to be executed at each node
$ repmgr primary register --force -f /path/to/repmgr.conf $ repmgr standby register --force -f /path/to/repmgr.conf $ repmgr witness register --force -f /path/to/repmgr.conf -h primary_host
[notes for use]
repmgr cannot run as root.
1.6 start repmgrd
1. Modify the postgresql.conf file
Join the repmgr shared library (just join repmgr in the previous shared library).
shared_preload_libraries = 'passwordcheck, repmgr'
2. Restart database
/usr/lib/postgresql/11/bin/pg_ctl restart
3. Start repmgrd service
# To create a log file, the log file of repmgrd needs to be created manually su postgres mkdir -p /data/pglog/repmgr/ touch /data/pglog/repmgr/repmgrd.log # Start repmgrd service /usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start
1.7 repmgrd log rotation
To ensure that the current repmgrd log file (the file specified with the parameter log_file in the repmgr.conf configuration file) does not grow indefinitely, configure your system logrotate to rotate it regularly.
vim /etc/logrotate.d/repmgr /data/pglog/repmgr/repmgrd.log { missingok compress rotate 52 maxsize 100M weekly create 0600 postgres postgres postrotate /usr/bin/killall -HUP repmgrd endscript }
1.8 repmgrd reload configuration
# 1. kill old process kill -9 `cat /tmp/repmgrd.pid` # 2,start /usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start
|2. Spare parts
[notes for use]
On the standby database, do not create a PostgreSQL instance (that is, do not execute initdb or any database creation script provided by the package), but ensure that the target data directory (and any other directory you want PostgreSQL to use) exists and belongs to its postgres system users. Permission must be set to 0700 (drwx ------).
2.1 create repmgr.com file
Create a repmgr.conf file on the standby server. The repmgr configuration file is the same as the main library. Pay attention to modifying the node in it_ id,node_name and conninfo are this node.
2.2 check whether the standby database can be cloned
Before the standby server node is registered, it is not necessary to initialize the PostgreSQL database. It can be "one click" deployed through the repmgr tool. Before cloning an alternate server, you can use the following command to test whether it can be cloned.
Use the -- dry run option to check whether the standby database can be cloned
$ su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -h 192.168.100.2 -U repmgr -d repmgr -f /etc/postgresql/11/main/repmgr.conf standby clone --dry-run" NOTICE: destination directory "/data/pgsql/main" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.100.2 user=repmgr dbname=repmgr DETAIL: current installation size is 38 MB INFO: "repmgr" extension is installed in database "repmgr" INFO: replication slot usage not requested; no replication slot will be set up for this standby INFO: parameter "max_wal_senders" set to 10 NOTICE: checking for available walsenders on the source node (2 required) INFO: sufficient walsenders available on the source node DETAIL: 2 required, 10 available NOTICE: checking replication connections can be made to the source server (2 required) INFO: required number of replication connections could be made to the source server DETAIL: 2 replication connections required NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: would execute: pg_basebackup -l "repmgr base backup" -D /data/pgsql/main -h 192.168.100.2 -p 5432 -U repmgr -X stream INFO: all prerequisites for "standby clone" are met
An error is reported, which proves that the password free login of the primary node is not configured!
NOTICE: destination directory "/data/pgsql/main" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.100.2 user=repmgr dbname=repmgr ERROR: connection to database failed DETAIL: fe_sendauth: no password supplied
2.3 clone backup Library
$ su - postgres -c "/usr/lib/postgresql/11/bin/repmgr -h 192.168.100.2 -U repmgr -d repmgr -f /etc/postgresql/11/main/repmgr.conf standby clone" NOTICE: destination directory "/data/pgsql/main" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.100.2 user=repmgr dbname=repmgr DETAIL: current installation size is 38 MB INFO: replication slot usage not requested; no replication slot will be set up for this standby NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/data/pgsql/main" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: pg_basebackup -l "repmgr base backup" -D /data/pgsql/main -h 192.168.100.2 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /data/pgsql/main start HINT: after starting the server, you need to register this standby with "repmgr standby register"
This represents PG using PostgreSQL_ The basebackup tool cloned the PostgreSQL data directory file from 192.168.100.2. A recovery.conf file containing the correct parameters to start streaming from the master server is automatically created. By default, any configuration files in the primary data directory are copied to the standby. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may need to be modified before standby starts.
2.4 modifying configuration files
Modify postgresql.conf and pg_hba.conf configuration file to configure password free login.
2.5 start standby database
# su postgres $ /usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ start -o '-c config_file=/etc/postgresql/11/main/postgresql.conf' -l /data/pglog/start.log
2.6 register slave database as standby server
# su postgres $ /usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf --upstream-node-id=1 standby register
2.7 start repmgrd
1. Modify the postgresql.conf file and add the repmgr shared library
shared_preload_libraries = 'passwordcheck, repmgr'
2. Restart database
/usr/lib/postgresql/11/bin/pg_ctl restart
3. Start repmgrd
# Create log file su postgres mkdir -p /data/pglog/repmgr/ touch /data/pglog/repmgr/repmgrd.log # Start repmgrd service /usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start
2.8 repmgrd log rotation
To ensure that the current repmgrd log file (the file specified with the parameter log_file in the repmgr.conf configuration file) does not grow indefinitely, configure your system logrotate to rotate it regularly.
# vim /etc/logrotate.d/repmgr /data/pglog/repmgr/repmgrd.log { missingok compress rotate 52 maxsize 100M weekly create 0600 postgres postgres postrotate /usr/bin/killall -HUP repmgrd endscript }
2.9 repmgrd overload configuration
# 1. kill old process kill -9 `cat /tmp/repmgrd.pid` # 2,start /usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start
|3 witness server
[notes for use]
- The witness server is only useful when using repmgrd;
- In the case of failover, the witness server provides evidence that the primary server itself is unavailable, rather than, for example, network separation between different physical locations (to prevent brain crack problems);
- Please set up a common PostgreSQL instance on the server in the same network segment as the cluster master server, install repmgr and repmgrd, and register the instance as witness (repmgr witness register) (the witness server Database system identifier cannot be the same as the cluster master server).
3.1 start node postgres service
/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ start
3.2 add repmgr.conf configuration
The basic configuration is the same as that of the main library, and the node is maintained_ id,node_name and conninfo are different from the main database.
3.3 start repmgrd
1. Modify the postgresql.conf file and add the repmgr shared library
shared_preload_libraries = 'passwordcheck, repmgr'
2. Restart database
/usr/lib/postgresql/11/bin/pg_ctl -D /data/pgsql/main/ restart -o '-c config_file=/etc/postgresql/11/main/postgresql.conf' -l /data/pglog/start.log
3. Start repmgrd
# Create log file su postgres mkdir -p /data/pglog/repmgr/ touch /data/pglog/repmgr/repmgrd.log # Start repmgrd service /usr/lib/postgresql/11/bin/repmgrd -f /etc/postgresql/11/main/repmgr.conf start
4. To ensure that the current repmgrd log file (the file specified with the parameter log_file in the repmgr.conf configuration file) will not grow indefinitely, configure your system logrotate to rotate it regularly
# vim /etc/logrotate.d/repmgr /data/pglog/repmgr/repmgrd.log { missingok compress rotate 52 maxsize 100M weekly create 0600 postgres postgres postrotate /usr/bin/killall -HUP repmgrd endscript }
3.4 register witness
/usr/lib/postgresql/11/bin/repmgr -f /etc/postgresql/11/main/repmgr.conf witness register -h 192.168.100.2
|Summary
So far, a PostgreSQL high availability cluster has been built based on repmgr (repmgr itself does not provide virtual ip services. If you need virtual ip services, please use kept or other tools). It has the functions of cluster status monitoring, fault detection, fault transfer and so on. For more advanced functions and principles of repmgr, such as dealing with network splitting, main visibility consensus, cascade replication, monitoring the number of connections, event notification, etc., please refer to the official documents for further study.
reference resources
[1]. repmgr.conf configuration: https://raw.githubusercontent...