mysql master-slave replication (analysis from the point often asked in the interview)

Posted by PhantomCode on Wed, 09 Feb 2022 04:23:44 +0100

What is the architecture of master-slave replication?

One master and one slave, write from the master library and read from the library
The slave library can only be set as read-only, otherwise it will be messy to synchronize the writing of both the master library and the client.

What is the role of master-slave replication?

  1. The function of hot backup. If the master database hangs, you can use the slave database to continue working
  2. Read write separation improves performance. In particular, some queries have a great impact on performance.
  3. To facilitate the expansion of the architecture, even one master is never enough in the later stage. For example, if there is A large demand for writing, it can be written in the main database, and the data placement points can be distributed according to the business sub database or hash. If the demand for query is large, you can add slave libraries, one master and two slaves, or even one master and multiple slaves. In order to reduce the pressure on the master database, cascade replication can be adopted. (A is the main library)

Principle of master-slave replication?


Probably the process is

  1. Execute write sql from master database
  2. The slave database is connected to the master database (so its replication is asynchronous, because the master database will transmit binlog to the slave database only after the slave database initiates a request)
  3. The master library creates a thread binlog dump thread to write binlog logs to the slave library
  4. Create an i/O thread from the library, read the contents of binlog log and write them to relay log
  5. Create an sql thread from the library, read the content from the real log and write to the slave library
    To sum up, there are two logs and three threads.

Delay of master-slave replication

It can be seen that one process of master-slave replication involves two disk writes, and although it is synchronous in macro view, it is actually an asynchronous replication, so there will be some delay, but the delay is relatively small.
The delay time is related to the complexity of the architecture. One master-slave and one master-slave must have different delays
It is also related to physical machines. One physical machine must have less delay than two physical machines, and one computer room must have less delay than two computer rooms
Generally speaking, the delay I have used is within 1ms, so the difference is almost that there is no delay.

Under what circumstances will master-slave inconsistency occur

Generally speaking, there are the following scenarios of master-slave inconsistency

Network Delay

This is mainly due to the hardware problem. For example, the network delay caused by cross machine room is too long, resulting in the inconsistency between master and slave that affects the use. If you really encounter the inconsistency between master and slave caused by too long network delay, you can only selectively read the master.
For example, the delay time is 1 s
The host writes a piece of data and caches the written records in the cache
If another write request hits the cache within one second, it is considered that the slave database has not completed synchronization at this time, and ask it to query the master database

The loads of the master and slave machines are inconsistent

In the process of master-slave replication, the master library needs to create one thread and the slave library needs to create two threads. Insufficient resources of any machine will lead to thread creation failure or waiting, resulting in inconsistency between master and slave.

Parameter configuration problem - Max_ allowed_ Inconsistent packet settings

If there is a problem with writing long sql, take a look at this parameter. It is likely that the value of slave database is relatively large, which can allow long sql execution, while the value of slave database is relatively small, so sql of the same length cannot be executed.

Self increasing inconsistent key

This is generally rare, unless there are a large number of data inconsistencies and obvious characteristics

Setting of synchronization parameters

  • innodb_flush_log_at_trx_commit: controls redo Delay of log flushing to disk
  • sync_binlog: controls the delay of binlog log flushing to disk
    See details Do you know the two important configuration parameters of MySQL master-slave replication?.
    These two parameters control the write delay. If it is not set properly and the log has not been written to the disk, mysql will be down, data will be lost and transaction loss will occur. Naturally, the problem of inconsistency between master and slave will also occur.

Asynchronous replication itself

Asynchronous replication is a process in which the master database sends a binlog and directly returns it to the client, regardless of whether the slave database is synchronized or not. mysql5. Semi synchronous replication already exists on 7. The parameters are as follows

- after_sync : The master database does not commit the transaction first, and then submits the transaction after the result is returned from the database

Duplicate server ID

The server IDs of multiple slave s are duplicated due to human configuration

Version inconsistency

This is relatively rare. Generally, the version of the master library and slave library will be set to be consistent

How to solve the inconsistency between master and slave

You can refer to this article
mysql master-slave synchronization inconsistency solution.
Found master-slave inconsistency
View the status of the next main library thread

show processlist;

Check the status of the main library

show master status;

Check the status of the slave library. If the following occurs, it indicates that the sql thread is hung

mysql> show slave status\G                                                

Slave_IO_Running: Yes
Slave_SQL_Running: No

Check the log again. What is the error

Delete failed

Last_SQL_Error: Could not execute Delete_rows event on table hcy.t1; 
Can't find record in 't1',
Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; 
the event's master log mysql-bin.000006, end_log_pos 254

Since the main database has been deleted and the data cannot be found from the database, it is good to skip directly from the database

stop slave;
// Skip step error
set global sql_slave_skip_counter =1;
start slave;

Duplicate primary key

The record already exists in the slave, and the same record is inserted in the master

Last_SQL_Error: Could not execute Write_rows event on table hcy.t1; 
Duplicate entry '2' for key 'PRIMARY', 
Error_code: 1062; 
handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000006, end_log_pos 924

View this data on the master and slave respectively and delete the duplicate primary keys

Update lost

A piece of data is updated on the master and cannot be found on the slave

Last_SQL_Error: Could not execute Update_rows event on table hcy.t1; 
Can't find record in 't1', 
Error_code: 1032; 
handler error HA_ERR_KEY_NOT_FOUND; 
the event's master log mysql-bin.000010, end_log_pos 794

Add data to the slave

The above is aimed at less inconsistent data, which can be solved manually. If there is a large amount of data and there are too many inconsistent data to process, you can only resynchronize

Resynchronize

Using the hot backup tool
IBackup or xtrabackup
This backup tool does not lock the table during backup. Like mvcc, it takes a snapshot for external reading.
Data changes during backup will be recorded and synchronized after the backup is completed.

How to get the monitoring master-slave inconsistency

Pass parameter

mysql>  show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.205
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 30
              Master_Log_File: edu-mysql-bin.000008
          Read_Master_Log_Pos: 120
               Relay_Log_File: edu-mysql-relay-bin.000002
                Relay_Log_Pos: 287
        Relay_Master_Log_File: edu-mysql-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 120
              Relay_Log_Space: 464
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 205
                  Master_UUID: 7402509d-fd14-11e5-bfd0-000c2963dd15
             Master_Info_File: /home/mysql/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)
  • Slave_IO_Running
    This parameter can be used as io_thread monitoring item. Yes means io_ The connection between thread and the master database is normal and replication can be implemented. No indicates that the communication with the master database is abnormal. In most cases, it is caused by the master-slave network;
  • Slave_SQL_Running
    This parameter represents SQL_ Whether the thread is normal or not, specifically whether the statement is executed. It is often encountered that the primary key is repeated or a table does not exist.
  • Seconds_Behind_Master
    By comparing SQL_ Timestamp and IO of event executed by thread_ Thread compares the timestamp (TS for short) of the copied event to obtain such a difference;
    NULL - indicates io_thread or SQL_ Any one of the threads fails, that is, the Running status of the thread is No, not Yes.
    0 - the value is zero, which is what we are eager to see. It means that the master-slave replication is good, and it can be considered that lag does not exist.
    Positive value - indicates that the master and slave have delayed. The larger the number, the more the slave lags behind the master.
    Negative value - it's almost rare. I just heard some experienced DBA s say that they have seen it. In fact, this is a BUG value. This parameter does not support negative values, that is, it should not appear.

Tool maatkit MK heartbeat

It can be understood as the encapsulation and enhancement of parameters
Through a synchronization clock, he builds the same table on the master database and the slave database. In the table, the server ID of the slave database and the timestamp of data insertion are stored. He regularly checks the timestamp difference of the same data on the master and slave database. If the difference is higher than half a second, it is considered that the master and slave are inconsistent.

Topics: Database MySQL