Pull binlog, automatic data synchronization, and the boss wants to raise salary

Posted by adcripps on Thu, 10 Feb 2022 19:59:20 +0100

Hello, I'm brother Tom~

Everyone must be familiar with MySQL database. Today, I'd like to talk about data synchronization

For data synchronization, our common strategies are synchronous double write and asynchronous message

1. Synchronous double write: literally, synchronous + double write. For example, the old database model is reconstructed and the data is migrated to the new database. During the migration process, if there is any data change, it should be written to both the old database and the new database, and both sides should be updated synchronously.

  • Advantages: synchronization mechanism ensures the effectiveness of data.
  • Disadvantages: adding additional synchronization logic will lead to performance loss

2. Asynchronous message: if there are too many dependent parties, we usually send heterogeneous change data to the MQ message system. Interested businesses can subscribe to the message Topic, pull the message, and then process it according to their own business logic.

  • Advantages: the architecture is decoupled and can be done asynchronously to reduce the performance loss of the main link. If there are multiple consumers, there will be no exponential performance superposition
  • Disadvantages: asynchronous mechanism can not meet the real-time performance and has a certain delay. Only final consistency can be achieved.

The above two schemes are hard coded. Is there a general technical scheme. It doesn't care what business you are and what data you write. For the platform, it can be abstracted into MySQL tables and synchronize the table data directly. Only users really care about the data content.

You can refer to the master-slave synchronization principle of MySQL, pull binlog, and just parse the data in it.

The popular middleware is Ali's open source Canal. Today we'll make a technical scheme, which is roughly as follows:

1, Canal introduction

Canal, which means waterway / pipeline / ditch, is mainly used for incremental log analysis based on MySQL database to provide incremental data subscription and consumption.

At the beginning of Canal's birth, it was to solve the pressure on the main database caused by data synchronization between multiple standby databases and the main database.

Slowly, this pipeline has been developed, and there are more and more application scenarios

The working principle is very simple. Disguise yourself as a MySQL slave, simulate the interactive protocol of MySQL slave, and send a dump request to the MySQL master.

The MySQL master receives the dump request sent by canal and starts to push the binary log to canal. Then canal parses the binlog log and stores it in different storage media, such as mysql, Kafka, Elastic Search, Pulsar, etc

Business scenario:

  • Real time database backup
  • Construction and maintenance of ES data index
  • Synchronous maintenance of distributed cache (e.g. Redis)
  • The data is heterogeneous. Subscribers can subscribe and consume according to their own business needs, such as Kafka, Pulsar, etc

2, Install MySQL

1. Pull MySQL image

docker pull mysql:5.7

2. View mirror

docker images

3. Start MySQL process

docker run \
--name mysql \
-p 3306:3306 \
-d mysql:5.7

4. View process

[root@iZbp12gqydkgwid86ftoauZ mysql]# docker ps -a
CONTAINER ID   IMAGE     COMMAND                  CREATED         STATUS         PORTS                               NAMES
e92827897538   mysql     "docker-entrypoint.s..."   4 seconds ago   Up 2 seconds>3306/tcp, 33060/tcp   mysql

5. Enter MySQL container

docker exec -it 167bfa3785f1 /bin/bash

Note: some problems may be encountered when modifying some configuration files, such as:

Handling method of bash: vi: command not found using vi or vim prompt in docker container

Because the vi editor is not installed, you can execute the following command

apt-get update
apt-get install vim

6. MySQL client common commands

# Log in to mysql
mysql -uroot -p111111

# Display database list
show databases;

# Select database
use mysql;

# Show all tables
show tables;

# Display table structure
describe Table name;

More commands:

3, MySQL related configuration

Create a MySQL user with user name: tom and password: 123456

create user 'tom'@'%' identified by '123456';

Grant read and write permissions to all libraries for user: tom

grant SELECT, REPLICATION SLAVE, REPLICATION CLIENT on *.* to 'tom'@'%' identified by '123456';

Modify MySQL configuration file my CNF, location: / etc / my cnf

log-bin=mysql-bin # Enable binlog
binlog-format=ROW # Select row mode
server_id=1 # MySQL replacement configuration needs to be defined. It should not be repeated with the slaveId of canal

Note: you need to restart the MySQL container instance and execute the command docker restart mysql

View binlog mode:

To view the list of binlog log files:

To view the binlog file currently being written:

4, Install Canal

1. Download the installation package from the official website

Download address: https://github.com/alibaba/canal/releases

This experiment uses the latest version v1 1.5, which is mainly personalized support for different clients, belongs to ecological expansion.

For more features, you can check it on the official website

Unzip tar GZ compressed package

tar -zxvf canal.deployer-1.1.5.tar.gz 

Open the configuration file conf / example / instance Properties, modify the configuration as follows:

## v1. slaveId will be automatically generated after version 0.26, so you can not configure it
# canal.instance.mysql.slaveId=0

# Database address
# binlog log name
# Starting binlog offset when linking mysql main database
# Timestamp of binlog starting when linking mysql main database

# username/password
# Account and password authorized on MySQL server
# character set
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password

# table regex .*\..* It means to listen to all tables. You can also write specific table names, separated by
# The blacklist of mysql data analysis tables. Multiple tables are separated by

Start command


Because Alibaba cloud ECS server is adopted, it is found that JAVA environment is not installed.

Download the installation package of JDK 8 from Oracle official website

Download address: https://www.oracle.com/java/technologies/downloads/#java8

Then, upload the installation package to the ECS server through the following command

scp jdk-8u311-linux-x64.tar.gz root@      //Upload file

Install JDK 8 environment

file: https://developer.aliyun.com/article/701864

5, Start Canal

Enter canal deployer-1.1.5/bin

Execute startup script:


Enter canal deployer-1.1.5/logs/example

If example The following contents appear in the log file, indicating that the startup is successful

2022-01-03 08:23:10.165 [canal-instance-scan-0] INFO  c.a.otter.canal.instance.core.AbstractCanalInstance - stop CannalInstance for null-example 
2022-01-03 08:23:10.177 [canal-instance-scan-0] INFO  c.a.otter.canal.instance.core.AbstractCanalInstance - stop successful....
2022-01-03 08:23:10.298 [canal-instance-scan-0] INFO  c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start CannalInstance for 1-example 
2022-01-03 08:23:10.298 [canal-instance-scan-0] WARN  c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
2022-01-03 08:23:10.298 [canal-instance-scan-0] WARN  c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter : ^mysql\.slave_.*$
2022-01-03 08:23:10.299 [canal-instance-scan-0] INFO  c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....

6, Engineering experiment

Create a SpringBoot project, spring boot bulking canal

Introducing related pom dependencies


Write a java class, establish a connection with the canal server, and pull the change data of the database

// create link
CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress("", 11111), "example", "", "");
try {
    //open a connection
    //Subscribe to all tables
    //Rollback to the place without ack. The next fetch can start from the last place without ack
    while (true) {
        Message message = connector.getWithoutAck(BATCH_SIZE);
        long batchId = message.getId();
        // batch id submission
} catch (Exception e) {
} finally {

Create MySQL table under ds1 database

CREATE TABLE `person` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto increment primary key',
  `income` bigint(20) NOT NULL COMMENT 'income',
  `expend` bigint(20) NOT NULL COMMENT 'expenditure',
  PRIMARY KEY (`id`),
  KEY `idx_income` (`income`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='Personal income and expenditure statement';

Insert a record:

insert into person values(100,1000,1000);

The Java class parses binlog and prints the change log on the console:

binlog[mysql-bin.000002:1946] , table[ds1,person] , eventType : INSERT
id : 100    update=true
income : 1000    update=true
expend : 1000    update=true

Modify the record with id=100:

update person set income=2000, expend=2000 where id=100;

Console print change log:

binlog[mysql-bin.000002:2252] , table[ds1,person] , eventType : UPDATE
------->; before
id : 100    update=false
income : 1000    update=false
expend : 1000    update=false
------->; after
id : 100    update=false
income : 2000    update=true
expend : 2000    update=true