Use the Canal ClientAdapter to synchronize the DDL and DML of Mysql to PostgreSQL

Posted by jalperin on Fri, 24 Sep 2021 00:36:17 +0200

1. Description

The description of each component is shown in the table below:

assembly	edition	The server	Installation tutorial	effect
canal	1.1.5	canal1, canal2, canal3	canal HA installation + mysql multi node automatic switching
zookeeper	3.6.3	canal1, canal2, canal3	zookeeper fully distributed installation
mysql	8.0.25	canal1, canal2	MySQL master master architecture construction	Source database
mysql	8.0.25	canal3	Install MySQL version 8.0.25 in CentOS 7	ClientAdapter remote configuration database
postgresql	13	canal3	Centos7 installing and configuring PostgreSQL13	Target database

postgresql creates database db1 and table db1.public.tb1_ one

2. canal.deploy configuration

canal.properties configuration

canal.serverMode = tcp

instance configuration

example_ rdb_ The db1 / instance.properties setting only synchronizes the data of the db1 database, as follows

#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal2 is 1262 and canal3 is 1263
canal.instance.mysql.slaveId=1261

# enable gtid use true/false
canal.instance.gtidon=false

# position info
canal.instance.master.address=canal2:3306
canal.instance.master.journal.name=mysql-bin.000026
canal.instance.master.position=4
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

canal.instance.standby.address = canal1:3306
#canal.instance.standby.journal.name = 
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=Canal_123
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=db1\\.tb\\d_\\d
# table black regex
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#################################################


# Heartbeat check needs to be turned on
canal.instance.detecting.enable = true 
# Heartbeat check sql
canal.instance.detecting.sql = select 1
# Heartbeat check frequency
canal.instance.detecting.interval.time = 3 
# Threshold value of the number of failed heartbeat checks. If the threshold value is exceeded, mysql link switching will be triggered, such as switching to the standby machine to continue consuming binlog
canal.instance.detecting.retry.threshold = 3  
# Whether to enable master/standby switching after the heartbeat check exceeds the threshold of failure times
canal.instance.detecting.heartbeatHaEnable = true

3. Get ClientAdapter

The canadian.adapter-1.1.5.tar.gz downloaded from github has postgresql compatibility problems. Use source code compilation instead. Please Refer here , then upload and unzip

[root@canal1 ~]# 
[root@canal1 ~]# mkdir canal.adapter-1.1.5
[root@canal1 ~]# 
[root@canal1 ~]# tar -zxvf canal.adapter-1.1.5.tar.gz -C canal.adapter-1.1.5
[root@canal1 ~]#

4. ClientAdapter configuration

4.1 conf/application.yml configuration

server:
  port: 8081
logging:
  level:
    com.alibaba.otter.canal.client.adapter.rdb: DEBUG
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp #tcp kafka rocketMQ rabbitMQ
  flatMessage: true
  zookeeperHosts: canal1:2181,canal2:2181,canal3:2181
  batchSize: 500
  syncBatchSize: 1000
  retries: -1    # -1 means infinite retry
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
    # canal tcp consumer
    # canal.tcp.server.host: 127.0.0.1:11111
    canal.tcp.zookeeper.hosts: canal1:2181,canal2:2181,canal3:2181
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:
    # kafka consumer
    kafka.bootstrap.servers: 127.0.0.1:9092
    kafka.enable.auto.commit: false
    kafka.auto.commit.interval.ms: 1000
    kafka.auto.offset.reset: latest
    kafka.request.timeout.ms: 40000
    kafka.session.timeout.ms: 30000
    kafka.isolation.level: read_committed
    kafka.max.poll.records: 1000
    # rocketMQ consumer
    rocketmq.namespace:
    rocketmq.namesrv.addr: 127.0.0.1:9876
    rocketmq.batch.size: 1000
    rocketmq.enable.message.trace: false
    rocketmq.customized.trace.topic:
    rocketmq.access.channel:
    rocketmq.subscribe.filter:
    # rabbitMQ consumer
    rabbitmq.host:
    rabbitmq.virtual.host:
    rabbitmq.username:
    rabbitmq.password:
    rabbitmq.resource.ownerId:

  srcDataSources:
    defaultDS:
      url: jdbc:mysql://canal1:3306/default_ test?useUnicode=true    # default_ No data in test database
      username: root
      password: Root_123
  canalAdapters:
  - instance: example_rdb_db1   # canal instance Name or mq topic name
    groups:    # An instance can be consumed by multiple group s in parallel
    - groupId: g1
      outerAdapters:
      - name: logger
#      - name: rdb
#        key: mysql1
#        properties:
#          jdbc.driverClassName: com.mysql.jdbc.Driver
#          jdbc.url: jdbc:mysql://127.0.0.1:3306/mytest2?useUnicode=true
#          jdbc.username: root
#          jdbc.password: 121212
#      - name: rdb
#        key: oracle1
#        properties:
#          jdbc.driverClassName: oracle.jdbc.OracleDriver
#          jdbc.url: jdbc:oracle:thin:@localhost:49161:XE
#          jdbc.username: mytest
#          jdbc.password: m121212
      - name: rdb
        key: postgres1      # Unique value, corresponding to outerAdapterKey of table mapping configuration
        properties:
          jdbc.driverClassName: org.postgresql.Driver
          jdbc.url: jdbc:postgresql://canal3:5432/db1    
          jdbc.username: postgres
          jdbc.password: postgres123
          threads: 2
          commitSize: 3000
#      - name: hbase
#        properties:
#          hbase.zookeeper.quorum: 127.0.0.1
#          hbase.zookeeper.property.clientPort: 2181
#          zookeeper.znode.parent: /hbase
#      - name: es
#        hosts: 127.0.0.1:9300 # 127.0.0.1:9200 for rest mode
#        properties:
#          mode: transport # or rest
#          # security.auth: test:123456 #  only used for rest mode
#          cluster.name: elasticsearch
#        - name: kudu
#          key: kudu
#          properties:
#            kudu.master.address: 127.0.0.1 # ',' split multi address

4.2 conf/rdb configuration

The adapter will automatically load all. yml terminated table mapping configuration files under conf/rdb

db1.yml

#dataSourceKey: defaultDS
#destination: example
#groupId: g1
#outerAdapterKey: mysql1
#concurrent: true
#dbMapping:
#  database: mytest
#  table: user
#  targetTable: mytest2.user
#  targetPk:
#    id: id
#  mapAll: true
#  targetColumns:
#    id:
#    name:
#    role_id:
#    c_time:
#    test1:
#  etlCondition: "where c_time>={}"
#  commitBatch: 3000 # Size of batch submission


## Mirror schema synchronize config
dataSourceKey: defaultDS
destination: example_rdb_db1
groupId: g1
outerAdapterKey: postgres1    # Corresponding to the key of outAdapters in application.yml
concurrent: true              # For parallel synchronized tables, the primary key must not be changed and cannot be a foreign key of other synchronized tables
dbMapping:
  mirrorDb: true
  database: db1        # Source and target database names

tb1_1.yml
Manually run the curl ETL command to synchronize the full data of the tables in this file

dataSourceKey: defaultDS
destination: example_rdb_db1
groupId: g1
outerAdapterKey: postgres1
concurrent: true
dbMapping:
  database: db1
  table: tb1_1
  targetTable: tb1_1
  targetPk:
    idA: idA
  mapAll: true
  targetColumns:
    id:
    name:
    role_id:
    c_time:
    test1:
  etlCondition: "where c_time>={}"
  commitBatch: 3000 # Size of batch submission


## Mirror schema synchronize config
#dataSourceKey: defaultDS
#destination: example_rdb_db1
#groupId: g1
#outerAdapterKey: postgres1    # Corresponding to the key of outAdapters in application.yml
#concurrent: true              # For parallel synchronized tables, the primary key must not be changed and cannot be a foreign key of other synchronized tables
#dbMapping:
#  mirrorDb: true
#  database: db1        # Source and target database names

5. Start and stop

[root@canal1 ~]#
[root@canal1 ~]# /root/canal.adapter-1.1.5/bin/restart.sh
[root@canal1 ~]# 
[root@canal1 ~]# /root/canal.adapter-1.1.5/bin/stop.sh
[root@canal1 ~]#

6. REST interface command

Query destinations:

[root@canal1 ~]# 
[root@canal1 ~]# curl http://canal1:8081/destinations
[{"destination":"example_rdb_db1","status":"on"}]
[root@canal1 ~]#

Query the switch status of a destination

[root@canal1 ~]# 
[root@canal1 ~]# curl http://canal1:8081/syncSwitch/example_rdb_db1
{"stauts":"on"}
[root@canal1 ~]#

Manual operation of full ETL

[root@canal1 ~]# 
[root@canal1 ~]# curl http://canal1:8081/etl/rdb/postgres1/tb1_1.yml -X POST -d ""
{"succeeded":true,"resultMessage":"Import RDB Data: 3"}
[root@canal1 ~]#

Client incremental data synchronization switch

[root@canal1 ~]# 
[root@canal1 ~]# curl http://canal1:8081/syncSwitch/example_rdb_db1/off -X PUT
{"code":20000,"message":"example: example_rdb_db1 Closing synchronization succeeded"}
[root@canal1 ~]#

View the total amount of data in the target table

[root@canal1 ~]# 
[root@canal1 ~]# curl http://canal1:8081/count/rdb/postgres1/tb1_1.yml
{"count":3,"targetTable":"tb1_1"}
[root@canal1 ~]#

7. Data synchronization mode

Incremental synchronization
Set tb1_ 1. Move YML to conf/rdb directory
Manually run curl ETL full synchronization command
Set tb1_ 1. Move YML out of the conf/rdb directory

8. Legacy BUG

Run bin / stop.sh of adapter and example of zookeeper_ rdb_ Db1 / 1001 / cursor is deleted. After restarting, there is no example_rdb_db1/1001/cursor node, and then insert a piece of data into the mysql source database, example_ rdb_ The db1 / 1001 / cursor node has data again
At the beginning, a field in mysql is score A. later, the table deletion is replaced with ageA, and the table deletion is replaced with score a; The field corresponding to the postgresql table is also score. During incremental synchronization, the following error is reported:

c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Target column: ageA not matched
java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Target column: ageA not matched

Manually run the curl ETL full synchronization command, tb1 of mysql_ 1 and tb1_ The data volume of Table 2 is 5 and 1 respectively, while the synchronized data volume is 3 and 3
conf/application.yml configuration, saving to mysql is not completed
Therefore, we give up the canal ClientAdapter method and implement it manually. Therefore, this article is for reference only

Topics: MySQL PostgreSQL canal

Programmer Think