Incremental synchronization of canal data from Mysql8.0 to ElasticSearch7 (windows local environment)

Posted by sgiandhu on Fri, 29 Oct 2021 09:50:25 +0200

Incremental synchronization of canal data from Mysql8.0 to ElasticSearch7 (windows local environment)

As of the current date 2021-10-29

The current version of canal has only been updated to 1.1.6

This version of canal is going to use v1.1.5, which supports mysql8.0 and elasticsearch7. X

If you use mysql5. X, elasticsearch6. X, v1.1.4 is recommended

The following is the building tutorial

1. (Mysql, ES) software download and installation

The installation configuration of ElasticSearch is detailed in my blog post( Click go)

Installation and configuration of Mysql8.0( Click go)

Note that after mysql8.0 is installed, binlog should be enabled by default. Check whether binlog format is row

mysql> show variables like 'binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
1 row in set, 1 warning (0.00 sec)

2. Download and install Canal

This is the download address

https://github.com/alibaba/canal/releases

Pay attention to downloading this version when downloading v1.1.5. I didn't download the alpha version before, so the es jar under the plugin directory can't be used

The download may be a little slow and need to wait a little

For the windows version, extract the above two compressed packages separately, even if the download and installation are completed.

3. Prepare the index and table structure of es and mysql

Prepare the required data structure first

Preparing the index structure in es

PUT /rabbit-statis/
{
  "mappings": {
      "properties": {
        "project_name":{
          "type": "text"
        },
        "class_name":{
          "type": "keyword"
        },
        "show_num":{
          "type": "integer"
        },
        "refactor_path":{
          "type": "text"
        }
      }
    
  }
}

Copy the above code and put it into kibana's console to run

Next, prepare the table structure of mysql

First build a rabbit-statis2 database, and then run the sql script

-- ----------------------------
-- Table structure for statis
-- ----------------------------
DROP TABLE IF EXISTS `statis`;
CREATE TABLE `statis`  (
  `id` int(0) NOT NULL AUTO_INCREMENT,
  `project_name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL,
  `class_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL,
  `show_num` int(0) NULL DEFAULT NULL,
  `refactor_path` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;

SET FOREIGN_KEY_CHECKS = 1;

4. Configure canal deploy

The purpose of configuring canal is to enable canal to listen to the binlog logs of our database, read these binlogs, and synchronize the data to es.

Next, configure canal deploy

We first find the file / deployer/conf/canal.properties, and then make the following modifications

# tcp bind ip
canal.ip =127.0.0.1

Then find the configuration file / deployer/conf/example/instance.properties and modify the following items

# canal is a slave disguised as mysql, so the slaveId cannot conflict with the slaveId of other mysql
canal.instance.mysql.slaveId=1234

# ip and port, username and password of the connection database
canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8

Then find the / deployer/bin/startup.bat windows environment and double-click the file. canal runs successfully.

How to confirm that the canal deployer runs successfully?

Locate the / deployer/log/canal/canal.log file

2021-10-29 12:33:25.300 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[127.0.0.1(127.0.0.1):11111]
2021-10-29 12:33:26.977 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......

Locate the / deployer/log/example/example.log file

2021-10-29 11:11:40.568 [main] INFO  c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start CannalInstance for 1-example 
2021-10-29 11:11:40.576 [main] WARN  c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
2021-10-29 11:11:40.576 [main] WARN  c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter : ^mysql\.slave_.*$
2021-10-29 11:11:40.626 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2021-10-29 11:11:40.638 [main] INFO  c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....

This indicates that the startup is successful.

5. Configuring the Canadian adapter

We first find the file / adapter/conf/application.yml. The modified file is as follows

server:
  port: 8081
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp #tcp kafka rocketMQ rabbitMQ
  flatMessage: true
  zookeeperHosts:
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
    # canal tcp consumer
    canal.tcp.server.host: 127.0.0.1:11111
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:
    # kafka consumer
    kafka.bootstrap.servers: 127.0.0.1:9092
    kafka.enable.auto.commit: false
    kafka.auto.commit.interval.ms: 1000
    kafka.auto.offset.reset: latest
    kafka.request.timeout.ms: 40000
    kafka.session.timeout.ms: 30000
    kafka.isolation.level: read_committed
    kafka.max.poll.records: 1000
    # rocketMQ consumer
    rocketmq.namespace:
    rocketmq.namesrv.addr: 127.0.0.1:9876
    rocketmq.batch.size: 1000
    rocketmq.enable.message.trace: false
    rocketmq.customized.trace.topic:
    rocketmq.access.channel:
    rocketmq.subscribe.filter:
    # rabbitMQ consumer
    rabbitmq.host:
    rabbitmq.virtual.host:
    rabbitmq.username:
    rabbitmq.password:
    rabbitmq.resource.ownerId:

  srcDataSources:
    defaultDS:
      url: jdbc:mysql://localhost:3306/rabbit-statis2?serverTimezone=GMT%2B8
      username: canal
      password: canal
  canalAdapters:
  - instance: example # canal instance Name or mq topic name
    groups:
    - groupId: g1
      outerAdapters:
      - name: logger
      - name: es7
        key: exampleKey
        hosts: 127.0.0.1:9300
        properties:
          mode: transport
          # security.auth: test:123456 #  only used for rest mode
          cluster.name: my-application
#        - name: kudu
#          key: kudu
#          properties:
#            kudu.master.address: 127.0.0.1 # ',' split multi address

Then find the file / adapter/conf/es7/mytest_user. The modified file is as follows

dataSourceKey: defaultDS
outerAdapterKey: exampleKey     # Corresponding to the key configured in es in application.yml
destination: example
groupId: g1
esMapping:
  _index: rabbit-statis
  _id: _id
  upsert: true
  sql: "select id as _id,project_name,class_name,show_num,refactor_path from statis"
  commitBatch: 3000

Topics: Windows MySQL ElasticSearch