Synchronization of Elastic Search Index Objects Based on Database Structure

Posted by dopp on Fri, 30 Aug 2019 12:51:04 +0200

cause

Because of the characteristics of business, sometimes the data need complex query logic to get. Because the database itself does not support complex queries, so synchronizing data to capability middleware has become a common solution, and synchronizing to Elastic search is one of them.

Scheme Selection

The data source we use is MySql, which chooses to synchronize to ES. The idea is to reduce intrusiveness by subscribing to Binlog. There are also some solutions, such as: go-mysql-elasticsearch,canal,gravity

However, these schemes usually support only one-to-one indexing, that is, one table and one index, while the one-to-one and one-to-many relationships in data tables can not be reflected in the body of the index. This relationship is often needed in business. So we should consider gravity as a plug-in.

gravity is Mobay's open source data synchronization middleware. Currently, data sources support MySql and Mono, TiDB and PostgreSQL in development. Synchronization target support: MySQL/TiDB and Kafka, Elastic search is still in bate phase, and supports 6.

EsModel Synchronization Plug-in

Project address: gravity Welcome star:).

Synchronization strategy

  • Supports the synchronization of main table and one-to-one and one-to-one multi-word table, which can be synchronized into an index structure.
  • One-to-one relationships support synchronization in the form of tiles or sub-objects.
  • Support ES Versions: 6, 7

For example, there are four tables: student, student_class, student_detail, student_parent. Student is the student table (main table), student_class class class (one-to-one sub-table), student_detail student details (one-to-one sub-table), student_parent student parents (one-to-many sub-table). Stud_class is synchronized in the form of sub-objects, and student_detail is synchronized in the form of tiles.

The Sql script is as follows:


DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
  `id` bigint(20) NOT NULL,
  `name` varchar(64) NOT NULL DEFAULT '',
  `birthday` date NOT NULL DEFAULT '1970-01-01',
  `high` int(11) NOT NULL DEFAULT '0',
  `sex` tinyint(4) NOT NULL DEFAULT '1',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of student
-- ----------------------------
BEGIN;
INSERT INTO `student` VALUES (1, 'Zhang San', '2010-02-05', 156, 1, '2019-08-29 19:55:36');
INSERT INTO `student` VALUES (2, 'Li Si', '2010-03-05', 176, 2, '2019-08-29 19:55:36');
INSERT INTO `student` VALUES (3, 'Wang Ping', '2010-03-05', 176, 2, '2019-08-29 20:09:03');
COMMIT;

-- ----------------------------
-- Table structure for student_class
-- ----------------------------
DROP TABLE IF EXISTS `student_class`;
CREATE TABLE `student_class` (
  `id` bigint(20) NOT NULL,
  `student_id` bigint(20) NOT NULL DEFAULT '0',
  `name` varchar(255) NOT NULL DEFAULT '',
  `student_count` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of student_class
-- ----------------------------
BEGIN;
INSERT INTO `student_class` VALUES (1, 1, 'One(1)class', 32);
INSERT INTO `student_class` VALUES (2, 2, 'Two(2)class', 12);
COMMIT;

-- ----------------------------
-- Table structure for student_detail
-- ----------------------------
DROP TABLE IF EXISTS `student_detail`;
CREATE TABLE `student_detail` (
  `id` bigint(20) NOT NULL,
  `student_id` bigint(20) NOT NULL DEFAULT '0',
  `introduce` varchar(255) NOT NULL DEFAULT '',
  `mobile` varchar(64) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of student_detail
-- ----------------------------
BEGIN;
INSERT INTO `student_detail` VALUES (1, 1, 'Introduction by Zhang San', '18888888888');
INSERT INTO `student_detail` VALUES (2, 2, 'Introduction of Li Si', '13333333333');
COMMIT;

-- ----------------------------
-- Table structure for student_parent
-- ----------------------------
DROP TABLE IF EXISTS `student_parent`;
CREATE TABLE `student_parent` (
  `id` bigint(20) NOT NULL,
  `student_id` bigint(20) NOT NULL DEFAULT '0',
  `name` varchar(255) NOT NULL DEFAULT '',
  `birthday` date NOT NULL DEFAULT '1970-01-01',
  `sex` tinyint(11) NOT NULL DEFAULT '1',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of student_parent
-- ----------------------------
BEGIN;
INSERT INTO `student_parent` VALUES (1, 1, 'Zhang San's father', '1980-02-02', 1, '2019-08-29 20:00:58');
INSERT INTO `student_parent` VALUES (2, 1, 'Zhang San's Mother', '1982-07-07', 2, '2019-08-29 20:00:58');
INSERT INTO `student_parent` VALUES (3, 2, 'Four Fathers of Li', '1979-03-03', 1, '2019-08-29 20:00:58');
INSERT INTO `student_parent` VALUES (4, 2, 'Mother Li Si', '1981-06-06', 2, '2019-08-29 20:00:58');
COMMIT;

The synchronization configuration is as follows:

# name must be filled in
name = "mysql2esmodelDemo"

# Internal library name for loci, heartbeat, etc., default to _gravity
internal-db-name = "_gravity"

#
# Definition of the Input plug-in, where mysql is used
#
[input]
type = "mysql"
mode = "replication"
[input.config.source]
host = "192.168.1.148"
username = "root"
password = "mysqldev"
port = 3306
max-idle = 10
max-open = 10


[output]
type = "esmodel"

[output.config]
# Ignore 400 (bad request) return
# Elastic search returns 400 errors when the index name is not standardized and parsing errors occur
# The default is false, which means that an exception will be thrown when a failure occurs and must be handled manually. These requests are ignored when set to true
ignore-bad-request = true

#
# Target-side Elastic search configuration
# - Required selection
#
[output.config.server]
# Connected Elastic search address, required
urls = ["http://192.168.1.152:9200"]
# Whether to sniff nodes, default to false
sniff = false
# Timeout, default 1000ms
timeout = 500

#
# Target End Authentication Configuration
# - Optional
#
[output.config.server.auth]
username = ""
password = ""


[[output.config.routes]]
match-schema = "test"
# Master table
match-table = "student"
#Index Name
index-name="student_index"
#Type name, es7 item is invalid
type-name="student"
#Number of fragments
shards-num=1
#Number of copies
replicas-num=0
#Number of failed retries
retry-count=3
#Included columns, default all
include-column = []
#Excluded columns, not by default
exclude-column = []

# List escalation strategy
[output.config.routes.convert-column]
name = "studentName"


[[output.config.routes.one-one]]
match-schema = "test"
match-table = "student_detail"
#foreign key column
fk-column = "student_id"
#Included columns, default all
include-column = []
#Excluded columns, not by default
exclude-column = []
# Schema, 1: Subobjects, 2 Index Flattening
mode = 2
# Attribute object name, mode 1 valid, default to source table name hump structure
property-name = "studentDetail"
# Attribute prefix, mode 2 valid, default to source table name hump structure
property-pre = "sd_"

[output.config.routes.one-one.convert-column]
introduce = "introduceInfo"

[[output.config.routes.one-one]]
match-schema = "test"
match-table = "student_class"
#foreign key column
fk-column = "student_id"
#Included columns, default all
include-column = []
#Excluded columns, not by default
exclude-column = []
# Schema, 1: Subobjects, 2 Index Flattening
mode = 1
# Attribute object name, mode 1 valid, default to source table name hump structure
property-name = "studentClass"
# Attribute prefix, mode 2 valid, default to source table name hump structure
property-pre = "sc_"

[output.config.routes.one-one.convert-column]
name = "className"

[[output.config.routes.one-more]]
match-schema = "test"
match-table = "student_parent"
#foreign key column
fk-column = "student_id"
#Included columns, default all
include-column = []
#Excluded columns, not by default
exclude-column = []
# Property object name, default to source table name hump structure
property-name = "studentParent"

[output.config.routes.one-more.convert-column]
name = "parentName"

Compile to the project root directory make and execute the synchronization command

./bin/gravity -config  ./docs/2.0/example-mysql2esmodel.toml

The index structure after synchronization is:

{
    "state": "open",
    "settings": {
        "index": {
            "creation_date": "1567160065596",
            "number_of_shards": "1",
            "number_of_replicas": "0",
            "uuid": "noe_V-RdTr6QaFDy4fPRjA",
            "version": {
                "created": "7030199"
            },
            "provided_name": "student_index"
        }
    },
    "mappings": {
        "_doc": {
            "properties": {
                "birthday": {
                    "type": "date"
                },
                "studentParent": {
                    "type": "nested",
                    "properties": {
                        "birthday": {
                            "type": "date"
                        },
                        "parentName": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            }
                        },
                        "create_time": {
                            "type": "date"
                        },
                        "sex": {
                            "type": "long"
                        },
                        "student_id": {
                            "type": "long"
                        },
                        "id": {
                            "type": "long"
                        }
                    }
                },
                "high": {
                    "type": "long"
                },
                "create_time": {
                    "type": "date"
                },
                "sex": {
                    "type": "long"
                },
                "studentName": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                },
                "sd_student_id": {
                    "type": "long"
                },
                "introduceInfo": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                },
                "sd_id": {
                    "type": "long"
                },
                "id": {
                    "type": "long"
                },
                "sd_mobile": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                },
                "studentClass": {
                    "properties": {
                        "sc_id": {
                            "type": "long"
                        },
                        "className": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            }
                        },
                        "sc_student_count": {
                            "type": "long"
                        },
                        "sc_student_id": {
                            "type": "long"
                        }
                    }
                }
            }
        }
    }
}

The data samples are as follows:

{
    "_index": "student_index",
    "_type": "_doc",
    "_id": "2",
    "_version": 5,
    "_score": 1,
    "_source": {
        "studentClass": {
            "className": "Two(2)class",
            "sc_id": 2,
            "sc_student_count": 12,
            "sc_student_id": 2
        },
        "sd_student_id": 2,
        "introduceInfo": "Introduction of Li Si",
        "sd_id": 2,
        "sd_mobile": "13333333333",
        "studentParent": [
            {
                "birthday": "1981-06-06T00:00:00+08:00",
                "parentName": "Mother Li Si",
                "create_time": "2019-08-29T20:00:58+08:00",
                "sex": 2,
                "student_id": 2,
                "id": 4
            },
            {
                "birthday": "1979-03-03T00:00:00+08:00",
                "parentName": "Four Fathers of Li",
                "create_time": "2019-08-29T20:00:58+08:00",
                "sex": 1,
                "student_id": 2,
                "id": 3
            }
        ],
        "birthday": "2010-03-05T00:00:00+08:00",
        "high": 176,
        "create_time": "2019-08-29T19:55:36+08:00",
        "sex": 2,
        "studentName": "Li Si",
        "id": 2
    }
}

Topics: Programming MySQL Attribute Database ElasticSearch