A preliminary study of sub database and sub table

Posted by Mehdi on Mon, 28 Oct 2019 07:30:03 +0100

This article comes from: Daniel lin07's blog "preliminary analysis of sub database and sub table"

Preface

Sub database and sub table is a very common optimization work in enterprise development data storage, but it has not been carefully understood before. Until recently, I contacted a Spark table to synchronize tens of millions of daily data to MySQL table, I had a preliminary understanding of sub database and sub table.
This paper is a record summary of the preliminary study of sub database and sub table.

Summary

When the amount of data is small, the data is mostly stored in the form of a single table. However, with the expansion of business volume and the increase of data storage, the operation performance of a single table will be greatly reduced, which will affect the normal business work.
At this time, we need to consider the use of sub database and sub table. Generally speaking, when the data volume of a single table reaches about 10 million (recommended by the company's DBA), we can consider the use of sub database and sub table.

Strategy of sub database and sub table

vertical partitioning

To put it simply, vertical segmentation is to segment multiple fields involved in a table into different tables or even libraries for storage. As shown in the figure below:

In fact, the three common database paradigm design is also a kind of vertical segmentation.
Another commonly used vertical segmentation is to segment the hot access field and the cold access field, so that the database can cache more rows with fewer fields, and then improve the performance.

horizontal partitioning

To put it simply, horizontal segmentation is to store the data stored in a table to different tables according to a certain strategy. As shown in the figure below:

Range

The first way of horizontal segmentation is Range, which is to distribute according to a certain Range.
For example, according to the time range, one table is stored for one month's data, or according to the self increasing sequence of user ID, one table is stored for user ID in the range of 000000 to 100000, and one table is stored for user ID in the range of 100001 to 200000, etc.
The advantage of Range distribution is that it is convenient for data expansion. The disadvantage is that it is easy to generate hot data issues.

Hash

The second way of horizontal segmentation is Hash, which is to determine the location of the table to be stored by taking the number of remaining tables - 1 after a Hash operation.
For example: Hash distribution according to user name. The user's name is Xiaoming, the hashcode is calculated, and 754703 is obtained. The number of sub tables is determined to be 8 in advance, and then the remaining 7 is taken to obtain 3, which is distributed to the data table with index 3.
According to the advantages of Hash distribution, data distribution is even and there will be no data hotspot. However, it is inconvenient to expand the capacity, and the Hash value of data needs to be recalculated.

MyBatis + - ShardingJDBC practice sub library sub table

ShardingJDBC is a subproject of ShardingSphere, which provides additional services in the JDBC layer of Java. Visible ShardingPhere official document.

Database preparation

Existing user information needs to be stored, with five fields: uid, name, mobile, credit_id, and create_time.
The current strategy for database and table splitting is:

According to the uid level segmentation, if the last bit of uid is even, it will be divided into sharding0db database, otherwise it will be divided into sharding1db database.
In each database, if the second from the bottom of uid is an even number, it will be divided into t Uuser table 0, otherwise it will be divided into t UUSER Table 1.

So the fields stored in each table are the same. The database Schema script of one table is as follows:

DROP TABLE IF EXISTS `t_user_0`;
CREATE TABLE `t_user_0`  (
  `uid` int(6) NOT NULL,
  `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `mobile` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `credit_id` varchar(16) NOT NULL,
  `create_time` datetime(0) NULL,
  PRIMARY KEY (`uid`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

After data tables have been established for sharding0db and sharding1db, the structure is as follows:

Maven dependence

Spring boot 2.0.3.release is used in this project. The following Maven dependencies are imported into the project:

<dependencies>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter</artifactId>
   </dependency>

   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-test</artifactId>
      <scope>test</scope>
   </dependency>

   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
   </dependency>

   <dependency>
      <groupId>org.mybatis.spring.boot</groupId>
      <artifactId>mybatis-spring-boot-starter</artifactId>
      <version>1.3.2</version>
   </dependency>

   <dependency>
      <groupId>org.projectlombok</groupId>
      <artifactId>lombok</artifactId>
      <optional>true</optional>
   </dependency>

   <dependency>
      <groupId>org.apache.shardingsphere</groupId>
      <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
      <version>4.0.0-RC1</version>
   </dependency>

   <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
   </dependency>
</dependencies>

configuration file

Configure in application.yml:

spring:

  shardingsphere:
    datasource:
      names: sharding0db,sharding1db
      sharding0db:
        type: com.zaxxer.hikari.HikariDataSource
        jdbc-url: jdbc:mysql://localhost:3306/sharding0db?useUnicode=true&useSSL=false&useAffectedRows=true&characterEncoding=utf8
        driver-class-name: com.mysql.jdbc.Driver
        username: root
        password:
      sharding1db:
        type: com.zaxxer.hikari.HikariDataSource
        jdbc-url: jdbc:mysql://localhost:3306/sharding1db?useUnicode=true&useSSL=false&useAffectedRows=true&characterEncoding=utf8
        driver-class-name: com.mysql.jdbc.Driver
        username: root
        password:

    sharding:
      # Strategy of sub database and sub table
      default-database-strategy:
        inline:
          # Piecewise column
          sharding-column: uid
          # The fragment expression, groovy language, here is to get rid of uid. If the result is 0, it will be divided into sharding0db, and if the result is 1, it will be divided into sharding1db.
          algorithm-expression: sharding$->{uid % 2}db
      tables:
        t_user:
          actual-data-nodes: sharding$->{0..1}db.t_user_$->{0..1}
          table-strategy:
            inline:
              sharding-column: uid
              # The expression of fragmentation takes the remainder from the penultimate of uid. If it is result 0, it will be divided into t Uuser U0, and if it is result 1, it will be divided into t Uuser U1.
              algorithm-expression: t_user_$->{uid.intdiv(10) % 2}

# MyBatis configuration
mybatis:
  # Mapper map file location
  mapper-locations: classpath:mapper/*.xml
  # Aliases of all classes under the package. Configure aliases to omit the package path when receiving and returning parameter types in the object mapping file
  type-aliases-package: com.daniellin.demosharding.entity

Code

To prepare the UserDAO file:

@Mapper
@Repository
public interface UserDAO {

    /**
     * Get all users
     *
     * @return All users
     */
    List<User> queryList();

    /**
     * Add new users
     *
     * @param user new user
     */
    void insert(User user);
}

To prepare the XML map for UserDAO:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapper
        PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN"
        "http://mybatis.org/dtd/mybatis-3-mapper.dtd">

<mapper namespace="com.daniellin.demosharding.dao.UserDAO">

    <resultMap id="UserMapping" type="User">
        <id column="uid" property="uid" javaType="int"/>
        <result column="name" property="name" javaType="String"/>
        <result column="mobile" property="mobile" javaType="long"/>
        <result column="credit_id" property="creditId" javaType="long"/>
        <result column="create_time" property="createTime"/>
    </resultMap>

    <!-- Get all users -->
    <select id="queryList" resultMap="UserMapping">
        SELECT * FROM t_user
    </select>

    <!-- Add new users -->
    <insert id="insert" keyProperty="uid" parameterType="User" >
        INSERT INTO t_user(uid, name, mobile, credit_id, create_time)
        VALUES (#{uid},#{name},#{mobile},#{creditId},#{createTime})
    </insert>

</mapper>

Prepare User entity:

@Data
public class User {

    private Integer uid;

    private String name;

    private String mobile;

    private String creditId;

    private Date createTime;
}

Write unit test to insert data. Here, test by randomly generating UIDs of 100 users:

@RunWith(SpringRunner.class)
@SpringBootTest
public class DemoShardingApplicationTests {

    @Autowired
    private UserDAO userDAO;

    @Test
    public void testInsert() {
        System.out.println(("----- sharding insert method test ------"));
        for (int i = 0; i < 100; i++) {
            User userData = new User();
            userData.setUid(new Random().nextInt(999999));
            userData.setName(UUID.randomUUID().toString().replaceAll("-", ""));
            userData.setCreditId("1234567890");
            userData.setMobile("1234567890");
            userData.setCreateTime(new Date());
            userDAO.insert(userData);
        }
    }
}

Operation result

Viewing the data results, you can see that the data has been successfully inserted into the specified database table.
The last bit is odd, and the last and last bit is even. It is inserted into sharding1db. T ﹣ user ﹣ 0:

If the last bit is even and the last and last bit is odd, it is inserted into sharding0db. T ﹣ user ﹣ 1:

Reference material

#	Article link	author
1	sharding: a database, table and partition that can be read by everyone	Golden Dragon
2	A rare practice of dividing databases and tables	crossoverjie
3	advanced-java	doocs

Topics: Java Database Mybatis JDBC Spring

Programmer Think