This article comes from: Daniel lin07's blog "preliminary analysis of sub database and sub table"
Preface
Sub database and sub table is a very common optimization work in enterprise development data storage, but it has not been carefully understood before. Until recently, I contacted a Spark table to synchronize tens of millions of daily data to MySQL table, I had a preliminary understanding of sub database and sub table.
This paper is a record summary of the preliminary study of sub database and sub table.
Summary
When the amount of data is small, the data is mostly stored in the form of a single table. However, with the expansion of business volume and the increase of data storage, the operation performance of a single table will be greatly reduced, which will affect the normal business work.
At this time, we need to consider the use of sub database and sub table. Generally speaking, when the data volume of a single table reaches about 10 million (recommended by the company's DBA), we can consider the use of sub database and sub table.
Strategy of sub database and sub table
vertical partitioning
To put it simply, vertical segmentation is to segment multiple fields involved in a table into different tables or even libraries for storage. As shown in the figure below:
In fact, the three common database paradigm design is also a kind of vertical segmentation.
Another commonly used vertical segmentation is to segment the hot access field and the cold access field, so that the database can cache more rows with fewer fields, and then improve the performance.
horizontal partitioning
To put it simply, horizontal segmentation is to store the data stored in a table to different tables according to a certain strategy. As shown in the figure below:
Range
The first way of horizontal segmentation is Range, which is to distribute according to a certain Range.
For example, according to the time range, one table is stored for one month's data, or according to the self increasing sequence of user ID, one table is stored for user ID in the range of 000000 to 100000, and one table is stored for user ID in the range of 100001 to 200000, etc.
The advantage of Range distribution is that it is convenient for data expansion. The disadvantage is that it is easy to generate hot data issues.
Hash
The second way of horizontal segmentation is Hash, which is to determine the location of the table to be stored by taking the number of remaining tables - 1 after a Hash operation.
For example: Hash distribution according to user name. The user's name is Xiaoming, the hashcode is calculated, and 754703 is obtained. The number of sub tables is determined to be 8 in advance, and then the remaining 7 is taken to obtain 3, which is distributed to the data table with index 3.
According to the advantages of Hash distribution, data distribution is even and there will be no data hotspot. However, it is inconvenient to expand the capacity, and the Hash value of data needs to be recalculated.
MyBatis + - ShardingJDBC practice sub library sub table
ShardingJDBC is a subproject of ShardingSphere, which provides additional services in the JDBC layer of Java. Visible ShardingPhere official document.
Database preparation
Existing user information needs to be stored, with five fields: uid, name, mobile, credit_id, and create_time.
The current strategy for database and table splitting is:
- According to the uid level segmentation, if the last bit of uid is even, it will be divided into sharding0db database, otherwise it will be divided into sharding1db database.
- In each database, if the second from the bottom of uid is an even number, it will be divided into t Uuser table 0, otherwise it will be divided into t UUSER Table 1.
So the fields stored in each table are the same. The database Schema script of one table is as follows:
DROP TABLE IF EXISTS `t_user_0`; CREATE TABLE `t_user_0` ( `uid` int(6) NOT NULL, `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `mobile` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `credit_id` varchar(16) NOT NULL, `create_time` datetime(0) NULL, PRIMARY KEY (`uid`) USING BTREE ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
After data tables have been established for sharding0db and sharding1db, the structure is as follows:
Maven dependence
Spring boot 2.0.3.release is used in this project. The following Maven dependencies are imported into the project:
<dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.mybatis.spring.boot</groupId> <artifactId>mybatis-spring-boot-starter</artifactId> <version>1.3.2</version> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>sharding-jdbc-spring-boot-starter</artifactId> <version>4.0.0-RC1</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> </dependency> </dependencies>
configuration file
Configure in application.yml:
spring: shardingsphere: datasource: names: sharding0db,sharding1db sharding0db: type: com.zaxxer.hikari.HikariDataSource jdbc-url: jdbc:mysql://localhost:3306/sharding0db?useUnicode=true&useSSL=false&useAffectedRows=true&characterEncoding=utf8 driver-class-name: com.mysql.jdbc.Driver username: root password: sharding1db: type: com.zaxxer.hikari.HikariDataSource jdbc-url: jdbc:mysql://localhost:3306/sharding1db?useUnicode=true&useSSL=false&useAffectedRows=true&characterEncoding=utf8 driver-class-name: com.mysql.jdbc.Driver username: root password: sharding: # Strategy of sub database and sub table default-database-strategy: inline: # Piecewise column sharding-column: uid # The fragment expression, groovy language, here is to get rid of uid. If the result is 0, it will be divided into sharding0db, and if the result is 1, it will be divided into sharding1db. algorithm-expression: sharding$->{uid % 2}db tables: t_user: actual-data-nodes: sharding$->{0..1}db.t_user_$->{0..1} table-strategy: inline: sharding-column: uid # The expression of fragmentation takes the remainder from the penultimate of uid. If it is result 0, it will be divided into t Uuser U0, and if it is result 1, it will be divided into t Uuser U1. algorithm-expression: t_user_$->{uid.intdiv(10) % 2} # MyBatis configuration mybatis: # Mapper map file location mapper-locations: classpath:mapper/*.xml # Aliases of all classes under the package. Configure aliases to omit the package path when receiving and returning parameter types in the object mapping file type-aliases-package: com.daniellin.demosharding.entity
Code
To prepare the UserDAO file:
@Mapper @Repository public interface UserDAO { /** * Get all users * * @return All users */ List<User> queryList(); /** * Add new users * * @param user new user */ void insert(User user); }
To prepare the XML map for UserDAO:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd"> <mapper namespace="com.daniellin.demosharding.dao.UserDAO"> <resultMap id="UserMapping" type="User"> <id column="uid" property="uid" javaType="int"/> <result column="name" property="name" javaType="String"/> <result column="mobile" property="mobile" javaType="long"/> <result column="credit_id" property="creditId" javaType="long"/> <result column="create_time" property="createTime"/> </resultMap> <!-- Get all users --> <select id="queryList" resultMap="UserMapping"> SELECT * FROM t_user </select> <!-- Add new users --> <insert id="insert" keyProperty="uid" parameterType="User" > INSERT INTO t_user(uid, name, mobile, credit_id, create_time) VALUES (#{uid},#{name},#{mobile},#{creditId},#{createTime}) </insert> </mapper>
Prepare User entity:
@Data public class User { private Integer uid; private String name; private String mobile; private String creditId; private Date createTime; }
Write unit test to insert data. Here, test by randomly generating UIDs of 100 users:
@RunWith(SpringRunner.class) @SpringBootTest public class DemoShardingApplicationTests { @Autowired private UserDAO userDAO; @Test public void testInsert() { System.out.println(("----- sharding insert method test ------")); for (int i = 0; i < 100; i++) { User userData = new User(); userData.setUid(new Random().nextInt(999999)); userData.setName(UUID.randomUUID().toString().replaceAll("-", "")); userData.setCreditId("1234567890"); userData.setMobile("1234567890"); userData.setCreateTime(new Date()); userDAO.insert(userData); } } }
Operation result
Viewing the data results, you can see that the data has been successfully inserted into the specified database table.
The last bit is odd, and the last and last bit is even. It is inserted into sharding1db. T ﹣ user ﹣ 0:
If the last bit is even and the last and last bit is odd, it is inserted into sharding0db. T ﹣ user ﹣ 1:
Reference material
# | Article link | author |
---|---|---|
1 | sharding: a database, table and partition that can be read by everyone | Golden Dragon |
2 | A rare practice of dividing databases and tables | crossoverjie |
3 | advanced-java | doocs |