Distributed ID - snowflake algorithm

Posted by rallokkcaz on Sat, 18 Jan 2020 10:12:11 +0100

background

With the increasing business volume, the division of database is becoming more and more detailed, and the concept of sub database and sub table is also gradually implemented. The generation of primary key ID such as auto increase primary key or sequence no longer meets the demand, so the generation of distributed ID emerges as the times require. Generally speaking, the generation rules are more responsible and the probability of repetition is reduced.

1, Snowflake algorithm

The original version of the snowflake algorithm is scala version Used to generate distributed ID (pure number, time sequence), order number, etc.

Self increasing ID: it is not suitable for data sensitive scenarios and distributed scenarios.
GUID: meaningless string is adopted. When the amount of data increases, the access is too slow and it is not suitable for sorting.

Algorithm description:

  • The highest bit is the sign bit, always 0, not available.
  • 41 bit time series, accurate to milliseconds, 41 bit length can be used for 69 years. Time bits also play an important role in sorting by time.
  • The 10 bit machine ID supports the deployment of up to 1024 nodes.
  • The 12 bit count serial number, which is a series of self increasing IDS, can support the same node to generate multiple ID serial numbers in the same millisecond, and the 12 bit count serial number supports each node to generate 4096 ID serial numbers per millisecond.

II. Time part

The logical start of the time part is very simple. It is to specify a start time stamp, and then subtract the start time stamp from the current time stamp. The difference between these two numbers is the result we want. But there is a problem. If the system time goes wrong, after all, we normally take the server time, so we need to verify it

        //If the current time is less than the time of the last ID generation, the system returns and throws an exception
        if (now < LAST_TIME_STAMP) {
            log.info("System time is abnormal, please check!");
            throw new RuntimeException("System time is abnormal!");
        }

3, Machine information section

Machine information, 10 digits. Here we divide the machine information into two parts, one is the data center id, accounting for 5 bits, the other is the machine id, accounting for 5 bits. These two IDS can be customized according to different machines when deploying projects, so that each id can be guaranteed to be different artificially. In the jdk library, there is an api that can obtain the hostname and hostaddress of the local machine, take the information of hostname as the data center id and the information of hostaddress as the machine id. how to change the two strings into two digital IDs? It's very simple. Get the byte array of the string, and then add each number of the array to get the remainder of the maximum number of nodes. Because all of them are 5 bits, the maximum value is 31, and 32 needs to be redundant. Then the total number of machines that snow algorithm can deploy is 32 * 32 = 1024, which is the limit of machine information.

private static int getDataId() {
        try {
            return getHostId(Inet4Address.getLocalHost().getHostName(), DATA_MAX_NUM);
        } catch (UnknownHostException e) {
            return new Random().nextInt(DATA_RANDOM);
        }
    }

    private static int getWorkId() {
        try {
            return getHostId(Inet4Address.getLocalHost().getHostAddress(), WORK_MAX_NUM);
        } catch (UnknownHostException e) {
            return new Random().nextInt(WORK_RANDOM);
        }
    }


    private static int getHostId(String str, int max) {
        byte[] bytes = str.getBytes();
        int sums = 0;
        for (int b : bytes) {
            sums += b;
        }
        return sums % (max + 1);
    }

4, Sequence in milliseconds

Sequence in milliseconds. What do you mean? We use long now = System.currentTimeMillis(), which is a millisecond timestamps when generating timestamps. But even in such a short time, it is enough for a computer to generate many IDS, so many IDs may be generated in the same millisecond, that is, the value of the time part is the same. At this time, we need to add the id generated in the same millisecond to the digital sequence id, which is the sequence of the third part. The length of the third part is 12 bits, and the integer value is 4095, so the range of the last part is between 4095 and 0. What if the number of accesses in milliseconds exceeds this limit? It can't be solved. You have to wait until the next millisecond to reproduce the id.

Five, actual combat

As mentioned above, we can basically write methods, but considering multithreading, we need to add lock guarantee. Here is the complete code.

@Slf4j
public class SnowflakeUtil {
    /**
     * Length of time part
     */
    private static final int TIME_LEN = 41;
    /**
     * Length of data center id
     */
    private static final int DATA_LEN = 5;
    /**
     * Length of machine id
     */
    private static final int WORK_LEN = 5;
    /**
     * Length of sequence in milliseconds
     */
    private static final int SEQ_LEN = 12;
    /**
     * Definition start time: January 1, 2015 00:00:00
     */
    private static final long START_TIME = 1420041600000L;
    /**
     * Timestamp of last generated id
     */
    private static long LAST_TIME_STAMP = -1L;
    /**
     * Bits of time part moving left 22
     */
    private static final int TIME_LEFT_BIT = 64 - 1 - TIME_LEN;
    /**
     * Get data center id automatically (any number between 0-31 can be defined manually)
     */
    private static final long DATA_ID = getDataId();
    /**
     * Get machine id automatically (any number between 0-31 can be defined manually)
     */
    private static final long WORK_ID = getWorkId();
    /**
     * Data center id Max 31
     */
    private static final int DATA_MAX_NUM = ~(-1 << DATA_LEN);
    /**
     * Machine id Max 31
     */
    private static final int WORK_MAX_NUM = ~(-1 << WORK_LEN);
    /**
     * Randomly obtaining parameters of data center id 32
     */
    private static final int DATA_RANDOM = DATA_MAX_NUM + 1;
    /**
     * Randomly obtaining parameters of machine id 32
     */
    private static final int WORK_RANDOM = WORK_MAX_NUM + 1;
    /**
     * Data center id left shift 17
     */
    private static final int DATA_LEFT_BIT = TIME_LEFT_BIT - DATA_LEN;
    /**
     * Machine id left shift 12
     */
    private static final int WORK_LEFT_BIT = DATA_LEFT_BIT - WORK_LEN;
    /**
     * Sequence value in last MS
     */
    private static long LAST_SEQ = 0L;
    /**
     * Maximum value of sequence in milliseconds 4095
     */
    private static final long SEQ_MAX_NUM = ~(-1 << SEQ_LEN);
    private final Object object = new Object();

    public synchronized static long getId() {
        long now = System.currentTimeMillis();

        //If the current time is less than the time of the last ID generation, the system returns and throws an exception
        if (now < LAST_TIME_STAMP) {
            log.info("System time is abnormal, please check!");
            throw new RuntimeException("System time is abnormal!");
        }

        if (now == LAST_TIME_STAMP) {
            LAST_SEQ = (LAST_SEQ + 1) & SEQ_MAX_NUM;
            if (LAST_SEQ == 0) {
                now = nextMillis(LAST_TIME_STAMP);
            }
        } else {
            LAST_SEQ = 0;
        }

        LAST_TIME_STAMP = now;

        return ((now - START_TIME) << TIME_LEFT_BIT) | (DATA_ID << DATA_LEFT_BIT) | (WORK_ID << WORK_LEFT_BIT) | LAST_SEQ;
    }


    private static long nextMillis(Long lastMillis) {
        long now = System.currentTimeMillis();
        while (now <= lastMillis) {
            now = System.currentTimeMillis();
        }
        return now;
    }

    private static int getDataId() {
        try {
            return getHostId(Inet4Address.getLocalHost().getHostName(), DATA_MAX_NUM);
        } catch (UnknownHostException e) {
            return new Random().nextInt(DATA_RANDOM);
        }
    }

    private static int getWorkId() {
        try {
            return getHostId(Inet4Address.getLocalHost().getHostAddress(), WORK_MAX_NUM);
        } catch (UnknownHostException e) {
            return new Random().nextInt(WORK_RANDOM);
        }
    }


    private static int getHostId(String str, int max) {
        byte[] bytes = str.getBytes();
        int sums = 0;
        for (int b : bytes) {
            sums += b;
        }
        return sums % (max + 1);
    }


    public static void main(String[] args) {
//        SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
//        try {
//            long time = dateFormat.parse("2015-01-01 00:00:00").getTime();
//            System.out.println(time);
//        } catch (ParseException e) {
//            e.printStackTrace();
//        }


        for (int i = 0; i <10 ; i++) {
            new Thread(new Runnable() {
                @Override
                public void run() {
                    System.out.println("-------------");
                    System.out.println(getId());
                }
            }).start();
        }

    }
}

Study topics: Snowflake algorithm (07) final version of snowflake algorithm

161 original articles published, 49 praised, 160000 visitors+
Private letter follow

Topics: Database less Scala JDK