[snowflake algorithm] PHP generates snowflake algorithm and tests and uses it [original]

Posted by mickd on Sun, 19 Dec 2021 01:32:18 +0100

summary

At 12.09, there was a problem in the production environment. The specific reason is that there are two asynchronous tasks executing simultaneously on the line. These two asynchronous tasks generate tens of thousands of data and then write the table. The table's primary key ID is generated by snowflake algorithm. Specifically, snowflake in the public library is used Snowflake algorithm ID generated by nextId method of PHP file


The general logic is as follows:

foreach ($checkData as $v) {
    $data = [
        'client_id' => $v['client_id'], 
        'activity_tag' => $v['activity_tag'], 
        'id_type' => $v['id_type'], 
        'activity_type' => $v['activity_type'], 
        'update_time' => date("Y-m-d H:i:s")
    ];
    $data['id'] = $snowFlakeModel->nextId();
    $data['activity_tag'] = $v['activity_tag'];
    $data['create_time'] = date("Y-m-d H:i:s");
    $bool = $relationModel->create($data);
    if (!$bool) {
        Yii::warning('crowd_relation_info operate failed');
    }
}

The snowflake algorithm of this library has no problem in single machine single process, but it has problems in single machine multi process. There is a probability of id duplication


test

To verify whether there is a problem, I wrote a PHP script to test. The script is as follows:

YIi project \ console \ controllers \ testcontroller php

<?php

namespace app\commands;

use app\helper\SnowFlake;
use yii\console\Controller;
use Yii;

/**
 * Test whether the global ID generation algorithm is repeated under multiple processes
 * The global ID generated by default is placed in the set set of redis
 *
 * Principle:
 *  1. Generate global ID using snowflake algorithm
 *  2. The set set of redis is used to store the generated global ID. since the elements in the set set are unique, the insertion fails, so it can be considered that the same global ID has been generated
 *
 * usage method:
 *  1. Enter the console directory
 *  2. Execute the command: php yii test/main {number of processes} {number of global IDs generated by each process} the main process opens multiple sub processes to generate global IDs for testing
 *      * Number of processes: if it is not transmitted, the default is 2
 *      * Number of global ID s generated by each process: if not transmitted, the default is 20000
 *
 * Note: check the execution results of the process in the log. If there are duplicates, the log will be output, but not under the terminal
 *
 * 3. Use the command: php yii test/get {quantity} to view the saved random fixed number of global ID s
 * 4. Use the command: php yii test/del to delete the set set of redis [it must be deleted after the test is completed]
 */
class TestController extends Controller
{
    const KEY = 'FIN_PHP:BOLI_TEST_SET';
    protected static $redisObj = null;

    private static function getRedisObj()
    {
        if (self::$redisObj) {
            return self::$redisObj;
        }
        return Yii::$app->redis;
    }

    /**
     * The main process opens multiple sub processes to run
     * @param $n int Number of processes
     * @param $num int Number of global ID s generated per process
     *
     * Note: there are two processes by default, and each process generates 20000 global ID s
     */
    public function actionMain(int $n = 2, int $num = 20000)
    {
        var_dump('This is the main process');
        // Open multiple sub processes to generate global id asynchronously
        for ($i = 1; $i <= $n; $i++) {
            exec('php ' . __DIR__ . '/../yii test/run ' . $i . ' ' . $num . ' >/dev/null  &', $output, $return);
        }
        var_dump('Main process succeeded');
    }

    /**
     * Batch generate global id
     * Generate $num global IDs and save them to the set set of redis
     */
    public function actionRun($pid = 0, $num = 20000): int
    {
        Yii::info("Process:{$pid} start");

        $snowFlakeModel = new SnowFlake();
        $redis = self::getRedisObj();

        $i = 1;

        while (true) {
            if ($i > $num) break;
            $id = $snowFlakeModel->nextId();    // Old snowflake algorithm generation
            // If the id already exists in the set set, it is generated repeatedly, and the process will exit
            if ($redis->exists(self::KEY) && $redis->sismember(self::KEY, $id)) {
                Yii::info("Process: {$pid} Failed, repeated, ID Is:{$id},This is the second {$i} Secondary insertion");
                return 0;
            }
            // Insert elements into the set set
            $redis->sadd(self::KEY, $id);
            $i++;
        }

        Yii::info('redis of set The number of sets is:' . $redis->scard(self::KEY));
        Yii::info("Process: {$pid} success");
        return 1;
    }


    /**
     * Randomly take out the saved global ID
     * @param $num  int  The number of global ID S is ten by default
     * @return void
     */
    public function actionGet(int $num = 10)
    {
        $redis = self::getRedisObj();
        var_dump('redis of set The number of sets is:' . $redis->scard(self::KEY));
        var_dump("Random extraction{$num}Number:");
        var_dump($redis->spop(self::KEY, $num));
    }


    /**
     * Delete the set set in redis
     */
    public function actionDel()
    {
        var_dump('Start deletion');
        $redis = self::getRedisObj();

        if (empty($redis->exists(self::KEY))) {
            var_dump('The does not exist set aggregate');
        }

        // If there is a set set, delete it
        if ($redis->del(self::KEY)) {
            var_dump('Delete succeeded');
        }

        var_dump('Deletion failed');
    }
}

The principle is:

The PHP script runs multiple processes at the same time (the number of processes can be specified, and two processes are not specified by default). Each process circularly traverses and generates 20000 snowflake algorithm ID S, which are then inserted into Redis's set set set. Since the elements in the set set set are unique, you can judge whether there are duplicates by whether the insertion is successful


Note: for convenience, I use Yii's console script or PHP native script


The test results are as follows:

# php yii test/main
string(15) "This is the main process"
string(15) "Main process succeeded"

The log results are as follows:

2021-12-18 21:22:45 [-][-][-][info][application] Process: 1 start
2021-12-18 21:22:45 [-][-][-][info][application] Process: 1 failed. There are duplicates, ID It is 524698059632803840, which is the sixth insertion

2021-12-18 21:22:45 [-][-][-][info][application] Process: 2 start
2021-12-18 21:22:49 [-][-][-][info][application] redis of set Number of sets: 20002
2021-12-18 21:22:49 [-][-][-][info][application] Process: 2 successful

Process 1 reported an error and exited when inserting the sixth one, because a duplicate ID was generated

In other words, the snowflake algorithm library we are using now is problematic. In the case of multi process concurrency, duplicate snowflake algorithm ID s may be generated.


solve

In order to solve the above repeated ID problem, I copied some snowflake algorithms on Github, generated new snowflake algorithm classes, and tested them

reference resources: https://github.com/tengzbiao/php-snowflake


Mainly this class file idworker php:

<?php

namespace app\helper;

/**
 * Snowflake algorithm
 * Up to 4096 ID S can be generated per millisecond
 * reference resources: https://github.com/tengzbiao/php-snowflake
 *
 * Note: the snowflake algorithm can stably generate global ID under multi process concurrency and will not be repeated. It is recommended to use it
 * Note: SnowFlake algorithm generated by SnowFlake class may be repeated in case of multi process concurrency
 */
class IDWorker
{
    const WORKER_BITS = 6;
    const DATA_CENTER_BITS = 2;
    const EXTENSION_BITS = 2;
    const SEQUENCE_BITS = 12;

    private $timestampShift = self::SEQUENCE_BITS + self::EXTENSION_BITS + self::WORKER_BITS + self::DATA_CENTER_BITS;
    private $dataCenterShift = self::SEQUENCE_BITS + self::EXTENSION_BITS + self::WORKER_BITS;
    private $workerShift = self::SEQUENCE_BITS + self::EXTENSION_BITS;
    private $extensionShirt = self::SEQUENCE_BITS;
    private $workerMax = -1 ^ (-1 << self::WORKER_BITS);
    private $dataCenterMax = -1 ^ (-1 << self::DATA_CENTER_BITS);
    private $sequenceMax = -1 ^ (-1 << self::SEQUENCE_BITS);
    private $extensionMax = -1 ^ (-1 << self::EXTENSION_BITS);

    private static $ins;
    private $workerID; // Node ID
    private $dataCenterID; // Data center ID
    private $timestamp; // Last time
    private $epoch = 1514736000000; // 2018-01-01 00:00:00. Once the ID is defined and generated, do not change it, otherwise the same ID may be generated
    private $extension = 0;

    private function __construct($dataCenterID, $workerID, int $epoch)
    {
        if ($dataCenterID > $this->dataCenterMax) {
            throw new \Exception("data center id should between 0 and " . $this->dataCenterMax);
        }

        if ($workerID > $this->workerMax) {
            throw new \Exception("worker id should between 0 and " . $this->workerID);
        }

        $this->dataCenterID = $dataCenterID;
        $this->workerID = $workerID;

        if ($epoch > 0) {
            $this->epoch = $epoch;
        }

//        $epochMax = $this->getUnixTimestamp();
//        $epochMin = $epochMax - strtotime("1 year");
//        if ($this->epoch > $epochMax || $this->epoch < $epochMin) {
//            throw new \Exception(sprintf("epoch should between %s and %s", $epochMin, $epochMax));
//        }
    }

    /**
     * Generate instance
     * @param int $dataCenterID If the data center id is more than one data center, it is better to use different IDS, and support up to 4 data centers between 0-3
     * @param int $workerID If the machine id is more than one machine, it is better to use different IDs to support up to 64 machines between 0-63
     * @param int $epoch
     */
    public static function getInstance($dataCenterID = 1, $workerID = 1, int $epoch = 0)
    {
        if (is_null(self::$ins)) {
            self::$ins = new self($dataCenterID, $workerID, $epoch);
        }
        return self::$ins;
    }

    public function id()
    {
        $timestamp = $this->getUnixTimestamp();
        // Allow clock callback
        if ($timestamp < $this->timestamp) {
            $diff = $this->timestamp - $timestamp;
            if ($diff < 2) {
                sleep($diff);
                $timestamp = $this->getUnixTimestamp();
                if ($timestamp < $this->timestamp) {
                    $this->extension += 1;
                    if ($this->extension > $this->extensionMax) {
                        throw new \Exception("clock moved backwards");
                    }
                }
            } else {
                $this->extension += 1;
                if ($this->extension > $this->extensionMax) {
                    throw new \Exception("clock moved backwards");
                }
            }
        }

        $sequenceID = $this->getSequenceID();
        if ($sequenceID > $this->sequenceMax) {
            $timestamp = $this->getUnixTimestamp();
            while ($timestamp <= $this->timestamp) {
                $timestamp = $this->getUnixTimestamp();
            }
            $sequenceID = $this->getSequenceID();
        }
        $this->timestamp = $timestamp;
        $id = (int)($timestamp - $this->epoch) << $this->timestampShift
            | $this->dataCenterID << $this->dataCenterShift
            | $this->workerID << $this->workerShift
            | $this->extension << $this->extensionShirt
            | $sequenceID;

        return (string)$id;
    }

    private function getUnixTimestamp()
    {
        return floor(microtime(true) * 1000);
    }

    private function getSequenceID($max = 4096, $min = 0)
    {
        $key = ftok(__FILE__, 'd');
        $var_key = 100;
        $sem_id = sem_get($key);
        $shm_id = shm_attach($key, 4096);
        $cycle_id = 0;

        if (sem_acquire($sem_id)) {
            $cycle_id = intval(@shm_get_var($shm_id, $var_key) ?: 0);
            $cycle_id++;
            if ($cycle_id > $max) {
                $cycle_id = $min;
            }
            shm_put_var($shm_id, $var_key, $cycle_id);
            shm_detach($shm_id);
            sem_release($sem_id);
        }
        return $cycle_id;
    }

    private function __clone()
    {
    }
}

Use the above script to test: (Note: before testing, you need to use php yii test/del to delete the set set of Redis to avoid test data problems)

Modify the actionRun method of the script:

 /**
  * Batch generate global id
  * Generate $num global IDs and save them to the set set of redis
  */
public function actionRun($pid = 0, $num = 20000): int
{
    Yii::info("Process:{$pid} start");

    $idWorker = IDWorker::getInstance();
    $redis = self::getRedisObj();

    $i = 1;

    while (true) {
        if ($i > $num) break;
        $id = $idWorker->id();                // New snowflake algorithm generation
        // If the id already exists in the set set, it is generated repeatedly, and the process will exit
        if ($redis->exists(self::KEY) && $redis->sismember(self::KEY, $id)) {
            Yii::info("Process: {$pid} Failed, repeated, ID Is:{$id},This is the second {$i} Secondary insertion");
            return 0;
        }
        // Insert elements into the set set
        $redis->sadd(self::KEY, $id);
        $i++;
    }

    Yii::info('redis of set The number of sets is:' . $redis->scard(self::KEY));
    Yii::info("Process: {$pid} success");
    return 1;
}

Open 5 processes, and test each process with 40000 ID:

# php yii test/main  5 40000
string(15) "This is the main process"
string(15) "Main process succeeded"

Look at the log:

2021-12-18 23:29:21 [-][-][-][info][application] Process: 1 start
2021-12-18 23:30:07 [-][-][-][info][application] redis of set Number of collections: 199966
2021-12-18 23:30:07 [-][-][-][info][application] Process: 1 succeeded
2021-12-18 23:29:21 [-][-][-][info][application] Process: 2 start
2021-12-18 23:30:07 [-][-][-][info][application] redis of set Number of collections: 199974
2021-12-18 23:30:07 [-][-][-][info][application] Process: 2 successful
2021-12-18 23:29:22 [-][-][-][info][application] Process: 3 start
2021-12-18 23:30:07 [-][-][-][info][application] redis of set The number of collections is 199983
2021-12-18 23:30:07 [-][-][-][info][application] Process: 3 successful
2021-12-18 23:29:22 [-][-][-][info][application] Process: 4 start
2021-12-18 23:30:07 [-][-][-][info][application] redis of set The number of collections is: 199997
2021-12-18 23:30:07 [-][-][-][info][application] Process: 4 successful
2021-12-18 23:29:22 [-][-][-][info][application] Process: 5 start
2021-12-18 23:30:07 [-][-][-][info][application] redis of set Number of collections: 200000
2021-12-18 23:30:07 [-][-][-][info][application] Process: 5 successful

The result is no problem, that is to say, this snowflake algorithm class can stably generate a unique global ID in the scenario of multi process concurrency.


use

It is recommended to use IDWorker to generate snowflake algorithm ID

The new snowflake algorithm (IDWorker) is simple to use:

use app\helper\IDWorker;

$idWorker = IDWorker::getInstance();
$id = $idWorker->id();  

If there are multiple machines, you can use the following parameters:

$dataCenterID = 1;
$workerID = 1;   // Different machines use different ID s
$idWorker = IDWorker::getInstance($dataCenterID, $workerID);

See code for details:

/**
  * Generate instance
  * @param int $dataCenterID If the data center id is more than one data center, it is better to use different IDS, and support up to 4 data centers between 0-3
  * @param int $workerID If the machine id is more than one machine, it is better to use different IDs to support up to 64 machines between 0-63
  * @param int $epoch
  */
public static function getInstance($dataCenterID = 1, $workerID = 1, int $epoch = 0)
{
    if (is_null(self::$ins)) {
        self::$ins = new self($dataCenterID, $workerID, $epoch);
    }
    return self::$ins;
}

Topics: PHP Algorithm