Generally speaking, no matter what database is used, there is a primary key named ID in the table. Since it is a primary key, uniqueness must be met. For MySQL users, it is mostly an auto_increment field. Some other users like to use UUID as the primary key, However, UUID is usually not a good choice for MySQL (especially InnoDB), because cluster index requires physical data to be sorted according to the primary key, and UUID itself is unordered, which will bring a lot of unnecessary IO consumption. Therefore, we come to a conclusion: ID is the best unique value of order.
In this way, use MySQL auto_increment , just add the field automatically? The problem is that this can not meet high availability, although different auto settings can be set through multiple servers_ Increase step size to improve availability, but the database itself is always the shortest board. As for solutions, there have been many similar discussions on the Internet:
- Detailed discussion on distributed ID generation method
- What kind of ID generator does the business system need
- List of generation methods of distributed Unique ID
- Architecture design and evolution of wechat serial number generator
The most popular solution, of course, is twitter snowflake , which roughly means that in order to avoid a single point of failure, the ID generator service is run on multiple nodes. Each node has its own independent ID, and the ID is prefixed with the time factor. Although the time of different servers may be different and the absolute order cannot be guaranteed, the overall trend can still be considered as sequential, and the IO burden can be ignored, At the same time, a counter is used as the suffix to ensure uniqueness.
Existing open source ID generators on the Internet, such as Chronos , all run as services, but this is too heavy for me, so I implemented a non service simplified ID generator in PHP. Although it is very simple, it is not simple and realizes the functions required by snowflake:
<?php class Sequence { const EPOCH = 1000000000000; const TIME_BITS = 41; const NODE_BITS = 10; const COUNT_BITS = 10; private $node = 0; private $ttl = 10; public function __construct($node) { $max = $this->max(self::NODE_BITS); if (is_int($node) === false || $node > $max || $node < 0) { throw new \InvalidArgumentException('node'); } $this->node = $node; } public function generate($time = null) { if ($time === null) { $time = (int)(microtime(true) * 1000); } return ($this->time($time) << (self::NODE_BITS + self::COUNT_BITS)) | ($this->node << self::COUNT_BITS) | ($this->count($time)); } public function restore($id) { $binary = decbin($id); $position = -(self::NODE_BITS + self::COUNT_BITS); return array( 'time' => bindec(substr($binary, 0, $position)) + self::EPOCH, 'node' => bindec(substr($binary, $position, - self::COUNT_BITS)), 'count' => bindec(substr($binary, - self::COUNT_BITS)), ); } public function setTTL($ttl) { $this->ttl = $ttl; } private function time($time) { $time -= self::EPOCH; $max = $this->max(self::TIME_BITS); if (is_int($time) === false || $time > $max || $time < 0) { throw new \InvalidArgumentException('time'); } return $time; } private function count($time) { $key = "seq:count:" . ($time % ($this->ttl * 1000)); while (!$count = apcu_inc($key)) { apcu_add($key, mt_rand(0, 9), $this->ttl); } $max = $this->max(self::COUNT_BITS); if ($count > $max) { throw new \UnexpectedValueException('count'); } return $count; } private function max($bits) { return -1 ^ (-1 << $bits); } } ?>
The implementation in this paper uses apcu To save data, but it does not need to exist in the form of a service. Taking 41} bit millisecond time as an example, theoretically, the maximum value can be saved to 2039-09-07. If EPOCH is considered, it can be saved for a longer time. Taking 10000000000 as an example, it can be saved to 2071-05-16. In addition, we leave 10 bits for nodes and 10 bits for counters. Theoretically, it can accommodate up to 1023 nodes, A maximum of 1023 ID S per millisecond per node. These thresholds are basically sufficient. Most of them have not reached the upper limit, and the system has hung up.
BTW: if some unrelated PHP processes use an id generator together, for example, PHP FPM and PHP cli use an id generator together, apcu is not appropriate and needs to be used at this time libshmcache.
It should be noted that at first, my design was not in milliseconds, but in seconds, but there was a problem in seconds: if PHP FPM was restarted in one second, it might produce non unique values. Although I could avoid the problem by sleep ing for one second in the restart script, it was too troublesome after all, so I simply designed in milliseconds, Because we can't restart PHP FPM in a millisecond interval, this problem doesn't exist.
However, if the server has time fallback, it may still produce non unique values, but several conditions need to be met: first, the server time fallback occurs; Secondly, the time when the ID is generated after fallback is just used before; Finally, the server clears the relevant cache for reasons such as LRU. It is basically difficult to meet these conditions, that is, for most PHP projects, the code in this article can be considered strong enough.
In addition, it's best not to use the generated ID directly, otherwise others can reverse solve the data, such as how many servers you have, etc. the solution is to use it in the application layer hashids Encoding and decoding. In this way, the original ID (Bigint) is saved in the database, but what the user sees is the HASH ID, which better protects the security of the data.