Solve the problem that the master node of redis distributed lock in master-slave architecture is down and the lock is lost

Posted by shailendra on Tue, 04 Jan 2022 08:08:54 +0100

 

Common implementation

When it comes to Redis distributed locks, most people will think of: setnx+lua, or know set key value px milliseconds nx. The core implementation commands of the latter method are as follows:

-Obtain lock (unique_value can be UUID, etc.)

SET resource_name unique_value NX PX 30000 

-Release the lock (in the lua script, be sure to compare the value to prevent accidental unlocking)

if redis.call("get",KEYS[1]) == ARGV[1] 
then return redis.call("del",KEYS[1]) 
else return 0 end

There are three key points in this implementation method (which is also where the interview probability is very high):

  1. set key value px milliseconds nx is used for the set command;
  2. value should be unique;
  3. When releasing the lock, verify the value value and do not unlock it by mistake;

In fact, the biggest disadvantage of this kind of trivial node is that it only works on one Redis node when locking. Even if Redis ensures high availability through sentinel, if the master-slave switch occurs for some reason, the lock will be lost:

  • Get the lock on the master node of Redis;
  • However, the locked key has not been synchronized to the slave node;
  • When the master fails, failover occurs, and the slave node is upgraded to the master node;
  • The lock is lost.

For this reason, Redis author antirez proposed a more advanced implementation method of distributed lock: Redlock based on the distributed environment.

The author believes that Redlock is also the only way for the interviewer to climax among all Redis distributed lock implementations.

Redlock implementation

The redlock algorithm proposed by antirez is roughly as follows:

In the Redis distributed environment, we assume that there are N Redis master s.
These nodes are completely independent of each other, and there is no master-slave replication or other cluster coordination mechanism.
We ensure that locks will be obtained and released on N instances using the same method as in Redis single instance.
Now let's assume that there are five Redis master nodes, and we need to run these Redis instances on five servers to ensure that they will not all go down at the same time.
In order to get the lock, the client should do the following:
Gets the current Unix time in milliseconds.
For example, attempts to obtain uid from 5 instances in turn are the same.
When requesting a lock from Redis, the client should set a network connection and response timeout, which should be less than the lock expiration time.
For example, if your lock automatically expires for 10 seconds, the timeout should be between 5-50 milliseconds.
This can prevent the client from waiting for the response result when the server-side Redis has hung up.
If the server fails to respond within the specified time, the client should try to obtain the lock from another Redis instance as soon as possible.
The client uses the current time to subtract the time to start acquiring the lock (the time recorded in step 1) to get the time to acquire the lock.
The lock is successful only if and only if the lock is obtained from most Redis nodes (N/2+1, here are three nodes) and the use time is less than the lock expiration time.
If the lock is obtained, the real effective time of the key is equal to the effective time minus the time used to obtain the lock (the result calculated in step 3).
If the lock acquisition fails for some reason (the lock is not obtained in at least N/2+1 Redis instances, or the lock acquisition time has exceeded the effective time), The client should unlock all Redis instances (even if some Redis instances are not locked successfully at all, to prevent some nodes from acquiring locks, but the client does not get a response, so that the lock cannot be acquired again in the next period of time).

Redlock source code

redisson has encapsulated the redislock algorithm. Next, we will briefly introduce its usage and analyze the core source code (assuming five redis instances).

POM relies on org Reisson reisson 3.3.2 usage

First, let's take a look at the distributed lock usage implemented by the redistribute encapsulated redlock algorithm, which is very simple and somewhat similar to the reentrant lock:

Config config1 = new Config(); 
config1.useSingleServer()
       .setAddress("redis://192.168.0.1:5378") 
       .setPassword("a123456").
       setDatabase(0); 
RedissonClient redissonClient1 = Redisson.create(config1); 
Config config2 = new Config(); 
config2.useSingleServer()
        .setAddress("redis://192.168.0.1:5379") 
        .setPassword("a123456").
        setDatabase(0); 
RedissonClient redissonClient2 = Redisson.create(config2); 
Config config3 = new Config(); 
config3.useSingleServer()
       .setAddress("redis://192.168.0.1:5380") 
       .setPassword("a123456")
       .setDatabase(0); 
RedissonClient redissonClient3 = Redisson.create(config3); 
String resourceName = "REDLOCK_KEY"; 
RLock lock1 = redissonClient1.getLock(resourceName); 
RLock lock2 = redissonClient2.getLock(resourceName); 
RLock lock3 = redissonClient3.getLock(resourceName); 
// Try locking three redis instances 
RedissonRedLock redLock = new RedissonRedLock(lock1, lock2, lock3); 
boolean isLock; 
try { // isLock = redLock. tryLock(); //  If you can't get the lock within 500ms, it is considered that obtaining the lock failed. 10000ms, i.e. 10s, is the lock failure time.
    isLock = redLock.tryLock(500, 10000, TimeUnit.MILLISECONDS); 
    System.out.println("isLock = "+isLock); 
    if (isLock) { 
        //TODO if get lock success, do something; 
    } } catch (Exception e) { 
      } finally { // In any case, unlock redlock in the end unlock(); } Unique ID

A very important point in implementing distributed locks is that the value of set should be unique. How does the value of reisson ensure the uniqueness of value? The answer is UUID+threadId.

The entry is redissonclient Getlock ("redlock_key"), the source code is in redisson Java and redissonlock In Java:

protected final UUID id = UUID.randomUUID(); 
String getLockName(long threadId) { return id + ":" + threadId; }Acquire lock

The code to obtain the lock is redlock Trylock() or redlock Trylock (500, 10000, timeunit. Milliseconds). The final core source code of both is the following code, but the default lease time for the former is LOCK_EXPIRATION_INTERVAL_SECONDS, i.e. 30s:

RFuture tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand command) { 
      internalLockLeaseTime = unit.toMillis(leaseTime); // The lua command to be executed on the redis instance when obtaining the lock 
      return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command, // First, the KEY of the distributed lock cannot exist. If it does not exist, execute the hset command (hset REDLOCK_KEY uuid+threadId 1) and set the expiration time (also the lease time of the lock) through pexpire  
      "if (redis.call('exists', KEYS[1]) == 0) then " 
      + "redis.call('hset', KEYS[1], ARGV[2], 1); " 
      + "redis.call('pexpire', KEYS[1], ARGV[1]); " 
      + "return nil; " + "end; " + // If the KEY of the distributed lock already exists and the value also matches, indicating that it is the lock held by the current thread, the number of reentries is increased by 1, and the expiration time is set 
      "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " 
      + "redis.call('hincrby', KEYS[1], ARGV[2], 1); " 
      + "redis.call('pexpire', KEYS[1], ARGV[1]); " 
      + "return nil; " + "end; "  + // Gets the number of milliseconds that the KEY of the distributed lock expires 
      "return redis.call('pttl', KEYS[1]);", // These three parameters correspond to KEYS[1], ARGV[1] and ARGV[2] respectively 
     Collections.singletonList(getName()), internalLockLeaseTime, getLockName(threadId)); 
}

In the command to obtain the lock,

  • KEYS[1] is collections Singletonlist (getname()), which represents the key of the distributed lock, namely REDLOCK_KEY;
  • ARGV[1] is the internalLockLeaseTime, that is, the lease time of the lock, which is 30s by default;
  • ARGV[2] is getLockName(threadId), which is the unique value of set when obtaining the lock, that is, UUID+threadId: release the lock

The code for releasing the lock is redlock Unlock (), the core source code is as follows:

protected RFuture unlockInnerAsync(long threadId) { // The lua command to be executed on the redis instance when releasing the lock 
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN, // If the distributed lock KEY does not exist, a message is issued to the channel 
    "if (redis.call('exists', KEYS[1]) == 0) then " + 
    "redis.call('publish', KEYS[2], ARGV[1]); " + 
    "return 1; " + "end;" + // If the distributed lock exists, but the value s do not match, indicating that the lock has been occupied, it is returned directly 
    "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " + 
    "return nil;" + "end; " + // If the current thread holds the distributed lock, the number of reentries will be reduced by 1 
    "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " + // If the value after the number of re-entry times minus 1 is greater than 0, it indicates that the distributed lock has re-entered, so only the expiration time is set and cannot be deleted 
    "if (counter > 0) then " + "redis.call('pexpire', KEYS[1], ARGV[2]); " + "return 0; " + "else " + // If the value after the number of reentries minus 1 is 0, it means that the distributed lock has been obtained only once, then delete the KEY and publish the unlocking message 
    "redis.call('del', KEYS[1]); " + 
    "redis.call('publish', KEYS[2], ARGV[1]); " + 
    "return 1; "+ "end; " + 
    "return nil;", // These five parameters correspond to KEYS[1], KEYS[2], ARGV[1], ARGV[2] and ARGV[3] respectively 
    Arrays.asList(getName(), getChannelName()), LockPubSub.unlockMessage, internalLockLeaseTime, getLockName(threadId)); 
}

reference resources: https://redis.io/topics/distloc