Implementing Redis distributed lock from zero to one coding

Posted by phifgo on Thu, 09 Dec 2021 07:03:02 +0100

Some students are so pissing. You can understand it. If you don't do it yourself, how can you understand it thoroughly? Let's do it together!

Usage scenario and model selection

The distributed multi node deployment mode makes it possible for shared variables to be operated at the same time. In case of data consistency requirements, global locking measures need to be taken to ensure the consistency requirements under concurrent operations, such as inventory deduction, shelves and updates of the same commodity, etc.

Common distributed locks are implemented by Zookeeper and Redis. How to choose?

In the production environment, performance is often given priority. Compared with their respective advantages and disadvantages, we generally prefer redis.

Implementing distributed locks from 0 to 1

Step 1: basic ability construction of locking and unlocking

Jedis.set(key, value, params) 👏🏻
The enhanced set command added after 2.6 is really good. It solves the atomic demand of setting lock timeout when locking and prevents deadlock caused by service downtime~

(1) A distributed lock object with lock unlocking function must have at least jedis client, corresponding redis key and lock timeout:

//Building distributed lock objects
public class DistributedLock {
    private Jedis jedis;
    private String lockName;
    private long lockExpireSecond;

    public DistributedLock(Jedis jedis, String lockName, long lockExpireSecond) {
        this.jedis = jedis;
        this.lockName = lockName;
        this.lockExpireSecond = lockExpireSecond;
    }
}

(2) Using SetParams provided by jedis, NX and PX are displayed in jedis Completion setting of one-time atoms in set operation:

public void lock() throws BizException {
    String lockResult = null;
    try {
       //Set NX PX parameters
       SetParams params = new SetParams();
       params.nx();
       params.px(TimeUnit.SECONDS.toMillis(lockExpireSecond));
       //Execute locking, and value is temporarily fixed string
       lockResult = this.jedis.set(this.lockName, "lockValue", params);

   } catch (Exception e) {
       LOG.error("lock error",e);
   }

   if ("OK".equals(lockResult)) {
       LOG.debug("locked success,lockName:{}",lockName);
   } else {
      throw new BizException("Get lock failed.");
   }
}

(3) Use jedis Del command completes unlocking:

 public boolean unlock() {
    boolean unlockResult=false;

   try {
       this.jedis.del(this.lockName);
       unlockResult=true;
   }catch (Exception e){
      LOG.error("unLock error",e);
   }
    return unlockResult;
}

Step 2: failed to lock. End directly? I hope to try more

From the above constructor and lock() implementation, it is found that the current implementation belongs to a one-off deal. If it is unsuccessful, it will become benevolence. In fact, this does not meet our production needs. In many scenarios, the business execution speed is very fast. Just wait a little. What can we do?

User defined retry times and waiting interval, limited retry waiting

//New retry interval attribute
private long retryIntervalTime; 

//Initialize retry interval by constructor
public DistributedLock(Jedis jedis, String lockName, long lockExpireSecond, long retryIntervalTime) {
   ...slightly
   this.retryIntervalTime = retryIntervalTime;
}

//Add input parameter and lock timeout
public void lock(long timeout,TimeUnit unit) throws TimeoutException {
   String lockResult = null;
   try {
       //Set NX PX parameters
       SetParams params = new SetParams();
       params.nx();
       params.px(TimeUnit.SECONDS.toMillis(lockExpireSecond));
            
      //Lock start time
      long startTime=System.nanoTime();
            
      //Cyclic finite wait
      while (!"OK".equals(lockResult=this.jedis.set(this.lockName, "lockValue", params))&&!isTimeout(startTime,unit.toNanos(timeout))){
           Thread.sleep(retryIntervalTime);
     }

  } catch (Exception e) {
      LOG.error("lock error",e);
  }
        
  //Modify the thrown exception type to timeout exception
  if ("OK".equals(lockResult)) {
       LOG.debug("locked success,lockName:{}",lockName);
  } else {
      throw new TimeoutException("Get lock failed because of timeout.");
  }
}

step3: you can only unlock the lock you added, and others' locks can't be moved

Consider a problem: in order to prevent machine downtime after locking, we set an expiration time for the lock, so as to ensure that the lock can also provide lock operation for subsequent businesses when the service node is down and can not be unlocked.

In the figure above, the uncontrollable business execution time (or unexpected pauses such as GC) brings problems to the use of distributed locks.

Let's look at problem 1 first: user thread 1} released the lock of thread 2! What shall I do?

Lock, save the thread ID, unlock the verification, and do not release the lock that is not your own

//Other attributes are omitted, and the lockOwner ID is added
private String lockOwner;

//Initializes the lockOwner identity through the constructor 
public DistributedLock(Jedis jedis, String lockName, String lockOwner, long lockExpireSecond, long retryIntervalTime) {
    ...slightly
    this.lockOwner = lockOwner;
}

public void lock(long timeout,TimeUnit unit) throws TimeoutException {
   String lockResult = null;
   try {
      //Set NX PX parameters
      SetParams params = new SetParams();
      params.nx();
      params.px(TimeUnit.SECONDS.toMillis(lockExpireSecond));
            
      //Lock start time
      long startTime=System.nanoTime();
            
      // The value at set time is changed to lockOwner
     while (!"OK".equals(lockResult=this.jedis.set(this.lockName, this.lockOwner, params))&&!isTimeout(startTime,unit.toNanos(timeout))){
         Thread.sleep(retryIntervalTime);
      }
   } catch (Exception e) {
       LOG.error("lock error",e);
   }
   ...slightly
}
    
public boolean unlock() {
    boolean unlockResult=false;
    try {
       // Get the value first and match it with the current lockOwner before unlocking
        if (this.lockOwner.equals(this.jedis.get(this.lockName))) {
           this.jedis.del(this.lockName);
           unlockResult = true;
       }
   }catch (Exception e){
        LOG.error("unLock error",e);
  }
    return unlockResult;
}

Some students said that this unlocked place needs to be wrapped into atomic operations with lua. In terms of function alone, the above implementation is also OK, because the following operations will be carried out only when the obtained result matches itself. The purpose of packaging lua scripts should be mainly to reduce one transmission and improve execution efficiency.

Step4: concurrency conflict caused by insufficient expire time

That is, problem 2 in the previous figure: when thread 1 is still executing, the lock expires and is released, resulting in successful locking of thread 2, which directly leads to business conflicts between threads. What shall I do?

During the lock holding period, the expiration time of the lock can be dynamically extended as needed

The scheme selection for triggering lock delay is also a major event. jdk native timer, scheduling thread pool and netty timer can be implemented. Which one is better?

In terms of comprehensive comparison accuracy and resource consumption, the Timer using the time wheel algorithm in Netty should be the first choice. It can manage thousands of connections, schedule heartbeat detection, and use it to make a lock delay?

• first, you need to build a global Timer to store and schedule tasks • second, you need to add a timed trigger task after locking succeeds • third, you need to verify whether the current thread still holds the lock when delaying the operation • finally, you need to cancel the timed task when unlocking • note that the task needs to be registered circularly, taking into account the interruption of the thread

Build a distributed lock context to store the global time wheel scheduler:

public class LockContext {

    private HashedWheelTimer timer;

    private LockContext(){
        //Time wheel parameters can be obtained from the business's own configuration
        // long tickDuration=(Long) config.get("tickDuration");
        // int tickPerWheel=(int) config.get("tickPerWheel"); // Default 1024
        // boolean leakDetection=(Boolean)config.get("leakDetection");
        timer = new HashedWheelTimer(new DefaultThreadFactory("distributedLock-timer",true), 10, TimeUnit.MILLISECONDS, 1024, false);
    }

Pass the context and scheduler into the distributed lock object through the constructor:

public class DistributedLock {
    //context
    private LockContext context;
    //Currently held Timer scheduling object
    private volatile Timeout  lockTimeout;

    public DistributedLock(Jedis jedis, String lockName, String lockOwner, long lockExpireSecond, long retryIntervalTime, LockContext context) {
         ...Other attributes are omitted
        this.context = context;
    }

After locking is successful, execute the scheduler registration operation:

public void lock(long timeout, TimeUnit unit) throws TimeoutException {
    //... Locking strategy
    
   if ("OK".equals(lockResult)) {
       LOGGER.info("locked success,lockName:{}",lockName);
       try {
           //Registration cycle delay event
           registerLoopReExpire();
       }finally {
           if (Thread.currentThread().isInterrupted()&&this.lockTimeout!=null){
               LOGGER.warn("Thread interrupt, scheduled task cancel");
               this.lockTimeout.cancel();
           }
       }
   } else {
       throw new TimeoutException("Get lock failed because of timeout.");
    }
}

The method registerloop reexpire() contains the actual task registration and postponement operations:

private void registerLoopReExpire() {
    LOGGER.info("Distributed lock deferred task registration");

    //Each time you register, you assign timeout to the current lock object for cancellation in subsequent unlocking
    this.lockTimeout = context.getTimer().newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) throws Exception {
        
            //Verify that the lock is still held and extend the expiration time
            boolean isReExpired=reExpireLock(lockName,lockOwner);
        
            if (isReExpired) {
                //Adjust yourself and register circularly
                registerLoopReExpire();
            }else {
                lockTimeout.cancel();
            }
        }
    }, TimeUnit.SECONDS.toMillis( lockExpireSecond)/2, TimeUnit.MILLISECONDS);

    LOGGER.info("Distributed lock delay task registration completed");
}

Here are several points to focus on:

• the newTimeout() operation will return a Timeout entity, which we need to rely on to manage the current task, so we need to assign it to the internal object of the lock. • Lock delay needs to be judged according to lockOwner  and  lockName. Lock can only be added after holding the lock. lua method needs to be used to ensure the atomicity of judgment and execution. • After the postponement operation, follow-up processing needs to be carried out according to the results. If successful, continue to register, and if failed, cancel the current task. • The execution time of the scheduled task should be less than the expiration time of the lock. Take 1 / 2 or 1 / 3 of the expiration time or user-defined input.

Let's verify that we set the lock expiration time to 3 seconds and the service execution time to 10 seconds. Execution:

It can be seen that the scheduled task has been postponed for 6 times. The last registration was successful, but the unlocking task was cancelled after the business was executed.

Summary and review

In this paper, we encode and implement distributed locks from 0 to 1. Various demands from basic capabilities to production environment have been basically filled and improved.

It is worth mentioning that except for the delay function, most of the above capabilities have been tested in the production environment. If you find any problems with the implementation of the extension function, please leave a message to correct and discuss progress together.

Of course, the above contents are still missing, such as the delayed implementation of jedis # operation lua script and the transformation of reentry lock. Due to space reasons, they are not posted. Interested students can continue to improve according to the above ideas.

In addition, our above implementations are based on the master-slave architecture. Therefore, distributed locks may be abnormal in master-slave switching or other downtime scenarios. Personally, I think it is not necessary to sacrifice efficiency to ensure stable redLock in most scenarios. As for this part, in fact, several number masters have described it very well. You can search and have a look.

Finally, when we compare redisson's distributed lock implementation and look back and forth at our own implementation, we will find that the implementation of the main logic is basically the same, but redisson should be more complete in terms of reentry and efficiency (the application of netty Architecture).

Get on paper and finally feel shallow ~ encourage each other~

Topics: Programming Database Redis Programmer Distribution