Cache penetration analysis and pressure measurement

Posted by error_22 on Tue, 04 Jan 2022 21:07:39 +0100

Cache penetration

Cache penetration refers to querying data that must not exist in a database. The normal process of using cache is to query the data first. If the key does not exist or the key has expired, then query the database and put the queried objects into the cache. If the database query object is empty, it will not be put into the cache.

This article discusses one of the manifestations of cache breakdown:

For some keys with expiration time set, if these keys may be accessed at some time points, they are very "hot" data. At this time, another problem needs to be considered: the cache is "broken down".

  • Concept: when the cache expires at a certain point in time, there are a large number of concurrent requests for the Key at this point in time. When these requests find that the cache expires, they will generally load data from the backend dB and set it back to the cache. At this time, large concurrent requests may crush the backend DB instantly.
  • How to solve: use mutex. In short, when the cache fails (it is judged that the value is empty), instead of immediately loading dB, first use some operations of the cache tool with the return value of successful operations (such as SETNX of Redis or ADD of Memcache) to set a mutex key. When the operation returns success, then load db and reset the cache; Otherwise, retry the entire get cached method. Similar to the following code:
public String get(key) {
    String value = redis.get(key);
    if (value == null) { //Represents that the cache value has expired
        //Set a 3min timeout to prevent the next cache expiration from failing to load db when the del operation fails
        if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {  //Representative setting succeeded
            value = db.get(key);
                    redis.set(key, value, expire_secs);
                    redis.del(key_mutex);
            } else {  //At this time, it means that other threads at the same time have loaded dB and set it back to the cache. At this time, you can retry to obtain the cache value
                    sleep(50);
                    get(key);  //retry 
            }
        } else {
            return value;      
        }
}

Next, perform concurrent stress testing and Optimization:

The first is to do concurrent stress testing without setNX

The code is as follows:

package cn.chinotan.controller;

import lombok.extern.java.Log;
import org.apache.catalina.servlet4preview.http.HttpServletRequest;
import org.apache.commons.lang3.StringUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.concurrent.TimeUnit;

/**
 * @program: test
 * @description: redis test
 * @author: xingcheng
 * @create: 2019-03-09 16:26
 **/
@RestController
@RequestMapping("/redis")
@Log
public class RedisController {

    @Autowired
    StringRedisTemplate redisTemplate;

    public static final String KEY = "chinotan:redis:pass";

    public static final String VALUE = "redis-pass-value";

    /**
     * The simulation takes 3 seconds
     */
    public static final Long TIME_CONSUMING = 3 * 1000L;

    /**
     * VALUE Cache time 5 seconds
     */
    public static final Long VALUE_TIME = 5 * 1000L;

    @GetMapping(value = "/pass")
    public Object hello(HttpServletRequest request) throws Exception {
        long cacheStart = System.currentTimeMillis();
        String value = redisTemplate.opsForValue().get(KEY);
        long cacheEnd = System.currentTimeMillis();
        if (StringUtils.isBlank(value)) {
            // Simulate time-consuming operations and obtain data from the database
            long start = System.currentTimeMillis();
            TimeUnit.MILLISECONDS.sleep(TIME_CONSUMING);
            redisTemplate.opsForValue().set(KEY, VALUE, VALUE_TIME, TimeUnit.MILLISECONDS);
            long end = System.currentTimeMillis();
            log.info("Getting from the database takes time: " + (end - start) + "ms");
            return VALUE;
        } else {
            log.info("Time consuming to get from cache:" + (cacheEnd - cacheStart) + "ms");
            return value;
        }
    }

}

A very simple get request obtains data from the cache first. If the data does not exist, it is obtained from the database. Here, it is used

TimeUnit.MILLISECONDS.sleep(TIME_CONSUMING);

To simulate a complex operation of obtaining data from the database, the time is set to 3 seconds

Spring boot 2.0 is used in this test Deploy above 0, and jmeter performs stress concurrency test

Before the stress test, adjust the number of tomcat concurrency and connections that spring boot comes with and the redis connection pool

The connection pool of redis is adjusted as follows:

spring:
  redis:
    database: 0
    host: 127.0.0.1
    jedis:
      pool:
        #Maximum database connections
        max-active: 5000
        #Maximum number of pending connections
        max-idle: 5000
        #Maximum connection setup wait time. If this time is exceeded, an exception will be received. Set to - 1 for unlimited.
        max-wait: -1
        #The minimum number of waiting connections. Set 0 to unlimited
        min-idle: 10
#    lettuce:
#      pool:
#        max-active: 5000
#        max-idle: 5000
#        max-wait: -1
#        min-idle: 10
#      shutdown-timeout: 5000ms
    password:
    port: 6379
    timeout: 5000

tomcat is adjusted as follows:

server:
  port: 11111
  tomcat: 
      uri-encoding: UTF-8
      max-threads: 500
      max-connections: 10000

In this way, redis and tomcat can support large concurrent requests

Check whether the settings are effective after setting:

The redis connection pool does not work, for example:

It must be the same as the configuration item to be correct

Then prepare for pressure measurement: Download jmeter, and the next steps are as follows

After startup, the console prints as follows:

You can see that after a large number of concurrent operations, there will be many viewing operations, and there will be no cache. The cache hit rate is low, and the meaning of cache is much less

Optimize as follows:

package cn.chinotan.controller;

import lombok.extern.java.Log;
import org.apache.commons.lang3.StringUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.redis.core.RedisCallback;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import redis.clients.jedis.JedisCommands;

import java.util.Objects;
import java.util.concurrent.TimeUnit;

/**
 * @program: test
 * @description: redis test
 * @author: xingcheng
 * @create: 2019-03-09 16:26
 **/
@RestController
@RequestMapping("/redis")
@Log
public class RedisController {

    @Autowired
    StringRedisTemplate redisTemplate;

    public static final String KEY = "chinotan:redis:pass";

    public static final String NX_KEY = "chinotan:redis:nx";

    public static final String VALUE = "redis-pass-value";

    /**
     * The interval is 3 seconds
     */
    public static final Long NX_SLEEP_TIME = 50L;

    /**
     * The simulation takes 3 seconds
     */
    public static final Long TIME_CONSUMING = 1 * 1000L;

    /**
     * VALUE Cache time 5 seconds
     */
    public static final Long VALUE_TIME = 5 * 1000L;

    /**
     * Lock cache time 5 minutes
     */
    public static final Long NX_TIME = 5 * 60L;

    @GetMapping(value = "/pass")
    public Object hello() throws Exception {
        long cacheStart = System.currentTimeMillis();
        String value = redisTemplate.opsForValue().get(KEY);
        long cacheEnd = System.currentTimeMillis();
        if (StringUtils.isBlank(value)) {
            long start = System.currentTimeMillis();
            if (setNX(NX_KEY, NX_KEY)) {
                // Simulate time-consuming operations and obtain data from the database
                TimeUnit.MILLISECONDS.sleep(TIME_CONSUMING);
                redisTemplate.opsForValue().set(KEY, VALUE, VALUE_TIME, TimeUnit.MILLISECONDS);
                long end = System.currentTimeMillis();
                redisTemplate.delete(NX_KEY);
                log.info("Getting from the database takes time: " + (end - start) + "ms");
                return VALUE;
            } else {
                TimeUnit.MILLISECONDS.sleep(NX_SLEEP_TIME);
                log.info("Cache penetration recursion");
                return hello();
            }

        } else {
            log.info("Time consuming to get from cache:" + (cacheEnd - cacheStart) + "ms");
            return value;
        }
    }

    private boolean setNX(String key, String value) {
        Boolean aBoolean = redisTemplate.opsForValue().setIfAbsent(key, value);
        redisTemplate.expire(key, NX_TIME, TimeUnit.SECONDS);
        return aBoolean;
    }

}

Through the setNX command operation, when the cache exists, this command will not perform the overwrite update write operation and return false. When the cache does not exist, it will write and return true. It is usually used in the design and implementation of distributed locks

After optimization, a large number of concurrent requests will not hit the database, but will be retried recursively every 50ms. In this way, only one request will request the database, and other requests can only get data from the cache, greatly increasing the hit rate of the cache

The following are the pressure test results:

You can see that there is only one operation log fetching from the database, thus avoiding a performance problem of cache breakdown

Next optimization direction:

The setNX operation provided by RedisTemplate is not an atomic operation (one is to save data, the other is to set cache time, two requests). There may be problems in the concurrent environment. How to solve them? Welcome to leave a message