Preface
We often use cache to improve data query speed. Due to the limited cache capacity, when the cache capacity reaches the upper limit, we need to delete some data and move out of space, so that new data can be added. Cache data cannot be deleted randomly. Generally, we need to delete cache data according to some algorithm. The commonly used elimination algorithms are LRU,LFU,FIFO. In this article, we will talk about LRU algorithm.
LRU introduction
LRU is the abbreviation of Least Recently Used. This algorithm believes that the most recently used data is hot data, which will be used again in the next time. However, the data rarely used recently will not be used next time. When the cache capacity is full, the data rarely used recently will be eliminated first.
Suppose to cache the internal data as shown in the figure:
Here we call the first node in the list the head node, and the last node the tail node.
When calling the cache to get the data with key=1, the LRU algorithm needs to move the 1 node to the head node, and the rest nodes remain unchanged, as shown in the figure.
Then we insert a key=8 node. At this time, the cache capacity reaches the upper limit, so we need to delete the data before joining. Because every query will move the data to the head node, the data that is not queried will sink to the tail node, and the tail data can be regarded as the least accessed data, so the data of the tail node will be deleted.
Then we add the data directly to the head node.
Here we summarize the specific steps of LRU algorithm:
- New data is inserted directly into the list header
- Cache data is hit, move data to list header
- When the cache is full, remove the tail data of the list.
LRU algorithm implementation
As you can see in the above example, the LRU algorithm needs to add the head node and delete the tail node. The time complexity O(1) of adding / deleting nodes in the linked list is very suitable to be used as a storage cache data container. However, ordinary one-way linked list cannot be used. There are several disadvantages of one-way linked list:
- Every time we get any node data, we need to traverse from the beginning of the node, which leads to the complexity of getting node O(N).
- When we move the intermediate node to the end node, we need to know the information of the previous node of the intermediate node, and the one-way linked list has to traverse again to get the information.
To solve the above problems, we can combine other data structures.
Using hash storage nodes, the complexity of getting nodes will be reduced to O(1). In the node movement problem, the precursor pointer can be added to the node to record the information of the previous node, so that the linked list becomes a two-way linked list from a one-way linked list.
To sum up, the combination of two-way linked list and hash table is used, and the data structure is as shown in the figure:
Two "sentry" nodes are specially added to the two-way linked list, which do not need to store any data. Using sentinel nodes, when adding / deleting nodes, you can simplify the programming difficulty and reduce the code complexity without considering the existence of boundary nodes.
LRU algorithm implementation code is as follows. In order to simplify key, val considers int type.
public class LRUCache { Entry head, tail; int capacity; int size; Map<Integer, Entry> cache; public LRUCache(int capacity) { this.capacity = capacity; // Initialize linked list initLinkedList(); size = 0; cache = new HashMap<>(capacity + 2); } /** * If the node does not exist, return - 1. If it exists, move the node to the head node and return the data of the node. * * @param key * @return */ public int get(int key) { Entry node = cache.get(key); if (node == null) { return -1; } // There are mobile nodes moveToHead(node); return node.value; } /** * Add the node to the head node. If the capacity is full, the tail node will be deleted. * * @param key * @param value */ public void put(int key, int value) { Entry node = cache.get(key); if (node != null) { node.value = value; moveToHead(node); return; } // Non-existent. Add it first, then remove the tail node // At this time, the capacity is full to delete the tail node if (size == capacity) { Entry lastNode = tail.pre; deleteNode(lastNode); cache.remove(lastNode.key); size--; } // Join header node Entry newNode = new Entry(); newNode.key = key; newNode.value = value; addNode(newNode); cache.put(key, newNode); size++; } private void moveToHead(Entry node) { // First delete the relationship of the original node deleteNode(node); addNode(node); } private void addNode(Entry node) { head.next.pre = node; node.next = head.next; node.pre = head; head.next = node; } private void deleteNode(Entry node) { node.pre.next = node.next; node.next.pre = node.pre; } public static class Entry { public Entry pre; public Entry next; public int key; public int value; public Entry(int key, int value) { this.key = key; this.value = value; } public Entry() { } } private void initLinkedList() { head = new Entry(); tail = new Entry(); head.next = tail; tail.pre = head; } public static void main(String[] args) { LRUCache cache = new LRUCache(2); cache.put(1, 1); cache.put(2, 2); System.out.println(cache.get(1)); cache.put(3, 3); System.out.println(cache.get(2)); } }
LRU algorithm analysis
Cache hit rate is a very important index of the cache system. If the cache hit rate of the cache system is too low, the query will flow back to the database, resulting in the increase of the pressure on the database.
The advantages and disadvantages of LRU algorithm are analyzed.
The advantage of LRU algorithm is that it is not difficult to implement the algorithm. For hot data, LRU efficiency will be very good.
The disadvantage of LRU algorithm is that for sporadic batch operations, such as batch query of historical data, it is possible to replace the hot data in the cache with these historical data, causing cache pollution, reducing cache hit rate and slowing down normal data query.
LRU algorithm improvement scheme
The following scheme sources and MySQL InnoDB LRU improved algorithm
Divide the list into two parts, hot data area and cold data area, as shown in the figure.
After improvement, the algorithm flow will be the same as the following:
- If the access data is located in the hot data area, it will be moved to the head node of the hot data area, just like the previous LRU algorithm.
- When inserting data, if the cache is full, the data of the tail node will be eliminated. Then insert the data into the head node of the cold data area.
-
Each time the data in the cold data area is accessed, the following judgment shall be made:
- If the data has been in the cache for more than a specified time, such as 1 s, it will be moved to the header node of the hot data area.
- If the data exists at a time less than the specified time, the location remains unchanged.
For occasional batch queries, the data will only fall into the cold data area, and then will be eliminated soon. The data in the hot data area will not be affected, which solves the problem that the cache hit rate of LRU algorithm drops.
Other improved methods include LRU-K, 2q and LIRs algorithm, which can be consulted by interested students.
Welcome to pay attention to my public number: procedure, get daily dry goods push. If you are interested in my topic content, you can also follow my blog: studyidea.cn