Author javadoop, Senior Java Engineer.This article has been authorized to be published by the author.
Text Link https://www.javadoop.com/post...
This paper mainly introduces two parts about flow control.
The first part introduces the source code of RateLimiter in Guava, including its two modes. Most articles on the Internet now only analyze simple SmoothBursty modes, not SmoothWarmingUp with preheating.
The second part describes the implementation of flow control in Sentinel. This article does not require the reader to understand Sentinel. The coupling between Sentinel and this part is very low, so the reader does not need reading pressure.
The flow control design in Sentinel refers to Guava RateLimiter, so reading the second part requires the background of the first part.
Guava RateLimiter
RateLimiter is based on the leaky bucket algorithm, but it refers to the token bucket algorithm. Flow control algorithms are not discussed here. Look for your own data.
Introduction to RateLimiter use
RateLimiter's interface is very simple. It has two static methods for instantiation. After instantiation, we just need to care about acquire, or even have no release operation.
// RateLimiter interface list:
// There are two ways to instantiate: public static RateLimiter create(double permitsPerSecond){} public static RateLimiter create(double permitsPerSecond,long warmupPeriod,TimeUnit unit) {} public double acquire() {} public double acquire(int permits) {} public boolean tryAcquire() {} public boolean tryAcquire(int permits) {} public boolean tryAcquire(long timeout, TimeUnit unit) {} public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {} public final double getRate() {} public final void setRate(double permitsPerSecond) {}
RateLimiter is used to limit flow. We know that Semaphore is available in java Concurrent packages and it also provides control over resource usage. Let's look at the following code:
// Semaphore Semaphore semaphore = new Semaphore(10); for (int i = 0; i < 100; i++) { executor.submit(new Runnable() { @Override public void run() { semaphore.acquireUninterruptibly(1); try { doSomething(); } finally { semaphore.release(); } } }); }
Semaphore controls the concurrent number of concurrent accesses to a resource at the same time. For example, with the code above, we set up 100 threads to work, but we can do up to 10 threads in the doSomething() method at the same time.It controls the number of concurrencies.
RateLimiter, on the other hand, is used to control the rate at which resources are accessed. RateLimiter emphasizes controlling the rate.For example, control allows only 100 requests per second to pass, such as allowing 1 MB of data to be sent per second.
It is constructed by specifying a permitsPerSecond parameter that represents how many permits are generated per second, which is our rate.
RateLimiter allows preemptive tokens in the future, such as five permits per second, and we can request 100 at a time so that the next request that follows will take about 20 seconds to get the permits.
Introduction to SmoothRateLimiter
RateLimiter currently has only one subclass, which is the abstract class SmoothRateLimiter. SmoothRateLimiter has two implementation classes, which are the two modes we're going to cover here. We'll start with a brief introduction to SmoothRateLimiter, followed by two subsections describing its two implementation classes.
RateLimiter is an abstract class with only two properties:
private final SleepingStopwatch stopwatch; private volatile Object mutexDoNotUseDirectly;
stopwatch is important because it is used for "timing". RateLimiter sets the instantiated time to 0, followed by relative time in microseconds.
mutexDoNotUseDirectly is used for locks, and RateLimiter relies on synchronized controls for concurrency, so we can see later that no attribute is even decorated with volatile.
Then let's look at the properties of SmoothRateLimiter and what they mean.
// How many permits are not currently in use and how many are saved double storedPermits; // Maximum number of permits allowed to cache, which is the maximum that storedPermits can reach double maxPermits; // How often does one permit occur? // For example, in our construction method, we set five per second, one every 200ms, where the units are microseconds, or 200,000 double stableIntervalMicros; // The next time you can get permits, this time is relative to RateLimiter's construction time, a relative time, understand as a timestamp private long nextFreeTicketMicros = 0L;
In fact, looking at these attributes, we can roughly guess that its internal implementation:
NextFreeTicketMicrosoft is a key property.Every time we get permits, we take the value of storedPermits first. If it is enough, storedPermits subtracts the corresponding value. If not enough, we need to push nextFreeTicketMicrosoft forward to indicate how much time I have preoccupied the next.Then when the next request comes, if you haven't reached the time of nextFreeTicketMicros yet, you need sleep to go back to that point, and of course, push this value forward as well.
You may be confused here because time is moving forward, so information about storedPermits may be inaccurate, but you just need to synchronize and recalculate in key operations.
SmoothBursty analysis
Let's start with a simpler SmoothBursty to analyze RateLimiter's source code, then SmoothWarmingUp.
Bursty means sudden. It doesn't mean: we set up 1 K per second, and we can get 5 K permits at once. This scene doesn't mean sudden, it means preoccupying the permits generated in the next few seconds.
Suddenly, RateLimiter caches a certain number of permits in the pool so that sudden requests can be met in time.Imagine one of our interfaces, which has not been requested for a long time, and suddenly there are several requests coming at the same time. If we don't cache some permits, many threads will have to wait.
SmoothBursty defaults to cache permits for up to 1 second and cannot be modified.
RateLimiter's static construction method:
public static RateLimiter create(double permitsPerSecond) { return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer()); }
The construction parameter permitsPerSecond specifies how many permits can be generated per second.
static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) { RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */); rateLimiter.setRate(permitsPerSecond); return rateLimiter; }
We see that this instantiates an instance of SmoothBursty, which is constructed simply and has only one attribute, maxBurstSeconds, without any code.
The constructor specifies that maxBurstSeconds is 1.0, which means that up to one second will be cached, i.e. (1.0 * permitsPerSecond) so many permits will be in the pool.
This 1.0 seconds relates to storedPermits and maxPermits:
0 <= storedPermits <= maxPermits = permitsPerSecond
Let's move on to the setRate method:
public final void setRate(double permitsPerSecond) { checkArgument( permitsPerSecond > 0.0 && !Double.isNaN(permitsPerSecond), "rate must be positive"); synchronized (mutex()) { doSetRate(permitsPerSecond, stopwatch.readMicros()); } }
setRate is a public method that can be used to adjust the rate.We continue to follow the initialization process here, but you know in advance that this method is used to adjust the rate, which is very helpful for understanding the source code.Note that synchronized controls concurrency here.
@Override final void doSetRate(double permitsPerSecond, long nowMicros) { // synchronization resync(nowMicros); // Calculated Property stableIntervalMicros double stableIntervalMicros = SECONDS.toMicros(1L) / permitsPerSecond; this.stableIntervalMicros = stableIntervalMicros; doSetRate(permitsPerSecond, stableIntervalMicros); }
The resync method is simple and is used to adjust storedPermits and nextFreeTicketMicros oft.That's what we've said. At key nodes, you need to update storedPermits to the correct value first.
void resync(long nowMicros) { // If nextFreeTicket has passed, imagine a scenario where limiter.acquire() has not been called again for a long time // nextFreeTicket needs to be set to the current time to recalculate storedPermits if (nowMicros > nextFreeTicketMicros) { double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros(); storedPermits = min(maxPermits, storedPermits + newPermits); nextFreeTicketMicros = nowMicros; } }
coolDownIntervalMicros() is a method that you don't need to focus on for a moment. You can see that the implementation in the SmothBursty class returns the value of stableIntervalMicros directly, that is, the length of time we said each permit was generated.
Careful readers, of course, may find that stableIntervalMicros is not set at this point, that is, a divide by zero operation occurred above and the resulting newPermits are actually infinite.maxPermits is still 0 at this time, but it doesn't really matter here.
Let's go back to the previous method, after resync synchronization, stableIntervalMicros is set to a correct value, then we go to the following method:
@Override void doSetRate(double permitsPerSecond, double stableIntervalMicros) { double oldMaxPermits = this.maxPermits; // Here, the permits produced by maxPermits in one second are calculated maxPermits = maxBurstSeconds * permitsPerSecond; if (oldMaxPermits == Double.POSITIVE_INFINITY) { // if we don't special-case this, we would get storedPermits == NaN, below storedPermits = maxPermits; } else { // Because the range of storedPermits has changed, you need to scale equally storedPermits = (oldMaxPermits == 0.0) ? 0.0 // initial state : storedPermits * maxPermits / oldMaxPermits; } }
This method, as we see above, was originally initialized with a permitsPerSecond value, and now we're going to adjust this frequency.For maxPermits, it is recalculated, and for storedPermits, it is scaled equally.
Now that the construction method is complete, we have an instance of RateLimiter's implementation class SmoothBursty. Perhaps you still have some confusion about the source code above, but that's okay. If you look further, maybe a lot of your confusion will be solved.
Next, let's analyze the acquire method:
@CanIgnoreReturnValue public double acquire() { return acquire(1); } @CanIgnoreReturnValue public double acquire(int permits) { // Reservations, if permits are not currently available directly, need to wait // Return value represents how long a sleep is required long microsToWait = reserve(permits); // sleep stopwatch.sleepMicrosUninterruptibly(microsToWait); // Time to return to sleep return 1.0 * microsToWait / SECONDS.toMicros(1L); }
Let's look at the reserve method:
final long reserve(int permits) { checkPermits(permits); synchronized (mutex()) { return reserveAndGetWaitLength(permits, stopwatch.readMicros()); } } final long reserveAndGetWaitLength(int permits, long nowMicros) { // Return to nextFreeTicketMicros oft long momentAvailable = reserveEarliestAvailable(permits, nowMicros); // Calculating time return max(momentAvailable - nowMicros, 0); }
Keep looking in:
@Override final long reserveEarliestAvailable(int requiredPermits, long nowMicros) { // Do a synchronization here to update storedPermits and nextFreeTicketMicros oft (if needed) resync(nowMicros); // The return value is nextFreeTicketMicros oft, notice that resync has just been done and that it is the latest correct value long returnValue = nextFreeTicketMicros; // How many permits can be used in storedPermits double storedPermitsToSpend = min(requiredPermits, this.storedPermits); // Not enough in storedPermits double freshPermits = requiredPermits - storedPermitsToSpend; // How long to wait for this insufficient part long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend) // This part returns 0 fixed + (long) (freshPermits * stableIntervalMicros); // Push nextFreeTicketMicros oft forward this.nextFreeTicketMicros = LongMath.saturatedAdd(nextFreeTicketMicros, waitMicros); // storedPermits minus the removed part this.storedPermits -= storedPermitsToSpend; return returnValue; }
We can see that when we acquire permits, we actually acquire two parts, one from the stock storedPermits, the other from the preoccupied freshPermits in the future if the stock is insufficient.
A key point to mention here is that we see that the return value is the old value of nextFreeTicketMicrosoft, because at this point in time it means that the acquire was able to return successfully, regardless of whether storedPermits was sufficient.If storedPermits is not enough, it will push nextFreeTicketMicros forward for a certain amount of time.
At this point, the acquire method is analyzed, and as you can see here, you can just look forward against it.It should be said that the source code for SmoothBursty is very simple.
SmoothWarmingUp analysis
Having analyzed SmoothBursty, it would be easier to analyze SmoothWarmingUp again.We said that SmoothBursty can handle unexpected requests because it caches permits for up to one second, and we'll see a completely different design for SmoothWarmingUp later.
SmoothWarmingUp is suitable for scenarios where resources need to be preheated, such as one of our interface businesses, which needs to use a database connection. Because connections need to be preheated to get into optimal state, if our system is under low or zero load for a long time (and of course, the same is true when the application is just started), the connection poolThe connection in was slowly released and we thought the connection pool was cold.
Assuming our business is stable, we can normally provide up to 1000 QPS access, but if the connection pool is cold, we can't have 1000 requests coming in at the same time, because this will crash our system, we should have a preheating and warming process.
In SmoothWarmingUp, storedPermits keep increasing if the system is under low load. When requests come in, we need to retrieve permits from storedPermits. The most critical point is that retrieving permits from storedPermits is time consuming because there is no preheating.
Looking back at the SmothBursty described earlier, it doesn't need to wait for time to get permits from storedPermits, whereas, conversely, it takes more time to get permits from storedPermits, which is the biggest difference. Understanding this first will help you better understand the source code.
Let's start with some rough concepts, and then let's look at this picture:
This graph is not easy to understand, with the X axis representing the number of storedPermits and the Y axis representing the time it takes to get a permits.
Assuming that permitsPerSecond is 10, stableInterval is 100 ms, while coldInterval is 3 times, or 300ms (coldFactor, 3 times write-dead, and users cannot modify it).That is, when maxPermits is reached, it takes 300ms to get a permit when the system is coldest, and 100ms if storedPermits is less than thresholdPermits.
Imagine a vertical line x=k whose intersection with the X-axis K represents the current number of storedPermits:
- When the system is very busy, this line stays at x=0, where storedPermits is 0
- When the limiter is not in use, the line slowly moves to the right until x=maxPermits;
- If the limiter is reused, the line slowly moves to the left until x=0;
When storedPermits is in the maxPermits state, we think the permits in the limiter are cold, so it takes more time to get a permit because it needs to be preheated, and there is a key dividing point that is thresholdPermits.
The preheating time is specified at the time of construction, and the trapezoidal area in the diagram is the preheating time, because after the preheating is completed, we can enter a stable Interval. Here we calculate the thresholdPermits and maxPermits values.
One key point is that the time from thresholdPermits to 0 is half the time from maxPermits to thresholdPermits, that is, the area of a trapezoid is twice the area of a rectangle, and the area of a trapezoid is warmupPeriod.
The area of the rectangle is warmupPeriod/2 because the coldFactor is hard-coded 3.
The trapezoid area is warmupPeriod, that is:
warmupPeriod = 2 * stableInterval * thresholdPermits
Thus, we derive the value of thresholdPermits:
thresholdPermits = 0.5 * warmupPeriod / stableInterval
Then we use the formula for calculating the trapezoid area:
warmupPeriod = 0.5 * (stableInterval + coldInterval) * (maxPermits - thresholdPermits)
The maxPermits are:
maxPermits = thresholdPermits + 2.0 * warmupPeriod / (stableInterval + coldInterval)
This gives us the values of thresholdPermits and maxPermits.
Next, let's look at the cooling-down interval, which refers to the rate at which each permit in storedPermits grows, that is, the rate at which the vertical line x=k moves to the right. To get from 0 to maxPermits, it takes warmupPeriodMicros time, we define it as:
@Override double coolDownIntervalMicros() { return warmupPeriodMicros / maxPermits; } //Just click on the code and you will know that this is used in resync: void resync(long nowMicros) { if (nowMicros > nextFreeTicketMicros) { // coolDownIntervalMicros is used here double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros(); storedPermits = min(maxPermits, storedPermits + newPermits); nextFreeTicketMicros = nowMicros; } }
Based on the above analysis, let's look at the other sources for SmoothWarmingUp.
First, let's look at its doSetRate method. With the previous introduction, the source code for this method is very simple:
@Override void doSetRate(double permitsPerSecond, double stableIntervalMicros) { double oldMaxPermits = maxPermits; // coldFactor is fixed 3 double coldIntervalMicros = stableIntervalMicros * coldFactor; // This formula we've already said above thresholdPermits = 0.5 * warmupPeriodMicros / stableIntervalMicros; // We've already said this formula above maxPermits = thresholdPermits + 2.0 * warmupPeriodMicros / (stableIntervalMicros + coldIntervalMicros); // Calculate the slope of that slash.Mathematical knowledge, opposite/adjacent slope = (coldIntervalMicros - stableIntervalMicros) / (maxPermits - thresholdPermits); if (oldMaxPermits == Double.POSITIVE_INFINITY) { // if we don't special-case this, we would get storedPermits == NaN, below storedPermits = 0.0; } else { storedPermits = (oldMaxPermits == 0.0) ? maxPermits // initial state is cold : storedPermits * maxPermits / oldMaxPermits; } }
The setRate method is very simple. Next, we will analyze the storedPermitsToWaitTime method. Let's review the code below:
This code is the core of the acquire method, and waitMicros consists of two parts, one is the time taken from storedPermits and the other is the time spent waiting for freshPermits to occur.In the implementation of SmothBursty, getting permits from storedPermits returns 0 directly without waiting.
In the implementation of SmoothWarmingUp, it takes time to retrieve permits from storedPermits because preheating is required. In fact, it is to calculate the area of the shadow in the following figure.
@Override long storedPermitsToWaitTime(double storedPermits, double permitsToTake) { double availablePermitsAboveThreshold = storedPermits - thresholdPermits; long micros = 0; // If the right trapezoidal part has permits, first obtain permits from the right part and calculate the area of the shadow part of the trapezoidal part if (availablePermitsAboveThreshold > 0.0) { // Number of permits obtained from the right part double permitsAboveThresholdToTake = min(availablePermitsAboveThreshold, permitsToTake); // Trapezoid Area Formula: (Upper Bottom+Lower Bottom)*Height/2 double length = permitsToTime(availablePermitsAboveThreshold) + permitsToTime(availablePermitsAboveThreshold - permitsAboveThresholdToTake); micros = (long) (permitsAboveThresholdToTake * length / 2.0); permitsToTake -= permitsAboveThresholdToTake; } // Shadow area with rectangular section micros += (long) (stableIntervalMicros * permitsToTake); return micros; } // For a given x-value, calculate y-value private double permitsToTime(double permits) { return stableIntervalMicros + permits * slope; }
At this point, SmoothWarmingUp is finished.
If you have any doubts about Guava RateLimiter, please leave a message in the message area. Readers who are not interested in streaming in Sentinel will be finished here.
Flow Control in Sentinel
Sentinel is an open source flow control and fusing tool for Ali. We don't give you much information here. Interested readers should find out for themselves.
In the flow control of entinel, we can configure the flow control rules, mainly to control QPS and the number of threads. We do not discuss controlling the number of threads here. The code to control the number of threads is not in the scope of our discussion here. The following descriptions are all about accusing QPS.
RateLimiterController
RateLimiterController is very simple, it records the last pass time by using the latestPassedTime property, and then calculates whether the current request can pass based on the QPS restrictions in the rule.
A very simple example is to set the QPS to 10, then pass is allowed every 100 milliseconds, which is determined by calculating whether the current time has passed 100 milliseconds after the latest PassedTime of the previous request.Assuming that only 50ms have passed, the current thread needs to sleep another 50ms before it can pass.What if there is another request at the same time?That requires sleep 150ms.
public class RateLimiterController implements TrafficShapingController { // Maximum queue length, default 500ms private final int maxQueueingTimeMs; // Value set by QPS private final double count; // When the last request passed private final AtomicLong latestPassedTime = new AtomicLong(-1); public RateLimiterController(int timeOut, double count) { this.maxQueueingTimeMs = timeOut; this.count = count; } @Override public boolean canPass(Node node, int acquireCount) { return canPass(node, acquireCount, false); } // Usually acquireCount is 1, prioritized parameter is not a concern here @Override public boolean canPass(Node node, int acquireCount, boolean prioritized) { // Pass when acquire count is less or equal than 0. if (acquireCount <= 0) { return true; } // if (count <= 0) { return false; } long currentTime = TimeUtil.currentTimeMillis(); // Calculate the interval between every 2 requests, for example, if QPS is limited to 10, then the interval is 100ms long costTime = Math.round(1.0 * (acquireCount) / count * 1000); // Expected pass time of this request. long expectedTime = costTime + latestPassedTime.get(); // By setting latestPassedTime, you can return to true if (expectedTime <= currentTime) { // Contention may exist here, but it's okay. latestPassedTime.set(currentTime); return true; } else { // Can't pass, need to wait long waitTime = costTime + latestPassedTime.get() - TimeUtil.currentTimeMillis(); // Wait longer than maximum returns false if (waitTime > maxQueueingTimeMs) { return false; } else { // Push latestPassedTime forward long oldTime = latestPassedTime.addAndGet(costTime); try { // Time required for sleep waitTime = oldTime - TimeUtil.currentTimeMillis(); if (waitTime > maxQueueingTimeMs) { latestPassedTime.addAndGet(-costTime); return false; } // in race condition waitTime may <= 0 if (waitTime > 0) { Thread.sleep(waitTime); } return true; } catch (InterruptedException e) { } } } return false; } }
This strategy is also very understandable, simple and rude, and fails quickly.
WarmUpController
WarmUpController is used to prevent the sudden increase in traffic, resulting in a system load that is too high for the system to handle in a stable state, but because many resources are not preheated, it cannot handle at this time.For example, databases need connections, remote services, and so on, which is why we need to warm up.
To be tedious, this means not only that the system needs to be preheated just after it is started, but also that unexpected traffic needs to be preheated again for systems with long periods of low load.
Guava's SmothWarmingUp is used to control the rate at which tokens are acquired, which is a little different from the QPS control here, but with the same central idea.We'll discuss the differences after reviewing the source code.
To help you understand the source code, let's set a scene here: QPS is set to 100 and preheating time is set to 10 seconds.The use of'[in the code represents a value calculated from this scenario.
public class WarmUpController implements TrafficShapingController { // threshold protected double count; // 3 private int coldFactor; // Number of tokens at the inflection point, meaning Guava's thresholdPermits // [500] protected int warningToken = 0; // Maximum number of tokens, meaning Guava's maxPermits // [1000] private int maxToken; // Slant slope // [1/25000] protected double slope; // Cumulative number of tokens, meaning Guava's storedPermits protected AtomicLong storedTokens = new AtomicLong(0); // Time of last token update protected AtomicLong lastFilledTime = new AtomicLong(0); public WarmUpController(double count, int warmUpPeriodInSec, int coldFactor) { construct(count, warmUpPeriodInSec, coldFactor); } public WarmUpController(double count, int warmUpPeriodInSec) { construct(count, warmUpPeriodInSec, 3); } // The following constructions are similar to those in Guava except that thresholdPermits and maxPermits have changed names private void construct(double count, int warmUpPeriodInSec, int coldFactor) { if (coldFactor <= 1) { throw new IllegalArgumentException("Cold factor should be larger than 1"); } this.count = count; this.coldFactor = coldFactor; // WaringToken and thresholdPermits mean the same thing, but the results are the same // thresholdPermits = 0.5 * warmupPeriod / stableInterval. // [warningToken = (10*100)/(3-1) = 500] warningToken = (int)(warmUpPeriodInSec * count) / (coldFactor - 1); // maxToken and maxPermits mean the same thing, but the results are the same // maxPermits = thresholdPermits + 2*warmupPeriod/(stableInterval+coldInterval) // [maxToken = 500 + (2*10*100)/(1.0+3) = 1000] maxToken = warningToken + (int)(2 * warmUpPeriodInSec * count / (1.0 + coldFactor)); // Slope calculation // slope // slope = (coldIntervalMicros-stableIntervalMicros)/(maxPermits-thresholdPermits); // [slope = (3-1.0) / 100 / (1000-500) = 1/25000] slope = (coldFactor - 1.0) / count / (maxToken - warningToken); } @Override public boolean canPass(Node node, int acquireCount) { return canPass(node, acquireCount, false); } @Override public boolean canPass(Node node, int acquireCount, boolean prioritized) { // Sentinel's QPS statistics use a sliding window // QPS of the current time window long passQps = (long) node.passQps(); // Here is the QPS of the previous time window, where one window spans 1 minute long previousQps = (long) node.previousPassQps(); // Synchronization.Set storedTokens and lastFilledTime to correct values syncToken(previousQps); long restToken = storedTokens.get(); // Number of tokens exceeds warningToken, entering trapezoidal area if (restToken >= warningToken) { // In a nutshell, because the current number of tokens exceeds the warningToken threshold, the system is in the stage where it needs to be preheated // By calculating the time it takes to acquire a token, the reciprocal is the maximum QPS capacity of the current system. long aboveToken = restToken - warningToken; // Here we calculate the alert QPS value, which is the highest QPS that can be reached in the current state. // (aboveToken * slope + 1.0 / count) is actually the time it takes to get a token in its current state double warningQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count)); // If not, then pass, otherwise not pass if (passQps + acquireCount <= warningQps) { return true; } } else { // count is the highest achievable QPS if (passQps + acquireCount <= count) { return true; } } return false; } protected void syncToken(long passQps) { // The following lines of code show how to synchronize the first time you enter a new second // Off-topic: Sentinel defaults to two time windows in a second, 500ms each long currentTime = TimeUtil.currentTimeMillis(); currentTime = currentTime - currentTime % 1000; long oldLastFillTime = lastFilledTime.get(); if (currentTime <= oldLastFillTime) { return; } // Old value of number of tokens long oldValue = storedTokens.get(); // Calculate the number of new tokens, look down long newValue = coolDownTokens(currentTime, passQps); if (storedTokens.compareAndSet(oldValue, newValue)) { // On the number of tokens, subtract the last minute's QPS and set a new value long currentValue = storedTokens.addAndGet(0 - passQps); if (currentValue < 0) { storedTokens.set(0L); } lastFilledTime.set(currentTime); } } // Update number of tokens private long coolDownTokens(long currentTime, long passQps) { long oldValue = storedTokens.get(); long newValue = oldValue; // Current number of tokens is less than warningToken, add tokens if (oldValue < warningToken) { newValue = (long)(oldValue + (currentTime - lastFilledTime.get()) * count / 1000); } else if (oldValue > warningToken) { // The current number of tokens is in the trapezoid phase. // If the QPS currently passing is greater than count/coldFactor, the system is consuming tokens faster than the cooling rate // You do not need to add a token, otherwise you need to add a token if (passQps < (int)count / coldFactor) { newValue = (long)(oldValue + (currentTime - lastFilledTime.get()) * count / 1000); } } return Math.min(newValue, maxToken); } }
The coolDownTokens method is used to calculate the number of new token s, but I don't fully understand the author's design either:
- First, for token increment, warmupPeriodMicros / maxPermits is used as the growth rate in Guava because it implements warmupPeriod, which takes storedPermits from 0 to maxPermits.And here is the growth rate of count s per second. Why?
- Second, I don't understand the decision in else branch, why do I use passQps to compare count / coldFactor to decide whether to continue adding tokens?
- My own understanding is that count/coldFactor refers to cooling speed, and that makes sense.Welcome to our discussion.
Finally, let's briefly talk about the difference between Guava's MoothWarmingUp and the entinel's WarmupController.
Guava controls the rate at which tokens are acquired and is concerned with how long it takes to acquire permits, including from storedPermits and freshPermits, to move nextFreeTicketMicros oft to a future point in time.
Sentinel controls QPS, which uses the number of tokens to identify the current state of the system, increments tokens according to time, and decreases tokens based on passing QPS.If QPS continues to decline, it is deduced that storedTokens are increasing, and then over the warning Tokens threshold, tokens will continue to grow until QPS drops to count/3 until maxTokens.
StordTokens grew at the "count per second" rate of growth, and the decrease was reduced by the QPS of the previous minute.In fact, I also have a question here, why does time take into account when adding tokens, but time is not taken into account when reducing tokens, when mentioning issue s, no one seems to answer them.
WarmUpRateLimiterController
Note that this class inherits from the WarmUpController just described and its flow control effect is defined as queuing.The code is actually the Rete LimiterController plus WarmUpController described earlier.
public class WarmUpRateLimiterController extends WarmUpController { private final int timeoutInMs; private final AtomicLong latestPassedTime = new AtomicLong(-1); public WarmUpRateLimiterController(double count, int warmUpPeriodSec, int timeOutMs, int coldFactor) { super(count, warmUpPeriodSec, coldFactor); this.timeoutInMs = timeOutMs; } @Override public boolean canPass(Node node, int acquireCount) { return canPass(node, acquireCount, false); } @Override public boolean canPass(Node node, int acquireCount, boolean prioritized) { long previousQps = (long) node.previousPassQps(); syncToken(previousQps); long currentTime = TimeUtil.currentTimeMillis(); long restToken = storedTokens.get(); long costTime = 0; long expectedTime = 0; // The main difference between RateLimiterController and RateLimiterController is this code if (restToken >= warningToken) { long aboveToken = restToken - warningToken; // current interval = restToken*slope+1/count double warmingQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count)); costTime = Math.round(1.0 * (acquireCount) / warmingQps * 1000); } else { costTime = Math.round(1.0 * (acquireCount) / count * 1000); } expectedTime = costTime + latestPassedTime.get(); if (expectedTime <= currentTime) { latestPassedTime.set(currentTime); return true; } else { long waitTime = costTime + latestPassedTime.get() - currentTime; if (waitTime > timeoutInMs) { return false; } else { long oldTime = latestPassedTime.addAndGet(costTime); try { waitTime = oldTime - TimeUtil.currentTimeMillis(); if (waitTime > timeoutInMs) { latestPassedTime.addAndGet(-costTime); return false; } if (waitTime > 0) { Thread.sleep(waitTime); } return true; } catch (InterruptedException e) { } } } return false; } }
The code is simple, it's the code in RateLimiter, and it's preheated.
In RateLimiter, the costTime for a single request is fixed, that is, 1/count, for example, if 100 qps is set, then costTime is 10 ms.
However, WarmUp is added here, that is, the number of tokens is used to determine how much QPS the current system should have. If the number of tokens exceeds warning Tokens, the QPS capacity of the system is already lower than our preset QPS, and costTime will be extended accordingly.
Summary
I haven't written an article for some time. I welcome you to correct the mistakes you made.
Focus on Author Public Number: