RateLimiter Source Analysis (Guava and entinel implementations)

Posted by mjdamato on Sun, 25 Aug 2019 05:17:54 +0200

Author javadoop, Senior Java Engineer.This article has been authorized to be published by the author.
Text Link https://www.javadoop.com/post...

This paper mainly introduces two parts about flow control.

The first part introduces the source code of RateLimiter in Guava, including its two modes. Most articles on the Internet now only analyze simple SmoothBursty modes, not SmoothWarmingUp with preheating.

The second part describes the implementation of flow control in Sentinel. This article does not require the reader to understand Sentinel. The coupling between Sentinel and this part is very low, so the reader does not need reading pressure.

The flow control design in Sentinel refers to Guava RateLimiter, so reading the second part requires the background of the first part.

Guava RateLimiter

RateLimiter is based on the leaky bucket algorithm, but it refers to the token bucket algorithm. Flow control algorithms are not discussed here. Look for your own data.

Introduction to RateLimiter use

RateLimiter's interface is very simple. It has two static methods for instantiation. After instantiation, we just need to care about acquire, or even have no release operation.

// RateLimiter interface list:

// There are two ways to instantiate:
public static RateLimiter create(double permitsPerSecond){}
public static RateLimiter create(double permitsPerSecond,long warmupPeriod,TimeUnit unit) {}

public double acquire() {}
public double acquire(int permits) {}

public boolean tryAcquire() {}
public boolean tryAcquire(int permits) {}
public boolean tryAcquire(long timeout, TimeUnit unit) {}
public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {}

public final double getRate() {}
public final void setRate(double permitsPerSecond) {}

RateLimiter is used to limit flow. We know that Semaphore is available in java Concurrent packages and it also provides control over resource usage. Let's look at the following code:

// Semaphore
Semaphore semaphore = new Semaphore(10);
for (int i = 0; i < 100; i++) {
    executor.submit(new Runnable() {
        @Override
        public void run() {
            semaphore.acquireUninterruptibly(1);
            try {
                doSomething();
            } finally {
                semaphore.release();
            }
        }
    });
}

Semaphore controls the concurrent number of concurrent accesses to a resource at the same time. For example, with the code above, we set up 100 threads to work, but we can do up to 10 threads in the doSomething() method at the same time.It controls the number of concurrencies.

RateLimiter, on the other hand, is used to control the rate at which resources are accessed. RateLimiter emphasizes controlling the rate.For example, control allows only 100 requests per second to pass, such as allowing 1 MB of data to be sent per second.

It is constructed by specifying a permitsPerSecond parameter that represents how many permits are generated per second, which is our rate.

RateLimiter allows preemptive tokens in the future, such as five permits per second, and we can request 100 at a time so that the next request that follows will take about 20 seconds to get the permits.

Introduction to SmoothRateLimiter

RateLimiter currently has only one subclass, which is the abstract class SmoothRateLimiter. SmoothRateLimiter has two implementation classes, which are the two modes we're going to cover here. We'll start with a brief introduction to SmoothRateLimiter, followed by two subsections describing its two implementation classes.

RateLimiter is an abstract class with only two properties:

private final SleepingStopwatch stopwatch;

private volatile Object mutexDoNotUseDirectly;

stopwatch is important because it is used for "timing". RateLimiter sets the instantiated time to 0, followed by relative time in microseconds.

mutexDoNotUseDirectly is used for locks, and RateLimiter relies on synchronized controls for concurrency, so we can see later that no attribute is even decorated with volatile.

Then let's look at the properties of SmoothRateLimiter and what they mean.

// How many permits are not currently in use and how many are saved
double storedPermits;

// Maximum number of permits allowed to cache, which is the maximum that storedPermits can reach
double maxPermits;

// How often does one permit occur?
// For example, in our construction method, we set five per second, one every 200ms, where the units are microseconds, or 200,000
double stableIntervalMicros;

// The next time you can get permits, this time is relative to RateLimiter's construction time, a relative time, understand as a timestamp
private long nextFreeTicketMicros = 0L;

In fact, looking at these attributes, we can roughly guess that its internal implementation:

NextFreeTicketMicrosoft is a key property.Every time we get permits, we take the value of storedPermits first. If it is enough, storedPermits subtracts the corresponding value. If not enough, we need to push nextFreeTicketMicrosoft forward to indicate how much time I have preoccupied the next.Then when the next request comes, if you haven't reached the time of nextFreeTicketMicros yet, you need sleep to go back to that point, and of course, push this value forward as well.

You may be confused here because time is moving forward, so information about storedPermits may be inaccurate, but you just need to synchronize and recalculate in key operations.

SmoothBursty analysis

Let's start with a simpler SmoothBursty to analyze RateLimiter's source code, then SmoothWarmingUp.

Bursty means sudden. It doesn't mean: we set up 1 K per second, and we can get 5 K permits at once. This scene doesn't mean sudden, it means preoccupying the permits generated in the next few seconds.

Suddenly, RateLimiter caches a certain number of permits in the pool so that sudden requests can be met in time.Imagine one of our interfaces, which has not been requested for a long time, and suddenly there are several requests coming at the same time. If we don't cache some permits, many threads will have to wait.

SmoothBursty defaults to cache permits for up to 1 second and cannot be modified.

RateLimiter's static construction method:

public static RateLimiter create(double permitsPerSecond) {
    return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer());
}

The construction parameter permitsPerSecond specifies how many permits can be generated per second.

static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {
    RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
    rateLimiter.setRate(permitsPerSecond);
    return rateLimiter;
}

We see that this instantiates an instance of SmoothBursty, which is constructed simply and has only one attribute, maxBurstSeconds, without any code.

The constructor specifies that maxBurstSeconds is 1.0, which means that up to one second will be cached, i.e. (1.0 * permitsPerSecond) so many permits will be in the pool.

This 1.0 seconds relates to storedPermits and maxPermits:

0 <= storedPermits <= maxPermits = permitsPerSecond

Let's move on to the setRate method:

public final void setRate(double permitsPerSecond) {
  checkArgument(
      permitsPerSecond > 0.0 && !Double.isNaN(permitsPerSecond), "rate must be positive");
  synchronized (mutex()) {
    doSetRate(permitsPerSecond, stopwatch.readMicros());
  }
}

setRate is a public method that can be used to adjust the rate.We continue to follow the initialization process here, but you know in advance that this method is used to adjust the rate, which is very helpful for understanding the source code.Note that synchronized controls concurrency here.

@Override
final void doSetRate(double permitsPerSecond, long nowMicros) {
    // synchronization
    resync(nowMicros);
    // Calculated Property stableIntervalMicros
    double stableIntervalMicros = SECONDS.toMicros(1L) / permitsPerSecond;
    this.stableIntervalMicros = stableIntervalMicros;
    doSetRate(permitsPerSecond, stableIntervalMicros);
}

The resync method is simple and is used to adjust storedPermits and nextFreeTicketMicros oft.That's what we've said. At key nodes, you need to update storedPermits to the correct value first.

void resync(long nowMicros) {
  // If nextFreeTicket has passed, imagine a scenario where limiter.acquire() has not been called again for a long time
  // nextFreeTicket needs to be set to the current time to recalculate storedPermits
  if (nowMicros > nextFreeTicketMicros) {
    double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();
    storedPermits = min(maxPermits, storedPermits + newPermits);
    nextFreeTicketMicros = nowMicros;
  }
}

coolDownIntervalMicros() is a method that you don't need to focus on for a moment. You can see that the implementation in the SmothBursty class returns the value of stableIntervalMicros directly, that is, the length of time we said each permit was generated.

Careful readers, of course, may find that stableIntervalMicros is not set at this point, that is, a divide by zero operation occurred above and the resulting newPermits are actually infinite.maxPermits is still 0 at this time, but it doesn't really matter here.

Let's go back to the previous method, after resync synchronization, stableIntervalMicros is set to a correct value, then we go to the following method:

@Override
void doSetRate(double permitsPerSecond, double stableIntervalMicros) {
  double oldMaxPermits = this.maxPermits;
  // Here, the permits produced by maxPermits in one second are calculated
  maxPermits = maxBurstSeconds * permitsPerSecond;
  if (oldMaxPermits == Double.POSITIVE_INFINITY) {
    // if we don't special-case this, we would get storedPermits == NaN, below
    storedPermits = maxPermits;
  } else {
    // Because the range of storedPermits has changed, you need to scale equally
    storedPermits =
        (oldMaxPermits == 0.0)
            ? 0.0 // initial state
            : storedPermits * maxPermits / oldMaxPermits;
  }
}

This method, as we see above, was originally initialized with a permitsPerSecond value, and now we're going to adjust this frequency.For maxPermits, it is recalculated, and for storedPermits, it is scaled equally.

Now that the construction method is complete, we have an instance of RateLimiter's implementation class SmoothBursty. Perhaps you still have some confusion about the source code above, but that's okay. If you look further, maybe a lot of your confusion will be solved.

Next, let's analyze the acquire method:

@CanIgnoreReturnValue
public double acquire() {
  return acquire(1);
}

@CanIgnoreReturnValue
public double acquire(int permits) {
  // Reservations, if permits are not currently available directly, need to wait
  // Return value represents how long a sleep is required
  long microsToWait = reserve(permits);
  // sleep
  stopwatch.sleepMicrosUninterruptibly(microsToWait);
  // Time to return to sleep
  return 1.0 * microsToWait / SECONDS.toMicros(1L);
}

Let's look at the reserve method:

final long reserve(int permits) {
  checkPermits(permits);
  synchronized (mutex()) {
    return reserveAndGetWaitLength(permits, stopwatch.readMicros());
  }
}

final long reserveAndGetWaitLength(int permits, long nowMicros) {
  // Return to nextFreeTicketMicros oft
  long momentAvailable = reserveEarliestAvailable(permits, nowMicros);
  // Calculating time
  return max(momentAvailable - nowMicros, 0);
}

Keep looking in:

@Override
final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {
  // Do a synchronization here to update storedPermits and nextFreeTicketMicros oft (if needed)
  resync(nowMicros);
  // The return value is nextFreeTicketMicros oft, notice that resync has just been done and that it is the latest correct value
  long returnValue = nextFreeTicketMicros;
  // How many permits can be used in storedPermits
  double storedPermitsToSpend = min(requiredPermits, this.storedPermits);
  // Not enough in storedPermits
  double freshPermits = requiredPermits - storedPermitsToSpend;
  // How long to wait for this insufficient part
  long waitMicros =
      storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend) // This part returns 0 fixed
          + (long) (freshPermits * stableIntervalMicros);
  // Push nextFreeTicketMicros oft forward
  this.nextFreeTicketMicros = LongMath.saturatedAdd(nextFreeTicketMicros, waitMicros);
  // storedPermits minus the removed part
  this.storedPermits -= storedPermitsToSpend;
  return returnValue;
}

We can see that when we acquire permits, we actually acquire two parts, one from the stock storedPermits, the other from the preoccupied freshPermits in the future if the stock is insufficient.

A key point to mention here is that we see that the return value is the old value of nextFreeTicketMicrosoft, because at this point in time it means that the acquire was able to return successfully, regardless of whether storedPermits was sufficient.If storedPermits is not enough, it will push nextFreeTicketMicros forward for a certain amount of time.

At this point, the acquire method is analyzed, and as you can see here, you can just look forward against it.It should be said that the source code for SmoothBursty is very simple.

SmoothWarmingUp analysis

Having analyzed SmoothBursty, it would be easier to analyze SmoothWarmingUp again.We said that SmoothBursty can handle unexpected requests because it caches permits for up to one second, and we'll see a completely different design for SmoothWarmingUp later.

SmoothWarmingUp is suitable for scenarios where resources need to be preheated, such as one of our interface businesses, which needs to use a database connection. Because connections need to be preheated to get into optimal state, if our system is under low or zero load for a long time (and of course, the same is true when the application is just started), the connection poolThe connection in was slowly released and we thought the connection pool was cold.

Assuming our business is stable, we can normally provide up to 1000 QPS access, but if the connection pool is cold, we can't have 1000 requests coming in at the same time, because this will crash our system, we should have a preheating and warming process.

In SmoothWarmingUp, storedPermits keep increasing if the system is under low load. When requests come in, we need to retrieve permits from storedPermits. The most critical point is that retrieving permits from storedPermits is time consuming because there is no preheating.

Looking back at the SmothBursty described earlier, it doesn't need to wait for time to get permits from storedPermits, whereas, conversely, it takes more time to get permits from storedPermits, which is the biggest difference. Understanding this first will help you better understand the source code.

Let's start with some rough concepts, and then let's look at this picture:

This graph is not easy to understand, with the X axis representing the number of storedPermits and the Y axis representing the time it takes to get a permits.

Assuming that permitsPerSecond is 10, stableInterval is 100 ms, while coldInterval is 3 times, or 300ms (coldFactor, 3 times write-dead, and users cannot modify it).That is, when maxPermits is reached, it takes 300ms to get a permit when the system is coldest, and 100ms if storedPermits is less than thresholdPermits.

Imagine a vertical line x=k whose intersection with the X-axis K represents the current number of storedPermits:

When the system is very busy, this line stays at x=0, where storedPermits is 0
When the limiter is not in use, the line slowly moves to the right until x=maxPermits;
If the limiter is reused, the line slowly moves to the left until x=0;

When storedPermits is in the maxPermits state, we think the permits in the limiter are cold, so it takes more time to get a permit because it needs to be preheated, and there is a key dividing point that is thresholdPermits.

The preheating time is specified at the time of construction, and the trapezoidal area in the diagram is the preheating time, because after the preheating is completed, we can enter a stable Interval. Here we calculate the thresholdPermits and maxPermits values.

One key point is that the time from thresholdPermits to 0 is half the time from maxPermits to thresholdPermits, that is, the area of a trapezoid is twice the area of a rectangle, and the area of a trapezoid is warmupPeriod.

The area of the rectangle is warmupPeriod/2 because the coldFactor is hard-coded 3.

The trapezoid area is warmupPeriod, that is:

warmupPeriod = 2 * stableInterval * thresholdPermits

Thus, we derive the value of thresholdPermits:

thresholdPermits = 0.5 * warmupPeriod / stableInterval

Then we use the formula for calculating the trapezoid area:

warmupPeriod = 0.5 * (stableInterval + coldInterval) * (maxPermits - thresholdPermits)

The maxPermits are:

maxPermits = thresholdPermits + 2.0 * warmupPeriod / (stableInterval + coldInterval)

This gives us the values of thresholdPermits and maxPermits.

Next, let's look at the cooling-down interval, which refers to the rate at which each permit in storedPermits grows, that is, the rate at which the vertical line x=k moves to the right. To get from 0 to maxPermits, it takes warmupPeriodMicros time, we define it as:

@Override
double coolDownIntervalMicros() {
    return warmupPeriodMicros / maxPermits;
}
//Just click on the code and you will know that this is used in resync:

void resync(long nowMicros) {
  if (nowMicros > nextFreeTicketMicros) {
    // coolDownIntervalMicros is used here
    double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();
    storedPermits = min(maxPermits, storedPermits + newPermits);
    nextFreeTicketMicros = nowMicros;
  }
}

Based on the above analysis, let's look at the other sources for SmoothWarmingUp.

First, let's look at its doSetRate method. With the previous introduction, the source code for this method is very simple:

@Override
void doSetRate(double permitsPerSecond, double stableIntervalMicros) {
    double oldMaxPermits = maxPermits;
    // coldFactor is fixed 3
    double coldIntervalMicros = stableIntervalMicros * coldFactor;
    // This formula we've already said above
    thresholdPermits = 0.5 * warmupPeriodMicros / stableIntervalMicros;
    // We've already said this formula above
    maxPermits =
        thresholdPermits + 2.0 * warmupPeriodMicros / (stableIntervalMicros + coldIntervalMicros);
    // Calculate the slope of that slash.Mathematical knowledge, opposite/adjacent
    slope = (coldIntervalMicros - stableIntervalMicros) / (maxPermits - thresholdPermits);
    if (oldMaxPermits == Double.POSITIVE_INFINITY) {
        // if we don't special-case this, we would get storedPermits == NaN, below
        storedPermits = 0.0;
    } else {
        storedPermits =
            (oldMaxPermits == 0.0)
                ? maxPermits // initial state is cold
                : storedPermits * maxPermits / oldMaxPermits;
    }
}

The setRate method is very simple. Next, we will analyze the storedPermitsToWaitTime method. Let's review the code below:

This code is the core of the acquire method, and waitMicros consists of two parts, one is the time taken from storedPermits and the other is the time spent waiting for freshPermits to occur.In the implementation of SmothBursty, getting permits from storedPermits returns 0 directly without waiting.

In the implementation of SmoothWarmingUp, it takes time to retrieve permits from storedPermits because preheating is required. In fact, it is to calculate the area of the shadow in the following figure.

@Override
long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
  double availablePermitsAboveThreshold = storedPermits - thresholdPermits;
  long micros = 0;
  // If the right trapezoidal part has permits, first obtain permits from the right part and calculate the area of the shadow part of the trapezoidal part
  if (availablePermitsAboveThreshold > 0.0) {
    // Number of permits obtained from the right part
    double permitsAboveThresholdToTake = min(availablePermitsAboveThreshold, permitsToTake);
    // Trapezoid Area Formula: (Upper Bottom+Lower Bottom)*Height/2
    double length =
        permitsToTime(availablePermitsAboveThreshold)
            + permitsToTime(availablePermitsAboveThreshold - permitsAboveThresholdToTake);
    micros = (long) (permitsAboveThresholdToTake * length / 2.0);
    permitsToTake -= permitsAboveThresholdToTake;
  }
  // Shadow area with rectangular section
  micros += (long) (stableIntervalMicros * permitsToTake);
  return micros;
}

// For a given x-value, calculate y-value
private double permitsToTime(double permits) {
  return stableIntervalMicros + permits * slope;
}

At this point, SmoothWarmingUp is finished.

If you have any doubts about Guava RateLimiter, please leave a message in the message area. Readers who are not interested in streaming in Sentinel will be finished here.

Flow Control in Sentinel

Sentinel is an open source flow control and fusing tool for Ali. We don't give you much information here. Interested readers should find out for themselves.

In the flow control of entinel, we can configure the flow control rules, mainly to control QPS and the number of threads. We do not discuss controlling the number of threads here. The code to control the number of threads is not in the scope of our discussion here. The following descriptions are all about accusing QPS.

RateLimiterController

RateLimiterController is very simple, it records the last pass time by using the latestPassedTime property, and then calculates whether the current request can pass based on the QPS restrictions in the rule.

A very simple example is to set the QPS to 10, then pass is allowed every 100 milliseconds, which is determined by calculating whether the current time has passed 100 milliseconds after the latest PassedTime of the previous request.Assuming that only 50ms have passed, the current thread needs to sleep another 50ms before it can pass.What if there is another request at the same time?That requires sleep 150ms.

public class RateLimiterController implements TrafficShapingController {

    // Maximum queue length, default 500ms
    private final int maxQueueingTimeMs;
    // Value set by QPS
    private final double count;
        // When the last request passed
    private final AtomicLong latestPassedTime = new AtomicLong(-1);

    public RateLimiterController(int timeOut, double count) {
        this.maxQueueingTimeMs = timeOut;
        this.count = count;
    }

    @Override
    public boolean canPass(Node node, int acquireCount) {
        return canPass(node, acquireCount, false);
    }

    // Usually acquireCount is 1, prioritized parameter is not a concern here
    @Override
    public boolean canPass(Node node, int acquireCount, boolean prioritized) {
        // Pass when acquire count is less or equal than 0.
        if (acquireCount <= 0) {
            return true;
        }
        // 
        if (count <= 0) {
            return false;
        }

        long currentTime = TimeUtil.currentTimeMillis();
        // Calculate the interval between every 2 requests, for example, if QPS is limited to 10, then the interval is 100ms
        long costTime = Math.round(1.0 * (acquireCount) / count * 1000);

        // Expected pass time of this request.
        long expectedTime = costTime + latestPassedTime.get();

        // By setting latestPassedTime, you can return to true
        if (expectedTime <= currentTime) {
            // Contention may exist here, but it's okay.
            latestPassedTime.set(currentTime);
            return true;
        } else {
            // Can't pass, need to wait
            long waitTime = costTime + latestPassedTime.get() - TimeUtil.currentTimeMillis();
            // Wait longer than maximum returns false
            if (waitTime > maxQueueingTimeMs) {
                return false;
            } else {
                // Push latestPassedTime forward
                long oldTime = latestPassedTime.addAndGet(costTime);
                try {
                    // Time required for sleep
                    waitTime = oldTime - TimeUtil.currentTimeMillis();
                    if (waitTime > maxQueueingTimeMs) {
                        latestPassedTime.addAndGet(-costTime);
                        return false;
                    }
                    // in race condition waitTime may <= 0
                    if (waitTime > 0) {
                        Thread.sleep(waitTime);
                    }
                    return true;
                } catch (InterruptedException e) {
                }
            }
        }
        return false;
    }

}

This strategy is also very understandable, simple and rude, and fails quickly.

WarmUpController

WarmUpController is used to prevent the sudden increase in traffic, resulting in a system load that is too high for the system to handle in a stable state, but because many resources are not preheated, it cannot handle at this time.For example, databases need connections, remote services, and so on, which is why we need to warm up.

To be tedious, this means not only that the system needs to be preheated just after it is started, but also that unexpected traffic needs to be preheated again for systems with long periods of low load.

Guava's SmothWarmingUp is used to control the rate at which tokens are acquired, which is a little different from the QPS control here, but with the same central idea.We'll discuss the differences after reviewing the source code.

To help you understand the source code, let's set a scene here: QPS is set to 100 and preheating time is set to 10 seconds.The use of'[in the code represents a value calculated from this scenario.

public class WarmUpController implements TrafficShapingController {

    // threshold
    protected double count;
    // 3
    private int coldFactor;
    // Number of tokens at the inflection point, meaning Guava's thresholdPermits
    // [500]
    protected int warningToken = 0;
    // Maximum number of tokens, meaning Guava's maxPermits
    // [1000]
    private int maxToken;
    // Slant slope
    // [1/25000]
    protected double slope;

    // Cumulative number of tokens, meaning Guava's storedPermits
    protected AtomicLong storedTokens = new AtomicLong(0);
    // Time of last token update
    protected AtomicLong lastFilledTime = new AtomicLong(0);

    public WarmUpController(double count, int warmUpPeriodInSec, int coldFactor) {
        construct(count, warmUpPeriodInSec, coldFactor);
    }

    public WarmUpController(double count, int warmUpPeriodInSec) {
        construct(count, warmUpPeriodInSec, 3);
    }

    // The following constructions are similar to those in Guava except that thresholdPermits and maxPermits have changed names
    private void construct(double count, int warmUpPeriodInSec, int coldFactor) {

        if (coldFactor <= 1) {
            throw new IllegalArgumentException("Cold factor should be larger than 1");
        }

        this.count = count;

        this.coldFactor = coldFactor;

        // WaringToken and thresholdPermits mean the same thing, but the results are the same
        // thresholdPermits = 0.5 * warmupPeriod / stableInterval.
        // [warningToken = (10*100)/(3-1) = 500]
        warningToken = (int)(warmUpPeriodInSec * count) / (coldFactor - 1);

        // maxToken and maxPermits mean the same thing, but the results are the same
        // maxPermits = thresholdPermits + 2*warmupPeriod/(stableInterval+coldInterval)
        // [maxToken = 500 + (2*10*100)/(1.0+3) = 1000]
        maxToken = warningToken + (int)(2 * warmUpPeriodInSec * count / (1.0 + coldFactor));

        // Slope calculation
        // slope
        // slope = (coldIntervalMicros-stableIntervalMicros)/(maxPermits-thresholdPermits);
        // [slope = (3-1.0) / 100 / (1000-500) = 1/25000]
        slope = (coldFactor - 1.0) / count / (maxToken - warningToken);

    }

    @Override
    public boolean canPass(Node node, int acquireCount) {
        return canPass(node, acquireCount, false);
    }

    @Override
    public boolean canPass(Node node, int acquireCount, boolean prioritized) {

        // Sentinel's QPS statistics use a sliding window

        // QPS of the current time window 
        long passQps = (long) node.passQps();

        // Here is the QPS of the previous time window, where one window spans 1 minute
        long previousQps = (long) node.previousPassQps();

        // Synchronization.Set storedTokens and lastFilledTime to correct values
        syncToken(previousQps);

        long restToken = storedTokens.get();
        // Number of tokens exceeds warningToken, entering trapezoidal area
        if (restToken >= warningToken) {

            // In a nutshell, because the current number of tokens exceeds the warningToken threshold, the system is in the stage where it needs to be preheated
            // By calculating the time it takes to acquire a token, the reciprocal is the maximum QPS capacity of the current system.

            long aboveToken = restToken - warningToken;

            // Here we calculate the alert QPS value, which is the highest QPS that can be reached in the current state.
            // (aboveToken * slope + 1.0 / count) is actually the time it takes to get a token in its current state
            double warningQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count));
            // If not, then pass, otherwise not pass
            if (passQps + acquireCount <= warningQps) {
                return true;
            }
        } else {
            // count is the highest achievable QPS
            if (passQps + acquireCount <= count) {
                return true;
            }
        }

        return false;
    }

    protected void syncToken(long passQps) {
        // The following lines of code show how to synchronize the first time you enter a new second
        // Off-topic: Sentinel defaults to two time windows in a second, 500ms each
        long currentTime = TimeUtil.currentTimeMillis();
        currentTime = currentTime - currentTime % 1000;
        long oldLastFillTime = lastFilledTime.get();
        if (currentTime <= oldLastFillTime) {
            return;
        }

        // Old value of number of tokens
        long oldValue = storedTokens.get();
        // Calculate the number of new tokens, look down
        long newValue = coolDownTokens(currentTime, passQps);

        if (storedTokens.compareAndSet(oldValue, newValue)) {
            // On the number of tokens, subtract the last minute's QPS and set a new value
            long currentValue = storedTokens.addAndGet(0 - passQps);
            if (currentValue < 0) {
                storedTokens.set(0L);
            }
            lastFilledTime.set(currentTime);
        }

    }

    // Update number of tokens
    private long coolDownTokens(long currentTime, long passQps) {
        long oldValue = storedTokens.get();
        long newValue = oldValue;

        // Current number of tokens is less than warningToken, add tokens
        if (oldValue < warningToken) {
            newValue = (long)(oldValue + (currentTime - lastFilledTime.get()) * count / 1000);
        } else if (oldValue > warningToken) {
            // The current number of tokens is in the trapezoid phase.
            // If the QPS currently passing is greater than count/coldFactor, the system is consuming tokens faster than the cooling rate
            //    You do not need to add a token, otherwise you need to add a token
            if (passQps < (int)count / coldFactor) {
                newValue = (long)(oldValue + (currentTime - lastFilledTime.get()) * count / 1000);
            }
        }
        return Math.min(newValue, maxToken);
    }

}

The coolDownTokens method is used to calculate the number of new token s, but I don't fully understand the author's design either:

First, for token increment, warmupPeriodMicros / maxPermits is used as the growth rate in Guava because it implements warmupPeriod, which takes storedPermits from 0 to maxPermits.And here is the growth rate of count s per second. Why?
Second, I don't understand the decision in else branch, why do I use passQps to compare count / coldFactor to decide whether to continue adding tokens?
My own understanding is that count/coldFactor refers to cooling speed, and that makes sense.Welcome to our discussion.

Finally, let's briefly talk about the difference between Guava's MoothWarmingUp and the entinel's WarmupController.

Guava controls the rate at which tokens are acquired and is concerned with how long it takes to acquire permits, including from storedPermits and freshPermits, to move nextFreeTicketMicros oft to a future point in time.

Sentinel controls QPS, which uses the number of tokens to identify the current state of the system, increments tokens according to time, and decreases tokens based on passing QPS.If QPS continues to decline, it is deduced that storedTokens are increasing, and then over the warning Tokens threshold, tokens will continue to grow until QPS drops to count/3 until maxTokens.

StordTokens grew at the "count per second" rate of growth, and the decrease was reduced by the QPS of the previous minute.In fact, I also have a question here, why does time take into account when adding tokens, but time is not taken into account when reducing tokens, when mentioning issue s, no one seems to answer them.

WarmUpRateLimiterController

Note that this class inherits from the WarmUpController just described and its flow control effect is defined as queuing.The code is actually the Rete LimiterController plus WarmUpController described earlier.


public class WarmUpRateLimiterController extends WarmUpController {

    private final int timeoutInMs;
    private final AtomicLong latestPassedTime = new AtomicLong(-1);

    public WarmUpRateLimiterController(double count, int warmUpPeriodSec, int timeOutMs, int coldFactor) {
        super(count, warmUpPeriodSec, coldFactor);
        this.timeoutInMs = timeOutMs;
    }

    @Override
    public boolean canPass(Node node, int acquireCount) {
        return canPass(node, acquireCount, false);
    }

    @Override
    public boolean canPass(Node node, int acquireCount, boolean prioritized) {
        long previousQps = (long) node.previousPassQps();
        syncToken(previousQps);

        long currentTime = TimeUtil.currentTimeMillis();

        long restToken = storedTokens.get();
        long costTime = 0;
        long expectedTime = 0;

        // The main difference between RateLimiterController and RateLimiterController is this code

        if (restToken >= warningToken) {
            long aboveToken = restToken - warningToken;

            // current interval = restToken*slope+1/count
            double warmingQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count));
            costTime = Math.round(1.0 * (acquireCount) / warmingQps * 1000);
        } else {
            costTime = Math.round(1.0 * (acquireCount) / count * 1000);
        }
        expectedTime = costTime + latestPassedTime.get();

        if (expectedTime <= currentTime) {
            latestPassedTime.set(currentTime);
            return true;
        } else {
            long waitTime = costTime + latestPassedTime.get() - currentTime;
            if (waitTime > timeoutInMs) {
                return false;
            } else {
                long oldTime = latestPassedTime.addAndGet(costTime);
                try {
                    waitTime = oldTime - TimeUtil.currentTimeMillis();
                    if (waitTime > timeoutInMs) {
                        latestPassedTime.addAndGet(-costTime);
                        return false;
                    }
                    if (waitTime > 0) {
                        Thread.sleep(waitTime);
                    }
                    return true;
                } catch (InterruptedException e) {
                }
            }
        }
        return false;
    }
}

The code is simple, it's the code in RateLimiter, and it's preheated.

In RateLimiter, the costTime for a single request is fixed, that is, 1/count, for example, if 100 qps is set, then costTime is 10 ms.

However, WarmUp is added here, that is, the number of tokens is used to determine how much QPS the current system should have. If the number of tokens exceeds warning Tokens, the QPS capacity of the system is already lower than our preset QPS, and costTime will be extended accordingly.

Summary

I haven't written an article for some time. I welcome you to correct the mistakes you made.

Focus on Author Public Number:

Topics: Java less Attribute Database

Programmer Think