Author: Grey
Original address: Design and implementation of reservoir algorithm
Problems to be solved
Suppose there is a machine that spits out different balls from source to source, and there is only a bag containing 10 balls. Each spit out ball is either put into the bag or thrown away forever. How to make sure that after the machine spits out each ball, all spit out balls are put into the bag with equal probability
rule
Spit out balls 1 to 10, completely put them into the bag, introduce the random function f(i), provide a value i, and return a number of 1-i with equal probability. When ball K spits out (k > 10), we decide whether to put them into the bag through the following decision
-
The random function: f(K) is introduced. If a number within 10 is returned, it will be put into the bag. If a number other than 10 is returned, it will be thrown away, that is, the probability of 10/K determines whether the ball will be put into the bag.
-
In the first step, if it is decided to enter the bag, the existing balls in the bag will be discarded with equal probability.
prove
Case 1
When K is 1 ~ 10, according to our rule, the probability of entering the bag is 100%, and each ball has equal probability
Case 2
When k is any number greater than 10, we assume that K is 927, that is, when the ball with the number 927 spits out, we consider:
A. Bag entry probability of balls 1 ~ 10
B. Bag entry probability of balls greater than 10 and less than or equal to 927
If both A and B can explain equal probability
Then it can be extended to the general case that it is equal probability.
A. when ball 927 arrives, we can consider what is the bag entry probability of ball 5?
Ball 5 needs to survive until ball 927 arrives, which must meet the following requirements:
-
When ball 11 came, ball 5 survived
-
When ball 12 came, ball 5 survived
-
...
-
When ball 926 came, ball 5 survived
When the No. 11 ball comes, how can the No. 5 ball survive? Let's take a look first. If the No. 11 ball comes, if the No. 5 ball does not have the probability of survival q, then the probability of survival of the No. 5 ball is 1 - q
First of all, according to our rules, ball 11 should be selected into the bag with a probability of 10 / 11, and ball 5 should be selected to be replaced with a very unlucky probability of 1 / 10. Then, when ball 11 arrives, the probability of ball 5 being replaced is:
(10/11 * 1/10) = 1/11
So the probability that ball 5 will survive is
1 - 1/11 = 10/11
When ball 12 arrives, the probability of ball 5 surviving can be calculated as 11 / 12
When ball 13 arrives, the probability of ball 5 surviving can be calculated as 12 / 13
...
When ball 927 arrives, the probability of ball 5 alive is the same: it can be calculated as 926 / 927
Therefore, the probability of ball 5 surviving is:
10/11 * 11/12 * 12/13 ... * 925/926 * 926/927 = 10/927
Similarly, any number of balls 1 ~ 10 can be calculated according to the calculation method of ball 5, and the probability is 10 / 927
Case A is equal probability
In case B, we can assume a ball greater than 10 but less than 927, for example, ball 15, considering the bag entry probability
If the No. 15 ball is still in the bag when the No. 927 ball arrives, you need to ensure that:
When ball 15 was spit out at that time, it was selected with a probability of 10 / 15, and
When Ball 16 arrives, ball 15 survives. According to the calculation logic of A, the probability is 15 / 16
When ball 17 arrived, ball 15 survived. Similarly, it was 16 / 17
...
When ball 926 arrived, ball 15 survived, with a probability of 925 / 926
When ball 927 arrived, ball 15 survived, with a probability of 926 / 927
Therefore, when ball 927 arrives, the probability of ball 15 surviving is:
10/15 * 15/16 * 16/17 .... * 925/926 * 926/927 = 10/927
Similarly, the probability of any ball greater than 10 and less than 927 can be calculated according to the calculation logic of ball 15, which is 10 / 927
The probability of case A and case B is 10 / 927
So the rules meet the requirements of the topic.
code
public class Code_0058_ReservoirSampling { public static class RandomBox { private int[] bag; // Bag capacity private int capacity; // What number ball private int count; public RandomBox(int capacity) { bag = new int[capacity]; this.capacity = capacity; count = 0; } // Random function, equal probability generates a random number between 1-max // Math. Random() - > generate numbers within the range of [0,1] // (int)i is the rounding down of i private int rand(int max) { return (int) (Math.random() * max) + 1; } public void add(int num) { // Increase in the number of balls count++; // If the number of balls does not exceed the capacity if (count <= capacity) { // Then put it into the bag bag[count - 1] = num; } else if (rand(count) <= capacity) { // Otherwise, enter the bag with the probability of N/count bag[rand(capacity) - 1] = num; } } // Returns the final selected ball in the bag public int[] choices() { int[] res = new int[capacity]; System.arraycopy(bag, 0, res, 0, capacity); return res; } } }
more
Algorithm and data structure notes