Design and implementation of reservoir algorithm

Posted by leszczu on Wed, 01 Dec 2021 20:04:23 +0100

Author: Grey

Original address: Design and implementation of reservoir algorithm

Problems to be solved

Suppose there is a machine that spits out different balls from source to source, and there is only a bag containing 10 balls. Each spit out ball is either put into the bag or thrown away forever. How to make sure that after the machine spits out each ball, all spit out balls are put into the bag with equal probability

rule

Spit out balls 1 to 10, completely put them into the bag, introduce the random function f(i), provide a value i, and return a number of 1-i with equal probability. When ball K spits out (k > 10), we decide whether to put them into the bag through the following decision

  1. The random function: f(K) is introduced. If a number within 10 is returned, it will be put into the bag. If a number other than 10 is returned, it will be thrown away, that is, the probability of 10/K determines whether the ball will be put into the bag.

  2. In the first step, if it is decided to enter the bag, the existing balls in the bag will be discarded with equal probability.

prove

Case 1

When K is 1 ~ 10, according to our rule, the probability of entering the bag is 100%, and each ball has equal probability

Case 2

When k is any number greater than 10, we assume that K is 927, that is, when the ball with the number 927 spits out, we consider:

A. Bag entry probability of balls 1 ~ 10

B. Bag entry probability of balls greater than 10 and less than or equal to 927

If both A and B can explain equal probability

Then it can be extended to the general case that it is equal probability.

A. when ball 927 arrives, we can consider what is the bag entry probability of ball 5?

Ball 5 needs to survive until ball 927 arrives, which must meet the following requirements:

  1. When ball 11 came, ball 5 survived

  2. When ball 12 came, ball 5 survived

  3. ...

  4. When ball 926 came, ball 5 survived

When the No. 11 ball comes, how can the No. 5 ball survive? Let's take a look first. If the No. 11 ball comes, if the No. 5 ball does not have the probability of survival q, then the probability of survival of the No. 5 ball is 1 - q

First of all, according to our rules, ball 11 should be selected into the bag with a probability of 10 / 11, and ball 5 should be selected to be replaced with a very unlucky probability of 1 / 10. Then, when ball 11 arrives, the probability of ball 5 being replaced is:

(10/11 * 1/10) = 1/11

So the probability that ball 5 will survive is

1 - 1/11 = 10/11

When ball 12 arrives, the probability of ball 5 surviving can be calculated as 11 / 12

When ball 13 arrives, the probability of ball 5 surviving can be calculated as 12 / 13

...

When ball 927 arrives, the probability of ball 5 alive is the same: it can be calculated as 926 / 927

Therefore, the probability of ball 5 surviving is:

10/11 * 11/12 * 12/13 ... * 925/926 * 926/927 = 10/927

Similarly, any number of balls 1 ~ 10 can be calculated according to the calculation method of ball 5, and the probability is 10 / 927

Case A is equal probability

In case B, we can assume a ball greater than 10 but less than 927, for example, ball 15, considering the bag entry probability

If the No. 15 ball is still in the bag when the No. 927 ball arrives, you need to ensure that:

When ball 15 was spit out at that time, it was selected with a probability of 10 / 15, and

When Ball 16 arrives, ball 15 survives. According to the calculation logic of A, the probability is 15 / 16

When ball 17 arrived, ball 15 survived. Similarly, it was 16 / 17

...

When ball 926 arrived, ball 15 survived, with a probability of 925 / 926

When ball 927 arrived, ball 15 survived, with a probability of 926 / 927

Therefore, when ball 927 arrives, the probability of ball 15 surviving is:

10/15 * 15/16 * 16/17 .... * 925/926 * 926/927 = 10/927

Similarly, the probability of any ball greater than 10 and less than 927 can be calculated according to the calculation logic of ball 15, which is 10 / 927

The probability of case A and case B is 10 / 927

So the rules meet the requirements of the topic.

code

public class Code_0058_ReservoirSampling {
    public static class RandomBox {
        private int[] bag;
        // Bag capacity
        private int capacity;
        // What number ball
        private int count;

        public RandomBox(int capacity) {
            bag = new int[capacity];
            this.capacity = capacity;
            count = 0;
        }

        // Random function, equal probability generates a random number between 1-max
        // Math. Random() - > generate numbers within the range of [0,1]
        // (int)i is the rounding down of i
        private int rand(int max) {
            return (int) (Math.random() * max) + 1;
        }

        public void add(int num) {
            // Increase in the number of balls
            count++;
            // If the number of balls does not exceed the capacity
            if (count <= capacity) {
                // Then put it into the bag
                bag[count - 1] = num;
            } else if (rand(count) <= capacity) {
                // Otherwise, enter the bag with the probability of N/count
                bag[rand(capacity) - 1] = num;
            }
        }

        // Returns the final selected ball in the bag
        public int[] choices() {
            int[] res = new int[capacity];
            System.arraycopy(bag, 0, res, 0, capacity);
            return res;
        }

    }
}

more

Algorithm and data structure notes

reference material

Topics: Algorithm data structure