Application of Bloom filter

Posted by provision on Fri, 10 Sep 2021 11:32:31 +0200

In an App login scenario, there are two important information: account and device number

The device number identifies an installed App and is generated when the App is installed. If it is uninstalled and reinstalled, a new device number will be generated.

Suppose there is a demand, it is necessary to check whether the active account and equipment ratio are normal within a time window. If it is not within a reasonable range, an alarm will be given.

There are two general schemes:

programme	characteristic
Memory set	Advantages: good performance, disadvantages: limited data storage
database	Advantages: more data can be stored, disadvantages: the performance is relatively poor, and the data clearing logic needs to be maintained

These two schemes have their own advantages and disadvantages.

So, if I pursue both performance and storing more data, is there a way

So the bloom filter came in handy.

The use of Bloom filter also stores data in memory and compresses data at the same time, so it can store more data while ensuring performance.

The premise is that it can tolerate a certain degree of misjudgment rate, but most business scenarios do not need so accurate data.

If you want to maintain the bloom filter in local memory, you can use Guava's implementation.

BloomFilter<CharSequence> bloomFilter = BloomFilter.create(
                Funnels.stringFunnel(Charsets.UTF_8),
                // Expected amount of inserted data
                100000,
                // Expected fault tolerance
                0.0001);

However, the local memory is still limited, and in the actual scenario, the cluster composed of multiple application instances needs to maintain a common bloom filter, so distributed cache components such as redis will be considered

If redis is used, there are several implementation methods

(1) Install redis bloom filter plug-in.

(2) Implementation of bitmap based on redis

public class RedisBloomFilter {

    // The redis key prefix of Bloom filter can be used to count the Redis usage of the filter
    private static final String KEY_PREFIX = "bf:";

    @Autowired
    private StringRedisTemplate stringRedisTemplate;

    /**
     * bit Array length
     */
    private final long numBits;

    // Number of hash functions
    private final int numHashFunctions;

    /**
     *
     * @param expectedInsertions Estimated insertion
     * @param fpp Acceptable error rate
     */
    public RedisBloomFilter(long expectedInsertions, double fpp) {
        this.numBits = optimalNumOfBits(expectedInsertions, fpp);
        this.numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    }

    // Calculate the number of hash functions (method from guava)
    private int optimalNumOfHashFunctions(long n, long m) {
        return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
    }

    // Calculate the bit array length (method from guava)
    private long optimalNumOfBits(long n, double p) {
        if (p == 0) {
            p = Double.MIN_VALUE;
        }
        return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
    }

    /**
     * Judge whether value exists in key
     */
    public boolean contains(String key, String value) {
        long[] indexs = getIndexs(value);
        // Redis pipeline is used here to reduce the number of accesses to redis during filter operation and reduce the amount of redis concurrency
        List<Object> pipelinedResult = stringRedisTemplate.executePipelined((RedisCallback<Boolean>) connection -> {
            for (long index : indexs) {
                connection.getBit(getRedisKey(key).getBytes(), index);
            }
            return null;
        });

        boolean result = !pipelinedResult.contains(false);

        if (!result) {
            put(key, value);
        }

        return result;
    }

    /**
     * Save the key in redis bitmap
     */
    private void put(String key, String value) {
        long[] indexs = getIndexs(value);
        // Redis pipeline is used here to reduce the number of accesses to redis during filter operation and reduce the amount of redis concurrency
        stringRedisTemplate.executePipelined((RedisCallback<Boolean>) connection -> {
            for (long index : indexs) {
                connection.setBit(getRedisKey(key).getBytes(), index, true);
            }

            return null;
        });
    }

    /**
     * The method of obtaining bitmap subscript according to key comes from guava
     */
    private long[] getIndexs(String value) {
        long hash1 = hash(value);
        long hash2 = hash1 >>> 16;
        long[] result = new long[numHashFunctions];
        for (int i = 0; i < numHashFunctions; i++) {
            long combinedHash = hash1 + i * hash2;
            if (combinedHash < 0) {
                combinedHash = ~combinedHash;
            }
            result[i] = combinedHash % numBits;
        }
        return result;
    }

    /**
     * Get a hash value (method from guava)
     */
    private long hash(String value) {
        Charset charset = Charset.forName("UTF-8");
        return Hashing.murmur3_128().hashObject(value, Funnels.stringFunnel(charset)).asLong();
    }

    public String getRedisKey(String key) {
        return KEY_PREFIX + key;
    }

}

(3) Using open source implementation - Redisson

RBloomFilter<String> bloomFilter = redisson.getBloomFilter("bloom-filter");
// The capacity is 10000 and the fault tolerance rate is 0.001
bloomFilter.tryInit(10000L, 0.001);
bloomFilter.contains("value");

Topics: Java Redis

Programmer Think

Application of Bloom filter

Hot Topics