Bitmap (Java based implementation)

Posted by -[ webdreamer ] on Sun, 16 Jan 2022 13:28:57 +0100

The so-called bitmap is to use each bit to store a certain state. It is suitable for large-scale data, but there are not many data states. It is usually used to judge whether a data store exists.

Design principle:

Maximize the use of memory as much as possible, and maximize the mining, utilization and performance of Java.

Design idea:

Use a long array for storage,

Therefore, the size of Bitmap class uses long type (int type is not enough limit), so theoretically 0 < = size < = 2 ^ 63-1;

In addition, the maximum length of Java array is 2 ^ 31-1 (i.e. the maximum literal value of int type), so the array of long type can store up to 64 * (2 ^ 31-1) points (calculated by points with only two states, i.e. 1bit represents the state of a point), so the size of Bitmap cannot be greater than 64 * (2 ^ 31-1), so the actual 0 < = size < = 64 * (2 ^ 31-1);

Then continue to analyze that each point in the Bitmap cannot have only two states, so the Bitmap that needs to be designed can customize the number of states of points, state, That is, 1 < state < = 2 ^ 63-1 (greater than 1 is because there cannot be only one state. If there is only one state, the use of Bitmap will lose its meaning. Less than or equal to 2 ^ 63-1 is because 2 ^ 63-1 is the maximum literal value of long type, and there is no integer value greater than this value, which is in line with the purpose: To explore the ultimate use of Java);

In addition, the number of bits needed to represent the status number of user-defined points is also a problem. The number of bits of the status number of user-defined points is stateBitNum = (bits of binary representation of state)

(

Make a simple derivation first:

1bit is required for two states, [0,1]

2bit is required for 3 ~ 4 states, [00,01,10,11]

3bit is required for 5 ~ 8 states, [00000, 1100, 111, 001, 101, 111]

4bit is required for 9 ~ 16 states, [...,...]

The calculation formula is given here: statebitnum = 64 - long numberOfLeadingZeros(state - 1);

Long.numberOfLeadingZeros(long i) method is to calculate the number of zeros on the left of the highest 1 in the complement form of i, that is, the number of zeros in the complement from left to right. It stops at the first non-zero position. The number of zeros is the return value of this method, for example: long Numberofleadingzeros (2) = 62, the complement of 2 is 10, so 64 bit complement (because i is long) is used to represent that there are 62 zeros in front of 10; So 64 - long Numberofleadingzeros (2) = 2, and the two states (state=2) are actually represented by 1 bit, so the state is subtracted by one when it is passed into the method (because 0 itself can represent a state in the computer, for example, the eight states can be represented by numbers 0 ~ 7);

If you continue to dig deeply, you will find that 64 bits can actually represent 2 ^ 64 States, but the maximum literal value of Java is 2 ^ 63-1, so the maximum number of States state=2^63-1. 63 bits are required to save these states. A long type is enough, and there is also a pit long Numberofleadingzeros (0) = 64, but this situation cannot occur when we use it, because the minimum state is 2 and the minimum state-1 is 1, which avoids this situation;

That's it!

);

Other Bitmap operations are described during implementation.

realization:

/**
 * Bitmap class:
 * 1.At most (2 ^ 37-2 ^ 6) points with two status values can be stored;
 * 2.Theoretically, there are (2 ^ 37-2 ^ 6) bit spaces, which may not be fully used in practice (see the introduction in the parameterless constructor),
 * Therefore, the number of state values (stateNum) of the selected points should preferably meet 64% ⌈ log2(stateNum) ⌉ = 0, that is, the number of bits of the complement of stateNum is exactly a factor of 64; (if you don't understand, feel it by yourself)
 * 3.There is no method to delete points, only add, update and find operations; (ask why people who don't have deletion methods are either mentally ill or mentally ill...)
 * 4.Answer the question in 3: first of all, it is not impossible, it is not necessary, because deleting a point requires filling the deleted position in the array by shifting. Due to the large amount of data, it may be an extremely waste of time,
 * Moreover, the meaning of bitmap itself is to represent the possibility of each data in massive data. For massive data, a few errors will not affect the final statistical results (for sneaking, the wrong data can still be re filtered in the next layer, even if you don't understand...),
 * Here, it is recommended to use the update method to set the unnecessary point to 0 to indicate that the point is discarded. The specific logic needs to be edited by the user himself;
 * 5.The theoretical persistent storage space spent by this class is about 260MB, that is, the limit value of the size of the elementData array. In fact, it may be a little more, but it is almost negligible. Finally, the number of points storing two state values is about 2 billion,
 * You can also customize the number of state values for each point, stateNum and the number of points. The maximum length can be set to (2 ^ 63-1), but the length > ⌈ log2(stateNum) ⌉ (rounded up) / 64 * integer MAX_ VALUE,
 * It must be sufficient for daily use; (here, there are always bullies who will ask what to do if they exceed the two status values. Even if there are 2 billion points, I still don't have enough?)
 * 6.Answer the question in 5: can't you make an array? Can't you use more new ones together!!!
 */
public class Bitmap {
    /*
     *Using long array savepoints
     */
    private final long[] elementData;
    /*
     * The maximum value max of the sum of the bits of all points that can be stored in the bitmap_ STATE_ SIZE=137,438,953,408;
     * 64(MAX_STATE_SIZE=long Number of bits of type) * 0x7fffff (maximum value of type int Integer.MAX_VALUE) = MAX_STATE_SIZE (2^37-2^6)
     */
//    private static final long MAX_STATE_SIZE = 0x1fffffffc0L;// It didn't work. I was careless...
    /*
    Number of states of points saved by bitmap
     */
    private final long stateNum;
    /*
    The number of bits required to store the state value of a point saved by a bitmap
     */
    private final int stateBitNum;
    /*
    The maximum number of points a Bitmap can store, which is specified when initializing the Bitmap class
     */
    private final long MAX_SIZE;
    /*
    The number of points that a long variable can hold
     */
    private final long numOfOneLong;
    /*
    The number of points that the bitmap has saved
     */
    private long size;

    /**
     * @param stateNum Number of states in the bitmap, state > 1
     * @param length   Bitmap size, 0 < length
     *                 If length > ⌈ log2(stateNum) ⌉ (rounded up) / 64 * integer MAX_ Value, it indicates that the Bitmap class cannot store points of length stateNum, and an exception is thrown
     */
    public Bitmap(long stateNum, long length) {
        if (stateNum > 1 && length > 0) {
            /*
            How many bits are needed to calculate the status value
             */
            stateBitNum = 64 - Long.numberOfLeadingZeros(stateNum - 1);
            /*
            Calculate the maximum number of status values that can be stored in a long type,
            If you can't divide it, discard the remaining digits,
            Because it is extremely complex to access and calculate bit values across elements in the array, part of the bit space is discarded to facilitate various operations of the bitmap
             */
            numOfOneLong = 64 / stateBitNum;
            /*
            A long type can store up to numOfOneLong status values,
            Therefore, the bitmap can access numofonelong * integer at most MAX_ Value points,
            If the length of the number of bitmaps to be stored is greater than the maximum number of bitmaps to be stored (numOfOneLong*Integer.MAX_VALUE), it means that this kind of Bitmap cannot meet the requirements, and an exception is thrown
             */
            if (length > numOfOneLong * Integer.MAX_VALUE)
                throw new RuntimeException("The initialized bitmap is too large to store!!!");
            this.stateNum = stateNum;
            MAX_SIZE = length;

            if (length % numOfOneLong == 0)
                elementData = new long[(int) (length / numOfOneLong)];
            /*
            If the length point cannot be put down by the array, you need to add an additional array element to store the point
             */
            else elementData = new long[((int) (length / numOfOneLong)) + 1];
        } else
            throw new RuntimeException("There is no negative integer in the initialization parameter value of bitmap class!!!");
    }

    /**
     * Add points sequentially
     *
     * @param state Status of points
     * @return true Indicates that the addition is successful, and false occurs when the bitmap is full; Status value out of bounds
     */
    public boolean add(long state) {
        if (state > stateNum - 1 || state < 0 || size == MAX_SIZE)
            return false;
        int index = (int) (size / numOfOneLong);
        int left = (int) (size % numOfOneLong);
        elementData[index] |= state << (64 - stateBitNum * (1 + left));
        ++size;
        return true;
    }

    public long find(long index) {
        if (index < 0 || index > size - 1)
            return -1;
        /*
        Calculate which element in the array holds the point corresponding to the index index
         */
        int arrayIndex = (int) (index / numOfOneLong);
        /*
        The bit starting from which bit in the calculation element holds the point corresponding to the index index
         */
        int elementIndex = (int) (index % numOfOneLong);
        /*
        Move left to clear the useless bits on the left, and then move right to clear the useless bits on the right. Finally, it is the state value corresponding to the index position to be found
         */
        return elementData[arrayIndex] << (stateBitNum * elementIndex) >>> (64 - stateBitNum);
    }

    public boolean update(long index, long state) {
        if (index < 0 || index > size - 1 || state > stateNum - 1 || state < 0)
            return false;
        int arrayIndex = (int) (index / numOfOneLong);
        int left = (int) (index % numOfOneLong);
        elementData[arrayIndex] |= state << (64 - stateBitNum * (1 + left));
        return true;
    }

    /**
     * Returns the number of states in a bitmap
     */
    public long getStateNum() {
        return stateNum;
    }

    /**
     * Returns the maximum number of points that a bitmap can store
     */
    public long getMaxSize() {
        return MAX_SIZE;
    }

    /**
     * Returns the actual number of used bitmaps
     */
    public long getSize() {
        return size;
    }

    /**
     * The auxiliary toString() method is mainly used to print the specific display of bitmap
     */
    private String elementDataToString() {
        StringBuilder result = new StringBuilder("[\n");
        for (long element : elementData) {
            String eleString = Long.toBinaryString(element);
            StringBuilder one = new StringBuilder();
            for (int i = 0; i < 64 - eleString.length(); i++)
                one.append("0");
            one.append(eleString);
            for (int i = 0; i < numOfOneLong + 1; i++)
                one.insert((stateBitNum + 1) * i, ',');
            result.append(one.substring(1, one.lastIndexOf(","))).append(",\n");
        }
        return result.append("]").toString();
    }

    @Override
    public String toString() {
        return "Bitmap{\n" +
                "elementData=" + elementDataToString() +
                ", \nstateNum=" + stateNum +
                ", \nstateBitNum=" + stateBitNum +
                ", \nMAX_SIZE=" + MAX_SIZE +
                ", \nsize=" + size +
                "\n}";
    }
}

Test:

public class test {
    public static void main(String[] args) {
        Bitmap bitmap = new Bitmap(16, 10);
        System.out.println(bitmap.add(2));
        System.out.println(bitmap.add(3));
        System.out.println(bitmap.add(4));
        System.out.println(bitmap.add(5));
        System.out.println(bitmap);
        System.out.println(bitmap.update(3, 7));
        System.out.println(bitmap);
        System.out.println(bitmap.find(3));
    }
}

Summary:

Bitmap class:
1. You can store up to (2 ^ 37-2 ^ 6) points with two status values;

2. Theoretically, there are (2 ^ 37-2 ^ 6) bit spaces, which may not be fully used in practice (see the introduction in the nonparametric constructor), so the number of state values (stateNum) of the selected points should preferably meet 64% ⌈ log2(stateNum) ⌉ = 0, that is, the number of complement bits of stateNum is exactly the factor of 64; (if you don't understand, feel it by yourself)

3. There is no method to delete points, only add, update and find operations; (ask why people who don't have deletion methods are either mentally ill or mentally ill...)

4. Answer question 3: first of all, it is not impossible, it is not necessary, because deleting a point requires filling the deleted position in the array by shifting. Due to the large amount of data, it may be an extremely waste of time, and the significance of bitmap itself is to represent the possibility of each data in massive data. For massive data, A few errors will not affect the final statistical results (for sneaking, the wrong data can still be re filtered in the next layer, even if you don't understand...), Here, it is recommended to use the update method to set the unnecessary point to 0 to indicate that the point is discarded. The specific logic needs to be edited by the user himself;

5. The theoretical persistent storage space of this class is about 260MB, that is, the limit value of the size of the elementData array. In fact, it may be a little more, but it is almost negligible. Finally, the number of points storing two state values is about 2 billion. You can also customize the number of state values of each point. stateNum and the length of points can be set to (2 ^ 63-1) at most, But in the end, length > ⌈ log2(stateNum) ⌉ (rounded up) / 64 * integer MAX_ Value must be sufficient for daily use; (here, there are always bullies who will ask what to do if they exceed the two status values. Even if there are 2 billion points, I still don't have enough?)

6. Answer the question in 5: can't you make an array? Can't you use more new ones together!!!

Topics: Java Algorithm data structure