HashSet Source Code Analysis of Dead java Sets

Posted by grimmier on Sat, 18 May 2019 15:44:47 +0200

problem

(1) What's the difference between Collection and Set?

(2) How does HashSet ensure that elements are not duplicated?

(3) Does HashSet allow null elements?

(4) Is HashSet orderly?

(5) Is HashSet synchronous?

(6) What is fail-fast?

brief introduction

Set, the concept is a little vague.

Generally speaking, collections in Java refer to container classes under the java.util package, including all classes related to Collection and Map.

Generally speaking, collections refer to Collection-related classes in java collections, not Map-related classes.

In a narrow sense, a Set in mathematics refers to a container that does not contain repetitive elements, that is, there are no two identical elements in the Set, which correspond to a Set in java.

How to understand it depends on the context.

For example, if you are asked to talk about collections in java, it must be in a broad sense.

For example, let's add all the elements in another set to the Set, which is in the middle sense.

HashSet is an implementation of Set. The underlying layer mainly uses HashMap to ensure that elements are not duplicated.

Source code analysis

attribute

    // Internal use of HashMap
    private transient HashMap<E,Object> map;

    // Virtual object, used as value to put in map
    private static final Object PRESENT = new Object();

Construction method

public HashSet() {
    map = new HashMap<>();
}

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}

public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}

// Non-public, mainly for LinkedHashSet
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}

Constructors are constructors that call the corresponding HashMap.

The last constructor is a bit special. It's not public, meaning it can only be called by the same package or subclass. This is LinkedHashSet's proprietary method.

Additive elements

Invoke the put() method of HashMap directly, treat the element itself as key and PRESENT as value, that is, all values in this map are the same.

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

Delete elements

Call the remove() method of HashMap directly, noting that the remove return of map is the value of the deleted element, while the remov return of Set is the boolean type.

Here I want to check that if null, it means that there is no element, if not null, it must be equal to PRESENT.

public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}

Query element

Set has no get() method, because get seems meaningless, unlike List, where elements can be retrieved by index.

The containsKey() method of map is called directly as long as a method contains() checks whether the element exists.

public boolean contains(Object o) {
    return map.containsKey(o);
}

Traversal element

An iterator that directly calls the keySet of map.

public Iterator<E> iterator() {
    return map.keySet().iterator();
}

All source code

package java.util;

import java.io.InvalidObjectException;
import sun.misc.SharedSecrets;

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    static final long serialVersionUID = -5024744406713321676L;

    // Internal elements are stored in HashMap
    private transient HashMap<E,Object> map;

    // Virtual elements, which are stored in the value of the map element, have no practical significance.
    private static final Object PRESENT = new Object();

    // Spatial construction method
    public HashSet() {
        map = new HashMap<>();
    }

    // Add all the elements of another collection to the current Set
    // Note that the initial capacity of the map is calculated here when it is initialized.
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

    // Specify initial capacity and load factor
    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }

    // Specify only the initial capacity
    public HashSet(int initialCapacity) {
        map = new HashMap<>(initialCapacity);
    }

    // LinkedHashSet-specific method
    // dummy is meaningless, just to keep up with the above manipulation signature.
    HashSet(int initialCapacity, float loadFactor, boolean dummy) {
        map = new LinkedHashMap<>(initialCapacity, loadFactor);
    }

    // iterator
    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }

    // Element number
    public int size() {
        return map.size();
    }

    // Check if it is empty
    public boolean isEmpty() {
        return map.isEmpty();
    }

    // Check whether an element is included
    public boolean contains(Object o) {
        return map.containsKey(o);
    }

    // Additive elements
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

    // Delete elements
    public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }

    // Empty all elements
    public void clear() {
        map.clear();
    }

    // Cloning method
    @SuppressWarnings("unchecked")
    public Object clone() {
        try {
            HashSet<E> newSet = (HashSet<E>) super.clone();
            newSet.map = (HashMap<E, Object>) map.clone();
            return newSet;
        } catch (CloneNotSupportedException e) {
            throw new InternalError(e);
        }
    }

    // Serialization Writing Method
    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException {
        // Write out non-static non-transient s
        s.defaultWriteObject();

        // Write out the capacity and load factor of map
        s.writeInt(map.capacity());
        s.writeFloat(map.loadFactor());

        // Write out the number of elements
        s.writeInt(map.size());

        // Traverse to write out all elements
        for (E e : map.keySet())
            s.writeObject(e);
    }

    // Serialized Reading Method
    private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        // Read in non-static non-transient attributes
        s.defaultReadObject();

        // Read in capacity and check not less than 0
        int capacity = s.readInt();
        if (capacity < 0) {
            throw new InvalidObjectException("Illegal capacity: " +
                                             capacity);
        }

        // Read in the load factor and check that it cannot be less than or equal to 0 or NaN(Not a Number)
        // java.lang.Float.NaN = 0.0f / 0.0f;
        float loadFactor = s.readFloat();
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
            throw new InvalidObjectException("Illegal load factor: " +
                                             loadFactor);
        }

        // Read in the number of elements and check not to be less than 0
        int size = s.readInt();
        if (size < 0) {
            throw new InvalidObjectException("Illegal size: " +
                                             size);
        }
        // Reset capacity based on number of elements
        // This is to ensure that the map has enough capacity to accommodate all elements and prevent meaningless expansion.
        capacity = (int) Math.min(size * Math.min(1 / loadFactor, 4.0f),
                HashMap.MAXIMUM_CAPACITY);

        // Check something again and ignore the unimportant code
        SharedSecrets.getJavaOISAccess()
                     .checkArray(s, Map.Entry[].class, HashMap.tableSizeFor(capacity));

        // Create a map and check if it's a LinkedHashSet type?
        map = (((HashSet<?>)this) instanceof LinkedHashSet ?
               new LinkedHashMap<E,Object>(capacity, loadFactor) :
               new HashMap<E,Object>(capacity, loadFactor));

        // Read in all the elements and put them in the map
        for (int i=0; i<size; i++) {
            @SuppressWarnings("unchecked")
                E e = (E) s.readObject();
            map.put(e, PRESENT);
        }
    }

    // A separable iterator, mainly used in parallel multithreaded iterative processing
    public Spliterator<E> spliterator() {
        return new HashMap.KeySpliterator<E,Object>(map, 0, -1, 0, 0);
    }
}

summary

(1) HashSet uses the key of HashMap to store elements in order to ensure that elements are not duplicated;

(2) HashSet is disordered, because the key of HashMap is disordered;

(3) HashSet allows a null element, because HashMap allows a key to be null;

(4) HashSet is non-thread-safe;

(5) HashSet has no get() method;

Egg

(1) The Ali Manual says that you should specify the size of the set when using the set in java. Through the analysis of the source code, do you know how to transfer the initial capacity when initializing HashMap?

We found the following construction method, which clearly and clearly showed us how to specify the capacity.

If we predict that HashMap will store n elements, then its capacity should be specified as ((n/0.75f) + 1), if this value is less than 16, then 16 will be used directly.

The initial capacity is specified to reduce the number of expansion and improve efficiency.

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

(2) What is fail-fast?

fail-fast mechanism is an error mechanism in java collection.

When iterators are used to iterate, if changes are found in the collection, the response is quickly failed and the Concurrent ModificationException exception is thrown.

This modification may be caused by the modification of other threads, or by the modification of the current thread itself, such as calling remove() to delete elements directly in the iteration process.

In addition, not all collections in java have fail-fast mechanisms. For example, Concurrent HashMap, CopyOnWriter Array List, etc., which are ultimately consistent, do not have fast-fail.

So how does fail-fast work?

Careful students may find that there is an attribute called modCount in ArrayList and HashMap. Each time the value of the set is modified, it will be added 1. Record the value to expectedModCount before traversal. Check whether the two are consistent during traversal. If there is inconsistency, it means that there is a change, then throw the Concurrent Modification Exception exception.

Welcome to pay attention to my public number "Tong Ge Reads Source Code". Check out more articles about source code series and enjoy the sea of source code with Tong Ge.

Topics: Java less Attribute