Java Collection interface: List interface & Set interface

Posted by fredroines on Mon, 10 Jan 2022 03:40:09 +0100

Big data technology AI

Flink/Spark/Hadoop / data warehouse, data analysis, interview, source code interpretation and other dry goods learning materials

101 original content

official account

1. List interface

The elements in the List collection class are ordered and repeatable, and each element in the collection has its corresponding sequential index.
The elements in the List container correspond to an integer serial number, recording their position in the container. The elements in the container can be accessed according to the serial number.
The implementation classes of List interface in JDK API are ArrayList, LinkedList and Vector.

1.1 List interface method

In addition to the methods inherited from the Collection collection, the List Collection adds some methods to operate the Collection elements according to the index

void add(int index, Object ele)
boolean addAll(int index, Collection eles)
Object get(int index)
int indexOf(Object obj)
int lastIndexOf(Object obj)
Object remove(int index)
Object set(int index, Object ele)
List subList(int fromIndex, int toIndex)

1.2 List interface iterator ListIterator

In addition to foreach and Iterator iterators, List provides an additional listIterator() method, which returns a ListIterator object. The ListIterator interface inherits the Iterator interface and provides special methods for operating List:

void add()
void set(Object obj)
void remove()
boolean hasPrevious()
Object previous()
int previousIndex()
boolean hasNext()
Object next()
int nextIndex()

1.3 one of the implementation classes of list: ArrayList

1. ArrayList overview

ArrayList is a typical implementation class and main implementation class of the List interface
In essence, ArrayList is a "variable length" array of object references
Jdk1 of ArrayList 8 what is the difference between previous and subsequent implementations?

JDK1.7: ArrayList is like a hungry man. It directly creates an array with an initial capacity of 10
JDK1.8: ArrayList is like a lazy man. It starts by creating an array with a length of 0. When the first element is added, it creates an array with a starting capacity of 10

Arrays. The List collection returned by the aslist (...) method is neither an ArrayList instance nor a Vector instance
Arrays. The return value of aslist (...) is a fixed length List collection

2. ArrayList source code analysis

//The bottom layer creates an Object [] array elementData with a length of 10
ArrayList list = new ArrayList();
//elementData[0] = new Integer(123);
list.add(123);
//If the capacity of the underlying elementData array is insufficient due to this addition, the capacity will be expanded.
list.add(11);

In JDK7, by default, the capacity is expanded to 1.5 times the original capacity. At the same time, the data in the original array needs to be copied to the new array.

Summary: it is recommended to use constructors with parameters in development: ArrayList = new ArrayList (int capacity)

 public class ArrayList<E> extends AbstractList<E>
         implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{

     private transient Object[] elementData;

     /**
      * Constructs an empty list with an initial capacity of ten.
      */
     public ArrayList() {
         this(10);
     }

     public ArrayList(int initialCapacity) {
         super();
         if (initialCapacity < 0)
             throw new IllegalArgumentException("Illegal Capacity: "+
                                                initialCapacity);
         // When you create a set, you create a set with a length of 10                                       
         this.elementData = new Object[initialCapacity];
     }

     public boolean add(E e) {
         ensureCapacityInternal(size + 1);  // Increments modCount!!
         elementData[size++] = e;
         return true;
     }

     private void ensureCapacityInternal(int minCapacity) {
         modCount++;
         // overflow-conscious code
         if (minCapacity - elementData.length > 0)
             grow(minCapacity);
     }

     private void grow(int minCapacity) {
         // overflow-conscious code
         int oldCapacity = elementData.length;
         int newCapacity = oldCapacity + (oldCapacity >> 1);
         if (newCapacity - minCapacity < 0)
             newCapacity = minCapacity;
         if (newCapacity - MAX_ARRAY_SIZE > 0)
             newCapacity = hugeCapacity(minCapacity);
         // minCapacity is usually close to size, so this is a win:
         elementData = Arrays.copyOf(elementData, newCapacity);
     }

 }

When you create a set, you create a set with a length of 10

Constructs an empty list with an initial capacity of ten

Changes of ArrayList in JDK8

 //The underlying Object[] elementData is initialized to {} No array of length 10 was created
 ArrayList list = new ArrayList();
 //The first time add() is called, the underlying layer creates an array of length 10 and adds data 123 to elementData[0]
 list.add(123);

The underlying Object[] elementData is initialized to {} No array of length 10 was created

The first time add() is called, the underlying layer creates an array of length 10 and adds data 123 to elementData[0]

The subsequent addition and expansion operations are the same as jdk 7.

Summary: the creation of ArrayList objects in jdk7 is similar to the starving type of singleton, while the creation of ArrayList objects in jdk8 is similar to the lazy type of singleton, delaying the creation of arrays and saving memory.

 public class ArrayList<E> extends AbstractList<E>
         implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
     /**
     * Default initial capacity.
     */
     private static final int DEFAULT_CAPACITY = 10;

     private static final Object[] EMPTY_ELEMENTDATA = {};

     private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

     transient Object[] elementData; // non-private to simplify nested class access


     public ArrayList(int initialCapacity) {
         if (initialCapacity > 0) {
             this.elementData = new Object[initialCapacity];
         } else if (initialCapacity == 0) {
             this.elementData = EMPTY_ELEMENTDATA;
         } else {
             throw new IllegalArgumentException("Illegal Capacity: "+
                                                initialCapacity);
         }
     }

     public ArrayList() {
         // Initialize an empty collection
         this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
     }    

     public boolean add(E e) {
         ensureCapacityInternal(size + 1);  // Increments modCount!!
         elementData[size++] = e;
         return true;
     }

     private void ensureCapacityInternal(int minCapacity) {
         // When adding the first element, enter if judgment
         // minCapacity=1, DEFAULT_ Capability = 10, the maximum value is 10
         if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
             minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
         }
         // This is not the first time to add an element
         ensureExplicitCapacity(minCapacity);
     }

     private void ensureExplicitCapacity(int minCapacity) {
         modCount++;
         // overflow-conscious code
         if (minCapacity - elementData.length > 0)
             grow(minCapacity);
     }  

     private void grow(int minCapacity) {
         // overflow-conscious code
         int oldCapacity = elementData.length;
         // The new capacity is 1.5 times that of the original
         // For the first time, oldCapacity =0
         int newCapacity = oldCapacity + (oldCapacity >> 1);
         if (newCapacity - minCapacity < 0)
             newCapacity = minCapacity;
         if (newCapacity - MAX_ARRAY_SIZE > 0)
             newCapacity = hugeCapacity(minCapacity);
         // minCapacity is usually close to size, so this is a win:
         elementData = Arrays.copyOf(elementData, newCapacity);
     } 

 }

1.4 implementation class 2 of list: Vector

Vector is an ancient collection, jdk1 0. Most operations are the same as ArrayList, except that vector is thread safe.
Among various list s, it is best to use ArrayList as the default choice.
Use LinkedList when inserting and deleting frequently;
Vector is always slower than ArrayList, so try to avoid it.
When creating objects through the Vector() constructor in jdk7 and jdk8, the underlying layer creates an array with a length of 10.
In terms of capacity expansion, the default capacity expansion is twice the length of the original array.

1.5 implementation class 3 of list: LinkedList

LinkedList: bidirectional linked list. In addition to saving data, it also defines two variables:

The prev variable records the position of the previous element
The next variable records the position of the next element

For frequent operations of inserting or deleting elements, it is recommended to use the LinkedList class, which is more efficient

 private static class Node<E> {
     E item;
     Node<E> next;
     Node<E> prev;

     Node(Node<E> prev, E element, Node<E> next) {
         this.item = element;
         this.next = next;
         this.prev = prev;
     }
 }

What are the similarities and differences between ArrayList/LinkedList/Vector? Talk about your understanding? What is the bottom layer of ArrayList? Capacity expansion mechanism? What is the biggest difference between Vector and ArrayList?

ArrayList: as the main implementation class of the List interface; Unsafe thread and high efficiency; The underlying layer uses Object[] elementData storage

LinkedList: for frequent insert and delete operations, the efficiency of using this type is higher than that of ArrayList; The bottom layer uses two-way linked list storage

Vector: as an ancient implementation class of the List interface; Thread safety and low efficiency; The underlying layer uses Object[] elementData storage

ArrayList and LinkedList

Both are thread unsafe. Compared with thread safe Vector, the execution efficiency is high.
ArrayList is a data structure based on dynamic array
LinkedList is based on the data structure of linked list.
For random access to get and set, ArrayList feels better than LinkedList because LinkedList moves the pointer.
For the operations add (specifically insert) and remove, LinkedList has an advantage because ArrayList moves data.

ArrayList and Vector

Vector and ArrayList are almost identical
The only difference is that Vector is a synchronized class and belongs to a strongly synchronized class. Therefore, the overhead is larger and the access is slower than ArrayList.
Normally, most Java programmers use ArrayList instead of Vector, because synchronization can be completely controlled by the programmer himself.
Vector requests twice its size for each expansion, while ArrayList is 1.5 times.
Vector also has a subclass Stack.

2. Set interface

The set interface is a sub interface of the Collection. The set interface does not provide additional methods.
The Set set cannot contain the same elements. If you try to add two identical elements to the same Set set, the add operation fails.
Set determines whether two objects are the same, not using the = = operator, but according to the equals method. Therefore, the elements stored in the set collection must pay attention to the override of the equals method.
The common implementation classes of Set include HashSet, TreeSet and LinkedHashSet.

2.1 one of the set implementation classes: HashSet

characteristic

HashSet is a typical implementation of the Set interface. This implementation class is used most of the time when using the Set set.
HashSet stores the elements in the set according to the Hash algorithm, so it has good performance of access, search and deletion.
HashSet has the following characteristics:
The order of elements cannot be guaranteed
HashSet is not thread safe
Collection elements can be null
HashSet sets the criteria for judging the equality of two elements: the two objects are equal through hashCode() method, and the return values of the equals() method of the two objects are also equal.
For objects stored in the Set container, the corresponding classes must override the equals() and hashCode(Object obj) methods to implement the object equality rule. That is: "equal objects must have equal hash codes"
The bottom layer is also an array with an initial capacity of 16. If the utilization rate exceeds 0.75, (16 * 0.75 = 12), the capacity will be doubled. (16 capacity expansion is 32, followed by 64128...)

Basic principles for overriding hashCode() methods

When the program runs, the same object calls the hashCode() method multiple times and should return the same value.
When the equals() method of two objects returns true, the return value of the hashCode() method of the two objects should also be equal
The fields used for the equals() method comparison in the object should be used to calculate the hashCode value.

Basic principles for overriding the equals() method

Taking the custom Customer class as an example, when do I need to override equals()?

When a class has its own unique concept of "logical equality", when rewriting equals(), always rewrite hashCode(). According to the equals method of a class (after rewriting), two distinct instances may be logically equal, but according to object hashCode() method, which are just two objects.
Therefore, there is a violation of "equal objects must have equal hash codes".
Conclusion: when copying the equals method, it is generally necessary to copy the hashCode method at the same time. Generally, the properties of the objects involved in the calculation of hashCode should also be involved in the calculation of equals().

Source code analysis:

Set: stores unordered and non repeatable data
Disorder: not equal to randomness. The stored data is not added in the order of array index in the underlying array, but determined according to the hash value of the data.
Non repeatability: ensure that when the added element is judged according to equals(), it cannot return true That is, only one element can be added to the same element.
Process of adding elements: take HashSet as an example:

We add element a to the HashSet. First, we call the hashCode() method of the class where element a is located to calculate the hash value of element a,

This hash value then calculates the storage position (i.e. index position) in the underlying array of HashSet through some algorithm to judge

Whether the array already has elements at this position:

If there are no other elements at this location, element a is added successfully. > Case 1
If there are other elements B (or multiple elements in the form of a linked list) at this location, then

Compare the hash values of element a and element b
equals() returns true, element a addition failed
If equals() returns false, element a is added successfully. > Case 2
If the hash values are different, element a is added successfully. > Case 2
If the hash values are the same, you need to call the class of element a

Compare with the equals() method

:
For cases 2 and 3 where the addition is successful, element a and the data already existing at the specified index position are stored in a linked list.

jdk 7: put element a into the array and point to the original element.

jdk 8: the original element is in the array and points to element a

HashSet bottom layer: array + linked list structure.

11.5.2 Set implementation class II: LinkedHashSet

LinkedHashSet is a subclass of HashSet
LinkedHashSet determines the storage location of elements according to the hashCode value of elements, but it also uses a two-way linked list to maintain the order of elements, which makes elements appear to be saved in insertion order.
The insertion performance of LinkedHashSet is slightly lower than that of HashSet, but it has good performance when iteratively accessing all elements in the Set
LinkedHashSet does not allow duplicate collection elements.

11.5.3 Set implementation class 3: TreeSet

TreeSet is the implementation class of SortedSet interface
TreeSet ensures that collection elements are in a sorted state.
The underlying TreeSet uses a red black tree structure to store data
TreeSet has two sorting methods: natural sorting and custom sorting.
By default, TreeSet adopts natural sorting. The natural sorting of TreeSet is from small to large according to the size of collection elements
Features: orderly, faster query speed than List

Natural sorting:

Natural sorting: TreeSet will call the compareTo(Object obj) method of collection elements to compare the size relationship between elements, and then arrange the collection elements in ascending order (by default)
If you try to add an object to a TreeSet, the object's class must implement the Comparable interface.
The class implementing Comparable must implement the compareTo(Object obj) method. The size of the two objects is compared through the return value of the compareTo(Object obj) method.

Typical implementation of Comparable:

BigDecimal, BigInteger and packing classes corresponding to all numerical types: compare them according to their corresponding numerical sizes
Character: compare by unicode value of the character
Boolean: the wrapper class instance corresponding to true is greater than that corresponding to false
String: compare by unicode value of characters in the string
Date, Time: the later Time and date are larger than the previous Time and date

 public class User implements Comparable{
     private String name;
     private int age;

     public User() {
     }

     public User(String name, int age) {
         this.name = name;
         this.age = age;
     }

     public String getName() {
         return name;
     }

     public void setName(String name) {
         this.name = name;
     }

     public int getAge() {
         return age;
     }

     public void setAge(int age) {
         this.age = age;
     }

     @Override
     public String toString() {
         return "User{" +
                 "name='" + name + '\'' +
                 ", age=" + age +
                 '}';
     }

     @Override
     public boolean equals(Object o) {
         System.out.println("User equals()....");
         if (this == o) return true;
         if (o == null || getClass() != o.getClass()) return false;

         User user = (User) o;

         if (age != user.age) return false;
         return name != null ? name.equals(user.name) : user.name == null;
     }

     @Override
     public int hashCode() { //return name.hashCode() + age;
         int result = name != null ? name.hashCode() : 0;
         result = 31 * result + age;
         return result;
     }

     //In descending order of name and age
     @Override
     public int compareTo(Object o) {
         if(o instanceof User){
             User user = (User)o;
             int compare = -this.name.compareTo(user.name);
             if(compare != 0){
                 return compare;
             }else{
                 return Integer.compare(this.age,user.age);
             }
         }else{
             throw new RuntimeException("The types entered do not match");
         }

     }
 }

Custom sorting:

The natural sorting of TreeSet requires that the class to which the element belongs implements the Comparable interface. If the class to which the element belongs does not implement the Comparable interface, or does not want to arrange the elements in ascending order (by default), or wants to sort according to the size of other attributes, consider using customized sorting.
Custom sorting is implemented through the Comparator interface. The compare(T o1,T o2) method needs to be rewritten.
Use the int compare(T o1,T o2) method to compare the sizes of o1 and o2: if the method returns a positive integer, it means that o1 is greater than o2; If 0 is returned, it means equal; Returns a negative integer indicating that o1 is less than o2.
To implement custom sorting, you need to pass an instance that implements the Comparator interface as a formal parameter to the constructor of TreeSet.
At this point, only objects of the same type can still be added to the TreeSet. Otherwise, a ClassCastException exception occurs.
The criterion for judging the equality of two elements by using custom sorting is that the Comparator returns 0 when comparing two elements

 public void test2(){
     Comparator com = new Comparator() {
         //Arranged from small to large according to age
         @Override
         public int compare(Object o1, Object o2) {
             if(o1 instanceof User && o2 instanceof User){
                 User u1 = (User)o1;
                 User u2 = (User)o2;
                 return Integer.compare(u1.getAge(),u2.getAge());
             }else{
                 throw new RuntimeException("The data types entered do not match");
             }
         }
     };

     TreeSet set = new TreeSet(com);
     set.add(new User("Tom",12));
     set.add(new User("Jerry",32));

     Iterator iterator = set.iterator();
     while(iterator.hasNext()){
         System.out.println(iterator.next());
     }
 }

Why use the copy hashCode method to have the number 31?

When selecting the coefficient, select the coefficient as large as possible. Because if the calculated hash address is larger, the so-called "conflicts" will be less, and the search efficiency will be improved. (conflict reduction)
Moreover, 31 only occupies 5 bits, and the probability of data overflow caused by multiplication is small.
31 can be represented by I * 31 = = (I < < 5) - 1. Now many virtual machines are optimized. (improve algorithm efficiency)
31 is a prime number. The function of prime number is that if I multiply a number by this prime number, the final result can only be divided by the prime number itself, the multiplicand and 1! (conflict reduction)

Topics: Java Big Data list

Programmer Think