Master the 24 operations of Collectors in Java8 Stream

Posted by andymoo on Fri, 25 Feb 2022 13:50:09 +0100

Java 8 should be regarded as the main version in the industry. A highly important update in the version is Stream stream processing. There are many contents about Stream processing. This article mainly talks about the use of Collectors tool class in Stream.

Collectors are Java util. A tool class under the stream package, in which the return value of each method can be used as Java util. stream. The input parameter of stream #collect realizes various operations on the queue, including grouping, aggregation, etc. Official documents give some examples:

Implementations of {@link Collector} that implement various useful reduction operations, such as accumulating elements into collections, summarizing elements according to various criteria, etc.

The following are examples of using the predefined collectors to perform common mutable reduction tasks:

// Accumulate names into a List
List<String> list = people.stream().map(Person::getName).collect(Collectors.toList());

// Accumulate names into a TreeSet
Set<String> set = people.stream().map(Person::getName).collect(Collectors.toCollection(TreeSet::new));

// Convert elements to strings and concatenate them, separated by commas
String joined = things.stream()
        .map(Object::toString)
        .collect(Collectors.joining(", "));

// Compute sum of salaries of employee
int total = employees.stream()
        .collect(Collectors.summingInt(Employee::getSalary)));

// Group employees by department
Map<Department, List<Employee>> byDept = employees.stream()
        .collect(Collectors.groupingBy(Employee::getDepartment));

// Compute sum of salaries by department
Map<Department, Integer> totalByDept = employees.stream()
        .collect(Collectors.groupingBy(Employee::getDepartment, Collectors.summingInt(Employee::getSalary)));

// Partition students into passing and failing
Map<Boolean, List<Student>> passingFailing = students.stream()
        .collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS_THRESHOLD));

Define sample data

First define the object to be operated, a universal Student class (using lombok):

@Data
@AllArgsConstructor
public class Student {
    private String id;
    private String name;
    private LocalDate birthday;
    private int age;
    private double score;
}

Then define a set of test data:

final List<Student> students = Lists.newArrayList();
students.add(new Student("1", "Zhang San", LocalDate.of(2009, Month.JANUARY, 1), 12, 12.123));
students.add(new Student("2", "Li Si", LocalDate.of(2010, Month.FEBRUARY, 2), 11, 22.123));
students.add(new Student("3", "Wang Wu", LocalDate.of(2011, Month.MARCH, 3), 10, 32.123));

data statistics

Number of elements: counting

This is relatively simple, which is to count the number of elements in the aggregation result:

// 3
students.stream().collect(Collectors.counting())

Average value: averagedouble, averageint, averagelong

These methods are used to calculate the average value of aggregation elements. The difference is that the input parameter needs to be the corresponding type.

For example, calculate the average score of students. Because the score is of double type, you need to use averaging double without changing the type:

// 22.123
students.stream().collect(Collectors.averagingDouble(Student::getScore))

If the conversion accuracy is considered, it can also be realized:

// 22.0
students.stream().collect(Collectors.averagingInt(s -> (int)s.getScore()))
// 22.0
students.stream().collect(Collectors.averagingLong(s -> (long)s.getScore()))

If you want to find the average age of students, because the age is of type int, you can use any function at will:

// 11.0
students.stream().collect(Collectors.averagingInt(Student::getAge))
// 11.0
students.stream().collect(Collectors.averagingDouble(Student::getAge))
// 11.0
students.stream().collect(Collectors.averagingLong(Student::getAge))

Note: the return values of these three methods are of type Double.

And: summerdouble, summerint, summerlong

These three methods are similar to the above average method, but also need to pay attention to the element type. When type conversion is required, it needs to be forced:

// 66
students.stream().collect(Collectors.summingInt(s -> (int)s.getScore()))
// 66.369
students.stream().collect(Collectors.summingDouble(Student::getScore))
// 66
students.stream().collect(Collectors.summingLong(s -> (long)s.getScore()))

However, for types that do not need to be cast, you can use any function at will:

// 33
students.stream().collect(Collectors.summingInt(Student::getAge))
// 33.0
students.stream().collect(Collectors.summingDouble(Student::getAge))
// 33
students.stream().collect(Collectors.summingLong(Student::getAge))

Note: the return values of these three methods are different from those of the average value. Summerdouble returns Double type, summerint returns Integer type, and summerlong returns Long type.

Max / min element: maxBy, minBy

As the name suggests, these two functions are to find the maximum / minimum element in the specified comparator in the aggregation element. For example, find the oldest / youngest Student object:

// Optional[Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)]. Note that the return type is optional
students.stream().collect(Collectors.minBy(Comparator.comparing(Student::getAge)))
// Optional[Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123)]. Note that the return type is optional
students.stream().collect(Collectors.maxBy(Comparator.comparing(Student::getAge)))

It can be seen from the source code that these two methods are the benefits given by the author to improve the results of data statistics. Internally, it encapsulates the reducing method and BinaryOperator tool class, which will be discussed below.

public static <T> Collector<T, ?, Optional<T>> maxBy(Comparator<? super T> comparator) {
    return reducing(BinaryOperator.maxBy(comparator));
}

public static <T> Collector<T, ?, Optional<T>> minBy(Comparator<? super T> comparator) {
    return reducing(BinaryOperator.minBy(comparator));
}

Statistical results: summarizingDouble, summarizingInt, summarizingLong

Since it is a data operation, there is basically no escape from counting, drawing, summation, maximum and minimum, so the author also realizes a group of aggregated data statistics methods.

This group of methods is similar to the methods of summation and averaging. Attention should be paid to the type of method. For example, according to score statistics, type conversion is required:

// IntSummaryStatistics{count=3, sum=66, min=12, average=22.000000, max=32}
students.stream().collect(Collectors.summarizingInt(s -> (int) s.getScore()))
// DoubleSummaryStatistics{count=3, sum=66.369000, min=12.123000, average=22.123000, max=32.123000}
students.stream().collect(Collectors.summarizingDouble(Student::getScore))
// LongSummaryStatistics{count=3, sum=66, min=12, average=22.000000, max=32}
students.stream().collect(Collectors.summarizingLong(s -> (long) s.getScore()))

If age statistics are used, the three methods are common:

// IntSummaryStatistics{count=3, sum=33, min=10, average=11.000000, max=12}
students.stream().collect(Collectors.summarizingInt(Student::getAge))
// DoubleSummaryStatistics{count=3, sum=33.000000, min=10.000000, average=11.000000, max=12.000000}
students.stream().collect(Collectors.summarizingDouble(Student::getAge))
// LongSummaryStatistics{count=3, sum=33, min=10, average=11.000000, max=12}
students.stream().collect(Collectors.summarizingLong(Student::getAge))

Note: the return values of these three methods are different. summarizingDouble returns double summarystatistics type, summarizingInt returns IntSummaryStatistics type, and summarizingLong returns LongSummaryStatistics type.

Aggregation and grouping

Aggregation elements: toList, toSet, toCollection

These functions are relatively simple. They re encapsulate the aggregated elements into the queue and then return. For example, to get the ID list of all students, you only need to use different methods according to the required result type:

// List: [1, 2, 3]
final List<String> idList = students.stream().map(Student::getId).collect(Collectors.toList());
// Set: [1, 2, 3]
final Set<String> idSet = students.stream().map(Student::getId).collect(Collectors.toSet());
// TreeSet: [1, 2, 3]
final Collection<String> idTreeSet = students.stream().map(Student::getId).collect(Collectors.toCollection(TreeSet::new));

Note: the toList method returns the List subclass, toSet returns the Set subclass, and toCollection returns the Collection subclass. As we all know, the subclasses of Collection include many subclasses such as List and Set, so toCollection is more flexible.

Aggregation elements: tocurrentmap, tocurrentmap

The function of these two methods is to reassemble the aggregated elements into a Map structure, that is, a k-v structure. The usage of the two methods is the same. The difference is that tocurrentmap returns a Map and tocurrentmap returns a ConcurrentMap. That is, tocurrentmap returns a thread safe Map structure.

For example, we need to aggregate Student IDS:

/ {1=Student(id=1, name=Zhang San, birthday=2009-01-01, age=12, score=12.123), 2=Student(id=2, name=Li Si, birthday=2010-02-02, age=11, score=22.123), 3=Student(id=3, name=Wang Wu, birthday=2011-03-03, age=10, score=32.123)}
final Map<String, Student> map11 = students.stream()
    .collect(Collectors.toMap(Student::getId, Function.identity()));

However, if there is a duplicate id, it will throw Java Lang. IllegalStateException: duplicate key exception. Therefore, for the sake of insurance, we need to use another overloaded method of toMap:

// {1=Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123), 2=Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123), 3=Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)}
final Map<String, Student> map2 = students.stream()
    .collect(Collectors.toMap(Student::getId, Function.identity(), (x, y) -> x));

It can be seen that there are different overloaded methods for toMap, which can realize more complex logic. For example, we need to get the names of students grouped by id:

// {1 = Zhang San, 2 = Li Si, 3 = Wang Wu}
final Map<String, String> map3 = students.stream()
    .collect(Collectors.toMap(Student::getId, Student::getName, (x, y) -> x));

For example, we need to get the Student object set with the highest score at the same age:

// {10=Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123), 11=Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123), 12=Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123)}
final Map<Integer, Student> map5 = students.stream()
    .collect(Collectors.toMap(Student::getAge, Function.identity(), BinaryOperator.maxBy(Comparator.comparing(Student::getScore))));

Therefore, the playability of toMap is very high.

Grouping: groupingBy, groupingByConcurrent

Both groupingBy and toMap group aggregation elements. The difference is that the result of toMap is a 1:1 k-v structure, and the result of groupingBy is a 1:n k-v structure.

For example, we grouped students by age:

// List: {10=[Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)], 11=[Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123)], 12=[Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123)]}
final Map<Integer, List<Student>> map1 = students.stream().collect(Collectors.groupingBy(Student::getAge));
// Set: {10=[Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)], 11=[Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123)], 12=[Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123)]}
final Map<Integer, Set<Student>> map12 = students.stream().collect(Collectors.groupingBy(Student::getAge, Collectors.toSet()));

Since groupingBy is also a group, can it also implement functions similar to those of toMap? For example, students grouped by id:

// {1=Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123), 2=Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123), 3=Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)}
final Map<String, Student> map3 = students.stream()
    .collect(Collectors.groupingBy(Student::getId, Collectors.collectingAndThen(Collectors.toList(), list -> list.get(0))));

For comparison, put the writing method of toMap here:

// {1=Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123), 2=Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123), 3=Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)}
final Map<String, Student> map2 = students.stream()
    .collect(Collectors.toMap(Student::getId, Function.identity(), (x, y) -> x));

If you want a thread safe Map, you can use groupingByConcurrent.

Grouping: partitioningBy

The difference between partitioningBy and groupingBy is that partitioningBy can divide collection elements into true and false with the help of Predicate assertion. For example, group by age greater than 11:

// List: {false=[Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123), Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)], true=[Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123)]}
final Map<Boolean, List<Student>> map6 = students.stream().collect(Collectors.partitioningBy(s -> s.getAge() > 11));
// Set: {false=[Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123), Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123)], true=[Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123)]}
final Map<Boolean, Set<Student>> map7 = students.stream().collect(Collectors.partitioningBy(s -> s.getAge() > 11, Collectors.toSet()));

Link data: joining

This method aggregates the elements of String type, splices them into a String, and returns them. The function is the same as that of Java Similar to lang. String #join, it provides three different overload methods to meet different needs. For example:

// javagosql
Stream.of("java", "go", "sql").collect(Collectors.joining());
// java, go, sql
Stream.of("java", "go", "sql").collect(Collectors.joining(", "));
// [java, go, sql]
Stream.of("java", "go", "sql").collect(Collectors.joining(", ", "[", "]"));

Operation chain: collectingAndThen

This method has appeared in the example of groupingBy. It first aggregates the collection, and then processes the aggregated results again through the Function defined by the Function.

For example, the example in groupingBy:

// {1=Student(id=1, name = Zhang San, birthday=2009-01-01, age=12, score=12.123), 2=Student(id=2, name = Li Si, birthday=2010-02-02, age=11, score=22.123), 3=Student(id=3, name = Wang Wu, birthday=2011-03-03, age=10, score=32.123)}
final Map<String, Student> map3 = students.stream()
    .collect(Collectors.groupingBy(Student::getId, Collectors.collectingAndThen(Collectors.toList(), list -> list.get(0))));

The display aggregates the results into a List, and then takes the 0th element of the List to return. In this way, the 1:1 map structure is realized.

Let's take a more complicated step and find the Student list with correct age data in the aggregation element:

// [], the result is empty because everyone's age in the example is correct
students.stream()
        .collect(
                Collectors.collectingAndThen(Collectors.toList(), (
                        list -> list.stream()
                                .filter(s -> (LocalDate.now().getYear() - s.getBirthday().getYear()) != s.getAge())
                                .collect(Collectors.toList()))
                )
        );

This example is purely to use the usage of collectingAndThen. In fact, it can be simplified to:

students.stream()
        .filter(s -> (LocalDate.now().getYear() - s.getBirthday().getYear()) != s.getAge())
        .collect(Collectors.toList());

Post operation aggregation: mapping

mapping first processes the data through the Function function Function, and then aggregates the elements through the Collector method. For example, get the name list of students:

// [Zhang San, Li Si, Wang Wu]
students.stream()
        .collect(Collectors.mapping(Student::getName, Collectors.toList()));

This calculation is similar to Java util. stream. The stream #map method is similar:

// [Zhang San, Li Si, Wang Wu]
students.stream()
        .map(Student::getName)
        .collect(Collectors.toList());

From this point of view, or through Java util. stream. Stream #map is clearer.

Post aggregation operation: reducing

reducing provides three overload methods:

  • public static <T> Collector<T, ?, Optional < T > > reducing (BinaryOperator < T > OP): operate directly through BinaryOperator, and the return value is optional

  • public static <T> Collector<T, ?, T> Reducing (t identity, BinaryOperator < T > OP): preset the default value, and then operate through BinaryOperator

  • public static <T, U> Collector<T, ?, U> Reducing (U identity, Function <? Super T,? Extends U > mapper, BinaryOperator < U > OP): preset the default value, operate the element through Function, and then operate through BinaryOperator

For example, calculate the total score of all students:

// Optional[66.369]. Note that the return type is optional
students.stream()
        .map(Student::getScore)
        .collect(Collectors.reducing(Double::sum));
// 66.369
students.stream()
        .map(Student::getScore)
        .collect(Collectors.reducing(0.0, Double::sum));
// 66.369
students.stream()
        .collect(Collectors.reducing(0.0, Student::getScore, Double::sum));

The operation of mapping and reducing is the same as that of Java util. stream. The stream #reduce method is similar:

// Optional[66.369]. Note that the return type is optional
students.stream().map(Student::getScore).reduce(Double::sum);
// 66.369
students.stream().map(Student::getScore).reduce(0.0, Double::sum);

When talking about maxBy and minBy above, these two functions are implemented through reducing.

For mapping and reducing, you can refer to the concept of map reduce in functional programming.

Conclusion

This article mainly explains the 24 methods defined by Collectors in Java8 Stream. This stream computing logic has great advantages in performance by relying on Fork/Join framework. If you don't master these usages, it may be difficult to read the code later. After all, Java 8 is basically the benchmark in the industry.

Topics: Java Back-end