The programming style of Stream is introduced into JDK8. By flexibly using this style, we can help us realize more convenient data processing operations. Today, let's talk about the implementation of distinct() in Stream and how to design custom de duplication logic through filter().
final int[] distinct = Stream.of(1, 1, 1, 2, 2, 3, 3, 4, 4, 5) // According to objects Equal() for de duplication .distinct() .mapToInt(Integer::intValue).toArray(); System.out.println(Arrays.toString(distinct));
Returns a stream containing non repeating elements (judged by Object.equals(object)) according to the current stream.
For ordered flow, the selection of non repeating elements is stable (the same elements are removed each time). For disordered flow, this operation cannot guarantee the stability of de duplication (each de duplication cannot guarantee the same elements).
For example, I have three 1s in the above code. No matter how many times I run the above code, distinct() will keep the first one and remove the last two 1s as duplicate values. If I can't guarantee the order of the elements in the stream, I can't guarantee that the repeated values removed each time are stable. Therefore, regardless of the order, distinct() will only retain the first non repeating element and remove the remaining elements that repeat the element.
If you directly use distinct(), the Stream operation will only repeatedly judge the elements according to the equal() and hashCode() of the Object. However, in some cases, if you want to implement the custom distinct(), you need to design it yourself through filter().
Below I have a User object:
@Data public class User{ private Integer id; private String name; private Integer age; private String addr; }
At this time, I want to be able to de duplicate according to the name of the User object instead of the hashCode() of the User object. The corresponding code is as follows:
final List<User> users = Arrays.asList( new User(1, "yuxin", 26, "beijing"), new User(2, "chunfeng", 26, "tianjing"), new User(3, "feiyang", 26, "wuzhou"), new User(3, "feiyang", 27, "wuzhou"), new User(4, "fei", 26, "sichuan"), new User(5, "yi", 26, "australia") ); final Map<String, User> map = new ConcurrentHashMap<>(); final List<Object> ret = users.stream() // The data is de duplicated by map, which is only used here .filter(user -> map.put(user.getName(), user) == null) .collect(Collectors.toList()); ret.forEach(System.out::println);
As you can see, I defined a map outside the Stream and tried to de duplicate it through the map in the filter operation. If we don't want to expose the de reused map outside the Stream, we can also use the static method to encapsulate the Predicate:
/** * Custom de duplication * @return func */ private static Predicate<User> customDistinct() { final Map<String, User> map = new ConcurrentHashMap<>(); return user -> map.put(user.getName(), user) == null; } final List<User> users = Arrays.asList( new User(1, "yuxin", 26, "beijing"), new User(2, "chunfeng", 26, "tianjing"), new User(3, "feiyang", 26, "wuzhou"), new User(3, "feiyang", 27, "wuzhou"), new User(4, "fei", 26, "sichuan"), new User(5, "yi", 26, "australia") ); final List<Object> ret = users.stream() // Here, call the above method to get the de duplication method .filter(customDistinct()) .collect(Collectors.toList()); ret.forEach(System.out::println);
In fact, there is no limit to the Map used for de duplication. If only Stream is used instead of parallel Stream, HashMap is sufficient. However, if parallel Stream is used, the concurrency involved in the de duplication process needs to be considered. Using concurrent HashMap is more appropriate. Because the set data structure itself is also more suitable for de duplication, We can also use set to implement de duplication without saving the element itself. How to select the model depends on the actual needs of the developer.
// Use HashSet to realize de duplication, which is applicable in non concurrent scenarios private static Predicate<User> customDistinct() { final Set<String> set = new HashSet<>(); return user -> set.add(user.getName()); }
summary
The above are some small summaries of de duplication through Stream. If you have any questions or supplements, you can leave a message in the comment area. As for Stream itself, due to its powerful function, flexible application can help us achieve rapid development.
reference material
stream de duplication function distinct of jdk8
java stream distinc