Reading Notes of java 8 Actual War - Chapter VII Parallel Data Processing and Performance

Posted by datafan on Sat, 11 May 2019 00:03:09 +0200

1. Parallel Flow

1. Convert Sequential Flow to Parallel Flow

Call the parallel method on a sequential stream:

public static long parallelSum(long n) { 
 return Stream.iterate(1L, i -> i + 1) 
 .limit(n) 
 .parallel() 
 .reduce(0L, Long::sum); 
}

Inside, it is essentially a boolean flag, indicating that you want all operations performed after calling parallel to execute in parallel.Similarly, you can turn a parallel stream into a sequential stream simply by calling the sequential method.However, the last parallel or sequential call affects the entire pipeline.

2. Measuring flow performance

iterate generates boxed objects that must be unboxed into numbers to sum;
It is difficult to divide an iterate into separate blocks for parallel execution.

iterate is difficult to split into small chunks that can be executed independently, because each application of this function relies on the results of the previous application, and the entire list of numbers is not ready at the beginning of the induction process, so streams cannot be effectively divided into small chunks for parallel processing.Marking streams as parallel adds overhead to sequential processing and divides each summation operation into separate threads.

3. Use parallel streams correctly

The primary reason for the error caused by misuse of parallel streams is that the algorithm used has changed some of the shared states.

public class Accumulator { 
 public long total = 0; 
 public void add(long value) { total += value; } 
}

public static long sideEffectParallelSum(long n) { 
 Accumulator accumulator = new Accumulator(); 
 LongStream.rangeClosed(1, n).parallel().forEach(accumulator::add); 
 return accumulator.total; 
}

The example above is sequential in nature, with data competition occurring for each total access. Since multiple threads access the accumulator at the same time, executing total += value is a seemingly simple but not an atomic operation.The results are also uncontrollable (wrong).

4. Efficient use of parallel streams

Pay attention to packing
Some operations themselves perform poorly on parallel streams than sequential streams
Also consider the total calculated cost of the flow's operational pipeline.If N is the total number of elements to be processed and Q is the approximate processing cost of an element through the pipeline, then N*Q is a rough qualitative estimate of the cost.A higher Q value means a greater likelihood of good performance when using parallel streams
Choosing parallel streams is almost never a good decision for small amounts of data
Consider whether the data structure behind the flow is easy to decompose
The characteristics of the flow itself and the way in which intermediate operations in the pipeline modify the flow may alter the performance of the decomposition process.
Also consider the cost of merging steps in terminal operations

2. Branch/Merge Framework (Fork/Join)

See Chapter VI for details
Note: The invoke method of ForkJoinPool should not be used inside RecursiveTask.Instead, you should always call the compute or fork methods directly, and only sequential code should invoke to start parallel computing.

3. Spliterator

Spliterator is another new interface added to Java 8; the name stands for splitable iterator.Like Iterator, Spliterator is used to traverse elements in a data source, but it is designed for parallel execution.
Spliterator interface

public interface Spliterator<T> { 
 boolean tryAdvance(Consumer<? super T> action); 
 Spliterator<T> trySplit(); 
 long estimateSize(); 
 int characteristics(); 
}

As always, T is the type of element that Spliterator traverses.The tryAdvance method behaves like a normal Iterator because it uses elements from the Spliterator one by one in order, and returns true if there are other elements to traverse.However, trySplit is designed for the Spliterator interface because it can delimit elements out to the second Spliterator (returned by the method) so that they are processed in parallel.Spliterator can also use the estimateSize method to estimate how many elements are left to traverse, because a quick calculation of a value, even if not exact, helps to make the splits more evenly.

1. Splitting process

The algorithm for splitting Stream into parts is a recursive process, as shown in the figure.The first step is to call trySplit on the first Spliterator and generate the second Spliterator.The second step calls trysplit on both Spliterators, so there are four Spliterators in total.This framework keeps calling trySplit on Spliterator until it returns null, indicating that the data structure it processes cannot be split anymore, as shown in step 3.Finally, the recursive splitting process terminates at step 4, when all Spliterators return null when trySplit is called.

2. Realize your own pliterator

The three-parameter overload method of reduce is mentioned in this paper.

<U> U reduce(U identity,BiFunction<U, ? super T, U> accumulator,BinaryOperator<U> combiner)

It has three parameters:

identity: An initialized value; this initialized value is of type generic U, which is the same as the type returned by the Reduce method; note that elements in Stream are of type T, which can be different or the same as U, and thus have a larger operating space; regardless of the type of elements stored in Stream, U can be of any type, such as U can be some basic data typePackaging types Integer, Long, and so on; either String, or some collection type ArrayList, and so on; these usages will be mentioned later.
accumulator: Its type is BiFunction, the input is U and T, and the return is U; that is, the return type is the same as the first parameter type of the input, and the second parameter type of the input is the same as the element type of Stream.
combiner: Its type is BinaryOperator and supports manipulating objects of type U;

The third parameter combiner is primarily used in parallel computing scenarios; if Stream is not parallel, the third parameter is actually not valid.

Code implementation:

class WordCounter { 
 private final int counter; 
 private final boolean lastSpace; 
 public WordCounter(int counter, boolean lastSpace) { 
 this.counter = counter; 
 this.lastSpace = lastSpace; 
 } 
 public WordCounter accumulate(Character c) { 
 if (Character.isWhitespace(c)) { 
 return lastSpace ? 
 this : 
 new WordCounter(counter, true); 
 } else { 
 return lastSpace ? 
 new WordCounter(counter + 1, false) :
 this; 
 } 
 } 
 public WordCounter combine(WordCounter wordCounter) { 
 return new WordCounter(counter + wordCounter.counter, 
 wordCounter.lastSpace); 
 } 
 public int getCounter() { 
 return counter; 
 } 
}

class WordCounterSpliterator implements Spliterator<Character> { 
 private final String string; 
 private int currentChar = 0; 
 public WordCounterSpliterator(String string) { 
 this.string = string; 
 } 
 @Override 
 public boolean tryAdvance(Consumer<? super Character> action) { 
 action.accept(string.charAt(currentChar++)); 
 return currentChar < string.length(); 
 } 
 @Override 
 public Spliterator<Character> trySplit() { 
 int currentSize = string.length() - currentChar; 
 if (currentSize < 10) { 
 return null; 
 } 
 for (int splitPos = currentSize / 2 + currentChar; 
 splitPos < string.length(); splitPos++) { 
 if (Character.isWhitespace(string.charAt(splitPos))) { 
 Spliterator<Character> spliterator = 
 new WordCounterSpliterator(string.substring(currentChar, 
 splitPos)); 
 currentChar = splitPos; 
 return spliterator; 
 } 
 } 
 return null; 
 } 
 @Override 
 public long estimateSize() { 
 return string.length() - currentChar; 
 } 
 @Override 
 public int characteristics() { 
 return ORDERED + SIZED + SUBSIZED + NONNULL + IMMUTABLE; 
 } 
}

final String SENTENCE = 
 " Nel mezzo del cammin di nostra vita " + 
 "mi ritrovai in una selva oscura" + 
 " ché la dritta via era smarrita ";


private int countWords(Stream<Character> stream) { 
 WordCounter wordCounter = stream.reduce(new WordCounter(0, true), 
 WordCounter::accumulate, 
 WordCounter::combine); 
 return wordCounter.getCounter(); 
}

Spliterator<Character> spliterator = new WordCounterSpliterator(SENTENCE); 
Stream<Character> stream = StreamSupport.stream(spliterator, true);

System.out.println("Found " + countWords(stream) + " words");

Last Print Display

Found 19 words

Topics: Java

Programmer Think