In depth interpretation of the second issue of Java insider news

Posted by chopficaro on Sun, 16 Jan 2022 17:16:27 +0100

This is a program officially released by Java and produced by Oracle JDK R & D nipafx, including the recent R & D Progress of JDK and the prospect and use of new features. It comes with personal translated subtitles. I read and extracted the essence of Nipafx's extensive data and made detailed personal explanations. Video address (cooked meat)

⎯⎯⎯⎯⎯⎯ Chapters ⎯⎯⎯⎯⎯⎯

  • 0:00 - Intro
  • 0:33 - Vector API
  • 0:56 - Vector API - SIMD and Vector Instructions
  • 2:22 - Vector API - Current State
  • 3:10 - Vector API - More
    Inside Java podcast Ep. 7
  • 3:59 - Records Serialization
  • 5:22 - JDK 17 - Enhanced Pseudo-Random Number Generators
  • 6:06 - Outro

The content of this section is not much, but it is more interesting.

Vector API

Relevant JEP:

The most important application is SIMD (single instruction multiple data) processing using CPU. It provides multi-channel data flow through the program, which may have 4 or 8 channels or any number of single data elements. Moreover, the CPU organizes operations in parallel on all channels at once, which can greatly increase the CPU throughput. Through the Vector API, the Java team is trying to let Java programmers directly access it using java code; In the past, they had to program vector mathematics at the assembly code level, or use C/C + + with Intrinsic, and then provide it to Java through JNI.

A major optimization point is the loop. In the past, the loop (scalar loop) was executed on one element at a time, which was very slow. Now you can use the Vector API to convert scalar algorithms to faster data parallel algorithms. An example of using Vector:

//The test index is throughput
@BenchmarkMode(Mode.Throughput)
//Preheating is required to eliminate the impact of jit real-time compilation and JVM collection of various indicators. Since we cycle many times in a single cycle, preheating once is OK
@Warmup(iterations = 1)
//Single thread is enough
@Fork(1)
//Test times, we test 10 times
@Measurement(iterations = 10)
//The life cycle of a class instance is defined, and all test threads share an instance
@State(value = Scope.Benchmark)
public class VectorTest {
    private static final VectorSpecies<Float> SPECIES =
            FloatVector.SPECIES_256;

    final int size = 1000;
    final float[] a = new float[size];
    final float[] b = new float[size];
    final float[] c = new float[size];

    public VectorTest() {
        for (int i = 0; i < size; i++) {
            a[i] = ThreadLocalRandom.current().nextFloat(0.0001f, 100.0f);
            b[i] = ThreadLocalRandom.current().nextFloat(0.0001f, 100.0f);
        }
    }

    @Benchmark
    public void testScalar(Blackhole blackhole) throws Exception {
        for (int i = 0; i < a.length; i++) {
            c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
        }
    }

    @Benchmark
    public void testVector(Blackhole blackhole) {
        int i = 0;
        //A multiple of the length of data processed at one time by specialties that is higher than the length of the array
        int upperBound = SPECIES.loopBound(a.length);
        //Deal with specifications every cycle Length() so much data
        for (; i < upperBound; i += SPECIES.length()) {
            // FloatVector va, vb, vc;
            var va = FloatVector.fromArray(SPECIES, a, i);
            var vb = FloatVector.fromArray(SPECIES, b, i);
            var vc = va.mul(va)
                    .add(vb.mul(vb))
                    .neg();
            vc.intoArray(c, i);
        }
        for (; i < a.length; i++) {
            c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
        }
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(VectorTest.class.getSimpleName()).build();
        new Runner(opt).run();
    }
}

Note that using the incubated Java feature requires additional startup parameters to expose the module. Here is -- add modules JDK incubator. Vector, these parameters need to be added during javac compilation and Java operation. Use IDEA, that is:

Test results:

Benchmark               Mode  Cnt         Score         Error  Units
VectorTest.testScalar  thrpt   10   7380697.998 ± 1018277.914  ops/s
VectorTest.testVector  thrpt   10  37151609.182 ± 1011336.900  ops/s

For other uses, please refer to: fizzbuzz-simd-style , this is an interesting article (although the performance optimization is not only due to SIMD, but also due to algorithm optimization, ha ha)

For more detailed use and design ideas, please refer to this audio: https://www.youtube.com/watch...

Records Serialization

I also wrote an article to analyze the serialization of Java Record. For reference: [some thoughts on Java Record - serialization related] ()

Among them, the most important is the compatibility of some mainstream serialization frameworks

Because Record limits the only way of serialization and deserialization, it is actually very simple to be compatible, which is simpler than the serialization framework change caused by changing the structure of Java Class and adding a feature.

The idea of implementing Record compatibility in these three frameworks is very similar and relatively simple, that is:

  1. Implement a special Serializer and Deserializer for Record.
  2. Verify whether the current version of Java supports Record through Java Reflection or Java MethodHandle, and obtain the canonical constructor of Record and getter s of various field s for deserialization and serialization.

JDK 17 - Enhanced Pseudo-Random Number Generators

Java 17 makes a unified interface package for random number generator, and has built-in Xoshiro algorithm and its own developed LXM algorithm. You can refer to my series of articles:

Part of the analysis is intercepted here:

According to the previous analysis, SplittableRandom is the fastest in a single threaded environment and ThreadLocalRandom is the fastest in a multi-threaded environment. The new random algorithm implementation class, Period, requires more calculations, and the implementation of LXM requires more calculations. These algorithms are added to adapt to more random applications, not faster. However, in order to satisfy everyone's curiosity, the following code is written for testing. It can also be seen from the following code that the new RandomGenerator API is easier to use:

package prng;

import java.util.random.RandomGenerator;
import java.util.random.RandomGeneratorFactory;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

//The test index is throughput
@BenchmarkMode(Mode.Throughput)
//Preheating is required to eliminate the impact of jit real-time compilation and JVM collection of various indicators. Since we cycle many times in a single cycle, preheating once is OK
@Warmup(iterations = 1)
//Number of threads
@Threads(10)
@Fork(1)
//Test times, we test 50 times
@Measurement(iterations = 50)
//The life cycle of a class instance is defined, and all test threads share an instance
@State(value = Scope.Benchmark)
public class TestRandomGenerator {
    @Param({
            "Random", "SecureRandom", "SplittableRandom", "Xoroshiro128PlusPlus", "Xoshiro256PlusPlus", "L64X256MixRandom",
            "L64X128StarStarRandom", "L64X128MixRandom", "L64X1024MixRandom", "L32X64MixRandom", "L128X256MixRandom",
            "L128X128MixRandom", "L128X1024MixRandom"
    })
    private String name;
    ThreadLocal<RandomGenerator> randomGenerator;
    @Setup
    public void setup() {
        final String finalName = this.name;
        randomGenerator = ThreadLocal.withInitial(() -> RandomGeneratorFactory.of(finalName).create());
    }

    @Benchmark
    public void testRandomInt(Blackhole blackhole) throws Exception {
        blackhole.consume(randomGenerator.get().nextInt());
    }

    @Benchmark
    public void testRandomIntWithBound(Blackhole blackhole) throws Exception {
        //Note that the number 2^n is not taken, because this number is generally not used as the scope of practical application, but the bottom layer is optimized for this number
        blackhole.consume(randomGenerator.get().nextInt(1, 100));
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(TestRandomGenerator.class.getSimpleName()).build();
        new Runner(opt).run();
    }
}

Test results:

Benchmark                                                  (name)   Mode  Cnt          Score           Error  Units
TestRandomGenerator.testRandomInt                          Random  thrpt   50  276250026.985 ± 240164319.588  ops/s
TestRandomGenerator.testRandomInt                    SecureRandom  thrpt   50    2362066.269 ±   1277699.965  ops/s
TestRandomGenerator.testRandomInt                SplittableRandom  thrpt   50  365417656.247 ± 377568150.497  ops/s
TestRandomGenerator.testRandomInt            Xoroshiro128PlusPlus  thrpt   50  341640250.941 ± 287261684.079  ops/s
TestRandomGenerator.testRandomInt              Xoshiro256PlusPlus  thrpt   50  343279172.542 ± 247888916.092  ops/s
TestRandomGenerator.testRandomInt                L64X256MixRandom  thrpt   50  317749688.838 ± 245196331.079  ops/s
TestRandomGenerator.testRandomInt           L64X128StarStarRandom  thrpt   50  294727346.284 ± 283056025.396  ops/s
TestRandomGenerator.testRandomInt                L64X128MixRandom  thrpt   50  314790625.909 ± 257860657.824  ops/s
TestRandomGenerator.testRandomInt               L64X1024MixRandom  thrpt   50  315040504.948 ± 101354716.147  ops/s
TestRandomGenerator.testRandomInt                 L32X64MixRandom  thrpt   50  311507435.009 ± 315893651.601  ops/s
TestRandomGenerator.testRandomInt               L128X256MixRandom  thrpt   50  187922591.311 ± 137220695.866  ops/s
TestRandomGenerator.testRandomInt               L128X128MixRandom  thrpt   50  218433110.870 ± 164229361.010  ops/s
TestRandomGenerator.testRandomInt              L128X1024MixRandom  thrpt   50  220855813.894 ±  47531327.692  ops/s
TestRandomGenerator.testRandomIntWithBound                 Random  thrpt   50  248088572.243 ± 206899706.862  ops/s
TestRandomGenerator.testRandomIntWithBound           SecureRandom  thrpt   50    1926592.946 ±   2060477.065  ops/s
TestRandomGenerator.testRandomIntWithBound       SplittableRandom  thrpt   50  334863388.450 ±  92778213.010  ops/s
TestRandomGenerator.testRandomIntWithBound   Xoroshiro128PlusPlus  thrpt   50  252787781.866 ± 200544008.824  ops/s
TestRandomGenerator.testRandomIntWithBound     Xoshiro256PlusPlus  thrpt   50  247673155.126 ± 164068511.968  ops/s
TestRandomGenerator.testRandomIntWithBound       L64X256MixRandom  thrpt   50  273735605.410 ±  87195037.181  ops/s
TestRandomGenerator.testRandomIntWithBound  L64X128StarStarRandom  thrpt   50  291151383.164 ± 192343348.429  ops/s
TestRandomGenerator.testRandomIntWithBound       L64X128MixRandom  thrpt   50  217051928.549 ± 177462405.951  ops/s
TestRandomGenerator.testRandomIntWithBound      L64X1024MixRandom  thrpt   50  222495366.798 ± 180718625.063  ops/s
TestRandomGenerator.testRandomIntWithBound        L32X64MixRandom  thrpt   50  305716905.710 ±  51030948.739  ops/s
TestRandomGenerator.testRandomIntWithBound      L128X256MixRandom  thrpt   50  174719656.589 ± 148285151.049  ops/s
TestRandomGenerator.testRandomIntWithBound      L128X128MixRandom  thrpt   50  176431895.622 ± 143002504.266  ops/s
TestRandomGenerator.testRandomIntWithBound     L128X1024MixRandom  thrpt   50  198282642.786 ±  24204852.619  ops/s

In the previous result verification, we have known that SplittableRandom has the best performance in single thread, and the best performance in multi-threaded environment is ThreadLocalRandom with similar algorithm but multi-threaded optimization

How to select random algorithm

The principle is to look at your business scenario, how many random combinations there are and within what range. Then find the best algorithm in the Period larger than this range. For example, the business scenario is a deck of poker. In addition to the big and small king 52 cards, the licensing order is determined by random numbers:

  • First card: random generator Nextint (0, 52), choose from the remaining 52 cards
  • Second card: random generator Nextint (0, 51), choose from the remaining 51 cards
  • and so on

Then there are 52! So many results range from 2 ^ 225 to 2 ^ 226. If the Period of the random number generator we use is less than this result set, we may never generate the order of some cards. Therefore, we need to select a Period > 54! Random number generator.

WeChat search "my programming meow" attention to the official account, daily brush, easy to upgrade technology, and capture all kinds of offer:

Topics: Java