This is a program officially released by Java and produced by Oracle JDK R & D nipafx, including the recent R & D Progress of JDK and the prospect and use of new features. It comes with personal translated subtitles. I read and extracted the essence of Nipafx's extensive data and made detailed personal explanations. Video address (cooked meat)
⎯⎯⎯⎯⎯⎯ Chapters ⎯⎯⎯⎯⎯⎯
- 0:00 - Intro
- 0:33 - Vector API
- 0:56 - Vector API - SIMD and Vector Instructions
- 2:22 - Vector API - Current State
- 3:10 - Vector API - More
Inside Java podcast Ep. 7 - 3:59 - Records Serialization
- 5:22 - JDK 17 - Enhanced Pseudo-Random Number Generators
- 6:06 - Outro
The content of this section is not much, but it is more interesting.
Vector API
Relevant JEP:
- JEP 338: Vector API (Incubator)
- JEP 414: Vector API (Second Incubator) : in Java 17
- JEP 417: Vector API (Third Incubator) : in Java 18
The most important application is SIMD (single instruction multiple data) processing using CPU. It provides multi-channel data flow through the program, which may have 4 or 8 channels or any number of single data elements. Moreover, the CPU organizes operations in parallel on all channels at once, which can greatly increase the CPU throughput. Through the Vector API, the Java team is trying to let Java programmers directly access it using java code; In the past, they had to program vector mathematics at the assembly code level, or use C/C + + with Intrinsic, and then provide it to Java through JNI.
A major optimization point is the loop. In the past, the loop (scalar loop) was executed on one element at a time, which was very slow. Now you can use the Vector API to convert scalar algorithms to faster data parallel algorithms. An example of using Vector:
//The test index is throughput @BenchmarkMode(Mode.Throughput) //Preheating is required to eliminate the impact of jit real-time compilation and JVM collection of various indicators. Since we cycle many times in a single cycle, preheating once is OK @Warmup(iterations = 1) //Single thread is enough @Fork(1) //Test times, we test 10 times @Measurement(iterations = 10) //The life cycle of a class instance is defined, and all test threads share an instance @State(value = Scope.Benchmark) public class VectorTest { private static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_256; final int size = 1000; final float[] a = new float[size]; final float[] b = new float[size]; final float[] c = new float[size]; public VectorTest() { for (int i = 0; i < size; i++) { a[i] = ThreadLocalRandom.current().nextFloat(0.0001f, 100.0f); b[i] = ThreadLocalRandom.current().nextFloat(0.0001f, 100.0f); } } @Benchmark public void testScalar(Blackhole blackhole) throws Exception { for (int i = 0; i < a.length; i++) { c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f; } } @Benchmark public void testVector(Blackhole blackhole) { int i = 0; //A multiple of the length of data processed at one time by specialties that is higher than the length of the array int upperBound = SPECIES.loopBound(a.length); //Deal with specifications every cycle Length() so much data for (; i < upperBound; i += SPECIES.length()) { // FloatVector va, vb, vc; var va = FloatVector.fromArray(SPECIES, a, i); var vb = FloatVector.fromArray(SPECIES, b, i); var vc = va.mul(va) .add(vb.mul(vb)) .neg(); vc.intoArray(c, i); } for (; i < a.length; i++) { c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f; } } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(VectorTest.class.getSimpleName()).build(); new Runner(opt).run(); } }
Note that using the incubated Java feature requires additional startup parameters to expose the module. Here is -- add modules JDK incubator. Vector, these parameters need to be added during javac compilation and Java operation. Use IDEA, that is:
Test results:
Benchmark Mode Cnt Score Error Units VectorTest.testScalar thrpt 10 7380697.998 ± 1018277.914 ops/s VectorTest.testVector thrpt 10 37151609.182 ± 1011336.900 ops/s
For other uses, please refer to: fizzbuzz-simd-style , this is an interesting article (although the performance optimization is not only due to SIMD, but also due to algorithm optimization, ha ha)
For more detailed use and design ideas, please refer to this audio: https://www.youtube.com/watch...
Records Serialization
I also wrote an article to analyze the serialization of Java Record. For reference: [some thoughts on Java Record - serialization related] ()
Among them, the most important is the compatibility of some mainstream serialization frameworks
Because Record limits the only way of serialization and deserialization, it is actually very simple to be compatible, which is simpler than the serialization framework change caused by changing the structure of Java Class and adding a feature.
- Issue: Support for record types in JDK 14
- Pull Request: Add support for Record types in JDK 14
- Corresponding version: 1.5 x. Not yet released
The idea of implementing Record compatibility in these three frameworks is very similar and relatively simple, that is:
- Implement a special Serializer and Deserializer for Record.
- Verify whether the current version of Java supports Record through Java Reflection or Java MethodHandle, and obtain the canonical constructor of Record and getter s of various field s for deserialization and serialization.
JDK 17 - Enhanced Pseudo-Random Number Generators
Java 17 makes a unified interface package for random number generator, and has built-in Xoshiro algorithm and its own developed LXM algorithm. You can refer to my series of articles:
- Evolution and thinking of hard core Java random number related API (Part I)
- Evolution and thinking of hard core Java random number related API (Part 2)
Part of the analysis is intercepted here:
According to the previous analysis, SplittableRandom is the fastest in a single threaded environment and ThreadLocalRandom is the fastest in a multi-threaded environment. The new random algorithm implementation class, Period, requires more calculations, and the implementation of LXM requires more calculations. These algorithms are added to adapt to more random applications, not faster. However, in order to satisfy everyone's curiosity, the following code is written for testing. It can also be seen from the following code that the new RandomGenerator API is easier to use:
package prng; import java.util.random.RandomGenerator; import java.util.random.RandomGeneratorFactory; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.Param; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Threads; import org.openjdk.jmh.annotations.Warmup; import org.openjdk.jmh.infra.Blackhole; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; //The test index is throughput @BenchmarkMode(Mode.Throughput) //Preheating is required to eliminate the impact of jit real-time compilation and JVM collection of various indicators. Since we cycle many times in a single cycle, preheating once is OK @Warmup(iterations = 1) //Number of threads @Threads(10) @Fork(1) //Test times, we test 50 times @Measurement(iterations = 50) //The life cycle of a class instance is defined, and all test threads share an instance @State(value = Scope.Benchmark) public class TestRandomGenerator { @Param({ "Random", "SecureRandom", "SplittableRandom", "Xoroshiro128PlusPlus", "Xoshiro256PlusPlus", "L64X256MixRandom", "L64X128StarStarRandom", "L64X128MixRandom", "L64X1024MixRandom", "L32X64MixRandom", "L128X256MixRandom", "L128X128MixRandom", "L128X1024MixRandom" }) private String name; ThreadLocal<RandomGenerator> randomGenerator; @Setup public void setup() { final String finalName = this.name; randomGenerator = ThreadLocal.withInitial(() -> RandomGeneratorFactory.of(finalName).create()); } @Benchmark public void testRandomInt(Blackhole blackhole) throws Exception { blackhole.consume(randomGenerator.get().nextInt()); } @Benchmark public void testRandomIntWithBound(Blackhole blackhole) throws Exception { //Note that the number 2^n is not taken, because this number is generally not used as the scope of practical application, but the bottom layer is optimized for this number blackhole.consume(randomGenerator.get().nextInt(1, 100)); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(TestRandomGenerator.class.getSimpleName()).build(); new Runner(opt).run(); } }
Test results:
Benchmark (name) Mode Cnt Score Error Units TestRandomGenerator.testRandomInt Random thrpt 50 276250026.985 ± 240164319.588 ops/s TestRandomGenerator.testRandomInt SecureRandom thrpt 50 2362066.269 ± 1277699.965 ops/s TestRandomGenerator.testRandomInt SplittableRandom thrpt 50 365417656.247 ± 377568150.497 ops/s TestRandomGenerator.testRandomInt Xoroshiro128PlusPlus thrpt 50 341640250.941 ± 287261684.079 ops/s TestRandomGenerator.testRandomInt Xoshiro256PlusPlus thrpt 50 343279172.542 ± 247888916.092 ops/s TestRandomGenerator.testRandomInt L64X256MixRandom thrpt 50 317749688.838 ± 245196331.079 ops/s TestRandomGenerator.testRandomInt L64X128StarStarRandom thrpt 50 294727346.284 ± 283056025.396 ops/s TestRandomGenerator.testRandomInt L64X128MixRandom thrpt 50 314790625.909 ± 257860657.824 ops/s TestRandomGenerator.testRandomInt L64X1024MixRandom thrpt 50 315040504.948 ± 101354716.147 ops/s TestRandomGenerator.testRandomInt L32X64MixRandom thrpt 50 311507435.009 ± 315893651.601 ops/s TestRandomGenerator.testRandomInt L128X256MixRandom thrpt 50 187922591.311 ± 137220695.866 ops/s TestRandomGenerator.testRandomInt L128X128MixRandom thrpt 50 218433110.870 ± 164229361.010 ops/s TestRandomGenerator.testRandomInt L128X1024MixRandom thrpt 50 220855813.894 ± 47531327.692 ops/s TestRandomGenerator.testRandomIntWithBound Random thrpt 50 248088572.243 ± 206899706.862 ops/s TestRandomGenerator.testRandomIntWithBound SecureRandom thrpt 50 1926592.946 ± 2060477.065 ops/s TestRandomGenerator.testRandomIntWithBound SplittableRandom thrpt 50 334863388.450 ± 92778213.010 ops/s TestRandomGenerator.testRandomIntWithBound Xoroshiro128PlusPlus thrpt 50 252787781.866 ± 200544008.824 ops/s TestRandomGenerator.testRandomIntWithBound Xoshiro256PlusPlus thrpt 50 247673155.126 ± 164068511.968 ops/s TestRandomGenerator.testRandomIntWithBound L64X256MixRandom thrpt 50 273735605.410 ± 87195037.181 ops/s TestRandomGenerator.testRandomIntWithBound L64X128StarStarRandom thrpt 50 291151383.164 ± 192343348.429 ops/s TestRandomGenerator.testRandomIntWithBound L64X128MixRandom thrpt 50 217051928.549 ± 177462405.951 ops/s TestRandomGenerator.testRandomIntWithBound L64X1024MixRandom thrpt 50 222495366.798 ± 180718625.063 ops/s TestRandomGenerator.testRandomIntWithBound L32X64MixRandom thrpt 50 305716905.710 ± 51030948.739 ops/s TestRandomGenerator.testRandomIntWithBound L128X256MixRandom thrpt 50 174719656.589 ± 148285151.049 ops/s TestRandomGenerator.testRandomIntWithBound L128X128MixRandom thrpt 50 176431895.622 ± 143002504.266 ops/s TestRandomGenerator.testRandomIntWithBound L128X1024MixRandom thrpt 50 198282642.786 ± 24204852.619 ops/s
In the previous result verification, we have known that SplittableRandom has the best performance in single thread, and the best performance in multi-threaded environment is ThreadLocalRandom with similar algorithm but multi-threaded optimization
How to select random algorithm
The principle is to look at your business scenario, how many random combinations there are and within what range. Then find the best algorithm in the Period larger than this range. For example, the business scenario is a deck of poker. In addition to the big and small king 52 cards, the licensing order is determined by random numbers:
- First card: random generator Nextint (0, 52), choose from the remaining 52 cards
- Second card: random generator Nextint (0, 51), choose from the remaining 51 cards
- and so on
Then there are 52! So many results range from 2 ^ 225 to 2 ^ 226. If the Period of the random number generator we use is less than this result set, we may never generate the order of some cards. Therefore, we need to select a Period > 54! Random number generator.
WeChat search "my programming meow" attention to the official account, daily brush, easy to upgrade technology, and capture all kinds of offer: