spark Learning Notes - core operators

spark Learning Notes - core operator (2) distinct operator /** * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { def removeDuplicatesInPartition(partition: Iterator[T]): Iterator[T] = { // Create an instance of extern ...

Posted by Mikester on Wed, 22 Sep 2021 17:46:44 +0200

spark related introduction - extract hive table

Environmental description of this document centos The server jupyter of scala nucleus spylon-kernel spark-2.4.0 scala-2.11.12 hadoop-2.6.0 Main contents of this paper spark reads the data of hive table, mainly including direct sql reading of hive table; Read hive table and hive partition table through hdfs file.Initialize the sparksession t ...

Posted by flash99 on Mon, 20 Sep 2021 18:18:29 +0200

Thread security issues for JUC collections

scene Because when working, you don't over-consider the issue of high concurrency. Most collections use normal lists, sets, maps. There's not much problem either. But if you're in a multi-threaded scenario where multiple threads are manipulating a collection at the same time, there's a lot of problem. Collection Security Issue Code Display L ...

Posted by SaxMan101 on Fri, 10 Sep 2021 18:13:05 +0200

Inside Spark Technology: detailed explanation of Shuffle

Next, we will introduce some more detailed implementation details. Shuffle is undoubtedly a key point of performance tuning. This paper will deeply analyze the implementation details of Spark Shuffle from the perspective of source code implementation. The upper boundary of each Stage requires either reading data from external storage or r ...

Posted by rbrown on Wed, 08 Sep 2021 04:50:23 +0200

Scala07_ Higher order function programming

Higher order function The so-called high-order function is actually to use the function as an object;Functions also have types, that is, function types 1 function is assigned to the variable as a value 1.1 assigning a parameterless function object to a variable 1. Phenomenon introduction object Scala05_Function_Hell { def main(args: Arra ...

Posted by fwegan on Wed, 01 Sep 2021 21:09:13 +0200

akka-typed - cluster: group router, cluster-load-balancing

Let's start with the router actor for akka-typed.route is divided into pool router and group router.Let's first look at a demonstration of using pool-router: val pool = Routers.pool(poolSize = 4)( // make sure the workers are restarted if they fail Behaviors.supervise(WorkerRoutee()).onFailure[Exception](SupervisorStrateg ...

Posted by scnjl on Thu, 11 Jun 2020 03:02:18 +0200

akka-typed - typed-actor, typed messages

It has been a while since akka 2.6.x was officially released.The core change is the formal enabling of typed-actor s, but there are also big changes in modules such as persistence,cluster, and so on.Name estimation starts with changing traditional anytype messages to strongly typed ones, so you'll want to take a moment to see how this can have ...

Posted by aod on Tue, 26 May 2020 18:22:55 +0200

Summary and solution of various errors reported by Flink

Table is not an append-only table. Use the toRetractStream() in order to handle add and retract messages. This is because the dynamic table is not in the append only mode. It needs to be processed with to retrieve stream tableEnv.toRetractStreamPerson.print() Today, when you start the Flink task, an error was reported as "Caused by: jav ...

Posted by CaseyC1 on Thu, 07 May 2020 10:54:41 +0200

Chapter 3 array related operations of quick learning scala

Recently, I am learning this book, without java foundation. According to the blogger (I am a painter), I wrote the after-school questions. In Chapter 3, the blogger used several kinds of questions, but Xiaobai was not clear about his lack of knowledge. In my own way, I wrote several questions. 3.1 write a piece of code to set a as a n array of ...

Posted by tripleaaa on Mon, 04 May 2020 14:26:33 +0200

Akka writes RPC communication framework to simulate a small case of Worker connecting to Master

Guiding ideology: 1. Using RPC communication framework (AKKA)2. Define 2 classes Master and Worker -------------------------------------------------------------------------------------------------------------------------------Start Master first, then all workers1. After the Worker is started, establish a connection with the Master in the PreSta ...

Posted by kettle_drum on Thu, 30 Apr 2020 17:12:55 +0200