spark Learning Notes - core operators
spark Learning Notes - core operator (2)
distinct operator
/**
* Return a new RDD containing the distinct elements in this RDD.
*/
def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope {
def removeDuplicatesInPartition(partition: Iterator[T]): Iterator[T] = {
// Create an instance of extern ...
Posted by Mikester on Wed, 22 Sep 2021 17:46:44 +0200
spark related introduction - extract hive table
Environmental description of this document
centos The server
jupyter of scala nucleus spylon-kernel
spark-2.4.0
scala-2.11.12
hadoop-2.6.0
Main contents of this paper
spark reads the data of hive table, mainly including direct sql reading of hive table; Read hive table and hive partition table through hdfs file.Initialize the sparksession t ...
Posted by flash99 on Mon, 20 Sep 2021 18:18:29 +0200
Thread security issues for JUC collections
scene
Because when working, you don't over-consider the issue of high concurrency. Most collections use normal lists, sets, maps. There's not much problem either. But if you're in a multi-threaded scenario where multiple threads are manipulating a collection at the same time, there's a lot of problem.
Collection Security Issue Code Display
L ...
Posted by SaxMan101 on Fri, 10 Sep 2021 18:13:05 +0200
Inside Spark Technology: detailed explanation of Shuffle
Next, we will introduce some more detailed implementation details.
Shuffle is undoubtedly a key point of performance tuning. This paper will deeply analyze the implementation details of Spark Shuffle from the perspective of source code implementation.
The upper boundary of each Stage requires either reading data from external storage or r ...
Posted by rbrown on Wed, 08 Sep 2021 04:50:23 +0200
Scala07_ Higher order function programming
Higher order function
The so-called high-order function is actually to use the function as an object;Functions also have types, that is, function types
1 function is assigned to the variable as a value
1.1 assigning a parameterless function object to a variable
1. Phenomenon introduction
object Scala05_Function_Hell {
def main(args: Arra ...
Posted by fwegan on Wed, 01 Sep 2021 21:09:13 +0200
akka-typed - cluster: group router, cluster-load-balancing
Let's start with the router actor for akka-typed.route is divided into pool router and group router.Let's first look at a demonstration of using pool-router:
val pool = Routers.pool(poolSize = 4)(
// make sure the workers are restarted if they fail
Behaviors.supervise(WorkerRoutee()).onFailure[Exception](SupervisorStrateg ...
Posted by scnjl on Thu, 11 Jun 2020 03:02:18 +0200
akka-typed - typed-actor, typed messages
It has been a while since akka 2.6.x was officially released.The core change is the formal enabling of typed-actor s, but there are also big changes in modules such as persistence,cluster, and so on.Name estimation starts with changing traditional anytype messages to strongly typed ones, so you'll want to take a moment to see how this can have ...
Posted by aod on Tue, 26 May 2020 18:22:55 +0200
Summary and solution of various errors reported by Flink
Table is not an append-only table. Use the toRetractStream() in order to handle add and retract messages.
This is because the dynamic table is not in the append only mode. It needs to be processed with to retrieve stream
tableEnv.toRetractStreamPerson.print()
Today, when you start the Flink task, an error was reported as "Caused by: jav ...
Posted by CaseyC1 on Thu, 07 May 2020 10:54:41 +0200
Chapter 3 array related operations of quick learning scala
Recently, I am learning this book, without java foundation. According to the blogger (I am a painter), I wrote the after-school questions. In Chapter 3, the blogger used several kinds of questions, but Xiaobai was not clear about his lack of knowledge. In my own way, I wrote several questions.
3.1 write a piece of code to set a as a n array of ...
Posted by tripleaaa on Mon, 04 May 2020 14:26:33 +0200
Akka writes RPC communication framework to simulate a small case of Worker connecting to Master
Guiding ideology:
1. Using RPC communication framework (AKKA)2. Define 2 classes Master and Worker
-------------------------------------------------------------------------------------------------------------------------------Start Master first, then all workers1. After the Worker is started, establish a connection with the Master in the PreSta ...
Posted by kettle_drum on Thu, 30 Apr 2020 17:12:55 +0200