[Spark Streaming] Spark Day11: Spark Streaming learning notes

Spark Day11: Spark Streaming 01 - [understand] - yesterday's course content review Main explanation: Spark Streaming module quick start 1,Streaming Overview of flow computing - Streaming Application scenario Real time report RealTime Report Real time increment ETL Real time early warning and monitoring Real time search recomm ...

Posted by richever on Sun, 28 Nov 2021 09:11:50 +0100

[Spark Streaming] Spark Day10: Spark Streaming learning notes

Spark Day10: Spark Streaming 01 - [understand] - yesterday's course content review Practical exercise: Taking the background of DMP advertising industry as an example, the processing of advertising click data is divided into two aspects [advertising data ETL conversion and business report development], as follows: [[premise]: use Spar ...

Posted by rjs34 on Sun, 28 Nov 2021 07:40:00 +0100

Spark source code reading 02 - storage analysis of spark storage principle

Overall framework Spark storage adopts master-slave mode, i.e. Master / Slave mode. The whole enclosure uses RPC message communication mode. Of which: The Master is responsible for the management and maintenance of data block metadata during the operation of the whole applicationSlave is responsible for reporting the status information of ...

Posted by machiavelli1079 on Thu, 25 Nov 2021 02:28:07 +0100

Spark Streaming foundation - DStream creation - RDD queue, custom data source, Kafka data source

Chapter 3 DStream creation 3.1 RDD queue 3.1.1 usage and description During the test, you can create a DStream by using ssc.queueStream(queueOfRDDs). Each RDD pushed to the queue will be processed as a DStream. 3.1.2 case practice Requirement: create several RDD S in a loop and put them into the queue. Create Dstream through SparkSt ...

Posted by RussellReal on Fri, 19 Nov 2021 23:38:38 +0100

spark source code tracks RDD logic code execution

1, Driver val sparkConnf=new SparkConf().setAppName("wordCount").setMaster("local[3]") val sparkContext=new SparkContext(sparkConnf) val rdd = sparkContext.parallelize(Array(1, 2, 3, 4, 5), 3) val rdd_increace=rdd.map(_+1) rdd_increace.collect() sparkContext.stop() The above code creates two RDDS without shuffle dependency, so there are ...

Posted by irandoct on Fri, 19 Nov 2021 10:59:06 +0100

Redis memory database

Redis is a high-performance key value in memory database. Redis is completely open source and free, and complies with the BSD protocol 1. Architecture KV database of single process and single thread modelIt is completely based on memory and provides data persistence functionThe data structure is simple and the operation is simpleUsing multipl ...

Posted by scarface222 on Thu, 18 Nov 2021 17:04:10 +0100

Meaning of spark.sql.hive.caseSensitiveInferenceMode parameter of spark

This paper combs and summarizes the parameter meaning and use of spark.sql.hive.caseSensitiveInferenceMode of spark 1. Parameter meaning Spark 2.1.1 introduces a new configuration item: spark.sql.hive.caseSensitiveInferenceMode. The default value is NEVER_INFER, maintain behavior consistent with spark 2.1.0. However, Spark 2.2.0 changes the de ...

Posted by kate_rose on Fri, 12 Nov 2021 18:22:59 +0100

Spark's chche and checkpoint

In order to introduce these two mechanisms, here we write an operation to realize Pi. Here, we only need to calculate the probability ratio of the point falling in the circle to the point falling in the square Here we highlight slices indicates how many tasks are generated cnt indicates how many points are generated in each task For the nu ...

Posted by mazen on Fri, 12 Nov 2021 05:34:07 +0100

Spark operator - Python

1, Theoretical basis Spark operators can be divided into: Transformation Transformation/Conversion operator: this transformation does not trigger the submission of the job and completes the intermediate process of the job. Transformation Operations are deferred, that is, from a RDD Transform to generate another RDD The operation is not perfor ...

Posted by blackcow on Fri, 22 Oct 2021 09:24:04 +0200

Hadoop, zookeeper, spark installation

New Folder: Compressed Package Folder, Software Installation Directory Folder The following does not indicate which host operations are all Master host operations # Recursively Create Compressed Package Folder mkdir -p /usr/tar # Recursively create the software installation directory folder mkdir -p /usr/apps Install upload and color code c ...

Posted by Skepsis on Thu, 21 Oct 2021 15:31:14 +0200