[Spark Streaming] Spark Day11: Spark Streaming learning notes
Spark Day11: Spark Streaming
01 - [understand] - yesterday's course content review
Main explanation: Spark Streaming module quick start
1,Streaming Overview of flow computing
- Streaming Application scenario
Real time report RealTime Report
Real time increment ETL
Real time early warning and monitoring
Real time search recomm ...
Posted by richever on Sun, 28 Nov 2021 09:11:50 +0100
[Spark Streaming] Spark Day10: Spark Streaming learning notes
Spark Day10: Spark Streaming
01 - [understand] - yesterday's course content review
Practical exercise: Taking the background of DMP advertising industry as an example, the processing of advertising click data is divided into two aspects [advertising data ETL conversion and business report development], as follows:
[[premise]: use Spar ...
Posted by rjs34 on Sun, 28 Nov 2021 07:40:00 +0100
Spark source code reading 02 - storage analysis of spark storage principle
Overall framework
Spark storage adopts master-slave mode, i.e. Master / Slave mode. The whole enclosure uses RPC message communication mode. Of which:
The Master is responsible for the management and maintenance of data block metadata during the operation of the whole applicationSlave is responsible for reporting the status information of ...
Posted by machiavelli1079 on Thu, 25 Nov 2021 02:28:07 +0100
Spark Streaming foundation - DStream creation - RDD queue, custom data source, Kafka data source
Chapter 3 DStream creation
3.1 RDD queue
3.1.1 usage and description
During the test, you can create a DStream by using ssc.queueStream(queueOfRDDs). Each RDD pushed to the queue will be processed as a DStream.
3.1.2 case practice
Requirement: create several RDD S in a loop and put them into the queue. Create Dstream through SparkSt ...
Posted by RussellReal on Fri, 19 Nov 2021 23:38:38 +0100
spark source code tracks RDD logic code execution
1, Driver
val sparkConnf=new SparkConf().setAppName("wordCount").setMaster("local[3]")
val sparkContext=new SparkContext(sparkConnf)
val rdd = sparkContext.parallelize(Array(1, 2, 3, 4, 5), 3)
val rdd_increace=rdd.map(_+1)
rdd_increace.collect()
sparkContext.stop()
The above code creates two RDDS without shuffle dependency, so there are ...
Posted by irandoct on Fri, 19 Nov 2021 10:59:06 +0100
Redis memory database
Redis is a high-performance key value in memory database. Redis is completely open source and free, and complies with the BSD protocol
1. Architecture
KV database of single process and single thread modelIt is completely based on memory and provides data persistence functionThe data structure is simple and the operation is simpleUsing multipl ...
Posted by scarface222 on Thu, 18 Nov 2021 17:04:10 +0100
Meaning of spark.sql.hive.caseSensitiveInferenceMode parameter of spark
This paper combs and summarizes the parameter meaning and use of spark.sql.hive.caseSensitiveInferenceMode of spark
1. Parameter meaning Spark 2.1.1 introduces a new configuration item: spark.sql.hive.caseSensitiveInferenceMode. The default value is NEVER_INFER, maintain behavior consistent with spark 2.1.0. However, Spark 2.2.0 changes the de ...
Posted by kate_rose on Fri, 12 Nov 2021 18:22:59 +0100
Spark's chche and checkpoint
In order to introduce these two mechanisms, here we write an operation to realize Pi. Here, we only need to calculate the probability ratio of the point falling in the circle to the point falling in the square Here we highlight slices indicates how many tasks are generated cnt indicates how many points are generated in each task For the nu ...
Posted by mazen on Fri, 12 Nov 2021 05:34:07 +0100
Spark operator - Python
1, Theoretical basis
Spark operators can be divided into:
Transformation Transformation/Conversion operator: this transformation does not trigger the submission of the job and completes the intermediate process of the job. Transformation Operations are deferred, that is, from a RDD Transform to generate another RDD The operation is not perfor ...
Posted by blackcow on Fri, 22 Oct 2021 09:24:04 +0200
Hadoop, zookeeper, spark installation
New Folder: Compressed Package Folder, Software Installation Directory Folder
The following does not indicate which host operations are all Master host operations
# Recursively Create Compressed Package Folder
mkdir -p /usr/tar
# Recursively create the software installation directory folder
mkdir -p /usr/apps
Install upload and color code c ...
Posted by Skepsis on Thu, 21 Oct 2021 15:31:14 +0200