Spark Learning Notes - creation of RDD
By default, Spark can divide a job into multiple tasks and send it to the Executor node for parallel computing. The number of tasks that can be calculated in parallel is called parallelism. This number can be specified when building the RDD. However, the number of splitting tasks is not necessarily equal to the number of tasks executed in paral ...
Posted by mona02 on Sun, 02 Jan 2022 09:42:25 +0100
Flink user e-commerce behavior analysis project
Analysis of Flink users' e-commerce behavior
1. Real time statistical analysis
1.1 statistics of popular commodities
Demand Description: display the top n of the popular products of the website within 1 hour every 5 minutes Data form displayed:
Time window information:
NO 1: Product ID + number of views 1
NO 2: Product ...
Posted by califdon on Sat, 01 Jan 2022 14:40:33 +0100
Analysis of Apache Avro data
Absrtact: This article will demonstrate how to generate avro data by serialization and parse it using FlinkSQL.
This article is shared from Huawei cloud community< [technology sharing] serialization and deserialization of Apache Avro data & & flinksql parsing Avro data >, author: third uncle of Nanpai.
Technical background
...
Posted by learning_php_mysql on Sat, 01 Jan 2022 14:25:13 +0100
2021-12-30 the 58th step towards the program
catalogue
1, Introduction to azkaban
2, System architecture of azkaban
3, Installation mode of azkaban
3.1 Solo Server installation
3.1. 1 Introduction to solo server
3.1. 2 installation steps
3.2 installation method of multi exec server
3.2. 1 node layout
3.2. 2. Configure mysql
3.2. 3. Configure web server
3.2. 4. Configure exec se ...
Posted by evolve4 on Sat, 01 Jan 2022 04:07:23 +0100
Bill data warehouse construction - data warehouse concept and data collection
1 data warehouse concept
Data Warehouse can be abbreviated as DW or DWH. Data Warehouse is a strategic set that provides all system data support for all decision-making processes of enterprises. The analysis of data in data warehouse can help enterprises improve business processes, control costs and improve product quality. Data warehouse is n ...
Posted by ddragas on Sat, 01 Jan 2022 01:39:50 +0100
Hive file storage format
Hive supports the following formats for storing data: TEXTFILE (row storage), sequencefile (row storage), ORC (column storage) and PARQUET (column storage)
1: Column storage and row storage
The left side of the figure above is a logical table, the first one on the right is row storage, and the second one is column storage.
The storage ...
Posted by Lol5916 on Fri, 31 Dec 2021 12:59:52 +0100
Call MapReduce to count the occurrence times of each word in the file
Note: the places that need to be installed and configured are in the final reference materials
1, Upload the files to be analyzed (no less than 100000 English words) to HDFS
demo.txt is the file to be analyzed
Start Hadoop
Upload the file to the input folder of hdfs
Ensure successful upload
2. Call MapReduce to count the n ...
Posted by crazytoon on Fri, 31 Dec 2021 05:40:31 +0100
Network data transmission process in flink
background
This paper follows the above StreamTask data flow , we explained how the data in the task in each executor is read, converted and written. In this article, we will explain the data transmission process between executors, including the shuffle implementation of flick and the details of reading and writing data of InputChannel and Res ...
Posted by waterox on Thu, 30 Dec 2021 22:32:03 +0100
Hive common function summary
function
View all built-in functions
show functions;
How to use query function
desc function [extended]Detailed display function name
UDF one in one out is measured by lineUDAF enters one more placeUDTF one in many out
UDF
NVL: assign a value to the data whose value is NULL. Its format is NVL (value, default_value). Its function is to ...
Posted by jerk on Thu, 30 Dec 2021 17:53:54 +0100
Hive compression and storage
Hive compression and storage
1, Hadoop compression configuration
1.1 MR supported compression coding
Compression coding
In order to support a variety of compression / decompression algorithms, Hadoop introduces an encoder / decoder, as shown in the following table:
encoder
Comparison of compression performance
performance comparison ...
Posted by sparrrow on Thu, 30 Dec 2021 11:08:41 +0100