Spark Learning Notes - creation of RDD

By default, Spark can divide a job into multiple tasks and send it to the Executor node for parallel computing. The number of tasks that can be calculated in parallel is called parallelism. This number can be specified when building the RDD. However, the number of splitting tasks is not necessarily equal to the number of tasks executed in paral ...

Posted by mona02 on Sun, 02 Jan 2022 09:42:25 +0100

Flink user e-commerce behavior analysis project

Analysis of Flink users' e-commerce behavior 1. Real time statistical analysis 1.1 statistics of popular commodities Demand Description: display the top n of the popular products of the website within 1 hour every 5 minutes Data form displayed: Time window information: NO 1: Product ID + number of views 1 NO 2: Product ...

Posted by califdon on Sat, 01 Jan 2022 14:40:33 +0100

Analysis of Apache Avro data

Absrtact: This article will demonstrate how to generate avro data by serialization and parse it using FlinkSQL. This article is shared from Huawei cloud community< [technology sharing] serialization and deserialization of Apache Avro data & & flinksql parsing Avro data >, author: third uncle of Nanpai. Technical background ...

Posted by learning_php_mysql on Sat, 01 Jan 2022 14:25:13 +0100

2021-12-30 the 58th step towards the program

catalogue 1, Introduction to azkaban 2, System architecture of azkaban 3, Installation mode of azkaban 3.1 Solo Server installation 3.1. 1 Introduction to solo server 3.1. 2 installation steps 3.2 installation method of multi exec server 3.2. 1 node layout 3.2. 2. Configure mysql 3.2. 3. Configure web server 3.2. 4. Configure exec se ...

Posted by evolve4 on Sat, 01 Jan 2022 04:07:23 +0100

Bill data warehouse construction - data warehouse concept and data collection

1 data warehouse concept Data Warehouse can be abbreviated as DW or DWH. Data Warehouse is a strategic set that provides all system data support for all decision-making processes of enterprises. The analysis of data in data warehouse can help enterprises improve business processes, control costs and improve product quality. Data warehouse is n ...

Posted by ddragas on Sat, 01 Jan 2022 01:39:50 +0100

Hive file storage format

Hive supports the following formats for storing data: TEXTFILE (row storage), sequencefile (row storage), ORC (column storage) and PARQUET (column storage) 1: Column storage and row storage   The left side of the figure above is a logical table, the first one on the right is row storage, and the second one is column storage. The storage ...

Posted by Lol5916 on Fri, 31 Dec 2021 12:59:52 +0100

Call MapReduce to count the occurrence times of each word in the file

Note: the places that need to be installed and configured are in the final reference materials 1, Upload the files to be analyzed (no less than 100000 English words) to HDFS  demo.txt is the file to be analyzed Start Hadoop Upload the file to the input folder of hdfs Ensure successful upload 2. Call MapReduce to count the n ...

Posted by crazytoon on Fri, 31 Dec 2021 05:40:31 +0100

Network data transmission process in flink

background This paper follows the above StreamTask data flow , we explained how the data in the task in each executor is read, converted and written. In this article, we will explain the data transmission process between executors, including the shuffle implementation of flick and the details of reading and writing data of InputChannel and Res ...

Posted by waterox on Thu, 30 Dec 2021 22:32:03 +0100

Hive common function summary

function View all built-in functions show functions; How to use query function desc function [extended]Detailed display function name UDF one in one out is measured by lineUDAF enters one more placeUDTF one in many out UDF NVL: assign a value to the data whose value is NULL. Its format is NVL (value, default_value). Its function is to ...

Posted by jerk on Thu, 30 Dec 2021 17:53:54 +0100

Hive compression and storage

Hive compression and storage 1, Hadoop compression configuration 1.1 MR supported compression coding Compression coding In order to support a variety of compression / decompression algorithms, Hadoop introduces an encoder / decoder, as shown in the following table: encoder Comparison of compression performance performance comparison ...

Posted by sparrrow on Thu, 30 Dec 2021 11:08:41 +0100