Spark 2.4.2 source compilation

Software version:     jdk: 1.8     maven: 3.61    http://maven.apache.org/download.cgi     spark: 2.42      https://archive.apache.org/dist/spark/spark-2.4.2/ Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed) To configure maven: #Configure environment variables [root@hadoop004  ...

Posted by buddymoore on Wed, 20 Nov 2019 18:01:18 +0100

Traffic statistics of MapReduce

Traffic statistics of MapReduce means of preparation Open hadoop pojo level mapper level reducer level partition layer job level Package it into jar package and upload it to the server summary Sort by total flow pojo level mapper level reducer level Zoning layer job level means of preparation A virtual machine with ha ...

Posted by Hellomonkey on Tue, 19 Nov 2019 18:39:37 +0100

HBase custom MapReduce

Transfer of HBase table data In the Hadoop phase, the MR task we wrote has two classes: Mapper and Reducer. In HBase, we need to inherit two classes: TableMapper and TableReducer. Objective: to migrate part of the data in the fruit table to the fruit_mr table through MR Step 1. Build the ReadFruitMapper class to read the data in the fruit ta ...

Posted by brooky on Sun, 03 Nov 2019 19:16:43 +0100

Cluster construction of hadoop, spark, hive and azkaban under ubuntu

Tuesday, 08. October 2019 11:01 am Initial preparation: 1. jdk installation Do the following on all three machines (depending on the number of machines you have): 1) you can install jdk through apt get, execute whereis java on the command line to get the installation path of java, or download the installation package of jdk manually f ...

Posted by mattal999 on Sat, 02 Nov 2019 11:47:52 +0100

2. hdfs architecture

[TOC] I. Overview of HDFS System Composition This is a distributed file system that is suitable for scenarios where multiple reads are written at one time. Contains the following roles: NameNode(nn): Store metadata of files, such as file name, file directory structure, file attributes and so on, as well as block list of each file and DataNode ...

Posted by Nilanka on Mon, 14 Oct 2019 05:24:03 +0200

Could not flush and close the file system output stream

A Flink program for Kafka data consumption, the Flinon Yarn model, was released in the test and production environments before. It was normal and had no problems. However, after restarting the test environment, it was redistributed again. The error was reported as follows: 2019-07-01 15:19:25,984 INFO ...

Posted by foxden on Thu, 10 Oct 2019 06:09:17 +0200

Hive 2.3.0 Installation Notes

preparation in advance Complete hadoop installationComplete the installation of mysql Download Hive wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz Or go to the official website and install it. Unzip to the spe ...

Posted by ubuntu-user on Wed, 18 Sep 2019 13:23:26 +0200

0663-6.2.0 - Get CDSW login information through Nginx

Fayson's github: https://github.com/fayson/cdh project Recommend the public number "Hadoop Practice", ID: gh_c4c535955d0f 1 Document Writing Purpose Task Background: We need to record the audit information of CDSW login, such as when th ...

Posted by NiteCloak on Fri, 13 Sep 2019 07:40:47 +0200

MapReduce custom k, partition, and counter

1. Introduction Case - WordCount Requirement: Statistically output the total number of occurrences of each word in a given set of text files 1. Data format preparation Create a new file cd /export/servers vim wordcount.txt Put the following ...

Posted by Varma69 on Wed, 11 Sep 2019 13:34:17 +0200

Hive format for storing and reading files

Hive files are stored in the following formats: TEXTFILE SEQUENCEFILE RCFILE ORCFILE (since 0.11) TEXTFILE is the default format, which will be defaulted if tables are not specified. When data is imported, data files will be copied directly ...

Posted by MatrixGL on Fri, 06 Sep 2019 04:28:02 +0200