Spark 2.4.2 source compilation
Software version:
jdk: 1.8
maven: 3.61 http://maven.apache.org/download.cgi
spark: 2.42 https://archive.apache.org/dist/spark/spark-2.4.2/
Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed)
To configure maven:
#Configure environment variables
[root@hadoop004 ...
Posted by buddymoore on Wed, 20 Nov 2019 18:01:18 +0100
Traffic statistics of MapReduce
Traffic statistics of MapReduce
means of preparation
Open hadoop
pojo level
mapper level
reducer level
partition layer
job level
Package it into jar package and upload it to the server
summary
Sort by total flow
pojo level
mapper level
reducer level
Zoning layer
job level
means of preparation
A virtual machine with ha ...
Posted by Hellomonkey on Tue, 19 Nov 2019 18:39:37 +0100
HBase custom MapReduce
Transfer of HBase table data
In the Hadoop phase, the MR task we wrote has two classes: Mapper and Reducer. In HBase, we need to inherit two classes: TableMapper and TableReducer.
Objective: to migrate part of the data in the fruit table to the fruit_mr table through MR
Step 1. Build the ReadFruitMapper class to read the data in the fruit ta ...
Posted by brooky on Sun, 03 Nov 2019 19:16:43 +0100
Cluster construction of hadoop, spark, hive and azkaban under ubuntu
Tuesday, 08. October 2019 11:01 am
Initial preparation:
1. jdk installation
Do the following on all three machines (depending on the number of machines you have):
1) you can install jdk through apt get, execute whereis java on the command line to get the installation path of java, or download the installation package of jdk manually f ...
Posted by mattal999 on Sat, 02 Nov 2019 11:47:52 +0100
2. hdfs architecture
[TOC]
I. Overview of HDFS System Composition
This is a distributed file system that is suitable for scenarios where multiple reads are written at one time. Contains the following roles:
NameNode(nn):
Store metadata of files, such as file name, file directory structure, file attributes and so on, as well as block list of each file and DataNode ...
Posted by Nilanka on Mon, 14 Oct 2019 05:24:03 +0200
Could not flush and close the file system output stream
A Flink program for Kafka data consumption, the Flinon Yarn model, was released in the test and production environments before. It was normal and had no problems. However, after restarting the test environment, it was redistributed again. The error was reported as follows:
2019-07-01 15:19:25,984 INFO ...
Posted by foxden on Thu, 10 Oct 2019 06:09:17 +0200
Hive 2.3.0 Installation Notes
preparation in advance
Complete hadoop installationComplete the installation of mysql
Download Hive
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz
Or go to the official website and install it.
Unzip to the spe ...
Posted by ubuntu-user on Wed, 18 Sep 2019 13:23:26 +0200
0663-6.2.0 - Get CDSW login information through Nginx
Fayson's github: https://github.com/fayson/cdh project
Recommend the public number "Hadoop Practice", ID: gh_c4c535955d0f
1 Document Writing Purpose
Task Background: We need to record the audit information of CDSW login, such as when th ...
Posted by NiteCloak on Fri, 13 Sep 2019 07:40:47 +0200
MapReduce custom k, partition, and counter
1. Introduction Case - WordCount
Requirement: Statistically output the total number of occurrences of each word in a given set of text files
1. Data format preparation
Create a new file
cd /export/servers
vim wordcount.txt
Put the following ...
Posted by Varma69 on Wed, 11 Sep 2019 13:34:17 +0200
Hive format for storing and reading files
Hive files are stored in the following formats:
TEXTFILE
SEQUENCEFILE
RCFILE
ORCFILE (since 0.11)
TEXTFILE is the default format, which will be defaulted if tables are not specified. When data is imported, data files will be copied directly ...
Posted by MatrixGL on Fri, 06 Sep 2019 04:28:02 +0200