Building hadoop cluster with CentOS7
explain:
1. Prepare three virtual machines, refer to: Construction of CentOS7 cluster environment (taking 3 sets as an example)
2. Configure secret free login between virtual machines: Reference: CentOS7 cluster configuration password free login
3. Install jd ...
Posted by defx on Sun, 12 Dec 2021 16:58:54 +0100
hadoop HDFS folder creation, file upload, file download, folder deletion, file renaming, file details, file type judgment (folder or file)
Absrtact: This article mainly introduces the use of the basic api of hadoop hdfs. Including Windows side dependency configuration and Maven dependency configuration. The last is the actual operation, including: obtaining the remote hadoop hdfs connection and a series of operations on it, including; Folder creation, file upload, file download, f ...
Posted by The_Walrus on Thu, 09 Dec 2021 05:10:08 +0100
Senior big data Development Engineer - Hive learning notes
Hive improved chapter
Use of Hive
Hive's bucket table
1. Principle of drum dividing table
Bucket splitting is a more fine-grained partition relative to partition. Hive table or partition table can further divide bucketsDivide the bucket, take the hash value of the whole data content according to a column, and determine which bucket th ...
Posted by luisluis on Wed, 08 Dec 2021 08:35:11 +0100
Hadoop Source Analysis
2021SC@SDUSC
A brief introduction to the research
Last week, we analyzed the iterator classes MarkableIterator, OutputCommitter, OutputFormat, and their subclasses, and learned more about custom format output. We will continue our analysis here, starting with org.apache.hadoop.mapreduce.Partitioner<KEY, VALUE>
Org.apache.hadoop.mapre ...
Posted by paulrichards19 on Tue, 07 Dec 2021 18:55:08 +0100
SQOOP installation and use
SQOOP installation and use
SQOOP installation
1. Upload and unzip
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/soft/
2. Modify folder name
mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7
3. Modify profile
# Switch to the sqoop profile directory
cd /usr/local/soft/sqoop-1.4.7/conf
# Copy profile and rename
cp sqoop-env ...
Posted by MaxBodine on Mon, 06 Dec 2021 23:04:46 +0100
Hive UDF < user defined functions > getting started
1, Introduction
Hive has three types of UDFs: (normal) UDF, user-defined aggregate function (UDAF), and user-defined table generating function (UDTF).
UDF: the operation acts on a single data row and produces a data row as output. Most functions, such as mathematical and string functions, fall into this category.UDAF: accepts multiple input d ...
Posted by ublapach on Fri, 03 Dec 2021 05:59:54 +0100
Hive code analysis report: semantic analysis ⑤
2021SC@SDUSC
catalogue
summary
Supplementary description doPhase1()
getMetaData(QB, ReadEntity) analysis
summary
In the last article, I analyzed doPhase1() Function, which is the initial stage of semantic analysis. The final goal of the program is to load the ast data into QB. The main idea of doPhase1 in this stage is to recursively tra ...
Posted by clewis4343 on Wed, 01 Dec 2021 21:52:13 +0100
Big data offline processing data project website log file data collection log splitting data collection to HDFS and preprocessing
Introduction:
This article is about the first process of big data offline data processing project: data collection
Main contents:
1) Use flume to collect website log file data to access.log
2) Write shell script: split the collected log data file (otherwise the access.log file is too large) and rename it to access_ Mm / DD / yyyy.log.   ...
Posted by erth on Tue, 30 Nov 2021 12:59:03 +0100
scala ---- list, ancestor, set and related knowledge
1. Array
1.1 general
Array is a container used to store multiple elements of the same type. Each element has a number (also known as subscript, subscript and index), and the number starts from 0. In Scala, there are two kinds of arrays, one is a fixed length array and the other is a variable length array
1.2 fixed length array
1.2.1 feature ...
Posted by soupy127 on Mon, 29 Nov 2021 23:32:31 +0100
MapReduce core design -- Hadoop RPC framework
Hadoop RPC is divided into four parts
Serialization layer: convert structured objects into byte streams for transmission over the network or write to persistent storage. In the RPC framework, it is mainly used to convert parameters or responses in user requests into byte streams for cross machine transmission.Function call layer: locate the fu ...
Posted by DuNuNuBatman on Mon, 29 Nov 2021 22:37:21 +0100