Building hadoop cluster with CentOS7

explain:         1. Prepare three virtual machines, refer to: Construction of CentOS7 cluster environment (taking 3 sets as an example)         2. Configure secret free login between virtual machines: Reference: CentOS7 cluster configuration password free login         3. Install jd ...

Posted by defx on Sun, 12 Dec 2021 16:58:54 +0100

hadoop HDFS folder creation, file upload, file download, folder deletion, file renaming, file details, file type judgment (folder or file)

Absrtact: This article mainly introduces the use of the basic api of hadoop hdfs. Including Windows side dependency configuration and Maven dependency configuration. The last is the actual operation, including: obtaining the remote hadoop hdfs connection and a series of operations on it, including; Folder creation, file upload, file download, f ...

Posted by The_Walrus on Thu, 09 Dec 2021 05:10:08 +0100

Senior big data Development Engineer - Hive learning notes

Hive improved chapter Use of Hive Hive's bucket table 1. Principle of drum dividing table Bucket splitting is a more fine-grained partition relative to partition. Hive table or partition table can further divide bucketsDivide the bucket, take the hash value of the whole data content according to a column, and determine which bucket th ...

Posted by luisluis on Wed, 08 Dec 2021 08:35:11 +0100

Hadoop Source Analysis

2021SC@SDUSC A brief introduction to the research Last week, we analyzed the iterator classes MarkableIterator, OutputCommitter, OutputFormat, and their subclasses, and learned more about custom format output. We will continue our analysis here, starting with org.apache.hadoop.mapreduce.Partitioner<KEY, VALUE> Org.apache.hadoop.mapre ...

Posted by paulrichards19 on Tue, 07 Dec 2021 18:55:08 +0100

SQOOP installation and use

SQOOP installation and use SQOOP installation 1. Upload and unzip tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/soft/ 2. Modify folder name mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7 3. Modify profile # Switch to the sqoop profile directory cd /usr/local/soft/sqoop-1.4.7/conf # Copy profile and rename cp sqoop-env ...

Posted by MaxBodine on Mon, 06 Dec 2021 23:04:46 +0100

Hive UDF < user defined functions > getting started

1, Introduction Hive has three types of UDFs: (normal) UDF, user-defined aggregate function (UDAF), and user-defined table generating function (UDTF). UDF: the operation acts on a single data row and produces a data row as output. Most functions, such as mathematical and string functions, fall into this category.UDAF: accepts multiple input d ...

Posted by ublapach on Fri, 03 Dec 2021 05:59:54 +0100

Hive code analysis report: semantic analysis ⑤

2021SC@SDUSC catalogue summary Supplementary description doPhase1() getMetaData(QB, ReadEntity) analysis summary In the last article, I analyzed doPhase1() Function, which is the initial stage of semantic analysis. The final goal of the program is to load the ast data into QB. The main idea of doPhase1 in this stage is to recursively tra ...

Posted by clewis4343 on Wed, 01 Dec 2021 21:52:13 +0100

Big data offline processing data project website log file data collection log splitting data collection to HDFS and preprocessing

Introduction: This article is about the first process of big data offline data processing project: data collection Main contents: 1) Use flume to collect website log file data to access.log 2) Write shell script: split the collected log data file (otherwise the access.log file is too large) and rename it to access_ Mm / DD / yyyy.log. &nbsp ...

Posted by erth on Tue, 30 Nov 2021 12:59:03 +0100

scala ---- list, ancestor, set and related knowledge

1. Array 1.1 general Array is a container used to store multiple elements of the same type. Each element has a number (also known as subscript, subscript and index), and the number starts from 0. In Scala, there are two kinds of arrays, one is a fixed length array and the other is a variable length array 1.2 fixed length array 1.2.1 feature ...

Posted by soupy127 on Mon, 29 Nov 2021 23:32:31 +0100

MapReduce core design -- Hadoop RPC framework

Hadoop RPC is divided into four parts Serialization layer: convert structured objects into byte streams for transmission over the network or write to persistent storage. In the RPC framework, it is mainly used to convert parameters or responses in user requests into byte streams for cross machine transmission.Function call layer: locate the fu ...

Posted by DuNuNuBatman on Mon, 29 Nov 2021 22:37:21 +0100