Hadoop [Page 17] - Programmer Think - where programmers share thinking

Spark 2.4.2 source compilation

Software version: jdk: 1.8 maven: 3.61 http://maven.apache.org/download.cgi spark: 2.42 https://archive.apache.org/dist/spark/spark-2.4.2/ Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed) To configure maven: #Configure environment variables [root@hadoop004 ...

Posted by buddymoore on Wed, 20 Nov 2019 18:01:18 +0100

Traffic statistics of MapReduce

Traffic statistics of MapReduce means of preparation Open hadoop pojo level mapper level reducer level partition layer job level Package it into jar package and upload it to the server summary Sort by total flow pojo level mapper level reducer level Zoning layer job level means of preparation A virtual machine with ha ...

Posted by Hellomonkey on Tue, 19 Nov 2019 18:39:37 +0100

HBase custom MapReduce

Transfer of HBase table data In the Hadoop phase, the MR task we wrote has two classes: Mapper and Reducer. In HBase, we need to inherit two classes: TableMapper and TableReducer. Objective: to migrate part of the data in the fruit table to the fruit_mr table through MR Step 1. Build the ReadFruitMapper class to read the data in the fruit ta ...

Posted by brooky on Sun, 03 Nov 2019 19:16:43 +0100

Cluster construction of hadoop, spark, hive and azkaban under ubuntu

Tuesday, 08. October 2019 11:01 am Initial preparation: 1. jdk installation Do the following on all three machines (depending on the number of machines you have): 1) you can install jdk through apt get, execute whereis java on the command line to get the installation path of java, or download the installation package of jdk manually f ...

Posted by mattal999 on Sat, 02 Nov 2019 11:47:52 +0100

2. hdfs architecture

[TOC] I. Overview of HDFS System Composition This is a distributed file system that is suitable for scenarios where multiple reads are written at one time. Contains the following roles: NameNode(nn): Store metadata of files, such as file name, file directory structure, file attributes and so on, as well as block list of each file and DataNode ...

Posted by Nilanka on Mon, 14 Oct 2019 05:24:03 +0200

Could not flush and close the file system output stream

A Flink program for Kafka data consumption, the Flinon Yarn model, was released in the test and production environments before. It was normal and had no problems. However, after restarting the test environment, it was redistributed again. The error was reported as follows: 2019-07-01 15:19:25,984 INFO ...

Posted by foxden on Thu, 10 Oct 2019 06:09:17 +0200

Programmer Think

Spark 2.4.2 source compilation

Traffic statistics of MapReduce

HBase custom MapReduce

Cluster construction of hadoop, spark, hive and azkaban under ubuntu

2. hdfs architecture

Could not flush and close the file system output stream

Hive 2.3.0 Installation Notes

0663-6.2.0 - Get CDSW login information through Nginx

MapReduce custom k, partition, and counter

Hive format for storing and reading files

Hot Topics