Exceptions and solutions when Hadoop runs MapReduce task

Exception code description Just beginning to contact Hadoop,about MapReduce From time to time, I especially understand that the following records the problems and solutions that have been tangled for a day 1. Execute MapReduce task hadoop jar wc.jar hejie.zheng.mapreduce.wordcount2.WordCountDriver /input /output 2. Jump out of exception ...

Posted by burzvingion on Thu, 03 Feb 2022 14:16:53 +0100

Hadoop distributed cluster installation

Pseudo distributed clusters are shown in: https://blog.csdn.net/weixin_40612128/article/details/119008295?spm=1001.2014.3001.5501 After the pseudo distributed cluster is completed, let's take a look at what the real distributed cluster is like. Take a look at this figure. It shows three nodes. The one on the left is the master node and the t ...

Posted by verlen on Wed, 02 Feb 2022 19:57:20 +0100

Migrating CM and database-2

9.1 migrate the data of the original CM node to the new node 9.1.1 backup the original CM node data It mainly backs up the monitoring data and management information of CM. The data directory includes: /var/lib/cloudera-host-monitor /var/lib/cloudera-service-monitor /var/lib/cloudera-scm-server /var/lib/cloudera-scm-eventserver /var/lib/clou ...

Posted by cidesign on Wed, 02 Feb 2022 07:50:51 +0100

Implementation of e-commerce website user operation analysis using Hive tool in Hadoop environment

1, Analysis target 1. Establish daily operation index system; 2. Analyze the composition of existing users and count the composition of daily users; 3. Analyze the re purchase situation of users and guide the subsequent re purchase operation. 2, Data description User_info user information table: user_action user behavior table: 3, Imp ...

Posted by gljaber on Wed, 02 Feb 2022 06:13:48 +0100

spark the way of God - detailed explanation of RDD creation

3.2 RDD programming In Spark, RDD is represented as an object, and RDD is converted through method calls on the object. After defining RDD through a series of transformations, you can call actions to trigger the calculation of RDD. Actions can be to return results to the application (count, collect, etc.) or to save data to the storage system ...

Posted by Matty999555 on Tue, 01 Feb 2022 16:47:04 +0100

hadoop2.6.5 Mapper class source code analysis

Mapper class // // Source code recreated from a .class file by IntelliJ IDEA // (powered by Fernflower decompiler) // package org.apache.hadoop.mapreduce; import java.io.IOException; import org.apache.hadoop.classification.InterfaceAudience.Public; import org.apache.hadoop.classification.InterfaceStability.Stable; @Public @Stable public cla ...

Posted by warydig on Mon, 31 Jan 2022 03:11:39 +0100

UDF is used in Hive and Impala

11.1 introduction to experimental environment The cluster environment is running normallyHive and Impala services are installed in the clusterOperating system: redhat6 fiveCDH and CM versions are 5.11.1EC2 user with sudo permission is used for operation 11.2 UDF function development - using Intellij tools Use Intellij tool to develop Hive's ...

Posted by Design on Sun, 30 Jan 2022 23:08:38 +0100

Big data learning road Hadoop

1. Introduction to big data 1.1 big data concept big data refers to a data set that cannot be captured, managed and processed by conventional software tools within a certain time range. It is a massive, high growth rate and diversified information asset that requires a new processing mode to have stronger decision-making power, insight an ...

Posted by monkuar on Sat, 29 Jan 2022 15:27:44 +0100

Python big data processing library PySpark actual combat summary III

Shared variable broadcast variable Broadcast variables allow programs to cache a read-only variable on each machine in the cluster instead of saving a copy for each task. With broadcast variables, you can share some data in a more efficient way, such as a global configuration file. from pyspark.sql import SparkSession spark = SparkSe ...

Posted by cunoodle2 on Sat, 29 Jan 2022 14:37:23 +0100

zepplin installation configuration and summary of all command failures

zeppelin installation configuration Installation premise: before installing zeppelin, make sure your hadoop and hive are on. Oh, this is very important!!! First step Put the downloaded compressed package under / opt/download/hadoop directory (I put it here, and you can choose by yourself) cd /opt/download/hadoop ls Then just drag it Step ...

Posted by ChaosKnight on Fri, 28 Jan 2022 05:00:43 +0100