Exceptions and solutions when Hadoop runs MapReduce task
Exception code description
Just beginning to contact Hadoop,about MapReduce From time to time, I especially understand that the following records the problems and solutions that have been tangled for a day
1. Execute MapReduce task
hadoop jar wc.jar hejie.zheng.mapreduce.wordcount2.WordCountDriver /input /output
2. Jump out of exception ...
Posted by burzvingion on Thu, 03 Feb 2022 14:16:53 +0100
Hadoop distributed cluster installation
Pseudo distributed clusters are shown in:
https://blog.csdn.net/weixin_40612128/article/details/119008295?spm=1001.2014.3001.5501
After the pseudo distributed cluster is completed, let's take a look at what the real distributed cluster is like. Take a look at this figure. It shows three nodes. The one on the left is the master node and the t ...
Posted by verlen on Wed, 02 Feb 2022 19:57:20 +0100
Migrating CM and database-2
9.1 migrate the data of the original CM node to the new node
9.1.1 backup the original CM node data
It mainly backs up the monitoring data and management information of CM. The data directory includes:
/var/lib/cloudera-host-monitor
/var/lib/cloudera-service-monitor
/var/lib/cloudera-scm-server
/var/lib/cloudera-scm-eventserver
/var/lib/clou ...
Posted by cidesign on Wed, 02 Feb 2022 07:50:51 +0100
Implementation of e-commerce website user operation analysis using Hive tool in Hadoop environment
1, Analysis target
1. Establish daily operation index system; 2. Analyze the composition of existing users and count the composition of daily users; 3. Analyze the re purchase situation of users and guide the subsequent re purchase operation.
2, Data description
User_info user information table: user_action user behavior table:
3, Imp ...
Posted by gljaber on Wed, 02 Feb 2022 06:13:48 +0100
spark the way of God - detailed explanation of RDD creation
3.2 RDD programming
In Spark, RDD is represented as an object, and RDD is converted through method calls on the object. After defining RDD through a series of transformations, you can call actions to trigger the calculation of RDD. Actions can be to return results to the application (count, collect, etc.) or to save data to the storage system ...
Posted by Matty999555 on Tue, 01 Feb 2022 16:47:04 +0100
hadoop2.6.5 Mapper class source code analysis
Mapper class
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//
package org.apache.hadoop.mapreduce;
import java.io.IOException;
import org.apache.hadoop.classification.InterfaceAudience.Public;
import org.apache.hadoop.classification.InterfaceStability.Stable;
@Public
@Stable
public cla ...
Posted by warydig on Mon, 31 Jan 2022 03:11:39 +0100
UDF is used in Hive and Impala
11.1 introduction to experimental environment
The cluster environment is running normallyHive and Impala services are installed in the clusterOperating system: redhat6 fiveCDH and CM versions are 5.11.1EC2 user with sudo permission is used for operation
11.2 UDF function development - using Intellij tools
Use Intellij tool to develop Hive's ...
Posted by Design on Sun, 30 Jan 2022 23:08:38 +0100
Big data learning road Hadoop
1. Introduction to big data
1.1 big data concept
big data refers to a data set that cannot be captured, managed and processed by conventional software tools within a certain time range. It is a massive, high growth rate and diversified information asset that requires a new processing mode to have stronger decision-making power, insight an ...
Posted by monkuar on Sat, 29 Jan 2022 15:27:44 +0100
Python big data processing library PySpark actual combat summary III
Shared variable
broadcast variable
Broadcast variables allow programs to cache a read-only variable on each machine in the cluster instead of saving a copy for each task. With broadcast variables, you can share some data in a more efficient way, such as a global configuration file. from pyspark.sql import SparkSession
spark = SparkSe ...
Posted by cunoodle2 on Sat, 29 Jan 2022 14:37:23 +0100
zepplin installation configuration and summary of all command failures
zeppelin installation configuration
Installation premise: before installing zeppelin, make sure your hadoop and hive are on. Oh, this is very important!!!
First step
Put the downloaded compressed package under / opt/download/hadoop directory (I put it here, and you can choose by yourself)
cd /opt/download/hadoop
ls
Then just drag it
Step ...
Posted by ChaosKnight on Fri, 28 Jan 2022 05:00:43 +0100