MapReduce program 2 of Maven project -- realize the function of counting the total salaries of employees in each department

Posted by tsiedsma on Tue, 10 Dec 2019 17:00:49 +0100

prerequisite:

1. Install jdk1.8 (under Windows Environment)

2. Install maven 3.3.9 (under Windows Environment)

3. Install eclipse (under Windows Environment)

4. Install hadoop (under Linux environment)

Question:

The input file is: EMP.csv. The contents of the EMP.csv file are as follows: SAL is employee salary (int type), DEPTNO is department number (int type)

Copy the following to an editor such as Sublime / or Notepad + + and save it as EMP.csv

7369,SMITH,CLERK,7902,1980/12/17,800,,20
7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30
7521,WARD,SALESMAN,7698,1981/2/22,1250,500,30
7566,JONES,MANAGER,7839,1981/4/2,2975,,20
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981/5/1,2850,,30
7782,CLARK,MANAGER,7839,1981/6/9,2450,,10
7788,SCOTT,ANALYST,7566,1987/4/19,3000,,20
7839,KING,PRESIDENT,,1981/11/17,5000,,10
7844,TURNER,SALESMAN,7698,1981/9/8,1500,0,30
7876,ADAMS,CLERK,7788,1987/5/23,1100,,20
7900,JAMES,CLERK,7698,1981/12/3,950,,30
7902,FORD,ANALYST,7566,1981/12/3,3000,,20
7934,MILLER,CLERK,7782,1982/1/23,1300,,10

Question: realize the function of counting the total salaries of employees in each department in EMP.csv through MapReduce?

Programming model analysis:

Realize the function of counting the total salaries of employees in each department through MapReduce

To configure the Maven environment of eclipse: (please skip the previous configuration.) refer to: Configuring eclipse's Maven environment

New Maven project: Project Name: DeptSalarySum

Delete the original App.java in the src/main/java directory,

Create three new classes in src/main/java Directory: DeptSalaySumMain.java, DeptSalaySumMapper.java, deptsalaysumeducator.java

Modify pom.xml file:

a. configure the main class: add the following before < / Project >

<build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                <!-- main()Note the modification of the class -->
                  <mainClass>com.DeptSalarySum.DeptSalaySumMain<</mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

b. configure the dependency of MapReduce program (important): the package that the program depends on. When configured here, Maven will help us download it and save the trouble of manually adding jar package. Before < / dependencies >, add the following:

    <dependency>
	    <groupId>org.apache.hadoop</groupId>
	    <artifactId>hadoop-common</artifactId>
	    <version>2.7.3</version>
	</dependency>
	<dependency>
	    <groupId>org.apache.hadoop</groupId>
	    <artifactId>hadoop-client</artifactId>
	    <version>2.7.3</version>
	</dependency>
	<dependency>
	    <groupId>org.apache.hadoop</groupId>
	    <artifactId>hadoop-hdfs</artifactId>
	    <version>2.7.3</version>
	</dependency>
	<dependency>
	    <groupId>org.apache.hadoop</groupId>
	    <artifactId>hadoop-mapreduce-client-core</artifactId>
	    <version>2.7.3</version>
    </dependency>

After modifying pom.xml, save it with ctrl + s

Refresh project

Write code:

First, think about how to write the code. Through the analysis of programming model, write the code

Packaging engineering

Submit to Hadoop to run and view the running results.

Topics: Big Data Hadoop Maven Java Apache

Programmer Think