prerequisite:
1. Install jdk1.8 (under Windows Environment)
2. Install maven 3.3.9 (under Windows Environment)
3. Install eclipse (under Windows Environment)
4. Install hadoop (under Linux environment)
Question:
The input file is: EMP.csv. The contents of the EMP.csv file are as follows: SAL is employee salary (int type), DEPTNO is department number (int type)
Copy the following to an editor such as Sublime / or Notepad + + and save it as EMP.csv
7369,SMITH,CLERK,7902,1980/12/17,800,,20 7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30 7521,WARD,SALESMAN,7698,1981/2/22,1250,500,30 7566,JONES,MANAGER,7839,1981/4/2,2975,,20 7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30 7698,BLAKE,MANAGER,7839,1981/5/1,2850,,30 7782,CLARK,MANAGER,7839,1981/6/9,2450,,10 7788,SCOTT,ANALYST,7566,1987/4/19,3000,,20 7839,KING,PRESIDENT,,1981/11/17,5000,,10 7844,TURNER,SALESMAN,7698,1981/9/8,1500,0,30 7876,ADAMS,CLERK,7788,1987/5/23,1100,,20 7900,JAMES,CLERK,7698,1981/12/3,950,,30 7902,FORD,ANALYST,7566,1981/12/3,3000,,20 7934,MILLER,CLERK,7782,1982/1/23,1300,,10
Question: realize the function of counting the total salaries of employees in each department in EMP.csv through MapReduce?
Programming model analysis:
Realize the function of counting the total salaries of employees in each department through MapReduce
To configure the Maven environment of eclipse: (please skip the previous configuration.) refer to: Configuring eclipse's Maven environment
New Maven project: Project Name: DeptSalarySum
Delete the original App.java in the src/main/java directory,
Create three new classes in src/main/java Directory: DeptSalaySumMain.java, DeptSalaySumMapper.java, deptsalaysumeducator.java
Modify pom.xml file:
a. configure the main class: add the following before < / Project >
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.1.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <!-- main()Note the modification of the class --> <mainClass>com.DeptSalarySum.DeptSalaySumMain<</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>
b. configure the dependency of MapReduce program (important): the package that the program depends on. When configured here, Maven will help us download it and save the trouble of manually adding jar package. Before < / dependencies >, add the following:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.3</version> </dependency>
After modifying pom.xml, save it with ctrl + s
Refresh project
Write code:
First, think about how to write the code. Through the analysis of programming model, write the code
Packaging engineering
Submit to Hadoop to run and view the running results.