Software version:
jdk: 1.8
maven: 3.61 http://maven.apache.org/download.cgi
spark: 2.42 https://archive.apache.org/dist/spark/spark-2.4.2/
Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed)
To configure maven:
#Configure environment variables [root@hadoop004 soft]# cat /etc/profile.d/maven.sh MAVEN_HOME=/usr/local/maven export PATH=$MAVEN_HOME/bin:$PATH #Confirm maven version [root@hadoop004 maven]# mvn --version Apache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-05T03:00:29+08:00) Maven home: /usr/local/maven Java version: 1.8.0_111, vendor: Oracle Corporation, runtime: /usr/java/jdk1.8.0_111/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "3.10.0-862.3.2.el7.x86_64", arch: "amd64", family: "unix" #Configure the local storage address of mvn: settings.xml file <localRepository>/usr/local/maven/repo</localRepository> #Configure the mvn download source as Alibaba cloud's maven warehouse to speed up the download <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror>
Configure Spark:
tar xf spark-2.4.2.tgz cd spark-2.4.2.tgz #Modify pom.xml file and add cloud warehouse <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository>
Execute the compile command:
#Execute in spark directory ./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0 Note: the compilation time is about 35 minutes long, and there is no error report in the middle; Note: the latest version of scala is used by default. If you want to specify a version of Scala, modify it in the following ways For example, change the scala version to 2.10 ./dev/change-scala-version.sh 2.10
Parameter Description:
--Name: the suffix name of the generated compressed package; the prefix is the name of the spark version by default. In this case, spark-2.4.2-bin
--tgz: the compression format is tar, and the compression suffix is. tgz
-Pyarn: indicates that spark needs to run on yarn
-Phadop-2.6: the id of the profile that spark uses hadoop
-Dhadop. Version = 2.6.0-cdh5.7.0: indicates that spark uses the version of hadoop; if not specified, the default is 2.2.0 hadoop
-Phive - phive thriftserver: indicates support for hive
Generated files:
In spark Directory: spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz
Use the compiled spark deployment:
tar xf spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz ln -s spark-2.4.2-bin-2.6.0-cdh5.7.0 spark #Configuring environment variables for spark [hadoop@hadoop001 ~]$ vim .bash_profile export SPARK_HOME=/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0 export PATH=${SPARK_HOME}/bin:$PATH [hadoop@hadoop001 ~]$ source .bash_profile #Run spark test [hadoop@hadoop001 ~]$ spark-shell 19/04/29 10:51:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://hadoop001:4040 Spark context available as 'sc' (master = local[*], app id = local-1556506274719). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.2 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111) Type in expressions to have them evaluated. Type :help for more information. scala>