Hadoop program installation and configuration

Posted by rcarr on Thu, 20 Jan 2022 15:05:34 +0100

I Cluster environment construction

Environmental preparation

(1) Server configuration

IP

host name

Environment configuration

install

10.100.100.42

node01

Close the firewall and selinux,host mapping, and clock synchronization

JDK,NameNode,ResourceManager,Zookeeper

10.100.100.43

node02

Close the firewall and selinux,host mapping and clock synchronization

JDK,DataNode,NodeManager,Zookeeper

10.100.100.44

node03

Close the firewall and selinux,host mapping and clock synchronization

JDK,DataNode,NodeManager,Zookeeper

(2) Modify host name to host name mapping

  • Edit the etc/hostname file

(3) Turn off firewall and SELinux

  • Service iptables stop Chkconfig iptables off
  • CentOS7
  • systemctl stop firewalld.service # stop firewall
  • systemctl disable firewalld.service # disable firewall startup
  • Close Selinux
  • View status command / usr/sbin/sestatus -v
  • Modify the / etc/selinux/config file and change SELINUX=enforcing to SELINUX=disabled
    • After setting, you need to restart to take effect

(4) SSH password free login

  • The three machines in the cluster generate the public and private key pair SSH keygen - t RSA
  • Copy the public keys of the three machines to the first ssh-copy-id node01
  • Copy the public key of the first machine to the other two
  • scp /root/.ssh/authorized_keys node02:/root/.ssh
  • scp /root/.ssh/authorized_keys node03:/root/.ssh
  • Perform ssh node02 on node01 for password free login verification

(5) Clock synchronization

  • Install ntp using yum install -y ntp
  • Turn on scheduled service
  • crontab -e
  • */1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;

(6) Cluster machine installation JDK(1.8)

  • Check whether openJDK is installed. If the installation needs to uninstall rpm -qa | grep java
  • Uninstall the jdk# rpm -e jar package -- nodeps
  • Create a directory # mkdir -p /export/softPage # mkdir -p /export/soft
  • Upload jdk and unzip it. Install upload tool yum -y install lrzsz
  • Upload file command}rz -E
  • Unzip the file command tar -xvf installation package (path) - C /export/soft
  • Configure the environment variable vi /etc/profile
  • export JAVA_HOME=/export/soft/jdk1.8.0_144
  • export PATH=:$JAVA_HOME/bin:$PATH
  • Make the configuration file effective source /etc/profile
  • Verify that the jdk is successfully installed {java -version
  • Remote copy folder} scp -r folder path node01:/export/soft
  • Configure the environment variables of other machines. See the above for specific operations
  • Cancel log reminder log
  • vi /etc/profile add unset MAILCHECK , and then make the file effective , source /etc/profile

II Zookeeper cluster environment construction

(1) Cluster planning

                                IP

host name

MyId

10.100.100.42

Node01

1

10.100.100.43

Node02

2

10.100.100.44

Node03

3

(2) Download Zookeeper (3.4.9) installation package

(3) Unzip file

  • tar -xvf installation package - C /export/soft

(4) Modify profile

  • cd /export/soft/zookeeper-3.4.9/conf/
  • Copy template configuration file_ sample. cfg zoo. cfg
  • Create zookeeper data directory MKDIR - P / export / soft / zookeeper-3.4.9/zkdata
  • Configuration data directory

  • Configure number of snapshots

  • Configure log cleanup time

 

  • Configure zookeeper cluster
  • server.1=node01:2888:3888
  • server.2=node02:2888:3888
  • server.3=node03:2888:3888

(5) Create myid profile

  • vi myid
  • Echo 1 > /export/soft/zookeeper-3.4.9/zkdatas/myid

(6) The installation package is distributed to node02 node03

  • scp -r folder path node02:/export/soft
  • scp -r folder path node03:/export/soft

(7) Start the zookeeper service for each machine in the cluster

  • Start service / export / soft / zookeeper-3.4.9/bin/zkserver.com sh start
  • View service status / export / soft / zookeeper-3.4.9/bin/zkserver.com sh status

III Building Hadoop cluster environment

(1) Cluster planning

Server IP

10.100.100.42

10.100.100.43

10.100.100.44

host name

Node01

Node02

Node03

NameNode

yes

no

no

SecondaryNameNode

yes

no

no

dataNode

yes

yes

yes

ResourceManager

yes

no

no

NodeManager

yes

yes

yes

(2) Installation package download (Hadoop 2.7.5)

(3) See similar operations above for details of file upload and decompression

(4) Modify profile

  • Modify core site XML file path / export/soft/hadoop-2.7.5/etc/hadoop/core-site.xml xml
  • Modification content

<configuration>

<!--Set file type and master node-->

<property>

<name>fs.default.name</name>

<value>hdfs://192.168.1.200:8020</value>

</property>

<!--set up Hadoop Temporary directory-->

<property>

<name>hadoop.tmp.dir</name>

<value>/export/soft/hadoop-2.7.5/hadoopDatas/tempDatas</value>

</property>

<!--Set buffer size-->

<property>

<name>io.file.buffer.size</name>

<value>4096</value>

</property>

<!--set up hdfs The garbage can recycling mechanism mainly refers to the time to empty the recycle bin, in minutes-->

<property>

<name>fs.trash.interval</name>

<value>10080</value>

</property>

</configuration>



modify hdfs-site.xml

File path

/export/soft/hadoop-2.7.5/etc/hadoop/hdfs-site.xml

Modification content

<configuration>

<!-- to configure secondaryNameNode Access address for -->

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>node01:50090</value>

</property>

<!-- to configure nameNode Access address for -->

<property>

<name>dfs.namenode.http-address</name>

<value>node01:50070</value>

</property>

<!-- to configure nameNode Where metadata is stored -->

<property>

<name>dfs.namenode.name.dir</name>

<value>file:///export/soft/hadoop-2.7.5/hadoopDatas/namenodeDatas,file:///export/soft/hadoop-2.7.5/hadoopDatas/namenodeDatas2</value>

</property>

<!-- to configure dataNode Node location of the data store -->

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///export/soft/hadoop-2.7.5/hadoopDatas/datanodeDatas,file:///export/soft/hadoop-2.7.5/hadoopDatas/datanodeDatas2</value>

</property>

<!-- to configure nameNode Log file storage location -->

<property>

<name>dfs.namenode.edits.dir</name>

<value>file:///export/soft/hadoop-2.7.5/hadoopDatas/nn/edits</value>

</property>

<!-- Configure checkpoint file storage location -->

<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>file:///export/soft/hadoop-2.9.2/hadoopDatas/snn/name</value>

</property>

<!--   -->

<property>

<name>dfs.namenode.checkpoint.edits.dir</name>

<value>file:///export/soft/hadoop-2.9.2/hadoopDatas/dfs/snn/edits</value>

</property>

<!-- Number of copies stored in a single data slice file  -->

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<!-- Enable permissions -->

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<!-- Single data slice size 128 M-->

<property>

<name>dfs.blocksize</name>

<value>134217728</value>

</property>

</configuration>

modify hadoop-env.sh   Mainly modify jdk Path of

/export/soft/jdk1.8.0_144

modify mapred-site.xml Modification content

<configuration>

<!-- open MapReduce Small task mode  -->

<property>

<name>mapreduce.job.ubertask.enable</name>

<value>true</value>

</property>



<!-- Set the host and port for historical tasks  -->

<property>

<name>mapreduce.jobhistory.address</name>

<value>node01:10020</value>

</property>



<!-- Set the host and port of the web page access history task  -->

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>node01:19888</value>

</property>

</configuration>

Modify Yard site XML, modifying content

<!-- to configure yarn Location of master node -->

<property>

<name>yarn.resourcemananger.hostname</name>

<value>node01</value>

</property>



<!--  -->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>



<!-- Enable log aggregation -->

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>



<!-- Set aggregation log file in hdfs Time saved on, in seconds -->

<property>

<name>yarn.log-aggregation.retain-seconds</name>

<value>604800</value>

</property>



<!-- set up yarn Memory allocation scheme of cluster -->

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>20480</value>

</property>

<property>

<name>yarn.scheduler.minimum-allocation-mb</name>

<value>2048</value>

</property>

<property>

<name>yarn.nodemanager.vmem-pmem-ratio</name>

<value>2.1</value>

</property>
  • Modify mapred env Sh# mainly modifies the path of the jdk

  export JAVA_HOME=/export/soft/jdk1.8.0_144

Modifying the slave file is mainly to configure the cluster level relationship. The modification contents are as follows: Node01 , Node02 , Node03

  • Create directory
mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/tempDatas

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/namenodeDatas

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/namenodeDatas2

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/datanodeDatas

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/datanodeDatas2

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/nn/edits

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/snn/name

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/dfs/snn/edits
  • Installation package distribution

scp -r installation package path node02:$PWD

scp -r installation package path node03:$PWD

  • Configuring hadoop environment variables

        vi /etc/profile

        export HADOOP_HOME=/ export/soft/hadoop-2.9.2

        export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

         source /etc/profile

  • Start the cluster on the primary node

        cd /export/soft/hadoop-2.7.5/

Bin / HDFS / namenode - Format (executed after the first startup)

        sbin/start-dfs.sh   Sbin/start-yarn.sh

        sbin/mr-jobhistory-daemon.sh start historyserver

Web view hdfs http://node01:50070/explorer.html#/

Web view yarn cluster http://node01:8088/cluster

Page view history completed tasks http://node01:19888/jobhistory

IV Hbase (hadoop database) installation

(1) Download and upload the installation package (hbase 2.1.0)

Download path http://archive.apache.org/dist/hb ase software version 2.1.0

(2) Modify HBase env SH file

        export JAVA_HOME=/export/soft/jdk1.8.0_144

        export HBASE_MANAGES_ZK=false

 

 

 

(3) Modify HBase site XML file

 

<!-- Hbase Data in HDFS Storage location in -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://node01:8020/hbase</value>
  </property>
  <!-- Hbase Operation mode false Click mode  true Distributed mode -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <!-- ZooKeeper Cluster address -->
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node01,node02,node03</value>
  </property>
  <!-- ZooKeeper Snapshot data address -->
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/export/soft/zookeeper-3.4.9/zkdatas</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <!-- The distributed mode is set to false -->
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>

(4) Configure environment variable file

        export HBASE_HOME=/export/soft/hbase-2.4.0

        export PATH=:$HBASE_HOME/bin:$HBASE_HOME/sbin:$PATH

(5) Copy dependent Library

       cp $HBASE_HOME/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar                          $HBASE_HOME/lib

(6) Modify the regionservers file

        

(7) Installation package distribution

        

        scp -r hbase-2.4.0/ node02:$PWD

        scp -r hbase-2.4.0/ node03:$PWD

(8) Modify the environment variable of node02 and node03 and take effect

(9) Start Hbase

First, make sure that the zookeeper cluster and hadoop cluster are started. Then start Hbase on the primary node

Enter the bin directory of Hbase and execute start Hbase sh

(10) Verify that Hbase started successfully

        

 

(11) Hbase web interface

        Http://10.100.100.42:16010

V Phoenix(5.0.0) plug in installation

(1) File download

File download address Http://phoenix.apache.org/download.html

(2) Upload and unzip the file tar -xvf phoenix-hbase-2.4.0-5.1.2-bin tar. gz -C /export/soft

(3) Copy Jar package to Hbase directory

        cp  /export/soft/phoenix-hbase-2.4.0-5.1.2-bin/phoenix-*.jar /export/soft/hbase-2.4.0/lib/

(4) Distribute Jar package to other nodes of Hbase

        scp phoenix-*.jar node02:$PWD

        scp phoenix-*.jar node03:$PWD

(5) Modify profile

modify hbase Configuration file for

cd /export/soft/hbase-2.4.0/conf/

modify hbase-site.xml file

 <!-- support HBase Namespace mapping -->

  <property>

    <name>phoenix.schema.isNamespaceMappingEnabled</name>

    <value>true</value>

  </property>

  <!-- Support index log pre write encoding -->

  <property>

    <name>hbase.regionserver.wal.codec</name>

    <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>

  </property>

(6) Distribution profile

        scp hbase-site.xml node02:$PWD

        scp hbase-site.xml node03:$PWD

(7) Copy the configuration file to the phoenix directory

        cp hbase-site.xml /export/soft/phoenix-hbase-2.4.0-5.1.2-bin/bin/

(8) Restart Hbase service

New cluster zookeeper 3 4.9 hadoop2. 7.5  hbase 2.1.0  phoenix (5.0.0-Hbase2.0)

Check Hadoop 2 7.5 support for local libraries

Bin/hadoop/ checknative spring boot integrates Hadoop

The client needs Hadoop in windows environment dll

These new clusters

 

Topics: Big Data Hadoop gis