Hadoop program installation and configuration

Posted by rcarr on Thu, 20 Jan 2022 15:05:34 +0100

I Cluster environment construction

Environmental preparation

(1) Server configuration

IP	host name	Environment configuration	install
10.100.100.42	node01	Close the firewall and selinux,host mapping, and clock synchronization	JDK,NameNode,ResourceManager,Zookeeper
10.100.100.43	node02	Close the firewall and selinux,host mapping and clock synchronization	JDK,DataNode,NodeManager,Zookeeper
10.100.100.44	node03	Close the firewall and selinux,host mapping and clock synchronization	JDK,DataNode,NodeManager,Zookeeper

(2) Modify host name to host name mapping

Edit the etc/hostname file

(3) Turn off firewall and SELinux

Service iptables stop Chkconfig iptables off
CentOS7
systemctl stop firewalld.service # stop firewall
systemctl disable firewalld.service # disable firewall startup
Close Selinux
View status command / usr/sbin/sestatus -v
Modify the / etc/selinux/config file and change SELINUX=enforcing to SELINUX=disabled
- After setting, you need to restart to take effect

(4) SSH password free login

The three machines in the cluster generate the public and private key pair SSH keygen - t RSA
Copy the public keys of the three machines to the first ssh-copy-id node01
Copy the public key of the first machine to the other two
scp /root/.ssh/authorized_keys node02:/root/.ssh
scp /root/.ssh/authorized_keys node03:/root/.ssh
Perform ssh node02 on node01 for password free login verification

(5) Clock synchronization

Install ntp using yum install -y ntp
Turn on scheduled service
crontab -e
*/1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;

(6) Cluster machine installation JDK(1.8)

Check whether openJDK is installed. If the installation needs to uninstall rpm -qa | grep java
Uninstall the jdk# rpm -e jar package -- nodeps
Create a directory # mkdir -p /export/softPage # mkdir -p /export/soft
Upload jdk and unzip it. Install upload tool yum -y install lrzsz
Upload file command}rz -E
Unzip the file command tar -xvf installation package (path) - C /export/soft
Configure the environment variable vi /etc/profile
export JAVA_HOME=/export/soft/jdk1.8.0_144
export PATH=:$JAVA_HOME/bin:$PATH
Make the configuration file effective source /etc/profile
Verify that the jdk is successfully installed {java -version
Remote copy folder} scp -r folder path node01:/export/soft
Configure the environment variables of other machines. See the above for specific operations
Cancel log reminder log
vi /etc/profile add unset MAILCHECK ， and then make the file effective ， source /etc/profile

II Zookeeper cluster environment construction

(1) Cluster planning

IP	host name	MyId
10.100.100.42	Node01	1
10.100.100.43	Node02	2
10.100.100.44	Node03	3

(2) Download Zookeeper (3.4.9) installation package

Download address: Index of /dist/zookeeper

(3) Unzip file

tar -xvf installation package - C /export/soft

(4) Modify profile

cd /export/soft/zookeeper-3.4.9/conf/
Copy template configuration file_ sample. cfg zoo. cfg
Create zookeeper data directory MKDIR - P / export / soft / zookeeper-3.4.9/zkdata
Configuration data directory

Configure number of snapshots

Configure log cleanup time

Configure zookeeper cluster
server.1=node01:2888:3888
server.2=node02:2888:3888
server.3=node03:2888:3888

(5) Create myid profile

vi myid
Echo 1 > /export/soft/zookeeper-3.4.9/zkdatas/myid

(6) The installation package is distributed to node02 node03

scp -r folder path node02:/export/soft
scp -r folder path node03:/export/soft

(7) Start the zookeeper service for each machine in the cluster

Start service / export / soft / zookeeper-3.4.9/bin/zkserver.com sh start
View service status / export / soft / zookeeper-3.4.9/bin/zkserver.com sh status

III Building Hadoop cluster environment

(1) Cluster planning

Server IP	10.100.100.42	10.100.100.43	10.100.100.44
host name	Node01	Node02	Node03
NameNode	yes	no	no
SecondaryNameNode	yes	no	no
dataNode	yes	yes	yes
ResourceManager	yes	no	no
NodeManager	yes	yes	yes

(2) Installation package download (Hadoop 2.7.5)

(3) See similar operations above for details of file upload and decompression

(4) Modify profile

Modify core site XML file path / export/soft/hadoop-2.7.5/etc/hadoop/core-site.xml xml
Modification content

<configuration>

<!--Set file type and master node-->

<property>

<name>fs.default.name</name>

<value>hdfs://192.168.1.200:8020</value>

</property>

<!--set up Hadoop Temporary directory-->

<property>

<name>hadoop.tmp.dir</name>

<value>/export/soft/hadoop-2.7.5/hadoopDatas/tempDatas</value>

</property>

<!--Set buffer size-->

<property>

<name>io.file.buffer.size</name>

<value>4096</value>

</property>

<!--set up hdfs The garbage can recycling mechanism mainly refers to the time to empty the recycle bin, in minutes-->

<property>

<name>fs.trash.interval</name>

<value>10080</value>

</property>

</configuration>



modify hdfs-site.xml

File path

/export/soft/hadoop-2.7.5/etc/hadoop/hdfs-site.xml

Modification content

<configuration>

<!-- to configure secondaryNameNode Access address for -->

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>node01:50090</value>

</property>

<!-- to configure nameNode Access address for -->

<property>

<name>dfs.namenode.http-address</name>

<value>node01:50070</value>

</property>

<!-- to configure nameNode Where metadata is stored -->

<property>

<name>dfs.namenode.name.dir</name>

<value>file:///export/soft/hadoop-2.7.5/hadoopDatas/namenodeDatas,file:///export/soft/hadoop-2.7.5/hadoopDatas/namenodeDatas2</value>

</property>

<!-- to configure dataNode Node location of the data store -->

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///export/soft/hadoop-2.7.5/hadoopDatas/datanodeDatas,file:///export/soft/hadoop-2.7.5/hadoopDatas/datanodeDatas2</value>

</property>

<!-- to configure nameNode Log file storage location -->

<property>

<name>dfs.namenode.edits.dir</name>

<value>file:///export/soft/hadoop-2.7.5/hadoopDatas/nn/edits</value>

</property>

<!-- Configure checkpoint file storage location -->

<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>file:///export/soft/hadoop-2.9.2/hadoopDatas/snn/name</value>

</property>

<!--   -->

<property>

<name>dfs.namenode.checkpoint.edits.dir</name>

<value>file:///export/soft/hadoop-2.9.2/hadoopDatas/dfs/snn/edits</value>

</property>

<!-- Number of copies stored in a single data slice file  -->

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<!-- Enable permissions -->

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<!-- Single data slice size 128 M-->

<property>

<name>dfs.blocksize</name>

<value>134217728</value>

</property>

</configuration>

modify hadoop-env.sh   Mainly modify jdk Path of

/export/soft/jdk1.8.0_144

modify mapred-site.xml Modification content

<configuration>

<!-- open MapReduce Small task mode  -->

<property>

<name>mapreduce.job.ubertask.enable</name>

<value>true</value>

</property>



<!-- Set the host and port for historical tasks  -->

<property>

<name>mapreduce.jobhistory.address</name>

<value>node01:10020</value>

</property>



<!-- Set the host and port of the web page access history task  -->

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>node01:19888</value>

</property>

</configuration>

Modify Yard site XML, modifying content

<!-- to configure yarn Location of master node -->

<property>

<name>yarn.resourcemananger.hostname</name>

<value>node01</value>

</property>



<!--  -->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>



<!-- Enable log aggregation -->

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>



<!-- Set aggregation log file in hdfs Time saved on, in seconds -->

<property>

<name>yarn.log-aggregation.retain-seconds</name>

<value>604800</value>

</property>



<!-- set up yarn Memory allocation scheme of cluster -->

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>20480</value>

</property>

<property>

<name>yarn.scheduler.minimum-allocation-mb</name>

<value>2048</value>

</property>

<property>

<name>yarn.nodemanager.vmem-pmem-ratio</name>

<value>2.1</value>

</property>

Modify mapred env Sh# mainly modifies the path of the jdk

export JAVA_HOME=/export/soft/jdk1.8.0_144

Modifying the slave file is mainly to configure the cluster level relationship. The modification contents are as follows: Node01 ， Node02 ， Node03

Create directory

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/tempDatas

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/namenodeDatas

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/namenodeDatas2

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/datanodeDatas

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/datanodeDatas2

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/nn/edits

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/snn/name

mkdir -p /export/soft/hadoop-2.9.2/hadoopDatas/dfs/snn/edits

Installation package distribution

scp -r installation package path node02:$PWD

scp -r installation package path node03:$PWD

Configuring hadoop environment variables

vi /etc/profile

export HADOOP_HOME=/ export/soft/hadoop-2.9.2

export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

source /etc/profile

Start the cluster on the primary node

cd /export/soft/hadoop-2.7.5/

Bin / HDFS / namenode - Format (executed after the first startup)

sbin/start-dfs.sh Sbin/start-yarn.sh

sbin/mr-jobhistory-daemon.sh start historyserver

Web view hdfs http://node01:50070/explorer.html#/

Web view yarn cluster http://node01:8088/cluster

Page view history completed tasks http://node01:19888/jobhistory

IV Hbase (hadoop database) installation

(1) Download and upload the installation package (hbase 2.1.0)

Download path http://archive.apache.org/dist/hb ase software version 2.1.0

(2) Modify HBase env SH file

export JAVA_HOME=/export/soft/jdk1.8.0_144

export HBASE_MANAGES_ZK=false

(3) Modify HBase site XML file

<!-- Hbase Data in HDFS Storage location in -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://node01:8020/hbase</value>
  </property>
  <!-- Hbase Operation mode false Click mode  true Distributed mode -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <!-- ZooKeeper Cluster address -->
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node01,node02,node03</value>
  </property>
  <!-- ZooKeeper Snapshot data address -->
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/export/soft/zookeeper-3.4.9/zkdatas</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <!-- The distributed mode is set to false -->
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>

(4) Configure environment variable file

export HBASE_HOME=/export/soft/hbase-2.4.0

export PATH=:$HBASE_HOME/bin:$HBASE_HOME/sbin:$PATH

(5) Copy dependent Library

cp $HBASE_HOME/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar $HBASE_HOME/lib

(6) Modify the regionservers file

(7) Installation package distribution

scp -r hbase-2.4.0/ node02:$PWD

scp -r hbase-2.4.0/ node03:$PWD

(8) Modify the environment variable of node02 and node03 and take effect

(9) Start Hbase

First, make sure that the zookeeper cluster and hadoop cluster are started. Then start Hbase on the primary node

Enter the bin directory of Hbase and execute start Hbase sh

(10) Verify that Hbase started successfully

(11) Hbase web interface

Http://10.100.100.42:16010

V Phoenix(5.0.0) plug in installation

(1) File download

File download address Http://phoenix.apache.org/download.html

(2) Upload and unzip the file tar -xvf phoenix-hbase-2.4.0-5.1.2-bin tar. gz -C /export/soft

(3) Copy Jar package to Hbase directory

cp /export/soft/phoenix-hbase-2.4.0-5.1.2-bin/phoenix-*.jar /export/soft/hbase-2.4.0/lib/

(4) Distribute Jar package to other nodes of Hbase

scp phoenix-*.jar node02:$PWD

scp phoenix-*.jar node03:$PWD

(5) Modify profile

modify hbase Configuration file for

cd /export/soft/hbase-2.4.0/conf/

modify hbase-site.xml file

 <!-- support HBase Namespace mapping -->

  <property>

    <name>phoenix.schema.isNamespaceMappingEnabled</name>

    <value>true</value>

  </property>

  <!-- Support index log pre write encoding -->

  <property>

    <name>hbase.regionserver.wal.codec</name>

    <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>

  </property>

(6) Distribution profile

scp hbase-site.xml node02:$PWD

scp hbase-site.xml node03:$PWD

(7) Copy the configuration file to the phoenix directory

cp hbase-site.xml /export/soft/phoenix-hbase-2.4.0-5.1.2-bin/bin/

(8) Restart Hbase service

New cluster zookeeper 3 4.9 hadoop2. 7.5 hbase 2.1.0 phoenix (5.0.0-Hbase2.0)

Check Hadoop 2 7.5 support for local libraries

Bin/hadoop/ checknative spring boot integrates Hadoop

The client needs Hadoop in windows environment dll

These new clusters

Topics: Big Data Hadoop gis

Programmer Think