Bill data warehouse construction - data warehouse concept and data collection

Posted by ddragas on Sat, 01 Jan 2022 01:39:50 +0100

1 data warehouse concept

Data Warehouse can be abbreviated as DW or DWH. Data Warehouse is a strategic set that provides all system data support for all decision-making processes of enterprises.
The analysis of data in data warehouse can help enterprises improve business processes, control costs and improve product quality.
Data warehouse is not the final destination of data, but preparing for the final destination of data. These preparations include data cleaning, escape, classification, reorganization, merging, splitting, statistics, etc.

2 project requirements

2.1 project demand analysis

1. Real time collection of user behavior data of buried points
2. Realize the hierarchical construction of data warehouse
3. Import business data regularly every day
4. Perform report analysis according to the data in the data warehouse

2.2 project framework

2.2. 1. Technical selection

Data acquisition and transmission: Flume, Kafka, Logstash, DataX, Sqoop
Data storage: Hive, MySql, HDFS, HBase, S3
Data calculation: Spark, Hive, Tez, Flink, Storm
Data query: Presto, Impala, Kylin

2.2. 2 common system architecture diagram design

2.2. 3 common system data flow design

2.2. 4 frame version selection

software product edition
Hadoop2.7.2
Flume1.7.0
Kafka0.11.0.2
Kafka Manager1.3.3.22
Hive1.2.1
Sqoop1.4.6
Mysql5.7
Azkaban2.5.0
Java1.8
Zookeeper3.4.10

Note: try not to select the latest frame for frame selection, and choose the stable version of the latest frame about half a year ago.
2.2. 5 cluster resource planning and design

Server 1Server 2Server 3
HDFSNameNode,DataNodeDataNodeDataNode
YarnNodeManagerResourcemanager,NodeManagerNodeManager
ZookeeperZookeeperZookeeperZookeeper
Flume (collection log)FlumeFlume
KafkaKafkaKafkaKafka
Flume (consumption Kafka)Flume
HiveHive
MysqlMysql

3 data generation module

3.1 bill data field

private String sysId;               // Platform coding
private String serviceName;         // Interface service name
private String homeProvinceCode;    // Belonging Province
private String visitProvinceCode;   // Visit Province
private String channelCode;         // Channel code
private String serviceCode;         // Business serial number
private String cdrGenTime;  // start time
private String duration;            // duration
private String recordType;          // Bill type
private String imsi;                // 46000 first 15 bit code
private String msisdn;              // Usually refers to the mobile phone number
private String dataUpLinkVolume;    // Uplink traffic
private String dataDownLinkVolume;  // Downlink traffic
private String charge;              // cost
private String resultCode;          // Result coding

3.2 bill data format:

{"cdrGenTime":"20211206165309844","channelCode":"wRFUC","charge":"62.28","dataDownLinkVolume":"2197","dataUpLinkVolume":"467","duration":"1678","homeProvinceCode":"551","imsi":"460007237312954","msisdn":"15636420864","recordType":"gprs","resultCode":"000000","serviceCode":"398814910321850","serviceName":"a9icp8xcNy","sysId":"U1ob6","visitProvinceCode":"451"}

3.3 write project simulation to generate bill data

See project: CDR for details_ gen_ app
Package and deploy the project to Hadoop 101 machine, and execute the command to test data generation

[hadoop@hadoop101 ~]$ head datas/call.log 
{"cdrGenTime":"20211206173630762","channelCode":"QEHVc","charge":"78.59","dataDownLinkVolume":"3007","dataUpLinkVolume":"3399","duration":"1545","homeProvinceCode":"571","imsi":"460002441249614","msisdn":"18981080935","recordType":"gprs","resultCode":"000000","serviceCode":"408688053609568","serviceName":"8lr23kV4C2","sysId":"j4FbV","visitProvinceCode":"931"}
{"cdrGenTime":"20211206173630884","channelCode":"TY4On","charge":"31.17","dataDownLinkVolume":"1853","dataUpLinkVolume":"2802","duration":"42","homeProvinceCode":"891","imsi":"460008474876800","msisdn":"15998955319","recordType":"gprs","resultCode":"000000","serviceCode":"739375515156215","serviceName":"6p87h0OHdC","sysId":"UnI9V","visitProvinceCode":"991"}
...

4 data acquisition

4.1 Hadoop environment preparation

Hadoop101Hadoop102Hadoop103
HDFDNameNode, DataNodeDataNodeDataNode
YarnNodeManagerResourceManager,NodeManagerNodeManager

4.1. 1 add LZO support package

1) Download lzo's jar project first
<u>https://github.com/twitter/hadoop-lzo/archive/master.zip</u>
2) The downloaded file name is Hadoop LZO master. It is a zip compressed package. Decompress it first, and then compile it with maven. Generate hadoop-lzo-0.4 20.
3) The compiled hadoop-lzo-0.4 20. Jar into hadoop-2.7 2/share/hadoop/common/

[hadoop@hadoop101 common]$ pwd
/opt/modules/hadoop-2.7.2/share/hadoop/common
[hadoop@hadoop101 common]$ ls
hadoop-lzo-0.4.20.jar

4) Synchronize hadoop-lzo-0.4 20. Jar to Hadoop 102 and Hadoop 103

[hadoop@hadoop101 common]$ xsync hadoop-lzo-0.4.20.jar

4.1. 2 add configuration

1)core-site.xml added configuration to support LZO compression

    <property>
       <name>io.compression.codecs</name>
       <value>
           org.apache.hadoop.io.compress.GzipCodec,
           org.apache.hadoop.io.compress.DefaultCodec,
           org.apache.hadoop.io.compress.BZip2Codec,
           org.apache.hadoop.io.compress.SnappyCodec,
           com.hadoop.compression.lzo.LzoCodec,
           com.hadoop.compression.lzo.LzopCodec
       </value>
  </property>
  <property>
      <name>io.compression.codec.lzo.class</name>
      <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

2) Synchronize core site XML to Hadoop 102, Hadoop 103

[hadoop@hadoop101 hadoop]$ xsync core-site.xml

4.1. 2 start the cluster

[hadoop@hadoop101 hadoop-2.7.2]$ sbin/start-dfs.sh
[hadoop@hadoop102 hadoop-2.7.2]$ sbin/start-yarn.sh

4.1. 3 verification

1) web and process viewing

4.2 Zookeeper environment preparation

Hadoop101Hadoop102Hadoop103
ZookeeperZookeeperZookeeperZookeeper

4.2.2 ZK cluster start stop script

1) Create a script in the / home/hadoop/bin directory of Hadoop 101

[hadoop@hadoop101 bin]$ vim zk.sh

Write the following in the script

#! /bin/bash

case $1 in
"start"){
    for i in hadoop101 hadoop102 hadoop103
    do
        ssh $i "/opt/modules/zookeeper-3.4.10/bin/zkServer.sh start"
    done
    };;
"stop"){
    for i in hadoop101 hadoop102 hadoop103
    do
        ssh $i "/opt/modules/zookeeper-3.4.10/bin/zkServer.sh stop"
    done
    };;
esac

2) Increase script execution permission

[hadoop@hadoop101 bin]$ chmod +x zk.sh

3) Zookeeper cluster startup script

[hadoop@hadoop101 modules]$ zk.sh start

4) Zookeeper cluster stop script

[hadoop@hadoop101 modules]$ zk.sh stop

4.3 Flume environmental preparation

Hadoop101Hadoop102Hadoop103
Flume (acquisition)Flume

flume configuration analysis

Specific configuration of Flume
(1) Create the flume1.conf file in the / opt / data / callogs / flume1 / directory
[hadoop@hadoop101 conf]$ vim flume1.conf
Configure the following in the file

a1.sources=r1
a1.channels=c1 c2 
a1.sinks=k1 k2 

# configure source
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /opt/datas/calllogs/flume/log_position.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/datas/calllogs/records/calllogs.+
a1.sources.r1.fileHeader = true
a1.sources.r1.channels = c1 c2

#interceptor
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = com.cmcc.jackyan.flume.interceptor.LogETLInterceptor$Builder
a1.sources.r1.interceptors.i2.type = com.cmcc.jackyan.flume.interceptor.LogTypeInterceptor$Builder

# selector
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = recordType
a1.sources.r1.selector.mapping.volte= c1
a1.sources.r1.selector.mapping.cdr= c2

# configure channel
a1.channels.c1.type = memory
a1.channels.c1.capacity=10000
a1.channels.c1.byteCapacityBufferPercentage=20

a1.channels.c2.type = memory
a1.channels.c2.capacity=10000
a1.channels.c2.byteCapacityBufferPercentage=20

# configure sink
# start-sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = topic_volte
a1.sinks.k1.kafka.bootstrap.servers = hadoop101:9092,hadoop102:9092,hadoop103:9092
a1.sinks.k1.kafka.flumeBatchSize = 2000
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.channel = c1

# event-sink
a1.sinks.k2.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k2.kafka.topic = topic_cdr
a1.sinks.k2.kafka.bootstrap.servers = hadoop101:9092,hadoop102:9092,hadoop103:9092
a1.sinks.k2.kafka.flumeBatchSize = 2000
a1.sinks.k2.kafka.producer.acks = 1
a1.sinks.k2.channel = c2

Note: com cmcc. jackyan. flume. interceptor. Logetlinterceptor and com jackyan. flume. interceptor. Logtypeinterceptor is the full class name of a custom interceptor. It needs to be modified according to the user-defined interceptor.

4.3. 3 custom Flume interceptor

Two interceptors are customized here: ETL interceptor and log type discrimination interceptor.
ETL interceptor is mainly used to filter logs with illegal timestamp and incomplete json data
The log type discrimination interceptor is mainly used to distinguish volte bills from other bills to facilitate different topic s sent to kafka.
1) Create maven module flume interceptor
2) Create package name: com cmcc. jackyan. flume. interceptor
3) In POM Add the following configuration to the XML file

<dependencies>
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>1.7.0</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>com.cmcc.jackyan.appclient.AppMain</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

4) On COM cmcc. jackyan. flume. Create LogETLInterceptor class name under interceptor package
Flume ETL interceptor LogETLInterceptor

public class LogETLInterceptor implements Interceptor {

    //Print log for debugging
    private static final Logger logger = LoggerFactory.getLogger(LogETLInterceptor.class);

    @Override
    public void initialize() {

    }

    @Override
    public Event intercept(Event event) {

        String body = new String(event.getBody(), Charset.forName("UTF-8"));
//        logger.info("Before:" + body);

        event.setBody(LogUtils.addTimeStamp(body).getBytes());

        body = new String(event.getBody(), Charset.forName("UTF-8"));

//        logger.info("After:" + body);

        return event;
    }

    @Override
    public List<Event> intercept(List<Event> events) {

        ArrayList<Event> intercepts = new ArrayList<>();

        // Traverse all events and filter out illegal events
        for (Event event : events) {
            Event interceptEvent = intercept(event);
            if (interceptEvent != null) {
                intercepts.add(event);
            }
        }
        return intercepts;
    }

    @Override
    public void close() {

    }

    public static class Builder implements Interceptor.Builder {

        @Override
        public Interceptor build() {
            return new LogETLInterceptor();
        }

        @Override
        public void configure(Context context) {
        }
    }
}

4) Flume log processing tool class

public class LogUtils {

    private static Logger logger = LoggerFactory.getLogger(LogUtils.class);

    /**
     * Splice a timestamp for the log
     *
     * @param log
     */
    public static String addTimeStamp(String log) {

        //{"cdrGenTime":"20211207091838913","channelCode":"k82Ce","charge":"86.14","dataDownLinkVolume":"3083","dataUpLinkVolume":"2311","duration":"133","homeProvinceCode":"210","imsi":"460004320498004","msisdn":"17268221069","recordType":"mms","resultCode":"000000","serviceCode":"711869363795868","serviceName":"H79cKXuJO4","sysId":"aGIL4","visitProvinceCode":"220"}
        long t = System.currentTimeMillis();

        return t + "|" + log;
    }
}

5) Flume log type distinguishing interceptor LogTypeInterceptor

public class LogTypeInterceptor implements Interceptor {

    //Print log for debugging
    private static final Logger logger = LoggerFactory.getLogger(LogTypeInterceptor.class);

    @Override
    public void initialize() {

    }

    @Override
    public Event intercept(Event event) {

        // 1 get flume receive message header
        Map<String, String> headers = event.getHeaders();

        // 2 get the json data array received by flume

        byte[] json = event.getBody();
        // Convert json array to string
        String jsonStr = new String(json);

        String recordType = "" ;

        // volte
        if (jsonStr.contains("volte")) {
            recordType = "volte";
        }
        // cdr
        else {
            recordType = "cdr";
        }

        // 3 store the log type in the flume header
        headers.put("recordType", recordType);

//        logger.info("recordType:" + recordType);

        return event;
    }

    @Override
    public List<Event> intercept(List<Event> events) {
        ArrayList<Event> interceptors = new ArrayList<>();

        for (Event event : events) {
            Event interceptEvent = intercept(event);

            interceptors.add(interceptEvent);
        }

        return interceptors;
    }

    @Override
    public void close() {

    }

    public static class Builder implements Interceptor.Builder {

        public Interceptor build() {
            return new LogTypeInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }
}

6) Pack
After the interceptor is packaged, it only needs a separate package, and there is no need to upload the dependent package. After packaging, put it under the lib folder of flume.
7) You need to put the packed package under the / opt/modules/flume/lib folder of Hadoop 101.

[hadoop@hadoop101 lib]$ ls | grep interceptor
flume-interceptor-1.0-SNAPSHOT.jar

8) Start flume

[hadoop@hadoop101 flume]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file /opt/datas/calllogs/flume1/flume1.conf &

4.3. 4 log collection Flume start stop script
1) Create the script calllog-flume1.0 in the / home/hadoop/bin directory sh

[hadoop@hadoop101 bin]$ vim calllog-flume1.sh

Fill in the script as follows

#! /bin/bash

case $1 in
"start"){
        for i in hadoop101
        do
                echo " --------start-up $i consumption flume-------"
                ssh $i "nohup /opt/modules/flume/bin/flume-ng agent --conf /opt/modules/flume/conf/ --conf-file /opt/datas/calllogs/flume1/flume1.conf &"
        done
};;
"stop"){
        for i in hadoop101
        do
                echo " --------stop it $i consumption flume-------"
                ssh $i "ps -ef | grep flume1 | grep -v grep | awk '{print \$2}' | xargs kill -9"
        done

};;
esac

Note: nohup, this command can continue to run the corresponding process after you exit the account / close the terminal. Nohup means not to hang.
2) Increase script execution permission

[hadoop@hadoop101 bin]$ chmod +x calllog-flume1.sh

3) flume1 cluster startup script

[hadoop@hadoop101 modules]$ calllog-flume1.sh start

4) flume1 cluster stop script

[hadoop@hadoop101 modules]$ calllog-flume1.sh stop

4.4 Kafka environmental preparation

Hadoop101Hadoop102Hadoop103
KafkaKafkaKafkaKafka

4.4.1 Kafka cluster start stop script

1) Create the script kafka.com in the / home/hadoop/bin directory sh

[hadoop@hadoop101 bin]$ vim kafka.sh

Fill in the script as follows

#! /bin/bash

case $1 in
"start"){
        for i in hadoop101 hadoop102 hadoop103
        do
                echo " --------start-up $i kafka-------"
                # For KafkaManager monitoring

                ssh $i "export JMX_PORT=9988 && /opt/modules/kafka/bin/kafka-server-start.sh -daemon /opt/modules/kafka/config/server.properties "
        done
};;
"stop"){
        for i in hadoop101 hadoop102 hadoop103
        do
                echo " --------stop it $i kafka-------"
                ssh $i "ps -ef | grep server.properties | grep -v grep| awk '{print $2}' | xargs kill >/dev/null 2>&1 &"
        done
};;
esac

Note: when starting Kafka, you should first open the JMX port for subsequent KafkaManager monitoring.
2) Increase script execution permission

[hadoop@hadoop101 bin]$ chmod +x kafka.sh

3) kf cluster startup script

[hadoop@hadoop101 modules]$ kafka.sh start

4) kf cluster stop script

[hadoop@hadoop101 modules]$ kafka.sh stop

4.4. 2 view all Kafka topic s

[hadoop@hadoop101 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181 --list

4.4. 3 Create Kafka topic

Enter the / opt/modules/kafka / directory and create the startup log topic and event log topic respectively.
1) Create a volt theme

[hadoop@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181  --create --replication-factor 2 --partitions 3 --topic topic-volte  

2) Create cdr theme

[hadoop@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181  --create --replication-factor 2 --partitions 3 --topic topic-cdr

4.4. 4 delete Kafka topic

[hadoop@hadoop101 kafka]$ bin/kafka-topics.sh --delete --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181 --topic topic-volte

[hadoop@hadoop101 kafka]$ bin/kafka-topics.sh --delete --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181 --topic topic-cdr

4.4. 5 production message

[hadoop@hadoop101 kafka]$ bin/kafka-console-producer.sh \
--bootstrap-server hadoop101:9092 --topic topic-volte
>hello world
>hadoop  hadoop

4.4. 6 consumption news

[hadoop@hadoop102 kafka]$ bin/kafka-console-consumer.sh \
--bootstrap-server hadoop101:9092 --from-beginning --topic topic-volte

--From beginning: all previous data in the first topic will be read out. Select whether to add the configuration according to the business scenario.

4.4. 7 view the details of a Topic

[hadoop@hadoop101 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181 \
--describe --topic topic-volte

4.5 write flume consumption Kafka data to HDFS

Hadoop101Hadoop102Hadoop103
Flume (consumption kafka)Flume

Configuration analysis

Specific configuration of Flume
(1) Create the flume2.conf file in the / opt / data / callogs / flume2 / directory of Hadoop 103

[hadoop@hadoop101 conf]$ vim flume2.conf

Configure the following in the file

## assembly
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2

## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = hadoop101:9092,hadoop102:9092,hadoop103:9092
a1.sources.r1.kafka.zookeeperConnect = hadoop101:2181,hadoop102:2181,hadoop103:2181
a1.sources.r1.kafka.topics=topic-volte

## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers = hadoop101:9092,hadoop102:9092,hadoop103:9092
a1.sources.r2.kafka.zookeeperConnect = hadoop101:2181,hadoop102:2181,hadoop103:2181
a1.sources.r2.kafka.topics=topic-cdr

## channel1
a1.channels.c1.type=memory
a1.channels.c1.capacity=100000
a1.channels.c1.transactionCapacity=10000

## channel2
a1.channels.c2.type=memory
a1.channels.c2.capacity=100000
a1.channels.c2.transactionCapacity=10000

## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/calllogs/records/topic-volte/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = volte-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 30
a1.sinks.k1.hdfs.roundUnit = second

##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/calllogs/records/topic-cdr/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = cdr-
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 30
a1.sinks.k2.hdfs.roundUnit = second

## Do not generate a large number of small files
a1.sinks.k1.hdfs.rollInterval = 30
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0

a1.sinks.k2.hdfs.rollInterval = 30
a1.sinks.k2.hdfs.rollSize = 0
a1.sinks.k2.hdfs.rollCount = 0

## The control output file is a native file.
a1.sinks.k1.hdfs.fileType = CompressedStream 
a1.sinks.k2.hdfs.fileType = CompressedStream 

a1.sinks.k1.hdfs.codeC = lzop
a1.sinks.k2.hdfs.codeC = lzop

## Assembly
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1

a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2

4.5.2 Flume exception handling
1) Problem Description: if Flume starts consumption, the following exception is thrown

ERROR hdfs.HDFSEventSink: process failed
java.lang.OutOfMemoryError: GC overhead limit exceeded

2) Solution steps:
(1) Add the following configuration in / opt / modules / flume / conf / flume env.sh file of Hadoop 101 server

export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

(2) Synchronize configuration to Hadoop 102 and Hadoop 103 servers

[hadoop@hadoop101 conf]$ xsync flume-env.sh

4.5. 2 log consumption Flume start stop script
1) Create the script calllog-flume2.0 in the / home/hadoop/bin directory sh

[hadoop@hadoop101 bin]$ vim calllog-flume2.sh

Fill in the script as follows

#! /bin/bash

case $1 in
"start"){
        for i in hadoop103
        do
                echo " --------start-up $i consumption flume-------"
                ssh $i "nohup /opt/modules/flume/bin/flume-ng agent --conf /opt/modules/flume/conf/ --conf-file /opt/datas/calllogs/flume2/flume2.conf &"
        done
};;
"stop"){
        for i in hadoop103
        do
                echo " --------stop it $i consumption flume-------"
                ssh $i "ps -ef | grep flume2 | grep -v grep | awk '{print \$2}' | xargs kill -9"
        done

};;
esac

2) Increase script execution permission

[hadoop@hadoop101 bin]$ chmod +x calllog-flume2.sh

3) Cluster startup script

[hadoop@hadoop101 modules]$ calllog-flume2.sh start

4) Cluster stop script

[hadoop@hadoop101 modules]$ calllog-flume2.sh stop

4.6 acquisition channel start / stop script

1) Create the script cluster in the / opt / data / callogs directory sh

[hadoop@hadoop101 calllogs]$ vim cluster.sh

Fill in the script as follows

#! /bin/bash

case $1 in
"start"){
        echo " -------- Start cluster -------"

        echo " -------- start-up hadoop colony -------"
        /opt/modules/hadoop-2.7.2/sbin/start-dfs.sh 
        ssh hadoop102 "/opt/modules/hadoop-2.7.2/sbin/start-yarn.sh"

        #Start Zookeeper cluster
        zk.sh start

        #Start Flume collection cluster
        calllog-flume1.sh start

        #Start Kafka collection cluster
        kafka.sh start

        sleep 4s;

        #Start Flume consumption cluster
        calllog-flume2.sh start
};;
"stop"){
        echo " -------- Stop cluster -------"

        #Stop Flume consumption cluster
        calllog-flume2.sh stop

        #Stop Kafka collection cluster
        kafka.sh stop

        sleep 4s;

        #Stop Flume collection cluster
        calllog-flume1.sh stop

        #Stop Zookeeper cluster
        zk.sh stop

        echo " -------- stop it hadoop colony -------"
        ssh hadoop102 "/opt/modules/hadoop-2.7.2/sbin/stop-yarn.sh"
        /opt/modules/hadoop-2.7.2/sbin/stop-dfs.sh 
};;
esac

2) Increase script execution permission

[hadoop@hadoop101 calllogs]$ chmod +x cluster.sh

3) Cluster cluster startup script

[hadoop@hadoop101 calllogs]$ ./cluster.sh start

4) Cluster cluster stop script

[hadoop@hadoop101 calllogs]$ ./cluster.sh stop

Topics: Big Data Data Warehouse