E-commerce log items

Posted by lilleman on Wed, 22 Dec 2021 07:11:49 +0100

Project architecture design

Java project collection

Project system architecture

Based on the real business data architecture of an e-commerce website, the project realizes the data from collection to use through multi-directional closed-loop business such as front-end application, back-end program, data analysis and platform deployment. A set of e-commerce log analysis project in line with the teaching system has been formed, which is mainly realized through off-line technology.

  • User visualization: it is mainly responsible for the interaction with users and the display of business data. The main body is implemented in JS, not on tomcat server.
  • Business logic program: it mainly realizes the overall business logic, which is built through spring to meet the business requirements. Deployed on tomcat.
  • data storage
    • Business database: the project adopts the widely used relational database mysql, which is mainly responsible for the storage of platform business logic data
    • HDFS distributed file storage service: the project adopts HBase+Hive, which is mainly used to store the full amount of historical business data and support the demand for high-speed acquisition, as well as the storage of massive data, so as to support future decision analysis.
  • Offline analysis part
    • Log collection service: collect the user's access behavior to the page in the business platform by using flume ng and send it to the HDFS cluster regularly
    • Offline analysis and ETL: the batch statistical business is realized by MapReduce+Hivesql to realize the statistical task of indicator data
    • Data transfer service: it is mainly responsible for transferring data to Hive by batch processing business data with sqoop

Project data flow

  • Analysis system (bf_transformer)

    • From data collection to page presentation

  • Log collection section

    • Flume reads log updates from the operation logs of business services and regularly pushes the updated logs to HDFS; After receiving these logs, HDFS filters the obtained log information through the MR program to obtain the user access data stream UID | mid | platform | browser | timestamp; After the calculation is completed, the data is merged with the data in HBase.
  • ETL part

  • Load the data initialized by the system into HBase through MapReduce.

  • Offline analysis part

  • The offline statistics service and offline analysis service can be scheduled through Oozie, and the trigger operation of tasks can be completed through the set running time

  • The offline analysis service loads data from HBase, implements multiple statistical algorithms, and writes the calculation results to Mysql;

  • Data warehouse analysis service

    • Scheduling and execution of sql script

  • The data warehouse analysis service can be scheduled through Oozie, and the trigger operation of tasks can be completed through the set running time

  • The data analysis service loads data from the database of each system into HDFS. After HDFS receives these logs, Filter the acquired data through MR program (unify the data format). After the calculation, merge the data with the data in Hive; after Hive obtains these data, logically process the acquired data through HQL script to realize transaction information, access information and multiple indicators; after the calculation, merge the data with the data in Hive.

  • Application background execution workflow

    Note: instead of using ip to indicate the uniqueness of the user, we fill a uuid in the cookie to indicate the uniqueness of the user.

    In our js sdk, different events are divided according to different data collected.

    • For example, pageview event, the execution process of Js sdk is as follows:

  • analysis

    • PC side event analysis

    For our final different analysis modules, we need different data. Next, we analyze the data required by each module from each module. The basic user information is the analysis of the user's browsing behavior information, that is, we only need the pageview event;

    Browser information analysis and region information analysis actually add the dimension information of browser and region on the basis of user basic information analysis. Among them, we can use the browser window navigator. The user agent is used for analysis. The regional information can be analyzed by collecting the user's ip address through the nginx server, that is, the pageview event can also meet the analysis of these two modules.

    For external chain data analysis and user browsing depth analysis, we can add the current url of the visiting page and the url of the previous page to the pageview event for processing and analysis, that is, the pageview event can also meet the analysis of these two modules.

    Order information analysis requires the pc to send an event generated by an order, so corresponding to the analysis of this module, we need a new event chargeRequest. For event analysis, we also need a pc to send a new event data, which can be defined as event. In addition, we also need to set up a launch event to record the access of new users.

    The format of data url sent by various events on Pc side is as follows, and the parameters behind the url are the data we collected: http://shsxt.com/shsxt.gif?requestdata

    Final analysis modulePC js sdk event
    User basic information analysispageview event
    Browser information analysispageview event
    Regional information analysispageview event
    External chain data analysispageview event
    User browsing depth analysispageview event
    Order information analysischargeRequest event
    Event analysisEvent event
    User basic information modificationlaunch event

PC side JS and SDK events

  • General parameters

The same information will be returned in all buried points.

namecontent
Data sentu_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time= 1449137597974&ver=1&pl=website&sdk=js& b_rst=1920*1080&u_ud=12BF4079-223E-4A57-AC60-C1A0 4D8F7A2F&b_iev=Mozilla%2F5.0%20(Windows%20NT%206. 1%3B%20WOW64)%20AppleWebKit%2F537.1%20(KHTML%2C%2 0like%20Gecko)%20Chrome%2F21.0.1180.77%20Safari% 2F537.1&l=zh-CN&en=e_l
Parameter nametypedescribe
u_sdstringSession id
c_timestringClient creation time
verstringVersion number, eg: 0.0 one
plstringPlatform, eg: website
sdkstringSdk type, eg: js
b_rststringBrowser resolution, eg: 1800*678
u_udstringUser / visitor unique identifier
b_ievstringBrowser information useragent
lstringClient language
  • Launch event

This event is triggered when the user visits the website for the first time. No external call interface is provided, and only the data collection of this event is realized.

namecontent
Data senten=e_l&General parameters
Parameter nametypedescribe
enstringEvent name, eg: e_l
  • Member login time

This event is triggered when the user logs in to the website. No external call interface is provided, and only the data collection of this event is realized.

namecontent
Data sentu_mid=phone&General parameters
Parameter nametypedescribe
u_midstringThe member id is consistent with the business system
  • Pageview event, which depends on onPageView class

This event is triggered when the user accesses / refreshes the page. This event will be called automatically, or it can be called manually by the programmer.

Method namecontent
Data senten=e_pv&p_ref=www.shsxt.com%3A8080&p_url =http%3A%2F%2Fwww.shsxt.com%3A8080%2Fvst_track%2Findex.html&General parameters
Parameter nametypedescribe
enstringEvent name, eg: e_pv
p_urlstringurl of the current page
p_refstringurl of the previous page
  • ChargeSuccess event

    This event is triggered when the user successfully places an order, which needs to be actively called by the program.

Method nameonChargeRequest
Data sentoid=orderid123&on=%E4%BA%A7%E5%93% 81%E5%90%8D%E7%A7%B0&cua=1000&cut=%E4%BA%BA%E6%B0 %91%E5%B8%81&pt=%E6%B7%98%E5%AE%9&en=e_cs &General parameters
parametertypeRequireddescribe
orderIdstringyesOrder id
onStringyesProduct purchase description name
cuadoubleyesOrder price
cutStringyesCurrency type
ptStringyesPayment method
enStringyesEvent name, eg: e_cs
  • Chargefind event

    This event is triggered when the user fails to place an order, which needs to be actively called by the program.

Method nameonChargeRequest
Data sentoid=orderid123&on=%E4%BA%A7%E5%93% 81%E5%90%8D%E7%A7%B0&cua=1000&cut=%E4%BA%BA%E6%B0 %91%E5%B8%81&pt=%E6%B7%98%E5%AE%9&en=e_cr &General parameters
parametertypeRequireddescribe
orderIdstringyesOrder id
onStringyesProduct purchase description name
cuadoubleyesOrder price
cutStringyesCurrency type
ptStringyesPayment method
enStringyesEvent name, eg: e_cr
  • Event event

When a visitor / user triggers a business defined event, the front-end program calls this method.

Method nameonEventDuration
Data sentca=%E7%B1%BB%E5%9E%8B&ac=%E5%8A%A8%E4%BD%9C kv_p_url=http%3A%2F%2Fwwwshsxt...com %3A8080%2Fvst_track%2Findex.html&kv_%E5%B1%9E%E6 %80%A7key=%E5%B1%9E%E6%80%A7value&du=1000& en=e_e&General parameters
parametertypeRequireddescribe
castringyesCategory name of Event
acStringyesaction name of Event
kv_p_urlmapnoCustom properties for Event
dulongnoDuration of Event
enStringyesEvent name, eg: e_e
  • Data parameter description

    Collect different data in different events and send it to nginx server, but in fact, these collected data still have some commonalities. The possible parameters used are described as follows:

    Parameter nametypedescribe
    enstringEvent name, eg: e_pv
    verstringVersion number, eg: 0.0 one
    plstringPlatform, eg: website
    sdkstringSdk type, eg: js
    b_rststringBrowser resolution, eg: 1800*678
    b_ievstringBrowser information useragent
    u_udstringUser / visitor unique identifier
    lstringClient language
    u_midstringThe member id is consistent with the business system
    u_sdstringSession id
    c_timestringClient time
    p_urlstringurl of the current page
    p_refstringurl of the previous page
    ttstringTitle of the current page
    castringCategory name of Event
    acstringaction name of Event
    kv_*stringCustom properties for Event
    dustringDuration of Event
    oidstringOrder id
    onstringOrder name
    cuastringPayment amount
    cutstringPayment currency type
    ptstringPayment method
  • The order workflow is as follows: (similar to refund)

  • analysis

    • Program background event analysis

    In this project, only the chargeSuccess event will start in the program background. The main function of this event is to send the information of order success to the nginx server. The sending format is the same as the pc sending method, and it also accesses the same url for data transmission. The format is:

    Final analysis modulePC js sdk event
    Order information analysischargeSuccess event chargereturn event
    • chargeSuccess event

      This event is triggered when the member finally pays successfully, which needs to be actively called by the program.

      Method nameonChargeSuccess
      Data sentu_mid=shsxt&c_time=1449142044528&oid=orderid123&ver=1&en=e_cs&pl= javaserver&sdk=jdk
      parametertypeRequireddescribe
      orderIdstringyesOrder id
      memberIdstringyesMember id
    • Chargefind event

      This event is triggered when a member performs a refund operation, which needs to be actively called by the program.

      Method nameonChargeRefund
      Data sentu_mid=shsxt&c_time=1449142044528&oid=orderid123 &ver=1&en=e_cr&pl=javaserver&sdk=jdk
      parametertypeRequireddescribe
      orderIdstringyesOrder id
      memberIdstringyesMember id
  • Integration mode

The sdk of java can be directly introduced into the project or added to the classpath.

  • Data parameter description

The parameters are described as follows:

Parameter nametypedescribe
enstringEvent name, eg: e_cs
verstringVersion number, eg: 0.0 one
plstringPlatform, eg: website,javaweb,php
sdkstringSdk type, eg: java
u_midstringThe member id is consistent with the business system
c_timestringClient time
oidstringOrder id

Project data model

  • HBase storage structure

    Here, we use the method of including timestamp in rowkey; hbase column clusters use log to mark column clusters. So finally, we create a single column cluster rowkey eventlog table with timestamp.

    • create ‘eventlog’, ‘log’. rowkey design rules are: timestamp + (uid + mid + EN) actual work content in CRC Coding Project

Environment construction

NginxLog file service (single node)

1. Upload and unzip

//To the root directory
cd /root

//Upload file
rz

//decompression
tar -zxvf ./nginx-1.8.1.tar.gz

//Delete compressed package
rm -rf ./nginx-1.8.1.tar.gz

2. Compile and install

//Compile and install
//Install the dependent programs required by nginx in advance
yum install gcc pcre-devel zlib-devel openssl-devel -y

//Locate the configure file
cd nginx-1.8.1/

//Execute compilation
./configure --prefix=/opt/sxt/nginx

//install
make && make install

3. Start verification

//Find nginx startup file
cd /opt/sxt/nginx/sbin

//start-up
./nginx

//web side authentication
shsxt-hadoop101:80

//Common commands
nginx -s reload
nginx -s quit

Flume ng (single node)

  • Upload and decompress
//Upload data file
mkdir -p /opt/sxt/flume

cd /opt/sxt/flume

rz

//decompression
tar -zxvf apache-flume-1.6.0-bin.tar.gz

//delete
rm -rf apache-flume-1.6.0-bin.tar.gz

  • Modify profile
cd /opt/sxt/flume/apache-flume-1.6.0-bin/conf

cp flume-env.sh.template flume-env.sh

vim flume-env.sh

export JAVA_HOME=/usr/java/jdk1.8.0_231-amd64

# Set the used memory size. This is only used when chnnel is set to memory storage
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
  • Modify environment variables
vim /etc/profile

export FLUME_HOME=/opt/sxt/flume/apache-flume-1.6.0-bin
export PATH=$FLUME_HOME/bin:

source /etc/profile
  • verification
flume-ng version

Sqoop (single node)

  • install

Upload, decompress, modify and verify the configuration file

//create folder
mkdir -p /opt/sxt/sqoop

//upload
rz

//decompression
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

//delete
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

//Configure environment variables
export SQOOP_HOME=/opt/sxt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
export PATH=$SQOOP_HOME/bin:

source /etc/profile

//Add mysql connection package
cd ./sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/

//rename profile
cd /opt/sxt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf

mv sqoop-env-template.sh sqoop-env.sh

//Modify the configuration file (comment out the unused component information)
cd /opt/sxt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/configure-sqoop

vim configure-sqoop

//Note the following
##if [ -z "${HCAT_HOME}" ]; then
##  if [ -d "/usr/lib/hive-hcatalog" ]; then
##    HCAT_HOME=/usr/lib/hive-hcatalog
##  elif [ -d "/usr/lib/hcatalog" ]; then
##    HCAT_HOME=/usr/lib/hcatalog
##  else
##    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
##    if [ ! -d ${HCAT_HOME} ]; then
##       HCAT_HOME=${SQOOP_HOME}/../hcatalog
##    fi
##  fi
##fi
##if [ -z "${ACCUMULO_HOME}" ]; then
##  if [ -d "/usr/lib/accumulo" ]; then
##    ACCUMULO_HOME=/usr/lib/accumulo
##  else
##    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
##  fi
##fi

## Moved to be a runtime check in sqoop.
##if [ ! -d "${HCAT_HOME}" ]; then
##  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
##  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
##fi

##if [ ! -d "${ACCUMULO_HOME}" ]; then
##  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
##  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
##fi

##export HCAT_HOME
##export ACCUMULO_HOME


//verification
sqoop version

//Verify sqoop and database connections
sqoop list-databases -connect jdbc:mysql://shsxt-hadoop101:3306/ -username root -password 123456

Integration of Hive and HBase

hive and hbase synchronization https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

  • Map data from HBase to Hive
    • Because they are stored on HDFS
    • Therefore, Hive can specify the data storage path of HBase when creating tables
    • However, deleting the Hive table does not delete the Hbase
    • Conversely, if HBase is deleted, Hive's data will be deleted
//Lose the jar package, configure the cluster information, and specify the HBase mapped data when creating the table
cp /opt/sxt/apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar /opt/sxt/hbase-1.4.13/lib/

//Check whether the jar has been uploaded successfully (three nodes)
ls /opt/sxt/hbase-1.4.13/lib/hive-hbase-handler-*

//Add attributes to hive's configuration file:
vim /opt/sxt/apache-hive-1.2.1-bin/conf/hive-site.xml

//New: (all three nodes are added)
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>shsxt-hadoop101:2181,shsxt-hadoop102:2181,shsxt-hadoop103:2181</value>
    </property>


//verification
//First create a temporary table in hive, and then query this table

CREATE EXTERNAL TABLE brower1 (
`id` string, 
`name` string, 
`version` string)  
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:browser,info:browser_v")  
TBLPROPERTIES ("hbase.table.name" = "event");

CREATE EXTERNAL TABLE tmp_order (
`key` string, 
`name` string, 
`age` string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties('hbase.columns.mapping'=':key,info:u_ud,info:u_sd')
tblproperties('hbase.table.name'='event');

Hive on Tez (single node)

1. Add apache-tez-0.8 Deploy the compressed package in the 5-bin / share directory to HDFS

cd /opt/bdp/

rz

tar -zxvf apache-tez-0.8.5-bin.tar.gz

rm -rf apache-tez-0.8.5-bin.tar.gz

cd apache-tez-0.8.5-bin/share/

hadoop fs -mkdir -p /bdp/tez/

hadoop fs -put tez.tar.gz /bdp/tez/

hadoop fs -chmod -R 777 /bdp

hadoop fs -ls /bdp/tez/

2. Create tez site in {hide_home} / conf directory XML file, as follows:

cd /opt/bdp/apache-hive-1.2.1-bin/conf/

vim tez-site.xml 


<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>/bdp/tez/tez.tar.gz</value>
    </property>
    <property>
        <name>tez.container.max.java.heap.fraction</name>
        <value>0.2</value>
        </property>
</configuration>

Note: tez lib. The path of the URI configuration is tez. In the previous step tar. HDFS path for GZ compressed package deployment. Also tez site The XML file needs to be copied to the corresponding directories of the nodes where HiveServer2 and HiveMetastore services are located.

3. Add apache-tez-0.8 Tez. In the 5-bin / share directory tar. Unzip the GZ compressed package into the current lib directory

cd /opt/bdp/apache-tez-0.8.5-bin/share/

ll

mkdir lib

tar -zxvf tez.tar.gz -C lib/

4. Copy all jar packages under lib and lib/lib directories to {HIVE_HOME}/lib directory

cd lib

pwd

scp -r *.jar /opt/bdp/apache-hive-1.2.1-bin/lib/

scp -r lib/*.jar /opt/bdp/apache-hive-1.2.1-bin/lib/

ll /opt/bdp/apache-hive-1.2.1-bin/lib/tez-*

Note: Tez's dependent packages need to be copied to the corresponding directories of the nodes where HiveServer2 and HiveMetastore services are located.

5. After completing the above operations, restart the HiveServer and HiveMetastore services

nohup hive --service metastore > /dev/null 2>&1 &

nohup hiveserver2 > /dev/null 2>&1 &

netstat -apn |grep 10000
netstat -apn |grep 9083

Hive2 On Tez test: test with hive command

hive

set hive.tez.container.size=3020;

set hive.execution.engine=tez;

use bdp;

select count(*) from test;

Oozie build

Deploy Hadoop (CDH version)

  • Modify Hadoop configuration

    core-site.xml

    <!-- Oozie Server of Hostname -->
    <property>
          <name>hadoop.proxyuser.atguigu.hosts</name>
          <value>*</value>
    </property>
    
    <!-- Allowed to be Oozie Proxy user group -->
    <property>
          <name>hadoop.proxyuser.atguigu.groups</name>
         <value>*</value>
    </property>
    

    mapred-site.xml

    <!-- to configure MapReduce JobHistory Server Address, default port 10020 -->
    <property>
       <name>mapreduce.jobhistory.address</name>
       <value>shsxt_hadoop102:10020</value>
    </property>
    
    <!-- to configure MapReduce JobHistory Server web ui Address, default port 19888 -->
    <property>
       <name>mapreduce.jobhistory.webapp.address</name>
       <value>shsxt_hadoop102:19888</value>
    </property>
    

    yarn-site.xml

    <!-- Task history service -->
    <property> 
          <name>yarn.log.server.url</name> 
          <value>http://shsxt_hadoop102:19888/jobhistory/logs/</value> 
    </property>
    

Remember to synchronize scp to other machine nodes after completion

  • Restart Hadoop cluster
sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/mr-jobhistory-daemon.sh start historyserver

Note: if you need to start JobHistoryServer, you'd better run an MR task for testing.

Deploy Oozie

  • Unzip Oozie
 tar -zxvf /opt/sxt/cdh/oozie-4.0.0-cdh5.3.6.tar.gz -C ./
  • In the oozie root directory, unzip oozie-hadoop libs-4.0 0-cdh5. 3.6. tar. gz
 tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../

After completion, the Hadoop LIBS directory will appear under the Oozie directory.

  • Create the libext directory under the Oozie directory
mkdir libext/
  • Copy dependent Jar package

    • Copy the jar package in Hadoop LIBS to the libext Directory:
    cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
    
    • Copy the Mysql driver package to the libext Directory:
    cp -a /root/mysql-connector-java-5.1.27-bin.jar ./libext/
    
  • Add ext-2.2 Zip to the libext / directory

    ext is a js framework for displaying oozie front-end pages:

     cp -a /root/ext-2.2.zip libext/
    
  • Modify Oozie profile

    oozie-site.xml

Properties: oozie.service.JPAService.jdbc.driver

Attribute value: com.mysql.jdbc.Driver

Explanation: JDBC Drive of

 

Properties: oozie.service.JPAService.jdbc.url

Attribute value: jdbc:mysql://shsxt_hadoop101:3306/oozie

Explanation: oozie Required database address

 

Properties: oozie.service.JPAService.jdbc.username

Attribute value: root

Explanation: database user name

 

Properties: oozie.service.JPAService.jdbc.password

Attribute value: 123456

Explanation: database password

 

Properties: oozie.service.HadoopAccessorService.hadoop.configurations

Attribute value:*=/opt/sxt/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop

Explanation: let Oozie quote Hadoop Configuration file for
  • Create Oozie's database in Mysql

    Enter Mysql and create oozie database:

    mysql -uroot -p000000
    
    create database oozie;
    
    grant all on *.* to root@'%' identified by '123456';
    
    flush privileges;
    
    exit;
    
  • Initialize Oozie

    1) Upload yarn.com under Oozie directory tar. GZ file to HDFS:

    Tip: yarn tar. GZ files will unzip themselves

    bin/oozie-setup.sh sharelib create -fs hdfs://shsxt_hadoop102:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
    

    After successful execution, go to 50070 to check whether there is file generation in the corresponding directory.

    2) Create oozie SQL file

    bin/ooziedb.sh create -sqlfile oozie.sql -run
    

    3) Package the project and generate the war package

    bin/oozie-setup.sh prepare-war
    
  • Startup and shutdown of Oozie

    The startup command is as follows:

    bin/oozied.sh start
    

    The closing command is as follows:

    bin/oozied.sh stop
    

Project realization

System environment:

systemedition
windows10 professional edition
linuxCentOS 7

Development tools:

tooledition
idea2019.2.4
maven3.6.2
JDK1.8+

Cluster environment:

frameedition
hadoop2.6.5
zookeeper3.4.10
hbase1.3.1
flume1.6.0
sqoop1.4.6

Hardware environment:

Hardwarehadoop102hadoop103hadoop104
Memory1G1G1G
CPU2 nucleus1 core1 core
Hard disk50G50G50G

Data production

data structure

We will use the method of including timestamp in rowkey in HBase; HBase column clusters use log to mark column clusters. So finally, we create a single column cluster rowkey eventlog table with timestamp. create ‘eventlog’, ‘log’

The design rule of rowkey is: timestamp+(uid+mid+en)crc code

Listingexplaindistance
browserBrowser name360
browser_vBrowser version3
citycityGuiyang City
countrycountryChina
enEvent namee_l
osoperating systemlinux
os_vOperating system version1
p_urlurl of the current pagehttp://www.tmall.com
plplatformwebsite
provinceProvinces and citiesGuizhou Province
s_timesystem time1595605873000
u_sdSession id12344F83-6357-4A64-8527-F09216974234
u_udUser id26866661

Write code

Create a new maven project: shsxt_ecshop

  • pom file configuration (you need to customize the name of the package and the technical version used, and the following is required):
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.3.2.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.xxxx</groupId>
	<artifactId>bigdatalog</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>bigdatalog</name>
	<description>Demo project for Spring Boot</description>

	<properties>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-freemarker</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
			<exclusions>
				<exclusion>
					<groupId>org.junit.vintage</groupId>
					<artifactId>junit-vintage-engine</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>

</project>

  • analytics.js: new js file

    Create an anonymous self calling function: simple operations for declaring cookie s (get, set), parameter settings (start a session, the user logs in for the first time, the user visits the page, the user order request, the user-defined event, the method that must be executed before executing the external method, send the data to the server, add the common part sent to the log collection server to the data, and return the string with the parameter code).

Tip: a custom parameter is required: serverurl:“ http://bd1601/shsxt.jpg "

(function () {
//The cookie is used to determine whether it is a new access.
    var CookieUtil = {
        // get the cookie of the key is name
        get: function (name) {
            var cookieName = encodeURIComponent(name) + "=", cookieStart = document.cookie
                .indexOf(cookieName), cookieValue = null;
            if (cookieStart > -1) {
                var cookieEnd = document.cookie.indexOf(";", cookieStart);
                if (cookieEnd == -1) {
                    cookieEnd = document.cookie.length;
                }
                cookieValue = decodeURIComponent(document.cookie.substring(
                    cookieStart + cookieName.length, cookieEnd));
            }
            return cookieValue;
        },
        // set the name/value pair to browser cookie
        set: function (name, value, expires, path, domain, secure) {
            var cookieText = encodeURIComponent(name) + "="
                + encodeURIComponent(value);

            if (expires) {
                // set the expires time
                var expiresTime = new Date();
                expiresTime.setTime(expires);
                cookieText += ";expires=" + expiresTime.toGMTString();
            }

            if (path) {
                cookieText += ";path=" + path;
            }

            if (domain) {
                cookieText += ";domain=" + domain;
            }

            if (secure) {
                cookieText += ";secure";
            }

            document.cookie = cookieText;
        },
        setExt: function (name, value) {
            this.set(name, value, new Date().getTime() + 315360000000, "/");
        }
    };
    

//===================================================================================
    
    // The subject is actually tracker js
    var tracker = {
        // config
        clientConfig: {
            //Address of the log server
            serverUrl: "http://bd1601/shsxt.jpg",
            //session expiration time
            sessionTimeout: 360, // 360s -> 6min
            //Maximum waiting time
            maxWaitTime: 3600, // 3600s -> 60min -> 1h
            //Version version
            ver: "1"
        },

        //cookie expiration time
        cookieExpiresTime: 315360000000, // cookie expiration time, 10 years

        //General data
        columns: {
            // The name of the column sent to the server
            eventName: "en",
            version: "ver",
            platform: "pl",
            sdk: "sdk",
            uuid: "u_ud",
            memberId: "u_mid",
            sessionId: "u_sd",
            clientTime: "c_time",
            language: "l",
            userAgent: "b_iev",
            resolution: "b_rst",
            currentUrl: "p_url",
            referrerUrl: "p_ref",
            title: "tt",
            orderId: "oid",
            orderName: "on",
            currencyAmount: "cua",
            currencyType: "cut",
            paymentType: "pt",
            category: "ca",
            action: "ac",
            kv: "kv_",
            duration: "du"
        },

        //Value to set to common data
        keys: {
            pageView: "e_pv",
            chargeRequestEvent: "e_crt",
            launch: "e_l",
            eventDurationEvent: "e_e",
            sid: "bftrack_sid",
            uuid: "bftrack_uuid",
            mid: "bftrack_mid",
            preVisitTime: "bftrack_previsit",

        },

        /**
         * Get session id
         */
        getSid: function () {
            return CookieUtil.get(this.keys.sid);
        },

        /**
         * Save session id to cookie
         */
        setSid: function (sid) {
            if (sid) {
                CookieUtil.setExt(this.keys.sid, sid);
            }
        },

        /**
         * Get uuid from cookie
         */
        getUuid: function () {
            return CookieUtil.get(this.keys.uuid);
        },

        /**
         * Save uuid to cookie
         */
        setUuid: function (uuid) {
            if (uuid) {
                CookieUtil.setExt(this.keys.uuid, uuid);
            }
        },

        /**
         * Get memberID
         */
        getMemberId: function () {
            return CookieUtil.get(this.keys.mid);
        },

        /**
         * Set mid
         */
        setMemberId: function (mid) {
            if (mid) {
                CookieUtil.setExt(this.keys.mid, mid);
            }
        },

        //Start a session
        startSession: function () {
            // Method triggered when js is loaded
            if (this.getSid()) {
                // The session id exists, indicating that the uuid also exists
                if (this.isSessionTimeout()) {
                    // The session expires and a new session is generated
                    this.createNewSession();
                } else {
                    // The session has not expired. Update the latest access time
                    this.updatePreVisitTime(new Date().getTime());
                }
            } else {
                // The session id does not exist, indicating that the uuid does not exist
                this.createNewSession();
            }
            //If not, just come in and return to pv
            this.onPageView();
        },

        //User first login
        onLaunch: function () {
            // Trigger launch event
            var launch = {};
            launch[this.columns.eventName] = this.keys.launch; // Set event name
            this.setCommonColumns(launch); // Set public columns
            this.sendDataToServer(this.parseParam(launch)); // Finally send encoded data
        },
		//User access page
        onPageView: function () {
            // Trigger page view event
            if (this.preCallApi()) {
                var time = new Date().getTime();
                var pageviewEvent = {};
                pageviewEvent[this.columns.eventName] = this.keys.pageView;
                pageviewEvent[this.columns.currentUrl] = window.location.href; // Set current url
                pageviewEvent[this.columns.referrerUrl] = document.referrer; // Set the url of the previous page
                pageviewEvent[this.columns.title] = document.title; // Set title
                this.setCommonColumns(pageviewEvent); // Set public columns
                this.sendDataToServer(this.parseParam(pageviewEvent)); // Finally send encoded data
                this.updatePreVisitTime(time);
            }
        },

        //User order request
        onChargeRequest: function (orderId, name, currencyAmount, currencyType, paymentType) {
            // Event triggered to generate an order
            if (this.preCallApi()) {
                if (!orderId || !currencyType || !paymentType) {
                    this.log("order id,Currency type and payment method cannot be blank");
                    return;
                }

                if (typeof (currencyAmount) == "number") {
                    // Amount must be a number
                    var time = new Date().getTime();
                    var chargeRequestEvent = {};
                    chargeRequestEvent[this.columns.eventName] = this.keys.chargeRequestEvent;
                    chargeRequestEvent[this.columns.orderId] = orderId;
                    chargeRequestEvent[this.columns.orderName] = name;
                    chargeRequestEvent[this.columns.currencyAmount] = currencyAmount;
                    chargeRequestEvent[this.columns.currencyType] = currencyType;
                    chargeRequestEvent[this.columns.paymentType] = paymentType;
                    this.setCommonColumns(chargeRequestEvent); // Set public columns
                    this.sendDataToServer(this.parseParam(chargeRequestEvent)); // Finally send encoded data
                    this.updatePreVisitTime(time);
                } else {
                    this.log("Order amount must be numeric");
                    return;
                }
            }
        },

        //User defined events
        onEventDuration: function (category, action, map, duration) {
            // Trigger event
            if (this.preCallApi()) {
                if (category && action) {
                    var time = new Date().getTime();
                    var event = {};
                    event[this.columns.eventName] = this.keys.eventDurationEvent;
                    event[this.columns.category] = category;
                    event[this.columns.action] = action;
                    if (map) {
                        for (var k in map) {
                            if (k && map[k]) {
                                event[this.columns.kv + k] = map[k];
                            }
                        }
                    }
                    if (duration) {
                        event[this.columns.duration] = duration;
                    }
                    this.setCommonColumns(event); // Set public columns
                    this.sendDataToServer(this.parseParam(event)); // Finally send encoded data
                    this.updatePreVisitTime(time);
                } else {
                    this.log("category and action Cannot be empty");
                }
            }
        },

        /**
         * Methods that must be executed before executing external methods
         */
        preCallApi: function () {
            if (this.isSessionTimeout()) {
                // If true, it indicates that a new one needs to be created
                this.startSession();
            } else {
                this.updatePreVisitTime(new Date().getTime());
            }
            return true;
        },

        //Send data to server
        sendDataToServer: function (data) {

            // alert(data);

            // Send data to the server, where data is a string
            var that = this;
            var i2 = new Image(1, 1);
            // <img src="url"></img>
            i2.onerror = function () {
                // Retry operation can be performed here
            };
            //http:/bd1601/log. gif? Data is the parameter to be uploaded
            i2.src = this.clientConfig.serverUrl + "?" + data;
        },

        /**
         * Add the common part sent to the log collection server to the data
         */
        setCommonColumns: function (data) {
            data[this.columns.version] = this.clientConfig.ver;
            data[this.columns.platform] = "website";
            data[this.columns.sdk] = "js";
            data[this.columns.uuid] = this.getUuid(); // Set user id
            data[this.columns.memberId] = this.getMemberId(); // Set member id
            data[this.columns.sessionId] = this.getSid(); // Set sid
            data[this.columns.clientTime] = new Date().getTime(); // Set client time
            data[this.columns.language] = window.navigator.language; // Set browser language
            data[this.columns.userAgent] = window.navigator.userAgent; // Set browser type
            data[this.columns.resolution] = screen.width + "*" + screen.height; // Set browser resolution
        },

        /**
         * Create a new member and judge whether it is the first time to visit the page. If so, send the launch event.
         */
        createNewSession: function () {
            var time = new Date().getTime(); // Get current operation time
            // 1. Update the session
            var sid = this.generateId(); // Generate a session id
            this.setSid(sid);
            this.updatePreVisitTime(time); // Update last access time
            // 2. View uuid
            if (!this.getUuid()) {
                // The uuid does not exist. First create the uuid, then save it to the cookie, and finally trigger the launch event
                var uuid = this.generateId(); // Product uuid
                this.setUuid(uuid);
                this.onLaunch();
            }
        },

        /**
         * Parameter encoding return string
         */
        parseParam: function (data) {
            var params = "";
//			{key:value,key2:value2}
            for (var e in data) {
                if (e && data[e]) {
                    params += encodeURIComponent(e) + "="
                        + encodeURIComponent(data[e]) + "&";
                }
            }
            if (params) {
                return params.substring(0, params.length - 1);
            } else {
                return params;
            }
        },

        /**
         * Generate uuid
         */
        generateId: function () {
            var chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
            var tmpid = [];
            var r;
            tmpid[8] = tmpid[13] = tmpid[18] = tmpid[23] = '-';
            tmpid[14] = '4';

            for (i = 0; i < 36; i++) {
                if (!tmpid[i]) {
                    r = 0 | Math.random() * 16;
                    tmpid[i] = chars[(i == 19) ? (r & 0x3) | 0x8 : r];
                }
            }
            return tmpid.join('');
        },

        /**
         * Judge whether the session expires, and check whether the current time and the latest access time interval are less than this clientConfig. sessionTimeout<br/>
         * If it is less than, return false; Otherwise, return true.
         */
        isSessionTimeout: function () {
            var time = new Date().getTime();
            var preTime = CookieUtil.get(this.keys.preVisitTime);
            if (preTime) {
                // If the latest access time exists, judge the interval
                return time - preTime > this.clientConfig.sessionTimeout * 1000;
            }
            return true;
        },

        /**
         * Update last access time
         */
        updatePreVisitTime: function (time) {
            CookieUtil.setExt(this.keys.preVisitTime, time);
        },

        /**
         * Print log
         */
        log: function (msg) {
            console.log(msg);
        },

    };

    // Method name of external exposure
    window.__AE__ = {
        startSession: function () {
            tracker.startSession();
        },
        onPageView: function () {
            tracker.onPageView();
        },
        onChargeRequest: function (orderId, name, currencyAmount, currencyType, paymentType) {
            tracker.onChargeRequest(orderId, name, currencyAmount, currencyType, paymentType);
        },
        onEventDuration: function (category, action, map, duration) {
            tracker.onEventDuration(category, action, map, duration);
        },
        setMemberId: function (mid) {
            tracker.setMemberId(mid);
        }
    };

    // Automatic loading method
    var autoLoad = function () {
        // Set parameters
        var _aelog_ = _aelog_ || window._aelog_ || [];
        var memberId = null;
        for (i = 0; i < _aelog_.length; i++) {
            _aelog_[i][0] === "memberId" && (memberId = _aelog_[i][1]);
        }
        // Set the value of memberid according to the given memberid
        memberId && __AE__.setMemberId(memberId);
        // Start session
        __AE__.startSession();
    };

    autoLoad();
})();
  • AnalyticsEngineSDK.java

    Judge whether the order id and user id are empty. If not, build the url through the build method, and then send the url through the SendDataMonitor class

Need custom accessurl =“ http://bd1601/shsxt.jpg ";

/**
 * Analysis engine sdk java server-side data collection
 * 
 * @author root
 * @version 1.0
 *
 */
public class AnalyticsEngineSDK {
	// Log print object
	private static final Logger log = Logger.getGlobal();
	// The body of the request url
	public static final String accessUrl = "http://bd1601/shsxt.jpg";
	private static final String platformName = "java_server";
	private static final String sdkName = "jdk";
	private static final String version = "1";

	/**
	 * Trigger the event of successful order payment and send the event data to the server
	 * 
	 * @param orderId
	 *            Order payment id
	 * @param memberId
	 *            Order payment member id
	 * @return If the data is successfully sent (added to the send queue), then true is returned; Otherwise, it returns false (parameter exception & failed to add to send queue)
	 */
	public static boolean onChargeSuccess(String orderId, String memberId) {
		try {
			if (isEmpty(orderId) || isEmpty(memberId)) {
				// Order id or memberid is null
				log.log(Level.WARNING, "order id And members id Cannot be empty");
				return false;
			}
			// When the code is executed here, it means that neither the order id nor the member id is empty.
			Map<String, String> data = new HashMap<String, String>();
			data.put("u_mid", memberId);
			data.put("oid", orderId);
			data.put("c_time", String.valueOf(System.currentTimeMillis()));
			data.put("ver", version);
			data.put("en", "e_cs");
			data.put("pl", platformName);
			data.put("sdk", sdkName);
			// Create url
			String url = buildUrl(data);
			// Send url & Add url to queue
			SendDataMonitor.addSendUrl(url);
			return true;
		} catch (Throwable e) {

			log.log(Level.WARNING, "Sending data exception", e);
		}
		return false;
	}

	/**
	 * Trigger the order refund event and send the refund data to the server
	 * 
	 * @param orderId
	 *            Refund order id
	 * @param memberId
	 *            Refund member id
	 * @return Returns true if the data is sent successfully. Otherwise, false is returned.
	 */
	public static boolean onChargeRefund(String orderId, String memberId) {
		try {
			if (isEmpty(orderId) || isEmpty(memberId)) {
				// Order id or memberid is null
				log.log(Level.WARNING, "order id And members id Cannot be empty");
				return false;
			}
			// When the code is executed here, it means that neither the order id nor the member id is empty.
			Map<String, String> data = new HashMap<String, String>();
			data.put("u_mid", memberId);
			data.put("oid", orderId);
			data.put("c_time", String.valueOf(System.currentTimeMillis()));
			data.put("ver", version);
			data.put("en", "e_cr");
			data.put("pl", platformName);
			data.put("sdk", sdkName);
			// Build url
			String url = buildUrl(data);
			// Send url & Add url to queue
			SendDataMonitor.addSendUrl(url);
			return true;
		} catch (Throwable e) {
			log.log(Level.WARNING, "Sending data exception", e);
		}
		return false;
	}

	/**
	 * Build the url based on the passed in parameters
	 * 
	 * @param data
	 * @return
	 * @throws UnsupportedEncodingException
	 */
	private static String buildUrl(Map<String, String> data)
			throws UnsupportedEncodingException {
		StringBuilder sb = new StringBuilder();
		//http://node01/log.gif?
		sb.append(accessUrl).append("?");
		for (Map.Entry<String, String> entry : data.entrySet()) {
			if (isNotEmpty(entry.getKey()) && isNotEmpty(entry.getValue())) {
				sb.append(entry.getKey().trim())
						.append("=")
						.append(URLEncoder.encode(entry.getValue().trim(), "utf-8"))
						.append("&");
			}
		}
		return sb.substring(0, sb.length() - 1);// Remove last&
	}

	/**
	 * Judge whether the string is empty. If it is empty, return true. Otherwise, false is returned.
	 * 
	 * @param value
	 * @return
	 */
	private static boolean isEmpty(String value) {
		return value == null || value.trim().isEmpty();
	}

	/**
	 * Judge whether the string is not empty. If not, return true. If it is empty, false is returned.
	 * 
	 * @param value
	 * @return
	 */
	private static boolean isNotEmpty(String value) {
		return !isEmpty(value);
	}
}
  • SendDataMonitor.java

    Send url requests, use getSendDataMonitor, this single instance mode constructor private, then call the written static method to return to SendDataMonitor object, then call the queue queue of the object to add url continuously, then SendDataMonitor.monitor.run(); This method continuously circularly obtains the url in the queue for transmission

No customization required

/**
 * The monitor that sends url data, which is used to start a separate thread to send data
 * 
 * @author root
 *
 */
public class SendDataMonitor {
	// Logging object
	private static final Logger log = Logger.getGlobal();
	// Blocking queue, where the user stores the sending url
	private BlockingQueue<String> queue = new LinkedBlockingQueue<String>();
	// A class object for a single column
	private static SendDataMonitor monitor = null;

	private SendDataMonitor() {
		// Private construction method to create single column mode
	}

	/**
	 * Get the monitor object instance of single column, double check
	 * 
	 * @return
	 */
	public static SendDataMonitor getSendDataMonitor() {
		if (monitor == null) {
			synchronized (SendDataMonitor.class) {
				if (monitor == null) {
					monitor = new SendDataMonitor();

					Thread thread = new Thread(new Runnable() {

						@Override
						public void run() {
							// The specific processing method is invoked in the thread.
							SendDataMonitor.monitor.run();
						}
					});
					// When testing, it is not set to guard mode
					// thread.setDaemon(true);
					thread.start();
				}
			}
		}
		return monitor;
	}

	/**
	 * Add a url to the queue
	 * 
	 * @param url
	 * @throws InterruptedException
	 */
	public static void addSendUrl(String url) throws InterruptedException {
		getSendDataMonitor().queue.put(url);

	}

	/**
	 * Specifically implement the method of sending url
	 * 
	 */
	private void run() {
		while (true) {
			try {
				String url = this.queue.take();
				// Official send url
				HttpRequestUtil.sendData(url);
			} catch (Throwable e) {
				log.log(Level.WARNING, "send out url abnormal", e);
			}
		}
	}

	/**
	 * Internal class, http tool class for users to send data
	 * 
	 * @author root
	 *
	 */
	public static class HttpRequestUtil {
		/**
		 * Specific method of sending url
		 * 
		 * @param url
		 * @throws IOException
		 */
		public static void sendData(String url) throws IOException {
			HttpURLConnection con = null;
			BufferedReader in = null;

			try {
				URL obj = new URL(url); // Create url object
				con = (HttpURLConnection) obj.openConnection(); // Open url connection
				// Set connection parameters
				con.setConnectTimeout(5000); // Connection expiration time
				con.setReadTimeout(5000); // Read data expiration time
				con.setRequestMethod("GET"); // Set the request type to get

				System.out.println("send out url:" + url);
				// Send connection request
				in = new BufferedReader(new InputStreamReader(
						con.getInputStream()));
				// TODO: consider here whether you can
			} finally {
				try {
					if (in != null) {
						in.close();
					}
				} catch (Throwable e) {
					// nothing
				}
				try {
					con.disconnect();
				} catch (Throwable e) {
					// nothing
				}
			}
		}
	}
}
  • Test.java

    Test payment status log

No customization required

public class Test {
	public static String day = "20190607";

	public static void main(String[] args) {
		
		System.out.println("=================Start walking you=================");
		//Insert code to collect logs
		//When payment is successful
		AnalyticsEngineSDK.onChargeSuccess("orderid123", "zhangsan");
		//When payment fails
		AnalyticsEngineSDK.onChargeRefund("orderid456", "lisi");
		
		System.out.println("==========Continue to execute my code====================");
        //Normal code, continue execution

	}
}

Packaging test

Testing in Linux

Upload the jar package to the specified directory of linux

mkdir /opt/sxt/datalog/

cd /opt/sxt/datalog/

Start project

nohup java -jar bigdatalog.jar >>/opt/sxt/datalog/runlog.log 2>&1 &

verification

http://192.168.58.201:8080/

data acquisition

Idea:

  • Configure nginx and start the cluster and nginx
  • Configure flume
  • Start flume monitoring task
  • Run log production script
  • Observation test

Nginx configuration

Create a new nginx configuration file: nginx conf

New nginx profile

//Find nginx conf
cd /opt/sxt/nginx/conf

//Modify file
vim nginx.conf

Modify the following

#############Modification content:Unlock notes log_format my_format,Increase again location####################

worker_processes  1;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

  	log_format my_format '$remote_addr^A$msec^A$http_host^A$request_uri';

    sendfile        on;	

    keepalive_timeout  65;

    server {
        listen       80;
        server_name  localhost;

        location / {
            root   html;
            index  index.html index.htm;
        }
					
        location = /shsxt.jpg {
            default_type image/gif;
            access_log /opt/data/access.log my_format;
        }
}

verification

###############################Restart verification###########################################

//Create LOG storage directory
mkdir -p /opt/sxt/

//Monitoring log file
tail -F /opt/sxt/access.log

//verification
shsxt-hadoop101/shsxt.jpg

Create script for cutting Log file

##############################establish nginx Log cutting script#######################################
//Create script
vim nginx.log.sh



##############################Add the following#######################################
#nginx log cutting script

#!/bin/bash
#Set log file storage directory
logs_path="/opt/data/"
#Set backup directory
logs_bak_path="/opt/data/access_logs_bak/"

#Rename log file
mv ${logs_path}access.log ${logs_bak_path}access_$(date "+%Y%m%d").log

#Reload nginx: regenerate a new log file
/opt/bdp/nginx/sbin/nginx -s reload

Flume configuration

Create a new Flume configuration file: example conf

New Flume profile

//Create a directory dedicated to preventing flume configuration files
mkdir -p /opt/bdp/flume/options

//Move to profile directory
cd /opt/bdp/flume/options

//Create first profile
vim example.conf

Modify Flume configuration file: add the following content

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/data/access_logs_bak

a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://bdp/data/test/flume/%Y%m%d

#Prefix of uploaded file
a1.sinks.k1.hdfs.filePrefix = access_logs
##New files are generated every 60s or when the file size exceeds 10M
# Create a new file when there are many messages in hdfs. 0 is not based on the number of messages
a1.sinks.k1.hdfs.rollCount=0
# How long does hdfs create a new file? 0 is not based on time
a1.sinks.k1.hdfs.rollInterval=0
# Create a new file when hdfs is large. 0 is not based on the file size
a1.sinks.k1.hdfs.rollSize=0
# When no data is written to the currently opened temporary file within the time (seconds) specified by this parameter, the temporary file will be closed and renamed to the target file
a1.sinks.k1.hdfs.idleTimeout=3

a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp=true

## Generate a directory every five minutes:
# Whether to enable discard in time. Discard here is similar to rounding, which will be described later. If enabled, all time expressions except% t will be affected
a1.sinks.k1.hdfs.round=true
# The value of "discard" in time;
a1.sinks.k1.hdfs.roundValue=5
# The unit of "discarding" in time, including: second,minute,hour
a1.sinks.k1.hdfs.roundUnit=minute

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Start flume

nohup flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console >> /root/flume_log 2>&1 &

Start the production script and observe the test

cat /root/flume_log

Data consumption

If the above operations are successful, start to write the code for operating HBase to consume data, and store the generated data in HBase in real time.

Idea:

  • Write MR, read the data in HDFS cluster and print it to the console to observe whether it is successful;

  • Now that the data in HDFS can be read, the read data can be written to HBase, so write and call HBaseAPI related methods to write the data read from HDFS to HBase;

  • The above two steps are enough to complete the tasks of consuming and storing data, but involve decoupling, so some attribute files need to be externalized in the process, and HBase generic methods need to be encapsulated in a class.

Create a new module project: shsxt_ecshop_loganalyse

  • pom.xml file configuration:

No customization required

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>org.example</groupId>
  <artifactId>shsxt_ecshop_loganalyse</artifactId>
  <version>1.0</version>

  <!--Use Ali's maven Warehouse-->
  <repositories>
    <repository>
      <id>central</id>
      <name>aliyun maven</name>
      <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <layout>default</layout>
    </repository>
  </repositories>

  <!--Set basic project parameters-->
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>

  <!--Add dependency-->
  <dependencies>
    <!--add to hadoop Command dependency-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.6.5</version>
    </dependency>
    <!--add to hadoop Client dependency-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.6.5</version>
    </dependency>
    <!--add to hdfs Dependence of-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>2.6.5</version>
    </dependency>
    <!--add to mapreduce Client code dependencies for-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.6.5</version>
    </dependency>
    <!--add to hive Data file dependency-->
    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>2.3.7</version>
    </dependency>
    <!--add to hbase Client dependencies for-->
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>1.4.13</version>
    </dependency>
    <!--add to hbase Server dependency-->
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-server</artifactId>
      <version>1.4.13</version>
    </dependency>
    <!--add to musql Connection dependency-->
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>8.0.20</version>
    </dependency>
    <!--Add and resolve browser dependencies-->
    <dependency>
      <groupId>cz.mallat.uasparser</groupId>
      <artifactId>uasparser</artifactId>
      <version>0.6.2</version>
    </dependency>
    <!--Add test dependency-->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Function of ETL code

  • Remove dirty data
  • Split the data into formats that can be processed directly (parse IP, parse browser, split log)
  • Store the results of data cleaning in HBase

ETL coding

  • IpSeeker: creates a new IpSeeker class file

    Parsing ip: used to read QQwry Dat file to get the location according to ip, QQwry The format of DAT is:

    I File header, 8 bytes in total

    1. Absolute offset of the first starting IP, 4 bytes

    2. Absolute offset of the last starting IP, 4 bytes

    II "End address / country / region" record area each record followed by a four byte ip address is divided into two parts

    1. National Records

    2. Regional records, but regional records are not necessarily available.

      Moreover, there are two forms of national records and regional records

      1. String ending with 0

      2. 4 bytes, one byte may be 0x1 or 0x2

        a. When it is 0x1, it means that the absolute offset is followed by a region record. Note that it is after the absolute offset, not after these four bytes

        b. When it is 0x2, it indicates that there is no area record after absolute offset

        Whether 0x1 or 0x2, the last three bytes are the absolute offset in the file of the actual country name. If it is a regional record, the meanings of 0x1 and 0x2 are unknown, but if these two bytes appear, they must be offset by 3 bytes. If not, it is a 0-ending string

    III "Start address / end address offset" record area

    Each record is 7 bytes, arranged from small to large according to the starting address

    a. starting IP address, 4 bytes

    b. absolute offset of end ip address, 3 bytes

    Note that the ip address and all offsets in this file are in little endian format, while java is in big endian format. Pay attention to conversion

    No customization required

public class IPSeeker {
	
	// Some fixed constants, such as record length, etc
	private static final int IP_RECORD_LENGTH = 7;
	private static final byte AREA_FOLLOWED = 0x01;
	private static final byte NO_AREA = 0x2;

	// It is used as a cache. When querying an ip, first check the cache to reduce unnecessary repeated searches
	private Hashtable ipCache;
	// Random file access class
	private RandomAccessFile ipFile;
	// Memory mapping file
	private MappedByteBuffer mbb;
	// The first mock exam example
	private static IPSeeker instance = null;
	// Absolute offset of the start and end of the start region
	private long ipBegin, ipEnd;
	// Temporary variables used to improve efficiency
	private IPLocation loc;
	private byte[] buf;
	private byte[] b4;
	private byte[] b3;
	
	/**
	 * private constructors 
	 */
	protected IPSeeker() {
		ipCache = new Hashtable();
		loc = new IPLocation();
		buf = new byte[100];
		b4 = new byte[4];
		b3 = new byte[3];
		try {
			String ipFilePath = IPSeeker.class.getResource("/qqwry.dat")
					.getFile();
			ipFile = new RandomAccessFile(ipFilePath, "r");
		} catch (FileNotFoundException e) {
			System.out.println("IP Address information file not found, IP The display function will not be available");
			ipFile = null;

		}
		// If the file is opened successfully, read the file header information
		if (ipFile != null) {
			try {
				ipBegin = readLong4(0);
				ipEnd = readLong4(4);
				if (ipBegin == -1 || ipEnd == -1) {
					ipFile.close();
					ipFile = null;
					
				}
			} catch (IOException e) {
				System.out.println("IP Address information file format error, IP The display function will not be available");
				ipFile = null;
			}
		}
	}
	/**
	 * @return Single instance
	 */
	public static IPSeeker getInstance() {
		if (instance == null) {
			instance = new IPSeeker();
		}
		return instance;
	}
	/**
	 * Given the incomplete name of a location, a series of IP range records containing s substrings are obtained
	 * @param s Ground point string
	 * @return List containing IPEntry type
	 */
	public List getIPEntriesDebug(String s) {
		List ret = new ArrayList();
		long endOffset = ipEnd + 4;
		for (long offset = ipBegin + 4; offset <= endOffset; offset += IP_RECORD_LENGTH) {
			// Read end IP offset
			long temp = readLong3(offset);
			// If temp is not equal to - 1, read the location information of IP
			if (temp != -1) {
				IPLocation loc = getIPLocation(temp);
				// Judge whether the location contains s substring. If so, add the record to the List. If not, continue
				if (loc.country.indexOf(s) != -1 || loc.area.indexOf(s) != -1) {
					IPEntry entry = new IPEntry();
					entry.country = loc.country;
					entry.area = loc.area;
					// Get start IP
					readIP(offset - 4, b4);
					entry.beginIp = IPSeekerUtils.getIpStringFromBytes(b4);
					// Get end IP
					readIP(temp, b4);
					entry.endIp = IPSeekerUtils.getIpStringFromBytes(b4);
					// Add this record
					ret.add(entry);
				}
			}
		}
		return ret;
	}

	/** */
	/**
	 * Given the incomplete name of a location, a series of IP range records containing s substrings are obtained
	 * 
	 * @param s
	 *            Ground point string
	 * @return List containing IPEntry type
	 */
	public List getIPEntries(String s) {
		List ret = new ArrayList();
		try {
			// Mapping IP information files to memory
			if (mbb == null) {
				FileChannel fc = ipFile.getChannel();
				mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, ipFile.length());
				mbb.order(ByteOrder.LITTLE_ENDIAN);
			}

			int endOffset = (int) ipEnd;
			for (int offset = (int) ipBegin + 4; offset <= endOffset; offset += IP_RECORD_LENGTH) {
				int temp = readInt3(offset);
				if (temp != -1) {
					IPLocation loc = getIPLocation(temp);
					// Judge whether the location contains s substring. If so, add the record to the List. If not, continue
					if (loc.country.indexOf(s) != -1
							|| loc.area.indexOf(s) != -1) {
						IPEntry entry = new IPEntry();
						entry.country = loc.country;
						entry.area = loc.area;
						// Get start IP
						readIP(offset - 4, b4);
						entry.beginIp = IPSeekerUtils.getIpStringFromBytes(b4);
						// Get end IP
						readIP(temp, b4);
						entry.endIp = IPSeekerUtils.getIpStringFromBytes(b4);
						// Add this record
						ret.add(entry);
					}
				}
			}
		} catch (IOException e) {
			System.out.println(e.getMessage());
		}
		return ret;
	}

	/** */
	/**
	 * Read an int three bytes from the offset position of the memory mapping file
	 * 
	 * @param offset
	 * @return
	 */
	private int readInt3(int offset) {
		mbb.position(offset);
		return mbb.getInt() & 0x00FFFFFF;
	}

	/** */
	/**
	 * Read an int three bytes from the current location of the memory mapped file
	 * 
	 * @return
	 */
	private int readInt3() {
		return mbb.getInt() & 0x00FFFFFF;
	}

	/** */
	/**
	 * Get country name according to IP
	 * 
	 * @param ip
	 *            ip Byte array form of
	 * @return Country name string
	 */
	public String getCountry(byte[] ip) {
		// Check whether the ip address file is normal
		if (ipFile == null)
			return "FALSE IP Database file";
		// Save ip and convert ip byte array to string form
		String ipStr = IPSeekerUtils.getIpStringFromBytes(ip);
		// First check whether the cache already contains the results of this ip, and no more search files
		if (ipCache.containsKey(ipStr)) {
			IPLocation loc = (IPLocation) ipCache.get(ipStr);
			return loc.country;
		} else {
			IPLocation loc = getIPLocation(ip);
			ipCache.put(ipStr, loc.getCopy());
			return loc.country;
		}
	}

	/** */
	/**
	 * Get country name according to IP
	 * 
	 * @param ip
	 *            IP String form of
	 * @return Country name string
	 */
	public String getCountry(String ip) {
		return getCountry(IPSeekerUtils.getIpByteArrayFromString(ip));
	}

	/** */
	/**
	 * Get region name according to IP
	 * 
	 * @param ip
	 *            ip Byte array form of
	 * @return Region name string
	 */
	public String getArea(byte[] ip) {
		// Check whether the ip address file is normal
		if (ipFile == null)
			return "FALSE IP Database file";
		// Save ip and convert ip byte array to string form
		String ipStr = IPSeekerUtils.getIpStringFromBytes(ip);
		// First check whether the cache already contains the results of this ip, and no more search files
		if (ipCache.containsKey(ipStr)) {
			IPLocation loc = (IPLocation) ipCache.get(ipStr);
			return loc.area;
		} else {
			IPLocation loc = getIPLocation(ip);
			ipCache.put(ipStr, loc.getCopy());
			return loc.area;
		}
	}

	/**
	 * Get region name according to IP
	 * 
	 * @param ip
	 *            IP String form of
	 * @return Region name string
	 */
	public String getArea(String ip) {
		return getArea(IPSeekerUtils.getIpByteArrayFromString(ip));
	}

	/** */
	/**
	 * Search the ip information file according to the ip to obtain the IPLocation structure, and the searched ip parameters are obtained from the class member ip
	 * 
	 * @param ip
	 *            IP to query
	 * @return IPLocation structure
	 */
	public IPLocation getIPLocation(byte[] ip) {
		IPLocation info = null;
		long offset = locateIP(ip);
		if (offset != -1)
			info = getIPLocation(offset);
		if (info == null) {
			info = new IPLocation();
			info.country = "Unknown country";
			info.area = "Unknown region";
		}
		return info;
	}

	/**
	 * Read 4 bytes from the offset position as a long. Because java is in big endian format, there is no way to use such a function for conversion
	 * 
	 * @param offset
	 * @return Read the long value of the file, and return - 1, indicating that reading the file failed
	 */
	private long readLong4(long offset) {
		long ret = 0;
		try {
			ipFile.seek(offset);
			ret |= (ipFile.readByte() & 0xFF);
			ret |= ((ipFile.readByte() << 8) & 0xFF00);
			ret |= ((ipFile.readByte() << 16) & 0xFF0000);
			ret |= ((ipFile.readByte() << 24) & 0xFF000000);
			return ret;
		} catch (IOException e) {
			return -1;
		}
	}

	/**
	 * Read three bytes from the offset position as a long. Because java is in big endian format, there is no way to use such a function for conversion
	 * 
	 * @param offset
	 * @return Read the long value of the file, and return - 1, indicating that reading the file failed
	 */
	private long readLong3(long offset) {
		long ret = 0;
		try {
			ipFile.seek(offset);
			ipFile.readFully(b3);
			ret |= (b3[0] & 0xFF);
			ret |= ((b3[1] << 8) & 0xFF00);
			ret |= ((b3[2] << 16) & 0xFF0000);
			return ret;
		} catch (IOException e) {
			return -1;
		}
	}

	/**
	 * Read 3 bytes from the current position and convert to long
	 * 
	 * @return
	 */
	private long readLong3() {
		long ret = 0;
		try {
			ipFile.readFully(b3);
			ret |= (b3[0] & 0xFF);
			ret |= ((b3[1] << 8) & 0xFF00);
			ret |= ((b3[2] << 16) & 0xFF0000);
			return ret;
		} catch (IOException e) {
			return -1;
		}
	}

	/**
	 * Read the four byte ip address from the offset position and put it into the ip array. The read ip is in big endian format, but
	 * The file is in the form of little endian, which will be converted
	 * 
	 * @param offset
	 * @param ip
	 */
	private void readIP(long offset, byte[] ip) {
		try {
			ipFile.seek(offset);
			ipFile.readFully(ip);
			byte temp = ip[0];
			ip[0] = ip[3];
			ip[3] = temp;
			temp = ip[1];
			ip[1] = ip[2];
			ip[2] = temp;
		} catch (IOException e) {
			System.out.println(e.getMessage());
		}
	}

	/**
	 * Read the four byte ip address from the offset position and put it into the ip array. The read ip is in big endian format, but
	 * The file is in the form of little endian, which will be converted
	 * 
	 * @param offset
	 * @param ip
	 */
	private void readIP(int offset, byte[] ip) {
		mbb.position(offset);
		mbb.get(ip);
		byte temp = ip[0];
		ip[0] = ip[3];
		ip[3] = temp;
		temp = ip[1];
		ip[1] = ip[2];
		ip[2] = temp;
	}

	/**
	 * Compare the class member ip with beginIp. Note that the beginIp is big endian
	 * 
	 * @param ip
	 *            IP to query
	 * @param beginIp
	 *            IP compared with queried IP
	 * @return Equal returns 0, ip greater than beginIp returns 1, less than - 1.
	 */
	private int compareIP(byte[] ip, byte[] beginIp) {
		for (int i = 0; i < 4; i++) {
			int r = compareByte(ip[i], beginIp[i]);
			if (r != 0)
				return r;
		}
		return 0;
	}

	/**
	 * Compare two byte s as unsigned numbers
	 * 
	 * @param b1
	 * @param b2
	 * @return If b1 is greater than b2, it returns 1, equal returns 0, and less than - 1
	 */
	private int compareByte(byte b1, byte b2) {
		if ((b1 & 0xFF) > (b2 & 0xFF)) // Compare whether greater than
			return 1;
		else if ((b1 ^ b2) == 0)// Judge whether they are equal
			return 0;
		else
			return -1;
	}

	/**
	 * This method will locate the record containing the ip country and region according to the ip content, return an absolute offset, and use the dichotomy method to find it.
	 * 
	 * @param ip
	 *            IP to query
	 * @return If found, return the offset of the end IP. If not found, return - 1
	 */
	private long locateIP(byte[] ip) {
		long m = 0;
		int r;
		// Compare first ip entry
		readIP(ipBegin, b4);
		r = compareIP(ip, b4);
		if (r == 0)
			return ipBegin;
		else if (r < 0)
			return -1;
		// Start binary search
		for (long i = ipBegin, j = ipEnd; i < j;) {
			m = getMiddleOffset(i, j);
			readIP(m, b4);
			r = compareIP(ip, b4);
			// log.debug(Utils.getIpStringFromBytes(b));
			if (r > 0)
				i = m;
			else if (r < 0) {
				if (m == j) {
					j -= IP_RECORD_LENGTH;
					m = j;
				} else
					j = m;
			} else
				return readLong3(m + 4);
		}
		// If the loop ends, then i and j must be equal. This record is the most likely record, but it is not
		// It must be. Also check it. If so, return the absolute offset of the end address area
		m = readLong3(m + 4);
		readIP(m, b4);
		r = compareIP(ip, b4);
		if (r <= 0)
			return m;
		else
			return -1;
	}

	/**
	 * Get the offset recorded in the middle of the begin offset and end offset
	 * 
	 * @param begin
	 * @param end
	 * @return
	 */
	private long getMiddleOffset(long begin, long end) {
		long records = (end - begin) / IP_RECORD_LENGTH;
		records >>= 1;
		if (records == 0)
			records = 1;
		return begin + records * IP_RECORD_LENGTH;
	}

	/**
	 * Given the offset of an ip country and region record, an IPLocation structure is returned
	 * 
	 * @param offset
	 * @return
	 */
	private IPLocation getIPLocation(long offset) {
		try {
			// Skip 4-byte ip
			ipFile.seek(offset + 4);
			// Read the first byte to determine whether the flag byte
			byte b = ipFile.readByte();
			if (b == AREA_FOLLOWED) {
				// Read country offset
				long countryOffset = readLong3();
				// Jump to offset
				ipFile.seek(countryOffset);
				// Check the flag byte again, because this place may still be a redirect at this time
				b = ipFile.readByte();
				if (b == NO_AREA) {
					loc.country = readString(readLong3());
					ipFile.seek(countryOffset + 4);
				} else
					loc.country = readString(countryOffset);
				// Read region flag
				loc.area = readArea(ipFile.getFilePointer());
			} else if (b == NO_AREA) {
				loc.country = readString(readLong3());
				loc.area = readArea(offset + 8);
			} else {
				loc.country = readString(ipFile.getFilePointer() - 1);
				loc.area = readArea(ipFile.getFilePointer());
			}
			return loc;
		} catch (IOException e) {
			return null;
		}
	}

	/**
	 * @param offset
	 * @return
	 */
	private IPLocation getIPLocation(int offset) {
		// Skip 4-byte ip
		mbb.position(offset + 4);
		// Read the first byte to determine whether the flag byte
		byte b = mbb.get();
		if (b == AREA_FOLLOWED) {
			// Read country offset
			int countryOffset = readInt3();
			// Jump to offset
			mbb.position(countryOffset);
			// Check the flag byte again, because this place may still be a redirect at this time
			b = mbb.get();
			if (b == NO_AREA) {
				loc.country = readString(readInt3());
				mbb.position(countryOffset + 4);
			} else
				loc.country = readString(countryOffset);
			// Read region flag
			loc.area = readArea(mbb.position());
		} else if (b == NO_AREA) {
			loc.country = readString(readInt3());
			loc.area = readArea(offset + 8);
		} else {
			loc.country = readString(mbb.position() - 1);
			loc.area = readArea(mbb.position());
		}
		return loc;
	}
	/**
	 * Starting from the offset, parse the following bytes and read out a region name
	 * 
	 * @param offset
	 * @return Region name string
	 * @throws IOException
	 */
	private String readArea(long offset) throws IOException {
		ipFile.seek(offset);
		byte b = ipFile.readByte();
		if (b == 0x01 || b == 0x02) {
			long areaOffset = readLong3(offset + 1);
			if (areaOffset == 0)
				return "Unknown region";
			else
				return readString(areaOffset);
		} else
			return readString(offset);
	}
	/**
	 * @param offset
	 * @return
	 */
	private String readArea(int offset) {
		mbb.position(offset);
		byte b = mbb.get();
		if (b == 0x01 || b == 0x02) {
			int areaOffset = readInt3();
			if (areaOffset == 0)
				return "Unknown region";
			else
				return readString(areaOffset);
		} else
			return readString(offset);
	}
	/**
	 * Reads a string ending in 0 from the offset
	 * 
	 * @param offset
	 * @return An error occurred while reading the string. An empty string was returned
	 */
	private String readString(long offset) {
		try {
			ipFile.seek(offset);
			int i;
			for (i = 0, buf[i] = ipFile.readByte(); buf[i] != 0; buf[++i] = ipFile
					.readByte())
				;
			if (i != 0)
				return IPSeekerUtils.getString(buf, 0, i, "GBK");
		} catch (IOException e) {
			System.out.println(e.getMessage());
		}
		return "";
	}
	/**
	 * Get a string ending in 0 from the offset position of the memory mapping file
	 * 
	 * @param offset
	 * @return
	 */
	private String readString(int offset) {
		try {
			mbb.position(offset);
			int i;
			for (i = 0, buf[i] = mbb.get(); buf[i] != 0; buf[++i] = mbb.get())
				;
			if (i != 0)
				return IPSeekerUtils.getString(buf, 0, i, "GBK");
		} catch (IllegalArgumentException e) {
			System.out.println(e.getMessage());
		}
		return "";
	}

	public String getAddress(String ip) {
		String country = getCountry(ip).equals(" CZ88.NET") ? ""
				: getCountry(ip);
		String area = getArea(ip).equals(" CZ88.NET") ? "" : getArea(ip);
		String address = country + " " + area;
		return address.trim();
	}
	/**
	 * * It is used to encapsulate ip related information. At present, there are only two fields: the country and region where the ip is located
	 * 
	 * 
	 * @author swallow
	 */
	public class IPLocation {
		public String country;
		public String area;

		public IPLocation() {
			country = area = "";
		}

		public IPLocation getCopy() {
			IPLocation ret = new IPLocation();
			ret.country = country;
			ret.area = area;
			return ret;
		}
	}
	/**
	 * An IP range record includes not only country and region, but also start IP and end IP*
	 * 
	 * 
	 * @author root
	 */
	public class IPEntry {
		public String beginIp;
		public String endIp;
		public String country;
		public String area;

		public IPEntry() {
			beginIp = endIp = country = area = "";
		}

		public String toString() {
			return this.area + " " + this.country + "IP  Χ:" + this.beginIp
					+ "-" + this.endIp;
		}
	}
	/**
	 * Operation tool class
	 * 
	 * @author root
	 * 
	 */
	public static class IPSeekerUtils {
		/**
		 * Get the byte array form from the string form of ip
		 * 
		 * @param ip
		 *            ip in string form
		 * @return ip in byte array
		 */
		public static byte[] getIpByteArrayFromString(String ip) {
			byte[] ret = new byte[4];
			java.util.StringTokenizer st = new java.util.StringTokenizer(ip,
					".");
			try {
				ret[0] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF);
				ret[1] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF);
				ret[2] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF);
				ret[3] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF);
			} catch (Exception e) {
				System.out.println(e.getMessage());
			}
			return ret;
		}
		/**
		 * Encode and convert the original string. If it fails, return the original string
		 * 
		 * @param s
		 *            Original string
		 * @param srcEncoding
		 *            Source coding method
		 * @param destEncoding
		 *            Target coding method
		 * @return Failed to convert the encoded string. The original string is returned
		 */
		public static String getString(String s, String srcEncoding,
				String destEncoding) {
			try {
				return new String(s.getBytes(srcEncoding), destEncoding);
			} catch (UnsupportedEncodingException e) {
				return s;
			}
		}
		/**
		 * Converts an array of bytes into a string according to some encoding
		 * 
		 * @param b
		 *            Byte array
		 * @param encoding
		 *            Coding mode
		 * @return If encoding is not supported, a default encoded string is returned
		 */
		public static String getString(byte[] b, String encoding) {
			try {
				return new String(b, encoding);
			} catch (UnsupportedEncodingException e) {
				return new String(b);
			}
		}
		/**
		 * Converts an array of bytes into a string according to some encoding
		 * 
		 * @param b
		 *            Byte array
		 * @param offset
		 *            Start position to convert
		 * @param len
		 *            Length to convert
		 * @param encoding
		 *            Coding mode
		 * @return If encoding is not supported, a default encoded string is returned
		 */
		public static String getString(byte[] b, int offset, int len,
				String encoding) {
			try {
				return new String(b, offset, len, encoding);
			} catch (UnsupportedEncodingException e) {
				return new String(b, offset, len);
			}
		}
		/**
		 * @param ip
		 *            ip Byte array form of
		 * @return ip in string form
		 */
		public static String getIpStringFromBytes(byte[] ip) {
			StringBuffer sb = new StringBuffer();
			sb.append(ip[0] & 0xFF);
			sb.append('.');
			sb.append(ip[1] & 0xFF);
			sb.append('.');
			sb.append(ip[2] & 0xFF);
			sb.append('.');
			sb.append(ip[3] & 0xFF);
			return sb.toString();
		}
	}
	/**
	 * Get a list of all ip address sets
	 * 
	 * @return
	 */
	public List<String> getAllIp() {
		List<String> list = new ArrayList<String>();
		byte[] buf = new byte[4];
		for (long i = ipBegin; i < ipEnd; i += IP_RECORD_LENGTH) {
			try {
				this.readIP(this.readLong3(i + 4), buf); // Read the ip, and finally put the ip into the buf
				String ip = IPSeekerUtils.getIpStringFromBytes(buf);
				list.add(ip);
			} catch (Exception e) {
				// nothing
			}
		}
		return list;
	}
}
  • Configuration file qqery dat:

    Configuration file for configuration and code qqery DAT, upload to the resource directory

    No customization required

    If an error is reported and the human body cannot be found, and no result can be found after running, there must be a problem with the directory of the configuration file. It must be an English directory

  • Test IPSeeker code

    Create a new TestIPSeeker class: test whether there is a problem with ip conversion

    No customization required

    Tip: the configuration file must be an English directory

public class TestIPSeeker {
	public static void main(String[] args) {
		IPSeeker ipSeeker = IPSeeker.getInstance();
		System.out.println(ipSeeker.getCountry("120.197.87.216"));
		System.out.println(ipSeeker.getCountry("115.239.210.27"));
        System.out.println(ipSeeker.getCountry("255.255.255.255"));
	}
}
  • Constant class GlobalConstants coding

    New GlobalConstants class: contains the values of commonly used global variables

    No customization required

public class GlobalConstants {
	//Milliseconds per day
	public static final int DAY_OF_MILLISECONDS = 86400000;
	// Defined runtime variable name
	public static final String RUNNING_DATE_PARAMES = "RUNNING_DATE";
	// Default value
	public static final String DEFAULT_VALUE = "unknown";
	// Specify all column values in the dimension information table
	public static final String VALUE_OF_ALL = "all";
	// Prefix of the defined output collector
	public static final String OUTPUT_COLLECTOR_KEY_PREFIX = "collector_";
	// Specify that the connection table is configured as report
	public static final String WAREHOUSE_OF_REPORT = "report";
	// Batch executed key
	public static final String JDBC_BATCH_NUMBER = "mysql.batch.number";
	// Default batch size
	public static final String DEFAULT_JDBC_BATCH_NUMBER = "500";
	// driver name
	public static final String JDBC_DRIVER = "mysql.%s.driver";
	// JDBC URL
	public static final String JDBC_URL = "mysql.%s.url";
	// username name
	public static final String JDBC_USERNAME = "mysql.%s.username";
	// password name
	public static final String JDBC_PASSWORD = "mysql.%s.password";

}
  • IPSeekerExt coding:

    Create a new subclass IPSeekerExt class of IPSeeker.

    Enhance IP resolution:

    • When resolving the final return of ip: country name, province name and city name are returned separately

    • If it is a foreign ip, set it directly to unknown unknown unknown

    • If it is a domestic ip and cannot be parsed, it is set to China unknown unknown

    • unknown reads the constant class GlobalConstants

    No customization required

    public class IPSeekerExt extends IPSeeker {
    	private RegionInfo DEFAULT_INFO = new RegionInfo();
    
    	/**
    	 * Resolve the ip address and return the country and province information corresponding to the ip address < br / >
    	 * If the ip resolution fails, the default value is returned directly
    	 * @param ip The format of the ip address to be resolved is 120.197 eighty-seven point two one six
    	 * @return
    	 */
    	public RegionInfo analyticIp(String ip) {
    		if (ip == null || ip.trim().isEmpty()) {
    			return DEFAULT_INFO;
    		}
    
    		RegionInfo info = new RegionInfo();
    		try {
    			String country = super.getCountry(ip);
    			if ("LAN".equals(country)) {
    				info.setCountry("China");
    				info.setProvince("Shanghai");
    			} else if (country != null && !country.trim().isEmpty()) {
    				// Indicates that the ip is also a resolvable ip
    				country = country.trim();
    				int length = country.length();
    				int index = country.indexOf('province');
    				if (index > 0) {
    					// The current ip address belongs to one of 23 provinces. The format of country is xxx province (xxx city) (xxx county / district)
    					info.setCountry("China");
    					if (index == length - 1) {
    						info.setProvince(country); // Set Province, format: Guangdong Province
    					} else {
    						// Format: Guangzhou City, Guangdong Province
    						info.setProvince(country.substring(0, index + 1)); // Set Province
    						int index2 = country.indexOf('city', index); // View the location of the next market
    						if (index2 > 0) {
    							country.substring(1, 1);
    							info.setCity(country.substring(index + 1,
    									Math.min(index2 + 1, length))); // Set city
    						}
    					}
    				} else {
    					// The other five autonomous regions, four municipalities directly under the central government and two special administrative regions
    					String flag = country.substring(0, 2); // Take the first two digits of the string
    					switch (flag) {
    					case "Inner Mongolia":
    						info.setCountry("China");
    						info.setProvince("Inner Mongolia Autonomous Region");
    						country = country.substring(3);
    						if (country != null && !country.isEmpty()) {
    							index = country.indexOf('city');
    							if (index > 0) {
    								info.setCity(country.substring(0,
    										Math.min(index + 1, country.length()))); // Set city
    							}
    						}
    						break;
    					case "Guangxi":
    					case "Tibet":
    					case "Ningxia":
    					case "Xinjiang":
    						info.setCountry("China");
    						info.setProvince(flag);
    						country = country.substring(2);
    						if (country != null && !country.isEmpty()) {
    							index = country.indexOf('city');
    							if (index > 0) {
    								info.setCity(country.substring(0,
    										Math.min(index + 1, country.length()))); // Set city
    							}
    						}
    						break;
    					case "Shanghai":
    					case "Beijing":
    					case "Tianjin":
    					case "Chongqing":
    						info.setCountry("China");
    						info.setProvince(flag + "city");
    						country = country.substring(3); // Remove this province / Municipality
    						if (country != null && !country.isEmpty()) {
    							index = country.indexOf('area');
    							if (index > 0) {
    								char ch = country.charAt(index - 1);
    								if (ch != 'school' || ch != 'Small') {
    									info.setCity(country.substring(
    											0,
    											Math.min(index + 1,
    													country.length()))); // Setting area
    								}
    							}
    
    							if (RegionInfo.DEFAULT_VALUE.equals(info.getCity())) {
    								// city is still the default
    								index = country.indexOf('county');
    								if (index > 0) {
    									info.setCity(country.substring(
    											0,
    											Math.min(index + 1,
    													country.length()))); // Setting area
    								}
    							}
    						}
    						break;
    					case "Hong Kong":
    					case "Macao":
    						info.setCountry("China");
    						info.setProvince(flag + "special administrative region");
    						break;
    					default:
    						break;
    					}
    				}
    			}
    		} catch (Exception e) {
    			// An exception occurred during parsing
    			e.printStackTrace();
    		}
    		return info;
    	}
    	/**
    	 * ip A model related to region
    	 * @author root
    	 */
    	public static class RegionInfo {
    		public static final String DEFAULT_VALUE = GlobalConstants.DEFAULT_VALUE; // Default value
    		private String country = DEFAULT_VALUE; // country
    		private String province = DEFAULT_VALUE; // province
    		private String city = DEFAULT_VALUE; // city
    		public String getCountry() {
    			return country;
    		}
    		public void setCountry(String country) {
    			this.country = country;
    		}
    		public String getProvince() {
    			return province;
    		}
    		public void setProvince(String province) {
    			this.province = province;
    		}
    		public String getCity() {
    			return city;
    		}
    		public void setCity(String city) {
    			this.city = city;
    		}
    		@Override
    		public String toString() {
    			return "RegionInfo [country=" + country + ", province=" + province
    					+ ", city=" + city + "]";
    		}
    	}
    }
    
    
  • Testing the IPseekerExt class

    Create a new testipseekerex class: check whether the obtained results meet your requirements

    public class TestIPSeekerExt {
    	public static void main(String[] args) {
    		//Get the region information of an IP
    		IPSeekerExt ipSeekerExt = new IPSeekerExt();
    		RegionInfo info = ipSeekerExt.analyticIp("114.114.114.114");
    		System.out.println(info);
    
    		//Obtain regional information of location ip
    		info = ipSeekerExt.analyticIp("255.255.255.255");
    		System.out.println(info);
    
    		//Get the region information of all IP addresses
    //		List<String> ips = ipSeekerExt.getAllIp();
    //		for (String ip : ips) {
    //			System.out.println(ip + " --- " + ipSeekerExt.analyticIp(ip));
    //		}
    	}
    }
    
    
  • UserAgentUtil class coding

    Create a new UserAgentUtil class file: parse the browser. Rely on CZ by calling pom mallat. The method in the uasparser jar file in the uasparser.

    No customization required

    /**
     * The tool class that parses the user agent of the browser calls the uasparser jar file internally
     * @author root
     */
    public class UserAgentUtil {
    	static UASparser uasParser = null;
    
    	// static code block to initialize the uasParser object
    	static {
    		try {
    			uasParser = new UASparser(OnlineUpdater.getVendoredInputStream());
    		} catch (IOException e) {
    			e.printStackTrace();
    		}
    	}
    
    	/**
    	 * Parse the user agent string of the browser and return the UserAgentInfo object< br/>
    	 * If the user agent is null, null is returned. If the parsing fails, null is also returned directly.
    	 * 
    	 * @param userAgent user agent string to parse
    	 * @return Returns a specific value
    	 */
    	public static UserAgentInfo analyticUserAgent(String userAgent) {
    		UserAgentInfo result = null;
    		if (!(userAgent == null || userAgent.trim().isEmpty())) {
    			// At this point, the userAgent is not null and does not consist of all spaces
    			try {
    				cz.mallat.uasparser.UserAgentInfo info = null;
    				info = uasParser.parse(userAgent);
    				result = new UserAgentInfo();
    				result.setBrowserName(info.getUaFamily());
    				result.setBrowserVersion(info.getBrowserVersionInfo());
    				result.setOsName(info.getOsFamily());
    				result.setOsVersion(info.getOsName());
    			} catch (IOException e) {
    				// An exception occurred, set the return value to null
    				result = null;
    			}
    		}
    		return result;
    	}
    
    	/**
    	 * Browser information model object after internal parsing
    	 * 
    	 * @author root
    	 *
    	 */
    	public static class UserAgentInfo {
    		private String browserName; // Browser name
    		private String browserVersion; // Browser version number
    		private String osName; // Operating system name
    		private String osVersion; // Operating system version number
    		public String getBrowserName() {
    			return browserName;
    		}
    		public void setBrowserName(String browserName) {
    			this.browserName = browserName;
    		}
    		public String getBrowserVersion() {
    			return browserVersion;
    		}
    		public void setBrowserVersion(String browserVersion) {
    			this.browserVersion = browserVersion;
    		}
    		public String getOsName() {
    			return osName;
    		}
    		public void setOsName(String osName) {
    			this.osName = osName;
    		}
    		public String getOsVersion() {
    			return osVersion;
    		}
    		public void setOsVersion(String osVersion) {
    			this.osVersion = osVersion;
    		}
    		@Override
    		public String toString() {
    			return "UserAgentInfo [browserName=" + browserName + ", browserVersion=" + browserVersion + ", osName="
    					+ osName + ", osVersion=" + osVersion + "]";
    		}
    	}
    }
    
  • UserAgentUtil test class coding

    Create a new UserAgentUtil class file: check whether the parsing result is correct

    No customization required

    public class TestUserAgentUtil {
    	
    	public static void main(String[] args) {
    		String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36";
    //		userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; GWX:QUALIFIED; rv:11.0) like Gecko";
    //		userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0";
    		UserAgentInfo info = UserAgentUtil.analyticUserAgent(userAgent);
    		System.out.println(info);
    	}
    }
    
    
  • Create a new date type enumeration class DateEnum:

    Create a new enumeration class DateEnum file: parse the data entered in the tool for obtaining time

    No customization required

    /**
     * Date type enumeration class
     * 
     * @author root
     *
     */
    public enum DateEnum {
    	YEAR("year"), SEASON("season"), MONTH("month"), WEEK("week"), DAY("day"), HOUR(
    			"hour");
    
    	public final String name;
    
    	private DateEnum(String name) {
    		this.name = name;
    }
    	/**
    	 * Get the corresponding type object according to the value of the attribute name
    	 * 
    	 * @param name
    	 * @return
    	 */
    	public static DateEnum valueOfName(String name) {
    		for (DateEnum type : values()) {
    			if (type.name.equals(name)) {
    				return type;
    			}
    		}
    		return null;
    	}
    }
    
  • Create a new time control tool class TimeUtil:

    New tool class file TimeUtil: converts the time in the log file into an event stamp

    No customization required

    public class TimeUtil {
        public static final String DATE_FORMAT = "yyyy-MM-dd";
    
        /**
         * Get date format string data of yesterday
         * 
         * @return
         */
        public static String getYesterday() {
            return getYesterday(DATE_FORMAT);
        }
        /**
         * Get date format string data of the current day
         *
         * @return
         */
        public static String getday() {
            return getday(DATE_FORMAT);
        }
        /**
         * Gets the time string in the corresponding format
         * 
         * @param pattern
         * @return
         */
        public static String getYesterday(String pattern) {
            SimpleDateFormat sdf = new SimpleDateFormat(pattern);
            Calendar calendar = Calendar.getInstance();
            calendar.add(Calendar.DAY_OF_YEAR, -1);
            return sdf.format(calendar.getTime());
        }
    
        /**
         * Gets the time string in the corresponding format
         *
         * @param pattern
         * @return
         */
        public static String getday(String pattern) {
            SimpleDateFormat sdf = new SimpleDateFormat(pattern);
            Calendar calendar = Calendar.getInstance();
            calendar.add(Calendar.DAY_OF_YEAR, 0);
            return sdf.format(calendar.getTime());
        }
    
        /**
         * Judge whether the input parameter is a valid time format data
         * 
         * @param input
         * @return
         */
        public static boolean isValidateRunningDate(String input) {
            Matcher matcher = null;
            boolean result = false;
            String regex = "[0-9]{4}-[0-9]{2}-[0-9]{2}";
            if (input != null && !input.isEmpty()) {
                Pattern pattern = Pattern.compile(regex);
                matcher = pattern.matcher(input);
            }
            if (matcher != null) {
                result = matcher.matches();
            }
            return result;
        }
    
        /**
         * Converts a time string in yyyy MM DD format to a timestamp
         * 
         * @param input
         * @return
         */
        public static long parseString2Long(String input) {
            return parseString2Long(input, DATE_FORMAT);
        }
    
        /**
         * Converts a time string in the specified format to a timestamp
         * 
         * @param input
         * @param pattern
         * @return
         */
        public static long parseString2Long(String input, String pattern) {
            Date date = null;
            try {
                date = new SimpleDateFormat(pattern).parse(input);
            } catch (ParseException e) {
                throw new RuntimeException(e);
            }
            return date.getTime();
        }
    
        /**
         * Converts a timestamp to a time string in yyyy MM DD format
         * 
         * @param input
         * @return
         */
        public static String parseLong2String(long input) {
            return parseLong2String(input, DATE_FORMAT);
        }
    
        /**
         * Converts a timestamp to a string in the specified format
         * 
         * @param input
         * @param pattern
         * @return
         */
        public static String parseLong2String(long input, String pattern) {
            Calendar calendar = Calendar.getInstance();
            calendar.setTimeInMillis(input);
            return new SimpleDateFormat(pattern).format(calendar.getTime());
        }
    
        /**
         * Convert the nginx server time to a timestamp. If the parsing fails, return - 1
         * 
         * @param input  1459581125.573
         * @return
         */
        public static long parseNginxServerTime2Long(String input) {
            Date date = parseNginxServerTime2Date(input);
            return date == null ? -1L : date.getTime();
        }
    
        /**
         * Converts the nginx server time to a date object. If the resolution fails, null is returned
         * 
         * @param input
         *            Format: 1449410796.976
         * @return
         */
        public static Date parseNginxServerTime2Date(String input) {
            if (StringUtils.isNotBlank(input)) {
                try {
                    long timestamp = Double.valueOf(Double.valueOf(input.trim()) * 1000).longValue();
                    Calendar calendar = Calendar.getInstance();
                    calendar.setTimeInMillis(timestamp);
                    return calendar.getTime();
                } catch (Exception e) {
                    // nothing
                }
            }
            return null;
        }
    
        /**
         * Obtain the required time information from the timestamp
         * 
         * @param time
         *            time stamp
         * @param type
         * @return If there is no matching type, an exception message is thrown
         */
        public static int getDateInfo(long time, DateEnum type) {
            Calendar calendar = Calendar.getInstance();
            calendar.setTimeInMillis(time);
            if (DateEnum.YEAR.equals(type)) {
                // Year information required
                return calendar.get(Calendar.YEAR);
            } else if (DateEnum.SEASON.equals(type)) {
                // Quarterly information required
                int month = calendar.get(Calendar.MONTH) + 1;
                if (month % 3 == 0) {
                    return month / 3;
                }
                return month / 3 + 1;
            } else if (DateEnum.MONTH.equals(type)) {
                // Month information required
                return calendar.get(Calendar.MONTH) + 1;
            } else if (DateEnum.WEEK.equals(type)) {
                // Week information required
                return calendar.get(Calendar.WEEK_OF_YEAR);
            } else if (DateEnum.DAY.equals(type)) {
                return calendar.get(Calendar.DAY_OF_MONTH);
            } else if (DateEnum.HOUR.equals(type)) {
                return calendar.get(Calendar.HOUR_OF_DAY);
            }
            throw new RuntimeException("There is no corresponding time type:" + type);
        }
    
        /**
         * Gets the time stamp value of the first day of the specified week
         * 
         * @param time
         * @return
         */
        public static long getFirstDayOfThisWeek(long time) {
            Calendar cal = Calendar.getInstance();
            cal.setTimeInMillis(time);
            cal.set(Calendar.DAY_OF_WEEK, 1);
            cal.set(Calendar.HOUR_OF_DAY, 0);
            cal.set(Calendar.MINUTE, 0);
            cal.set(Calendar.SECOND, 0);
            cal.set(Calendar.MILLISECOND, 0);
            return cal.getTimeInMillis();
        }
    }
    
  • Code writing of EventLogConstants class:

    Create a new EventLogConstants class file: define the name and event of the user data parameters collected by the log collection client_ Logs is the structure information of the hbase table. The name of the user data parameter is event_ Column name of logs

    No customization required

    public class EventLogConstants {
    	
    	/** Event enumeration class. Specifies the name of the event
    	 * 
    	 * @author root
    	 *
    	 **/
    	public static enum EventEnum {
    		LAUNCH(1, "launch event", "e_l"), // launch event, indicating the first visit
    		PAGEVIEW(2, "page view event", "e_pv"), // Page browsing events
    		CHARGEREQUEST(3, "charge request event", "e_crt"), // Order production event
    		CHARGESUCCESS(4, "charge success event", "e_cs"), // Event triggered when an order is successfully paid
    		CHARGEREFUND(5, "charge refund event", "e_cr"), // Order refund event
    		EVENT(6, "event duration event", "e_e") // event
    		;
    
    		public final int id; // id unique identification
    		public final String name; // name
    		public final String alias; // Alias, short for data collection
    
    		private EventEnum(int id, String name, String alias) {
    			this.id = id;
    			this.name = name;
    			this.alias = alias;
    		}
    
    		/**
    		 * Get the event enumeration object matching the alias. If there is no matching value in the end, null will be returned directly.
    		 * 
    		 * @param alias
    		 * @return
    	     **/
    		public static EventEnum valueOfAlias(String alias) {
    			for (EventEnum event : values()) {
    				if (event.alias.equals(alias)) {
    					return event;
    				}
    			}
    			return null;
    		}
    	}
    	// Table name
    	public static final String HBASE_NAME_EVENT_LOGS = "event";
    	// event_ Column cluster name of logs table
    	public static final String EVENT_LOGS_FAMILY_NAME = "log";
    	// Log separator
    	public static final String LOG_SEPARTIOR = "\\^A";
    	// User ip address
    	public static final String LOG_COLUMN_NAME_IP = "ip";
    	// Server time
    	public static final String LOG_COLUMN_NAME_SERVER_TIME = "s_time";
    	// Event name
    	public static final String LOG_COLUMN_NAME_EVENT_NAME = "en";
    	// Version information of data collection side
    	public static final String LOG_COLUMN_NAME_VERSION = "ver";
    	// User unique identifier
    	public static final String LOG_COLUMN_NAME_UUID = "u_ud";
    	// Member unique identifier
    	public static final String LOG_COLUMN_NAME_MEMBER_ID = "u_mid";
    	// Session id
    	public static final String LOG_COLUMN_NAME_SESSION_ID = "u_sd";
    	// Client time
    	public static final String LOG_COLUMN_NAME_CLIENT_TIME = "c_time";
    	// language
    	public static final String LOG_COLUMN_NAME_LANGUAGE = "l";
    	// Browser user agent parameters
    	public static final String LOG_COLUMN_NAME_USER_AGENT = "b_iev";
    	// Browser resolution size
    	public static final String LOG_COLUMN_NAME_RESOLUTION = "b_rst";
    	// Define platform
    	public static final String LOG_COLUMN_NAME_PLATFORM = "pl";
    	// Current url
    	public static final String LOG_COLUMN_NAME_CURRENT_URL = "p_url";
    	// url of the previous page
    	public static final String LOG_COLUMN_NAME_REFERRER_URL = "p_ref";
    	// title of the current page
    	public static final String LOG_COLUMN_NAME_TITLE = "tt";
    	// Order id
    	public static final String LOG_COLUMN_NAME_ORDER_ID = "oid";
    	// Order name
    	public static final String LOG_COLUMN_NAME_ORDER_NAME = "on";
    	// Order amount
    	public static final String LOG_COLUMN_NAME_ORDER_CURRENCY_AMOUNT = "cua";
    	// Order currency type
    	public static final String LOG_COLUMN_NAME_ORDER_CURRENCY_TYPE = "cut";
    	// Order payment amount
    	public static final String LOG_COLUMN_NAME_ORDER_PAYMENT_TYPE = "pt";
    	// category name
    	public static final String LOG_COLUMN_NAME_EVENT_CATEGORY = "ca";
    	// action name
    	public static final String LOG_COLUMN_NAME_EVENT_ACTION = "ac";
    	// kv prefix
    	public static final String LOG_COLUMN_NAME_EVENT_KV_START = "kv_";
    	// duration
    	public static final String LOG_COLUMN_NAME_EVENT_DURATION = "du";
    	// Operating system name
    	public static final String LOG_COLUMN_NAME_OS_NAME = "os";
    	// Operating system version
    	public static final String LOG_COLUMN_NAME_OS_VERSION = "os_v";
    	// Browser name
    	public static final String LOG_COLUMN_NAME_BROWSER_NAME = "browser";
    	// Browser version
    	public static final String LOG_COLUMN_NAME_BROWSER_VERSION = "browser_v";
    	// Country of ip address resolution
    	public static final String LOG_COLUMN_NAME_COUNTRY = "country";
    	// Province of ip address resolution
    	public static final String LOG_COLUMN_NAME_PROVINCE = "province";
        // City of ip address resolution
    	public static final String LOG_COLUMN_NAME_CITY = "city";
    }
    
  • Coding of LoggerUtil class

    Create a new LoggerUtil class file: a specific working class for processing log data

    • Process log data logText and return the map collection of processing results
    • If logText does not specify a data format, the collection of empty is returned directly
    • Process ip address
    • Process browser userAgent information
    • Processing request parameters

    No customization required

    public class LoggerUtil {
        private static final Logger logger = Logger.getLogger(LoggerUtil.class);
        private static IPSeekerExt ipSeekerExt = new IPSeekerExt();
    
        /**
         * Process log data logText and return the map set of processing results < br / >
         * If logText does not specify a data format, the collection of empty is returned directly
         * 
    	 * 
    	 192.168.78.1^A1542816232.816^Anode01^A/log.gif?en=e_e&ca=event%E7%9A%84category%E5%90%8D%E7%A7%B0&ac=event%E7%9A%84action%E5%90%8D%E7%A7%B0
    	 &ver=1&pl=website&sdk=js&u_ud=9061294F-721C-4838-83E3-B4F6E6EB3233&u_mid=zhangsan&u_sd=E9175784-A5D7-4692-806B-5FF384902D9D&c_time=1542781455523&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F60.0.3112.113%20Safari%2F537.36&b_rst=1536*864
    
         * 
         * @param logText
         * @return
         */
        public static Map<String, String> handleLog(String logText) {
            Map<String, String> clientInfo = new HashMap<String, String>();
            if (StringUtils.isNotBlank(logText)) {
                String[] splits = logText.trim().split(EventLogConstants.LOG_SEPARTIOR);
                if (splits.length == 4) {
                    // Log format: IP ^ aserver time ^ Ahost^A request parameter
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_IP, splits[0].trim()); // Set ip
                    // Set server time
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_SERVER_TIME, String.valueOf(TimeUtil.parseNginxServerTime2Long(splits[1].trim())));
                    int index = splits[3].indexOf("?");
                    if (index > -1) {
                        String requestBody = splits[3].substring(index + 1); // Get the request parameters, that is, our collected data
                        // Processing request parameters
                        handleRequestBody(requestBody, clientInfo);
                        // Handling userAgent
                        handleUserAgent(clientInfo);
                        // Process ip address
                        handleIp(clientInfo);
                    } else {
                        // Abnormal data format
                        clientInfo.clear();
                    }
                }
            }
            return clientInfo;
        }
    
        /**
         * Process ip address
         * 
         * @param clientInfo
         */
        private static void handleIp(Map<String,String> clientInfo) {
            if (clientInfo.containsKey(EventLogConstants.LOG_COLUMN_NAME_IP)) {
                String ip = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_IP);
                RegionInfo info = ipSeekerExt.analyticIp(ip);
                if (info != null) {
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_COUNTRY, info.getCountry());
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_PROVINCE, info.getProvince());
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_CITY, info.getCity());
                }
            }
        }
    
        /**
         * Process browser userAgent information
         * 
         * @param clientInfo
         */
        private static void handleUserAgent(Map<String, String> clientInfo) {
            if (clientInfo.containsKey(EventLogConstants.LOG_COLUMN_NAME_USER_AGENT)) {
                UserAgentInfo info = UserAgentUtil.analyticUserAgent(clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_USER_AGENT));
                if (info != null) {
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_NAME, info.getOsName());
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_VERSION, info.getOsVersion());
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_NAME, info.getBrowserName());
                    clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_VERSION, info.getBrowserVersion());
                }
            }
        }
    
        /**
         * Processing request parameters
         * en=e_e&ca=event%E7%9A%84category%E5%90%8D%E7%A7%B0&ac=event%E7%9A%84action%E5%90%8D%E7%A7%B0
    	 &ver=1&pl=website&sdk=js&u_ud=9061294F-721C-4838-83E3-B4F6E6EB3233&u_mid=zhangsan&u_sd=E9175784-A5D7-4692-806B-5FF384902D9D&c_time=1542781455523&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F60.0.3112.113%20Safari%2F537.36&b_rst=1536*864
    
         * @param requestBody
         * @param clientInfo
         */
        private static void handleRequestBody(String requestBody, Map<String, String> clientInfo) {
            if (StringUtils.isNotBlank(requestBody)) {
                String[] requestParams = requestBody.split("&");
                for (String param : requestParams) {
                    if (StringUtils.isNotBlank(param)) {
                        int index = param.indexOf("=");
                        if (index < 0) {
                            logger.warn("Unable to parse parameters:" + param + ", The request parameter is:" + requestBody);
                            continue;
                        }
    
                        String key = null, value = null;
                        try {
                            key = param.substring(0, index);
                            value = URLDecoder.decode(param.substring(index + 1), "utf-8");
                        } catch (Exception e) {
                            logger.warn("Exception in decoding operation", e);
                            continue;
                        }
                        if (StringUtils.isNotBlank(key) && StringUtils.isNotBlank(value)) {
                            clientInfo.put(key, value);
                        }
                    }
                }
            }
        }
    }
    
  • Verify LoggerUtil class:

    Create a new TestLoggerUtil class file: check whether the parsing result is what you want

    No customization required

    public class TestLoggerUtil {
        public static void main(String[] args) {
            String log = "192.168.100.102^A1449411239.595^A192.168.239.8^A/log.gif?c_time=1449411240818&oid=orderid456&u_mid=zhangsan&pl=java_server&en=e_cr&sdk=jdk&ver=1";
            log = "192.168.100.102^A1449587515.394^A192.168.239.8^A/log.gif?en=e_pv&p_url=http%3A%2F%2Flocalhost%3A8080%2Fbf_track_jssdk%2Fdemo2.jsp&p_ref=http%3A%2F%2Flocalhost%3A8080%2Fbf_track_jssdk%2Fdemo.jsp&tt=%E6%B5%8B%E8%AF%95%E9%A1%B5%E9%9D%A22&ver=1&pl=website&sdk=js&u_ud=948AB94A-E1A5-4EED-BBB8-CEDB74B8B4D0&u_sd=9EF5D22F-5CCD-4290-AFCA-641672988F73&c_time=1449587517241&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%206.1%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F46.0.2490.71%20Safari%2F537.36&b_rst=1280*768";
            System.out.println(LoggerUtil.handleLog(log));
            System.out.println(IPSeekerExt.getInstance().getCountry("192.168.100.102"));
            System.out.println(IPSeekerExt.getInstance().getArea("192.168.100.102"));
        }
    }
    
  • Code writing of AnalyserLogDataRunner

    Create the AnalyserLogDataRunner class file, call the mapper class to read the hdfs file and write the data to hbse

    Path inputpath = new path ("/ data / test / flume /" + timeutil.parselong2string (timeutil.parsestring2long (date), "yyyyMMdd") + "/");

    public class AnalyserLogDataRunner implements Tool {
    	private static final Logger logger = Logger
    			.getLogger(AnalyserLogDataRunner.class);
    	private Configuration conf = null;
    
    	public static void main(String[] args) {
    		try {
    			ToolRunner.run(new Configuration(), new AnalyserLogDataRunner(), args);
    		} catch (Exception e) {
    			logger.error("Perform log parsing job abnormal", e);
    			throw new RuntimeException(e);
    		}
    	}
    
    	@Override
    	public void setConf(Configuration conf) {
    		conf = HBaseConfiguration.create();
    		conf.set("mapreduce.app-submission.corss-paltform", "true");
    		conf.set("mapreduce.framework.name", "local");
    		this.conf = HBaseConfiguration.create(conf);
    	}
    
    	@Override
    	public Configuration getConf() {
    		return this.conf;
    	}
    
    	@Override
    	public int run(String[] args) throws Exception {
    		Configuration conf = this.getConf();
    		this.processArgs(conf, args);
    
    		Job job = Job.getInstance(conf, "analyser_logdata");
    
    		// Code is required to set local job submission and run the cluster
    		// File jarFile = EJob.createTempJar("target/classes");
    		// ((JobConf) job.getConfiguration()).setJar(jarFile.toString());
    		// Set the local submit job, and the cluster runs. The code ends
    
    		job.setJarByClass(AnalyserLogDataRunner.class);
    		job.setMapperClass(AnalyserLogDataMapper.class);
    		job.setMapOutputKeyClass(NullWritable.class);
    		job.setMapOutputValueClass(Put.class);
    		// Set reducer configuration
    		// 1. Run on the cluster and print it as jar (the addDependencyJars parameter is required to be true, which is true by default)
    		// TableMapReduceUtil.initTableReducerJob(EventLogConstants.HBASE_NAME_EVENT_LOGS,
    		// null, job);
    		// 2. For local operation, the parameter addDependencyJars is required to be false
    		TableMapReduceUtil.initTableReducerJob(
    				EventLogConstants.HBASE_NAME_EVENT_LOGS, null, job, null, null,
    				null, null, false);
    		job.setNumReduceTasks(0);
    
    		// Set input path
    		this.setJobInputPaths(job);
    		return job.waitForCompletion(true) ? 0 : -1;
    	}
    
    	/**
    	 * Processing parameters
    	 * 
    	 * @param conf
    	 * @param args
    	 * -d 2019-06-01
    	 */
    	private void processArgs(Configuration conf, String[] args) {
    		String date = null;
    		for (int i = 0; i < args.length; i++) {
    			if ("-d".equals(args[i])) {
    				if (i + 1 < args.length) {
    					date = args[++i];
    					break;
    				}
    			}
    		}
    		
    		System.out.println("-----" + date);
    
    		// The required date format is yyyy mm DD   
    		if (StringUtils.isBlank(date) || !TimeUtil.isValidateRunningDate(date)) {
    			// date is an invalid time data
    			date = TimeUtil.getday(); // The default time is yesterday
    			System.out.println(date);
    		}
    		conf.set(GlobalConstants.RUNNING_DATE_PARAMES, date);
    	}
    
    	/**
    	 * Set the input path of the job
    	 * 
    	 * @param job
    	 */
    	private void setJobInputPaths(Job job) {
    		Configuration conf = job.getConfiguration();
    		FileSystem fs = null;
    		try {
    			fs = FileSystem.get(conf);
    			String date = conf.get(GlobalConstants.RUNNING_DATE_PARAMES);
    			 Path inputPath = new Path("/data/test/flume/"
    					 + TimeUtil.parseLong2String(
    			 				TimeUtil.parseString2Long(date), "yyyyMMdd")
    					+ "/");
    System.out.println(inputPath);
    
    //			Path inputPath = new Path("/log/"
    //					+ TimeUtil.parseLong2String(
    //							TimeUtil.parseString2Long(date), "yyyyMMdd")
    //					+ "/");
    			
    			if (fs.exists(inputPath)) {
    				FileInputFormat.addInputPath(job, inputPath);
    			} else {
    				throw new RuntimeException("directory does not exist:" + inputPath);
    			}
    		} catch (IOException e) {
    			throw new RuntimeException("set up job of mapreduce An exception occurred in the input path", e);
    		} finally {
    			if (fs != null) {
    				try {
    					fs.close();
    				} catch (IOException e) {
    					// nothing
    				}
    			}
    		}
    	}
    
    }
    
  • Code writing of AnalyserLogDataMapper

    Create an AnalyserLogDataMapper class file, read the hdfs file, and parse it. Main functions: filter out data, parse logs, and create rowkey s according to uuid memberid servertime

    No customization required

    public class AnalyserLogDataMapper extends Mapper<LongWritable, Text, NullWritable, Put> {
    	
    	private final Logger logger = Logger.getLogger(AnalyserLogDataMapper.class);
    	private int inputRecords, filterRecords, outputRecords; // It is mainly used for marking to facilitate viewing and filtering data
    	private byte[] family = Bytes.toBytes(EventLogConstants.EVENT_LOGS_FAMILY_NAME);
    	private CRC32 crc32 = new CRC32();
    	/**
    	 * 
    	 192.168.78.1^A1542816232.816^Anode01^A/log.gif?en=e_e&ca=event%E7%9A%84category%E5%90%8D%E7%A7%B0&ac=event%E7%9A%84action%E5%90%8D%E7%A7%B0
    	 &ver=1&pl=website&sdk=js&u_ud=9061294F-721C-4838-83E3-B4F6E6EB3233&u_mid=zhangsan&u_sd=E9175784-A5D7-4692-806B-5FF384902D9D&c_time=1542781455523&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F60.0.3112.113%20Safari%2F537.36&b_rst=1536*864
    
    	 */
    	@Override
    	protected void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		
    		this.inputRecords++;
    		this.logger.debug("Analyse data of :" + value);
    
    		try {
    			// Parse log
    			Map<String, String> clientInfo = LoggerUtil.handleLog(value.toString());
    
    			// Filter failed data
    			if (clientInfo.isEmpty()) {
    				this.filterRecords++;
    				return;
    			}
    
    			// Get event name
    			String eventAliasName = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_EVENT_NAME);
    			
    			
    			EventEnum event = EventEnum.valueOfAlias(eventAliasName);
    			switch (event) {
    			case LAUNCH:
    			case PAGEVIEW:
    			case CHARGEREQUEST:
    			case CHARGEREFUND:
    			case CHARGESUCCESS:
    			case EVENT:
    				// Processing data
    				this.handleData(clientInfo, event, context);
    				break;
    			default:
    				this.filterRecords++;
    				this.logger.warn("The event cannot be resolved. The event name is:" + eventAliasName);
    			}
    		} catch (Exception e) {
    			this.filterRecords++;
    			this.logger.error("Processing data, sending exception, data:" + value, e);
    		}
    	}
    
    	@Override
    	protected void cleanup(Context context) throws IOException, InterruptedException {
    		super.cleanup(context);
    		logger.info("input data:" + this.inputRecords + ";output data:" + this.outputRecords
    				+ ";Filter data:" + this.filterRecords);
    	}
    
    	/**
    	 * Specific data processing methods
    	 * 
    	 * @param clientInfo
    	 * @param context
    	 * @param event
    	 * @throws InterruptedException
    	 * @throws IOException
    	 */
    	private void handleData(Map<String, String> clientInfo, EventEnum event,
    			Context context) throws IOException, InterruptedException {
    		String uuid = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_UUID);
    		String memberId = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_MEMBER_ID);
    		String serverTime = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_SERVER_TIME);
    		if (StringUtils.isNotBlank(serverTime)) {
    			// Server time required is not empty
    			clientInfo.remove(EventLogConstants.LOG_COLUMN_NAME_USER_AGENT); // Remove browser information
    			String rowkey = this.generateRowKey(uuid, memberId, event.alias, serverTime); // timestamp
    									// +
    									// (uuid+memberid+event).crc
    			Put put = new Put(Bytes.toBytes(rowkey));
    			for (Map.Entry<String, String> entry : clientInfo.entrySet()) {
    				if (StringUtils.isNotBlank(entry.getKey()) && StringUtils.isNotBlank(entry.getValue())) {
    					put.addColumn(family, Bytes.toBytes(entry.getKey()), Bytes.toBytes(entry.getValue()));
    				}
    			}
    			context.write(NullWritable.get(), put);
    			this.outputRecords++;
    		} else {
    			this.filterRecords++;
    		}
    	}
    
    	/**
    	 * Create rowkey based on uuid memberid servertime
    	 * 
    	 * @param uuid
    	 * @param memberId
    	 * @param eventAliasName
    	 * @param serverTime
    	 * @return
    	 */
    	private String generateRowKey(String uuid, String memberId, String eventAliasName, String serverTime) {
    		StringBuilder sb = new StringBuilder();
    		sb.append(serverTime).append("_");
    		this.crc32.reset();
    		if (StringUtils.isNotBlank(uuid)) {
    			this.crc32.update(uuid.getBytes());
    		}
    		if (StringUtils.isNotBlank(memberId)) {
    			this.crc32.update(memberId.getBytes());
    		}
    		this.crc32.update(eventAliasName.getBytes());
    		sb.append(this.crc32.getValue() % 100000000L);
    		return sb.toString();
    	}
    }
    

Cluster profile:

You need to replace it with your own configuration file in the resource directory

core-site.xml ,hbase-site.xml ,hdfs-site.xml ,yarn-site.xml

ETL packaging test

Testing in Linux

Upload the jar package to the / root/tmp/jar directory of linux

cd /root/tmp/jar

Run jar package test

java -jar ETL.jar

verification

//Log in to Hbase
hbase shell

//Scan table
scan event

HBase test data loading

  • The idea of storing simulation data into hbase:
    • First, create a java class to store the simulated user id, session id and system time;
    • Create HBase table object;
    • Create a tool class for resolving ip, obtain the ip address through the tool class, and finally output the field data of province, city and country;
    • Create a method to randomly generate system time, random events, random web address, random browser, random platform and random system. You can specify the random range, and finally generate and output browser and browser_v,en,os,os_v,p_url, pl field data
    • Add a piece of output data to the put set of the and return the put object;
    • Use table The put method loads the data into the HBase table;

Create a new TestDataMaker test file:

Need to customize TN = "event"

public class TestDataMaker {

    //Table configuration
    private static String TN = "event";
    private static Configuration conf;
    private static Connection connection;
    private static Admin admin;
    private static Table table;


    public static void main(String[] args) throws Exception {

        TestDataMaker tDataMaker = new TestDataMaker();
        Random r = new Random();

        conf = HBaseConfiguration.create();
        connection = ConnectionFactory.createConnection(conf);

        Admin admin = connection.getAdmin();
        table = connection.getTable(TableName.valueOf(TN));

        // User ID u_ud randomly generates 8 bits
        String uuid = String.format("%08d", r.nextInt(99999999));
        
        // Member ID u_mid randomly generates 8 bits
        String memberId = String.format("%08d", r.nextInt(99999999));

        List<Put> puts = new ArrayList<Put>();
        
        for (int i = 0; i < 100; i++) {
            if(i%5==0) {
                uuid = String.format("%08d", r.nextInt(99999999));
                memberId = String.format("%08d", r.nextInt(99999999));
            }
            if(i%6==0) {
                uuid = String.format("%08d", r.nextInt(99999999));
                memberId = String.format("%08d", r.nextInt(99999999));
            }

            SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd");
            Calendar calendar = Calendar.getInstance();
            calendar.add(Calendar.DATE, -1);
            Date d = tDataMaker.getDate(df.format(calendar.getTime()));

            String serverTime = ""+d.getTime();

            Put put = tDataMaker.putMaker(uuid, memberId, serverTime);
            puts.add(put);
        }
        
        table.put(puts);
    }

    Random r = new Random();

    private static IPSeekerExt ipSeekerExt = new IPSeekerExt();

    /**
     * test data
     * day Date:
     * lognum Number of logs
     */
    public Put putMaker(String uuid, String memberId, String serverTime) {

        Map<String, Put> map = new HashMap<String, Put>();

        byte[] family = Bytes.toBytes(EventLogConstants.EVENT_LOGS_FAMILY_NAME);

        // Parse log
        Map<String, String> clientInfo = LoggerUtil.handleLog("......");

        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_SERVER_TIME, serverTime);
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_UUID, uuid);
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_PLATFORM, "website");

        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_EVENT_NAME, EventNames[r.nextInt(EventNames.length)]);
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_SESSION_ID, SessionIDs[r.nextInt(SessionIDs.length)]);
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_CURRENT_URL, CurrentURLs[r.nextInt(CurrentURLs.length)]);


        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_NAME, this.getOsName());
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_VERSION, this.getOsVersion());
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_NAME, this.getBrowserName());
        clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_VERSION, this.getBrowserVersion());

        String ip = IPs[r.nextInt(IPs.length)];
        RegionInfo info = ipSeekerExt.analyticIp(ip);
        if (info != null) {
            clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_COUNTRY, info.getCountry());
            clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_PROVINCE, info.getProvince());
            clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_CITY, info.getCity());
        }

        String eventName = EventNames[r.nextInt(EventNames.length)];

        //Generate rowkey
        String rowkey = this.generateRowKey(uuid, memberId, eventName, serverTime);

        Put put = new Put(Bytes.toBytes(rowkey));
        for (Map.Entry<String, String> entry : clientInfo.entrySet()) {
            put.addColumn(family, Bytes.toBytes(entry.getKey()), Bytes.toBytes(entry.getValue()));
        }

        return put;
    }

    private String[] CurrentURLs = new String[]{"http://www.jd.com",
            "http://www.tmall.com","http://www.sina.com","http://www.weibo.com"};

    private String[] SessionIDs = new String[]{"1A3B4F83-6357-4A64-8527-F092169746D3",
            "12344F83-6357-4A64-8527-F09216974234","1A3B4F83-6357-4A64-8527-F092169746D8"};

    private String[] IPs = new String[]{"58.42.245.255","39.67.154.255",
            "23.13.191.255","14.197.148.38","14.197.149.137","14.197.201.202","14.197.243.254"};

    private String[] EventNames = new String[]{"e_l","e_pv"};

    private String[] BrowserNames = new String[]{"FireFox","Chrome","aoyou","360"};

    /**
     * Gets the random browser name
     * @return
     */
    private String getBrowserName() {
        return BrowserNames[r.nextInt(BrowserNames.length)];
    }


    /**
     * Get random browser version information
     * @return
     */
    private String getBrowserVersion() {
        return (""+r.nextInt(9));
    }

    /**
     * Obtain random system version information
     * @return
     */
    private String getOsVersion() {
        return (""+r.nextInt(3));
    }

    private String[] OsNames = new String[]{"window","linux","ios"};
    
    /**
     * Obtain random system information
     * @return
     */
    private String getOsName() {
        return OsNames[r.nextInt(OsNames.length)];
    }

    private CRC32 crc32 = new CRC32();

    /**
     * Create rowkey based on uuid memberid servertime
     * @param uuid
     * @param memberId
     * @param eventAliasName
     * @param serverTime
     * @return
     */
    private String generateRowKey(String uuid, String memberId, String eventAliasName, String serverTime) {
        StringBuilder sb = new StringBuilder();
        sb.append(serverTime).append("_");
        this.crc32.reset();
        if (StringUtils.isNotBlank(uuid)) {
            this.crc32.update(uuid.getBytes());
        }
        if (StringUtils.isNotBlank(memberId)) {
            this.crc32.update(memberId.getBytes());
        }
        this.crc32.update(eventAliasName.getBytes());
        sb.append(this.crc32.getValue() % 100000000L);
        return sb.toString();
    }

    SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");

    /**
     * Random generation time
     * @param str  Mm / DD / 20160101
     * @return
     */
    public Date getDate(String str) {
        str = str + String.format("%02d%02d%02d", new Object[]{r.nextInt(24), r.nextInt(60), r.nextInt(60)});
        Date d = new Date();
        try {
            d = sdf.parse(str);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        return d;
    }
}

Data query

Create a new HbaseScan class file: query the data in HBase

To be customized: scanTable("event", null, null);

public class HbaseScan {

    private static Configuration conf;
    private static Connection connection;
    private static Admin admin;

    static {
        conf = HBaseConfiguration.create();
        try {
            connection = ConnectionFactory.createConnection(conf);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }


    /**
     * Get HBase administrator
     * @return
     * @throws IOException
     */
    public static Admin getAdmin() throws IOException {
        return connection.getAdmin();
    }


     /**
     * Scan table
     * @param tableName
     * @param startRow  Starting position
     * @param stopRow  End position
     */
    public static void scanTable(String tableName, String startRow, String stopRow) {
        if(tableName == null || tableName.length() == 0) {
            System.out.println("Please enter the table name correctly!");
            return;
        }
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
            // Left closed right open
            if(startRow != null && stopRow != null) {
                scan.withStartRow(Bytes.toBytes(startRow));
                scan.withStopRow(Bytes.toBytes(stopRow));
            } else {
                startRow = "invalid";
                stopRow = "invalid";
            }
            ResultScanner resultScanner = table.getScanner(scan);
            Iterator<Result> iterator = resultScanner.iterator();
            System.out.println("scan\t startRow: " + startRow + "\t stopRow: " + stopRow);
            System.out.println("RowKey\tTimeStamp\tcolumnFamilyName\tcolumnQualifierName");
            while(iterator.hasNext()) {
                Result result = iterator.next();
                showCell(result);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            closeTable(table);
        }
    }

    /**
     * Format output
     * @param result
     */
    public static void showCell(Result result) {
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.print(Bytes.toString(CellUtil.cloneRow(cell)) + "\t");
            System.out.print(cell.getTimestamp() + "\t");
            String columnFamilyName = Bytes.toString(CellUtil.cloneFamily(cell));
            String columnQualifierName = Bytes.toString(CellUtil.cloneQualifier(cell));
            String value = Bytes.toString(CellUtil.cloneValue(cell));
            System.out.println(columnFamilyName + ":" + columnQualifierName + "\t\t\t" + value);
        }
    }

    /**
     * Close connection
     */
    public static void closeConn() {
        try {
            if (null != admin){
                admin.close();
                connection.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    /**
     * Close table
     * @param table
     */
    public static void closeTable(Table table) {
        if(table != null) {
            try {
                table.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }


        /**
     * Gets the specified row
     * @param tableName
     * @param rowKey
     * @param colFamily
     * @param col
     */
    public static void getRow(String tableName, String rowKey, String colFamily, String col) {
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            Get g = new Get(Bytes.toBytes(rowKey));
            // Gets the specified column family data
            if(col == null && colFamily != null) {
                g.addFamily(Bytes.toBytes(colFamily));
            } else if(col != null && colFamily != null) {
                // Gets the specified column data
                g.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col));
            }
            Result result = table.get(g);
            System.out.println("getRow\t");
            System.out.println("RowKey\tTimeStamp\tcolumnFamilyName\tcolumnQualifierName");
            showCell(result);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            closeTable(table);
        }
    }


    public static void main(String[] args) throws IOException {
        admin = getAdmin();
        scanTable("event", null, null);
//        getRow("nevent", "1542988607000_9327110", "log", "city");
        closeConn();
    }
}

Data analysis

According to the requirements and objectives, the table structure is designed. We need to combine MapReduce statistics according to the time range (year, month and day).

Idea:

a) Dimension, that is, a certain angle, a certain perspective, is counted according to the time dimension. For example, I want to count the pv of all months and all days in 2017. Then this dimension can be expressed as mm / DD / YY, 2017

b) Aggregate data to Reducer according to different dimensions through Mapper

c) Get the data aggregated according to various dimensions through Reducer, summarize and output

d) According to the business requirements, the output of the Reducer is transmitted to the data through the Outputformat

**Data input: * * HBase

**Data output: * * Mysql

Data source structure in HBase:

labelEx amp le & Illustration
rowkeytimestamp+(uid+mid+en)crc
logf1 column family: store log information (en) event name, eg: e_pvver version number, eg: 0.0.1pl platform, eg: websitesdk Sdk type, eg: jsb_rst browser resolution, eg: 1800678b_iev browser information useragentu UD user / visitor unique identifier l client language u_mid member id, consistent with the business system, u_sd session idc_time client time p_url urlp_ref of the current page urltt of the previous page Page title ca Event Category name ac Event action name kV_ Custom attribute of event event duration of event order name of order payment amount payment currency type pt payment method)

a) If the goal is known, it is necessary to consider whether the existing data can support the realization of the goal in combination with the goal;

b) According to the target data structure, build Mysql table structure and create tables;

c) Think about which functional modules need to be involved in the code, and establish the package structure corresponding to different functional modules.

d) The description data must be based on a certain dimension (perspective), so construct a dimension class. For example, aggregate all data according to the combination of "platform" and "browser" as a key, so you can count the relevant results of this user in this year.

e) The custom OutputFormat is used to interface with Mysql to output data.

f) Create related tool classes.

Mysql table structure design

We save the analysis result data to Mysql to facilitate query and display on the Web.

  • MySql storage structure

    In mysql, we use three types of tables: dimension information table + statistical analysis result table + analysis auxiliary table. The dimension information table is used to store dimension related information. The name format is: dimension_; The statistical analysis result table stores the final statistical analysis results. The dimension id is used as the main key. The name format is: stats_; Analysis auxiliary tables are other auxiliary tables used by users during analysis.

    According to the final dimension information, we need to create the following eight dimension tables: platform, date, browser, location, payment, currency_type, event and inbound. In addition, you need a kpi dimension and an operating system dimension (os) table respectively. Note that the os table will not be used in this project.

    Analysis moduleRelated dimension table
    User basic information analysisplatform,date
    Browser information analysisplatform,date,browser
    Regional information analysisplatform,date,location
    User browsing depth analysisplatform,date,kpi
    External chain information analysisplatform,date,inbound
    Order information analysisplatform,date,currency_type,payment
    Event analysisplatform,date,event
  • In the user basic information analysis module, the database corresponding table is required to have data in the following dimensions: the number of new users, the number of active users, the total number of users, the number of new members, the number of active members, the total number of members, the number of sessions, and the session length. In addition, the platform and date dimension information id and the created field are required to indicate the modification time. The table data is uniquely determined by the two field information of platform and date. The design table name is stats_user. In addition to this table, we also need to count the data information by time period, so we also need to have a table for storing statistical data by time. The design table name is stats_hourly.

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    active_usersint(11)Empty. 0Number of active users
    new_install_usersing(11)Empty, 0Number of new users
    total_intall_usersint(11)Empty, 0Total users
    sessionsint(11)Empty, 0Number of sessions
    sessions_lengthint(11)Empty, 0Session length
    total_membersint(11)Empty, 0Total members
    active_membersint(11)Empty, 0Number of active members
    new_membersint(11)Empty, 0Number of new members
    createddateEmpty, nullRecord date
  • The basic types of browser information analysis and user basic information analysis also contain data in the following dimensions: the number of new users, the number of active users, the total number of users, the number of new members, the number of active members, the total number of members, the number of sessions, and the length of sessions. In addition, a statistical indicator of pv count, platform, date The three dimension information fields of browser and create indicate the modification date. The table data is uniquely determined by the three field information of platform, date and browser. The design table name is stats_device_browser

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    browser_dimension_idint(11)Non empty, 0Browser id, pkey
    active_userint(11)Empty. 0Number of active users
    new_install_usersing(11)Empty, 0Number of new users
    total_intall_usersint(11)Empty, 0Total users
    sessionsint(11)Empty, 0Number of sessions
    sessions_lengthint(11)Empty, 0Session length
    total_membersint(11)Empty, 0Total members
    active_membersint(11)Empty, 0Number of active members
    new_membersint(11)Empty, 0Number of new members
    pvint(11)Empty, 0pv number
    createddateEmpty, nullLast modified date
  • The regional information analysis module only analyzes the regional distribution of active users and the correlation analysis of jump out rate, so the following statistical indicators are required: number of active users, number of sessions and number of jump out sessions. In addition, the three dimension information fields of platform, date and location and the field of create indicating the modification date are required. The table data is uniquely determined by three field information: platform, date and location. The design table name is stats_device_location

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    location_dimension_idint(11)Non empty, 0Region id, pkey
    active_userint(11)Empty. 0Number of active users
    sessionsint(11)Empty, 0Number of sessions
    bounce_sessionsint(11)Empty, 0Number of jump out sessions
    createddateEmpty, nullLast modified date
  • Users' browsing depth is expressed by calculating the number of users / sessions accessing different numbers of pages. In this project, we are divided into 8 indicators of different orders of magnitude, namely: accessing 1 pv, accessing 2 pv, accessing 3 pv, accessing 4 pv, accessing 5-10 pv (including 5 but not including 10), accessing 10-30 pv, accessing 30-60 pv, and accessing 60+pv. In addition, the three dimension information fields of platform, date and kpi and the field of create indicating the modification date are required. The table data is uniquely determined by three field information: platform, date and kpi. The design table name is stats_view_depth

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    kpi_dimension_idint(11)Non empty, 0kpiid,pkey
    pv1int(11)Empty. 0Number of visits to only one page
    pv2int(11)Empty, 0Access two pages
    pv3int(11)Empty, 0Visit three pages
    pv4int(11)Empty, 0Access four pages
    pv5_10int(11)Empty, 0Visit [5,10) pages
    pv10_30int(11)Empty, 0Visit [10,30) pages
    pv30_60int(11)Empty, 0Visit [30,60) pages
    pv60+int(11)Empty, 0Visit [60,...) pages
    createddateEmpty, nullLast modified date
  • External chain information analysis mainly includes external chain composition (preference) analysis and jump out rate analysis. The composition (preference) of the external chain is marked by the number of active users. We need several statistical indicators: the number of active users, the number of sessions and the number of outgoing sessions. In addition, the three dimension information fields of platform, date and inbound and the field of create indicating the modification date are required. The table data is uniquely determined by three field information: platform, date and inbound. The design table name is stats_inbound

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    inbound_dimension_idint(11)Non empty, 0Outer chain id, pkey
    active_userint(11)Empty. 0Number of active users
    sessionsint(11)Empty, 0Number of sessions
    bounce_sessionsint(11)Empty, 0Number of jump out sessions
    createddateEmpty, nullLast modified date
  • Order information analysis needs to analyze the statistical information related to order quantity and order amount, so it needs data analysis of the following indicators: order quantity, successfully paid order quantity, refund order quantity, order amount, successfully paid amount, refund amount, total successfully paid amount and total refund amount. In addition, we also need to use platform, date and currency_ The four dimension field classes of type and payment represent the confirmation of unique records. In addition, you need to add a created field to represent the data date. The table name is designed as stats_order

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    currency_type_dimension_idint(11)Non empty, 0Currency type id, pkey
    payment_type_dimension_idint(11)Non empty, 0Payment type id, pkey
    ordersint(11)Empty, 0Order quantity
    success_ordersint(11)Empty, 0Number of orders successfully paid
    refund_ordersint(11)Empty, 0Order quantity refunded
    order_amountint(11)Empty, 0Order amount
    revenue_amountint(11)Empty, 0Payment amount
    refund_amountint(11)Empty, 0refund amount
    total_revenue_amountint(11)Empty, 0Total payment amount
    total_refund_amountint(11)Empty, 0Total refund amount
    createddateEmpty, nullLast modified date
  • In this project, event analysis is mainly to analyze the trigger times of events. Therefore, the data storage structure of is: times, platform, date and event dimension fields and created fields. The table name is designed as: stats_event

    ListingtypeDefault valuedescribe
    platform_dimension_idint(11)Non empty, 0Platform id, pkey
    date_dimension_idint(11)Non empty, 0Date id, pkey
    event_dimension_idint(11)Non empty, 0event dimension id, pkey
    timesint(11)Empty. 0Trigger times
    createddateEmpty, nullLast modified date

We ensure the normality of our data display through the integration of dimension information table and statistical analysis result table. The command for creating database is: CREATE DATABASEreport DEFAULT CHARACTER SET utf8;

  • In this project, the platform modeling is mainly to analyze the annual average pv, monthly average pv, weekly average pv, monthly daily average transaction amount, annual average order quantity, monthly average order quantity, total transaction amount in recent one year, transaction amount in recent one month, transaction amount in recent one week, total transactions in recent one year, transactions in recent one month and transactions in recent one week. The table name is designed as: inner_fct_sxt_deal

    ListingtypeDefault valuedescribe
    platform_dimension_idvachar(110)Non empty, 0Platform id, pkey
    date_dimension_idvachar(110)Non empty, 0Date id, pkey
    year_avg_pvvachar(30)Empty, 'Annual average daily pv
    month_avg_pvvachar(30)Empty, 'Monthly daily average pv
    week_avg_pvvachar(30)Empty, 'Average pv per day
    month_day_balvachar(30)Empty, 'Monthly and daily average transaction amount
    year_avg_balvachar(30)Empty, 'Average annual order quantity
    month_avg_balvachar(30)Empty, 'Monthly average order quantity
    year_sum_balvachar(30)Empty, 'Total transaction amount in recent 1 year
    month_sum_balvachar(30)Empty, 'Transaction amount in recent month
    week_sum_balvachar(30)Empty, 'Transaction amount in recent week
    year_sum_countvachar(30)Empty, 'Total transactions in recent 1 year
    month_sum_countvachar(30)Empty, 'Number of transactions in recent month
    week_sum_countvachar(30)Empty, 'Number of transactions in recent week

Mysql table creation statement

#
# Structure for table "dimension_browser"
#

DROP TABLE IF EXISTS `dimension_browser`;
CREATE TABLE `dimension_browser` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `browser_name` varchar(45) NOT NULL DEFAULT '' COMMENT 'Browser name',
  `browser_version` varchar(255) NOT NULL DEFAULT '' COMMENT 'Browser version number',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Browser dimension information table';

#
# Structure for table "dimension_currency_type"
#

DROP TABLE IF EXISTS `dimension_currency_type`;
CREATE TABLE `dimension_currency_type` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `currency_name` varchar(10) DEFAULT NULL COMMENT 'Currency name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Payment currency type dimension information table';

#
# Structure for table "dimension_date"
#

DROP TABLE IF EXISTS `dimension_date`;
CREATE TABLE `dimension_date` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `year` int(11) DEFAULT NULL,
  `season` int(11) DEFAULT NULL,
  `month` int(11) DEFAULT NULL,
  `week` int(11) DEFAULT NULL,
  `day` int(11) DEFAULT NULL,
  `calendar` date DEFAULT NULL,
  `type` enum('year','season','month','week','day') DEFAULT NULL COMMENT 'Date format',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Time dimension information table';

#
# Structure for table "dimension_event"
#

DROP TABLE IF EXISTS `dimension_event`;
CREATE TABLE `dimension_event` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `category` varchar(255) DEFAULT NULL COMMENT 'Event type category',
  `action` varchar(255) DEFAULT NULL COMMENT 'event action name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Event dimension information table';

#
# Structure for table "dimension_inbound"
#

DROP TABLE IF EXISTS `dimension_inbound`;
CREATE TABLE `dimension_inbound` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL COMMENT 'Parent outer chain id',
  `name` varchar(45) DEFAULT NULL COMMENT 'External chain name',
  `url` varchar(255) DEFAULT NULL COMMENT 'Outer chain url',
  `type` int(11) DEFAULT NULL COMMENT 'External chain type',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Dimension information table of external chain source data';

#
# Structure for table "dimension_kpi"
#

DROP TABLE IF EXISTS `dimension_kpi`;
CREATE TABLE `dimension_kpi` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `kpi_name` varchar(45) DEFAULT NULL COMMENT 'kpi Dimension name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='kpi Dimension related information table';

#
# Structure for table "dimension_location"
#

DROP TABLE IF EXISTS `dimension_location`;
CREATE TABLE `dimension_location` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `country` varchar(45) DEFAULT NULL COMMENT 'Country name',
  `province` varchar(45) DEFAULT NULL COMMENT 'Province name',
  `city` varchar(45) DEFAULT NULL COMMENT 'City name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Regional information dimension table';

#
# Structure for table "dimension_os"
#

DROP TABLE IF EXISTS `dimension_os`;
CREATE TABLE `dimension_os` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `os_name` varchar(45) NOT NULL DEFAULT '' COMMENT 'Operating system name',
  `os_version` varchar(45) NOT NULL DEFAULT '' COMMENT 'Operating system version number',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Operating system information dimension table';

#
# Structure for table "dimension_payment_type"
#

DROP TABLE IF EXISTS `dimension_payment_type`;
CREATE TABLE `dimension_payment_type` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `payment_type` varchar(255) DEFAULT NULL COMMENT 'Name of payment method',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Payment method dimension information table';

#
# Structure for table "dimension_platform"
#

DROP TABLE IF EXISTS `dimension_platform`;
CREATE TABLE `dimension_platform` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `platform_name` varchar(45) DEFAULT NULL COMMENT 'Platform name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Platform dimension information table';

#
# Structure for table "event_info"
#

DROP TABLE IF EXISTS `event_info`;
CREATE TABLE `event_info` (
  `event_dimension_id` int(11) NOT NULL DEFAULT '0',
  `key` varchar(255) DEFAULT NULL,
  `value` varchar(255) DEFAULT NULL,
  `times` int(11) DEFAULT '0' COMMENT 'Trigger times',
  PRIMARY KEY (`event_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='describe event The attribute information of will not be used in this project';

#
# Structure for table "order_info"
#

DROP TABLE IF EXISTS `order_info`;
CREATE TABLE `order_info` (
  `order_id` varchar(50) NOT NULL DEFAULT '',
  `date_dimension_id` int(11) NOT NULL DEFAULT '0',
  `amount` int(11) NOT NULL DEFAULT '0' COMMENT 'Order amount',
  `is_pay` int(1) DEFAULT '0' COMMENT 'Indicates whether to pay, 0 indicates not paid, and 1 indicates paid',
  `is_refund` int(1) DEFAULT '0' COMMENT 'Indicates whether to refund, 0 indicates no refund, and 1 indicates refund',
  PRIMARY KEY (`order_id`,`date_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Describe the relevant information of the order table The main goal of this project is to remove duplicate data';

#
# Structure for table "stats_device_browser"
#

DROP TABLE IF EXISTS `stats_device_browser`;
CREATE TABLE `stats_device_browser` (
  `date_dimension_id` int(11) NOT NULL,
  `platform_dimension_id` int(11) NOT NULL,
  `browser_dimension_id` int(11) NOT NULL DEFAULT '0',
  `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users',
  `new_install_users` int(11) DEFAULT '0' COMMENT 'Number of new users',
  `total_install_users` int(11) DEFAULT '0' COMMENT 'Total users',
  `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions',
  `sessions_length` int(11) DEFAULT '0' COMMENT 'Session length',
  `total_members` int(11) unsigned DEFAULT '0' COMMENT 'Total members',
  `active_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of active members',
  `new_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of new members',
  `pv` int(11) DEFAULT '0' COMMENT 'pv number',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`browser_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistics of browser related analysis data';

#
# Structure for table "stats_device_location"
#

DROP TABLE IF EXISTS `stats_device_location`;
CREATE TABLE `stats_device_location` (
  `date_dimension_id` int(11) NOT NULL,
  `platform_dimension_id` int(11) NOT NULL,
  `location_dimension_id` int(11) NOT NULL DEFAULT '0',
  `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users',
  `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions',
  `bounce_sessions` int(11) DEFAULT '0' COMMENT 'Number of jump out sessions',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`location_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistical table of regional correlation analysis data';

#
# Structure for table "stats_event"
#

DROP TABLE IF EXISTS `stats_event`;
CREATE TABLE `stats_event` (
  `platform_dimension_id` int(11) NOT NULL DEFAULT '0',
  `date_dimension_id` int(11) NOT NULL DEFAULT '0',
  `event_dimension_id` int(11) NOT NULL DEFAULT '0',
  `times` int(11) DEFAULT '0' COMMENT 'Trigger times',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`event_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Statistical table for statistical event related analysis data';

#
# Structure for table "stats_hourly"
#

DROP TABLE IF EXISTS `stats_hourly`;
CREATE TABLE `stats_hourly` (
  `platform_dimension_id` int(11) NOT NULL,
  `date_dimension_id` int(11) NOT NULL,
  `kpi_dimension_id` int(11) NOT NULL,
  `hour_00` int(11) DEFAULT '0',
  `hour_01` int(11) DEFAULT '0',
  `hour_02` int(11) DEFAULT '0',
  `hour_03` int(11) DEFAULT '0',
  `hour_04` int(11) DEFAULT '0',
  `hour_05` int(11) DEFAULT '0',
  `hour_06` int(11) DEFAULT '0',
  `hour_07` int(11) DEFAULT '0',
  `hour_08` int(11) DEFAULT '0',
  `hour_09` int(11) DEFAULT '0',
  `hour_10` int(11) DEFAULT '0',
  `hour_11` int(11) DEFAULT '0',
  `hour_12` int(11) DEFAULT '0',
  `hour_13` int(11) DEFAULT '0',
  `hour_14` int(11) DEFAULT '0',
  `hour_15` int(11) DEFAULT '0',
  `hour_16` int(11) DEFAULT '0',
  `hour_17` int(11) DEFAULT '0',
  `hour_18` int(11) DEFAULT '0',
  `hour_19` int(11) DEFAULT '0',
  `hour_20` int(11) DEFAULT '0',
  `hour_21` int(11) DEFAULT '0',
  `hour_22` int(11) DEFAULT '0',
  `hour_23` int(11) DEFAULT '0',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`kpi_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistics by hour';

#
# Structure for table "stats_inbound"
#

DROP TABLE IF EXISTS `stats_inbound`;
CREATE TABLE `stats_inbound` (
  `platform_dimension_id` int(11) NOT NULL DEFAULT '0',
  `date_dimension_id` int(11) NOT NULL,
  `inbound_dimension_id` int(11) NOT NULL,
  `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users',
  `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions',
  `bounce_sessions` int(11) DEFAULT '0' COMMENT 'Number of jump out sessions',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`inbound_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistics of external chain information';

#
# Structure for table "stats_order"
#

DROP TABLE IF EXISTS `stats_order`;
CREATE TABLE `stats_order` (
  `platform_dimension_id` int(11) NOT NULL DEFAULT '0',
  `date_dimension_id` int(11) NOT NULL DEFAULT '0',
  `currency_type_dimension_id` int(11) NOT NULL DEFAULT '0',
  `payment_type_dimension_id` int(11) NOT NULL DEFAULT '0',
  `orders` int(11) DEFAULT '0' COMMENT 'Number of orders',
  `success_orders` int(11) DEFAULT '0' COMMENT 'Number of orders successfully paid',
  `refund_orders` int(11) DEFAULT '0' COMMENT 'Number of refund orders',
  `order_amount` int(11) DEFAULT '0' COMMENT 'Order amount',
  `revenue_amount` int(11) DEFAULT '0' COMMENT 'The amount of income, that is, the amount successfully paid',
  `refund_amount` int(11) DEFAULT '0' COMMENT 'refund amount ',
  `total_revenue_amount` int(11) DEFAULT '0' COMMENT 'Total order transactions to date',
  `total_refund_amount` int(11) DEFAULT '0' COMMENT 'Total refund amount to date',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`currency_type_dimension_id`,`payment_type_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Statistics table of order information';

#
# Structure for table "stats_user"
#

DROP TABLE IF EXISTS `stats_user`;
CREATE TABLE `stats_user` (
  `date_dimension_id` int(11) NOT NULL,
  `platform_dimension_id` int(11) NOT NULL,
  `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users',
  `new_install_users` int(11) DEFAULT '0' COMMENT 'Number of new users',
  `total_install_users` int(11) DEFAULT '0' COMMENT 'Total users',
  `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions',
  `sessions_length` int(11) DEFAULT '0' COMMENT 'Session length',
  `total_members` int(11) unsigned DEFAULT '0' COMMENT 'Total members',
  `active_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of active members',
  `new_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of new members',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistical table for basic user information';

#
# Structure for table "stats_view_depth"
#

DROP TABLE IF EXISTS `stats_view_depth`;
CREATE TABLE `stats_view_depth` (
  `platform_dimension_id` int(11) NOT NULL DEFAULT '0',
  `data_dimension_id` int(11) NOT NULL DEFAULT '0',
  `kpi_dimension_id` int(11) NOT NULL DEFAULT '0',
  `pv1` int(11) DEFAULT '0',
  `pv2` int(11) DEFAULT '0',
  `pv3` int(11) DEFAULT '0',
  `pv4` int(11) DEFAULT '0',
  `pv5_10` int(11) DEFAULT '0',
  `pv10_30` int(11) DEFAULT '0',
  `pv30_60` int(11) DEFAULT '0',
  `pv60+` int(11) DEFAULT '0',
  `created` date DEFAULT NULL,
  PRIMARY KEY (`platform_dimension_id`,`data_dimension_id`,`kpi_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Statistical table of user browsing depth related analysis data';

Transformer code function

  • Process the data of various dimensions
  • Integrate some dimensions
  • Use MR program to store the combined dimension and data into mysql

Transformer coding

  • Create basic dimension class

    • BaseDimension coding

      Create a new BaseDimension class file: basic interface, which can be called later

      No customization required

      public abstract class BaseDimension implements WritableComparable<BaseDimension>{
          // nothing
      }
      
    • BrowserDimension coding

      Create a new BrowserDimension class file: get the id, name and version in the browser

      No customization required

      public class BrowserDimension extends BaseDimension {
          private int id; // id
          private String browserName; // name
          private String browserVersion; // edition
      
          public BrowserDimension() {
              super();
          }
      
          public BrowserDimension(String browserName, String browserVersion) {
              super();
              this.browserName = browserName;
              this.browserVersion = browserVersion;
          }
      
          public void clean() {
              this.id = 0;
              this.browserName = "";
              this.browserVersion = "";
          }
      
          public static BrowserDimension newInstance(String browserName, String browserVersion) {
              BrowserDimension browserDimension = new BrowserDimension();
              browserDimension.browserName = browserName;
              browserDimension.browserVersion = browserVersion;
              return browserDimension;
          }
      
          /**
           * Build multiple browser dimension information object collections
           * 
           * @param browserName      chrome
           * @param browserVersion    48
           * @return
           */
          public static List<BrowserDimension> buildList(String browserName, String browserVersion) {
              List<BrowserDimension> list = new ArrayList<BrowserDimension>();
              if (StringUtils.isBlank(browserName)) {
                  // If the browser name is empty, it is set to unknown
                  browserName = GlobalConstants.DEFAULT_VALUE;
                  browserVersion = GlobalConstants.DEFAULT_VALUE;
              }
              if (StringUtils.isEmpty(browserVersion)) {
                  browserVersion = GlobalConstants.DEFAULT_VALUE;
              }
              // list.add(BrowserDimension.newInstance(GlobalConstants.VALUE_OF_ALL,
              // GlobalConstants.VALUE_OF_ALL));
              list.add(BrowserDimension.newInstance(browserName, GlobalConstants.VALUE_OF_ALL));
              list.add(BrowserDimension.newInstance(browserName, browserVersion));
              return list;
          }
      
          public int getId() {
              return id;
          }
      
          public void setId(int id) {
              this.id = id;
          }
      
          public String getBrowserName() {
              return browserName;
          }
      
          public void setBrowserName(String browserName) {
              this.browserName = browserName;
          }
      
          public String getBrowserVersion() {
              return browserVersion;
          }
      
          public void setBrowserVersion(String browserVersion) {
              this.browserVersion = browserVersion;
          }
      
          @Override
          public void write(DataOutput out) throws IOException {
              out.writeInt(this.id);
              out.writeUTF(this.browserName);
              out.writeUTF(this.browserVersion);
          }
      
          @Override
          public void readFields(DataInput in) throws IOException {
              this.id = in.readInt();
              this.browserName = in.readUTF();
              this.browserVersion = in.readUTF();
          }
      
          @Override
          public int compareTo(BaseDimension o) {
              if (this == o) {
                  return 0;
              }
      
              BrowserDimension other = (BrowserDimension) o;
              int tmp = Integer.compare(this.id, other.id);
              if (tmp != 0) {
                  return tmp;
              }
              tmp = this.browserName.compareTo(other.browserName);
              if (tmp != 0) {
                  return tmp;
              }
              tmp = this.browserVersion.compareTo(other.browserVersion);
              return tmp;
          }
      
          @Override
          public int hashCode() {
              final int prime = 31;
              int result = 1;
              result = prime * result + ((browserName == null) ? 0 : browserName.hashCode());
              result = prime * result + ((browserVersion == null) ? 0 : browserVersion.hashCode());
              result = prime * result + id;
              return result;
          }
      
          @Override
          public boolean equals(Object obj) {
              if (this == obj)
                  return true;
              if (obj == null)
                  return false;
              if (getClass() != obj.getClass())
                  return false;
              BrowserDimension other = (BrowserDimension) obj;
              if (browserName == null) {
                  if (other.browserName != null)
                      return false;
              } else if (!browserName.equals(other.browserName))
                  return false;
              if (browserVersion == null) {
                  if (other.browserVersion != null)
                      return false;
              } else if (!browserVersion.equals(other.browserVersion))
                  return false;
              if (id != other.id)
                  return false;
              return true;
          }
      }
      
    • DateDimension code writing

      Create a new DateDimension class file: get the ID, year, season, month, week, day and type of the date

      No customization required

      public class DateDimension extends BaseDimension {
      	
          private int id; // id,eg: 1
          private int year; // Year: eg: 2015
          private int season; // Quarter, eg:4
          private int month; // Month, eg:12
          private int week; // week
          private int day;
          private String type; // type
          private Date calendar = new Date();
      
          /**
           * Get the corresponding time dimension object according to the type
           * 
           * @param time
           *            time stamp
           * @param type
           *            type
           * @return
           */
          public static DateDimension buildDate(long time, DateEnum type) {
              int year = TimeUtil.getDateInfo(time, DateEnum.YEAR);
              Calendar calendar = Calendar.getInstance();
              calendar.clear();
              if (DateEnum.YEAR.equals(type)) {
                  calendar.set(year, 0, 1);
                  return new DateDimension(year, 0, 0, 0, 0, type.name, calendar.getTime());
              }
              int season = TimeUtil.getDateInfo(time, DateEnum.SEASON);
              if (DateEnum.SEASON.equals(type)) {
                  int month = (3 * season - 2);
                  calendar.set(year, month - 1, 1);
                  return new DateDimension(year, season, 0, 0, 0, type.name, calendar.getTime());
              }
              int month = TimeUtil.getDateInfo(time, DateEnum.MONTH);
              if (DateEnum.MONTH.equals(type)) {
                  calendar.set(year, month - 1, 1);
                  return new DateDimension(year, season, month, 0, 0, type.name, calendar.getTime());
              }
              int week = TimeUtil.getDateInfo(time, DateEnum.WEEK);
              if (DateEnum.WEEK.equals(type)) {
                  long firstDayOfWeek = TimeUtil.getFirstDayOfThisWeek(time); // Gets the timestamp of the first day of the week to which the specified timestamp belongs
                  year = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.YEAR);
                  season = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.SEASON);
                  month = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.MONTH);
                  week = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.WEEK);
                  if (month == 12 && week == 1) {
                      week = 53;
                  }
                  return new DateDimension(year, season, month, week, 0, type.name, new Date(firstDayOfWeek));
              }
              int day = TimeUtil.getDateInfo(time, DateEnum.DAY);
              if (DateEnum.DAY.equals(type)) {
                  calendar.set(year, month - 1, day);
      
                  return new DateDimension(year, season, month, week, day, type.name, calendar.getTime());
              }
              throw new RuntimeException("The requested is not supported dateEnum Type to get datedimension object" + type);
          }
      
          public DateDimension() {
              super();
          }
      
          public DateDimension(int year, int season, int month, int week, int day, String type) {
              super();
              this.year = year;
              this.season = season;
              this.month = month;
              this.week = week;
              this.day = day;
              this.type = type;
          }
      
          public DateDimension(int year, int season, int month, int week, int day, String type, Date calendar) {
              this(year, season, month, week, day, type);
              this.calendar = calendar;
          }
      
          public DateDimension(int id, int year, int season, int month, int week, int day, String type, Date calendar) {
              this(year, season, month, week, day, type, calendar);
              this.id = id;
          }
      
          public int getId() {
              return id;
          }
      
          public void setId(int id) {
              this.id = id;
          }
      
          public int getYear() {
              return year;
          }
      
          public void setYear(int year) {
              this.year = year;
          }
      
          public int getSeason() {
              return season;
          }
      
          public void setSeason(int season) {
              this.season = season;
          }
      
          public int getMonth() {
              return month;
          }
      
          public void setMonth(int month) {
              this.month = month;
          }
      
          public int getWeek() {
              return week;
          }
      
          public void setWeek(int week) {
              this.week = week;
          }
      
          public int getDay() {
              return day;
          }
      
          public void setDay(int day) {
              this.day = day;
          }
      
          public String getType() {
              return type;
          }
      
          public void setType(String type) {
              this.type = type;
          }
      
          public Date getCalendar() {
              return calendar;
          }
      
          public void setCalendar(Date calendar) {
              this.calendar = calendar;
          }
      
          @Override
          public void write(DataOutput out) throws IOException {
              out.writeInt(this.id);
              out.writeInt(this.year);
              out.writeInt(this.season);
              out.writeInt(this.month);
              out.writeInt(this.week);
              out.writeInt(this.day);
              out.writeUTF(this.type);
              out.writeLong(this.calendar.getTime());
          }
      
          @Override
          public void readFields(DataInput in) throws IOException {
              this.id = in.readInt();
              this.year = in.readInt();
              this.season = in.readInt();
              this.month = in.readInt();
              this.week = in.readInt();
              this.day = in.readInt();
              this.type = in.readUTF();
              this.calendar.setTime(in.readLong());
          }
      
          @Override
          public int compareTo(BaseDimension o) {
              if (this == o) {
                  return 0;
              }
      
              DateDimension other = (DateDimension) o;
              int tmp = Integer.compare(this.id, other.id);
              if (tmp != 0) {
                  return tmp;
              }
      
              tmp = Integer.compare(this.year, other.year);
              if (tmp != 0) {
                  return tmp;
              }
      
              tmp = Integer.compare(this.season, other.season);
              if (tmp != 0) {
                  return tmp;
              }
      
              tmp = Integer.compare(this.month, other.month);
              if (tmp != 0) {
                  return tmp;
              }
      
              tmp = Integer.compare(this.week, other.week);
              if (tmp != 0) {
                  return tmp;
              }
      
              tmp = Integer.compare(this.day, other.day);
              if (tmp != 0) {
                  return tmp;
              }
      
              tmp = this.type.compareTo(other.type);
              return tmp;
          }
      
          @Override
          public int hashCode() {
              final int prime = 31;
              int result = 1;
              result = prime * result + day;
              result = prime * result + id;
              result = prime * result + month;
              result = prime * result + season;
              result = prime * result + ((type == null) ? 0 : type.hashCode());
              result = prime * result + week;
              result = prime * result + year;
              return result;
          }
      
          @Override
          public boolean equals(Object obj) {
              if (this == obj)
                  return true;
              if (obj == null)
                  return false;
              if (getClass() != obj.getClass())
                  return false;
              DateDimension other = (DateDimension) obj;
              if (day != other.day)
                  return false;
              if (id != other.id)
                  return false;
              if (month != other.month)
                  return false;
              if (season != other.season)
                  return false;
              if (type == null) {
                  if (other.type != null)
                      return false;
              } else if (!type.equals(other.type))
                  return false;
              if (week != other.week)
                  return false;
              if (year != other.year)
                  return false;
              return true;
          }
      }
      
    • KpiDimension coding

      Create a new KpiDimension class file: get user id, and user name

      No customization required

    public class KpiDimension extends BaseDimension {

      private int id;
      private String kpiName;
    
      public KpiDimension() {
          super();
      }
    
      public KpiDimension(String kpiName) {
          super();
          this.kpiName = kpiName;
      }
    
      public KpiDimension(int id, String kpiName) {
          super();
          this.id = id;
          this.kpiName = kpiName;
      }
    
      public int getId() {
          return id;
      }
    
      public void setId(int id) {
          this.id = id;
      }
    
      public String getKpiName() {
          return kpiName;
      }
    
      public void setKpiName(String kpiName) {
          this.kpiName = kpiName;
      }
    
      @Override
      public void write(DataOutput out) throws IOException {
          out.writeInt(this.id);
          out.writeUTF(this.kpiName);
      }
    
      @Override
      public void readFields(DataInput in) throws IOException {
          this.id = in.readInt();
          this.kpiName = in.readUTF();
      }
    
      @Override
      public int compareTo(BaseDimension o) {
          if (this == o) {
              return 0;
          }
    
          KpiDimension other = (KpiDimension) o;
          int tmp = Integer.compare(this.id, other.id);
          if (tmp != 0) {
              return tmp;
          }
          tmp = this.kpiName.compareTo(other.kpiName);
          return tmp;
      }
    
      @Override
      public int hashCode() {
          final int prime = 31;
          int result = 1;
          result = prime * result + id;
          result = prime * result + ((kpiName == null) ? 0 : kpiName.hashCode());
          return result;
      }
    
      @Override
      public boolean equals(Object obj) {
          if (this == obj)
              return true;
          if (obj == null)
              return false;
          if (getClass() != obj.getClass())
              return false;
          KpiDimension other = (KpiDimension) obj;
          if (id != other.id)
              return false;
          if (kpiName == null) {
              if (other.kpiName != null)
                  return false;
          } else if (!kpiName.equals(other.kpiName))
              return false;
          return true;
      }
    

    }

    
    
    * PlatformDimension Code writing
    
    newly build PlatformDimension Class files: getting platform id,And platform name
    
    > No customization required
    
    ```java
    public class PlatformDimension extends BaseDimension {
    
    	private int id;
    	private String platformName;
    
    	public PlatformDimension() {
    		super();
    	}
    
    	public PlatformDimension(String platformName) {
    		super();
    		this.platformName = platformName;
    	}
    
    	public PlatformDimension(int id, String platformName) {
    		super();
    		this.id = id;
    		this.platformName = platformName;
    	}
    
    	public static List<PlatformDimension> buildList(String platformName) {
    		if (StringUtils.isBlank(platformName)) {
    			platformName = GlobalConstants.DEFAULT_VALUE;
    		}
    		List<PlatformDimension> list = new ArrayList<PlatformDimension>();
    		list.add(new PlatformDimension(GlobalConstants.VALUE_OF_ALL));
    		list.add(new PlatformDimension(platformName));
    		return list;
    	}
    
    	public int getId() {
    		return id;
    	}
    
    	public void setId(int id) {
    		this.id = id;
    	}
    
    	public String getPlatformName() {
    		return platformName;
    	}
    
    	public void setPlatformName(String platformName) {
    		this.platformName = platformName;
    	}
    
    	@Override
    	public void write(DataOutput out) throws IOException {
    		out.writeInt(this.id);
    		out.writeUTF(this.platformName);
    	}
    
    	@Override
    	public void readFields(DataInput in) throws IOException {
    		this.id = in.readInt();
    		this.platformName = in.readUTF();
    	}
    
    	@Override
    	public int compareTo(BaseDimension o) {
    		if (this == o) {
    			return 0;
    		}
    
    		PlatformDimension other = (PlatformDimension) o;
    		int tmp = Integer.compare(this.id, other.id);
    		if (tmp != 0) {
    			return tmp;
    		}
    		tmp = this.platformName.compareTo(other.platformName);
    		return tmp;
    	}
    
    	@Override
    	public int hashCode() {
    		final int prime = 31;
    		int result = 1;
    		result = prime * result + id;
    		result = prime * result
    				+ ((platformName == null) ? 0 : platformName.hashCode());
    		return result;
    	}
    
    	@Override
    	public boolean equals(Object obj) {
    		if (this == obj)
    			return true;
    		if (obj == null)
    			return false;
    		if (getClass() != obj.getClass())
    			return false;
    		PlatformDimension other = (PlatformDimension) obj;
    		if (id != other.id)
    			return false;
    		if (platformName == null) {
    			if (other.platformName != null)
    				return false;
    		} else if (!platformName.equals(other.platformName))
    			return false;
    		return true;
    	}
    
    }
    
  • Create composite dimension class

    • StatsDimension coding

      New StatsDimension class file: basic interface, which can be called by later classes

      No customization required

      public abstract class StatsDimension extends BaseDimension {
          // nothing
      }
      
    • StatsCommonDimension coding

    New StatsCommonDimension class file: the most commonly used dimension combination of the base, including time, platform, and user

    No customization required

    public class StatsCommonDimension extends StatsDimension {
        private DateDimension date = new DateDimension();
        private PlatformDimension platform = new PlatformDimension();
        private KpiDimension kpi = new KpiDimension();
    
        /**
         * close An instance object
         * 
         * @param dimension
         * @return
         */
        public static StatsCommonDimension clone(StatsCommonDimension dimension) {
            DateDimension date = new DateDimension(dimension.date.getId(), dimension.date.getYear(), dimension.date.getSeason(), dimension.date.getMonth(), dimension.date.getWeek(), dimension.date.getDay(), dimension.date.getType(), dimension.date.getCalendar());
            PlatformDimension platform = new PlatformDimension(dimension.platform.getId(), dimension.platform.getPlatformName());
            KpiDimension kpi = new KpiDimension(dimension.kpi.getId(), dimension.kpi.getKpiName());
            return new StatsCommonDimension(date, platform, kpi);
        }
    
        public StatsCommonDimension() {
            super();
        }
    
        public StatsCommonDimension(DateDimension date, PlatformDimension platform, KpiDimension kpi) {
            super();
            this.date = date;
            this.platform = platform;
            this.kpi = kpi;
        }
    
        public DateDimension getDate() {
            return date;
        }
    
        public void setDate(DateDimension date) {
            this.date = date;
        }
    
        public PlatformDimension getPlatform() {
            return platform;
        }
    
        public void setPlatform(PlatformDimension platform) {
            this.platform = platform;
        }
    
        public KpiDimension getKpi() {
            return kpi;
        }
    
        public void setKpi(KpiDimension kpi) {
            this.kpi = kpi;
        }
    
        @Override
        public void write(DataOutput out) throws IOException {
            this.date.write(out);
            this.platform.write(out);
            this.kpi.write(out);
        }
    
        @Override
        public void readFields(DataInput in) throws IOException {
            this.date.readFields(in);
            this.platform.readFields(in);
            this.kpi.readFields(in);
        }
    
        @Override
        public int compareTo(BaseDimension o) {
            if (this == o) {
                return 0;
            }
    
            StatsCommonDimension other = (StatsCommonDimension) o;
            int tmp = this.date.compareTo(other.date);
            if (tmp != 0) {
                return tmp;
            }
            tmp = this.platform.compareTo(other.platform);
            if (tmp != 0) {
                return tmp;
            }
            tmp = this.kpi.compareTo(other.kpi);
            return tmp;
        }
    
        @Override
        public int hashCode() {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((date == null) ? 0 : date.hashCode());
            result = prime * result + ((kpi == null) ? 0 : kpi.hashCode());
            result = prime * result + ((platform == null) ? 0 : platform.hashCode());
            return result;
        }
    
        @Override
        public boolean equals(Object obj) {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            StatsCommonDimension other = (StatsCommonDimension) obj;
            if (date == null) {
                if (other.date != null)
                    return false;
            } else if (!date.equals(other.date))
                return false;
            if (kpi == null) {
                if (other.kpi != null)
                    return false;
            } else if (!kpi.equals(other.kpi))
                return false;
            if (platform == null) {
                if (other.platform != null)
                    return false;
            } else if (!platform.equals(other.platform))
                return false;
            return true;
        }
    
    }
    
    • StatsUserDimension coding

      New StatsUserDimension class file: a dimension combination of users, including time, platform, user and browser

      No customization required

      public class StatsUserDimension extends StatsDimension {
      	
          private StatsCommonDimension statsCommon = new StatsCommonDimension();
          private BrowserDimension browser = new BrowserDimension();
      
          /**
           * close An instance object
           * 
           * @param dimension
           * @return
           */
          public static StatsUserDimension clone(StatsUserDimension dimension) {
              BrowserDimension browser = new BrowserDimension(dimension.browser.getBrowserName(), dimension.browser.getBrowserVersion());
              StatsCommonDimension statsCommon = StatsCommonDimension.clone(dimension.statsCommon);
              return new StatsUserDimension(statsCommon, browser);
          }
      
          public StatsUserDimension() {
              super();
          }
      
          public StatsUserDimension(StatsCommonDimension statsCommon, BrowserDimension browser) {
              super();
              this.statsCommon = statsCommon;
              this.browser = browser;
          }
      
          public StatsCommonDimension getStatsCommon() {
              return statsCommon;
          }
      
          public void setStatsCommon(StatsCommonDimension statsCommon) {
              this.statsCommon = statsCommon;
          }
      
          public BrowserDimension getBrowser() {
              return browser;
          }
      
          public void setBrowser(BrowserDimension browser) {
              this.browser = browser;
          }
      
          @Override
          public void write(DataOutput out) throws IOException {
              this.statsCommon.write(out);
              this.browser.write(out);
          }
      
          @Override
          public void readFields(DataInput in) throws IOException {
              this.statsCommon.readFields(in);
              this.browser.readFields(in);
          }
      
          @Override
          public int compareTo(BaseDimension o) {
              if (this == o) {
                  return 0;
              }
      
              StatsUserDimension other = (StatsUserDimension) o;
              int tmp = this.statsCommon.compareTo(other.statsCommon);
              if (tmp != 0) {
                  return tmp;
              }
              tmp = this.browser.compareTo(other.browser);
              return tmp;
          }
      
      }
      
  • Create a class to get the data of the indicator

    • KpiType coding

      Create a new kpi type class file: count the names of user KPIs and enumerate classes

      No customization required

      public enum KpiType {
      
      	NEW_INSTALL_USER("new_install_user"), // kpi statistics for new users
      	BROWSER_NEW_INSTALL_USER("browser_new_install_user"), // Count new user KPIs for browser dimensions
      	ACTIVE_USER("active_user"), // Statistics of active user KPIs
      	BROWSER_ACTIVE_USER("browser_active_user"), // Counts the active user KPIs for the browser dimension
      	;
      
      	public final String name;
      
      	private KpiType(String name) {
      		this.name = name;
      	}
      
      	/**
      	 * Obtain the corresponding kpitype enumeration object according to the name string value of kpitype
      	 * 
      	 * @param name
      	 * @return
      	 */
      	
      	public static KpiType valueOfName(String name) {
      		for (KpiType type : values()) {
      			if (type.name.equals(name)) {
      				return type;
      			}
      		}
      		throw new RuntimeException("designated name Does not belong to the KpiType Enumeration class:" + name);
      	}
      }
      
    • BaseStatsValueWritable code writing

      Create a new BaseStatsValueWritable class file: customize the basic statistics parent class for later classes to call

      No customization required

      public abstract class BaseStatsValueWritable implements Writable {
          /**
           * Get the kpi value corresponding to the current value
           * 
           * @return
           */
          public abstract KpiType getKpi();
      }
      
    • MapWritableValue coding

      New MapWritableValue class file: user basic information data

      No customization required

      public class MapWritableValue extends BaseStatsValueWritable {
      
          private MapWritable value = new MapWritable();//A row record is about to be inserted into the database table
          private KpiType kpi;
      
          public MapWritableValue() {
              super();
          }
      
          public MapWritableValue(MapWritable value, KpiType kpi) {
              super();
              this.value = value;
              this.kpi = kpi;
          }
      
          public MapWritable getValue() {
              return value;
          }
      
          public void setValue(MapWritable value) {
              this.value = value;
          }
      
          public void setKpi(KpiType kpi) {
              this.kpi = kpi;
          }
      
          @Override
          public void write(DataOutput out) throws IOException {
              this.value.write(out);
              WritableUtils.writeEnum(out, this.kpi);
          }
      
          @Override
          public void readFields(DataInput in) throws IOException {
              this.value.readFields(in);
              this.kpi = WritableUtils.readEnum(in, KpiType.class);
          }
      
          @Override
          public KpiType getKpi() {
              return this.kpi;
          }
      
      }
      
    • TimeOutputValue coding

      New TimeOutputValue class file: time information

      No customization required

      public class TimeOutputValue extends BaseStatsValueWritable {
          private String id; // id
          private long time; // time stamp
      
          public String getId() {
              return id;
          }
      
          public void setId(String id) {
              this.id = id;
          }
      
          public long getTime() {
              return time;
          }
      
          public void setTime(long time) {
              this.time = time;
          }
      
          @Override
          public void write(DataOutput out) throws IOException {
              out.writeUTF(this.id);
              out.writeLong(this.time);
          }
      
          @Override
          public void readFields(DataInput in) throws IOException {
              this.id = in.readUTF();
              this.time = in.readLong();
          }
      
          @Override
          public KpiType getKpi() {
              // TODO Auto-generated method stub
              return null;
          }
      
      }
      
  • Create a class to output MR to Mysql

    • IDimensionConverter code writing

      Create a new IDimensionConverter class file: provides an interface for special operations (querying and inserting dimension tables from relational databases)

      No customization required

      public interface IDimensionConverter {
          /**
           * Get the ID < br / > according to the value value of the dimension
           * If there is in the database, return directly. If not, the new id value is returned after the insertion
           * 
           * @param dimension
           * @return
           * @throws IOException
           */
          public int getDimensionIdByValue(BaseDimension dimension) throws IOException;
      }
      
      
    • IOutputCollector coding

      Create a new IOutputCollector class file: a custom class that performs specific sql output with custom output

      No customization required

      public interface IOutputCollector {
      
          /**
           * Specific methods of statistical data insertion
           * 
           * @param conf
           * @param key
           * @param value
           * @param pstmt
           * @param converter
           * @throws SQLException
           * @throws IOException
           */
          public void collect(Configuration conf, BaseDimension key, BaseStatsValueWritable value, PreparedStatement pstmt, IDimensionConverter converter) throws SQLException, IOException;
      }
      
  • Create classes for active user analysis

    • JdbcManager coding

      Create a new JdbcManager class file: jdbc management to obtain jdbc information

      No customization required

      public class JdbcManager {
          /**
           * Obtain the jdbc connection of the relational database according to the configuration
           * 
           * @param conf
           *            hadoop configuration information
           * @param flag
           *            Flag bits for distinguishing different data sources
           * @return
           * @throws SQLException
           */
          public static Connection getConnection(Configuration conf, String flag) throws SQLException {
              String driverStr = String.format(GlobalConstants.JDBC_DRIVER, flag);
              String urlStr = String.format(GlobalConstants.JDBC_URL, flag);
              String usernameStr = String.format(GlobalConstants.JDBC_USERNAME, flag);
              String passwordStr = String.format(GlobalConstants.JDBC_PASSWORD, flag);
      
              String driverClass = conf.get(driverStr);
              String url = conf.get(urlStr);
              String username = conf.get(usernameStr);
              String password = conf.get(passwordStr);
              try {
                  Class.forName(driverClass);
              } catch (ClassNotFoundException e) {
                  // nothing
              }
              return DriverManager.getConnection(url, username, password);
          }
      }
      
    • DimensionConverterImpl coding

      Create a new DimensionConverterImpl class file: create a connection with the database and write the dimension data into it

      To customize:

      private static final String URL = "jdbc:mysql://bd1601:3306/test";

      private static final String USERNAME = "root";

      private static final String PASSWORD = "123456";

      public class DimensionConverterImpl implements IDimensionConverter {
          private static final Logger logger = Logger.getLogger(DimensionConverterImpl.class);
          private static final String DRIVER = "com.mysql.cj.jdbc.Driver";
          private static final String URL = "jdbc:mysql://bd1601:3306/test";
          private static final String USERNAME = "root";
          private static final String PASSWORD = "123456";
          private Map<String, Integer> cache = new LinkedHashMap<String, Integer>() {
              private static final long serialVersionUID = 8894507016522723685L;
      
              @Override
      		protected boolean removeEldestEntry(Map.Entry<String, Integer> eldest) {
                  return this.size() > 5000;
              };
          };
      
          static {
              try {
                  Class.forName(DRIVER);
              } catch (ClassNotFoundException e) {
                  // nothing
              }
          }
      
          @Override
          public int getDimensionIdByValue(BaseDimension dimension) throws IOException {
              String cacheKey = this.buildCacheKey(dimension); // Get cache key
              System.out.println("----Dimensional cache key +"+cacheKey);
              if (this.cache.containsKey(cacheKey)) {
                  return this.cache.get(cacheKey);
              }
      
              Connection conn = null;
              try {
                  // 1. Check whether there is a corresponding value in the database. If yes, return
                  // 2. If there is no value in the first step; First insert our dimension data and get the id
                  String[] sql = null; // Execute sql array
                  if (dimension instanceof DateDimension) {
                      sql = this.buildDateSql();
                  } else if (dimension instanceof PlatformDimension) {
                      sql = this.buildPlatformSql();
                  } else if (dimension instanceof BrowserDimension) {
                      sql = this.buildBrowserSql();
                  } else {
                      throw new IOException("This is not supported dimensionid Acquisition of:" + dimension.getClass());
                  }
      
                  conn = this.getConnection(); // Get connection
      //            conn=JdbcManager.getConnection(conf, flag)
                  int id = 0;
                  synchronized (this) {
                      id = this.executeSql(conn, cacheKey, sql, dimension);
                      this.cache.put(cacheKey, id);
                  }
                  return id;
              } catch (Throwable e) {
                  logger.error("An exception occurred while operating the database", e);
                  throw new IOException(e);
              } finally {
                  if (conn != null) {
                      try {
                          conn.close();
                      } catch (SQLException e) {
                          // nothing
                      }
                  }
              }
          }
      
          /**
           * Get database connection
           * 
           * @return
           * @throws SQLException
           */
          private Connection getConnection() throws SQLException {
              return DriverManager.getConnection(URL, USERNAME, PASSWORD);
          }
      
          /**
           * Create cache key
           * 
           * @param dimension
           * @return
           */
          private String buildCacheKey(BaseDimension dimension) {
              StringBuilder sb = new StringBuilder();
              if (dimension instanceof DateDimension) {
                  sb.append("date_dimension");
                  DateDimension date = (DateDimension) dimension;
                  sb.append(date.getYear()).append(date.getSeason()).append(date.getMonth());
                  sb.append(date.getWeek()).append(date.getDay()).append(date.getType());
              } else if (dimension instanceof PlatformDimension) {
                  sb.append("platform_dimension");
                  PlatformDimension platform = (PlatformDimension) dimension;
                  sb.append(platform.getPlatformName());
              } else if (dimension instanceof BrowserDimension) {
                  sb.append("browser_dimension");
                  BrowserDimension browser = (BrowserDimension) dimension;
                  sb.append(browser.getBrowserName()).append(browser.getBrowserVersion());
              }
      
              if (sb.length() == 0) {
                  throw new RuntimeException("Unable to create the specified dimension of cachekey: " + dimension.getClass());
              }
              return sb.toString();
          }
      
          /**
           * Set parameters
           * 
           * @param pstmt
           * @param dimension
           * @throws SQLException
           */
          private void setArgs(PreparedStatement pstmt, BaseDimension dimension) throws SQLException {
              int i = 0;
              if (dimension instanceof DateDimension) {
                  DateDimension date = (DateDimension) dimension;
                  pstmt.setInt(++i, date.getYear());
                  pstmt.setInt(++i, date.getSeason());
                  pstmt.setInt(++i, date.getMonth());
                  pstmt.setInt(++i, date.getWeek());
                  pstmt.setInt(++i, date.getDay());
                  pstmt.setString(++i, date.getType());
                  pstmt.setDate(++i, new Date(date.getCalendar().getTime()));
              } else if (dimension instanceof PlatformDimension) {
                  PlatformDimension platform = (PlatformDimension) dimension;
                  pstmt.setString(++i, platform.getPlatformName());
              } else if (dimension instanceof BrowserDimension) {
                  BrowserDimension browser = (BrowserDimension) dimension;
                  pstmt.setString(++i, browser.getBrowserName());
                  pstmt.setString(++i, browser.getBrowserVersion());
              }
          }
      
          /**
           * Create date dimension related sql
           * 
           * @return
           */
          private String[] buildDateSql() {
              String querySql = "SELECT `id` FROM `dimension_date` WHERE `year` = ? AND `season` = ? AND `month` = ? AND `week` = ? AND `day` = ? AND `type` = ? AND `calendar` = ?";
              String insertSql = "INSERT INTO `dimension_date`(`year`, `season`, `month`, `week`, `day`, `type`, `calendar`) VALUES(?, ?, ?, ?, ?, ?, ?)";
              return new String[] { querySql, insertSql };
          }
      
          /**
           * Create polatform dimension related sql
           * 
           * @return
           */
          private String[] buildPlatformSql() {
              String querySql = "SELECT `id` FROM `dimension_platform` WHERE `platform_name` = ?";
              String insertSql = "INSERT INTO `dimension_platform`(`platform_name`) VALUES(?)";
              return new String[] { querySql, insertSql };
          }
      
          /**
           * Create browser dimension related sql
           * 
           * @return
           */
          private String[] buildBrowserSql() {
              String querySql = "SELECT `id` FROM `dimension_browser` WHERE `browser_name` = ? AND `browser_version` = ?";
              String insertSql = "INSERT INTO `dimension_browser`(`browser_name`, `browser_version`) VALUES(?, ?)";
              return new String[] { querySql, insertSql };
          }
      
          /**
           * How to execute sql
           * 
           * @param conn
           * @param cacheKey
           * @param sqls
           * @param dimension
           * @return
           * @throws SQLException
           */
          @SuppressWarnings("resource")
          private int executeSql(Connection conn, String cacheKey, String[] sqls, BaseDimension dimension) throws SQLException {
              PreparedStatement pstmt = null;
              ResultSet rs = null;
              try {
                  pstmt = conn.prepareStatement(sqls[0]); // Create pstmt object to query sql
                  // Set parameters
                  this.setArgs(pstmt, dimension);
                  rs = pstmt.executeQuery();
                  if (rs.next()) {
                      return rs.getInt(1); // Return value
                  }
                  // The code runs here, indicating that the dimension is not stored in the database and can be inserted
                  pstmt = conn.prepareStatement(sqls[1], java.sql.Statement.RETURN_GENERATED_KEYS);
                  // Set parameters
                  this.setArgs(pstmt, dimension);
                  pstmt.executeUpdate();
                  rs = pstmt.getGeneratedKeys(); // Gets the automatically generated id of the returned
                  if (rs.next()) {
                      return rs.getInt(1); // Get return value
                  }
              } finally {
                  if (rs != null) {
                      try {
                          rs.close();
                      } catch (Throwable e) {
                          // nothing
                      }
                  }
                  if (pstmt != null) {
                      try {
                          pstmt.close();
                      } catch (Throwable e) {
                          // nothing
                      }
                  }
              }
              throw new RuntimeException("Get from database id fail");
          }
      }
      
    • Transformer output format coding

      Create a new TransformerOutputFormat class file: customize the outputformat class output to mysql, including the key output from BaseDimension and the value output from BaseStatsValueWritable

      No customization required

      public class TransformerOutputFormat extends OutputFormat<BaseDimension, BaseStatsValueWritable> {
          private static final Logger logger = Logger.getLogger(TransformerOutputFormat.class);
      
          /**
           * Define the output format of each piece of data. One piece of data is the data output by the write method each time the reducer task executes.
           */
          @Override
      	public RecordWriter<BaseDimension, BaseStatsValueWritable> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException {
              Configuration conf = context.getConfiguration();
              Connection conn = null;
              //? Put it first
              IDimensionConverter converter = new DimensionConverterImpl();
              try {
                  conn = JdbcManager.getConnection(conf, GlobalConstants.WAREHOUSE_OF_REPORT);
                  conn.setAutoCommit(false);
              } catch (SQLException e) {
                  logger.error("Failed to get database connection", e);
                  throw new IOException("Failed to get database connection", e);
              }
              return new TransformerRecordWriter(conn, conf, converter);
          }
      
          @Override
          public void checkOutputSpecs(JobContext context) throws IOException, InterruptedException {
              // Detect the output space and output to mysql without detection
          }
      
          @Override
          public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException, InterruptedException {
              return new FileOutputCommitter(FileOutputFormat.getOutputPath(context), context);
          }
      
          /**
           * Customize specific data output writer
           * 
           * @author root
           *
           */
          public class TransformerRecordWriter extends RecordWriter<BaseDimension, BaseStatsValueWritable> {
              private Connection conn = null;
              private Configuration conf = null;
              private IDimensionConverter converter = null;
              private Map<KpiType, PreparedStatement> map = new HashMap<KpiType, PreparedStatement>();
              private Map<KpiType, Integer> batch = new HashMap<KpiType, Integer>();
      
              public TransformerRecordWriter(Connection conn, Configuration conf, IDimensionConverter converter) {
                  super();
                  this.conn = conn;
                  this.conf = conf;
                  this.converter = converter;
              }
      
              @Override
              /**
               * When the output data of the reduce task is, it is automatically called by the computing framework. Write the data output by reducer to mysql
               */
              public void write(BaseDimension key, BaseStatsValueWritable value) throws IOException, InterruptedException {
                  if (key == null || value == null) {
                      return;
                  }
      
                  try {
                      KpiType kpi = value.getKpi();
                      PreparedStatement pstmt = null;//Each pstmt object corresponds to an sql statement
                      int count = 1;//Batch processing of sql statements, 10 at a time
                      if (map.get(kpi) == null) {
                          // Use kpi to distinguish, return sql and save it to config
                          pstmt = this.conn.prepareStatement(conf.get(kpi.name));
                          map.put(kpi, pstmt);
                      } else {
                          pstmt = map.get(kpi);
                          count = batch.get(kpi);
                          count++;
                      }
                      batch.put(kpi, count); // Storage of batch times
      
                      String collectorName = conf.get(GlobalConstants.OUTPUT_COLLECTOR_KEY_PREFIX + kpi.name);
                      Class<?> clazz = Class.forName(collectorName);
                      IOutputCollector collector = (IOutputCollector) clazz.newInstance();//Insert value into the mysql method. Because kpi dimensions are different. Cannot insert into table.
                      collector.collect(conf, key, value, pstmt, converter);
      
                      if (count % Integer.valueOf(conf.get(GlobalConstants.JDBC_BATCH_NUMBER, GlobalConstants.DEFAULT_JDBC_BATCH_NUMBER)) == 0) {
                          pstmt.executeBatch();
                          conn.commit();
                          batch.put(kpi, 0); // Corresponding batch calculation deletion
                      }
                  } catch (Throwable e) {
                      logger.error("stay writer Exception in writing data", e);
                      throw new IOException(e);
                  }
              }
      
              @Override
              public void close(TaskAttemptContext context) throws IOException, InterruptedException {
                  try {
                      for (Map.Entry<KpiType, PreparedStatement> entry : this.map.entrySet()) {
                          entry.getValue().executeBatch();
                      }
                  } catch (SQLException e) {
                      logger.error("implement executeUpdate Method exception", e);
                      throw new IOException(e);
                  } finally {
                      try {
                          if (conn != null) {
                              conn.commit(); // Submit the connection
                          }
                      } catch (Exception e) {
                          // nothing
                      } finally {
                          for (Map.Entry<KpiType, PreparedStatement> entry : this.map.entrySet()) {
                              try {
                                  entry.getValue().close();
                              } catch (SQLException e) {
                                  // nothing
                              }
                          }
                          if (conn != null)
                              try {
                                  conn.close();
                              } catch (Exception e) {
                                  // nothing
                              }
                      }
                  }
              }
      
          }
      }
      
    • ActiveUserCollector coding

      Create a new ActiveUserCollector class file: implement IOutputCollector class, connect to mysql database and write data

      No customization required

      public class ActiveUserCollector implements IOutputCollector {
      
      
      	/**
      	 *   INSERT INTO `stats_user`(
      		    `platform_dimension_id`,
      		    `date_dimension_id`,
      		    `active_users`,
      		    `created`)
      		  VALUES(?, ?, ?, ?) ON DUPLICATE KEY UPDATE `active_users` = ?
      	 */
      	@Override
      	public void collect(Configuration conf, BaseDimension key, BaseStatsValueWritable value, PreparedStatement pstmt,
      			IDimensionConverter converter) throws SQLException, IOException {
      		StatsUserDimension userDimension = (StatsUserDimension) key;
      		
      		
      		MapWritableValue mapWritableValue = (MapWritableValue)value;
      		MapWritable mapWritable = mapWritableValue.getValue();
      		IntWritable intWritable = (IntWritable)mapWritable.get(new IntWritable(-1));
      		int activeUsers = intWritable.get();
      	
      		
      		pstmt.setInt(1, converter.getDimensionIdByValue(userDimension.getStatsCommon().getPlatform()));
      		pstmt.setInt(2, converter.getDimensionIdByValue(userDimension.getStatsCommon().getDate()));
      		pstmt.setInt(3, activeUsers);
      		pstmt.setString(4, conf.get(GlobalConstants.RUNNING_DATE_PARAMES));
      		pstmt.setInt(5,activeUsers);
      		
      		pstmt.addBatch();
      
      	}
      
      }
      
      

Article reprinted from Le byte

Topics: Java Front-end Project