Project architecture design
Java project collection
Project system architecture
Based on the real business data architecture of an e-commerce website, the project realizes the data from collection to use through multi-directional closed-loop business such as front-end application, back-end program, data analysis and platform deployment. A set of e-commerce log analysis project in line with the teaching system has been formed, which is mainly realized through off-line technology.
- User visualization: it is mainly responsible for the interaction with users and the display of business data. The main body is implemented in JS, not on tomcat server.
- Business logic program: it mainly realizes the overall business logic, which is built through spring to meet the business requirements. Deployed on tomcat.
- data storage
- Business database: the project adopts the widely used relational database mysql, which is mainly responsible for the storage of platform business logic data
- HDFS distributed file storage service: the project adopts HBase+Hive, which is mainly used to store the full amount of historical business data and support the demand for high-speed acquisition, as well as the storage of massive data, so as to support future decision analysis.
- Offline analysis part
- Log collection service: collect the user's access behavior to the page in the business platform by using flume ng and send it to the HDFS cluster regularly
- Offline analysis and ETL: the batch statistical business is realized by MapReduce+Hivesql to realize the statistical task of indicator data
- Data transfer service: it is mainly responsible for transferring data to Hive by batch processing business data with sqoop
Project data flow
-
Analysis system (bf_transformer)
- From data collection to page presentation
-
Log collection section
- Flume reads log updates from the operation logs of business services and regularly pushes the updated logs to HDFS; After receiving these logs, HDFS filters the obtained log information through the MR program to obtain the user access data stream UID | mid | platform | browser | timestamp; After the calculation is completed, the data is merged with the data in HBase.
-
ETL part
-
Load the data initialized by the system into HBase through MapReduce.
-
Offline analysis part
-
The offline statistics service and offline analysis service can be scheduled through Oozie, and the trigger operation of tasks can be completed through the set running time
-
The offline analysis service loads data from HBase, implements multiple statistical algorithms, and writes the calculation results to Mysql;
-
Data warehouse analysis service
- Scheduling and execution of sql script
-
The data warehouse analysis service can be scheduled through Oozie, and the trigger operation of tasks can be completed through the set running time
-
The data analysis service loads data from the database of each system into HDFS. After HDFS receives these logs, Filter the acquired data through MR program (unify the data format). After the calculation, merge the data with the data in Hive; after Hive obtains these data, logically process the acquired data through HQL script to realize transaction information, access information and multiple indicators; after the calculation, merge the data with the data in Hive.
-
Application background execution workflow
Note: instead of using ip to indicate the uniqueness of the user, we fill a uuid in the cookie to indicate the uniqueness of the user.
In our js sdk, different events are divided according to different data collected.
- For example, pageview event, the execution process of Js sdk is as follows:
-
analysis
- PC side event analysis
For our final different analysis modules, we need different data. Next, we analyze the data required by each module from each module. The basic user information is the analysis of the user's browsing behavior information, that is, we only need the pageview event;
Browser information analysis and region information analysis actually add the dimension information of browser and region on the basis of user basic information analysis. Among them, we can use the browser window navigator. The user agent is used for analysis. The regional information can be analyzed by collecting the user's ip address through the nginx server, that is, the pageview event can also meet the analysis of these two modules.
For external chain data analysis and user browsing depth analysis, we can add the current url of the visiting page and the url of the previous page to the pageview event for processing and analysis, that is, the pageview event can also meet the analysis of these two modules.
Order information analysis requires the pc to send an event generated by an order, so corresponding to the analysis of this module, we need a new event chargeRequest. For event analysis, we also need a pc to send a new event data, which can be defined as event. In addition, we also need to set up a launch event to record the access of new users.
The format of data url sent by various events on Pc side is as follows, and the parameters behind the url are the data we collected: http://shsxt.com/shsxt.gif?requestdata
Final analysis module PC js sdk event User basic information analysis pageview event Browser information analysis pageview event Regional information analysis pageview event External chain data analysis pageview event User browsing depth analysis pageview event Order information analysis chargeRequest event Event analysis Event event User basic information modification launch event
PC side JS and SDK events
- General parameters
The same information will be returned in all buried points.
name | content |
---|---|
Data sent | u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time= 1449137597974&ver=1&pl=website&sdk=js& b_rst=1920*1080&u_ud=12BF4079-223E-4A57-AC60-C1A0 4D8F7A2F&b_iev=Mozilla%2F5.0%20(Windows%20NT%206. 1%3B%20WOW64)%20AppleWebKit%2F537.1%20(KHTML%2C%2 0like%20Gecko)%20Chrome%2F21.0.1180.77%20Safari% 2F537.1&l=zh-CN&en=e_l |
Parameter name | type | describe |
---|---|---|
u_sd | string | Session id |
c_time | string | Client creation time |
ver | string | Version number, eg: 0.0 one |
pl | string | Platform, eg: website |
sdk | string | Sdk type, eg: js |
b_rst | string | Browser resolution, eg: 1800*678 |
u_ud | string | User / visitor unique identifier |
b_iev | string | Browser information useragent |
l | string | Client language |
- Launch event
This event is triggered when the user visits the website for the first time. No external call interface is provided, and only the data collection of this event is realized.
name | content |
---|---|
Data sent | en=e_l&General parameters |
Parameter name | type | describe |
---|---|---|
en | string | Event name, eg: e_l |
- Member login time
This event is triggered when the user logs in to the website. No external call interface is provided, and only the data collection of this event is realized.
name | content |
---|---|
Data sent | u_mid=phone&General parameters |
Parameter name | type | describe |
---|---|---|
u_mid | string | The member id is consistent with the business system |
- Pageview event, which depends on onPageView class
This event is triggered when the user accesses / refreshes the page. This event will be called automatically, or it can be called manually by the programmer.
Method name | content |
---|---|
Data sent | en=e_pv&p_ref=www.shsxt.com%3A8080&p_url =http%3A%2F%2Fwww.shsxt.com%3A8080%2Fvst_track%2Findex.html&General parameters |
Parameter name | type | describe |
---|---|---|
en | string | Event name, eg: e_pv |
p_url | string | url of the current page |
p_ref | string | url of the previous page |
-
ChargeSuccess event
This event is triggered when the user successfully places an order, which needs to be actively called by the program.
Method name | onChargeRequest |
---|---|
Data sent | oid=orderid123&on=%E4%BA%A7%E5%93% 81%E5%90%8D%E7%A7%B0&cua=1000&cut=%E4%BA%BA%E6%B0 %91%E5%B8%81&pt=%E6%B7%98%E5%AE%9&en=e_cs &General parameters |
parameter | type | Required | describe |
---|---|---|---|
orderId | string | yes | Order id |
on | String | yes | Product purchase description name |
cua | double | yes | Order price |
cut | String | yes | Currency type |
pt | String | yes | Payment method |
en | String | yes | Event name, eg: e_cs |
-
Chargefind event
This event is triggered when the user fails to place an order, which needs to be actively called by the program.
Method name | onChargeRequest |
---|---|
Data sent | oid=orderid123&on=%E4%BA%A7%E5%93% 81%E5%90%8D%E7%A7%B0&cua=1000&cut=%E4%BA%BA%E6%B0 %91%E5%B8%81&pt=%E6%B7%98%E5%AE%9&en=e_cr &General parameters |
parameter | type | Required | describe |
---|---|---|---|
orderId | string | yes | Order id |
on | String | yes | Product purchase description name |
cua | double | yes | Order price |
cut | String | yes | Currency type |
pt | String | yes | Payment method |
en | String | yes | Event name, eg: e_cr |
- Event event
When a visitor / user triggers a business defined event, the front-end program calls this method.
Method name | onEventDuration |
---|---|
Data sent | ca=%E7%B1%BB%E5%9E%8B&ac=%E5%8A%A8%E4%BD%9C kv_p_url=http%3A%2F%2Fwwwshsxt...com %3A8080%2Fvst_track%2Findex.html&kv_%E5%B1%9E%E6 %80%A7key=%E5%B1%9E%E6%80%A7value&du=1000& en=e_e&General parameters |
parameter | type | Required | describe |
---|---|---|---|
ca | string | yes | Category name of Event |
ac | String | yes | action name of Event |
kv_p_url | map | no | Custom properties for Event |
du | long | no | Duration of Event |
en | String | yes | Event name, eg: e_e |
-
Data parameter description
Collect different data in different events and send it to nginx server, but in fact, these collected data still have some commonalities. The possible parameters used are described as follows:
Parameter name type describe en string Event name, eg: e_pv ver string Version number, eg: 0.0 one pl string Platform, eg: website sdk string Sdk type, eg: js b_rst string Browser resolution, eg: 1800*678 b_iev string Browser information useragent u_ud string User / visitor unique identifier l string Client language u_mid string The member id is consistent with the business system u_sd string Session id c_time string Client time p_url string url of the current page p_ref string url of the previous page tt string Title of the current page ca string Category name of Event ac string action name of Event kv_* string Custom properties for Event du string Duration of Event oid string Order id on string Order name cua string Payment amount cut string Payment currency type pt string Payment method -
The order workflow is as follows: (similar to refund)
-
analysis
- Program background event analysis
In this project, only the chargeSuccess event will start in the program background. The main function of this event is to send the information of order success to the nginx server. The sending format is the same as the pc sending method, and it also accesses the same url for data transmission. The format is:
Final analysis module PC js sdk event Order information analysis chargeSuccess event chargereturn event -
chargeSuccess event
This event is triggered when the member finally pays successfully, which needs to be actively called by the program.
Method name onChargeSuccess Data sent u_mid=shsxt&c_time=1449142044528&oid=orderid123&ver=1&en=e_cs&pl= javaserver&sdk=jdk parameter type Required describe orderId string yes Order id memberId string yes Member id -
Chargefind event
This event is triggered when a member performs a refund operation, which needs to be actively called by the program.
Method name onChargeRefund Data sent u_mid=shsxt&c_time=1449142044528&oid=orderid123 &ver=1&en=e_cr&pl=javaserver&sdk=jdk parameter type Required describe orderId string yes Order id memberId string yes Member id
-
Integration mode
The sdk of java can be directly introduced into the project or added to the classpath.
- Data parameter description
The parameters are described as follows:
Parameter name | type | describe |
---|---|---|
en | string | Event name, eg: e_cs |
ver | string | Version number, eg: 0.0 one |
pl | string | Platform, eg: website,javaweb,php |
sdk | string | Sdk type, eg: java |
u_mid | string | The member id is consistent with the business system |
c_time | string | Client time |
oid | string | Order id |
Project data model
-
HBase storage structure
Here, we use the method of including timestamp in rowkey; hbase column clusters use log to mark column clusters. So finally, we create a single column cluster rowkey eventlog table with timestamp.
- create ‘eventlog’, ‘log’. rowkey design rules are: timestamp + (uid + mid + EN) actual work content in CRC Coding Project
Environment construction
NginxLog file service (single node)
1. Upload and unzip
//To the root directory cd /root //Upload file rz //decompression tar -zxvf ./nginx-1.8.1.tar.gz //Delete compressed package rm -rf ./nginx-1.8.1.tar.gz
2. Compile and install
//Compile and install //Install the dependent programs required by nginx in advance yum install gcc pcre-devel zlib-devel openssl-devel -y //Locate the configure file cd nginx-1.8.1/ //Execute compilation ./configure --prefix=/opt/sxt/nginx //install make && make install
3. Start verification
//Find nginx startup file cd /opt/sxt/nginx/sbin //start-up ./nginx //web side authentication shsxt-hadoop101:80 //Common commands nginx -s reload nginx -s quit
Flume ng (single node)
- Upload and decompress
//Upload data file mkdir -p /opt/sxt/flume cd /opt/sxt/flume rz //decompression tar -zxvf apache-flume-1.6.0-bin.tar.gz //delete rm -rf apache-flume-1.6.0-bin.tar.gz
- Modify profile
cd /opt/sxt/flume/apache-flume-1.6.0-bin/conf cp flume-env.sh.template flume-env.sh vim flume-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_231-amd64 # Set the used memory size. This is only used when chnnel is set to memory storage # export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
- Modify environment variables
vim /etc/profile export FLUME_HOME=/opt/sxt/flume/apache-flume-1.6.0-bin export PATH=$FLUME_HOME/bin: source /etc/profile
- verification
flume-ng version
Sqoop (single node)
- install
Upload, decompress, modify and verify the configuration file
//create folder mkdir -p /opt/sxt/sqoop //upload rz //decompression tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz //delete tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz //Configure environment variables export SQOOP_HOME=/opt/sxt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha export PATH=$SQOOP_HOME/bin: source /etc/profile //Add mysql connection package cd ./sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/ //rename profile cd /opt/sxt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf mv sqoop-env-template.sh sqoop-env.sh //Modify the configuration file (comment out the unused component information) cd /opt/sxt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/configure-sqoop vim configure-sqoop //Note the following ##if [ -z "${HCAT_HOME}" ]; then ## if [ -d "/usr/lib/hive-hcatalog" ]; then ## HCAT_HOME=/usr/lib/hive-hcatalog ## elif [ -d "/usr/lib/hcatalog" ]; then ## HCAT_HOME=/usr/lib/hcatalog ## else ## HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog ## if [ ! -d ${HCAT_HOME} ]; then ## HCAT_HOME=${SQOOP_HOME}/../hcatalog ## fi ## fi ##fi ##if [ -z "${ACCUMULO_HOME}" ]; then ## if [ -d "/usr/lib/accumulo" ]; then ## ACCUMULO_HOME=/usr/lib/accumulo ## else ## ACCUMULO_HOME=${SQOOP_HOME}/../accumulo ## fi ##fi ## Moved to be a runtime check in sqoop. ##if [ ! -d "${HCAT_HOME}" ]; then ## echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." ## echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' ##fi ##if [ ! -d "${ACCUMULO_HOME}" ]; then ## echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." ## echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' ##fi ##export HCAT_HOME ##export ACCUMULO_HOME //verification sqoop version //Verify sqoop and database connections sqoop list-databases -connect jdbc:mysql://shsxt-hadoop101:3306/ -username root -password 123456
Integration of Hive and HBase
hive and hbase synchronization https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
- Map data from HBase to Hive
- Because they are stored on HDFS
- Therefore, Hive can specify the data storage path of HBase when creating tables
- However, deleting the Hive table does not delete the Hbase
- Conversely, if HBase is deleted, Hive's data will be deleted
//Lose the jar package, configure the cluster information, and specify the HBase mapped data when creating the table cp /opt/sxt/apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar /opt/sxt/hbase-1.4.13/lib/ //Check whether the jar has been uploaded successfully (three nodes) ls /opt/sxt/hbase-1.4.13/lib/hive-hbase-handler-* //Add attributes to hive's configuration file: vim /opt/sxt/apache-hive-1.2.1-bin/conf/hive-site.xml //New: (all three nodes are added) <property> <name>hbase.zookeeper.quorum</name> <value>shsxt-hadoop101:2181,shsxt-hadoop102:2181,shsxt-hadoop103:2181</value> </property> //verification //First create a temporary table in hive, and then query this table CREATE EXTERNAL TABLE brower1 ( `id` string, `name` string, `version` string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:browser,info:browser_v") TBLPROPERTIES ("hbase.table.name" = "event"); CREATE EXTERNAL TABLE tmp_order ( `key` string, `name` string, `age` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties('hbase.columns.mapping'=':key,info:u_ud,info:u_sd') tblproperties('hbase.table.name'='event');
Hive on Tez (single node)
1. Add apache-tez-0.8 Deploy the compressed package in the 5-bin / share directory to HDFS
cd /opt/bdp/ rz tar -zxvf apache-tez-0.8.5-bin.tar.gz rm -rf apache-tez-0.8.5-bin.tar.gz cd apache-tez-0.8.5-bin/share/ hadoop fs -mkdir -p /bdp/tez/ hadoop fs -put tez.tar.gz /bdp/tez/ hadoop fs -chmod -R 777 /bdp hadoop fs -ls /bdp/tez/
2. Create tez site in {hide_home} / conf directory XML file, as follows:
cd /opt/bdp/apache-hive-1.2.1-bin/conf/ vim tez-site.xml <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>tez.lib.uris</name> <value>/bdp/tez/tez.tar.gz</value> </property> <property> <name>tez.container.max.java.heap.fraction</name> <value>0.2</value> </property> </configuration>
Note: tez lib. The path of the URI configuration is tez. In the previous step tar. HDFS path for GZ compressed package deployment. Also tez site The XML file needs to be copied to the corresponding directories of the nodes where HiveServer2 and HiveMetastore services are located.
3. Add apache-tez-0.8 Tez. In the 5-bin / share directory tar. Unzip the GZ compressed package into the current lib directory
cd /opt/bdp/apache-tez-0.8.5-bin/share/ ll mkdir lib tar -zxvf tez.tar.gz -C lib/
4. Copy all jar packages under lib and lib/lib directories to {HIVE_HOME}/lib directory
cd lib pwd scp -r *.jar /opt/bdp/apache-hive-1.2.1-bin/lib/ scp -r lib/*.jar /opt/bdp/apache-hive-1.2.1-bin/lib/ ll /opt/bdp/apache-hive-1.2.1-bin/lib/tez-*
Note: Tez's dependent packages need to be copied to the corresponding directories of the nodes where HiveServer2 and HiveMetastore services are located.
5. After completing the above operations, restart the HiveServer and HiveMetastore services
nohup hive --service metastore > /dev/null 2>&1 & nohup hiveserver2 > /dev/null 2>&1 & netstat -apn |grep 10000 netstat -apn |grep 9083
Hive2 On Tez test: test with hive command
hive set hive.tez.container.size=3020; set hive.execution.engine=tez; use bdp; select count(*) from test;
Oozie build
Deploy Hadoop (CDH version)
-
Modify Hadoop configuration
core-site.xml
<!-- Oozie Server of Hostname --> <property> <name>hadoop.proxyuser.atguigu.hosts</name> <value>*</value> </property> <!-- Allowed to be Oozie Proxy user group --> <property> <name>hadoop.proxyuser.atguigu.groups</name> <value>*</value> </property>
mapred-site.xml
<!-- to configure MapReduce JobHistory Server Address, default port 10020 --> <property> <name>mapreduce.jobhistory.address</name> <value>shsxt_hadoop102:10020</value> </property> <!-- to configure MapReduce JobHistory Server web ui Address, default port 19888 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>shsxt_hadoop102:19888</value> </property>
yarn-site.xml
<!-- Task history service --> <property> <name>yarn.log.server.url</name> <value>http://shsxt_hadoop102:19888/jobhistory/logs/</value> </property>
Remember to synchronize scp to other machine nodes after completion
- Restart Hadoop cluster
sbin/start-dfs.sh sbin/start-yarn.sh sbin/mr-jobhistory-daemon.sh start historyserver
Note: if you need to start JobHistoryServer, you'd better run an MR task for testing.
Deploy Oozie
- Unzip Oozie
tar -zxvf /opt/sxt/cdh/oozie-4.0.0-cdh5.3.6.tar.gz -C ./
- In the oozie root directory, unzip oozie-hadoop libs-4.0 0-cdh5. 3.6. tar. gz
tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../
After completion, the Hadoop LIBS directory will appear under the Oozie directory.
- Create the libext directory under the Oozie directory
mkdir libext/
-
Copy dependent Jar package
- Copy the jar package in Hadoop LIBS to the libext Directory:
cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
- Copy the Mysql driver package to the libext Directory:
cp -a /root/mysql-connector-java-5.1.27-bin.jar ./libext/
-
Add ext-2.2 Zip to the libext / directory
ext is a js framework for displaying oozie front-end pages:
cp -a /root/ext-2.2.zip libext/
-
Modify Oozie profile
oozie-site.xml
Properties: oozie.service.JPAService.jdbc.driver Attribute value: com.mysql.jdbc.Driver Explanation: JDBC Drive of Properties: oozie.service.JPAService.jdbc.url Attribute value: jdbc:mysql://shsxt_hadoop101:3306/oozie Explanation: oozie Required database address Properties: oozie.service.JPAService.jdbc.username Attribute value: root Explanation: database user name Properties: oozie.service.JPAService.jdbc.password Attribute value: 123456 Explanation: database password Properties: oozie.service.HadoopAccessorService.hadoop.configurations Attribute value:*=/opt/sxt/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop Explanation: let Oozie quote Hadoop Configuration file for
-
Create Oozie's database in Mysql
Enter Mysql and create oozie database:
mysql -uroot -p000000 create database oozie; grant all on *.* to root@'%' identified by '123456'; flush privileges; exit;
-
Initialize Oozie
1) Upload yarn.com under Oozie directory tar. GZ file to HDFS:
Tip: yarn tar. GZ files will unzip themselves
bin/oozie-setup.sh sharelib create -fs hdfs://shsxt_hadoop102:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
After successful execution, go to 50070 to check whether there is file generation in the corresponding directory.
2) Create oozie SQL file
bin/ooziedb.sh create -sqlfile oozie.sql -run
3) Package the project and generate the war package
bin/oozie-setup.sh prepare-war
-
Startup and shutdown of Oozie
The startup command is as follows:
bin/oozied.sh start
The closing command is as follows:
bin/oozied.sh stop
Project realization
System environment:
system | edition |
---|---|
windows | 10 professional edition |
linux | CentOS 7 |
Development tools:
tool | edition |
---|---|
idea | 2019.2.4 |
maven | 3.6.2 |
JDK | 1.8+ |
Cluster environment:
frame | edition |
---|---|
hadoop | 2.6.5 |
zookeeper | 3.4.10 |
hbase | 1.3.1 |
flume | 1.6.0 |
sqoop | 1.4.6 |
Hardware environment:
Hardware | hadoop102 | hadoop103 | hadoop104 |
---|---|---|---|
Memory | 1G | 1G | 1G |
CPU | 2 nucleus | 1 core | 1 core |
Hard disk | 50G | 50G | 50G |
Data production
data structure
We will use the method of including timestamp in rowkey in HBase; HBase column clusters use log to mark column clusters. So finally, we create a single column cluster rowkey eventlog table with timestamp. create ‘eventlog’, ‘log’
The design rule of rowkey is: timestamp+(uid+mid+en)crc code
Listing | explain | distance |
---|---|---|
browser | Browser name | 360 |
browser_v | Browser version | 3 |
city | city | Guiyang City |
country | country | China |
en | Event name | e_l |
os | operating system | linux |
os_v | Operating system version | 1 |
p_url | url of the current page | http://www.tmall.com |
pl | platform | website |
province | Provinces and cities | Guizhou Province |
s_time | system time | 1595605873000 |
u_sd | Session id | 12344F83-6357-4A64-8527-F09216974234 |
u_ud | User id | 26866661 |
Write code
Create a new maven project: shsxt_ecshop
- pom file configuration (you need to customize the name of the package and the technical version used, and the following is required):
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.3.2.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>com.xxxx</groupId> <artifactId>bigdatalog</artifactId> <version>0.0.1-SNAPSHOT</version> <name>bigdatalog</name> <description>Demo project for Spring Boot</description> <properties> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-freemarker</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> <exclusions> <exclusion> <groupId>org.junit.vintage</groupId> <artifactId>junit-vintage-engine</artifactId> </exclusion> </exclusions> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
-
analytics.js: new js file
Create an anonymous self calling function: simple operations for declaring cookie s (get, set), parameter settings (start a session, the user logs in for the first time, the user visits the page, the user order request, the user-defined event, the method that must be executed before executing the external method, send the data to the server, add the common part sent to the log collection server to the data, and return the string with the parameter code).
Tip: a custom parameter is required: serverurl:“ http://bd1601/shsxt.jpg "
(function () { //The cookie is used to determine whether it is a new access. var CookieUtil = { // get the cookie of the key is name get: function (name) { var cookieName = encodeURIComponent(name) + "=", cookieStart = document.cookie .indexOf(cookieName), cookieValue = null; if (cookieStart > -1) { var cookieEnd = document.cookie.indexOf(";", cookieStart); if (cookieEnd == -1) { cookieEnd = document.cookie.length; } cookieValue = decodeURIComponent(document.cookie.substring( cookieStart + cookieName.length, cookieEnd)); } return cookieValue; }, // set the name/value pair to browser cookie set: function (name, value, expires, path, domain, secure) { var cookieText = encodeURIComponent(name) + "=" + encodeURIComponent(value); if (expires) { // set the expires time var expiresTime = new Date(); expiresTime.setTime(expires); cookieText += ";expires=" + expiresTime.toGMTString(); } if (path) { cookieText += ";path=" + path; } if (domain) { cookieText += ";domain=" + domain; } if (secure) { cookieText += ";secure"; } document.cookie = cookieText; }, setExt: function (name, value) { this.set(name, value, new Date().getTime() + 315360000000, "/"); } }; //=================================================================================== // The subject is actually tracker js var tracker = { // config clientConfig: { //Address of the log server serverUrl: "http://bd1601/shsxt.jpg", //session expiration time sessionTimeout: 360, // 360s -> 6min //Maximum waiting time maxWaitTime: 3600, // 3600s -> 60min -> 1h //Version version ver: "1" }, //cookie expiration time cookieExpiresTime: 315360000000, // cookie expiration time, 10 years //General data columns: { // The name of the column sent to the server eventName: "en", version: "ver", platform: "pl", sdk: "sdk", uuid: "u_ud", memberId: "u_mid", sessionId: "u_sd", clientTime: "c_time", language: "l", userAgent: "b_iev", resolution: "b_rst", currentUrl: "p_url", referrerUrl: "p_ref", title: "tt", orderId: "oid", orderName: "on", currencyAmount: "cua", currencyType: "cut", paymentType: "pt", category: "ca", action: "ac", kv: "kv_", duration: "du" }, //Value to set to common data keys: { pageView: "e_pv", chargeRequestEvent: "e_crt", launch: "e_l", eventDurationEvent: "e_e", sid: "bftrack_sid", uuid: "bftrack_uuid", mid: "bftrack_mid", preVisitTime: "bftrack_previsit", }, /** * Get session id */ getSid: function () { return CookieUtil.get(this.keys.sid); }, /** * Save session id to cookie */ setSid: function (sid) { if (sid) { CookieUtil.setExt(this.keys.sid, sid); } }, /** * Get uuid from cookie */ getUuid: function () { return CookieUtil.get(this.keys.uuid); }, /** * Save uuid to cookie */ setUuid: function (uuid) { if (uuid) { CookieUtil.setExt(this.keys.uuid, uuid); } }, /** * Get memberID */ getMemberId: function () { return CookieUtil.get(this.keys.mid); }, /** * Set mid */ setMemberId: function (mid) { if (mid) { CookieUtil.setExt(this.keys.mid, mid); } }, //Start a session startSession: function () { // Method triggered when js is loaded if (this.getSid()) { // The session id exists, indicating that the uuid also exists if (this.isSessionTimeout()) { // The session expires and a new session is generated this.createNewSession(); } else { // The session has not expired. Update the latest access time this.updatePreVisitTime(new Date().getTime()); } } else { // The session id does not exist, indicating that the uuid does not exist this.createNewSession(); } //If not, just come in and return to pv this.onPageView(); }, //User first login onLaunch: function () { // Trigger launch event var launch = {}; launch[this.columns.eventName] = this.keys.launch; // Set event name this.setCommonColumns(launch); // Set public columns this.sendDataToServer(this.parseParam(launch)); // Finally send encoded data }, //User access page onPageView: function () { // Trigger page view event if (this.preCallApi()) { var time = new Date().getTime(); var pageviewEvent = {}; pageviewEvent[this.columns.eventName] = this.keys.pageView; pageviewEvent[this.columns.currentUrl] = window.location.href; // Set current url pageviewEvent[this.columns.referrerUrl] = document.referrer; // Set the url of the previous page pageviewEvent[this.columns.title] = document.title; // Set title this.setCommonColumns(pageviewEvent); // Set public columns this.sendDataToServer(this.parseParam(pageviewEvent)); // Finally send encoded data this.updatePreVisitTime(time); } }, //User order request onChargeRequest: function (orderId, name, currencyAmount, currencyType, paymentType) { // Event triggered to generate an order if (this.preCallApi()) { if (!orderId || !currencyType || !paymentType) { this.log("order id,Currency type and payment method cannot be blank"); return; } if (typeof (currencyAmount) == "number") { // Amount must be a number var time = new Date().getTime(); var chargeRequestEvent = {}; chargeRequestEvent[this.columns.eventName] = this.keys.chargeRequestEvent; chargeRequestEvent[this.columns.orderId] = orderId; chargeRequestEvent[this.columns.orderName] = name; chargeRequestEvent[this.columns.currencyAmount] = currencyAmount; chargeRequestEvent[this.columns.currencyType] = currencyType; chargeRequestEvent[this.columns.paymentType] = paymentType; this.setCommonColumns(chargeRequestEvent); // Set public columns this.sendDataToServer(this.parseParam(chargeRequestEvent)); // Finally send encoded data this.updatePreVisitTime(time); } else { this.log("Order amount must be numeric"); return; } } }, //User defined events onEventDuration: function (category, action, map, duration) { // Trigger event if (this.preCallApi()) { if (category && action) { var time = new Date().getTime(); var event = {}; event[this.columns.eventName] = this.keys.eventDurationEvent; event[this.columns.category] = category; event[this.columns.action] = action; if (map) { for (var k in map) { if (k && map[k]) { event[this.columns.kv + k] = map[k]; } } } if (duration) { event[this.columns.duration] = duration; } this.setCommonColumns(event); // Set public columns this.sendDataToServer(this.parseParam(event)); // Finally send encoded data this.updatePreVisitTime(time); } else { this.log("category and action Cannot be empty"); } } }, /** * Methods that must be executed before executing external methods */ preCallApi: function () { if (this.isSessionTimeout()) { // If true, it indicates that a new one needs to be created this.startSession(); } else { this.updatePreVisitTime(new Date().getTime()); } return true; }, //Send data to server sendDataToServer: function (data) { // alert(data); // Send data to the server, where data is a string var that = this; var i2 = new Image(1, 1); // <img src="url"></img> i2.onerror = function () { // Retry operation can be performed here }; //http:/bd1601/log. gif? Data is the parameter to be uploaded i2.src = this.clientConfig.serverUrl + "?" + data; }, /** * Add the common part sent to the log collection server to the data */ setCommonColumns: function (data) { data[this.columns.version] = this.clientConfig.ver; data[this.columns.platform] = "website"; data[this.columns.sdk] = "js"; data[this.columns.uuid] = this.getUuid(); // Set user id data[this.columns.memberId] = this.getMemberId(); // Set member id data[this.columns.sessionId] = this.getSid(); // Set sid data[this.columns.clientTime] = new Date().getTime(); // Set client time data[this.columns.language] = window.navigator.language; // Set browser language data[this.columns.userAgent] = window.navigator.userAgent; // Set browser type data[this.columns.resolution] = screen.width + "*" + screen.height; // Set browser resolution }, /** * Create a new member and judge whether it is the first time to visit the page. If so, send the launch event. */ createNewSession: function () { var time = new Date().getTime(); // Get current operation time // 1. Update the session var sid = this.generateId(); // Generate a session id this.setSid(sid); this.updatePreVisitTime(time); // Update last access time // 2. View uuid if (!this.getUuid()) { // The uuid does not exist. First create the uuid, then save it to the cookie, and finally trigger the launch event var uuid = this.generateId(); // Product uuid this.setUuid(uuid); this.onLaunch(); } }, /** * Parameter encoding return string */ parseParam: function (data) { var params = ""; // {key:value,key2:value2} for (var e in data) { if (e && data[e]) { params += encodeURIComponent(e) + "=" + encodeURIComponent(data[e]) + "&"; } } if (params) { return params.substring(0, params.length - 1); } else { return params; } }, /** * Generate uuid */ generateId: function () { var chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'; var tmpid = []; var r; tmpid[8] = tmpid[13] = tmpid[18] = tmpid[23] = '-'; tmpid[14] = '4'; for (i = 0; i < 36; i++) { if (!tmpid[i]) { r = 0 | Math.random() * 16; tmpid[i] = chars[(i == 19) ? (r & 0x3) | 0x8 : r]; } } return tmpid.join(''); }, /** * Judge whether the session expires, and check whether the current time and the latest access time interval are less than this clientConfig. sessionTimeout<br/> * If it is less than, return false; Otherwise, return true. */ isSessionTimeout: function () { var time = new Date().getTime(); var preTime = CookieUtil.get(this.keys.preVisitTime); if (preTime) { // If the latest access time exists, judge the interval return time - preTime > this.clientConfig.sessionTimeout * 1000; } return true; }, /** * Update last access time */ updatePreVisitTime: function (time) { CookieUtil.setExt(this.keys.preVisitTime, time); }, /** * Print log */ log: function (msg) { console.log(msg); }, }; // Method name of external exposure window.__AE__ = { startSession: function () { tracker.startSession(); }, onPageView: function () { tracker.onPageView(); }, onChargeRequest: function (orderId, name, currencyAmount, currencyType, paymentType) { tracker.onChargeRequest(orderId, name, currencyAmount, currencyType, paymentType); }, onEventDuration: function (category, action, map, duration) { tracker.onEventDuration(category, action, map, duration); }, setMemberId: function (mid) { tracker.setMemberId(mid); } }; // Automatic loading method var autoLoad = function () { // Set parameters var _aelog_ = _aelog_ || window._aelog_ || []; var memberId = null; for (i = 0; i < _aelog_.length; i++) { _aelog_[i][0] === "memberId" && (memberId = _aelog_[i][1]); } // Set the value of memberid according to the given memberid memberId && __AE__.setMemberId(memberId); // Start session __AE__.startSession(); }; autoLoad(); })();
-
AnalyticsEngineSDK.java
Judge whether the order id and user id are empty. If not, build the url through the build method, and then send the url through the SendDataMonitor class
Need custom accessurl =“ http://bd1601/shsxt.jpg ";
/** * Analysis engine sdk java server-side data collection * * @author root * @version 1.0 * */ public class AnalyticsEngineSDK { // Log print object private static final Logger log = Logger.getGlobal(); // The body of the request url public static final String accessUrl = "http://bd1601/shsxt.jpg"; private static final String platformName = "java_server"; private static final String sdkName = "jdk"; private static final String version = "1"; /** * Trigger the event of successful order payment and send the event data to the server * * @param orderId * Order payment id * @param memberId * Order payment member id * @return If the data is successfully sent (added to the send queue), then true is returned; Otherwise, it returns false (parameter exception & failed to add to send queue) */ public static boolean onChargeSuccess(String orderId, String memberId) { try { if (isEmpty(orderId) || isEmpty(memberId)) { // Order id or memberid is null log.log(Level.WARNING, "order id And members id Cannot be empty"); return false; } // When the code is executed here, it means that neither the order id nor the member id is empty. Map<String, String> data = new HashMap<String, String>(); data.put("u_mid", memberId); data.put("oid", orderId); data.put("c_time", String.valueOf(System.currentTimeMillis())); data.put("ver", version); data.put("en", "e_cs"); data.put("pl", platformName); data.put("sdk", sdkName); // Create url String url = buildUrl(data); // Send url & Add url to queue SendDataMonitor.addSendUrl(url); return true; } catch (Throwable e) { log.log(Level.WARNING, "Sending data exception", e); } return false; } /** * Trigger the order refund event and send the refund data to the server * * @param orderId * Refund order id * @param memberId * Refund member id * @return Returns true if the data is sent successfully. Otherwise, false is returned. */ public static boolean onChargeRefund(String orderId, String memberId) { try { if (isEmpty(orderId) || isEmpty(memberId)) { // Order id or memberid is null log.log(Level.WARNING, "order id And members id Cannot be empty"); return false; } // When the code is executed here, it means that neither the order id nor the member id is empty. Map<String, String> data = new HashMap<String, String>(); data.put("u_mid", memberId); data.put("oid", orderId); data.put("c_time", String.valueOf(System.currentTimeMillis())); data.put("ver", version); data.put("en", "e_cr"); data.put("pl", platformName); data.put("sdk", sdkName); // Build url String url = buildUrl(data); // Send url & Add url to queue SendDataMonitor.addSendUrl(url); return true; } catch (Throwable e) { log.log(Level.WARNING, "Sending data exception", e); } return false; } /** * Build the url based on the passed in parameters * * @param data * @return * @throws UnsupportedEncodingException */ private static String buildUrl(Map<String, String> data) throws UnsupportedEncodingException { StringBuilder sb = new StringBuilder(); //http://node01/log.gif? sb.append(accessUrl).append("?"); for (Map.Entry<String, String> entry : data.entrySet()) { if (isNotEmpty(entry.getKey()) && isNotEmpty(entry.getValue())) { sb.append(entry.getKey().trim()) .append("=") .append(URLEncoder.encode(entry.getValue().trim(), "utf-8")) .append("&"); } } return sb.substring(0, sb.length() - 1);// Remove last& } /** * Judge whether the string is empty. If it is empty, return true. Otherwise, false is returned. * * @param value * @return */ private static boolean isEmpty(String value) { return value == null || value.trim().isEmpty(); } /** * Judge whether the string is not empty. If not, return true. If it is empty, false is returned. * * @param value * @return */ private static boolean isNotEmpty(String value) { return !isEmpty(value); } }
-
SendDataMonitor.java
Send url requests, use getSendDataMonitor, this single instance mode constructor private, then call the written static method to return to SendDataMonitor object, then call the queue queue of the object to add url continuously, then SendDataMonitor.monitor.run(); This method continuously circularly obtains the url in the queue for transmission
No customization required
/** * The monitor that sends url data, which is used to start a separate thread to send data * * @author root * */ public class SendDataMonitor { // Logging object private static final Logger log = Logger.getGlobal(); // Blocking queue, where the user stores the sending url private BlockingQueue<String> queue = new LinkedBlockingQueue<String>(); // A class object for a single column private static SendDataMonitor monitor = null; private SendDataMonitor() { // Private construction method to create single column mode } /** * Get the monitor object instance of single column, double check * * @return */ public static SendDataMonitor getSendDataMonitor() { if (monitor == null) { synchronized (SendDataMonitor.class) { if (monitor == null) { monitor = new SendDataMonitor(); Thread thread = new Thread(new Runnable() { @Override public void run() { // The specific processing method is invoked in the thread. SendDataMonitor.monitor.run(); } }); // When testing, it is not set to guard mode // thread.setDaemon(true); thread.start(); } } } return monitor; } /** * Add a url to the queue * * @param url * @throws InterruptedException */ public static void addSendUrl(String url) throws InterruptedException { getSendDataMonitor().queue.put(url); } /** * Specifically implement the method of sending url * */ private void run() { while (true) { try { String url = this.queue.take(); // Official send url HttpRequestUtil.sendData(url); } catch (Throwable e) { log.log(Level.WARNING, "send out url abnormal", e); } } } /** * Internal class, http tool class for users to send data * * @author root * */ public static class HttpRequestUtil { /** * Specific method of sending url * * @param url * @throws IOException */ public static void sendData(String url) throws IOException { HttpURLConnection con = null; BufferedReader in = null; try { URL obj = new URL(url); // Create url object con = (HttpURLConnection) obj.openConnection(); // Open url connection // Set connection parameters con.setConnectTimeout(5000); // Connection expiration time con.setReadTimeout(5000); // Read data expiration time con.setRequestMethod("GET"); // Set the request type to get System.out.println("send out url:" + url); // Send connection request in = new BufferedReader(new InputStreamReader( con.getInputStream())); // TODO: consider here whether you can } finally { try { if (in != null) { in.close(); } } catch (Throwable e) { // nothing } try { con.disconnect(); } catch (Throwable e) { // nothing } } } } }
-
Test.java
Test payment status log
No customization required
public class Test { public static String day = "20190607"; public static void main(String[] args) { System.out.println("=================Start walking you================="); //Insert code to collect logs //When payment is successful AnalyticsEngineSDK.onChargeSuccess("orderid123", "zhangsan"); //When payment fails AnalyticsEngineSDK.onChargeRefund("orderid456", "lisi"); System.out.println("==========Continue to execute my code===================="); //Normal code, continue execution } }
Packaging test
Testing in Linux
Upload the jar package to the specified directory of linux
mkdir /opt/sxt/datalog/ cd /opt/sxt/datalog/
Start project
nohup java -jar bigdatalog.jar >>/opt/sxt/datalog/runlog.log 2>&1 &
verification
http://192.168.58.201:8080/
data acquisition
Idea:
- Configure nginx and start the cluster and nginx
- Configure flume
- Start flume monitoring task
- Run log production script
- Observation test
Nginx configuration
Create a new nginx configuration file: nginx conf
New nginx profile
//Find nginx conf cd /opt/sxt/nginx/conf //Modify file vim nginx.conf
Modify the following
#############Modification content:Unlock notes log_format my_format,Increase again location#################### worker_processes 1; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format my_format '$remote_addr^A$msec^A$http_host^A$request_uri'; sendfile on; keepalive_timeout 65; server { listen 80; server_name localhost; location / { root html; index index.html index.htm; } location = /shsxt.jpg { default_type image/gif; access_log /opt/data/access.log my_format; } }
verification
###############################Restart verification########################################### //Create LOG storage directory mkdir -p /opt/sxt/ //Monitoring log file tail -F /opt/sxt/access.log //verification shsxt-hadoop101/shsxt.jpg
Create script for cutting Log file
##############################establish nginx Log cutting script####################################### //Create script vim nginx.log.sh ##############################Add the following####################################### #nginx log cutting script #!/bin/bash #Set log file storage directory logs_path="/opt/data/" #Set backup directory logs_bak_path="/opt/data/access_logs_bak/" #Rename log file mv ${logs_path}access.log ${logs_bak_path}access_$(date "+%Y%m%d").log #Reload nginx: regenerate a new log file /opt/bdp/nginx/sbin/nginx -s reload
Flume configuration
Create a new Flume configuration file: example conf
New Flume profile
//Create a directory dedicated to preventing flume configuration files mkdir -p /opt/bdp/flume/options //Move to profile directory cd /opt/bdp/flume/options //Create first profile vim example.conf
Modify Flume configuration file: add the following content
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /opt/data/access_logs_bak a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=hdfs://bdp/data/test/flume/%Y%m%d #Prefix of uploaded file a1.sinks.k1.hdfs.filePrefix = access_logs ##New files are generated every 60s or when the file size exceeds 10M # Create a new file when there are many messages in hdfs. 0 is not based on the number of messages a1.sinks.k1.hdfs.rollCount=0 # How long does hdfs create a new file? 0 is not based on time a1.sinks.k1.hdfs.rollInterval=0 # Create a new file when hdfs is large. 0 is not based on the file size a1.sinks.k1.hdfs.rollSize=0 # When no data is written to the currently opened temporary file within the time (seconds) specified by this parameter, the temporary file will be closed and renamed to the target file a1.sinks.k1.hdfs.idleTimeout=3 a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.useLocalTimeStamp=true ## Generate a directory every five minutes: # Whether to enable discard in time. Discard here is similar to rounding, which will be described later. If enabled, all time expressions except% t will be affected a1.sinks.k1.hdfs.round=true # The value of "discard" in time; a1.sinks.k1.hdfs.roundValue=5 # The unit of "discarding" in time, including: second,minute,hour a1.sinks.k1.hdfs.roundUnit=minute # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
Start flume
nohup flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console >> /root/flume_log 2>&1 &
Start the production script and observe the test
cat /root/flume_log
Data consumption
If the above operations are successful, start to write the code for operating HBase to consume data, and store the generated data in HBase in real time.
Idea:
-
Write MR, read the data in HDFS cluster and print it to the console to observe whether it is successful;
-
Now that the data in HDFS can be read, the read data can be written to HBase, so write and call HBaseAPI related methods to write the data read from HDFS to HBase;
-
The above two steps are enough to complete the tasks of consuming and storing data, but involve decoupling, so some attribute files need to be externalized in the process, and HBase generic methods need to be encapsulated in a class.
Create a new module project: shsxt_ecshop_loganalyse
- pom.xml file configuration:
No customization required
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>shsxt_ecshop_loganalyse</artifactId> <version>1.0</version> <!--Use Ali's maven Warehouse--> <repositories> <repository> <id>central</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <layout>default</layout> </repository> </repositories> <!--Set basic project parameters--> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> </properties> <!--Add dependency--> <dependencies> <!--add to hadoop Command dependency--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.5</version> </dependency> <!--add to hadoop Client dependency--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.5</version> </dependency> <!--add to hdfs Dependence of--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.5</version> </dependency> <!--add to mapreduce Client code dependencies for--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.6.5</version> </dependency> <!--add to hive Data file dependency--> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.3.7</version> </dependency> <!--add to hbase Client dependencies for--> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.4.13</version> </dependency> <!--add to hbase Server dependency--> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.4.13</version> </dependency> <!--add to musql Connection dependency--> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.20</version> </dependency> <!--Add and resolve browser dependencies--> <dependency> <groupId>cz.mallat.uasparser</groupId> <artifactId>uasparser</artifactId> <version>0.6.2</version> </dependency> <!--Add test dependency--> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> </dependencies> </project>
Function of ETL code
- Remove dirty data
- Split the data into formats that can be processed directly (parse IP, parse browser, split log)
- Store the results of data cleaning in HBase
ETL coding
-
IpSeeker: creates a new IpSeeker class file
Parsing ip: used to read QQwry Dat file to get the location according to ip, QQwry The format of DAT is:
I File header, 8 bytes in total
-
Absolute offset of the first starting IP, 4 bytes
-
Absolute offset of the last starting IP, 4 bytes
II "End address / country / region" record area each record followed by a four byte ip address is divided into two parts
-
National Records
-
Regional records, but regional records are not necessarily available.
Moreover, there are two forms of national records and regional records
-
String ending with 0
-
4 bytes, one byte may be 0x1 or 0x2
a. When it is 0x1, it means that the absolute offset is followed by a region record. Note that it is after the absolute offset, not after these four bytes
b. When it is 0x2, it indicates that there is no area record after absolute offset
Whether 0x1 or 0x2, the last three bytes are the absolute offset in the file of the actual country name. If it is a regional record, the meanings of 0x1 and 0x2 are unknown, but if these two bytes appear, they must be offset by 3 bytes. If not, it is a 0-ending string
-
III "Start address / end address offset" record area
Each record is 7 bytes, arranged from small to large according to the starting address
a. starting IP address, 4 bytes
b. absolute offset of end ip address, 3 bytes
Note that the ip address and all offsets in this file are in little endian format, while java is in big endian format. Pay attention to conversion
No customization required
-
public class IPSeeker { // Some fixed constants, such as record length, etc private static final int IP_RECORD_LENGTH = 7; private static final byte AREA_FOLLOWED = 0x01; private static final byte NO_AREA = 0x2; // It is used as a cache. When querying an ip, first check the cache to reduce unnecessary repeated searches private Hashtable ipCache; // Random file access class private RandomAccessFile ipFile; // Memory mapping file private MappedByteBuffer mbb; // The first mock exam example private static IPSeeker instance = null; // Absolute offset of the start and end of the start region private long ipBegin, ipEnd; // Temporary variables used to improve efficiency private IPLocation loc; private byte[] buf; private byte[] b4; private byte[] b3; /** * private constructors */ protected IPSeeker() { ipCache = new Hashtable(); loc = new IPLocation(); buf = new byte[100]; b4 = new byte[4]; b3 = new byte[3]; try { String ipFilePath = IPSeeker.class.getResource("/qqwry.dat") .getFile(); ipFile = new RandomAccessFile(ipFilePath, "r"); } catch (FileNotFoundException e) { System.out.println("IP Address information file not found, IP The display function will not be available"); ipFile = null; } // If the file is opened successfully, read the file header information if (ipFile != null) { try { ipBegin = readLong4(0); ipEnd = readLong4(4); if (ipBegin == -1 || ipEnd == -1) { ipFile.close(); ipFile = null; } } catch (IOException e) { System.out.println("IP Address information file format error, IP The display function will not be available"); ipFile = null; } } } /** * @return Single instance */ public static IPSeeker getInstance() { if (instance == null) { instance = new IPSeeker(); } return instance; } /** * Given the incomplete name of a location, a series of IP range records containing s substrings are obtained * @param s Ground point string * @return List containing IPEntry type */ public List getIPEntriesDebug(String s) { List ret = new ArrayList(); long endOffset = ipEnd + 4; for (long offset = ipBegin + 4; offset <= endOffset; offset += IP_RECORD_LENGTH) { // Read end IP offset long temp = readLong3(offset); // If temp is not equal to - 1, read the location information of IP if (temp != -1) { IPLocation loc = getIPLocation(temp); // Judge whether the location contains s substring. If so, add the record to the List. If not, continue if (loc.country.indexOf(s) != -1 || loc.area.indexOf(s) != -1) { IPEntry entry = new IPEntry(); entry.country = loc.country; entry.area = loc.area; // Get start IP readIP(offset - 4, b4); entry.beginIp = IPSeekerUtils.getIpStringFromBytes(b4); // Get end IP readIP(temp, b4); entry.endIp = IPSeekerUtils.getIpStringFromBytes(b4); // Add this record ret.add(entry); } } } return ret; } /** */ /** * Given the incomplete name of a location, a series of IP range records containing s substrings are obtained * * @param s * Ground point string * @return List containing IPEntry type */ public List getIPEntries(String s) { List ret = new ArrayList(); try { // Mapping IP information files to memory if (mbb == null) { FileChannel fc = ipFile.getChannel(); mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, ipFile.length()); mbb.order(ByteOrder.LITTLE_ENDIAN); } int endOffset = (int) ipEnd; for (int offset = (int) ipBegin + 4; offset <= endOffset; offset += IP_RECORD_LENGTH) { int temp = readInt3(offset); if (temp != -1) { IPLocation loc = getIPLocation(temp); // Judge whether the location contains s substring. If so, add the record to the List. If not, continue if (loc.country.indexOf(s) != -1 || loc.area.indexOf(s) != -1) { IPEntry entry = new IPEntry(); entry.country = loc.country; entry.area = loc.area; // Get start IP readIP(offset - 4, b4); entry.beginIp = IPSeekerUtils.getIpStringFromBytes(b4); // Get end IP readIP(temp, b4); entry.endIp = IPSeekerUtils.getIpStringFromBytes(b4); // Add this record ret.add(entry); } } } } catch (IOException e) { System.out.println(e.getMessage()); } return ret; } /** */ /** * Read an int three bytes from the offset position of the memory mapping file * * @param offset * @return */ private int readInt3(int offset) { mbb.position(offset); return mbb.getInt() & 0x00FFFFFF; } /** */ /** * Read an int three bytes from the current location of the memory mapped file * * @return */ private int readInt3() { return mbb.getInt() & 0x00FFFFFF; } /** */ /** * Get country name according to IP * * @param ip * ip Byte array form of * @return Country name string */ public String getCountry(byte[] ip) { // Check whether the ip address file is normal if (ipFile == null) return "FALSE IP Database file"; // Save ip and convert ip byte array to string form String ipStr = IPSeekerUtils.getIpStringFromBytes(ip); // First check whether the cache already contains the results of this ip, and no more search files if (ipCache.containsKey(ipStr)) { IPLocation loc = (IPLocation) ipCache.get(ipStr); return loc.country; } else { IPLocation loc = getIPLocation(ip); ipCache.put(ipStr, loc.getCopy()); return loc.country; } } /** */ /** * Get country name according to IP * * @param ip * IP String form of * @return Country name string */ public String getCountry(String ip) { return getCountry(IPSeekerUtils.getIpByteArrayFromString(ip)); } /** */ /** * Get region name according to IP * * @param ip * ip Byte array form of * @return Region name string */ public String getArea(byte[] ip) { // Check whether the ip address file is normal if (ipFile == null) return "FALSE IP Database file"; // Save ip and convert ip byte array to string form String ipStr = IPSeekerUtils.getIpStringFromBytes(ip); // First check whether the cache already contains the results of this ip, and no more search files if (ipCache.containsKey(ipStr)) { IPLocation loc = (IPLocation) ipCache.get(ipStr); return loc.area; } else { IPLocation loc = getIPLocation(ip); ipCache.put(ipStr, loc.getCopy()); return loc.area; } } /** * Get region name according to IP * * @param ip * IP String form of * @return Region name string */ public String getArea(String ip) { return getArea(IPSeekerUtils.getIpByteArrayFromString(ip)); } /** */ /** * Search the ip information file according to the ip to obtain the IPLocation structure, and the searched ip parameters are obtained from the class member ip * * @param ip * IP to query * @return IPLocation structure */ public IPLocation getIPLocation(byte[] ip) { IPLocation info = null; long offset = locateIP(ip); if (offset != -1) info = getIPLocation(offset); if (info == null) { info = new IPLocation(); info.country = "Unknown country"; info.area = "Unknown region"; } return info; } /** * Read 4 bytes from the offset position as a long. Because java is in big endian format, there is no way to use such a function for conversion * * @param offset * @return Read the long value of the file, and return - 1, indicating that reading the file failed */ private long readLong4(long offset) { long ret = 0; try { ipFile.seek(offset); ret |= (ipFile.readByte() & 0xFF); ret |= ((ipFile.readByte() << 8) & 0xFF00); ret |= ((ipFile.readByte() << 16) & 0xFF0000); ret |= ((ipFile.readByte() << 24) & 0xFF000000); return ret; } catch (IOException e) { return -1; } } /** * Read three bytes from the offset position as a long. Because java is in big endian format, there is no way to use such a function for conversion * * @param offset * @return Read the long value of the file, and return - 1, indicating that reading the file failed */ private long readLong3(long offset) { long ret = 0; try { ipFile.seek(offset); ipFile.readFully(b3); ret |= (b3[0] & 0xFF); ret |= ((b3[1] << 8) & 0xFF00); ret |= ((b3[2] << 16) & 0xFF0000); return ret; } catch (IOException e) { return -1; } } /** * Read 3 bytes from the current position and convert to long * * @return */ private long readLong3() { long ret = 0; try { ipFile.readFully(b3); ret |= (b3[0] & 0xFF); ret |= ((b3[1] << 8) & 0xFF00); ret |= ((b3[2] << 16) & 0xFF0000); return ret; } catch (IOException e) { return -1; } } /** * Read the four byte ip address from the offset position and put it into the ip array. The read ip is in big endian format, but * The file is in the form of little endian, which will be converted * * @param offset * @param ip */ private void readIP(long offset, byte[] ip) { try { ipFile.seek(offset); ipFile.readFully(ip); byte temp = ip[0]; ip[0] = ip[3]; ip[3] = temp; temp = ip[1]; ip[1] = ip[2]; ip[2] = temp; } catch (IOException e) { System.out.println(e.getMessage()); } } /** * Read the four byte ip address from the offset position and put it into the ip array. The read ip is in big endian format, but * The file is in the form of little endian, which will be converted * * @param offset * @param ip */ private void readIP(int offset, byte[] ip) { mbb.position(offset); mbb.get(ip); byte temp = ip[0]; ip[0] = ip[3]; ip[3] = temp; temp = ip[1]; ip[1] = ip[2]; ip[2] = temp; } /** * Compare the class member ip with beginIp. Note that the beginIp is big endian * * @param ip * IP to query * @param beginIp * IP compared with queried IP * @return Equal returns 0, ip greater than beginIp returns 1, less than - 1. */ private int compareIP(byte[] ip, byte[] beginIp) { for (int i = 0; i < 4; i++) { int r = compareByte(ip[i], beginIp[i]); if (r != 0) return r; } return 0; } /** * Compare two byte s as unsigned numbers * * @param b1 * @param b2 * @return If b1 is greater than b2, it returns 1, equal returns 0, and less than - 1 */ private int compareByte(byte b1, byte b2) { if ((b1 & 0xFF) > (b2 & 0xFF)) // Compare whether greater than return 1; else if ((b1 ^ b2) == 0)// Judge whether they are equal return 0; else return -1; } /** * This method will locate the record containing the ip country and region according to the ip content, return an absolute offset, and use the dichotomy method to find it. * * @param ip * IP to query * @return If found, return the offset of the end IP. If not found, return - 1 */ private long locateIP(byte[] ip) { long m = 0; int r; // Compare first ip entry readIP(ipBegin, b4); r = compareIP(ip, b4); if (r == 0) return ipBegin; else if (r < 0) return -1; // Start binary search for (long i = ipBegin, j = ipEnd; i < j;) { m = getMiddleOffset(i, j); readIP(m, b4); r = compareIP(ip, b4); // log.debug(Utils.getIpStringFromBytes(b)); if (r > 0) i = m; else if (r < 0) { if (m == j) { j -= IP_RECORD_LENGTH; m = j; } else j = m; } else return readLong3(m + 4); } // If the loop ends, then i and j must be equal. This record is the most likely record, but it is not // It must be. Also check it. If so, return the absolute offset of the end address area m = readLong3(m + 4); readIP(m, b4); r = compareIP(ip, b4); if (r <= 0) return m; else return -1; } /** * Get the offset recorded in the middle of the begin offset and end offset * * @param begin * @param end * @return */ private long getMiddleOffset(long begin, long end) { long records = (end - begin) / IP_RECORD_LENGTH; records >>= 1; if (records == 0) records = 1; return begin + records * IP_RECORD_LENGTH; } /** * Given the offset of an ip country and region record, an IPLocation structure is returned * * @param offset * @return */ private IPLocation getIPLocation(long offset) { try { // Skip 4-byte ip ipFile.seek(offset + 4); // Read the first byte to determine whether the flag byte byte b = ipFile.readByte(); if (b == AREA_FOLLOWED) { // Read country offset long countryOffset = readLong3(); // Jump to offset ipFile.seek(countryOffset); // Check the flag byte again, because this place may still be a redirect at this time b = ipFile.readByte(); if (b == NO_AREA) { loc.country = readString(readLong3()); ipFile.seek(countryOffset + 4); } else loc.country = readString(countryOffset); // Read region flag loc.area = readArea(ipFile.getFilePointer()); } else if (b == NO_AREA) { loc.country = readString(readLong3()); loc.area = readArea(offset + 8); } else { loc.country = readString(ipFile.getFilePointer() - 1); loc.area = readArea(ipFile.getFilePointer()); } return loc; } catch (IOException e) { return null; } } /** * @param offset * @return */ private IPLocation getIPLocation(int offset) { // Skip 4-byte ip mbb.position(offset + 4); // Read the first byte to determine whether the flag byte byte b = mbb.get(); if (b == AREA_FOLLOWED) { // Read country offset int countryOffset = readInt3(); // Jump to offset mbb.position(countryOffset); // Check the flag byte again, because this place may still be a redirect at this time b = mbb.get(); if (b == NO_AREA) { loc.country = readString(readInt3()); mbb.position(countryOffset + 4); } else loc.country = readString(countryOffset); // Read region flag loc.area = readArea(mbb.position()); } else if (b == NO_AREA) { loc.country = readString(readInt3()); loc.area = readArea(offset + 8); } else { loc.country = readString(mbb.position() - 1); loc.area = readArea(mbb.position()); } return loc; } /** * Starting from the offset, parse the following bytes and read out a region name * * @param offset * @return Region name string * @throws IOException */ private String readArea(long offset) throws IOException { ipFile.seek(offset); byte b = ipFile.readByte(); if (b == 0x01 || b == 0x02) { long areaOffset = readLong3(offset + 1); if (areaOffset == 0) return "Unknown region"; else return readString(areaOffset); } else return readString(offset); } /** * @param offset * @return */ private String readArea(int offset) { mbb.position(offset); byte b = mbb.get(); if (b == 0x01 || b == 0x02) { int areaOffset = readInt3(); if (areaOffset == 0) return "Unknown region"; else return readString(areaOffset); } else return readString(offset); } /** * Reads a string ending in 0 from the offset * * @param offset * @return An error occurred while reading the string. An empty string was returned */ private String readString(long offset) { try { ipFile.seek(offset); int i; for (i = 0, buf[i] = ipFile.readByte(); buf[i] != 0; buf[++i] = ipFile .readByte()) ; if (i != 0) return IPSeekerUtils.getString(buf, 0, i, "GBK"); } catch (IOException e) { System.out.println(e.getMessage()); } return ""; } /** * Get a string ending in 0 from the offset position of the memory mapping file * * @param offset * @return */ private String readString(int offset) { try { mbb.position(offset); int i; for (i = 0, buf[i] = mbb.get(); buf[i] != 0; buf[++i] = mbb.get()) ; if (i != 0) return IPSeekerUtils.getString(buf, 0, i, "GBK"); } catch (IllegalArgumentException e) { System.out.println(e.getMessage()); } return ""; } public String getAddress(String ip) { String country = getCountry(ip).equals(" CZ88.NET") ? "" : getCountry(ip); String area = getArea(ip).equals(" CZ88.NET") ? "" : getArea(ip); String address = country + " " + area; return address.trim(); } /** * * It is used to encapsulate ip related information. At present, there are only two fields: the country and region where the ip is located * * * @author swallow */ public class IPLocation { public String country; public String area; public IPLocation() { country = area = ""; } public IPLocation getCopy() { IPLocation ret = new IPLocation(); ret.country = country; ret.area = area; return ret; } } /** * An IP range record includes not only country and region, but also start IP and end IP* * * * @author root */ public class IPEntry { public String beginIp; public String endIp; public String country; public String area; public IPEntry() { beginIp = endIp = country = area = ""; } public String toString() { return this.area + " " + this.country + "IP Χ:" + this.beginIp + "-" + this.endIp; } } /** * Operation tool class * * @author root * */ public static class IPSeekerUtils { /** * Get the byte array form from the string form of ip * * @param ip * ip in string form * @return ip in byte array */ public static byte[] getIpByteArrayFromString(String ip) { byte[] ret = new byte[4]; java.util.StringTokenizer st = new java.util.StringTokenizer(ip, "."); try { ret[0] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF); ret[1] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF); ret[2] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF); ret[3] = (byte) (Integer.parseInt(st.nextToken()) & 0xFF); } catch (Exception e) { System.out.println(e.getMessage()); } return ret; } /** * Encode and convert the original string. If it fails, return the original string * * @param s * Original string * @param srcEncoding * Source coding method * @param destEncoding * Target coding method * @return Failed to convert the encoded string. The original string is returned */ public static String getString(String s, String srcEncoding, String destEncoding) { try { return new String(s.getBytes(srcEncoding), destEncoding); } catch (UnsupportedEncodingException e) { return s; } } /** * Converts an array of bytes into a string according to some encoding * * @param b * Byte array * @param encoding * Coding mode * @return If encoding is not supported, a default encoded string is returned */ public static String getString(byte[] b, String encoding) { try { return new String(b, encoding); } catch (UnsupportedEncodingException e) { return new String(b); } } /** * Converts an array of bytes into a string according to some encoding * * @param b * Byte array * @param offset * Start position to convert * @param len * Length to convert * @param encoding * Coding mode * @return If encoding is not supported, a default encoded string is returned */ public static String getString(byte[] b, int offset, int len, String encoding) { try { return new String(b, offset, len, encoding); } catch (UnsupportedEncodingException e) { return new String(b, offset, len); } } /** * @param ip * ip Byte array form of * @return ip in string form */ public static String getIpStringFromBytes(byte[] ip) { StringBuffer sb = new StringBuffer(); sb.append(ip[0] & 0xFF); sb.append('.'); sb.append(ip[1] & 0xFF); sb.append('.'); sb.append(ip[2] & 0xFF); sb.append('.'); sb.append(ip[3] & 0xFF); return sb.toString(); } } /** * Get a list of all ip address sets * * @return */ public List<String> getAllIp() { List<String> list = new ArrayList<String>(); byte[] buf = new byte[4]; for (long i = ipBegin; i < ipEnd; i += IP_RECORD_LENGTH) { try { this.readIP(this.readLong3(i + 4), buf); // Read the ip, and finally put the ip into the buf String ip = IPSeekerUtils.getIpStringFromBytes(buf); list.add(ip); } catch (Exception e) { // nothing } } return list; } }
-
Configuration file qqery dat:
Configuration file for configuration and code qqery DAT, upload to the resource directory
No customization required
If an error is reported and the human body cannot be found, and no result can be found after running, there must be a problem with the directory of the configuration file. It must be an English directory
-
Test IPSeeker code
Create a new TestIPSeeker class: test whether there is a problem with ip conversion
No customization required
Tip: the configuration file must be an English directory
public class TestIPSeeker { public static void main(String[] args) { IPSeeker ipSeeker = IPSeeker.getInstance(); System.out.println(ipSeeker.getCountry("120.197.87.216")); System.out.println(ipSeeker.getCountry("115.239.210.27")); System.out.println(ipSeeker.getCountry("255.255.255.255")); } }
-
Constant class GlobalConstants coding
New GlobalConstants class: contains the values of commonly used global variables
No customization required
public class GlobalConstants { //Milliseconds per day public static final int DAY_OF_MILLISECONDS = 86400000; // Defined runtime variable name public static final String RUNNING_DATE_PARAMES = "RUNNING_DATE"; // Default value public static final String DEFAULT_VALUE = "unknown"; // Specify all column values in the dimension information table public static final String VALUE_OF_ALL = "all"; // Prefix of the defined output collector public static final String OUTPUT_COLLECTOR_KEY_PREFIX = "collector_"; // Specify that the connection table is configured as report public static final String WAREHOUSE_OF_REPORT = "report"; // Batch executed key public static final String JDBC_BATCH_NUMBER = "mysql.batch.number"; // Default batch size public static final String DEFAULT_JDBC_BATCH_NUMBER = "500"; // driver name public static final String JDBC_DRIVER = "mysql.%s.driver"; // JDBC URL public static final String JDBC_URL = "mysql.%s.url"; // username name public static final String JDBC_USERNAME = "mysql.%s.username"; // password name public static final String JDBC_PASSWORD = "mysql.%s.password"; }
-
IPSeekerExt coding:
Create a new subclass IPSeekerExt class of IPSeeker.
Enhance IP resolution:
-
When resolving the final return of ip: country name, province name and city name are returned separately
-
If it is a foreign ip, set it directly to unknown unknown unknown
-
If it is a domestic ip and cannot be parsed, it is set to China unknown unknown
-
unknown reads the constant class GlobalConstants
No customization required
public class IPSeekerExt extends IPSeeker { private RegionInfo DEFAULT_INFO = new RegionInfo(); /** * Resolve the ip address and return the country and province information corresponding to the ip address < br / > * If the ip resolution fails, the default value is returned directly * @param ip The format of the ip address to be resolved is 120.197 eighty-seven point two one six * @return */ public RegionInfo analyticIp(String ip) { if (ip == null || ip.trim().isEmpty()) { return DEFAULT_INFO; } RegionInfo info = new RegionInfo(); try { String country = super.getCountry(ip); if ("LAN".equals(country)) { info.setCountry("China"); info.setProvince("Shanghai"); } else if (country != null && !country.trim().isEmpty()) { // Indicates that the ip is also a resolvable ip country = country.trim(); int length = country.length(); int index = country.indexOf('province'); if (index > 0) { // The current ip address belongs to one of 23 provinces. The format of country is xxx province (xxx city) (xxx county / district) info.setCountry("China"); if (index == length - 1) { info.setProvince(country); // Set Province, format: Guangdong Province } else { // Format: Guangzhou City, Guangdong Province info.setProvince(country.substring(0, index + 1)); // Set Province int index2 = country.indexOf('city', index); // View the location of the next market if (index2 > 0) { country.substring(1, 1); info.setCity(country.substring(index + 1, Math.min(index2 + 1, length))); // Set city } } } else { // The other five autonomous regions, four municipalities directly under the central government and two special administrative regions String flag = country.substring(0, 2); // Take the first two digits of the string switch (flag) { case "Inner Mongolia": info.setCountry("China"); info.setProvince("Inner Mongolia Autonomous Region"); country = country.substring(3); if (country != null && !country.isEmpty()) { index = country.indexOf('city'); if (index > 0) { info.setCity(country.substring(0, Math.min(index + 1, country.length()))); // Set city } } break; case "Guangxi": case "Tibet": case "Ningxia": case "Xinjiang": info.setCountry("China"); info.setProvince(flag); country = country.substring(2); if (country != null && !country.isEmpty()) { index = country.indexOf('city'); if (index > 0) { info.setCity(country.substring(0, Math.min(index + 1, country.length()))); // Set city } } break; case "Shanghai": case "Beijing": case "Tianjin": case "Chongqing": info.setCountry("China"); info.setProvince(flag + "city"); country = country.substring(3); // Remove this province / Municipality if (country != null && !country.isEmpty()) { index = country.indexOf('area'); if (index > 0) { char ch = country.charAt(index - 1); if (ch != 'school' || ch != 'Small') { info.setCity(country.substring( 0, Math.min(index + 1, country.length()))); // Setting area } } if (RegionInfo.DEFAULT_VALUE.equals(info.getCity())) { // city is still the default index = country.indexOf('county'); if (index > 0) { info.setCity(country.substring( 0, Math.min(index + 1, country.length()))); // Setting area } } } break; case "Hong Kong": case "Macao": info.setCountry("China"); info.setProvince(flag + "special administrative region"); break; default: break; } } } } catch (Exception e) { // An exception occurred during parsing e.printStackTrace(); } return info; } /** * ip A model related to region * @author root */ public static class RegionInfo { public static final String DEFAULT_VALUE = GlobalConstants.DEFAULT_VALUE; // Default value private String country = DEFAULT_VALUE; // country private String province = DEFAULT_VALUE; // province private String city = DEFAULT_VALUE; // city public String getCountry() { return country; } public void setCountry(String country) { this.country = country; } public String getProvince() { return province; } public void setProvince(String province) { this.province = province; } public String getCity() { return city; } public void setCity(String city) { this.city = city; } @Override public String toString() { return "RegionInfo [country=" + country + ", province=" + province + ", city=" + city + "]"; } } }
-
-
Testing the IPseekerExt class
Create a new testipseekerex class: check whether the obtained results meet your requirements
public class TestIPSeekerExt { public static void main(String[] args) { //Get the region information of an IP IPSeekerExt ipSeekerExt = new IPSeekerExt(); RegionInfo info = ipSeekerExt.analyticIp("114.114.114.114"); System.out.println(info); //Obtain regional information of location ip info = ipSeekerExt.analyticIp("255.255.255.255"); System.out.println(info); //Get the region information of all IP addresses // List<String> ips = ipSeekerExt.getAllIp(); // for (String ip : ips) { // System.out.println(ip + " --- " + ipSeekerExt.analyticIp(ip)); // } } }
-
UserAgentUtil class coding
Create a new UserAgentUtil class file: parse the browser. Rely on CZ by calling pom mallat. The method in the uasparser jar file in the uasparser.
No customization required
/** * The tool class that parses the user agent of the browser calls the uasparser jar file internally * @author root */ public class UserAgentUtil { static UASparser uasParser = null; // static code block to initialize the uasParser object static { try { uasParser = new UASparser(OnlineUpdater.getVendoredInputStream()); } catch (IOException e) { e.printStackTrace(); } } /** * Parse the user agent string of the browser and return the UserAgentInfo object< br/> * If the user agent is null, null is returned. If the parsing fails, null is also returned directly. * * @param userAgent user agent string to parse * @return Returns a specific value */ public static UserAgentInfo analyticUserAgent(String userAgent) { UserAgentInfo result = null; if (!(userAgent == null || userAgent.trim().isEmpty())) { // At this point, the userAgent is not null and does not consist of all spaces try { cz.mallat.uasparser.UserAgentInfo info = null; info = uasParser.parse(userAgent); result = new UserAgentInfo(); result.setBrowserName(info.getUaFamily()); result.setBrowserVersion(info.getBrowserVersionInfo()); result.setOsName(info.getOsFamily()); result.setOsVersion(info.getOsName()); } catch (IOException e) { // An exception occurred, set the return value to null result = null; } } return result; } /** * Browser information model object after internal parsing * * @author root * */ public static class UserAgentInfo { private String browserName; // Browser name private String browserVersion; // Browser version number private String osName; // Operating system name private String osVersion; // Operating system version number public String getBrowserName() { return browserName; } public void setBrowserName(String browserName) { this.browserName = browserName; } public String getBrowserVersion() { return browserVersion; } public void setBrowserVersion(String browserVersion) { this.browserVersion = browserVersion; } public String getOsName() { return osName; } public void setOsName(String osName) { this.osName = osName; } public String getOsVersion() { return osVersion; } public void setOsVersion(String osVersion) { this.osVersion = osVersion; } @Override public String toString() { return "UserAgentInfo [browserName=" + browserName + ", browserVersion=" + browserVersion + ", osName=" + osName + ", osVersion=" + osVersion + "]"; } } }
-
UserAgentUtil test class coding
Create a new UserAgentUtil class file: check whether the parsing result is correct
No customization required
public class TestUserAgentUtil { public static void main(String[] args) { String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36"; // userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; GWX:QUALIFIED; rv:11.0) like Gecko"; // userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"; UserAgentInfo info = UserAgentUtil.analyticUserAgent(userAgent); System.out.println(info); } }
-
Create a new date type enumeration class DateEnum:
Create a new enumeration class DateEnum file: parse the data entered in the tool for obtaining time
No customization required
/** * Date type enumeration class * * @author root * */ public enum DateEnum { YEAR("year"), SEASON("season"), MONTH("month"), WEEK("week"), DAY("day"), HOUR( "hour"); public final String name; private DateEnum(String name) { this.name = name; } /** * Get the corresponding type object according to the value of the attribute name * * @param name * @return */ public static DateEnum valueOfName(String name) { for (DateEnum type : values()) { if (type.name.equals(name)) { return type; } } return null; } }
-
Create a new time control tool class TimeUtil:
New tool class file TimeUtil: converts the time in the log file into an event stamp
No customization required
public class TimeUtil { public static final String DATE_FORMAT = "yyyy-MM-dd"; /** * Get date format string data of yesterday * * @return */ public static String getYesterday() { return getYesterday(DATE_FORMAT); } /** * Get date format string data of the current day * * @return */ public static String getday() { return getday(DATE_FORMAT); } /** * Gets the time string in the corresponding format * * @param pattern * @return */ public static String getYesterday(String pattern) { SimpleDateFormat sdf = new SimpleDateFormat(pattern); Calendar calendar = Calendar.getInstance(); calendar.add(Calendar.DAY_OF_YEAR, -1); return sdf.format(calendar.getTime()); } /** * Gets the time string in the corresponding format * * @param pattern * @return */ public static String getday(String pattern) { SimpleDateFormat sdf = new SimpleDateFormat(pattern); Calendar calendar = Calendar.getInstance(); calendar.add(Calendar.DAY_OF_YEAR, 0); return sdf.format(calendar.getTime()); } /** * Judge whether the input parameter is a valid time format data * * @param input * @return */ public static boolean isValidateRunningDate(String input) { Matcher matcher = null; boolean result = false; String regex = "[0-9]{4}-[0-9]{2}-[0-9]{2}"; if (input != null && !input.isEmpty()) { Pattern pattern = Pattern.compile(regex); matcher = pattern.matcher(input); } if (matcher != null) { result = matcher.matches(); } return result; } /** * Converts a time string in yyyy MM DD format to a timestamp * * @param input * @return */ public static long parseString2Long(String input) { return parseString2Long(input, DATE_FORMAT); } /** * Converts a time string in the specified format to a timestamp * * @param input * @param pattern * @return */ public static long parseString2Long(String input, String pattern) { Date date = null; try { date = new SimpleDateFormat(pattern).parse(input); } catch (ParseException e) { throw new RuntimeException(e); } return date.getTime(); } /** * Converts a timestamp to a time string in yyyy MM DD format * * @param input * @return */ public static String parseLong2String(long input) { return parseLong2String(input, DATE_FORMAT); } /** * Converts a timestamp to a string in the specified format * * @param input * @param pattern * @return */ public static String parseLong2String(long input, String pattern) { Calendar calendar = Calendar.getInstance(); calendar.setTimeInMillis(input); return new SimpleDateFormat(pattern).format(calendar.getTime()); } /** * Convert the nginx server time to a timestamp. If the parsing fails, return - 1 * * @param input 1459581125.573 * @return */ public static long parseNginxServerTime2Long(String input) { Date date = parseNginxServerTime2Date(input); return date == null ? -1L : date.getTime(); } /** * Converts the nginx server time to a date object. If the resolution fails, null is returned * * @param input * Format: 1449410796.976 * @return */ public static Date parseNginxServerTime2Date(String input) { if (StringUtils.isNotBlank(input)) { try { long timestamp = Double.valueOf(Double.valueOf(input.trim()) * 1000).longValue(); Calendar calendar = Calendar.getInstance(); calendar.setTimeInMillis(timestamp); return calendar.getTime(); } catch (Exception e) { // nothing } } return null; } /** * Obtain the required time information from the timestamp * * @param time * time stamp * @param type * @return If there is no matching type, an exception message is thrown */ public static int getDateInfo(long time, DateEnum type) { Calendar calendar = Calendar.getInstance(); calendar.setTimeInMillis(time); if (DateEnum.YEAR.equals(type)) { // Year information required return calendar.get(Calendar.YEAR); } else if (DateEnum.SEASON.equals(type)) { // Quarterly information required int month = calendar.get(Calendar.MONTH) + 1; if (month % 3 == 0) { return month / 3; } return month / 3 + 1; } else if (DateEnum.MONTH.equals(type)) { // Month information required return calendar.get(Calendar.MONTH) + 1; } else if (DateEnum.WEEK.equals(type)) { // Week information required return calendar.get(Calendar.WEEK_OF_YEAR); } else if (DateEnum.DAY.equals(type)) { return calendar.get(Calendar.DAY_OF_MONTH); } else if (DateEnum.HOUR.equals(type)) { return calendar.get(Calendar.HOUR_OF_DAY); } throw new RuntimeException("There is no corresponding time type:" + type); } /** * Gets the time stamp value of the first day of the specified week * * @param time * @return */ public static long getFirstDayOfThisWeek(long time) { Calendar cal = Calendar.getInstance(); cal.setTimeInMillis(time); cal.set(Calendar.DAY_OF_WEEK, 1); cal.set(Calendar.HOUR_OF_DAY, 0); cal.set(Calendar.MINUTE, 0); cal.set(Calendar.SECOND, 0); cal.set(Calendar.MILLISECOND, 0); return cal.getTimeInMillis(); } }
-
Code writing of EventLogConstants class:
Create a new EventLogConstants class file: define the name and event of the user data parameters collected by the log collection client_ Logs is the structure information of the hbase table. The name of the user data parameter is event_ Column name of logs
No customization required
public class EventLogConstants { /** Event enumeration class. Specifies the name of the event * * @author root * **/ public static enum EventEnum { LAUNCH(1, "launch event", "e_l"), // launch event, indicating the first visit PAGEVIEW(2, "page view event", "e_pv"), // Page browsing events CHARGEREQUEST(3, "charge request event", "e_crt"), // Order production event CHARGESUCCESS(4, "charge success event", "e_cs"), // Event triggered when an order is successfully paid CHARGEREFUND(5, "charge refund event", "e_cr"), // Order refund event EVENT(6, "event duration event", "e_e") // event ; public final int id; // id unique identification public final String name; // name public final String alias; // Alias, short for data collection private EventEnum(int id, String name, String alias) { this.id = id; this.name = name; this.alias = alias; } /** * Get the event enumeration object matching the alias. If there is no matching value in the end, null will be returned directly. * * @param alias * @return **/ public static EventEnum valueOfAlias(String alias) { for (EventEnum event : values()) { if (event.alias.equals(alias)) { return event; } } return null; } } // Table name public static final String HBASE_NAME_EVENT_LOGS = "event"; // event_ Column cluster name of logs table public static final String EVENT_LOGS_FAMILY_NAME = "log"; // Log separator public static final String LOG_SEPARTIOR = "\\^A"; // User ip address public static final String LOG_COLUMN_NAME_IP = "ip"; // Server time public static final String LOG_COLUMN_NAME_SERVER_TIME = "s_time"; // Event name public static final String LOG_COLUMN_NAME_EVENT_NAME = "en"; // Version information of data collection side public static final String LOG_COLUMN_NAME_VERSION = "ver"; // User unique identifier public static final String LOG_COLUMN_NAME_UUID = "u_ud"; // Member unique identifier public static final String LOG_COLUMN_NAME_MEMBER_ID = "u_mid"; // Session id public static final String LOG_COLUMN_NAME_SESSION_ID = "u_sd"; // Client time public static final String LOG_COLUMN_NAME_CLIENT_TIME = "c_time"; // language public static final String LOG_COLUMN_NAME_LANGUAGE = "l"; // Browser user agent parameters public static final String LOG_COLUMN_NAME_USER_AGENT = "b_iev"; // Browser resolution size public static final String LOG_COLUMN_NAME_RESOLUTION = "b_rst"; // Define platform public static final String LOG_COLUMN_NAME_PLATFORM = "pl"; // Current url public static final String LOG_COLUMN_NAME_CURRENT_URL = "p_url"; // url of the previous page public static final String LOG_COLUMN_NAME_REFERRER_URL = "p_ref"; // title of the current page public static final String LOG_COLUMN_NAME_TITLE = "tt"; // Order id public static final String LOG_COLUMN_NAME_ORDER_ID = "oid"; // Order name public static final String LOG_COLUMN_NAME_ORDER_NAME = "on"; // Order amount public static final String LOG_COLUMN_NAME_ORDER_CURRENCY_AMOUNT = "cua"; // Order currency type public static final String LOG_COLUMN_NAME_ORDER_CURRENCY_TYPE = "cut"; // Order payment amount public static final String LOG_COLUMN_NAME_ORDER_PAYMENT_TYPE = "pt"; // category name public static final String LOG_COLUMN_NAME_EVENT_CATEGORY = "ca"; // action name public static final String LOG_COLUMN_NAME_EVENT_ACTION = "ac"; // kv prefix public static final String LOG_COLUMN_NAME_EVENT_KV_START = "kv_"; // duration public static final String LOG_COLUMN_NAME_EVENT_DURATION = "du"; // Operating system name public static final String LOG_COLUMN_NAME_OS_NAME = "os"; // Operating system version public static final String LOG_COLUMN_NAME_OS_VERSION = "os_v"; // Browser name public static final String LOG_COLUMN_NAME_BROWSER_NAME = "browser"; // Browser version public static final String LOG_COLUMN_NAME_BROWSER_VERSION = "browser_v"; // Country of ip address resolution public static final String LOG_COLUMN_NAME_COUNTRY = "country"; // Province of ip address resolution public static final String LOG_COLUMN_NAME_PROVINCE = "province"; // City of ip address resolution public static final String LOG_COLUMN_NAME_CITY = "city"; }
-
Coding of LoggerUtil class
Create a new LoggerUtil class file: a specific working class for processing log data
- Process log data logText and return the map collection of processing results
- If logText does not specify a data format, the collection of empty is returned directly
- Process ip address
- Process browser userAgent information
- Processing request parameters
No customization required
public class LoggerUtil { private static final Logger logger = Logger.getLogger(LoggerUtil.class); private static IPSeekerExt ipSeekerExt = new IPSeekerExt(); /** * Process log data logText and return the map set of processing results < br / > * If logText does not specify a data format, the collection of empty is returned directly * * 192.168.78.1^A1542816232.816^Anode01^A/log.gif?en=e_e&ca=event%E7%9A%84category%E5%90%8D%E7%A7%B0&ac=event%E7%9A%84action%E5%90%8D%E7%A7%B0 &ver=1&pl=website&sdk=js&u_ud=9061294F-721C-4838-83E3-B4F6E6EB3233&u_mid=zhangsan&u_sd=E9175784-A5D7-4692-806B-5FF384902D9D&c_time=1542781455523&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F60.0.3112.113%20Safari%2F537.36&b_rst=1536*864 * * @param logText * @return */ public static Map<String, String> handleLog(String logText) { Map<String, String> clientInfo = new HashMap<String, String>(); if (StringUtils.isNotBlank(logText)) { String[] splits = logText.trim().split(EventLogConstants.LOG_SEPARTIOR); if (splits.length == 4) { // Log format: IP ^ aserver time ^ Ahost^A request parameter clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_IP, splits[0].trim()); // Set ip // Set server time clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_SERVER_TIME, String.valueOf(TimeUtil.parseNginxServerTime2Long(splits[1].trim()))); int index = splits[3].indexOf("?"); if (index > -1) { String requestBody = splits[3].substring(index + 1); // Get the request parameters, that is, our collected data // Processing request parameters handleRequestBody(requestBody, clientInfo); // Handling userAgent handleUserAgent(clientInfo); // Process ip address handleIp(clientInfo); } else { // Abnormal data format clientInfo.clear(); } } } return clientInfo; } /** * Process ip address * * @param clientInfo */ private static void handleIp(Map<String,String> clientInfo) { if (clientInfo.containsKey(EventLogConstants.LOG_COLUMN_NAME_IP)) { String ip = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_IP); RegionInfo info = ipSeekerExt.analyticIp(ip); if (info != null) { clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_COUNTRY, info.getCountry()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_PROVINCE, info.getProvince()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_CITY, info.getCity()); } } } /** * Process browser userAgent information * * @param clientInfo */ private static void handleUserAgent(Map<String, String> clientInfo) { if (clientInfo.containsKey(EventLogConstants.LOG_COLUMN_NAME_USER_AGENT)) { UserAgentInfo info = UserAgentUtil.analyticUserAgent(clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_USER_AGENT)); if (info != null) { clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_NAME, info.getOsName()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_VERSION, info.getOsVersion()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_NAME, info.getBrowserName()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_VERSION, info.getBrowserVersion()); } } } /** * Processing request parameters * en=e_e&ca=event%E7%9A%84category%E5%90%8D%E7%A7%B0&ac=event%E7%9A%84action%E5%90%8D%E7%A7%B0 &ver=1&pl=website&sdk=js&u_ud=9061294F-721C-4838-83E3-B4F6E6EB3233&u_mid=zhangsan&u_sd=E9175784-A5D7-4692-806B-5FF384902D9D&c_time=1542781455523&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F60.0.3112.113%20Safari%2F537.36&b_rst=1536*864 * @param requestBody * @param clientInfo */ private static void handleRequestBody(String requestBody, Map<String, String> clientInfo) { if (StringUtils.isNotBlank(requestBody)) { String[] requestParams = requestBody.split("&"); for (String param : requestParams) { if (StringUtils.isNotBlank(param)) { int index = param.indexOf("="); if (index < 0) { logger.warn("Unable to parse parameters:" + param + ", The request parameter is:" + requestBody); continue; } String key = null, value = null; try { key = param.substring(0, index); value = URLDecoder.decode(param.substring(index + 1), "utf-8"); } catch (Exception e) { logger.warn("Exception in decoding operation", e); continue; } if (StringUtils.isNotBlank(key) && StringUtils.isNotBlank(value)) { clientInfo.put(key, value); } } } } } }
-
Verify LoggerUtil class:
Create a new TestLoggerUtil class file: check whether the parsing result is what you want
No customization required
public class TestLoggerUtil { public static void main(String[] args) { String log = "192.168.100.102^A1449411239.595^A192.168.239.8^A/log.gif?c_time=1449411240818&oid=orderid456&u_mid=zhangsan&pl=java_server&en=e_cr&sdk=jdk&ver=1"; log = "192.168.100.102^A1449587515.394^A192.168.239.8^A/log.gif?en=e_pv&p_url=http%3A%2F%2Flocalhost%3A8080%2Fbf_track_jssdk%2Fdemo2.jsp&p_ref=http%3A%2F%2Flocalhost%3A8080%2Fbf_track_jssdk%2Fdemo.jsp&tt=%E6%B5%8B%E8%AF%95%E9%A1%B5%E9%9D%A22&ver=1&pl=website&sdk=js&u_ud=948AB94A-E1A5-4EED-BBB8-CEDB74B8B4D0&u_sd=9EF5D22F-5CCD-4290-AFCA-641672988F73&c_time=1449587517241&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%206.1%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F46.0.2490.71%20Safari%2F537.36&b_rst=1280*768"; System.out.println(LoggerUtil.handleLog(log)); System.out.println(IPSeekerExt.getInstance().getCountry("192.168.100.102")); System.out.println(IPSeekerExt.getInstance().getArea("192.168.100.102")); } }
-
Code writing of AnalyserLogDataRunner
Create the AnalyserLogDataRunner class file, call the mapper class to read the hdfs file and write the data to hbse
Path inputpath = new path ("/ data / test / flume /" + timeutil.parselong2string (timeutil.parsestring2long (date), "yyyyMMdd") + "/");
public class AnalyserLogDataRunner implements Tool { private static final Logger logger = Logger .getLogger(AnalyserLogDataRunner.class); private Configuration conf = null; public static void main(String[] args) { try { ToolRunner.run(new Configuration(), new AnalyserLogDataRunner(), args); } catch (Exception e) { logger.error("Perform log parsing job abnormal", e); throw new RuntimeException(e); } } @Override public void setConf(Configuration conf) { conf = HBaseConfiguration.create(); conf.set("mapreduce.app-submission.corss-paltform", "true"); conf.set("mapreduce.framework.name", "local"); this.conf = HBaseConfiguration.create(conf); } @Override public Configuration getConf() { return this.conf; } @Override public int run(String[] args) throws Exception { Configuration conf = this.getConf(); this.processArgs(conf, args); Job job = Job.getInstance(conf, "analyser_logdata"); // Code is required to set local job submission and run the cluster // File jarFile = EJob.createTempJar("target/classes"); // ((JobConf) job.getConfiguration()).setJar(jarFile.toString()); // Set the local submit job, and the cluster runs. The code ends job.setJarByClass(AnalyserLogDataRunner.class); job.setMapperClass(AnalyserLogDataMapper.class); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(Put.class); // Set reducer configuration // 1. Run on the cluster and print it as jar (the addDependencyJars parameter is required to be true, which is true by default) // TableMapReduceUtil.initTableReducerJob(EventLogConstants.HBASE_NAME_EVENT_LOGS, // null, job); // 2. For local operation, the parameter addDependencyJars is required to be false TableMapReduceUtil.initTableReducerJob( EventLogConstants.HBASE_NAME_EVENT_LOGS, null, job, null, null, null, null, false); job.setNumReduceTasks(0); // Set input path this.setJobInputPaths(job); return job.waitForCompletion(true) ? 0 : -1; } /** * Processing parameters * * @param conf * @param args * -d 2019-06-01 */ private void processArgs(Configuration conf, String[] args) { String date = null; for (int i = 0; i < args.length; i++) { if ("-d".equals(args[i])) { if (i + 1 < args.length) { date = args[++i]; break; } } } System.out.println("-----" + date); // The required date format is yyyy mm DD if (StringUtils.isBlank(date) || !TimeUtil.isValidateRunningDate(date)) { // date is an invalid time data date = TimeUtil.getday(); // The default time is yesterday System.out.println(date); } conf.set(GlobalConstants.RUNNING_DATE_PARAMES, date); } /** * Set the input path of the job * * @param job */ private void setJobInputPaths(Job job) { Configuration conf = job.getConfiguration(); FileSystem fs = null; try { fs = FileSystem.get(conf); String date = conf.get(GlobalConstants.RUNNING_DATE_PARAMES); Path inputPath = new Path("/data/test/flume/" + TimeUtil.parseLong2String( TimeUtil.parseString2Long(date), "yyyyMMdd") + "/"); System.out.println(inputPath); // Path inputPath = new Path("/log/" // + TimeUtil.parseLong2String( // TimeUtil.parseString2Long(date), "yyyyMMdd") // + "/"); if (fs.exists(inputPath)) { FileInputFormat.addInputPath(job, inputPath); } else { throw new RuntimeException("directory does not exist:" + inputPath); } } catch (IOException e) { throw new RuntimeException("set up job of mapreduce An exception occurred in the input path", e); } finally { if (fs != null) { try { fs.close(); } catch (IOException e) { // nothing } } } } }
-
Code writing of AnalyserLogDataMapper
Create an AnalyserLogDataMapper class file, read the hdfs file, and parse it. Main functions: filter out data, parse logs, and create rowkey s according to uuid memberid servertime
No customization required
public class AnalyserLogDataMapper extends Mapper<LongWritable, Text, NullWritable, Put> { private final Logger logger = Logger.getLogger(AnalyserLogDataMapper.class); private int inputRecords, filterRecords, outputRecords; // It is mainly used for marking to facilitate viewing and filtering data private byte[] family = Bytes.toBytes(EventLogConstants.EVENT_LOGS_FAMILY_NAME); private CRC32 crc32 = new CRC32(); /** * 192.168.78.1^A1542816232.816^Anode01^A/log.gif?en=e_e&ca=event%E7%9A%84category%E5%90%8D%E7%A7%B0&ac=event%E7%9A%84action%E5%90%8D%E7%A7%B0 &ver=1&pl=website&sdk=js&u_ud=9061294F-721C-4838-83E3-B4F6E6EB3233&u_mid=zhangsan&u_sd=E9175784-A5D7-4692-806B-5FF384902D9D&c_time=1542781455523&l=zh-CN&b_iev=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F60.0.3112.113%20Safari%2F537.36&b_rst=1536*864 */ @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { this.inputRecords++; this.logger.debug("Analyse data of :" + value); try { // Parse log Map<String, String> clientInfo = LoggerUtil.handleLog(value.toString()); // Filter failed data if (clientInfo.isEmpty()) { this.filterRecords++; return; } // Get event name String eventAliasName = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_EVENT_NAME); EventEnum event = EventEnum.valueOfAlias(eventAliasName); switch (event) { case LAUNCH: case PAGEVIEW: case CHARGEREQUEST: case CHARGEREFUND: case CHARGESUCCESS: case EVENT: // Processing data this.handleData(clientInfo, event, context); break; default: this.filterRecords++; this.logger.warn("The event cannot be resolved. The event name is:" + eventAliasName); } } catch (Exception e) { this.filterRecords++; this.logger.error("Processing data, sending exception, data:" + value, e); } } @Override protected void cleanup(Context context) throws IOException, InterruptedException { super.cleanup(context); logger.info("input data:" + this.inputRecords + ";output data:" + this.outputRecords + ";Filter data:" + this.filterRecords); } /** * Specific data processing methods * * @param clientInfo * @param context * @param event * @throws InterruptedException * @throws IOException */ private void handleData(Map<String, String> clientInfo, EventEnum event, Context context) throws IOException, InterruptedException { String uuid = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_UUID); String memberId = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_MEMBER_ID); String serverTime = clientInfo.get(EventLogConstants.LOG_COLUMN_NAME_SERVER_TIME); if (StringUtils.isNotBlank(serverTime)) { // Server time required is not empty clientInfo.remove(EventLogConstants.LOG_COLUMN_NAME_USER_AGENT); // Remove browser information String rowkey = this.generateRowKey(uuid, memberId, event.alias, serverTime); // timestamp // + // (uuid+memberid+event).crc Put put = new Put(Bytes.toBytes(rowkey)); for (Map.Entry<String, String> entry : clientInfo.entrySet()) { if (StringUtils.isNotBlank(entry.getKey()) && StringUtils.isNotBlank(entry.getValue())) { put.addColumn(family, Bytes.toBytes(entry.getKey()), Bytes.toBytes(entry.getValue())); } } context.write(NullWritable.get(), put); this.outputRecords++; } else { this.filterRecords++; } } /** * Create rowkey based on uuid memberid servertime * * @param uuid * @param memberId * @param eventAliasName * @param serverTime * @return */ private String generateRowKey(String uuid, String memberId, String eventAliasName, String serverTime) { StringBuilder sb = new StringBuilder(); sb.append(serverTime).append("_"); this.crc32.reset(); if (StringUtils.isNotBlank(uuid)) { this.crc32.update(uuid.getBytes()); } if (StringUtils.isNotBlank(memberId)) { this.crc32.update(memberId.getBytes()); } this.crc32.update(eventAliasName.getBytes()); sb.append(this.crc32.getValue() % 100000000L); return sb.toString(); } }
Cluster profile:
You need to replace it with your own configuration file in the resource directory
core-site.xml ,hbase-site.xml ,hdfs-site.xml ,yarn-site.xml
ETL packaging test
Testing in Linux
Upload the jar package to the / root/tmp/jar directory of linux
cd /root/tmp/jar
Run jar package test
java -jar ETL.jar
verification
//Log in to Hbase hbase shell //Scan table scan event
HBase test data loading
- The idea of storing simulation data into hbase:
- First, create a java class to store the simulated user id, session id and system time;
- Create HBase table object;
- Create a tool class for resolving ip, obtain the ip address through the tool class, and finally output the field data of province, city and country;
- Create a method to randomly generate system time, random events, random web address, random browser, random platform and random system. You can specify the random range, and finally generate and output browser and browser_v,en,os,os_v,p_url, pl field data
- Add a piece of output data to the put set of the and return the put object;
- Use table The put method loads the data into the HBase table;
Create a new TestDataMaker test file:
Need to customize TN = "event"
public class TestDataMaker { //Table configuration private static String TN = "event"; private static Configuration conf; private static Connection connection; private static Admin admin; private static Table table; public static void main(String[] args) throws Exception { TestDataMaker tDataMaker = new TestDataMaker(); Random r = new Random(); conf = HBaseConfiguration.create(); connection = ConnectionFactory.createConnection(conf); Admin admin = connection.getAdmin(); table = connection.getTable(TableName.valueOf(TN)); // User ID u_ud randomly generates 8 bits String uuid = String.format("%08d", r.nextInt(99999999)); // Member ID u_mid randomly generates 8 bits String memberId = String.format("%08d", r.nextInt(99999999)); List<Put> puts = new ArrayList<Put>(); for (int i = 0; i < 100; i++) { if(i%5==0) { uuid = String.format("%08d", r.nextInt(99999999)); memberId = String.format("%08d", r.nextInt(99999999)); } if(i%6==0) { uuid = String.format("%08d", r.nextInt(99999999)); memberId = String.format("%08d", r.nextInt(99999999)); } SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd"); Calendar calendar = Calendar.getInstance(); calendar.add(Calendar.DATE, -1); Date d = tDataMaker.getDate(df.format(calendar.getTime())); String serverTime = ""+d.getTime(); Put put = tDataMaker.putMaker(uuid, memberId, serverTime); puts.add(put); } table.put(puts); } Random r = new Random(); private static IPSeekerExt ipSeekerExt = new IPSeekerExt(); /** * test data * day Date: * lognum Number of logs */ public Put putMaker(String uuid, String memberId, String serverTime) { Map<String, Put> map = new HashMap<String, Put>(); byte[] family = Bytes.toBytes(EventLogConstants.EVENT_LOGS_FAMILY_NAME); // Parse log Map<String, String> clientInfo = LoggerUtil.handleLog("......"); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_SERVER_TIME, serverTime); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_UUID, uuid); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_PLATFORM, "website"); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_EVENT_NAME, EventNames[r.nextInt(EventNames.length)]); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_SESSION_ID, SessionIDs[r.nextInt(SessionIDs.length)]); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_CURRENT_URL, CurrentURLs[r.nextInt(CurrentURLs.length)]); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_NAME, this.getOsName()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_OS_VERSION, this.getOsVersion()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_NAME, this.getBrowserName()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_BROWSER_VERSION, this.getBrowserVersion()); String ip = IPs[r.nextInt(IPs.length)]; RegionInfo info = ipSeekerExt.analyticIp(ip); if (info != null) { clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_COUNTRY, info.getCountry()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_PROVINCE, info.getProvince()); clientInfo.put(EventLogConstants.LOG_COLUMN_NAME_CITY, info.getCity()); } String eventName = EventNames[r.nextInt(EventNames.length)]; //Generate rowkey String rowkey = this.generateRowKey(uuid, memberId, eventName, serverTime); Put put = new Put(Bytes.toBytes(rowkey)); for (Map.Entry<String, String> entry : clientInfo.entrySet()) { put.addColumn(family, Bytes.toBytes(entry.getKey()), Bytes.toBytes(entry.getValue())); } return put; } private String[] CurrentURLs = new String[]{"http://www.jd.com", "http://www.tmall.com","http://www.sina.com","http://www.weibo.com"}; private String[] SessionIDs = new String[]{"1A3B4F83-6357-4A64-8527-F092169746D3", "12344F83-6357-4A64-8527-F09216974234","1A3B4F83-6357-4A64-8527-F092169746D8"}; private String[] IPs = new String[]{"58.42.245.255","39.67.154.255", "23.13.191.255","14.197.148.38","14.197.149.137","14.197.201.202","14.197.243.254"}; private String[] EventNames = new String[]{"e_l","e_pv"}; private String[] BrowserNames = new String[]{"FireFox","Chrome","aoyou","360"}; /** * Gets the random browser name * @return */ private String getBrowserName() { return BrowserNames[r.nextInt(BrowserNames.length)]; } /** * Get random browser version information * @return */ private String getBrowserVersion() { return (""+r.nextInt(9)); } /** * Obtain random system version information * @return */ private String getOsVersion() { return (""+r.nextInt(3)); } private String[] OsNames = new String[]{"window","linux","ios"}; /** * Obtain random system information * @return */ private String getOsName() { return OsNames[r.nextInt(OsNames.length)]; } private CRC32 crc32 = new CRC32(); /** * Create rowkey based on uuid memberid servertime * @param uuid * @param memberId * @param eventAliasName * @param serverTime * @return */ private String generateRowKey(String uuid, String memberId, String eventAliasName, String serverTime) { StringBuilder sb = new StringBuilder(); sb.append(serverTime).append("_"); this.crc32.reset(); if (StringUtils.isNotBlank(uuid)) { this.crc32.update(uuid.getBytes()); } if (StringUtils.isNotBlank(memberId)) { this.crc32.update(memberId.getBytes()); } this.crc32.update(eventAliasName.getBytes()); sb.append(this.crc32.getValue() % 100000000L); return sb.toString(); } SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss"); /** * Random generation time * @param str Mm / DD / 20160101 * @return */ public Date getDate(String str) { str = str + String.format("%02d%02d%02d", new Object[]{r.nextInt(24), r.nextInt(60), r.nextInt(60)}); Date d = new Date(); try { d = sdf.parse(str); } catch (ParseException e) { e.printStackTrace(); } return d; } }
Data query
Create a new HbaseScan class file: query the data in HBase
To be customized: scanTable("event", null, null);
public class HbaseScan { private static Configuration conf; private static Connection connection; private static Admin admin; static { conf = HBaseConfiguration.create(); try { connection = ConnectionFactory.createConnection(conf); } catch (IOException e) { e.printStackTrace(); } } /** * Get HBase administrator * @return * @throws IOException */ public static Admin getAdmin() throws IOException { return connection.getAdmin(); } /** * Scan table * @param tableName * @param startRow Starting position * @param stopRow End position */ public static void scanTable(String tableName, String startRow, String stopRow) { if(tableName == null || tableName.length() == 0) { System.out.println("Please enter the table name correctly!"); return; } Table table = null; try { table = connection.getTable(TableName.valueOf(tableName)); Scan scan = new Scan(); // Left closed right open if(startRow != null && stopRow != null) { scan.withStartRow(Bytes.toBytes(startRow)); scan.withStopRow(Bytes.toBytes(stopRow)); } else { startRow = "invalid"; stopRow = "invalid"; } ResultScanner resultScanner = table.getScanner(scan); Iterator<Result> iterator = resultScanner.iterator(); System.out.println("scan\t startRow: " + startRow + "\t stopRow: " + stopRow); System.out.println("RowKey\tTimeStamp\tcolumnFamilyName\tcolumnQualifierName"); while(iterator.hasNext()) { Result result = iterator.next(); showCell(result); } } catch (IOException e) { e.printStackTrace(); } finally { closeTable(table); } } /** * Format output * @param result */ public static void showCell(Result result) { Cell[] cells = result.rawCells(); for (Cell cell : cells) { System.out.print(Bytes.toString(CellUtil.cloneRow(cell)) + "\t"); System.out.print(cell.getTimestamp() + "\t"); String columnFamilyName = Bytes.toString(CellUtil.cloneFamily(cell)); String columnQualifierName = Bytes.toString(CellUtil.cloneQualifier(cell)); String value = Bytes.toString(CellUtil.cloneValue(cell)); System.out.println(columnFamilyName + ":" + columnQualifierName + "\t\t\t" + value); } } /** * Close connection */ public static void closeConn() { try { if (null != admin){ admin.close(); connection.close(); } } catch (IOException e) { e.printStackTrace(); } } /** * Close table * @param table */ public static void closeTable(Table table) { if(table != null) { try { table.close(); } catch (IOException e) { e.printStackTrace(); } } } /** * Gets the specified row * @param tableName * @param rowKey * @param colFamily * @param col */ public static void getRow(String tableName, String rowKey, String colFamily, String col) { Table table = null; try { table = connection.getTable(TableName.valueOf(tableName)); Get g = new Get(Bytes.toBytes(rowKey)); // Gets the specified column family data if(col == null && colFamily != null) { g.addFamily(Bytes.toBytes(colFamily)); } else if(col != null && colFamily != null) { // Gets the specified column data g.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col)); } Result result = table.get(g); System.out.println("getRow\t"); System.out.println("RowKey\tTimeStamp\tcolumnFamilyName\tcolumnQualifierName"); showCell(result); } catch (IOException e) { e.printStackTrace(); } finally { closeTable(table); } } public static void main(String[] args) throws IOException { admin = getAdmin(); scanTable("event", null, null); // getRow("nevent", "1542988607000_9327110", "log", "city"); closeConn(); } }
Data analysis
According to the requirements and objectives, the table structure is designed. We need to combine MapReduce statistics according to the time range (year, month and day).
Idea:
a) Dimension, that is, a certain angle, a certain perspective, is counted according to the time dimension. For example, I want to count the pv of all months and all days in 2017. Then this dimension can be expressed as mm / DD / YY, 2017
b) Aggregate data to Reducer according to different dimensions through Mapper
c) Get the data aggregated according to various dimensions through Reducer, summarize and output
d) According to the business requirements, the output of the Reducer is transmitted to the data through the Outputformat
**Data input: * * HBase
**Data output: * * Mysql
Data source structure in HBase:
label | Ex amp le & Illustration |
---|---|
rowkey | timestamp+(uid+mid+en)crc |
log | f1 column family: store log information (en) event name, eg: e_pvver version number, eg: 0.0.1pl platform, eg: websitesdk Sdk type, eg: jsb_rst browser resolution, eg: 1800678b_iev browser information useragentu UD user / visitor unique identifier l client language u_mid member id, consistent with the business system, u_sd session idc_time client time p_url urlp_ref of the current page urltt of the previous page Page title ca Event Category name ac Event action name kV_ Custom attribute of event event duration of event order name of order payment amount payment currency type pt payment method) |
a) If the goal is known, it is necessary to consider whether the existing data can support the realization of the goal in combination with the goal;
b) According to the target data structure, build Mysql table structure and create tables;
c) Think about which functional modules need to be involved in the code, and establish the package structure corresponding to different functional modules.
d) The description data must be based on a certain dimension (perspective), so construct a dimension class. For example, aggregate all data according to the combination of "platform" and "browser" as a key, so you can count the relevant results of this user in this year.
e) The custom OutputFormat is used to interface with Mysql to output data.
f) Create related tool classes.
Mysql table structure design
We save the analysis result data to Mysql to facilitate query and display on the Web.
-
MySql storage structure
In mysql, we use three types of tables: dimension information table + statistical analysis result table + analysis auxiliary table. The dimension information table is used to store dimension related information. The name format is: dimension_; The statistical analysis result table stores the final statistical analysis results. The dimension id is used as the main key. The name format is: stats_; Analysis auxiliary tables are other auxiliary tables used by users during analysis.
According to the final dimension information, we need to create the following eight dimension tables: platform, date, browser, location, payment, currency_type, event and inbound. In addition, you need a kpi dimension and an operating system dimension (os) table respectively. Note that the os table will not be used in this project.
Analysis module Related dimension table User basic information analysis platform,date Browser information analysis platform,date,browser Regional information analysis platform,date,location User browsing depth analysis platform,date,kpi External chain information analysis platform,date,inbound Order information analysis platform,date,currency_type,payment Event analysis platform,date,event -
In the user basic information analysis module, the database corresponding table is required to have data in the following dimensions: the number of new users, the number of active users, the total number of users, the number of new members, the number of active members, the total number of members, the number of sessions, and the session length. In addition, the platform and date dimension information id and the created field are required to indicate the modification time. The table data is uniquely determined by the two field information of platform and date. The design table name is stats_user. In addition to this table, we also need to count the data information by time period, so we also need to have a table for storing statistical data by time. The design table name is stats_hourly.
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey active_users int(11) Empty. 0 Number of active users new_install_users ing(11) Empty, 0 Number of new users total_intall_users int(11) Empty, 0 Total users sessions int(11) Empty, 0 Number of sessions sessions_length int(11) Empty, 0 Session length total_members int(11) Empty, 0 Total members active_members int(11) Empty, 0 Number of active members new_members int(11) Empty, 0 Number of new members created date Empty, null Record date -
The basic types of browser information analysis and user basic information analysis also contain data in the following dimensions: the number of new users, the number of active users, the total number of users, the number of new members, the number of active members, the total number of members, the number of sessions, and the length of sessions. In addition, a statistical indicator of pv count, platform, date The three dimension information fields of browser and create indicate the modification date. The table data is uniquely determined by the three field information of platform, date and browser. The design table name is stats_device_browser
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey browser_dimension_id int(11) Non empty, 0 Browser id, pkey active_user int(11) Empty. 0 Number of active users new_install_users ing(11) Empty, 0 Number of new users total_intall_users int(11) Empty, 0 Total users sessions int(11) Empty, 0 Number of sessions sessions_length int(11) Empty, 0 Session length total_members int(11) Empty, 0 Total members active_members int(11) Empty, 0 Number of active members new_members int(11) Empty, 0 Number of new members pv int(11) Empty, 0 pv number created date Empty, null Last modified date -
The regional information analysis module only analyzes the regional distribution of active users and the correlation analysis of jump out rate, so the following statistical indicators are required: number of active users, number of sessions and number of jump out sessions. In addition, the three dimension information fields of platform, date and location and the field of create indicating the modification date are required. The table data is uniquely determined by three field information: platform, date and location. The design table name is stats_device_location
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey location_dimension_id int(11) Non empty, 0 Region id, pkey active_user int(11) Empty. 0 Number of active users sessions int(11) Empty, 0 Number of sessions bounce_sessions int(11) Empty, 0 Number of jump out sessions created date Empty, null Last modified date -
Users' browsing depth is expressed by calculating the number of users / sessions accessing different numbers of pages. In this project, we are divided into 8 indicators of different orders of magnitude, namely: accessing 1 pv, accessing 2 pv, accessing 3 pv, accessing 4 pv, accessing 5-10 pv (including 5 but not including 10), accessing 10-30 pv, accessing 30-60 pv, and accessing 60+pv. In addition, the three dimension information fields of platform, date and kpi and the field of create indicating the modification date are required. The table data is uniquely determined by three field information: platform, date and kpi. The design table name is stats_view_depth
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey kpi_dimension_id int(11) Non empty, 0 kpiid,pkey pv1 int(11) Empty. 0 Number of visits to only one page pv2 int(11) Empty, 0 Access two pages pv3 int(11) Empty, 0 Visit three pages pv4 int(11) Empty, 0 Access four pages pv5_10 int(11) Empty, 0 Visit [5,10) pages pv10_30 int(11) Empty, 0 Visit [10,30) pages pv30_60 int(11) Empty, 0 Visit [30,60) pages pv60+ int(11) Empty, 0 Visit [60,...) pages created date Empty, null Last modified date -
External chain information analysis mainly includes external chain composition (preference) analysis and jump out rate analysis. The composition (preference) of the external chain is marked by the number of active users. We need several statistical indicators: the number of active users, the number of sessions and the number of outgoing sessions. In addition, the three dimension information fields of platform, date and inbound and the field of create indicating the modification date are required. The table data is uniquely determined by three field information: platform, date and inbound. The design table name is stats_inbound
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey inbound_dimension_id int(11) Non empty, 0 Outer chain id, pkey active_user int(11) Empty. 0 Number of active users sessions int(11) Empty, 0 Number of sessions bounce_sessions int(11) Empty, 0 Number of jump out sessions created date Empty, null Last modified date -
Order information analysis needs to analyze the statistical information related to order quantity and order amount, so it needs data analysis of the following indicators: order quantity, successfully paid order quantity, refund order quantity, order amount, successfully paid amount, refund amount, total successfully paid amount and total refund amount. In addition, we also need to use platform, date and currency_ The four dimension field classes of type and payment represent the confirmation of unique records. In addition, you need to add a created field to represent the data date. The table name is designed as stats_order
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey currency_type_dimension_id int(11) Non empty, 0 Currency type id, pkey payment_type_dimension_id int(11) Non empty, 0 Payment type id, pkey orders int(11) Empty, 0 Order quantity success_orders int(11) Empty, 0 Number of orders successfully paid refund_orders int(11) Empty, 0 Order quantity refunded order_amount int(11) Empty, 0 Order amount revenue_amount int(11) Empty, 0 Payment amount refund_amount int(11) Empty, 0 refund amount total_revenue_amount int(11) Empty, 0 Total payment amount total_refund_amount int(11) Empty, 0 Total refund amount created date Empty, null Last modified date -
In this project, event analysis is mainly to analyze the trigger times of events. Therefore, the data storage structure of is: times, platform, date and event dimension fields and created fields. The table name is designed as: stats_event
Listing type Default value describe platform_dimension_id int(11) Non empty, 0 Platform id, pkey date_dimension_id int(11) Non empty, 0 Date id, pkey event_dimension_id int(11) Non empty, 0 event dimension id, pkey times int(11) Empty. 0 Trigger times created date Empty, null Last modified date
We ensure the normality of our data display through the integration of dimension information table and statistical analysis result table. The command for creating database is: CREATE DATABASEreport DEFAULT CHARACTER SET utf8;
-
In this project, the platform modeling is mainly to analyze the annual average pv, monthly average pv, weekly average pv, monthly daily average transaction amount, annual average order quantity, monthly average order quantity, total transaction amount in recent one year, transaction amount in recent one month, transaction amount in recent one week, total transactions in recent one year, transactions in recent one month and transactions in recent one week. The table name is designed as: inner_fct_sxt_deal
Listing type Default value describe platform_dimension_id vachar(110) Non empty, 0 Platform id, pkey date_dimension_id vachar(110) Non empty, 0 Date id, pkey year_avg_pv vachar(30) Empty, ' Annual average daily pv month_avg_pv vachar(30) Empty, ' Monthly daily average pv week_avg_pv vachar(30) Empty, ' Average pv per day month_day_bal vachar(30) Empty, ' Monthly and daily average transaction amount year_avg_bal vachar(30) Empty, ' Average annual order quantity month_avg_bal vachar(30) Empty, ' Monthly average order quantity year_sum_bal vachar(30) Empty, ' Total transaction amount in recent 1 year month_sum_bal vachar(30) Empty, ' Transaction amount in recent month week_sum_bal vachar(30) Empty, ' Transaction amount in recent week year_sum_count vachar(30) Empty, ' Total transactions in recent 1 year month_sum_count vachar(30) Empty, ' Number of transactions in recent month week_sum_count vachar(30) Empty, ' Number of transactions in recent week
Mysql table creation statement
# # Structure for table "dimension_browser" # DROP TABLE IF EXISTS `dimension_browser`; CREATE TABLE `dimension_browser` ( `id` int(11) NOT NULL AUTO_INCREMENT, `browser_name` varchar(45) NOT NULL DEFAULT '' COMMENT 'Browser name', `browser_version` varchar(255) NOT NULL DEFAULT '' COMMENT 'Browser version number', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Browser dimension information table'; # # Structure for table "dimension_currency_type" # DROP TABLE IF EXISTS `dimension_currency_type`; CREATE TABLE `dimension_currency_type` ( `id` int(11) NOT NULL AUTO_INCREMENT, `currency_name` varchar(10) DEFAULT NULL COMMENT 'Currency name', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Payment currency type dimension information table'; # # Structure for table "dimension_date" # DROP TABLE IF EXISTS `dimension_date`; CREATE TABLE `dimension_date` ( `id` int(11) NOT NULL AUTO_INCREMENT, `year` int(11) DEFAULT NULL, `season` int(11) DEFAULT NULL, `month` int(11) DEFAULT NULL, `week` int(11) DEFAULT NULL, `day` int(11) DEFAULT NULL, `calendar` date DEFAULT NULL, `type` enum('year','season','month','week','day') DEFAULT NULL COMMENT 'Date format', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Time dimension information table'; # # Structure for table "dimension_event" # DROP TABLE IF EXISTS `dimension_event`; CREATE TABLE `dimension_event` ( `id` int(11) NOT NULL AUTO_INCREMENT, `category` varchar(255) DEFAULT NULL COMMENT 'Event type category', `action` varchar(255) DEFAULT NULL COMMENT 'event action name', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Event dimension information table'; # # Structure for table "dimension_inbound" # DROP TABLE IF EXISTS `dimension_inbound`; CREATE TABLE `dimension_inbound` ( `id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL COMMENT 'Parent outer chain id', `name` varchar(45) DEFAULT NULL COMMENT 'External chain name', `url` varchar(255) DEFAULT NULL COMMENT 'Outer chain url', `type` int(11) DEFAULT NULL COMMENT 'External chain type', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Dimension information table of external chain source data'; # # Structure for table "dimension_kpi" # DROP TABLE IF EXISTS `dimension_kpi`; CREATE TABLE `dimension_kpi` ( `id` int(11) NOT NULL AUTO_INCREMENT, `kpi_name` varchar(45) DEFAULT NULL COMMENT 'kpi Dimension name', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='kpi Dimension related information table'; # # Structure for table "dimension_location" # DROP TABLE IF EXISTS `dimension_location`; CREATE TABLE `dimension_location` ( `id` int(11) NOT NULL AUTO_INCREMENT, `country` varchar(45) DEFAULT NULL COMMENT 'Country name', `province` varchar(45) DEFAULT NULL COMMENT 'Province name', `city` varchar(45) DEFAULT NULL COMMENT 'City name', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Regional information dimension table'; # # Structure for table "dimension_os" # DROP TABLE IF EXISTS `dimension_os`; CREATE TABLE `dimension_os` ( `id` int(11) NOT NULL AUTO_INCREMENT, `os_name` varchar(45) NOT NULL DEFAULT '' COMMENT 'Operating system name', `os_version` varchar(45) NOT NULL DEFAULT '' COMMENT 'Operating system version number', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Operating system information dimension table'; # # Structure for table "dimension_payment_type" # DROP TABLE IF EXISTS `dimension_payment_type`; CREATE TABLE `dimension_payment_type` ( `id` int(11) NOT NULL AUTO_INCREMENT, `payment_type` varchar(255) DEFAULT NULL COMMENT 'Name of payment method', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Payment method dimension information table'; # # Structure for table "dimension_platform" # DROP TABLE IF EXISTS `dimension_platform`; CREATE TABLE `dimension_platform` ( `id` int(11) NOT NULL AUTO_INCREMENT, `platform_name` varchar(45) DEFAULT NULL COMMENT 'Platform name', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Platform dimension information table'; # # Structure for table "event_info" # DROP TABLE IF EXISTS `event_info`; CREATE TABLE `event_info` ( `event_dimension_id` int(11) NOT NULL DEFAULT '0', `key` varchar(255) DEFAULT NULL, `value` varchar(255) DEFAULT NULL, `times` int(11) DEFAULT '0' COMMENT 'Trigger times', PRIMARY KEY (`event_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='describe event The attribute information of will not be used in this project'; # # Structure for table "order_info" # DROP TABLE IF EXISTS `order_info`; CREATE TABLE `order_info` ( `order_id` varchar(50) NOT NULL DEFAULT '', `date_dimension_id` int(11) NOT NULL DEFAULT '0', `amount` int(11) NOT NULL DEFAULT '0' COMMENT 'Order amount', `is_pay` int(1) DEFAULT '0' COMMENT 'Indicates whether to pay, 0 indicates not paid, and 1 indicates paid', `is_refund` int(1) DEFAULT '0' COMMENT 'Indicates whether to refund, 0 indicates no refund, and 1 indicates refund', PRIMARY KEY (`order_id`,`date_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Describe the relevant information of the order table The main goal of this project is to remove duplicate data'; # # Structure for table "stats_device_browser" # DROP TABLE IF EXISTS `stats_device_browser`; CREATE TABLE `stats_device_browser` ( `date_dimension_id` int(11) NOT NULL, `platform_dimension_id` int(11) NOT NULL, `browser_dimension_id` int(11) NOT NULL DEFAULT '0', `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users', `new_install_users` int(11) DEFAULT '0' COMMENT 'Number of new users', `total_install_users` int(11) DEFAULT '0' COMMENT 'Total users', `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions', `sessions_length` int(11) DEFAULT '0' COMMENT 'Session length', `total_members` int(11) unsigned DEFAULT '0' COMMENT 'Total members', `active_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of active members', `new_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of new members', `pv` int(11) DEFAULT '0' COMMENT 'pv number', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`browser_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistics of browser related analysis data'; # # Structure for table "stats_device_location" # DROP TABLE IF EXISTS `stats_device_location`; CREATE TABLE `stats_device_location` ( `date_dimension_id` int(11) NOT NULL, `platform_dimension_id` int(11) NOT NULL, `location_dimension_id` int(11) NOT NULL DEFAULT '0', `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users', `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions', `bounce_sessions` int(11) DEFAULT '0' COMMENT 'Number of jump out sessions', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`location_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistical table of regional correlation analysis data'; # # Structure for table "stats_event" # DROP TABLE IF EXISTS `stats_event`; CREATE TABLE `stats_event` ( `platform_dimension_id` int(11) NOT NULL DEFAULT '0', `date_dimension_id` int(11) NOT NULL DEFAULT '0', `event_dimension_id` int(11) NOT NULL DEFAULT '0', `times` int(11) DEFAULT '0' COMMENT 'Trigger times', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`event_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Statistical table for statistical event related analysis data'; # # Structure for table "stats_hourly" # DROP TABLE IF EXISTS `stats_hourly`; CREATE TABLE `stats_hourly` ( `platform_dimension_id` int(11) NOT NULL, `date_dimension_id` int(11) NOT NULL, `kpi_dimension_id` int(11) NOT NULL, `hour_00` int(11) DEFAULT '0', `hour_01` int(11) DEFAULT '0', `hour_02` int(11) DEFAULT '0', `hour_03` int(11) DEFAULT '0', `hour_04` int(11) DEFAULT '0', `hour_05` int(11) DEFAULT '0', `hour_06` int(11) DEFAULT '0', `hour_07` int(11) DEFAULT '0', `hour_08` int(11) DEFAULT '0', `hour_09` int(11) DEFAULT '0', `hour_10` int(11) DEFAULT '0', `hour_11` int(11) DEFAULT '0', `hour_12` int(11) DEFAULT '0', `hour_13` int(11) DEFAULT '0', `hour_14` int(11) DEFAULT '0', `hour_15` int(11) DEFAULT '0', `hour_16` int(11) DEFAULT '0', `hour_17` int(11) DEFAULT '0', `hour_18` int(11) DEFAULT '0', `hour_19` int(11) DEFAULT '0', `hour_20` int(11) DEFAULT '0', `hour_21` int(11) DEFAULT '0', `hour_22` int(11) DEFAULT '0', `hour_23` int(11) DEFAULT '0', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`kpi_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistics by hour'; # # Structure for table "stats_inbound" # DROP TABLE IF EXISTS `stats_inbound`; CREATE TABLE `stats_inbound` ( `platform_dimension_id` int(11) NOT NULL DEFAULT '0', `date_dimension_id` int(11) NOT NULL, `inbound_dimension_id` int(11) NOT NULL, `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users', `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions', `bounce_sessions` int(11) DEFAULT '0' COMMENT 'Number of jump out sessions', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`inbound_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistics of external chain information'; # # Structure for table "stats_order" # DROP TABLE IF EXISTS `stats_order`; CREATE TABLE `stats_order` ( `platform_dimension_id` int(11) NOT NULL DEFAULT '0', `date_dimension_id` int(11) NOT NULL DEFAULT '0', `currency_type_dimension_id` int(11) NOT NULL DEFAULT '0', `payment_type_dimension_id` int(11) NOT NULL DEFAULT '0', `orders` int(11) DEFAULT '0' COMMENT 'Number of orders', `success_orders` int(11) DEFAULT '0' COMMENT 'Number of orders successfully paid', `refund_orders` int(11) DEFAULT '0' COMMENT 'Number of refund orders', `order_amount` int(11) DEFAULT '0' COMMENT 'Order amount', `revenue_amount` int(11) DEFAULT '0' COMMENT 'The amount of income, that is, the amount successfully paid', `refund_amount` int(11) DEFAULT '0' COMMENT 'refund amount ', `total_revenue_amount` int(11) DEFAULT '0' COMMENT 'Total order transactions to date', `total_refund_amount` int(11) DEFAULT '0' COMMENT 'Total refund amount to date', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`,`currency_type_dimension_id`,`payment_type_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Statistics table of order information'; # # Structure for table "stats_user" # DROP TABLE IF EXISTS `stats_user`; CREATE TABLE `stats_user` ( `date_dimension_id` int(11) NOT NULL, `platform_dimension_id` int(11) NOT NULL, `active_users` int(11) DEFAULT '0' COMMENT 'Number of active users', `new_install_users` int(11) DEFAULT '0' COMMENT 'Number of new users', `total_install_users` int(11) DEFAULT '0' COMMENT 'Total users', `sessions` int(11) DEFAULT '0' COMMENT 'Number of sessions', `sessions_length` int(11) DEFAULT '0' COMMENT 'Session length', `total_members` int(11) unsigned DEFAULT '0' COMMENT 'Total members', `active_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of active members', `new_members` int(11) unsigned DEFAULT '0' COMMENT 'Number of new members', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`date_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='Statistical table for basic user information'; # # Structure for table "stats_view_depth" # DROP TABLE IF EXISTS `stats_view_depth`; CREATE TABLE `stats_view_depth` ( `platform_dimension_id` int(11) NOT NULL DEFAULT '0', `data_dimension_id` int(11) NOT NULL DEFAULT '0', `kpi_dimension_id` int(11) NOT NULL DEFAULT '0', `pv1` int(11) DEFAULT '0', `pv2` int(11) DEFAULT '0', `pv3` int(11) DEFAULT '0', `pv4` int(11) DEFAULT '0', `pv5_10` int(11) DEFAULT '0', `pv10_30` int(11) DEFAULT '0', `pv30_60` int(11) DEFAULT '0', `pv60+` int(11) DEFAULT '0', `created` date DEFAULT NULL, PRIMARY KEY (`platform_dimension_id`,`data_dimension_id`,`kpi_dimension_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Statistical table of user browsing depth related analysis data';
Transformer code function
- Process the data of various dimensions
- Integrate some dimensions
- Use MR program to store the combined dimension and data into mysql
Transformer coding
-
Create basic dimension class
-
BaseDimension coding
Create a new BaseDimension class file: basic interface, which can be called later
No customization required
public abstract class BaseDimension implements WritableComparable<BaseDimension>{ // nothing }
-
BrowserDimension coding
Create a new BrowserDimension class file: get the id, name and version in the browser
No customization required
public class BrowserDimension extends BaseDimension { private int id; // id private String browserName; // name private String browserVersion; // edition public BrowserDimension() { super(); } public BrowserDimension(String browserName, String browserVersion) { super(); this.browserName = browserName; this.browserVersion = browserVersion; } public void clean() { this.id = 0; this.browserName = ""; this.browserVersion = ""; } public static BrowserDimension newInstance(String browserName, String browserVersion) { BrowserDimension browserDimension = new BrowserDimension(); browserDimension.browserName = browserName; browserDimension.browserVersion = browserVersion; return browserDimension; } /** * Build multiple browser dimension information object collections * * @param browserName chrome * @param browserVersion 48 * @return */ public static List<BrowserDimension> buildList(String browserName, String browserVersion) { List<BrowserDimension> list = new ArrayList<BrowserDimension>(); if (StringUtils.isBlank(browserName)) { // If the browser name is empty, it is set to unknown browserName = GlobalConstants.DEFAULT_VALUE; browserVersion = GlobalConstants.DEFAULT_VALUE; } if (StringUtils.isEmpty(browserVersion)) { browserVersion = GlobalConstants.DEFAULT_VALUE; } // list.add(BrowserDimension.newInstance(GlobalConstants.VALUE_OF_ALL, // GlobalConstants.VALUE_OF_ALL)); list.add(BrowserDimension.newInstance(browserName, GlobalConstants.VALUE_OF_ALL)); list.add(BrowserDimension.newInstance(browserName, browserVersion)); return list; } public int getId() { return id; } public void setId(int id) { this.id = id; } public String getBrowserName() { return browserName; } public void setBrowserName(String browserName) { this.browserName = browserName; } public String getBrowserVersion() { return browserVersion; } public void setBrowserVersion(String browserVersion) { this.browserVersion = browserVersion; } @Override public void write(DataOutput out) throws IOException { out.writeInt(this.id); out.writeUTF(this.browserName); out.writeUTF(this.browserVersion); } @Override public void readFields(DataInput in) throws IOException { this.id = in.readInt(); this.browserName = in.readUTF(); this.browserVersion = in.readUTF(); } @Override public int compareTo(BaseDimension o) { if (this == o) { return 0; } BrowserDimension other = (BrowserDimension) o; int tmp = Integer.compare(this.id, other.id); if (tmp != 0) { return tmp; } tmp = this.browserName.compareTo(other.browserName); if (tmp != 0) { return tmp; } tmp = this.browserVersion.compareTo(other.browserVersion); return tmp; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((browserName == null) ? 0 : browserName.hashCode()); result = prime * result + ((browserVersion == null) ? 0 : browserVersion.hashCode()); result = prime * result + id; return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; BrowserDimension other = (BrowserDimension) obj; if (browserName == null) { if (other.browserName != null) return false; } else if (!browserName.equals(other.browserName)) return false; if (browserVersion == null) { if (other.browserVersion != null) return false; } else if (!browserVersion.equals(other.browserVersion)) return false; if (id != other.id) return false; return true; } }
-
DateDimension code writing
Create a new DateDimension class file: get the ID, year, season, month, week, day and type of the date
No customization required
public class DateDimension extends BaseDimension { private int id; // id,eg: 1 private int year; // Year: eg: 2015 private int season; // Quarter, eg:4 private int month; // Month, eg:12 private int week; // week private int day; private String type; // type private Date calendar = new Date(); /** * Get the corresponding time dimension object according to the type * * @param time * time stamp * @param type * type * @return */ public static DateDimension buildDate(long time, DateEnum type) { int year = TimeUtil.getDateInfo(time, DateEnum.YEAR); Calendar calendar = Calendar.getInstance(); calendar.clear(); if (DateEnum.YEAR.equals(type)) { calendar.set(year, 0, 1); return new DateDimension(year, 0, 0, 0, 0, type.name, calendar.getTime()); } int season = TimeUtil.getDateInfo(time, DateEnum.SEASON); if (DateEnum.SEASON.equals(type)) { int month = (3 * season - 2); calendar.set(year, month - 1, 1); return new DateDimension(year, season, 0, 0, 0, type.name, calendar.getTime()); } int month = TimeUtil.getDateInfo(time, DateEnum.MONTH); if (DateEnum.MONTH.equals(type)) { calendar.set(year, month - 1, 1); return new DateDimension(year, season, month, 0, 0, type.name, calendar.getTime()); } int week = TimeUtil.getDateInfo(time, DateEnum.WEEK); if (DateEnum.WEEK.equals(type)) { long firstDayOfWeek = TimeUtil.getFirstDayOfThisWeek(time); // Gets the timestamp of the first day of the week to which the specified timestamp belongs year = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.YEAR); season = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.SEASON); month = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.MONTH); week = TimeUtil.getDateInfo(firstDayOfWeek, DateEnum.WEEK); if (month == 12 && week == 1) { week = 53; } return new DateDimension(year, season, month, week, 0, type.name, new Date(firstDayOfWeek)); } int day = TimeUtil.getDateInfo(time, DateEnum.DAY); if (DateEnum.DAY.equals(type)) { calendar.set(year, month - 1, day); return new DateDimension(year, season, month, week, day, type.name, calendar.getTime()); } throw new RuntimeException("The requested is not supported dateEnum Type to get datedimension object" + type); } public DateDimension() { super(); } public DateDimension(int year, int season, int month, int week, int day, String type) { super(); this.year = year; this.season = season; this.month = month; this.week = week; this.day = day; this.type = type; } public DateDimension(int year, int season, int month, int week, int day, String type, Date calendar) { this(year, season, month, week, day, type); this.calendar = calendar; } public DateDimension(int id, int year, int season, int month, int week, int day, String type, Date calendar) { this(year, season, month, week, day, type, calendar); this.id = id; } public int getId() { return id; } public void setId(int id) { this.id = id; } public int getYear() { return year; } public void setYear(int year) { this.year = year; } public int getSeason() { return season; } public void setSeason(int season) { this.season = season; } public int getMonth() { return month; } public void setMonth(int month) { this.month = month; } public int getWeek() { return week; } public void setWeek(int week) { this.week = week; } public int getDay() { return day; } public void setDay(int day) { this.day = day; } public String getType() { return type; } public void setType(String type) { this.type = type; } public Date getCalendar() { return calendar; } public void setCalendar(Date calendar) { this.calendar = calendar; } @Override public void write(DataOutput out) throws IOException { out.writeInt(this.id); out.writeInt(this.year); out.writeInt(this.season); out.writeInt(this.month); out.writeInt(this.week); out.writeInt(this.day); out.writeUTF(this.type); out.writeLong(this.calendar.getTime()); } @Override public void readFields(DataInput in) throws IOException { this.id = in.readInt(); this.year = in.readInt(); this.season = in.readInt(); this.month = in.readInt(); this.week = in.readInt(); this.day = in.readInt(); this.type = in.readUTF(); this.calendar.setTime(in.readLong()); } @Override public int compareTo(BaseDimension o) { if (this == o) { return 0; } DateDimension other = (DateDimension) o; int tmp = Integer.compare(this.id, other.id); if (tmp != 0) { return tmp; } tmp = Integer.compare(this.year, other.year); if (tmp != 0) { return tmp; } tmp = Integer.compare(this.season, other.season); if (tmp != 0) { return tmp; } tmp = Integer.compare(this.month, other.month); if (tmp != 0) { return tmp; } tmp = Integer.compare(this.week, other.week); if (tmp != 0) { return tmp; } tmp = Integer.compare(this.day, other.day); if (tmp != 0) { return tmp; } tmp = this.type.compareTo(other.type); return tmp; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + day; result = prime * result + id; result = prime * result + month; result = prime * result + season; result = prime * result + ((type == null) ? 0 : type.hashCode()); result = prime * result + week; result = prime * result + year; return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; DateDimension other = (DateDimension) obj; if (day != other.day) return false; if (id != other.id) return false; if (month != other.month) return false; if (season != other.season) return false; if (type == null) { if (other.type != null) return false; } else if (!type.equals(other.type)) return false; if (week != other.week) return false; if (year != other.year) return false; return true; } }
-
KpiDimension coding
Create a new KpiDimension class file: get user id, and user name
No customization required
public class KpiDimension extends BaseDimension {
private int id; private String kpiName; public KpiDimension() { super(); } public KpiDimension(String kpiName) { super(); this.kpiName = kpiName; } public KpiDimension(int id, String kpiName) { super(); this.id = id; this.kpiName = kpiName; } public int getId() { return id; } public void setId(int id) { this.id = id; } public String getKpiName() { return kpiName; } public void setKpiName(String kpiName) { this.kpiName = kpiName; } @Override public void write(DataOutput out) throws IOException { out.writeInt(this.id); out.writeUTF(this.kpiName); } @Override public void readFields(DataInput in) throws IOException { this.id = in.readInt(); this.kpiName = in.readUTF(); } @Override public int compareTo(BaseDimension o) { if (this == o) { return 0; } KpiDimension other = (KpiDimension) o; int tmp = Integer.compare(this.id, other.id); if (tmp != 0) { return tmp; } tmp = this.kpiName.compareTo(other.kpiName); return tmp; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + id; result = prime * result + ((kpiName == null) ? 0 : kpiName.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; KpiDimension other = (KpiDimension) obj; if (id != other.id) return false; if (kpiName == null) { if (other.kpiName != null) return false; } else if (!kpiName.equals(other.kpiName)) return false; return true; }
}
* PlatformDimension Code writing newly build PlatformDimension Class files: getting platform id,And platform name > No customization required ```java public class PlatformDimension extends BaseDimension { private int id; private String platformName; public PlatformDimension() { super(); } public PlatformDimension(String platformName) { super(); this.platformName = platformName; } public PlatformDimension(int id, String platformName) { super(); this.id = id; this.platformName = platformName; } public static List<PlatformDimension> buildList(String platformName) { if (StringUtils.isBlank(platformName)) { platformName = GlobalConstants.DEFAULT_VALUE; } List<PlatformDimension> list = new ArrayList<PlatformDimension>(); list.add(new PlatformDimension(GlobalConstants.VALUE_OF_ALL)); list.add(new PlatformDimension(platformName)); return list; } public int getId() { return id; } public void setId(int id) { this.id = id; } public String getPlatformName() { return platformName; } public void setPlatformName(String platformName) { this.platformName = platformName; } @Override public void write(DataOutput out) throws IOException { out.writeInt(this.id); out.writeUTF(this.platformName); } @Override public void readFields(DataInput in) throws IOException { this.id = in.readInt(); this.platformName = in.readUTF(); } @Override public int compareTo(BaseDimension o) { if (this == o) { return 0; } PlatformDimension other = (PlatformDimension) o; int tmp = Integer.compare(this.id, other.id); if (tmp != 0) { return tmp; } tmp = this.platformName.compareTo(other.platformName); return tmp; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + id; result = prime * result + ((platformName == null) ? 0 : platformName.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; PlatformDimension other = (PlatformDimension) obj; if (id != other.id) return false; if (platformName == null) { if (other.platformName != null) return false; } else if (!platformName.equals(other.platformName)) return false; return true; } }
-
-
Create composite dimension class
-
StatsDimension coding
New StatsDimension class file: basic interface, which can be called by later classes
No customization required
public abstract class StatsDimension extends BaseDimension { // nothing }
-
StatsCommonDimension coding
New StatsCommonDimension class file: the most commonly used dimension combination of the base, including time, platform, and user
No customization required
public class StatsCommonDimension extends StatsDimension { private DateDimension date = new DateDimension(); private PlatformDimension platform = new PlatformDimension(); private KpiDimension kpi = new KpiDimension(); /** * close An instance object * * @param dimension * @return */ public static StatsCommonDimension clone(StatsCommonDimension dimension) { DateDimension date = new DateDimension(dimension.date.getId(), dimension.date.getYear(), dimension.date.getSeason(), dimension.date.getMonth(), dimension.date.getWeek(), dimension.date.getDay(), dimension.date.getType(), dimension.date.getCalendar()); PlatformDimension platform = new PlatformDimension(dimension.platform.getId(), dimension.platform.getPlatformName()); KpiDimension kpi = new KpiDimension(dimension.kpi.getId(), dimension.kpi.getKpiName()); return new StatsCommonDimension(date, platform, kpi); } public StatsCommonDimension() { super(); } public StatsCommonDimension(DateDimension date, PlatformDimension platform, KpiDimension kpi) { super(); this.date = date; this.platform = platform; this.kpi = kpi; } public DateDimension getDate() { return date; } public void setDate(DateDimension date) { this.date = date; } public PlatformDimension getPlatform() { return platform; } public void setPlatform(PlatformDimension platform) { this.platform = platform; } public KpiDimension getKpi() { return kpi; } public void setKpi(KpiDimension kpi) { this.kpi = kpi; } @Override public void write(DataOutput out) throws IOException { this.date.write(out); this.platform.write(out); this.kpi.write(out); } @Override public void readFields(DataInput in) throws IOException { this.date.readFields(in); this.platform.readFields(in); this.kpi.readFields(in); } @Override public int compareTo(BaseDimension o) { if (this == o) { return 0; } StatsCommonDimension other = (StatsCommonDimension) o; int tmp = this.date.compareTo(other.date); if (tmp != 0) { return tmp; } tmp = this.platform.compareTo(other.platform); if (tmp != 0) { return tmp; } tmp = this.kpi.compareTo(other.kpi); return tmp; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((date == null) ? 0 : date.hashCode()); result = prime * result + ((kpi == null) ? 0 : kpi.hashCode()); result = prime * result + ((platform == null) ? 0 : platform.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; StatsCommonDimension other = (StatsCommonDimension) obj; if (date == null) { if (other.date != null) return false; } else if (!date.equals(other.date)) return false; if (kpi == null) { if (other.kpi != null) return false; } else if (!kpi.equals(other.kpi)) return false; if (platform == null) { if (other.platform != null) return false; } else if (!platform.equals(other.platform)) return false; return true; } }
-
StatsUserDimension coding
New StatsUserDimension class file: a dimension combination of users, including time, platform, user and browser
No customization required
public class StatsUserDimension extends StatsDimension { private StatsCommonDimension statsCommon = new StatsCommonDimension(); private BrowserDimension browser = new BrowserDimension(); /** * close An instance object * * @param dimension * @return */ public static StatsUserDimension clone(StatsUserDimension dimension) { BrowserDimension browser = new BrowserDimension(dimension.browser.getBrowserName(), dimension.browser.getBrowserVersion()); StatsCommonDimension statsCommon = StatsCommonDimension.clone(dimension.statsCommon); return new StatsUserDimension(statsCommon, browser); } public StatsUserDimension() { super(); } public StatsUserDimension(StatsCommonDimension statsCommon, BrowserDimension browser) { super(); this.statsCommon = statsCommon; this.browser = browser; } public StatsCommonDimension getStatsCommon() { return statsCommon; } public void setStatsCommon(StatsCommonDimension statsCommon) { this.statsCommon = statsCommon; } public BrowserDimension getBrowser() { return browser; } public void setBrowser(BrowserDimension browser) { this.browser = browser; } @Override public void write(DataOutput out) throws IOException { this.statsCommon.write(out); this.browser.write(out); } @Override public void readFields(DataInput in) throws IOException { this.statsCommon.readFields(in); this.browser.readFields(in); } @Override public int compareTo(BaseDimension o) { if (this == o) { return 0; } StatsUserDimension other = (StatsUserDimension) o; int tmp = this.statsCommon.compareTo(other.statsCommon); if (tmp != 0) { return tmp; } tmp = this.browser.compareTo(other.browser); return tmp; } }
-
-
Create a class to get the data of the indicator
-
KpiType coding
Create a new kpi type class file: count the names of user KPIs and enumerate classes
No customization required
public enum KpiType { NEW_INSTALL_USER("new_install_user"), // kpi statistics for new users BROWSER_NEW_INSTALL_USER("browser_new_install_user"), // Count new user KPIs for browser dimensions ACTIVE_USER("active_user"), // Statistics of active user KPIs BROWSER_ACTIVE_USER("browser_active_user"), // Counts the active user KPIs for the browser dimension ; public final String name; private KpiType(String name) { this.name = name; } /** * Obtain the corresponding kpitype enumeration object according to the name string value of kpitype * * @param name * @return */ public static KpiType valueOfName(String name) { for (KpiType type : values()) { if (type.name.equals(name)) { return type; } } throw new RuntimeException("designated name Does not belong to the KpiType Enumeration class:" + name); } }
-
BaseStatsValueWritable code writing
Create a new BaseStatsValueWritable class file: customize the basic statistics parent class for later classes to call
No customization required
public abstract class BaseStatsValueWritable implements Writable { /** * Get the kpi value corresponding to the current value * * @return */ public abstract KpiType getKpi(); }
-
MapWritableValue coding
New MapWritableValue class file: user basic information data
No customization required
public class MapWritableValue extends BaseStatsValueWritable { private MapWritable value = new MapWritable();//A row record is about to be inserted into the database table private KpiType kpi; public MapWritableValue() { super(); } public MapWritableValue(MapWritable value, KpiType kpi) { super(); this.value = value; this.kpi = kpi; } public MapWritable getValue() { return value; } public void setValue(MapWritable value) { this.value = value; } public void setKpi(KpiType kpi) { this.kpi = kpi; } @Override public void write(DataOutput out) throws IOException { this.value.write(out); WritableUtils.writeEnum(out, this.kpi); } @Override public void readFields(DataInput in) throws IOException { this.value.readFields(in); this.kpi = WritableUtils.readEnum(in, KpiType.class); } @Override public KpiType getKpi() { return this.kpi; } }
-
TimeOutputValue coding
New TimeOutputValue class file: time information
No customization required
public class TimeOutputValue extends BaseStatsValueWritable { private String id; // id private long time; // time stamp public String getId() { return id; } public void setId(String id) { this.id = id; } public long getTime() { return time; } public void setTime(long time) { this.time = time; } @Override public void write(DataOutput out) throws IOException { out.writeUTF(this.id); out.writeLong(this.time); } @Override public void readFields(DataInput in) throws IOException { this.id = in.readUTF(); this.time = in.readLong(); } @Override public KpiType getKpi() { // TODO Auto-generated method stub return null; } }
-
-
Create a class to output MR to Mysql
-
IDimensionConverter code writing
Create a new IDimensionConverter class file: provides an interface for special operations (querying and inserting dimension tables from relational databases)
No customization required
public interface IDimensionConverter { /** * Get the ID < br / > according to the value value of the dimension * If there is in the database, return directly. If not, the new id value is returned after the insertion * * @param dimension * @return * @throws IOException */ public int getDimensionIdByValue(BaseDimension dimension) throws IOException; }
-
IOutputCollector coding
Create a new IOutputCollector class file: a custom class that performs specific sql output with custom output
No customization required
public interface IOutputCollector { /** * Specific methods of statistical data insertion * * @param conf * @param key * @param value * @param pstmt * @param converter * @throws SQLException * @throws IOException */ public void collect(Configuration conf, BaseDimension key, BaseStatsValueWritable value, PreparedStatement pstmt, IDimensionConverter converter) throws SQLException, IOException; }
-
-
Create classes for active user analysis
-
JdbcManager coding
Create a new JdbcManager class file: jdbc management to obtain jdbc information
No customization required
public class JdbcManager { /** * Obtain the jdbc connection of the relational database according to the configuration * * @param conf * hadoop configuration information * @param flag * Flag bits for distinguishing different data sources * @return * @throws SQLException */ public static Connection getConnection(Configuration conf, String flag) throws SQLException { String driverStr = String.format(GlobalConstants.JDBC_DRIVER, flag); String urlStr = String.format(GlobalConstants.JDBC_URL, flag); String usernameStr = String.format(GlobalConstants.JDBC_USERNAME, flag); String passwordStr = String.format(GlobalConstants.JDBC_PASSWORD, flag); String driverClass = conf.get(driverStr); String url = conf.get(urlStr); String username = conf.get(usernameStr); String password = conf.get(passwordStr); try { Class.forName(driverClass); } catch (ClassNotFoundException e) { // nothing } return DriverManager.getConnection(url, username, password); } }
-
DimensionConverterImpl coding
Create a new DimensionConverterImpl class file: create a connection with the database and write the dimension data into it
To customize:
private static final String URL = "jdbc:mysql://bd1601:3306/test";
private static final String USERNAME = "root";
private static final String PASSWORD = "123456";
public class DimensionConverterImpl implements IDimensionConverter { private static final Logger logger = Logger.getLogger(DimensionConverterImpl.class); private static final String DRIVER = "com.mysql.cj.jdbc.Driver"; private static final String URL = "jdbc:mysql://bd1601:3306/test"; private static final String USERNAME = "root"; private static final String PASSWORD = "123456"; private Map<String, Integer> cache = new LinkedHashMap<String, Integer>() { private static final long serialVersionUID = 8894507016522723685L; @Override protected boolean removeEldestEntry(Map.Entry<String, Integer> eldest) { return this.size() > 5000; }; }; static { try { Class.forName(DRIVER); } catch (ClassNotFoundException e) { // nothing } } @Override public int getDimensionIdByValue(BaseDimension dimension) throws IOException { String cacheKey = this.buildCacheKey(dimension); // Get cache key System.out.println("----Dimensional cache key +"+cacheKey); if (this.cache.containsKey(cacheKey)) { return this.cache.get(cacheKey); } Connection conn = null; try { // 1. Check whether there is a corresponding value in the database. If yes, return // 2. If there is no value in the first step; First insert our dimension data and get the id String[] sql = null; // Execute sql array if (dimension instanceof DateDimension) { sql = this.buildDateSql(); } else if (dimension instanceof PlatformDimension) { sql = this.buildPlatformSql(); } else if (dimension instanceof BrowserDimension) { sql = this.buildBrowserSql(); } else { throw new IOException("This is not supported dimensionid Acquisition of:" + dimension.getClass()); } conn = this.getConnection(); // Get connection // conn=JdbcManager.getConnection(conf, flag) int id = 0; synchronized (this) { id = this.executeSql(conn, cacheKey, sql, dimension); this.cache.put(cacheKey, id); } return id; } catch (Throwable e) { logger.error("An exception occurred while operating the database", e); throw new IOException(e); } finally { if (conn != null) { try { conn.close(); } catch (SQLException e) { // nothing } } } } /** * Get database connection * * @return * @throws SQLException */ private Connection getConnection() throws SQLException { return DriverManager.getConnection(URL, USERNAME, PASSWORD); } /** * Create cache key * * @param dimension * @return */ private String buildCacheKey(BaseDimension dimension) { StringBuilder sb = new StringBuilder(); if (dimension instanceof DateDimension) { sb.append("date_dimension"); DateDimension date = (DateDimension) dimension; sb.append(date.getYear()).append(date.getSeason()).append(date.getMonth()); sb.append(date.getWeek()).append(date.getDay()).append(date.getType()); } else if (dimension instanceof PlatformDimension) { sb.append("platform_dimension"); PlatformDimension platform = (PlatformDimension) dimension; sb.append(platform.getPlatformName()); } else if (dimension instanceof BrowserDimension) { sb.append("browser_dimension"); BrowserDimension browser = (BrowserDimension) dimension; sb.append(browser.getBrowserName()).append(browser.getBrowserVersion()); } if (sb.length() == 0) { throw new RuntimeException("Unable to create the specified dimension of cachekey: " + dimension.getClass()); } return sb.toString(); } /** * Set parameters * * @param pstmt * @param dimension * @throws SQLException */ private void setArgs(PreparedStatement pstmt, BaseDimension dimension) throws SQLException { int i = 0; if (dimension instanceof DateDimension) { DateDimension date = (DateDimension) dimension; pstmt.setInt(++i, date.getYear()); pstmt.setInt(++i, date.getSeason()); pstmt.setInt(++i, date.getMonth()); pstmt.setInt(++i, date.getWeek()); pstmt.setInt(++i, date.getDay()); pstmt.setString(++i, date.getType()); pstmt.setDate(++i, new Date(date.getCalendar().getTime())); } else if (dimension instanceof PlatformDimension) { PlatformDimension platform = (PlatformDimension) dimension; pstmt.setString(++i, platform.getPlatformName()); } else if (dimension instanceof BrowserDimension) { BrowserDimension browser = (BrowserDimension) dimension; pstmt.setString(++i, browser.getBrowserName()); pstmt.setString(++i, browser.getBrowserVersion()); } } /** * Create date dimension related sql * * @return */ private String[] buildDateSql() { String querySql = "SELECT `id` FROM `dimension_date` WHERE `year` = ? AND `season` = ? AND `month` = ? AND `week` = ? AND `day` = ? AND `type` = ? AND `calendar` = ?"; String insertSql = "INSERT INTO `dimension_date`(`year`, `season`, `month`, `week`, `day`, `type`, `calendar`) VALUES(?, ?, ?, ?, ?, ?, ?)"; return new String[] { querySql, insertSql }; } /** * Create polatform dimension related sql * * @return */ private String[] buildPlatformSql() { String querySql = "SELECT `id` FROM `dimension_platform` WHERE `platform_name` = ?"; String insertSql = "INSERT INTO `dimension_platform`(`platform_name`) VALUES(?)"; return new String[] { querySql, insertSql }; } /** * Create browser dimension related sql * * @return */ private String[] buildBrowserSql() { String querySql = "SELECT `id` FROM `dimension_browser` WHERE `browser_name` = ? AND `browser_version` = ?"; String insertSql = "INSERT INTO `dimension_browser`(`browser_name`, `browser_version`) VALUES(?, ?)"; return new String[] { querySql, insertSql }; } /** * How to execute sql * * @param conn * @param cacheKey * @param sqls * @param dimension * @return * @throws SQLException */ @SuppressWarnings("resource") private int executeSql(Connection conn, String cacheKey, String[] sqls, BaseDimension dimension) throws SQLException { PreparedStatement pstmt = null; ResultSet rs = null; try { pstmt = conn.prepareStatement(sqls[0]); // Create pstmt object to query sql // Set parameters this.setArgs(pstmt, dimension); rs = pstmt.executeQuery(); if (rs.next()) { return rs.getInt(1); // Return value } // The code runs here, indicating that the dimension is not stored in the database and can be inserted pstmt = conn.prepareStatement(sqls[1], java.sql.Statement.RETURN_GENERATED_KEYS); // Set parameters this.setArgs(pstmt, dimension); pstmt.executeUpdate(); rs = pstmt.getGeneratedKeys(); // Gets the automatically generated id of the returned if (rs.next()) { return rs.getInt(1); // Get return value } } finally { if (rs != null) { try { rs.close(); } catch (Throwable e) { // nothing } } if (pstmt != null) { try { pstmt.close(); } catch (Throwable e) { // nothing } } } throw new RuntimeException("Get from database id fail"); } }
-
Transformer output format coding
Create a new TransformerOutputFormat class file: customize the outputformat class output to mysql, including the key output from BaseDimension and the value output from BaseStatsValueWritable
No customization required
public class TransformerOutputFormat extends OutputFormat<BaseDimension, BaseStatsValueWritable> { private static final Logger logger = Logger.getLogger(TransformerOutputFormat.class); /** * Define the output format of each piece of data. One piece of data is the data output by the write method each time the reducer task executes. */ @Override public RecordWriter<BaseDimension, BaseStatsValueWritable> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException { Configuration conf = context.getConfiguration(); Connection conn = null; //? Put it first IDimensionConverter converter = new DimensionConverterImpl(); try { conn = JdbcManager.getConnection(conf, GlobalConstants.WAREHOUSE_OF_REPORT); conn.setAutoCommit(false); } catch (SQLException e) { logger.error("Failed to get database connection", e); throw new IOException("Failed to get database connection", e); } return new TransformerRecordWriter(conn, conf, converter); } @Override public void checkOutputSpecs(JobContext context) throws IOException, InterruptedException { // Detect the output space and output to mysql without detection } @Override public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException, InterruptedException { return new FileOutputCommitter(FileOutputFormat.getOutputPath(context), context); } /** * Customize specific data output writer * * @author root * */ public class TransformerRecordWriter extends RecordWriter<BaseDimension, BaseStatsValueWritable> { private Connection conn = null; private Configuration conf = null; private IDimensionConverter converter = null; private Map<KpiType, PreparedStatement> map = new HashMap<KpiType, PreparedStatement>(); private Map<KpiType, Integer> batch = new HashMap<KpiType, Integer>(); public TransformerRecordWriter(Connection conn, Configuration conf, IDimensionConverter converter) { super(); this.conn = conn; this.conf = conf; this.converter = converter; } @Override /** * When the output data of the reduce task is, it is automatically called by the computing framework. Write the data output by reducer to mysql */ public void write(BaseDimension key, BaseStatsValueWritable value) throws IOException, InterruptedException { if (key == null || value == null) { return; } try { KpiType kpi = value.getKpi(); PreparedStatement pstmt = null;//Each pstmt object corresponds to an sql statement int count = 1;//Batch processing of sql statements, 10 at a time if (map.get(kpi) == null) { // Use kpi to distinguish, return sql and save it to config pstmt = this.conn.prepareStatement(conf.get(kpi.name)); map.put(kpi, pstmt); } else { pstmt = map.get(kpi); count = batch.get(kpi); count++; } batch.put(kpi, count); // Storage of batch times String collectorName = conf.get(GlobalConstants.OUTPUT_COLLECTOR_KEY_PREFIX + kpi.name); Class<?> clazz = Class.forName(collectorName); IOutputCollector collector = (IOutputCollector) clazz.newInstance();//Insert value into the mysql method. Because kpi dimensions are different. Cannot insert into table. collector.collect(conf, key, value, pstmt, converter); if (count % Integer.valueOf(conf.get(GlobalConstants.JDBC_BATCH_NUMBER, GlobalConstants.DEFAULT_JDBC_BATCH_NUMBER)) == 0) { pstmt.executeBatch(); conn.commit(); batch.put(kpi, 0); // Corresponding batch calculation deletion } } catch (Throwable e) { logger.error("stay writer Exception in writing data", e); throw new IOException(e); } } @Override public void close(TaskAttemptContext context) throws IOException, InterruptedException { try { for (Map.Entry<KpiType, PreparedStatement> entry : this.map.entrySet()) { entry.getValue().executeBatch(); } } catch (SQLException e) { logger.error("implement executeUpdate Method exception", e); throw new IOException(e); } finally { try { if (conn != null) { conn.commit(); // Submit the connection } } catch (Exception e) { // nothing } finally { for (Map.Entry<KpiType, PreparedStatement> entry : this.map.entrySet()) { try { entry.getValue().close(); } catch (SQLException e) { // nothing } } if (conn != null) try { conn.close(); } catch (Exception e) { // nothing } } } } } }
-
ActiveUserCollector coding
Create a new ActiveUserCollector class file: implement IOutputCollector class, connect to mysql database and write data
No customization required
public class ActiveUserCollector implements IOutputCollector { /** * INSERT INTO `stats_user`( `platform_dimension_id`, `date_dimension_id`, `active_users`, `created`) VALUES(?, ?, ?, ?) ON DUPLICATE KEY UPDATE `active_users` = ? */ @Override public void collect(Configuration conf, BaseDimension key, BaseStatsValueWritable value, PreparedStatement pstmt, IDimensionConverter converter) throws SQLException, IOException { StatsUserDimension userDimension = (StatsUserDimension) key; MapWritableValue mapWritableValue = (MapWritableValue)value; MapWritable mapWritable = mapWritableValue.getValue(); IntWritable intWritable = (IntWritable)mapWritable.get(new IntWritable(-1)); int activeUsers = intWritable.get(); pstmt.setInt(1, converter.getDimensionIdByValue(userDimension.getStatsCommon().getPlatform())); pstmt.setInt(2, converter.getDimensionIdByValue(userDimension.getStatsCommon().getDate())); pstmt.setInt(3, activeUsers); pstmt.setString(4, conf.get(GlobalConstants.RUNNING_DATE_PARAMES)); pstmt.setInt(5,activeUsers); pstmt.addBatch(); } }
-
Article reprinted from Le byte