nginx lua integrated kafka

Posted by damdempsel on Sat, 03 Aug 2019 10:05:15 +0200

NGINX lua integrated kafka

Step 1: Enter the opresty directory

[root@node03 openresty]# cd /export/servers/openresty/
[root@node03 openresty]# ll
total 356
drwxr-xr-x  2 root root   4096 Jul 26 11:33 bin
drwxrwxr-x 44 1000 1000   4096 Jul 26 11:31 build
drwxrwxr-x 43 1000 1000   4096 Nov 13  2017 bundle
-rwxrwxr-x  1 1000 1000  45908 Nov 13  2017 configure
-rw-rw-r--  1 1000 1000  22924 Nov 13  2017 COPYRIGHT
drwxr-xr-x  6 root root   4096 Jul 26 11:33 luajit
drwxr-xr-x  6 root root   4096 Aug  1 08:14 lualib
-rw-r--r--  1 root root   5413 Jul 26 11:32 Makefile
drwxr-xr-x 11 root root   4096 Jul 26 11:35 nginx
drwxrwxr-x  2 1000 1000   4096 Nov 13  2017 patches
drwxr-xr-x 44 root root   4096 Jul 26 11:33 pod
-rw-rw-r--  1 1000 1000   3689 Nov 13  2017 README.markdown
-rw-rw-r--  1 1000 1000   8690 Nov 13  2017 README-win32.txt
-rw-r--r--  1 root root 218352 Jul 26 11:33 resty.index
drwxr-xr-x  5 root root   4096 Jul 26 11:33 site
drwxr-xr-x  2 root root   4096 Aug  1 10:54 testlua
drwxrwxr-x  2 1000 1000   4096 Nov 13  2017 util
[root@node03 openresty]# 

Note: Next, let's focus on two directories lualib and nginx

1.lualib: An integrated package for storing opresty

2.nginx: the nginx service directory

Next, let's go into the lualib directory and see what happens:

[root@node03 openresty]# cd lualib/
[root@node03 lualib]# ll
total 116
-rwxr-xr-x 1 root root 101809 Jul 26 11:33 cjson.so
drwxr-xr-x 3 root root   4096 Jul 26 11:33 ngx
drwxr-xr-x 2 root root   4096 Jul 26 11:33 rds
drwxr-xr-x 2 root root   4096 Jul 26 11:33 redis
drwxr-xr-x 9 root root   4096 Aug  1 10:34 resty

Here we see redis and ngx integration packages, indicating that we can use nginx and redis without importing any dependency packages!!!

Now look at some instructions in resty????

[root@node03 lualib]# cd resty/
[root@node03 resty]# ll
total 152
-rw-r--r-- 1 root root  6409 Jul 26 11:33 aes.lua
drwxr-xr-x 2 root root  4096 Jul 26 11:33 core
-rw-r--r-- 1 root root   596 Jul 26 11:33 core.lua
drwxr-xr-x 2 root root  4096 Jul 26 11:33 dns
drwxr-xr-x 2 root root  4096 Aug  1 10:42 kafka   #This is what we imported ourselves.
drwxr-xr-x 2 root root  4096 Jul 26 11:33 limit
-rw-r--r-- 1 root root  4616 Jul 26 11:33 lock.lua
drwxr-xr-x 2 root root  4096 Jul 26 11:33 lrucache
-rw-r--r-- 1 root root  4620 Jul 26 11:33 lrucache.lua
-rw-r--r-- 1 root root  1211 Jul 26 11:33 md5.lua
-rw-r--r-- 1 root root 14544 Jul 26 11:33 memcached.lua
-rw-r--r-- 1 root root 21577 Jul 26 11:33 mysql.lua
-rw-r--r-- 1 root root   616 Jul 26 11:33 random.lua
-rw-r--r-- 1 root root  9227 Jul 26 11:33 redis.lua
-rw-r--r-- 1 root root  1192 Jul 26 11:33 sha1.lua
-rw-r--r-- 1 root root  1045 Jul 26 11:33 sha224.lua
-rw-r--r-- 1 root root  1221 Jul 26 11:33 sha256.lua
-rw-r--r-- 1 root root  1045 Jul 26 11:33 sha384.lua
-rw-r--r-- 1 root root  1359 Jul 26 11:33 sha512.lua
-rw-r--r-- 1 root root   236 Jul 26 11:33 sha.lua
-rw-r--r-- 1 root root   698 Jul 26 11:33 string.lua
-rw-r--r-- 1 root root  5178 Jul 26 11:33 upload.lua
drwxr-xr-x 2 root root  4096 Jul 26 11:33 upstream
drwxr-xr-x 2 root root  406 Jul 26 11:33 websocket

Here we see the familiar mysql.lua and redis.lua, but leave the rest alone.

Note: The Kafka package here is not available, indicating that opnresty has no integrated kafka. Here I have imported the Kafka integration package in advance.

Let's see what more bags are in kafka:

[root@node03 resty]# cd kafka
[root@node03 kafka]# ll
total 48
-rw-r--r-- 1 root root  1369 Aug  1 10:42 broker.lua
-rw-r--r-- 1 root root  5537 Aug  1 10:42 client.lua
-rw-r--r-- 1 root root   710 Aug  1 10:42 errors.lua
-rw-r--r-- 1 root root 10718 Aug  1 10:42 producer.lua
-rw-r--r-- 1 root root  4072 Aug  1 10:42 request.lua
-rw-r--r-- 1 root root  2118 Aug  1 10:42 response.lua
-rw-r--r-- 1 root root  1494 Aug  1 10:42 ringbuffer.lua
-rw-r--r-- 1 root root  4845 Aug  1 10:42 sendbuffer.lua

Attached is the kafka integration package:

Link: https://pan.baidu.com/s/1pFLhz3E_txb3ZWIRWxfQYg
Extraction code: 0umg

Step 2: Create a kafka test lua file

1. Back to openresty

[root@node03 kafka]# cd /export/servers/openresty/

2. Create test files

[root@node03 openresty]# mkdir -r testlua
#Here the name of the file is taken by oneself, the location of the file is determined by oneself, but it must be found.

Here the name of the file is taken by oneself, the location of the file is determined by oneself, but it must be found!!!!!!!!!!!! Next will be used!!!!!!!!!!!!!!!!!

3. Enter the folder you just created and create the kafkalua.lua script file

Create files: vim kafkalua.lua or touch kafkalua.lua

[root@node03 openresty]# cd testlua/
[root@node03 testlua]# ll
total 8
-rw-r--r-- 1 root root 3288 Aug  1 10:54 kafkalua.lua

kafkalua.lua:

--Test statements can be avoided
ngx.say('hello kafka file configuration successful!!!!!!')

--Data acquisition threshold limit, if lua If the acquisition exceeds the threshold value, no acquisition will occur.
local DEFAULT_THRESHOLD = 100000
-- kafka Number of partitions
local PARTITION_NUM = 6
-- kafka Topic Name
local TOPIC = 'B2CDATA_COLLECTION1'
-- Pollers share variables KEY value
local POLLING_KEY = "POLLING_KEY"
-- kafka colony(Definition kafka broker Address, ip Need and kafka Of host.name Configuration is consistent)
local function partitioner(key, num, correlation_id)
    return tonumber(key)
end
--kafka broker list
local BROKER_LIST = {{host="192.168.52.100",port=9092},{host="192.168.52.110",port=9092},{host="192.168.52.120",port=9092}}
--kafka Parameters,
local CONNECT_PARAMS = { producer_type = "async", socket_timeout = 30000, flush_time = 10000, request_timeout = 20000, partitioner = partitioner }
-- Shared memory counter, for kafka Polling use
local shared_data = ngx.shared.shared_data
local pollingVal = shared_data:get(POLLING_KEY)
if not pollingVal then
    pollingVal = 1
    shared_data:set(POLLING_KEY, pollingVal)
end
--The counter that gets each message, right PARTITION_NUM Remainder, balanced partition
local partitions = '' .. (tonumber(pollingVal) % PARTITION_NUM)
shared_data:incr(POLLING_KEY, 1)

-- concurrency control
local isGone = true
--Obtain ngx.var.connections_active Overload protection, i.e. current limiting protection if the number of active connections exceeds the threshold
if tonumber(ngx.var.connections_active) > tonumber(DEFAULT_THRESHOLD) then
    isGone = false
end
-- data acquisition
if isGone then

    local time_local = ngx.var.time_local
    if time_local == nil then
        time_local = ""
    end

    local request = ngx.var.request
    if request == nil then
        request = ""
    end

    local request_method = ngx.var.request_method
    if request_method == nil then
        request_method = ""
    end

    local content_type = ngx.var.content_type
    if content_type == nil then
        content_type = ""
    end
    ngx.req.read_body()
    local request_body = ngx.var.request_body
    if request_body == nil then
        request_body = ""
    end

    local http_referer = ngx.var.http_referer
    if http_referer == nil then
        http_referer = ""
    end

    local remote_addr = ngx.var.remote_addr
    if remote_addr == nil then
        remote_addr = ""
    end

    local http_user_agent = ngx.var.http_user_agent
    if http_user_agent == nil then
        http_user_agent = ""
    end

    local time_iso8601 = ngx.var.time_iso8601
    if time_iso8601 == nil then
        time_iso8601 = ""
    end

    local server_addr = ngx.var.server_addr
    if server_addr == nil then
        server_addr = ""
    end

    local http_cookie = ngx.var.http_cookie
    if http_cookie == nil then
        http_cookie = ""
    end
--Encapsulation of data
    local message = time_local .."#CS#".. request .."#CS#".. request_method .."#CS#".. content_type .."#CS#".. request_body .."#CS#".. http_referer .."#CS#".. remote_addr .."#CS#".. http_user_agent .."#CS#".. time_iso8601 .."#CS#".. server_addr .."#CS#".. http_cookie;
--Introduce kafka Of producer
local producer = require "resty.kafka.producer"
--Establish producer
local bp = producer:new(BROKER_LIST, CONNECT_PARAMS)
--send data
local ok, err = bp:send(TOPIC, partitions, message)
--Print error log
    if not ok then
        ngx.log(ngx.ERR, "kafka send err:", err)
        return
    end
end

Step 3: Modify the nginx configuration file nginx.conf

1. Enter the ngin/conf directory

[root@node03 openresty]# cd /export/servers/openresty/nginx/conf/
[root@node03 conf]# ll
total 76
-rw-r--r-- 1 root root 1077 Jul 26 11:33 fastcgi.conf
-rw-r--r-- 1 root root 1077 Jul 26 11:33 fastcgi.conf.default
-rw-r--r-- 1 root root 1007 Jul 26 11:33 fastcgi_params
-rw-r--r-- 1 root root 1007 Jul 26 11:33 fastcgi_params.default
-rw-r--r-- 1 root root 2837 Jul 26 11:33 koi-utf
-rw-r--r-- 1 root root 2223 Jul 26 11:33 koi-win
-rw-r--r-- 1 root root 5170 Jul 26 11:33 mime.types
-rw-r--r-- 1 root root 5170 Jul 26 11:33 mime.types.default
-rw-r--r-- 1 root root 3191 Aug  1 10:52 nginx.conf
-rw-r--r-- 1 root root 2656 Jul 26 11:33 nginx.conf.default
-rw-r--r-- 1 root root  636 Jul 26 11:33 scgi_params
-rw-r--r-- 1 root root  636 Jul 26 11:33 scgi_params.default
-rw-r--r-- 1 root root  664 Jul 26 11:33 uwsgi_params
-rw-r--r-- 1 root root  664 Jul 26 11:33 uwsgi_params.default
-rw-r--r-- 1 root root 3610 Jul 26 11:33 win-utf

2. Modify nginx.conf

[root@node03 conf]# vim nginx.conf

        #1. Explain to find the first server
        #2. Add two lines of code to the server as follows
        #3. Add kafka-related code to server as follows
        
        
#Additional code
 #Open the shared dictionary and set the memory size to 10M for each nginx thread
 lua_shared_dict shared_data 10m;
 #Configure Local Domain Name Resolution
 resolver 127.0.0.1;
#Additional code

 server {
        listen       80;
        server_name  localhost;

        #charset koi8-r;

        #access_log  logs/host.access.log  main;
        location / {
            root   html;
            index  index.html index.htm;
        }

        #Additional code
        location /kafkalua {  #Here, kafkalua is the name of the project. It is empty without default.
            #Turn on nginx monitoring
            stub_status on;
            #Load the lua file
            default_type text/html;
            #Specify the location of kafka's lua file, which is the kafkalua.lua we just created (which was emphasized earlier!!!!).
            content_by_lua_file /export/servers/openresty/testlua/kafkalua.lua;
        }
        #Additional code
}

Description: location /kafkalua {...} Here, kafkalua is the name of the project, you can choose it at will or not, but you must remember!!!

Seeing that we have two locations configured above, the first is location /{...} and the second is location /kafkalua {...}, what's the difference between them? Look down first, and the fog will slowly unravel.

Step 4: Start nginx

1. Enter nginx/sbin

[root@node03 sbin]# cd /export/servers/openresty/nginx/sbin/
[root@node03 sbin]# ll
total 16356
-rwxr-xr-x 1 root root 16745834 Jul 26 11:33 nginx

2. Test whether the configuration file is correct

[root@node03 sbin]# nginx -t
nginx: the configuration file /export/servers/openresty/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /export/servers/openresty/nginx/conf/nginx.conf test is successful
#See it's done.

3. Start nginx

[root@node03 sbin]# nginx
#Not showing anything is generally successful.

4. Check whether nginx started successfully

[root@node03 sbin]# ps -ef | grep nginx
root       3730      1  0 09:24 ?        00:00:00 nginx: master process nginx
nobody     3731   3730  0 09:24 ?        00:00:20 nginx: worker process is shutting down
nobody     5766   3730  0 12:17 ?        00:00:00 nginx: worker process
root       5824   3708  0 12:24 pts/1    00:00:00 grep nginx
#See that there are two nginx processes, indicating success le

5. Browser accesses nginx

Input in browser: node03/kafkalua

Description: How to input the address of the device where openresty is located without hosts configuration, such as 192.168.52.120/kafkalua

Enter in the browser: node03/or 192.168.52.120/

Enter in the browser: node03:80/kafkalua and node03:80/try

Move to nginx.conf and see:

node03:80/kafkalua Here nide03 is the alias of the server or the address of the writing server between the servers, 80 is the listening port configured, 80 port can be omitted from writing, if this is written as listen 8088, then the browser needs to input. node03:8088/kafkalua (8088 cannot be omitted here). kafkalua is the name of the project.

 server {
        listen       80;
        server_name  localhost;

        #charset koi8-r;

        #access_log  logs/host.access.log  main;
        location / {
            root   html;
            index  index.html index.htm;
        }

        #Additional code
        location /kafkalua {  #Here, kafkalua is the name of the project. It is empty without default.
            #Turn on nginx monitoring
            stub_status on;
            #Load the lua file
            default_type text/html;
            #Specify the location of kafka's lua file, which is the kafkalua.lua we just created (which was emphasized earlier!!!!).
            content_by_lua_file /export/servers/openresty/testlua/kafkalua.lua;
        }

Step 5: Create a test crawler

1. Create maven project import dependencies

    <dependencies>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.11.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.4</version>
        </dependency>
    </dependencies>

2. Pseudo-crawler program

public class SpiderGoAirCN {
    private static String basePath = "http://node03/kafkalua";
    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 50000; i++) {
            // Request for query information
            spiderQueryao();
            // Request html
            spiderHtml();
            // Request js
            spiderJs();
            // Request css
            spiderCss();
            // Request png
            spiderPng();
            // Request jpg
            spiderJpg();
            Thread.sleep(100);
        }
    }

    /**
     * 
     * @throws Exception
     */
    public static void spiderQueryao() throws Exception {
        // 1. Designate target website ^. */B2C40/query/jaxb/direct/query.ao.*$
        String url = basePath + "/B2C40/query/jaxb/direct/query.ao";
        // 2. Initiation of requests
        HttpPost httpPost = new HttpPost(url);
        // 3. Setting Request Parameters
        httpPost.setHeader("Time-Local", getLocalDateTime());
        httpPost.setHeader("Requst",
                    "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1");
        httpPost.setHeader("Request Method", "POST");
        httpPost.setHeader("Content-Type",
                "application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader(
                "Referer",
                "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1="
                        + getGoTime() + "&at=1&ct=0&it=0");
        httpPost.setHeader("Remote Address", "192.168.56.80");
        httpPost.setHeader(
                "User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPost.setHeader("Time-Iso8601", getISO8601Timestamp());
        httpPost.setHeader("Server Address", "243.45.78.132");
        httpPost.setHeader(
                "Cookie",
                "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D"
                        + getGoTime()
                        + "%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1("
                        + getGoTime() + ")");
        // 4. Setting Request Parameters
        ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>();
        parameters
                .add(new BasicNameValuePair(
                        "json",
                        "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}"));
        httpPost.setEntity(new UrlEncodedFormEntity(parameters));
        // 5. Initiation of requests
        CloseableHttpClient httpClient = HttpClients.createDefault();
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 6. Get the return value
        System.out.println(response != null);
    }

    public static void spiderHtml() throws Exception {
        // 1. Designate target website ^. * html.*$
        String url = basePath + "/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=CTU&d1=2018-01-17&at=1&ct=0&it=0";
        // 2. Initiation of requests
        HttpPost httpPost = new HttpPost(url);
        // 3. Setting Request Parameters
        httpPost.setHeader("Time-Local", getLocalDateTime());
        httpPost.setHeader("Requst",
                "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1");
        httpPost.setHeader("Request Method", "POST");
        httpPost.setHeader("Content-Type",
                "application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader(
                "Referer",
                "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0");
        httpPost.setHeader("Remote Address", "192.168.56.1");
        httpPost.setHeader(
                "User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPost.setHeader("Time-Iso8601", getISO8601Timestamp());
        httpPost.setHeader("Server Address", "192.168.56.80");
        httpPost.setHeader(
                "Cookie",
                "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)");
        // 4. Setting Request Parameters
        // httpPost.setEntity(new StringEntity(
        // "depcity=CAN&arrcity=WUH&flightdate=20180220&adultnum=1&childnum=0&infantnum=0&cabinorder=0&airline=1&flytype=0&international=0&action=0&segtype=1&cache=0&preUrl=&isMember="));
        ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>();
        parameters
                .add(new BasicNameValuePair(
                        "json",
                        "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}"));
        httpPost.setEntity(new UrlEncodedFormEntity(parameters));
        // 5. Initiation of requests
        CloseableHttpClient httpClient = HttpClients.createDefault();
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 6. Get the return value
        System.out.println(response != null);
    }

    public static void spiderJs() throws Exception {

        // 1. Designate the target website
        String url = basePath +"/B2C40/dist/main/modules/common/requireConfig.js";
        // 2. Initiation of requests
        HttpPost httpPost = new HttpPost(url);
        // 3. Setting Request Parameters
        httpPost.setHeader("Time-Local", getLocalDateTime());
        httpPost.setHeader("Requst",
                "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1");
        httpPost.setHeader("Request Method", "POST");
        httpPost.setHeader("Content-Type",
                "application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader(
                "Referer",
                "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0");
        httpPost.setHeader("Remote Address", "192.168.56.1");
        httpPost.setHeader(
                "User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPost.setHeader("Time-Iso8601", getISO8601Timestamp());
        httpPost.setHeader("Server Address", "192.168.56.80");
        httpPost.setHeader(
                "Cookie",
                "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)");
        // 4. Setting Request Parameters
        ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>();
        parameters
                .add(new BasicNameValuePair(
                        "json",
                        "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}"));
        httpPost.setEntity(new UrlEncodedFormEntity(parameters));
        // 5. Initiation of requests
        CloseableHttpClient httpClient = HttpClients.createDefault();
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 6. Get the return value
        System.out.println(response != null);
    }

    public static void spiderCss() throws Exception {

        // 1. Designate the target website
        String url = basePath +"/B2C40/dist/main/css/flight.css";
        // 2. Initiation of requests
        HttpPost httpPost = new HttpPost(url);
        // 3. Setting Request Parameters
        httpPost.setHeader("Time-Local", getLocalDateTime());
        httpPost.setHeader("Requst",
                "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1");
        httpPost.setHeader("Request Method", "POST");
        httpPost.setHeader("Content-Type",
                "application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader("Referer",
                "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html");
        httpPost.setHeader("Remote Address", "192.168.56.1");
        httpPost.setHeader(
                "User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPost.setHeader("Time-Iso8601", getISO8601Timestamp());
        httpPost.setHeader("Server Address", "192.168.56.80");
        httpPost.setHeader(
                "Cookie",
                "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)");
        // 4. Setting Request Parameters
        ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>();
        parameters
                .add(new BasicNameValuePair(
                        "json",
                        "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}"));
        httpPost.setEntity(new UrlEncodedFormEntity(parameters));
        // 5. Initiation of requests
        CloseableHttpClient httpClient = HttpClients.createDefault();
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 6. Get the return value
        System.out.println(response != null);
    }

    public static void spiderPng() throws Exception {

        // 1. Designate the target website
        String url =basePath + "/B2C40/dist/main/images/common.png";
        // 2. Initiation of requests
        HttpPost httpPost = new HttpPost(url);
        // 3. Setting Request Parameters
        httpPost.setHeader("Time-Local", getLocalDateTime());
        httpPost.setHeader("Requst",
                "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1");
        httpPost.setHeader("Request Method", "POST");
        httpPost.setHeader("Content-Type",
                "application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader(
                "Referer",
                "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0");
        httpPost.setHeader("Remote Address", "192.168.56.1");
        httpPost.setHeader(
                "User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPost.setHeader("Time-Iso8601", getISO8601Timestamp());
        httpPost.setHeader("Server Address", "192.168.56.80");
        httpPost.setHeader(
                "Cookie",
                "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)");
        // 4. Setting Request Parameters
        ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>();
        parameters
                .add(new BasicNameValuePair(
                        "json",
                        "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}"));
        httpPost.setEntity(new UrlEncodedFormEntity(parameters));
        // 5. Initiation of requests
        CloseableHttpClient httpClient = HttpClients.createDefault();
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 6. Get the return value
        System.out.println(response != null);
    }

    public static void spiderJpg() throws Exception {

        // 1. Designate the target website
        String url = basePath +"/B2C40/dist/main/images/loadingimg.jpg";
        // 2. Initiation of requests
        HttpPost httpPost = new HttpPost(url);
        // 3. Setting Request Parameters
        httpPost.setHeader("Time-Local", getLocalDateTime());
        httpPost.setHeader("Requst",
                "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1");
        httpPost.setHeader("Request Method", "POST");
        httpPost.setHeader("Content-Type",
                "application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader(
                "Referer",
                "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0");
        httpPost.setHeader("Remote Address", "192.168.56.1");
        httpPost.setHeader(
                "User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPost.setHeader("Time-Iso8601", getISO8601Timestamp());
        httpPost.setHeader("Server Address", "192.168.56.80");
        httpPost.setHeader(
                "Cookie",
                "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)");
        // 4. Setting Request Parameters
        ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>();
        parameters
                .add(new BasicNameValuePair(
                        "json",
                        "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}"));
        httpPost.setEntity(new UrlEncodedFormEntity(parameters));
        // 5. Initiation of requests
        CloseableHttpClient httpClient = HttpClients.createDefault();
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 6. Get the return value
        System.out.println(response != null);
    }

    public static String getLocalDateTime() {
        DateFormat df = new SimpleDateFormat("dd/MMM/yyyy'T'HH:mm:ss +08:00",
                Locale.ENGLISH);
        String nowAsISO = df.format(new Date());
        return nowAsISO;

    }

    public static String getISO8601Timestamp() {
        DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss+08:00");
        String nowAsISO = df.format(new Date());
        return nowAsISO;
    }

    public static String getGoTime() {
        DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
        String nowAsISO = df.format(new Date());
        return nowAsISO;
    }

    public static String getBackTime() {
        Date date = new Date();// Take time
        Calendar calendar = new GregorianCalendar();
        calendar.setTime(date);
        calendar.add(calendar.DATE, +1);// Reduce the date by one day. If you want to push the date back by one day, change the negative number to the positive number.
        date = calendar.getTime();
        SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
        String dateString = formatter.format(date);
        return dateString;
    }
}

Step 6: Start kafka

1. Create topic

[root@node01 bin]# kafka-topics.sh --zookeeper node01:2181 --partitions 3 
--replication-factor 3 --create --topic B2CDATA_COLLECTION1

2. Open up kafka consumers

[root@node01 bin]# kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 
--topic B2CDATA_COLLECTION1

Step 7: Open the crawler program and observe the results

1. Start the crawler program

2. Look at the consumer window as follows

Step 8: Start kafka-manager observation

1. Start kafka-manager

[root@node01 conf]# cd /export/servers/kafka-manager-1.3.3.23/bin/
[root@node01 bin]# ll
total 36
-rwxr-xr-x 1 root root 13747 May  1 06:27 kafka-manager
-rw-r--r-- 1 root root  9975 May  1 06:27 kafka-manager.bat
-rwxr-xr-x 1 root root  1383 May  1 06:27 log-config
-rw-r--r-- 1 root root   105 May  1 06:27 log-config.bat
[root@node01 bin]# 

#start-up
[root@node01 bin]# ./kafka-manager 

After startup window:

2. Browser access

Browser input: node01:9000

kafka manager uses no explanation to observe the consumption of B2CDATA_COLLECTION 1:

There are three zones, and the difference in information consumed by each zone indicates that it has succeeded.

If not, the kafkalua.lua script does not configure partitioning policies. The default partitioning will cause data skew. We need to configure our own partitioning policies!

Complete!!!!!!!!

Topics: PHP Nginx kafka Windows JSON