NGINX lua integrated kafka
Step 1: Enter the opresty directory
[root@node03 openresty]# cd /export/servers/openresty/ [root@node03 openresty]# ll total 356 drwxr-xr-x 2 root root 4096 Jul 26 11:33 bin drwxrwxr-x 44 1000 1000 4096 Jul 26 11:31 build drwxrwxr-x 43 1000 1000 4096 Nov 13 2017 bundle -rwxrwxr-x 1 1000 1000 45908 Nov 13 2017 configure -rw-rw-r-- 1 1000 1000 22924 Nov 13 2017 COPYRIGHT drwxr-xr-x 6 root root 4096 Jul 26 11:33 luajit drwxr-xr-x 6 root root 4096 Aug 1 08:14 lualib -rw-r--r-- 1 root root 5413 Jul 26 11:32 Makefile drwxr-xr-x 11 root root 4096 Jul 26 11:35 nginx drwxrwxr-x 2 1000 1000 4096 Nov 13 2017 patches drwxr-xr-x 44 root root 4096 Jul 26 11:33 pod -rw-rw-r-- 1 1000 1000 3689 Nov 13 2017 README.markdown -rw-rw-r-- 1 1000 1000 8690 Nov 13 2017 README-win32.txt -rw-r--r-- 1 root root 218352 Jul 26 11:33 resty.index drwxr-xr-x 5 root root 4096 Jul 26 11:33 site drwxr-xr-x 2 root root 4096 Aug 1 10:54 testlua drwxrwxr-x 2 1000 1000 4096 Nov 13 2017 util [root@node03 openresty]#
Note: Next, let's focus on two directories lualib and nginx
1.lualib: An integrated package for storing opresty
2.nginx: the nginx service directory
Next, let's go into the lualib directory and see what happens:
[root@node03 openresty]# cd lualib/ [root@node03 lualib]# ll total 116 -rwxr-xr-x 1 root root 101809 Jul 26 11:33 cjson.so drwxr-xr-x 3 root root 4096 Jul 26 11:33 ngx drwxr-xr-x 2 root root 4096 Jul 26 11:33 rds drwxr-xr-x 2 root root 4096 Jul 26 11:33 redis drwxr-xr-x 9 root root 4096 Aug 1 10:34 resty
Here we see redis and ngx integration packages, indicating that we can use nginx and redis without importing any dependency packages!!!
Now look at some instructions in resty????
[root@node03 lualib]# cd resty/ [root@node03 resty]# ll total 152 -rw-r--r-- 1 root root 6409 Jul 26 11:33 aes.lua drwxr-xr-x 2 root root 4096 Jul 26 11:33 core -rw-r--r-- 1 root root 596 Jul 26 11:33 core.lua drwxr-xr-x 2 root root 4096 Jul 26 11:33 dns drwxr-xr-x 2 root root 4096 Aug 1 10:42 kafka #This is what we imported ourselves. drwxr-xr-x 2 root root 4096 Jul 26 11:33 limit -rw-r--r-- 1 root root 4616 Jul 26 11:33 lock.lua drwxr-xr-x 2 root root 4096 Jul 26 11:33 lrucache -rw-r--r-- 1 root root 4620 Jul 26 11:33 lrucache.lua -rw-r--r-- 1 root root 1211 Jul 26 11:33 md5.lua -rw-r--r-- 1 root root 14544 Jul 26 11:33 memcached.lua -rw-r--r-- 1 root root 21577 Jul 26 11:33 mysql.lua -rw-r--r-- 1 root root 616 Jul 26 11:33 random.lua -rw-r--r-- 1 root root 9227 Jul 26 11:33 redis.lua -rw-r--r-- 1 root root 1192 Jul 26 11:33 sha1.lua -rw-r--r-- 1 root root 1045 Jul 26 11:33 sha224.lua -rw-r--r-- 1 root root 1221 Jul 26 11:33 sha256.lua -rw-r--r-- 1 root root 1045 Jul 26 11:33 sha384.lua -rw-r--r-- 1 root root 1359 Jul 26 11:33 sha512.lua -rw-r--r-- 1 root root 236 Jul 26 11:33 sha.lua -rw-r--r-- 1 root root 698 Jul 26 11:33 string.lua -rw-r--r-- 1 root root 5178 Jul 26 11:33 upload.lua drwxr-xr-x 2 root root 4096 Jul 26 11:33 upstream drwxr-xr-x 2 root root 406 Jul 26 11:33 websocket
Here we see the familiar mysql.lua and redis.lua, but leave the rest alone.
Note: The Kafka package here is not available, indicating that opnresty has no integrated kafka. Here I have imported the Kafka integration package in advance.
Let's see what more bags are in kafka:
[root@node03 resty]# cd kafka [root@node03 kafka]# ll total 48 -rw-r--r-- 1 root root 1369 Aug 1 10:42 broker.lua -rw-r--r-- 1 root root 5537 Aug 1 10:42 client.lua -rw-r--r-- 1 root root 710 Aug 1 10:42 errors.lua -rw-r--r-- 1 root root 10718 Aug 1 10:42 producer.lua -rw-r--r-- 1 root root 4072 Aug 1 10:42 request.lua -rw-r--r-- 1 root root 2118 Aug 1 10:42 response.lua -rw-r--r-- 1 root root 1494 Aug 1 10:42 ringbuffer.lua -rw-r--r-- 1 root root 4845 Aug 1 10:42 sendbuffer.lua
Attached is the kafka integration package:
Link: https://pan.baidu.com/s/1pFLhz3E_txb3ZWIRWxfQYg
Extraction code: 0umg
Step 2: Create a kafka test lua file
1. Back to openresty
[root@node03 kafka]# cd /export/servers/openresty/
2. Create test files
[root@node03 openresty]# mkdir -r testlua #Here the name of the file is taken by oneself, the location of the file is determined by oneself, but it must be found.
Here the name of the file is taken by oneself, the location of the file is determined by oneself, but it must be found!!!!!!!!!!!! Next will be used!!!!!!!!!!!!!!!!!
3. Enter the folder you just created and create the kafkalua.lua script file
Create files: vim kafkalua.lua or touch kafkalua.lua
[root@node03 openresty]# cd testlua/ [root@node03 testlua]# ll total 8 -rw-r--r-- 1 root root 3288 Aug 1 10:54 kafkalua.lua
kafkalua.lua:
--Test statements can be avoided ngx.say('hello kafka file configuration successful!!!!!!') --Data acquisition threshold limit, if lua If the acquisition exceeds the threshold value, no acquisition will occur. local DEFAULT_THRESHOLD = 100000 -- kafka Number of partitions local PARTITION_NUM = 6 -- kafka Topic Name local TOPIC = 'B2CDATA_COLLECTION1' -- Pollers share variables KEY value local POLLING_KEY = "POLLING_KEY" -- kafka colony(Definition kafka broker Address, ip Need and kafka Of host.name Configuration is consistent) local function partitioner(key, num, correlation_id) return tonumber(key) end --kafka broker list local BROKER_LIST = {{host="192.168.52.100",port=9092},{host="192.168.52.110",port=9092},{host="192.168.52.120",port=9092}} --kafka Parameters, local CONNECT_PARAMS = { producer_type = "async", socket_timeout = 30000, flush_time = 10000, request_timeout = 20000, partitioner = partitioner } -- Shared memory counter, for kafka Polling use local shared_data = ngx.shared.shared_data local pollingVal = shared_data:get(POLLING_KEY) if not pollingVal then pollingVal = 1 shared_data:set(POLLING_KEY, pollingVal) end --The counter that gets each message, right PARTITION_NUM Remainder, balanced partition local partitions = '' .. (tonumber(pollingVal) % PARTITION_NUM) shared_data:incr(POLLING_KEY, 1) -- concurrency control local isGone = true --Obtain ngx.var.connections_active Overload protection, i.e. current limiting protection if the number of active connections exceeds the threshold if tonumber(ngx.var.connections_active) > tonumber(DEFAULT_THRESHOLD) then isGone = false end -- data acquisition if isGone then local time_local = ngx.var.time_local if time_local == nil then time_local = "" end local request = ngx.var.request if request == nil then request = "" end local request_method = ngx.var.request_method if request_method == nil then request_method = "" end local content_type = ngx.var.content_type if content_type == nil then content_type = "" end ngx.req.read_body() local request_body = ngx.var.request_body if request_body == nil then request_body = "" end local http_referer = ngx.var.http_referer if http_referer == nil then http_referer = "" end local remote_addr = ngx.var.remote_addr if remote_addr == nil then remote_addr = "" end local http_user_agent = ngx.var.http_user_agent if http_user_agent == nil then http_user_agent = "" end local time_iso8601 = ngx.var.time_iso8601 if time_iso8601 == nil then time_iso8601 = "" end local server_addr = ngx.var.server_addr if server_addr == nil then server_addr = "" end local http_cookie = ngx.var.http_cookie if http_cookie == nil then http_cookie = "" end --Encapsulation of data local message = time_local .."#CS#".. request .."#CS#".. request_method .."#CS#".. content_type .."#CS#".. request_body .."#CS#".. http_referer .."#CS#".. remote_addr .."#CS#".. http_user_agent .."#CS#".. time_iso8601 .."#CS#".. server_addr .."#CS#".. http_cookie; --Introduce kafka Of producer local producer = require "resty.kafka.producer" --Establish producer local bp = producer:new(BROKER_LIST, CONNECT_PARAMS) --send data local ok, err = bp:send(TOPIC, partitions, message) --Print error log if not ok then ngx.log(ngx.ERR, "kafka send err:", err) return end end
Step 3: Modify the nginx configuration file nginx.conf
1. Enter the ngin/conf directory
[root@node03 openresty]# cd /export/servers/openresty/nginx/conf/ [root@node03 conf]# ll total 76 -rw-r--r-- 1 root root 1077 Jul 26 11:33 fastcgi.conf -rw-r--r-- 1 root root 1077 Jul 26 11:33 fastcgi.conf.default -rw-r--r-- 1 root root 1007 Jul 26 11:33 fastcgi_params -rw-r--r-- 1 root root 1007 Jul 26 11:33 fastcgi_params.default -rw-r--r-- 1 root root 2837 Jul 26 11:33 koi-utf -rw-r--r-- 1 root root 2223 Jul 26 11:33 koi-win -rw-r--r-- 1 root root 5170 Jul 26 11:33 mime.types -rw-r--r-- 1 root root 5170 Jul 26 11:33 mime.types.default -rw-r--r-- 1 root root 3191 Aug 1 10:52 nginx.conf -rw-r--r-- 1 root root 2656 Jul 26 11:33 nginx.conf.default -rw-r--r-- 1 root root 636 Jul 26 11:33 scgi_params -rw-r--r-- 1 root root 636 Jul 26 11:33 scgi_params.default -rw-r--r-- 1 root root 664 Jul 26 11:33 uwsgi_params -rw-r--r-- 1 root root 664 Jul 26 11:33 uwsgi_params.default -rw-r--r-- 1 root root 3610 Jul 26 11:33 win-utf
2. Modify nginx.conf
[root@node03 conf]# vim nginx.conf #1. Explain to find the first server #2. Add two lines of code to the server as follows #3. Add kafka-related code to server as follows #Additional code #Open the shared dictionary and set the memory size to 10M for each nginx thread lua_shared_dict shared_data 10m; #Configure Local Domain Name Resolution resolver 127.0.0.1; #Additional code server { listen 80; server_name localhost; #charset koi8-r; #access_log logs/host.access.log main; location / { root html; index index.html index.htm; } #Additional code location /kafkalua { #Here, kafkalua is the name of the project. It is empty without default. #Turn on nginx monitoring stub_status on; #Load the lua file default_type text/html; #Specify the location of kafka's lua file, which is the kafkalua.lua we just created (which was emphasized earlier!!!!). content_by_lua_file /export/servers/openresty/testlua/kafkalua.lua; } #Additional code }
Description: location /kafkalua {...} Here, kafkalua is the name of the project, you can choose it at will or not, but you must remember!!!
Seeing that we have two locations configured above, the first is location /{...} and the second is location /kafkalua {...}, what's the difference between them? Look down first, and the fog will slowly unravel.
Step 4: Start nginx
1. Enter nginx/sbin
[root@node03 sbin]# cd /export/servers/openresty/nginx/sbin/ [root@node03 sbin]# ll total 16356 -rwxr-xr-x 1 root root 16745834 Jul 26 11:33 nginx
2. Test whether the configuration file is correct
[root@node03 sbin]# nginx -t nginx: the configuration file /export/servers/openresty/nginx/conf/nginx.conf syntax is ok nginx: configuration file /export/servers/openresty/nginx/conf/nginx.conf test is successful #See it's done.
3. Start nginx
[root@node03 sbin]# nginx #Not showing anything is generally successful.
4. Check whether nginx started successfully
[root@node03 sbin]# ps -ef | grep nginx root 3730 1 0 09:24 ? 00:00:00 nginx: master process nginx nobody 3731 3730 0 09:24 ? 00:00:20 nginx: worker process is shutting down nobody 5766 3730 0 12:17 ? 00:00:00 nginx: worker process root 5824 3708 0 12:24 pts/1 00:00:00 grep nginx #See that there are two nginx processes, indicating success le
5. Browser accesses nginx
Input in browser: node03/kafkalua
Description: How to input the address of the device where openresty is located without hosts configuration, such as 192.168.52.120/kafkalua
Enter in the browser: node03/or 192.168.52.120/
Enter in the browser: node03:80/kafkalua and node03:80/try
Move to nginx.conf and see:
node03:80/kafkalua Here nide03 is the alias of the server or the address of the writing server between the servers, 80 is the listening port configured, 80 port can be omitted from writing, if this is written as listen 8088, then the browser needs to input. node03:8088/kafkalua (8088 cannot be omitted here). kafkalua is the name of the project.
server { listen 80; server_name localhost; #charset koi8-r; #access_log logs/host.access.log main; location / { root html; index index.html index.htm; } #Additional code location /kafkalua { #Here, kafkalua is the name of the project. It is empty without default. #Turn on nginx monitoring stub_status on; #Load the lua file default_type text/html; #Specify the location of kafka's lua file, which is the kafkalua.lua we just created (which was emphasized earlier!!!!). content_by_lua_file /export/servers/openresty/testlua/kafkalua.lua; }
Step 5: Create a test crawler
1. Create maven project import dependencies
<dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.11.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.4</version> </dependency> </dependencies>
2. Pseudo-crawler program
public class SpiderGoAirCN { private static String basePath = "http://node03/kafkalua"; public static void main(String[] args) throws Exception { for (int i = 0; i < 50000; i++) { // Request for query information spiderQueryao(); // Request html spiderHtml(); // Request js spiderJs(); // Request css spiderCss(); // Request png spiderPng(); // Request jpg spiderJpg(); Thread.sleep(100); } } /** * * @throws Exception */ public static void spiderQueryao() throws Exception { // 1. Designate target website ^. */B2C40/query/jaxb/direct/query.ao.*$ String url = basePath + "/B2C40/query/jaxb/direct/query.ao"; // 2. Initiation of requests HttpPost httpPost = new HttpPost(url); // 3. Setting Request Parameters httpPost.setHeader("Time-Local", getLocalDateTime()); httpPost.setHeader("Requst", "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1"); httpPost.setHeader("Request Method", "POST"); httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); httpPost.setHeader( "Referer", "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=" + getGoTime() + "&at=1&ct=0&it=0"); httpPost.setHeader("Remote Address", "192.168.56.80"); httpPost.setHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"); httpPost.setHeader("Time-Iso8601", getISO8601Timestamp()); httpPost.setHeader("Server Address", "243.45.78.132"); httpPost.setHeader( "Cookie", "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D" + getGoTime() + "%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(" + getGoTime() + ")"); // 4. Setting Request Parameters ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>(); parameters .add(new BasicNameValuePair( "json", "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}")); httpPost.setEntity(new UrlEncodedFormEntity(parameters)); // 5. Initiation of requests CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost); // 6. Get the return value System.out.println(response != null); } public static void spiderHtml() throws Exception { // 1. Designate target website ^. * html.*$ String url = basePath + "/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=CTU&d1=2018-01-17&at=1&ct=0&it=0"; // 2. Initiation of requests HttpPost httpPost = new HttpPost(url); // 3. Setting Request Parameters httpPost.setHeader("Time-Local", getLocalDateTime()); httpPost.setHeader("Requst", "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1"); httpPost.setHeader("Request Method", "POST"); httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); httpPost.setHeader( "Referer", "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0"); httpPost.setHeader("Remote Address", "192.168.56.1"); httpPost.setHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"); httpPost.setHeader("Time-Iso8601", getISO8601Timestamp()); httpPost.setHeader("Server Address", "192.168.56.80"); httpPost.setHeader( "Cookie", "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)"); // 4. Setting Request Parameters // httpPost.setEntity(new StringEntity( // "depcity=CAN&arrcity=WUH&flightdate=20180220&adultnum=1&childnum=0&infantnum=0&cabinorder=0&airline=1&flytype=0&international=0&action=0&segtype=1&cache=0&preUrl=&isMember=")); ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>(); parameters .add(new BasicNameValuePair( "json", "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}")); httpPost.setEntity(new UrlEncodedFormEntity(parameters)); // 5. Initiation of requests CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost); // 6. Get the return value System.out.println(response != null); } public static void spiderJs() throws Exception { // 1. Designate the target website String url = basePath +"/B2C40/dist/main/modules/common/requireConfig.js"; // 2. Initiation of requests HttpPost httpPost = new HttpPost(url); // 3. Setting Request Parameters httpPost.setHeader("Time-Local", getLocalDateTime()); httpPost.setHeader("Requst", "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1"); httpPost.setHeader("Request Method", "POST"); httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); httpPost.setHeader( "Referer", "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0"); httpPost.setHeader("Remote Address", "192.168.56.1"); httpPost.setHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"); httpPost.setHeader("Time-Iso8601", getISO8601Timestamp()); httpPost.setHeader("Server Address", "192.168.56.80"); httpPost.setHeader( "Cookie", "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)"); // 4. Setting Request Parameters ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>(); parameters .add(new BasicNameValuePair( "json", "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}")); httpPost.setEntity(new UrlEncodedFormEntity(parameters)); // 5. Initiation of requests CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost); // 6. Get the return value System.out.println(response != null); } public static void spiderCss() throws Exception { // 1. Designate the target website String url = basePath +"/B2C40/dist/main/css/flight.css"; // 2. Initiation of requests HttpPost httpPost = new HttpPost(url); // 3. Setting Request Parameters httpPost.setHeader("Time-Local", getLocalDateTime()); httpPost.setHeader("Requst", "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1"); httpPost.setHeader("Request Method", "POST"); httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); httpPost.setHeader("Referer", "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html"); httpPost.setHeader("Remote Address", "192.168.56.1"); httpPost.setHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"); httpPost.setHeader("Time-Iso8601", getISO8601Timestamp()); httpPost.setHeader("Server Address", "192.168.56.80"); httpPost.setHeader( "Cookie", "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)"); // 4. Setting Request Parameters ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>(); parameters .add(new BasicNameValuePair( "json", "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}")); httpPost.setEntity(new UrlEncodedFormEntity(parameters)); // 5. Initiation of requests CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost); // 6. Get the return value System.out.println(response != null); } public static void spiderPng() throws Exception { // 1. Designate the target website String url =basePath + "/B2C40/dist/main/images/common.png"; // 2. Initiation of requests HttpPost httpPost = new HttpPost(url); // 3. Setting Request Parameters httpPost.setHeader("Time-Local", getLocalDateTime()); httpPost.setHeader("Requst", "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1"); httpPost.setHeader("Request Method", "POST"); httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); httpPost.setHeader( "Referer", "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0"); httpPost.setHeader("Remote Address", "192.168.56.1"); httpPost.setHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"); httpPost.setHeader("Time-Iso8601", getISO8601Timestamp()); httpPost.setHeader("Server Address", "192.168.56.80"); httpPost.setHeader( "Cookie", "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)"); // 4. Setting Request Parameters ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>(); parameters .add(new BasicNameValuePair( "json", "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}")); httpPost.setEntity(new UrlEncodedFormEntity(parameters)); // 5. Initiation of requests CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost); // 6. Get the return value System.out.println(response != null); } public static void spiderJpg() throws Exception { // 1. Designate the target website String url = basePath +"/B2C40/dist/main/images/loadingimg.jpg"; // 2. Initiation of requests HttpPost httpPost = new HttpPost(url); // 3. Setting Request Parameters httpPost.setHeader("Time-Local", getLocalDateTime()); httpPost.setHeader("Requst", "POST /B2C40/query/jaxb/direct/query.ao HTTP/1.1"); httpPost.setHeader("Request Method", "POST"); httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); httpPost.setHeader( "Referer", "http://b2c.csair.com/B2C40/modules/bookingnew/main/flightSelectDirect.html?t=S&c1=CAN&c2=WUH&d1=2018-02-20&at=1&ct=0&it=0"); httpPost.setHeader("Remote Address", "192.168.56.1"); httpPost.setHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"); httpPost.setHeader("Time-Iso8601", getISO8601Timestamp()); httpPost.setHeader("Server Address", "192.168.56.80"); httpPost.setHeader( "Cookie", "JSESSIONID=782121159357B98CA6112554CF44321E; sid=b5cc11e02e154ac5b0f3609332f86803; aid=8ae8768760927e280160bb348bef3e12; identifyStatus=N; userType4logCookie=M; userId4logCookie=13818791413; useridCookie=13818791413; userCodeCookie=13818791413; temp_zh=cou%3D0%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-13%3B%E5%B9%BF%E5%B7%9E-%E5%8C%97%E4%BA%AC%3B1%2C0%2C0%3B%26cou%3D1%3Bsegt%3D%E5%8D%95%E7%A8%8B%3Btime%3D2018-01-17%3B%E5%B9%BF%E5%B7%9E-%E6%88%90%E9%83%BD%3B1%2C0%2C0%3B%26; JSESSIONID=782121159357B98CA6112554CF44321E; WT-FPC=id=211.103.142.26-608782688.30635197:lv=1516170718655:ss=1516170709449:fs=1513243317440:pn=2:vn=10; language=zh_CN; WT.al_flight=WT.al_hctype(S)%3AWT.al_adultnum(1)%3AWT.al_childnum(0)%3AWT.al_infantnum(0)%3AWT.al_orgcity1(CAN)%3AWT.al_dstcity1(CTU)%3AWT.al_orgdate1(2018-01-17)"); // 4. Setting Request Parameters ArrayList<BasicNameValuePair> parameters = new ArrayList<BasicNameValuePair>(); parameters .add(new BasicNameValuePair( "json", "{\"depcity\":\"CAN\", \"arrcity\":\"WUH\", \"flightdate\":\"20180220\", \"adultnum\":\"1\", \"childnum\":\"0\", \"infantnum\":\"0\", \"cabinorder\":\"0\", \"airline\":\"1\", \"flytype\":\"0\", \"international\":\"0\", \"action\":\"0\", \"segtype\":\"1\", \"cache\":\"0\", \"preUrl\":\"\", \"isMember\":\"\"}")); httpPost.setEntity(new UrlEncodedFormEntity(parameters)); // 5. Initiation of requests CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost); // 6. Get the return value System.out.println(response != null); } public static String getLocalDateTime() { DateFormat df = new SimpleDateFormat("dd/MMM/yyyy'T'HH:mm:ss +08:00", Locale.ENGLISH); String nowAsISO = df.format(new Date()); return nowAsISO; } public static String getISO8601Timestamp() { DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss+08:00"); String nowAsISO = df.format(new Date()); return nowAsISO; } public static String getGoTime() { DateFormat df = new SimpleDateFormat("yyyy-MM-dd"); String nowAsISO = df.format(new Date()); return nowAsISO; } public static String getBackTime() { Date date = new Date();// Take time Calendar calendar = new GregorianCalendar(); calendar.setTime(date); calendar.add(calendar.DATE, +1);// Reduce the date by one day. If you want to push the date back by one day, change the negative number to the positive number. date = calendar.getTime(); SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd"); String dateString = formatter.format(date); return dateString; } }
Step 6: Start kafka
1. Create topic
[root@node01 bin]# kafka-topics.sh --zookeeper node01:2181 --partitions 3 --replication-factor 3 --create --topic B2CDATA_COLLECTION1
2. Open up kafka consumers
[root@node01 bin]# kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --topic B2CDATA_COLLECTION1
Step 7: Open the crawler program and observe the results
1. Start the crawler program
2. Look at the consumer window as follows
Step 8: Start kafka-manager observation
1. Start kafka-manager
[root@node01 conf]# cd /export/servers/kafka-manager-1.3.3.23/bin/ [root@node01 bin]# ll total 36 -rwxr-xr-x 1 root root 13747 May 1 06:27 kafka-manager -rw-r--r-- 1 root root 9975 May 1 06:27 kafka-manager.bat -rwxr-xr-x 1 root root 1383 May 1 06:27 log-config -rw-r--r-- 1 root root 105 May 1 06:27 log-config.bat [root@node01 bin]# #start-up [root@node01 bin]# ./kafka-manager
After startup window:
2. Browser access
Browser input: node01:9000
kafka manager uses no explanation to observe the consumption of B2CDATA_COLLECTION 1:
There are three zones, and the difference in information consumed by each zone indicates that it has succeeded.
If not, the kafkalua.lua script does not configure partitioning policies. The default partitioning will cause data skew. We need to configure our own partitioning policies!
Complete!!!!!!!!