Focus on official account dry cargo WeChat public: K brother crawler, keep sharing crawler advance, JS/ Android reverse technology dry goods!
statement
All contents in this article are for learning and communication only. The packet capturing content, sensitive website and data interface have been desensitized. It is strictly prohibited to use them for commercial and illegal purposes, otherwise all the consequences have nothing to do with the author. If there is infringement, please contact me and delete them immediately!
Reverse target
- Objective: webmaster anti crawler practice platform question 6: JS encryption, environment simulation detection
- Link: http://spider.wangluozhe.com/challenge/6
- Introduction: it is also required to collect all figures on 100 pages and calculate the sum of all data. Please note that! Don't reuse a parameter value. Don't deceive yourself!
Packet capture analysis
Through packet capture analysis, it can be found that unlike the previous questions, the parameters in Payload change, but there is a hexin-v in Request Headers, which changes every request. If a friend has done a Huashun financial crawler, it will be found that this parameter is also widely used in a Huashun site, as shown in the figure below:
Find encryption
First, try to search hexin-v directly, only in 6 There are values in JS. It is obvious that this JS is confused and cannot be located. Take a closer look at the whole 6 JS is a self executing function (IIFE). The parameters passed in are 7 arrays, corresponding to n, t, r, e, a, u and c respectively, as shown below:
!function (n, t, r, e, a, u, c) { }( [],[],[],[],[],[],[] );
6.js takes the value through the element subscript when calling the value, so this confusion is also very simple. If you want to restore, you can directly write a script to replace the value corresponding to the array. Of course, it is relatively simple in this example and there is no need to solve the confusion.
Because the value of hexin-v is in Request Headers, we can use Hook to catch the debugger when setting the hexin-v value of the header (the method of injecting Hook code is explained in detail in brother K's previous articles and will not be repeated in this article):
(function () { 'use strict'; var org = window.XMLHttpRequest.prototype.setRequestHeader; window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) { if (key == 'hexin-v') { debugger; } return org.apply(this, arguments); }; })();
The next step is to follow the stack. Follow one up to 6 The value of h seen in JS is the value we want, h = CT update(),ct.update() is actually x(), as shown in the following figure:
Follow up x(), t is the value we want, t = N():
Continue to follow up N(). et.encode(n) is the final value. You can see some functions such as mouse movement and clicking:
We have analyzed it earlier, 6 JS is a self executing method, and the amount of code is not very large, so we directly define a global variable here, and export the N method, so we don't deduct method by method. The pseudo code is as follows:
// Define global variables var Hexin; !function (n, t, r, e, a, u, c) { // Omit N multiple codes function N() { S[T]++, S[f] = ot.serverTimeNow(), S[l] = ot.timeNow(), S[k] = zn, S[I] = it.getMouseMove(), S[_] = it.getMouseClick(), S[y] = it.getMouseWhell(), S[E] = it.getKeyDown(), S[A] = it.getClickPos().x, S[C] = it.getClickPos().y; var n = S.toBuffer(); return et.encode(n) } // Assign N method to global variable Hexin = N }( [],[],[],[],[],[],[] ); // The custom function gets the final hexin-v value function getHexinV(){ return Hexin() }
Environmental supplement
After the above rewriting, we will debug locally and find that window and document are undefined. We first define them as null according to the previous method, and then report the error getElementsByTagName is not a function. We know that getElementsByTagName obtains the object of calibration signature, which belongs to the content of HTML DOM, Our local node implementation certainly does not have this environment.
Here we introduce a method that can be used directly in node JS creates a DOM environment using the jsdom library. The official introduction is as follows:
Jsdom is a pure JavaScript implementation of many Web standards, especially WHATWG DOM and HTML standards for node js. In general, the goal of the project is to simulate enough subsets of Web browsers for testing and capturing real Web applications. The latest version of jsdom requires node JS V12 or later. (the jsdom version lower than v17 is still applicable to the previous Node.js version, but it is not supported.) for specific usage, please refer to jsdom document.
It should be noted that jsdom also relies on canvas, so it is also necessary to install the canvas library. The HTML canvas tag is used to dynamically draw graphics through scripts (usually JavaScript). For details and usage, please refer to canvas document.
After we add the following code to the local JS, we have a DOM environment and can run successfully:
// var canvas = require("canvas"); var jsdom = require("jsdom"); var {JSDOM} = jsdom; var dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`); window = dom.window; document = window.document; navigator = window.navigator;
With Python code, carry different hexin-v in the request header each time, calculate the data of each page one by one, and finally submit it successfully:
Complete code
GitHub pays attention to brother K crawler and continues to share crawler related codes! Welcome, star! https://github.com/kgepachong/
The following only demonstrates part of the key code and cannot be run directly! Full code warehouse address: https://github.com/kgepachong/crawler/
JavaScript encryption key code
/* ================================== # @Time : 2021-12-20 # @Author : WeChat official account: K brother crawler # @FileName: challenge_6.js # @Software: PyCharm # ================================== */ var TOKEN_SERVER_TIME = 1611313000.340; var Hexin; var jsdom = require("jsdom"); var {JSDOM} = jsdom; var dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`); window = dom.window; document = window.document; navigator = window.navigator; !function(n, t, r, e, a, u, c) { !function() { function Gn() {} var Qn = [new a[23](n[20]), new e[3](f + l + d + p)]; function Zn() {} var Jn = [new t[16](c[13]), new u[9](e[19])], qn = a[24][u[16]] || a[24].getElementsByTagName(st(r[19], r[20]))[a[25]], nt; !function(o) {}(nt || (nt = {})); var tt; !function(o) {}(tt || (tt = {})); var rt = function() {}(), et; RT = rt !function(o) {}(et || (et = {})); function at() {} var ot; !function(o) {}(ot || (ot = {})); var it; !function(o) {}(it || (it = {})); var ut; !function(s) {}(ut || (ut = {})); var ct; !function(o) { function x() {} function L() {} function M() {} o[a[105]] = M; function N() { S[T]++, S[f] = ot.serverTimeNow(), S[l] = ot.timeNow(), S[k] = zn, S[I] = it.getMouseMove(), S[_] = it.getMouseClick(), S[y] = it.getMouseWhell(), S[E] = it.getKeyDown(), S[A] = it.getClickPos().x, S[C] = it.getClickPos().y; var n = S.toBuffer(); return et.encode(n) } Hexin = N o[r[81]] = x }(ct || (ct = {})); function st() {} var vt; !function(o) {}(vt || (vt = {})); var ft; !function(r) {}(ft || (ft = {})) }() }( [],[],[],[],[],[],[] ); function getHexinV(){ return Hexin() } // Test output // console.log(getHexinV())
Python computing key code
# ================================== # --*-- coding: utf-8 --*-- # @Time : 2021-12-20 # @Author: WeChat official account: K brother crawler # @FileName: challenge_6.py # @Software: PyCharm # ================================== import execjs import requests challenge_api = "http://spider.wangluozhe.com/challenge/api/6" headers = { "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8", "Cookie": "cookie Change it to your own!", "Host": "spider.wangluozhe.com", "Origin": "http://spider.wangluozhe.com", "Referer": "http://spider.wangluozhe.com/challenge/6", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36", "X-Requested-With": "XMLHttpRequest" } def get_hexin_v(): with open('challenge_6.js', 'r', encoding='utf-8') as f: wlz_js = execjs.compile(f.read()) hexin_v = wlz_js.call("getHexinV") print("hexin-v: ", hexin_v) return hexin_v def main(): result = 0 for page in range(1, 101): data = { "page": page, "count": 10, } headers["hexin-v"] = get_hexin_v() response = requests.post(url=challenge_api, headers=headers, data=data).json() for d in response["data"]: result += d["value"] print("The result is: ", result) if __name__ == '__main__': main()