[JS reverse hundred examples] the first question on the anti climbing practice platform of netizens: JS confusion, encryption and anti Hook operation

Posted by dfowler on Tue, 14 Dec 2021 03:17:19 +0100

Focus on official account dry cargo WeChat public: K brother crawler, keep sharing crawler advance, JS/ Android reverse technology dry goods!

statement

All contents in this article are for learning and communication only. The packet capturing content, sensitive website and data interface have been desensitized. It is strictly prohibited to use them for commercial and illegal purposes, otherwise all the consequences have nothing to do with the author. If there is infringement, please contact me and delete them immediately!

Write in front

The topic itself is not very difficult, but there are many pits, mainly anti Hook operation and local joint commissioning and compensation environment. This paper will introduce each pit in detail, not just a single pass, but in great detail!

Through this article, you will learn:

  1. Hook Function and timer to eliminate infinite debugger;
  2. Solve the anti Hook problem and find the encryption parameters through Hook_ signature;
  3. Analyze the differences between the browser and the local environment, how to find navigator, document, location and other objects, and how to supplement the local environment;
  4. How to use PyCharm for local joint debugging to locate the differences between local and browser environments, so as to pass the detection.

Reverse target

  • Objective: webmaster anti crawler practice platform question 1: JS confusion encryption, anti Hook operation
  • Link: http://spider.wangluozhe.com/challenge/1
  • Introduction: the answer to this question is to add all the data on 100 pages. It is required to complete this question in the form of Hook. Do not solve it in the form of AST and deduction code, and do not decrypt it with JS anti confusion tool. (the writing method and usage of Hook code are described in brother K's previous articles and will not be described in detail in this article.)

Bypass infinite debugger

First of all, it is observed that the URL does not change when you click page turning, so it is generally an Ajax request. Some parameters of each request will change. Skillfully press F12 to find the encryption parameters. You will find that you will immediately stop, enter the infinite debugger state, and follow a stack up to display the word debugger, as shown in the following figure:

This situation also exists in the previous case of brother K. at that time, we directly rewritten the JS and replaced the word debugger. However, this topic obviously hopes that we can get rid of infinite debugger by Hook. In addition to debugger, we noticed that there is a word constructor in front of us, which is called construction method in JavaScript, It is usually called when an object is created or instantiated. Its basic syntax is: constructor([arguments]) {...}, For detailed introduction, please refer to MDN construction method , in this case, it is obvious that the debugger is the arguments parameter of the constructor, so we can write the following Hook code to pass the infinite debugger:

// Keep the original constructor first
Function.prototype.constructor_ = Function.prototype.constructor;
Function.prototype.constructor = function (a) {
    // If the parameter is debugger, an empty method is returned
    if(a == "debugger") {
        return function (){};
    }
    // If the parameter is not debugger, the original method is returned
    return Function.prototype.constructor_(a);
};

There are also many ways to inject Hook code, such as entering code directly on the browser developer tool console (refreshing the web page will fail), Fiddler plug-in injection, oil monkey plug-in injection, self writing browser plug-in injection, etc. these methods have been introduced in brother K's previous articles and will not be repeated today.

This time, we use the Fiddler plug-in to inject. After injecting the above Hook code, we will find that it will enter the infinite debugger and setInterval again. Obviously, the timer has two necessary parameters. The first is the method to be executed, and the second is the time parameter, that is, the time interval of calling the method periodically, in milliseconds. For details, please refer to Rookie tutorial Window setInterval() Similarly, we can Hook it out:

// Keep the original timer first
var setInterval_ = setInterval
setInterval = function (func, time){
    // If the time parameter is 0x7d0, an empty method is returned
    // Of course, you can directly return null without judgment. There are many ways to write it
    if(time == 0x7d0)
    {
        return function () {};
    }
    // If the time parameter is not 0x7d0, the original method is returned
    return setInterval_(func, time)
}

Paste two sections of Hook code into the browser plug-in, open the Hook, refresh the page, and you will find that the infinite debugger has passed.

Hook parameters

After passing the infinite debugger, we can click a page casually, and the packet capture can see that it is a POST request. In Form Data, page is the number of pages, and count is the amount of data on each page_ signature is the parameter we want to reverse, as shown in the following figure:

We search directly_ Signature, there is only one result, including a window get_ The sign () method is to set_ The function of signature is shown in the following figure:

Here comes the problem!!! Let's take a look at the title of this topic. JS confusion encryption and anti Hook operation. The author has repeatedly stressed that this topic is to test Hook ability! And so far, it seems that we haven't encountered any anti Hook means, so we can search directly_ signature is obviously too simple. It must be obtained through Hook_ signature, and the subsequent Hook operation will not be smooth!

Don't say much. Let's write a hook window directly_ The code of signature is as follows:

(function() {
    //Rigorous mode checks for all errors
    'use strict';
    //window is the object to hook. Here is the object of hook_ signature
	var _signatureTemp = "";
    Object.defineProperty(window, '_signature', {
		//The hook set method is also the method of assignment 
		set: function(val) {
				console.log('Hook Capture _signature set up->', val);
                debugger;
				_signatureTemp = val;
				return val;
		},
		//The hook get method is also the value taking method 
		get: function()
		{
			return _signatureTemp;
		}
    });
})();

Two Hook codes that bypass the infinite debugger, and this Hook_ Together with the code of signature, Inject with Fiddler plug-in (note that the code bypassing the debugger should be placed behind the Hook _signature code, otherwise it may not work, which may be a BUG of the plug-in). Refresh the web page again, and you can find that the buttons on the front row of pages are missing. Open the developer tool, and you can see two errors in the upper right corner. Click to jump to the wrong code, or you can go to the console See the error message, as shown in the figure below:

Whole 1 The JS code has been confused by sojson jsjiami v6. We output some confused codes on the console, and then restore this code manually. There are two variables i1I1i1li and illllli1, which seem laborious. They are directly replaced by a and b, as shown below:

(function() {
    'use strict';
    var a = '';
    Object["defineProperty"](window, "_signature", {
        set: function(b) {
            a = b;
            return b;
        },
        get: function() {
            return a;
        }
    });
}());

Are you familiar with it? There are get and set methods. This is the Hook window_ Signature operation? The whole logic is when the set method is set_ When signature, assign it to a and get it_ A is returned for signature. In fact, for_ Signature has no effect. What is the meaning of this code? Why do we report errors when we add our own Hook code?

Look at the error message: uncaught typeerror: cannot redefine property:_ Signature, cannot be redefined_ signature? Our Hook code runs object as soon as the page is loaded defineProperty (window, '_signature', {}). When the website's JS defineProperty is defined again, an error will be reported. That's very simple. Since the redefinition is not allowed, and the website's own JS Hook code will not be affected_ Signature, just delete it! This place is probably an anti Hook operation.

Save original 1 JS to the local, delete its Hook code, use Fiddler's AutoResponder function to replace the response (there are many replacement methods, which are also described in brother K's previous articles), refresh again, find that the exception is removed, and successfully Hook to _signature.

Inverse parameter

After the successful Hook, the method is directly exposed by directly following the stack: window_ signature = window. byted_ acrawler(window.sign())

Let's take a look at window Sign(), select it and you can actually see that it is a 13 bit millisecond timestamp. We follow up 1 JS to see his implementation code:

We will restore some confused codes manually:

window["sign"] = function sign() {
    try {
        div = document["createElement"];
        return Date["parse"](new Date())["toString"]();
    } catch (IIl1lI1i) {
        return "123456789abcdefghigklmnopqrstuvwxyz";
    }
}

It should be noted here that there is a pit buried for us. If you skip it directly and think that a time stamp is not good-looking, you are very wrong! Note that this is a try catch statement, including a sentence div = document["createElement"];, There is an HTML DOM Document object that creates a div tag. If this code is put into the browser for execution, there is no problem. Go directly to the try statement and return the timestamp. If it is executed in our local node, it will capture the document is not defined, and then go to the catch statement to return the string of numbers plus letters. The final result must be incorrect!

The solution is also very simple. In the local code, either remove the try catch statement and directly return the timestamp, or define the document at the beginning, or comment out the line of code that creates the div tag. However, brother K recommends defining the document directly here, because who can guarantee that there are similar holes in other places? If you hide deeply and don't find it, isn't it in vain?

Then look at window byted_ Acrawler(), the return statement mainly uses sign(), that is, window Sign() method and IIl1llI1() method. We can see that the try catch statement is also used in IIl1llI1() method, NAV = navigator [liiiiii11 ('2b ')]; Similar to the previous div, it is also recommended to define navigator directly, as shown in the following figure:

The methods used here are basically analyzed. After defining the window, document and navigator, run them locally, and you will be prompted with window [liiiiii11 (...)] is not a function:

When we go to the web page, we will find that this method is actually a timer, which has no great effect. Just comment it out directly:

PyCharm local joint commissioning

After the above operations, run it locally again and you will be prompted window Signs is not a function. The error is an eval statement. We go to the browser to have a look at the eval statement and find that it is clearly window Sign(), why does the local become window Signs (), why are there more s for no reason?

There is only one reason for this situation, which is the difference between the local environment and the browser environment. There must be environment detection in the confused code. If it is not the browser environment, the code in eval will be modified and an s will be added. Here, if you directly delete the whole function containing eval sentences and the setInterval timer above, the code can also run normally, but, Brother K always pursues details! We must find out the reason for adding more s!

We use PyCharm locally for debugging to see where an s is added. The error is the eval statement. We click this line, the next breakpoint, and right-click debug to run, Enter the debugging interface (PS: the original code has an infinite debugger. If it is not processed, the debugging in PyCharm will also enter the infinite debugger. You can directly add the previous Hook code to the local code or delete the corresponding function or variable):

On the left is the call stack and on the right is the variable value. On the whole, it is similar to the developer tools in Chrome. For detailed usage, please refer to Official JetBrains documentation , mainly introduce the 8 buttons in the figure below:

  1. Show Execution Point (Alt + F10): if your cursor is on other lines or other pages, click this button to jump to the line where the current breakpoint is located;
  2. Step Over (F8): step down line by line. If there is a method on this line, it will not enter the method;
  3. Step Into (F7): step in. If there is a method in the current line, you can enter the method. It is generally used to enter the user-defined method written by the user, and will not enter the official class library;
  4. Force Step Into (Alt + Shift + F7): forced step in. You can enter any method. When viewing the underlying source code, you can use this method to enter the official class library;
  5. Step Out (Shift + F8): step out, exit from the entered method to the method call. At this time, the method has been executed, but the assignment has not been completed;
  6. Restart Frame: abandon the current breakpoint and re execute the breakpoint;
  7. Run to Cursor (Alt + F9): when running to the cursor, the code will run to the cursor line without breaking points;
  8. Evaluate Expression (Alt + F8): To evaluate an expression, you can run the expression directly without entering it on the command line.

Click the Step Into button to enter function iiliii (), where the try catch statement is also used. If you continue to the next step, you will find that an exception is caught and prompt Cannot read property 'location' of undefined, as shown in the following figure:

Let's output the values of each variable and restore the code manually, as follows:

function IIlIliii(II1, iIIiIIi1) {
    try {
        href = window["document"]["location"]["href"];
        check_screen = screen["availHeight"];
        window["code"] = "gnature = window.byted_acrawler(window.sign())";
        return '';
    } catch (I1IiI1il) {
        window["code"] = "gnature = window.byted_acrawlers(window.signs())";
        return '';
    }
}

In this way, we found a clue. Locally, we do not have document, location, href and availHeight objects, so we will use the catch statement to become window Signs (), an error will be reported. The solution here is also very simple. You can directly delete the redundant code and directly define it as the string of statements without s, or you can choose to supplement the environment. Look at the values of href and screen in the browser and define it:

var window = {
    "document": {
        "location": {
            "href": "http://spider.wangluozhe.com/challenge/1"
        }
    },
}

var screen = {
    "availHeight": 1040
}

Then run it again, and you will be prompted that sign is not defined. Here, sign () is actually window sign(), that is, the following window[liIIIi11('a ')] method, can be written in any way:

Run again, there is no error. We can write a method to get it ourselves_ signature: either of the following can be used:

function getSign(){
    return window[liIIIi11('9')](window[liIIIi11('a')]())
}

function getSign(){
    return window.byted_acrawler(window.sign())
}

// Test output
console.log(getSign())

We run it and find that there is no output in pychar. Similarly, we output console on the console of the title page Log is found to be empty, as shown in the following figure:

It seems that he is also interested in console Log is processed. In fact, this is not a problem. We can directly use Python script to call the getSign() method we wrote earlier_ The value of signature, but again, brother K always pursues details! I have to find the processing console Log place, make it normal!

Here, we still use pycham to debug and get more familiar with local joint debugging in console Log (getsign()) statement, follow up step by step, and you will find that the statement var IlII1li1 = function() {};, Check the variable value at this time and find console log,console. The methods such as warn are set to null, as shown in the following figure:

Follow up the next step and find that it is returned directly. Here, it is possible that the method related to console will be null when JS is run for the first time. Therefore, first lay down several breakpoints in the suspected method of console processing, and then re debug. It will find that the else statement will be reached, and then directly assign IlII1li1, that is, the empty method, to the related commands of console, As shown in the figure below:

After locating the problem, we can comment out the if else statement without leaving it empty, and then debug it again. It is found that the result can be output directly:

Call Python to carry_ signature calculates the data of each page one by one and finally submits it successfully:

Complete code

GitHub pays attention to brother K crawler and continues to share crawler related codes! Welcome, star! https://github.com/kgepachong/

**The following only demonstrates part of the key code and cannot be run directly** Full code warehouse address: https://github.com/kgepachong/crawler/

JavaScript encryption key code architecture

var window = {
    "document": {
        "location": {
            "href": "http://spider.wangluozhe.com/challenge/1"
        }
    },
}

var screen = {
    "availHeight": 1040
}
var document = {}
var navigator = {}
var location = {}

// Keep the original constructor first
Function.prototype.constructor_ = Function.prototype.constructor;
Function.prototype.constructor = function (a) {
    // If the parameter is debugger, an empty method is returned
    if(a == "debugger") {
        return function (){};
    }
    // If the parameter is not debugger, the original method is returned
    return Function.prototype.constructor_(a);
};

// Keep the original timer first
var setInterval_ = setInterval
setInterval = function (func, time){
    // If the time parameter is 0x7d0, an empty method is returned
    // Of course, you can directly return null without judgment. There are many ways to write it
    if(time == 0x7d0)
    {
        return function () {};
    }
    // If the time parameter is not 0x7d0, the original method is returned
    return setInterval_(func, time)
}

var iil = 'jsjiami.com.v6'
  , iiIIilii = [iil, '\x73\x65\x74\x49\x6e\x74\x65\x72\x76\x61\x6c', '\x6a\x73\x6a', ...];
var liIIIi11 = function(_0x11145e, _0x3cbe90) {
    _0x11145e = ~~'0x'['concat'](_0x11145e);
    var _0x636e4d = iiIIilii[_0x11145e];
    return _0x636e4d;
};
(function(_0x52284d, _0xfd26eb) {
    var _0x1bba22 = 0x0;
    for (_0xfd26eb = _0x52284d['shift'](_0x1bba22 >> 0x2); _0xfd26eb && _0xfd26eb !== (_0x52284d['pop'](_0x1bba22 >> 0x3) + '')['replace'](/[fnwRwdGKbwKrRFCtSC=]/g, ''); _0x1bba22++) {
        _0x1bba22 = _0x1bba22 ^ 0x661c2;
    }
}(iiIIilii, liIIIi11));
// window[liIIIi11('0')](function() {
//     var l111IlII = liIIIi11('1') + liIIIi11('2');
//     if (typeof iil == liIIIi11('3') + liIIIi11('4') || iil != l111IlII + liIIIi11('5') + l111IlII[liIIIi11('6')]) {
//         var Ilil11iI = [];
//         while (Ilil11iI[liIIIi11('6')] > -0x1) {
//             Ilil11iI[liIIIi11('7')](Ilil11iI[liIIIi11('6')] ^ 0x2);
//         }
//     }
//     iliI1lli();
// }, 0x7d0);
(function() {
    var iiIIiil = function() {}();
    var l1liii11 = function() {}();
    window[liIIIi11('9')] = function byted_acrawler() {};
    window[liIIIi11('a')] = function sign() {};
    (function() {}());
    // (function() {
    //     'use strict';
    //     var i1I1i1li = '';
    //     Object[liIIIi11('1f')](window, liIIIi11('21'), {
    //         '\x73\x65\x74': function(illllli1) {
    //             i1I1i1li = illllli1;
    //             return illllli1;
    //         },
    //         '\x67\x65\x74': function() {
    //             return i1I1i1li;
    //         }
    //     });
    // }());
    var iiil1 = 0x0;
    var l11il1l1 = '';
    var ii1Ii = 0x8;
    function i1Il11i(iiIll1i) {}
    function I1lIIlil(l11l1iIi) {}
    function lllIIiI(IIi1lIil) {}

    // N functions are omitted here
    
    window[liIIIi11('37')]();
}());

function iliI1lli(lil1I1) {
    function lili11I(l11I11l1) {
        if (typeof l11I11l1 === liIIIi11('38')) {
            return function(lllI11i) {}
            [liIIIi11('39')](liIIIi11('3a'))[liIIIi11('8')](liIIIi11('3b'));
        } else {
            if (('' + l11I11l1 / l11I11l1)[liIIIi11('6')] !== 0x1 || l11I11l1 % 0x14 === 0x0) {
                (function() {
                    return !![];
                }
                [liIIIi11('39')](liIIIi11('3c') + liIIIi11('3d'))[liIIIi11('3e')](liIIIi11('3f')));
            } else {
                (function() {
                    return ![];
                }
                [liIIIi11('39')](liIIIi11('3c') + liIIIi11('3d'))[liIIIi11('8')](liIIIi11('40')));
            }
        }
        lili11I(++l11I11l1);
    }
    try {
        if (lil1I1) {
            return lili11I;
        } else {
            lili11I(0x0);
        }
    } catch (liIlI1il) {}
}
;iil = 'jsjiami.com.v6';

// function getSign(){
//     return window[liIIIi11('9')](window[liIIIi11('a')]())
// }

function getSign(){
    return window.byted_acrawler(window.sign())
}

console.log(getSign())

Python computing key code

# ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-12-01
# @Author: WeChat official account: K brother crawler
# @FileName: challenge_1.py
# @Software: PyCharm
# ==================================


import execjs
import requests

challenge_api = "http://spider.wangluozhe.com/challenge/api/1"
headers = {
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "take cookie Change the value to your own!",
    "Host": "spider.wangluozhe.com",
    "Origin": "http://spider.wangluozhe.com",
    "Referer": "http://spider.wangluozhe.com/challenge/1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}


def get_signature():
    with open('challenge_1.js', 'r', encoding='utf-8') as f:
        ppdai_js = execjs.compile(f.read())
    signature = ppdai_js.call("getSign")
    print("signature: ", signature)
    return signature


def main():
    result = 0
    for page in range(1, 101):
        data = {
            "page": page,
            "count": 10,
            "_signature": get_signature()
        }
        response = requests.post(url=challenge_api, headers=headers, data=data).json()
        for d in response["data"]:
            result += d["value"]
    print("The result is: ", result)


if __name__ == '__main__':
    main()

Topics: Python Javascript crawler hook