Python 3 Network Crawler Actual Warfare-44, Identification of Verification Code for Point Contact Selection

Posted by jskywalker on Wed, 07 Aug 2019 17:25:36 +0200

In the last section, we realized the recognition of the test verification code, but there is another common and widely used test code besides the test. The more representative one is the touch verification code.

Maybe you are unfamiliar with the name, but you must have seen similar authentication codes, such as 12306, which is a typical touch verification code, as shown in Figure 8-18:

Figure 8-18 12306 Verification Code

We need to click directly on the graph that meets the requirements. If all the answers are correct, the verification will succeed. If one answer is wrong, the verification will fail. This kind of verification code can be called touch verification code. In the process of learning, you can join us to learn and communicate with us. In the middle of 784, 758 and 214, you can share with you the current talent demand of Python enterprise, how to learn Python from zero foundation, and what to learn. Relevant learning videos and development tools are shared

There is also a site dedicated to providing touch verification code services, called TouClick, whose official website is: https://www.touclick.com/ This section takes it as an example to illustrate the identification process of such verification codes.

1. This program logo

In this section, our goal is to identify and verify the code by program.

2. Preparations

This time we use Selenium as the Python library and Chrome as the browser. Before that, please make sure that Selenium library, Chrome browser and Chrome Driver are installed correctly. The related process can be referred to in Chapter 1.

3. Understanding touch verification codes

The authentication code style of TouClick's official website is shown in Figure 8-19.

Figure 8-19 Verification Code Style

There are similarities with 12306 sites, but this time it's clicking on the text in the picture, not the picture. There are also various forms of touch verification codes, which may have slightly different interaction forms, but the basic principles are similar.

Next, we will unify the recognition process of this kind of touch verification code.

4. Identifying Ideas

This kind of verification code is very difficult to recognize if it relies on image recognition.

For example, for 12306, there are two difficulties in its recognition. The first is character recognition, as shown in Figure 8-20.

Figure 8-20 12306 Verification Code

If we click on all funnels in the picture, the word "funnel" has been deformed, scaled and blurred. If we use the OCR technology mentioned above to identify, the accuracy of recognition will be greatly reduced, and even no results will be obtained. The second point is image recognition. We need to convert the image into text again. We can use various graphic recognition interfaces. But the accuracy of the correct results is very low after I test. There are often cases of incorrect matching or incorrect matching, and the clarity of the image itself is not enough, so the recognition is more difficult. In addition, we need to recognize the results of eight pictures at the same time, and several of the answers need to match correctly to verify the validity. Generally speaking, this method is basically not feasible.

Take TouClick for example, as shown in Figure 8-21:

Figure 8-21 Verification Code Example

We need to recognize the word "plant" from this picture, but the background of the picture will more or less interfere, leading to OCR can hardly recognize the results. Some people will say, it's not good to directly recognize white text? But what if I change a validation code? As shown in Figure 8-22:

Figure 8-22 Verification Code Example

The text of this verification code picture turns blue again, and it also has white shadows, which will greatly increase the difficulty of recognition.

So this kind of validation code can't be solved? The answer, of course, is yes. What does it depend on? Depend on people.

Solve it by people? What else does the program do? Don't worry, the person here is not our own solution. There are many authentication code service platforms on the Internet. The platform provides authentication code recognition service 7 x 24 hours. A picture will get recognition results in a few seconds. The accuracy rate can reach more than 90%, but we need to spend some money to buy the service. After all, the platforms all provide authentication code recognition service. It's profitable, but don't worry, it only takes a few cents to identify a validation code.

One platform I personally recommend here is Super Eagle, whose official website is: https://www.chaojiying.com Non-advertising.

It provides a wide range of services, and there are many types of identifiable authentication codes, including touch verification codes.

In addition, Super Hawk platform also supports simple graphic verification code recognition. If OCR recognition is difficult, the same method can be used in this section to identify the platform. Here are some services provided by this platform:

  • English numerals, providing mixed recognition of up to 20 digits of English numerals
  • Chinese Characters, providing up to 7 Chinese Character Recognition
  • Pure English, providing up to 12-digit English recognition
  • Pure numbers, providing identification of up to 11 digits
  • Arbitrary special characters, provide indefinite Chinese characters, English numerals, alphabetic initials, calculation questions, idiom mixing, container number and other characters recognition
  • Coordinate Selection Recognition, such as Complex Computing Questions, Four Choices of Selection Questions, Question and Answer Questions, Clicking on the same words, objects, animals and so on to return to multiple coordinates recognition

In case of any change, the official website shall prevail: https://www.chaojiying.com/price.html.

In this section, we need to solve the last category, coordinate multi-choice recognition. What we need to do is to submit the verification code picture to the platform, and then the platform will return the coordinate position of the recognition result in the image. Next, we will parse the coordinates to simulate the click.

The principle is very simple, let's experiment with the program.

5. Registered Account

Before we start, we need to register a superhawk account and apply for a software ID. The registration page links to: https://www.chaojiying.com/user/reg/ After registration, we need to add a software ID in the background developer center. The last thing is to recharge some points. The recharge amount can be determined according to the price and recognition quantity.

6. Getting API

After the above preparation, we can begin to use the program to dock the identification of the verification code.

First, we can download the corresponding Python API from the official website.

The modified API is as follows:

import requests

from hashlib import md5

class  Chaojiying(object):

    def __init__(self,  username,  password,  soft_id):

        self.username  =  username

        self.password  =  md5(password.encode('utf-8')).hexdigest()

        self.soft_id  =  soft_id

        self.base_params  =  {

            'user':  self.username,

            'pass2':  self.password,

            'softid':  self.soft_id,

        }

        self.headers  =  {

            'Connection':  'Keep-Alive',

            'User-Agent':  'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',

        }

    def post_pic(self,  im,  codetype):

        """

        im: Picture bytes

        codetype: Topic Type Reference http://www.chaojiying.com/price.html

        """

        params  =  {

            'codetype':  codetype,

        }

        params.update(self.base_params)

        files  =  {'userfile':  ('ccc.jpg',  im)}

        r  =  requests.post('http://upload.chaojiying.net/Upload/Processing.php',  data=params,  files=files,  headers=self.headers)

        return  r.json()

    def report_error(self,  im_id):

        """

        im_id:Pictures with incorrect headings ID

        """

        params  =  {

            'id':  im_id,

        }

        params.update(self.base_params)

        r  =  requests.post('http://upload.chaojiying.net/Upload/ReportError.php',  data=params,  headers=self.headers)

        return  r.json()

This paper defines a Chaojiying class, whose constructor receives three parameters: the user name, password and software ID of the superhawk, which are saved for use.

Next, the most important method is called post_pic(), where you need to pass in the code of the image object and the verification code. This method will send the image object and related information to the background of the superhawk for identification, and then return the Json which has been successfully identified.

Another method is called report_error(), which is a callback when an error occurs. If the verification code identifies an error, calling this method will return the corresponding score.

Next, we take TouClick's official website as an example to demonstrate the identification process of touch verification codes. The link is as follows: http://admin.touclick.com/ If there is no registered account, one can be registered first.

7. Initialization

First, we need to initialize some variables, such as WebDriver, Chaojiying objects and so on. The code implementation is as follows:

EMAIL  =  'cqc@cuiqingcai.com'

PASSWORD  =  ''

# Super Hawk User Name, Password, Software ID, Verification Code Type

CHAOJIYING_USERNAME  =  'Germey'

CHAOJIYING_PASSWORD  =  ''

CHAOJIYING_SOFT_ID  =  893590

CHAOJIYING_KIND  =  9102

class  CrackTouClick():

    def __init__(self):

        self.url  =  'http://admin.touclick.com/login.html'

        self.browser  =  webdriver.Chrome()

        self.wait  =  WebDriverWait(self.browser,  20)

        self.email  =  EMAIL

        self.password  =  PASSWORD

        self.chaojiying  =  Chaojiying(CHAOJIYING_USERNAME,  CHAOJIYING_PASSWORD,  CHAOJIYING_SOFT_ID)

Please modify the account number and password here.

8. Get the authentication code

The next step is to improve the relevant forms, and then simulate click-out validation code, this step is very simple, the code is implemented as follows:

def open(self):

    """

    //Open the web page and enter the username and password

    :return: None

    """

    self.browser.get(self.url)

    email  =  self.wait.until(EC.presence_of_element_located((By.ID,  'email')))

    password  =  self.wait.until(EC.presence_of_element_located((By.ID,  'password')))

    email.send_keys(self.email)

    password.send_keys(self.password)

def get_touclick_button(self):

    """

    //Get the initial validation button

    :return:

    """

    button  =  self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME,  'touclick-hod-wrap')))

    return  button

Here, the open() method fills in the form, and the get_touclick_button() method gets the validation code button and then triggers the click.

Next, we need to get the location and size of the verification code image as we did in the previous section. Then we can intercept the corresponding verification code image from the screenshot of the web page. The code is implemented as follows:

def get_touclick_element(self):

    """

    //Getting Validated Picture Objects

    :return: Picture object

    """

    element  =  self.wait.until(EC.presence_of_element_located((By.CLASS_NAME,  'touclick-pub-content')))

    return  element

def get_position(self):

    """

    //Get the location of the authentication code

    :return: Verification code position tuple

    """

    element  =  self.get_touclick_element()

    time.sleep(2)

    location  =  element.location

    size  =  element.size

    top,  bottom,  left,  right  =  location['y'],  location['y']  +  size['height'],  location['x'],  location['x']  +  size[

        'width']

    return  (top,  bottom,  left,  right)

def get_screenshot(self):

    """

    //Get screenshots of web pages

    :return: Screen object

    """

    screenshot  =  self.browser.get_screenshot_as_png()

    screenshot  =  Image.open(BytesIO(screenshot))

    return  screenshot

def get_touclick_image(self,  name='captcha.png'):

    """

    //Get Verification Code Pictures

    :return: Picture object

    """

    top,  bottom,  left,  right  =  self.get_position()

    print('Verification Code Location',  top,  bottom,  left,  right)

    screenshot  =  self.get_screenshot()

    captcha  =  screenshot.crop((left,  top,  right,  bottom))

    return  captcha

Here the get_touclick_image() method is to intercept the corresponding verification code picture from the web screenshot, where the relative position coordinates of the verification code picture are returned by the get_position() method, and finally we get an Image object.

9. Identification Verification Code

Then we call the post_pic() method of the Chaojiying object to send the picture to the Super Eagle background. The image sent here is in byte stream format. The code is as follows:

image  =  self.get_touclick_image()

bytes_array  =  BytesIO()

image.save(bytes_array,  format='PNG')

# Identification Verification Code

result  =  self.chaojiying.post_pic(bytes_array.getvalue(),  CHAOJIYING_KIND)

print(result)

After that, the result variable is the result of the recognition of the Super Eagle background. It may take several seconds to run. After all, there is still manual background to complete the recognition.

The result returned is a Json, and if the recognition is successful, a typical return result is similar to the following:

{'err_no':  0,  'err_str':  'OK',  'pic_id':  '6002001380949200001',  'pic_str':  '132,127|56,77',  'md5':  '1f8e1d4bef8b11484cb1f1f34299865b'}

The pic_str is the coordinates of the recognized text, which are returned in the form of strings. Each coordinate is separated by |, so we only need to parse it and then simulate the click. The code is as follows:

def get_points(self,  captcha_result):

    """

    //Analytical Recognition Results

    :param captcha_result: Recognition results

    :return: Transformed results

    """

    groups  =  captcha_result.get('pic_str').split('|')

    locations  =  [[int(number)  for  number in  group.split(',')]  for  group in  groups]

    return  locations

def touch_click_words(self,  locations):

    """

    //Click on the Verification Picture

    :param locations: Click Location

    :return: None

    """

    for  location in  locations:

        print(location)

        ActionChains(self.browser).move_to_element_with_offset(self.get_touclick_element(),  location[0],  location[1]).click().perform()

        time.sleep(1)

Here we use get_points() method to change the recognition result into a list. Finally, touch_click_words() method passes in the coordinates of the resolution by calling move_to_element_with_offset() method, and then clicks.

In this way, we can simulate the completion of the coordinate point selection, and the operation effect is shown in Figure 8-23.

Figure 8-23 Point Selection Effect

The last thing we need to do is click the submit validation button to wait for the validation to pass, and then click the login button to login successfully. The subsequent implementation will not be repeated here.

In this way, we have completed the identification of touch verification codes with the help of online verification code platform. This method is also a general method. The principle of identifying 12306 and other verification codes with this method is exactly the same.

10. Concluding remarks

In this section, we complete the identification of verification codes with the help of online coding platform. This recognition method is very powerful. Almost any verification codes can be recognized. If we encounter difficulties, it is undoubtedly an excellent choice to use the coding platform.

Topics: Python JSON Selenium PHP