Python boss takes you to crack with a crawler -- sliding verification code identification

Posted by volatileboy on Thu, 17 Feb 2022 23:18:01 +0100

Being a crawler will always encounter all kinds of anti climbing restrictions. The first defense line of anti climbing often appears when logging in. In order to limit the automatic login of crawlers, all families have tried their best. The so-called way is one foot high and the devil is one foot high.

Small class


Steps:
(1) Calculate sliding distance
(2) Simulated human sliding (the general idea is to be fast first and then slow)

Now let's take a look at the Douban login interfaceAt this time, we enter the wrong password to make it appear the verification code.

Slide and refresh several times and find some rules. The y-axis remains unchanged and the x-axis is changing. For the sliding verification code Douban, the x-axis distance is about 207. If accurate measurement is required, pixel comparison is required.
Next, find the slider through selenium and move it. But there is a problem. If you directly move (x1,y1) to (x2,y2), it is equivalent to the effect of blinking. The time is very short and may be detected by the other party.

Next, we need to use the click sliding track of the simulated real person, which is generally accelerated first and then accelerated, assuming uniform acceleration and uniform deceleration.
After sliding, if you don't pass, you can refresh the button and then slide until you pass (because after passing, the general page starts to jump to a different title or find another comparison to find a different one)

Simulate uniform acceleration and deceleration


Code implementation:

def get_tracks(distance, rate=0.6, t=0.2, v=0):
    """
    take distance Distance divided into small segments
    :param distance: Total distance
    :param rate: Critical ratio of acceleration and deceleration
    :param a1: acceleration
    :param a2: deceleration
    :param t: unit time 
    :param t: Initial speed
    :return: Short distance set
    """
    tracks = []
    # Critical value of acceleration and deceleration
    mid = rate * distance
    # Current displacement
    s = 0
    # loop
    while s < distance:
        # Initial speed
        v0 = v
        if s < mid:
            a = 20
        else:
            a = -3
        # Calculate the distance traveled in the current t time period
        s0 = v0 * t + 0.5 * a * t * t
        # Calculate current speed
        v = v0 + a * t
        # Round off the distance because pixels have no decimals
        tracks.append(round(s0))
        # Calculate current distance
        s += s0

    return tracks

if __name__ == '__main__':
    tracks = get_tracks(100)
    print(tracks)
    print(sum(tracks))
123456789101112131415161718192021222324252627282930313233343536373839

Let's take a look at the running results:

We can see that the operation of simulating uniform acceleration and deceleration has been completed.

Analysis login page


First, through the URL, we found

https://accounts.douban.com/passport/login

The page after opening is as follows:

Now let's take a look at how normal people log in to Douban.

Now let's start analyzing the page and complete these operations through selenium.

Analyze web page structure

  1. Password login

//*[@id="account"]/div[2]/div[2]/div/div[1]/ul[1]/li[2]
1
  1. User account
  2. User password
  3. Login Douban
  4. Find slider
  5. Refresh button

After the analysis is completed, let's start the code implementation

code implementation

url = "https://accounts.douban.com/passport/login"
driver = webdriver.Chrome("./chromedriver/chromedriver.exe")
driver.get(url)
print("Current title: ",driver.title)

driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[1]/ul[1]/li[2]').click()
driver.find_element_by_xpath('//*[@id="username"]').send_keys("account number")
driver.find_element_by_xpath('//*[@id="password"]').send_keys("password")
driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[2]/div[1]/div[4]/a').click()
# Stop and wait for it to appear
time.sleep(2)

# Switch iframe
driver.switch_to.frame(1)
block = driver.find_element_by_xpath('//*[@id="tcaptcha_drag_button"]')
reload = driver.find_element_by_xpath('//*[@id="reload"]')

# Action chain is required for sliding operation
# Press the slider
ActionChains(driver).click_and_hold(block).perform()
# move
ActionChains(driver).move_by_offset(180, 0).perform()
# Get displacement
tracks = get_tracks(30)
# loop
for track in tracks:
    # move
    ActionChains(driver).move_by_offset(track, 0).perform()
# release
ActionChains(driver).release().perform()
# judge
if driver.title == "Login Douban":
    print("fail...Once more...")
    # Click the refresh button to refresh
    reload.click()
    # Stop it
    time.sleep(2)
else:
    print("success!")

time.sleep(5)
driver.quit()
123456789101112131415161718192021222324252627282930313233343536373839404142

Login process test

Complete code

# encoding: utf-8
'''
  @author python Eye of
  @create 2020-11-14 14:41
  Mycsdn: https://buwenbuhuo.blog.csdn.net/
  @contact: 459804692@qq.com
  @software: Pycharm
  @file: Douban login.py
  @Version: 1.0
  
'''
import requests
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains


def get_tracks(distance, rate=0.6, t=0.2, v=0):
    """
    take distance Distance divided into small segments
    :param distance: Total distance
    :param rate: Critical ratio of acceleration and deceleration
    :param a1: acceleration
    :param a2: deceleration
    :param t: unit time 
    :param t: Initial speed
    :return: Short distance set
    """
    tracks = []
    # Critical value of acceleration and deceleration
    mid = rate * distance
    # Current displacement
    s = 0
    # loop
    while s < distance:
        # Initial speed
        v0 = v
        if s < mid:
            a = 20
        else:
            a = -3
        # Calculate the distance traveled in the current t time period
        s0 = v0 * t + 0.5 * a * t * t
        # Calculate current speed
        v = v0 + a * t
        # Round off the distance because pixels have no decimals
        tracks.append(round(s0))
        # Calculate current distance
        s += s0

    return tracks


def slide(driver):
    """Sliding verification code"""
    # Switch iframe
    driver.switch_to.frame(1)
    #Find slider
    block = driver.find_element_by_xpath('//*[@id="tcaptcha_drag_button"]')
    #Refresh found
    reload = driver.find_element_by_xpath('//*[@id="reload"]')
    while True:
        # Press the slider
        ActionChains(driver).click_and_hold(block).perform()
        # move
        ActionChains(driver).move_by_offset(180, 0).perform()
        #Get displacement
        tracks = get_tracks(30)
        #loop
        for track in tracks:
            #move
            ActionChains(driver).move_by_offset(track, 0).perform()
        # release
        ActionChains(driver).release().perform()
        #Stop it
        time.sleep(2)
        #judge
        if driver.title == "Login Douban":
            print("fail...Once more...")
            #Click the refresh button to refresh
            reload.click()
            # Stop it
            time.sleep(2)
        else:
            break

def main():
    """main program"""
    url = "https://accounts.douban.com/passport/login"
    driver = webdriver.Chrome("./chromedriver/chromedriver.exe")
    driver.get(url)
    driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[1]/ul[1]/li[2]').click()
    driver.find_element_by_xpath('//*[@id="username"]').send_keys("account number")
    driver.find_element_by_xpath('//*[@id="password"]').send_keys("password")
    driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[2]/div[1]/div[4]/a').click()
    # Stop and wait for it to appear
    time.sleep(2)
    #Sliding verification code
    slide(driver)

    print("success")
    driver.quit()


if __name__ == '__main__':
    main()
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106

Finally, Xiaobian is a python development engineer. Here is a complete set of Python learning materials and routes sorted out by myself. Anyone who wants these materials can pay attention to Xiaobian and receive them by private letter "01".

Topics: Python Selenium AI