Being a crawler will always encounter all kinds of anti climbing restrictions. The first defense line of anti climbing often appears when logging in. In order to limit the automatic login of crawlers, all families have tried their best. The so-called way is one foot high and the devil is one foot high.
Small class
Steps:
(1) Calculate sliding distance
(2) Simulated human sliding (the general idea is to be fast first and then slow)
Now let's take a look at the Douban login interfaceAt this time, we enter the wrong password to make it appear the verification code.
Slide and refresh several times and find some rules. The y-axis remains unchanged and the x-axis is changing. For the sliding verification code Douban, the x-axis distance is about 207. If accurate measurement is required, pixel comparison is required.
Next, find the slider through selenium and move it. But there is a problem. If you directly move (x1,y1) to (x2,y2), it is equivalent to the effect of blinking. The time is very short and may be detected by the other party.
Next, we need to use the click sliding track of the simulated real person, which is generally accelerated first and then accelerated, assuming uniform acceleration and uniform deceleration.
After sliding, if you don't pass, you can refresh the button and then slide until you pass (because after passing, the general page starts to jump to a different title or find another comparison to find a different one)
Simulate uniform acceleration and deceleration
Code implementation:
def get_tracks(distance, rate=0.6, t=0.2, v=0): """ take distance Distance divided into small segments :param distance: Total distance :param rate: Critical ratio of acceleration and deceleration :param a1: acceleration :param a2: deceleration :param t: unit time :param t: Initial speed :return: Short distance set """ tracks = [] # Critical value of acceleration and deceleration mid = rate * distance # Current displacement s = 0 # loop while s < distance: # Initial speed v0 = v if s < mid: a = 20 else: a = -3 # Calculate the distance traveled in the current t time period s0 = v0 * t + 0.5 * a * t * t # Calculate current speed v = v0 + a * t # Round off the distance because pixels have no decimals tracks.append(round(s0)) # Calculate current distance s += s0 return tracks if __name__ == '__main__': tracks = get_tracks(100) print(tracks) print(sum(tracks)) 123456789101112131415161718192021222324252627282930313233343536373839
Let's take a look at the running results:
We can see that the operation of simulating uniform acceleration and deceleration has been completed.
Analysis login page
First, through the URL, we found
https://accounts.douban.com/passport/login
The page after opening is as follows:
Now let's take a look at how normal people log in to Douban.
Now let's start analyzing the page and complete these operations through selenium.
Analyze web page structure
- Password login
//*[@id="account"]/div[2]/div[2]/div/div[1]/ul[1]/li[2] 1
- User account
- User password
- Login Douban
- Find slider
- Refresh button
After the analysis is completed, let's start the code implementation
code implementation
url = "https://accounts.douban.com/passport/login" driver = webdriver.Chrome("./chromedriver/chromedriver.exe") driver.get(url) print("Current title: ",driver.title) driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[1]/ul[1]/li[2]').click() driver.find_element_by_xpath('//*[@id="username"]').send_keys("account number") driver.find_element_by_xpath('//*[@id="password"]').send_keys("password") driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[2]/div[1]/div[4]/a').click() # Stop and wait for it to appear time.sleep(2) # Switch iframe driver.switch_to.frame(1) block = driver.find_element_by_xpath('//*[@id="tcaptcha_drag_button"]') reload = driver.find_element_by_xpath('//*[@id="reload"]') # Action chain is required for sliding operation # Press the slider ActionChains(driver).click_and_hold(block).perform() # move ActionChains(driver).move_by_offset(180, 0).perform() # Get displacement tracks = get_tracks(30) # loop for track in tracks: # move ActionChains(driver).move_by_offset(track, 0).perform() # release ActionChains(driver).release().perform() # judge if driver.title == "Login Douban": print("fail...Once more...") # Click the refresh button to refresh reload.click() # Stop it time.sleep(2) else: print("success!") time.sleep(5) driver.quit() 123456789101112131415161718192021222324252627282930313233343536373839404142
Login process test
Complete code
# encoding: utf-8 ''' @author python Eye of @create 2020-11-14 14:41 Mycsdn: https://buwenbuhuo.blog.csdn.net/ @contact: 459804692@qq.com @software: Pycharm @file: Douban login.py @Version: 1.0 ''' import requests import time from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains def get_tracks(distance, rate=0.6, t=0.2, v=0): """ take distance Distance divided into small segments :param distance: Total distance :param rate: Critical ratio of acceleration and deceleration :param a1: acceleration :param a2: deceleration :param t: unit time :param t: Initial speed :return: Short distance set """ tracks = [] # Critical value of acceleration and deceleration mid = rate * distance # Current displacement s = 0 # loop while s < distance: # Initial speed v0 = v if s < mid: a = 20 else: a = -3 # Calculate the distance traveled in the current t time period s0 = v0 * t + 0.5 * a * t * t # Calculate current speed v = v0 + a * t # Round off the distance because pixels have no decimals tracks.append(round(s0)) # Calculate current distance s += s0 return tracks def slide(driver): """Sliding verification code""" # Switch iframe driver.switch_to.frame(1) #Find slider block = driver.find_element_by_xpath('//*[@id="tcaptcha_drag_button"]') #Refresh found reload = driver.find_element_by_xpath('//*[@id="reload"]') while True: # Press the slider ActionChains(driver).click_and_hold(block).perform() # move ActionChains(driver).move_by_offset(180, 0).perform() #Get displacement tracks = get_tracks(30) #loop for track in tracks: #move ActionChains(driver).move_by_offset(track, 0).perform() # release ActionChains(driver).release().perform() #Stop it time.sleep(2) #judge if driver.title == "Login Douban": print("fail...Once more...") #Click the refresh button to refresh reload.click() # Stop it time.sleep(2) else: break def main(): """main program""" url = "https://accounts.douban.com/passport/login" driver = webdriver.Chrome("./chromedriver/chromedriver.exe") driver.get(url) driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[1]/ul[1]/li[2]').click() driver.find_element_by_xpath('//*[@id="username"]').send_keys("account number") driver.find_element_by_xpath('//*[@id="password"]').send_keys("password") driver.find_element_by_xpath('//*[@id="account"]/div[2]/div[2]/div/div[2]/div[1]/div[4]/a').click() # Stop and wait for it to appear time.sleep(2) #Sliding verification code slide(driver) print("success") driver.quit() if __name__ == '__main__': main() 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
Finally, Xiaobian is a python development engineer. Here is a complete set of Python learning materials and routes sorted out by myself. Anyone who wants these materials can pay attention to Xiaobian and receive them by private letter "01".