Crawler cracking 12306 verification code to realize login operation
1, Preparation
Before crawling, let's see what the 12306 verification code looks like
See this verification code, have wood very flustered, this tm can also crack???
Answer: of course.
Let's meet a verification code recognition platform, super eagle.
If you don't have a registered partner, you can register one. It's still easy to use the platform for personal testing, and the price is not expensive.
After successful registration, go to the user center
Warm tip: you can get 1000 points when you bind wechat for the first time. After all, I'm a prostitute, hehe.
Enter the software id and generate a software id. after that, we only need to use the software id.
Then click development documents, python, and download.
Download it and get the following.
2, Full code
At this point, the basic preparation is completed, not much to say - code.
open chaojiying.py File to copy its contents to the python file to be written
First, I will give you the complete code, and then we will analyze it step by step
import requests from hashlib import md5 class Chaojiying_Client(object): def __init__(self, username, password, soft_id): self.username = username password = password.encode('utf8') self.password = md5(password).hexdigest() self.soft_id = soft_id self.base_params = { 'user': self.username, 'pass2': self.password, 'softid': self.soft_id, } self.headers = { 'Connection': 'Keep-Alive', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)', } def PostPic(self, im, codetype): """ im: Picture byte codetype: Topic type reference http://www.chaojiying.com/price.html """ params = { 'codetype': codetype, } params.update(self.base_params) files = {'userfile': ('ccc.jpg', im)} r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers) return r.json() def ReportError(self, im_id): """ im_id:Picture of wrong topic ID """ params = { 'id': im_id, } params.update(self.base_params) r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers) return r.json() ========================================================================================= # chaojiying = Chaojiying_Client('ppx666', '07244058664', '906006') # User center > > software ID generates a replacement 96001 # im = open('a.jpg', 'rb').read() # Local image file path to replace a.jpg sometimes required by WIN system// # print(chaojiying.PostPic(im, 9004)['pic_str']) # 1902 official website of verification code type from selenium import webdriver from PIL import Image import time from selenium.webdriver import ActionChains bro = webdriver.Chrome() #Full screen bro.maximize_window() bro.get("https://kyfw.12306.cn/otn/resources/login.html") time.sleep(2) bro.find_element_by_class_name("login-hd-account").click() # save_screenshot is to take a screenshot of the current page and save it bro.save_screenshot('aa.png') # Determine the coordinates of the upper left corner and the lower right corner corresponding to the verification code picture code_img_ele = bro.find_element_by_id("J-loginImg") location = code_img_ele.location #The coordinates x,y of the upper left corner of the captcha picture size = code_img_ele.size # Length and width corresponding to the label of the verification code # Coordinates of upper left and lower right rangle = ( int(location['x']),int(location['y']),int(location['x'] + size['width']),int(location['y'] + size['height']) ) i = Image.open("./aa.png") code_img_name = './code.png' # Crop crop frame = i.crop(rangle) frame.save(code_img_name) chaojiying = Chaojiying_Client('Super Eagle account', 'Super Eagle code', 'The software that we prepared before ID') # User center > > software ID generates a replacement 96001 im = open('code.png', 'rb').read() # Local image file path to replace a.jpg sometimes required by WIN system// print(chaojiying.PostPic(im, 9004)['pic_str']) # 1902 official website of verification code type ======================================================================================== result = chaojiying.PostPic(im, 9004)['pic_str'] all_list = [] # Store coordinates to be clicked if '|' in result: list_1 = result.split("|") count_1 = len(list_1) for i in range(count_1): xy_list = [] x = int(list_1[i].split(",")[0]) y = int(list_1[i].split(",")[1]) xy_list.append(x) xy_list.append(y) all_list.append(xy_list) else: x = int(result.split(",")[0]) y = int(result.split(",")[1]) xy_list = [] xy_list.append(x) xy_list.append(y) all_list.append(xy_list) print(all_list) # Traverse the list, and click the position specified by x,y corresponding to each list element using the action chain for l in all_list: x = l[0] print(x) y = l[1] print(y) ActionChains(bro).move_to_element_with_offset(code_img_ele,x,y).click().perform() # move_to_element_with_offset moves to the position of how far away from an element (upper left coordinate) time.sleep(0.5) ======================================================================================== bro.find_element_by_id("J-userName").send_keys("Fill in your 12306 account") time.sleep(1) bro.find_element_by_id("J-password").send_keys("Fill in your 12306 password") time.sleep(1) bro.find_element_by_id("J-login").click() time.sleep(1)
3, Code analysis
OKOK, start to analyze!!! It's divided into four parts. I separated them with "===="
1, Part I
The first part is to let you download the python file. Don't ask, just don't know (∀ˇˇ). After all, I haven't studied it. Interested partners can do research.
2, Part II
# chaojiying = Chaojiying_Client('ppx666', '07244058664', '906006') # User center > > software ID generates a replacement 96001 # im = open('a.jpg', 'rb').read() # Local image file path to replace a.jpg sometimes required by WIN system// # print(chaojiying.PostPic(im, 9004)['pic_str']) # 1902 official website of verification code type from selenium import webdriver from PIL import Image import time from selenium.webdriver import ActionChains bro = webdriver.Chrome() #Full screen bro.maximize_window() bro.get("https://kyfw.12306.cn/otn/resources/login.html") time.sleep(2) bro.find_element_by_class_name("login-hd-account").click() # save_screenshot is to take a screenshot of the current page and save it bro.save_screenshot('aa.png') # Determine the coordinates of the upper left corner and the lower right corner corresponding to the verification code picture code_img_ele = bro.find_element_by_id("J-loginImg") location = code_img_ele.location #The coordinates x,y of the upper left corner of the captcha picture size = code_img_ele.size # Length and width corresponding to the label of the verification code # Coordinates of upper left and lower right rangle = ( int(location['x']),int(location['y']),int(location['x'] + size['width']),int(location['y'] + size['height']) ) i = Image.open("./aa.png") code_img_name = './code.png' # Crop crop frame = i.crop(rangle) frame.save(code_img_name) chaojiying = Chaojiying_Client('Super Eagle account', 'Super Eagle code', 'The software that we prepared before ID') # User center > > software ID generates a replacement 96001 im = open('code.png', 'rb').read() # Local image file path to replace a.jpg sometimes required by WIN system// print(chaojiying.PostPic(im, 9004)['pic_str']) # 1902 official website of verification code type
I won't talk about the first guide bag
selenium, I'm sure you've learned it. It doesn't matter what you haven't learned. Let's develop documents for you Selenium Chinese document
bro = webdriver.Chrome() use Chrome Browser driven bro.maximize_window() Set full screen to open, because I found that if I don't set it, he won't open full screen, I don't know if you are bro.get("https://kyfw.12306.cn/otn/resources/login.html") This is it. get reach12306Of time.sleep(2) Don't sleep too fast for two seconds bro.find_element_by_class_name("login-hd-account").click() We think that the default way to open the login website is to scan the code for login, so we can use theclassClick the name of to log in the account
bro.save_screenshot('aa.png') save_screenshot It is to take a screenshot of the current page and save it code_img_ele = bro.find_element_by_id("J-loginImg") according to id Find captcha picture location = code_img_ele.location #Coordinate x of the upper left corner of the captcha picture,y(location Property to return the picture object(This picture)Location in browser, returned as a dictionary) size = code_img_ele.size # The length and width corresponding to the label of the verification code (size returns the width and height of the picture) rangle = ( int(location['x']),int(location['y']),int(location['x'] + size['width']),int(location['y'] + size['height']) ) Coordinates of upper left and lower right i = Image.open("./aa.png") Open truncated image code_img_name = './code.png' # Crop crop frame = i.crop(rangle) Cut according to the above coordinates frame.save(code_img_name) preservation chaojiying = Chaojiying_Client('Super Eagle account', 'Super Eagle code', 'The software that we prepared before ID') # User center>>Software ID Generate a replacement 96001 im = open('code.png', 'rb').read() # Local picture file path to replace a.jpg Sometimes WIN System requirements// print(chaojiying.PostPic(im, 9004)['pic_str']) # 9004 Verification code type
12306 verification code generally has four pictures at most, so 9004 is used
OK, the second part is finished.
3, Part III
result = chaojiying.PostPic(im, 9004)['pic_str'] Get the coordinates returned by super Eagle all_list = [] # Store coordinates to be clicked //Coordinate format is: x,x|x,x (x Indicates the number of bits) if '|' in result: If there are multiple coordinates, there are“|",Single none“|" list_1 = result.split("|") According to“|"division count_1 = len(list_1) Get the divided quantity for i in range(count_1): //The following should be understandable. I won't analyze it. I will disassemble it according to the coordinate format xy_list = [] x = int(list_1[i].split(",")[0]) y = int(list_1[i].split(",")[1]) xy_list.append(x) xy_list.append(y) all_list.append(xy_list) else: x = int(result.split(",")[0]) y = int(result.split(",")[1]) xy_list = [] xy_list.append(x) xy_list.append(y) all_list.append(xy_list) print(all_list) # Traversing the list, using the action chain for each list element corresponding to the x,y Click at the designated location for l in all_list: x = l[0] print(x) y = l[1] print(y) ActionChains(bro).move_to_element_with_offset(code_img_ele,x,y).click().perform() # move_to_element_with_offset moves to the position of how far away from an element (upper left coordinate) time.sleep(0.5)
4, Part IV
It doesn't need to be said, that is to find the account number and password according to the id and fill in the assignment box, and finally click the login button. Don't rest too fast in the middle bro.find_element_by_id("J-userName").send_keys("fill in your 12306 account") time.sleep(1) bro.find_element_by_id("J-password").send_keys("fill in your 12306 password") time.sleep(1) bro.find_element_by_id("J-login").click() time.sleep(1)
This is the end. I think a good partner can give me some praise! Thank you so much. (if there are any mistakes, please point them out in the comment area, thank you!)