[python] [crawler] selenium module logs in using cookie s

Posted by kelly3330 on Sun, 16 Jan 2022 17:17:25 +0100

catalogue

Use the Session object to get cookies

Use selenium module to obtain cookies to realize automatic login

Get cookies

Automatic login using cookies

Existing problems

In the previous article, we implemented the operation of sending a POST request to the server to simulate login [send post request] However, this method is useless in the case of verification code. However, we can use the cookie information of the current login account to automatically log in the next time we visit the website.

You can use packet capture or F12 to find the currently logged in cookie information in the Network page, but the cookie value obtained by this method may not be used directly. Here, we provide two methods to automatically obtain cookies and log in using cookie status.

Use the Session object to get cookies

Use the Session object in the request module to send a request to obtain the current cookie information. When using this object to access the website again, the cookie information recorded last time will be directly used, and other login parameters are no longer required.

import requests
session = requests.Session()    #Instantiate object
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
data = {'userName':'12312451235', 'password':'123456'}    # Enter the login information in the form of a dictionary. Here, first look at F12
url = 'http://passport.baidu.com/v2/?login'
# Send POST
response = s.post(url=url, data=data, headers=headers)
print(response.cookies)    # View currently logged in cookies

# Login again to verify that the login does not use the data field
response2 = s.get(url=url, headers=headers).text()
print(response2)

However, the cookie information obtained here is only convenient for the session object. The next time you visit the web page, you still need to write a request. Therefore, you can use the selenium module. You only need to add the cookie information to the currently accessed object to realize automatic login (I think it is more convenient).

Use selenium module to obtain cookies to realize automatic login

Get using selenium module_ Cookies () obtains the cookie information of the current web page login. After converting the obtained information into JSON string, it can be stored in txt file. The next time you use selenium to access the website, you can directly add cookies to load the login status.

Get cookies

Similarly, you need to log in to the web page once before you can get the cookies of the current account

from selenium import webdriver
import json

driver = webdriver.Firefox()
driver.get("https://passport.baidu.com/v2/?login")
# Sign in
# Here, you can also introduce the time module sleep() for a certain period of time, during which you can log in manually
driver.find_element("userName").send_keys("12312341234")
driver.find_element("password").send_keys("123456")
driver.find_element("linkText","Sign in").click()

cookies = driver.get_cookies()    # Get cookies
f1 = open('cookie.txt', 'w')    #cookies stored in file JSON string
f1.write(json.dumps(cookies))
f1.close()

driver.close()

Automatic login using cookies

from selenium import webdriver
import json

driver = webdriver.Firefox()
driver.get("https://passport.baidu.com/v2/?login")
# Extract cookies from saved files
f1 = open('cookie.txt')
cookie = f1.read()
cookie_list = json.loads(cookie)    #json read cookies
for c in cookie_list:
    driver.add_cookie(c)    #The extracted cookie is added to the driver in a loop

driver.refresh()    # After refreshing, the page displays logged in

Existing problems

Because cookies are time sensitive, an error will be reported when adding cookies. For method 1, you can directly obtain the latest cookie, but for method 2, you can have the following methods:

When reading a cookie from a file, delete the "expiry" field with an error

f1 = open('cookie.txt')
cookie = f1.read()
cookie_list = json.loads(cookie)    #json read cookies
for c in cookie_list:
    if 'expiry' in cookie:
        del cookie['expiry']    # Delete the expiration field with error
    driver.add_cookie(c)    #The extracted cookie is added to the driver in a loop

Topics: Python