Implementation of web automatic clock in with Python + selenium

Posted by XaeroDegreaz on Thu, 20 Jan 2022 08:20:03 +0100

Implementation of web automatic clock in with Python + selenium

Summary:

​ Probably in the context of XX, you need to log in to the website every day to fill in the body temperature and location information, analyze the logic of sending the request during login, and realize automatic clock out through Python selenium.

thinking

​ Operate the browser through selenium to complete login - > fill in form information - > clock in regularly and repeatedly.

  1. Login: there are two methods to complete the login using the machine. One is to locate the verification code position, identify the character through ocr, save the cookie after login, and then fill the cookie in the browser to skip the login. The second is to manually maintain the cookie. The analysis shows that when a login request is initiated, the client will generate a random string and insert it into the cookie of the request header, ASP NET_ Sessionid = ryhrbiqalg14hgrxhmhd4ie5, the server inserts centersoftweb = xxxxx in the set cookie of the response header; expires=Sun, 23-Jan-2022 07:45:06 GMT; path=/; Httponly, which forms a complete login skippable cookie
  2. Fill in the form information, operate the elements in the browser through selenium, and complete a series of logical operations. (you need to download the driver and chrome corresponding to the browser+ chromedriver)
  3. Regular and repeated clock in Crontab+shell can realize daily regular startup

About selenium python, Please check the website

Concrete implementation

​ The location of the location verification code is identified by the machine. First, start the operation selenium to obtain the browser screenshot under the current operating environment, and then import the picture into the pixel value of the four corners of the location verification code in ps.

When the four corner positions are obtained, the verification code image can be saved and transmitted to the ocr machine for identification.

#Save Screenshot 
browser.save_screenshot('verification/'+stunumber+'.png')
#Fill in information
browser.find_element_by_id("StudentId").send_keys(stunumber)
browser.find_element_by_xpath("/html/body/div[1]/div[2]/div/div[2]/form/div[2]/div[2]/input").send_keys(pwd)

#Picture clipping
left = 781
top = 336
right = 870
bottom = 369
picture = Image.open('verification/'+stunumber+'.png')
picture = picture.crop((left, top, right, bottom))
picture.save('verification/'+stunumber+'-ed.png') 

#ocr identifies the access api
yzm_code = moyunocr.getYzm('verification/'+stunumber+'-ed.png')

#Get cookie write file
dictCookies = browser.get_cookies()   
jsonCookies = json.dumps(dictCookies)
with open('cookies/'+stunumber+'_cookie.txt', 'w') as f:
  f.write(jsonCookies)

Stored cookie content

ps: the obtained cookie is converted to text through the json library object, so the format is slightly different from the officially defined cookie format.

[
{"domain": "dxg.cxxu.edu.cn", "expiry": 1615705350, "httpOnly": true, "name": "CenterSoftWeb", "path": "/", "secure": false, "value": "08CBB2BBDCB3E268D3EE9D0C8046B4C979910509ACF95A1F302336B2826DCAD9CE60311213F70E114CAF84ED18695CB3C6260D17B502F8D42ADC8966B6886C5C78D8B479AC5F152E8CC526E9517C1CF3783D3462632A5EDC0104C19C58AFCE9B020937BD766C253F8A327AE72F11F5ECD4F91E72021B3ECC9C7232BFBCAEF0BA38ACD4246EA9CD293CD7ADFD1BC682DF7FEB194CB8E433804D93B3CF00B299E55DA5530BF7D32523977316BAF9486DBAB85221C5348E587F05220BC9A44EE8901C33ADD15097D87E8305C0FEF8A4A3B9D2B358172EA280D622DE78B53CAE5EE88379B2600F38740BD2D33CFC0192CCD9B2DCE5C8BE4124A4466623C8A96B2499B25C16C9EA128D19C88BA0FD8BC4746F"}, 
{"domain": "dxg.cxxu.edu.cn", "httpOnly": true, "name": "ASP.NET_SessionId", "path": "/", "secure": false, "value": "qe45ulzkfahnqjvvexj45kcr"}
]

What Cookie

cookie document

​ Based on the HTTP protocol itself is stateless, it is impossible to identify whether each request is initiated by the same user, so you can put the identifiable status information in each request header. This leads to two authentication methods, such as cookie and token, which can be simply understood as an identifier.

​ The Cookie exists in the request header when initiating the request.

Format:

​ A cookie is a set of data in the form of name=value and is represented by ';' Example of split list: PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43; _gat=1,

What Set-Cookie

​ The Set Cookie HTTP response header is used to send cookies from the server to the user agent so that the user agent can send them back to the server later. To send multiple cookies, multiple Set Cookie headers should be sent in the same response

Attributes attribute

Attribute item describe Is it necessary
cookie-name=cookie-value name is self-defined but must be unique, and value is based on the saved value of the business yes
Expires=date 1. Maximum lifetime of cookies, 2 If it is not set, it is a session cookie 3 The expiration date is related to the client setting the cookie, and the expiration date is deleted no
Max-Age=number Similar to Expires, but higher priority than Expires no
Domain=domain-value 1. Define the host that receives cookie s 2 [. Example. Com] contains child domain names but may be ignored 3 no
Path=path-value 1. The path that must exist in the cookie validation path URL 2 no
Secure Cookies are sent to the server only when requesting with https no
HttpOnly Disable JavaScript access to cookie s no
SameSite=samesite-value Controls whether cookies are sent along with cross source requests. Optional [Strict (prohibit cross domain), Lax (send cookies when users navigate from an external website to the original website, default), none (allow cross domain and set secure at the same time)] no

Request to carry cookies

with open('cookies/'+stunumber+'_cookie.txt', 'r', encoding='utf8') as f:
		listCookies = json.loads(f.read())
# Add cookies to browser
for cookie in listCookies:
    cookie_dict = {
    'domain': '.mnnu.edu.cn',
    'name': cookie.get('name'),
    'value': cookie.get('value'),
    "expires": '',
    'path': '/',
    'httpOnly': False,
    'HostOnly': False,
    'Secure': False
    }
		browser.add_cookie(cookie_dict)
browser.refresh()

selenium configuration under Linux

Premise: since the linux browser cannot be interfaced, additional configuration needs to be specified when initializing the browser

chrome_options = webdriver.ChromeOptions()
# Headless mode
chrome_options.add_argument('--headless')
# Disable GPU acceleration
chrome_options.add_argument('--disable-gpu')
browser = webdriver.Chrome(chrome_options=chrome_options, executable_path='/root/chromedriver')
#Global implicit wait time
browser.implicitly_wait(10)

Script monitoring cookie expiration

Traverse the interval between the last modification time and the current time of the cookie file in the specified folder. When the interval exceeds 6 days, the email reminder will be updated

SIX_DAY_SECONDS=6*24*60*60
SEVEN_DAY_SECONDS=7*24*60*60
def listFile(rootpath):
    todayStamp = time.time()
    todayTime = dateFormat(todayStamp)
    logging.info("==========="+todayTime+"====START====")
    fileList = os.listdir(rootpath)
    email_msg =""
    for i in range(0,len(fileList)):
        fullpath = os.path.join(rootpath,fileList[i])
        # Determine whether it is a file
        if os.path.isfile(fullpath):
            # Get last modified date
            lastTimeStamp = os.path.getmtime(fullpath)
            # Seconds difference
            diffSeconds = daysDiff(todayStamp,lastTimeStamp)
            if SEVEN_DAY_SECONDS-diffSeconds <=0:
                logging.info(stunumber+ "Yours cookie Has expired")
            if diffSeconds<=SEVEN_DAY_SECONDS :
                if diffSeconds>=SIX_DAY_SECONDS :
                    logging.info(stunumber+"Yours cookie Expiring soon. Valid time remaining:"+str((SEVEN_DAY_SECONDS-diffSeconds)//3600) + "hours")
                    # Send email reminder
                    email_msg+=stunumber+"Yours cookie Expiring soon. Valid time remaining:"+str((SEVEN_DAY_SECONDS-diffSeconds)//3600) + "hours \ n"
                else: 
                    logging.info(stunumber+"Yours cookie Valid time remaining:"+str((SEVEN_DAY_SECONDS-diffSeconds)//3600) + "hours")

Topics: Python Selenium