Implementation of web automatic clock in with Python + selenium
Summary:
Probably in the context of XX, you need to log in to the website every day to fill in the body temperature and location information, analyze the logic of sending the request during login, and realize automatic clock out through Python selenium.
thinking
Operate the browser through selenium to complete login - > fill in form information - > clock in regularly and repeatedly.
- Login: there are two methods to complete the login using the machine. One is to locate the verification code position, identify the character through ocr, save the cookie after login, and then fill the cookie in the browser to skip the login. The second is to manually maintain the cookie. The analysis shows that when a login request is initiated, the client will generate a random string and insert it into the cookie of the request header, ASP NET_ Sessionid = ryhrbiqalg14hgrxhmhd4ie5, the server inserts centersoftweb = xxxxx in the set cookie of the response header; expires=Sun, 23-Jan-2022 07:45:06 GMT; path=/; Httponly, which forms a complete login skippable cookie
- Fill in the form information, operate the elements in the browser through selenium, and complete a series of logical operations. (you need to download the driver and chrome corresponding to the browser+ chromedriver)
- Regular and repeated clock in Crontab+shell can realize daily regular startup
About selenium python, Please check the website
Concrete implementation
The location of the location verification code is identified by the machine. First, start the operation selenium to obtain the browser screenshot under the current operating environment, and then import the picture into the pixel value of the four corners of the location verification code in ps.
When the four corner positions are obtained, the verification code image can be saved and transmitted to the ocr machine for identification.
#Save Screenshot browser.save_screenshot('verification/'+stunumber+'.png') #Fill in information browser.find_element_by_id("StudentId").send_keys(stunumber) browser.find_element_by_xpath("/html/body/div[1]/div[2]/div/div[2]/form/div[2]/div[2]/input").send_keys(pwd) #Picture clipping left = 781 top = 336 right = 870 bottom = 369 picture = Image.open('verification/'+stunumber+'.png') picture = picture.crop((left, top, right, bottom)) picture.save('verification/'+stunumber+'-ed.png') #ocr identifies the access api yzm_code = moyunocr.getYzm('verification/'+stunumber+'-ed.png') #Get cookie write file dictCookies = browser.get_cookies() jsonCookies = json.dumps(dictCookies) with open('cookies/'+stunumber+'_cookie.txt', 'w') as f: f.write(jsonCookies)
Stored cookie content
ps: the obtained cookie is converted to text through the json library object, so the format is slightly different from the officially defined cookie format.
[ {"domain": "dxg.cxxu.edu.cn", "expiry": 1615705350, "httpOnly": true, "name": "CenterSoftWeb", "path": "/", "secure": false, "value": "08CBB2BBDCB3E268D3EE9D0C8046B4C979910509ACF95A1F302336B2826DCAD9CE60311213F70E114CAF84ED18695CB3C6260D17B502F8D42ADC8966B6886C5C78D8B479AC5F152E8CC526E9517C1CF3783D3462632A5EDC0104C19C58AFCE9B020937BD766C253F8A327AE72F11F5ECD4F91E72021B3ECC9C7232BFBCAEF0BA38ACD4246EA9CD293CD7ADFD1BC682DF7FEB194CB8E433804D93B3CF00B299E55DA5530BF7D32523977316BAF9486DBAB85221C5348E587F05220BC9A44EE8901C33ADD15097D87E8305C0FEF8A4A3B9D2B358172EA280D622DE78B53CAE5EE88379B2600F38740BD2D33CFC0192CCD9B2DCE5C8BE4124A4466623C8A96B2499B25C16C9EA128D19C88BA0FD8BC4746F"}, {"domain": "dxg.cxxu.edu.cn", "httpOnly": true, "name": "ASP.NET_SessionId", "path": "/", "secure": false, "value": "qe45ulzkfahnqjvvexj45kcr"} ]
What Cookie
Based on the HTTP protocol itself is stateless, it is impossible to identify whether each request is initiated by the same user, so you can put the identifiable status information in each request header. This leads to two authentication methods, such as cookie and token, which can be simply understood as an identifier.
The Cookie exists in the request header when initiating the request.
Format:
A cookie is a set of data in the form of name=value and is represented by ';' Example of split list: PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43; _gat=1,
What Set-Cookie
The Set Cookie HTTP response header is used to send cookies from the server to the user agent so that the user agent can send them back to the server later. To send multiple cookies, multiple Set Cookie headers should be sent in the same response
Attributes attribute
Attribute item | describe | Is it necessary |
---|---|---|
cookie-name=cookie-value | name is self-defined but must be unique, and value is based on the saved value of the business | yes |
Expires=date | 1. Maximum lifetime of cookies, 2 If it is not set, it is a session cookie 3 The expiration date is related to the client setting the cookie, and the expiration date is deleted | no |
Max-Age=number | Similar to Expires, but higher priority than Expires | no |
Domain=domain-value | 1. Define the host that receives cookie s 2 [. Example. Com] contains child domain names but may be ignored 3 | no |
Path=path-value | 1. The path that must exist in the cookie validation path URL 2 | no |
Secure | Cookies are sent to the server only when requesting with https | no |
HttpOnly | Disable JavaScript access to cookie s | no |
SameSite=samesite-value | Controls whether cookies are sent along with cross source requests. Optional [Strict (prohibit cross domain), Lax (send cookies when users navigate from an external website to the original website, default), none (allow cross domain and set secure at the same time)] | no |
Request to carry cookies
with open('cookies/'+stunumber+'_cookie.txt', 'r', encoding='utf8') as f: listCookies = json.loads(f.read()) # Add cookies to browser for cookie in listCookies: cookie_dict = { 'domain': '.mnnu.edu.cn', 'name': cookie.get('name'), 'value': cookie.get('value'), "expires": '', 'path': '/', 'httpOnly': False, 'HostOnly': False, 'Secure': False } browser.add_cookie(cookie_dict) browser.refresh()
selenium configuration under Linux
Premise: since the linux browser cannot be interfaced, additional configuration needs to be specified when initializing the browser
chrome_options = webdriver.ChromeOptions() # Headless mode chrome_options.add_argument('--headless') # Disable GPU acceleration chrome_options.add_argument('--disable-gpu') browser = webdriver.Chrome(chrome_options=chrome_options, executable_path='/root/chromedriver') #Global implicit wait time browser.implicitly_wait(10)
Script monitoring cookie expiration
Traverse the interval between the last modification time and the current time of the cookie file in the specified folder. When the interval exceeds 6 days, the email reminder will be updated
SIX_DAY_SECONDS=6*24*60*60 SEVEN_DAY_SECONDS=7*24*60*60 def listFile(rootpath): todayStamp = time.time() todayTime = dateFormat(todayStamp) logging.info("==========="+todayTime+"====START====") fileList = os.listdir(rootpath) email_msg ="" for i in range(0,len(fileList)): fullpath = os.path.join(rootpath,fileList[i]) # Determine whether it is a file if os.path.isfile(fullpath): # Get last modified date lastTimeStamp = os.path.getmtime(fullpath) # Seconds difference diffSeconds = daysDiff(todayStamp,lastTimeStamp) if SEVEN_DAY_SECONDS-diffSeconds <=0: logging.info(stunumber+ "Yours cookie Has expired") if diffSeconds<=SEVEN_DAY_SECONDS : if diffSeconds>=SIX_DAY_SECONDS : logging.info(stunumber+"Yours cookie Expiring soon. Valid time remaining:"+str((SEVEN_DAY_SECONDS-diffSeconds)//3600) + "hours") # Send email reminder email_msg+=stunumber+"Yours cookie Expiring soon. Valid time remaining:"+str((SEVEN_DAY_SECONDS-diffSeconds)//3600) + "hours \ n" else: logging.info(stunumber+"Yours cookie Valid time remaining:"+str((SEVEN_DAY_SECONDS-diffSeconds)//3600) + "hours")