Server
The following is a server made of flash, which is used to set cookie s and request headers when printing requests
# -*- coding: utf-8 -*- from flask import Flask, make_response, request app = Flask(__name__) @app.route('/a1') def a1(): print(request.headers) rp = make_response() rp.set_cookie('a1', '123') return rp @app.route('/a2') def a2(): print(request.headers) rp = make_response() # rp.set_cookie('a2', '234') return rp @app.route('/a3') def a3(): print(request.headers) rp = make_response() rp.set_cookie('a3', '345') return rp if __name__ == '__main__': app.run(host='0.0.0.0')
client
# -*- coding: utf-8 -*- import requests url1 = 'http://192.168.2.159:5000/a1' url2 = 'http://192.168.2.159:5000/a2' url3 = 'http://192.168.2.159:5000/a3' cookies = requests.utils.cookiejar_from_dict({'test': 'test'}) print(type(cookies), cookies) # Requestscookeiejar object s = requests.session() s.cookies = cookies # The cookie test=test set here will be attached to all requests s.headers = {'h1':'h1'} # The request header h1=h1 set here will be attached to all requests r1 = s.get(url1, cookies={'r1': 'r1'},headers={'h2':'h2'}) # Temporarily add cookies R1 = R1 and header h2=h2. This cookie and header will not be present in the next request r2 = s.get(url2) requests.utils.add_dict_to_cookiejar(s.cookies, {'xx': 'xx'}) # In the next request, the xx cookie is permanently added r3 = s.get(url3) # r1.cookies is a RequestsCookieJar object, which can use requests utils. dict_ from_ Cookie jar (r1.cookies) converts it to dict # I found that dict can be directly used for conversion, which is more convenient to write print(dict(r1.cookies)) # Print the cookies set in the return result of r1 request print(dict(r2.cookies)) # cookies set in the return result of the print r2 request print(dict(r3.cookies)) # The cookies set in the return result of the print r3 request print(dict(s.cookies)) # s.cookies include all cookie s in the whole session request (temporarily added, such as r1 above is not included)
Start the server first and then the client
Operation results
Server print results
192.168.2.159 - - [26/Jun/2019 17:28:00] "GET /a1 HTTP/1.1" 200 - Host: 192.168.2.159:5000 Accept-Encoding: identity H1: h1 H2: h2 Cookie: test=test; r1=r1 192.168.2.159 - - [26/Jun/2019 17:28:00] "GET /a2 HTTP/1.1" 200 - Host: 192.168.2.159:5000 Accept-Encoding: identity H1: h1 Cookie: test=test; a1=123 192.168.2.159 - - [26/Jun/2019 17:28:00] "GET /a3 HTTP/1.1" 200 - Host: 192.168.2.159:5000 Accept-Encoding: identity H1: h1 Cookie: test=test; xx=xx; a1=123
Client print results
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[<Cookie test=test for />]> {'a1': '123'} {} {'a3': '345'} {'test': 'test', 'xx': 'xx', 'a1': '123', 'a3': '345'}
Summary and suggestions
- It can be seen from the server printing that if we do not set the user agent, the request header of the requests module is Python requests / 2.21.0, which is not the request header of the normal browser. This is one reason why we must modify the request header when we are a crawler
- Use requests Session () can help us save all the cookies during the session. It can save us from obtaining the cookie of the last request, and then updating the cookie and resetting it before requesting such operations
- Cookies and headers set through s.cookie s and s.headers will be carried throughout the session
- Cookies and headers set in the form of s.get (url1, cookies = {R1 ':'r1'}, headers = {H2 ':'h2'}) will not overwrite the request headers and cookies set in s.cookies and s.headers, but will be added to this request, and R1 and H2 will not be carried in the next request
- requests. utils. add_ dict_ to_ Cookie jar (s.cookies, {XX ':'xx'}) can set a fixed Cookie: XX for S. this cookie is not temporary and will be carried in subsequent requests
- r1. The result of cookies is the RequestsCookieJar object, which can be converted through dict to obtain a dict. Its content is the cookie set in the R1 request response header. If the current request is not set with a new cookie, an empty dictionary will follow the dict
- s. The result of cookies is the set cookies during the whole session (the process of all requests sent through s), and all set cookies can be obtained through dict(s.cookies)
- It is suggested that we set the public parts in advance during reuse, such as headers, cookies and proxies
- Recently, it has been found that if some cookies are set many times in the whole process and the direct use of dict fails, the safest way is to use requests utils. dict_ from_ cookie jar (s.cookies) to get dictionary type cookies