cookie usage for urlib:
If you already know the cookie, or the cookie you obtained through packet grabbing, you can log in directly in the header's information;
The cookie information on the website of Jingdong is different from that on the website of Jingdong,
You can log in to JD, grab the cookie information, and then visit any website.
import urllib.request url = "http://www.jd.com" header = {"user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36", "cookie": "xxxxx"} req = urllib.request.Request(url=url, headers=header) res = urllib.request.urlopen(req) text = res.read()
Urlib's cookie related classes
The class of cookie in python2 is called: import cookie IB
The class of cookie in Python 3 is called: import http.cookie jar
The concept of opener
When you get a URL you use an opener (an instance of urllib2.OpenerDirector). Previously, we all used the default opener, urlopen.
urlopen is a special opener, which can be understood as a special instance of opener. The parameters passed in are only url, data and timeout.
If we need to use cookies, we can't achieve the goal only with this opener, so we need to create a more general opener to set cookies.
Terminal output cookie object
import urllib.request import http.cookiejar url = "http://www.hao123.com" req = urllib.request.Request(url) cookiejar = http.cookiejar.CookieJar() handler = urllib.request.HTTPCookieProcessor(cookiejar) opener = urllib.request.build_opener(handler) r = opener.open(req) print(cookiejar) <CookieJar[<Cookie BAIDUID=93B415355E0704B2BC94B5D514468898:FG=1 for .hao123.com/>, <Cookie hz=0 for .www.hao123.com/>, <Cookie ft=1 for www.hao123.com/>, <Cookie v_pg=normal for www.hao123.com/>]>
Cookie s saved to file:
import urllib.request import http.cookiejar url = "http://www.hao123.com" req = urllib.request.Request(url) cookieFileName = "cookie.txt" cookiejar = http.cookiejar.MozillaCookieJar(cookieFileName)#File cookie handler = urllib.request.HTTPCookieProcessor(cookiejar) opener = urllib.request.build_opener(handler) r = opener.open(req) print(cookiejar) cookiejar.save() //Saved in file cookie.txt MozillaCookieJar inherit FileCookieJar()inherit CookieJar Cookie Read from file cookie Information and access: import urllib.request import http.cookiejar cookie_filename = 'cookie.txt' cookie = http.cookiejar.MozillaCookieJar(cookie_filename) cookie.load(cookie_filename, ignore_discard=True, ignore_expires=True) print(cookie) url = "http://www.hao123.com" req = urllib.request.Request(url) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) # Create an opener with the method of build and opener of urlib2 response = opener.open(req) print(response.read().decode("utf-8"))#Solve the problem of garbled code