catalogue
1, Installation and documentation address
Add headers and query parameters
response.text and response Difference of content
7, Handling untrusted SSL certificates
Although the urllib module in Python's standard library already contains most of the functions we usually use, its API doesn't feel very good to use, and Requests advertises "HTTP for Humans", indicating that it is more concise and convenient to use.
1, Installation and documentation address
pip can be installed easily:
pip install requests
Chinese documents: http://docs.python-requests.org/zh_CN/latest/index.html
github address: https://github.com/requests/requests
2, Send GET request
The simplest way to send a get request is through requests Get to call
import requests response = requests.get("https://www.baidu.com/")
Add headers and query parameters
If you want to add headers, you can pass in the headers parameter to add header information in the request header. If you want to pass parameters in a url, you can use the params parameter. The relevant example codes are as follows
import requests headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER" } data = { "kw": "China" } url = "https://www.baidu.com/s" # params receives the query parameters of a dictionary or string. The dictionary type is automatically converted to url without urlencode() response = requests.get(url=url, params=data, headers=headers) # View response code print("status_code: {}".format(response.status_code)) # status_code: 200 # View decoding response Decoding method used in text print("encoding:{}".format(response.encoding)) # encoding:utf-8 # View the full url address print("url:{}".format(response.url)) # url:https://www.baidu.com/ # Check the response content, response Byte stream data returned by content print("context:{}".format(response.content)) # Requests will automatically decode the content, and most unicode character sets can be decoded seamlessly. print("text:{}".format(response.text))
response.text and response Difference of content
1,response.content: This is the data captured directly from the network without any decoding. So it is a byte type. In fact, the data transmitted on the hard disk and on the network are bytes.
2,response.text: This is the data type of str, which is the response of the requests library Content to decode the string. Decoding requires specifying an encoding method, and requests will judge the decoding method according to their own guess. Therefore, sometimes there may be a guess error, which will lead to garbled code after decoding. At this time, response should be used content. Decode ("UTF-8").
3, Send POST request
The most basic post request can use the post method:
response = requests.post("http://www.baidu.com/",data=data)
Incoming data
At this time, don't use urlencode to encode. Just pass it into a dictionary. For example, the code of the data requested to pull the hook:
import requests url = "https://www.lagou.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false" headers = { "Referer": "https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER", "Cookie": "LGUID=20161229121751-c39adc5c-cd7d-11e6-8409-5254005c3644; user_trace_token=20210531172956-5c418892-9a48-4fa1-8377-2c4f7a20b741; LG_HAS_LOGIN=1; hasDeliver=0; privacyPolicyPopup=false; WEBTJ-ID=20210531185241-179c20de542206-0e1348daf26807-2b6f686a-1049088-179c20de543150; RECOMMEND_TIP=true; __lg_stoken__=677cc1b348553c3ed5e9cbb7b390a2ff300eb24fefe8a8e97e42e2872fc9543fba2800c9390bbd1d173c49e0c0362f67288bd32b2db49b0ed2db58d21a0b452d975350e4ed22; index_location_city=%E5%B9%BF%E5%B7%9E; login=false; unick=""; _putrc=""; JSESSIONID=ABAAAECABIEACCAAF01E6707FDF7DE8820405BA09C6C439; PRE_UTM=; PRE_HOST=; PRE_SITE=; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; X_HTTP_TOKEN=34e72e60c648e0f923883522611a51a83da2b43601; sensorsdata2015session=%7B%7D; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2221787087%22%2C%22first_id%22%3A%22179c20de60575-03879bc46582c2-2b6f686a-1049088-179c20de606146%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24os%22%3A%22Windows%22%2C%22%24browser%22%3A%22Chrome%22%2C%22%24browser_version%22%3A%2257.0.2987.98%22%7D%2C%22%24device_id%22%3A%22179c20de60575-03879bc46582c2-2b6f686a-1049088-179c20de606146%22%7D; _gat=1; _ga=GA1.2.2125950788.1622453397; _gid=GA1.2.1174524155.1622458625; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1622453401,1622453584,1622458348,1622460836; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1622538830; TG-TRACK-CODE=index_search; LGSID=20210601171352-cf8ce267-baa7-496a-85b4-d5c3239d2f39; LGRID=20210601171402-1ac136e6-0d19-488f-9639-d1ad2bb75fe7; SEARCH_ID=6a4a97b38c434d13830a64516514c7e3" } data = { "first": "true", "pn": 1, "kd": "python" } response = requests.post(url=url, data=data, headers=headers) print(response.text) print(response.json()) # Call the built-in JSON decoder to parse the data
Sending a post request is very simple. Call requests directly The post method is OK.
If json data is returned, you can call response json () to convert the json string into a dictionary. If json decoding fails, response json () will throw an exception.
To check whether the request was successful, use response raise_ for_ Status() or check the response status_ Whether the code is the same as your expectations.
4, Use agent
Using requests to add a proxy is also very simple. Just pass the proxies parameter in the requested method (such as get or post). The example code is as follows:
import requests url = "http://httpbin.org/ip" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER", } proxy = { "http": "42.193.23.248:16817" } resp = requests.get(url=url, headers=headers, proxies=proxy) print(resp.text)
5, Cookies
If a cookie is included in a response, you can use the cookie attribute to get the returned cookie value:
import requests url = "http://www.baidu.com" resp = requests.get(url) print("cookies: {}".format(resp.cookies)) print("cookies_dict: {}".format(resp.cookies.get_dict())) """ result: cookies: <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> cookies_dict: {'BDORZ': '27315'} """
6, session
Session objects allow you to maintain certain parameters across requests. It also maintains cookie s between all requests made by the same session instance.
Therefore, if you send multiple requests to the same host, the underlying TCP connection will be reused, resulting in significant performance improvement. (see HTTP persistent connection).
Previously, using the urllib library, you can use opener to send multiple requests, and cookies can be shared among multiple requests. If we use requests to share cookies, we can use the session object provided by the requests library. Note that the session here is not the session in web development. This place is just a session object. Take logging in to Renren as an example and use requests to implement it. The example code is as follows
import requests headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER" } data = { "log": "123123@qq.com", "pwd": "12312", "wp-submit": "Log In", "redirect_to": "http://47.106.134.xx:10086/wp-blog/wp-admin/", "testcookie": "1" } login_url = "http://47.106.134.xx:10086/wp-blog/wp-login.php" admin_url = "http://47.106.134.xx:10086/wp-blog/wp-admin/" session = requests.Session() # land session.post(login_url, data=data, headers=headers) # Enter the management page response = session.get(admin_url) with open("admin.html", "wb") as f: f.write(response.content)
7, Handling untrusted SSL certificates
For websites that have been trusted with SSL integers, such as https://www.baidu.com/ , then you can directly return the normal response using requests. The example code is as follows:
resp = requests.get('http://www.12306.cn/mormhweb/',verify=False)
print(resp.content.decode('utf-8'))