requests is a simple request library compared with urllib. The steps to complete the request are simple and can speed up the development efficiency
Installation module
pip install requests
GET request
Use * * get() sends a get request. Similarly, there are post()**,. put(),. delete(),. head(),. options() these request types
import requests url = "http://www.baidu.com" response = requests.get(url) print(response.content.decode('utf-8'))
Adding request headers using headers
When we use urllib, we need to add a request header to build the request object, but here, we don't need to build it. It's ok to pass a header parameter directly, and cookies don't need to use cookie jar. Does it feel very convenient?
url = "https://www.baidu.com/s?" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36", "cookie": "cookie" } response = requests.get(url, headers) print(response.content.decode('utf8'))
Passing URL parameters using params
When we need to pass some data for the query string of the URL, we need to put it after the URL followed by a question mark in the form of key value pairs. The get() function has a parameter (params), so we don't need to build the URL ourselves
Observe the url of Baidu, and the query parameter key is wd
import requests url = "https://www.baidu.com/s" params = { "wd": "Ultraman" } response = requests.get(url=url, params=params)
Processing of response content
The text function will automatically help you decode, but sometimes there are errors
The function content will get binary data. If it is a video, it is ok to use content for these binary files, but if it is text, it needs to be decoded with decode()
import requests picture_url = 'https://img-home.csdnimg.cn/images/20210129020554.jpg' url = 'https://www.csdn.net/' response1 = requests.get(picture_url) with open("1.jpg", "wb") as f: f.write(response1.content) response2 = requests.get(url) print(response2.encoding) # The encoding property automatically recognized by the response object can be changed print(response2.text)
POST request
The data to be passed in the post request is also wrapped in a dictionary, but no coding is required
import requests def spider(url): data = { "form ": "data", "key": "value" } response = requests.post(url, data=data) print(response.text) if __name__ == "__main__": url = "http://httpbin.org/post" spider(url)
Session reply maintenance
This is like http The function of cookie jar is similar. It can also automatically manage cookies and maintain the current session. For example, after we simulate login, the login information will be saved in a cookie. If we use session, we will use the cookie by default when we visit another web page next time, so we don't need to use the cookie manually.
import requests session = requests.Session() session.get("http://httpbin.org/cookies/set/sessioncookie/123456789") response = session.get("http://httpbin.org/cookies") print(response.text)
agent
The proxies parameter is used to pass the proxy. Similarly, the proxy also needs to be wrapped in a dictionary. The key is the requested protocol (http/https). The value has two parts, address + port
import requests proxies = { "http": "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", } headers = { "User-agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50" } response = requests.get("http://python.org", headers=headers, proxies=proxies)
other
-
Status_code
import requests res = request.get("https://www.baidu.com") print(res.status_code)
-
timeout setting
import requests response = requests.get("http://www.baidu.com", timeout=0.5s) print(response)