Request mode 2 - requests module

Posted by jerryroy on Sun, 16 Jan 2022 12:55:06 +0100

requests is a simple request library compared with urllib. The steps to complete the request are simple and can speed up the development efficiency

Installation module

​ pip install requests

GET request

Use * * get() sends a get request. Similarly, there are post()**,. put(),. delete(),. head(),. options() these request types

import requests

url = "http://www.baidu.com"
response = requests.get(url)
print(response.content.decode('utf-8'))

Adding request headers using headers

When we use urllib, we need to add a request header to build the request object, but here, we don't need to build it. It's ok to pass a header parameter directly, and cookies don't need to use cookie jar. Does it feel very convenient?

url = "https://www.baidu.com/s?"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36",
    "cookie": "cookie"
}
response = requests.get(url, headers)
print(response.content.decode('utf8'))

Passing URL parameters using params

When we need to pass some data for the query string of the URL, we need to put it after the URL followed by a question mark in the form of key value pairs. The get() function has a parameter (params), so we don't need to build the URL ourselves

Observe the url of Baidu, and the query parameter key is wd

import requests

url = "https://www.baidu.com/s"
params = {
    "wd": "Ultraman"
}
response = requests.get(url=url, params=params)

Processing of response content

The text function will automatically help you decode, but sometimes there are errors

The function content will get binary data. If it is a video, it is ok to use content for these binary files, but if it is text, it needs to be decoded with decode()

import requests

picture_url = 'https://img-home.csdnimg.cn/images/20210129020554.jpg'
url = 'https://www.csdn.net/'

response1 = requests.get(picture_url)
with open("1.jpg", "wb") as f:
    f.write(response1.content)

response2 = requests.get(url)
print(response2.encoding) # The encoding property automatically recognized by the response object can be changed
print(response2.text)

POST request

The data to be passed in the post request is also wrapped in a dictionary, but no coding is required

import requests 
def spider(url): 
    data = { "form ": "data", "key": "value" } 
    response = requests.post(url, data=data) 
    print(response.text) 
if __name__ == "__main__": url = "http://httpbin.org/post" 			spider(url)

Session reply maintenance

This is like http The function of cookie jar is similar. It can also automatically manage cookies and maintain the current session. For example, after we simulate login, the login information will be saved in a cookie. If we use session, we will use the cookie by default when we visit another web page next time, so we don't need to use the cookie manually.

import requests 

session = requests.Session()
session.get("http://httpbin.org/cookies/set/sessioncookie/123456789") 
response = session.get("http://httpbin.org/cookies") print(response.text)

agent

The proxies parameter is used to pass the proxy. Similarly, the proxy also needs to be wrapped in a dictionary. The key is the requested protocol (http/https). The value has two parts, address + port

import requests 
proxies = { 
    "http": "http://10.10.1.10:3128", 
    "https": "http://10.10.1.10:1080", 
} 
headers = { 
    "User-agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50" 
          } 
response = requests.get("http://python.org", headers=headers, proxies=proxies)

other

  • Status_code

    import requests
    
    res = request.get("https://www.baidu.com")
    print(res.status_code)
    
  • timeout setting

    import requests
    
    response = requests.get("http://www.baidu.com", timeout=0.5s)
    print(response)
    

Topics: Python