Want to learn Python but don't know the installation and use of requests module?

Posted by Pro Ninja on Tue, 14 Dec 2021 06:22:11 +0100

Mainly learn the http module requests, which is mainly used to send requests and obtain responses. There are many alternative modules in this module, such as urlib module, but the requests module is most used in work. The code of requests is simple and easy to understand. Compared with the bloated urlib module, there will be less crawler code written with requests, And it will be simple to implement a function. Therefore, the requests module is very important in python. It is recommended that you master the use of this module.

requests module

Let's learn how to implement our crawler in code

1. Introduction to requests module

requests file http://docs.python-requests.org/zh_CN/latest/index.html

**1.1 functions of requests module:**

send out http Request, get response data

1.2 requests module is a third-party module that needs to be installed in your python (virtual) environment

pip/pip3 install requests

1.3 requests module sends get requests

    Requirements: through requests Send a request to Baidu home page to obtain the source code of the page

    Run the following code and observe the printout
# 1.2. 1 - simple code implementation
import requests 

# Destination url
url = 'https://www.baidu.com' 

# Send get request to target url
response = requests.get(url)

# Print response content
print(response.text)

Knowledge points: master the get request sent by the requests module

2. response object

Observe the running results of the above code and find that there are a lot of random codes; This is because the character sets used in encoding and decoding are different; We try to use the following methods to solve the problem of Chinese garbled code
# 1.2.2-response.content
import requests 

# Destination url
url = 'https://www.baidu.com' 

# Send get request to target url
response = requests.get(url)

# Print response content
# print(response.text)
print(response.content.decode()) # Pay attention here!

1.response.text yes requests Module according to chardet The result of decoding the encoded character set inferred by the module

2.The strings transmitted over the network are bytes Type, so response.text = response.content.decode('Inferred coded character set')

3.We can search in the web source code charset,Try to refer to the coded character set and note that there are inaccuracies

2.1 response.text and response The difference between content:

response.text

    Type: str

    Decoding type: requests Module automatically according to HTTP The header makes an informed guess about the encoding of the response, which is the inferred text encoding

response.content

    Type: bytes

    Decoding type: not specified

2.2 through response Content is decode d to solve Chinese garbled code

response.content.decode() default utf-8

response.content.decode("GBK")

Common coded character sets
  • utf-8
  • gbk
  • gb2312
  • ascii (pronunciation: ask code)
  • iso-8859-1

2.3 other common properties or methods of response object

response = requests. In get (URL), response is the response object obtained by sending the request; In the response response object, there are other common properties or methods besides text and content to obtain the response content:

response.url Responsive url;Sometimes it's responsive url And requested url Inconsistent

response.status_code Response status code

response.request.headers Response corresponding request header

response.headers Response header

response.request._cookies Response to corresponding request cookie;return cookieJar type

response.cookies Responsive cookie(Yes set-cookie Action; return cookieJar type

response.json()Automatically json The response content of string type is converted to python Object( dict or list)
# 1.2.3-response other common attributes
import requests

# Destination url
url = 'https://www.baidu.com'

# Send get request to target url
response = requests.get(url)

# Print response content
# print(response.text)
# print(response.content.decode())             # Pay attention here!
print(response.url)                            # Print the url of the response
print(response.status_code)                    # Print the status code of the response
print(response.request.headers)                # Print the request header of the response object
print(response.headers)                        # Print response header
print(response.request._cookies)            # Print the cookies carried by the request
print(response.cookies)                        # Print the cookies carried in the response


3. The requests module sends a request

3.1 send request with header

We first write a code to get Baidu home page
import requests

url = 'https://www.baidu.com'

response = requests.get(url)

print(response.content.decode())

# Print the request header information corresponding to the response request
print(response.request.headers)

3.1. 1 thinking

What is the difference between the web source code of Baidu home page on the browser and the source code of Baidu home page in the code?

    How to view the web source code:

        Right click-View web page source code or

        Right click-inspect

Contrast correspondence url What is the difference between the response content and the source code of Baidu home page in the code?

    View corresponding url Method of response content:

        Right click-inspect

        click Net work

        Tick Preserve log

        Refresh page

        see Name The same as the browser address bar url of Response

The source code of Baidu home page in the code is very few. Why?

    We need to bring the request header

        Review the concept of crawler, simulate browser, cheat server, and obtain content consistent with browser

    There are many fields in the request header, including User-Agent Field is necessary to represent the operating system of the client and the information of the browser

3.1. 2 method of carrying request and sending request

requests.get(url, headers=headers)

headers Parameter receives the request header in dictionary form

Request header field name as key,Field as value

3.1. 3 complete code implementation

Copy from browser User-Agent,structure headers Dictionaries; After completing the following code, run the code to see the results
import requests

url = 'https://www.baidu.com'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

# Bring the user agent in the request header to simulate the browser to send the request
response = requests.get(url, headers=headers) 

print(response.content)

# Print request header information
print(response.request.headers)

3.2 sending requests with parameters

We often find when using Baidu search url There will be one in the address ?,After the question mark is the request parameter, also known as the query string

3.2. 1 carry parameters in url

Make a request directly to the url with parameters

import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

url = 'https://www.baidu.com/s?wd=python'

response = requests.get(url, headers=headers)

3.2. 2 carry parameter dictionary through params

1. Build request parameter dictionary

2. Bring the parameter dictionary when sending the request to the interface, and set the parameter dictionary to params

import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

# This is the target url
# url = 'https://www.baidu.com/s?wd=python'

# Is there a question mark in the end? The results are the same
url = 'https://www.baidu.com/s?'

# The request parameter is a dictionary, namely wd=python
kw = {'wd': 'python'}

# Initiate a request with request parameters and obtain a response
response = requests.get(url, headers=headers, params=kw)

print(response.content)

3.3 carrying cookie s in the headers parameter

Websites often take advantage of the in the request header Cookie Field to maintain the user access status, so we can headers Add to parameter Cookie,Simulate the request of ordinary users. We take github Take login as an example:

3.3.1 github login packet capture analysis

Open the browser and right-click-Check, click Net work,Tick Preserve log

visit github Logged in url address https://github.com/login

Enter the account password and click log in to access a user who needs to log in to get the correct content url,For example, click the in the upper right corner Your profile visit https://github.com/USER_NAME

determine url After that, the number in the request header information required to send the request is determined User-Agent and Cookie

3.3. 2 completion code

Copy from browser User-Agent and Cookie

The request header fields and values in the browser are the same as headers Must be consistent in parameters

headers In the request parameter dictionary Cookie The value corresponding to the key is a string
import requests

url = 'https://github.com/USER_NAME'

# Construct request header dictionary
headers = {
    # User agent copied from browser
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',
    # Cookie s copied from the browser
    'Cookie': 'xxx Here is a copy cookie character string'
}

# The cookie string is carried in the request header parameter dictionary
resp = requests.get(url, headers=headers)

print(resp.text)

3.3. 3 run code verification results

Search in the printed output title,html If the title text content in is yours github Account number, then it is successfully used headers Parameter carrying cookie,Get the page that can only be accessed after login

Knowledge points: master the cookie s carried in headers

3.4 use of cookie parameters

In the last section, we were headers Carried in parameter cookie,Special can also be used cookies parameter

cookies Parameter form: Dictionary

cookies = {"cookie of name":"cookie of value"}

    The dictionary corresponds to the request header Cookie String, dividing each pair of dictionary key value pairs with semicolons and spaces

    To the left of the equal sign is a cookie of name,corresponding cookies Dictionary key

    Corresponding to the right of the equal sign cookies Dictionary value

cookies How to use parameters

response = requests.get(url, cookies)

take cookie Convert string to cookies Dictionary required for parameter:

cookies_dict = {cookie.split('=')[0]:cookie.split('=')[-1] for cookie in cookies_str.split('; ')}

be careful: cookie Generally, there is an expiration time. Once it expires, it needs to be obtained again
import requests

url = 'https://github.com/USER_NAME'

# Construct request header dictionary
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
# Construct cookie dictionary
cookies_str = 'From browser copy Come here cookies character string'

cookies_dict = {cookie.split('=')[0]:cookie.split('=')[-1] for cookie in cookies_str.split('; ')}

# The cookie string is carried in the request header parameter dictionary
resp = requests.get(url, headers=headers, cookies=cookies_dict)

print(resp.text)

Knowledge points: master the use of cookie parameters

3.5 method of converting Cookie Jar object into cookie dictionary

use requests Acquired resposne Object with cookies Properties. The attribute value is a cookieJar Type, including the local settings of the opposite server cookie. How do we convert it to cookies What about the dictionary?

Conversion method

cookies_dict = requests.utils.dict_from_cookiejar(response.cookies)

among response.cookies What is returned is cookieJar Object of type

requests.utils.dict_from_cookiejar Function return cookies Dictionaries

Knowledge points: master the conversion method of cookieJar

last

If the article is helpful to you, please praise the author\
Refuse white whoring, start with me

Topics: Python