HTTPX: a new generation of network request library that is better than others and comparable to requests

Posted by wookie on Tue, 01 Feb 2022 15:41:57 +0100

As a new generation of network request library, HTTPX It not only supports all operations of requests, but also supports asynchronous API and HTTP/2. According to the description on the official website, the following features are summarized:

  • Standard synchronous interface and asynchronous support
  • HTTP/1.1 and HTTP/2
  • Make requests directly to WSGI/ASGI applications
  • Strict timeout settings
  • Full type notes
  • 100% test coverage

Quick start

>>> import httpx
>>> r = httpx.get('https://github.com')
>>> r
<Response [200 OK]>
>>> r.status_code
200
>>> r.text
<!DOCTYPE html>\n<html lang="en"  class="html-fluid"> ...

Alternatively, use an asynchronous API

>>> async with httpx.AsyncClient() as client:
>>>     r = await client.get('https://github.com')
>>> r
<Response [200 OK]>

And, HTTP/2

>>> client = httpx.Client(http2=True)
>>> r = client.get('https://github.com')
>>> r
<Response [200 OK]>
>>> r.http_version
HTTP/2

install

Installing using pip

pip install httpx

[optional] http2 support

pip install httpx[http2]

[optional] brotli decoder support

pip install httpx[brotli]

Basic use

Initiate request

>>> httpx.get('*')
>>> httpx.post('*')
>>> httpx.put('*')
>>> httpx.delete('*')
>>> httpx.head('*')
>>> httpx.options('*')

Transfer parameters

# get parameter
httpx.get(url, params={'key1': 'value1', 'key2': ['1', '2']})

# post parameter
httpx.post(url, data={'username': '123'})

# json parameter
httpx.post(url, json={'query': 'hello'})

# file
httpx.post(url, files={'file': open('report.xls', 'rb')})

# headers
httpx.get(url, headers={'User-agent': 'baiduspider'})

# cookies
httpx.get(url, cookies={'sessionid': '***'})

response

>>> r = httpx.get('https://github.com')

>>> r.url
URL('https://github.com')

>>> r.encoding
utf-8

>>> r.status_code
200

# Text response
>>> r.text

# Binary response
>>> r.content

# JSON response
>>> r.json()

redirect

By default, HTTPX follows the redirection of all http methods

The history property can be used to show the redirection of the request. It contains a list of redirection responses, arranged in the order of the responses

For example, Github automatically redirects http requests to https

>>> r = httpx.get('http://github.com')
>>> r.url
URL('https://github.com/')
>>> r.status_code
200
>>> r.history
[<Response [301 Moved Permanently]>]

You can use allow_ The redirects parameter disables default redirection

>>> r = httpx.get('http://github.com', allow_redirects=False)
>>> r.url
URL('http://github.com/')
>>> r.status_code
301
>>> r.history
[]
>>> r.next_request
<Request('GET', 'https://github.com/')>

overtime

HTTPX allows you to control different types of timeout behaviors with finer granularity, namely connect, read, write and pool

connect: the maximum time to establish a socket connection. When it times out, a ConnectTimeout exception is thrown
read: the maximum time to receive a data block. When timeout, a ReadTimeout exception is thrown
write: the maximum time to send a data block. If it times out, a WriteTimeout exception is thrown
Pool: get the maximum connection time in the connection pool, and throw a PoolTimeout exception when it times out

# Specify connect timeout of 60 seconds and other timeout of 10 seconds
timeout = httpx.Timeout(10, connect=60)
r = httpx.get(url, timeout=timeout)

Advanced

Client

When passing HTTPX When a request is initiated by get, HTTPX must establish a new connection for each request (the connection will not be reused). As the number of requests increases, it will soon become inefficient

The client uses HTTP connection pool. When you send multiple requests to the same host, the client will reuse the underlying TCP connection, which can bring significant performance improvements, including:

  • Reduce request latency (no handshake)
  • Reduce CPU usage and round trips
  • Reduce network congestion

as well as

  • Cookie persistence across requests
  • Configure all requests
  • Use http proxy
  • Use http/2

If you use requests, httpx Client can be used instead of requests Session

usage

(1) Using context manager

with httpx.Client() as client:
    ...

(2) Explicitly close connection

client = httpx.client()
try:
    ...
finally:
    client.close()

Shared configuration

The Client allows you to pass in parameters that apply to all outgoing requests

headers = {'user-agent': 'httpx/0.18.1'}
with httpx.Client(headers=headers) as client:
    r = client.get(url)

Merge configuration

When both the request and the request parameters are configured, the following two situations may occur:

  1. For headers, params, cookies, these values will be combined
>>> headers = {'user-agent': 'httpx/0.18.1'}
>>> with httpx.Client(headers=headers) as client:
>>>     r = client.get(url, headers={'Content-Type': 'application/json'})
>>>     print(r.headers)
Headers({..., 'user-agent': 'httpx/0.18.1', 'content-type': 'application/json'})
  1. For other parameters, request takes precedence

Event hooks

httpx allows you to register hook functions in the city, which will be called automatically after a specific type of event occurs

There are currently two types of events:

  1. Request, called when a request is about to occur
  2. Response, call after response.

event hook read only

>>> def log_request(request):
>>>     print(f'Request event hook: {request.method} - {request.url} sending..')

>>> def log_response(response):
>>>     request = response.request
>>>     print(f"Response event hook: {request.method} {request.url} - Status {response.status_code}")

    
>>> with httpx.Client(event_hooks={'request': [log_request]}) as client:
>>>     client.get('https://github.com')
>>>     print(r.status_code)

Request event hook: GET - https://github.com sending..
Response event hook: GET https://github.com - Status 200
200

event hooks is a list. You can register multiple hook functions for each type of event

Http proxy

# Agent all requests
httpx.Client(proxies="http://localhost:8030")

# Appoint an agent according to the agreement
proxies = {
    "http://": "http://localhost:8030",
    "https://": "http://localhost:8031",
}

# Complex agent
proxies = {
    # Proxy specified port
    "all://*:1234": "http://localhost:8030",
    # Proxy designated secondary domain name
    "all://*.example.com": "http://localhost:8030",
    # Specifies that the http protocol does not set a proxy
    "http://*": None
}

Stream

When we request large files, we don't need to directly read them into memory. We can use the way of data flow to return and read a little until we get all the contents.

You can stream the binary data of the response:

>>> with httpx.stream("GET", "https://www.example.com") as r:
>>>     for data in r.iter_bytes():
>>>         print(data)

When you use stream response, response Content and response Text will not be available, but you can also conditionally load the response body in the stream:

>>> with httpx.stream("GET", "https://www.example.com") as r:
>>>     if r.headers['Content-Length'] < TOO_LONG:
>>>         r.read()
>>>         print(r.text)

Topics: Python network crawler http