WebSocket communication process and Implementation

Posted by thedualmind on Wed, 02 Mar 2022 08:51:11 +0100

What is WebSocket?

WebSocket is a standard protocol for two-way data transmission between client and server. But it has nothing to do with HTTP. It is an independent implementation based on TCP.

In the past, the client wanted to know the processing progress of the server, kept polling with Ajax, and asked the browser to send a request to the server every few seconds, which put great pressure on the server. Another way of polling is to use long poll, which is similar to making a phone call. If you don't receive a message, you don't hang up. That is, after the client initiates a connection, if there is no message, you don't return a Response to the client, and the connection phase is always blocked.

WebSocket solves these problems of HTTP. After the server completes the protocol upgrade (HTTP - > WebSocket), the server can actively push the information to the client, which solves the problem of synchronization delay caused by polling. Because WebSocket only needs one HTTP handshake, the server can keep communicating with the client until the connection is closed, which solves the problem that the server needs to parse the HTTP protocol repeatedly and reduces the cost of resources.

With the advancement of the new standard, WebSocket has been relatively mature, and the support of various mainstream browsers for WebSocket is relatively good (incompatible with low version IE, IE below 10). You can have a look when you are free.

When using WebSocket, the front-end use is relatively standardized. js supports ws protocol and feels similar to a lightly encapsulated Socket protocol. However, in the past, you need to maintain the Socket connection yourself. Now you can do it in a more standard way.

Now let's talk about the communication process of WebSocket in combination with the above figure.

Establish connection

Client request message Header

Client request message:

GET / HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Host: example.com
Origin: http://example.com
Sec-WebSocket-Key: sN9cRrP/n9NdMgdcy2VJFQ==
Sec-WebSocket-Version: 13

Different from traditional HTTP messages:

Upgrade: websocket
Connection: Upgrade

These two lines indicate that the WebSocket protocol is initiated.

Sec-WebSocket-Key: sN9cRrP/n9NdMgdcy2VJFQ==
Sec-WebSocket-Version: 13

The protection provided by the web key socket or by the web browser against unintentional connection is basic.

SEC WebSocket version refers to the version of WebSocket. At first, there were too many WebSocket protocols. Different manufacturers have their own protocol versions, but now it has been decided. If the server does not support this version, you need to return a sec WebSocket versionheader, which contains the version number supported by the server.

To create a WebSocket object:

var ws = new websocket("ws://127.0.0.1:8001");

ws means using WebSocket protocol, followed by address and port

Complete client code:

<script type="text/javascript">
    var ws;
    var box = document.getElementById('box');

    function startWS() {
        ws = new WebSocket('ws://127.0.0.1:8001');
        ws.onopen = function (msg) {
            console.log('WebSocket opened!');
        };
        ws.onmessage = function (message) {
            console.log('receive message: ' + message.data);
            box.insertAdjacentHTML('beforeend', '<p>' + message.data + '</p>');
        };
        ws.onerror = function (error) {
            console.log('Error: ' + error.name + error.number);
        };
        ws.onclose = function () {
            console.log('WebSocket closed!');
        };
    }

    function sendMessage() {
        console.log('Sending a message...');
        var text = document.getElementById('text');
        ws.send(text.value);
    }

    window.onbeforeunload = function () {
        ws.onclose = function () {};  // Close WebSocket first
        ws.close()
    };
</script>
Server response message Header

First, let's look at the response message of the server:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
Sec-WebSocket-Protocol: chat

Let's explain line by line:

  1. First, the 101 status code indicates that the server has understood the client's request and will notify the client through the Upgrade message header to use different protocols to complete the request;
  2. Then, SEC websocket accept is the encrypted sec websocket key confirmed by the server;
  3. Finally, SEC websocket protocol represents the final protocol.

Calculation method of SEC websocket accept:

  1. Splice the SEC websocket key with 258EAFA5-E914-47DA-95CA-C5AB0DC85B11;
  2. Calculate the summary through SHA1 and convert it into base64 string.

Note: the conversion of SEC websocket key / sec websocket accept can only bring basic guarantee, but there is no practical guarantee for ws client and ws server whether the connection is safe, whether the data is safe, and whether the client / server is legal.

Create a main thread to accept the WebSocket establishment request:

def create_socket():
    # Start the Socket and listen for connections
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    try:
        sock.bind(('127.0.0.1', 8001))

        # The operating system will release the port of the server immediately after the server Socket is closed or the server process is terminated. Otherwise, the operating system will keep the port for a few minutes.
        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        sock.listen(5)
    except Exception as e:
        logging.error(e)
        return
    else:
        logging.info('Server running...')

    # Waiting for access
    while True:
        conn, addr = sock.accept()  # You will enter the waiting state

        data = str(conn.recv(1024))
        logging.debug(data)

        header_dict = {}
        header, _ = data.split(r'\r\n\r\n', 1)
        for line in header.split(r'\r\n')[1:]:
            key, val = line.split(': ', 1)
            header_dict[key] = val

        if 'Sec-WebSocket-Key' not in header_dict:
            logging.error('This socket is not websocket, client close.')
            conn.close()
            return

        magic_key = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'
        sec_key = header_dict['Sec-WebSocket-Key'] + magic_key
        key = base64.b64encode(hashlib.sha1(bytes(sec_key, encoding='utf-8')).digest())
        key_str = str(key)[2:30]
        logging.debug(key_str)

        response = 'HTTP/1.1 101 Switching Protocols\r\n' \
                   'Connection: Upgrade\r\n' \
                   'Upgrade: websocket\r\n' \
                   'Sec-WebSocket-Accept: {0}\r\n' \
                   'WebSocket-Protocol: chat\r\n\r\n'.format(key_str)
        conn.send(bytes(response, encoding='utf-8'))

        logging.debug('Send the handshake data')

        WebSocketThread(conn).start()

Communicate

The server parses the WebSocket message

The Server side receives the message sent by the Client and needs to analyze it

Client package format

1.FIN: 1 bit

  • 0: not the last fragment of the message
  • 1: It's the last fragment of the message

2.RSV1, RSV2, RSV3: 1 bit each

Generally, it is all 0. When the client and server negotiate to adopt WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value appears and the WebSocket extension is not adopted, the connection error occurs.

3.Opcode: 4bit

  • %x0: indicates a continuation frame. When Opcode is 0, it means that the data transmission adopts data fragmentation, and the currently received data frame is one of the data fragmentation;
  • %x1: indicates that this is a text frame;
  • %x2: indicates that this is a binary frame;
  • %x3-7: reserved operation code, used for non control frames defined later;
  • %x8: indicates that the connection is disconnected;
  • %x9: indicates that this is a heartbeat request (ping);
  • %xA: indicates that this is a heartbeat response (pong);
  • %xB-F: reserved operation code for subsequent defined control frames.

4.Mask: 1bit

Indicates whether to mask the data payload or not.

  • 0: no
  • 1: Yes

5.Payload length: 7bit or (7 + 16)bit or (7 + 64)bit

Represents the length of the data load.

  • 0 ~ 126: the length of the data is equal to this value;
  • 126: the next two bytes represent a 16 bit unsigned integer whose value is the length of the data;
  • 127: the next 8 bytes represent a 64 bit unsigned integer (the highest bit is 0), and the value of the unsigned integer is the length of the data.

6.Masking-key: 0 or 4bytes

  • When Mask is 1, 4-byte masking key is carried;
  • When Mask is 0, there is no masking key.
  • Masking algorithm: perform cyclic XOR operation by bit. First take the modulus of the index of the bit to obtain the corresponding value x in masking key, and then XOR the bit with X to obtain the real byte data.

Note: the role of mask is not to prevent data leakage, but to prevent proxy cache pollution attacks and other problems in earlier versions of the protocol.

7.Payload Data: load data

The code for parsing WebSocket message is as follows:

def read_msg(data):
    logging.debug(data)

    msg_len = data[1] & 127  # Length of data load
    if msg_len == 126:
        mask = data[4:8]  # Mask mask
        content = data[8:]  # Message content
    elif msg_len == 127:
        mask = data[10:14]
        content = data[14:]
    else:
        mask = data[2:6]
        content = data[6:]

    raw_str = ''  # Decoded content
    for i, d in enumerate(content):
        raw_str += chr(d ^ mask[i % 4])
    return raw_str
The server sends WebSocket message

The Mask is not carried when returning, so the Mask bit is 0, and then the length is written according to the size of the load data, and finally the load data is written.

struct module parsing

struct.pack(fmt, v1, v2, ...)

According to the given format fmt, the data is encapsulated into a string (actually a byte stream similar to the C structure)

The supported formats in struct are as follows:

Format

C Type

Python type

Standard size

x

pad byte

no value

c

char

bytes of length 1

1

b

signed char

integer

1

B

unsigned char

integer

1

?

_Bool

bool

1

h

short

integer

2

H

unsigned short

integer

2

i

int

integer

4

I

unsigned int

integer

4

l

long

integer

4

L

unsigned long

integer

4

q

long long

integer

8

Q

unsigned long long

integer

8

n

ssize_t

integer

N

size_t

integer

e

-7

float

2

f

float

float

4

d

double

float

8

s

char[]

bytes

p

char[]

bytes

P

void *

integer

In order to exchange data with structures in C language, we should also consider that some C or C + + compilers use byte alignment, which is usually a 32-bit system with 4 bytes as the unit. Therefore, struct is converted according to the byte order of the local machine. You can use the first character in the format to change the alignment, which is defined as follows:

Character

Byte order

Size

Alignment

@

native

native

native

=

native

standard

none

<

little-endian

standard

none

>

big-endian

standard

none

!

network (= big-endian)

standard

none

The code of sending WebSocket message is as follows:

def write_msg(message):
    data = struct.pack('B', 129)  # Write first byte, 10000001

    # Write packet length
    msg_len = len(message)
    if msg_len <= 125:
        data += struct.pack('B', msg_len)
    elif msg_len <= (2 ** 16 - 1):
        data += struct.pack('!BH', 126, msg_len)
    elif msg_len <= (2 ** 64 - 1):
        data += struct.pack('!BQ', 127, msg_len)
    else:
        logging.error('Message is too long!')
        return

    data += bytes(message, encoding='utf-8')  # Write message content
    logging.debug(data)
    return data

summary

There is no other technology that can realize full duplex transmission like WebSocket. So far, most developers still use Ajax polling, but this is a less elegant solution. Although WebSocket is used by few people, it may be because there are security problems when the protocol came out and there are few compatible browsers, it has been solved now. If you have these requirements, you can consider using WebSocket:

  1. Interaction between multiple users;
  2. It is necessary to frequently request update data from the server.

For example, bullet screen, message subscription, multi player game, collaborative editing, real-time quotation of stock fund, video conference, etc.