Using RUST to write the actual combat of streaming media server -- rtmp chunk stepping on the pit record

Posted by Mieke23 on Sat, 19 Feb 2022 23:50:29 +0100

Using RUST to write the actual combat of streaming media server -- rtmp chunk stepping on the pit record

In recent months, I've broken my guard and focused on a new open source project, a streaming media service written in trust xiu.
I stepped on a lot of holes in the process of implementation. Today, let's talk about the chunk in rtmp.

RTMP protocol is really complex. I read many posts and official documents before this project, but I always feel that I can't fully understand it. After implementing this protocol, I feel a lot clearer.

At present, there are not enough tests, but some problems have been found. Chunks have been seen for a long time, but many people still don't understand it. To explain, except for three handshake data, other RTMP protocols, including signaling and media data (audio and video related data), will be encapsulated into chunk s.

Residual data of handshake

TCP does not send data in accordance with protocol signaling. Only one signaling is sent at a time. Sometimes multiple signaling are sent. In the rtmp handshake stage, data is read from the TCP stream once. After the handshake, some data will be left, which should be filled into the chunk parsing buffer data.

chunk size

The initial chunk size should be set to 128.

My test and troubleshooting process records are as follows:
At the beginning, I set the chunk size to 4096. When I use ffplay to play the stream and send connect signaling, there will always be one more byte, resulting in the failure of amf parsing. I use wireshark to capture packets. This byte is not available. At first, I thought wireshark would not make mistakes. I thought tokio network library, so I replaced it with tcp basic library. This byte still exists. I thought of a stupid method, Find an open source rtmp server and print out this signaling. When tcp data is first received, this byte also exists, but amf parsing is successful. The next step is to print out the data of each step, from chunk parsing to amf parsing Look at the step in which this byte disappeared. Finally, it is found that this byte is the first byte of the chunk, fmt+csid. The initialized chunk size is wrong..

Status retention

Before explaining status retention, let's talk about the components of chunk. According to the official documents, chunk consists of four parts:

  • basic header
  • message header
  • extended timestamp
  • payload

The first three parts can be compressed.

basic header

 /******************************************************************
 * 5.3.1.1. Chunk Basic Header
 * The Chunk Basic Header encodes the chunk stream ID and the chunk
 * type(represented by fmt field in the figure below). Chunk type
 * determines the format of the encoded message header. Chunk Basic
 * Header field may be 1, 2, or 3 bytes, depending on the chunk stream
 * ID.
 *
 * The bits 0-5 (least significant) in the chunk basic header represent
 * the chunk stream ID.
 *
 * Chunk stream IDs 2-63 can be encoded in the 1-byte version of this
 * field.
 *    0 1 2 3 4 5 6 7
 *   +-+-+-+-+-+-+-+-+
 *   |fmt|   cs id   |
 *   +-+-+-+-+-+-+-+-+
 *   Figure 6 Chunk basic header 1
 *
 * Chunk stream IDs 64-319 can be encoded in the 2-byte version of this
 * field. ID is computed as (the second byte + 64).
 *   0                   1
 *   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
 *   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *   |fmt|    0      | cs id - 64    |
 *   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *   Figure 7 Chunk basic header 2
 *
 * Chunk stream IDs 64-65599 can be encoded in the 3-byte version of
 * this field. ID is computed as ((the third byte)*256 + the second byte
 * + 64).
 *    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
 *   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *   |fmt|     1     |         cs id - 64            |
 *   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *   Figure 8 Chunk basic header 3
 *
 * cs id: 6 bits
 * fmt: 2 bits
 * cs id - 64: 8 or 16 bits
 *
 * Chunk stream IDs with values 64-319 could be represented by both 2-
 * byte version and 3-byte version of this field.
 ***********************************************************************/

The first two bits of the first byte are format, with four values of 0, 1, 2 and 3. These four values are used to compress the message header. The details will be described below. The last six bits are chunk stream ID, referred to as csid (there are pits in this field, which will be explained below). The value range of the six bits is [0,63], 0 and 1 have special purposes, and 2 to 63 represent the real csid. For the special values of 0 and 1:

  • 0 indicates csid, which is represented by 6 + 8 bit s

  • 1 indicates csid, which is represented by 6 + 16 bit s

    The parsing code is as follows:

     let mut csid = (byte & 0b00111111) as u32;
     match csid {
      0 => {
          if self.reader.len() < 1 {
              return Ok(UnpackResult::NotEnoughBytes);
          }
          csid = 64;
          csid += self.reader.read_u8()? as u32;
      }
      1 => {
          if self.reader.len() < 1 {
              return Ok(UnpackResult::NotEnoughBytes);
          }
          csid = 64;
          csid += self.reader.read_u8()? as u32;
          csid += self.reader.read_u8()? as u32 * 256;
      }
      _ => {}

    }

message header

Next, let's talk about the message header. This part is complex. There are four types, corresponding to 0 ~ 3 of the format field in the basic header.

type 0

/*****************************************************************/
/*      5.3.1.2.1. Type 0                                        */
/*****************************************************************
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                timestamp(3bytes)              |message length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| message length (cont)(3bytes) |message type id| msg stream id |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       message stream id (cont) (4bytes)       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*****************************************************************/

No fields are omitted.

type 1

/*****************************************************************/
/*      5.3.1.2.2. Type 1                                        */
/*****************************************************************
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                timestamp(3bytes)              |message length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| message length (cont)(3bytes) |message type id|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*****************************************************************/

The message stream id is omitted and the data of the previous chunk is used.

type 2

 /************************************************/
 /*      5.3.1.2.3. Type 2                       */
 /************************************************
  0                   1                   2
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                timestamp(3bytes)              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 ***************************************************/

What's more, message stream id, message length and message type id are omitted. This is also read from the previous chunk.

type 3

3 nothing, all from the front.

extended timestamp

This field is optional and takes up 4 byte s. If the timestamp field in the message header is greater than 0xFFFFFF, this field will be read.

payload

Finally, the payload, whose length is determined by the message length in the message header.

The whole reading process of chunk block is as follows. At the beginning, my implementation process is like this (there is a problem)

  1. Read the first byte of a chunk and parse the format and chunk stream ID.

  2. Parse message header according to format:

    • If it is 0, each field must be parsed from the TCP stream.
    • If it is 1, the message stream ID of the previous chunk is used.
    • If it is 2, the message stream id, message length and message type id of the previous chunk are used.
    • If it is 3, the message stream id, message length, message type id and timestamp of the previous chunk are used.
  3. Determine whether to read the extendtimestamp of 4 bytes according to the timestamp value.

  4. Read the payload value according to the message length. There is a special case here. It is possible that a piece of payload data is divided into two or more chunks. In this step, you need to synthesize these segmented payload data into a complete chunk data and then return it. That is, if message length is not equal to the length of the payload after reading the payload data, go back to step 1 and continue reading the remaining payload data from the next chunk until it is finished.

    Well, the whole process is basically explained clearly. The state retention in the headline has two meanings. The first meaning is to explain the problems I stated above. I'm talking about taking the omitted fields from the previous chunk. It's wrong here because there are the following situations:

    +--------+---------+-----+------------+------- ---+------------+
    |        | Chunk   |Chunk|Header Data |No.of Bytes|Total No.of |
    |        |Stream ID|Type |            | After     |Bytes in the|
    |        |         |     |            |Header     |Chunk       |
    +--------+---------+-----+------------+-----------+------------+
    |Chunk#1 |     3      | 0   | delta: 1000| 32        | 44         |
    |        |            |     | length: 32,|           |            |
    |        |         |     | type: 8,   |           |            |
    |        |         |     | stream ID: |           |            |
    |        |         |     | 12345 (11  |           |            |
    |        |         |     | bytes)     |           |            |
    +--------+---------+-----+------------+-----------+------------+
    |Chunk#2 | 3       | 2   | 20 (3      | 32        | 36         |
    |        |         |     | bytes)     |           |            |
    +--------+---------+-----+----+-------+-----------+------------+
    |Chunk#3 | 4       | 3   | none (0    | 32        | 33         |
    |        |         |     | bytes)     |           |            |
    +--------+---------+-----+------------+-----------+------------+
    |Chunk#4 | 3       | 3   | none (0    | 32        | 33         |
    |        |         |     | bytes)     |           |            |
    +--------+---------+-----+------------+-----------+------------+

Note: field reuse in message header is for chunk stream ID.

Therefore, in the above case, chunk 2 can reuse the message header of chunk 1, but chunk 4 cannot reuse the message header of chunk 3. Therefore, special processing is required in the code. Each csid message header needs to be saved. After each chunk is parsed, the previous csid message header needs to be recovered after reading the basic header.

The second situation is also something I didn't think of when I wrote code:

tcp packets can be split anywhere.

In other words, the tcp data may be used up before a chunk is read. You need to wait for the next data. In this case, you need to keep the state of reading each field. Each read operation should set a flag, so the following four big states are written, and there are four small states in the message header.

#[derive(Copy, Clone)]
enum ChunkReadState {
    ReadBasicHeader = 1,
    ReadMessageHeader = 2,
    ReadExtendedTimestamp = 3,
    ReadMessagePayload = 4,
    Finish = 5,
}

#[derive(Copy, Clone)]
enum MessageHeaderReadState {x'x
    ReadTimeStamp = 1,
    ReadMsgLength = 2,
    ReadMsgTypeID = 3,
    ReadMsgStreamID = 4,
}

For example, ReadExtendedTimestamp takes up 4 bytes, but there are only 2 bytes left when reading here. You should keep this state. The next time you read new data from TCP, start from this state and read out the other two bytes.

Finally, the complete implementation of rtmp chunk parsing t rust is implemented in here

Finally, welcome to star.

Topics: Rust