rtsp for audio and video learning understanding of rtp protocol

Posted by racerxfactor on Sat, 15 Jan 2022 10:06:37 +0100

1: Theoretical understanding of relevant details

The actual transmission of media data (video/audio) is through rtp.

rtp can be sent based on udp or tcp. (This is a bit questionable, many people say that rtp is based on udp transport)

=="So out of order, lost packets, and a picture resource is too large, how do I unpack the relevant logic?

rtp transfers h264 image resources, need to know about h264 format data, and how to send packets and unpack after receiving

When rtp transfers AAC audio files, it needs to understand aac-related formats (aac has two formats), and think about how to package and unpack as well.

During the rtsp test, I found that if the audio is sent by timer, there will be sound karton. What can I do here?

There is a question in the previous article about how to use rtp for pushing and how to play unsuccessfully, but at least:

== rtp package is only part of the data, and audio and video playback also needs to know some other information (sdp), such as the type of current stream, sampling rate when playing audio, and so on.

== 1: When using rtp to push, get an sdp file and use it to pull and play.

== 2: Receive data using rtp, parse it, store it locally, and play it.

There was a question in the previous article about how to push with obs:

=="obs is a powerful live video recording software that can support push streaming.
== obs can collect camera, audio, desktop, window and other functions, only focus on test pushflow here
== When pushing, you need to set it, set it in File ->Settings ->Push, fill in our server correlation, then select some collection methods and click Start Push (test successful):

2: Understand rtp-related protocols (codes already available for the course)

2.1: Take a look at the concepts:

rtp real-time transport protocol, is a transport layer protocol (usually udp based)

=== Actual Transport: Maximum Transmission Unit MTU needs to be considered

rtp actually transmits streaming data internally (which can be a complete frame (such as an audio frame) or less than one frame (such as an image resource)

The actual data flow inside the rtp can be data in various formats, such as h264,aac, and other protocols.

rtp can support multiple streams, such as a single rtp link that can transmit both h264 and aac streams.

===) If rtp supports multiple formats and streams, certain information negotiation (sdp) needs to be done before the actual transmission.

The rtp protocol is usually used in conjunction with the rtcp protocol.

Understanding between rtp over rtsp(udp) and rtp over rtsp (tcp)?

=== rtp is a transport layer protocol, but in essence it is an application layer protocol, only a lower level than the application layer protocol

==="rtp and rtcp can be transmitted with udp or tcp.

==="rtsp involves multiple sets of transmission channels, to define the port of rtp transmission.

2.2: Understanding the Agreement

When reading RFC3550 Chinese documents, rtp can be used in multicast audio conferencing, audio and video conferencing, mixer, converter, layered encoding, monitor, etc.

rtcp is the RTP control protocol, such as the increase and departure of the number of people in the meeting, mixing is the format to adapt to rtp, calculating the current bandwidth, controlling the sending frequency of rtp, etc.

==="rtcp itself accounts for bandwidth, and its sending frequency has some specifications

=== rtcp has different packet types: SR (sender report), RR (receiver report), SDES (source description item), BYE (end of session), APP (application description function)

2.2.1:rtp message format

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-taQdknig-1642180924923) (C:\Usersyun68AppDataRoamingTyporatypora-user-imagesimage-2021857251.png)]

2.2.2: Simple description of rtp message format

The first 12 bytes appear in each RTP package, and the CSRC identifier list appears only when inserted by the mixer.

Version (V): The version number of the RTP protocol, which occupies 2 places, and the current protocol version number is 2.

Fill_: The fill mark takes up one place, and if P=1, it fills the tail of the newspaper with or more additional bit groups, which are not part of the payload. (It may be used for some encryption algorithms with fixed length or for transmitting multiple RTP packets in the underlying data unit.)

== If a fill position is set, an additional fill word will be included at the end of the package, which is not a payload. Fills in the value of the last byte of the byte length bit. Some encryption algorithms require a fixed-size padding word or several RTP packages to carry in the underlying protocol data unit.

Extension (X): Extension bits, 1 bit, if X=1, then the fixed head (only) is followed by a head extension. (Extension headers start with a fixed header format 0XBEDE flag and are 32-bit aligned)

CSRC Count (CC): The CSRC counter, which takes up four digits, contains the number of CSRC identifiers that follow the fixed head.

Mark (M): Mark, occupies one place, different payloads have different meanings,

== For video, mark the end of frame; For frequency, mark the beginning of the frame.

Load type (PT): Payload type, which takes up 7 places, is used to describe the type of payload in RTP message, such as GSM frequency, JPEM image, etc.

sequence number: 16 bits, identifies the serial number of the RTP message sent by the sender, and increments the serial number by 1 for each message sent.

== Recipients use serial numbers to detect missing messages, reorder them, and restore data. (Initial value is random)

Time stamp: 32 bits, the time stamp reflects the sampling time of the first bit group (first byte) of the RTP message.

=="The recipient causes the timestamp to calculate latency and delay jitter and synchronize control.

== The clock frequency depends on the load data format and is described in the description file (profile)

==) If the RTP packet is generated periodically, the nominal sampling time determined by the sampling clock will be used instead of the reading system time (for a fixed rate of audio, the sampling clock will increase by 1 per cycle. If an audio reads from the input device blocks containing 160 sampling cycles, the timestamp value will increase by 160 for each block)

== The initial value of the timestamp should be random, just like the serial number. Several consecutive RTP packages will have the same serial number if they are generated simultaneously.

==) If the transmitted data is stored, instead of being sampled in real time, a virtual representation of the timeline obtained from the reference clock is used.

Synchronization Source (SSRC): 32 bits, identifying the synchronization source.

=="Synchronization source, all sources with the same identity, are processed together.

=="All packages of a synchronization source form part of the same timing and serial number space so that the recipient can put the packages of a synchronization source together for replay.

==) This identifier is randomly selected and two synchronization sources participating in the same video conference cannot have the same SSRC (to resolve conflicts).

=="Microphones, cameras, RTP mixers (see below) are synchronization sources

=="A synchronization source may change its data format over time, such as audio encoding.

Specific Sources (CSRC): Each CSRC identifier takes 32 bits and can have between 0 and 15. (is a table)

=="source of action, which makes up all the active sources in the mixer.

== The number is determined by the CSRC count (CC)

== CSRC table: Identifies all the dedicated sources contained in the RTP payload.

=="is inserted by the mixer, listing all the sources used in the mixer.

== For example, in an audio conference, which people are grouped together in a package to let the listener know who is speaking.

2.2.2:rtp header definition

typedef struct _rtp_header_t
{
    uint32_t v:2;		/* Version 2*/
    uint32_t p:1;		/* Fill flag for 1 bit encryption or multiple rtp packages???*/
    uint32_t x:1;		/* Extension flag takes up 1 bit Add Header Extension, has fixed format, 32 bit alignment */
    uint32_t cc:4;		/* CSRC Number of 4-bit sources for counter*/
    uint32_t m:1;		/* sign 			 End of 1-bit video flag, start of audio flag*/
    uint32_t pt:7;		/* Payload, 7-bit type such as GSM Frequency, JPEM image, etc.*/
    uint32_t seq:16;	/*Sequence Number 16 Bit Lost Packet Rearrangement Recovery Data Use*/
    uint32_t timestamp; /*Time stamp occupies 16 bits for delay control  */
    uint32_t ssrc;		/*Synchronization source takes 32 bits and identifies multiple synchronization sources together */
    					/*The source of action takes up 32 bit mixer, not added here.*/
} rtp_header_t;   

2.3: Understand the code

As an agreement, here are a few points to understand:

1: Define the structure based on the agreement

2: Construct protocol message, serialize

3: Parse protocol message, deserialize

Here is a simple comb of these details based on the test source:

2.3.1: Head structure

//Head structure
typedef struct _rtp_header_t
{
    uint32_t v:2;       /* protocol version */
    uint32_t p:1;       /* padding flag */
    uint32_t x:1;       /* header extension flag */
    uint32_t cc:4;      /* CSRC count */
    uint32_t m:1;       /* marker bit */
    uint32_t pt:7;      /* payload type */
    uint32_t seq:16;    /* sequence number */
    uint32_t timestamp; /* timestamp */
    uint32_t ssrc;      /* synchronization source */
} rtp_header_t;


struct rtp_packet_t     // Encapsulating this RTP includes header + [csrc/extension] + payload
{
    rtp_header_t rtp;
    uint32_t csrc[16];      // Up to 16 CSRCs
    const void* extension; // extension(valid only if rtp.x = 1)
    uint16_t extlen; // extension length in bytes
    uint16_t reserved; // extension reserved
    const void* payload; //  rtp payload
    int payloadlen; // payload length in bytes
};

2.3.2: Construct rtp packets to send

The function here is actually an existing rtp package, just for processing and sending, followed by other logic.

//Rtp_based on rtpt header data Header_ T structure, written in ptr
static inline void nbo_write_rtp_header(uint8_t *ptr, const rtp_header_t *header)
{
    ptr[0] = (uint8_t)((header->v << 6) | (header->p << 5) | (header->x << 4) | header->cc);
    ptr[1] = (uint8_t)((header->m << 7) | header->pt);
    ptr[2] = (uint8_t)(header->seq >> 8);
    ptr[3] = (uint8_t)(header->seq & 0xFF);

    nbo_w32(ptr+4, header->timestamp);
    nbo_w32(ptr+8, header->ssrc);
}

// Encapsulate a readable RTP package as a data serialization to send out
int rtp_packet_serialize_header(const struct rtp_packet_t *pkt, void* data, int bytes)
{
    int hdrlen;
    uint32_t i;
    uint8_t* ptr;

    if (RTP_VERSION != pkt->rtp.v || 0 != (pkt->extlen % 4))
    {
        assert(0); // RTP version field must equal 2 (p66)
        return -1;
    }

    // RFC3550 5.1 RTP Fixed Header Fields(p12)
    hdrlen = RTP_FIXED_HEADER + pkt->rtp.cc * 4 + (pkt->rtp.x ? 4 : 0);
    if (bytes < hdrlen + pkt->extlen)
        return -1;

    ptr = (uint8_t *)data;
    //Write rtp_header_t-related data including timestamps and ssrc
    nbo_write_rtp_header(ptr, &pkt->rtp);
    ptr += RTP_FIXED_HEADER;

    // pkt contributing source
    //Write csrc
    for (i = 0; i < pkt->rtp.cc; i++, ptr += 4)
    {
        nbo_w32(ptr, pkt->csrc[i]);     // csrc list encapsulated to the head
    }

    // pkt header extension
    //If there is an extension flag, write it behind the rtp header
    if (1 == pkt->rtp.x)
    {
        // 5.3.1 RTP Header Extension
        assert(0 == (pkt->extlen % 4));
        nbo_w16(ptr, pkt->reserved);
        nbo_w16(ptr + 2, pkt->extlen / 4);
        memcpy(ptr + 4, pkt->extension, pkt->extlen);   // extension encapsulated to the head
        ptr += pkt->extlen + 4;
    }

    return hdrlen + pkt->extlen;
}

//The data bits ultimately send the data 
int rtp_packet_serialize(const struct rtp_packet_t *pkt, void* data, int bytes)
{
    int hdrlen;

    //Write rtp header data to data
    hdrlen = rtp_packet_serialize_header(pkt, data, bytes);
    if (hdrlen < RTP_FIXED_HEADER || hdrlen + pkt->payloadlen > bytes)
        return -1;

    //Write the actual payload to data
    memcpy(((uint8_t*)data) + hdrlen, pkt->payload, pkt->payloadlen);
    //Returns the actual size of the entire data
    return hdrlen + pkt->payloadlen;
}

2.3.3: Resolve incoming rtp packets

//Logical attention to padding and flag bit processing for resolving retrieved rtp packages
/*
 0               1               2               3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|   CC  |M|     PT      |      sequence number          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                synchronization source (SSRC) identifier       |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                 contributing source (CSRC) identifiers        |
|                               ....                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*/
// By receiving the data, the resolvable RTP package deserialized bytes is the byte order received
int rtp_packet_deserialize(struct rtp_packet_t *pkt, const void* data, int bytes)
{
    uint32_t i, v;
    int hdrlen;
    const uint8_t *ptr;

    if (bytes < RTP_FIXED_HEADER) // RFC3550 5.1 RTP Fixed Header Fields(p12)
        return -1;
    ptr = (const unsigned char *)data;
    memset(pkt, 0, sizeof(struct rtp_packet_t));

    // Processing of pkt header network byte order
    v = nbo_r32(ptr);   //The first 4 bytes of uint32 processing
    pkt->rtp.v = RTP_V(v);
    pkt->rtp.p = RTP_P(v);
    pkt->rtp.x = RTP_X(v);
    pkt->rtp.cc = RTP_CC(v);
    pkt->rtp.m = RTP_M(v);
    pkt->rtp.pt = RTP_PT(v);
    pkt->rtp.seq = RTP_SEQ(v);
    pkt->rtp.timestamp = nbo_r32(ptr + 4);  //Processing the next four bytes, timestamp
    pkt->rtp.ssrc = nbo_r32(ptr + 8);       //SSRC  
    assert(RTP_VERSION == pkt->rtp.v);      // When debugging

    hdrlen = RTP_FIXED_HEADER + pkt->rtp.cc * 4;    // Total length when parsing with csrc
    //Verify version header length and extension and fill flags based on rtcp header data
    if (RTP_VERSION != pkt->rtp.v || bytes < hdrlen + (pkt->rtp.x ? 4 : 0) + (pkt->rtp.p ? 1 : 0))
        return -1;      // Report errors

    // pkt contributing source
    //Get the CSRC table if there is information about the source of action
    for (i = 0; i < pkt->rtp.cc; i++)
    {
        pkt->csrc[i] = nbo_r32(ptr + 12 + i * 4);
    }

    assert(bytes >= hdrlen);
    pkt->payload = (uint8_t*)ptr + hdrlen;      // Skip your head to get payload
    pkt->payloadlen = bytes - hdrlen;           // payload length

    // pkt header extension
    //If there is an extension flag 
    if (1 == pkt->rtp.x)
    {
        const uint8_t *rtpext = ptr + hdrlen;
        assert(pkt->payloadlen >= 4);
        //rtp extension headers are also specific formats
        pkt->extension = rtpext + 4; //This should be 4 bytes of the extension header specific identity
        pkt->reserved = nbo_r16(rtpext); //Extended header correlation
        pkt->extlen = nbo_r16(rtpext + 2) * 4;
        if (pkt->extlen + 4 > pkt->payloadlen)
        {
            assert(0);
            return -1;
        }
        else
        {
            pkt->payload = rtpext + pkt->extlen + 4;
            pkt->payloadlen -= pkt->extlen + 4;
        }
    }

    // If padding has padding bits, the last byte is the length of the padding
    if (1 == pkt->rtp.p)
    {
        uint8_t padding = ptr[bytes - 1];
        if (pkt->payloadlen < padding)
        {
            assert(0);
            return -1;
        }
        else
        {
            pkt->payloadlen -= padding;
        }
    }

    return 0;
}

3: Summary and Next Step

Seeing the relevant documents, rtp belongs to the transport layer protocol and is based on udp transmission. However, I understand that sometimes rtp can be transmitted through tcp, which is a little doubt left over.

Next step:

How does rtp interact with the corresponding file formats such as h264,aac? Comb a process of reading and pushing H264 files.

rtcp messages refer to SR,RR,SDES,BYE,APP non-communicating types of messages, as well as the control role and detail combing of rtcp in the entire business process.

Analyze the test source of rtp and comb it with rtp for transmission of h264 and aac

Relevant knowledge and sources of information: Recommend free subscription

Topics: rtp