[Implementing an H.264 stream parser from zero]: Importing exponential Columbus decoding to implement and preliminarily parse NALU

Posted by kcgame on Thu, 22 Aug 2019 08:33:07 +0200

Links to the original text: https://mp.weixin.qq.com/s/LMQKMyt8q4fkittRPSoCaA

In the last article, we have found nalu. At the beginning of this article, we will gradually build a framework to analyze nalu. In this article, the core tasks are as follows:

  • 1. Implementing nal_to_rbsp, that is, extracting RBSP from nalu
  • 2. Implementing rbsp_to_sodb, that is, finding trailing_bits from RBSP
  • 3. Import the exponential Columbus decoding part that has been implemented in the [h264/avc Syntax and Semantic Explanation] series.

The content is very simple. In order to be more professional, let's change the last project a little.

(1) File name:


Yes, we changed main.c to decode.c, h264_nal.h to nalu.h, and h264_nal.c to nalu.c.

(2) Encapsulation and Opening File Operation

In the previous article, we put the open h264 file and the read operation into main.c. For this purpose, we first create new files stream.h and stream.c to encapsulate the open and read flow operation. The encapsulation is implemented as follows:


It is divided into two steps: read and release, in which read includes open operation. And we define the read h264 file buffer as the global variable file_buff. Therefore, in decode.c, the read operation can be completed in one step:

// Read h264 file
int buff_size = readAnnexbBitStreamFile("silent_cif_baseline_5_frames.h264");
printf("totalSize: %d\n", buff_size);

(3) introducing nalu_t structure

In the previous article, every time we find_nal_unit(), we put the buf data of Nalu into the global variable uint8_t nalu_buf[1024*1024], which is not conducive to subsequent operations. So we discard this variable and introduce the nalu_t structure. The buf data of each read Nalu is stored in Nalu - > buf.

And we'll parse the three syntactic elements of nalu_header later, so we can implement them as follows:

/**
Network Abstraction Layer (NAL) unit
@see 7.3.1 NAL unit syntax
*/
typedef struct
{
   // nal header
   int forbidden_zero_bit;                     // f(1)
   int nal_ref_idc;                            // u(2)
   int nal_unit_type;                          // u(5)
   int len;                // Save nalu_size first, then rbsp_size, and finally the length of SODB
   uint8_t *buf;
} nalu_t;

The value of Nalu - > len is not fixed at the beginning, as we will see in the next analysis.

By synthesizing (2) and (3) the file_buff global variables and nalu_t structure acquired in two steps, the find_nal_unit() function implemented before is slightly modified, and the buf data of Nalu read each time is stored in nalu-> buf for subsequent analysis. In this way, the former main.c, now decode.c, becomes like this:

int main(int argc, const char * argv[]) {
   // 0. Read h264 file
   int buff_size = readAnnexbBitStreamFile("silent_cif_baseline_5_frames.h264");
   printf("totalSize: %d\n", buff_size);
   
   // 1. Open up nalu_t to save nalu_header and SODB
   nalu_t *nalu = allocNalu(MAX_NALU_SIZE);
   
   int curr_nal_start = 0;  // nalu start location currently found
   int curr_find_index = 0; // Location index of current lookup
   
   // 2. Find each nalu in the h264 stream
   while ((nalu->len = find_nal_unit(nalu, buff_size, &curr_nal_start, &curr_find_index)) > 0) {
   }
   freeNalu(nalu);
   freeFilebuffer();
   return 0;
}

When allocNalu() is just implemented as the nalu_t structure, it will not paste the code.

Let's continue with this article.

1. Implement nal_to_rbsp()

The code is implemented as follows:

/**
Removal of 0x03 from rbsp
@see 7.3.1 NAL unit syntax
@see 7.4.1.1 Encapsulation of an SODB within an RBSP
@return Return the size of nalu after removal of 0x03
*/
int nal_to_rbsp(nalu_t *nalu)
{
   int nalu_size = nalu->len;
   int j = 0;
   int count = 0;
   // When encountering 0x000003, remove 03, including the case where the end of cabac_zero_word is 0x000003
   for (int i = 0; i < nalu_size; i++)
   {
       if (count == 2 && nalu->buf[i] == 0x03)
       {
           if (i == nalu_size - 1) // Ending at 0x000003
           {
               break; // Break
           }
           else
           {
               i++; // Go on to the next one
               count = 0;
           }
       }
       nalu->buf[j] = nalu->buf[i];
       if (nalu->buf[i] == 0x00)
       {
           count++;
       }
       else
       {
           count = 0;
       }
       
       j++;
   }
   return j;
}

Note that all the data of the current Nalu are stored in Nalu - > buf at this time, so the operation of converting Nalu to rbsp is to remove 03 from Nalu - > buf in 0x000003, and then re-assign the obtained rbsp data to Nalu - > buf, and get a new Nalu - > len, that is, remove the length of sprb plus 1 byte nalu_header after 0x03.

So at first we need to traverse Nalu - > len times, byte by byte, to find if there is 0x000003 in Nalu - > buf. The independent variable i has two functions:

  • (1) byte-by-byte lookup
  • (2) Control the reassignment of Nalu - > buf. If it encounters 0x000003, it skips one byte of unassignment, which is equivalent to removing 0x03.

The counts inside are obviously meant to find out if there is 0x000003. If it encounters 0 bytes, count will increase, otherwise it will be cleared until there are two consecutive bytes of 0, and then to detect whether the third byte is 0x03.

2. Implementing rbsp_to_sodb()

The code is implemented as follows:

/**
Calculate the length of SODB
[Note] RBSP = SODB + trailing_bits
*/
int rbsp_to_sodb(nalu_t *nalu)
{
   int ctr_bit, bitoffset, last_byte_pos;
   bitoffset = 0;
   last_byte_pos = nalu->len - 1;
   
   // 0. Start with bits at the end of Nalu - > buf
   ctr_bit = (nalu->buf[last_byte_pos] & (0x01 << bitoffset));
   
   // 1. Loop to find rbsp_stop_one_bit in trailing_bits
   while (ctr_bit == 0)
   {
       bitoffset++;
       if(bitoffset == 8)
       {
           // Because Nalu - > buf saves nalu_header+RBSP, finding the last byte of nalu_header declares that the RBSP lookup is over
           if(last_byte_pos == 1)
               printf(" Panic: All zero data sequence in RBSP \n");
           assert(last_byte_pos != 1);
           last_byte_pos -= 1;
           bitoffset = 0;
       }
       ctr_bit= nalu->buf[last_byte_pos-1] & (0x01 << bitoffset);
   }
   // [Note] The function has started to subtract 1 from last_byte_pos, where last_byte_pos represents the position relative to SODB, and then assigns Nalu - > len to get the final size of SODB.
   return last_byte_pos;
}

Note that in this process, we did not reassign Nalu - > buf. Because when extracting RBSP from nalu, 0x000003 may appear in the middle of the data sequence, and removing 0x03 will disturb the original data. Extracting sodb is different, because extracting sodb only removes the tail of rbsp. Speaking of the tail, it must be at the end of the rbsp, so we just need to find it and recalculate Nalu - > len.

So our focus is to find the tail of rbsp, so how to find the tail? The key point is to find the rbsp_stop_one_bit in the tail of rbsp, which is a bit with a value of 1. The data before it is sodb, and after it is the tail of rbsp.

So we just need to look up the last byte of Nalu - > buf bit by bit. When we encounter a bit value of 1, the search ends. At this point, the byte in which rbsp_stop_one_bit is located is the last byte of sodb. The last_byte_pos value of the last byte, that is, the length of sodb.

3. Import Index Columbus Decoding to Realize bs.h

In the [h264/avc syntax and semantics detailed] series, we divide the exponential Columbus coding into. h and. c files to achieve, and here slightly modified, all implemented in. H. Because they are frequently used throughout the decoding process, they need to be used as inline functions.

At this point, we can use the bs_t structure to parse the three syntactic elements of nalu_header from Nalu - > buf:

// Initialization of Bit-by-Bit Reader Handle
bs_t *bs = bs_new(nalu->buf, nalu->len);
// Read nal header 7.3.1
nalu->forbidden_zero_bit = bs_read_u(bs, 1);
nalu->nal_ref_idc = bs_read_u(bs, 2);
nalu->nal_unit_type = bs_read_u(bs, 5);

4. Integration

By synthesizing steps 1, 2 and 3, we can implement the following parsing operations according to Nalu - > buf after finding_nal_unit():

/**
Read a nalu
@see 7.3.1 NAL unit syntax
@see 7.4.1 NAL unit semantics
*/
void read_nal_unit(nalu_t *nalu)
{
   // 1. Remove emulation_prevention_three_byte:0x03 from nalu
   nalu->len = nal_to_rbsp(nalu);
   
   // 2. Initialization of Bit-by-Bit Reader Tool Handles
   bs_t *bs = bs_new(nalu->buf, nalu->len);
   
   // 3. Read nal header 7.3.1
   nalu->forbidden_zero_bit = bs_read_u(bs, 1);
   nalu->nal_ref_idc = bs_read_u(bs, 2);
   nalu->nal_unit_type = bs_read_u(bs, 5);
   
   switch (nalu->nal_unit_type)
   {
       case H264_NAL_SPS:
           nalu->len = rbsp_to_sodb(nalu);
           break;
           
       case H264_NAL_PPS:
           nalu->len = rbsp_to_sodb(nalu);
           break;
           
       case H264_NAL_SLICE:
       case H264_NAL_IDR_SLICE:
           nalu->len = rbsp_to_sodb(nalu);
           break;
           
       case H264_NAL_DPA:
           nalu->len = rbsp_to_sodb(nalu);
           break;
           
       case H264_NAL_DPB:
           nalu->len = rbsp_to_sodb(nalu);
           break;
           
       case H264_NAL_DPC:
           nalu->len = rbsp_to_sodb(nalu);
           break;
           
       default:
           break;
   }
   
   bs_free(bs);
}

The enumeration value of Nalu - > nal_unit_type refers to the implementation of h264 document and JM, FFmpeg. Several commonly used definitions are selected:

/* 7.4.1 Table 7-1 NAL unit types */
enum nal_unit_type {
   H264_NAL_UNKNOWN         = 0,
   H264_NAL_SLICE           = 1,
   H264_NAL_DPA             = 2,
   H264_NAL_DPB             = 3,
   H264_NAL_DPC             = 4,
   H264_NAL_IDR_SLICE       = 5,
   H264_NAL_SEI             = 6,
   H264_NAL_SPS             = 7,
   H264_NAL_PPS             = 8,
   H264_NAL_AUD             = 9,
   H264_NAL_END_SEQUENCE    = 10,
   H264_NAL_END_STREAM      = 11,
   H264_NAL_FILLER_DATA     = 12,
   H264_NAL_SPS_EXT         = 13,
   H264_NAL_AUXILIARY_SLICE = 19,
};

Finally, in order to facilitate the subsequent parsing operation, I replaced a baseline encoding level of h264 file material, and all use I frame, frame number cut to only 5 frames. In the subsequent decoding, we only need to consider the I frame at the baseline level first, and then we can change other materials when needed.

The source code address for this article is as follows (in H264Analysis_02):

1,GitHub: https://github.com/Gosivn/H264Analysis

Topics: github network encoding