Analysis of JPEG principle and debugging of JPEG decoder

Posted by sunwukung on Mon, 31 Jan 2022 20:03:36 +0100

1, Experimental purpose

Master the basic principle of JPEG codec system. Preliminarily master the implementation of complex data compression algorithm, and be able to output the corresponding data according to the needs of theoretical analysis.

2, Experimental content

1. JPEG codec principle

JPEG is the most commonly used image file format. It mainly adopts the joint coding method of predictive coding (DPCM), discrete cosine transform (DCT) and entropy coding to remove redundant image and color data. It belongs to lossy compression format. It can compress the image in a small storage space, which will damage the image data to a certain extent. In particular, the use of too high compression ratio will reduce the quality of the restored image after final decompression. If you pursue high-quality images, it is not appropriate to use too high compression ratio.

Zero offset Level Offset:
U. The V component is positive and negative. Subtract 128 from the Y component to become positive and negative.
For pixels with a gray level of 2n, the unsigned integer value is changed into a signed number by subtracting 2n-1.
For n=8, that is, the value range from 0 to 255 is converted into a value with a value range between - 128 to 127 by subtracting 128.
Objective: to greatly increase the probability of 3-digit decimal system in the absolute value of pixels
DCT transformation:
For each individual color image component, the whole component image is divided into 8 * 8 image blocks and used as the input of two-dimensional discrete cosine transform DCT
quantification:
Because human eyes are more sensitive to brightness signals than color difference signals, two quantization tables are used: brightness quantization value and color difference quantization value
According to the visual characteristics of human eyes (sensitive to low frequency and insensitive to high frequency), fine quantization is adopted for low-frequency components and coarse quantization is adopted for high-frequency parts. This will introduce errors. If the original image is rich in detail, more data will be removed.
Differential coding of DC coefficients:
The DC coefficient of 8 * 8 image blocks after DCT transformation has two characteristics. The value of the coefficient is large, and the DC coefficient of adjacent 8 * 8 image blocks changes little: redundancy
According to this characteristic, JPEG algorithm adopts DCMP (differential pulse modulation coding) to encode the difference DIFF of quantized DC coefficients between adjacent image blocks.
Z-scan of AC coefficient:
After DCT transformation, most of the coefficients are concentrated in the upper left corner, that is, the low-frequency component area. Therefore, the Z-shape is used to read out in the order of high and low frequency, and even zeros can appear. Run length coding can be used. If they are all 0, EOB can be given directly
Huffman code:

Huffman coding is used for the results of DC coefficient DPCM and AC coefficient RLE. The category ID is encoded by unary code, and the intra category index is encoded by fixed length code.
Take the DC coefficient as an example. For example, if the difference DIFF = 3, the corresponding category ID = 2, and the intra category index = 3, the codeword is 100 11.
There are four Huffman coding tables: brightness DC, brightness AC, color difference DC and color difference AC.
Decoding is the inverse of encoding

2. Analysis of JPEG file format

JPEG is composed of segments, and each segment starts with a marker. Makers start with 0xFF, followed by a 1-byte tag identifier, a 2-byte tag length, and the payload corresponding to the tag

The high-order part of the tag length is in the front and the low-order part is in the back. It does not contain the first two bytes of the tag
The data of entropy coding part is inserted into 0x00 by the encoder after 0xFF. The decoder skips this byte during decoding and will not process it
SOI (start of image) and EOI (end of image) tags have no payload
It must start with 0xFFD8, which means that the image starts SOI
It must end with 0xFFD9, indicating that the image ends EOI

Use the following picture as the test image

SOI and EOI
APP0
Length of APP0 block: 16 bytes (excluding 0xFFE0)
Quantization table DQT, value 0xDB
Generally, there are two quantization tables, one for brightness and one for chroma
Starting with 0xFFDB, the length of the quantization table is generally 0043 (or 0084), and this picture is 0043.
Quantization table information (1 byte)
Bit 0~3 QT number (only 0 ~ 3 can be taken, otherwise it will be wrong)
Bit 4~7 QT accuracy (0 is 8bit, otherwise 16 bit)
Frame image start SOF0, value 0xFFC0
Huffman table, value 0xFFC4

There are four Huffman tables with lengths of 29, 62, 30 and 47 respectively
Table ID and table type: 0x00 DC No. 0 table, 0x10 AC No. 0 table, 0x01 DC No. 1 table, 0X11 AC No. 1 table
Number of codewords with different bits (16 bytes): taking DC No. 0 table as an example, it means that there is no 1-bit Huffman codeword, there are 3 2-bit Huffman codewords, 1 Huffman codeword of 3 ~ 9 bits, and there is no Huffman codeword of 10 bits or more.
Coding content (10 bytes): take DC 0 table as an example. It is known from the previous part that there are 10 items in this field of this table. This data indicates that 10 leaf nodes are arranged from small to large, and their weights (Huffval) are 04, 05, 06, 03, 02, 01, 00, 09, 07 and 08 (hexadecimal).
SOS, value 0xFFDA

Start with FFDA, indicating the length of the field and the number of color components
Spectrum selection start and end and fixed value 003F00. Then there is the official image data.

3, Experimental code

1. Structure

Store Huffman code table

struct huffman_table
{
  /* Fast look up table, using HUFFMAN_HASH_NBITS bits we can have directly the symbol,
   * if the symbol is <0, then we need to look into the tree table */
  short int lookup[HUFFMAN_HASH_SIZE];
  /* code size: give the number of bits of a symbol is encoded */
  unsigned char code_size[HUFFMAN_HASH_SIZE];
  /* some place to store value that is not encoded in the lookup table 
   * FIXME: Calculate if 256 value is enough to store all values
   */
  uint16_t slowtable[16-HUFFMAN_HASH_NBITS][256];
};

Stores information about decoding in the current image block

struct component 
{
  unsigned int Hfactor;
  unsigned int Vfactor;
  float *Q_table;		/* Pointer to the quantisation table to use */
  struct huffman_table *AC_table;
  struct huffman_table *DC_table;
  short int previous_DC;	/* Previous DC coefficient */
  short int DCT[64];		/* DCT coef */
#if SANITY_CHECK
  unsigned int cid;
#endif
};

Store image width and height, code table and other information

struct jdec_private
{
  /* Public variables */
  uint8_t *components[COMPONENTS];
  unsigned int width, height;	/* Size of the image */
  unsigned int flags;

  /* Private variables */
  const unsigned char *stream_begin, *stream_end;
  unsigned int stream_length;

  const unsigned char *stream;	/* Pointer to the current stream */
  unsigned int reservoir, nbits_in_reservoir;

  struct component component_infos[COMPONENTS];
  float Q_tables[COMPONENTS][64];		/* quantization tables */
  struct huffman_table HTDC[HUFFMAN_TABLES];	/* DC huffman tables   */
  struct huffman_table HTAC[HUFFMAN_TABLES];	/* AC huffman tables   */
  int default_huffman_table_initialized;
  int restart_interval;
  int restarts_to_go;				/* MCUs left in this restart interval */
  int last_rst_marker_seen;			/* Rst marker is incremented each time */

  /* Temp space used after the IDCT to store each components */
  uint8_t Y[64*4], Cr[64], Cb[64];

  jmp_buf jump_state;
  /* Internal Pointer use for colorspace conversion, do not modify it !!! */
  uint8_t *plane[COMPONENTS];

};

2. Decoding

int convert_one_image(const char *infilename, const char *outfilename, int output_format)
{
  FILE *fp;
  unsigned int length_of_file;
  unsigned int width, height;
  unsigned char *buf;
  struct jdec_private *jdec;
  unsigned char *components[3];

  /* Load the Jpeg into memory */
  fp = fopen(infilename, "rb");
  if (fp == NULL)
    exitmessage("Cannot open filename\n");
  length_of_file = filesize(fp);
  buf = (unsigned char *)malloc(length_of_file + 4);
  if (buf == NULL)
    exitmessage("Not enough memory for loading file\n");
  fread(buf, length_of_file, 1, fp);
  fclose(fp);

  /* Decompress it */
  jdec = tinyjpeg_init();
  if (jdec == NULL)
    exitmessage("Not enough memory to alloc the structure need for decompressing\n");

  if (tinyjpeg_parse_header(jdec, buf, length_of_file)<0)
    exitmessage(tinyjpeg_get_errorstring(jdec));

  /* Get the size of the image */
  tinyjpeg_get_size(jdec, &width, &height);

  snprintf(error_string, sizeof(error_string),"Decoding JPEG image...\n");
  if (tinyjpeg_decode(jdec, output_format) < 0)
    exitmessage(tinyjpeg_get_errorstring(jdec));

  /* 
   * Get address for each plane (not only max 3 planes is supported), and
   * depending of the output mode, only some components will be filled 
   * RGB: 1 plane, YUV420P: 3 planes, GREY: 1 plane
   */
  tinyjpeg_get_components(jdec, components);

  /* Save it */
  switch (output_format)
   {
    case TINYJPEG_FMT_RGB24:
    case TINYJPEG_FMT_BGR24:
      write_tga(outfilename, output_format, width, height, components);
      break;
    case TINYJPEG_FMT_YUV420P:
      write_yuv(outfilename, width, height, components);
      break;
    case TINYJPEG_FMT_GREY:
      write_pgm(outfilename, width, height, components);
      break;
   }

  /* Only called this if the buffers were allocated by tinyjpeg_decode() */
  tinyjpeg_free(jdec);
  /* else called just free(jdec); */

  free(buf);
  return 0;
}

JPEG file header parsing

int tinyjpeg_parse_header(struct jdec_private *priv, const unsigned char *buf, unsigned int size)
{
  int ret;

  /* Identify the file */
  if ((buf[0] != 0xFF) || (buf[1] != SOI))
    snprintf(error_string, sizeof(error_string),"Not a JPG file ?\n");

  priv->stream_begin = buf+2;
  priv->stream_length = size-2;
  priv->stream_end = priv->stream_begin + priv->stream_length;

  ret = parse_JFIF(priv, priv->stream_begin);

  return ret;
}

Parse marker flag

static int parse_JFIF(struct jdec_private *priv, const unsigned char *stream)
{
  int chuck_len;
  int marker;
  int sos_marker_found = 0;
  int dht_marker_found = 0;
  const unsigned char *next_chunck;

  /* Parse marker */
  while (!sos_marker_found)
   {
     if (*stream++ != 0xff)
       goto bogus_jpeg_format;
     /* Skip any padding ff byte (this is normal) */
     while (*stream == 0xff)
       stream++;

     marker = *stream++;
     chuck_len = be16_to_cpu(stream);
     next_chunck = stream + chuck_len;
     switch (marker)
      {
       case SOF:
	 if (parse_SOF(priv, stream) < 0)
	   return -1;
	 break;
       case DQT:
	 if (parse_DQT(priv, stream) < 0)
	   return -1;
	 break;
       case SOS:
	 if (parse_SOS(priv, stream) < 0)
	   return -1;
	 sos_marker_found = 1;
	 break;
       case DHT:
	 if (parse_DHT(priv, stream) < 0)
	   return -1;
	 dht_marker_found = 1;
	 break;
       case DRI:
	 if (parse_DRI(priv, stream) < 0)
	   return -1;
	 break;
       default:
#if TRACE
	fprintf(p_trace,"> Unknown marker %2.2x\n", marker);
	fflush(p_trace);
#endif
	 break;
      }

     stream = next_chunck;
   }

  if (!dht_marker_found) {
#if TRACE
	  fprintf(p_trace,"No Huffman table loaded, using the default one\n");
	  fflush(p_trace);
#endif
    build_default_huffman_tables(priv);
  }

#ifdef SANITY_CHECK
  if (   (priv->component_infos[cY].Hfactor < priv->component_infos[cCb].Hfactor)
      || (priv->component_infos[cY].Hfactor < priv->component_infos[cCr].Hfactor))
    snprintf(error_string, sizeof(error_string),"Horizontal sampling factor for Y should be greater than horitontal sampling factor for Cb or Cr\n");
  if (   (priv->component_infos[cY].Vfactor < priv->component_infos[cCb].Vfactor)
      || (priv->component_infos[cY].Vfactor < priv->component_infos[cCr].Vfactor))
    snprintf(error_string, sizeof(error_string),"Vertical sampling factor for Y should be greater than vertical sampling factor for Cb or Cr\n");
  if (   (priv->component_infos[cCb].Hfactor!=1) 
      || (priv->component_infos[cCr].Hfactor!=1)
      || (priv->component_infos[cCb].Vfactor!=1)
      || (priv->component_infos[cCr].Vfactor!=1))
    snprintf(error_string, sizeof(error_string),"Sampling other than 1x1 for Cr and Cb is not supported");
#endif

  return 0;
bogus_jpeg_format:
#if TRACE
  fprintf(p_trace,"Bogus jpeg format\n");
  fflush(p_trace);
#endif
  return -1;
}

Analytical quantization table

static int parse_DQT(struct jdec_private *priv, const unsigned char *stream)
{
  int qi;
  float *table;
  const unsigned char *dqt_block_end;
#if TRACE
  fprintf(p_trace,"> DQT marker\n");
  fflush(p_trace);
#endif
  dqt_block_end = stream + be16_to_cpu(stream);
  stream += 2;	/* Skip length */

  while (stream < dqt_block_end)
   {
     qi = *stream++;
#if SANITY_CHECK
     if (qi>>4)
       snprintf(error_string, sizeof(error_string),"16 bits quantization table is not supported\n");
     if (qi>4)
       snprintf(error_string, sizeof(error_string),"No more 4 quantization table is supported (got %d)\n", qi);
#endif
     table = priv->Q_tables[qi];
     build_quantization_table(table, stream);
     stream += 64;
   }
#if TRACE
  fprintf(p_trace,"< DQT marker\n");
  fflush(p_trace);
#endif
  return 0;
}

Establish quantitative table

static void build_quantization_table(float *qtable, const unsigned char *ref_table)
{
  /* Taken from libjpeg. Copyright Independent JPEG Group's LLM idct.
   * For float AA&N IDCT method, divisors are equal to quantization
   * coefficients scaled by scalefactor[row]*scalefactor[col], where
   *   scalefactor[0] = 1
   *   scalefactor[k] = cos(k*PI/16) * sqrt(2)    for k=1..7
   * We apply a further scale factor of 8.
   * What's actually stored is 1/divisor so that the inner loop can
   * use a multiplication rather than a division.
   */
  int i, j;
  static const double aanscalefactor[8] = {
     1.0, 1.387039845, 1.306562965, 1.175875602,
     1.0, 0.785694958, 0.541196100, 0.275899379
  };
  const unsigned char *zz = zigzag;
  const unsigned char *zz2 = zigzag;
  for (i=0; i<8; i++) {
     for (j=0; j<8; j++) {
       *qtable++ = ref_table[*zz++] * aanscalefactor[i] * aanscalefactor[j];
     }
   }

 #if TRACE
  for (i=0; i<8; i++)
  {
   for (j=0; j<8; j++)
   {
    fprintf(p_trace,"%-6d",ref_table[*zz2++]);
   }
   fprintf(p_trace,"\n");
  }
#endif
}

Create Huffman table

static void build_huffman_table(const unsigned char *bits, const unsigned char *vals, struct huffman_table *table)
{
  unsigned int i, j, code, code_size, val, nbits;
  unsigned char huffsize[HUFFMAN_BITS_SIZE+1], *hz;
  unsigned int huffcode[HUFFMAN_BITS_SIZE+1], *hc;
  int next_free_entry;

  /*
   * Build a temp array 
   *   huffsize[X] => numbers of bits to write vals[X]
   */
  hz = huffsize;
  for (i=1; i<=16; i++)
   {
     for (j=1; j<=bits[i]; j++)
       *hz++ = i;
   }
  *hz = 0;

  memset(table->lookup, 0xff, sizeof(table->lookup));
  for (i=0; i<(16-HUFFMAN_HASH_NBITS); i++)
    table->slowtable[i][0] = 0;

  /* Build a temp array
   *   huffcode[X] => code used to write vals[X]
   */
  code = 0;
  hc = huffcode;
  hz = huffsize;
  nbits = *hz;
  while (*hz)
   {
     while (*hz == nbits)
      {
	*hc++ = code++;
	hz++;
      }
     code <<= 1;
     nbits++;
   }

  /*
   * Build the lookup table, and the slowtable if needed.
   */
  next_free_entry = -1;
  for (i=0; huffsize[i]; i++)
   {
     val = vals[i];
     code = huffcode[i];
     code_size = huffsize[i];
	#if TRACE
     fprintf(p_trace,"val=%2.2x code=%8.8x codesize=%2.2d\n", val, code, code_size);
	 fflush(p_trace);
    #endif
     table->code_size[val] = code_size;
     if (code_size <= HUFFMAN_HASH_NBITS)
      {
	/*
	 * Good: val can be put in the lookup table, so fill all value of this
	 * column with value val 
	 */
	int repeat = 1UL<<(HUFFMAN_HASH_NBITS - code_size);
	code <<= HUFFMAN_HASH_NBITS - code_size;
	while ( repeat-- )
	  table->lookup[code++] = val;

      }
     else
      {
	/* Perhaps sorting the array will be an optimization */
	uint16_t *slowtable = table->slowtable[code_size-HUFFMAN_HASH_NBITS-1];
	while(slowtable[0])
	  slowtable+=2;
	slowtable[0] = code;
	slowtable[1] = val;
	slowtable[2] = 0;
	/* TODO: NEED TO CHECK FOR AN OVERFLOW OF THE TABLE */
      }

   }
}

Parsing Huffman table

static int parse_DHT(struct jdec_private *priv, const unsigned char *stream)
{
  unsigned int count, i;
  unsigned char huff_bits[17];
  int length, index;

  length = be16_to_cpu(stream) - 2;
  stream += 2;	/* Skip length */
#if TRACE
  fprintf(p_trace,"> DHT marker (length=%d)\n", length);
  fflush(p_trace);
#endif

  while (length>0) {
     index = *stream++;

     /* We need to calculate the number of bytes 'vals' will takes */
     huff_bits[0] = 0;
     count = 0;
     for (i=1; i<17; i++) {
	huff_bits[i] = *stream++;
	count += huff_bits[i];
     }
#if SANITY_CHECK
     if (count >= HUFFMAN_BITS_SIZE)
       snprintf(error_string, sizeof(error_string),"No more than %d bytes is allowed to describe a huffman table", HUFFMAN_BITS_SIZE);
     if ( (index &0xf) >= HUFFMAN_TABLES)
       snprintf(error_string, sizeof(error_string),"No more than %d Huffman tables is supported (got %d)\n", HUFFMAN_TABLES, index&0xf);
#if TRACE
     fprintf(p_trace,"Huffman table %s[%d] length=%d\n", (index&0xf0)?"AC":"DC", index&0xf, count);
	 fflush(p_trace);
#endif
#endif

     if (index & 0xf0 )
       build_huffman_table(huff_bits, stream, &priv->HTAC[index&0xf]);
     else
       build_huffman_table(huff_bits, stream, &priv->HTDC[index&0xf]);

     length -= 1;
     length -= 16;
     length -= count;
     stream += count;
  }
#if TRACE
  fprintf(p_trace,"< DHT marker\n");
  fflush(p_trace);
#endif
  return 0;
}

Analyze SOS

static int parse_SOS(struct jdec_private *priv, const unsigned char *stream)
{
  unsigned int i, cid, table;
  unsigned int nr_components = stream[2];
#if TRACE
  fprintf(p_trace,"> SOS marker\n");
  fflush(p_trace);
#endif

#if SANITY_CHECK
  if (nr_components != 3)
    snprintf(error_string, sizeof(error_string),"We only support YCbCr image\n");
#endif

  stream += 3;
  for (i=0;i<nr_components;i++) {
     cid = *stream++;
     table = *stream++;
#if SANITY_CHECK
     if ((table&0xf)>=4)
	snprintf(error_string, sizeof(error_string),"We do not support more than 2 AC Huffman table\n");
     if ((table>>4)>=4)
	snprintf(error_string, sizeof(error_string),"We do not support more than 2 DC Huffman table\n");
     if (cid != priv->component_infos[i].cid)
        snprintf(error_string, sizeof(error_string),"SOS cid order (%d:%d) isn't compatible with the SOF marker (%d:%d)\n",
	      i, cid, i, priv->component_infos[i].cid);
#if TRACE
     fprintf(p_trace,"ComponentId:%d  tableAC:%d tableDC:%d\n", cid, table&0xf, table>>4);
	 fflush(p_trace);
#endif
#endif
     priv->component_infos[i].AC_table = &priv->HTAC[table&0xf];
     priv->component_infos[i].DC_table = &priv->HTDC[table>>4];
  }
  priv->stream = stream+3;
#if TRACE
  fprintf(p_trace,"< SOS marker\n");
  fflush(p_trace);
#endif
  return 0;
}

Parse SOF

static int parse_SOF(struct jdec_private *priv, const unsigned char *stream)
{
  int i, width, height, nr_components, cid, sampling_factor;
  int Q_table;
  struct component *c;
#if TRACE
  fprintf(p_trace,"> SOF marker\n");
  fflush(p_trace);
#endif
  print_SOF(stream);

  height = be16_to_cpu(stream+3);
  width  = be16_to_cpu(stream+5);
  nr_components = stream[7];
#if SANITY_CHECK
  if (stream[2] != 8)
    snprintf(error_string, sizeof(error_string),"Precision other than 8 is not supported\n");
  if (width>JPEG_MAX_WIDTH || height>JPEG_MAX_HEIGHT)
    snprintf(error_string, sizeof(error_string),"Width and Height (%dx%d) seems suspicious\n", width, height);
  if (nr_components != 3)
    snprintf(error_string, sizeof(error_string),"We only support YUV images\n");
  if (height%16)
    snprintf(error_string, sizeof(error_string),"Height need to be a multiple of 16 (current height is %d)\n", height);
  if (width%16)
    snprintf(error_string, sizeof(error_string),"Width need to be a multiple of 16 (current Width is %d)\n", width);
#endif
  stream += 8;
  for (i=0; i<nr_components; i++) {
     cid = *stream++;
     sampling_factor = *stream++;
     Q_table = *stream++;
     c = &priv->component_infos[i];
#if SANITY_CHECK
     c->cid = cid;
     if (Q_table >= COMPONENTS)
       snprintf(error_string, sizeof(error_string),"Bad Quantization table index (got %d, max allowed %d)\n", Q_table, COMPONENTS-1);
#endif
     c->Vfactor = sampling_factor&0xf;
     c->Hfactor = sampling_factor>>4;
     c->Q_table = priv->Q_tables[Q_table];
#if TRACE
     fprintf(p_trace,"Component:%d  factor:%dx%d  Quantization table:%d\n",
           cid, c->Hfactor, c->Hfactor, Q_table );
	 fflush(p_trace);
#endif

  }
  priv->width = width;
  priv->height = height;
#if TRACE
  fprintf(p_trace,"< SOF marker\n");
  fflush(p_trace);
#endif

  return 0;
}

Analyze actual image data

int tinyjpeg_decode(struct jdec_private *priv, int pixfmt)
{
  unsigned int x, y, xstride_by_mcu, ystride_by_mcu;
  unsigned int bytes_per_blocklines[3], bytes_per_mcu[3];
  decode_MCU_fct decode_MCU;
  const decode_MCU_fct *decode_mcu_table;
  const convert_colorspace_fct *colorspace_array_conv;
  convert_colorspace_fct convert_to_pixfmt;

  if (setjmp(priv->jump_state))
    return -1;

  /* To keep gcc happy initialize some array */
  bytes_per_mcu[1] = 0;
  bytes_per_mcu[2] = 0;
  bytes_per_blocklines[1] = 0;
  bytes_per_blocklines[2] = 0;

  decode_mcu_table = decode_mcu_3comp_table;
  switch (pixfmt) {
     case TINYJPEG_FMT_YUV420P:
       colorspace_array_conv = convert_colorspace_yuv420p;
       if (priv->components[0] == NULL)
	 priv->components[0] = (uint8_t *)malloc(priv->width * priv->height);
       if (priv->components[1] == NULL)
	 priv->components[1] = (uint8_t *)malloc(priv->width * priv->height/4);
       if (priv->components[2] == NULL)
	 priv->components[2] = (uint8_t *)malloc(priv->width * priv->height/4);
       bytes_per_blocklines[0] = priv->width;
       bytes_per_blocklines[1] = priv->width/4;
       bytes_per_blocklines[2] = priv->width/4;
       bytes_per_mcu[0] = 8;
       bytes_per_mcu[1] = 4;
       bytes_per_mcu[2] = 4;
       break;

     case TINYJPEG_FMT_RGB24:
       colorspace_array_conv = convert_colorspace_rgb24;
       if (priv->components[0] == NULL)
	 priv->components[0] = (uint8_t *)malloc(priv->width * priv->height * 3);
       bytes_per_blocklines[0] = priv->width * 3;
       bytes_per_mcu[0] = 3*8;
       break;

     case TINYJPEG_FMT_BGR24:
       colorspace_array_conv = convert_colorspace_bgr24;
       if (priv->components[0] == NULL)
	 priv->components[0] = (uint8_t *)malloc(priv->width * priv->height * 3);
       bytes_per_blocklines[0] = priv->width * 3;
       bytes_per_mcu[0] = 3*8;
       break;

     case TINYJPEG_FMT_GREY:
       decode_mcu_table = decode_mcu_1comp_table;
       colorspace_array_conv = convert_colorspace_grey;
       if (priv->components[0] == NULL)
	 priv->components[0] = (uint8_t *)malloc(priv->width * priv->height);
       bytes_per_blocklines[0] = priv->width;
       bytes_per_mcu[0] = 8;
       break;

     default:
#if TRACE
		 fprintf(p_trace,"Bad pixel format\n");
		 fflush(p_trace);
#endif
       return -1;
  }

  xstride_by_mcu = ystride_by_mcu = 8;
  if ((priv->component_infos[cY].Hfactor | priv->component_infos[cY].Vfactor) == 1) {
     decode_MCU = decode_mcu_table[0];
     convert_to_pixfmt = colorspace_array_conv[0];
#if TRACE
     fprintf(p_trace,"Use decode 1x1 sampling\n");
	 fflush(p_trace);
#endif
  } else if (priv->component_infos[cY].Hfactor == 1) {
     decode_MCU = decode_mcu_table[1];
     convert_to_pixfmt = colorspace_array_conv[1];
     ystride_by_mcu = 16;
#if TRACE
     fprintf(p_trace,"Use decode 1x2 sampling (not supported)\n");
	 fflush(p_trace);
#endif
  } else if (priv->component_infos[cY].Vfactor == 2) {
     decode_MCU = decode_mcu_table[3];
     convert_to_pixfmt = colorspace_array_conv[3];
     xstride_by_mcu = 16;
     ystride_by_mcu = 16;
#if TRACE 
	 fprintf(p_trace,"Use decode 2x2 sampling\n");
	 fflush(p_trace);
#endif
  } else {
     decode_MCU = decode_mcu_table[2];
     convert_to_pixfmt = colorspace_array_conv[2];
     xstride_by_mcu = 16;
#if TRACE
     fprintf(p_trace,"Use decode 2x1 sampling\n");
	 fflush(p_trace);
#endif
  }

  resync(priv);

  /* Don't forget to that block can be either 8 or 16 lines */
  bytes_per_blocklines[0] *= ystride_by_mcu;
  bytes_per_blocklines[1] *= ystride_by_mcu;
  bytes_per_blocklines[2] *= ystride_by_mcu;

  bytes_per_mcu[0] *= xstride_by_mcu/8;
  bytes_per_mcu[1] *= xstride_by_mcu/8;
  bytes_per_mcu[2] *= xstride_by_mcu/8;

  /* Just the decode the image by macroblock (size is 8x8, 8x16, or 16x16) */
  for (y=0; y < priv->height/ystride_by_mcu; y++)
   {
     //trace("Decoding row %d\n", y);
     priv->plane[0] = priv->components[0] + (y * bytes_per_blocklines[0]);
     priv->plane[1] = priv->components[1] + (y * bytes_per_blocklines[1]);
     priv->plane[2] = priv->components[2] + (y * bytes_per_blocklines[2]);
     for (x=0; x < priv->width; x+=xstride_by_mcu)
      {
	decode_MCU(priv);
	convert_to_pixfmt(priv);
	priv->plane[0] += bytes_per_mcu[0];
	priv->plane[1] += bytes_per_mcu[1];
	priv->plane[2] += bytes_per_mcu[2];
	if (priv->restarts_to_go>0)
	 {
	   priv->restarts_to_go--;
	   if (priv->restarts_to_go == 0)
	    {
	      priv->stream -= (priv->nbits_in_reservoir/8);
	      resync(priv);
	      if (find_next_rst_marker(priv) < 0)
		return -1;
	    }
	 }
      }
   }
#if TRACE
  fprintf(p_trace,"Input file size: %d\n", priv->stream_length+2);
  fprintf(p_trace,"Input bytes actually read: %d\n", priv->stream - priv->stream_begin + 2);
  fflush(p_trace);
#endif

  return 0;
}

4, Experimental results

1.yuv

 snprintf(temp, 1024, "%s.YUV", filename);
  F = fopen(temp, "wb");
  fwrite(components[0], width, height, F);
  fwrite(components[1], width*height/4, 1, F);
  fwrite(components[2], width*height/4, 1, F);

2. Quantization matrix

static void build_quantization_table(float *qtable, const unsigned char *ref_table)
{
  /* Taken from libjpeg. Copyright Independent JPEG Group's LLM idct.
   * For float AA&N IDCT method, divisors are equal to quantization
   * coefficients scaled by scalefactor[row]*scalefactor[col], where
   *   scalefactor[0] = 1
   *   scalefactor[k] = cos(k*PI/16) * sqrt(2)    for k=1..7
   * We apply a further scale factor of 8.
   * What's actually stored is 1/divisor so that the inner loop can
   * use a multiplication rather than a division.
   */
  int i, j;
  static const double aanscalefactor[8] = {
     1.0, 1.387039845, 1.306562965, 1.175875602,
     1.0, 0.785694958, 0.541196100, 0.275899379
  };
  const unsigned char *zz = zigzag;
  const unsigned char *zz_2 = zigzag;
  for (i=0; i<8; i++) {
     for (j=0; j<8; j++) {
       *qtable++ = ref_table[*zz++] * aanscalefactor[i] * aanscalefactor[j];
     }
   }

 #if TRACE
  for (i=0; i<8; i++)
  {
   for (j=0; j<8; j++)
   {
    fprintf(p_trace,"%-6d",ref_table[*zz_2++]);
   }
   fprintf(p_trace,"\n");
  }
#endif
}

Programmer Think