Pixel format conversion based on FFMPEG (swscale, tribute to Lei Xiaohua)

Posted by phprocky on Thu, 09 Dec 2021 16:24:33 +0100

Pixel format conversion based on FFMPEG (swscale, tribute to Lei Xiaohua)

A few days ago, I wrote several introductory articles on the conversion of ffmpeg programming to encapsulation. The next step was to write transcoding or coding. However, it is found that the transformation of image pixel format will be encountered whether transcoding or encoding. The images we can usually display on the software interface are in RGB format (RGB24 or RGB32). However, the images in video files are basically YUV format (YUV420p or YUV422p). In order to continue my later software development, I need to supplement some knowledge of YUV format. There is also a method of direct mutual conversion between YUV and RGB.

I wrote a blog about YUV and RGB conversion:

https://blog.csdn.net/liyuanbhu/article/details/68951683

After reading this blog, you should be able to switch. However, since there are corresponding functions in ffmpeg, we still try to use the functions provided by ffmpeg. After all, the code in ffmpeg should be more optimized than what I wrote.

This blog refers to Dr. Lei's blog:

https://blog.csdn.net/leixiaohua1020/article/details/14215391

Dr. Lei wrote this blog several years ago. Now some functions used in it are no longer applicable. Alternatives will be given in my blog. In addition, the system of my blog is also different from that of Leibo. I tried to keep the code as concise as possible.

libswscale in ffmpeg is used for pixel format conversion and image size zooming. For convenience, Qt is also used in my code. Because QImage in Qt is very convenient. However, QImage also has a disadvantage, that is, it does not support YUV format.

Image zooming

libswscale is easy to use. The basic process is three functions.

sws_getContext()
sws_scale()
sws_freeContext()

sws_getContext() function and sws_freeContext() is simple. Where SWS_ Generally, the last three parameters of getcontext are not used, and nullptr can be entered. flags is an interpolation type, usually SWS_BICUBIC will have a better effect. If the speed requirement is relatively high, SWS can be used_ FAST_ BILINEAR. The two functions are declared as follows:

SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
                           int dstW, int dstH, enum AVPixelFormat dstFormat,
                           int flags, SwsFilter *srcFilter,
                           SwsFilter *dstFilter, const double *param);
void sws_freeContext(struct SwsContext *swsContext);

sws_scale() is the focus. First, let's write a relatively simple image scaling. sws_ The function declaration of scale() is as follows:

int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
              const int srcStride[], int srcSliceY, int srcSliceH,
              uint8_t *const dst[], const int dstStride[]);

Where srcSlice is the data pointer of the input image and dst is the data pointer of the output image. srcStride and dstStride are the number of bytes per line. Equivalent to bytesPerLine() of QImage. srcSliceY can be filled with 0 directly. srcSliceH is the number of lines of the input image.

Let's take a closer look at the two image data pointers:

const uint8_t *const srcSlice[];
uint8_t *const dst[];

It's different from what we think. Usually, you may think that these two pointers will be like this:

const uint8_t * srcSlice;
uint8_t dst;

As a result, both pointers are pointer arrays. Why? Because our image data may be multiple memory blocks. In other words, the image data exists in multiple arrays. At this time, one pointer is not enough. We know that there are two types of data arrangement in images: planar and packed.

packed is common, such as RGB24 images. The pixel data is arranged in R, G, B, R, G, B, R, G, B.

planar is usually used in video data, such as YUV422p. The arrangement of pixel data is y y... U u... V

The Y U V data in an image are in three arrays respectively.

First write a QImage image scaling code. The data of QImage is packed. There is no need to consider the problem of image segmentation. Here is the code:

bool scale(const QImage &inImage, QImage &outImage, double scaleX, double scaleY)
{
    int srcW = inImage.width();
    int srcH = inImage.height();
    int desW = srcW * scaleX;
    int desH = srcH * scaleY;
    if(outImage.size() != QSize(desW, desH) || outImage.format() != inImage.format())
    {
        outImage = QImage(QSize(desW, desH), inImage.format());
    }
    AVPixelFormat srcFormat = toAVPixelFormat(inImage.format());

    uint8_t *in_data[1];
    int in_linesize[1];
    in_data[0] = (uint8_t *) inImage.bits();
    in_linesize[0] = inImage.bytesPerLine();
    //av_image_fill_arrays(in_data, in_linesize, inImage.bits(), AV_PIX_FMT_YUYV422, srcW, srcH, 1);

    uint8_t *out_data[1];
    int out_linesize[1];
    out_data[0] = outImage.bits();
    out_linesize[0] = outImage.bytesPerLine();

    SwsContext * pContext = sws_getContext(srcW, srcH, srcFormat,
                                           desW, desH, srcFormat, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
    if(!pContext) return false;
    sws_scale(pContext, in_data, in_linesize, 0, srcH,
              out_data, out_linesize);
    sws_freeContext(pContext);
    return true;
}

It is divided into several parts:

uint8_t *in_data[1];
int in_linesize[1];
in_data[0] = (uint8_t *) inImage.bits();
in_linesize[0] = inImage.bytesPerLine();

Since there is only one piece of QImage data, one pointer is enough. So our pointer array length is 1. in_linesize [] the length of the array is also 1. The code also involves the mapping from QImage::Format to AVPixelFormat format. I wrote a function like this.

enum AVPixelFormat toAVPixelFormat(QImage::Format format)
{
    switch (format) {
    case QImage::Format_Invalid:
    case QImage::Format_MonoLSB:
        return AV_PIX_FMT_NONE;
    case QImage::Format_Mono:
        return AV_PIX_FMT_MONOBLACK;
    case QImage::Format_Indexed8:
        return AV_PIX_FMT_PAL8;
    case QImage::Format_Alpha8:
    case QImage::Format_Grayscale8:
        return AV_PIX_FMT_GRAY8;
    case QImage::Format_Grayscale16:
        return AV_PIX_FMT_GRAY16LE;
    case QImage::Format_RGB32:
    case QImage::Format_ARGB32:
    case QImage::Format_ARGB32_Premultiplied:
        return AV_PIX_FMT_BGRA;
    case QImage::Format_RGB16:
    case QImage::Format_ARGB8565_Premultiplied:
        return AV_PIX_FMT_RGB565LE;
    case QImage::Format_RGB666:
    case QImage::Format_ARGB6666_Premultiplied:
        return AV_PIX_FMT_NONE;
    case QImage::Format_RGB555:
    case QImage::Format_ARGB8555_Premultiplied:
        return AV_PIX_FMT_BGR555LE;
    case QImage::Format_RGB888:
        return AV_PIX_FMT_RGB24;
    case QImage::Format_RGB444:
    case QImage::Format_ARGB4444_Premultiplied:
        return AV_PIX_FMT_RGB444LE;
    case QImage::Format_RGBX8888:
    case QImage::Format_RGBA8888:
    case QImage::Format_RGBA8888_Premultiplied:
        return AV_PIX_FMT_RGBA;
    case QImage::Format_BGR30:
    case QImage::Format_A2BGR30_Premultiplied:
    case QImage::Format_RGB30:
    case QImage::Format_A2RGB30_Premultiplied:
        return AV_PIX_FMT_NONE;
    case QImage::Format_RGBX64:
    case QImage::Format_RGBA64:
    case QImage::Format_RGBA64_Premultiplied:
        return AV_PIX_FMT_RGBA64LE;
    case QImage::Format_BGR888:
        return AV_PIX_FMT_BGR24;
    default:
        return AV_PIX_FMT_NONE;
    }
    return AV_PIX_FMT_NONE;
}

This toAVPixelFormat() function is not fully tested. It is possible that some format mappings are wrong. However, the common QImage::Format_Grayscale8,QImage::Format_RGB32,QImage::Format_RGB888 should be no problem.

The following example is the conversion code from YUV422p to RGB32.

QImage YUV422pToQImageRGB32(const uchar *y, const uchar *u, const uchar *v, int width, int height)
{
    QImage image(QSize(width, height), QImage::Format_RGB32);
    SwsContext * pContext = sws_getContext(width, height, AV_PIX_FMT_YUV422P,
                                           width, height, AV_PIX_FMT_RGBA, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);

    const uint8_t *in_data[4];
    int in_linesize[4];
    in_data[0] = y;
    in_data[1] = u;
    in_data[2] = v;
    in_linesize[0] = width;
    in_linesize[2] = width / 2;
    in_linesize[3] = width / 2;

    uint8_t *out_data[1];
    int out_linesize[1];
    out_data[0] = image.bits();
    out_linesize[0] = image.bytesPerLine();

    sws_scale(pContext, in_data, in_linesize, 0, height, out_data, out_linesize);
    sws_freeContext(pContext);

    return image;
}

Sometimes the data of YUV422P is connected together, and there is only one data header pointer.

QImage YUV422pToQImageRGB32(const uchar *yuv, int width, int height)
{
    QImage image(QSize(width, height), QImage::Format_RGB32);
    SwsContext * pContext = sws_getContext(width, height, AV_PIX_FMT_YUV422P,
                                           width, height, AV_PIX_FMT_RGBA, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);

    const uint8_t *in_data[4];
    int in_linesize[4];
    in_data[0] = yuv;
    in_data[1] = in_data[0] + width * height;
    in_data[2] = in_data[1] + width * height / 2;
    in_linesize[0] = width;
    in_linesize[2] = width / 2;
    in_linesize[3] = width / 2;

    uint8_t *out_data[1];
    int out_linesize[1];
    out_data[0] = image.bits();
    out_linesize[0] = image.bytesPerLine();

    sws_scale(pContext, in_data, in_linesize, 0, height, out_data, out_linesize);
    sws_freeContext(pContext);

    return image;
}

If it is YUV420P data. Then the code is as follows:

QImage YUV420pToQImageRGB32(const uchar *yuv, int width, int height)
{
    QImage image(QSize(width, height), QImage::Format_RGB32);
    SwsContext * pContext = sws_getContext(width, height, AV_PIX_FMT_YUV420P,
                                           width, height, AV_PIX_FMT_RGBA, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);

    const uint8_t *in_data[4];
    int in_linesize[4];
    in_data[0] = yuv;
    in_data[1] = in_data[0] + width * height;
    in_data[2] = in_data[1] + width * height / 4;
    in_linesize[0] = width;
    in_linesize[2] = width / 2;
    in_linesize[3] = width / 2;

    uint8_t *out_data[1];
    int out_linesize[1];
    out_data[0] = image.bits();
    out_linesize[0] = image.bytesPerLine();

    sws_scale(pContext, in_data, in_linesize, 0, height, out_data, out_linesize);
    sws_freeContext(pContext);

    return image;
}

Some students may ask, how are you sure that writing code is right? In fact, before writing these codes, I wrote another code and generated a YUV420P data myself.

    int width = 1280;
    int height = 960;
    int yuvBufferSize = av_image_get_buffer_size(AV_PIX_FMT_YUV420P, width, height, 1);
    uchar * yuvBuffer = (uint8_t*)av_malloc(yuvBufferSize);

    qDebug() << "width = " << width << ", height = " << height;
    qDebug() << "yuvBufferSize = " << yuvBufferSize;

    uint8_t *out_data[4];
    int out_linesize[4];
    av_image_fill_arrays(out_data, out_linesize, yuvBuffer, AV_PIX_FMT_YUV420P, width, height, 1);

    qDebug() << "out_data[0] = " << out_data[0] << ", size = " << out_data[1] - out_data[0] << ", out_linesize[0] = " << out_linesize[0];
    qDebug() << "out_data[1] = " << out_data[1] << ", size = " << out_data[2] - out_data[1] << ", out_linesize[1] = " << out_linesize[1];
    qDebug() << "out_data[2] = " << out_data[2] << ", out_linesize[2] = " << out_linesize[2];

    av_free(yuvBuffer);

The output of this code is as follows:

width =  1280 , height =  960
yuvBufferSize =  1843200
out_data[0] =  0x1f7dda88080 , size =  1228800 , out_linesize[0] =  1280
out_data[1] =  0x1f7ddbb4080 , size =  307200 , out_linesize[1] =  640
out_data[2] =  0x1f7ddbff080 , out_linesize[2] =  640

We know 1280 * 960 = 12288001843200 = 1228800 * 1.5

Therefore, in YUV420P format, y channel accounts for 1228800 bytes, UV accounts for 614400 bytes, and each channel of u and V is 1 / 4 of Y channel. U. V the number of bytes per line of each channel is half the number of bytes per line of Y.

If the program is changed to YUV422P, the result is:

width =  1280 , height =  960
yuvBufferSize =  2457600
out_data[0] =  0x23c9bd5c080 , size =  1228800 , out_linesize[0] =  1280
out_data[1] =  0x23c9be88080 , size =  614400 , out_linesize[1] =  640
out_data[2] =  0x23c9bf1e080 , out_linesize[2] =  640

U. V the number of bytes per channel is 1 / 2 of that of Y channel. U. V is the number of bytes per line of each channel or half of the number of bytes per line of Y.

The difference between YUV422P and YUV420P is that the number of rows and columns of the UV channel of YUV420P is only one side of the number of rows and columns of the picture. YUV422P, the number of lines of the UV channel is the same as that of the picture.

We can also see from the above code. When we are not sure how to fill in srcStride and dstStride, we can use av_image_fill_arrays() to help us.

When we process video images, we usually don't define out directly like me_ Data and out_linesize. We usually use AVFrame. You can refer to the following code snippet:

    AVFrame *pFrameYUV420P = av_frame_alloc();
    pFrameYUV420P->width = 1280;
    pFrameYUV420P->height = 960;
    pFrameYUV420P->format = AV_PIX_FMT_YUV422P;

    int yuvBufferSize = av_image_get_buffer_size((AVPixelFormat) pFrameYUV420P->format,
                                                 pFrameYUV420P->width,
                                                 pFrameYUV420P->height, 1);

    uchar * yuvBuffer = (uint8_t*)av_malloc(yuvBufferSize);

    av_image_fill_arrays(pFrameYUV420P->data,
                         pFrameYUV420P->linesize,
                         yuvBuffer,
                         (AVPixelFormat) pFrameYUV420P->format,
                         pFrameYUV420P->width,
                         pFrameYUV420P->height,
                         1);

    // Here to fill in the specific data

    av_frame_free(&pFrameYUV420P);//yuvBuffer will be automatically released here

There is another way to write this Code:

    AVFrame *pFrameYUV420P = av_frame_alloc();
    pFrameYUV420P->width = 1280;
    pFrameYUV420P->height = 960;
    pFrameYUV420P->format = AV_PIX_FMT_YUV422P;

    av_image_alloc(pFrameYUV420P->data,
                   pFrameYUV420P->linesize,
                   pFrameYUV420P->width,
                   pFrameYUV420P->height,
                   (AVPixelFormat) pFrameYUV420P->format,
                   1);

    // Here to fill in the specific data

    av_frame_free(&pFrameYUV420P);

Well, let's write so much. Students who have read this blog should have mastered the use of sws_scale() is the method of pixel format conversion.

Programmer Think

Pixel format conversion based on FFMPEG (swscale, tribute to Lei Xiaohua)

Pixel format conversion based on FFMPEG (swscale, tribute to Lei Xiaohua)

Image zooming

Hot Topics