Using ffmpeg to split the video file into several ppm pictures

Posted by iBlizz on Sun, 23 Feb 2020 05:29:45 +0100

Using ffmpeg to split the video file into several ppm pictures

  • thinking
    Get path - > unpack - > get video stream information - > decode - > YUV420 to RGB - > store in ppm format.

  • Pseudo code
    step1. Registration
    Step 2. Open audio and video files
    Step 3. Get audio and video stream
    step4. Find the avmedia? Type? Video of the video type
    Step 5. Find decoder
    Step 6. Copy the AVCodecContext of the video stream
    Step 7. Turn on decoder
    Step 8. Get the total size of the video stream and request heap space
    step9. Initialize YUV to RGB
    Step 10. Create a loop to read one frame of video stream data and determine whether it is the last frame of video stream
    Decode a frame of data
    Formal processing YUV to RGB data type
    Save file in PPM format
    Step 11. Free up space and close audio and video files

  • All the source code written in C language is as follows

#include <stdio.h>
#include <assert.h>
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libavutil/pixfmt.h"
#include "libswscale/swscale.h"

void save_frame(AVFrame *pframe, int width, int height, int iframe);
int main(int argc, char *argv[])
    char *file_path = "/root/vidoLean/xingJiDaZhan.mp4";
    const int frameNumber = 100;

    AVFormatContext *pctx = NULL;

    if (avformat_open_input(&pctx, file_path, NULL, NULL)!=0)
        return -1;
    assert(avformat_find_stream_info(pctx, NULL)>=0);
    av_dump_format(pctx, 0, file_path, 0);

    int video_stream = -1;
    int i= 0;
    for (i=0; i<pctx->nb_streams; i++)
        // Find the first video stream
            video_stream =i;

    if (-1==video_stream)
        printf("no video stream detected\n");
        return -1;
    // Pcodec? CTX points to the first video stream
    AVCodecContext *pcodec_ctx =

    AVCodec *pcodec = NULL;
    // Find the decoder corresponding to the video stream
    pcodec = avcodec_find_decoder(pcodec_ctx->codec_id);
    if (NULL == pcodec)
        printf("unsupported codec.\n");
        return -1;
    //Because the AVCodecContext of the video stream must not be used directly, copy the AVCodecContext context
    AVCodecContext *pcodec_ctx_orig =
    if (avcodec_copy_context(pcodec_ctx_orig, pcodec_ctx) != 0)
        printf("couldn't copy codec context\n");
        return -1;
    // Open codec
    if (avcodec_open2(pcodec_ctx, pcodec, NULL) < 0)
        printf("couldn't open codec\n");
        return -1;

    AVFrame *pframe = av_frame_alloc();
    AVFrame *pframe_rgb = av_frame_alloc();
    assert(pframe && pframe_rgb);

    //Get the total size of the original video stream data
    int num_bytes = avpicture_get_size(AV_PIX_FMT_RGB24,
                                       pcodec_ctx->width, pcodec_ctx->height);
	/*av_malloc() Function is the memory allocation function of FFmpeg,
	In fact, it is just a simple encapsulation of malloc() function, only to ensure memory address alignment to improve the efficiency of the program.
	Using it is similar to using malloc(), we should pay attention to avoid memory leakage, multiple releases and other issues.*/
    uint8_t *buffer = av_malloc(num_bytes * sizeof(uint8_t));

//Fill the video frame data into the newly allocated buffer
                (AVPicture *)pframe_rgb,

    int frame_finished;
    AVPacket pkt;
    //Initializes the sws context for YUV to rgb conversion. Because the original data of video stream is YUV, and the ppm data is rgb.
    struct SwsContext *sws_ctx = sws_getContext(
    i = 0;
	//AV read frame() reads a frame of data
    while (av_read_frame(pctx, &pkt) >= 0)
        if (pkt.stream_index != video_stream)
		//Decode one frame of data, the decoded data is sent to pframe, and the compressed data is sent to pkt
        avcodec_decode_video2(pcodec_ctx, pframe, &frame_finished, &pkt);
        if (!frame_finished)

    //Convert frame from its original format (pctx - > pix ﹐ FMT) to the RGB format we expect
        sws_scale(sws_ctx, pframe->data, pframe->linesize,
                  0, pcodec_ctx->height, pframe_rgb->data, pframe_rgb->linesize);

        if (++i<=frameNumber)
            save_frame(pframe_rgb, pcodec_ctx->width, pcodec_ctx->height,i);  //Save ppm picture file


    // Free memory
    // Close codec
    // Close open file

    return 0;

void save_frame(AVFrame *pframe, int width, int height, int iframe)
    char filename[32];
    int y;

    sprintf(filename, "frame%d.ppm", iframe);
    FILE *fp = fopen(filename, "w+");

    fprintf(fp, "P6\n%d %d\n255\n", width, height); //  The PPM file adds fixed header information.

    for (y=0; y<height; y++)
        fwrite(pframe->data[0]+y*pframe->linesize[0], 1, width*3, fp); //ppm storage format
  • Explanation of code using ffmeg api

Realize the registration of coder coder coder codec hardware accelerator hwacel analyzer parser, and the registration of multiplexer and demuxer demuxer.

This function is used to allocate space to create an avformat context object, and emphasizes the use of the avformat? Free? Context method to clean up and free the space of the object.

Initialization of input and output structure AVIOContext;
Identification of protocol (such as RTMP or file) for input data (through a set of scoring mechanism): 1. Determine the suffix of file name; 2. Read the data of file header for comparison;
The URLProtocol corresponding to the file protocol with the highest score is used to connect with FFMPEG (non professional words) through function pointer;
The rest is to call the function of the URLProtocol to open,read, etc

This function can read some audio and video data and get some related information.
This function is mainly used to assign value to AVStream structure of each media stream (audio / video). After a general understanding of the code of this function, we will find that it has realized the search of decoder, the opening of decoder, the reading of video and audio frames, and the decoding of video and audio frames. In other words, the function has actually "walked" through the whole decoding process.

The parameter of the function is the ID of a decoder and returns the found decoder (NULL if not found).
The function argument is of enumeration type. Loop through the id to find the number of the corresponding decoder id type.

Turn on the decoder. The AVCodecContext used to initialize a video and audio codec.

sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
int dstW, int dstH, enum AVPixelFormat dstFormat,
int flags, SwsFilter *srcFilter,
SwsFilter *dstFilter, const double *param)
Returns the structure of SwsContext type after success.
Parameter 1: width of the converted source
Parameter 2: the height of the converted source
Parameter 3: format of the converted source, eg: YUV, RGB (enumeration format, which can also be directly represented by enumeration code eg: AV ﹣ pix ﹣ FMT ﹣ yuv420p. The enumeration formats are listed in libavutil/pixfmt.h)
Parameter 4: width specified after conversion
Parameter 5: specified height after conversion
Parameter 6: the format specified after conversion is the same as that of parameter 3
Parameter 7: algorithm used for conversion,
Parameter 8: NULL
Parameter 9: NULL
Parameter 10: NULL
The algorithms used in the transformation are enumerated in libswscale/swscale.h

avpicture_fill(AVPicture *picture, const uint8_t *ptr,
enum AVPixelFormat pix_fmt, int width, int heig)
Avpicture? Fill function fills the data pointed by ptr into picture, but does not copy it. It only points the data pointer in picture structure to the data of ptr (details:

Here FFmpeg will help us to calculate the number of bytes needed to store the image in this format. Contact the context and explain in detail the following address:

av_new_packet(); //Allocate packet data
AVPacket stores the data before decoding, that is, the compressed data. The structure itself does not directly contain data. It has a pointer to the data domain. Many data structures in FFmpeg use this method to manage data.

Used for printing program debugging.

The function is to read several audio frames or one video frame in the code stream. For example, when decoding a video, you need to call AV read frame() to obtain the compressed data of a video frame before decoding the data (for example, a frame of compressed data in H.264 usually corresponds to a NAL).

The function of avcodec? Decode? Video2() in ffmpeg is to decode a frame of video data. Input a compressed structure AVPacket and output a decoded structure AVFrame.

The function is mainly used for the conversion of video pixel format and resolution. Its advantages are: it can be realized in the same function: 1. Image color space conversion, 2: resolution scaling, 3: image filtering before and after. The disadvantage is that the efficiency is relatively low, not as good as libyuv or shader

What is the difference between SWS? Scale() and SWS? Getcontext()?
Answer: SWS ENU getcontext() is to initialize the transformation. sws_scale() is used to process specific conversion contents.

  • Design sketch

    Double click operation

Diagram after operation

Published 1 original article, won 1 praise and 2590 visitors
Private letter follow

Topics: codec REST C