Audio and video series 4: acquisition of audio and video frame data by ffmpeg

Posted by techite on Mon, 29 Nov 2021 14:57:39 +0100

title: audio and video series 4: acquisition of audio and video frame data by ffmpeg

categories:[ffmpeg]

tags: [audio and video programming]

date: 2021/11/29

< div align = 'right' > Author: Hackett < / div >

<div align ='right'> WeChat official account: overtime apes </div>

1, AVFrame decoded video

1. First post a screenshot of 20 frames of flv file data parsed by ffmpeg. AVFrame is a structure containing many code stream parameters. The source code of the structure is located in libavcodec/avcodec.h

Full code:

#include <stdio.h>

#ifdef __cplusplus
extern "C" {
#endif
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#ifdef __cplusplus
};
#endif

int openCodecContext(const AVFormatContext* pFormatCtx, int* pStreamIndex, enum AVMediaType type, AVCodecContext** ppCodecCtx) {
    int streamIdx = -1;
    // Get stream subscript
    for (int i = 0; i < pFormatCtx->nb_streams; i++) {
        if (pFormatCtx->streams[i]->codec->codec_type == type) {
            streamIdx = i;
            break;
        }
    }
    if (streamIdx == -1) {
        printf("find video stream failed!\n");
        exit(-1);
    }
    // Find decoder
    AVCodecContext* pCodecCtx = pFormatCtx->streams[streamIdx]->codec;
    AVCodec* pCodec = avcodec_find_decoder(pCodecCtx->codec_id);
    if (NULL == pCodec) {
        printf("avcode find decoder failed!\n");
        exit(-1);
    }

    //Open decoder
    if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) {
        printf("avcode open failed!\n");
        exit(-1);
    }
    *ppCodecCtx = pCodecCtx;
    *pStreamIndex = streamIdx;

    return 0;
}

int main(void)
{
    AVFormatContext* pInFormatCtx = NULL;
    AVCodecContext* pVideoCodecCtx = NULL;
    AVCodecContext* pAudioCodecCtx = NULL;
    AVPacket* pPacket = NULL;
    AVFrame* pFrame = NULL;
    int ret;
    /* Support for local files and network URLs */
    const char streamUrl[] = "./ouput_1min.flv";

    /* 1. register */
    av_register_all();

    pInFormatCtx = avformat_alloc_context();

    /* 2. Open stream */
    if (avformat_open_input(&pInFormatCtx, streamUrl, NULL, NULL) != 0) {
        printf("Couldn't open input stream.\n");
        return -1;
    }

    /* 3. Get stream information */
    if (avformat_find_stream_info(pInFormatCtx, NULL) < 0) {
        printf("Couldn't find stream information.\n");
        return -1;
    }

    int videoStreamIdx = -1;
    int audioStreamIdx = -1;
    /* 4. Find and open decoder */
    openCodecContext(pInFormatCtx, &videoStreamIdx, AVMEDIA_TYPE_VIDEO, &pVideoCodecCtx);
    openCodecContext(pInFormatCtx, &audioStreamIdx, AVMEDIA_TYPE_AUDIO, &pAudioCodecCtx);

    pPacket = av_packet_alloc();
    pFrame = av_frame_alloc();

    int cnt = 20; // Read 20 frames of data (audio and video)
    while (cnt--) {
        /* 5. Read the stream data, and store the uncoded data in the pPacket */
        ret = av_read_frame(pInFormatCtx, pPacket);
        if (ret < 0) {
            printf("av_read_frame error\n");
            break;
        }

        /* 6. Decode and store the decoded data in pFrame */
        /* Video decoding */
        if (pPacket->stream_index == videoStreamIdx) {
            avcodec_decode_video2(pVideoCodecCtx, pFrame, &ret, pPacket);
            if (ret == 0) {
                printf("video decodec error!\n");
                continue;
            }
            printf("* * * * * * video * * * * * * * * *\n");
            printf("___height: [%d]\n", pFrame->height);
            printf("____width: [%d]\n", pFrame->width);
            printf("pict_type: [%d]\n", pFrame->pict_type);
            printf("key_frame: [%d]\n", pFrame->key_frame); // Video keyframe 1 - > Yes 0 - > no
            printf("___format: [%d]\n", pFrame->format);
            printf("* * * * * * * * * * * * * * * * * * *\n\n");
        }

        /* Audio decoding */
        if (pPacket->stream_index == audioStreamIdx) {
            avcodec_decode_audio4(pAudioCodecCtx, pFrame, &ret, pPacket);
            if (ret < 0) {
                printf("audio decodec error!\n");
                continue;
            }
            printf("* * * * * * audio * * * * * * * * * *\n");
            printf("____nb_samples: [%d]\n", pFrame->nb_samples);
            printf("__samples_rate: [%d]\n", pFrame->sample_rate);
            printf("channel_layout: [%lu]\n", pFrame->channel_layout);
            printf("________format: [%d]\n", pFrame->format);
            printf("* * * * * * * * * * * * * * * * * * *\n\n");
        }
        av_packet_unref(pPacket); /* Set the reference count of the cache space to - 1 and the other fields in the Packet to the initial value. If the reference count is 0, the cache space is automatically released */
    }
    /* Release resources */
    av_frame_free(&pFrame);
    av_packet_free(&pPacket);
    avcodec_close(pVideoCodecCtx);
    avcodec_close(pAudioCodecCtx);
    avformat_close_input(&pInFormatCtx);

    return 0;
}

2. Briefly introduce the meaning of each function in the process:

av_register_all(): register all codecs of FFmpeg.

avformat_open_input(): open the AVFormatContext of the stream.

avformat_find_stream_info(): get the information of the stream.

avcodec_find_encoder(): find the encoder.

avcodec_open2(): open the encoder.

av_read_frame(): read stream data.

avcodec_decode_video2(): video decoding.

av_write_frame(): write the encoded video code stream to the file.

av_packet_unref(): count the reference of the cache space to - 1 and set other fields in the Packet to the initial value. If the reference count is 0, the cache space is automatically released.

2, AVFrame data structure

AVFrame structure is generally used to store original data (i.e. uncompressed data, such as YUV and RGB for video and PCM for audio). In addition, it also contains some relevant information.

The comments on the source code here are too lengthy, so they are omitted.

typedef struct AVFrame {
#define AV_NUM_DATA_POINTERS 8
    uint8_t *data[AV_NUM_DATA_POINTERS];
    int linesize[AV_NUM_DATA_POINTERS];
    uint8_t **extended_data;
    int width, height;
    int nb_samples;
    int format;
    int key_frame;
    enum AVPictureType pict_type;
    AVRational sample_aspect_ratio;
    int64_t pts;
#if FF_API_PKT_PTS
    attribute_deprecated
    int64_t pkt_pts;
#endif
    int64_t pkt_dts;
    int coded_picture_number;
    int display_picture_number;
    int quality;
    void *opaque;
#if FF_API_ERROR_FRAME
    attribute_deprecated
    uint64_t error[AV_NUM_DATA_POINTERS];
#endif
    int repeat_pict;
    int interlaced_frame;
    int top_field_first;
    int palette_has_changed;
    int64_t reordered_opaque;
    int sample_rate;
    uint64_t channel_layout;
    AVBufferRef *buf[AV_NUM_DATA_POINTERS];
    AVBufferRef **extended_buf;
    int        nb_extended_buf;
    AVFrameSideData **side_data;
    int            nb_side_data;
#define AV_FRAME_FLAG_CORRUPT       (1 << 0)
#define AV_FRAME_FLAG_DISCARD   (1 << 2)
    int flags;
    enum AVColorRange color_range;
    enum AVColorPrimaries color_primaries;
    enum AVColorTransferCharacteristic color_trc;
    enum AVColorSpace colorspace;
    enum AVChromaLocation chroma_location;
    int64_t best_effort_timestamp;
    int64_t pkt_pos;
    int64_t pkt_duration;
    AVDictionary *metadata;
    int decode_error_flags;
#define FF_DECODE_ERROR_INVALID_BITSTREAM   1
#define FF_DECODE_ERROR_MISSING_REFERENCE   2
#define FF_DECODE_ERROR_CONCEALMENT_ACTIVE  4
#define FF_DECODE_ERROR_DECODE_SLICES       8
    int channels;
    int pkt_size;
#if FF_API_FRAME_QP
    attribute_deprecated
    int8_t *qscale_table;
    attribute_deprecated
    int qstride;
    attribute_deprecated
    int qscale_type;
    attribute_deprecated
    AVBufferRef *qp_table_buf;
#endif
    AVBufferRef *hw_frames_ctx;
    AVBufferRef *opaque_ref;
    size_t crop_top;
    size_t crop_bottom;
    size_t crop_left;
    size_t crop_right;
    AVBufferRef *private_ref;
    } AVFrame;

Next, let's focus on some common structure members:

2.1 data

 uint8_t *data[AV_NUM_DATA_POINTERS]; // Original data after decoding (YUV, RGB for video and PCM for audio)

data is a pointer array. Each element of the array is a pointer to a plane of an image in video or a plane of a channel in audio.

2.2 linesize

int linesize[AV_NUM_DATA_POINTERS]; // The size of "one row" data in data. Note: it may not be equal to the width of the image, but is generally greater than the width of the image

For video, linesize each element is the size (in bytes) of a row of images in an image plane. Pay attention to alignment requirements

For audio, linesize each element is the size (in bytes) of an audio plane

Linesize may fill in some additional data due to performance considerations, so linesize may be larger than the actual corresponding audio and video data size.

2.3 width, height;

int width, height; // Video frame width and height (1920x10801280x720...)

2.4 nb_samples

int nb_samples; // The number of sampling points contained in a single channel in an audio frame.

2.5 format

int format; // Original data type after decoding

For video frames, this value corresponds to enum AVPixelFormat

enum AVPixelFormat {  
    AV_PIX_FMT_NONE = -1,  
    AV_PIX_FMT_YUV420P,   ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)  
    AV_PIX_FMT_YUYV422,   ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr  
    AV_PIX_FMT_RGB24,     ///< packed RGB 8:8:8, 24bpp, RGBRGB...  
    AV_PIX_FMT_BGR24,     ///< packed RGB 8:8:8, 24bpp, BGRBGR...  
    AV_PIX_FMT_YUV422P,   ///< planar YUV 4:2:2, 16bpp, (1 Cr & Cb sample per 2x1 Y samples)  
    AV_PIX_FMT_YUV444P,   ///< planar YUV 4:4:4, 24bpp, (1 Cr & Cb sample per 1x1 Y samples)  
    AV_PIX_FMT_YUV410P,   ///< planar YUV 4:1:0,  9bpp, (1 Cr & Cb sample per 4x4 Y samples)  
    AV_PIX_FMT_YUV411P,   ///< planar YUV 4:1:1, 12bpp, (1 Cr & Cb sample per 4x1 Y samples)  
    AV_PIX_FMT_GRAY8,     ///<        Y        ,  8bpp  
    AV_PIX_FMT_MONOWHITE, ///<        Y        ,  1bpp, 0 is white, 1 is black, in each byte pixels are ordered from the msb to the lsb  
    AV_PIX_FMT_MONOBLACK, ///<        Y        ,  1bpp, 0 is black, 1 is white, in each byte pixels are ordered from the msb to the lsb  
    AV_PIX_FMT_PAL8,      ///< 8 bit with PIX_FMT_RGB32 palette  
    AV_PIX_FMT_YUVJ420P,  ///< planar YUV 4:2:0, 12bpp, full scale (JPEG), deprecated in favor of PIX_FMT_YUV420P and setting color_range
    ...((omitted)  
}  

For audio frames, this value corresponds to enum AVSampleFormat

enum AVSampleFormat {  
    AV_SAMPLE_FMT_NONE = -1,  
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits  
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits  
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits  
    AV_SAMPLE_FMT_FLT,         ///< float  
    AV_SAMPLE_FMT_DBL,         ///< double  
    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar  
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar  
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar  
    AV_SAMPLE_FMT_FLTP,        ///< float, planar  
    AV_SAMPLE_FMT_DBLP,        ///< double, planar 
    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically  
};  

2.6 key_frame

int key_frame; // Is it a keyframe

2.7 pict_type

enum AVPictureType pict_type; // Frame type (I,B,P...)

Video frame type (I, B, P, etc.)

enum AVPictureType {
    AV_PICTURE_TYPE_NONE = 0, ///< Undefined
    AV_PICTURE_TYPE_I,     ///< Intra
    AV_PICTURE_TYPE_P,     ///< Predicted
    AV_PICTURE_TYPE_B,     ///< Bi-dir predicted
    AV_PICTURE_TYPE_S,     ///< S(GMC)-VOP MPEG-4
    AV_PICTURE_TYPE_SI,    ///< Switching Intra
    AV_PICTURE_TYPE_SP,    ///< Switching Predicted
    AV_PICTURE_TYPE_BI,    ///< BI type
};

2.8 sample_aspect_ratio

AVRational sample_aspect_ratio; // Video aspect ratio (16:9, 4:3...)

2.9 pts

int64_t pts; // The time stamp is displayed in time_base

2.10 pkt_pts

int64_t pkt_pts; 

The decoding timestamp in the packet corresponding to this frame. This value is obtained by copying DTS from the corresponding packet (decoding to generate this frame).
If there is only dts in the corresponding packet and pts is not set, this value is also the pts of this frame.

2.11 coded_picture_number

int coded_picture_number; // Coded frame sequence number

2.12 display_picture_number

int display_picture_number; // Display frame sequence number

2.13 interlaced_frame

int interlaced_frame; // Is it interlaced

2.14 sample_rate

int sample_rate; // Audio sampling rate

2.15 buf

AVBufferRef *buf[AV_NUM_DATA_POINTERS]; 

The data of this frame can be managed by AVBufferRef, which provides AVBuffer reference mechanism

AVBuffer is a commonly used buffer in FFmpeg. The buffer uses the reference counted mechanism

2.16 pkt_pos

int64_t pkt_pos; // The position offset of the last packet thrown into the decoder in the input file

2.17 pkt_duration

int64_t pkt_duration;// The duration of the corresponding packet, in avstream - > time_ base

2.18 channels

int channels;// Number of audio channels

2.19 pkt_size

int pkt_size;// Corresponding packet size

2.20 crop_

size_t crop_top;
size_t crop_bottom;
size_t crop_left;
size_t crop_right;

Used for video frame image cutting. The four values are the number of pixels cut from the top / bottom / left / right boundary of the frame.

If you think the article is good, you can give it a "three company"

I'm an overtime ape. I'll see you next time

Topics: ffmpeg