Live selling system, using ffmpeg to implement live broadcasting process

Posted by crimsonmoon on Wed, 19 Jan 2022 18:33:11 +0100

Anyone who knows ffmpeg knows that using ffmpeg command, you can publish the images of a live selling system, such as UDP, RTP, RTMP, etc. or even HLS, save m3u8 files and videos to a Web server, and playback them directly by ordinary players of the live selling system.

True, but as a technology enthusiast, I believe everyone is interested in the mechanisms and principles inside, and we want to do it by writing code. In addition, the camera pictures published by the ffmpeg command seem to have some watermarks and display some custom text, which may not be so flexible. For example, we want the main picture in the picture to be the computer desktop, the top left corner shows the camera picture, I believe the existing ffmpeg command can not be achieved. What can I do about that? The answer is yes. If we find the part of ffmpeg that collects the video, can we just replace it with what we need? That is, we grab the desktop picture, grab the camera picture, and then superimpose it together by some zoom.

So here, we have a purpose. Just think about what we have just imagined. We can think of a function in ffmpeg that saves the picture of the computer desktop to a file (which can also be published to the network), whether this function is particularly similar to our needs. Through Baidu, we can get the following code:

AVFormatContext *pFormatCtx = avformat_alloc_context();  
AVInputFormat *ifmt=av_find_input_format("gdigrap");  
avformat_open_input(&pFormatCtx, 0, ifmt,NULL);

With the above code, we can open a device called gdigrap (Recording windows Desktop). When we open it, we can read the picture frame by frame from it. We can find a device called gdigrap in the libavdevice directory. C file, which implements the basic implementation of a ffmpeg device, structured as follows:

/** gdi grabber device demuxer declaration */
AVInputFormat ff_gdigrab_demuxer = {
    .name           = "gdigrab",
    .long_name      = NULL_IF_CONFIG_SMALL("GDI API Windows frame grabber"),
    .priv_data_size = sizeof(struct gdigrab),
    .read_header    = gdigrab_read_header,
    .read_packet    = gdigrab_read_packet,
    .read_close     = gdigrab_read_close,
    .flags          = AVFMT_NOFILE,
    .priv_class     = &gdigrab_class,
};

Notice that the three methods in the read section are to turn on, read data and turn off the device, simply scan the code of the three methods. The code inside is still relatively simple, and is the operation of some DC s (you should know something about windows window drawing). When you read this, we have ideas and we can imitate this file to achieve what we want. Considering that this file is compiled into the ffmpeg library, we want to add several interfaces (or callbacks) without breaking the ffmpeg framework. When you turn on the device, call back our interfaces. When you read the picture, call back our interfaces. When you turn off the device, call back our interfaces. In fact, this C file we just wrote is just a shelf, and the specific implementation inside is given to external users, so we define the following three interfaces:

typedef int (*fnVideoCapInitCallback)(int index, int width, int height, int framerate);
typedef int (*fnVideoCapReadCallback)(int index, unsigned char *buff, int len, int width, int height, int framerate, int format);
typedef int (*fnVideoCapCloseCallback)(int index);

void av_setVideoCapInitCallback(fnVideoCapInitCallback callback);
void av_setVideoCapReadCallback(fnVideoCapReadCallback callback);
void av_setVideoCapCloseCallback(fnVideoCapCloseCallback callback);

The following are concrete implementations of the three methods in the implementation structure:

static int mygrab_read_header(AVFormatContext *s1)
{
    struct mygrab *mygrab = s1->priv_data;
	AVStream   *st       = NULL;
	int ret = 0;
	printf("call mygrab_read_header\n");
	if(mygrab->width <= 0 || mygrab->height <= 0){
		av_log(s1, AV_LOG_ERROR, "video size (%d %d) is invalid\n", mygrab->width, mygrab->height);
		return -1;
	}
	
	st = avformat_new_stream(s1, NULL);
	if (!st) { 
		ret = AVERROR(ENOMEM);
		return -1; 
	}
	printf("avpriv_set_pts_info\n");
	avpriv_set_pts_info(st, 64, 1, 1000000); /* 64 bits pts in us */
	if(mygrab->framerate.num <= 0 || mygrab->framerate.den <= 0 ){
		av_log(s1, AV_LOG_WARNING, "not set framerate set default framerate\n");
		mygrab->framerate.num = 10;
		mygrab->framerate.den = 1;
	}
	
	mygrab->time_base   = av_inv_q(mygrab->framerate);	
	mygrab->time_frame  = av_gettime() / av_q2d(mygrab->time_base);
	mygrab->frame_size = mygrab->width * mygrab->height * 3/2;
	st->codec->codec_type = AVMEDIA_TYPE_VIDEO;
	st->codec->codec_id = AV_CODEC_ID_RAWVIDEO;
	st->codec->pix_fmt = AV_PIX_FMT_YUV420P;//AV_PIX_FMT_RGB24;
	st->codec->width = mygrab->width;
	st->codec->height = mygrab->height; 
	st->codec->time_base = mygrab->time_base;
	st->codec->bit_rate = mygrab->frame_size * 1/av_q2d(st->codec->time_base) * 8;
 
	if(s_videoCapInitCallback != NULL){
		av_log(s1, AV_LOG_INFO, "video size (%d %d) frameRate:%d\n", st->codec->width, st->codec->height, mygrab->framerate.num/mygrab->framerate.den);
		s_videoCapInitCallback(0, st->codec->width, st->codec->height, mygrab->framerate.num/mygrab->framerate.den);
		return 0;
	}
 
	av_log(s1, AV_LOG_ERROR, "video cap not call av_setVideoCapInitCallback\n");
	return -1;
	
}
 
static int mygrab_read_packet(AVFormatContext *s1, AVPacket *pkt)
{
    struct mygrab *s = s1->priv_data;
	int64_t curtime, delay;
	/* Calculate the time of the next frame */ 
	s->time_frame += INT64_C(1000000);
 
	/* wait based on the frame rate */
    for(;;) {
        curtime = av_gettime();
        delay = s->time_frame * s->time_base.num / s->time_base.den - curtime;
        if (delay <= 0) {
            if (delay < INT64_C(-1000000) * s->time_base.num / s->time_base.den) {
                /* printf("grabbing is %d frames late (dropping)\n", (int) -(delay / 16666)); */
                s->time_frame += INT64_C(1000000);
            }
            break;
        }
        av_usleep(delay);
    }
	if (av_new_packet(pkt, s->frame_size) < 0)  return AVERROR(EIO);
	
	pkt->pts = curtime;
	if(s_videoCapReadCallback != NULL){
		s_videoCapReadCallback(0, pkt->data, pkt->size, s->width, s->height, s->framerate.num/s->framerate.den, AV_PIX_FMT_YUV420P);
		return pkt->size;
	}
	av_log(s1, AV_LOG_ERROR, "video cap not call av_setVideoCapReadCallback\n");
	return 0;
}
 
 
static int mygrab_read_close(AVFormatContext *s1)
{
    //struct mygrab *s = s1->priv_data;
	if(s_videoCapCloseCallback != NULL){
		s_videoCapCloseCallback(0);
	}
    return 0;
}

And then we're at alldevices.c Register our custom device, then we can call mygrab by name when we use it. The logic of C is

void avdevice_register_all(void)
{
	......
	REGISTER_INDEV   (MYGRAB,          mygrab);
	......
}

Similarly, create a new myoss.c implements the custom processing and collection of sound, so we have added six new interfaces. To keep the interface simple, we merge the six interfaces into one interface, that is

void av_setVideoAudioCapCallbacks(fnVideoCapInitCallback callback1,fnVideoCapReadCallback callback2,fnVideoCapCloseCallback callback3
		,fnAudioCapInitCallback callback4,fnAudioCapReadCallback callback5,fnAudioCapCloseCallback callback6);

Re-compiling ffmpeg supports our custom video and audio collection functions, which are very flexible. For example, if we use these functions, we will typically write down the following lines of code:

av_setVideoAudioCapCallbacks(..,..,..,..,..,..,..); //Register callbacks
AVFormatContext *pFormatCtx = avformat_alloc_context();  
AVInputFormat *ifmt=av_find_input_format("mygrap");  
avformat_open_input(&pFormatCtx, 0, ifmt,NULL); //Open Custom Video Device
 
 
AVFormatContext *pFormatCtx = avformat_alloc_context();  
AVInputFormat *ifmt=av_find_input_format("myoss");  
avformat_open_input(&pFormatCtx, 0, ifmt,NULL);//Open Custom Audio Device

Then the callback set in the first step will be invoked in the same way as other devices. Through the above compromise, we have laid the foundation for the audio collection of video in the later live selling system, and no more troublesome compilation is needed when using it, just focus on the implementation of our callbacks.
Now that the ffmpeg library is compiled, we are about to start implementing our main camera acquisition coding push stream. We have noticed that there is a good reference example in ffmpeg called muxing.c, this file realizes the function of saving the live selling system picture and sound as a video, so we can implement our function based on this file, muxing. The pictures and sounds in C are generated by code, just pick them up from the camera and sound card. Of course, we certainly don't just add code to it, otherwise the commotion would be meaningless. We'll use the two custom devices above and read the data from them.
Having done the above steps, I believe you can save the video and sound of the camera as mp4 and other files.
This is close to publishing to the network, so to publish the video on the network instead of saving it in a file, just change a few lines of code, for example, publish as HLS, RTMP, FILE:

if(type == TYPE_HLS){
		sprintf(filename, "%s\\playlist.m3u8", szPath);
		avformat_alloc_output_context2(&oc, NULL, "hls", filename);
 
 
	}else if(type == TYPE_RTMP){
		sprintf(filename, "%s", szPath);
		avformat_alloc_output_context2(&oc, NULL, "flv", filename);
	}else if(type == TYPE_FILE){
		sprintf(filename, "%s", szPath);
		avformat_alloc_output_context2(&oc, NULL, NULL, filename);
	}

In this way, we can publish our live broadcasts to the network. Here, we set up a streaming media server for the live selling system, and push the streams to this server, such as red5. Download a vlc or ffplay to play it. Such a complete live broadcasting system is coded and implemented, and the pictures in the live broadcasts are left to us to play. Not just the camera pictures, but the imagination.
Following is a camera made by MFC to collect live Demo, PC browser can view it in real time through web test page, mobile PAD can be viewed through players such as VLC. It is worth mentioning that the delay in Web test page is smaller, if you use a player to play with a larger delay, this is because flash in Web test page sets rtmp cache to minimum. Third-party players have a cache and are relatively large.

In addition, the camera collection in Demo is an opencv, because opencv is easy to process the image, it can easily add text, invert, transform and so on to the live selling system image.
--------
Statement: This article was forwarded by Cloud Leopard Technology from ce6581281 Blog, Contact Author to Delete Infringements

Programmer Think

Live selling system, using ffmpeg to implement live broadcasting process

Hot Topics