Intelligent customer service building - MRCP Server ASR plug-in development

Posted by satyac46 on Wed, 05 Jan 2022 10:21:47 +0100

1. Preparation before coding

1.1 create a plugin

    because unimrcp uses automake for source code compilation management, we also need to add corresponding configuration in addition to adding source code.

1.2 modify configure ac

  first edit configure The AC file, added as follows, is actually a macro definition, which will be used in the Makefile later, and the Makefile we added later:

dnl ShengHan recognizer plugin.
UNI_PLUGIN_ENABLED(shenghanrecog)

AM_CONDITIONAL([SHENGHANRECOG_PLUGIN],[test "${enable_shenghanrecog_plugin}" = "yes"])

...

AC_CONFIG_FILES([
    plugins/shenghan-recog/Makefile
])

...

AC_OUTPUT
echo ShengHan recognizer plugin.... : $enable_shenghanrecog_plugin

1.3 new source code and directory

  under the plugin directory, create a new shenghan recog directory and a new src directory under this directory. You can add demo to the directory_ recog_ engine. C copy to this directory and rename it shenghan_recog_engine.c. And replace all demo keywords in the source code with shenghan.

  create a makefile Am file, as follows:

AM_CPPFLAGS                = $(UNIMRCP_PLUGIN_INCLUDES)

plugin_LTLIBRARIES         = shenghanrecog.la

shenghanrecog_la_SOURCES       = src/shenghan_recog_engine.c
shenghanrecog_la_LDFLAGS       = $(UNIMRCP_PLUGIN_OPTS) -std=c++11 -pthread

include $(top_srcdir)/build/rules/uniplugin.am

  modify the makefile in the plugin directory Am file, add the following content:

if SHENGHANRECOG_PLUGIN
SUBDIRS               += shenghan-recog
endif

2. Coding

2.1 voice engine encapsulation

   class encapsulate the voice engine module you are docking with. It is recommended that the class include the following methods.

/** Initialize the voice Engine and return the created class pointer */
static AsrShengHan* create_shenghan_engine();

/** ws After successful connection, send voice stream data to ws server */
void ws_send_buffer(const void * wave, const size_t wave_size);

/** Event to obtain speech recognition result pushed by ws server */
bool have_asr_result();

/** Get the speech recognition result pushed by ws server */
std::string get_asr_result();

/** It is used to destroy the voice Engine, including destroying threads, freeing websocket memory and freeing Engine memory */
void release_shenghan_engine();

2.2 MRCP Server framework coding

2.2.1 reference header file

  class encapsulation based on voice engine, in Shenghan_ recog_ engine. Reference its header file in C.

#include "AsrShengHan.h"

2.2.2 new class variables

  in the structure Shenghan_ recog_ channel_ Add voice engine class variables in t.

/** Declaration of shenghan recognizer channel */
struct shenghan_recog_channel_t {
	/** Back pointer to engine */
	shenghan_recog_engine_t     *shenghan_engine;
	/** Engine channel base */
	mrcp_engine_channel_t   *channel;

	/** Active (in-progress) recognition request */
	mrcp_message_t          *recog_request;
	/** Pending stop response */
	mrcp_message_t          *stop_response;
	/** Indicates whether input timers are started */
	apt_bool_t               timers_started;
	/** Voice activity detector */
	mpf_activity_detector_t *detector;
	/** File to write utterance to */
	FILE                    *audio_out;

    /** Shenghan asr engine parameter */
    AsrShengHan             *shenghan_asr; // Sound Han engine ASR class
};

2.2.3 introduction to framework core functions

  introduce the core functions of MRCP Server framework, and explain the code integration of Shenghan voice engine.

mrcp_plugin_create

   it is used for voice engine involving login / registration / authentication. For example, iFLYTEK voice engine can add such as XXX_ The login() function does not involve registration for the time being, and no operation is performed in this part.

/** Create shenghan recognizer engine */
MRCP_PLUGIN_DECLARE(mrcp_engine_t*) mrcp_plugin_create(apr_pool_t *pool);

shenghan_recog_engine_destroy

   it is used for global destruction of voice engine. If the voice engine class / SDK involves global destruction, the added code is here.

/** Destroy recognizer engine */
static apt_bool_t shenghan_recog_engine_destroy(mrcp_engine_t *engine);

shenghan_recog_engine_channel_create

  when an MRCP Client connects to the MRCP Server, a channel will be created, and the framework will call this method. You can add a voice engine instantiation function to process one channel of asr results, create_shenghan_engine is called in this function.

/* create shenghan recognizer channel */
static mrcp_engine_channel_t* shenghan_recog_engine_channel_create(mrcp_engine_t *engine, apr_pool_t *pool);

shenghan_recog_channel_destroy

  when an MRCP Client disconnects the MRCP Server, the channel will be destroyed, and the framework will call this method. You can add a voice engine destruction function to release the objects created in the class_ shenghan_ Engine is called in this function.

/** Destroy engine channel */
static apt_bool_t shenghan_recog_channel_destroy(mrcp_engine_channel_t *channel)

shenghan_recog_stream_write

   as can be seen from the function name, this function is used to receive the voice stream sent by the MRCP Client. After receiving the voice stream in this function, two things are done:

  • Spread the voice to Shenghan_ recog_ stream_ The recog function performs asynchronous speech recognition processing
  • Pass voice to mpf_activity_detector_process function for energy calculation and endpoint detection

  let's introduce these two functions.

shenghan_recog_stream_recog

  in the created channel, use shenghan_recog_stream_recog function, get voice flow, speech recognition processing, in this function, call ws_send_buffer for voice recognition.

static apt_bool_t shenghan_recog_stream_recog(shenghan_recog_channel_t *recog_channel, const void *voice_data, unsigned int voice_len) 

mpf_activity_detector_process

   this function is encapsulated in MRCP and involves energy calculation and endpoint detection. Check shenghan_recog_engine_channel_create function, you will find that MPF is called_ activity_ detector_ The input parameter of process is mpf_activity_detector_create creates a voice activity detector.

  according to MPF_ activity_ detector_ Implement the create function and find the level_threshold sets the energy threshold.

MPF_DECLARE(mpf_activity_detector_t*) mpf_activity_detector_create(apr_pool_t *pool)
{
	mpf_activity_detector_t *detector = apr_palloc(pool,sizeof(mpf_activity_detector_t));
	detector->level_threshold = 12; /* 0 .. 255 */
	detector->speech_timeout = 300; /* 0.3 s */
	detector->silence_timeout = 300; /* 0.3 s */
	detector->noinput_timeout = 5000; /* 5 s */
	detector->duration = 0;
	detector->state = DETECTOR_STATE_INACTIVITY;
	return detector;
}

   I use the anchor microphone for testing. Under the default energy threshold of 12, the end of speech cannot be detected. When it is adjusted to 25, sometimes the endpoint can be detected, and sometimes not. I feel that the overall effect of VAD is not good. By the way, the setting method of energy threshold can be directly through mpf_activity_detector_level_set function.

recog_channel->detector = mpf_activity_detector_create(pool);
mpf_activity_detector_level_set(recog_channel->detector, 30);

  later, I went to see the endpoint detection effect of MRCP on the Internet, and the evaluation was basically useless.

   one of the posts is as follows. You can learn about it if you are interested:

  Unimrcp voice activity speech detection.

  for those who want to replace their own VAD with VaD, please refer to these two articles:

  Interpretation of VAD process in WebRTC

  Replace VAD module of unimrcp

   because the voice engine I use supports VAD, I don't study it further. The integration method of soundhan VaD is introduced in 2.2.4.

shenghan_recog_result_load

This function is called after the event that the speech recognition is ended. The function is to get the result of recognition and assemble it to xml to send to MRCP Client terminal, thus completing the process of speech transmission and recognition.

/* Load shenghan recognition result */
static apt_bool_t shenghan_recog_result_load(shenghan_recog_channel_t *recog_channel, mrcp_message_t *message);

2.2.4 soundhan VAD integration

   this part focuses on the following implementation of the VAD function core of MRCP by encapsulating the VAD event of voice engine.

/** Callback is called from MPF engine context to write/send new frame */
static apt_bool_t shenghan_recog_stream_write(mpf_audio_stream_t *stream, const mpf_frame_t *frame)
{
	...
    
	if(recog_channel->recog_request) {
        // The following code is masked because the native vad of mrcp is poor, and the vad of soundhan is used
		// mpf_detector_event_e det_event = mpf_activity_detector_process(recog_channel->detector,frame);
        ......

        // Using Shenghan vad, the mrcp protocol is sent according to the asr result
        bool shenghan_vad_completion = recog_channel->shenghan_asr->have_asr_result();
        if (shenghan_vad_completion == true)
        {
            apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[shenghan] Detected Voice Inactivity " APT_SIDRES_FMT,
                MRCP_MESSAGE_SIDRES(recog_channel->recog_request));
            shenghan_recog_recognition_complete(recog_channel,RECOGNIZER_COMPLETION_CAUSE_SUCCESS);
        }
        
        
        
		if(recog_channel->audio_out) {
			fwrite(frame->codec_frame.buffer,1,frame->codec_frame.size,recog_channel->audio_out);
		}
	}
	return TRUE;
}

3. Compilation

3.1 standard compilation method

   after coding, recompile and install, and then you can configure and use it.

   for compilation and installation methods, please refer to: Intelligent customer service setup (1) - MRCP Server setup.

3.2 independent compilation method

   compile Makefile by adding mrcp and lib/include of engine class.

   this development uses g + + to compile. For details, please refer to makefile

   after the code development is completed and the test is stable, consider uploading the code according to the situation. Since the integrated SDK has not been published yet, I will write a detailed article on integrating ALI / iFLYTEK voice engine later.

  put the compiled so into / usr/local/unimrcp/plugin.

  the next article introduces how MRCP connects with FreeSWITCH.

Topics: C++ Linux