Intelligent customer service building - FreeSWITCH + mod_unimrcp + ESL development docking

Posted by tomz0r on Fri, 18 Feb 2022 19:23:48 +0100

1. Write in front

a lot of preparatory work has been done in the previous article. The next thing is to carry out actual docking and use. The goal is to carry out real-time speech recognition for both sides of the call through the unimrcp module, carry out real-time intelligent analysis of the recognized content, and do real-time monitoring, intelligent quality inspection, etc.

the following contents default that you have a certain understanding of FreeSWITCH. Some contents may be briefly described. You can study the specific details in depth.

2. Overall thinking

after dialing into the FreeSWITCH internal line, configure it in advance in the dialing rules of the internal line number, use the Outbound mode, and connect the event after the incoming call_ Socket, hand over the control to the ESL service developed by yourself, and control the function and process through DTMF. Since the process will carry out intelligent voice broadcasting, the parameters of unmicp TTS are also configured.

<extension name="esl-mrcp">
    <condition field="destination_number" expression="^4567$">
        <action application="answer"/>
        <action application="set" data="tts_engine=unimrcp"/>
        <action application="set" data="tts_voice=aixia"/>
        <action application="socket" data="192.168.160.11:16023 async full"/>
    </condition>
</extension>

by the way, generally in ESL, the function unit test can be carried out through different keys. After the test is passed, the integrated joint commissioning can be carried out.

next, continue to talk about the implementation logic of ESL. After accessing the phone, carry out the voice broadcast of welcome words, and then prompt the user to press the key for function selection. After pressing the switch to manual, execute the bridge to bridge the user and the seat. After the bridge is successful, execute the voice recognition commands on the two channels respectively to obtain the recognition results until the end of the call.

3. Concrete realization

3.1 start speech recognition

in Intelligent customer service building (3) - connection between MRCP server and FreeSWITCH Verified play_ and_ detect_ The function of speech will not be introduced in detail here.

in the scenario of realizing the function this time, it is necessary to continuously monitor the real-time calls of both parties, so detect needs to be used_ The speech command starts unimrcp speech recognition. The specific commands are as follows:

detect_speech unimrcp {parameters1=value1,parameters2=value2}hello default

the content in {} is used as the custom parameter of MRCP header and will be transmitted to MRCP server. Generally, it will bring call information.

3.2 monitor speech recognition results

subscribe to all events through the command event plain all. It is found that every time you start speech recognition, you will receive detected after you finish speaking_ The specific contents of the speech event are as follows:

Event-Name: DETECTED_SPEECH
Core-UUID: 4ad02bbf-e4a5-4d29-964b-018962db3354
Event-Calling-Line-Number: 4732
Speech-Type: detected-speech
ASR-Completion-Cause: 0
Channel-State: CS_EXECUTE
Channel-Call-State: ACTIVE
Content-Length: 192

<?xml version="1.0"?>
<result>
  <interpretation confidence="99">
    <engineName>shenghan</engineName>
    <engineStartTime>1636615997148</engineStartTime>
    <result>The result of voice transcription.</result>
    <beginTime>4090</beginTime>
    <endTime>5370</endTime>
    <volumeMax>19</volumeMax>
    <volumeMin>1</volumeMin>
    <volumeAvg>8</volumeAvg>
  </interpretation>
</result>

when the speech type is detected speech, it means that the speech recognition is over. After the actual test, it is found that the unimrcp module of FreeSWITCH realizes short speech recognition and will not automatically start the speech recognition command after completion. This advantage is that in the case of multiple concurrent IVR scenarios, the number of speech recognition channels can be saved.

therefore, it is necessary to restart speech recognition after receiving the end event of speech recognition. There are two ways: one is to start directly with the command introduced in 3.1, and the other is to realize it through the resume command.

detect_speech resume

3.3 transfer to seat

there are three ways to transfer to the required telephone through bridge:

bridge user/1002
fifo ivr in
callcenter ivr_queue

those who are interested can go to the official documents of FreeSWITCH. Experience tells us that the official documents are still the most useful.

when the agent answers the phone, he receives channel_ The bridge event starts to execute voice recognition commands on two channels at this time.

according to my own expectations, things should come to an end here, but it's not so simple. As if God gave a person a lot of money, he would take away some things, such as his troubles. On the contrary, the money I took away brought me trouble. I know God just made a mistake this time. I hope he can make it up for me next time.

then we started the journey of exploration.

4. Problems encountered

after transferring to the seat, you can't receive detected anymore_ Speed event, fs_cli can see the results identified by MRCP Server in the FreeSWITCH log, but not detected_ As long as the event is detected by tech, it can't be obtained by tech_ The speed incident was very embarrassing.

I also consulted Mr. Du. It may be because the bridge is blocked. Then I press the key many times after the bridge to execute detect_speech successfully executed the speech recognition command, saw the speech recognition results in the log, and did not receive the detected that should be returned in theory_ Speed event.

later, I tested again. a called b directly, and then manually executed the following commands through Inbound mode:

nc 192.168.160.84 8021

auth ClueCon

event DETECTED_SPEECH

sendmsg 9c603a5f-fa12-4506-a93c-c2a971bd5a5a
call-command: execute
execute-app-name: detect_speech
execute-app-arg: unimrcp {test}hello default

the test results are the same, fs_cli can see the result identification of MRCP Server in the FreeSWITCH log, but not detected_ Speed event.

according to the above attempts, the conclusion is that bridge affects detected_ Normal sending of speed events.

5. Solutions

in order to solve this problem, I read some of the source code, but the amount of code is relatively large, and I can't spend much time to fully understand it. Therefore, I also found some materials on the Internet, one of which is more impressive, MRCP protocol stack source code modification to support real-time speech recognition , the general idea is to change short speech recognition into long speech recognition by modifying MRCP protocol. Since I have realized the acquisition and sending of transcribed content to MQ in MRCP Server, this is a feasible scheme for me. But on second thought, the MRCP Server needs to be modified, which will have an impact on the standard protocol. There is no way to connect the standard protocol in the future, so this scheme was not adopted later.

later I watched mod_unimrcp source code, considering the time cost, decided to use mod_ A new custom event is added to the unimrcp source code.

due to the previous troubleshooting, I am familiar with this part of the code, so after finding and completing speech recognition, I receive the recognizer from MRCP Server_ RECOGNITION_ After the complete event, in mod_unimrcp.c added a CUSTOM unimrcp::asrend event to add the Header and Body data of MRCP to the event. The code is as follows:

/**
 * Handle the MRCP responses/events
 */
static apt_bool_t recog_on_message_receive(mrcp_application_t *application, mrcp_session_t *session, mrcp_channel_t *channel, mrcp_message_t *message)
{
    ...
    if (message->start_line.message_type == MRCP_MESSAGE_TYPE_RESPONSE) {
    ...
    } else if (message->start_line.message_type == MRCP_MESSAGE_TYPE_EVENT) {
		/* received MRCP event */
		if (message->start_line.method_id == RECOGNIZER_RECOGNITION_COMPLETE) {
		    ...
		    recog_channel_set_result_headers(schannel, recog_hdr);
			recog_channel_set_results(schannel, result);
			// add event begin
			if (switch_event_create(&event, SWITCH_EVENT_CUSTOM) == SWITCH_STATUS_SUCCESS) {
                event->subclass_name = strdup("unimrcp::asrend");
                switch_event_add_header_string(event, SWITCH_STACK_BOTTOM, "Event-Subclass", event->subclass_name);
                ...
                switch_event_add_header_string(event, SWITCH_STACK_BOTTOM, "MRCP-Body", result);
                switch_event_fire(&event);
            }
            // add event end
    ...
}

next, in the FreeSWITCH source code directory, click make mod_ Compile and deploy unimrcp install, start debugging, and then successfully receive the custom event.

Callernumber: "1004"
Core-Uuid: "ceddd45d-80aa-495d-9301-297356ccf05f"
Event-Calling-File: "mod_unimrcp.c"
Event-Calling-Function: "recog_on_message_receive"
Event-Calling-Line-Number: "3694"
Event-Date-Gmt: "Tue, 15 Feb 2022 10:04:11 GMT"
Event-Date-Local: "2022-02-15 18:04:11"
Event-Date-Timestamp: "1644919451678906"
Event-Name: "CUSTOM"
Event-Sequence: "592"
Event-Subclass: "unimrcp::asrend"
Freeswitch-Hostname: "freeswitch-seat"
Freeswitch-Ipv4: "192.168.160.84"
Freeswitch-Ipv6: "::1"
Freeswitch-Switchname: "freeswitch-seat"
Mrcp-Body: "<?xml version=\"1.0\"?>\n<result>\n  <interpretation confidence=\"99\">\n    <engineName>shenghan</engineName>\n    <engineStartTime>1644919435900</engineStartTime>\n    <result>Today's weather 
. </result>\n    <beginTime>8490</beginTime>\n    <endTime>9510</endTime>\n    <volumeMax>29</volumeMax>\n    <volumeMin>4</volumeMin>\n    <volumeAvg>13</volumeAvg>\n  </interpretation>\n</result>\n" 
Source: "0"
Uuid: "ff8fd9ba-5a26-4f55-ad60-441db85c5bf1"

next, in the ESL service you wrote, subscribe to the CUSTOM unimrcp::asrend event and execute detect after receiving it_ Speech resume continues speech recognition.

6. Follow up issues to be studied

I still hope to have time to study why bridge affects detected later_ For the speech event, of course, if any great God knows the reason or solution, please leave a message for guidance. Thank you very much.

we hope to learn from each other and make common progress.

Topics: FreeSwitch

Programmer Think