Natural language processing - application scenario - chat robot: Code encapsulation and external interface

Posted by dent_moosely on Sat, 05 Mar 2022 04:00:36 +0100

1, Code encapsulation and external interfaces

In the process of code encapsulation, it should be noted that in the whole structure, many settlement results are dump ed locally to prevent repeated calculation each time. Therefore, the result of laod should be loaded into the content in advance, rather than calling the load semantic every time

1. Complete intent identification code encapsulation

Complete the code to judge the user's intention, that is, judge the classification of the user's input sentences using the fasttext model

import fastText
import re
from lib import jieba_cut

fc_word_mode = fastText.load_model("./classify/data/ft_classify.model")
fc_word_mode = fastText.load_model("./classify/data/ft_classify_words.model")



def is_QA(sentence_info):
    python_qs_list = [" ".join(sentence_info["cuted_sentence"])]
    result = fc_word_mode.predict(python_qs_list)
	
    python_qs_list = [" ".join(sentence_info["cuted_word_sentence"])]
    words_result = fc_word_mode.predict(python_qs_list)
    for index, (label,acc,word_label,word_acc) in enumerate(zip(*result,*words_result)):
        label = label[0]
        acc = acc[0]
        word_label = word_label[0]
        word_acc = word_acc[0]
        #Label_ If the prediction result is QA, it shall prevail_ Chat, then label_ Probability of QA = 1-label_chat
        if label == "__label__chat":
            label = "__label__QA"
            acc = 1-acc
        if word_label == "__label__chat":
            word_label = "__label__QA"
            word_acc = 1 - word_acc
        if acc>0.95 or word_acc>0.95:
            #It's QA
            return True
        else:
            return False

2. Complete the encapsulation of chatbot code

Provide the interface of predict

"""
Ready to chat model
"""
import pickle
from lib import jieba_cut
import numpy as np
from chatbot import Sequence2Sequence

class Chatbot:
    def __init__(self,ws_path="./chatbot/data/ws.pkl",save_path="./chatbot/model/seq2seq_chatbot.ckpt"):
        self.ws_chatbot = pickle.load(open(ws_path, "rb"))
        self.save_path = save_path
		#TODO .....


    def predict(self,s):
        """
        :param s:Without participle
        :param ws:
        :param ws_words:
        :return:
        """
        #TODO ...
        return ans

3. Complete the packaging of the recall of the question and answer system

"""
Method of recall
"""
import os
import pickle


class Recall:
    def __init__(self,topk=20):
        # Prepare modules such as mode for Q & A
        self.topk = topk

    def predict(self,sentence):
        """
        :param sentence:
        :param debug:
        :return: [recall list],[entity]
        """
        #TODO recall
        return recall_list

    def get_answer(self,s):
        return self.QA_dict[s]

4. Complete the encapsulation of the question and answer ranking model

"""
Deep learning ranking
"""
import tensorflow as tf
import pickle
from DNN2 import SiamsesNetwork
from lib import jieba_cut


class DNNSort():
    def __init__(self):
        #The mean value of word and word models is used as the final result
        self.dnn_sort_words = DNNSortWords()
        self.dnn_sort_single_word = DNNSortSingleWord()

    def predict(self,s,c_list):
        sort1 = self.dnn_sort_words.predict(s,c_list)
        sort2 = self.dnn_sort_single_word.predict(s,c_list)
        for i in sort1:
            sort1[i] = (sort1[i]+ sort2[i])/2
        sorts = sorted(sort1.items(),key=lambda x:x[-1],reverse=True)
        return sorts[0][0],sorts[0][1]

class DNNSortWords:
    def __init__(self,ws_path="./DNN2/data/ws_80000.pkl",save_path="./DNN2/model_keras/esim_model_softmax.ckpt"):
        self.ws = pickle.load(open(ws_path, "rb"))
        self.save_path = save_path
		#TOOD ...
        
    def predict(self,s,c_list):
        """
        :param s:Without participle
        :param c_list: List with comparison
        :param ws:
        :param ws_words:
        :return:
        """
        #TOOD ...
        return sim_dict

class DNNSortSingleWord:
    def __init__(self,ws_path="./DNN2/data/ws_word.pkl",save_path="./DNN2/data/esim_word_model_softmax.ckpt"):
        self.ws = pickle.load(open(ws_path, "rb"))
        self.save_path = save_path
        #TOOD ...

    def predict(self,s,c_list):
        """
        :param s:Without participle
        :param c_list: List with comparison
        :param ws:
        :param ws_words:
        :return:
        """
		#TOOD ...
        return sim_dict

5. Realize the saving of chat records

For different users, a conversation within 10 minutes in a row is considered as one round of conversation. If there is no next conversation after 10 minutes, it is considered as the end of this round of conversation. If the conversation starts after 10 minutes, it is considered as the next round of conversation. In order to save the chat topics in different rounds, basic conversation management can be realized in the follow-up. For example, the user has just asked a question about python. If there is no subject in the subsequent question, then take Python in redis as its subject

The main implementation logic is:

redis is used to store basic user data
Using mongodb to store conversation records

The specific ideas are as follows:

Obtain the dialogue id according to the user id, and judge whether the current dialogue exists according to the dialogue id
If conversation id exists:
1. Update the entity of the conversation, the time of the last conversation, and set the expiration time of the conversation id
2. Save data to mongodb
If the conversation id does not exist:
1. Create user's basic information (user_id,entity, conversation time)
2. Store the user's basic information in redis, and set the conversation id and expiration time at the same time
3. Save data to mongodb

"""
Get and update user information
"""
from pymongo import MongoClient
import redis
from uuid import uuid1
import time
import json

"""
### redis
{
user_id:"id",
user_background:{}
last_entity:[]
last_conversation_time:int(time):
}

userid_conversation_id:""

### monodb stores conversation records
{user_id:,conversion_id:,from:user/bot,message:"",create_time,entity:[],attention:[]}
"""

HOST = "localhost"
CNVERSION_EXPERID_TIME = 60 * 10  # 10 minutes. If there is no communication for 10 consecutive minutes, it means that the session is over


class MessageManager:
    def __init__(self):
        self.client = MongoClient(host=HOST)
        self.m = self.client["toutiao"]["dialogue"]
        self.r = redis.Redis(host=HOST, port=6379, db=10)

    def last_entity(self, user_id):
        """Last time entity"""
        return json.loads(self.r.hget(user_id, "entity"))

    def gen_conversation_id(self):
        return uuid1().hex

    def bot_message_pipeline(self, user_id, message):
        """Save the reply record of the robot"""
        conversation_id_key = "{}_conversion_id".format(user_id)
        conversation_id = self.user_exist(conversation_id_key)
        if conversation_id:
            # Update conversation_ Expiration time of ID
            self.r.expire(conversation_id_key, CNVERSION_EXPERID_TIME)
            data = {"user_id": user_id,
                    "conversation_id": conversation_id,
                    "from": "bot",
                    "message": message,
                    "create_time": int(time.time()),
                    }
            self.m.save(data)

        else:
            raise ValueError("No session id，But the robot tried to reply....")

    def user_message_pipeline(self, user_id, message, create_time, attention, entity=[]):
        # Identify user related information
        # 1. Does the user exist
        # 2.1 if the user exists, return the latest entity of the user and save the latest conversation
        # 3.1 judge whether it is a new conversation. If it is a new conversation, open a new reply and update the conversation information of the user
        # 3.2 if it is not a new conversation, update the conversation information of the user
        # 3. Update the user's basic information
        # 4 return user related information
        # 5. Call the prediction interface and send the dialog structure

        # The data to be saved is missing conversation_id
        data = {
            "user_id": user_id,
            "from": "user",
            "message": message,
            "create_time": create_time,
            "entity": json.dumps(entity),
            "attention": attention,
        }

        conversation_id_key = "{}_conversion_id".format(user_id)
        conversation_id = self.user_exist(conversation_id_key)
        print("conversation_id",conversation_id)
        if conversation_id:
            if entity:
                # Update the current user's last_entity
                self.r.hset(user_id, "last_entity", json.dumps(entity))
            # Update last conversation time
            self.r.hset(user_id, "last_conversion_time", create_time)
            # Setting the expiration time of conversation
            self.r.expire(conversation_id_key, CNVERSION_EXPERID_TIME)

            # Save chat records to mongodb
            data["conversation_id"] = conversation_id

            self.m.save(data)
            print("mongodb Data saved successfully")

        else:
            # non-existent
            user_basic_info = {
                "user_id": user_id,
                "last_conversion_time": create_time,
                "last_entity": json.dumps(entity)
            }
            self.r.hmset(user_id, user_basic_info)
            print("redis Deposit user_basic_info success")
            conversation_id = self.gen_conversation_id()
            print("generate conversation_id",conversation_id)

            # Set the id of the session
            self.r.set(conversation_id_key, conversation_id, ex=CNVERSION_EXPERID_TIME)
            # Save chat records to mongodb
            data["conversation_id"] = conversation_id
            self.m.save(data)
            print("mongodb Data saved successfully")


    def user_exist(self, conversation_id_key):
        """
        Determine whether the user exists
        :param user_id:user id
        :return:
        """
        conversation_id = self.r.get(conversation_id_key)
        if conversation_id:
            conversation_id = conversation_id.decode()
        print("load conversation_id",conversation_id)
        return conversation_id

2, External interface

1. Use GRPC to provide external services

1.1 environment related to grpc installation

gRPC Installation of:`pip install grpcio`
install ProtoBuf dependent python Dependent Library:`pip install protobuf`
install python grpc of protobuf Compilation tool:`pip install grpcio-tools`

1.2 define the interface of GRPC

//chatbot.proto file
syntax = "proto3";

message ReceivedMessage {
    string user_id = 1; //User id
    string user_message = 2; //Messages delivered by the current user
    int32 create_time = 3; //The time when the current message was sent
}

message ResponsedMessage {
    string user_response = 1; //Messages returned to users
    int32 create_time = 2; //Time returned to the user
}

service ChatBotService {
  rpc Chatbot (ReceivedMessage) returns (ResponsedMessage);
}

1.3 compile and generate protobuf file

Compile with the following command to get chatbot_pb2.py and chatbot_pb2_grpc.py file

python -m grpc_tools.protoc -I. –python_out=. –grpc_python_out=. ./chatbot.proto

1.4 using grpc to provide services

import dialogue
from classify import is_QA
from dialogue.process_sentence import process_user_sentence

from chatbot_grpc import chatbot_pb2_grpc
from chatbot_grpc import chatbot_pb2
import time



class chatServicer(chatbot_pb2_grpc.ChatBotServiceServicer):

    def __init__(self):
        #Load various models in advance
        self.recall = dialogue.Recall(topk=20)
        self.dnnsort = dialogue.DNNSort()
        self.chatbot = dialogue.Chatbot()
        self.message_manager = dialogue.MessageManager()

    def Chatbot(self, request, context):
        user_id = request.user_id
        message = request.user_message
        create_time = request.create_time
        #Basic processing of user's output, such as word segmentation
        message_info = process_user_sentence(message)
        if is_QA(message_info):
            attention = "QA"
            #Save dialog data
            self.message_manager.user_message_pipeline(user_id, message, create_time, attention, entity=message_info["entity"])
            recall_list,entity = self.recall.predict(message_info)
            line, score = self.dnnsort.predict(message,recall_list)
            if score > 0.7:
                ans = self.recall.get_answer(line)
                user_response = ans["ans"]

            else:
                user_response = "Sorry, I haven't learned this problem yet..."
        else:
            attention = "chat"
            # Save dialog data
            self.message_manager.user_message_pipeline(user_id,message,create_time,attention,entity=message_info["entity"])
            user_response = self.chatbot.predict(message)

        self.message_manager.bot_message_pipeline(user_id,user_response)

        user_response = user_response
        create_time = int(time.time())
        return chatbot_pb2.ResponsedMessage(user_response=user_response,create_time=create_time)

def serve():
    import grpc
    from concurrent import futures
    # Multithreaded server
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    # Register local service
    chatbot_pb2_grpc.add_ChatBotServiceServicer_to_server(chatServicer(), server)
    # Listening port
    server.add_insecure_port("[::]:9999")
    # Start receiving requests for service
    server.start()
    # Use ctrl+c to exit the service
    try:
        time.sleep(1000)
    except KeyboardInterrupt:
        server.stop(0)


if __name__ == '__main__':
    serve()

2. Use supervisor to complete the management of services

2.1 write simple execution script

#!/bin/bash

cd `$dirname`|exit 0
#source activate ds
python grpc_predict.py

Add executable permission: chmod +x file name

2.2 installation and configuration of supervisor

The current official version of supervisor is still python2, but you can use the following command to install the python3 version

pip3 install git+https://github.com/Supervisor/supervisor

Complete the preparation of supervisor configuration file, and use semicolon as annotation symbol in conf

;conf.d
[program:chat_service]

command=/root/chat_service/run.sh  ;Commands executed

stdout_logfile=/root/chat_service/log/out.log ;log Location of

stderr_logfile=/root/chat_service/log/error.log  ;error log Location of

directory=/root/chat_service  ;route

autostart=true  ;Auto start

autorestart=true  ;Whether to restart automatically

startretries=10 ;Maximum number of failed attempts

Add the above configuration file to the basic configuration of supervisor

;/etc/supervisord/supervisor.conf 
[include]
files=/root/chat_service/conf.d

Run Supervisor

supervisord -c /etc/supervisord/supervisor.conf

Topics: AI NLP

Programmer Think