Natural language processing - application scenario - chat robot: Code encapsulation and external interface

Posted by dent_moosely on Sat, 05 Mar 2022 04:00:36 +0100

1, Code encapsulation and external interfaces

In the process of code encapsulation, it should be noted that in the whole structure, many settlement results are dump ed locally to prevent repeated calculation each time. Therefore, the result of laod should be loaded into the content in advance, rather than calling the load semantic every time

1. Complete intent identification code encapsulation

Complete the code to judge the user's intention, that is, judge the classification of the user's input sentences using the fasttext model

import fastText
import re
from lib import jieba_cut

fc_word_mode = fastText.load_model("./classify/data/ft_classify.model")
fc_word_mode = fastText.load_model("./classify/data/ft_classify_words.model")

def is_QA(sentence_info):
    python_qs_list = [" ".join(sentence_info["cuted_sentence"])]
    result = fc_word_mode.predict(python_qs_list)
    python_qs_list = [" ".join(sentence_info["cuted_word_sentence"])]
    words_result = fc_word_mode.predict(python_qs_list)
    for index, (label,acc,word_label,word_acc) in enumerate(zip(*result,*words_result)):
        label = label[0]
        acc = acc[0]
        word_label = word_label[0]
        word_acc = word_acc[0]
        #Label_ If the prediction result is QA, it shall prevail_ Chat, then label_ Probability of QA = 1-label_chat
        if label == "__label__chat":
            label = "__label__QA"
            acc = 1-acc
        if word_label == "__label__chat":
            word_label = "__label__QA"
            word_acc = 1 - word_acc
        if acc>0.95 or word_acc>0.95:
            #It's QA
            return True
            return False

2. Complete the encapsulation of chatbot code

Provide the interface of predict

Ready to chat model
import pickle
from lib import jieba_cut
import numpy as np
from chatbot import Sequence2Sequence

class Chatbot:
    def __init__(self,ws_path="./chatbot/data/ws.pkl",save_path="./chatbot/model/seq2seq_chatbot.ckpt"):
        self.ws_chatbot = pickle.load(open(ws_path, "rb"))
        self.save_path = save_path
		#TODO .....

    def predict(self,s):
        :param s:Without participle
        :param ws:
        :param ws_words:
        #TODO ...
        return ans

3. Complete the packaging of the recall of the question and answer system

Method of recall
import os
import pickle

class Recall:
    def __init__(self,topk=20):
        # Prepare modules such as mode for Q & A
        self.topk = topk

    def predict(self,sentence):
        :param sentence:
        :param debug:
        :return: [recall list],[entity]
        #TODO recall
        return recall_list

    def get_answer(self,s):
        return self.QA_dict[s]

4. Complete the encapsulation of the question and answer ranking model

Deep learning ranking
import tensorflow as tf
import pickle
from DNN2 import SiamsesNetwork
from lib import jieba_cut

class DNNSort():
    def __init__(self):
        #The mean value of word and word models is used as the final result
        self.dnn_sort_words = DNNSortWords()
        self.dnn_sort_single_word = DNNSortSingleWord()

    def predict(self,s,c_list):
        sort1 = self.dnn_sort_words.predict(s,c_list)
        sort2 = self.dnn_sort_single_word.predict(s,c_list)
        for i in sort1:
            sort1[i] = (sort1[i]+ sort2[i])/2
        sorts = sorted(sort1.items(),key=lambda x:x[-1],reverse=True)
        return sorts[0][0],sorts[0][1]

class DNNSortWords:
    def __init__(self,ws_path="./DNN2/data/ws_80000.pkl",save_path="./DNN2/model_keras/esim_model_softmax.ckpt"): = pickle.load(open(ws_path, "rb"))
        self.save_path = save_path
		#TOOD ...
    def predict(self,s,c_list):
        :param s:Without participle
        :param c_list: List with comparison
        :param ws:
        :param ws_words:
        #TOOD ...
        return sim_dict

class DNNSortSingleWord:
    def __init__(self,ws_path="./DNN2/data/ws_word.pkl",save_path="./DNN2/data/esim_word_model_softmax.ckpt"): = pickle.load(open(ws_path, "rb"))
        self.save_path = save_path
        #TOOD ...

    def predict(self,s,c_list):
        :param s:Without participle
        :param c_list: List with comparison
        :param ws:
        :param ws_words:
		#TOOD ...
        return sim_dict

5. Realize the saving of chat records

For different users, a conversation within 10 minutes in a row is considered as one round of conversation. If there is no next conversation after 10 minutes, it is considered as the end of this round of conversation. If the conversation starts after 10 minutes, it is considered as the next round of conversation. In order to save the chat topics in different rounds, basic conversation management can be realized in the follow-up. For example, the user has just asked a question about python. If there is no subject in the subsequent question, then take Python in redis as its subject

The main implementation logic is:

  1. redis is used to store basic user data
  2. Using mongodb to store conversation records

The specific ideas are as follows:

  1. Obtain the dialogue id according to the user id, and judge whether the current dialogue exists according to the dialogue id
  2. If conversation id exists:
    1. Update the entity of the conversation, the time of the last conversation, and set the expiration time of the conversation id
    2. Save data to mongodb
  3. If the conversation id does not exist:
    1. Create user's basic information (user_id,entity, conversation time)
    2. Store the user's basic information in redis, and set the conversation id and expiration time at the same time
    3. Save data to mongodb
Get and update user information
from pymongo import MongoClient
import redis
from uuid import uuid1
import time
import json

### redis


### monodb stores conversation records

HOST = "localhost"
CNVERSION_EXPERID_TIME = 60 * 10  # 10 minutes. If there is no communication for 10 consecutive minutes, it means that the session is over

class MessageManager:
    def __init__(self):
        self.client = MongoClient(host=HOST)
        self.m = self.client["toutiao"]["dialogue"]
        self.r = redis.Redis(host=HOST, port=6379, db=10)

    def last_entity(self, user_id):
        """Last time entity"""
        return json.loads(self.r.hget(user_id, "entity"))

    def gen_conversation_id(self):
        return uuid1().hex

    def bot_message_pipeline(self, user_id, message):
        """Save the reply record of the robot"""
        conversation_id_key = "{}_conversion_id".format(user_id)
        conversation_id = self.user_exist(conversation_id_key)
        if conversation_id:
            # Update conversation_ Expiration time of ID
            self.r.expire(conversation_id_key, CNVERSION_EXPERID_TIME)
            data = {"user_id": user_id,
                    "conversation_id": conversation_id,
                    "from": "bot",
                    "message": message,
                    "create_time": int(time.time()),

            raise ValueError("No session id,But the robot tried to reply....")

    def user_message_pipeline(self, user_id, message, create_time, attention, entity=[]):
        # Identify user related information
        # 1. Does the user exist
        # 2.1 if the user exists, return the latest entity of the user and save the latest conversation
        # 3.1 judge whether it is a new conversation. If it is a new conversation, open a new reply and update the conversation information of the user
        # 3.2 if it is not a new conversation, update the conversation information of the user
        # 3. Update the user's basic information
        # 4 return user related information
        # 5. Call the prediction interface and send the dialog structure

        # The data to be saved is missing conversation_id
        data = {
            "user_id": user_id,
            "from": "user",
            "message": message,
            "create_time": create_time,
            "entity": json.dumps(entity),
            "attention": attention,

        conversation_id_key = "{}_conversion_id".format(user_id)
        conversation_id = self.user_exist(conversation_id_key)
        if conversation_id:
            if entity:
                # Update the current user's last_entity
                self.r.hset(user_id, "last_entity", json.dumps(entity))
            # Update last conversation time
            self.r.hset(user_id, "last_conversion_time", create_time)
            # Setting the expiration time of conversation
            self.r.expire(conversation_id_key, CNVERSION_EXPERID_TIME)

            # Save chat records to mongodb
            data["conversation_id"] = conversation_id

            print("mongodb Data saved successfully")

            # non-existent
            user_basic_info = {
                "user_id": user_id,
                "last_conversion_time": create_time,
                "last_entity": json.dumps(entity)
            self.r.hmset(user_id, user_basic_info)
            print("redis Deposit user_basic_info success")
            conversation_id = self.gen_conversation_id()
            print("generate conversation_id",conversation_id)

            # Set the id of the session
            self.r.set(conversation_id_key, conversation_id, ex=CNVERSION_EXPERID_TIME)
            # Save chat records to mongodb
            data["conversation_id"] = conversation_id
            print("mongodb Data saved successfully")

    def user_exist(self, conversation_id_key):
        Determine whether the user exists
        :param user_id:user id
        conversation_id = self.r.get(conversation_id_key)
        if conversation_id:
            conversation_id = conversation_id.decode()
        print("load conversation_id",conversation_id)
        return conversation_id

2, External interface

1. Use GRPC to provide external services

1.1 environment related to grpc installation

gRPC Installation of:`pip install grpcio`
install ProtoBuf dependent python Dependent Library:`pip install protobuf`
install python grpc of protobuf Compilation tool:`pip install grpcio-tools`

1.2 define the interface of GRPC

//chatbot.proto file
syntax = "proto3";

message ReceivedMessage {
    string user_id = 1; //User id
    string user_message = 2; //Messages delivered by the current user
    int32 create_time = 3; //The time when the current message was sent

message ResponsedMessage {
    string user_response = 1; //Messages returned to users
    int32 create_time = 2; //Time returned to the user

service ChatBotService {
  rpc Chatbot (ReceivedMessage) returns (ResponsedMessage);

1.3 compile and generate protobuf file

Compile with the following command to get and file

python -m grpc_tools.protoc -I. –python_out=. –grpc_python_out=. ./chatbot.proto

1.4 using grpc to provide services

import dialogue
from classify import is_QA
from dialogue.process_sentence import process_user_sentence

from chatbot_grpc import chatbot_pb2_grpc
from chatbot_grpc import chatbot_pb2
import time

class chatServicer(chatbot_pb2_grpc.ChatBotServiceServicer):

    def __init__(self):
        #Load various models in advance
        self.recall = dialogue.Recall(topk=20)
        self.dnnsort = dialogue.DNNSort()
        self.chatbot = dialogue.Chatbot()
        self.message_manager = dialogue.MessageManager()

    def Chatbot(self, request, context):
        user_id = request.user_id
        message = request.user_message
        create_time = request.create_time
        #Basic processing of user's output, such as word segmentation
        message_info = process_user_sentence(message)
        if is_QA(message_info):
            attention = "QA"
            #Save dialog data
            self.message_manager.user_message_pipeline(user_id, message, create_time, attention, entity=message_info["entity"])
            recall_list,entity = self.recall.predict(message_info)
            line, score = self.dnnsort.predict(message,recall_list)
            if score > 0.7:
                ans = self.recall.get_answer(line)
                user_response = ans["ans"]

                user_response = "Sorry, I haven't learned this problem yet..."
            attention = "chat"
            # Save dialog data
            user_response = self.chatbot.predict(message)


        user_response = user_response
        create_time = int(time.time())
        return chatbot_pb2.ResponsedMessage(user_response=user_response,create_time=create_time)

def serve():
    import grpc
    from concurrent import futures
    # Multithreaded server
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    # Register local service
    chatbot_pb2_grpc.add_ChatBotServiceServicer_to_server(chatServicer(), server)
    # Listening port
    # Start receiving requests for service
    # Use ctrl+c to exit the service
    except KeyboardInterrupt:

if __name__ == '__main__':

2. Use supervisor to complete the management of services

2.1 write simple execution script


cd `$dirname`|exit 0
#source activate ds

Add executable permission: chmod +x file name

2.2 installation and configuration of supervisor

The current official version of supervisor is still python2, but you can use the following command to install the python3 version

pip3 install git+    
  1. Complete the preparation of supervisor configuration file, and use semicolon as annotation symbol in conf

    command=/root/chat_service/  ;Commands executed
    stdout_logfile=/root/chat_service/log/out.log ;log Location of
    stderr_logfile=/root/chat_service/log/error.log  ;error log Location of
    directory=/root/chat_service  ;route
    autostart=true  ;Auto start
    autorestart=true  ;Whether to restart automatically
    startretries=10 ;Maximum number of failed attempts
  2. Add the above configuration file to the basic configuration of supervisor

  3. Run Supervisor

    supervisord -c /etc/supervisord/supervisor.conf

Topics: AI NLP