NewsSort - News intelligence subsystem

Posted by hennety on Sun, 28 Nov 2021 04:09:07 +0100

NewsSort - News intelligence subsystem

Demo video portal: NewsSort - News intelligence subsystem

1, Project introduction and description

1.1 project introduction:

Projects in previous periods 2021 China software Cup - News intelligence subsystem In, we completed the training of news long text 10 classification model by fine-tuning training on the crawled and integrated news 10 classification data set based on PaddleHub, and completed the visual interface demonstration of the project based on PyQt5. and Current project It will bring you the web deployment of the project, which can realize the fine news text 10 classification tasks: finance, real estate, education, science and technology, military, automobile, sports, games, entertainment and others.

Technology stack: front end: Vue+Element UI; Back end: FastAPI+PaddleHub.

Effective screening and classification of news text data can enable users to obtain valuable news information more efficiently and save the cost of information acquisition; On the other hand, Internet companies use text classification technology to classify news texts, put different categories in different category libraries, and make accurate automatic recommendation according to users' needs, which greatly saves human and material resources.

1.2 technical route:

a. Firstly, based on PaddleHub, fine-tuning training is carried out on the crawled and integrated news 10 classification data set through the pre training model erine tiny, so as to complete the construction of news long text 10 classification model.

b. Then, the deployment of the model and the establishment of the back-end API interface are completed based on FastAPI, and the interface logic and function are tested through Postman.

c. Finally, the web front-end interface of the news intelligence subsystem is built based on Vue+ElementUI, and the back-end API service is connected by sending network requests through Axios, so as to complete the front-end and rear-end joint debugging.

1.3 source code operation description:

The complete project source code has been mounted in the form of data set for easy management and download. Source address: https://aistudio.baidu.com/aistudio/datasetdetail/117783

If you are interested, you can download it locally, unzip it, and configure it according to the provided "project description document. txt". If you encounter problems during the operation of the project, please give me feedback in the comment area.

I hope this project can be helpful to you. Those who are interested can Fork, like and pay attention to Sanlian ❤

# The complete project is mounted in the form of data set. Now decompress and view it
%cd /home/aistudio/data/data117783/
!unzip NewsSort.zip
/home/aistudio/data/data117783
Archive:  NewsSort.zip
   creating: NewsSort/NewsSort-API/
   creating: NewsSort/NewsSort-API/__pycache__/
  inflating: NewsSort/NewsSort-API/__pycache__/main.cpython-37.pyc  
   creating: NewsSort/NewsSort-API/best_model/
  inflating: NewsSort/NewsSort-API/best_model/model.pdparams  
  inflating: NewsSort/NewsSort-API/main.py  
   creating: NewsSort/NewsSort-Web/
  inflating: NewsSort/NewsSort-Web/index.html  
  inflating: NewsSort/NewsSort-Web/package.json  
  inflating: NewsSort/NewsSort-Web/package-lock.json  
   creating: NewsSort/NewsSort-Web/public/
  inflating: NewsSort/NewsSort-Web/public/favicon.ico  
   creating: NewsSort/NewsSort-Web/src/
  inflating: NewsSort/NewsSort-Web/src/App.vue  
   creating: NewsSort/NewsSort-Web/src/assets/
  inflating: NewsSort/NewsSort-Web/src/assets/background.png  
   creating: NewsSort/NewsSort-Web/src/components/
  inflating: NewsSort/NewsSort-Web/src/components/HelloWorld.vue  
  inflating: NewsSort/NewsSort-Web/src/main.js  
  inflating: NewsSort/NewsSort-Web/vite.config.js  
  inflating: NewsSort/╧ю─┐╦╡├ў╬─╡╡.txt  

Source file description:

a. Newssort API folder is the back-end API service module, where best_ The model folder stores the news long text 10 classification model parameters trained based on PaddleHub. main.py is the main program of back-end API service.

b. Newssort web folder is a web front-end interface module. The front-end page module of news intelligence subsystem is built based on VUE+ElementUI components. Send a network request through Axios to connect with the back-end API interface service, so as to complete the front and rear end joint debugging. The main program of interface construction can view the src/App.vue file.

2, Prediction of news long text 10 classification model

News long text 10 classification model training details in previous projects( 2021 software Cup - News intelligence subsystem )It has been described in detail. Those interested can go to the project to learn more.

This module mainly demonstrates the prediction effect of news long text 10 classification model.

# Download the latest version of paddlehub
!pip install -U paddlehub -i https://pypi.tuna.tsinghua.edu.cn/simple
# Make a simple version alignment

!pip install numpy==1.19
!pip install matplotlib==3.3.0
# Import paddlehub and paddle packages
import paddle
import paddlehub as hub
# View the files in the current path
!ls
NewsSort  NewsSort.zip
# Define 10 categories to classify
label_list=['Finance and Economics', 'house property', 'education', 'science and technology', 'military', 'automobile', 'Sports', 'game', 'entertainment', 'other']
label_map = { 
    idx: label_text for idx, label_text in enumerate(label_list)
}

# Load the trained model
model = hub.Module(
    name='ernie_tiny',
    task='seq-cls',
    num_classes=10,    # Set the classification category to 10
    load_checkpoint='./NewsSort/NewsSort-API/best_model/model.pdparams', # Load and fine tune the trained model weights. Note that the path here must be set correctly, otherwise the effect will be very poor!
    label_map=label_map
    )
INFO:filelock:Lock 140516872109264 acquired on /home/aistudio/.paddlehub/tmp/ernie_tiny


Download https://bj.bcebos.com/paddlehub/paddlehub_dev/ernie_tiny_2.0.2.tar.gz
[##################################################] 100.00%
Decompress /home/aistudio/.paddlehub/tmp/tmprx6d3j8g/ernie_tiny_2.0.2.tar.gz
[##################################################] 100.00%


[2021-11-24 21:48:50,323] [    INFO] - Successfully installed ernie_tiny-2.0.2
INFO:filelock:Lock 140516872109264 released on /home/aistudio/.paddlehub/tmp/ernie_tiny
[2021-11-24 21:48:50,329] [    INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams and saved to /home/aistudio/.paddlenlp/models/ernie-tiny
[2021-11-24 21:48:50,332] [    INFO] - Downloading ernie_tiny.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams
100%|██████████| 354158/354158 [00:07<00:00, 45763.40it/s]
W1124 21:48:58.175362   123 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W1124 21:48:58.179697   123 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2021-11-24 21:49:09,092] [    INFO] - Loaded parameters from /home/aistudio/data/data117783/NewsSort/NewsSort-API/best_model/model.pdparams
# Data for news 10 classification: news title + news body
# News headlines
title = "The new governor is committed to improving the quality of public education in Canada"
# News text
body = "Mr. Johnston, President of the University of Waterloo, assumed the post of governor of Canada on October 1. Mr. Johnston was also the president of McGill University and held teaching positions at the University of Toronto, Queen's University and the University of Western Ontario.  In his inaugural speech, Mr. Johnston said that he would build Canada into a "smart and caring country". To achieve this goal, he proposed three pillars: supporting and caring for families and children; Encourage learning and creativity; Promote charity and volunteer spirit. In particular, he stressed the need to care for and respect teachers, and fully develop everyone's talents through public education."
# Check whether the current GPU is available
paddle.utils.run_check()
Running verify PaddlePaddle program ... 
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

ps: Note: if you select a non GPU version, please manually set use when running the following code_ GPU = true to use_gpu=False

from time import time

# Splicing news headlines and text content
data = title + body
# Simple data cleaning and format processing
process = lambda x: x.strip().replace('\n', '').replace('\r', '').replace(" ","").replace(u'\t',u'')
data = process(data)
newslist = []
list = []
list.append(data)
newslist.append(list)

# Predict and calculate execution efficiency. Note: the initial loading takes a long time, and the call will accelerate after loading.
begin_time = time()
# News long text 10 classification prediction. The default is cpu environment. If GPU environment is configured, use can be set_ gpu=True
label, probs = model.predict(newslist, max_seq_len=256, batch_size=1, return_prob=True, use_gpu=True)
end_time = time()
print('Total prediction time (MS):%.3f' % ((end_time - begin_time) * 1000.0))

# Output results
print('News headlines: {} \n News text: {} \n News category: {} \n Confidence: {}'.format(title, body, label[0], max(probs[0])))
Total prediction time (MS): 13.762
 News headlines: The new governor is committed to improving the quality of public education in Canada 
 News text: Mr. Johnston, President of the University of Waterloo, assumed the post of governor of Canada on October 1. Mr. Johnston was also the president of McGill University and held teaching positions at the University of Toronto, Queen's University and the University of Western Ontario.  In his inaugural speech, Mr. Johnston said that he would build Canada into a "smart and caring country". To achieve this goal, he proposed three pillars: supporting and caring for families and children; Encourage learning and creativity; Promote charity and volunteer spirit. In particular, he stressed the need to care for and respect teachers, and fully develop everyone's talents through public education. 
 News category: education 
 Confidence: 0.9986690282821655

3, Build back-end API services based on FastAPI+PaddleHub

The module mainly completes the model deployment and the establishment of back-end API interface based on FastAPI, and tests the interface function and logic through Postman.

3.1 build API interface based on FastAPI

FastAPI is a modern, fast (high-performance) web framework for building API s. It is fast and simple, which can help developers code efficiently and effectively reduce human errors.

FastAPI document address: https://fastapi.tiangolo.com/zh/

The back-end API service of the project is complete. You can view the main.py file in the directory of data / data117783 / newssort / newssort API /!

Solve cross domain problems:

Define API interface processing:

Running back-end services:

You can see that the backend API service is started successfully. Note that 127.0.0.1 here is the local machine address. Small partners with servers can try to deploy the project to the server. After starting the program, the public ip:8000 / can access the service.

The API interface address created in this project is http://127.0.0.1:8000/newssort , the HTTP method is POST. Next, we will simply test the API interface just created.

3.2 Postman interface test

Postman is an interface debugging and testing tool that supports http protocol. Its main features are powerful, simple to use and easy to use. Using the tutorial: Postman tools tutorial

Next, we will simply test the API interface just created through Postman to test whether the logical function and result return of the API interface are normal.

4, Building front-end web pages based on Vue+ElementUI

Vue is a progressive framework for building user interfaces. Vue's core library only focuses on view layers, which is not only easy to start, but also easy to integrate with third-party libraries or existing projects.

VUE official documents: https://v3.cn.vuejs.org/

Element UI is a Vue based desktop component library for developers, designers and product managers. It provides exquisite and rich components to help developers build websites quickly.

ElementUI document: https://element.eleme.cn/#/zh-CN

web interface construction and front and rear end joint debugging

This module mainly builds the web interface of news intelligence subsystem based on Vue+ElementUI, and sends a network request through Axios to connect with the back-end API interface to complete the front-end and rear-end joint debugging.

See data / data117783 / newssort / newssort Web / SRC / app.vue for details of web interface construction.

The following mainly introduces the interaction logic between front-end and back-end API interfaces:

Start front end project:

After starting the project, access http://localhost:3000/ The project news intelligence subsystem interface can be opened.

5, Author introduction

Nickname? Alchemist 233

Propeller developer technical expert PPDE

Main direction: development, focusing on NLP and data mining related competitions or projects

https://aistudio.baidu.com/aistudio/personalcenter/thirdview/330406 Pay attention to me and bring more wonderful projects to share next time!

Topics: NLP paddlepaddle