Tensorflow 2 uses full connection neural network to explore the classification and model optimization of movie reviews

Posted by fowlerlfc on Tue, 10 Mar 2020 07:11:25 +0100

The data set we use is the movie review data set imdb provided by testnflow2

When loading data sets for the first time, it may be necessary to download data from the Internet, which is relatively slow. I put it on the network disk, and the download link is https://pan.baidu.com/s/1tq kw3tldmwn-cmtcbrplw
Extraction code: 4r7f

Copy it to after downloading

data = keras.datasets.imdb
#Limit the number of words read
max_word = 10000
(x_train, y_train), (x_test, y_test) = data.load_data(num_words=max_word)

View data shape

x_train.shape, y_train.shape

Output: ((25000,), (25000,))

View tab

print(y_train)
print(y_train.max())
print(y_train.min())

Output:

[1 0 0 ... 0 1 0]

1

0

Well, it's a dichotomy.

The length of input data of neural network should be the same. Let's see the length of training data

data_lens = [len(x) for x in x_train]
print(max(data_lens))
print(min(data_lens))

Output:

2494

11

Different length, our next step is to make it the same length, and the text is trained into a dense vector. If you don't understand, please query the relevant information by yourself, and don't introduce the theory much. This paper focuses on the implementation process. Let's look at the length distribution of the data

import matplotlib.pyplot as plt
import numpy as np
p_x = np.linspace(0,2494,len(data_lens))
p_y = data_lens
plt.scatter(p_x,p_y)

We fixed the length to a value greater than the average length, which is

np.mean(data_lens)

Output: 238.71364

OK, let's take 300

x_train = keras.preprocessing.sequence.pad_sequences(x_train,300)
x_test = keras.preprocessing.sequence.pad_sequences(x_test,300)

Start the model building, and the text intensive is completed by the functions in tensorflow2 model

model = keras.models.Sequential()
#Training text into dense vector, data dimension (max'word, 300) - > (max'word, 300, 50)
model.add(layers.Embedding(max_word,50,input_length=300))
#Add a flattening layer to flatten the data. The data dimension (max_word, 300,50) - > (max_word, 300 * 50) is convenient to transfer to the full connection layer
model.add(layers.Flatten())
#Add full connection layer
model.add(layers.Dense(128,activation='relu'))
#Add output layer
model.add(layers.Dense(1))
#sigmoid selection of activation function for binary classification problem
model.add(layers.Activation('sigmoid'))
#View built models
model.summary()

Output:

Start model compilation and training

model.compile(optimizer = keras.optimizers.Adam(lr=0.001) #Set optimizer and learning rate lr
              ,loss = 'binary_crossentropy' #Loss function binary cross entropy]
              ,metrics = ['acc']  #Accuracy of evaluation index acc
)
model.fit(x_train
          ,y_train
          ,epochs=10 #Training times
          ,batch_size = 128
          ,validation_data = (x_test,y_test)
         )

Train on 25000 samples, validate on 25000 samples Epoch 1/10 25000/25000 [==============================] - 7s 278us/sample - loss: 0.4376 - acc: 0.7748 - val_loss: 0.2951 - val_acc: 0.8735

Epoch 2/10 25000/25000 [==============================] - 6s 246us/sample - loss: 0.1181 - acc: 0.9600 - val_loss: 0.3628 - val_acc: 0.8563

Epoch 3/10 25000/25000 [==============================] - 6s 228us/sample - loss: 0.0171 - acc: 0.9972 - val_loss: 0.4186 - val_acc: 0.8649

Epoch 4/10 25000/25000 [==============================] - 6s 235us/sample - loss: 0.0026 - acc: 0.9999 - val_loss: 0.4444 - val_acc: 0.8678

Epoch 5/10 25000/25000 [==============================] - 6s 234us/sample - loss: 0.0010 - acc: 1.0000 - val_loss: 0.4723 - val_acc: 0.8694

Epoch 6/10 25000/25000 [==============================] - 6s 238us/sample - loss: 5.7669e-04 - acc: 1.0000 - val_loss: 0.4912 - val_acc: 0.8698

Epoch 7/10 25000/25000 [==============================] - 6s 238us/sample - loss: 3.7859e-04 - acc: 1.0000 - val_loss: 0.5088 - val_acc: 0.8702

Epoch 8/10 25000/25000 [==============================] - 6s 233us/sample - loss: 2.6826e-04 - acc: 1.0000 - val_loss: 0.5232 - val_acc: 0.8705

Epoch 9/10 25000/25000 [==============================] - 6s 248us/sample - loss: 1.9853e-04 - acc: 1.0000 - val_loss: 0.5364 - val_acc: 0.8701

Epoch 10/10 25000/25000 [==============================] - 7s 262us/sample - loss: 1.5131e-04 - acc: 1.0000 - val_loss: 0.5492 - val_acc: 0.8708

It can be seen that on the training set, acc quickly reaches 1, and on the test set, acc first increases and then decreases, so it is more convenient to draw a picture to observe them

It can be seen that the model performs well in the training set and not well in the test set, resulting in over fitting.

To solve the over fitting problem, in addition to increasing the amount of data, we usually use two methods: 1. Increasing dropout layer 2. Adding L1 or L2 regularization 3. Cross validation

Now make changes to the model

model = tf.keras.models.Sequential()
#Training text into dense vector, data dimension (max'word, 300) - > (max'word, 300, 50)
model.add(tf.keras.layers.Embedding(max_word,50,input_length=300))
#Add a flattening layer to flatten the data. The data dimension (max_word, 300,50) - > (max_word, 300 * 50) is convenient to transfer to the full connection layer
#model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.GlobalAveragePooling1D())
#Add full connection layer
model.add(tf.keras.layers.Dense(128,activation='relu',kernel_regularizer=tf.keras.regularizers.Regularizer())) #Add regularization
#Add Dropout layer
model.add(tf.keras.layers.Dropout(0.8))
#Add output layer
model.add(tf.keras.layers.Dense(1))
#sigmoid selection of activation function for binary classification problem
model.add(tf.keras.layers.Activation('sigmoid'))

Results after transformation:

The acc of the intersection of two lines is about 0.89, which shows better performance. More optimization and improvement will not be discussed here, ok, so far.

 

 

Published 18 original articles, won praise 1, visited 2979
Private letter follow

Topics: network