TensorFlow Practical Warfare Lesson 7 (dropout Solving overfitting)

Posted by ev5unleash on Mon, 05 Aug 2019 12:42:05 +0200

Dropout Solves overfitting

 

overfitting is also known as over-learning and over-fitting. He is a common problem in machine learning.

 

 

The black curve is the normal model, and the green curve is the overfitting model. Although the green curve accurately distinguishes all the training data, it does not describe the overall characteristics of the data and has poor adaptability to the new test data.

Take Regression for example.

 

The third curve has the problem of overfitting. Although it passes through all training points, it can not reflect the trend of data very well, and its prediction ability is seriously inadequate. tensorflow provides a powerful drop out method to end overfitting problems.

 

Establishment of Dropout Layer

This content needs to install data adapted to sklearn database. Students who do not install sklearn can refer to the following

import tensorflow as tf
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import  LabelBinarizer
keep_prob = tf.placeholder(tf.float32)
...
...
Wx_plus_b = tf.nn.dropout(Wx_plus_b, keep_prob)

Here keep_prob is the retention probability, that is, the proportion of the results we want to retain. As a placeholder, it is passed in when run. When keep_prob=1, it is equivalent to 100% retention, that is, drop does not work.

 

The following data are prepared:

digits = load_digits()
X = digits.data
y = digits.target
y = LabelBinarizer().fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

X_train is the training data and X_test is the test data. Then add the hidden layer and output layer

 

# add output layer
l1 = add_layer(xs, 64, 50, 'l1', activation_function=tf.nn.tanh)
prediction = add_layer(l1, 50, 10, 'l2', activation_function=tf.nn.softmax)

The activation function of adding layer is tanh (actually I don't know why, I don't bother to write tanh function in class, other functions will report none error, no matter what)

 

train

sess.run(train_step, feed_dict={xs: X_train, ys: y_train, keep_prob: 0.5})
#sess.run(train_step, feed_dict={xs: X_train, ys: y_train, keep_prob: 1})

Preservation probability refers to how much is retained. Let's try 0.5-1.0 to observe the visualization results generated by tensorboard.

 

Visualization results

When keep_prob=1, the problem of overfitting will be exposed. When keep_prob=0.5, Dropout will play a role.

When keep_prob=1, the adaptability of the model to training data is better than that of test data, and there exists overfitting. The red line is the error of train and the blue line is the error of test.

 

When keep_prob=0.5

You can clearly see the changes in loss

 

The complete code and some comments are as follows

import tensorflow as tf
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import  LabelBinarizer

#Load data
digits = load_digits()
X = digits.data
y = digits.target
#preprocessing.LabelBinarizer It's a very useful tool.
# For example, you can yes and no Convert to 0 and 1
# Or incident and normal Convert to 0 and 1
y = LabelBinarizer().fit_transform(y)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=.3)

def add_layer(inputs,in_size,out_size,layer_name,activation_function=None,):
    Weights = tf.Variable(tf.random_normal([in_size,out_size]))
    biases = tf.Variable(tf.zeros([1,out_size])+ 0.1)
    Wx_plus_b = tf.matmul(inputs,Weights)+biases
    #here to drop
    Wx_plus_b = tf.nn.dropout(Wx_plus_b, keep_prob)
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b,)
    tf.summary.histogram(layer_name + 'outputs',outputs)
    return outputs

#define placeholder for inputs to network
keep_prob = tf.placeholder(tf.float32)#Retention probability is the proportion of the results we need to retain
xs = tf.placeholder(tf.float32,[None,64])#8*8
ys = tf.placeholder(tf.float32,[None,10])

#add output layer
l1 = add_layer(xs,64,50,'l1',activation_function=tf.nn.tanh)
prediction = add_layer(l1,50,10,'l2',activation_function=tf.nn.softmax)

#the loss between prediction and real data
#loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction),reduction_indices=[1]))

tf.summary.scalar('loss',cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

sess = tf.Session()
merged = tf.summary.merge_all()

#summary writer goes in here
train_writer = tf.summary.FileWriter("logs/train",sess.graph)
test_writer = tf.summary.FileWriter("logs/test",sess.graph)

init = tf.global_variables_initializer() #Variable initialization

sess.run(init)

for i in range(500):
    #here to determine the keeping probability
    sess.run(train_step,feed_dict={xs:X_train,ys:y_train,keep_prob:0.5})
    if i %50 ==0:
        #record loss
        train_result = sess.run(merged,feed_dict={xs:X_train,ys:y_train,keep_prob:1})
        test_result = sess.run(merged,feed_dict={xs:X_test,ys:y_test,keep_prob:1})
        train_writer.add_summary(test_result,i)
        test_writer.add_summary(test_result,i)

Topics: PHP Database network Session