# Prevent Early Stopping of over fitting

Posted by Duke555 on Thu, 03 Feb 2022 04:43:23 +0100

# Early Stopping

### Brief Introduction

When we train deep learning neural networks, we usually hope to obtain the best generalization performance. However, all standard deep learning neural network structures such as MLP are easy to over fit: when the error rate of the network in the training set is getting lower and lower, in fact, at a certain moment, its performance in the test set has begun to deteriorate.

PS: in this figure, due to the small disturbance of the loss of the verification set, the "U-shape" of the verification set is not very obvious.

### How to slove overfitting

1. Reduce the dimension of parameter space.
2. Reduce the effective scale of each dimension.
Methods to reduce the number of parameters include green constructive learning, pruning and weight sharing. The main methods to reduce the effective scale of each parameter dimension are regularization, such as weight decay and early stopping.

### Early stopping

#### Brief Introduction

During training, calculate the performance of the model in the verification set. When the performance of the model in the verification set begins to decline, stop training.

#### Specific steps

Step 1: divide the training set into training set and verification set
Step 2: the training will be conducted only on the training set, and the error of the model on the verification set will be calculated for each cycle T, for example, one cycle in every 15 epoch (mini batch) training, and the optimal model parameters under the current situation will be saved.
step3: stop training when p times of bad verification set performance is observed (P can be understood as patience value and tolerance).
Step 4: use the parameters in the last iteration result as the final parameters of the model.

#### Codes

The following is an example to use early stopping, using a simple three-tier GCN as an example.

```Pytorch = 1.7.1 ， Python = 3.6 ，torch-geomatric = 1.7.1,  CUDA = 10.1
```
```import random
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
import matplotlib.pyplot as plt

# Define the network used, layer 3 GCN
class GCN_NET3(torch.nn.Module):
'''
three-layers GCN
two-layers GCN has a better performance
'''
def __init__(self, num_features, hidden_size1, hidden_size2, classes):
'''
:param num_features: each node has a [1,D] feature vector
:param hidden_size1: the size of the first hidden layer
:param hidden_size2: the size of the second hidden layer
:param classes: the number of the classes
'''
super(GCN_NET3, self).__init__()
self.conv1 = GCNConv(num_features, hidden_size1)
self.relu = torch.nn.ReLU()
self.dropout = torch.nn.Dropout(p=0.5)  # use dropout to over ove-fitting
self.conv2 = GCNConv(hidden_size1, hidden_size2)
self.conv3 = GCNConv(hidden_size2, classes)
self.softmax = torch.nn.Softmax(dim=1) # each raw

def forward(self, Graph):
x, edge_index = Graph.x, Graph.edge_index
out = self.conv1(x, edge_index)
out = self.relu(out)
out = self.dropout(out)
out = self.conv2(out, edge_index)
out = self.relu(out)
out = self.dropout(out)
out = self.conv3(out, edge_index)
out = self.softmax(out)
return out

def setup_seed(seed):
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)

dataset = Planetoid(root='./', name='Cora')  # if root='./', Planetoid will use local dataset
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  # use cpu or gpu
model = GCN_NET3(dataset.num_node_features, 128, 64, dataset.num_classes).to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.005)  # define optimizer

# define some parameters
eval_T = 5  # evaluate period
P = 3  # patience
i = 0  # record the frequency f bad performance of validation
max_epoch = 300
setup_seed(seed=20)  # set up random seed
temp_val_loss = 99999  # initialize val loss
L = []  # store loss of training
L_val = []  # store loss of val

# training process
model.train()
for epoch in range(max_epoch):
out = model(data)
_, val_pred = model(data).max(dim=1)

# early stopping
if (epoch % eval_T) == 0:
if (temp_val_loss > loss_val):
temp_val_loss = loss_val
torch.save(model.state_dict(), "GCN_NET3.pth")  # save th current best
i = 0  # reset i
else:
i = i + 1
if i > P:
print("Early Stopping! Epoch : ", epoch,)
break

L_val.append(loss_val)
val_acc = val_corrent / data.val_mask.sum()
print('Epoch: {}  loss : {:.4f}  val_loss: {:.4f}  val_acc: {:.4f}'.format(epoch, loss.item(),
loss_val.item(), val_acc.item()))
L.append(loss.item())
loss.backward()
optimizer.step()

# test
model.eval()
_, pred = model(data).max(dim=1)
acc = corrent / data.test_mask.sum()
print("test accuracy is {:.4f}".format(acc.item()))

# plot the curve of loss

n = [i for i in range(len(L))]
plt.plot(n, L, label='train')
plt.plot(n, L_val, label='val')
plt.legend()  # show the labels
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()
```

#### Result

```Output result:
Early Stopping!  Epoch :  28
test accuracy is 0.8030
```

The test accuracy has been significantly improved. When Early stopping is not used, the test accuracy is about 76%; Now it is about 78%, up to 80.3%