DeepLearning:Train and Test Records on ResNet-18 Model

Posted by Tubby on Sun, 21 Nov 2021 12:26:55 +0100

ResNet-18:Train and Test Records


**Invariant: * * 200 rounds, ResNet18, input image size 32 (pad=4, randomCrop=32), batchsize=256

**Variables: * * learning rate, optimizer, policy

Note: because the experiment is at 32 × 32, so maxpooling (thank to teacher shi.) should not be used before the second layer, but I didn't know at the beginning, so the effect of training can't reach 90%. After modification, it can achieve high accuracy. Because it's the same in essence, I only do a group of experiments to compare the results (LR=0.1,SGD+M=0.9, cosiannealingwarm).

1, Fixed learning rate

1,LR=0.1 (SGD+M!!!)

Different optim

1.SGD
Experiment 1.1-SGD

Experiment 1.2-SGD+M=0.9

Experiment 1.3-SGD+M=0.9+weight_decay=1e-3

Experiment 1.4-sgd + M = 0 + nestrov = true
Experiment 1.5-SGD+M=0.9+Nesterov=True

2.Adam
Experiment 1.6-Adam

3.Adagrad
Experiment 1.7-Adagrad

4.RMSprop
Experiment 1.8-RMSprop

2,LR=1e-2 (SGD+M!!!)

Experiment 1.9 SGD

Experiment 1.10 SGD+M

Experiment 1.11 Adam

Experiment 1.12 Adagrad

Experiment 1.13-RMSprop

3,LR=1e-3

Experiment 1.14-SGD

2, Change in learning rate

1. Cosine simulated annealing

T_max=25, that is, the half cycle is 25 epoch s and the full cycle is 50, then there will be 4 cycles in 200 rounds.

Different optim
1.SGD+Momentum

Experiment 2.1: LR=0.1, M=0.9, no weight decay

Initial learning rate 0.1, SGD optimizer with momentum (0.9), no weight decaly

optimizer = optim.SGD(net.parameters(),lr=args.lr,
                      momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=20)

Training set loss

Test set acc

Feeling: without weight_ There will be fitting problems in decaly.

Experiment 2.2: LR=0.1, M=0.9, Weight_decay=1e-3

Initial learning rate 0.1, SGD optimizer with momentum (0.9), T_max=25,weight _decay=1e-3

optimizer = optim.SGD(net.parameters(),lr=args.lr,
                      momentum=0.9,weight_decay=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=20)

Training set loss

Test set acc

Experiment 2.3: LR=0.1, M=0.9, Weight_decay=1e-4

Initial learning rate 0.1, SGD optimizer with momentum (0.9), weight decay=1e-4

optimizer = optim.SGD(net.parameters(),lr=args.lr,
                      momentum=0.9,weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=20)

Training set loss

Test set acc

Comparison of the first three experiments

2.Adam
Experiment 2.4: LR=0.1, l1=0.9,l2=0.999, no weight decay

Initial learning rate 0.1, Adam optimizer, l1=0.9,l2=0.999, no weight decay

optimizer_Adam = optim.Adam(net.parameters(),args.lr,(0.9, 0.999))
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_Adam,T_max=25)

Training set loss

Test set acc

Compared with ResNet18 of no weight decaly:

Experiment 2.5: LR=0.1, l1=0.9,l2=0.999, weight decay=1e-3

The initial learning rate is 0.1, Adam optimizer, l1=0.9,l2=0.999, weight decay=1e-3

optimizer_Adam = optim.Adam(net.parameters(),args.lr,(0.9, 0.999),weight_decay=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_Adam,T_max=25)

Training set loss

Test set acc

The effect is too bad

Same weight as SGD_ Dacay for comparison

Experiment 2.6: LR=0.1, l1=0.9,l2=0.999, weight decay=1e-4

The initial learning rate is 0.1, Adam optimizer, l1=0.9,l2=0.999, weight decay=1e-4

optimizer_Adam = optim.Adam(net.parameters(),args.lr,(0.9, 0.999),weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_Adam,T_max=25)

Training set loss

Test set acc

The effect is too bad

Same weight as SGD_ Dacay for comparison and 1e-3 for comparison

Summary: same weight as_ Compare the SGD optimizer of decay and find the train of SGD_ Loss and test_acc is better, but better than weight_decay=1e-3 (pink line) should make progress

3.Adagrad
Experiment 2.7: LR=0.1

Initial learning rate 0.1, Adagrad optimizer

optimizer = optim.Adagrad(net.parameters(),lr=args.lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_Adam,T_max=25)

Training set loss

Test set acc

The accuracy is not what we want. Because the official document of pytorch says that the default is 0.01, and we are 0.1, we plan to adjust the initial learning rate and try again.

Experiment 2.8: LR=0.01

Initial learning rate 0.01, Adagrad optimizer

optimizer = optim.Adagrad(net.parameters(),lr=1e-2)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_Adam,T_max=25)

Training set loss

Test set acc

Comparison of the above two adagrads

The findings are very similar, so we plan to use a set of improved algorithms - RMSprop to observe whether there is progress.

4.RMSprop
Experiment 2.9: LR=0.1

Initial learning rate 0.1, RMSprop optimizer

optimizer = optim.RMSprop(net.parameters(),lr=args.lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer_Adam,T_max=25)

Training set loss

Test set acc

Compared with Adagrad with the same parameters:

2. Cosine simulated annealing with preheating (WarmStart)

SGDR: Stochastic Gradient Descent with Warm Restarts.

1.SGD+Momentum
Experiment 2.10: LR=0.1, M=0.9, T_0=25,T_mult=1

The initial learning rate is 0.1, and the learning rate is adjusted every 25 rounds

optimizer = optim.SGD(net.parameters(),lr=args.lr,momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=25)

Training set loss

Test set acc

Experiment 2.11: LR=0.1, M=0.9, T_0=25,T_mult=2

The initial learning rate is 0.1, and the learning rate is adjusted every 25 rounds

optimizer = optim.SGD(net.parameters(),lr=args.lr,momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=25,T_mult=2)

Training set loss

Test set acc

contrast

3. Equal interval adjustment (StepLR)

Experiment 12: step_size=40

Adjust every 40 rounds

optimizer = optim.SGD(net.parameters(),lr=args.lr,momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=40,gamma=0.1, last_epoch=-1)

Training set loss

Test set acc

Experiment 13: step_size=40

Adjust every 40 rounds

optimizer = optim.SGD(net.parameters(),lr=args.lr,momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=40,gamma=0.1, last_epoch=-1)

Training set loss

Test set acc

3, New experiment

The reason for this experiment has been indicated at the beginning of the article.

1. Cosine simulated annealing with preheating (WarmStart)

SGD+Momentum
Experiment 3.1: LR=0.1, M=0.9, T_0=25,T_mult=2

The initial learning rate is 0.1, and the learning rate is adjusted every 25 rounds

optimizer = optim.SGD(net.parameters(),lr=args.lr,momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=25,T_mult=2)

Training set loss

Test set acc

2. Experimental control

Conclusion: it is found that ResNet18 with modified structure performs better in performance.

4, Summary

Learning rate, optimizer and schedule are really important. The deficiency of this experiment is that dropout is not used, but it has deepened my proficiency in deep learning parameter adjustment. At the same time, it should be noted that different network models, different data sets and different image sizes will have different effects on the learning rate, optimizer and scheduler. Do not generalize.

Finally, put a set diagram of these 28 experiments, hh:

oss**

[external chain picture transferring... (img-czlRQCC2-1637387103638)]

Test set acc

[external chain picture transferring... (IMG opuvtto0-1637387103639)]

2. Experimental control

Conclusion: it is found that ResNet18 with modified structure performs better in performance.

[external chain picture transferring... (img-K7j06G5z-1637387103641)]

[external chain picture transferring... (img-E5orBSnt-1637387103643)]

4, Summary

Learning rate, optimizer and schedule are really important. The deficiency of this experiment is that dropout is not used, but it has deepened my proficiency in deep learning parameter adjustment. At the same time, it should be noted that different network models, different data sets and different image sizes will have different effects on the learning rate, optimizer and scheduler. Do not generalize.

Finally, put a set diagram of these 28 experiments, hh:

[external chain picture transferring... (img-8qJtVuOL-1637387103644)]

Topics: Machine Learning neural networks Pytorch Deep Learning CV