Chapter 5-3 (pre training) reading notes of Python deep learning

Posted by jmboblee on Wed, 12 Jan 2022 07:47:21 +0100

5.3 using pre trained convolutional neural networks

Pre trained network:

  • It has been used in large data sets (usually large-scale image classification tasks) Training Practice good , protect Save good of network Collaterals \color{red} trained and preserved network Trained and preserved network.
  • Pre training network learn reach of special sign of empty between layer second junction structure \color{red} learns the spatial hierarchy of features The spatial hierarchy of the learned features can be effectively used as a general model of the visual world, which has a wide range of problems can shift plant nature \color{red} portability Portability.
  • Due to the pre training network, deep learning yes Small number according to ask topic wrong often have effect \color{red} is very effective for small data problems It is very effective for small data problems.

There are two ways to use the pre training network: special sign carry take ( f e a t u r e    e x t r a c t i o n ) \color{red} feature extraction Feature extraction and tiny transfer model type ( f i n e − t u n i n g ) \color{red} fine tuning model fine tune the model.

5.3.1 feature extraction

  1. definition:
    special sign carry take yes send use of front network Collaterals learn reach of surface show come from new kind book in carry take Out have interest of special sign . \color{red} feature extraction uses the representations learned from the network to extract interesting features from new samples. Feature extraction is to extract interesting features from new samples using the representations learned from the network.

  2. volume product base \color{red} convolution basis Convolution basis:

    • The convolutional neural network for image classification consists of two parts: the first is a series of pooling layers and convolution layers, and the last is a dense connection classifier. The first part is called model volume product base ( c o n v o l u t i o n a l    b a s e ) \color{red} convolutional base Convolutional base.
    • Feature extraction is to take out the convolution basis of the previously trained network, run new data on it, and then stay transport Out upper noodles Training Practice one individual new of branch class implement \color{red} trains a new classifier on the output Train a new classifier on the output.
      1. Why not use dense layers: the representation of dense connected layers no longer contains the objects in the input image position Set letter interest \color{red} location information Location information. The dense connection layer discards the concept of space, and the object position information is still described by the convolution characteristic graph.
    • Extracted from a convolution layer surface show of through use nature \Generality of color{red} representation The commonality (and reusability) of the representation depends on Should layer stay model type in of deep degree \color{red} the depth of the layer in the model The depth of the layer in the model.

      In the model, the layer closer to the bottom extracts local and highly general feature images (such as visual edges, colors and textures), while the layer closer to the top extracts more abstract concepts (such as "cat ears" or "dog eyes").

    • If your new data set is very different from the data set trained by the original model, it is best to use only the first few layers of the model for feature extraction, rather than the whole convolution basis.
    • Common models are built into Keras. You can get it from Keras Import in the applications module.
  3. VGG16 model

    from tensorflow.keras.applications import VGG16
    conv_base = VGG16(weights='imagenet',
    				  include_top=False,
    				  input_shape=(150, 150, 3))
    

    Here, three parameters are passed into the constructor.

    • weights specifies the weight checkpoint for model initialization.
    • include_top specifies whether the model finally contains a dense join classifier. By default, this dense connection classifier corresponds to 1000 categories of ImageNet. Because we intend to use our own dense join classifier (only two categories: cat and dog), we don't need to include it.
    • input_shape is the shape of the image tensor input into the network. This parameter is completely optional. If this parameter is not passed in, the network can handle any shape of input.
    conv_base.summary()
    

    The final feature graph shape is (4, 4, 512). We will add a dense connection classifier to this feature. There are two options for the next step.

    • Fast feature extraction without data enhancement.
      This method has high speed and low computational cost, because it only needs to run the convolution basis once for each input image, and the convolution basis is the most expensive in the current process.
    • Feature extraction using data enhancement
      Add a sense layer at the top to extend the existing model (i.e. conv_base) and run the whole model end-to-end on the input data.

5.3.1.1. Fast feature extraction without data enhancement

First, run the ImageDataGenerator instance to extract the image and its tags into a Numpy array. Call conv_base model is used to extract features from these images. Code of the first method: protect Save you of number according to stay \color{red} save your data in Save your data in conv_ Output in base, however after take this some transport Out do by transport enter use to new model type \color{red} then uses these outputs as input for the new model These outputs are then used as inputs for the new model.

  1. Feature extraction using pre trained convolution basis

    import os
    import numpy as np
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    base_dir = 'C:\\Users\\Administrator\\deep-learning-with-python-notebooks-master\\cats_and_dogs_small'
    train_dir = os.path.join(base_dir, 'train')
    validation_dir = os.path.join(base_dir, 'validation')
    test_dir = os.path.join(base_dir, 'test')
    
    datagen = ImageDataGenerator(rescale=1./255)
    batch_size = 20
    
    def extract_features(directory, sample_count):
    	features = np.zeros(shape=(sample_count, 4, 4, 512))
    	labels = np.zeros(shape=(sample_count))
    	generator = datagen.flow_from_directory(
    		directory,
    		target_size=(150, 150),
    		batch_size=batch_size,
    		class_mode='binary')
    	i = 0
    	for inputs_batch, labels_batch in generator:
    		features_batch = conv_base.predict(inputs_batch)
    		features[i * batch_size : (i + 1) * batch_size] = features_batch
    		labels[i * batch_size : (i + 1) * batch_size] = labels_batch
    		i += 1
    		if i * batch_size >= sample_count:
    			break
    	# Note that these generators constantly generate data in the loop,
    	# So you have to end the loop after reading all the images
    	return features, labels
    
    train_features, train_labels = extract_features(train_dir, 4000)
    validation_features, validation_labels = extract_features(validation_dir, 2000)
    test_features, test_labels = extract_features(test_dir, 2000)
    
    # The extracted feature shapes are (samples, 4, 4, 512). We need to input it into the dense connection classifier,
    # So you must first flatten the shape to (samples, 8192) 
    train_features = np.reshape(train_features, (4000, 4 * 4 * 512))
    validation_features = np.reshape(validation_features, (2000, 4 * 4 * 512))
    test_features = np.reshape(test_features, (2000, 4 * 4 * 512))
    
  2. Define and train dense join classifiers
    You need to use dropout regularization and train the classifier on the just saved data and tags.

    from tensorflow.keras import models
    from tensorflow.keras import layers
    from tensorflow.keras import optimizers
    model = models.Sequential()
    model.add(layers.Dense(256, activation='relu', input_dim=4 * 4 * 512))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(1, activation='sigmoid'))
    model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
    			  loss='binary_crossentropy',
    			  metrics=['acc'])
    history = model.fit(train_features, train_labels,
    					epochs=30,
    					batch_size=20,
    					validation_data=(validation_features, validation_labels))
    
  3. Draw results

    import matplotlib.pyplot as plt
    
    acc = history.history['acc']
    val_acc = history.history['val_acc']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    
    epochs = range(1, len(acc) + 1)
    
    plt.plot(epochs, acc, 'bo', label='Training acc')
    plt.plot(epochs, val_acc, 'b', label='Validation acc')
    plt.title('Training and validation accuracy')
    plt.legend()
    
    plt.figure()
    
    plt.plot(epochs, loss, 'bo', label='Training loss')
    plt.plot(epochs, val_loss, 'b', label='Validation loss')
    plt.title('Training and validation loss')
    plt.legend()
    plt.show()
    


Although the dropout ratio is quite large, the model is over fitted almost from the beginning. that is because book square method no have send use number according to increase strong \color{red} This method does not use data enhancement This method does not use data enhancement, but number according to increase strong yes Guard against stop Small type chart image number according to collection of too Draft close wrong often heavy want \color{red} data enhancement is very important to prevent over fitting of small image data sets Data enhancement is very important to prevent over fitting of small image data sets.

5.3.1.2. Feature extraction using data enhancement

It is slower and more computationally expensive, but data enhancement can be used during training. This method is: expand exhibition \color{red} extension Extended conv_base model, however after stay transport enter number according to upper end reach end land transport that 's ok model type \color{red} then run the model end-to-end on the input data Then run the model end-to-end on the input data.

The calculation cost of this method is very high, and it can only be tried when there is a GPU. It is absolutely difficult to run on the CPU. If you can't run code on the GPU, use the first method.

  1. A dense connection classifier is added to the convolution basis
    from tensorflow.keras import models
    from tensorflow.keras import layers
    
    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.Flatten())
    model.add(layers.Dense(256, activation='relu'))
    model.add(layers.Dense(1, activation='sigmoid'))
    
    model.summary()
    
    • The convolution basis of VGG16 has 14714688 parameters, very many. The classifier added to it has 2 million parameters.
    • The convolution basis must be "frozen" before compiling and training the model. freeze junction \color{red} freeze freeze one or more layers means to keep their weights unchanged during training. If this is not done, the previously learned representation of the convolution basis will be modified during the training process. Because the density layer added on it is randomly initialized, very large weight updates will spread in the network, causing great damage to the previously learned representation.
  2. Freezing model

    In Keras, freeze junction network Collaterals of square method yes take his t r a i n a b l e genus nature set up by F a l s e \color{red} the way to freeze the network is to set its trainable property to False The way to freeze a network is to set its trainable property to False.

    >>> print('This is the number of trainable weights '
    	'before freezing the conv base:', len(model.trainable_weights))
    This is the number of trainable weights before freezing the conv base: 30
    
    >>> conv_base.trainable = False
    
    >>> print('This is the number of trainable weights '
    	'after freezing the conv base:', len(model.trainable_weights))
    This is the number of trainable weights after freezing the conv base: 4
    
    After setting, only the weights of the two density layers added will be trained. There are four weight tensors in total, two in each layer (sovereign weight matrix and bias vector).

    If the trainable attribute of the weight is modified after compilation, the model should be recompiled, otherwise these modifications will be ignored.

  3. End to end training model using frozen convolution base
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    from tensorflow.keras import optimizers
    train_datagen = ImageDataGenerator(
    	rescale=1./255,
    	rotation_range=40,
    	width_shift_range=0.2,
    	height_shift_range=0.2,
    	shear_range=0.2,
    	zoom_range=0.2,
    	horizontal_flip=True,
    	fill_mode='nearest')
    	
    # Note that validation data cannot be enhanced
    test_datagen = ImageDataGenerator(rescale=1./255)
    
    train_generator = train_datagen.flow_from_directory(
    	train_dir,#Target directory
    	target_size=(150, 150),# Resize all images to 150 × one hundred and fifty
    	batch_size=20,
    	class_mode='binary')
    	#Because binary is used_ Crossintropy is lost, so you need to use binary tags
    	
    validation_generator = test_datagen.flow_from_directory(
    	validation_dir,
    	target_size=(150, 150),
    	batch_size=20,
    	class_mode='binary')
    model.compile(loss='binary_crossentropy',
    	optimizer=optimizers.RMSprop(lr=2e-5),
    	metrics=['acc'])
    history = model.fit_generator(
    	train_generator,
    	steps_per_epoch=200,#The data is 4000
    	epochs=30,
    	validation_data=validation_generator,
    	validation_steps=50)
    

5.3.2 fine tuning model

  1. model type tiny transfer ( f i n e − t u n i n g ) \color{red} model fine tuning fine tuning and feature extraction mutual by repair charge \color{red} complements each other They complement each other.

  2. tiny transfer model type set righteousness \color{red} fine tune model definition Fine tune model definition:
    For the frozen model base used for feature extraction, fine tuning refers to top Department of A few layer " solution freeze " \Several layers of "thaw" at the top of color{red} The top layers are "thawed" and will solution freeze of A few layer and new increase plus of Department branch Couplet close Training Practice \color{red} thawed layers and newly added parts of joint training Thawed layers and newly added parts of joint training. For this example, fine tuning is to tune the last part of the original and the new part.

    • freeze junction V G G 16 of volume product base yes by Yes can enough stay upper noodles Training Practice one individual along with machine first beginning turn of branch class implement . \color{red} the convolution basis of VGG16 is frozen so that a randomly initialized classifier can be trained on it. The convolution basis of VGG16 is frozen so that a randomly initialized classifier can be trained on it.
    • only have upper noodles of branch class implement already through Training Practice good Yes , just can tiny transfer volume product base of top Department A few layer . \color{red} only after the above classifier has been trained can the top layers of convolution basis be fine tuned. Only when the above classifier has been trained can the top layers of convolution basis be fine tuned.
  3. To fine tune a network

    1. stay already through Training Practice good of base network Collaterals ( b a s e n e t w o r k ) upper add plus since set righteousness network Collaterals \color{red} add a custom network to the trained base network Add a custom network to the trained base network.
    2. freeze junction base network Collaterals \color{red} freeze base network Freeze the base network.
    3. Training Practice place add plus of Department branch \Part added by color{red} training The part added to the training.
    4. solution freeze base network Collaterals of one some layer \color{red} thaws some layers of the base network Thaw some layers of the base network.
    5. Couplet close Training Practice solution freeze of this some layer and add plus of Department branch \color{red} joint training thawed these layers and added parts Joint training thawed these layers and added parts.

    The first three steps are feature extraction.

  4. Why more by top layer do tiny transfer \color{blue} is more fine tuned by the top layer More fine-tuning on the top?

    • Convolution basis more by bottom Department of layer Compile code of yes more plus through use of can complex use special sign \color{red} the layer at the bottom encodes more general reusable features The layer closer to the bottom encodes more general reusable features, while the layer closer to the top encodes more specialized features.
    • Training Practice of ginseng number More many , too Draft close of wind Risk More large \The more parameters of color{red} training, the greater the risk of over fitting The more training parameters, the greater the risk of over fitting.
  5. Freeze all layers up to a layer

    conv_base.trainable = True
    
    set_trainable = False
    for layer in conv_base.layers:
    	if layer.name == 'block5_conv1':
    		set_trainable = True
    	if set_trainable:
    		layer.trainable = True
    	else:
    		layer.trainable = False
    

  6. Fine tuning model

    model.compile(loss='binary_crossentropy',
    	optimizer=optimizers.RMSprop(lr=1e-5),
    	metrics=['acc'])
    history = model.fit_generator(
    	train_generator,
    	steps_per_epoch=200,
    	epochs=100,
    	validation_data=validation_generator,
    	validation_steps=50)
    
  7. Smoothes the curve

    import matplotlib.pyplot as plt
    def smooth_curve(points, factor=0.8):
    	smoothed_points = []
    	for point in points:
    		if smoothed_points:
    			previous = smoothed_points[-1]
    			smoothed_points.append(previous * factor + point * (1 - factor))
    		else:
    			smoothed_points.append(point)
    	return smoothed_points
    plt.plot(epochs,
    	smooth_curve(acc), 'bo', label='Smoothed training acc')
    plt.plot(epochs,
    	smooth_curve(val_acc), 'b', label='Smoothed validation acc')
    plt.title('Training and validation accuracy')
    plt.legend()
    plt.figure()
    plt.plot(epochs,
    	smooth_curve(loss), 'bo', label='Smoothed training loss')
    plt.plot(epochs,
    	smooth_curve(val_loss), 'b', label='Smoothed validation loss')
    plt.title('Training and validation loss')
    plt.legend()
    plt.show()
    
    Epoch 97/100
    200/200 [==============================] - 224s 1s/step - loss: 0.0158 - acc: 0.9950 - val_loss: 0.3688 - val_acc: 0.9450
    Epoch 98/100
    200/200 [==============================] - 223s 1s/step - loss: 0.0153 - acc: 0.9945 - val_loss: 0.4035 - val_acc: 0.9410
    Epoch 99/100
    200/200 [==============================] - 223s 1s/step - loss: 0.0260 - acc: 0.9901 - val_loss: 0.3588 - val_acc: 0.9440
    Epoch 100/100
    200/200 [==============================] - 223s 1s/step - loss: 0.0155 - acc: 0.9946 - val_loss: 0.4835 - val_acc: 0.9410
    

    The accuracy is improved to 94%, and the loss curve does not change much.

    From the loss curve, there is no real improvement (actually getting worse) compared with before. If the loss is not reduced, how can the accuracy remain stable or improved? The answer is simple: the figure shows the average point wise loss, but shadow ring essence degree of yes damage lose value of branch cloth \color{red} what affects the accuracy is the distribution of loss values It is the distribution of loss values rather than the average value that affects the accuracy, because the accuracy is the binary threshold of the category probability predicted by the model. Even if it cannot be seen from the average loss, the model may still be improving.

5.3.2.1 final evaluation of this model

test_generator = test_datagen.flow_from_directory(
	test_dir,
	target_size=(150, 150),
	batch_size=20,
	class_mode='binary')
	
test_loss, test_acc = model.evaluate_generator(test_generator, steps=50)
print('test acc:', test_acc)

5.3.3 summary

Image classification problem, especially for Small type number according to collection \color{red} small dataset Small datasets:

  • volume product god through network Collaterals yes use to meter count machine regard sleep let Affairs of most good machine implement learn Learn model type \color{red} convolutional neural network is the best machine learning model for computer vision tasks Convolutional neural network is the best machine learning model for computer vision tasks. A convolutional neural network can be trained from scratch even on a very small data set, and the results are good.
  • stay Small type number according to collection upper of main want ask topic yes too Draft close \The main problem of color{red} on small data sets is over fitting The main problem on small data sets is over fitting. When processing image data, number according to increase strong \color{red} data enhancement Data enhancement is a powerful method to reduce over fitting.
  • utilize special sign carry take \color{red} feature extraction Feature extraction can easily integrate the existing convolutional neural network complex use \color{red} reuse Reuse in new data sets. This is a valuable method for small image data sets.
  • As a supplement to feature extraction, you can also use tiny transfer \color{red} fine tuning Fine tune and apply some data representations learned before the existing model to new problems. This method can further improve the performance of the model.

Complete code

## Data preparation
import os, shutil
# The path to the original dataset decompression directory
original_dataset_dir = 'C:\\Users\\Administrator\\Python_learning\\kaggle_original_data'
# Directory to save smaller datasets
base_dir = 'C:\\Users\\Administrator\\Python_learning\\cats_and_dogs_small'
os.mkdir(base_dir)

# Corresponding to the directory of divided training, verification and test respectively
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

# Cat training image directory
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
# Dog training image directory
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)
# Cat authentication image directory
validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)
# Dog authentication image directory
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)
# Cat test image directory
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
# Dog test image directory
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)

# Copy the first 2000 cat images to the train_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src, dst)
# Copy the next 1000 cat images to validation_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(2000, 3000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_cats_dir, fname)
    shutil.copyfile(src, dst)
# Copy the next 1000 cat images to test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(3000, 4000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src, dst)

# Copy the first 2000 dog images to the train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)
# Copy the next 1000 dog images to validation_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(2000, 3000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_dogs_dir, fname)
    shutil.copyfile(src, dst)
# Copy the next 1000 dog images to test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(3000, 3000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)

## Data preprocessing, using data enhancement
train_datagen = ImageDataGenerator(
	rescale=1./255,
	rotation_range=40,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.2,
	zoom_range=0.2,
	horizontal_flip=True,)
# Note that validation data cannot be enhanced
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
		train_dir,               # Target directory
		target_size=(150, 150),  # Resize all images to 150 × one hundred and fifty
		batch_size=20,
		# It's 32 in the book, but 32*200(steps_per_epoch) is greater than 4000. An error will be reported during operation, so it's changed to 20
		class_mode='binary')  
# Because binary is used_ Crossintropy is lost, so you need to use binary tags

validation_generator = test_datagen.flow_from_directory(
		validation_dir,
		target_size=(150, 150),
		batch_size=20, # It's 32 in the book, but 32*200(steps_per_epoch) is greater than 4000. An error will be reported during operation, so it's changed to 20
		class_mode='binary')
		
## Call VGG16
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
				  include_top=False,
				  input_shape=(150, 150, 3))
				  
## Feature extraction using data enhancement
# A dense connection classifier is added to the convolution basis
from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

## Fine tuning model
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
	if layer.name == 'block5_conv1':
		set_trainable = True
	if set_trainable:
		layer.trainable = True
	else:
		layer.trainable = False

# Training model
model.compile(loss='binary_crossentropy',
	optimizer=optimizers.RMSprop(lr=1e-5),
	metrics=['acc'])
history = model.fit_generator(
	train_generator,
	steps_per_epoch=200,
	epochs=100,
	validation_data=validation_generator,
	validation_steps=50)

## Draw smooth curve
def smooth_curve(points, factor=0.8):
	smoothed_points = []
	for point in points:
		if smoothed_points:
			previous = smoothed_points[-1]
			smoothed_points.append(previous * factor + point * (1 - factor))
		else:
			smoothed_points.append(point)
	return smoothed_points
plt.plot(epochs,
	smooth_curve(acc), 'bo', label='Smoothed training acc')
plt.plot(epochs,
	smooth_curve(val_acc), 'b', label='Smoothed validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs,
	smooth_curve(loss), 'bo', label='Smoothed training loss')
plt.plot(epochs,
	smooth_curve(val_loss), 'b', label='Smoothed validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

## Evaluating models on test sets
test_generator = test_datagen.flow_from_directory(
	test_dir,
	target_size=(150, 150),
	batch_size=20,
	class_mode='binary')
	
test_loss, test_acc = model.evaluate_generator(test_generator, steps=50)
print('test acc:', test_acc)

Topics: Python neural networks Computer Vision Deep Learning