Reproduction of vgg16 for picture recognition

Posted by phpnwx on Tue, 16 Jun 2020 02:38:27 +0200

1. Structure

The image above is a replication of all the files of vgg16. The data folder is a test image. This replication only calls another trained model to identify the picture.vgg16.py reproduces the network structure of vgg16 and imports trained model parameters.Utils.pyFor input picture preprocessing program,Nclasses.py Is the label given to each image and the corresponding index value.App.pyIs our call file for image recognition.

2. Code Details

1,vgg16.py

  1 import tensorflow as tf
  2 import numpy as np
  3 import os
  4 import time
  5 import matplotlib.pyplot as plt
  6 from Nclasses import labels
  7 
  8 VGG_MEAN = [103.939,116.779,123.68] #sample RGB Average
  9 
 10 class Vgg16:
 11     def __init__(self,vgg16_npy_path=None):
 12         if vgg16_npy_path is None:
 13             vgg16_npy_path = os.path.join(os.getcwd(),'vgg16.npy') # os.getcwd() Method is used to return the current working directory.
 14             print(vgg16_npy_path)
 15         # Traversing key-value pairs,Import Model Parameters
 16         self.data_dict = np.load(vgg16_npy_path,encoding='latin1').item() # Load here vgg16.npy Parameters
 17         # ergodic data_dict Each key in
 18         for i in self.data_dict:
 19             print(i)
 20     #Forward Propagation Model
 21     def inference(self,images):
 22         start_time = time.time() # Get the start time of forward propagation
 23         print('build model started')
 24         ## Multiply 255.0 pixels by pixels (according to the initialization steps described in the original paper)
 25         images_scaled = images * 255
 26         # from GRB Convert color channels to BGR，Also available cv In GRBtoBGR
 27         #tf.split()Put the rgb Channel Separation
 28         red,green,blue = tf.split(images_scaled,num_or_size_splits=3,axis=3)
 29         # Subtracts the average pixel value for each channel by sample, which removes the average brightness value of the image. This method is commonly used on gray-scale images.
 30         bgr = tf.concat(values=[
 31             blue-VGG_MEAN[0],
 32             green-VGG_MEAN[1],
 33             red-VGG_MEAN[2]
 34         ],axis=3)
 35 
 36 
 37         #structure vgg network structure
 38         ## Next, build a 16-layer network of VGG Gs (with 5 convolutions and 3 full connections) and read network parameters layer by layer from the namespace
 39         # The first convolution, consisting of two convolution layers followed by a maximum pooling layer, is used to reduce the size of the picture
 40         # Incoming namespace name，To get the convolution kernel and offset of the layer, do the convolution, and return the value after the activation function
 41         conv1_1 = self.conv_layer(bgr,'conv1_1')
 42         conv1_2 = self.conv_layer(conv1_1,'conv1_2')
 43         # According to the incoming pooling Name pooling the layer accordingly
 44         pool1 = self.max_pool(conv1_2,'pool1')
 45 
 46         #Second layer convolution
 47         # The following forward propagation process is the same as the first one
 48         # The second convolution contains two convolution layers, one maximum pooled layer
 49         conv2_1 = self.conv_layer(pool1,'conv2_1')
 50         conv2_2 = self.conv_layer(conv2_1,'conv2_2')
 51         pool2 = self.max_pool(conv2_2,'pool2')
 52         # The third convolution, consisting of three convolution layers and one maximum pooled layer
 53         conv3_1 = self.conv_layer(pool2,'conv3_1')
 54         conv3_2 = self.conv_layer(conv3_1,'conv3_2')
 55         conv3_3 = self.conv_layer(conv3_2,'conv3_3')
 56         pool3 = self.max_pool(conv3_3,'pool3')
 57         # The fourth convolution, which consists of three convolution layers and one maximum pooled layer
 58         conv4_1 = self.conv_layer(pool3,'conv4_1')
 59         conv4_2 = self.conv_layer(conv4_1,'conv4_2')
 60         conv4_3 = self.conv_layer(conv4_2,'conv4_3')
 61         pool4 = self.max_pool(conv4_3,'pool4')
 62         # Segment 5 convolution, consisting of three convolution layers and one maximum pooled layer
 63         conv5_1 = self.conv_layer(pool4, 'conv5_1')
 64         conv5_2 = self.conv_layer(conv5_1, 'conv5_2')
 65         conv5_3 = self.conv_layer(conv5_2, 'conv5_3')
 66         pool5 = self.max_pool(conv5_3, 'pool5')
 67         # Layer 6 Full Connection
 68         #Full Connection Layer
 69         fc6 = self.fc_layer(pool5,'fc6') # According to Namespace name Do weighted sum operation
 70         fc6_relu = tf.nn.relu(fc6)
 71         fc7 = self.fc_layer(fc6_relu,'fc7')
 72         fc7_relu = tf.nn.relu(fc7)
 73         fc8 = self.fc_layer(fc7_relu,'fc8')
 74         # After the last layer is fully connected, do it again softmax Classify and get the probability of belonging to each category
 75         self.out = tf.nn.softmax(fc8,name='prediction')
 76 
 77         # Empty the dictionary of model parameters obtained this time to end the forward propagation time
 78         self.data_dict = None
 79         print(("build model finished: %ds" % (time.time() - start_time)))
 80 
 81     #Create convolution layer function, convolution kernel, offset from vgg16.npy Get in
 82     def conv_layer(self,conv_in,name):
 83         with tf.variable_scope(name): # Finding network parameters for the corresponding convolution layer from the namespace
 84             conv_W = tf.constant(self.data_dict[name][0],name= name + '_filter') # Read the convolution core of this layer
 85             conv_biase = tf.constant(self.data_dict[name][1],name= name + '_biases') # Read offset
 86             conv = tf.nn.conv2d(conv_in,conv_W,strides=[1,1,1,1],padding='SAME') # Convolution calculation
 87             return tf.nn.relu(tf.nn.bias_add(conv,conv_biase)) # Add offset and do activation calculation
 88 
 89     #Maximum pooled layer
 90     def max_pool(self,input_data,name):
 91         return tf.nn.max_pool(input_data,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME',name=name)
 92 
 93     #Average pooled layer
 94     def avg_pool(self, input_data, name):
 95         return tf.nn.avg_pool(input_data, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
 96 
 97     # Define forward propagation calculation for fully connected layers
 98     def fc_layer(self,input_data,name):
 99         fc_W = tf.constant(self.data_dict[name][0],name=name + '_Weight')
100         fc_b = tf.constant(self.data_dict[name][1],name=name + '_b')
101 
102         shape = input_data.get_shape().as_list() #Get a list of input dimension information
103         dim =1
104         for i in shape[1:]:
105             dim *= i # Multiply the dimensions of each layer, that is, long*wide*Number of Channels
106         # Changing the shape of the signature graph, that is, stretching the resulting multidimensional signatures only into the sixth fully connected layer
107         input_data = tf.reshape(input_data,[-1,dim])
108 
109         return tf.nn.bias_add(tf.matmul(input_data,fc_W),fc_b) # Weighted sum with offset for this layer of input
110 
111     # Define percentage conversion function
112     def percent(self,value):
113         return '%.2f%%' % (value * 100)
114     #Judging Picture Category Function
115     def test(self,file_path,prob):
116         #Define a figure Draw the window and specify the name of the window. You can also set the size of the window
117         fig = plt.figure(u"Top-5 Forecast results")
118         synset = [l.strip() for l in open(file_path).readlines()]
119 
120         # argsort Function returns index values from smallest to largest array values
121         # np.argsort The function returns the predicted value ( probability Data structure[[Probability values for each prediction category]])Index values from small to large,
122         # Five index values with the highest probability of prediction are extracted.
123         top5 = np.argsort(prob)[-1:-6:-1]
124         #print(pred)
125 
126         # Define two list---Corresponding probability values and actual labels ( zebra)
127         values = []
128         bar_label = []
129         # Five possibilities for large output probability groups,And list them one by one with the label
130         for n, i in enumerate(top5):
131             print("n:", n)
132             print("i:", i)
133             values.append(prob[i])# Remove and place the predicted probability value corresponding to the index value values
134             bar_label.append(labels[i]) # Remove the actual label according to the index value and put it in bar_label
135             # Probability of belonging to this category
136             print(i, ":", labels[i], "----", self.percent(prob[i])) # Print probability of belonging to a category
137         # Column Chart Processing
138         ax = fig.add_subplot(111) # Divide the canvas into rows and columns and place the following image in them
139         # bar()Function plots a column graph with parameters range(len(values)Is the column subscript, values A list of column heights (that is, five predicted probability values,
140         # tick_label Is the label displayed on each column (the actual label), width Is the width of the column, fc Is the color of the column)
141         ax.bar(range(len(values)), values, tick_label=bar_label, width=0.5, fc='g')
142         ax.set_ylabel(u'probabilityit') # Set Horizontal Axis Label
143         ax.set_title(u'Top-5') # Add Title
144         for a, b in zip(range(len(values)), values):
145             # Add a corresponding prediction probability value at the top of each column. a， b Represents coordinates, b+0.0005 Indicates that text information is to be placed above the top of each column
146             # 0.0005 Location,
147             #  center Is the middle position indicating that the text is located horizontally at the top of the column. bottom Place text horizontally at the bottom of the vertical direction of the top of the column
148             # Location, fontsize Is Font Size
149             ax.text(a, b + 0.0005, self.percent(b), ha='center', va='bottom', fontsize=7)
150         plt.savefig('.result.jpg')  # Save Pictures
151         plt.show()# Pop-up window display image

In this section, we have created the forward propagation of VGG16 network structure, including convolution core, offset, pooling layer and full connection layer. What we need to mention here is the establishment of full connection layer. To create a full connection layer, we first need to read the dimension information list of that layer, then we need to change the shape of the feature map and stretch the multidimensional features that will be obtained in the sixth layer.Make it conform to the input of the fully connected layer, where the shape has elements [-1], flattening the dimension to one dimension for the purpose of dimension reduction.

2,utils.py

from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from pylab import mpl
import os
mpl.rcParams['font.sans-serif']=['SimHei'] # Normal display of Chinese labels
mpl.rcParams['axes.unicode_minus']=False # Display plus and minus sign normally

def load_image(path):
    fig = plt.figure('Center and Resize')

    img = io.imread(path) # Read pictures from incoming paths
    img = img / 255.0 # Normalize pixels to[0,1]
    ax0 = fig.add_subplot(131)
    ax0.set_xlabel(u'Original Picture')  # Add Sublabel
    ax0.imshow(img)
    ## Image processing section
    #Subtract the width and height of the picture from the shortest edge and average it to take out the center image
    short_edge = min(img.shape[:2]) # Find the shortest edge of the image
    y = int((img.shape[0] - short_edge) // 2)
    x = int((img.shape[1] - short_edge)// 2)
    crop_img = img[y:y + short_edge,x:x + short_edge] #Remove the sliced central image

    ax1 = fig.add_subplot(132)
    ax1.set_xlabel(u"Center Picture")
    ax1.imshow(crop_img)

    #Central image resize 224, 224
    resized_img = transform.resize(crop_img,(224,224))

    ax2 = fig.add_subplot(133)
    ax2.set_xlabel('Divide pictures') # Add Sublabel
    ax2.imshow(resized_img)

    #plt.show()

    img_read = resized_img.reshape((1,224,224,3))  # shape [1, 224, 224, 3]
    return img_read

def load_data():
    imgs = {'tiger':[],'kittycat':[]}
    for k in imgs.keys():
        dir = './data/' + k
        for file in os.listdir(dir):
            if not file.lower().endswith('.jpg'):
                continue
            try:
                resized_img = load_image(os.path.join(dir,file))
            except OSError:
                continue

            imgs[k].append(resized_img)    # [1, height, width, depth] * n
            if len(imgs[k]) == 2:          # only use 400 imgs to reduce my memory load
                break

    return imgs['tiger'],imgs['kittycat']

In this part, the input pictures are processed, clipped, scaled, etc. to meet the needs of network input. The main idea is to normalize the image and process it, the results are as follows:

3,app.py

 1 import numpy as np
 2 import tensorflow as tf
 3 import matplotlib.pyplot as plt
 4 import vgg16
 5 import utils
 6 #Input of image to be detected and preprocessed
 7 batch = utils.load_image(r'E:\vgg16\for_transfer_learning\data\kittycat/000129037.jpg')
 8 
 9 #Define a figure Draw the window and specify the name of the window. You can also set the size of the window
10 fig = plt.figure(u"Top-5 Forecast results")
11 print('Net built')
12 with tf.Session() as sess:
13     # Define a dimension as[？,224,224,3],Type is float32 Of tensor placeholder
14     images = tf.placeholder(tf.float32, [None, 224, 224, 3])
15     feed_dict = {images: batch}
16     #class Vgg16 Instantiate vgg
17     vgg = vgg16.Vgg16()
18     # Call member methods of classes inference()，And pass in the image to be tested, which is also the process of network forward propagation
19     vgg.inference(images)
20     ## Feed a batch's data into the network to get the predicted output of the network
21     probability = sess.run(vgg.out, feed_dict=feed_dict)
22 
23     vgg.test('./synset.txt',probability[0])

This part calls the above programs to identify the image. What we need to do is call the network structure of VGG16, then calculate the probability, the five possibilities of the highest output probability, and one-to-one correspondence with the label, and finally draw the result with a column chart to express the result.

3. Testing

FunctionApp.pyReplace the picture directory in batch with the one you want to recognize

first group

Group 2

Topics: network encoding Session

Programmer Think

Reproduction of vgg16 for picture recognition

Hot Topics