[Baidu PaddlePaddle Paddle] 11 categories of food classification items

Project description

Train a simple convolutional neural network to realize the classification of food pictures.

Data set introduction

The data set used this time is food-11 data set, which has 11 categories in total

Bread, Dairy product, Dessert, Egg, Fried food, Meat, Noodles/Pasta, Rice, Seafood, Soup, and Vegetable/Fruit.
(bread, dairy products, desserts, eggs, fried food, meat, noodles / spaghetti, rice, seafood, soup, vegetables / fruits)
Training set: 9866 sheets
Validation set: 3430 sheets
Testing set: 3347 sheets

data format
After downloading the zip file and decompressing it, there will be three folders: training, validation and testing
The format of photo name in training and validation is [category]_ [No.] Jpg, e.g. 3_100.jpg is the photo of Category 3 (the number is not important)

Dataset Download

Decompress data

Run only once!

!unzip -d work data/data76472/food-11.zip # Decompress the food-11 dataset

Introduction environment

import os
import paddle
import paddle.vision.transforms as T
import numpy as np
import pandas as pd
from PIL import Image
import paddle.nn.functional as F

Pretreatment link

#Only run once!!!!
For the naming of pictures, the index file is generated

The format of photo name is [category]_ [No.] Jpg, e.g. 3_100.jpg is the photo of Category 3 (the number is not important)

data_path = '/home/aistudio/work/food-11/'  # Set initial file address
character_folders = os.listdir(data_path)  # Give absolute address

 #Process the training set
for character_folder in character_folders: 
    with open(f'./training_set.txt', 'a') as f_train: 
        character_imgs = os.listdir(os.path.join(data_path,character_folder)) 
        #Initialize counter
        CNT = 0
        for img in character_imgs:
            f_train.write(os.path.join(data_path,character_folder,img) + '\t' + img[0:img.rfind('_', 1)] + '\n')
            CNT += 1

#Processing validation sets
for character_folder in character_folders: 
    with open(f'./validation_set.txt', 'a') as f_train: 
        character_imgs = os.listdir(os.path.join(data_path,character_folder)) 
        #Initialize counter
        CNT = 0
        for img in character_imgs:
            f_train.write(os.path.join(data_path,character_folder,img) + '\t' + img[0:img.rfind('_', 1)] + '\n')
            CNT += 1

#Processing test sets
for character_folder in character_folders: 
    with open(f'./test_set.txt', 'a') as f_train: 
        character_imgs = os.listdir(os.path.join(data_path,character_folder)) 
        #Initialize counter
        CNT = 0
        for img in character_imgs:
            f_train.write(os.path.join(data_path,character_folder,img) + '\n')
            CNT += 1
training 9866
validation 3430
testing 3347

Verify whether the corresponding relationship is correct

with open(tf) as f:
for i in tfl:

EDA (Exploratory Data Analysis)

View the data volume and distribution of each category.
pandas is used here for data processing
For txt files without header, you can use read_ The names parameter of table and command it. Refer to this article
For the use of Pandas, please refer to the following articles
pandas learning record

#Read txt text
df =  pd.read_table(tf,sep='\t',names=['name','label'])
#Then select the data under the label column to draw
d = df['label'].hist().get_figure()

                                                  name    label
0       /home/aistudio/work/food-11/training/3_288.jpg        3
1        /home/aistudio/work/food-11/training/4_36.jpg        4
2       /home/aistudio/work/food-11/training/6_328.jpg        6
3      /home/aistudio/work/food-11/training/10_707.jpg       10
4       /home/aistudio/work/food-11/training/2_957.jpg        2
...                                                ...      ...
16638     /home/aistudio/work/food-11/testing/0542.jpg  0542.jp
16639     /home/aistudio/work/food-11/testing/3091.jpg  3091.jp
16640     /home/aistudio/work/food-11/testing/0722.jpg  0722.jp
16641     /home/aistudio/work/food-11/testing/0805.jpg  0805.jp
16642     /home/aistudio/work/food-11/testing/1566.jpg  1566.jp

[16643 rows x 2 columns]

label shuffling

It can be seen from the above figure that there are too many data in category 10, which may lead to over fitting of specific categories in the training process of our deep learning model, resulting in insufficient generalization ability of the model. On this basis, we use the shuffling method

Its principle is as follows
I would like to thank ID: idle people Principle description

Firstly, the original image list is sorted according to the tag order; Then calculate the number of samples in each category and get the number of samples in the category with the most samples. According to the maximum number of samples, a randomly arranged list is generated for each class; Then, the number in the list of each category is used to sum the number of samples of each category to obtain an index value, extract the image from the image of this category, and generate a random list of images of this category; Then connect the random lists of all categories together, make a Random Shuffling, get the final image list, and use this list for training.

#View data format
#shuffle program
from sklearn.utils import shuffle
def labelShuffling(dataFrame, groupByName='label'):
    groupDataFrame = dataFrame.groupby(by=[groupByName])
    labels = groupDataFrame.size()
    print("length of label is ", len(labels))
    maxNum = max(labels)
    lst = pd.DataFrame()
    for i in range(len(labels)):
        print("Processing label  :", i)
        tmpGroupBy = groupDataFrame.get_group(i)
        createdShuffleLabels = np.random.permutation(np.array(range(maxNum))) % labels[i]  # Random arrangement and combination
        print("Num of the label is : ", labels[i])
        lst=lst.append(tmpGroupBy.iloc[createdShuffleLabels], ignore_index=True)
        # print("Done")
    # lst.to_csv('test1.csv', index=False)
    return lst

all_size = len(df)
print("Training set size:", all_size)

# train_image_list = df

df1 = labelShuffling(df)
df1 = shuffle(df1)
print("shuffle Post dataset size:", len(df1))

train_image_path_list = df1['name'].values
label_list = df1['label'].values
label_list = paddle.to_tensor(label_list, dtype='int64')
train_label_list = paddle.nn.functional.one_hot(label_list, num_classes=11)

Training set size: 9866
length of label is  11
Processing label  : 0
Num of the label is :  994
Processing label  : 1
Num of the label is :  429
Processing label  : 2
Num of the label is :  1500
Processing label  : 3
Num of the label is :  986
Processing label  : 4
Num of the label is :  848
Processing label  : 5
Num of the label is :  1325
Processing label  : 6
Num of the label is :  440
Processing label  : 7
Num of the label is :  280
Processing label  : 8
Num of the label is :  855
Processing label  : 9
Num of the label is :  1500
Processing label  : 10
Num of the label is :  709
shuffle Post dataset size: 16500
#View the data format after shuffling
(16500, 2)
                                                  name  label
13196   /home/aistudio/work/food-11/training/8_813.jpg      8
3445    /home/aistudio/work/food-11/training/2_783.jpg      2
5117    /home/aistudio/work/food-11/training/3_818.jpg      3
12593   /home/aistudio/work/food-11/training/8_719.jpg      8
15083  /home/aistudio/work/food-11/training/10_655.jpg     10
...                                                ...    ...
7211    /home/aistudio/work/food-11/training/4_139.jpg      4
3078    /home/aistudio/work/food-11/training/2_328.jpg      2
9574    /home/aistudio/work/food-11/training/6_199.jpg      6
9338    /home/aistudio/work/food-11/training/6_205.jpg      6
1035     /home/aistudio/work/food-11/training/0_32.jpg      0

[16500 rows x 2 columns]
#Convert the shuffled list into txt and save it
with open('./t1.txt','a') as f:
    for i in range(len(df1)):
#Processing validation sets
#Read txt text
vsf =  pd.read_table('./validation_set.txt',sep='\t',names=['name','label'])

val_image_path_list = val_image_list['name'].values
val_label_list = val_image_list['label'].values
val_label_list = paddle.to_tensor(val_label_list, dtype='int64')
val_label_list = paddle.nn.functional.one_hot(val_label_list, num_classes=11)
                                                  name  label
0     /home/aistudio/work/food-11/validation/3_316.jpg      3
1     /home/aistudio/work/food-11/validation/2_314.jpg      2
2     /home/aistudio/work/food-11/validation/3_167.jpg      3
3      /home/aistudio/work/food-11/validation/8_73.jpg      8
4      /home/aistudio/work/food-11/validation/1_56.jpg      1
...                                                ...    ...
3425  /home/aistudio/work/food-11/validation/3_212.jpg      3
3426  /home/aistudio/work/food-11/validation/6_115.jpg      6
3427  /home/aistudio/work/food-11/validation/4_264.jpg      4
3428  /home/aistudio/work/food-11/validation/5_148.jpg      5
3429  /home/aistudio/work/food-11/validation/2_105.jpg      2

[3430 rows x 2 columns]
#For testing
Tensor(shape=[11], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])
Tensor(shape=[11], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Tensor(shape=[11], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])
Tensor(shape=[11], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.])
Tensor(shape=[11], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])
Tensor(shape=[11], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.])

Create dataset

# # Define data preprocessing
# data_transforms = T.Compose([
#     T.RandomResizedCrop(224, scale=(0.8, 1.2), ratio=(3. / 4, 4. / 3), interpolation='bilinear'),
#     T.RandomHorizontalFlip(),
#     T.RandomVerticalFlip(),
#     T.RandomRotation(15),
#     T.Transpose(),    # HWC -> CHW
#     T.Normalize(
#         mean=[127.5, 127.5, 127.5],        # normalization
#         std=[127.5, 127.5, 127.5],
#         to_rgb=True)    
# ])
# Build Dataset
class MyDataset(paddle.io.Dataset):
    Step 1: inherit paddle.io.Dataset class
    def __init__(self, train_img_list, val_img_list, train_label_list, val_label_list, mode='train'):
        Step 2: implement the constructor, define the data reading method, and divide the training and test data sets
        super(MyDataset, self).__init__()
        self.img = []
        self.label = []
        # Reading csv library with pandas
        self.train_images = train_img_list
        self.test_images = val_img_list
        self.train_label = train_label_list
        self.test_label = val_label_list
        if mode == 'train':
            # Read train_images data
            for img,la in zip(self.train_images, self.train_label):
            # Read test_images data
            for img,la in zip(self.test_images, self.test_label):

    def load_img(self, image_path):
        # In actual use, you can use the pilot related library to read pictures. Here, let's simulate the data first
        image = Image.open(image_path).convert('RGB')
        return image

    def __getitem__(self, index):
        Step 3: Implement__getitem__Method, defining and specifying index How to obtain the corresponding data of the training tag and return it
        image = self.load_img(self.img[index])
        label = self.label[index]
        ## label = paddle.to_tensor(label)
        img=image.resize((100, 100), Image.ANTIALIAS)  # Image size style normalization
        img = np.array(img).astype('float32')  # Convert to array type floating-point 32-bit
        img = img.transpose((2, 0, 1))     #The image read out is rgb,rgb,rbg, Transpose to RRR, ggg..., bbb...
        img = img/255.0  # The data is scaled to the range of 0-1
        label=np.argmax(label)#Switch back label from ONE HOT
        return img, label
        #return data_transforms(image), paddle.nn.functional.label_smooth(label)

    def __len__(self):
        Step 4: Implement__len__Method to return the total number of data sets
        return len(self.img)

PLACE = paddle.CUDAPlace(0)
# train_loader
train_dataset = MyDataset(
train_loader = paddle.io.DataLoader(

# val_loader
val_dataset = MyDataset(

val_loader = paddle.io.DataLoader(

print('train size:', train_dataset.__len__())
print('val size:', val_dataset.__len__())    

# View picture data, size and label
for data, label in train_dataset:
train Size: 16500
val Size: 3430
(3, 100, 100)

Build network

The network refers to the ID: three-year-old Public project , I would like to express my thanks

class MyCNN(paddle.nn.Layer):
    def __init__(self):
        self.conv0 = paddle.nn.Conv2D(in_channels=3, out_channels=20, kernel_size=5, padding=0)  # Two dimensional convolution
        self.pool0 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)  # Maximum pool layer
        self._batch_norm_0 = paddle.nn.BatchNorm2D(num_features = 20)  # Return to one level

        self.conv1 = paddle.nn.Conv2D(in_channels=20, out_channels=50, kernel_size=5, padding=0)
        self.pool1 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)
        self._batch_norm_1 = paddle.nn.BatchNorm2D(num_features = 50)

        self.conv2 = paddle.nn.Conv2D(in_channels=50, out_channels=50, kernel_size=5, padding=0)
        self.pool2 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)
        self.fc1 = paddle.nn.Linear(in_features=4050, out_features=218)  # Linear layer
        self.fc2 = paddle.nn.Linear(in_features=218, out_features=100)
        self.fc3 = paddle.nn.Linear(in_features=100, out_features=11)
    def forward(self,input):
        input = paddle.reshape(input,shape=[-1,3,100,100])  # Conversion dimension reading
        # print(input.shape)
        x = self.conv0(input)  #Data input convolution
        x = F.relu(x)  # Active layer
        x = self.pool0(x)  # Pool layer
        x = self._batch_norm_0(x)  # Return to one level

        x = self.conv1(x)
        x = F.relu(x)
        x = self.pool1(x)
        x = self._batch_norm_1(x)

        x = self.conv2(x)
        x = F.relu(x)
        x = self.pool2(x)
        x = paddle.reshape(x, [x.shape[0], -1])
        # print(x.shape)

        x = self.fc1(x)  # Linear layer
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)
        y = F.softmax(x)  # classifier 
        return y
network = MyCNN()  # Model instantiation
paddle.summary(network, (1,3,100,100))  # Model structure view
 Layer (type)       Input Shape          Output Shape         Param #    
   Conv2D-1      [[1, 3, 100, 100]]    [1, 20, 96, 96]         1,520     
  MaxPool2D-1    [[1, 20, 96, 96]]     [1, 20, 48, 48]           0       
 BatchNorm2D-1   [[1, 20, 48, 48]]     [1, 20, 48, 48]          80       
   Conv2D-2      [[1, 20, 48, 48]]     [1, 50, 44, 44]        25,050     
  MaxPool2D-2    [[1, 50, 44, 44]]     [1, 50, 22, 22]           0       
 BatchNorm2D-2   [[1, 50, 22, 22]]     [1, 50, 22, 22]          200      
   Conv2D-3      [[1, 50, 22, 22]]     [1, 50, 18, 18]        62,550     
  MaxPool2D-3    [[1, 50, 18, 18]]      [1, 50, 9, 9]            0       
   Linear-1         [[1, 4050]]            [1, 218]           883,118    
   Linear-2          [[1, 218]]            [1, 100]           21,900     
   Linear-3          [[1, 100]]            [1, 11]             1,111     
Total params: 995,529
Trainable params: 995,249
Non-trainable params: 280
Input size (MB): 0.11
Forward/backward pass size (MB): 3.37
Params size (MB): 3.80
Estimated Total Size (MB): 7.29

{'total_params': 995529, 'trainable_params': 995249}


Because the training set adopts shuffle, the training set is too large, which can improve batch_size

model = paddle.Model(network)  # Model encapsulation

# Configure optimizer, loss function and evaluation index
model.prepare(paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters()), 

# Start the whole process training of the model
model.fit(train_dataset,  # Training data set
          val_dataset,   # Evaluation data set
          epochs=5,       # Total rounds of training
          batch_size=64,  # Batch size for training
          verbose=1      # Log display form
The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/5

step 258/258 [==============================] - loss: 2.1116 - acc: 0.2521 - 3s/step         
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 54/54 [==============================] - loss: 2.4477 - acc: 0.0469 - 775ms/step        
Eval samples: 3430
Epoch 2/5
step 258/258 [==============================] - loss: 2.1425 - acc: 0.3990 - 3s/step         
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 54/54 [==============================] - loss: 2.5067 - acc: 0.0207 - 825ms/step        
Eval samples: 3430
Epoch 3/5
step 258/258 [==============================] - loss: 2.0256 - acc: 0.4742 - 3s/step         
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 54/54 [==============================] - loss: 2.4263 - acc: 0.0825 - 776ms/step        
Eval samples: 3430
Epoch 4/5
step 258/258 [==============================] - loss: 1.9744 - acc: 0.5348 - 3s/step         
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 54/54 [==============================] - loss: 2.4410 - acc: 0.0985 - 809ms/step        
Eval samples: 3430
Epoch 5/5
step 258/258 [==============================] - loss: 1.8584 - acc: 0.5815 - 3s/step         
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 54/54 [==============================] - loss: 2.3978 - acc: 0.1087 - 792ms/step        
Eval samples: 3430
model.save('finetuning/mnist')  # Save model
def openimg():  # Read picture function
    with open(f'test_set.txt') as f:  #Read folder
        test_img = []
        txt =  []
        for line in f.readlines():  # Cycle through each line
            img = Image.open(line[:-1])  # Open picture
            img = img.resize((100, 100), Image.ANTIALIAS)  # Size normalization
            img = np.array(img).astype('float32')  # Convert to array
            img = img.transpose((2, 0, 1))     #The image read out is rgb,rgb,rbg, Transpose to RRR, ggg..., bbb...
            img = img/255.0  # zoom
            txt.append(line[:-1])  # Generate list
        return txt,test_img
img_path, img = openimg()  # Read list


Establish a query list and finally display it with Chinese tags

from PIL import Image
labal_name=['bread','dairy','Dessert','egg','fried food','meat','noodle/pasta','rice','seafood','soup','Vegetables/Fruits']
site = 255  # Read picture location
model_state_dict = paddle.load('finetuning/mnist.pdparams')  # Read model
model = MyCNN()  # Instantiation model

ceshi = model(paddle.to_tensor(img[site]))  # test
print('The predicted result is:', labal_name[np.argmax(ceshi.numpy())])  # Get value
Image.open(img_path[site])  # display picture
label: 10
 The predicted result is: Vegetables/Fruits

