TensorFlow----fashion_ Construction of neural network for MNIST data set

Posted by Seamless on Wed, 26 Jan 2022 22:30:14 +0100

preface

Study Xiaobai deeply. I hope you guys will forgive me if you make mistakes.

1, Data set loading and data set preprocessing

The data set can be downloaded directly from the Internet. Here, the data set is divided into training set and test set, but more often, we will be divided into training set, cross validation set and test set, so the training effect will be better.

(x, y), (x_test, y_test) = datasets.fashion_mnist.load_data()

X, y, X here_ test , y_ Test is of Numpy type and should be converted to tensor type

def preprocess(x, y):
    # And normalized
    x = tf.cast(x, dtype=tf.float32) / 255.
    y = tf.cast(y, dtype=tf.int32)
    return x, y

This function is used in combination with map() and passed into preprocess to complete mapping and type conversion

Then, in order to speed up the calculation, the whole sample is sliced into small samples of batch size
To facilitate operation, you can create an object of the dataset class, and then use the dataset member method batch

db = tf.data.Dataset.from_tensor_slices((x, y))
db = db.map(preprocess).shuffle(10000).batch(batches)

Do the same for the test set

Cycle through each batch data sample

sample = next(iter(db))

2, Construction of fully connected network layer

Use the Sequential container to generate an instance of the Sequential class

model = Sequential([
    # [b,784] @ [784,256]--> [b,256]
    layers.Dense(256, activation=tf.nn.relu),
    # [b,256]--> [b,128]
    layers.Dense(128, activation=tf.nn.relu),
    # [b, 128] --> [b,64]
    layers.Dense(64, activation=tf.nn.relu),
    # [b,64] --> [b,32]
    layers.Dense(32, activation=tf.nn.relu),
    # [b, 32] -- > [b, 10] output layer
    layers.Dense(10)
])

The initialization of network weights, offsets and input dimensions and the output of network model parameters are completed by member function build and summary

model.build(input_shape=[None, 28 * 28]) 
model.summary()

Construct optimizer
The optimizer mainly uses apply_gradients method passes in variables and corresponding gradients to iterate the given variables, or directly use the minimize method to iterate and optimize the objective function.

optimizers = optimizers.Adam(learning_rate=1e-3)

3, Calculate the gradient and cost function and update the parameters

When using the automatic derivation function to calculate the gradient, the forward calculation process needs to be placed in TF In the GradientTape () environment, the gradient() method of GradientTape object is used to automatically solve the gradient of parameters, and the optimizers object is used to update parameters

 with tf.GradientTape() as tape:
 	logits = model(x)
 	y_onehot = tf.one_hot(y, depth=10)
 	loss_ce = tf.losses.categorical_crossentropy(y_onehot, logits, from_logits=True)
    loss_ce = tf.reduce_mean(loss_ce)

grads = tape.gradient(loss_ce, model.trainable_variables)
optimizers.apply_gradients(zip(grads, model.trainable_variables))

4, Complete program

# -*- codeing = utf-8 -*-
# @Time : 10:02
# @Author:Paranipd
# @File : mnist_test.py
# @Software:PyCharm

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics  # Data set, network layer, classifier, container

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # Remove unnecessary error reports

# Preprocessing, parameter type conversion, and the loaded dataset is of Nunpy type
def preprocess(x, y):
    # And normalized
    x = tf.cast(x, dtype=tf.float32) / 255.
    y = tf.cast(y, dtype=tf.int32)
    return x, y

# Loading data set = training set + test set x,y: normal data type (Numpy)
# x.sahpe : (60000, 28 , 28) y.shape:(60000,)
# x.min-max:(0,255)   y :[0,9]
(x, y), (x_test, y_test) = datasets.fashion_mnist.load_data()
print(x.shape, x.dtype, y.shape, y.dtype)

# Number of batch samples
batches = 128
# Create a dataset whose elements are slices of a given tensor
# By using TF data. Interface provided by dataset from_tensor_slices will (x, y) - > the object of the dataset class
db = tf.data.Dataset.from_tensor_slices((x, y))
# map data type conversion
# shuffle randomly scatters the data set
# Batch forms multiple samples into a batch to speed up the calculation
# Note that different orders will have different results
db = db.map(preprocess).shuffle(10000).batch(batches)  # In order to make the number of data samples taken each time as batches
# print('db:', db)


db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
db_test = db_test.map(preprocess).batch(batches)

# Iterate through the object db of the Dataset to obtain the next batch batch = = > sample = next (ITER (db))
db_iter = iter(db)
sample = next(db_iter)
print('batch:', sample[0].shape, sample[1].shape)

# Encapsulate a large network class object through Sequential container
model = Sequential([
    # [b,784] @ [784,256]--> [b,256]
    layers.Dense(256, activation=tf.nn.relu),
    # [b,256]--> [b,128]
    layers.Dense(128, activation=tf.nn.relu),
    # [b, 128] --> [b,64]
    layers.Dense(64, activation=tf.nn.relu),
    # [b,64] --> [b,32]
    layers.Dense(32, activation=tf.nn.relu),
    # [b, 32] -- > [b, 10] output layer
    layers.Dense(10)
])

# Initialize the weights and dimensions of the network
model.build(input_shape=[None, 28 * 28])    # Class method of Sequential class
# Output the parameter status of each layer of the network mode and view the structure of the network model
model.summary()

# w = w - lr * grad
# Setting of learning rate and updating parameters
optimizers = optimizers.Adam(learning_rate=1e-3)


def main():
    # Iterate the entire dataset 30 times
    for epoch in range(30):
        # Iterate over the data set object, wait for the step parameter, and complete a batch of data training, which is called a step
        # Partial data sets are processed in batches, and 128 samples are processed at a time
        for step, (x, y) in enumerate(db):
            # x: [b, 28,28] = = > [b, 784] one dimensional
            # y: [b]
            x = tf.reshape(x, [-1, 28 * 28])

            # When using the automatic derivation function to calculate the gradient, the forward calculation process needs to be placed in TF Gradienttape() environment
            # The gradient() method of GradientTape object is used to automatically solve the gradient of parameters
            # And use the optimizers object to update the parameters
            with tf.GradientTape() as tape:    # Gradient recorder
                # [b,784] ==> [b,10]
                # model(x) is actually in the calling class__ call__ method
                # Output network model (forward calculation) results
                logits = model(x)
                # onehot coding
                y_onehot = tf.one_hot(y, depth=10)

                # Mean square deviation cost function
                loss_mse = tf.reduce_mean(tf.losses.MSE(y_onehot, logits))
                # Cross entropy loss calculation function
                loss_ce = tf.losses.categorical_crossentropy(y_onehot, logits, from_logits=True)
                loss_ce = tf.reduce_mean(loss_ce)

            # Derivation of all optimizable variables
            grads = tape.gradient(loss_ce, model.trainable_variables)
            # Updating can optimize the tensor
            # zip packages the corresponding elements into tuples, which form a list
            optimizers.apply_gradients(zip(grads, model.trainable_variables))

            if step % 100 == 0:
                print(epoch, step, 'loss:', float(loss_ce), float(loss_mse))

        # test calculates an accuracy
        total_correct = 0
        total_num = 0
        for x, y in db_test:
            # x: [b,28,28] ==> [b,784]
            # y: [b]
            x = tf.reshape(x, [-1, 28 * 28])
            # [b,10]
            logits = model(x)
            # logits --> prob: [b,10] int64

            # Normalize the output results to obtain the probability that the sum is 1
            prob = tf.nn.softmax(logits, axis=1)  # [0,1]
            # Find the index position corresponding to the maximum value of the dimension
            pred = tf.argmax(prob, axis=1)
            pred = tf.cast(pred, dtype=tf.int32)
            # pred:[b]
            # y: [b]
            # correct: [b], True(1): equal; False(0): not equal
            correct = tf.equal(pred, y)
            correct = tf.reduce_sum(tf.cast(correct, dtype=tf.int32))
            # Number of forecast pairs
            total_correct += int(correct)
            # Total quantity
            total_num += x.shape[0]
        # accuracy
        acc = total_correct / total_num
        print(epoch, 'text acc:', acc)

if __name__ == "__main__":
    main()

Model results

The final prediction accuracy is about 0.87. If we do some optimization and error processing, the accuracy can be higher.

summary

Tip: here is a summary of the article:

Topics: neural networks TensorFlow Deep Learning

Programmer Think