Keras builds ACGAN to generate MNIST handwritten pictures

Posted by ari_aaron on Sat, 25 Dec 2021 04:25:24 +0100

I CGAN and ACGAN

1. CGAN

The input of ordinary GAN is an N-dimensional normal distribution along with machine number \color{red} random number Random number, and CGAN will along with machine number add upper mark sign \color{red} label random numbers Labeling random numbers;
CGAN uses the Embedding layer to convert positive integers (index values) to solid set ruler inch of Thick dense towards amount \color{red} dense vector of fixed size A dense vector of fixed size, and the dense vector is connected with the N-dimensional normally distributed random number mutually ride \Multiply color{red} Multiply to get a have mark sign of along with machine number \color{red} labeled random number Labeled random number.

2. ACGAN

ACGAN will deep degree volume product network Collaterals \color{red} depth convolution network Depth convolution network Save stay mark sign of G A N \color{red} GAN with label In GAN with tags, higher quality pictures can be generated.
ACGAN equivalent D C G A N ( deep degree volume product living become yes resist network Collaterals ) and C G A N ( strip piece living become yes resist network Collaterals ) of junction close \Color {red} combination of dcgan (deep convolution generation countermeasure network) and cgan (conditional generation countermeasure network) The combination of DCGAN (deep convolution generation countermeasure network) and cgan (conditional generation countermeasure network) will deep degree volume product network Collaterals and mark sign belt enter reach G A N \color{red} depth convolution network and label are brought into GAN Deep convolution networks and labels are brought into GAN.

II Construction of ACGAN neural network

2.1 generator

transport enter \color{red} input The input is a random number with a label.
The specific operation methods are:
- Generate an N-dimensional normally distributed random number, and then use the Embedding layer to convert the positive integer (index value) into an N-dimensional dense vector;
- Multiply this dense vector by an N-dimensional normally distributed random number.
model type junction structure \color{red} model structure Model structure: reshape the input number, and then generate an image of corresponding size by using a series of up sampling and convolution
transport enter \color{red} input After reshape the input number, the image is generated by up sampling and convolution.

Enter data code:

noise = Input(shape=(self.latent_dim,))  # (none,100)
label = Input(shape=(1,), dtype='int32')  # (none,1)
# Converts a positive integer (index value) to a dense vector of fixed size.
label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))

model_input = multiply([noise, label_embedding])  # Desired number * randomly generated noise
img = model(model_input)

Generator code

 	def build_generator(self):

     model = Sequential(name='generator')

     # All connected to 64 * 7 * 7 tensor, why is the input late_ dim？
     model.add(Dense(32 * 7 * 7, activation="relu", input_dim=self.latent_dim))
     # reshape into feature layer style
     model.add(Reshape((7, 7, 32)))

     # The output is 7 * 7 * 64
     model.add(Conv2D(filters=64, kernel_size=3, padding="same"))
     model.add(BatchNormalization(momentum=0.8))
     model.add(Activation("relu"))
     # Up sampling
     # 7*7*64->14*14*128
     model.add(UpSampling2D())
     model.add(Conv2D(filters=128, kernel_size=3, padding="same"))
     model.add(BatchNormalization(momentum=0.8))
     model.add(Activation("relu"))

     # 14*14*64->28*28*64
     model.add(UpSampling2D())
     model.add(Conv2D(filters=64, kernel_size=3, padding="same"))
     model.add(BatchNormalization(momentum=0.8))
     model.add(Activation("relu"))

     # 28*28*64->28*28*1
     model.add(Conv2D(self.channels, kernel_size=3, padding="same"))
     model.add(Activation("tanh"))

     model.summary()

     noise = Input(shape=(self.latent_dim,))  # (none,100)
     label = Input(shape=(1,), dtype='int32')  # (none,1)
     # Converts a positive integer (index value) to a dense vector of fixed size.
     label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))

     model_input = multiply([noise, label_embedding])  # Desired number * randomly generated noise
     img = model(model_input)

     return Model([noise, label], img)

2.2 discrimination model

The discriminant network of ACGAN is composed of convolution. Compared with the discriminant model of ordinary GAN, its purpose is no only want sentence break Out really false ， still want sentence break Out species class \color{red} not only to judge the authenticity, but also to judge the type Not only to judge the authenticity, but also to judge the type (the purpose of the ordinary GAN discrimination model is to judge the authenticity according to the input picture).

model type transport enter \color{red} model input Model input: a picture, which can be real or false
model type junction structure \color{red} model structure Model structure: reshape the input number, and then use a series of downsampling and convolution to obtain the characteristic graph
model type of transport Out \Output of color{red} model Model output: the result of judging the authenticity and type. transport Out have two individual \color{red} output has two There are two outputs:
- One is a number between 0 and 1. 1 means to judge whether the picture is true and 0 means to judge whether the picture is false;
- The other is a vector, which is used to judge what kind of picture this picture belongs to.

Discriminant model code

  def build_discriminator(self):
      model = Sequential(name='discriminator')

      # 28*28*1->14*14*16
      model.add(Conv2D(filters=16, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
      model.add(LeakyReLU(alpha=0.2))
      model.add(Dropout(0.25))

      # 14*14*16->8*8*32
      model.add(Conv2D(filters=32, kernel_size=3, strides=2, padding="same"))
      model.add(LeakyReLU(alpha=0.2))
      model.add(Dropout(0.25))
      model.add(BatchNormalization(momentum=0.8))
      # 8*8*32->4*4*64
      model.add(ZeroPadding2D(padding=((0, 1), (0, 1))))  # ？
      model.add(Conv2D(filters=64, kernel_size=3, strides=2, padding="same"))
      model.add(LeakyReLU(alpha=0.2))
      model.add(Dropout(0.25))
      model.add(BatchNormalization(momentum=0.8))
      # 4*4*64->4*4*128
      model.add(Conv2D(filters=128, kernel_size=3, strides=1, padding="same"))
      model.add(LeakyReLU(alpha=0.2))
      model.add(Dropout(0.25))
      model.add(GlobalAveragePooling2D())

      # import picture
      img = Input(shape=self.img_shape)  # (none,28,28,1))
      features = model(img)  # (none,128)

      validity = Dense(1, activation="sigmoid")(features)  # (none,1)
      label = Dense(self.num_classes, activation="softmax")(features)  # (none,10)

      return Model(img, [validity, label])

2.3 model training

It is roughly the same as the traditional GAN training idea, but the output of classification is added on this basis.
The training idea of ACGAN is divided into the following steps:

Randomly select batch_size a real picture and its label.
Randomly generated batch_size N-dimensional vectors and their corresponding label labels are combined using the Embedding layer and passed into the Generator to generate batch_ This is a fake picture.
The loss function of Discriminator consists of two parts: one is the comparison between the judgment result of authenticity and the real situation, and the other is the comparison between the judgment result of the label to which the picture belongs and the real situation.
The loss function of the Generator is also composed of two parts: one is whether the generated picture is judged as 1 by the Discriminator, and the other is whether the generated picture is divided into correct classes.

def train(self, epochs, batch_size=128, sample_interval=50):
     # Load data
     (X_train, y_train), (_,  _) = mnist.load_data()

     # Data preprocessing and normalization
     X_train = (X_train.astype(np.float32) - 127.5) / 127.5
     X_train = np.expand_dims(X_train, axis=3)
     Y_train = y_train.reshape(-1, 1)

     valid = np.ones((batch_size, 1))  # (256，1)
     fake = np.zeros((batch_size, 1))  # (256，1)

     for epoch in range(epochs):
         # Random selection of real pictures
         idx = np.random.randint(0, X_train.shape[0], batch_size)
         img, labels = X_train[idx], Y_train[idx]
         # Generate model to generate imitation image
         noise = np.random.normal(0, 1, (batch_size, self.latent_dim))  # [256,100]
         sampled_label = np.random.randint(0, 10, (batch_size, 1))  # [256,1] label
         gen_imgs = self.generator.predict([noise, sampled_label])  # [256,28,28,1] generated picture

         d_loss_real = self.discriminator.train_on_batch(img, [valid, labels])  # list 5,5 float numbers, 1 loss, 4 metrics
         d_loss_fake = self.discriminator.train_on_batch(gen_imgs, [fake, sampled_label])  # list 5,5 float numbers
         d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

         # Training generation model
         g_loss = self.combined.train_on_batch([noise, sampled_label], [valid, sampled_label])
         # list 3,3 float numbers, 1 loss, 2 metrics

         print("%d [D loss: %f, acc.: %.2f%%, op_acc: %.2f%%] [G loss: %f]" % (
             epoch, d_loss[0], 100 * d_loss[3], 100 * d_loss[4], g_loss[0]))

2.4 model call

Load the weight of the generation model. Given the random number and specific label, the generation model can be called to generate (32,32,3) pictures to achieve the purpose of data amplification.

if __name__ == '__main__':
    acgan = ACGAN()
    weight_path = "./weights/gen_epoch20000.h5"
    acgan.test(Label =np.array([[3]]),weight_path = weight_path)

3, Complete code

from __future__ import print_function, division
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, multiply, Dropout
from tensorflow.keras.layers import BatchNormalization, Activation, Embedding, ZeroPadding2D, GlobalAveragePooling2D,MaxPooling2D
from tensorflow.keras.layers import LeakyReLU, UpSampling2D, Conv2D
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adam

import matplotlib.pyplot as plt
import os
import numpy as np

class ACGAN():
    def __init__(self):
        # Enter the size of the picture
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        # Since there are ten numbers, they are divided into ten categories
        self.num_classes = 10
        # What does this mean?
        self.latent_dim = 100
        # adam optimizer
        # optimizer = Adam(learning_rate=0.0002, beta_1=0.5)
        optimizer = Adam(0.0002, 0.5)
        # Discriminant model
        losses = ['binary_crossentropy', 'sparse_categorical_crossentropy']
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss=losses,
                                   optimizer=optimizer,
                                   metrics=['accuracy'])
        # Generation model
        self.generator = self.build_generator()
        # Combine is a combination of generative model and discriminant model
        # When training the generation model, the trainable of the discrimination model is False
        # Model for training generation
        # Generate a noise picture
        noise = Input(shape=(self.latent_dim,))  # [none,100]
        label = Input(shape=(1,))                # [none, 1]
        img = self.generator([noise, label])     # [28, 28, 1]

        self.discriminator.trainable = False

        valid, target_labels = self.discriminator(img)  # valid[none,1];target_labels[none,10]

        self.combined = Model([noise, label], [valid, target_labels])  # ?
        self.combined.compile(loss=losses,
                              optimizer=optimizer)  # ?

    def build_generator(self):

        model = Sequential(name='generator')

        # All connected to 64 * 7 * 7 tensor, why is the input late_ dim？
        model.add(Dense(32 * 7 * 7, activation="relu", input_dim=self.latent_dim))
        # reshape into feature layer style
        model.add(Reshape((7, 7, 32)))

        # The output is 7 * 7 * 64
        model.add(Conv2D(filters=64, kernel_size=3, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Activation("relu"))
        # Up sampling
        # 7*7*64->14*14*128
        model.add(UpSampling2D())
        model.add(Conv2D(filters=128, kernel_size=3, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Activation("relu"))

        # 14*14*64->28*28*64
        model.add(UpSampling2D())
        model.add(Conv2D(filters=64, kernel_size=3, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Activation("relu"))

        # 28*28*64->28*28*1
        model.add(Conv2D(self.channels, kernel_size=3, padding="same"))
        model.add(Activation("tanh"))

        model.summary()

        noise = Input(shape=(self.latent_dim,))  # (none,100)
        label = Input(shape=(1,), dtype='int32')  # (none,1)
        # Converts a positive integer (index value) to a dense vector of fixed size.
        label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))

        model_input = multiply([noise, label_embedding])  # Desired number * randomly generated noise
        img = model(model_input)

        return Model([noise, label], img)

    def build_discriminator(self):
        model = Sequential(name='discriminator')

        # 28*28*1->14*14*16
        model.add(Conv2D(filters=16, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))

        # 14*14*16->8*8*32
        model.add(Conv2D(filters=32, kernel_size=3, strides=2, padding="same"))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(BatchNormalization(momentum=0.8))
        # 8*8*32->4*4*64
        model.add(ZeroPadding2D(padding=((0, 1), (0, 1))))  # ？
        model.add(Conv2D(filters=64, kernel_size=3, strides=2, padding="same"))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(BatchNormalization(momentum=0.8))
        # 4*4*64->4*4*128
        model.add(Conv2D(filters=128, kernel_size=3, strides=1, padding="same"))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(GlobalAveragePooling2D())

        # import picture
        img = Input(shape=self.img_shape)  # (none,28,28,1))
        features = model(img)  # (none,128)

        validity = Dense(1, activation="sigmoid")(features)  # (none,1)
        label = Dense(self.num_classes, activation="softmax")(features)  # (none,10)

        return Model(img, [validity, label])

    def train(self, epochs, batch_size=128, sample_interval=50):
        # Load data
        (X_train, y_train), (_,  _) = mnist.load_data()

        # Data preprocessing and normalization
        X_train = (X_train.astype(np.float32) - 127.5) / 127.5
        X_train = np.expand_dims(X_train, axis=3)
        Y_train = y_train.reshape(-1, 1)

        valid = np.ones((batch_size, 1))  # (256，1)
        fake = np.zeros((batch_size, 1))  # (256，1)

        for epoch in range(epochs):
            # Random selection of real pictures
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            img, labels = X_train[idx], Y_train[idx]
            # Generate model to generate imitation image
            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))  # [256,100]
            sampled_label = np.random.randint(0, 10, (batch_size, 1))  # [256,1] label
            gen_imgs = self.generator.predict([noise, sampled_label])  # [256,28,28,1] generated picture

            d_loss_real = self.discriminator.train_on_batch(img, [valid, labels])  # list 5,5 float numbers, 1 loss, 4 metrics
            d_loss_fake = self.discriminator.train_on_batch(gen_imgs, [fake, sampled_label])  # list 5,5 float numbers
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # Training generation model
            g_loss = self.combined.train_on_batch([noise, sampled_label], [valid, sampled_label])
            # list 3,3 float numbers, 1 loss, 2 metrics

            print("%d [D loss: %f, acc.: %.2f%%, op_acc: %.2f%%] [G loss: %f]" % (
                epoch, d_loss[0], 100 * d_loss[3], 100 * d_loss[4], g_loss[0]))

            if epoch % sample_interval == 0:
                self.sampel_image(epoch)

    def sampel_image(self, epoch):
        r, c = 2, 5
        noise = np.random.normal(0, 1, (r * c, 100))  # ?
        sampled_labels = np.arange(0, 10).reshape(-1, 1)

        gen_imgs = self.generator.predict([noise, sampled_labels])
        gen_imgs = 0.5 * gen_imgs + 0.5

        fig, axs = plt.subplots(r, c)
        cnt = 0
        for i in range(r):
            for j in range(c):
                axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
                axs[i, j].set_title("Digit: %d" % sampled_labels[cnt])
                axs[i, j].axis('off')
                cnt += 1
        fig.savefig("images/%d.png" % epoch)
        plt.close()

    def save_model(self):

        def save(model, model_name):
            model_path = "saved_model/%s.json" % model_name
            weights_path = "saved_model/%s_weights.hdf5" % model_name
            options = {"file_arch": model_path,
                       "file_weight": weights_path}
            json_string = model.to_json()
            open(options['file_arch'], 'w').write(json_string)
            model.save_weights(options['file_weight'])

        save(self.generator, "generator")
        save(self.discriminator, "discriminator")


if __name__ == '__main__':
    if not os.path.exists("./images"):
        os.makedirs("./images")
    acgan = ACGAN()
    acgan.train(epochs=30000, batch_size=256, sample_interval=50)

4. Reference

Topics: Python neural networks

Programmer Think