Learning notes of HybridSN: Exploring 3-D – 2-DCNN Feature Hierarchy for Hyperspectral Image Classification

Posted by legohead6 on Mon, 04 Oct 2021 05:08:49 +0200

Basic information of the paper

  • Title: HybridSN: Exploring 3-D – 2-DCNN Feature Hierarchy for Hyperspectral Image Classification
  • Author: swalpa Kumar Roy, student member, IEEE, gopal Krishna, SHIV ram Dubey, member, IEEE, and bidyut B. Chaudhuri, life fellow, IEEE
  • Time: 2019
  • Paper address: arXiv:1902.06701v3 [cs.CV] 3 Jul 2019
  • code: https://github.com/OUCTheoryGroup/colab_demo/blob/master/202003_models/HybridSN_GRSL2020.ipynb

Research background

Hyperspectral image classification is widely used in the analysis of remote sensing images. Hyperspectral images include images of different bands. Convolutional neural network (CNN) is a common visual data processing method based on deep learning. The use of CNN for HSI classification can also be seen in recent articles. Most of these methods are based on 2D CNN. However, HSI classification performance is highly dependent on spatial and spectral information. Due to the increase of computational complexity, there are few methods to use 3D CNN. In this paper, a hybrid spectral convolutional neural network (HybridSN) for HSI classification is proposed. Basically, HybridSN is a spectral spatial 3D-CNN, followed by spatial 2D-CNN. 3D-CNN promotes the joint spatial spectral feature representation from the spectral band stack. 2D-CNN on top of 3D-CNN further learns more abstract hierarchical space representation. In addition, using hybrid CNN reduces the complexity of the model than using 3D-CNN alone.

Questions after reading the summary

  • What is hyperspectral?
  • Model structure of HybridSN

Answers to the above questions after reading the article

Hyperspectral Fundamentals

  • Monochromatic light: light of a single wavelength (or frequency) that cannot produce dispersion.
  • Polychromatic light: light synthesized by several monochromatic lights.
  • Dispersion system: the phenomenon that polychromatic light is decomposed into monochromatic light to form a spectrum.
  • Spectrum (optical spectrum): a pattern in which the monochromatic light dispersed is sequentially arranged according to the wavelength (or frequency) after the dichroic light is divided by the dispersion system (such as prism and grating).
  • Grating: an optical device consisting of a large number of parallel slits of equal width and equal spacing.
  • Hyperspectral Image: detailed segmentation is carried out in the spectral dimension, which is not only the difference between traditional black, white or RGB, but also N channels in the spectral dimension. For example, we can divide 400nm-1000nm into 300 channels. Once, a data cube is obtained through hyperspectral equipment, which not only has the information of the image, but also expands it in the spectral dimension. As a result, we can not only obtain the spectral data of each point on the image, but also obtain the image information of any spectral segment.
  • Imaging principle of hyperspectral image: after the one-dimensional information in space passes through the lens and slit, light of different wavelengths propagates according to different degrees of dispersion. Each point on the one-dimensional image is diffracted and divided through the grating to form a spectral band, which irradiates on the detector. The position and intensity of each pixel on the detector characterize the spectrum and intensity.
  • H SI: digital image model, which reflects the way people'S visual system perceives color. It perceives color with three basic feature quantities: Hue (Hue), Saturation (Saturation) and brightness I (Intensity).
  • Hue H (hue): it is related to the frequency of light wave. It represents the feeling of human senses on different colors, and can also represent a certain range of colors;
  • Saturation S: indicates the purity of the color. The pure spectral color is completely saturated. Adding white light will dilute the saturation. The greater the saturation, the brighter the color looks.
  • Brightness I (Intensity): corresponds to the imaging brightness and image gray level. It is the brightness of the color.


According to the above description, 2D CNN is difficult to fully extract hyperspectral spatial information, while 3D CNN is rarely used due to the huge amount of computation.
In this paper, 2D and 3D CNN are combined, 3D CNN is used to extract hyperspectral spatial features, and 2D CNN is used for more abstract hierarchical spatial representation.

The network structure is shown in the figure above (3 three-dimensional convolutions, 1 two-dimensional convolution, 3 full connection layers):

  • In three-dimensional convolution, the size of convolution kernel is 8 × three × three × seven × 1,16 × three × three × five × 8,32 × three × three × three × 16 (16 3D cores, 3) × three × (5d)
  • In two-dimensional convolution, the size of convolution kernel is 64 × three × three × 576 (576 is the number of two-dimensional input feature maps)

experimental verification

Dataset description

Three open source hyperspectral image datasets, Indian Pines (IP), University of Pavia (UP) and Salinas Scene (SA)

IP image space dimension: 145 × 145, with a wavelength range of 400-2500 nm and a total of 224 spectral bands
UP image space dimension: 610 × 340, wavelength range 430-860nm, a total of 103 spectral bands
SA image space dimension: 512 × 217, with a wavelength range of 360-2500 nm and a total of 224 spectral bands

Classification results

Overall Accuracy (OA), Average Accuracy (AA) and Kappa Coefficient (Kappa) were used to judge the classification performance of HSI. Where OA represents the number of correctly classified samples in the overall test samples; AA represents the average value of classification accuracy; Kappa is a measure of statistical measurement, which provides mutual information about the strong consistency between ground truth map and classification map. The results of the proposed HybridSN model are compared with the most widely used monitoring methods such as SVM[33], 2D-CNN[34], 3D-CNN[35], M3D-CNN[36] and SSRN[27]. 30% and 70% of the data were randomly divided into training group and test group 2. We used the public code 3 of the comparison method to calculate the result.

It can be seen from table 2 that the hybrid algorithm is superior to all comparison methods in each data set while maintaining the minimum standard deviation.
Figure 3 shows the confusion matrix of HSI classification performance of hybrid networks on IP, UP and SA data sets.
The accuracy and loss convergence of 100 epochs in the training set and verification set are shown in Figure 4. It can be seen that the convergence time is about 50 epochs, which shows that the convergence speed of the author's method is fast.
The calculation efficiency of hybrid network model is reflected in the number of training and testing, as shown in Table 3.

code analysis

Three dimensional convolution part:
conv1: (1, 30, 25, 25), 8 convolution kernels of 7x3x3 = = > (8, 24, 23, 23)
conv2: (8, 24, 23, 23), 16 5x3x3 convolution kernels = = > (16, 20, 21, 21)
conv3: (16, 20, 21, 21), 32 convolution kernels of 3x3x3 = = > (32, 18, 19, 19)
Next, we need to carry out two-dimensional convolution, so take the previous 32*18 reshape and get (576, 19, 19)
Two dimensional convolution part:
(576, 19, 19) 64 3x3 convolution kernels are obtained (64, 17, 17)
Next is a flatten operation, which becomes a vector of 18496 dimensions,
Next is the full connection layer of 256128 nodes, all of which use Dropout with a ratio of 0.4,
The final output is 16 nodes, which is the final number of classification categories.

! wget http://www.ehu.eus/ccwintco/uploads/6/67/Indian_pines_corrected.mat
! wget http://www.ehu.eus/ccwintco/uploads/c/c4/Indian_pines_gt.mat
! pip install spectral
#Get the data and import it into the basic library
import numpy as np
import matplotlib.pyplot as plt
import scipy.io as sio
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report, cohen_kappa_score
import spectral
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

Define the HybridSN class:

class_num = 16

class HybridSN(nn.Module):
  ''' your code here '''
  def __init__(self):
    super(HybridSN, self).__init__()
    self.conv3d_1 = nn.Sequential(
        nn.Conv3d(1, 8, kernel_size=(7, 3, 3), stride=1, padding=0),
        nn.ReLU(inplace = True),
    self.conv3d_2 = nn.Sequential(
        nn.Conv3d(8, 16, kernel_size=(5, 3, 3), stride=1, padding=0),
        nn.ReLU(inplace = True),
    self.conv3d_3 = nn.Sequential(
        nn.Conv3d(16, 32, kernel_size=(3, 3, 3), stride=1, padding=0),
        nn.ReLU(inplace = True)

    self.conv2d_4 = nn.Sequential(
        nn.Conv2d(576, 64, kernel_size=(3, 3), stride=1, padding=0),
        nn.ReLU(inplace = True),
    self.fc1 = nn.Linear(18496,256)
    self.fc2 = nn.Linear(256,128)
    self.fc3 = nn.Linear(128,16)
    self.dropout = nn.Dropout(p = 0.4)

  def forward(self,x):
    out = self.conv3d_1(x)
    out = self.conv3d_2(out)
    out = self.conv3d_3(out)
    out = self.conv2d_4(out.reshape(out.shape[0],-1,19,19))
    out = out.reshape(out.shape[0],-1)
    out = F.relu(self.dropout(self.fc1(out)))
    out = F.relu(self.dropout(self.fc2(out)))
    out = self.fc3(out)
    return out
# Random input to test whether the network structure is connected
# x = torch.randn(1, 1, 30, 25, 25)
# net = HybridSN()
# y = net(x)
# print(y.shape)

Create dataset
Firstly, PCA dimensionality reduction is applied to hyperspectral data; Then create a data format that can be easily processed by keras; Then 10% of the data are randomly selected as the training set and the rest as the test set.

First, define the basic function:

# Applying PCA transform to hyperspectral data X
def applyPCA(X, numComponents):
    newX = np.reshape(X, (-1, X.shape[2]))
    pca = PCA(n_components=numComponents, whiten=True)
    newX = pca.fit_transform(newX)
    newX = np.reshape(newX, (X.shape[0], X.shape[1], numComponents))
    return newX

# When a patch is extracted around a single pixel, the edge pixels cannot be obtained. Therefore, this part of pixels is padded
def padWithZeros(X, margin=2):
    newX = np.zeros((X.shape[0] + 2 * margin, X.shape[1] + 2* margin, X.shape[2]))
    x_offset = margin
    y_offset = margin
    newX[x_offset:X.shape[0] + x_offset, y_offset:X.shape[1] + y_offset, :] = X
    return newX

# The patch is extracted around each pixel and then created into a format consistent with keras processing
def createImageCubes(X, y, windowSize=5, removeZeroLabels = True):
    # padding X
    margin = int((windowSize - 1) / 2)
    zeroPaddedX = padWithZeros(X, margin=margin)
    # split patches
    patchesData = np.zeros((X.shape[0] * X.shape[1], windowSize, windowSize, X.shape[2]))
    patchesLabels = np.zeros((X.shape[0] * X.shape[1]))
    patchIndex = 0
    for r in range(margin, zeroPaddedX.shape[0] - margin):
        for c in range(margin, zeroPaddedX.shape[1] - margin):
            patch = zeroPaddedX[r - margin:r + margin + 1, c - margin:c + margin + 1]   
            patchesData[patchIndex, :, :, :] = patch
            patchesLabels[patchIndex] = y[r-margin, c-margin]
            patchIndex = patchIndex + 1
    if removeZeroLabels:
        patchesData = patchesData[patchesLabels>0,:,:,:]
        patchesLabels = patchesLabels[patchesLabels>0]
        patchesLabels -= 1
    return patchesData, patchesLabels

def splitTrainTestSet(X, y, testRatio, randomState=345):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=testRatio, random_state=randomState, stratify=y)
    return X_train, X_test, y_train, y_test

Read and create the dataset below:

# Feature category
class_num = 16
X = sio.loadmat('Indian_pines_corrected.mat')['indian_pines_corrected']
y = sio.loadmat('Indian_pines_gt.mat')['indian_pines_gt']

# Proportion of samples used for testing
test_ratio = 0.90
# The size of the extracted patch around each pixel
patch_size = 25
# PCA dimension reduction is used to obtain the number of principal components
pca_components = 30

print('Hyperspectral data shape: ', X.shape)
print('Label shape: ', y.shape)

print('\n... ... PCA tranformation ... ...')
X_pca = applyPCA(X, numComponents=pca_components)
print('Data shape after PCA: ', X_pca.shape)

print('\n... ... create data cubes ... ...')
X_pca, y = createImageCubes(X_pca, y, windowSize=patch_size)
print('Data cube X shape: ', X_pca.shape)
print('Data cube y shape: ', y.shape)

print('\n... ... create train & test data ... ...')
Xtrain, Xtest, ytrain, ytest = splitTrainTestSet(X_pca, y, test_ratio)
print('Xtrain shape: ', Xtrain.shape)
print('Xtest  shape: ', Xtest.shape)

# Change the shape of xtrain and ytrain to meet the requirements of keras
Xtrain = Xtrain.reshape(-1, patch_size, patch_size, pca_components, 1)
Xtest  = Xtest.reshape(-1, patch_size, patch_size, pca_components, 1)
print('before transpose: Xtrain shape: ', Xtrain.shape) 
print('before transpose: Xtest  shape: ', Xtest.shape) 

# In order to adapt to the pytorch structure, the data should be transferred
Xtrain = Xtrain.transpose(0, 4, 3, 1, 2)
Xtest  = Xtest.transpose(0, 4, 3, 1, 2)
print('after transpose: Xtrain shape: ', Xtrain.shape) 
print('after transpose: Xtest  shape: ', Xtest.shape) 

""" Training dataset"""
class TrainDS(torch.utils.data.Dataset): 
    def __init__(self):
        self.len = Xtrain.shape[0]
        self.x_data = torch.FloatTensor(Xtrain)
        self.y_data = torch.LongTensor(ytrain)        
    def __getitem__(self, index):
        # Return data and corresponding labels according to the index
        return self.x_data[index], self.y_data[index]
    def __len__(self): 
        # Returns the number of file data
        return self.len

""" Testing dataset"""
class TestDS(torch.utils.data.Dataset): 
    def __init__(self):
        self.len = Xtest.shape[0]
        self.x_data = torch.FloatTensor(Xtest)
        self.y_data = torch.LongTensor(ytest)
    def __getitem__(self, index):
        # Return data and corresponding labels according to the index
        return self.x_data[index], self.y_data[index]
    def __len__(self): 
        # Returns the number of file data
        return self.len

# Create trainloader and testloader
trainset = TrainDS()
testset  = TestDS()
train_loader = torch.utils.data.DataLoader(dataset=trainset, batch_size=128, shuffle=True, num_workers=2)
test_loader  = torch.utils.data.DataLoader(dataset=testset,  batch_size=128, shuffle=False, num_workers=2)

Start training
The cpu was used to train here, and only 50 epoch s were run

# Using GPU training, you can set it in the menu "code execution tool" - > "change runtime type"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Put the network on the GPU
net = HybridSN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# Start training
total_loss = 0
for epoch in range(50):
    for i, (inputs, labels) in enumerate(train_loader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # Optimizer gradient zeroing
        # Forward propagation + back propagation + optimization 
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        total_loss += loss.item()
    print('[Epoch: %d]   [loss avg: %.4f]   [current loss: %.4f]' %(epoch + 1, total_loss/(epoch+1), loss.item()))

print('Finished Training')

Model test

count = 0
# Model test
for inputs, _ in test_loader:
    inputs = inputs.to(device)
    outputs = net(inputs)
    outputs = np.argmax(outputs.detach().cpu().numpy(), axis=1)
    if count == 0:
        y_pred_test =  outputs
        count = 1
        y_pred_test = np.concatenate( (y_pred_test, outputs) )

# Generate classification Report
classification = classification_report(ytest, y_pred_test, digits=4)

Alternate function ¶
The following is the standby function used to calculate the accuracy of each class and display the results for reference

from operator import truediv

def AA_andEachClassAccuracy(confusion_matrix):
    counter = confusion_matrix.shape[0]
    list_diag = np.diag(confusion_matrix)
    list_raw_sum = np.sum(confusion_matrix, axis=1)
    each_acc = np.nan_to_num(truediv(list_diag, list_raw_sum))
    average_acc = np.mean(each_acc)
    return each_acc, average_acc

def reports (test_loader, y_test, name):
    count = 0
    # Model test
    for inputs, _ in test_loader:
        inputs = inputs.to(device)
        outputs = net(inputs)
        outputs = np.argmax(outputs.detach().cpu().numpy(), axis=1)
        if count == 0:
            y_pred =  outputs
            count = 1
            y_pred = np.concatenate( (y_pred, outputs) )

    if name == 'IP':
        target_names = ['Alfalfa', 'Corn-notill', 'Corn-mintill', 'Corn'
                        ,'Grass-pasture', 'Grass-trees', 'Grass-pasture-mowed', 
                        'Hay-windrowed', 'Oats', 'Soybean-notill', 'Soybean-mintill',
                        'Soybean-clean', 'Wheat', 'Woods', 'Buildings-Grass-Trees-Drives',
    elif name == 'SA':
        target_names = ['Brocoli_green_weeds_1','Brocoli_green_weeds_2','Fallow','Fallow_rough_plow','Fallow_smooth',
    elif name == 'PU':
        target_names = ['Asphalt','Meadows','Gravel','Trees', 'Painted metal sheets','Bare Soil','Bitumen',
                        'Self-Blocking Bricks','Shadows']
    classification = classification_report(y_test, y_pred, target_names=target_names)
    oa = accuracy_score(y_test, y_pred)
    confusion = confusion_matrix(y_test, y_pred)
    each_acc, aa = AA_andEachClassAccuracy(confusion)
    kappa = cohen_kappa_score(y_test, y_pred)
    return classification, confusion, oa*100, each_acc*100, aa*100, kappa*100

The test results are written in the document:

classification, confusion, oa, each_acc, aa, kappa = reports(test_loader, ytest, 'IP')
classification = str(classification)
confusion = str(confusion)
file_name = "classification_report.txt"

with open(file_name, 'w') as x_file:
    x_file.write('{} Kappa accuracy (%)'.format(kappa))
    x_file.write('{} Overall accuracy (%)'.format(oa))
    x_file.write('{} Average accuracy (%)'.format(aa))

The following code is used to display the classification results:

# load the original image
X = sio.loadmat('Indian_pines_corrected.mat')['indian_pines_corrected']
y = sio.loadmat('Indian_pines_gt.mat')['indian_pines_gt']

height = y.shape[0]
width = y.shape[1]

X = applyPCA(X, numComponents= pca_components)
X = padWithZeros(X, patch_size//2)

# Pixel by pixel prediction category
outputs = np.zeros((height,width))
for i in range(height):
    for j in range(width):
        if int(y[i,j]) == 0:
        else :
            image_patch = X[i:i+patch_size, j:j+patch_size, :]
            image_patch = image_patch.reshape(1,image_patch.shape[0],image_patch.shape[1], image_patch.shape[2], 1)
            X_test_image = torch.FloatTensor(image_patch.transpose(0, 4, 3, 1, 2)).to(device)                                   
            prediction = net(X_test_image)
            prediction = np.argmax(prediction.detach().cpu().numpy(), axis=1)
            outputs[i][j] = prediction+1
    if i % 20 == 0:
        print('... ... row ', i, ' handling ... ...')

Existing doubts

Topics: neural networks Pytorch Deep Learning