Principal Component Analysis

Posted by marsooka on Sat, 23 Oct 2021 03:25:11 +0200

Introduction ¶
In this experiment, we will implement the principal component analysis method and use it to obtain the low dimensional representation of face image.

The data sets used in this experiment include:

ex4data1.mat -2D simulation dataset
ex4data2.mat -LFW face dataset
The scoring criteria are as follows:

Key point 1: realize PCA algorithm ------------ (20 points)
Key point 2: dimension reduction simulation data ------------ (20 points)
Key point 3: reconstruct simulation data ------------ (20 points)
Key point 4: dimensionality reduction of face data ------------ (20 points)
Key point 5: reconstruct face data ------------ (20 points)

# Import the required library files
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sb
from scipy.io import loadmat
import scipy

%matplotlib inline

1. Realize principal component analysis

In this part of the experiment, the principal component analysis algorithm will be implemented. The algorithm steps are as follows:

Step 1: standardize all samples so that the mean value of the samples is 0 and the standard deviation is 1

Step 2: calculate the covariance matrix of the sample: X X T \mathbf{X}\mathbf{X}^T XXT

Step 3: pair covariance matrix X X T \mathbf{X}\mathbf{X}^T XXT performs eigenvalue decomposition (or singular value decomposition)

Step 4: take the largest d d Eigenvector corresponding to d eigenvalues w 1 , ⋯   , w d \mathbf{w}_1,\cdots,\mathbf{w}_{d} w1​,⋯,wd​

Output: projection matrix W = [ w 1 , ⋯   , w d ] \mathbf{W}=[\mathbf{w}_1,\cdots,\mathbf{w}_{d}] W=[w1​,⋯,wd​]

# ======================Fill in the code here=======================
def PCA(X,d):
    """   
    input
    ----------
    X : Size (D, m)Matrix of, page i Listed as No i Samples, D Is the dimension of the sample, m Is the number of samples. 
        
    d:  Expected dimension
    
    output
    -------
    W : Size (D, d)Projection matrix, section i The second column of the covariance matrix i An eigenvector. 
    """
    
    X=[(x-np.mean(x))/np.std(x,axis=0,ddof=1) for x in X]
    matrix=np.cov(X[0],X[1])
    X=np.array(X)
    Z=np.matmul(X,X.T)
    U,S,V=np.linalg.svd(Z)
    W=U[:,:d]
    return W 
# ============================================================= 

If the above function pca is completed, the following code can be used for testing. If the result is [- 0.707107, - 0.707107], the calculation passes.

# Load the dataset into the variable X 
data = loadmat(os.path.join('ex4data1.mat'))
X = data['X']

#  Run PCA
W= PCA(X.T,2)


print('First eigenvector: W[:, 0] = [{:.6f}, {:.6f}]'.format(W[0, 0], W[1, 0]))

First eigenvector: W[:, 0] = [-0.707107, -0.707107]

2 apply PCA to simulation data

In this part of the experiment, the realized PCA algorithm is applied to data set 1. The sample dimension of the data set is 2. Therefore, after dimensionality reduction, the results before and after dimensionality reduction can be observed visually.

#  Visualizing raw data sets
plt.plot(X[:, 0], X[:, 1], 'bo', ms=10, mec='k', mew=1)
plt.axis([0.5, 6.5, 2, 8])
plt.gca().set_aspect('equal')
plt.grid(False)

#  Visualization of standardized data sets
mu = np.mean(X,axis=0)
sigma = np.std(X,axis=0)
X_norm = (X - mu)/sigma
plt.plot(X_norm[:, 0], X_norm[:, 1], 'bo', ms=10, mec='k', mew=1)
plt.axis([-3, 2.75, -3, 2.75])
plt.gca().set_aspect('equal')
plt.grid(False)

2.1 dimensionality reduction using PCA

The above PCA code can reduce the simulation data to one dimension.

# ======================Fill in the code here=======================
def ProjectData(X, W):
    """   
    input
    ----------
    X : Size (D, m)Matrix of, page i Listed as No i Samples, D Is the dimension of the sample, m Is the number of samples. 
    W : Size (D, d)Projection matrix, section i The second column of the covariance matrix i An eigenvector.   
    
    output
    -------
    Z : Size (d, m)Matrix of, page i Listed as No i Data after dimensionality reduction. 
    """
    X=[(x-np.mean(x))/np.std(x,axis=0,ddof=1) for x in X]
    Z=np.matmul(W.T,X)
    

    return Z
# ============================================================= 

If the above function ProjectData is completed, the following code can be used for testing. If the result is 1.481274, the calculation passes.

#  Reduce data to one dimension 
W = PCA(X.T,1)
Z = ProjectData(X.T, W)
print('Data after dimension reduction of the first sample: {:.6f}'.format(Z[0, 0]))

Data after dimension reduction of the first sample: 1.481274

2.2 reconstruct the original data through the reduced dimension data

Using the reduced dimension data matrix Z \mathbf{Z} Z. The original data can be approximately reconstructed.

# ======================Fill in the code here=======================
def RecoverData(Z, W):
    """   
    input
    ----------
    Z : Size (d, m)Matrix of, page i Listed as No i Data after dimensionality reduction. 
    W : Size (D, d)Projection matrix, section i The second column of the covariance matrix i An eigenvector.   
    
    output
    -------
    X_rec : Size (D, m)Matrix of, page i Listed as No i A reconstructed data. 
    """
    W=np.matrix(W)
    X_rec=np.dot(W.T.I,Z)


    return X_rec
# ============================================================= 

If the above function RecoverData is completed, the following code can be used for testing. If the result is [- 1.047419, - 1.047419], the calculation passes.

#  Reconstruct original data
X_rec = RecoverData(Z, W).T

# print('first reconstructed data: {:. 6f}'.format(X_rec[:, 0]))
print('The first reconstructed data: X_rec[0, :] = [{:.6f}, {:.6f}]'.format(X_rec[0, 0], X_rec[0, 1]))

First reconstructed data: X_rec[0, :] = [-1.047419, -1.047419]

mu = np.mean(X,axis=0)
sigma = np.std(X,axis=0, ddof=1)
X_norm = (X - mu)/sigma

#  Plot the normalized dataset (returned from featureNormalize)
fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(X_norm[:, 0], X_norm[:, 1], 'bo', ms=8, mec='b', mew=0.5)
ax.set_aspect('equal')
ax.grid(False)
plt.axis([-3, 2.75, -3, 2.75])

# Draw lines connecting the projected points to the original points
ax.plot(X_rec[:, 0], X_rec[:, 1], 'ro', mec='r', mew=2, mfc='none')
for xnorm, xrec in zip(X_norm, X_rec):
    ax.plot([xnorm[0], xrec[0]], [xnorm[1], xrec[1]], '--k', lw=1)

3 apply PCA to face reconstruction

In this part of the experiment, the implemented PCA algorithm is applied to dataset 2, which is a subset of LFW face data set.

#  Load Face dataset
data = loadmat(os.path.join( 'ex4data2.mat'))
X = data['X']
#Define functions to display face images
def displayData(X, example_width=None, figsize=(8, 8)):

    """   
    input
    ----------
    X : Size (m, D)Matrix of, page i Listed as No i Samples, D Is the dimension of the sample, m Is the number of samples.   
    """
    
    # Compute rows, cols
    if X.ndim == 2:
        m, n = X.shape
    elif X.ndim == 1:
        n = X.size
        m = 1
        X = X[None]  # Promote to a 2 dimensional array
    else:
        raise IndexError('Input X should be 1 or 2 dimensional.')

    example_width = example_width or int(np.round(np.sqrt(n)))
    example_height = int(n / example_width)

    # Compute number of items to display
    display_rows = int(np.floor(np.sqrt(m)))
    display_cols = int(np.ceil(m / display_rows))

    fig, ax_array = plt.subplots(display_rows, display_cols, figsize=figsize)
    fig.subplots_adjust(wspace=0.025, hspace=0.025)

    ax_array = [ax_array] if m == 1 else ax_array.ravel()

    for i, ax in enumerate(ax_array):
        ax.imshow(X[i].reshape(example_height, example_width, order='F'), cmap='gray')
        ax.axis('off')
 #  Display the first 100 faces in the dataset
displayData(X[:64, :], figsize=(8, 8))

3.1 dimensionality reduction of face image using PCA

# ======================Fill in the code here=======================
W1=PCA(X.T,64)
print(W1[:36])
Z1=ProjectData(X.T,W1)



# ============================================================= 
[[-0.01425307 -0.03606596 -0.04561884 ...  0.0061877   0.02367001
   0.05076207]
 [-0.01474233 -0.03809858 -0.04756249 ...  0.00101133  0.01959809
   0.05250727]
 [-0.01501482 -0.03988747 -0.05082085 ... -0.00338655  0.00776122
   0.05543446]
 ...
 [-0.01711169 -0.04114578 -0.04517263 ...  0.0064823   0.0409063
   0.02625426]
 [-0.01721429 -0.04300162 -0.04873793 ...  0.00549935  0.02761684
   0.03023218]
 [-0.01742302 -0.04464136 -0.05105597 ...  0.01009193  0.00885712
   0.03474425]]

3.2 reconstruction of original face data by principal component

# ======================Fill in the code here=======================
X1_rec=RecoverData(Z1,W1).T

displayData(X1_rec[:64,:],figsize=(8,8))
# =============================================================  

Topics: Python Machine Learning