Speech feature signal classification -- BP neural network -- MATLAB implementation

Posted by riginosz on Sun, 30 Jan 2022 20:53:01 +0100

Speech feature recognition is an important aspect in the field of speech recognition, which is generally solved by the principle of pattern matching. The operation process of speech recognition is:

Firstly, the speech to be recognized is transformed into an electrical signal and then input into the recognition system. After preprocessing, the speech feature signal is extracted by mathematical method. The extracted speech feature signal can be regarded as the pattern of the speech.
Then, the speech model is compared with the known reference mode to obtain the best matching reference mode as the recognition result of the speech.
The speech recognition process is shown in Figure 1.

Figure 1

stay Baidu online disk attachment data - extraction code: qwer In, four kinds of music speech characteristic signals have been extracted. Different speech signals are identified with 1, 2, 3 and 4 respectively, and the extracted signals are stored in data1 mat , data2. mat , data3. mat , data4. In the mat database file, the data content is shown in Figure 2.
Each group of data is 25 dimensions, the first dimension is category identification, and the last 24 dimensions are speech feature signals. The four kinds of speech characteristic signals are combined into a group, 1500 groups of data are randomly selected as training data and 500 groups of data are used as test data, and the training data are normalized. Set the expected output value of each group of voice signals according to the voice category identification. For example, when the identification category is 1, the expected output vector is [1 0].

Figure 2

Before BP neural network prediction, the network must be trained first. Through training, the network has the ability of associative memory and prediction. The training process of BP neural network includes the following steps.

Network initialization. Determine the number of network input layer nodes n, hidden layer nodes l and output layer nodes m according to the system input and output sequence (X,Y), and initialize the connection weights between input layer, hidden layer and output layer neurons ω i j \omega _{{\rm{ij}}} ωij, ω j k \omega _{{\rm{jk}}} ω jk, initialize hidden layer threshold a, output layer threshold b, given learning rate and neuron excitation function.
Hidden layer output calculation. According to the input vector X, the connection weight between the input layer and the hidden layer ω i j \omega _{{\rm{ij}}} ω ij, and the hidden layer threshold a, calculate the hidden layer output H.
Output layer output calculation. According to the hidden layer output H, connect the weight ω j k \omega _{{\rm{jk}}} ω jk ， and threshold b, calculate the BP neural network prediction output O
Error calculation. The network prediction error e is calculated according to the network prediction output O and the expected output Y.
Weight update. Update the network connection weight according to the network prediction error e ω i j \omega _{{\rm{ij}}} ωij, ω j k \omega _{{\rm{jk}}} ωjk
Threshold update. According to the network prediction error e, the network node thresholds A and B are updated.
Judge whether the algorithm iteration is over. If not, return to step 2.

The specific codes are as follows:

%% Clear environment variables
clc
clear

%% Training data prediction data extraction and normalization

%Download four types of voice signals
load data1 c1
load data2 c2
load data3 c3
load data4 c4
%Four characteristic signal matrices are combined into one matrix
data(1:500,:)=c1(1:500,:);
data(501:1000,:)=c2(1:500,:);
data(1001:1500,:)=c3(1:500,:);
data(1501:2000,:)=c4(1:500,:);

%Random sorting from 1 to 2000
k=rand(1,2000);
[m,n]=sort(k);  %n It is the number of 2000 randomly sorted

%Input / output data
input=data(:,2:25);  %Information of the last 24 locations of the original 2000 numbers
output1 =data(:,1);  %Information of the first position of the original 2000 numbers

%Change the output from 1D to 4D
for i=1:2000
    switch output1(i)
        case 1
            output(i,:)=[1 0 0 0];
        case 2
            output(i,:)=[0 1 0 0];
        case 3
            output(i,:)=[0 0 1 0];
        case 4
            output(i,:)=[0 0 0 1];
    end
end

%In accordance with the previously disrupted order, 1500 samples were randomly selected as training samples and 500 samples as prediction samples
input_train=input(n(1:1500),:)';
output_train=output(n(1:1500),:)';
input_test=input(n(1501:2000),:)';
output_test=output(n(1501:2000),:)';

%Input data normalization
[inputn,inputps]=mapminmax(input_train);  % Normalization principle formula: Xk=(Xk-Xmin)/(Xmax-Xmin)

%% Network structure initialization
%Network size initialization
innum=24;
midnum=25;
outnum=4;
 

%Weight initialization
w1=rands(midnum,innum);
b1=rands(midnum,1);
w2=rands(midnum,outnum);
b2=rands(outnum,1);

w2_1=w2;
w2_2=w2_1;
w1_1=w1;
w1_2=w1_1;

b1_1=b1;
b1_2=b1_1;
b2_1=b2;
b2_2=b2_1;

%Learning rate
xite=0.1;
alfa=0.01;

The BP neural network is trained with the training data, and the weight and threshold of the network are adjusted according to the network prediction error in the training process.

%% Network training
for ii=1:10
    E(ii)=0;
    for i=1:1:1500
       %% Network prediction output 
        x=inputn(:,i);
        % Hidden layer output
        for j=1:1:midnum
            I(j)=inputn(:,i)'*w1(j,:)'+b1(j);
            Iout(j)=1/(1+exp(-I(j)));
        end
        % Output layer output
        yn=w2'*Iout'+b2;
        
       %% Weight threshold correction
        %calculation error
        e=output_train(:,i)-yn;     
        E(ii)=E(ii)+sum(abs(e));
        
        %Calculate weight change rate
        dw2=e*Iout;
        db2=e';
        
        for j=1:1:midnum
            S=1/(1+exp(-I(j)));
            FI(j)=S*(1-S);
        end      
        for k=1:1:innum
            for j=1:1:midnum
                dw1(k,j)=FI(j)*x(k)*(e(1)*w2(j,1)+e(2)*w2(j,2)+e(3)*w2(j,3)+e(4)*w2(j,4));
                db1(j)=FI(j)*(e(1)*w2(j,1)+e(2)*w2(j,2)+e(3)*w2(j,3)+e(4)*w2(j,4));
            end
        end
           
        w1=w1_1+xite*dw1';
        b1=b1_1+xite*db1';
        w2=w2_1+xite*dw2';
        b2=b2_1+xite*db2';
        
        w1_2=w1_1;w1_1=w1;
        w2_2=w2_1;w2_1=w2;
        b1_2=b1_1;b1_1=b1;
        b2_2=b2_1;b2_1=b2;
    end
end

The trained BP neural network is used to classify speech characteristic signals, and the classification ability of BP neural network is analyzed according to the classification results.

%% Speech feature signal classification
inputn_test=mapminmax('apply',input_test,inputps);

for ii=1:1
    for i=1:500%1500
        %Hidden layer output
        for j=1:1:midnum
            I(j)=inputn_test(:,i)'*w1(j,:)'+b1(j);
            Iout(j)=1/(1+exp(-I(j)));
        end
        
        fore(:,i)=w2'*Iout'+b2;
    end
end

Analyze the experimental results. Draw the error diagram. Show accuracy.

%% Result analysis
%Find out what kind of data it belongs to according to the network output
for i=1:500
    output_fore(i)=find(fore(:,i)==max(fore(:,i)));
end

%BP Network prediction error
error=output_fore-output1(n(1501:2000))';



%Draw the classification diagram of predicted speech types and actual speech types
figure(1)
plot(output_fore,'r')
hold on
plot(output1(n(1501:2000))','b')
legend('Predicted speech category','Actual voice category')

%Draw the error diagram
figure(2)
plot(error)
title('BP Network classification error','fontsize',12)
xlabel('speech signal ','fontsize',12)
ylabel('Classification error','fontsize',12)

%print -dtiff -r600 1-4

k=zeros(1,4);  
%Find out the category of judgment errors
for i=1:500
    if error(i)~=0
        [b,c]=max(output_test(:,i));
        switch c
            case 1 
                k(1)=k(1)+1;
            case 2 
                k(2)=k(2)+1;
            case 3 
                k(3)=k(3)+1;
            case 4 
                k(4)=k(4)+1;
        end
    end
end

%Identify the individuals and of each category
kk=zeros(1,4);
for i=1:500
    [b,c]=max(output_test(:,i));
    switch c
        case 1
            kk(1)=kk(1)+1;
        case 2
            kk(2)=kk(2)+1;
        case 3
            kk(3)=kk(3)+1;
        case 4
            kk(4)=kk(4)+1;
    end
end

%Correct rate
rightridio=(kk-k)./kk

The final prediction effect is shown in Figure 3:

Figure 3

Seeing this content in the book, I want to write a blog to record it. If this blog is helpful to you, I hope I can praise, collect and pay attention to it. I will share high-quality blogs in the future. Thank you!

Topics: MATLAB neural networks

Programmer Think

Speech feature signal classification -- BP neural network -- MATLAB implementation

Hot Topics