Speech recognition-time domain analysis, autocorrelation not done

Posted by dreado on Wed, 24 Jul 2019 11:28:57 +0200

Reference resources:
https://blog.csdn.net/qq_39516859/article/details/80166842

Voice framing and windowing

Speech has time-varying characteristics, but in a short time range, its characteristics remain basically unchanged, that is, relatively stable. So it has short-term stationarity.

Speech signal is divided into segments to be analyzed. Each segment is called a'frame'. The frame length is usually 10-30 Ms.

The overlapping part of the previous frame and the latter frame is called frame shift.

The ratio of frame shift to frame length is usually 0-1/2.

window function

Frame segmentation is realized by weighting the movable finite length window.

Window length:

The relationship between sampling period Ts=1/fsT_s=1/f_sTs=1/fs, window length N and frequency f Delta f f
Δf=1NTs \Delta f=\frac{1}{NT_s} Δf=NTs​1​
Three kinds of window contrast

[External Link Picture Transfer Failure (img-tjgqblFw-1563959674910)(assets/three window comparison.jpg)]

Time domain analysis

(Both use hamming windows

(The voice is zj3.wav.

Frame length 200

Frame shift to 100

short time energy

Let the short-term energy of speech signal x n(m)x_n(m)xn(m) in the nth frame use EnE_nEn
En=∑m=0N−1xn2(m) E_n=\sum^{N-1}_{m=0}x_n^2(m) En​=m=0∑N−1​xn2​(m)

[External link picture transfer failure (img-l3tJH9nN-1563959674912)(assets/short-term energy.jpg)]

short time average magnitude

Calculate the energy of a speech signal
Mn=∑m=0N−1∣xn(m)∣ M_n=\sum^{N-1}_{m=0}|x_n(m)| Mn​=m=0∑N−1​∣xn​(m)∣
[External Link Picture Transfer Failure (img-uCuCK9PZ-1563959674912)(assets/short-term average amplitude.jpg)]

Short-term zero-crossing rate

Represents the number of times a speech signal waveform passes through a horizontal axis (zero level) in a frame of speech. (Number of positive and negative changes
Zn=12∑m−0N−1∣sgn[xn(m)]−sgn[xn(m−1)] Z_n=\frac{1}{2}\sum^{N-1}_{m-0}|sgn[x_n(m)]-sgn[x_n(m-1)] Zn​=21​m−0∑N−1​∣sgn[xn​(m)]−sgn[xn​(m−1)]

sgn[x]={1,(x)≥0)−1,(x&lt;0) sgn[x]=\begin{cases}1,&amp;\text(x)\geq 0) \\ -1,&amp;(x&lt;0) \end{cases} sgn[x]={1,−1,​(x)≥0)(x<0)​

The adjacent sampling values of discrete signals in MATLAB change symbols and the product is negative.
xi(m)×xi(m+1)&lt;0 x_i(m)\times x_i(m+1)&lt;0 xi​(m)×xi​(m+1)<0
[External Link Picture Transfer Failure (img-Gf53fmz3-1563959674912)(assets/short-time zero-crossing rate.jpg)]

Code:

%Calculation and display of short time domain analysis parameters
clear all;
clc;
filedir=[];%set up path
filename='D:\matlab\music\zj3.wav';
file=[filedir filename];
[x,Fs]=audioread(file);

wlen=200;%Frame length
inc=100;%Frame shift
win=hamming(wlen);%hamming window
N=length(x);%Signal Length
time=(0:N-1)/Fs;%Calculate the time scale of the signal

%These functions don't
% En1=STEn(x,win,inc);%short time energy
% Mn1=STMn(x,win,inc);%short time average magnitude
% Zcr1=STZcr(x,win,inc);%Short-term zero-crossing rate
%The return is not a vector but a matrix, because a frame of signal does not get a numerical value.

X=enframe(x,win,inc)'; %Framing,A column is a frame.
fn=size(X,2);%Frame number
frameTime=frame2time(fn,wlen,inc,Fs);  %Find the corresponding time of each frame
%This formula has to be looked at again.

  %short time energy
  for i=1:fn
          y=X(:,i);%Data per frame
          b=0;
          for m=1:1:200
          b=b+y(m).^2;   
          end
          E(i)=b;
  end
   
 %short time average magnitude
  for i=1:fn
          y=X(:,i);%Data per frame
          b=0;
          for m=1:1:200
          b=b+abs(y(m));   
          end
          M(i)=b;
  end
   
   %Short-term zero-crossing rate
   Z=zeros(1,fn);                 % Initialization
  for i=1:fn
          y=X(:,i);%Data per frame
          b=0;
          for m=1:1:199
          if y(m)*y(m+1)<0;
              b=b+1;
          end
          Z(i)=b;
          end
  end
  
   %Short-term autocorrelation
   d=time*Fs; %Sampling Points=Time multiplied by sampling frequency
  for i=1:fn
          y=X(:,i);%Data per frame
          for k=0:200
              for m=(k+1):199
          b=b+y(m)*y(m+1);
              end
          end
          R(i)=b;
  end
  
% %Drawing-short time energy
% figure(2)
% subplot(211)
% plot(time,x)
% title('Primitive speech')
% ylabel('amplitude'); xlabel(['time/s' 10 '(a)']);
% subplot(212)
% plot(frameTime,E);
% title('short time energy');
% ylabel('amplitude'); xlabel(['time/s' 10 '(b)']);


% % Drawing-short time average magnitude
% figure(3)
% subplot(211)
% plot(time,x)
% title('Primitive speech')
% ylabel('amplitude'); xlabel(['time/s' 10 '(a)']);
% subplot(212)
% plot(frameTime,M);
% title('short time average magnitude');
% ylabel('amplitude'); xlabel(['time/s' 10 '(b)']);

% % Drawing-Short-term zero-crossing rate
% figure(4)
% subplot(211)
% plot(time,x)
% title('Primitive speech')
% ylabel('amplitude'); xlabel(['time/s' 10 '(a)']);
% subplot(212)
% plot(frameTime,Z);
% title('Short-term zero-crossing rate');
% ylabel('frequency'); xlabel(['time/s' 10 '(b)']);

% % Drawing-Short-term autocorrelation-Not yet.
% figure(5)
% subplot(211)
% plot(time,x)
% title('Primitive speech')
% ylabel('amplitude'); xlabel(['time/s' 10 '(a)']);
% subplot(212)
% plot(time,R);
% title('Short-term autocorrelation');
% ylabel('copy'); xlabel(['Point' 10 '(b)']);

Short-term autocorrelation

For voiced speech, pitch period of speech waveform sequence can be calculated by autocorrelation function.

Autocorrelation function is also used in linear prediction analysis.

Speech signal x(m), autocorrelation function R(k)

K is the maximum delay point
Rn(k)=∑m=0N−1−Kxn(m)xn(m+k)(0≤k≤K) R_n(k)=\sum^{N-1-K}_{m=0}x_n(m)x_n(m+k) \quad (0\leq k\leq K) Rn​(k)=m=0∑N−1−K​xn​(m)xn​(m+k)(0≤k≤K)

Short-term mean amplitude difference

A parameter similar to an autocorrelation function
Fn(k)=∑m=0N−1−k∣xn(m)−xn(m+k) F_n(k)=\sum^{N-1-k}_{m=0}|x_n(m)-x_n(m+k) Fn​(k)=m=0∑N−1−k​∣xn​(m)−xn​(m+k)

Topics: MATLAB Windows