Reference resources:
https://blog.csdn.net/qq_39516859/article/details/80166842
Voice framing and windowing
Speech has time-varying characteristics, but in a short time range, its characteristics remain basically unchanged, that is, relatively stable. So it has short-term stationarity.
Speech signal is divided into segments to be analyzed. Each segment is called a'frame'. The frame length is usually 10-30 Ms.
The overlapping part of the previous frame and the latter frame is called frame shift.
The ratio of frame shift to frame length is usually 0-1/2.
window function
Frame segmentation is realized by weighting the movable finite length window.
Window length:
The relationship between sampling period Ts=1/fsT_s=1/f_sTs=1/fs, window length N and frequency f Delta f f
Δf=1NTs
\Delta f=\frac{1}{NT_s}
Δf=NTs1
Three kinds of window contrast
[External Link Picture Transfer Failure (img-tjgqblFw-1563959674910)(assets/three window comparison.jpg)]
Time domain analysis
(Both use hamming windows
(The voice is zj3.wav.
Frame length 200
Frame shift to 100
short time energy
Let the short-term energy of speech signal x n(m)x_n(m)xn(m) in the nth frame use EnE_nEn
En=∑m=0N−1xn2(m)
E_n=\sum^{N-1}_{m=0}x_n^2(m)
En=m=0∑N−1xn2(m)
[External link picture transfer failure (img-l3tJH9nN-1563959674912)(assets/short-term energy.jpg)]
short time average magnitude
Calculate the energy of a speech signal
Mn=∑m=0N−1∣xn(m)∣
M_n=\sum^{N-1}_{m=0}|x_n(m)|
Mn=m=0∑N−1∣xn(m)∣
[External Link Picture Transfer Failure (img-uCuCK9PZ-1563959674912)(assets/short-term average amplitude.jpg)]
Short-term zero-crossing rate
Represents the number of times a speech signal waveform passes through a horizontal axis (zero level) in a frame of speech. (Number of positive and negative changes
Zn=12∑m−0N−1∣sgn[xn(m)]−sgn[xn(m−1)]
Z_n=\frac{1}{2}\sum^{N-1}_{m-0}|sgn[x_n(m)]-sgn[x_n(m-1)]
Zn=21m−0∑N−1∣sgn[xn(m)]−sgn[xn(m−1)]
sgn[x]={1,(x)≥0)−1,(x<0) sgn[x]=\begin{cases}1,&\text(x)\geq 0) \\ -1,&(x<0) \end{cases} sgn[x]={1,−1,(x)≥0)(x<0)
The adjacent sampling values of discrete signals in MATLAB change symbols and the product is negative.
xi(m)×xi(m+1)<0
x_i(m)\times x_i(m+1)<0
xi(m)×xi(m+1)<0
[External Link Picture Transfer Failure (img-Gf53fmz3-1563959674912)(assets/short-time zero-crossing rate.jpg)]
Code:
%Calculation and display of short time domain analysis parameters clear all; clc; filedir=[];%set up path filename='D:\matlab\music\zj3.wav'; file=[filedir filename]; [x,Fs]=audioread(file); wlen=200;%Frame length inc=100;%Frame shift win=hamming(wlen);%hamming window N=length(x);%Signal Length time=(0:N-1)/Fs;%Calculate the time scale of the signal %These functions don't % En1=STEn(x,win,inc);%short time energy % Mn1=STMn(x,win,inc);%short time average magnitude % Zcr1=STZcr(x,win,inc);%Short-term zero-crossing rate %The return is not a vector but a matrix, because a frame of signal does not get a numerical value. X=enframe(x,win,inc)'; %Framing,A column is a frame. fn=size(X,2);%Frame number frameTime=frame2time(fn,wlen,inc,Fs); %Find the corresponding time of each frame %This formula has to be looked at again. %short time energy for i=1:fn y=X(:,i);%Data per frame b=0; for m=1:1:200 b=b+y(m).^2; end E(i)=b; end %short time average magnitude for i=1:fn y=X(:,i);%Data per frame b=0; for m=1:1:200 b=b+abs(y(m)); end M(i)=b; end %Short-term zero-crossing rate Z=zeros(1,fn); % Initialization for i=1:fn y=X(:,i);%Data per frame b=0; for m=1:1:199 if y(m)*y(m+1)<0; b=b+1; end Z(i)=b; end end %Short-term autocorrelation d=time*Fs; %Sampling Points=Time multiplied by sampling frequency for i=1:fn y=X(:,i);%Data per frame for k=0:200 for m=(k+1):199 b=b+y(m)*y(m+1); end end R(i)=b; end % %Drawing-short time energy % figure(2) % subplot(211) % plot(time,x) % title('Primitive speech') % ylabel('amplitude'); xlabel(['time/s' 10 '(a)']); % subplot(212) % plot(frameTime,E); % title('short time energy'); % ylabel('amplitude'); xlabel(['time/s' 10 '(b)']); % % Drawing-short time average magnitude % figure(3) % subplot(211) % plot(time,x) % title('Primitive speech') % ylabel('amplitude'); xlabel(['time/s' 10 '(a)']); % subplot(212) % plot(frameTime,M); % title('short time average magnitude'); % ylabel('amplitude'); xlabel(['time/s' 10 '(b)']); % % Drawing-Short-term zero-crossing rate % figure(4) % subplot(211) % plot(time,x) % title('Primitive speech') % ylabel('amplitude'); xlabel(['time/s' 10 '(a)']); % subplot(212) % plot(frameTime,Z); % title('Short-term zero-crossing rate'); % ylabel('frequency'); xlabel(['time/s' 10 '(b)']); % % Drawing-Short-term autocorrelation-Not yet. % figure(5) % subplot(211) % plot(time,x) % title('Primitive speech') % ylabel('amplitude'); xlabel(['time/s' 10 '(a)']); % subplot(212) % plot(time,R); % title('Short-term autocorrelation'); % ylabel('copy'); xlabel(['Point' 10 '(b)']);
Short-term autocorrelation
For voiced speech, pitch period of speech waveform sequence can be calculated by autocorrelation function.
Autocorrelation function is also used in linear prediction analysis.
Speech signal x(m), autocorrelation function R(k)
K is the maximum delay point
Rn(k)=∑m=0N−1−Kxn(m)xn(m+k)(0≤k≤K)
R_n(k)=\sum^{N-1-K}_{m=0}x_n(m)x_n(m+k) \quad (0\leq k\leq K)
Rn(k)=m=0∑N−1−Kxn(m)xn(m+k)(0≤k≤K)
Short-term mean amplitude difference
A parameter similar to an autocorrelation function
Fn(k)=∑m=0N−1−k∣xn(m)−xn(m+k)
F_n(k)=\sum^{N-1-k}_{m=0}|x_n(m)-x_n(m+k)
Fn(k)=m=0∑N−1−k∣xn(m)−xn(m+k)