使用librosa的STFT理解 [英] STFT understanding using librosa

查看:291
本文介绍了使用librosa的STFT理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个音频样本,采样率为8khz,约为14秒.我正在使用librosa从该音频文件中提取一些功能.

I have an audio sample of about 14 seconds in 8khz Sample Rate. Im using librosa to extract some features from this audio file.

y, sr = librosa.load(file_name)
stft = np.abs(librosa.stft(y, n_fft=n_fft))

# file_length = 14.650022675736961 #sec
# defaults 
# n_fft =2048
# hop_length = 512 # win_length/4 = n_fft/4 = 512 (win_length = n_fft default)

#windowsTime = n_fft * Ts # (1/sr)

stft.shape
# (1025, 631)

Specshow:

librosa.display.specshow(stft, x_axis='time', y_axis='log')

[![stft sr = 22050] [1]] [1]

[![stft sr = 22050][1]][1]

现在,我可以理解STFT的形状

Now, i can understand the shape of the STFT

631 time bins = are 4 * ( file_length / Ts * windowsTime) #overlapping
1025 frequency bins = Frames frequency gap sr/n_fft.
so there are 1025 frequencies in 0 to sr/2(Nyquest)

我不明白的是两种不同采样率的不同情节具有相同的比率.1-22050作为默认的librosa2-8khz作为采样率文件

what i cant understand is the different plot of two different sample rates with same ratios. 1 - 22050 as librosa default 2 - 8khz as sampling rate file

y2, sr = librosa.load(file_name, sr=None)

n_fft2 =743 # (same ratio to get same visuals for comparsion)
hop_length = 186 # (1/4 n_fft by default)

stft2 = np.abs(librosa.stft(y2, n_fft=n_fft2))

因此,stft的威信会有所不同

so ofc the shappe of stft will be different

stft2.shape
# (372, 634)


[![stft sr = 743] [2]] [2]

[![stft sr = 743][2]][2]

1.但是为什么绝对频率不一样呢?其相同的信号只是不被过采样,因此每个采样都是唯一的.我想念什么?是静态的y轴吗?

2.我无法理解时间段的值.我希望从前一点到文件结尾的第一个为跳数长度,第二个为windowTime时,帧数为bin.但是单位很奇怪?

我希望能够在特定的时间(帧)中提取特定Fbin的大小,或者希望能够对其中一些进行求和以得出时间范围的磁化强度.

i want to be able to extract the magnitude of a specific Fbin in a specific time (frame) or additionally be able to sum some of those to get the magnitue for time RANGE.

因此,如果我将stft [fBin的数量]取为1025 fBins的1行(stft [1025])并查看其内容,则stft [0]包含630点,对于每个频率而言,这恰好是630个时间点因此第1-1025帧中的每个帧都有相同的时间点.

Therefore, if i take stft[number of fBin] which is 1 row of 1025 fBins (stft[1025]) and look at it contents so stft[0] contains 630 points, which are exactly 630 time points for each frequency so each of the frames 1-1025 will have the same time points.

因此,如果我也采样了一个适合所有其他fbin的样本(相同时间点),即stft [0]我将能够选择时间范围和fBin并获得特定的幅度:

so if i take one sample which suits all the other fbins as well ( same time points) which is stft[0] i would be able to choose time frame and fBin and get the spcific magnitude:

times =  librosa.core.frames_to_time(stft2[0], sr=sr2, n_fft=n_fft2, hop_length=hop_length) 

fft_bin = 6
time_idx = 10

print('freq (Hz)', freqs[fft_bin])
print('time (s)', times[time_idx])
print('amplitude', stft[fft_bin, time_idx])

array([0.047375,0.047625,0.04825,0.04825,0.046875,0.04675,0.05,0.051625,0.051,0.048,0.05225,0.050375,0.04925,0.04725,0.051625,0.0465,0.05225,0.05,0.053,0.053875,0.048,0.0485,0.047875,0.04775,0.0485,0.049,0.051375,0.047125,0.051125,0.047125,0.04725、0.05025、0.05425、0.05475、0.051375、0.060375,0.050625、0.04875、0.054125、0.048、0.05025、0.052375,0.04975、0.054125、0.055625、0.047125、0.0475、0.047,0.049875、0.05025、0.048375、0.047、0.050625、0.05,0.046625、0.04925、0.048、0.049125、0.05375、0.0545,0.04925,0.049125,0.049125,0.049625,0.047,0.047625,0.0535,0.051875,0.05075,0.04975,0.047375,0.049,0.0485、0.050125、0.048、0.05475、0.05175、0.050125,0.04725、0.0575、0.056875、0.047、0.0485、0.055375,0.04975、0.047、0.0495、0.051375、0.04675、0.04925,0.052125、0.04825、0.048125、0.046875、0.047、0.048625,0.050875、0.05125、0.04825、0.052125、0.052375、0.05125,0.049875、0.048625、0.04825、0.0475、0.048375、0.050875,0.052875,0.0475,0.0485,0.05225,0.053625,0.05075,0.0525,0.047125,0.0485,0.048875,0.049,0.0515,0.055875、0.0515、0.05025、0.05125、0.054625、0.05525,0.047,0.0545,0.052375,0.049875,0.051,0.048625,0.0475,0.048,0.048875,0.050625,0.05375,0.051875,0.048125、0.052125、0.048125、0.051、0.052625、0.048375,0.047625、0.05、0.048125、0.050375、0.049125、0.053125,0.053875、0.05075、0.052375、0.048875、0.05325、0.05825,0.055625、0.0465、0.05475、0.051125、0.048375、0.0505,0.04675,0.0495,0.04725,0.046625,0.049625,0.054,0.056125、0.05175、0.050625、0.050375、0.047875、0.047,0.048125、0.048875、0.050625、0.049875、0.047、0.0505,0.047,0.053125,0.047625,0.05025,0.04825,0.05275,0.051625,0.05,0.051625,0.05425,0.052,0.04775,0.047,0.049125,0.05375,0.0535,0.04925,0.05125,0.046375、0.04775、0.04775、0.0465、0.047、0.04675,0.04675,0.04925,0.05125,0.046375,0.04825,0.0525,0.057875、0.056375、0.054375、0.04825,0.0535,0.05475,0.0485,0.048875,0.048625,0.0485,0.047625,0.046875,0.0465、0.05125、0.054、0.05、0.048、0.047875,0.0515,0.048125、0.055875、0.054875、0.051625、0.048125,0.047625、0.048375、0.052875、0.0485、0.0475、0.0495,0.05025,0.05675,0.0585,0.051625,0.05625,0.0605,0.052125、0.0495、0.049、0.047875、0.051375、0.054125,0.0525,0.0515,0.057875,0.055,0.05375,0.046375,0.04775、0.0485、0.050125、0.050875、0.04925、0.049125,0.0465、0.04975、0.053375、0.05225、0.0475、0.046375,0.05375,0.049875,0.049875,0.047375,0.049125,0.049375,0.04875、0.048125、0.05075、0.0505、0.046375、0.047375,0.048625、0.0485、0.047125、0.052625、0.051125、0.04725,0.050875,0.053875,0.0475,0.0495,0.051,0.055,0.053,0.050125,0.04675,0.05375,0.054375,0.047250.046875、0.04925、0.04725、0.0495、0.05075、0.050875,0.04775、0.05125、0.050125、0.047875、0.04825、0.046625,0.0475,0.046375,0.04775,0.05075,0.048125,0.046375,0.049625、0.0495、0.04675、0.046625、0.0475、0.04825,0.053,0.050875,0.049,0.057875,0.058875,0.049875,0.049125、0.0475、0.05225、0.055、0.055375、0.053875,0.051125,0.049875,0.05025,0.050875,0.049,0.0575,0.051875、0.049375、0.04775、0.051125、0.050375、0.0465,0.047375、0.0465、0.046375、0.048875、0.051875、0.047,0.047125、0.047125、0.046875、0.049625、0.048625、0.051,0.049,0.046375,0.049,0.056125,0.054625,0.047625,0.046625、0.0475、0.051875、0.05175、0.047625、0.050375,0.055125、0.05275、0.047125、0.05325、0.060125、0.056625,0.053,0.052125,0.047125,0.04825,0.050375,0.05025,0.048,0.046625,0.047125,0.04875,0.047,0.05525,0.0535,0.047,0.0495,0.0535,0.05125,0.046625,0.0495,0.04675,0.04875,0.047125,0.04975,0.047,0.049875、0.046875、0.047125、0.048、0.046375、0.0495,0.04975、0.05125、0.048375、0.049125、0.0515、0.048375,0.052375、0.051125、0.046375、0.047125、0.050375、0.0465,0.052375、0.05375、0.04925、0.05025、0.0565、0.054875,0.048,0.049375,0.052625,0.055375,0.053375,0.05075,0.048875、0.05475、0.05075、0.0485、0.049125、0.0475,0.047375、0.047375、0.047、0.052125、0.053875、0.049,0.052625、0.0485、0.04675、0.04875、0.05、0.0545,0.05025、0.0495、0.0515、0.0485、0.05025、0.0465,0.0465、0.048375、0.06375、0.10175、0.11975、0.118375,0.121375、0.12675、0.123、0.095375、0.055、0.05525,0.04775、0.053125、0.052375、0.056625、0.0565、0.046875,0.048、0.05175、0.048、0.052、0.048、0.048,0.05175,0.05025,0.049625,0.049625,0.047375,0.046625,0.052375、0.0555、0.051375、0.050625、0.052375、0.050125,0.048,0.052125,0.052125,0.0495,0.048875,0.048,0.049875、0.051125、0.050625、0.048、0.0465、0.048,0.04675、0.050875、0.048、0.046625、0.0495、0.050375,0.046625、0.0515、0.049875、0.049625、0.04675、0.049125,0.05025、0.050375、0.04725、0.047625、0.047、0.051625,0.0485、0.05225、0.046875、0.0475、0.04825、0.050375,0.05725,0.052375,0.048,0.046375,0.0475,0.0495,0.047875、0.046375、0.049875、0.046875、0.048、0.046875,0.048625、0.047125、0.046625、0.05、0.048875、0.04675,0.050125、0.05425、0.051375、0.050125、0.053375、0.052,0.053875、0.048、0.05575、0.049875、0.052125、0.048875,0.047375、0.048875、0.049125、0.047375、0.047375、0.047625,0.0495,0.04825,0.047875,0.04875,0.054,0.052125,0.051、0.046625、0.04925、0.05075、0.054375、0.0555,0.051625、0.046625、0.052125、0.055875、0.047、0.053875,0.050875、0.0505、0.0465、0.053125、0.050875、0.050625,0.051125、0.050875、0.056875、0.04925、0.050625、0.054125,0.056625、0.05025、0.0465、0.04675、0.049625、0.047,0.048375、0.047125、0.04875、0.048375、0.048875、0.04775,0.04775,0.047,0.052125,0.050875,0.054,0.058375,0.054,0.049125,0.04675,0.051875,0.05425,0.050125,0.04675,0.047625,0.046375,0.05275,0.053,0.04875,0.049125、0.047125、0.049375、0.0475、0.051125、0.0495,0.052375,0.047,0.047125,0.050875])

array([0.047375, 0.047625, 0.04825 , 0.04825 , 0.046875, 0.04675 , 0.05 , 0.051625, 0.051 , 0.048 , 0.05225 , 0.050375, 0.04925 , 0.04725 , 0.051625, 0.0465 , 0.05225 , 0.05 , 0.053 , 0.053875, 0.048 , 0.0485 , 0.047875, 0.04775 , 0.0485 , 0.049 , 0.051375, 0.047125, 0.051125, 0.047125, 0.04725 , 0.05025 , 0.05425 , 0.05475 , 0.051375, 0.060375, 0.050625, 0.04875 , 0.054125, 0.048 , 0.05025 , 0.052375, 0.04975 , 0.054125, 0.055625, 0.047125, 0.0475 , 0.047 , 0.049875, 0.05025 , 0.048375, 0.047 , 0.050625, 0.05 , 0.046625, 0.04925 , 0.048 , 0.049125, 0.05375 , 0.0545 , 0.04925 , 0.049125, 0.049125, 0.049625, 0.047 , 0.047625, 0.0535 , 0.051875, 0.05075 , 0.04975 , 0.047375, 0.049 , 0.0485 , 0.050125, 0.048 , 0.05475 , 0.05175 , 0.050125, 0.04725 , 0.0575 , 0.056875, 0.047 , 0.0485 , 0.055375, 0.04975 , 0.047 , 0.0495 , 0.051375, 0.04675 , 0.04925 , 0.052125, 0.04825 , 0.048125, 0.046875, 0.047 , 0.048625, 0.050875, 0.05125 , 0.04825 , 0.052125, 0.052375, 0.05125 , 0.049875, 0.048625, 0.04825 , 0.0475 , 0.048375, 0.050875, 0.052875, 0.0475 , 0.0485 , 0.05225 , 0.053625, 0.05075 , 0.0525 , 0.047125, 0.0485 , 0.048875, 0.049 , 0.0515 , 0.055875, 0.0515 , 0.05025 , 0.05125 , 0.054625, 0.05525 , 0.047 , 0.0545 , 0.052375, 0.049875, 0.051 , 0.048625, 0.0475 , 0.048 , 0.048875, 0.050625, 0.05375 , 0.051875, 0.048125, 0.052125, 0.048125, 0.051 , 0.052625, 0.048375, 0.047625, 0.05 , 0.048125, 0.050375, 0.049125, 0.053125, 0.053875, 0.05075 , 0.052375, 0.048875, 0.05325 , 0.05825 , 0.055625, 0.0465 , 0.05475 , 0.051125, 0.048375, 0.0505 , 0.04675 , 0.0495 , 0.04725 , 0.046625, 0.049625, 0.054 , 0.056125, 0.05175 , 0.050625, 0.050375, 0.047875, 0.047 , 0.048125, 0.048875, 0.050625, 0.049875, 0.047 , 0.0505 , 0.047 , 0.053125, 0.047625, 0.05025 , 0.04825 , 0.05275 , 0.051625, 0.05 , 0.051625, 0.05425 , 0.052 , 0.04775 , 0.047 , 0.049125, 0.05375 , 0.0535 , 0.04925 , 0.05125 , 0.046375, 0.04775 , 0.04775 , 0.0465 , 0.047 , 0.04675 , 0.04675 , 0.04925 , 0.05125 , 0.046375, 0.04825 , 0.0525 , 0.057875, 0.056375, 0.054375, 0.04825 , 0.0535 , 0.05475 , 0.0485 , 0.048875, 0.048625, 0.0485 , 0.047625, 0.046875, 0.0465 , 0.05125 , 0.054 , 0.05 , 0.048 , 0.047875, 0.0515 , 0.048125, 0.055875, 0.054875, 0.051625, 0.048125, 0.047625, 0.048375, 0.052875, 0.0485 , 0.0475 , 0.0495 , 0.05025 , 0.05675 , 0.0585 , 0.051625, 0.05625 , 0.0605 , 0.052125, 0.0495 , 0.049 , 0.047875, 0.051375, 0.054125, 0.0525 , 0.0515 , 0.057875, 0.055 , 0.05375 , 0.046375, 0.04775 , 0.0485 , 0.050125, 0.050875, 0.04925 , 0.049125, 0.0465 , 0.04975 , 0.053375, 0.05225 , 0.0475 , 0.046375, 0.05375 , 0.049875, 0.049875, 0.047375, 0.049125, 0.049375, 0.04875 , 0.048125, 0.05075 , 0.0505 , 0.046375, 0.047375, 0.048625, 0.0485 , 0.047125, 0.052625, 0.051125, 0.04725 , 0.050875, 0.053875, 0.0475 , 0.0495 , 0.051 , 0.055 , 0.053 , 0.050125, 0.04675 , 0.05375 , 0.054375, 0.04725 , 0.046875, 0.04925 , 0.04725 , 0.0495 , 0.05075 , 0.050875, 0.04775 , 0.05125 , 0.050125, 0.047875, 0.04825 , 0.046625, 0.0475 , 0.046375, 0.04775 , 0.05075 , 0.048125, 0.046375, 0.049625, 0.0495 , 0.04675 , 0.046625, 0.0475 , 0.04825 , 0.053 , 0.050875, 0.049 , 0.057875, 0.058875, 0.049875, 0.049125, 0.0475 , 0.05225 , 0.055 , 0.055375, 0.053875, 0.051125, 0.049875, 0.05025 , 0.050875, 0.049 , 0.0575 , 0.051875, 0.049375, 0.04775 , 0.051125, 0.050375, 0.0465 , 0.047375, 0.0465 , 0.046375, 0.048875, 0.051875, 0.047 , 0.047125, 0.047125, 0.046875, 0.049625, 0.048625, 0.051 , 0.049 , 0.046375, 0.049 , 0.056125, 0.054625, 0.047625, 0.046625, 0.0475 , 0.051875, 0.05175 , 0.047625, 0.050375, 0.055125, 0.05275 , 0.047125, 0.05325 , 0.060125, 0.056625, 0.053 , 0.052125, 0.047125, 0.04825 , 0.050375, 0.05025 , 0.048 , 0.046625, 0.047125, 0.04875 , 0.047 , 0.05525 , 0.0535 , 0.047 , 0.0495 , 0.0535 , 0.05125 , 0.046625, 0.0495 , 0.04675 , 0.04875 , 0.047125, 0.04975 , 0.047 , 0.049875, 0.046875, 0.047125, 0.048 , 0.046375, 0.0495 , 0.04975 , 0.05125 , 0.048375, 0.049125, 0.0515 , 0.048375, 0.052375, 0.051125, 0.046375, 0.047125, 0.050375, 0.0465 , 0.052375, 0.05375 , 0.04925 , 0.05025 , 0.0565 , 0.054875, 0.048 , 0.049375, 0.052625, 0.055375, 0.053375, 0.05075 , 0.048875, 0.05475 , 0.05075 , 0.0485 , 0.049125, 0.0475 , 0.047375, 0.047375, 0.047 , 0.052125, 0.053875, 0.049 , 0.052625, 0.0485 , 0.04675 , 0.04875 , 0.05 , 0.0545 , 0.05025 , 0.0495 , 0.0515 , 0.0485 , 0.05025 , 0.0465 , 0.0465 , 0.048375, 0.06375 , 0.10175 , 0.11975 , 0.118375, 0.121375, 0.12675 , 0.123 , 0.095375, 0.055 , 0.05525 , 0.04775 , 0.053125, 0.052375, 0.056625, 0.0565 , 0.046875, 0.048 , 0.05175 , 0.048 , 0.052 , 0.048 , 0.048 , 0.05175 , 0.05025 , 0.049625, 0.049625, 0.047375, 0.046625, 0.052375, 0.0555 , 0.051375, 0.050625, 0.052375, 0.050125, 0.048 , 0.052125, 0.052125, 0.0495 , 0.048875, 0.048 , 0.049875, 0.051125, 0.050625, 0.048 , 0.0465 , 0.048 , 0.04675 , 0.050875, 0.048 , 0.046625, 0.0495 , 0.050375, 0.046625, 0.0515 , 0.049875, 0.049625, 0.04675 , 0.049125, 0.05025 , 0.050375, 0.04725 , 0.047625, 0.047 , 0.051625, 0.0485 , 0.05225 , 0.046875, 0.0475 , 0.04825 , 0.050375, 0.05725 , 0.052375, 0.048 , 0.046375, 0.0475 , 0.0495 , 0.047875, 0.046375, 0.049875, 0.046875, 0.048 , 0.046875, 0.048625, 0.047125, 0.046625, 0.05 , 0.048875, 0.04675 , 0.050125, 0.05425 , 0.051375, 0.050125, 0.053375, 0.052 , 0.053875, 0.048 , 0.05575 , 0.049875, 0.052125, 0.048875, 0.047375, 0.048875, 0.049125, 0.047375, 0.047375, 0.047625, 0.0495 , 0.04825 , 0.047875, 0.04875 , 0.054 , 0.052125, 0.051 , 0.046625, 0.04925 , 0.05075 , 0.054375, 0.0555 , 0.051625, 0.046625, 0.052125, 0.055875, 0.047 , 0.053875, 0.050875, 0.0505 , 0.0465 , 0.053125, 0.050875, 0.050625, 0.051125, 0.050875, 0.056875, 0.04925 , 0.050625, 0.054125, 0.056625, 0.05025 , 0.0465 , 0.04675 , 0.049625, 0.047 , 0.048375, 0.047125, 0.04875 , 0.048375, 0.048875, 0.04775 , 0.04775 , 0.047 , 0.052125, 0.050875, 0.054 , 0.058375, 0.054 , 0.049125, 0.04675 , 0.051875, 0.05425 , 0.050125, 0.04675 , 0.047625, 0.046375, 0.05275 , 0.053 , 0.04875 , 0.049125, 0.047125, 0.049375, 0.0475 , 0.051125, 0.0495 , 0.052375, 0.047 , 0.047125, 0.050875])


  [1]: https://i.imgur.com/OeKzvrb.png
  [2]: https://i.imgur.com/ALtba5F.png

推荐答案

问题1:

使用 specshow 时,您需要指定采样率:

You need to specify the sampling rate when using specshow:

librosa.display.specshow(stft, x_axis='time', y_axis='log', sr=sr)

否则将使用默认值(22,050 Hz)(请参阅文档).

Otherwise the default value (22,050 Hz) will be used (see docs).

问题2:

librosa.core.frames_to_time 不使用 stft [0] 作为参数,它是第一帧的频点.相反,它以帧数作为第一个参数.

librosa.core.frames_to_time does not take stft[0] as argument, which would be the frequency bins of the first frame. Instead, it takes number of frames as first argument.

想象一下,您有一个 sr = 10000 Hz的音频信号.然后,使用 n_fft = 2000 hop_length = 1000 在其上运行STFT.然后,您每跳获得一个 frame ,并且该跳的长度为0.1s,因为10000个样本对应于1s,而1000个样本(1个跃点)因此对应于0.1s.

Imagine you have an audio signal with sr=10000 Hz. Then you run an STFT over it using n_fft=2000 and hop_length=1000. Then you get one frame per hop and the hop is 0.1s long, because 10000 samples correspond to 1s and 1000 samples (1 hop) therefore correspond to 0.1s.

stft [0] 不是帧编号.相反,第一个 stft 的形状为(1 + n_fft/2,t)(请参阅

stft[0] is not a frame number. Instead the first stft is of shape (1 + n_fft/2, t) (see here). This means the first dimension is the frequency bin and the second dimension is the frame number (t).

因此, stft 中的帧总数为 stft.shape [1] .要获取源音频的长度,可以执行以下操作:

The total number of frames in stft is therefore stft.shape[1]. To get the length of the source audio, you could do:

time = librosa.core.frames_to_time(stft.shape[1], sr=sr, hop_length=hop_length, n_fft=n_fft)

这篇关于使用librosa的STFT理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆