获取与Librosa中的STFT相关的频率 [英] Getting the frequencies associated with STFT in Librosa

查看:287
本文介绍了获取与Librosa中的STFT相关的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用 librosa.stft()计算频谱图时,如何获取相关的频率值?我对像 librosa.display.specshow 中那样生成图像不感兴趣,但是我想掌握这些值.

  y,sr = librosa.load('../recordings/high_pitch.m4a')stft = librosa.stft(y,n_fft = 256,window = sig.windows.hamming)规格= np.abs(stft) 

spec 给了我每个频率的振幅"或功率",但没有给我频率箱本身.我已经看到有一个 display.specshow 函数将在热图的垂直轴上显示这些频率值,但不会自行返回这些值.

我正在为单个FFT寻找类似于 nn.fft.fttfreq()的东西,但是在 librosa 文档中找不到与之等效的东西.

解决方案

我想特别指出这个问题和答案:文档,我们知道横轴是时间轴,纵轴是频率.频谱图中的每一列都是时间片的FFT,其中在此时间点的中心有一个放置有 n_fft = 256 分量的窗口.

我们还知道有一个 hop length ,它告诉我们在计算下一个FFT之前需要跳过多少个音频样本.默认情况下,此值为 n_fft/4 ,因此音频中每256/4 = 64点,我们将计算一个新的FFT,该新FFT的中心时间为 n_fft = 256 点长.如果您想知道每个窗口居中的确切时间点,那就是 i/Fs ,其中 i 是音频信号的索引,它是音频信号的倍数.64.

现在,对于每个FFT窗口,对于真实信号,频谱是对称的,因此我们仅考虑FFT的正向.这在文档中得到了验证,其中行数和频率分量的数量为 1 + n_fft/2 ,其中1为直流分量.既然我们有了这个,请咨询上面从bin编号到相应频率的关系的帖子为 i * Fs/n_fft ,其中 i 是bin编号, Fs是采样频率, n_fft = 256 是FFT窗口中的点数.由于我们仅查看半频谱,而不是 i 从0扩展到 n_fft ,所以从0扩展到 1 + n_fft/2 而不是因为超出 1 + n_fft/2 的频段仅是半频谱的反射版本,因此我们不考虑超出 Fs/2 Hz的频率分量.

如果您想生成这些频率的NumPy数组,则可以执行以下操作:

 将numpy导入为np频率= np.arange(0,1 + n_fft/2)* Fs/n_fft 

freqs 是一个将FFT中的bin编号映射到相应频率的数组.作为说明性示例,假设我们的采样频率为16384 Hz,并且 n_fft = 256 .因此:

 在[1]中:将numpy导入为np在[2]中:Fs = 16384在[3]中:n_fft = 256在[4]中:np.arange(0,1 + n_fft/2)* Fs/n_fft出[4]:array([0.,64.,128.,192.,256.,320.,384.,448.,512.,576.,640.,704.,768.,832.,896.,960.,1024.,1088.,1152.,1216.,1280.,1344.,1408.,1472.,1536.,1600.,1664.,1728.,1792.,1856.,1920.,1984.,2048.,2112.,2176.,2240.,2304.,2368.,2432.,2496.,2560.,2624.,2688.,2752.,2816.2880.,2944.,3008.,3072.,3136.,3200.,3264.,3328.,3392.,3456.,3520.,3584.,3648.,3712.,3776.,3840.,3904.,3968.,4032.,4096.,4160.,4224.,4288.,4352.,4416.,4480.,4544.,4608.,4672.,4736.,4800.,4864.,4928.,4992.,5056.,5120.,5184.,5248.,5312.,5376.,5440.,5504.,5568.,5632.,5696.5760.,5824.,5888.,5952.,6016.,6080.,6144.,6208.,6272.,6336.,6400.,6464.,6528.,6592.,6656.,6720.,6784.,6848.,6912.,6976.,7040.,7104.,7168.,7232.,7296.,7360.,7424.,7488.,7552.,7616.,7680.,7744.,7808.,7872.,7936.,8000.,8064.,8128.,8192.])在[5]中:频率= _;len(频率)出[5]:129 

我们可以看到我们已经生成了 1 + n_fft/2 = 129 元素数组,该数组告诉我们每个对应bin编号的频率.


警告

请注意, librosa.display.specshow 的默认采样率为22050 Hz,因此,如果您未将采样率( sr )设置为与音频信号相同的采样频率,则纵轴和横轴将是不正确的.确保指定 sr 输入标志以匹配输入音频的采样频率.

When using librosa.stft() to calculate a spectrogram, how does one get back the associated frequency values? I am not interested in generating an image as in librosa.display.specshow, but rather I want to have those values in hand.

y, sr = librosa.load('../recordings/high_pitch.m4a')
stft = librosa.stft(y, n_fft=256, window=sig.windows.hamming)
spec = np.abs(stft)

spec gives me the 'amplitude' or 'power' of each frequency, but not the frequencies bins themselves. I have seen that there is a display.specshow function that will display these frequency values on the vertical axis of a heatmap, but not return the values themselves.

I'm looking for something similar to nn.fft.fttfreq() for a single FFT, but cannot find its equivalent in the librosa documentation.

解决方案

I would like to point out this question and answer in particular: How do I obtain the frequencies of each value in an FFT?. In addition to consulting the documentation for the STFT from librosa, we know that the horizontal axis is the time axis while the vertical axis are the frequencies. Each column in the spectrogram is the FFT of a slice in time where the centre at this time point has a window placed with n_fft=256 components.

We also know that there is a hop length which tells us how many audio samples we need to skip over before we calculate the next FFT. This by default is n_fft / 4, so every 256 / 4 = 64 points in your audio, we calculate a new FFT centered at this time point of n_fft=256 points long. If you want to know the exact time point each window is centered at, that is simply i / Fs with i being the index of the audio signal which would be a multiple of 64.

Now, for each FFT window, for real signals the spectrum is symmetric so we only consider the positive side of the FFT. This is verified by the documentation where the number of rows and hence the number of frequency components is 1 + n_fft / 2 with 1 being the DC component. Since we have this now, consulting the post above the relationship from bin number to the corresponding frequency is i * Fs / n_fft with i being the bin number, Fs being the sampling frequency and n_fft=256 as the number of points in the FFT window. Since we are only looking at the half spectrum, instead of i spanning from 0 to n_fft, this spans from 0 up to 1 + n_fft / 2 instead as the bins beyond 1 + n_fft / 2 would simply be the reflected version of the half spectrum and so we do not consider the frequency components beyond Fs / 2 Hz.

If you wanted to generate a NumPy array of these frequencies, you could just do:

import numpy as np
freqs = np.arange(0, 1 + n_fft / 2) * Fs / n_fft

freqs would be an array that maps the bin number in the FFT to the corresponding frequency. As an illustrative example, suppose our sampling frequency is 16384 Hz, and n_fft = 256. Therefore:

In [1]: import numpy as np

In [2]: Fs = 16384

In [3]: n_fft = 256

In [4]: np.arange(0, 1 + n_fft / 2) * Fs / n_fft
Out[4]:
array([   0.,   64.,  128.,  192.,  256.,  320.,  384.,  448.,  512.,
        576.,  640.,  704.,  768.,  832.,  896.,  960., 1024., 1088.,
       1152., 1216., 1280., 1344., 1408., 1472., 1536., 1600., 1664.,
       1728., 1792., 1856., 1920., 1984., 2048., 2112., 2176., 2240.,
       2304., 2368., 2432., 2496., 2560., 2624., 2688., 2752., 2816.,
       2880., 2944., 3008., 3072., 3136., 3200., 3264., 3328., 3392.,
       3456., 3520., 3584., 3648., 3712., 3776., 3840., 3904., 3968.,
       4032., 4096., 4160., 4224., 4288., 4352., 4416., 4480., 4544.,
       4608., 4672., 4736., 4800., 4864., 4928., 4992., 5056., 5120.,
       5184., 5248., 5312., 5376., 5440., 5504., 5568., 5632., 5696.,
       5760., 5824., 5888., 5952., 6016., 6080., 6144., 6208., 6272.,
       6336., 6400., 6464., 6528., 6592., 6656., 6720., 6784., 6848.,
       6912., 6976., 7040., 7104., 7168., 7232., 7296., 7360., 7424.,
       7488., 7552., 7616., 7680., 7744., 7808., 7872., 7936., 8000.,
       8064., 8128., 8192.])

In [5]: freqs = _; len(freqs)
Out[5]: 129

We can see that we have generated a 1 + n_fft / 2 = 129 element array which tells us the frequencies for each corresponding bin number.


A word of caution

Take note that librosa.display.specshow has a default sampling rate of 22050 Hz, so if you don't set the sampling rate (sr) to the same sampling frequency as your audio signal, the vertical and horizontal axes will not be correct. Make sure you specify the sr input flag to match your sampling frequency of the incoming audio.

这篇关于获取与Librosa中的STFT相关的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆