STFT澄清(用于实时输入的FFT) [英] STFT Clarification (FFT for real-time input)

查看:238
本文介绍了STFT澄清(用于实时输入的FFT)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解了通过相关性进行DFT的工作原理,并将其用作理解FFT结果的基础.如果我有一个以44.1kHz采样的离散信号,那么这意味着如果我要采集1s的数据,我将有44,100个采样.为了对此进行FFT,我必须有一个44,100的数组和一个N = 44,100的DFT,以便获得检测高达22kHz频率所需的分辨率,对吗? (因为FFT只能将输入与正弦分量相关联,直到N/2的频率)

I get how the DFT via correlation works, and use that as a basis for understanding the results of the FFT. If I have a discrete signal that was sampled at 44.1kHz, then that means if I were to take 1s of data, I would have 44,100 samples. In order to run the FFT on that, I would have to have an array of 44,100 and a DFT with N=44,100 in order to get the resolution necessary to detect a frequencies up to 22kHz, right? (Because the FFT can only correlate the input with sinusoidal components up to a frequency of N/2)

这显然是很多数据点和计算时间,并且我读到这是短时FT(STFT)出现的地方.如果我随后进行前1024个采样(〜23ms)并在FFT上运行那,然后取一个重叠的1024个样本,我可以每23ms获得信号的连续频域.那我该如何解释输出呢?如果静态数据的FFT输出是N/2个数据点,其带宽为fs/(N/2),那么STFT频率输出的带宽是多少?

That's obviously a lot of data points and calculation time, and I have read that this is where the Short-time FT (STFT) comes in. If I then take the first 1024 samples (~23ms) and run the FFT on that, then take an overlapping 1024 samples, I can get the continuous frequency domain of the signal every 23ms. Then how do I interpret the output? If the output of the FFT on static data is N/2 data points with fs/(N/2) bandwidth, what is the bandwidth of the STFT's frequency output?

这是我在Mathematica中运行的一个示例:

Here's an example that I ran in Mathematica:

100Hz正弦波,采样率为44.1kHz:

100Hz sine wave at 44.1kHz sample rate:

然后,我仅在前1024个点上运行FFT:

Then I run the FFT on only the first 1024 points:

然后,感兴趣的频率在数据点3处,该点应以某种方式对应于100Hz.我认为44100/1024 = 43类似于比例因子,这意味着在此小窗口中具有1Hz的信号将对应于整个数据阵列中的43Hz的信号.但是,这将为我提供43Hz * 3 = 129Hz的输出.我的逻辑正确但我的实现不正确吗?

The frequency of interest is then at data point 3, which should somehow correspond to 100Hz. I think 44100/1024 = 43 is something like a scaling factor, which means that a signal with 1Hz in this little window will then correspond to a signal of 43Hz in the full data array. However, this would give me an output of 43Hz*3 = 129Hz. Is my logic correct but not my implementation?

推荐答案

正如我在前面的评论中已经提到的那样,变量N影响输出频谱可实现的分辨率,而不影响您可以检测到的频率范围. N越大,分辨率越高,但计算时间越长; N越小,则计算时间越短,但是会导致光谱泄漏,这是您在上图中看到的效果.

As I have already stated in my earlier comments, the variable N affects the resolution achievable by the output frequency spectrum and not the range of frequencies you can detect.A larger N gives you a higher resolution at the expense of higher computation time and a lower N gives you lower computation time but can cause spectral leakage, which is the effect you have seen in your last figure.

对于您的另一个问题,理论上FFT的带宽是无限的,但是我们将结果限制在[-fs/2至fs/2]范围内的频带,因为该频带之外的所有频率容易受到混叠的影响,因此没有用.此外,如果输入信号是真实的(在包括我们在内的大多数情况下都是正确的),则[-fs/2至0]的频率仅是[0至fs/2]的频率的反映,因此某些FFT程序仅输出[0至fs的FFT频谱/2],我认为这适用于您的情况.这意味着您作为输出接收到的N/2个数据点表示[0到fs/2]范围内的频率,因此这就是您正在使用的带宽.在FFT的情况下,在STFT的情况下(STFT只是一系列FFT的,STFT中的每个FFT都会为您提供在该频带中具有数据点的频谱).

As for your other question, well, theoretically the bandwidth of an FFT is infinite but we band-limit our result to the band of frequencies in the range [-fs/2 to fs/2] because all frequencies outside that band are susceptible to aliasing and are therefore of no use.Furthermore, if the input signal is real (which is true in most cases including ours) then the frequencies from [-fs/2 to 0] are just a reflection of the frequencies from [0 to fs/2] and so some FFT procedures just output the FFT spectrum from [0 to fs/2], which I think applies to your case.This means that the N/2 data points that you received as output represent the frequencies in the range [0 to fs/2] so that is the bandwidth you are working with in the case of the FFT and also in the case of the STFT (the STFT is just a series of FFT's, each FFT in a STFT will give you a spectrum with data points in this band).

我还要指出,如果您的输入是诸如音乐之类的变化信号,则STFT最有可能不会减少您的计算时间,因为在这种情况下,您将需要执行几次 >在歌曲的持续时间内可以使用,但是,与仅执行一次FFT相比,它可以使您更好地理解歌曲的频率特性.

I would also like to point out that the STFT will most likely not reduce your computation time if your input is a varying signal such as music because in that case you will need to take perform it several times over the duration of the song for it to be of any use, it will however enable you to understand the frequency characteristics of your song much better that you would do if you just performed one FFT.

要可视化FFT的结果,请使用频率(和/或相位)频谱图,但是要可视化STFT的结果,则很可能需要创建光谱图维基百科页面以获取更多信息.

To visualise the results of an FFT you use frequency (and/or phase) spectrum plots but in order to visualise the results of an STFT you will most probably need to create a spectrogram which is basically a graph can is made by just basically putting the individual FFT spectrums side by side.The process of creating a spectrogram can be seen in the figure below (Source: Dan Ellis - Introduction to Speech Processing).The spectrogram will show you how your signal's frequency characteristics change over time and how you interpret it will depend on what specific features you are looking to extract/detect from the audio.You might want to look at the spectrogram wikipedia page for more information.

这篇关于STFT澄清(用于实时输入的FFT)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆