FFT音高检测-旋律提取 [英] FFT Pitch Detection - Melody Extraction

查看:194
本文介绍了FFT音高检测-旋律提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个音调检测程序,该程序将从帧的FFT获得的功率谱中提取基本频率.这是我到目前为止的内容:

  • 将输入音频信号分为几帧.
  • 将框架与汉明窗相乘
  • 计算帧sqrt(real ^ 2 + img ^ 2)的FFT和幅度
  • 通过谐波乘积频谱找到基频(峰值)
  • 将峰值频率(bin频率)转换为音符(例如〜440 Hz为A4)

现在,程序将为每个帧生成一个从0到87的整数.根据我在此处中找到的公式,每个整数对应一个钢琴音符.我现在尝试通过基于计算出的音符合成声音来模仿输入信号中的旋律.我试图简单地产生一个幅度和频率与基频相对应的正弦波,但结果听起来与原始声音完全不同(几乎听起来像是随机的哔哔声).

我不是很了解音乐,因此,根据我拥有的音乐,我是否可以根据从基频获得的信息来产生类似于输入(乐器,声音,乐器+声音)的旋律的声音?如果没有,我可以尝试使用目前拥有的代码还有什么其他想法.

谢谢!

解决方案

这很大程度上取决于您要处理的音乐内容-提取单声道录音的音高(即单个乐器或声音)与提取音调并不相同从复音混合中单个乐器的音高(例如,从复音录音中提取旋律的音高).

对于单音基音提取,您可以尝试在时域和频域中实现各种算法.几个示例包括Yin(时域)和HPS(频域),有关这两者的更多详细信息的链接,请参见Wikipedia:

但是,如果您想从和弦材料中提取旋律,那么这两种方法都不会奏效.从和弦音乐中提取旋律仍然是一个研究问题,没有简单的步骤可循.研究社区提供了一些可供您试用的工具(尽管仅用于非商业用途),即:

最后一点,在合成输出时,我建议合成您提取的连续音高曲线(最简单的方法是每隔X ms(例如10)估算音高并合成变化的正弦波频率每10毫秒一次,以确保连续相位).这将使您的结果听起来更加自然,并且避免了将连续音高曲线量化为离散音符时所涉及的额外错误(这本身就是另一个问题).

I am creating a pitch detection program that extracts the fundamental frequency from the power spectrum obtained from the FFT of a frame. This is what I have so far:

  • divide input audio signal into frames.
  • multiply frame with a Hamming window
  • compute the FFT and magnitude of the frame sqrt(real^2 + img^2)
  • find the fundamental frequency (peak) by harmonic product spectrum
  • convert the frequency of the peak (bin frequency) to note (e. g. ~440 Hz is A4)

Now the program produces an integer with value from 0 to 87 for each frame. Each integer corresponds to a piano note according to a formula I found here. I am now trying to imitate the melodies in the input signal by synthesizing sounds based on the calculated notes. I tried to simply generate a sine wave with magnitude and frequency corresponding to the fundamental frequency but the result sounded nothing like the original sound (almost sounded like random beeps).

I don't really understand music so based on what I have, can I generate a sound with melodies similar to the input (instrument, voice, instrument + voice) based on the information I get from the fundamental frequency? If not, what other ideas can I try using the code I currently have.

Thanks!

解决方案

It depends greatly on the musical content you want to work with - extracting the pitch of a monophonic recording (i.e. single instrument or voice) is not the same as extracting the pitch of a single instrument from a polyphonic mixture (e.g. extracting the pitch of the melody from a polyphonic recording).

For monophonic pitch extraction there are various algorithm you could try to implement both in the time domain and frequency domain. A couple of examples include Yin (time domain) and HPS (frequency domain), link to further details on both are provided in wikipedia:

However, neither will work well if you want to extract the melody from polyphonic material. Melody extraction from polyphonic music is still a research problem, and there isn't a simple set of steps you can follow. There are some tools out there provided by the research community that you can try out (for non-commercial use only though), namely:

As a final note, when synthesizing your output I'd recommend synthesizing the continuous pitch curve that you extract (the easiest way to do this is to estimate the pitch every X ms (e.g. 10) and synthesize a sine wave that changes frequency every 10 ms, ensuring continuous phase). This will make your result sound a lot more natural, and you avoid the extra error involved in quantizing a continuous pitch curve into discrete notes (which is another problem in its own).

这篇关于FFT音高检测-旋律提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆