算法,用于确定一个音频样本的关键 [英] Algorithms for determining the key of an audio sample

查看:120
本文介绍了算法,用于确定一个音频样本的关键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我感兴趣的是确定音频采样的音乐键。如何将(​​或可以)的算法去试图接近音乐的音频采样的关键?

I am interested in determining the musical key of an audio sample. How would (or could) an algorithm go about trying to approximate the key of a musical audio sample?

安塔尔自整定和的Melodyne有两件软件,做这样的事情。

Antares Autotune and Melodyne are two pieces of software that do this sort of thing.

谁能给一个有点外行的解释,关于如何做到这一点的?要通过频谱的和弦进行分析等数学演绎一首歌曲的关键。

Can anyone give a bit of a layman's explanation about how this would work? To mathematically deduce the key of a song by analysing the frequency spectrum for chord progressions etc.

这个话题感兴趣了我很多!

This topic interests me a lot!

修改 - 辉煌的来源和丰富的信息,可以找到从大家谁促成了这一问题

特别是从:the_mandrill和丹尼尔·布鲁克纳

Especially from: the_mandrill and Daniel Brückner.

推荐答案

这是值得意识到,这是一个非常棘手的问题,如果你没有在信号处理的背景(或有兴趣了解它),然后你有你前面的一个非常令人沮丧的时间。如果你希望扔一对夫妇的FFT的问题,那么你就不会走得很远。我希望你有兴趣,因为这是一个非常有趣的地方。

It's worth being aware that this is a very tricky problem and if you don't have a background in signal processing (or an interest in learning about it) then you have a very frustrating time ahead of you. If you're expecting to throw a couple of FFTs at the problem then you won't get very far. I hope you do have the interest as it is a really fascinating area.

最初有间距的认可,这是比较容易做到的使用方法简单的单音乐器(如话音),如自相关或谐波频谱之和的问题(例如,见保罗的r链接)。但是,你经常会发现,这给了错误的结果:你经常会得到一半或一倍您期望的间距。这就是所谓的音调的倍周期的或的八度误差的和它发生本质上是因为在FFT或自相关具有这样的数据具有随时间恒定特性的假设。如果您有由人扮演的仪器总是会有一些变化。

Initially there is the problem of pitch recognition, which is reasonably easy to do for simple monophonic instruments (eg voice) using a method such as autocorrelation or harmonic sum spectrum (eg see Paul R's link). However, you'll often find that this gives the wrong results: you'll often get half or double the pitch that you were expecting. This is called pitch period doubling or octave errors and it occurs essentially because the FFT or autocorrelation has an assumption that the data has constant characteristics over time. If you have an instrument played by a human there will always be some variation.

有些人做法的关键问题的识别作为第一个做的音调识别,然后发现从球场的序列中的关键问题。这是如果您还有什么其他的不是音高的单音序列难以置信困难。如果你有间距的单声道序列,那么它仍然没有确定密钥的明确的方法:你如何处理与彩色音符,例如,或者确定它是否是大或小。所以,你需要使用类似Krumhansl的关键发现算法的方法。

Some people approach the problem of key recognition as being a matter of doing the pitch recognition first and then finding the key from the sequence of pitches. This is incredibly difficult if you have anything other than a monophonic sequence of pitches. If you do have a monophonic sequence of pitches then it's still not a clear cut method of determining the key: how you deal with chromatic notes, for instance, or determining whether it's major or minor. So you'd need to use a method similar to Krumhansl's key finding algorithm.

因此​​,考虑到这一做法的复杂性,另一种是看所有的音符正在播放的同时。如果你有和弦,或一个以上的乐器,那么你就要有丰富的频谱汤许多血窦玩一次。每个单独的音符由多个谐波的基本频率,所以(在440Hz的)将在440由血窦,880,1320 ...此外,如果你玩的E (见本的球场),那么这就是659.25Hz是的几乎的之一,一个半倍的(实际上1.498)。这意味着,它的每一三次谐波恰逢E.每2次谐波这是和弦音质悦耳,因为它们共享谐波的原因。 (顺便说一句,所有的原因,西方和谐的工作原理是,由于命运的怪癖,第2第十二根电源7接近1.5)

So, given the complexity of this approach, an alternative is to look at all the notes being played at the same time. If you have chords, or more than one instruments then you're going to have a rich spectral soup of many sinusoids playing at once. Each individual note is comprised of multiple harmonics a fundamental frequency, so A (at 440Hz) will be comprised of sinusoids at 440, 880, 1320... Furthermore, if you play an E (see this diagram for pitches) then that is 659.25Hz which is almost one and a half times that of A (actually 1.498). This means that every 3rd harmonic of A coincides with every 2nd harmonic of E. This is the reason that chords sound pleasant, because they share harmonics. (as an aside, the whole reason that western harmony works is due to the quirk of fate that the twelfth root of 2 to the power 7 is nearly 1.5)

如果你看超出了间隔5日至主要,次要和其他和弦,那么你会发现其他的比率。我认为,许多重要的发现技术,将枚举这些比率,然后填写一个直方图信号中的每个谱峰。你会因此在检测弦A5的情况下,期望找到峰值在440,880,659,1320,1760,1977年为B5这将是494,988,741,等等。因此,创建一个频率直方图和每一个在信号中的正弦峰值(如​​从FFT功率谱)增加直方图条目。然后,每个按键AG总结出在你的直方图仓和那些最条目是最有可能成为你的关键。

If you looked beyond this interval of a 5th to major, minor and other chords then you'll find other ratios. I think that many key finding techniques will enumerate these ratios and then fill a histogram for each spectral peak in the signal. So in the case of detecting the chord A5 you would expect to find peaks at 440, 880, 659, 1320, 1760, 1977. For B5 it'll be 494, 988, 741, etc. So create a frequency histogram and for every sinusoidal peak in the signal (eg from the FFT power spectrum) increment the histogram entry. Then for each key A-G tally up the bins in your histogram and the ones with the most entries is most likely to be your key.

这只是一个非常简单的方法,但可能足以找到一个弹奏或持续和弦的关键。你也不得不砍信号转换成小的区间(例如20毫秒),并分析每个人建立一个更强大的估计。

That's just a very simple approach but may be enough to find the key of a strummed or sustained chord. You'd also have to chop the signal into small intervals (eg 20ms) and analyse each one to build up a more robust estimate.

编辑:
如果您想尝试,那么我建议下载一个软件包如倍频或的CLAM 的,这使得它更容易地可视化的音频数据,并运行的FFT等操作。


If you want to experiment then I'd suggest downloading a package like Octave or CLAM which makes it easier to visualise audio data and run FFTs and other operations.

其他有用链接:

  • 我的博士论文的间距承认某些方面 - 数学是有点重,但去第2章是(我希望)相当接近建模简介音乐音频的不同的方法
  • <一个href="http://en.wikipedia.org/wiki/Auditory_scene_analysis">http://en.wikipedia.org/wiki/Auditory_scene_analysis - 布雷格曼的听觉场景分析,虽然不是在谈论音乐有关于我们如何看待复杂的场景中的一些有趣的发现
  • 丹埃利斯做了这个伟大的一些论文和类似地区
  • 基思·马丁有一些有趣的方法
  • My PhD thesis on some aspects of pitch recognition -- the maths is a bit heavy going but chapter 2 is (I hope) quite an accessible introduction to the different approaches of modelling musical audio
  • http://en.wikipedia.org/wiki/Auditory_scene_analysis -- Bregman's Auditory Scene analysis which though not talking about music has some fascinating findings about how we perceive complex scenes
  • Dan Ellis has done some great papers in this and similar areas
  • Keith Martin has some interesting approaches

这篇关于算法,用于确定一个音频样本的关键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆