两个音频序列之间的感知相似性 [英] Perceptual similarity between two audio sequences

查看:120
本文介绍了两个音频序列之间的感知相似性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获得两块音频之间的某种距离度量的。例如,我要比较的动物到人类模仿动物的声音的声音,然后返回的声音多么相似的是一个分数。

I would like to get some sort of distance measure between two pieces of audio. For example, I want to compare the sound of an animal to the sound of a human mimicking that animal, and then return a score of how similar the sounds were.

这似乎是一个棘手的问题。什么是接近它的最佳方法是什么?我在想提取的音频信号的几个特点,然后做这些特征的欧氏距离或余弦相似性(或类似的东西)。什么样的功能,很容易提取和用于确定声音的感知差异?

It seems like a difficult problem. What would be the best way to approach it? I was thinking to extract a couple of features from the audio signals and then do a Euclidian distance or cosine similarity (or something like that) on those features. What kind of features would be easy to extract and useful to determine the perceptual difference between sounds?

(我看到的东西上Ahazam如何使用散列,但这似乎是一个不同的问题,因为有两块音频的是完全一样的,增加噪音。然而在这种情况下,两片的音频的是不同样的,他们只是感觉上相似)

(I saw something on how Ahazam uses hashing, but that seemed like a different problem because there the two pieces of audio are exactly the same, with the addition of noise. Whereas in this case the two pieces of audio are not the same, they are just perceptually similar)

推荐答案

这个过程比较了一组相似的声音被称为基于内容的音频的索引,<一个href="http://www.google.com/search?hl=en&sa=X&oi=spell&resnum=0&ct=result&cd=1&q=content+based+audio+retrieval&spell=1">Retrieval,和<一href="http://www.google.com/search?hl=en&q=content+based+audio+fingerprinting&btnG=Search">Fingerprinting在计算机科学的研究。

The process for comparing a set of sounds for similarities is called Content Based Audio Indexing, Retrieval, and Fingerprinting in computer science research.

这样做的一种方法是:

  1. 运行的信号处理的若干位上的每个音频文件中提取的特征,比如音调随时间,频率谱,自相关,动态范围,瞬变等

  1. Run several bits of signal processing on each audio file to extract features, such as pitch over time, frequency spectrum, autocorrelation, dynamic range, transients, etc.

把所有的特征为每一个音频文件到一个多维数组和转储各多维数组到数据库中

Put all the features for each audio file into a multi-dimensional array and dump each multi-dimensional array into a database

使用优化技术(如梯度下降)找到在一个给定的音频文件的最佳匹配您的多维数据的数据库。

Use optimization techniques (such as gradient descent) to find the best match for a given audio file in your database of multi-dimensional data.

诀窍使这项工作做好是哪些功能来接。自动执行此操作,并获得了良好的效果可能会非常棘手。该球员在潘多拉做到这一点非常好,在我看来,他们身边有最好的相似性匹配。他们EN code。通过他们的手,虽然载体,通过让人们听音乐,并评价他们在许多不同的方式。看到自己的音乐基因组计划音乐基因组计划属性列表获取更多信息。

The trick to making this work well is which features to pick. Doing this automatically and getting good results can be tricky. The guys at Pandora do this really well, and in my opinion they have the best similarity matching around. They encode their vectors by hand though, by having people listen to music and rate them in many different ways. See their Music Genome Project and List of Music Genome Project attributes for more info.

有关自动测量距离,有几个项目做这样的东西,其中包括 marsysas ,的的MusicBrainz EchoNest

For automatic distance measurements, there are several projects that do stuff like this, including marsysas, MusicBrainz, and EchoNest.

Echonest有我已经看到了在这个空间href="http://developer.echonest.com/">简单的API之一。很容易上手。

Echonest has one of the simplest APIs I've seen in this space. Very easy to get started.

这篇关于两个音频序列之间的感知相似性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆