在Android中比较两个声音 [英] Compare two voice in android

查看:130
本文介绍了在Android中比较两个声音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个语音消息应用程序,我需要比较两种语音,

I am working on one voice messaging application, I need to compare two voice like,

  1. 通过记录您的声音向应用程序注册
  2. 已将语音邮件发送到另一个用户通过录制语音,但首先需要比较此语音记录个人资料中的声音.

出于安全目的,需要知道记录的消息是否来自特定用户.

Its for security purpose and need to know recorded message is from specific user or not.

我尝试过:

比较Android中的两种声音

但对语音比较一无所知.

But not getting idea about voice Comparison.

如果有人知道,请分享.找不到任何样本可以做到这一点.

Please share if anybody know about the same. Didn't find any sample to do this.

推荐答案

由于您表示这是出于安全目的,因此我想首先分享一些有关语音生物特征的内容:-)

Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)

验证某人的问题是,您需要确保他实际上在那儿说的是已录制的内容……而这与仅比较语音特征相比,是完全不同的复杂程度.

The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.

从样本中提取语音特征并随后计算新样本与第一个样本之间的距离的算法很容易被攻击者制作的录音所欺骗.

Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.

由于在您的情况下有一个人类收件人,因此从随机对话中创建由切碎的单词或句子组成的消息实际上非常困难且耗时.但并非完全不可能...

Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...

为音乐行业创建了非常好的声音处理软件,例如进行一些语音音频输入,使其听起来像第二个音频样本(欺诈者制作的指南)一样(在音调和时间上合理).SynchroArts的Vocalign Pro可以帮助获得完美的人声背景.您可以使用其他语音编辑软件进一步手动调整音频,以达到可以被接收者立即检测到的可接受的质量水平.

There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.

根据攻击者希望您的用户说什么,只要他拥有了他想要的所有记录材料,过程的复杂度就可能从一个小时到一天不等.

Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...

要对抗这种类型的攻击,您需要检测音频样本已被编辑.数字版将留下不自然的痕迹.例如.在声音周围的背景噪音中.

To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.

AFAICT,只有最好的商业软件才能达到此级别的安全性检查,但是我不能说出它们在检测到此类编辑方面走了多远.

AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.

从纯粹的安全角度来看,您还需要确保设备没有受到破坏.因此,这些语音验证检查应在服务器端进行,而不是在电话本身上进行.

From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.

请注意,这些是一般注意事项,这完全取决于您的用例实际需要哪种安全措施.我的汽车防盗器肯定不是牢不可破的,但是它有助于提高门槛,因此更少的攻击者可以偷走它...

Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...

要考虑的另一件事是,生物统计学从定义上说是一个统计过程,它将产生一定百分比的假阳性和假阴性.通过更改接受阈值,您可以降低其中一个阈值,但要以提高另一个阈值为代价.

Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.

选择适当的阈值将需要您拥有大量的测试数据.说至少要录制200分钟的扬声器1分钟才能开始拍照.

Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.

我想您还需要考虑的另一件事是人的声音固有的可变性.人们可能生病了,在某些情况下可能使声音无法识别.情绪状态也可能起一定作用:悲伤或愤怒会发出不同的声音……

One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...

最后但并非最不重要的一点是,周围的噪音可能会带来问题.假设用户在家中登记,后来又在繁忙的城市环境中在旅途中记录一条消息,则该系统可能难以确定实际上是同一个人在讲话.信噪比无疑将是您的主要问题之一.小提示:根据麦克风到嘴巴的距离,该比率会大不相同.当用户像在常规电话交谈中那样将电话靠近面部时,比在记录消息时注视屏幕时,您会获得更好的结果.

And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.

语音变异性和信噪比可能是假阴性结果背后的主要原因.

Voice variability and signal to noise ratio are probably the main reasons behind false negative results.

希望您现在对即将面临的挑战有了更好的了解,我可以开始分享一些有关开放源代码库和商业库的指针.

Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.

AFAIK,没有包含欺诈者检测的开源库...您可能需要检查Nuance Communication的最新技术.还有很多其他供应商,只要与Google确认,我就因为其声誉而只提到了Nuance.

AFAIK, there are no open source libraries that includes fraudster detection... You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.

有一个名为Alize的OSS库(使用LGPL许可以C ++编写),该库使用一种称为MFCC(梅尔频率倒谱系数)的算法.众所周知,MFCC可以带来出色的结果.由于该软件针对愿意改进该主题的最新技术的研究人员,因此预期会有陡峭的学习曲线,并且所使用的词汇非常具体.

There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.

我为常规开发人员编写了一个名为Recognito的OSS库(Java,Apache 2.0),因此您应该可以在几分钟内对其进行测试.该库还很年轻,在改进算法之前,我首先关注它的API.目前,我使用的算法称为线性预测编码(LPC),并且可以带来良好的效果(如果录音的质量相同,我也有很好的效果:-).我目前正在发布包含匹配结果中似然系数的新版本.路线图上已经有了MFCC的实现.有很多javadoc,代码应该非常简单... https://github.com/amaurycrickx/recognito

I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map. There is plenty of javadoc and the code should be very straightforward... https://github.com/amaurycrickx/recognito

Recognito依赖于javax.sound包来进行音频文件处理.您可能需要查看此帖子以了解在Android中使用该帖子需要什么:android

Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android

鉴于许多人需要Android设备,所以我会在不久的将来对此做一些事情,而不是说应该如何修改lib:-)

Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)

HTH

这篇关于在Android中比较两个声音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆