使用MARF的说话人识别 [英] Speaker Recognition using MARF

查看:144
本文介绍了使用MARF的说话人识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用MARF(模块化音频识别框架)来识别发言人的声音. 在此,我用"A"人的声音训练了MARF,并用"B"人的声音测试了MARF. 使用--train training-samples训练 使用--ident testing-samples/G.wav测试 在我的Speakers.txt文件中,我提到了两个人的语音样本,即A& B.

I am using MARF(Modular Audio Recognition Framework) to recognize the Speaker's voice. In this, i have trained MARF with the voice of person 'A' and tested MARF with voice of person 'B'. Trained using --train training-samples Tested using --ident testing-samples/G.wav In my speakers.txt file I have mentioned the voice samples of both the persons i.e. A & B.

但是我没有得到正确的答复,这意味着受训的语音和测试语音都不同,但是MARF正在提供音频采样匹配.

But I am not getting the correct response means both the trained voice and testing voice are different but MARF is giving the Audio Sampled match.

我也通过此链接.

http://stackoverflow.com/questions/4837511/speaker-recognition

结果

    Config: [SL: WAVE, PR: NORMALIZATION (100), FE: FFT (301), CL: EUCLIDEAN_DISTANCE (503), ID: -1]
         Speaker's ID: 26
   Speaker identified: G

或者我做错了,或者还有其他说话者识别方法可用.

Or i am doing wrong Or is there any other Speaker recognition method available.

编辑------------------------ 现在,我正在使用vText,并且可以轻松使用它. http://basic-signalprocessing.com/voiceRecognition.php 单击此链接,vText也使用MATLAB给出输出.

EDIT ------------------------ Now I am using vText and it can be easily used. http://basic-signalprocessing.com/voiceRecognition.php Follow this link and vText is using MATLAB also too give the output.

我获得了正确的频率时域图,但是我无法比较两个语音样本.我得到了错误

I am getting the correct freq-time domain graph but I am not able to compare the two voice samples.I am getting error

Exception: com.mathworks.toolbox.javabuilder.MWException: Error using ==> eq
Matrix dimensions must agree.
{??? Error using ==> eq
Matrix dimensions must agree.

Error in ==> recognizePartial10k at 10


} 

对此有任何想法的人

推荐答案

我要说的第一件事是,根据我的经验,使用FFT算法不会给您最好的结果:在MARF中尝试LPC

First thing I'd say is, in my experience, using the FFT algorithm won't give you the best result : try LPC in MARF

第二个:MARF假定人们所说的语音是封闭集",这意味着即使系统不知道说话者,它也会始终返回结果->您必须根据距离来确定响应的可能性阈值.

Second : MARF assumes what speech people call a "closed set" which means it will always return results even if the speaker is not known to the system -> you'd have to decide the likelihood of the response based on a distance threshold.

还要确保将滑动窗口(Hamming窗口)的大小设置为与文件的采样率相对应:对于22050 Hz的采样率,使用512个采样值的窗口将产生一个窗口. 23毫秒,以我的经验,在500个扬声器的数据集上返回了最佳结果.

Also make sure the sliding window (Hamming window) size is set accordingly to your file's sample rate : e.g. using a window of 512 sampled values for a sample rate of 22050 Hz yields a window of ca. 23 ms which in my experience returned the best results on a data set of 500 speakers.

由于22050 Hz意味着每秒采样数很多,因此对于任何采样率都容易找到所需的大约25 ms的长度:采样率/1000 * 25

Since 22050 Hz means that much samples per second, finding the desired length of around 25 ms for any sample rate is easy : sample rate / 1000 * 25

请注意,MARF中使用的FFT算法要求一个窗口的功率为2(256/512/1024/...).

Please note that the FFT algorithm used in MARF requires a window of exactly a power of 2 (256 / 512 / 1024 / ...).

但这不是LPC算法所必需的(虽然对于处理器而言,效率可能稍高一些,因为它只知道2的幂:-))

But that's not required for the LPC algorithm (maybe slightly more efficient for the processor though, since powers of 2 is all it knows :-))

哈,别忘了,如果您使用的是立体声文件,则窗口的长度是原来的两倍...但是我建议您使用单声道文件:使用多声道文件进行语音操作没有任何附加价值处理,它变得更长或更不精确.

Ha, and don't forget that if you're using a stereo file, the window is twice as long... but I would advise to use a mono file : there's no added value in using a multichannel file for voice processing, it's longer and less precise.

有关采样率的一个字:选定的采样率应该是您感兴趣的最高频率的两倍.通常,人们认为语音的最高频率是4000Hz,因此选择的采样率是8000Hz. 请注意,这并不完全正确:"s"和"sh"声音可以达到较高的频率.的确,您不需要那些频率就能理解说话者在说什么,但是在提取人声印刷品时,使用更宽的频谱可能会很有用. 我的偏好是22050Hz.某些人声密码套件不允许您低于11000 Hz.

A word on sample rate : the selected sample rate should be twice the highest frequency you're interested in. Usually, people consider that the highest frequency for voice is 4000Hz and thus select a sample rate of 8000Hz. Please note that this is not entirely correct : "s" and "sh" sounds reach for higher frequencies. It's true that you don't need those frequencies to understand what the speaker is saying, but when extracting a vocal print, it might be useful to use a broader spectrum. My preference goes to 22050Hz. Some vocal password packages don't allow you to go below 11000 Hz.

关于位深度的字:8位与16位 采样率是关于时间的精度,而位深度与幅度的精度有关. 8位为您提供256个值 16位为您提供65536个值

A word on bit depth : 8 bits vs 16 bits While the sample rate is the precision regarding time, the bit depth links to the precision of the amplitude. 8 bits gives you 256 values 16 bits gives you 65536 values

不用说为什么要对语音生物特征使用16位:-)

Needless to say why you should use 16 bits for vocal biometry :-)

作为参考,音频CD使用44100Hz/16位

For reference, an audio CD uses 44100Hz / 16 bit

关于vText:正如我之前告诉您的,傅立叶变换(FFT)并不是我发现可用于大型数据集的东西.它缺乏精度.

About vText : as I told you earlier, Fourier Transforms (FFT) is not something I've found to be usable on large data sets. It lacks of precision.

在将计算委托给MathLab时,似乎出现了问题.没有代码,恕我直言,几乎不可能为您提供更多信息.

Here it looks like something goes wrong when delegating calculations to MathLab. Without the code, imho, it's near to impossible to give you more info.

不要犹豫,要求澄清我所说的话,我可能认为有些事情是理所当然的,但并没有意识到它不是很清楚:-)

Don't hesitate to ask for clarification on the things I said, I might take some things for granted and not realize it's not that clear :-)

FWIW,我刚刚用Java编写了一个名为Recognito的演讲者识别工具,我认为它在识别功能上并不比MARF好,但是对于用户而言,在最初的步骤中肯定比较容易,它使用的许可模型不会要求您的软件是开源的,支持来自多个并发线程的调用.

FWIW, I just wrote a Speaker Recognition tool in Java called Recognito, I believe it's not way better than MARF in terms of recognition capabilities, but it's definitely easier on the user for the initial steps, uses a licensing model which doesn't require your software to be open source, supports calls from multiple concurrent threads.

如果您想让Recognito出手: https://github.com/amaurycrickx/recognito

In case you want to give Recognito a shot : https://github.com/amaurycrickx/recognito

这篇关于使用MARF的说话人识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆