从音频的波形数据中检测说话者的性别 [英] Gender detection of the speaker from wave data of the audio

查看:809
本文介绍了从音频的波形数据中检测说话者的性别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为我正在处理的新闻视频翻译应用添加性别检测功能,以便应用可以根据屏幕上的语音在男女声音之间切换。我不期望100%的准确性。
我使用EZAudio获取音频时间段的波形数据,并使用平均RMS值设置男性和女性之间的阈值(cutOff)值。最初cutOff = 3.3。

I would like to add a gender detection capability to a news video translator app I'm working on, so that the app can switch between male and female voice according to the voice onscreen. I'm not expecting 100% accuracy. I used EZAudio to obtain waveform data of a time period of audio and used the average RMS value to set a threshold(cutOff) value between male and female. Initially cutOff = 3.3.

    - (void)setInitialVoiceGenderDetectionParameters:(NSArray *)arrayAudioDetails
    {
        float initialMaleAvg = ((ConvertedTextDetails *)[arrayAudioDetails firstObject]).audioAverageRMS;
        // The average RMS value of a time period of Audio, say 5 sec
        float initialMaleVector = initialMaleAvg * 80;
        // MaleVector is the parameter to change the threshold according to different news clippings
        cutOff = (initialMaleVector < 5.3) ? initialMaleVector : 5.3;
        cutOff = (initialMaleVector > 23) ? initialMaleVector/2 : 5.3;
    }

最初adjustValue = -0.9和tanCutOff = 0.45。这些值5.3,23,cutOff,adjustValue和tanCutOff都是通过严格的测试获得的。此外,tan值用于放大值的差异。

Initially adjustValue = -0.9 and tanCutOff = 0.45. These values 5.3, 23, cutOff, adjustValue and tanCutOff are obtained from rigorous testing. Also tan of values are used to magnify the difference in values.

    - (BOOL)checkGenderWithPeekRMS:(float)pRMS andAverageRMS:(float)aRMS
{
    //pRMS is the peak RMS value in the audio snippet and aRMS is the average RMS value
    BOOL male = NO;
    if(tan(pRMS) < tanCutOff)
    {
        if(pRMS/aRMS > cutOff)
        {
            cutOff = cutOff + adjustValue;
            NSLog(@"FEMALE....");
            male = NO;
        }
        else
        {
            NSLog(@"MALE....");
            male = YES;
            cutOff = cutOff - adjustValue;
        }
    }
    else
    {
        NSLog(@"FEMALE.");
        male = NO;
    }

    return male;
}

adjustValue的用法是每次翻译新闻视频时校准阈值因为每个视频都有不同的噪音水平。但我知道这种方法是noob-ish。我能做些什么来创造一个稳定的门槛?或者我如何规范化每个音频片段?

Usage of the adjustValue is to calibrate the threshold each time a news video is translated as each video has different noise levels. But I know this method is noob-ish. What can I do create a stable threshold? or How can I normalise each audio snippet?

欢迎使用音频波数据确定性别的替代或更有效方法。

Alternate or more efficient ways to determine gender from audio wave data is also welcome.

编辑:来自Nikolay's建议我使用CMU Sphinx研究性别识别。任何人都可以建议如何使用Open Ears(适用于iOS平台的CMU Sphinx)提取MFCC功能并输入GMM / SVM分类器?

From Nikolay's suggestion I researched on gender recognition using CMU Sphinx. Can anybody suggest how can I extract MFCC features and feed into a GMM/SVM classifier using Open Ears (CMU Sphinx for iOS platform) ?

推荐答案

使用MFCC功能的GMM分类器可以实现准确的性别识别。你可以在这里阅读:

Accurate gender identification can be implemented with GMM classifier of MFCC features. You can read about it here:

基于GMM监督员和支持向量机的电话应用的年龄和性别识别

到我约会的日期虽然很多组件都可以在CMUSphinx这样的开源语音识别工具包中使用,但并不知道这是开源实现的。

To the date I am not aware of open source implementation of this, though many components are available in open source speech recognition toolkits like CMUSphinx.

这篇关于从音频的波形数据中检测说话者的性别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆