音频分析可检测人的声音,性别,年龄和情感-之前是否进行过开源工作? [英] Audio analysis to detect human voice, gender, age and emotion -- any prior open-source work done?

查看:355
本文介绍了音频分析可检测人的声音,性别,年龄和情感-之前是否进行过开源工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在音频分析"领域是否有进行过的开源工作来检测人声(尽管有一些背景噪音),确定说话者的性别,甚至可能确定否.说话者,说话者的年龄和说话者的情绪?

Is there prior open-source work done in the field of 'Audio analysis' to detect human-voice (say in spite of some background noise), determine speaker's gender, possibly determine no. of speakers, age of speaker(s), and the emotion of speakers?

我的直觉是,像CMU Sphinx这样的语音识别软件可能是一个很好的起点,但是如果有更好的选择,那就太好了.

My hunch is that the speech recognition software like CMU Sphinx could be a good place to start, but if there's something better, it'd be great.

推荐答案

我是一名从事语音识别研究的研究生.这些都是开放式研究问题,不幸的是,我不知道可以立即使用这些功能的开源软件包.

I'm a graduate student doing speech recognition research. These are open research problems, and, unfortunately, I'm not aware of open-source packages that can do these things out of the box.

如果您具有实施信号处理或机器学习算法的背景,则可以尝试使用以下搜索词查找学术论文:

If you have some background in implementing signal-processing or machine-learning algorithms, you could try looking up academic papers using some of these search terms:

  • 性别识别(有时也称为性别识别):从语音中预测说话者的性别
  • 年龄识别:预测说话者的年龄
  • 说话人识别:从一组可能的说话人中预测语音话语中最有可能的说话人
  • 说话者验证:接受或拒绝某项说话属于说话者(想象一种语音记录"型授权系统)
  • 演讲者区分:获取带有多个文件的音频文件,并标记出哪些语音片段属于哪个演讲者
  • 情感识别:通过语音发声来预测说话者的情感(这是一个非常新的研究领域).
  • gender identification (sometimes called gender recognition): predicting the gender of the speaker from the speech utterance
  • age identification: predicting the age of the speaker
  • speaker identification: predicting, from a set of possible speakers, the most likely speaker in a speech utterance
  • speaker verification: accepting or rejecting an utterance as belonging to a speaker (imagine a "voiceprint"-type authorization system)
  • speaker diarization: taking an audio file with multiple files and labeling which segments of speech belong to which speaker
  • emotion recognition: predicting the speaker's emotion from a speech utterance (a very new area of research).

根据 http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#speaker_identification ,CMU Sphinx(可能是目前领先的开源语音识别器)不支持说话者识别(

According to http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#speaker_identification, CMU Sphinx, which is probably the leading open-source speech recognizer out there, does not support speaker identification (http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#speaker_identification); I'm doubtful that it has any of the other capabilities described above.

一些学术研究人员将其代码在线发布,并且/或者可能愿意与您共享.对Google Scholar的搜索揭示了许多使用Sphinx撰写硕士学位或博士学位论文的人,因此这可能是一个不错的起点.

Some academic researchers post their code online, and/or might be willing to share it with you. A search of Google Scholar reveals many people who've written Master's or PhD theses using Sphinx, so that could be a good place to start.

最后,如果您了解一点信号处理,则可以尝试实施非常粗糙的性别识别算法,而无需进入语音识别器本身.基本上,男性和女性声音的基本频率有所不同-根据Wikipedia( http://en.wikipedia.org/wiki/Voice_frequency ),男性声音介于85-180Hz之间,而女性声音则介于165Hz-255Hz之间.您可以使用类似sox的方法来确定发声的频谱(使用所谓的快速傅立叶变换),并根据一些摘要统计信息(例如平均频率)将语音分为男性"或女性". ="http://classicalconvert.com/tag/sox/" rel ="noreferrer"> http://classicalconvert.com/tag/sox/).要使该功能稳定运行(即在许多扬声器,麦克风或录音环境中),您可以做很多事情.我不确定是否可以预测要达到70%的准确性需要多少时间和精力,因为这取决于您任务的性质;我的感觉是90%以上肯定会非常困难.

Lastly, you could try to implement a very crude gender-recognition algorithm without getting into the speech recognizer itself, if you know a little bit of signal processing. Basically, male and female voices differ in their fundamental frequency - according to Wikipedia (http://en.wikipedia.org/wiki/Voice_frequency), male voices are between 85-180Hz, while female voices are 165Hz-255Hz. You could use something like sox to determine the frequency spectrum (using something called the fast Fourier transform) of an utterance and classify speech as "male" or "female" depending on some summary statistic like the average frequency (see http://classicalconvert.com/tag/sox/). To make this work robustly (i.e. with many speakers, microphones, or recording environments), there are plenty of things that you can do. I'm not sure if I can predict how much time and effort would be required to get 70% accuracy, since it would depend on the nature of your task; my sense is that 90%+ would definitely be very hard.

祝你好运!

这篇关于音频分析可检测人的声音,性别,年龄和情感-之前是否进行过开源工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆