简单的语音识别方法 [英] simple speech recognition methods

查看:94
本文介绍了简单的语音识别方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是的,我知道语音识别相当复杂(轻描淡写).我正在寻找一种区分也许 20-30个短语的方法.具有分词功能(可以使用离散语音)会很不错,但这不是必需的.该软件将取决于用户(即供我使用).我不是在寻找现有的软件,而是寻找一种自己进行此操作的好方法.我研究了各种现有方法,将声音分割成音素似乎很普遍,但对于我的需求来说却有些过分.

Yes, I'm aware that speech recognition is fairly complicated (as an understatement). What I'm looking for is a method for distinguishing between maybe 20-30 phrases. An ability to split words (discrete speech is fine) would be nice, but isn't required. The software will be user-dependent(i.e. for use by me). I'm not looking for existing software, but for a good way of going about doing this myself. I've looked into various existing methods and it seems like splitting the sound into phonemes, while common, is somewhat excessive for my needs.

对于某些情况,我只是在寻找一种通过一些简单的语音命令来控制计算机某些方面的方法.我知道Windows已经有语音识别软件,但是我想自己做一个学习练习.命令很简单,例如打开Goog​​le"或静音".我想到的(不确定这是否是个好主意)是某些命令会变得复杂.因此,静音"将只是静音".而打开"命令可以单独识别,然后具有其后缀(Google,Photoshop等).被其他网络/模型/其他识别.但是我不确定以这种方式查找前缀/断字是否会比不必处理数量更多的单个命令会产生更好的结果.

For some context, I'm just looking for a way to control some aspects of my computer with a few simple voice commands. I'm aware that Windows already has speech recognition software, but I'd like to go about this one myself as a learning exercise. Commands would be simple like "Open Google", or "Mute". What I had in mind (not sure if this is a good idea) is that some commands would be compound. So "Mute" would just be "Mute". Whereas the "Open" command could be recognized individually, and then have its suffixes (Google, Photoshop, etc). recognized with another network/model/whatever. But I'm not sure if looking for prefixes/word breaks in this way would produce better results than having to deal with an increased number of individual commands.

我一直在研究感知器,hopfield网络(尽管它们对我的理解有些过时)和HMM,尽管我理解了这些概念背后的想法(我之前已经实现了ANN),但我并没有真正知道最适合此任务的.我以为线性矢量量化模型也很合适,但为此目的我找不到太多文献.任何指导/资源将不胜感激.

I've been looking into perceptrons, hopfield networks (though they're somewhat obsolete from what I understand) and HMMs, and while I understand the ideas behind these (I've implemented the ANNs before) I don't really know which is best suited to this task. I'm assuming that linear vector quantization models would also be appropriate, but I can't really find much literature to this end. Any guidance/resources would be greatly appreciated.

推荐答案

前一段时间,我读了一份有关有限词汇系统的白皮书,该系统使用了一个简单的识别过程.系统将每个发声分成少量的bin(时间,如果我没记错的话,时间为6个,强度为4个,总共24个),而所做的只是计算每个bin中的样本音频测量的数量.有一个模糊的逻辑规则库,该规则库然后为每个发音解释24 bin计数,并产生一个解释.

Some time ago, I read a whitepaper about a limited vocabulary system, which used a simple recognition process. The system divided each utterance into a small number of bins (6 in time, and 4 in magnitude, if I remember correctly, for 24 total), and all it did was count the number of sample audio measurements in each bin. There was a fuzzy logic rule base which then interpreted each utterances 24 bin counts, and generated an interpretation.

我想(对于某些应用程序)一个简单的匹配过程也可能会起作用,在这种情况下,当前话语的24 bin计数与每个存储的原型的bin计数简单匹配,而整体的计数最少差异是胜利者.

I imagine that (for some applications) a simple matching process might work just as well, in which the 24 bin counts of the current utterance are simple matched against those of each of your stored prototypes, and the one with the least overall difference is the winner.

这篇关于简单的语音识别方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆