没有API的孤立的单词识别 [英] Isolated word recognition without API

查看:103
本文介绍了没有API的孤立的单词识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要构建一个可以识别某些单词的小项目。

我不知道如何格式化数据以及将哪种格式放入我的神经网络输入层的问题。

我需要你的帮助。

I need to build a small project which can recognize some words.
The problem that i don't know how i need to format the data and on which format put into my neural network input layer.
I need your helps.

推荐答案

API的意思是应用程序编程接口。无论你的解决方案是什么,你都应该有一个API。



我相信你的意思是你不想使用现成的孤立词语音识别API 。



你所要求的是一个重要的项目,需要一些时间,然后可能比现有的解决方案更差。



在最高级别,你想做:



1.特征提取

2。特征后处理

3.模式识别



对于语音识别,通常有两个阶段的模式识别,即首先从音频流中提取的特征中提取音素符号,然后从音素流中提取单词。这是一个非常重要的问题。最好的语音识别解决方案是由一位名叫Jim Baker的人发明的,他写了一篇关于使用隐马尔可夫模型(HMM)进行语音识别的经典论文。



我有阅读神经网已经取得了一些成功,但我相信最好的系统,如开源卡内基梅隆识别器Sphynx,使用HMM。



相对可以通过对重叠音频的窗口段进行FFT来完成允许识别完全不同的有限字集的简单解决方案。对于一种解决方案,在16KHz下采样,进行重叠的FFT,这些FFT与128个采样重叠并进行256点快速傅里叶变换(FFT)。您可以找到开源FFT源代码。计算功率谱,然后训练模式识别器将频谱模式/序列转换为音素。然后使用另一种模式识别器将音素序列转换为单词。



这样的解决方案适用于一组精心挑选的孤立单词。语音识别是一个非常复杂的问题,更通用的解决方案需要更多的功能,大量的训练数据,高计算机能力来处理数据。我已经读到今天最先进的解决方案使用耳蜗耳模型,而不是FFT,虽然FFT解决方案可以提供良好的结果 - 我认为更高级的模型在嘈杂的环境中更好。



我意识到,如果你不了解数字信号处理,我上面写的只会带来更多问题。这个论坛太短,无法教授信号理论,过滤理论和模式识别等大学课程。你必须自己玩这些。



我和一位同学在1980年左右在大学里做了一个孤立的工作识别器项目。我们使用了我们为特征提取而构建的硬件和Kim-1微处理器识别特定的单词,并且没有太多的计算机能力,我们使用零交叉进行特征提取。它工作得非常糟糕,但它确实在时尚之后起作用。使用FFT并花费大量时间进行调整,你今天应该能够获得一些有用的东西,但它不会与像Sphynx这样的东西竞争,甚至Sphynx,这是非常好的,并不像我的商业解决方案那么好。见过。
API means "Application Programming Interface". Whatever your solution, you should have an API.

I believe what you mean is that you do not want to use an off-the-shelf isolated word speech recognition API.

What you are asking is a significant project that will take some time, and then likely work much poorer than well-developed existing solutions.

At the highest level, you want to do:

1. Feature extraction
2. Feature post-processing
3. Pattern recognition

For speech recognition, there is usually a two-phases of pattern recognition, which is to first extract phoneme symbols from the features extracted from an audio stream, and then extract words from the stream of phonemes. This is a non-trivial problem. The best existing solutions for speech recognition were invented by a man named Jim Baker who wrote a classic paper on using Hidden Markov Models (HMMs) for speech recognition.

I have read that neural nets have been used with some success, but I believe the best systems, such as Sphynx, the open source Carnegie Mellon recognizer, use HMMs.

A relatively simple solution that will allow recognizing a limited set of words that are sufficiently different can be done by doing FFTs of windows segments of overlapping audio. For one solution, sample at 16 KHz., do overlapping FFTs that are overlapped by 128 samples and do 256 point Fast Fourier Transforms (FFTs). You can find open source FFT source code. Computer the power spectrum, and then train a pattern recognizer to convert frequency spectrum patterns/sequences to phonemes. Then use another pattern recognizer to convert phoneme sequences to words.

Such a solution will work for a limited set of carefully chosen isolated words. Speech recognition is a very complex problem, and a more general solution requires more features, massive training data, high computer power to process the data. I have read that the most advanced solutions today use a cochlear ear model, not an FFT, although FFT solutions can provide good results - I think the more advanced models are just better in noisy environments.

I realize that, if you don't know Digital Signal Processing, what I wrote above will only lead to more questions. This forum is too short to teach college courses in signal theory, filtering theory, and pattern recognition. You will have to play with these yourself.

I, and a classmate, made an isolated work recognizer project in college around 1980. We used hardware we built for feature extraction and a Kim-1 microprocessor to recognize specific words, and not having much computer power, we used zero crossings for the feature extraction. It worked horribly, but it did work after a fashion. With an FFT and spending a lot of time tweaking, you should be able to get something useful today, but it will not compete with something like Sphynx, and even Sphynx, which is excellent, is not nearly as good as the commercial solutions I've seen.


这篇关于没有API的孤立的单词识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆