安卓:语音识别 [英] Android: Voice recognition

查看:127
本文介绍了安卓:语音识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[可能重复]但我没有找到答案,我下面的问题。

我已经做了在过去的两天语音识别一些研究,我没有得到回答我的问题:

  1. 是否有可能运行语音识别作为一种服务?我想实现这样的事情:我需要调用一些,虽然我的手机通过语音识别处于睡眠模式
  2. 是否语音识别正常工作,检测的话,当我上火车,公共汽车等?
  3. 是否有任何传感器从语音识别检测的声音与众不同?
  4. 在语音识别正常工作,并根据用户需要讲更贴近手机?
解决方案

1)这是正确的方法,把语音识别成一个服务,就像是在谷歌的API,在回调方法用得到的结果。为了使其运行continously,服务必须处理wakelock,这将避免陷入睡眠模式。一些更多的信息,这里提供唤醒锁Android的服务经常性它有一个很大的缺点 - 高电池的使用,导致通过CPU的连续工作和传入的声音数据coninuous计算。 (可降低带过滤器,阈值等)

2)语音识别并不是一个简单的任务。它希望的计算和数据数量庞大,以参考。如果输入的音频信号不明确(噪音,很多人的声音等),这是很难得到正确的输出。有什么可以做,使精度越好,滤波器的输入音频:噪音SUP presion公司,低通滤波器等,你不能指望100%的准确率,但80-95%可达到

哈德是过滤很多人的声音。但是,可以使用一些简单的振幅(声音强度级别)算法,自适应阈值决定何时字的开头和结尾。想法是,适当的语音最接近电话/设备最响亮=。因此,根据 4)精度较好,当用户说话接近麦克风,因为它是最响亮的声音。

3)我不知道你所说的传感器的意思,但也有算法简单地检测出人的声音,而这德code字。这些算法被称为语音活动检测(VAD)有些code应的Speex项目文档中找到 http://www.speex.org/

最简单的方法来处理语音识别是使用谷歌语音API至极是pretty的好,它认识到大量的语言,但需要一个互联网连接 - 它需要一段时间才能得到结果 - 更快的是CMU狮身人面像,但它几乎没有语言模型,需要更多的RAM内存和proccesor计算,因为所有的解码是由对设备。在我的opininon是非常好的,当dicitionary(文字来说revognized)是小的,如命令(左,右,向后,停止,启动,等等)。

[possibly duplicate] But I didn't find answers to my questions below.

I've been doing some research on voice recognition for the past two days and I didn't get answers to my questions:

  1. Is it possible to run voice recognition as a service? I would like to implement something like this: I need to call a number though my phone through voice recognition is in sleep mode.
  2. Does voice recognition work properly to detect the words when I am on a train, bus, etc?
  3. Is there any sensor to detect the voice apart from the voice recognition?
  4. For voice recognition to work properly, does the user need to speak closer to the phone?

解决方案

1) It is proper approach to put voice recognition into a service, like it is made in Google api, where callback methods are used to get results. To make it run continously, service must deal with wakelock that will avoid falling in sleep mode. Some more information is provided here Wake locks android service recurring It has one big disadvantage - high battery usage, cause by continuous work of CPU and coninuous computations of incoming sound data. (Can be reduced with filters, thresholds etc.)

2) Voice recognition is not a simple task. It desires huge number of calculation and data to reference to. If input audio is not clear (noise, many human voices etc.), it is harder to get proper output. What can be done to make accuracy better is, filter input audio: noise suppresion, low pass filter etc. You cannot expect 100% accuracy, but 80-95 % can be achieved.

Harder is to filter many human voices. But there can be used some simple amplitude (audio strength level) algorithms with adaptive threshold that decides when word begins and ends. Idea is that the proper voice is the loudest = nearest to phone/device. So according to 4) accuracy is better when user speak close to microphone, because it is the loudest voice.

3) I dont know what you mean by sensor, but there are algorithms to simply detect human voice rather that decode words. These algorithms are called Voice Activity Detection (VAD) Some code should be found in Speex project documentation http://www.speex.org/

Simplest method to handle voice recognition is to use Google Speech api wich is pretty good, and it recognize plenty of languages but need an Internet connection - and it takes a while to get result.
Faster is CMU Sphinx but it has few language models, needs more RAM memory and proccesor computation since all decoding is made on device. In my opininon it very good when dicitionary (words that are revognized) is small like commands (left,right, backward, stop, start, etc).

这篇关于安卓:语音识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆