如何[确定谷歌"技术实现 [英] How "ok google" technology is implemented

查看:237
本文介绍了如何[确定谷歌"技术实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过一些关于语音/语音识别,我不知道它是如何工作的。例如,OK谷歌Android和类似的案件?

I've read a little about Speech/voice recognition, and I wonder how does it work. For instance, "ok Google" on android and similar cases ?

我想知道它是如何工作(如何区分,并在连续进料分析的话,找到它的一个关键字)。如果我想它作为一个连续的文本饲料,这样做的一个办法是隔离饲料的给定长度,然后找到一个关键字。音频饲料是一点点更难理解,因为没有单词之间没有纯沉默(作为所述),并分离给定的长度并不能保证在开始或在所选择的子进料的端部切割关键字。它是如何工作的?

I would like to know how it works (how to differentiate and analyze a word in a continuous feed, to find of it's a keyword). If I think about it as a continuous text feed, one way of doing it would be Isolating a given length of the feed, then find a keyword. An audio feed is a little bit harder to understand, as there is no pure silence between words (as said) and isolating a given length doesn't guarantee cutting a keyword at the beginning or at the end of the selected sub-feed. How does it work?

最后,如果你们知道,有些库(C / C ++如果可能的话),这是能够做到的,我会很高兴实施关键字检举。

And finally, if you guys know some libs (C/C++ if possible) which are capable of doing it, I'll be glad to implement a "keyword spotter".

感谢您。

推荐答案

关键词检测通常采用动态规划落实,你只是搜索包含在寻找一切可能的开始和所有可能的两端关键字音频的最佳块。你需要寻找关键字和替代品。基本上在时间的每一刻你寻找关键字和其他声音,一旦概率关键字比其他演讲的概率越高,你提高的信号。误报率是由阈值控制。你并不需要,因为它是由其他的演讲的模式覆盖来专门处理沉默。在具体的算法是覆盖在下面的文章:

Keyword spotting is usually implemented with dynamic programming, you just search for the best chunk of audio containing the keyword looking on all possible starts and all possible ends. You need to look for both keywords and alternatives. Basically in every moment of time you look for both keyword and other sounds and once probability for keyword is higher than the probability of other speech you raise the signal. The false alarm rate is controlled by a threshold. You do not need to handle silence specifically because it is covered by "other speech" model. In detail the algorithm is covered in the following thesis:

http://eprints.qut.edu.au/37254/

有关实施关键字的斑点,您可以检查pocketsphinx和pocketsphinx Android的演示。这是一个C库能够发现在连续流的话。你可以在这里找到的教程:

For implementation of keyword spotting you can check pocketsphinx and pocketsphinx Android demo. It is a C library able to spot words in continuous stream. You can find the tutorial here:

http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx

要从麦克风发现关键字,你可以尝试一些简单的像

To spot for keyword from microphone you can try something simple like

  pocketsphinx_continuous -inmic yes -keyphrase "ok google" -kws_threshold 1e-20

原确定谷歌技术在以下出版物中描述:

Original "Ok Google" technology is described in the following publication:

小尺寸的关键词识别通过使用深神经网络
果果陈卡罗莱纳帕拉达乔治Heigold

SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS by Guoguo Chen Carolina Parada Georg Heigold

https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenSemester2201314/chen2014small.pdf

这是pretty先进的技术,更重要的是,它需要培训大量的具体数据。

It is pretty advanced technology, and more importantly, it requires a lot of specific data for training.

这篇关于如何[确定谷歌"技术实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆