浏览器上的连续语音识别,例如“ok google”或“嘿siri” [英] Continuous Speech Recognition on browser like "ok google" or "hey siri"

查看:176
本文介绍了浏览器上的连续语音识别,例如“ok google”或“嘿siri”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个POC,我的要求是我要实现像确定谷歌嘿Siri 在浏览器上。

I am doing a POC and my requirement is that I want to implement the feature like OK google or Hey Siri on browser.

我使用的是Chrome浏览器的网络语音API 。我注意到的事情是我无法继续识别,因为它会在一段时间后自动终止,并且由于安全问题我知道它的相关性。我只是做了另一个黑客,就像 SpeechReognition 终止然后在结束事件时我进一步开始 SpeechRecogntion 但它不是实现此类解决方案的最佳方式,因为假设我在不同的浏览器上使用相同应用程序的2个实例选项卡然后它不起作用或可能是我在浏览器中使用另一个使用语音识别的应用程序,然后应用程序的行为与预期不同。我正在寻找解决这个问题的最佳方法。

I am using the Chrome Browser's Web speech api. The things I noticed that I can't continuous the recognition as it terminates automatically after a certain period of time and I know its relevant because of security concern. I just does another hack like when the SpeechReognition terminates then on its end event I further start the SpeechRecogntion but it is not the best way to implement such a solution because suppose if I am using the 2 instances of same application on the different browser tab then It doesn't work or may be I am using another application in my browser that uses the speech recognition then both the application doesn't behave the same as expected. I am looking for a best approach to solve this problem.

提前致谢。

推荐答案

由于你的问题是你不能长时间连续运行SpeechRecognition,一种方法是启动只有当您在麦克风中获得一些输入时才会发出SpeechRecognition。

Since your problem is that you can't run the SpeechRecognition continuously for long periods of time, one way would be to start the SpeechRecognition only when you get some input in the mic.

这种方式只有在有输入时,您才会启动SR,寻找您的magic_word。

如果找到magic_word,那么您将能够正常使用SR进行其他任务。

This way only when there is some input, you will start the SR, looking for your magic_word.
If the magic_word is found, then you will be able to use the SR normally for your other tasks.

这可以通过WebAudioAPI检测到,这是没有受到这个时间限制SR的痛苦。您可以通过LocalMediaStream从 MediaDevices.getUserMedia 提供它。

This can be detected by the WebAudioAPI, which is not tied by this time restriction SR suffers from. You can feed it by an LocalMediaStream from MediaDevices.getUserMedia.

有关详细信息,请在下面的脚本中,您可以请参阅此答案

For more info, on below script, you can see this answer.

以下是如何附加它到SpeechRecognition:

Here is how you could attach it to a SpeechRecognition:

const magic_word = ##YOUR_MAGIC_WORD##;

// initialize our SpeechRecognition object
let recognition = new webkitSpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.continuous = true;

// detect the magic word
recognition.onresult = e => {
    // extract all the transcripts
    var transcripts  = [].concat.apply([], [...e.results]
      .map(res => [...res]
        .map(alt => alt.transcript)
      )
    );
  if(transcripts.some(t => t.indexOf(magic_word) > -1)){
    //do something awesome, like starting your own command listeners
  }
  else{
    // didn't understood...
  }
}
// called when we detect silence
function stopSpeech(){
    recognition.stop();
}
// called when we detect sound
function startSpeech(){
    try{ // calling it twice will throw...
      recognition.start();
  }
  catch(e){}
}
// request a LocalMediaStream
navigator.mediaDevices.getUserMedia({audio:true})
// add our listeners
.then(stream => detectSilence(stream, stopSpeech, startSpeech))
.catch(e => log(e.message));


function detectSilence(
  stream,
  onSoundEnd = _=>{},
  onSoundStart = _=>{},
  silence_delay = 500,
  min_decibels = -80
  ) {
  const ctx = new AudioContext();
  const analyser = ctx.createAnalyser();
  const streamNode = ctx.createMediaStreamSource(stream);
  streamNode.connect(analyser);
  analyser.minDecibels = min_decibels;

  const data = new Uint8Array(analyser.frequencyBinCount); // will hold our data
  let silence_start = performance.now();
  let triggered = false; // trigger only once per silence event

  function loop(time) {
    requestAnimationFrame(loop); // we'll loop every 60th of a second to check
    analyser.getByteFrequencyData(data); // get current data
    if (data.some(v => v)) { // if there is data above the given db limit
      if(triggered){
        triggered = false;
        onSoundStart();
        }
      silence_start = time; // set it to now
    }
    if (!triggered && time - silence_start > silence_delay) {
      onSoundEnd();
      triggered = true;
    }
  }
  loop();
}

一个掠夺者 ,因为StackSnippets和jsfiddle的iframe都不会允许两个版本的gUM ......

As a plunker, since neither StackSnippets nor jsfiddle's iframes will allow gUM in two versions...

这篇关于浏览器上的连续语音识别,例如“ok google”或“嘿siri”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆