程序运行时如何使Microsoft Azure语音转文本开始转录? (Unity,C#) [英] How to get Microsoft Azure Speech To Text to start transcribing when program is run? (Unity, C#)

查看:79
本文介绍了程序运行时如何使Microsoft Azure语音转文本开始转录? (Unity,C#)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Unity3D中Microsoft Azure的Cognitive Services Speech To Text SDK开发一个简单的应用程序.我正在关注本教程,并且效果很好.本教程的唯一问题是语音转文本"是通过按钮激活的.当您按下按钮时,它将在一个句子的时间内进行转录,并且您必须再次按下按钮才能再次进行转录.我的问题是,我希望程序在Unity中运行后立即开始转录,而不是每次我想转录一个句子时都必须按下一个按钮.

I am trying to build a simple app using Microsoft Azure's Cognitive Services Speech To Text SDK in Unity3D. I've following this tutorial, and it worked quite well. The only problem with this tutorial is that the Speech-To-Text is activated by a button. When you press the button, it'll transcribe for the duration of a sentence, and you'll have to press the button again for it to transcribe again. My problem is I'd like it to start transcribing as soon as the program is run in Unity, rather than having to press a button each time I want to transcribe a sentence.

这是代码.

    public async void ButtonClick()
    {
        // Creates an instance of a speech config with specified subscription key and service region.
        // Replace with your own subscription key and service region (e.g., "westus").
        var config = SpeechConfig.FromSubscription("[My API Key]", "westus");

        // Make sure to dispose the recognizer after use!
        using (var recognizer = new SpeechRecognizer(config))
        {
            lock (threadLocker)
            {
                waitingForReco = true;
            }

            // Starts speech recognition, and returns after a single utterance is recognized. The end of a
            // single utterance is determined by listening for silence at the end or until a maximum of 15
            // seconds of audio is processed.  The task returns the recognition text as result.
            // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
            // shot recognition like command or query.
            // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
            var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

            // Checks result.
            string newMessage = string.Empty;
            if (result.Reason == ResultReason.RecognizedSpeech)
            {
                newMessage = result.Text;
            }
            else if (result.Reason == ResultReason.NoMatch)
            {
                newMessage = "NOMATCH: Speech could not be recognized.";
            }
            else if (result.Reason == ResultReason.Canceled)
            {
                var cancellation = CancellationDetails.FromResult(result);
                newMessage = $"CANCELED: Reason={cancellation.Reason} ErrorDetails={cancellation.ErrorDetails}";
            }

            lock (threadLocker)
            {
                message = newMessage;
                waitingForReco = false;
            }
        }
    }

    void Start()
    {
        if (outputText == null)
        {
            UnityEngine.Debug.LogError("outputText property is null! Assign a UI Text element to it.");
        }
        else if (startRecoButton == null)
        {
            message = "startRecoButton property is null! Assign a UI Button to it.";
            UnityEngine.Debug.LogError(message);
        }
        else
        {
            // Continue with normal initialization, Text and Button objects are present.
        }
    }

    void Update()
    {
        lock (threadLocker)
        {
            if (startRecoButton != null)
            {
                startRecoButton.interactable = !waitingForReco && micPermissionGranted;
            }
        }
    }

我尝试删除Button对象,但是语音转文本不会运行.

I've tried removing the Button object, but then the speech-to-text won't run.

任何提示或建议都将是惊人的.谢谢.

Any tips or advice would be amazing. Thank you.

推荐答案

根据您所引用的教程脚本中的注释:

Per the comments in the script of the tutorial your referenced:

// Starts speech recognition, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed.  The task returns the recognition text as result.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.

但是,这并不像用'StartContinuousRecognitionAsync'替换'RecognizeOnceAsync'那样简单,因为它们的行为是不同的. RecognizeOnceAsync基本上将打开麦克风最多15秒钟,然后停止收听.

But it's not as simple as replacing 'RecognizeOnceAsync' with 'StartContinuousRecognitionAsync', because the behaviours are different. RecognizeOnceAsync will basically turn on your mic for a maximum of 15 seconds, and then stop listening.

相反,将按钮设置为我应该继续听还是不继续听?"使用StartContinuousRecognitionAsyncStopContinuousRecognitionAsync,然后更改您的Start函数以简单地启动一个新的识别器,并等待语音识别器事件通过.下面是我用来启用此功能的脚本:

Instead, make the button into 'should I listen continuously or not?' using StartContinuousRecognitionAsync and StopContinuousRecognitionAsync, and then change your Start function to simply start up a new recognizer and have it waiting for the Speech Recognizer event to come through. Below is the script I used to enable this functionality:

using UnityEngine;
using UnityEngine.UI;
using Microsoft.CognitiveServices.Speech;

public class HelloWorld : MonoBehaviour
{
    public Text outputText;
    public Button startRecordButton;

    // PULLED OUT OF BUTTON CLICK
    SpeechRecognizer recognizer;
    SpeechConfig config;

    private object threadLocker = new object();
    private bool speechStarted = false; //checking to see if you've started listening for speech
    private string message;

    private bool micPermissionGranted = false;

    private void RecognizingHandler(object sender, SpeechRecognitionEventArgs e)
    {
        lock (threadLocker)
        {
            message = e.Result.Text;
        }
    }
    public async void ButtonClick()
    {
        if (speechStarted)
        {
            await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false); // this stops the listening when you click the button, if it's already on
            lock(threadLocker)
            {
                speechStarted = false;
            }
        }
        else
        {
            await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false); // this will start the listening when you click the button, if it's already off
            lock (threadLocker)
            {
                speechStarted = true;
            }
        }

    }

    void Start()
    {
        startRecordButton.onClick.AddListener(ButtonClick);
        config = SpeechConfig.FromSubscription("KEY", "REGION");
        recognizer = new SpeechRecognizer(config);
        recognizer.Recognizing += RecognizingHandler;
    }

    void Update()
    {

        lock (threadLocker)
        {
            if (outputText != null)
            {
                outputText.text = message;
            }
        }
    }
}

以下是使用此功能的我的gif图片.您不会说我根本不单击按钮(在录制gif之前,它只被单击了一次)(同样,对于奇怪的句子,我的同事不停打扰,问我在跟谁说话)

And below is a gif of me using this functionality. You'll not that I don't click the button at all (and it was only clicked once, prior to the gif being recorded)(also, sorry for the strange sentences, my coworkers kept interrupting asking who I was talking to)

这篇关于程序运行时如何使Microsoft Azure语音转文本开始转录? (Unity,C#)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆