对于Watson的语音转文本Unity SDK,如何指定关键字? [英] For Watson's Speech-To-Text Unity SDK, how can you specify keywords?

查看:80
本文介绍了对于Watson的语音转文本Unity SDK,如何指定关键字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Watson的语音转文本Unity SDK中指定关键字,但是我不确定如何做到这一点.

I am trying to specify keywords in Watson's Speech-To-Text Unity SDK, but I'm unsure how to do this.

详细信息页面未显示示例(请参见此处:

The details page doesn't show an example (see here: https://www.ibm.com/watson/developercloud/doc/speech-to-text/output.shtml),

和其他论坛帖子都是针对Java应用程序编写的(请参见此处:

and other forum posts are written for Java applications (see here: How to specify phonetic keywords for IBM Watson speech2text service?).

我曾尝试像这样在识别"函数中创建的RecognizeRequest类中对这些值进行硬编码,但没有成功:

I've tried hard-coding these values in the RecognizeRequest class created in the "Recognize" function like so, but without success:

**编辑-此函数永远不会被调用-**

**EDIT - this function never gets called -- **

public bool Recognize(AudioClip clip, OnRecognize callback)
    {
        if (clip == null)
            throw new ArgumentNullException("clip");
        if (callback == null)
            throw new ArgumentNullException("callback");

        RESTConnector connector = RESTConnector.GetConnector(SERVICE_ID, "/v1/recognize");
        if (connector == null)
            return false;

        RecognizeRequest req = new RecognizeRequest();
        req.Clip = clip;
        req.Callback = callback;

        req.Headers["Content-Type"] = "audio/wav";
        req.Send = WaveFile.CreateWAV(clip);
        if (req.Send.Length > MAX_RECOGNIZE_CLIP_SIZE)
        {
            Log.Error("SpeechToText", "AudioClip is too large for Recognize().");
            return false;
        }
        req.Parameters["model"] = m_RecognizeModel;
        req.Parameters["continuous"] = "false";
        req.Parameters["max_alternatives"] = m_MaxAlternatives.ToString();
        req.Parameters["timestamps"] = m_Timestamps ? "true" : "false";
        req.Parameters["word_confidence"] = m_WordConfidence ? "true" :false";

        //these "keywords" and "keywords_threshold" and "keywordsThreshold" parameters
        //are just my guess for how to set these values            
        req.Parameters["keywords"] = new string[] {"fun", "match", "test" };
        req.Parameters["keywordsThreshold"] = .2;
        req.Parameters["keywords_threshold"] = .2;
        //end my test insertions

        req.OnResponse = OnRecognizeResponse;

        return connector.Send(req);
    }

,但是返回的SpeechRecognitionEvent结果值不包含任何keywords_result.这是我的目标.我正在尝试像这样查看keywords_result对象中每个关键字的置信度,但是keywords_result对象以null的形式返回.

but the returned SpeechRecognitionEvent result value does not contain any keywords_result. This is my aim. I'm trying to view the confidence for each keyword in the keywords_result object like so, but the keywords_result object comes back as null.

private void OnRecognize(SpeechRecognitionEvent result) {
    Debug.Log("Recognizing!");
    m_ResultOutput.SendData(new SpeechToTextData(result));

    if (result != null && result.results.Length > 0) {
        if (m_Transcript != null)
            m_Transcript.text = "";

        foreach (var res in result.results) {
            //the res.keywords_result comes back as null
            foreach (var keyword in res.keywords_result.keyword) {
                string text = keyword.normalized_text;
                float confidence = keyword.confidence;
                Debug.Log(text + ": " + confidence);                                            
            }
        }
    }
}

有人在Unity或C#中使用Watson的语音转文本SDK成功实现了关键字置信度评估吗?欢迎提出所有想法和建议.

Has anyone successfully implemented Keyword Confidence Evaluation with Watson's Speech-To-Text SDK in Unity or C#? All ideas and suggestions are welcome.

PS,这是我的第一篇文章:)

PS This is my first post :)

推荐答案

原来,我需要在"SendStart"函数中指定关键字,如下所示:

Turns out I needed to specify the keywords in the "SendStart" function like so:

private void SendStart() {
        if (m_ListenSocket == null)
            throw new WatsonException("SendStart() called with null connector.");

        Dictionary<string, object> start = new Dictionary<string, object>();
        start["action"] = "start";
        start["content-type"] = "audio/l16;rate=" + m_RecordingHZ.ToString() + ";channels=1;";
        start["continuous"] = EnableContinousRecognition;
        start["max_alternatives"] = m_MaxAlternatives;
        start["interim_results"] = EnableInterimResults;
        start["word_confidence"] = m_WordConfidence;
        start["timestamps"] = m_Timestamps;

        //specify keywords here
        start["keywords"] = keywordsToCheck.ToArray();
        start["keywords_threshold"] = 0.05;
        //end additions here 

        m_ListenSocket.Send(new WSConnector.TextMessage(Json.Serialize(start)));
        m_LastStartSent = DateTime.Now;
    }

并在"ParseRecognizeResponse"函数中编写一些代码以正确解析keyword_results:

and write some code to parse the keyword_results properly in the "ParseRecognizeResponse" function:

private SpeechRecognitionEvent ParseRecognizeResponse(IDictionary resp){

        if (resp == null)
            return null;


        List<SpeechRecognitionResult> results = new List<SpeechRecognitionResult>();
        IList iresults = resp["results"] as IList;
        if (iresults == null)
            return null;

        foreach (var r in iresults)
        {
            IDictionary iresult = r as IDictionary;
            if (iresults == null)
                continue;

            SpeechRecognitionResult result = new SpeechRecognitionResult();

            //added this section, starting here
            IDictionary iKeywords_result = iresult["keywords_result"] as IDictionary;
            result.keywords_result = new KeywordResults();
            List<KeywordResult> keywordResults = new List<KeywordResult>();
            foreach (string key in keywordsToCheck) {
                if (iKeywords_result[key] != null) {
                    IList keyword_Results = iKeywords_result[key] as IList;
                    if (keyword_Results == null) {
                        continue;
                    }
                    foreach (var res in keyword_Results) {
                        IDictionary kw_resultDic = res as IDictionary;
                        KeywordResult keyword_Result = new KeywordResult();
                        keyword_Result.confidence = (double)kw_resultDic["confidence"];
                        keyword_Result.end_time = (double)kw_resultDic["end_time"];
                        keyword_Result.start_time = (double)kw_resultDic["start_time"];
                        keyword_Result.normalized_text = (string)kw_resultDic["normalized_text"];
                        keywordResults.Add(keyword_Result);
                    }
                }
            }
            result.keywords_result.keyword = keywordResults.ToArray();                   
            //ends here

            result.final = (bool)iresult["final"];

            IList ialternatives = iresult["alternatives"] as IList;
            if (ialternatives == null)
                continue;

            List<SpeechRecognitionAlternative> alternatives = new List<SpeechRecognitionAlternative>();
            foreach (var a in ialternatives)
            {
                IDictionary ialternative = a as IDictionary;
                if (ialternative == null)
                    continue;

                SpeechRecognitionAlternative alternative = new SpeechRecognitionAlternative();
                alternative.transcript = (string)ialternative["transcript"];
                if (ialternative.Contains("confidence"))
                    alternative.confidence = (double)ialternative["confidence"];

                if (ialternative.Contains("timestamps"))
                {
                    IList itimestamps = ialternative["timestamps"] as IList;

                    TimeStamp[] timestamps = new TimeStamp[itimestamps.Count];
                    for (int i = 0; i < itimestamps.Count; ++i)
                    {
                        IList itimestamp = itimestamps[i] as IList;
                        if (itimestamp == null)
                            continue;

                        TimeStamp ts = new TimeStamp();
                        ts.Word = (string)itimestamp[0];
                        ts.Start = (double)itimestamp[1];
                        ts.End = (double)itimestamp[2];
                        timestamps[i] = ts;
                    }

                    alternative.Timestamps = timestamps;
                }
                if (ialternative.Contains("word_confidence"))
                {
                    IList iconfidence = ialternative["word_confidence"] as IList;

                    WordConfidence[] confidence = new WordConfidence[iconfidence.Count];
                    for (int i = 0; i < iconfidence.Count; ++i)
                    {
                        IList iwordconf = iconfidence[i] as IList;
                        if (iwordconf == null)
                            continue;

                        WordConfidence wc = new WordConfidence();
                        wc.Word = (string)iwordconf[0];
                        wc.Confidence = (double)iwordconf[1];
                        confidence[i] = wc;
                    }

                    alternative.WordConfidence = confidence;
                }

                alternatives.Add(alternative);
            }
            result.alternatives = alternatives.ToArray();
            results.Add(result);
        }

        return new SpeechRecognitionEvent(results.ToArray());                        
    }

因此,现在,当OnRecognize通过此SpeechRecognitionEvent时,我已将用于显示单词替换项及其置信度分数的代码更改为显示关键字结果及其置信度分数,如下所示:

So that now, when OnRecognize gets passed this SpeechRecognitionEvent, I've changed the code for displaying word alternatives and their confidence score, to displaying keyword results and their confidence score, like so:

private void OnRecognize(SpeechRecognitionEvent result) {
    //Debug.Log("Recognizing!");
    m_ResultOutput.SendData(new SpeechToTextData(result));

    if (result != null && result.results.Length > 0) {
        if (m_Transcript != null)
            m_Transcript.text = "";

        foreach (var res in result.results) {
            //start keyword recognition changes here
            if (res.keywords_result != null) {
                if (res.keywords_result.keyword != null) {
                    foreach (var keyword in res.keywords_result.keyword) {
                        m_Transcript.text += string.Format("{0} ({1}, {2:0.00})\n",
                            keyword.normalized_text, res.final ? "Final" : "Interim", keyword.confidence);
                    }
                }
            }
            //end here                
        }
    }
}

请注意,使用关键字结果置信度值比进行一些硬编码检查(看沃森是否正与您的关键字匹配)然后在此处使用置信度值进行一些硬编码检查更有价值.当检查keyword_results.keyword [].confidence值时,置信度值要高得多,因为它已经在检查这些词了.这就是完成此过程并解析SpeechRecognitionEvent结果值以正确包含keyword_result值的动力.

Note, using the keyword results confidence values is much more valuable than doing some hardcoded check to see if the word alternatives Watson is getting match your keywords, and then using the confidence value there. The confidence values come back much higher when checking the keyword_results.keyword[].confidence values because it's already checking against those words. That was the impetus for going through with this process and parsing the SpeechRecognitionEvent result value to properly include the keywords_result values.

在某些背景下,我正在为有阅读障碍的孩子创建一个节奏游戏,以学习单词形成过程,所以认为Guitar Hero遇到了芝麻街.

For some background, I'm creating a rhythm game for children with dyslexia to learn word formation, so think Guitar Hero meets Sesame street.

这篇关于对于Watson的语音转文本Unity SDK,如何指定关键字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆