寻找发音的正确性 [英] Finding pronunciation correctness

查看:322
本文介绍了寻找发音的正确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要识别用户的语音的质与微软的语音SDK的帮助下( System.Speech.Recognition )。我使用MS语音引擎 - 美国,所以我真正需要的是找出说话人的声音是如何接近北美口音。

I need to identify the "quality" of the user's pronunciation with the help of Microsoft speech SDK (System.Speech.Recognition). I am using MS Speech Engine - US, so what I actually need is to find out how close the speaker's voice is to the "North American" accent.

这样做的一种方式是通过检查用户的声音的接近程度的美国英语语音发音。正如MSDN提到的,好像这个过程是由它自身的语音SDK里面完成的,所以我需要说出来。既然我们可以设置语音的引擎通过我们的自我,以及,我相信这是可能的。

One way of doing this is by checking how close the user's voice is to the US English phonetic pronunciation. As mentioned in MSDN, it seems like this process is done inside the speech SDK by it self, so I need to get that out. Since we can set the phonetic to the engine by our selves as well, I am sure this is possible.

不过,我有什么,我必须做的没有明确的想法。所以,我能做些什么,找出用户的语音质量/它是如何接近美国的北美英语音标发音?用户只需要说话预定义的句子,如世界您好。我在这里。

However, I have no clear idea about what I have to do. So, what can I do to find out the quality of the user's pronunciation/How close it is to US North American English phonetic pronunciation? User will only have to speak pre-defined sentences like "Hello World. I am here".

请帮忙。

更新

我通过使用以下的有某种音素(如在MSDN中提到)的代码

I got some kind of "phonemes" (as mentioned in MSDN) by the use of following code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Windows.Forms;
using System.IO;

namespace US_Speech_Recognizer
{
    public class RecognizeSpeech
    {
        private SpeechRecognitionEngine sEngine; //Speech recognition engine
        private SpeechSynthesizer sSpeak; //Speech synthesizer
        string text3 = "";

        public RecognizeSpeech()
        {
            //Make the recognizer ready
            sEngine = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));


            //Load grammar
            Choices sentences = new Choices();
            sentences.Add(new string[] { "I am hungry" });

            GrammarBuilder gBuilder = new GrammarBuilder(sentences);

            Grammar g = new Grammar(gBuilder);

            sEngine.LoadGrammar(g);

            //Add a handler
            sEngine.SpeechRecognized +=new EventHandler<SpeechRecognizedEventArgs>(sEngine_SpeechRecognized);


            sSpeak = new SpeechSynthesizer();
            sSpeak.Rate = -2;



            //Computer speaks the words to get the phones
            Stream stream = new MemoryStream();
            sSpeak.SetOutputToWaveStream(stream);


            sSpeak.Speak("I was hungry");
            stream.Position = 0;
            sSpeak.SetOutputToNull();


            //Configure the recognizer to stream
            sEngine.SetInputToWaveStream(stream);

            sEngine.RecognizeAsync(RecognizeMode.Single);


        }


        //Start the speech recognition task
        private void sEngine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            string text = "";

            if (e.Result.Text == "I am hungry")
            {
                foreach (RecognizedWordUnit wordUnit in e.Result.Words)
                {
                    text = text + wordUnit.Pronunciation + "\n";
                }

                MessageBox.Show(e.Result.Text + "\n" + text);
            }


        }
    }
}

这是关系到音素的直接代码段(从上面的代码中提取)

This is the direct code snippet related to the phonemes (extracted from the above code)

   //Start the speech recognition task
    private void sEngine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        string text = "";

        if (e.Result.Text == "I am hungry")
        {
            foreach (RecognizedWordUnit wordUnit in e.Result.Words)
            {
                text = text + wordUnit.Pronunciation + "\n";
            }

            MessageBox.Show(e.Result.Text + "\n" + text);
        }


    }



以下是我输出。我得到的音素显示来自第二行开始。第一行只显示公认的句子

Following is my output. The phonemes I got are displayed starting from the second line. First line simply shows the recognized sentence

那么,请告诉我,根据MSDN这是音素。那么,这就是音素其实?我从来没有见过这些,这就是为什么。

So, please tell me, according to the MSDN this is "phonemes". So, is this is the "phonemes" actually? I have never seen these, that is why.

上面的代码,根据这个链接的 http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.srgsgrammar.srgstoken.pronunciation(v = office.14)的.aspx

above code is done according to this link http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.srgsgrammar.srgstoken.pronunciation(v=office.14).aspx

推荐答案

好吧,这里就是我想解决这个问题。

Ok, here's how I'd approach the problem.

首先,加载听写引擎与发音的主题,将返回用户(在识别的事件)所讲的音素。

First, load up the dictation engine with the Pronunciation topic, which will return the phonemes spoken by the user (in the Recognition event).

二,使用的 ISpEnginePronunciation :: GetPronunciations 方法(如我概述 )。

Second, get the reference phonemes for the word using the ISpEnginePronunciation::GetPronunciations method (as I outlined here).

一旦你拥有了两套音素,你可以对它们进行比较。从本质上讲,音素由空格分隔,每个音素由短标签(在的美国英语音素表示规格)。

Once you have the two sets of phonemes, you can compare them. Essentially, the phonemes are separated by spaces, and each phoneme is represented by a short tag (described in the American English Phoneme Representation spec).

鉴于此,你应该能够通过任意数量的近似串匹配方案(例如,的 Levenshtein距离)。

Given this, you should be able to compute a score by comparing the phonemes by any number of approximate string matching schemes (e.g., Levenshtein distance).

您可能会发现,通过比较手机的ID,而不是字符串的问题简单; ISpPhoneConverter :: PhoneToId 可以转换音素串到phoneIDs的阵列,每音素一个识别码。这会给你一个对空值终止的整型数组,也许更适合你的比较算法。

You might find the problem simpler by comparing phone IDs rather than strings; ISpPhoneConverter::PhoneToId can convert the phoneme strings to an array of phoneIDs, one ID per phoneme. That would give you a pair of null-terminated integer arrays, perhaps better suited for your comparison algorithm.

您可以使用引擎的信心惩罚的比赛,因为发动机低信心指示传入的音频不紧密匹配的音素的发动机的想法。

You could use the engine confidence to penalize matches, as low engine confidence indicates that the incoming audio doesn't closely match the engine's idea of the phoneme.

这篇关于寻找发音的正确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆