使用基于SRGS的自定义语法的自由格式文本 [英] Free-form text with custom SRGS based Grammar

查看:237
本文介绍了使用基于SRGS的自定义语法的自由格式文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试开发一个基于语音的应用程序,该应用程序可以接受用户输入作为语音并根据输入执行一些操作。这是我对这项技术的第一次尝试,我在开发它的同时正在学习。

I am trying to develop a Voice based application that would accept user input as speech and perform some actions based on the input. This is my first ever venture into this technology and I am learning while developing it.

我正在使用dotnet 4附带的Microsoft SAPI识别语音。到目前为止,我已经了解了它支持的两种模式。

I am using Microsoft SAPI shipped with dotnet 4 to recognize speech. So far, I have learned about the two types of modes it supports.


语音识别(SR)有两种操作模式:

Speech recognition (SR) has two modes of operation:


  • 听写模式-一种不受约束的自由形式的语音
    解释模式,使用
    识别器针对特定语言提供的内置语法。这是默认的识别器。

  • Dictation mode — an unconstrained, free-form speech interpretation mode that uses a built-in grammar provided by the recognizer for a specific language. This is the default recognizer.

语法模式-将口语单词与一个或多个特定的上下文无关语法(CFG)匹配。 CFG是一种结构,它定义了
个特定词集,并且这些词的组合可以使用
个。用基本术语来说,CFG定义了对
SR有效的句子。应用程序必须以
预编译语法文件的形式提供语法,或者在运行时以W3C
语音识别语法规范(SRGS)标记或较旧的
CFG规范的形式提供语法。 Windows SDK包含一个语法编译器:
gc.exe。

Grammar mode — matches spoken words to one or more specific context-free grammars (CFGs). A CFG is a structure that defines a specific set of words, and the combination of these words that can be used. In basic terms, a CFG defines the sentences that are valid for SR. Grammars must be supplied by the application in the form of precompiled grammar files or supplied at runtime in the form of W3C Speech Recognition Grammar Specification (SRGS) markup or the older CFG specification. The Windows SDK includes a grammar compiler: gc.exe.

因此,基本上,无论我用语法指定什么单词,引擎都只会识别那些单词。但是我也想包括一些自由形式的文本以及结构化语法。一个例子就是人名。如果要从语音中捕获名称,则需要在语法中指定该名称,但是如果该应用程序已开放供任何人使用,则不可能。

So essentially, whatever words I specify with the grammar, the engine would recognize only those. But I also want to include some free form text along with the structured grammar. An example for that can be names of people. If I want to capture the name from the speech, I need to have that name specified with in the grammar, but that's not possible if the application is open for anyone to use.

我可以提取一些已经不属于语法的文本的方法吗?

Is there a way I can extract some text which is not a part of the grammar already?

我如何使系统识别诸如我叫加里,今年25岁之类的句子旧。名称绝对可以是任何东西,如何在语法中定义它?

How can I get the system to recognize sentences such as "My name is Gary and I am 25 years old". The name can be absolutely anything, how do I define it in my Grammar?

推荐答案

您可以将听写模式与语法模式混合使用,参见MSDN上的以下示例:

You can mix dictation mode with grammar mode, see this example from MSDN:

http://msdn.microsoft.com/zh-cn/library/ms723634(v = vs.85).aspx

<GRAMMAR>
    <!-- command to handle first and last names with semantic properties -->
    <!-- By using semantic properties, the application can ignore all of
        the text returned, except for the text associated with the dictation
        tags' semantic properties "PID_FirstName" and "PID_LastName" -->
    <RULE ID="SubmitName" TOPLEVEL="ACTIVE">
        <P>
            my first name is
            <!-- Note the implicit maximum is only one word -->
            <DICTATION PROPID="PID_FirstName"/>
            and my last name is
            <!-- Note the implicit maximum is two words -->
            <DICTATION PROPID="PID_LastName" MAX="2"/>
        </P>
    </RULE>
</GRAMMAR>

这篇关于使用基于SRGS的自定义语法的自由格式文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆