非语音噪音或声音识别软件? [英] Non-Speech Noise or Sound Recognition Software?

查看:31
本文介绍了非语音噪音或声音识别软件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一些面向儿童的软件,并希望为该软件添加响应多种非语音声音的功能.例如,拍手、吠叫、吹口哨、放屁声等.

I'm working on some software for children, and looking to add the ability for the software to respond to a number of non-speech sounds. For instance, clapping, barking, whistling, fart noises, etc.

我过去使用过 CMU Sphinx 和 Windows Speech API,但是,据我所知,这两者都不支持非语音噪音,事实上我相信会主动过滤掉它们.

I've used CMU Sphinx and the Windows Speech API in the past, however, as far as I can tell neither of these have any support for non-speech noises, and in fact I believe actively filter them out.

总的来说,我正在寻找我如何获得此功能",但我怀疑如果我将其分解为三个问题(我猜测接下来要搜索的内容)可能会有所帮助:

In general I'm looking for "How do I get this functionality" but I suspect it may help if I break it down into three questions that are my guesses for what to search for next:

  1. 有没有办法通过改变声学模型或发音词典来使用主要的语音识别引擎之一来识别非单词的声音?
  2. (或)是否已经有一个库可以进行非单词噪声识别?
  3. (或)我对隐马尔可夫模型和大学语音识别的基础技术有一点熟悉,但没有很好地估计从头开始创建一个非常小的噪音/声音识别器会有多困难(假设 <20 种待识别的噪音).如果 1) 和 2) 失败,是否可以估计自己推出自己需要多长时间?

谢谢

推荐答案

是的,您可以使用 CMU Sphinx 等语音识别软件来识别非语音声音.为此,您需要创建自己的声学和语言模型,并定义仅限于您的任务的词典.但是要训练相应的声学模型,您必须有足够的训练数据,并带有带注释的感兴趣的声音.

Yes, you can use speech recognition software like CMU Sphinx for recognition of non-speech sounds. For this, you need to create your own acoustical and language models and define the lexicon restricted to your task. But to train the corresponding acoustic model, you must have enough training data with annotated sounds of interest.

简而言之,步骤顺序如下:

In short, the sequence of steps is the following:

首先,准备训练资源:词典、词典等.过程描述如下:http://cmusphinx.sourceforge.net/wiki/tutorialam.但在您的情况下,您需要重新定义音素集和词典.也就是说,您应该将填充词建模为真实单词(因此,周围没有 ++)并且您不需要定义完整的音素集.有很多可能性,但最简单的一种可能是为所有语音音素拥有一个模型.因此,您的词典将如下所示:

First, prepare resources for training: lexicon, dictionary etc. The process is described here: http://cmusphinx.sourceforge.net/wiki/tutorialam. But in your case, you need to redefine phoneme set and the lexicon. Namely, you should model fillers as real words (so, no ++ around) and you don't need to define the full phoneme set. There are many possibilities, but probably the most simple one is to have a single model for all speech phonemes. Thus, your lexicon will look like:

CLAP CLAP
BARK BARK
WHISTLE WHISTLE
FART FART
SPEECH SPEECH

其次,准备带有标签的训练数据:类似于 VoxForge,但文本注释必须仅包含您词典中的标签.当然,非语音声音也必须正确标记.这里的好问题是从哪里获得足够多的此类数据.不过我想应该是可以的.

Second, prepare training data with labels: Something similar to VoxForge, but text annotations must contain only labels from your lexicon. Of course, non-speech sounds must be labeled correctly as well. Good question here is where to get large enough amount of such data. But I guess it should be possible.

有了这个,你就可以训练你的模型了.与语音识别相比,这项任务更简单,例如,您不需要使用三音素,只需使用单音素.

Having that, you can train your model. The task is simpler compared to speech recognition, for instance, you don't need to use triphones, just monophones.

假设任何声音/语音的先验概率相等,最简单的语言模型可以是类似循环的语法(http://cmusphinx.sourceforge.net/wiki/tutoriallm):

Assuming equal prior probability of any sound/speech, the simplest language model can be a loop-like grammar (http://cmusphinx.sourceforge.net/wiki/tutoriallm):

#JSGF V1.0;
/**
 * JSGF Grammar for Hello World example
 */
grammar foo;
public <foo> = (CLAP | BARK | WHISTLE | FART | SPEECH)+ ;

这是使用 ASR 工具包完成任务的基本方法.可以通过微调 HMM 配置、使用统计语言模型和使用细粒度音素建模(例如,区分元音和辅音而不是使用单个 SPEECH 模型.这取决于训练数据的性质)来进一步改进 In.

This is the very basic approach to using ASR toolkit for your task. In can be further improved by fine-tuning HMMs configurations, using statistical language models and using fine-grained phonemes modeling (e.g. distinguishing vowels and consonants instead of having single SPEECH model. It depends on nature of your training data).

在语音识别框架之外,您可以构建一个简单的静态分类器,它将逐帧分析输入数据.在频谱图上运行的卷积神经网络在这项任务中表现得非常好.

Outside the framework of speech recognition, you can build a simple static classifier that will analyze the input data frame by frame. Convolutional neural networks that operate over spectrograms perform quite well for this task.

这篇关于非语音噪音或声音识别软件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆