.Net 中的语音转音素 [英] Speech to Phoneme in .Net

查看:41
本文介绍了.Net 中的语音转音素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题是我想用 C# 语言获取音频语音的音素.假设你有一个像x.wav"这样的音频文件,上面写着你好,亲爱的沙米姆".我想提取语音的所有音素及其相对时间.类似于下图:

The problem is that I want to get phonemes of a audio speech in C# language. say you have an audio file like "x.wav" that says "hello dear Shamim". i want to extract all the phonemes of the speech and their relative timings. something like the picture below:

我使用了 System.Speech 库(recognitionsynthesis 命名空间),但我没有找到我想要的.现在不要误会!我不想要句子你好,亲爱的 Shamim"的音素,我想从一个未知的语音输入和英语句子中提取音素.我尝试了 System.Speech.Recognition 但它试图从音频文件中提取单词,而不是音素!正如您可能猜到的,这些词有 30% 的错误!;)

I used System.Speech library (both recognition and synthesis namespaces) but i didn't find what i wanted. Now don't be mistaken! I don't want the phonemes of the sentence "hello dear Shamim", i want to extract the phonemes from an unknown audio input that speaks and English sentence. I tried System.Speech.Recognition but it tries to extract the words out of the audio file, not the phonems! and as you may guessed, the words are 30% wrong! ;)

推荐答案

与单词识别相比,音素识别需要一些专门的设置,并且大多数引擎不直接支持它(单音单词"字典通常不会导致良好的准确性).一个重要的原因是音素识别比单词识别准确得多,因为单词识别受到更多限制(它过滤掉所有没有映射到真实单词的音素组合,这是其中的大部分).但是HTK确实支持它.您可以通过执行 shell 命令(从 C# 执行此操作没有什么坏处)或 pinvoking 库来使用它.

Phoneme recognition requires a bit of a specialized set-up compared to word recognition, and most engines don't support it directly (a dictionary of monophonic "words" doesn't usually result in good accuracy). A big reason for that is that phoneme recognition is much less accurate than word recognition, since word recognition is more constrained (it filters out all phone combinations which don't map to real words, which is most of them). But HTK does support it. You can use it by executing shell commands (there's nothing evil in doing that from C#) or pinvoking the libraries.

这篇关于.Net 中的语音转音素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆