在python中将声音转换为音素列表 [英] convert sound to list of phonemes in python

查看:84
本文介绍了在python中将声音转换为音素列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将任何声音信号转换为列表音素?

How do I convert any sound signal to a list phonemes?

即从数字信号到制作录音的音素列表的实际方法和/或代码.
例如:

I.e the actual methodology and/or code to go from a digital signal to a list of phonemes that the sound recording is made from.
eg:

lPhonemes = audio_to_phonemes(aSignal)

例如哪里

from scipy.io.wavfile import read
iSampleRate, aSignal = read(sRecordingDir)

aSignal = #numpy array for the recorded word 'hear'
lPhonemes = ['HH', 'IY1', 'R']

我需要函数audio_to_phonemes

并非所有声音都是语言词,所以我不能只使用 使用谷歌 API 的东西.

Not all sounds are language words, so I cannot just use something that uses the google API for example.

编辑
我不想要音频到文字,我想要音频到音素.大多数图书馆似乎没有输出.您推荐的任何库都需要能够输出组成声音的音素的有序列表.它需要在python中.

Edit
I don't want audio to words, I want audio to phonemes. Most libraries seem to not output that. Any library you recommend needs to be able to output the ordered list of phonemes that the sound is made up of. And it needs to be in python.

我也很想知道声音到音素的过程是如何工作的.如果不是为了实现目的,那就是为了利益.

I would also love to know how the process of sound to phonemes works. If not for implementation purposes, then for interest sake.

推荐答案

准确的音素识别不容易存档,因为音素本身的定义相当松散.即使在良好的音频中,当今最好的系统也有大约 18% 的音素错误率(您可以在 Alex Graves 发布的 TIMIT 上查看 LSTM-RNN 结果).

Accurate phoneme recognition is not easy to archive because phonemes itself are pretty loosely defined. Even in good audio the best possible systems today have about 18% phoneme error rate (you can check LSTM-RNN results on TIMIT published by Alex Graves).

在 CMUSphinx 中,Python 中的音素识别是这样完成的:

In CMUSphinx phoneme recognition in Python is done like this:

from os import environ, path

from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

MODELDIR = "../../../model"
DATADIR = "../../../test/data"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-allphone', path.join(MODELDIR, 'en-us/en-us-phone.lm.dmp'))
config.set_float('-lw', 2.0)
config.set_float('-beam', 1e-10)
config.set_float('-pbeam', 1e-10)

# Decode streaming data.
decoder = Decoder(config)

decoder.start_utt()
stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')
while True:
  buf = stream.read(1024)
  if buf:
    decoder.process_raw(buf, False, False)
  else:
    break
decoder.end_utt()

hypothesis = decoder.hyp()
print ('Phonemes: ', [seg.word for seg in decoder.seg()])

您需要从 github 签出最新的 Pocketsphinx 才能运行此示例.结果应如下所示:

You need to checkout latest pocketsphinx from github in order to run this example. Result should look like this:

  ('Best phonemes: ', ['SIL', 'G', 'OW', 'F', 'AO', 'R', 'W', 'ER', 'D', 'T', 'AE', 'N', 'NG', 'IY', 'IH', 'ZH', 'ER', 'Z', 'S', 'V', 'SIL'])

另见维基页面

这篇关于在python中将声音转换为音素列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆