从PyTTS的音频流中编码mp3 [英] encoding mp3 from a audio stream of PyTTS

查看:124
本文介绍了从PyTTS的音频流中编码mp3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python 2.5处理音频mp3文件中的文本到语音转换文本.

I work on text-to-speech trasforming text, in audio mp3 files, using python 2.5.

我将pyTSS用作python文本语音转换模块,以转换音频.wav文件中的文本(在pyTTS中无法直接以mp​​3格式编码).因此,之后,我使用lame命令行编码器以mp3格式编码了这些wav文件.

I use pyTSS as a python Text-To-Speech module, to transform text in audio .wav files (in pyTTS is not possible to encode in mp3 format directly). So after that, I code these wav files, in mp3 format, using lame command line encoder.

现在,问题在于,我想在特定的外部声音文件(如声音警告)或(如果可能的话,生成警告声音)中插入(特别是在mp3音频文件中的两个单词之间).

Now, the problem is that, I would like to insert (in particular point of an audio mp3 file, between two words) a particular external sound file (like a sound warning) or (if possible a generated warning sound).

问题是:

1)我已经看到PyTTS可以将音频流保存在文件或内存流中.使用两个功能:

1) I have seen that PyTTS have possibilities to save audio stream on a file or in a memory stream. using two function:

tts.SpeakToWave(文件,文本)或tts.SpeakToMemory(文本)

tts.SpeakToWave(file, text) or tts.SpeakToMemory(text)

利用tts.SpeakToMemory(text)函数,并使用PyMedia,我已经能够直接保存mp3,但是mp3文件(再现时)听起来像唐老鸭一样令人难以理解! :-) 这里是一段代码:

Exploiting tts.SpeakToMemory(text) function, and using PyMedia I have been able to save an mp3 directly but mp3 file (when reproducing), sounds uncomprensible like donald duck! :-) Here a snippet of code:

            params = {'id': acodec.getCodecID('mp3'), 'bitrate': 128000, 'sample_rate': 44100, 'ext': 'mp3', 'channels': 2}

            m = tts.SpeakToMemory(p.Text)
            soundBytes = m.GetData()

            enc = acodec.Encoder(params)

            frames = enc.encode(soundBytes)
            f = file("test.mp3", 'wb')
            for frame in frames:
                f.write(frame)
            f.close()

我不明白问题出在哪里?!? 这种可能性(如果可以正常运行),最好跳过wav文件转换步骤.

I can not understand where is the problem?!? This possibility (if it would work correctly), it would be good to skip wav files transformation step.

2)作为第二个问题,我需要将音频mp3文件(从文本到语音模块获取)与特定的警告声音连接起来.

2) As second problem, I need to concatenate audio mp3 file (obtained from text-to-speech module) with a particular warning sound.

很明显,如果在将整个音频存储流编码为唯一的mp3文件之前,可以将文本的音频存储流(在文本转换为语音模块之后)和警告声音的流连接在一起,那就太好了.

Obviously, it would be great if I could concatenate audio memory streams of text (after text-to-speech module) and the stream of a warning sound, before encoding the whole audio memory stream in an unique mp3 file.

我也看到tksnack库可以连接音频,但是它们不能写mp3文件.

I have seen also that tksnack libraries, can concatenate audio, but they are not able to write mp3 files.

我希望已经清楚了. :-)

I hope to have been clear. :-)

非常感谢您对我的问题的回答.

Many thanks to for your answers to my questions.

朱利奥

推荐答案

我认为PyTTS不会生成默认的PCM数据(即44100 Hz,立体声,16位).您应该检查如下格式:

I don't think PyTTS produces default PCM data (i.e. 44100 Hz, stereo, 16-bit). You should check the format like this:

memStream = tts.SpeakToMemory("some text")
format = memStream.Format.GetWaveFormatEx()

...并将其正确移交给acodec.因此,您可以使用属性format.Channelsformat.BitsPerSampleformat.SamplesPerSec.

...and hand it over correctly to acodec. Therefore you can use the attributes format.Channels, format.BitsPerSample and format.SamplesPerSec.

关于第二个问题,如果声音的格式相同,则应该可以将它们全部依次传递给enc.encode.

As to your second question, if the sounds are in the same format, you should be able to simply pass them all to enc.encode, one after another.

这篇关于从PyTTS的音频流中编码mp3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆