从PyTTS的音频流中编码mp3 [英] encoding mp3 from a audio stream of PyTTS
问题描述
我使用python 2.5处理音频mp3文件中的文本到语音转换文本.
I work on text-to-speech trasforming text, in audio mp3 files, using python 2.5.
我将pyTSS用作python文本语音转换模块,以转换音频.wav文件中的文本(在pyTTS中无法直接以mp3格式编码).因此,之后,我使用lame命令行编码器以mp3格式编码了这些wav文件.
I use pyTSS as a python Text-To-Speech module, to transform text in audio .wav files (in pyTTS is not possible to encode in mp3 format directly). So after that, I code these wav files, in mp3 format, using lame command line encoder.
现在,问题在于,我想在特定的外部声音文件(如声音警告)或(如果可能的话,生成警告声音)中插入(特别是在mp3音频文件中的两个单词之间).
Now, the problem is that, I would like to insert (in particular point of an audio mp3 file, between two words) a particular external sound file (like a sound warning) or (if possible a generated warning sound).
问题是:
1)我已经看到PyTTS可以将音频流保存在文件或内存流中.使用两个功能:
1) I have seen that PyTTS have possibilities to save audio stream on a file or in a memory stream. using two function:
tts.SpeakToWave(文件,文本)或tts.SpeakToMemory(文本)
tts.SpeakToWave(file, text) or tts.SpeakToMemory(text)
利用tts.SpeakToMemory(text)函数,并使用PyMedia,我已经能够直接保存mp3,但是mp3文件(再现时)听起来像唐老鸭一样令人难以理解! :-) 这里是一段代码:
Exploiting tts.SpeakToMemory(text) function, and using PyMedia I have been able to save an mp3 directly but mp3 file (when reproducing), sounds uncomprensible like donald duck! :-) Here a snippet of code:
params = {'id': acodec.getCodecID('mp3'), 'bitrate': 128000, 'sample_rate': 44100, 'ext': 'mp3', 'channels': 2}
m = tts.SpeakToMemory(p.Text)
soundBytes = m.GetData()
enc = acodec.Encoder(params)
frames = enc.encode(soundBytes)
f = file("test.mp3", 'wb')
for frame in frames:
f.write(frame)
f.close()
我不明白问题出在哪里?!? 这种可能性(如果可以正常运行),最好跳过wav文件转换步骤.
I can not understand where is the problem?!? This possibility (if it would work correctly), it would be good to skip wav files transformation step.
2)作为第二个问题,我需要将音频mp3文件(从文本到语音模块获取)与特定的警告声音连接起来.
2) As second problem, I need to concatenate audio mp3 file (obtained from text-to-speech module) with a particular warning sound.
很明显,如果在将整个音频存储流编码为唯一的mp3文件之前,可以将文本的音频存储流(在文本转换为语音模块之后)和警告声音的流连接在一起,那就太好了.
Obviously, it would be great if I could concatenate audio memory streams of text (after text-to-speech module) and the stream of a warning sound, before encoding the whole audio memory stream in an unique mp3 file.
我也看到tksnack库可以连接音频,但是它们不能写mp3文件.
I have seen also that tksnack libraries, can concatenate audio, but they are not able to write mp3 files.
我希望已经清楚了. :-)
I hope to have been clear. :-)
非常感谢您对我的问题的回答.
Many thanks to for your answers to my questions.
朱利奥
推荐答案
我认为PyTTS不会生成默认的PCM数据(即44100 Hz,立体声,16位).您应该检查如下格式:
I don't think PyTTS produces default PCM data (i.e. 44100 Hz, stereo, 16-bit). You should check the format like this:
memStream = tts.SpeakToMemory("some text")
format = memStream.Format.GetWaveFormatEx()
...并将其正确移交给acodec
.因此,您可以使用属性format.Channels
,format.BitsPerSample
和format.SamplesPerSec
.
...and hand it over correctly to acodec
. Therefore you can use the attributes format.Channels
, format.BitsPerSample
and format.SamplesPerSec
.
关于第二个问题,如果声音的格式相同,则应该可以将它们全部依次传递给enc.encode
.
As to your second question, if the sounds are in the same format, you should be able to simply pass them all to enc.encode
, one after another.
这篇关于从PyTTS的音频流中编码mp3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!