为 Google Speech API 创建合适的 WAV 文件 [英] Creating suitable WAV files for Google Speech API

查看:68
本文介绍了为 Google Speech API 创建合适的 WAV 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 pyaudio 将我的声音录制为 wav 文件.我正在使用以下代码:

I'm using pyaudio to record my voice as wav file. I'm using following code:

def voice_recorder():
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 22050
    CHUNK = 1024
    RECORD_SECONDS = 4
    WAVE_OUTPUT_FILENAME = "first.wav"

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
    print "konusun..."
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    #print "finished recording"


    # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()

我为 Google Speech API 使用以下代码,它基本上将 WAV 文件中的语音转换为文本:https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py

I'm using following code for Google Speech API which basically converts the speech in the WAV file to text: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py

当我尝试将 pyaudio 生成的 wav 文件导入 Google 代码时,出现以下错误:

When I try to import the wav file which is generated by pyaudio to Google's code, I'm getting following error:

googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1beta1/speech:syncrecognize?alt=json returned "Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 22050.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 22050.">

我为此使用了以下解决方法:我使用 ffmpeg 将 WAV 文件转换为 MP3,然后我使用 sox 再次将 MP3 文件转换为 wav:

I'm using following workaround for this: I'm converting WAV file to MP3 with ffmpeg, after then I'm converting MP3 file to wav again with sox:

def wav_to_mp3():
    FNULL = open(os.devnull, 'w')
    subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT)

def mp3_to_wav():
    subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav'])

Google 的 API 可以处理此 WAV 输出,但由于质量下降太多,因此性能不佳.

Google's API works with this WAV output but since quality decreases too much, it doesn't perform well.

那么,如何在第一步使用 pyaudio 创建与 Google 兼容的 WAV 文件?

So how can I create Google compatible WAV file with pyaudio at the first step?

推荐答案

用avconv将wav文件转换为flac文件并发送到Google Speech API解决了问题

Converting wav file to flac file with avconv and sending it to Google Speech API solved the problem

subprocess.call(['avconv', '-i', 'first.wav', '-y', '-ar', '48000', '-ac', '1', 'last.flac'])

这篇关于为 Google Speech API 创建合适的 WAV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆