为 Google Speech API 创建合适的 WAV 文件 [英] Creating suitable WAV files for Google Speech API

查看：68 发布时间：2021/6/23 19:31:05 python wav pyaudio google-speech-api

本文介绍了为 Google Speech API 创建合适的 WAV 文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 pyaudio 将我的声音录制为 wav 文件.我正在使用以下代码:

I'm using pyaudio to record my voice as wav file. I'm using following code:

def voice_recorder():
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 22050
    CHUNK = 1024
    RECORD_SECONDS = 4
    WAVE_OUTPUT_FILENAME = "first.wav"

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
    print "konusun..."
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    #print "finished recording"


    # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()

我为 Google Speech API 使用以下代码，它基本上将 WAV 文件中的语音转换为文本:https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py

I'm using following code for Google Speech API which basically converts the speech in the WAV file to text: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py

当我尝试将 pyaudio 生成的 wav 文件导入 Google 代码时，出现以下错误:

When I try to import the wav file which is generated by pyaudio to Google's code, I'm getting following error:

googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1beta1/speech:syncrecognize?alt=json returned "Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 22050.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 22050.">

我为此使用了以下解决方法:我使用 ffmpeg 将 WAV 文件转换为 MP3，然后我使用 sox 再次将 MP3 文件转换为 wav:

I'm using following workaround for this: I'm converting WAV file to MP3 with ffmpeg, after then I'm converting MP3 file to wav again with sox:

def wav_to_mp3():
    FNULL = open(os.devnull, 'w')
    subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT)

def mp3_to_wav():
    subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav'])

Google 的 API 可以处理此 WAV 输出，但由于质量下降太多，因此性能不佳.

Google's API works with this WAV output but since quality decreases too much, it doesn't perform well.

那么，如何在第一步使用 pyaudio 创建与 Google 兼容的 WAV 文件?

So how can I create Google compatible WAV file with pyaudio at the first step?

为 Google Speech API 创建合适的 WAV 文件 [英] Creating suitable WAV files for Google Speech API

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为 Google Speech API 创建合适的 WAV 文件 [英] Creating suitable WAV files for Google Speech API

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭