Python,speech_recognition工具无法识别.wav文件 [英] Python, speech_recognition tool does not recognize .wav file

查看:155
本文介绍了Python,speech_recognition工具无法识别.wav文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经生成了一个.wav音频文件,其中包含一些语音和其他一些干扰语音. 这段代码为我测试了一个.wav文件:

I have generated a .wav audio file containing some speech with some other interference speech in the background. This code worked for me for a test .wav file:

    import speech_recognition as sr

    r = sr.Recognizer()
    with sr.WavFile(wav_path) as source:
        audio = r.record(source)

    text = r.recognize_google(audio)

如果我使用.wav文件,则会出现以下错误:

If I use my .wav file, I get the following error:

ValueError:无法将音频文件读取为PCM WAV,AIFF/AIFF-C或本机FLAC.检查文件是否损坏或其他格式

ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format

如果我将这个.wav文件保存为声音文件,情况会略有改善:

The situation slightly improves if I save this .wav file with soundfile:

    import soundfile as sf        

    wav, samplerate = sf.read(wav_path)
    sf.write(saved_wav_path, original_wav, fs)

,然后将新的save_wav_path加载回第一段代码,这一次,我得到:

and then load the new saved_wav_path back into the first block of code, this time I get:

如果不是isinstance(actual_result,dict)或len(actual_result.get("alternative",[]))== 0:引发UnknownValueError()

if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()

音频文件另存为

    wavfile.write(wav_path, fs, data)

其中wav_path ='data.wav'.有什么想法吗?

where wav_path = 'data.wav'. Any ideas?

解决方案:

通过以下方式保存音频数据会生成正确的.wav文件:

Saving the audio data the following way generates the correct .wav files:

    import wavio
    wavio.write(wav_path, data, fs ,sampwidth=2)

推荐答案

通过简单地查看speech_recognition包中的代码,似乎它使用了Python标准库中的wave来读取WAV文件. Python的wave库不能处理浮点WAV文件,因此必须确保将speech_recognition与以整数格式保存的文件一起使用.

From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.

SciPy的函数scipy.io.wavfile.write,它将创建一个整数文件.因此,如果data是浮点numpy数组,则可以尝试以下操作:

SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:

from scipy.io import wavfile

# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)

wavfile.write(wav_path, fs, y)

然后尝试使用speech_recognition读取该文件.

Then try to read that file with speech_recognition.

或者,您可以使用 wavio (我创建的一个小型库)来保存您的数据到WAV文件.它还使用Python的wave库创建其输出,因此speech_recognition应该能够读取其创建的文件.

Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.

这篇关于Python,speech_recognition工具无法识别.wav文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆