如何将.wav文件转换为Pandas DataFrame以便将其馈送到神经网络? [英] How to convert .wav files into a Pandas DataFrame in order to feed it to a neural network?
问题描述
我正在尝试将.wav文件提供给神经网络,以便对其进行训练以检测所讲的内容.所以我有大约 10 000 个 .wav 文件和音频的转录,但是当我尝试将 CSV 文件提供给神经网络时,我收到此错误:ValueError: setting an array element with a sequence.
I'm trying to feed .wav files to a neural network in order to train it to detect what's being said. So I have around 10 000 .wav files and the transcription of the audio, but when I try to feed the CSV file to the neural network I get this error : ValueError: setting an array element with a sequence.
我正在使用 Soundfile 获取没有标题的 .wav 数据并将其放入列表中.我也尝试过其他库,但是结果是一样的.
I'm using Soundfile to get the .wav data without the header and putting it into a list. I've tried other libraries too but the result was the same.
import os
import numpy as np
from tqdm import tqdm
import pandas as pd
import soundfile as sf
path = os.getcwd() + "/stft wav/"
audios = []
total = len(os.listdir(path))
pbar = tqdm(total = total)
for file in os.listdir(path):
data, sr = sf.read(path + file)
audios.append(data)
pbar.update(1)
pbar.close()
然后,我读取带有转录的文件,并创建将被馈送到神经网络的数据集.
Then I read the file with the transcription and create the dataset that's going to be fed to the neural network.
dict = pd.read_csv("dictionary.csv", sep = '\t')
dataset = pd.DataFrame(columns = ['Audio', 'Word'])
dataset.Audio = audios
dataset.Word = dict.Romaji
数据集现在看起来像这样:
The dataset now looks like this :
Audio Word
0 [-2.686136382767934e-11, 1.5804246800144028e-1... inshou
1 [5.0145061436523974e-09, 1.3923349584388234e-0... taishou
2 [-2.253151087927563e-08, 2.173326230092698e-08... genshou
3 [3.0560468644580396e-07, 1.0646554073900916e-0... kishou
4 [0.0, 2.499070395067804e-12, 1.206467304531999... chuushouteki
audio列中的数组大小不一样,但是我已经尝试用零填充它们,并且错误消息继续不变.
The arrays from the audio column don't have the same size, but I already tried padding them with zeros and the error message continues the same.
如果您想知道的话,这就是我的填充方式:
This is how I padded it in case you're wondering :
X = dataset.Audio.copy()
pbar = tqdm(total = len(X['Audio']))
for i in range(0, len(X['Audio'])):
X['Audio'][i] = np.resize(X['Audio'][i], len(max(X['Audio'], key = len)))
pbar.update(1)
pbar.close()
我注意到的一件奇怪的事是,当我保存此CSV文件并再次读取它时,Audio列的float数组会自动转换为字符串数组.我发现保持它应有的唯一方法就是将其另存为泡菜文件.
A weird thing I noticed is that when I save this CSV file and read it again the Audio column's float arrays are automatically converted into string arrays. The only way I found to keep it the way it should be is saving it as a pickle file.
自从我们开始讨论后,请随意建议其他方法来将.wav文件提供给神经网络.我尝试使用此方法代替频谱图,因为我阅读了
Since we're at it, feel free to suggest other methods to feed the .wav files to the neural network. I'm trying to use this method instead of spectrograms because I read here that it's not a good idea.
解决方案
我正在研究类似的问题,找到了一个简单而优雅的解决方案.训练测试拆分后,将音频列传递到神经网络时,请使用 list(X)
而不是仅使用 X
.
I was looking into similar problems and found a simple and elegant solution. After the train-test split, when passing the audios' column to the neural network, use list(X)
instead of just X
.
关于将float数组转换为字符串的CSV文件,这是因为使用了幂表示法.数字中间有一个字母,因此Pandas将其写为float,但将其读为字符串.如前所述,将数据帧另存为pickle文件是可行的,但是与将音频列另存为.npy文件相比,读取该文件花费的时间太长.
About the CSV file converting the float array to string, it's because of the power notation. There's a letter in the middle of the numbers, so Pandas writes it as float, but reads it as string. As I said previously, saving the dataframe as a pickle file works, but it takes too long to read compared to saving the audios' column separately as a .npy file.
推荐答案
看起来您已经解决了这个问题,但是这里似乎还没有提到其他几项.首先,wave是我的Py3.6安装中包含的Python实用程序.
Looks like you already solved this, but here are a couple of other items that it looks like haven't been mentioned. First, wave is a Python utility that was included in my Py3.6 install.
https://docs.python.org/3/library/wave.html
此代码是从(这里)被盗的从这里:
This code is (sorta) stolen from here:
from wave import open as open_wave
waveFile = open_wave(<filename>,'rb')
nframes = waveFile.getnframes()
wavFrames = waveFile.readframes(nframes)
ys = numpy.fromstring(wavFrames, dtype=numpy.int16)
这应该使您可以轻松地将数据放入DF,这似乎是根据线程标题要求的主要项目.
That should enable you to put your data into a DF pretty easily, which appears to be the main item you're asking about based on your thread title.
最后,关于dtypes的DF问题,请注意,DataFrame调用有一个dtype强制选项,我曾在遇到这种情况的情况下使用过.
Lastly, regarding your DF issues with dtypes, note that the DataFrame invocation has a dtype forcing option that I have used in situations like the one you find yourself in.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
这篇关于如何将.wav文件转换为Pandas DataFrame以便将其馈送到神经网络?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!