如何将.wav文件转换为Pandas DataFrame以便将其馈送到神经网络? [英] How to convert .wav files into a Pandas DataFrame in order to feed it to a neural network?

查看:102
本文介绍了如何将.wav文件转换为Pandas DataFrame以便将其馈送到神经网络?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将.wav文件提供给神经网络,以便对其进行训练以检测所讲的内容.所以我有大约 10 000 个 .wav 文件和音频的转录,但是当我尝试将 CSV 文件提供给神经网络时,我收到此错误:ValueError: setting an array element with a sequence.

I'm trying to feed .wav files to a neural network in order to train it to detect what's being said. So I have around 10 000 .wav files and the transcription of the audio, but when I try to feed the CSV file to the neural network I get this error : ValueError: setting an array element with a sequence.

我正在使用 Soundfile 获取没有标题的 .wav 数据并将其放入列表中.我也尝试过其他库,但是结果是一样的.

I'm using Soundfile to get the .wav data without the header and putting it into a list. I've tried other libraries too but the result was the same.

import os
import numpy as np
from tqdm import tqdm
import pandas as pd
import soundfile as sf

path = os.getcwd() + "/stft wav/"
audios = []
total = len(os.listdir(path))
pbar = tqdm(total = total)
for file in os.listdir(path):
    data, sr = sf.read(path + file)
    audios.append(data)
    pbar.update(1)
pbar.close()

然后,我读取带有转录的文件,并创建将被馈送到神经网络的数据集.

Then I read the file with the transcription and create the dataset that's going to be fed to the neural network.

dict = pd.read_csv("dictionary.csv", sep = '\t')
dataset = pd.DataFrame(columns = ['Audio', 'Word'])
dataset.Audio = audios
dataset.Word = dict.Romaji

数据集现在看起来像这样:

The dataset now looks like this :

    Audio                                               Word
0   [-2.686136382767934e-11, 1.5804246800144028e-1...   inshou
1   [5.0145061436523974e-09, 1.3923349584388234e-0...   taishou
2   [-2.253151087927563e-08, 2.173326230092698e-08...   genshou
3   [3.0560468644580396e-07, 1.0646554073900916e-0...   kishou
4   [0.0, 2.499070395067804e-12, 1.206467304531999...   chuushouteki

audio列中的数组大小不一样,但是我已经尝试用零填充它们,并且错误消息继续不变.

​The arrays from the audio column don't have the same size, but I already tried padding them with zeros and the error message continues the same.

如果您想知道的话,这就是我的填充方式:

This is how I padded it in case you're wondering :

X = dataset.Audio.copy()
pbar = tqdm(total = len(X['Audio']))
for i in range(0, len(X['Audio'])):
    X['Audio'][i] = np.resize(X['Audio'][i], len(max(X['Audio'], key = len)))
    pbar.update(1)
pbar.close()

我注意到的一件奇怪的事是,当我保存此CSV文件并再次读取它时,Audio列的float数组会自动转换为字符串数组.我发现保持它应有的唯一方法就是将其另存为泡菜文件.

A weird thing I noticed is that when I save this CSV file and read it again the Audio column's float arrays are automatically converted into string arrays. The only way I found to keep it the way it should be is saving it as a pickle file.

自从我们开始讨论后,请随意建议其他方法来将.wav文件提供给神经网络.我尝试使用此方法代替频谱图,因为我阅读了

Since we're at it, feel free to suggest other methods to feed the .wav files to the neural network. I'm trying to use this method instead of spectrograms because I read here that it's not a good idea.

解决方案

我正在研究类似的问题,找到了一个简单而优雅的解决方案.训练测试拆分后,将音频列传递到神经网络时,请使用 list(X)而不是仅使用 X .

I was looking into similar problems and found a simple and elegant solution. After the train-test split, when passing the audios' column to the neural network, use list(X) instead of just X.

关于将float数组转换为字符串的CSV文件,这是因为使用了幂表示法.数字中间有一个字母,因此Pandas将其写为float,但将其读为字符串.如前所述,将数据帧另存为pickle文件是可行的,但是与将音频列另存为.npy文件相比,读取该文件花费的时间太长.

About the CSV file converting the float array to string, it's because of the power notation. There's a letter in the middle of the numbers, so Pandas writes it as float, but reads it as string. As I said previously, saving the dataframe as a pickle file works, but it takes too long to read compared to saving the audios' column separately as a .npy file.

推荐答案

看起来您已经解决了这个问题,但是这里似乎还没有提到其他几项.首先,wave是我的Py3.6安装中包含的Python实用程序.

Looks like you already solved this, but here are a couple of other items that it looks like haven't been mentioned. First, wave is a Python utility that was included in my Py3.6 install.

https://docs.python.org/3/library/wave.html

此代码是从(这里)被盗的从这里:

This code is (sorta) stolen from here:

from wave import open as open_wave
waveFile = open_wave(<filename>,'rb')
nframes = waveFile.getnframes()
wavFrames = waveFile.readframes(nframes)
ys = numpy.fromstring(wavFrames, dtype=numpy.int16)

这应该使您可以轻松地将数据放入DF,这似乎是根据线程标题要求的主要项目.

That should enable you to put your data into a DF pretty easily, which appears to be the main item you're asking about based on your thread title.

最后,关于dtypes的DF问题,请注意,DataFrame调用有一个dtype强制选项,我曾在遇到这种情况的情况下使用过.

Lastly, regarding your DF issues with dtypes, note that the DataFrame invocation has a dtype forcing option that I have used in situations like the one you find yourself in.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

这篇关于如何将.wav文件转换为Pandas DataFrame以便将其馈送到神经网络?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆