ValueError:无法将输入数组从形状 (20,590) 广播到形状 (20) [英] ValueError: could not broadcast input array from shape (20,590) into shape (20)

查看:115
本文介绍了ValueError:无法将输入数组从形状 (20,590) 广播到形状 (20)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用声音文件的MFCC从.wav文件中提取功能.尝试将MFCC列表转换为numpy数组时出现错误.我非常确定会发生此错误,因为列表包含具有不同形状的MFCC值(但不确定如何解决此问题).

I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue).

我看了另外2个stackoverflow帖子,但是这些并不能解决我的问题,因为它们对于特定任务而言太具体了.

I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task.

ValueError:无法将输入数组从形状(128,128,3)广播到形状(128,128)

值错误:无法将输入数组从形状(857,3)广播到形状(857)

完整错误消息:

回溯(最近通话最近):文件"/..../.../...../Batch_MFCC_Data.py",第68行,在X = np.array(MFCCs)ValueError:无法将输入数组从形状(20,590)广播到形状(20)

Traceback (most recent call last): File "/..../.../...../Batch_MFCC_Data.py", line 68, in X = np.array(MFCCs) ValueError: could not broadcast input array from shape (20,590) into shape (20)

代码示例:

all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)

MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels

for i, wav_path in enumerate(all_wav_paths):

    individual_MFCC = MFCC_from_wav(wav_path)
    #MFCC_from_wav() -> returns the MFCC coefficients 

    label = get_class(wav_path)
    #get_class() -> returns the label of the wav file either 0 or 1

    #add features and label to the array
    MFCCs.append(individual_MFCC)
    labels.append(label)

#Must convert the training data to a Numpy Array for 
#train_test_split and saving to local drive

X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR

# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)

#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)

#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)

这是MFCC数组中MFCC(来自.wav文件)的形状的快照

Here is a snapshot of the shape of the MFCC's (from .wav files) in the MFCCs array

MFCCs数组包含以下形状:

The MFCCs array contains with the following shapes :

...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....

如您所见,MFCC数组中的MFCC的形状并不完全相同,这是因为记录的时间长度不一样.这就是为什么我不能将数组转换为numpy数组的原因吗?如果这是问题,如何解决此问题以使整个MFCC阵列具有相同的形状?

As you can see, the MFCC's in the MFCCs array don't all have the same shape, and this is because the recordings are not all the same lengths of time. Is this the reason why I can't convert the array to a numpy array? If this is the issue, how do I fix this issue to have the same shape throughout the MFCC array?

任何用于完成此操作的代码片段和建议将不胜感激!

Any code snippets for accomplishing this and advice would be greatly appreciated!

谢谢!

推荐答案

使用以下逻辑将数组降采样为 min_shape ,即将较大的数组缩减为 min_shape

Use the following logic to downsample the arrays to min_shape i.e. reduce larger arrays to min_shape

min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]    

for idx, arr in enumerate(MFCCs):
    MFCCs[idx] = arr[:, :min_shape[1]]

batch_arr = np.array(MFCCs)

然后您可以将这些数组堆叠为批处理数组,如下面的最小示例所示:

And then you can stack these arrays in a batch array as in the below minimal example:

In [33]: a1 = np.random.randn(2, 3)    
In [34]: a2 = np.random.randn(2, 5)    
In [35]: a3 = np.random.randn(2, 10)

In [36]: MFCCs = [a1, a2, a3]

In [37]: min_shape = (2, 2)

In [38]: for idx, arr in enumerate(MFCCs):
    ...:     MFCCs[idx] = arr[:, :min_shape[1]]
    ...:     

In [42]: batch_arr = np.array(MFCCs)

In [43]: batch_arr.shape
Out[43]: (3, 2, 2)


现在,第二种策略是将较小的数组上采样到 max_shape ,遵循类似的逻辑,但用 zeros nan 填充缺失的值code>您喜欢的值.


Now for the second strategy, to upsample the arrays smaller arrays to max_shape, follow similar logic but fill the missing values with either zeros or nan values as you prefer.

然后再次,您可以将数组堆叠为形状为(num_arrays,dim1,dim2)的批处理数组.因此,对于您的情况,形状应为(num_wav_files,20,max_column )

And then again, you can stack the arrays as a batch array of shape (num_arrays, dim1, dim2); So, for your case, the shape should be (num_wav_files, 20, max_column)

这篇关于ValueError:无法将输入数组从形状 (20,590) 广播到形状 (20)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆