重采样数据 - 使用来自 imblearn 的 SMOTE 和 3D numpy 数组 [英] resampling data - using SMOTE from imblearn with 3D numpy arrays
问题描述
我想重新采样我的数据集.这包括带有 3 个类别标签的分类转换数据.每类样本量为:
I want to resample my dataset. This consists in categorical transformed data with labels of 3 classes. The amount of samples per class are:
- A 类计数:6945
- B 类计数:650
- C 类计数:9066
- 样本总数:16661
没有标签的数据形状是 (16661, 1000, 256).这意味着 (1000,256) 的 16661 个样本.我想要的是将数据上采样到多数类的样本数,即 A 类 -> (6945)
The data shape without labels is (16661, 1000, 256). This means 16661 samples of (1000,256). What I would like is to up-sampling the data up to the number of samples from the majority class, that is, class A -> (6945)
但是,调用时:
from imblearn.over_sampling import SMOTE
print(categorical_vector.shape)
sm = SMOTE(random_state=2)
X_train_res, y_labels_res = sm.fit_sample(categorical_vector, labels.ravel())
它一直说 ValueError: Found array with dim 3. Estimator expected <= 2.
It keeps saying ValueError: Found array with dim 3. Estimator expected <= 2.
我怎样才能以一种估算器可以拟合数据并且它也有意义的方式展平数据?此外,如何在获得 X_train_res 后展开(具有 3D 维度)?
How can I flatten the data in a way that the estimator could fit it and that it makes sense too? Furthermore, how can I unflatten (with 3D dimension) after getting X_train_res?
推荐答案
我正在考虑一个虚拟的 3d
数组并自己假设一个 2d
数组大小,
I am considering a dummy 3d
array and assuming a 2d
array size by myself,
arr = np.random.rand(160, 10, 25)
orig_shape = arr.shape
print(orig_shape)
输出:(160, 10, 25)
arr = np.reshape(arr, (arr.shape[0], arr.shape[1]))
print(arr.shape)
输出:(4000, 10)
arr = np.reshape(arr, orig_shape))
print(arr.shape)
输出:(160, 10, 25)
这篇关于重采样数据 - 使用来自 imblearn 的 SMOTE 和 3D numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!