Sklearn StratifiedKFold: ValueError: 支持的目标类型是: ('binary', 'multiclass').取而代之的是“多标签指示器" [英] Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
问题描述
使用 Sklearn 分层 kfold 拆分,当我尝试使用多类拆分时,我收到错误消息(见下文).当我尝试使用二进制进行拆分时,它没有问题.
num_classes = len(np.unique(y_train))y_train_categorical = keras.utils.to_categorical(y_train, num_classes)kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)# 将数据分成不同的折叠对于 i, (train_index, val_index) 在 enumerate(kf.split(x_train, y_train_categorical)) 中:x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]ValueError: 支持的目标类型是: ('binary', 'multiclass').取而代之的是多标签指示器".
keras.utils.to_categorical
产生一个单热编码的类向量,即 multilabel-indicator
错误消息中提到.StratifiedKFold
并非设计用于处理此类输入;来自 split
方法 文档:
split
(X, y, groups=None)
[...]
y : 类数组,形状 (n_samples,)
监督学习问题的目标变量.分层是基于 y 标签完成的.
即你的 y
必须是你的类标签的一维数组.
本质上,您要做的只是颠倒操作的顺序:首先拆分(使用您的初始 y_train
),然后将 to_categorical
转换.>
Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem.
num_classes = len(np.unique(y_train))
y_train_categorical = keras.utils.to_categorical(y_train, num_classes)
kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)
# splitting data into different folds
for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):
x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
keras.utils.to_categorical
produces a one-hot encoded class vector, i.e. the multilabel-indicator
mentioned in the error message. StratifiedKFold
is not designed to work with such input; from the split
method docs:
split
(X, y, groups=None)[...]
y : array-like, shape (n_samples,)
The target variable for supervised learning problems. Stratification is done based on the y labels.
i.e. your y
must be a 1-D array of your class labels.
Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train
), and convert to_categorical
afterwards.
这篇关于Sklearn StratifiedKFold: ValueError: 支持的目标类型是: ('binary', 'multiclass').取而代之的是“多标签指示器"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!