如何使用sklearn将数据分为3个或更多部分 [英] how can I split data in 3 or more parts with sklearn
本文介绍了如何使用sklearn将数据分为3个或更多部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想将数据分为分层的训练,测试和验证数据集,但是sklearn仅提供cross_validation.train_test_split,该数据只能分为2个部分.
I want to split data into train,test and validation datasets which are stratification, but sklearn only provides cross_validation.train_test_split which only can divide into 2 pieces. What should i do if i want do this
推荐答案
If you want to use a Stratified Train/Test split, you can use StratifiedKFold in Sklearn
Suppose X
is your features and y
are your labels, based on the example here :
from sklearn.model_selection import StratifiedKFold
cv_stf = StratifiedKFold(n_splits=3)
for train_index, test_index in skf.split(X, y):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
更新:要将数据分成3个不同的百分比,请使用 numpy.split()可以像这样完成:
Update : To split data into say 3 different percentages use numpy.split() can be done like this :
X_train, X_test, X_validate = np.split(X, [int(.7*len(X)), int(.8*len(X))])
y_train, y_test, y_validate = np.split(y, [int(.7*len(y)), int(.8*len(y))])
这篇关于如何使用sklearn将数据分为3个或更多部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文