如何使用sklearn将数据分为3个或更多部分 [英] how can I split data in 3 or more parts with sklearn

查看:88
本文介绍了如何使用sklearn将数据分为3个或更多部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将数据分为分层的训练,测试和验证数据集,但是sklearn仅提供cross_validation.train_test_split,该数据只能分为2个部分.

I want to split data into train,test and validation datasets which are stratification, but sklearn only provides cross_validation.train_test_split which only can divide into 2 pieces. What should i do if i want do this

推荐答案

如果要使用分层训练/测试拆分,则可以使用

If you want to use a Stratified Train/Test split, you can use StratifiedKFold in Sklearn

假设 X 是您的特征,y 是您的标签,基于示例

Suppose X is your features and y are your labels, based on the example here :

from sklearn.model_selection import StratifiedKFold
cv_stf = StratifiedKFold(n_splits=3)
for train_index, test_index in skf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

更新:要将数​​据分成3个不同的百分比,请使用 numpy.split()可以像这样完成:

Update : To split data into say 3 different percentages use numpy.split() can be done like this :

X_train, X_test, X_validate  = np.split(X, [int(.7*len(X)), int(.8*len(X))])
y_train, y_test, y_validate  = np.split(y, [int(.7*len(y)), int(.8*len(y))])

这篇关于如何使用sklearn将数据分为3个或更多部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆