scikit-learn 中的分层训练/测试拆分 [英] Stratified Train/Test-split in scikit-learn

查看:73
本文介绍了scikit-learn 中的分层训练/测试拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将我的数据分成训练集 (75%) 和测试集 (25%).我目前使用以下代码执行此操作:

I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below:

X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo)   

但是,我想对我的训练数据集进行分层.我怎么做?我一直在研究 StratifiedKFold 方法,但没有让我指定 75%/25% 的分割并且只对训练数据集进行分层.

However, I'd like to stratify my training dataset. How do I do that? I've been looking into the StratifiedKFold method, but doesn't let me specifiy the 75%/25% split and only stratify the training dataset.

推荐答案

[update for 0.17]

[update for 0.17]

查看sklearn.model_selection.train_test_split:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    stratify=y, 
                                                    test_size=0.25)

[/更新为 0.17]

[/update for 0.17]

这里有一个拉取请求.但是你可以简单地做 train, test = next(iter(StratifiedKFold(...)))并根据需要使用训练和测试索引.

There is a pull request here. But you can simply do train, test = next(iter(StratifiedKFold(...))) and use the train and test indices if you want.

这篇关于scikit-learn 中的分层训练/测试拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆