scikit-learn 中的分层训练/测试拆分 [英] Stratified Train/Test-split in scikit-learn
问题描述
我需要将我的数据分成训练集 (75%) 和测试集 (25%).我目前使用以下代码执行此操作:
I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below:
X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo)
但是,我想对我的训练数据集进行分层.我怎么做?我一直在研究 StratifiedKFold
方法,但没有让我指定 75%/25% 的分割并且只对训练数据集进行分层.
However, I'd like to stratify my training dataset. How do I do that? I've been looking into the StratifiedKFold
method, but doesn't let me specifiy the 75%/25% split and only stratify the training dataset.
推荐答案
[update for 0.17]
[update for 0.17]
查看sklearn.model_selection.train_test_split
:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
stratify=y,
test_size=0.25)
[/更新为 0.17]
[/update for 0.17]
这里有一个拉取请求.但是你可以简单地做 train, test = next(iter(StratifiedKFold(...)))
并根据需要使用训练和测试索引.
There is a pull request here.
But you can simply do train, test = next(iter(StratifiedKFold(...)))
and use the train and test indices if you want.
这篇关于scikit-learn 中的分层训练/测试拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!