如何在python中分层kfold采样中给出测试大小? [英] how to give the test size in stratified kfold sampling in python?

查看:65
本文介绍了如何在python中分层kfold采样中给出测试大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用sklearn,我想在样本数据集中进行3次分割(即n_splits = 3),并且训练/测试比率为70:30.我可以将集合分成三折,但无法定义测试大小(类似于train_test_split方法).是否可以在StratifiedKFold中定义测试样本大小?

Using sklearn , I want to have 3 splits (i.e. n_splits = 3)in the sample dataset and have a Train/Test ratio as 70:30. I'm able split the set into 3 folds but not able to define the test size (similar to train_test_split method).Is there a way to do define test sample size in StratifiedKFold ?

from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
     X_train, X_test = X[train_index], X[test_index]
     y_train, y_test = y[train_index], y[test_index]

推荐答案

StratifiedKFold 确实定义为K倍拆分.也就是说,返回的迭代器将产生( K-1 )集进行训练,而将 1 集进行测试. K n_splits 控制,因此,它确实创建了 n_samples/K 组,并使用了 K-1的所有组合进行培训/测试.请参阅Wikipedia或Google K折交叉验证有关此信息的更多信息.

StratifiedKFold does by definition a K-fold split. This is, the iterator returned will yield (K-1) sets for training while 1 set for testing. K is controlled by n_splits, and thus, it does create groups of n_samples/K, and use all combinations of K-1 for training/testing. Refer to wikipedia or google K-fold cross-validation for more info about it.

简而言之,测试集的大小将为 1/K (即 1/n_splits ),因此您可以调整该参数以控制测试大小(例如 n_splits=3 将测试数据的 1/3 = 33% 大小).但是, StratifiedKFold 将遍历 K-1 K 个组,并且可能不是您想要的.

In short, the size of the test set will be 1/K (i.e. 1/n_splits), so you can tune that parameter to control the test size (e.g. n_splits=3 will have test split of size 1/3 = 33% of your data). However, StratifiedKFold will iterate over K groups of K-1, and might not be what you want.

话虽如此,您可能对

Having said that, you might be interested in StratifiedShuffleSplit, which returns just configurable number of splits and train/test ratio. If you just want a single split, you can tune n_splits=1 and yet keep test_size=0.3 (or whatever ratio you want).

这篇关于如何在python中分层kfold采样中给出测试大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆