如何在python中分层kfold采样中给出测试大小? [英] how to give the test size in stratified kfold sampling in python?
问题描述
使用sklearn,我想在样本数据集中进行3次分割(即n_splits = 3),并且训练/测试比率为70:30.我可以将集合分成三折,但无法定义测试大小(类似于train_test_split方法).是否可以在StratifiedKFold中定义测试样本大小?
Using sklearn , I want to have 3 splits (i.e. n_splits = 3)in the sample dataset and have a Train/Test ratio as 70:30. I'm able split the set into 3 folds but not able to define the test size (similar to train_test_split method).Is there a way to do define test sample size in StratifiedKFold ?
from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
推荐答案
StratifiedKFold
确实定义为K倍拆分.也就是说,返回的迭代器将产生( K-1
)集进行训练,而将 1
集进行测试. K
由 n_splits
控制,因此,它确实创建了 n_samples/K
组,并使用了 K-1的所有组合
进行培训/测试.请参阅Wikipedia或Google K折交叉验证有关此信息的更多信息.
StratifiedKFold
does by definition a K-fold split. This is, the iterator returned will yield (K-1
) sets for training while 1
set for testing. K
is controlled by n_splits
, and thus, it does create groups of n_samples/K
, and use all combinations of K-1
for training/testing. Refer to wikipedia or google K-fold cross-validation for more info about it.
简而言之,测试集的大小将为 1/K
(即 1/n_splits
),因此您可以调整该参数以控制测试大小(例如 n_splits=3
将测试数据的 1/3 = 33%
大小).但是, StratifiedKFold
将遍历 K-1
的 K
个组,并且可能不是您想要的.
In short, the size of the test set will be 1/K
(i.e. 1/n_splits
), so you can tune that parameter to control the test size (e.g. n_splits=3
will have test split of size 1/3 = 33%
of your data). However, StratifiedKFold
will iterate over K
groups of K-1
, and might not be what you want.
Having said that, you might be interested in StratifiedShuffleSplit, which returns just configurable number of splits and train/test ratio. If you just want a single split, you can tune n_splits=1
and yet keep test_size=0.3
(or whatever ratio you want).
这篇关于如何在python中分层kfold采样中给出测试大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!