如何在sklearn中实现n次重复k折交叉验证，从而产生n * k折? [英] How to implement n times repeated k-folds cross validation that yields nk folds in sklearn?*

查看：486 发布时间：2021/2/14 20:40:25 python scikit-learn keras

本文介绍了如何在sklearn中实现n次重复k折交叉验证，从而产生n * k折?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在实现我在论文中看到的交叉验证设置时遇到了一些麻烦.基本上在下面的图片中对此进行了解释:

I got some trouble in implementing a cross validation setting that i saw in a paper. Basically it is explained in this attached picture:

因此，它说它们使用5折，这表示k = 5.但是随后，作者说他们重复了20次交叉验证，总共产生了100倍的折叠.这是否意味着我可以使用这段代码:

So, it says that they use 5 folds, which means k = 5. But then, the authors said that they repeat the cross validation 20 times, which created 100 folds in total. Does that mean that i can just use this piece of code :

kfold = StratifiedKFold(n_splits=100, shuffle=True, random_state=seed)

因为基本上我的代码也能产生100倍的结果.有什么建议吗?

Cause basically my code also yields 100-folds. Any recommendation?

推荐答案

我很确定他们正在谈论RepeatedStratifiedKFold.您有2种简单的方法可以创建5次折叠20次.

I'm pretty sure they are talking about RepeatedStratifiedKFold. You have 2 simple ways to create 5-folds for 20 times.

方法1:

对于您的情况，为n_splits=5, n_repeats=20.下面的代码只是scikit-learn网站上的示例.

For your case, n_splits=5, n_repeats=20. Code below is just sample from scikit-learn website.

from sklearn.model_selection import RepeatedStratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])

rskf = RepeatedStratifiedKFold(n_splits=2, n_repeats=2,
...     random_state=42)
>>> for train_index, test_index in rskf.split(X, y):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
...
TRAIN: [1 2] TEST: [0 3] # n_repeats==1: the folds are [1 2] and [0 3]
TRAIN: [0 3] TEST: [1 2]
TRAIN: [1 3] TEST: [0 2] # n_repeats==2: the folds are [1 3] and [0 2]
TRAIN: [0 2] TEST: [1 3]

方法2:

通过循环可以达到相同的效果.请注意，random_state不能为固定数字，否则您将获得相同的5折20次.

You can achieve the same effect with looping. Note that the random_state cannot be a fixed number, otherwise you will get the same 5 folds for 20 times.

for i in range(20):
    kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)

为什么与您的代码不同?

Why is it different from your code?

假设您有10000个数据点，并且创建了100折. 1折的大小=100.您的训练组= 9900，而验证组= 100.

Say you have 10000 data points and you create 100 folds. Size of 1 fold = 100. Your training set=9900 versus validation set=100.

RepeatedStratifiedKFold为您的模型创建5折，每折为2000.然后重复进行5折，一次又一次，重复20次.这意味着您可以达到100倍，但验证集却大得多.根据您的目标，您可能需要更大的验证集，例如.具有足够的数据来正确验证，并且RepeatedStratifiedKFold使您能够以不同的方式(具有不同的训练验证比例)创建相同数量的折叠. 除此之外，我不确定是否还有其他目标.

RepeatedStratifiedKFold creates 5 folds for your model, each fold is 2000. Then it repeats making a 5 folds again, and again, for 20 times. That means that you achieve 100 folds, but have a much large validation set. Depending on your objective, you might want a larger validation set, eg. to have enough data to properly validate, and RepeatedStratifiedKFold gives you that ability to create the same number of folds in a different way (with different training-validation proportion). Other than that, I'm not sure if there's any other objectives.

http://scikit-learn.org/stable/modules/generation/sklearn.model_selection.RepeatedStratifiedKFold.html

谢谢RepeatedStratifiedKFold.

这篇关于如何在sklearn中实现n次重复k折交叉验证，从而产生n * k折?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在sklearn中实现n次重复k折交叉验证，从而产生n * k折? [英] How to implement n times repeated k-folds cross validation that yields nk folds in sklearn?*

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在sklearn中实现n次重复k折交叉验证，从而产生n * k折? [英] How to implement n times repeated k-folds cross validation that yields n*k folds in sklearn?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何在sklearn中实现n次重复k折交叉验证，从而产生n * k折? [英] How to implement n times repeated k-folds cross validation that yields nk folds in sklearn?*

登录关闭