sklearn 中 StratifiedKFold 和 StratifiedShuffleSplit 的区别 [英] difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

查看:32
本文介绍了sklearn 中 StratifiedKFold 和 StratifiedShuffleSplit 的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从标题我想知道两者之间有什么区别

As from the title I am wondering what is the difference between

StratifiedKFold 带有参数 shuffle= 真

StratifiedKFold(n_splits=10, shuffle=True, random_state=0)

StratifiedShuffleSplit

StratifiedShuffleSplit(n_splits=10, test_size=’default’, train_size=None, random_state=0)

以及使用 StratifiedShuffleSplit 的优势是什么

and what is the advantage of using StratifiedShuffleSplit

推荐答案

在 KFolds 中,每个测试集不应重叠,即使使用 shuffle.使用 KFolds 和 shuffle,数据在开始时被 shuffle 一次,然后分成所需的 splits 数.测试数据总是分裂之一,训练数据是其余的.

In KFolds, each test set should not overlap, even with shuffle. With KFolds and shuffle, the data is shuffled once at the start, and then divided into the number of desired splits. The test data is always one of the splits, the train data is the rest.

在ShuffleSplit中,数据每次都是shuffle,然后split.这意味着测试集可能会在拆分之间重叠.

In ShuffleSplit, the data is shuffled every time, and then split. This means the test sets may overlap between the splits.

有关差异的示例,请参阅此块.注意 ShuffleSplit 测试集中元素的重叠.

See this block for an example of the difference. Note the overlap of the elements in the test sets for ShuffleSplit.

splits = 5

tx = range(10)
ty = [0] * 5 + [1] * 5

from sklearn.model_selection import StratifiedShuffleSplit, StratifiedKFold
from sklearn import datasets

kfold = StratifiedKFold(n_splits=splits, shuffle=True, random_state=42)
shufflesplit = StratifiedShuffleSplit(n_splits=splits, random_state=42, test_size=2)

print("KFold")
for train_index, test_index in kfold.split(tx, ty):
    print("TRAIN:", train_index, "TEST:", test_index)

print("Shuffle Split")
for train_index, test_index in shufflesplit.split(tx, ty):
    print("TRAIN:", train_index, "TEST:", test_index)

输出:

KFold
TRAIN: [0 2 3 4 5 6 7 9] TEST: [1 8]
TRAIN: [0 1 2 3 5 7 8 9] TEST: [4 6]
TRAIN: [0 1 3 4 5 6 8 9] TEST: [2 7]
TRAIN: [1 2 3 4 6 7 8 9] TEST: [0 5]
TRAIN: [0 1 2 4 5 6 7 8] TEST: [3 9]
Shuffle Split
TRAIN: [8 4 1 0 6 5 7 2] TEST: [3 9]
TRAIN: [7 0 3 9 4 5 1 6] TEST: [8 2]
TRAIN: [1 2 5 6 4 8 9 0] TEST: [3 7]
TRAIN: [4 6 7 8 3 5 1 2] TEST: [9 0]
TRAIN: [7 2 6 5 4 3 0 9] TEST: [1 8]

至于何时使用它们,我倾向于使用 KFolds 进行任何交叉验证,并且我使用 ShuffleSplit 并为我的训练/测试集拆分使用拆分为 2.但我确信两者还有其他用例.

As for when to use them, I tend to use KFolds for any cross validation, and I use ShuffleSplit with a split of 2 for my train/test set splits. But I'm sure there are other use cases for both.

这篇关于sklearn 中 StratifiedKFold 和 StratifiedShuffleSplit 的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆