为什么在传递StratifiedKFold()作为GridSearchCV的参数期间应该调用split()函数? [英] Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?

查看:537
本文介绍了为什么在传递StratifiedKFold()作为GridSearchCV的参数期间应该调用split()函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要做什么?

我正在尝试在GridSearchCV()中使用StratifiedKFold().

然后,什么让我感到困惑?

当我们使用K折叠交叉验证时,我们只需传递GridSearchCV()内的CV编号,如下所示.

When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV() like the following.

grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=5, scoring='f1', return_train_score=True, n_jobs=2)

然后,当我需要使用StratifiedKFold()时,我认为该过程应该保持不变.也就是说,仅将分割数-StratifiedKFold(n_splits=5)设置为cv.

Then, when I will need to use StratifiedKFold(), I think the procedure should remain same. That is, set the number of splits only - StratifiedKFold(n_splits=5) to cv.

grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=StratifiedKFold(n_splits=5), scoring='f1', return_train_score=True, n_jobs=2)

但是这个答案

无论使用哪种交叉验证策略,所需要做的只是 按照建议使用功能split提供生成器:

whatever the cross validation strategy used, all that is needed is to provide the generator using the function split, as suggested:

kfolds = StratifiedKFold(5)
clf = GridSearchCV(estimator, parameters, scoring=qwk, cv=kfolds.split(xtrain,ytrain))
clf.fit(xtrain, ytrain)

此外,此问题的答案之一也建议这样做这.这意味着,他们建议在使用GridSearchCV()时调用split函数:StratifiedKFold(n_splits=5).split(xtrain,ytrain).但是,我发现调用split()而不调用split()给了我相同的f1分数.

Moreover, one of the answers of this question also suggest to do this. This means, they suggest to call split function :StratifiedKFold(n_splits=5).split(xtrain,ytrain) during using GridSearchCV(). But, I have found that calling split() and without calling split() give me the same f1 score.

因此,我的问题

  • 我不明白为什么在分层K折叠过程中需要调用split()函数 我们在K Fold CV期间不需要做这类事情.

  • I do not understand why do we need to call split() function during Stratified K Fold as we do not need to do such type of things during K Fold CV.

如果调用split()函数,GridSearchCV()将如何作为Split()函数

If split() function is called, how GridSearchCV() will work as Split() function returns training and testing data set indices? That is, I want to know how GridSearchCV() will use those indices?

推荐答案

基本上GridSearchCV很聪明,可以为该cv参数采用多个选项-一个数字,一个拆分索引的迭代器或一个带有拆分函数的对象.您可以在此处,复制到下面.

Basically GridSearchCV is clever and can take multiple options for that cv parameter - a number, an iterator of split indices or an object with a split function. You can look at the code here, copied below.

cv = 5 if cv is None else cv
if isinstance(cv, numbers.Integral):
    if (classifier and (y is not None) and
            (type_of_target(y) in ('binary', 'multiclass'))):
        return StratifiedKFold(cv)
    else:
        return KFold(cv)

if not hasattr(cv, 'split') or isinstance(cv, str):
    if not isinstance(cv, Iterable) or isinstance(cv, str):
        raise ValueError("Expected cv as an integer, cross-validation "
                         "object (from sklearn.model_selection) "
                         "or an iterable. Got %s." % cv)
    return _CVIterableWrapper(cv)

return cv  # New style cv objects are passed without any modification

基本上,如果您不传递任何内容,它将使用带有5的KFold.如果这是分类问题并且目标是二进制/多类,那么它也足够聪明以自动使用StratifedKFold.

Basically if you don't pass anything, it uses a KFold with 5. It's also clever enough to automatically use StratifedKFold, if it's a classification problem and the target is binary/multiclass.

如果传递带有拆分功能的对象,它将仅使用该功能.而且,如果您不传递任何参数,而是传递一个可迭代的对象,则它将假定它是拆分索引的可迭代对象,然后为您进行包装.

If you pass an object with a split function, it just uses that. And if you don't pass any of them, but pass an iterable, it assumes that is an iterable of the split indices and wraps that up for you.

因此,在您的情况下,假设这是二进制/多类目标的分类问题,则以下所有内容都将给出完全相同的结果/拆分-不管您使用哪一个都可以!

So in your case, assuming it's a classification problem with a binary/multiclass target, all the below will give the exact same results/splits - it does not matter which one you use!

cv=5
cv=StratifiedKFold(5)
cv=StratifiedKFold(5).split(xtrain,ytrain)

这篇关于为什么在传递StratifiedKFold()作为GridSearchCV的参数期间应该调用split()函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆