使用显式(预定义)验证集进行网格搜索与 sklearn [英] Using explicit (predefined) validation set for grid search with sklearn

查看:24
本文介绍了使用显式(预定义)验证集进行网格搜索与 sklearn的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,它之前被分成了 3 组:训练、验证和测试.必须按照给定的方式使用这些集合,以便比较不同算法的性能.

I have a dataset, which has previously been split into 3 sets: train, validation and test. These sets have to be used as given in order to compare the performance across different algorithms.

我现在想使用验证集优化我的 SVM 的参数.但是,我找不到如何将验证集显式输入到 sklearn.grid_search.GridSearchCV() 中.下面是我之前用于在训练集上进行 K 折交叉验证的一些代码.但是,对于这个问题,我需要使用给定的验证集.我该怎么做?

I would now like to optimize the parameters of my SVM using the validation set. However, I cannot find how to input the validation set explicitly into sklearn.grid_search.GridSearchCV(). Below is some code I've previously used for doing K-fold cross-validation on the training set. However, for this problem I need to use the validation set as given. How can I do that?

from sklearn import svm, cross_validation
from sklearn.grid_search import GridSearchCV

# (some code left out to simplify things)

skf = cross_validation.StratifiedKFold(y_train, n_folds=5, shuffle = True)
clf = GridSearchCV(svm.SVC(tol=0.005, cache_size=6000,
                             class_weight=penalty_weights),
                     param_grid=tuned_parameters,
                     n_jobs=2,
                     pre_dispatch="n_jobs",
                     cv=skf,
                     scoring=scorer)
clf.fit(X_train, y_train)

推荐答案

使用 PredefinedSplit

ps = PredefinedSplit(test_fold=your_test_fold)

然后在GridSearchCV

test_fold : 类数组,形状 (n_samples,)

test_fold : "array-like, shape (n_samples,)

test_fold[i] 给出样本 i 的测试集折叠.值为 -1 表示相应样本不属于任何测试集折叠,而是始终放入训练折叠中.

test_fold[i] gives the test set fold of sample i. A value of -1 indicates that the corresponding sample is not part of any test set folds, but will instead always be put into the training fold.

另见此处

使用验证集时,将所有属于验证集的样本的 test_fold 设置为 0,并将所有其他样本的 test_fold 设置为 -1.

when using a validation set, set the test_fold to 0 for all samples that are part of the validation set, and to -1 for all other samples.

这篇关于使用显式(预定义)验证集进行网格搜索与 sklearn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆