无需交叉验证的Scikit Learn GridSearchCV(无监督学习) [英] Scikit Learn GridSearchCV without cross validation (unsupervised learning)

查看:785
本文介绍了无需交叉验证的Scikit Learn GridSearchCV(无监督学习)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在没有交叉验证的情况下使用GridSearchCV?我正在尝试通过网格搜索优化KMeans集群中的集群数量,因此我不需要或想要交叉验证.

Is it possible to use GridSearchCV without cross validation? I am trying to optimize the number of clusters in KMeans clustering via grid search, and thus I don't need or want cross validation.

文档也使我感到困惑,因为fit()方法,它有一个用于非监督学习的选项(例如,将None用于非监督学习).但是,如果您想进行无监督学习,则需要在没有交叉验证的情况下进行学习,而且似乎没有摆脱交叉验证的选择.

The documentation is also confusing me because under the fit() method, it has an option for unsupervised learning (says to use None for unsupervised learning). But if you want to do unsupervised learning, you need to do it without cross validation and there appears to be no option to get rid of cross validation.

推荐答案

经过大量搜索,我能够找到此线程.如果使用以下方法,似乎可以摆脱GridSearchCV中的交叉验证:

After much searching, I was able to find this thread. It appears that you can get rid of cross validation in GridSearchCV if you use:

cv=[(slice(None), slice(None))]

我已经针对我自己的网格搜索的编码版本(没有交叉验证)对其进行了测试,并且从这两种方法中我都得到了相同的结果.我将这个答案发布到我自己的问题上,以防其他人遇到同样的问题.

I have tested this against my own coded version of grid search without cross validation and I get the same results from both methods. I am posting this answer to my own question in case others have the same issue.

在评论中回答jjrr的问题,这是一个示例用例:

to answer jjrr's question in the comments, here is an example use case:

from sklearn.metrics import silhouette_score as sc

def cv_silhouette_scorer(estimator, X):
    estimator.fit(X)
    cluster_labels = estimator.labels_
    num_labels = len(set(cluster_labels))
    num_samples = len(X.index)
    if num_labels == 1 or num_labels == num_samples:
        return -1
    else:
        return sc(X, cluster_labels)

cv = [(slice(None), slice(None))]
gs = GridSearchCV(estimator=sklearn.cluster.MeanShift(), param_grid=param_dict, 
                  scoring=cv_silhouette_scorer, cv=cv, n_jobs=-1)
gs.fit(df[cols_of_interest])

这篇关于无需交叉验证的Scikit Learn GridSearchCV(无监督学习)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆