使用gridsearchCV()后没有得到更好的结果,而是手动得到了更好的结果 [英] Not getting better results after using gridsearchCV(), rather getting better manually

查看:813
本文介绍了使用gridsearchCV()后没有得到更好的结果,而是手动得到了更好的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过在Knearistneighbors上对其进行测试来学习gridsearchCV的工作原理. 当我分配n_neighbors = 9时,我的分类器得分为0.9122807017543859

I was trying to learn working of gridsearchCV, by testing it on Knearistneighbors. When I assigned n_neighbors = 9 my classifier gave a score of 0.9122807017543859

但是当我在给它n_neighbors = 9的同时使用gridsearchCV时,我得到的分数为0.8947368421052632.

but when I used gridsearchCV while giving it n_neighbors = 9, in the list,I get the score of 0.8947368421052632.

可能是什么原因? 任何努力表示赞赏. 这是我的代码

What could possibly be the reason? Any effort is appreciated. Here's my code

from sklearn import datasets
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split as splitter
import pickle       
from sklearn.neighbors import KNeighborsClassifier  
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

# Data pre-processing  <-----------------------

data = datasets.load_breast_cancer()
p=data
add=data.target.reshape(569,1)  
columns = np.append(data.feature_names, 
                    data.target_names[0],
                    axis=None)
data = np.append(data.data,
                 add,
                 axis=1)                        
df = pd.DataFrame(data=data,columns=columns)

X_train,X_test,y_train,y_test = splitter(p.data,
                                         p.target,
                                         test_size=0.3,
                                         random_state=12)




gauss = KNeighborsClassifier(n_neighbors=9)

param_grid={'n_neighbors':[1,2,3,4,5,6,7,8,9,11,12,13,10]}

gausCV = GridSearchCV(KNeighborsClassifier(),param_grid,verbose=False)


gauss.fit(X_train,y_train)
gausCV.fit(X_train,y_train)

print(gauss.score(X_test,y_test))
print(gausCV.score(X_test,y_test))

这就是我所得到的

0.9122807017543859
0.8947368421052632

推荐答案

问题不在于邻居的数量,而在于交叉验证". GridSearchCV进程不仅尝试param_grid中具有的所有值,而且还执行一些数据操作:数据的折叠".这是对数据进行多次重采样,以帮助使最终分类器对新数据尽可能强健.考虑到您在gaussgausCV模型之间得到的分数接近,几乎可以肯定,所绘制的数据会影响结果,但影响不大.

The issue is not in the number of neighbors, but in the "cross validation". The GridSearchCV process not only attempts all of the values that you have in the param_grid, but also performs some data manipulation: the "folds" of the data. This is resampling data mulitple times so as to help make the final classifier as robust to new data as possible. Given how close the scores are that you get between the gauss and gausCV models, it is almost certain that the data being drawn is affecting the results, but not heavily.

这是一个很好的例子,为什么仅接受具有最高得分"的模型可能并不总是最好的途径:与经过交叉验证的模型相比,我对通过交叉验证获得良好评分的模型有更大的信心(所有其他条件均相等).

This is a good example of why just accepting a model with the highest "score" might not always be the best path: I would have greater faith in a model that scored well having gone through cross-validation than one that had not (all else equal).

这里很好地描述了 -验证.

这篇关于使用gridsearchCV()后没有得到更好的结果,而是手动得到了更好的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆