GridSearchCV.best_score_的含义是评分设置为“准确性"和CV [英] GridSearchCV.best_score_ meaning when scoring set to 'accuracy' and CV

查看:1689
本文介绍了GridSearchCV.best_score_的含义是评分设置为“准确性"和CV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在知名的威斯康星州癌症数据集(569个样本,31个特征+目标)上找到适用于乳腺癌样本分类的最佳模型神经网络模型.我正在使用sklearn 0.18.1.到目前为止,我还没有使用Normalization.解决此问题后,我会添加它.

I'm trying to find the best model Neural Network model applied for the classification of breast cancer samples on the well-known Wisconsin Cancer dataset (569 samples, 31 features + target). I'm using sklearn 0.18.1. I'm not using Normalization so far. I'll add it when I solve this question.

# some init code omitted
X_train, X_test, y_train, y_test = train_test_split(X, y)

为GridSearchCV定义参数NN参数

Define params NN params for the GridSearchCV

tuned_params = [{'solver': ['sgd'], 'learning_rate': ['constant'], "learning_rate_init" : [0.001, 0.01, 0.05, 0.1]},
                {"learning_rate_init" : [0.001, 0.01, 0.05, 0.1]}]

CV方法和模型

cv_method = KFold(n_splits=4, shuffle=True)
model = MLPClassifier()

应用网格

grid = GridSearchCV(estimator=model, param_grid=tuned_params, cv=cv_method, scoring='accuracy')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

如果我跑步:

print(grid.best_score_)
print(accuracy_score(y_test, y_pred))

结果为 0.746478873239 0.902097902098

根据文档"best_score_:浮点数,剩余数据上的best_estimator分数".我认为,在KFold指定的遗留数据上,运行在 tuned_pa​​rams 中指定的次数(由KFold指定的次数)中的8种不同配置所获得的精度最高.我说的对吗?

According to the doc "best_score_ : float, Score of best_estimator on the left out data". I assume it is the best accuracy among the ones obtained running the 8 different configuration as especified in tuned_params the number of times especified by KFold, on the left out data as especified by KFold. Am I right?

还有一个问题.是否有一种方法可以找到要在 train_test_split (默认值为0.25)中使用的测试数据的最佳大小?

One more question. Is there a method to find the optimal size of test data to use in train_test_split which defaults to 0.25?

非常感谢

参考

  • http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
  • http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV
  • http://scikit-learn.org/stable/modules/grid_search.html
  • http://scikit-learn.org/stable/modules/cross_validation.html
  • http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py

推荐答案

grid.best_score_是您在tuned_params中指定的参数的单个组合的所有cv折叠的平均值.

The grid.best_score_ is the average of all cv folds for a single combination of the parameters you specify in the tuned_params.

为了访问有关网格搜索过程的其他相关详细信息,您可以查看grid.cv_results_属性.

In order to access other relevant details about the grid searching process, you can look at the grid.cv_results_ attribute.

摘自GridSearchCV的文档:

From the documentation of GridSearchCV:

cv_results_:numpy(隐藏的)ndarray的字典

cv_results_ : dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, 
that can be imported into a pandas DataFrame

它包含诸如"split0_test_score"之类的键, 'split1_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score', 'split0_train_score', 'split1_train_score', 'mean_train_score', 等,它提供了有关整个执行过程的其他信息.

It contains keys like 'split0_test_score', 'split1_test_score' , 'mean_test_score', 'std_test_score', 'rank_test_score', 'split0_train_score', 'split1_train_score', 'mean_train_score', etc, which gives additional information about the whole execution.

这篇关于GridSearchCV.best_score_的含义是评分设置为“准确性"和CV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆