使用GridSearch时使用Scikit-learn建立模型帮助 [英] Model help using Scikit-learn when using GridSearch

查看：127 发布时间：2020/5/4 8:53:15 python machine-learning scikit-learn cross-validation grid-search

本文介绍了使用GridSearch时使用Scikit-learn建立模型帮助的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为Enron项目的一部分，构建了附加模型，以下是步骤的摘要，

As part of the Enron project, built the attached model, Below is the summary of the steps,

cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42)
gcv = GridSearchCV(pipe, clf_params,cv=cv)

gcv.fit(features,labels) ---> with the full dataset

for train_ind, test_ind in cv.split(features,labels):
    x_train, x_test = features[train_ind], features[test_ind]
    y_train, y_test = labels[train_ind],labels[test_ind]

    gcv.best_estimator_.predict(x_test)

下面的模型给出了更合理但得分更低的

cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42)
gcv = GridSearchCV(pipe, clf_params,cv=cv)

gcv.fit(features,labels) ---> with the full dataset

for train_ind, test_ind in cv.split(features,labels):
     x_train, x_test = features[train_ind], features[test_ind]
     y_train, y_test = labels[train_ind],labels[test_ind]

     gcv.best_estimator_.fit(x_train,y_train)
     gcv.best_estimator_.predict(x_test)

使用Kbest找出分数并对其功能进行排序，并尝试组合较高和较低的分数.

Used Kbest to find out the scores and sorted the features and trying a combination of higher and lower scores.

通过StratifiedShuffle将SVM与GridSearch一起使用

Used SVM with a GridSearch using a StratifiedShuffle

使用best_estimator_来预测和计算精度以及召回率.

Used the best_estimator_ to predict and calculate the precision and recall.

问题在于估算器会吐出完美的分数，在某些情况下为1

The problem is estimator is spitting out perfect scores, in some case 1

但是，当我在训练数据上重新拟合最佳分类器然后运行测试时，它会给出合理的分数.

But when I refit the best classifier on training data then run the test it gives reasonable scores.

我的疑问/问题是，在使用我们发送给它的Shuffle拆分对象进行拆分之后，GridSearch究竟对测试数据做了什么.我以为它不适合测试数据，如果是真的，那么当我预测使用相同的测试数据时，它应该不会给出如此高的分数.由于我使用random_state值，因此shufflesplit应该为Grid Fit和预测创建了相同的副本.

My doubt/question was what exactly GridSearch does with the test data after the split using the Shuffle split object we send in to it. I assumed it would not fit anything on Test data, if that was true then when I predict using the same test data, it should not give this high scores right.? since i used random_state value, the shufflesplit should have created the same copy for the Grid fit and also for the predict.

那么，是否对两个错误使用相同的Shufflesplit?

So, is using the same Shufflesplit for two wrong?

使用GridSearch时使用Scikit-learn建立模型帮助 [英] Model help using Scikit-learn when using GridSearch

问题描述

下面的模型给出了更合理但得分更低的

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

使用GridSearch时使用Scikit-learn建立模型帮助 [英] Model help using Scikit-learn when using GridSearch

问题描述

下面的模型给出了更合理但得分更低的

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭