使用 GridSearch 时使用 Scikit-learn 的模型帮助 [英] Model help using Scikit-learn when using GridSearch

查看：39 发布时间：2021/12/14 9:33:45 python machine-learning scikit-learn cross-validation grid-search

本文介绍了使用 GridSearch 时使用 Scikit-learn 的模型帮助的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为安然项目的一部分，构建了附加模型，以下是步骤的总结，

As part of the Enron project, built the attached model, Below is the summary of the steps,

cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42)
gcv = GridSearchCV(pipe, clf_params,cv=cv)

gcv.fit(features,labels) ---> with the full dataset

for train_ind, test_ind in cv.split(features,labels):
    x_train, x_test = features[train_ind], features[test_ind]
    y_train, y_test = labels[train_ind],labels[test_ind]

    gcv.best_estimator_.predict(x_test)

以下模型给出了更合理但较低的分数

cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42)
gcv = GridSearchCV(pipe, clf_params,cv=cv)

gcv.fit(features,labels) ---> with the full dataset

for train_ind, test_ind in cv.split(features,labels):
     x_train, x_test = features[train_ind], features[test_ind]
     y_train, y_test = labels[train_ind],labels[test_ind]

     gcv.best_estimator_.fit(x_train,y_train)
     gcv.best_estimator_.predict(x_test)

使用 Kbest 找出分数并对特征进行排序，并尝试组合较高和较低的分数.

Used Kbest to find out the scores and sorted the features and trying a combination of higher and lower scores.

将 SVM 与使用 StratifiedShuffle 的 GridSearch 结合使用

Used SVM with a GridSearch using a StratifiedShuffle

使用 best_estimator_ 来预测和计算准确率和召回率.

Used the best_estimator_ to predict and calculate the precision and recall.

问题是 estimator 输出完美的分数，在某些情况下是 1

The problem is estimator is spitting out perfect scores, in some case 1

但是当我在训练数据上重新调整最好的分类器然后运行测试时，它给出了合理的分数.

But when I refit the best classifier on training data then run the test it gives reasonable scores.

我的疑问/问题是使用我们发送给它的 Shuffle 拆分对象拆分后，GridSearch 对测试数据究竟做了什么.我假设它不适合测试数据，如果这是真的，那么当我使用相同的测试数据进行预测时，它不应该给出这么高的分数，对吧.?因为我使用了 random_state 值，shufflesplit 应该为网格拟合和预测创建相同的副本.

My doubt/question was what exactly GridSearch does with the test data after the split using the Shuffle split object we send in to it. I assumed it would not fit anything on Test data, if that was true then when I predict using the same test data, it should not give this high scores right.? since i used random_state value, the shufflesplit should have created the same copy for the Grid fit and also for the predict.

那么，对两个错误使用相同的 Shufflesplit 吗?

So, is using the same Shufflesplit for two wrong?

使用 GridSearch 时使用 Scikit-learn 的模型帮助 [英] Model help using Scikit-learn when using GridSearch

问题描述

以下模型给出了更合理但较低的分数

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

使用 GridSearch 时使用 Scikit-learn 的模型帮助 [英] Model help using Scikit-learn when using GridSearch

问题描述

以下模型给出了更合理但较低的分数

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭