仅基于网格分数的Scikit-Learn RFECV功能数量 [英] Scikit-Learn RFECV number of features based on grid scores only

查看：285 发布时间：2020/5/4 9:43:05 algorithm python-2.7 machine-learning scikit-learn

本文介绍了仅基于网格分数的Scikit-Learn RFECV功能数量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

来自scikit-learn RFE文档，该算法会依次选择较小的特征集，并且仅保留权重最高的特征.权重较低的要素将被删除，并且此过程会重复进行，直到剩余的要素数量与用户指定的数量匹配(或默认为原始要素数量的一半)为止.

From the scikit-learn RFE documentation, successively smaller sets of features are selected by the algorithm and only the features with the highest weights are preserved. Features with low weights are dropped and this process repeats itself until the number of features remaining matches that specified by the user (or is taken to be half of the original number of features by default).

RFECV文档表示这些功能是在RFE和KFCV中排名.

The RFECV docs indicate that the features are ranked with RFE and KFCV.

我们的文档示例RFECV :

from sklearn.svm import SVC
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV,RFE
from sklearn.datasets import make_classification

# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000, n_features=25, n_informative=3,
                           n_redundant=2, n_repeated=0, n_classes=8,
                           n_clusters_per_class=1, random_state=0)

# Create the RFE object and compute a cross-validated score.
svc = SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(y, 2),scoring='accuracy')
rfecv.fit(X, y)
rfe = RFE(estimator=svc, step=1)
rfe.fit(X, y)

print('Original number of features is %s' % X.shape[1])
print("RFE final number of features : %d" % rfe.n_features_)
print("RFECV final number of features : %d" % rfecv.n_features_)
print('')

import numpy as np
g_scores = rfecv.grid_scores_
indices = np.argsort(g_scores)[::-1]
print('Printing RFECV results:')
for f in range(X.shape[1]):
    print("%d. Number of features: %d;
                  Grid_Score: %f" % (f + 1, indices[f]+1, g_scores[indices[f]]))

这是我得到的输出:

Original number of features is 25
RFE final number of features : 12
RFECV final number of features : 3

Printing RFECV results:
1. Number of features: 3; Grid_Score: 0.818041
2. Number of features: 4; Grid_Score: 0.816065
3. Number of features: 5; Grid_Score: 0.816053
4. Number of features: 6; Grid_Score: 0.799107
5. Number of features: 7; Grid_Score: 0.797047
6. Number of features: 8; Grid_Score: 0.783034
7. Number of features: 10; Grid_Score: 0.783022
8. Number of features: 9; Grid_Score: 0.781992
9. Number of features: 11; Grid_Score: 0.778028
10. Number of features: 12; Grid_Score: 0.774052
11. Number of features: 14; Grid_Score: 0.762015
12. Number of features: 13; Grid_Score: 0.760075
13. Number of features: 15; Grid_Score: 0.752003
14. Number of features: 16; Grid_Score: 0.750015
15. Number of features: 18; Grid_Score: 0.750003
16. Number of features: 22; Grid_Score: 0.748039
17. Number of features: 17; Grid_Score: 0.746003
18. Number of features: 19; Grid_Score: 0.739105
19. Number of features: 20; Grid_Score: 0.739021
20. Number of features: 21; Grid_Score: 0.738003
21. Number of features: 23; Grid_Score: 0.729068
22. Number of features: 25; Grid_Score: 0.725056
23. Number of features: 24; Grid_Score: 0.725044
24. Number of features: 2; Grid_Score: 0.506952
25. Number of features: 1; Grid_Score: 0.272896

在此特定示例中:

对于RFE:代码始终返回12个功能(大约25个功能的一半，正如文档所预期的那样)
对于RFECV，代码返回的数字不同于1-25(不是功能数量的一半)

该被选择RFECV时，特征的数量正在拾取仅基于KFCV分数

在我看来 - 即交叉验证分数压倒的特征RFE的连续修剪

It seems to me that when RFECV is being selected, the number of features is being picked only based on the KFCV scores - i.e. the cross validation scores are over-riding RFE's successive pruning of features.

这是真的吗?如果要使用本机递归特征消除算法，那么RFECV是使用此算法还是使用它的混合版本?

Is this true? If one would like to use the native recursive feature elimination algorithm, then is RFECV using this algorithm or is it using a hybrid version of it?

在RFECV中，是否对修剪后剩余的特征子集进行交叉验证?如果是这样，则每次修剪后在RFECV中可以保留多少个功能?

In RFECV, is the cross-validation being done on the subset of features remaining after pruning? If so, how many features are kept after each prune in RFECV?

仅基于网格分数的Scikit-Learn RFECV功能数量 [英] Scikit-Learn RFECV number of features based on grid scores only

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

仅基于网格分数的Scikit-Learn RFECV功能数量 [英] Scikit-Learn RFECV number of features based on grid scores only

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭