使用插入符号rfe选择特征并使用另一种方法进行训练 [英] Feature selection with caret rfe and training with another method

查看:413
本文介绍了使用插入符号rfe选择特征并使用另一种方法进行训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我正在尝试使用Caret rfe函数执行功能选择,因为我处于p >> n的情况下,大多数不涉及某种正则化的回归技术都无法用得好.我已经使用过一些带有正则化(Lasso)的技术,但是现在我想尝试的是减少功能的数量,以便至少可以体面地在其上运行任何类型的回归算法.

Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it.

control <- rfeControl(functions=rfFuncs, method="cv", number=5)
model <- rfe(trainX, trainY, rfeControl=control)
predict(model, testX)

现在,如果我这样做,将运行使用随机森林的特征选择算法,然后根据5倍交叉验证将具有最佳特征集的模型用于特征选择算法.预测吧?

Right now, if I do it like this, a feature selection algorithm using random forest will be run, and then the model with the best set of features, according to the 5-fold cross-validation, will be used for the prediction, right?

我对这里的两件事感到好奇: 1)是否有一种简单的方法来获取一组特征,并在其上训练用于选择特征的另一项功能?例如,将特征数量从500个减少到20个左右似乎更重要,然后应用k最近邻.

I'm curious about two things here: 1) Is there an easy way to take the set of feature, and train another function on it that the one used for the feature selection? For example, reducing the number of features from 500 to 20 or so that seem more important and then applying k-nearest neighborhood.

我正在想像这样的简单方法:

I'm imagining an easy way to do it that would look like that:

control <- rfeControl(functions=rfFuncs, method="cv", number=5)
model <- rfe(trainX, trainY, method = "knn", rfeControl=control)
predict(model, testX)

2)有没有办法调整特征选择算法的参数?我想对mtry的值进行一些控制.使用插入符号中的火车功能时,可以使用传递值网格的相同方法.有没有办法用rfe做这样的事情?

2) Is there a way to tune the parameters of the feature selection algorithm? I would like to have some control on the values of mtry. The same way you can pass a grid of value when you are using the train function from Caret. Is there a way to do such a thing with rfe?

推荐答案

以下是有关如何使用内置模型执行rfe的简短示例:

Here is a short example on how to perform rfe with an inbuilt model:

library(caret)
library(mlbench) #for the data
data(Sonar)

rctrl1 <- rfeControl(method = "cv",
                     number = 3,
                     returnResamp = "all",
                     functions = caretFuncs,
                     saveDetails = TRUE)

model <- rfe(Class ~ ., data = Sonar,
             sizes = c(1, 5, 10, 15),
             method = "knn",
             trControl = trainControl(method = "cv",
                                      classProbs = TRUE),
             tuneGrid = data.frame(k = 1:10),
             rfeControl = rctrl1)

model
#output
Recursive feature selection

Outer resampling method: Cross-Validated (3 fold) 

Resampling performance over subset size:

 Variables Accuracy  Kappa AccuracySD KappaSD Selected
         1   0.6006 0.1984    0.06783 0.14047         
         5   0.7113 0.4160    0.04034 0.08261         
        10   0.7357 0.4638    0.01989 0.03967         
        15   0.7741 0.5417    0.05981 0.12001        *
        60   0.7696 0.5318    0.06405 0.13031         

The top 5 variables (out of 15):
   V11, V12, V10, V49, V9

model$fit$results
#output
    k  Accuracy     Kappa AccuracySD   KappaSD
1   1 0.8082684 0.6121666 0.07402575 0.1483508
2   2 0.8089610 0.6141450 0.10222599 0.2051025
3   3 0.8173377 0.6315411 0.07004865 0.1401424
4   4 0.7842208 0.5651094 0.08956707 0.1761045
5   5 0.7941775 0.5845479 0.07367886 0.1482536
6   6 0.7841775 0.5640338 0.06729946 0.1361090
7   7 0.7932468 0.5821317 0.07545889 0.1536220
8   8 0.7687229 0.5333385 0.05164023 0.1051902
9   9 0.7982468 0.5918922 0.07461116 0.1526814
10 10 0.8030087 0.6024680 0.06117471 0.1229467

有关更多自定义的信息,请参见:

for more customization see:

https://topepo.github.io/caret/recursive-feature -elimination.html

这篇关于使用插入符号rfe选择特征并使用另一种方法进行训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆