R Caret软件包中的Logistic回归调整参数网格? [英] Logistic Regression Tuning Parameter Grid in R Caret Package?

查看:694
本文介绍了R Caret软件包中的Logistic回归调整参数网格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用caret package在R中拟合逻辑回归模型.我已经做了以下事情:

I am trying to fit a logistic regression model in R using the caret package. I have done the following:

model <- train(dec_var ~., data=vars, method="glm", family="binomial",
                 trControl = ctrl, tuneGrid=expand.grid(C=c(0.001, 0.01, 0.1, 1,10,100, 1000)))

但是,我不确定该模型的调整参数应该是什么,并且我很难找到它.我假设它是C,因为C是sklearn中使用的参数.目前,我收到以下错误-

However, I am unsure what the tuning parameter should be for this model and I am having a difficult time finding it. I assumed it is C because C is the parameter used in sklearn. Currently, I am getting the following error -

错误:调整参数网格应具有列参数

Error: The tuning parameter grid should have columns parameter

您对如何解决此问题有任何建议吗?

Do you have any suggestions on how to fix this?

推荐答案

Per Max Kuhn的网络图书-

Per Max Kuhn's web-book - search for method = 'glm' here ,there is no tuning parameter glm within caret.

通过测试一些基本的train调用,我们可以轻松地验证这种情况.首先,让我们从一个方法(rpart)开始,该方法的确对每个网络书都有一个调整参数(cp).

We can easily verify this is the case by testing out a few basic train calls. First off, let's start with a method (rpart) that does have a tuning parameter (cp) per the web book.

library(caret)
data(GermanCredit)

# Check tuning parameter via `modelLookup` (matches up with the web book)
modelLookup('rpart')
#  model parameter                label forReg forClass probModel
#1 rpart        cp Complexity Parameter   TRUE     TRUE      TRUE

# Observe that the `cp` parameter is tuned
set.seed(1)
model_rpart <- train(Class ~., data=GermanCredit, method='rpart')
model_rpart
#CART 

#1000 samples
#  61 predictor
#   2 classes: 'Bad', 'Good' 

#No pre-processing
#Resampling: Bootstrapped (25 reps) 
#Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ... 
#Resampling results across tuning parameters:

#  cp          Accuracy   Kappa    
#  0.01555556  0.7091276  0.2398993
#  0.03000000  0.7025574  0.1950021
#  0.04444444  0.6991700  0.1316720

#Accuracy was used to select the optimal model using  the largest value.
#The final value used for the model was cp = 0.01555556.

我们看到已调整cp参数.现在让我们尝试glm.

We see that the cp parameter was tuned. Now let's try glm.

# Check tuning parameter via `modelLookup` (shows a parameter called 'parameter')
modelLookup('glm')
#  model parameter     label forReg forClass probModel
#1   glm parameter parameter   TRUE     TRUE      TRUE

# Try out the train function to see if 'parameter' gets tuned
set.seed(1)
model_glm <- train(Class ~., data=GermanCredit, method='glm')
model_glm
#Generalized Linear Model 

#1000 samples
#  61 predictor
#   2 classes: 'Bad', 'Good' 

#No pre-processing
#Resampling: Bootstrapped (25 reps) 
#Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ... 
#Resampling results:

#  Accuracy   Kappa    
#  0.7386384  0.3478527

在这种情况下,上面的glm不会执行任何参数调整.根据我的经验,名为parameterparameter似乎只是一个占位符,而不是真正的调整参数.如以下代码所示,即使我们试图迫使它调整parameter,它基本上也只执行一个值.

In this case with glm above there was no parameter tuning performed. From my experience, it appears the parameter named parameter is just a placeholder and not a real tuning parameter. As demonstrated in the code that follows, even if we try to force it to tune parameter it basically only does a single value.

set.seed(1)
model_glm2 <- train(Class ~., data=GermanCredit, method='glm',
                    tuneGrid=expand.grid(parameter=c(0.001, 0.01, 0.1, 1,10,100, 1000)))
model_glm2
#Generalized Linear Model 

#1000 samples
#  61 predictor
#   2 classes: 'Bad', 'Good' 

#No pre-processing
#Resampling: Bootstrapped (25 reps) 
#Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ... 
#Resampling results across tuning parameters:

#  Accuracy   Kappa      parameter
#  0.7386384  0.3478527  0.001    
#  0.7386384  0.3478527  0.001    
#  0.7386384  0.3478527  0.001    
#  0.7386384  0.3478527  0.001    
#  0.7386384  0.3478527  0.001    
#  0.7386384  0.3478527  0.001    
#  0.7386384  0.3478527  0.001    

#Accuracy was used to select the optimal model using  the largest value.
#The final value used for the model was parameter = 0.001.

这篇关于R Caret软件包中的Logistic回归调整参数网格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆