为什么不能在bestglm的输出上使用cv.glm? [英] Why can't I use cv.glm on the output of bestglm?

查看：362 发布时间：2020/5/4 3:21:04 r machine-learning logistic-regression cross-validation

本文介绍了为什么不能在bestglm的输出上使用cv.glm?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在葡萄酒数据集上进行最佳子集选择，然后我想使用10倍CV得出测试错误率.我使用的代码是-

I am trying to do best subset selection on the wine dataset, and then I want to get the test error rate using 10 fold CV. The code I used is -

cost1 <- function(good, pi=0) mean(abs(good-pi) > 0.5)
res.best.logistic <-
    bestglm(Xy = winedata,
            family = binomial,          # binomial family for logistic
            IC = "AIC",                 # Information criteria
            method = "exhaustive")
res.best.logistic$BestModels
best.cv.err<- cv.glm(winedata,res.best.logistic$BestModel,cost1, K=10)

但是，这给出了错误-

Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "NULL"

我认为$ BestModel是代表最合适的lm对象，这就是手册也说.如果是这样，那为什么在cv.glm的帮助下，为什么不能使用10倍CV在它上面找到测试错误?

I thought that $BestModel is the lm-object that represents the best fit, and that's what manual also says. If that's the case, then why cant I find the test error on it using 10 fold CV, with the help of cv.glm?

使用的数据集是来自 https://archive的白葡萄酒数据集. ics.uci.edu/ml/datasets/Wine+Quality ，并且使用的软件包是cv.glm的boot软件包和bestglm软件包.

The dataset used is the white wine dataset from https://archive.ics.uci.edu/ml/datasets/Wine+Quality and the package used is the boot package for cv.glm, and the bestglm package.

数据被处理为-

winedata <- read.delim("winequality-white.csv", sep = ';')
winedata$quality[winedata$quality< 7] <- "0" #recode
winedata$quality[winedata$quality>=7] <- "1" #recode
winedata$quality <- factor(winedata$quality)# Convert the column to a factor
names(winedata)[names(winedata) == "quality"] <- "good"      #rename 'quality' to 'good'

推荐答案

bestglm fit重新排列数据并将响应变量命名为y，因此，如果将其传递回cv.glm，winedata将没有y列之后崩溃

bestglm fit rearranges your data and name your response variable as y, hence if you pass it back into cv.glm, winedata does not have a column y and everything crashes after that

检查什么是最好的课程:

It's always good to check what is the class:

class(res.best.logistic$BestModel)
[1] "glm" "lm"

但是，如果您查看res.best.logistic$BestModel的调用:

But if you look at the call of res.best.logistic$BestModel:

res.best.logistic$BestModel$call

glm(formula = y ~ ., family = family, data = Xi, weights = weights)

head(res.best.logistic$BestModel$model)
  y fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
1 0           7.0             0.27        0.36           20.7     0.045
2 0           6.3             0.30        0.34            1.6     0.049
3 0           8.1             0.28        0.40            6.9     0.050
4 0           7.2             0.23        0.32            8.5     0.058
5 0           7.2             0.23        0.32            8.5     0.058
6 0           8.1             0.28        0.40            6.9     0.050
  free.sulfur.dioxide density   pH sulphates
1                  45  1.0010 3.00      0.45
2                  14  0.9940 3.30      0.49
3                  30  0.9951 3.26      0.44
4                  47  0.9956 3.19      0.40
5                  47  0.9956 3.19      0.40
6                  30  0.9951 3.26      0.44

您可以在通话等中替换事物，但这太混乱了.拟合并不昂贵，因此可以对winedata进行拟合并将其传递给cv.glm:

You can substitute things in the call etc, but it's too much of a mess. Fitting is not costly, so make a fit on winedata and pass it to cv.glm:

best_var = apply(res.best.logistic$BestModels[,-ncol(winedata)],1,which)
# take the variable names for best model
best_var = names(best_var[[1]])
new_form = as.formula(paste("good ~", paste(best_var,collapse="+")))
fit = glm(new_form,winedata,family="binomial")

best.cv.err<- cv.glm(winedata,fit,cost1, K=10)

这篇关于为什么不能在bestglm的输出上使用cv.glm?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么不能在bestglm的输出上使用cv.glm? [英] Why can't I use cv.glm on the output of bestglm?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

为什么不能在bestglm的输出上使用cv.glm? [英] Why can&#39;t I use cv.glm on the output of bestglm?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

为什么不能在bestglm的输出上使用cv.glm? [英] Why can't I use cv.glm on the output of bestglm?

登录关闭