glm（）模型的交叉验证 [英] Cross validation for glm() models

查看：311 发布时间：2020/10/11 19:53:39 r partitioning prediction glm cross-validation

本文介绍了glm（）模型的交叉验证的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试对我先前在R中构建的某些glm模型进行10倍交叉验证。我对 cv.glm（） boot 包中的$ c>函数。当我提供以下公式时：

I'm trying to do a 10-fold cross validation for some glm models that I have built earlier in R. I'm a little confused about the cv.glm() function in the boot package, although I've read a lot of help files. When I provide the following formula:

library(boot)
cv.glm(data, glmfit, K=10)

此处的数据参数是指整个数据集还是仅指测试集？

Does the "data" argument here refer to the whole dataset or only to the test set?

到目前为止，我所看到的示例都将数据参数作为测试集，但并没有真正的意义，例如为什么在同一数据上做10倍测试集？他们都会给出完全相同的结果（我想是！）。

The examples I have seen so far provide the "data" argument as the test set but that did not really make sense, such as why do 10-folds on the same test set? They are all going to give exactly the same result (I assume!).

不幸的是，？cv.glm 解释

data：包含数据的矩阵或数据帧。行应为
例，列应对应于变量，其中之一是
响应

data: A matrix or data frame containing the data. The rows should be cases and the columns correspond to variables, one of which is the response

My其他问题将与 $ delta [1] 结果有关。这是10次试验中的平均预测误差吗？如果我想每次折叠都出错怎么办？

My other question would be about the $delta[1] result. Is this the average prediction error over the 10 trials? What if I want to get the error for each fold?

这是我的脚本的样子：

##data partitioning
sub <- sample(nrow(data), floor(nrow(x) * 0.9))
training <- data[sub, ]
testing <- data[-sub, ]

##model building
model <- glm(formula = groupcol ~ var1 + var2 + var3,
        family = "binomial", data = training)

##cross-validation
cv.glm(testing, model, K=10)

推荐答案

对于使用各种程序包的十折交叉验证方法，我总是有些谨慎。我有自己的简单脚本，可以为任何机器学习包手动创建测试和培训分区：

I am always a little cautious about using various packages 10-fold cross validation methods. I have my own simple script to create the test and training partitions manually for any machine learning package:

#Randomly shuffle the data
yourData<-yourData[sample(nrow(yourData)),]

#Create 10 equally size folds
folds <- cut(seq(1,nrow(yourData)),breaks=10,labels=FALSE)

#Perform 10 fold cross validation
for(i in 1:10){
    #Segement your data by fold using the which() function 
    testIndexes <- which(folds==i,arr.ind=TRUE)
    testData <- yourData[testIndexes, ]
    trainData <- yourData[-testIndexes, ]
    #Use test and train data partitions however you desire...
}

这篇关于glm（）模型的交叉验证的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

glm（）模型的交叉验证 [英] Cross validation for glm() models

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

glm（）模型的交叉验证 [英] Cross validation for glm() models

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭