样本外测试集的SuperLearner预测 [英] SuperLearner Predictions for Out-of-Sample Test Set

查看:423
本文介绍了样本外测试集的SuperLearner预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R中的SuperLearner包返回SL.predict下训练集中包含的所有观察值的预测值,并且还返回权重(coef)的系数,这些系数对不同的基础算法进行加权,从而构成每次折叠的SuperLearner算法交叉验证,但我无法弄清楚如何使用该程序包获取样本外测试集的预测值.例如,以下是其手册中的玩具示例.我所做的唯一更改是在最后添加了一个保持测试集X2和Y2.如何根据训练集中的SuperLearner模型估算该样本外测试集的预测值?如何保存模型结果,以便将来可以基于同一模型来估计预测值?

The SuperLearner package in R returns predicted values for all observations included in the training set under SL.predict and also returns coefficients (coef) that weight the different underlying algorithms to make up the SuperLearner algorithm for each fold in the cross-validation, but I cannot figure out how to use the package to get predicted values for an out-of sample test set. For example, below is the toy example from their manual. The only change I have made is to add a hold out test set X2 and Y2 at the end. How do I estimate predicted values for this out-of-sample test set based on the SuperLearner model from the training set? How can I save the model results so that I can estimate predicted values in the future based on this same model?

library(SuperLearner)


set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
# build Library and run Super Learner
SL.library <- c("SL.glm", "SL.randomForest", "SL.gam", "SL.polymars", "SL.mean")
## Not run:
test <- CV.SuperLearner(Y = Y, X = X, V = 10, SL.library = SL.library,
  verbose = TRUE, method = "method.NNLS")
test
summary(test)
# Look at the coefficients across folds
coef(test)
## End(Not run)

###Added Test Set
X2 <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X2) <- paste("X", 1:p, sep="")
X2 <- data.frame(X2)
Y2 <- X2[, 1] + sqrt(abs(X2[, 2] * X2[, 3])) + X2[, 2] - X2[, 3] + rnorm(n)

推荐答案

您可以将predict方法用于SuperLearner对象 在所有数据上估算模型后 (CV.SuperLearner根据数据的几个子集估算模型, 不是整个数据).

You can use the predict method for SuperLearner objects after estimating your model on all the data (CV.SuperLearner estimates the model on several subsets of the data, not the whole data).

r <- SuperLearner(Y = Y, X = X, SL.library = SL.library, verbose = TRUE, method = "method.NNLS")
plot( Y2 ~ predict(r, newdata=X2)$pred )

这篇关于样本外测试集的SuperLearner预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆