样本外测试集的SuperLearner预测 [英] SuperLearner Predictions for Out-of-Sample Test Set
问题描述
R中的SuperLearner
包返回SL.predict
下训练集中包含的所有观察值的预测值,并且还返回权重(coef
)的系数,这些系数对不同的基础算法进行加权,从而构成每次折叠的SuperLearner算法交叉验证,但我无法弄清楚如何使用该程序包获取样本外测试集的预测值.例如,以下是其手册中的玩具示例.我所做的唯一更改是在最后添加了一个保持测试集X2和Y2.如何根据训练集中的SuperLearner模型估算该样本外测试集的预测值?如何保存模型结果,以便将来可以基于同一模型来估计预测值?
The SuperLearner
package in R returns predicted values for all observations included in the training set under SL.predict
and also returns coefficients (coef
) that weight the different underlying algorithms to make up the SuperLearner algorithm for each fold in the cross-validation, but I cannot figure out how to use the package to get predicted values for an out-of sample test set. For example, below is the toy example from their manual. The only change I have made is to add a hold out test set X2 and Y2 at the end. How do I estimate predicted values for this out-of-sample test set based on the SuperLearner model from the training set? How can I save the model results so that I can estimate predicted values in the future based on this same model?
library(SuperLearner)
set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
# build Library and run Super Learner
SL.library <- c("SL.glm", "SL.randomForest", "SL.gam", "SL.polymars", "SL.mean")
## Not run:
test <- CV.SuperLearner(Y = Y, X = X, V = 10, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS")
test
summary(test)
# Look at the coefficients across folds
coef(test)
## End(Not run)
###Added Test Set
X2 <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X2) <- paste("X", 1:p, sep="")
X2 <- data.frame(X2)
Y2 <- X2[, 1] + sqrt(abs(X2[, 2] * X2[, 3])) + X2[, 2] - X2[, 3] + rnorm(n)
推荐答案
您可以将predict
方法用于SuperLearner
对象
在所有数据上估算模型后
(CV.SuperLearner
根据数据的几个子集估算模型,
不是整个数据).
You can use the predict
method for SuperLearner
objects
after estimating your model on all the data
(CV.SuperLearner
estimates the model on several subsets of the data,
not the whole data).
r <- SuperLearner(Y = Y, X = X, SL.library = SL.library, verbose = TRUE, method = "method.NNLS")
plot( Y2 ~ predict(r, newdata=X2)$pred )
这篇关于样本外测试集的SuperLearner预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!