预言.glmnet:某些因素在新数据中仅具有一个层次 [英] predict.glmnet: Some Factors Have Only One Level in New Data

查看:194
本文介绍了预言.glmnet:某些因素在新数据中仅具有一个层次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用glmnet在R中训练了一个弹性网络模型,并希望使用它来根据新数据集进行预测.

I've trained an elastic net model in R using glmnet and would like to use it to make predictions off of a new data set.

但是我在生成矩阵以用作predict()方法中的参数时遇到了麻烦,因为在新数据集中我的某些因子变量(指示合并症的虚拟变量)只有一个级别(合并症)从未被观察到),这意味着我无法使用

But I'm having trouble producing the matrix to use as an argument in the predict() method because some of my factor variables (dummy variables indicating the presence of comorbidities) in the new data set only have one level (the comorbidities were never observed), which means I can't use

model.matrix(RESPONSE〜.,new_data)

model.matrix(RESPONSE ~ ., new_data)

因为它给了我(期望的)

because it gives me the (expected)

contrasts<-中的错误(*tmp*,值= contr.funs [1 + isOF [nn]]): 对比只能应用于具有2个或更多级别的因子

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

我对如何解决这个问题不知所措.在这种情况下,R中是否有一种方法可以构造一个合适的矩阵供predict()使用,还是需要在R之外准备矩阵?无论哪种情况,我该怎么做?

I'm at a loss for how to get around this issue. Is there a way in R that I can construct an appropriate matrix for use in predict() in this situation, or do I need to prepare the matrix outside of R? In either case, how might I go about doing it?

这是一个玩具示例,它再现了我遇到的问题:

Here is a toy example that reproduces the issue I'm having:

x1 <- rnorm(100)
x2 <- as.factor(rbinom(100, 1, 0.6))
x3 <- as.factor(rbinom(100, 1, 0.4))
y <- rbinom(100, 1, 0.2)

toy_data <- data.frame(x1, x2, x3, y)
colnames(toy_data) = c("Continuous", "FactorA", "FactorB", "Outcome")

mat1 <- model.matrix(Outcome ~ ., toy_data)[,-1]
y1 <- toy_data$Outcome

new_data <- toy_data
new_data$FactorB <- as.factor(0)

#summary(new_data) # Just to verify that FactorB now only contains one level

mat2 <- model.matrix(Outcome ~ ., new_data)[,-1]

推荐答案

在示例中,您可以将数据集的levels设置为与完整数据集的levels相匹配.一个因子可以在levels中存在值,即使该值不存在于变量中.

You can set the levels of your dataset to match the levels of the complete dataset in your example. A factor can have values present in the levels even when that value isn't present in the variable.

您可以使用factor()中的levels参数执行此操作:

You can do this with the levels argument in factor():

new_data$FactorB <- factor(0, levels = levels(toy_data$FactorB))

或通过使用levels()函数进行分配:

Or by using the levels() function with assignment:

levels(new_data$FactorB) <- levels(toy_data$FactorB)

使用任一方法,一旦您拥有多个级别,model.matrix()便可以正常工作:

Using either approach, model.matrix() works properly once you have more than one level:

head( model.matrix(Outcome ~ ., new_data)[,-1] )
   Continuous FactorA1 FactorB1
1 -1.91632972        0        0
2  1.11411267        0        0
3 -1.21333837        1        0
4 -0.06311276        0        0
5  1.31599915        0        0
6  0.36374591        1        0

这篇关于预言.glmnet:某些因素在新数据中仅具有一个层次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆