使用na.action = NULL的model.matrix()吗? [英] model.matrix() with na.action=NULL?
问题描述
我有一个公式和一个数据框,我想提取model.matrix()
.但是,我需要得到的矩阵包括原始数据集中找到的NA.如果要使用model.frame()
进行此操作,则只需将其传递给na.action=NULL
.但是,我需要的输出是model.matrix()
格式.具体来说,我只需要右侧变量,我需要输出是矩阵(而不是数据框),并且需要将因子转换为一系列虚拟变量.
I have a formula and a data frame, and I want to extract the model.matrix()
. However, I need the resulting matrix to include the NAs that were found in the original dataset. If I were to use model.frame()
to do this, I would simply pass it na.action=NULL
. However, the output I need is of the model.matrix()
format. Specifically, I need only the right-hand side variables, I need the output to be a matrix (not a data frame), and I need factors to be converted to a series of dummy variables.
我确定我可以使用循环或其他方法将某些东西合并在一起,但是我想知道是否有人可以提出一种更清洁,更有效的解决方法.非常感谢您的宝贵时间!
I'm sure I could hack something together using loops or something, but I was wondering if anyone could suggest a cleaner and more efficient workaround. Thanks a lot for your time!
这是一个例子:
dat <- data.frame(matrix(rnorm(20),5,4), gl(5,2))
dat[3,5] <- NA
names(dat) <- c(letters[1:4], 'fact')
ff <- a ~ b + fact
# This omits the row with a missing observation on the factor
model.matrix(ff, dat)
# This keeps the NA, but it gives me a data frame and does not dichotomize the factor
model.frame(ff, dat, na.action=NULL)
这就是我想要获得的:
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.7266086 0 0 0 0
2 1 -0.6088697 0 0 0 0
3 NA 0.4643360 NA NA NA NA
4 1 -1.1666248 1 0 0 0
5 1 -0.7577394 0 1 0 0
6 1 0.7266086 0 1 0 0
7 1 -0.6088697 0 0 1 0
8 1 0.4643360 0 0 1 0
9 1 -1.1666248 0 0 0 1
10 1 -0.7577394 0 0 0 1
推荐答案
根据行名,您可以在model.matrix
对象上稍作改动:
You can mess around a little with the model.matrix
object, based on the rownames :
MM <- model.matrix(ff,dat)
MM <- MM[match(rownames(dat),rownames(MM)),]
MM[,"b"] <- dat$b
rownames(MM) <- rownames(dat)
给出:
> MM
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.9583010 0 0 0 0
2 1 0.3266986 0 0 0 0
3 NA 1.4992358 NA NA NA NA
4 1 1.2867461 1 0 0 0
5 1 0.5024700 0 1 0 0
6 1 0.9583010 0 1 0 0
7 1 0.3266986 0 0 1 0
8 1 1.4992358 0 0 1 0
9 1 1.2867461 0 0 0 1
10 1 0.5024700 0 0 0 1
或者,您可以使用contrasts()
为您完成工作.手动构建矩阵为:
Alternatively, you can use contrasts()
to do the work for you. Constructing the matrix by hand would be :
cont <- contrasts(dat$fact)[as.numeric(dat$fact),]
colnames(cont) <- paste("fact",colnames(cont),sep="")
out <- cbind(1,dat$b,cont)
out[is.na(dat$fact),1] <- NA
colnames(out)[1:2]<- c("Intercept","b")
rownames(out) <- rownames(dat)
给出:
> out
Intercept b fact2 fact3 fact4 fact5
1 1 0.2534288 0 0 0 0
2 1 0.2697760 0 0 0 0
3 NA -0.8236879 NA NA NA NA
4 1 -0.6053445 1 0 0 0
5 1 0.4608907 0 1 0 0
6 1 0.2534288 0 1 0 0
7 1 0.2697760 0 0 1 0
8 1 -0.8236879 0 0 1 0
9 1 -0.6053445 0 0 0 1
10 1 0.4608907 0 0 0 1
在任何情况下,两种方法都可以合并到可以处理更复杂公式的函数中.我把练习留给读者(当我在论文中遇到时,我讨厌那个句子;-))
In any case, both methods can be incorporated in a function that can deal with more complex formulae. I leave the exercise to the reader (what do I loath that sentence when I meet it in a paper ;-) )
这篇关于使用na.action = NULL的model.matrix()吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!